## 1. Introduction

A number of studies have proposed to model atmospheric and oceanic multiscale processes by treating the fast variables as additive white noise, which acts as forcing for the dynamics described by a damped linear operator (Hasselmann 1988; Penland and Matrosova 1998; von Storch et al. 1995; Branstator and Haupt 1998). Commonly, this linear operator is obtained by an empirical fitting procedure and includes not only linear processes, but also linear approximations of nonlinear processes. Certainly, a more accurate representation of nonlinear processes would be desirable, but is hampered by the data requirements posed by the estimation of a high-dimensional nonlinear operator.

On the other hand, a number of studies have suggested that tropospheric planetary wave behavior is fundamentally nonlinear (Legras and Ghil 1985; Hansen and Sutera 1986; Itoh and Kimoto 1999). These have been motivated by the highly truncated planetary wave theory of Charney and DeVore (1979) and Wiin-Nielsen (1979) or by the behavior of low-order dynamical systems that are seen as metaphors for planetary wave dynamics (Lorenz 1963; Palmer 1999).

Many of these studies are based on the notion that the nonlinearities are linked to a potential function with multiple wells, which in turn lead to multiple local density maxima in the probability density function (PDF; Hannachi 1997a, b; Palmer 1999). Evidence for multiple modes, or at least pronounced non-Gaussianities in the PDF of prominent patterns of observed tropospheric variability have been reported by Kimoto and Ghil (1993), Cheng and Wallace (1993), and Corti et al. (1999), although the statistical significance of some of their results has been questioned (Hsu and Zwiers 2001; Stephenson et al. 2004) since.

It is a characteristic of linear stochastic models driven by additive white noise, that they produce PDFs that are multivariate Gaussian. Hence, multiscale processes with non-Gaussian PDFs cannot be modeled as such. However, nonlinear stochastic models or linear stochastic models driven by multiplicative noise are capable of producing non-Gaussian PDFs. Consequently, some studies on the non-Gaussianity of planetary wave dynamics have focused on the role of a multiplicative noise (Sardeshmukh et al. 2001; Sura 2003), while others have also incorporated the possibility of deterministic nonlinearities (Kwasniok 1996; Majda et al. 2001, 2003).

Recently, we have analyzed a very long GCM integration and have found distinct nonlinearities in the mean tendencies of geopotential heights in planes of the truncated phase space (Branstator and Berner 2005, hereafter BB1). We found that in the leading planes and for daily sampled data, the nonlinearities accounted for up to 40% of the mean square of the mean tendencies (Table 1 of BB1).^{1} Moreover, we found that the directions characterized by strong nonlinearities have PDFs with significant non-Gaussian features (Berner and Branstator 2005, manuscript submitted to *J. Atmos. Sci.,* hereafter BB2). This study examines whether the nonlinearities in the tendencies are sufficient to describe the non-Gaussian features in the PDFs. This is not necessarily so, because both PDFs and tendencies are projected quantities, and the projection has the potential to mask non-Gaussianity and nonlinearity to different degrees (see BB2). But from tendency information alone, it is impossible to infer the PDF of a system. The ambiguity in linking PDFs and tendencies becomes clear when considering the finite-difference estimate of the mean tendencies (Fig. 1) that were discussed in detail by BB1 and Berner (2003). While, in Fig. 1a, the mean dynamics consist of a nonlinear, undamped motion around two center points, the mean dynamics in Fig. 1b is dominated by damping. Maybe an uninformed first guess would be that the former is associated with a bimodal PDF while the latter goes along with a unimodal distribution. In fact, the tendencies in Figs. 1a,b were derived from the same data and the only difference in their construction is the time lag used for the finite-difference estimate. Since the data are the same, the PDF that goes along with these tendencies is in both cases identical and unimodal (Fig. 4a).

One way to link PDFs and tendencies is the Fokker–Planck equation (FPE), which describes the temporal evolution of the PDF as a function of drift and diffusion. As it turns out, the drift is analytically equivalent to the mean tendencies of BB1 and hence the terms drift and mean tendency are interchangeable. In its stationary form, the FPE relates the equilibrium PDF to the mean tendencies by a diffusion term. Having this in mind, the approach of Siegert et al. (1998) is followed, which consists of estimating drift and diffusion coefficients from data and then solving the empirically constructed Fokker–Planck equation. Lately, this approach has been applied to a number of other physical processes (Ditlevsen 1999; Wirth 2001; Egger and Jonsson 2002; Sura 2003). Concurrent with this study, Sura et al. (2005) have independently studied the role of multiplicative noise in modeling observed geopotential height fields.

The FPE describes the probabilistic evolution of stochastic systems that obey the general Langevin equations and can have deterministic nonlinearities and multiplicative noise. Thus, these equations are capable of reproducing the nonlinear and non-Gaussian behavior found in BB1 and BB2 that cannot be produced by linear, additive noise models. Since the parameters of the nonlinear stochastic model are estimated from data, it is necessary to perform the analysis in a highly reduced space.

Although stochastic modeling is commonly applied to a great variety of deterministic continuous physical processes, the underlying assumption of time-scale separation between fast and slow processes is often not carefully tested.^{2} To model a continuous system stochastically it is necessary that it consists of slowly evolving variables with long decorrelation times and is forced by rapidly fluctuating processes with small decorrelation times (e.g., Gardiner 1983; DelSole 2000; Penland 2003a, b). If the rapidly fluctuating processes have a sufficiently small decorrelation time, it might be possible to replace them by white noise. However, if a stochastic model is fitted on a time scale on which the rapid fluctuations are still correlated, the white-noise assumption is violated and the FPE not applicable. An immediate consequence is that the FPE parameters should not be estimated from the smallest available time lag in the data, but it is necessary to first find the time lag for which the noise in the reduced space is decorrelated. Results from DelSole (2000) are used to empirically find such a time lag by inspecting decorrelation rates.

In the context of approximating a deterministic continuous process by a stochastically forced model, the mathematical distinction between Stratonovitch and Itô systems becomes very concrete. When the fast processes of a continuous system are replaced by white noise, the resulting stochastic model converges to a Stratonovitch stochastic differential equation (Wong and Zakai 1965; Papanicolaou and Kohler 1974; Gardiner 1983; Penland 2003a, b). Hence, all our results are necessarily interpreted in the Stratonovitch sense.

With these issues in mind, after describing the data (section 2), the general approach here is to empirically fit a nonlinear model with multiplicative noise to data from a GCM (section 3). Once the system is reduced, it will be assessed as to how well it captures the GCM characteristics (section 4), and which aspects of the model are essential for producing these characteristics. In particular, this study focuses on those aspects that cannot be represented by a linear model with additive white noise, namely the importance of a nonlinear deterministic part and the role of multiplicative noise (section 5). While the nonlinear deterministic part describes the nonlinear interactions of the resolved processes, multiplicative noise can be viewed as comprising the nonlinear interactions between resolved and unresolved scales.^{3} If there are linear interactions between resolved and unresolved scales, the empirical fitting procedure will assign them to the deterministic part (BB1). Finally, the dependence of the results on the choice of the time lag is discussed (section 6) and conclusions are drawn (section 7).

## 2. Model and data preprocessing

Data for this investigation come from the same atmospheric GCM used in BB1 and BB2, namely the Community Climate Model Version 0 (CCM0), which was developed at the National Center for Atmospheric Research (NCAR). The formulation of this nine-level, rhomboidal-15 truncation model is described in Williamson (1983). Though no longer considered to be a state-of-the-art GCM, its variability has key similarities to observations, including the scale, structure, and maintenance of the dominant patterns (Branstator 1990, 1992), and is more realistic than the highly simplified models often used in studies of planetary wave nonlinearities.

To provide a data record consisting of 28 million samples, the model was run in perpetual January mode for a total of 14 million days and sampled twice a day. To assure statistical significance the dataset has been divided into two halves and it has been verified that the features for the entire dataset reported here are found also in each subset. For data reduction purposes, the GCM states are expressed in terms of 500-hPa geopotential height, and their dimensionality is further reduced by projecting the departures of this field from its long-term mean onto their leading EOFs (Fig. 2 of BB1). The explained variances of the first four temporal coefficients, also called principal components (PCs), based on 12-hourly data are 8%, 5%, 4%, and 4%, respectively. Together, they represent 21% of the GCM variance (and 38% of the monthly variance). Subsequently, the PCs have been standardized to have a standard deviation of 1.

The procedure of empirically fitting the parameters of a stochastic model makes it necessary to analyze the data in a low-dimensional phase space. In the context of stochastic modeling, this phase space should be spanned by patterns comprising the slowest evolving processes.^{4} By projecting the data onto the leading EOFs, this study concentrates on the largest scale, most frequently occurring, and highest amplitude structures. It is not necessary that this basis automatically separates fast processes from slow processes, and indeed there are bases that are better suited for this purpose, for example, the orthogonal basis spanned by the so-called optimal persistence patterns (DelSole 2001; Crommelin and Majda 2004). However, for the purpose of relating the results to other climatological studies, and to the findings of BB1 and BB2, EOF phase space is used nevertheless as the basis system for our stochastic model. This is helped by the fact that in the case of planetary wave behavior, there tends to be an inverse relationship between the 500-hPa index and the time scale of the associated PC (Branstator et al. 1993).

The decorrelation time *τ _{d}* of the leading twenty PCs is shown in Fig. 2a. It is defined as the time scale after which the autocorrelation function (ACF)

*ρ*(

*t*) has decayed to 1/

*e*of its value at lag

*τ*= 0,

*ρ*(

*τ*) =

_{d}*ρ*(0)

*e*

^{−1}, and by definition

*ρ*(0) = 1. The largest decorrelation time of

*τ*≈ 18 days is obtained for PC1, stating that the physical processes that project onto the first EOF have the longest memory time. Generally, the decorrelation times decrease with increasing PC index, confirming that large-scale patterns are in general linked to slowly evolving processes and smaller-scale patterns to faster evolving processes. However, there are exceptions to this general behavior, namely PC10 and PC11, which stand out with their relatively large decorrelation times of

_{d}*τ*≈ 11 days and

_{d}*τ*≈ 7 days, respectively.

_{d}The analysis of the decorrelation times in EOF phase space suggests that a two-dimensional stochastic model should resolve PC1 and PC10 as slowly evolving variables and parameterize all other PCs as noise. However, we are especially interested in the dynamics of the plane spanned by EOF1 and EOF4 since, of the planes analyzed by BB1 and BB2, it is characterized by the most prominent nonlinearities in the tendencies—accounting for 40% of the mean squared mean tendencies (BB1)—and the strongest non-Gaussian features. Together, the two PCs in this plane explain 12% of the total variance. It is argued here that if the interactions of PC1 and PC4 with other PCs with similarly large decorrelation times are not too strong, a stochastic model of PC1 and PC4 is likely to be successful, because there are still hundreds of PCs with shorter decorrelation times that take the place of the rapidly fluctuating noise. This argument is supported by the fact that stochastic models fitted in other planes and even in three- and four-dimensional subspaces were equally successful as the one presented here.

## 3. Method

This study fits a stochastic differential equation (SDE) of Langevin type, that is, a nonlinear stochastic model, with possibly multiplicative noise, to the combined time series of PC1 and PC4. This is accomplished by estimating drift and diffusion coefficients in the FPE from the data. Subsequently, a stochastic process of Langevin form is found that is consistent with this FPE. Since the FPE describes the evolution of the PDF as a function of time and, unlike the SDE, does not involve the numerical complications of a random forcing, it is used directly whenever possible. However, for some purposes, like the calculation of the autocorrelation function, the associated SDE is integrated to realize a stochastic time series. An essential issue for the interpretation of the results is the distinction between Itô and Stratonovitch systems. While former are valid for discrete systems with *δ*-correlated noise and are mathematically and numerically favorable, stochastic models fitted to continuous data are characterized by finitely correlated noise and must be interpreted in the Stratonovitch sense (e.g., Gardiner 1983; Penland 2003a, b). However, it is possible to transform a Stratonovitch system to an equivalent Itô system and vice versa (see appendix). Since Itô systems are easier to integrate, we will use this transformation whenever possible to integrate the SDE. Regardless for which system the calculations are performed, all interpretations are made for Stratonovitch systems. The methodology for Itô SDEs is summarized here and elaborated on in the appendix for the changes that have to be made for Stratonovitch systems. The most prominent difference is that Stratonovitch systems can have a noise-induced drift in addition to the deterministic drift. From a diagnostic standpoint, the contributions of these two drifts to the total drift are not separable.

### a. Itô systems

*m*-dimensional stochastic variable

**under the influence of a rapidly fluctuating force**

*ξ***Γ**(

*t*) is given by the Itô SDE:where

*h*(

_{i}**,**

*ξ**t*) denotes the deterministic, possibly nonlinear, part and the stochastic function

**Γ**(

*t*) interpreted in the Itô (I) sense consists of

*m*-independent

*δ*-correlated Gaussian white-noise processes with mean zero and covariance

*q*:Here, Einstein’s summation convection is used, which assumes summation over repeated indices. Brackets 〈. . .〉 denote the ensemble-averaging operator. If the noise amplitude

_{ij}*g*is a function of

_{ij}**x**, that is, state dependent, the noise is called multiplicative. Otherwise, if

*g*is state independent, the system (1) is said to be forced by additive white noise.

_{ij}**as function of time, because only the statistics, but not the actual values of the noise Γ at time**

*ξ**t*, are known. Regardless, an equation for the evolution of the PDF

*W*(

**x**,

*t*) as a function of sharp values

**x**of

**and of time can be derived (e.g., Risken 1984; Gardiner 1983) and is given by the FPE:The parameters**

*ξ**A*(

_{i}**x**,

*t*) and

*B*(

_{ij}**x**,

*t*) are called drift vector and diffusion matrix, respectively, and are defined as (e.g., Risken 1984):where

**(**

*ξ**t*) is a single stochastic realization of the SDE (1) that starts at time

*t*at

**(**

*ξ**t*) =

**x**. For an ergodic system, ensemble averaging and time averaging are interchangeable, and drift and diffusion coefficients can be estimated from the time series of realizations.

*h*and the white-noise amplitude

_{i}*g*(e.g., Risken 1984):

_{ij}For a given Itô SDE (1), we can immediately write down the associated FPE using (6) and (7). For multivariate problems, the reverse does not hold: by knowing *A _{i}* and

*B*we cannot uniquely determine the underlying SDE, because the diffusion tensor

_{ij}*B*is invariant under orthogonal transformations of the noise amplitude

_{ij}*g*, which introduces a non-uniqueness into (7). Hence, there is not only one stochastic process, but an infinite number of stochastically equivalent processes that all have the same diffusion coefficient, but obey SDEs with different noise amplitudes

_{ij}*g*(Deker and Ryter 1980). However, since all stochastically equivalent processes have the same drift and diffusion and thus the same FPE, the temporal and spatial moments of realizations from different members are identical. Thus the knowledge of the FPE for one particular member is sufficient to describe the probabilistic evolution for all members in this class. This fact will be useful when calculating the autocorrelation function of the stochastic model.

_{ij}### b. Time-scale separation

*β*is defined as (DelSole 2000):where

*ρ*(

*t*) is the autocorrelation at lag

*τ*. The meaning of the decorrelation rate is best demonstrated with an example: The one-dimensional linear Markov model

*ẋ*(

*t*) = −

*ax*(

*t*) + Γ(

*t*), where

*a*is a positive real number, is of the form (1) and has a decorrelation time of

*τ*= 1/

_{d}*a*and a decorrelation rate of

*β*(

*τ*) =

*a*. For this simple system, the decorrelation rate is independent of the lag

*τ*and equals the characteristic damping,

*a*. Obviously, there is a clear time-scale separation between the decorrelation time of the resolved variable,

*τ*

_{d}= 1/

*a*, and the decorrelation time of the noise, which is zero. DelSole (2000) has shown that if the white-noise Γ(

*t*) is replaced by a noise with a finite decorrelation time

*τ*, the decorrelation rate of the system varies linearly with lag for

_{γ}*τ*≪

*τ*and asymptotes to

_{γ}*a*for

*τ*≫

*τ*. The result for short time lags carries over to nonlinear systems: As long as the decorrelation rate increases linearly with time lag, the noise is not yet decorrelated and a white-noise approximation of the fast processes is not valid.

_{d}The decorrelation rates for PC1 and PC4 are plotted using a double logarithmic scale in Figs. 2b and 2c respectively. For PC1 we observe for short lags *τ* < 1 day a rapid linear increase of decorrelation rate followed by a transitional range 1 day ≤ *τ* ≤ 10 day with nonlinear increase. For larger lags *τ* > 10 day, the decorrelation rate shows a linear decrease that is much weaker than the increase for small time lags. The decorrelation rate for PC4 increases as well linearly for *τ* < 1 day, but it has a distinct peak at *τ* = 12 day, followed by a sharp drop. With the results of DelSole (2000) in mind, we reason that the decorrelation time of the noise is given by the smallest lag for which *β*(*τ*) does not increase in a linear manner. For the GCM data this amounts to *τ _{γ}* ≈ 1 day, which is a lower limit of the time scale on which stochastic modeling is possible.

A natural upper limit for the time scale for which we expect to be able to fit the nonlinear dynamics, is given by the decorrelation time *τ _{d}*. For time lags

*τ*>

*τ*, the system will have lost most of its memory, and an empirical fit of the dynamics will generally be hampered by insufficient temporal resolution. The intersect between the decorrelation rate

_{d}*β*(

*τ*) and the

*e*-folding line −1/

*τ*yields the decorrelation time

*τ*(Figs. 2b,c), which is

_{d}*τ*= 18 days and

_{d}*τ*= 6 days for PC1 and PC4, respectively (see also Fig. 2a).

_{d}From the investigation of the decorrelation rates it is inferred that if a stochastic approximation of the GCM dynamics is possible, it should be valid for a range of time lags between the decorrelation time of the noise, *τ _{γ}* ≈ 1day, and the smallest decorrelation time of the resolved processes, here

*τ*= 6 days for PC4. To resolve as much of the deterministic dynamics as possible, results are presented for a stochastic model on the shortest possible time scale

_{d}*τ*= 1 day and the outcome is discussed for other time lags in section 6.

## 4. A stochastic model of planetary wave behavior

### a. The probability density function

The two-dimensional drift vector *A _{i}* and the elements of the diffusion tensor

*B*were calculated using the finite-difference approximations of drift and diffusion (4, 5) and a time lag

_{ij}*τ*= 1 day. This introduces a finite-difference error of order

*τ*(Sura and Barsugli 2002), which is derived for the multivariate case in Berner (2003). It is of importance that the finite-difference error is a function of state and thus has the potential to make a state-independent diffusion appear state dependent. Hence, if the diffusion is used to determine if a system is forced by multiplicative noise, it is important to assure that the state dependence is not an artifact of this finite-difference error. To guard against such artifacts, all results in this study are based on drift and diffusion coefficients that are corrected to remove this finite-difference error.

The drift vector (Fig. 1b) is a function of the time lag *τ* and describes the mean dynamical evolution of the trajectory of the system. Theoretically, it is identical to the mean tendencies studied in BB1. The error that results from estimating the drift from finite differences is, in this dataset, typically two orders of magnitude smaller than the drift itself, and corrected and uncorrected drifts are visually indistinguishable. As reported in BB1, the drift can be decomposed into a lag-dependent linear damped operator *A _{i}*

_{,lin}and a lag-independent nonlinear operator characterized by a double swirl motion around two special states (similar to the motion depicted in Fig. 1a). Although the drift for

*τ*= 1 day is dominated by the linear damping, the nonlinear contributions in orientation and speed are still detectable (Fig. 1b). In particular, this drift cannot be produced by a linear stochastic model forced by additive white noise.

The diffusion tensor for each state **x** is given by a symmetric 2 × 2 matrix. According to (5), it is estimated as the covariance matrix of the displacement *ξ _{i}*(

*t*+

*τ*) −

*ξ*(

_{i}*t*) divided by the time lag

*τ*. The diagonal elements are estimated from variances and are hence positive definite. The symmetric off-diagonal elements are based on covariances and can thus be negative. The diffusion components are highly inhomogeneous and characterized by weak domainwide gradients (Fig. 3). They have been corrected for their finite-difference errors, which are typically one order of magnitude smaller and of different structure than the diffusion components. Therefore it is ensured that the diffusion is truly a function of the state

**x**, from which immediately follows that the corresponding SDE is forced by multiplicative noise. Overall, the diffusion, and thus the noise variance, is larger for a state that falls into the lower left quadrant than for one that falls into the upper right quadrant.

*x*= 0.4 and an explicit Eulerian integration scheme with a time step Δ

*t*= 0.1 day

^{−1}. If the change of probability with time is below a critical value at each grid point (in this case 10

^{−8}), the resulting PDF is assumed to be the stationary solution of the FPE. The stationary solution of the FPE for the time lag

*τ*= 1 day is shown together with the GCM’s PDF in Figs. 4a,c. The two PDFs are almost indistinguishable and share the same non-Gaussian features: the shift of the mode, the overall skewness, and the existence of probability ridges indicating preferred patterns. These non-Gaussian features are qualitatively and quantitatively examined in BB2. We found that a good measure for describing deviations from Gaussianity for both low- and high-amplitude states that is not dominated by the apparent skewness, is given by the local dependence (LD; Stephenson et al. 2004):where

*f*(

_{x}*x*) and

*f*(

_{y}*y*) are the marginals of the joint distribution

*f*(

*x*,

*y*) =

*W*(

*x*,

*y*). It is related to the integrand of the quantity known as mutual information (Shannon and Weaver 1949), but other than that does not assign small weight to high-amplitude states. In short, if the data are statistically independent, then

*f*(

*x*,

*y*) factors into its marginals and

*s*(

*x*,

*y*) is 1. For a multivariate Gaussian with zero covariance, the LD is 1 everywhere and deviations of the LD from 1 indicate local dependence and non-Gaussianity. The local density of the GCM PDF and the stationary solution of the FPE (Figs. 4b,d) are very similar, with large values at the upper right and left edges of the distribution and for negative PC1 close to the PC4 axis. This demonstrates that even the subtle non-Gaussian features in the GCM are captured remarkably well by the stochastic model.

### b. Temporal behavior

By estimating drift and diffusion from finite differences at a time lag *τ* = 1 day, a stationary FPE has been found that has virtually the same PDF as the GCM states in the EOF1–EOF4 plane. However, it is unclear if the class of associated stochastic models is also able to capture the temporal behavior of the GCM. Since all members of this class have the same FPE, their temporal behavior is statistically indistinguishable and it is sufficient to investigate the temporal behavior of one candidate. To see to what degree the stochastic model can reproduce the temporal aspects of the GCM, their temporal behavior is compared in two ways. First, the autocorrelation functions of the GCM and the stochastic model are compared. The ACF averages over all states and is thus a global measure of temporal behavior. Secondly, the nonstationary FPE is used to examine the local, that is, state-dependent temporal behavior.

To obtain the ACF of the stochastic model, the SDE is integrated. The system here has finitely correlated noise and thus converges to the Stratonovitch SDE [(A2), see appendix]. Nevertheless, it is favorable to work with the equivalent Itô SDE (A3), since for Itô systems the deterministic drift equals the total drift *A _{i}* =

*h*(

_{i}**x**) and does not have to be computed by subtracting the noise-induced drift from the total drift. In addition, numerical schemes for Itô systems have higher accuracy (Penland 2003a, b, and references therein). That member is selected out of the class of stochastically equivalent Itô SDEs whose noise amplitude is the root

*g*(

_{ij}**x**,

*t*) =

*E*

_{ik}d^{1/2}

_{k}

*E*of the diffusion matrix, where the

_{jk}*d*are the eigenvalues and

_{k}*E*is the eigenmatrix of

_{ik}*B*.

_{ij}The integration of an SDE is not straightforward since the accuracy of numerical schemes for integrating SDEs is different from that for deterministic differential equations.^{5} An Euler (first-order Runge–Kutta) scheme is chosen, which converges weakly to the solution of the Itô SDE with accuracy Δ*t*^{1/2} and a time step of Δ*t* = 0.1 day. This time step is about 10 times larger than the time step recommended by Kloeden and Platen (1992), but of the same order as that reported by Penland (2003b) to yield sufficiently good results. Repeating the numerical integration using more accurate numerical schemes and/or much smaller time steps did not change the results.

Outside the region of three standard deviations away from the mean, the GCM sample sizes are too small for *A _{i}* and

*B*to be accurately estimated. Hence, the trajectory is randomly reinitialized whenever it is more than three standard deviations from the origin, that is, outside a region to be referred to from now on as the 3

_{ij}*σ*-domain. Since autocorrelations for up to 100 days are shown, the Itô SDE is integrated until there are 1.4 × 10

^{5}segments of 100 consecutive days, yielding a data record of the same length as the GCM data. Thus, the stochastic time series is a concatenation of a large number of trajectories. The PDF of the integrated SDE and its LD are in very good agreement with the stationary FPE (Figs. 5a,b), indicating that the choices concerning the numerical integration of the SDE were valid. Theoretically, the PDFs of FPE and SDE are identical and any differences are the result of finite sampling and the accuracy of the numerical scheme.

The ACF of the integrated SDE and GCM for the first and fourth PC are plotted in Fig. 5. Recall that the stochastic trajectory is reinitialized if it leaves the 3*σ*domain and thus the correlations of the high-amplitude states are systematically discarded. Since this can have a profound effect on the global ACF, the ACF of the SDE is compared to that of the GCM within the 3*σ* domain. For the GCM, omitting high-amplitude events leads to an ACF that has less memory than the ACF using all amplitudes, indicating that the high-amplitude states have a higher autocorrelation than the average state, in particular for PC1.

The autocorrelation functions of the stochastic model capture the general decay rates of the GCM, with shorter decorrelation times for PC4 than for PC1. However, the decorrelation times of the SDE for both PCs are slightly too large, that is, the stochastic model has too much memory. The logarithmic scale in Figs. 5e and 5f reveals further subtleties. Using this scale, an exponential decay is represented as a straight line. We see that the ACFs of the GCM decay approximately exponentially, and for PC1, this decay is captured by the stochastic model over all 100 days. In this representation we become aware of a dynamically interesting feature in PC4. The ACF of the GCM shows a dip for lags of 10 to 12 days. Since the ACF is still positive, this feature cannot be explained by a simple oscillation. The stochastic model does not capture this dip; however, it reproduces the overall decay rate up to lags of 75 days very well. Thus, the ACF of the stochastic model can be seen as an upper envelope for the ACF in the GCM.

Since the decorrelation times are directly linked to the damped component of the drift and the damping is lag dependent (DelSole 2000; BB1), it is expected that decorrelation times will be a function of the lag *τ* used for estimating the coefficients in the SDE. Indeed, the decorrelation times of the SDE decrease for estimates based on lags of *τ* > 1 day and agree much better with the decorrelation times of the GCM, as will be discussed in section 6.

Though the global autocorrelation function is a good way to assess the overall temporal behavior of the stochastic model, it is expected that nonlinearities lead to state-dependent autocorrelations. Evidence of this was seen in the domain dependence of the GCM ACFs. To systematically study local behavior, the nonstationary FPE is employed to obtain the temporal evolution as function of the large-scale state. For many purposes, this approach addresses directly the issues at hand and circumvents the computational problems of integrating an SDE.

To find the temporal evolution of a state **x**_{0}(*t*_{0}), the following approach is adopted: First, the FPE is initialized with the mean of all the GCM states that fall into a certain subdomain of the PC1–PC4 plane. Then, the temporal evolution of the conditional PDF *P*(**x**,*t*_{0} + *δt*|**x**_{0},*t*_{0}) is computed by integrating the time-dependent FPE (3) and comparing it to the evolution of these states in the GCM. As an illustration the conditional PDF for the state in the center of the subdomain denoted by the black box in Fig. 6 is presented after 1, 5, and 10 days (Figs. 6d–f), and compared to the conditional PDF of the GCM states that fall into this subdomain at the same times (Figs. 6a–c). Overall, the temporal evolution of the FPE resembles that of the GCM states to a remarkable degree. The position of the mode of the conditional PDF in the PC1 direction is captured very well. However, it moves too slowly in the PC4 direction. This behavior is consistent with the earlier finding that PC4 has too much memory in the stochastic model. The FPE starting from this particular initial state **x**_{0} correctly assigns more variance in the PC4 direction than in the PC1 direction. However, the overall variance is underestimated, which leads to a conditional PDF that is more sharply peaked than that of the GCM. The covariance structure of the conditional PDF, which corresponds to the spreading of the “cloud of states” in diagonal directions, is well captured. This example demonstrates that the conditional PDF can be interpreted as an ensemble prediction system, where the drift vector predicts the ensemble mean state after *δt* days and the diffusion measures the uncertainty (or predictability) of this prediction.

Next, this approach is repeated for ensembles in each subdomain and the conditional PDF after 1, 5, and 10 days is summarized by its mean and covariance structure. In 2D, this amounts to five independent parameters: two for the mean, two for the variances, and one for the covariance. The mean position is summarized after the time *δt* by calculating the root-mean-square displacement of the mean from its initial position **x**_{0} (Fig. 7). This quantity tells us how far the mean of the conditional PDF moved within the given time interval and the differences between the first and second rows of Fig. 7 are a measure of the mean error of the stochastic model. The variances and covariances of the conditional PDF are displayed separately in Figs. 8 –10. All values in Figs. 7 –10 are plotted at the coordinates of the initial state **x**_{0}.

Overall, the time-dependent FPE predicts the evolution of the mean state in the GCM very well, although the displacement is slightly too small for longer time lags (Fig. 7). This error is mostly introduced by the PC4 component and is qualitatively similar to that discussed in respect to Fig. 6. The state-dependent variance of PC1 in the conditional PDF is also modeled well (Fig. 8). Initially, for both GCM and nonstationary FPE, the variance is strongest for states with large negative PC4. After five and more days, states that initially started in the lower right quadrant exhibit the largest variance, while states with negative PC1 spread much more slowly. The variance of the conditional PDF with regard to PC4 is not modeled nearly as well (Fig. 9). The FPE correctly models the large variances for states with negative PC1 for lags up to *δt* = 5 days, but fails to model the detailed structure of the variance for longer time lags *δt* > 5 days. Quite marked is the excellent agreement in the state-dependent covariance structure and amplitude of the conditional PDFs of the FPE and GCM (Fig. 10).

## 5. The role of the nonlinearities in the drift

Now that a two-dimensional SDE has been found that models not only the main characteristics of the global and state-dependent temporal evolution, but also the subtle non-Gaussianities in the PDF, we return to the question of whether the nonlinearities in the drift are necessary for a good agreement between the stochastic model and the GCM. To address this issue, it is examined which terms in the FPE are instrumental in producing this agreement. It is not attempted to determine if the source of the drift nonlinearities is deterministic or noise-induced, since, as discussed in the appendix, this distinction cannot be made in the empirical context.

First, the stationary FPE for a process with the same nonlinear drift as the GCM, but with a state-independent diffusion tensor * B _{ij}*, is found. To preserve the average diffusion the elements of the diffusion tensor are replaced with their spatially averaged values given in the left bottom corner of Figs. 3a–c. The diffusion tensor

*still consists of three independent entries, but they are not a function of*B

_{ij}**x**any more. This FPE is consistent with a Stratonovitch SDE with a nonlinear deterministic part equal to the drift,

*h*=

_{i}*A*, and additive white noise with amplitude

_{i}*(see appendix). Since the GCM data call for multiplicative noise, this is a simplification of the stochastic model found in section 4. The stationary solution of the FPE with this simplification is shown in Fig. 11a. The general features of the PDF are still captured well, although the probability ridge in the upper right quadrant is slightly too weak. The positions of the extrema in the LD are modeled accurately (Fig. 11b), but the amplitudes of the minima are too small. From this, one can conclude that a model with nonlinear deterministic part*g

_{ij}*A*forced by additive white noise is sufficient to explain the major non-Gaussian features in the PDF. While in the simplified model, the nonlinearities are purely deterministic, no claim can be made that the drift nonlinearities in the complete model are also deterministic and not noise induced. The global temporal characteristics of the GCM are well represented by the simplified model. However, the state-dependent temporal behavior does not contain the contribution of the state-dependent diffusion and is not nearly as good as for the stochastic model with multiplicative noise (not shown).

_{i}Although the diffusion tensor is state independent, it has nonvanishing off-diagonal terms _{12} = _{21}. This means that the noise processes forcing PC1 and PC4 are correlated via the coupling terms *g*_{12} and *g*_{21}. Is it possible to neglect this coupling and still get a good stationary PDF? To answer this, _{12} = _{21} is set to zero and _{11} and _{22} are kept constant, producing a nonlinear stochastic model forced by uncorrelated additive white noise. As seen in Fig. 11c, the stationary solution of this simplified model departs significantly from that of the more complete model. The probability ridge in the right upper quadrant is now highly overemphasized and the ridge in the upper left quadrant has disappeared. This structural change is also reflected in the LD (Fig. 11d). From this it can be concluded that correlated noise processes are necessary for a good stochastic representation.

To address whether a nonlinear drift *A _{i}* is necessary to obtain the non-Gaussian features, or if a diffusion tensor

*B*consistent with multiplicative noise alone is sufficient to model the inhomogeneities in the PDF, the stationary solution of the FPE with the full diffusion tensor

_{ij}*B*is calculated, but the nonlinear drift is replaced by the linear drift obtained from a linear regression,

_{ij}*A*=

_{i}*A*

_{i,lin}, analogous to that used to construct linear inverse models (Penland and Magorian 1993). The details of this linear regression are described in BB1. From the stationary solution and its LD (Figs. 11e,f), it follows that a model with a linear drift and full state-dependent diffusion is not able to model the non-Gaussian features in the GCM PDF. The conclusion is that a nonlinear drift is necessary to correctly model the non-Gaussianities in the GCM.

## 6. Stochastic models fitted to other time lags

To see if there are meaningful stochastic models for the entire range of lags 1 day < *τ* < 6 days, the analyses in sections 2–5 are repeated for finite-difference estimates of drift and diffusion coefficients based on other time lags within this range. As an example, the results for the lag *τ* = 4 days are shown. This lag is considerably longer than the lag used for the previous results, but short enough so that PC4 is not yet completely decorrelated. The stationary solution of the FPE together with the ACF of the associated SDE for this lag are shown in Fig. 12. Overall, the stationary solution of the FPE again has the same structure as the GCM with a distinct ridge in the upper right quadrant and a second region of increased probability in the left upper quadrant. This structural similarity is supported by the LD (Fig. 12b). However, compared to the PDF of the GCM the stationary solution of the FPE has too little variance in the PC4 direction.

The ACFs of the associated SDE again have approximately exponential decay rates, but are larger than the decay rates of the ACF for *τ* = 1 day (Figs. 12c–f). This leads to an improvement in the overall decay rate for *τ* < 50 days, but an underestimation of the decay rate for PC1 for lags *τ* > 50 days. It was found that the best overall match between the ACF of GCM and stochastic model are obtained for a lag of *τ* = 2 days.

## 7. Conclusions

While there is agreement that much of the large-scale planetary wave behavior can be understood in terms of a linear stochastic model driven by additive white noise, this study focuses on the part of tropospheric large-scale dynamics that cannot be explained by such a model. Since the nonlinearities are less pronounced than the dominating linear behavior, extended integrations of an atmospheric GCM have been used and analysis in the highly reduced phase space spanned by the leading EOFs has been performed to meet the data requirements for statistically meaningful results.

Linear and nonlinear signatures of planetary wave behavior in this GCM were identified and quantified in BB1. In addition to linear damping, the mean tendencies of the leading patterns show marked nonlinearities in the form of a double swirl motion around two phase-space locations. Separately, BB2 investigated probability density functions (PDFs) of planetary wave states in the same GCM. Although unimodal, the PDFs show significant departures from Gaussianity. To link the non-Gaussian features to the nonlinear tendencies, the development of a reduced stochastic model was undertaken with possible nonlinear deterministic drift and multiplicative noise.

Ideally one would want to use a strategy for systematically substituting the fast processes with noise (Khasminsky 1966; Gardiner 1983; Majda et al. 1999, 2001) and recently this analytical approach was applied to simplified atmospheric models (Sardeshmukh et al. 2001; Majda et al. 2003; Franzke et al. 2005). However, the complexity of the dynamics in the GCM does not allow for such a mostly analytical treatment and makes an empirical approach necessary.

To find a nonlinear stochastic model for planetary wave behavior, drift and diffusion coefficients in the Fokker–Planck equation (FPE) are first estimated from the GCM data. For most purposes we are interested in the probabilistic evolution of an ensemble of nearby states, which is directly given by the FPE. Other analyses, like the computation of the autocorrelation function (ACF), require realizations of individual trajectories, which are obtained by integrating a stochastic differential equation (SDE) that has the same FPE as the GCM.

It was found that the drift vector of our model is a nonlinear function of state. In addition, the diffusion coefficients are state dependent, from which immediately follows that the associated SDE is driven by multiplicative noise. In particular, the diffusion of states with equal and opposite coordinates is not the same, which is a manifestation of nonlinear behavior. The diffusion tensor is not diagonal, which implies correlation among the driving noise processes.

This nonlinear, multiplicative noise model captures the non-Gaussian subtleties in the climatological PDF of planetary wave behavior to a remarkable degree. Furthermore, it produces the major temporal characteristics of planetary wave behavior in the GCM, in terms of both global autocorrelations and the temporal evolution of ensembles of nearby states. Since the time-dependent FPE predicts not only the ensemble mean state, but also the ensemble spread over time, it can be used to obtain state-dependent nonlinear predictions and their associated state-dependent predictability.

To assess the importance of drift nonlinearities and the role of the multiplicative noise for the nonlinear stochastic model of planetary wave behavior, simplified models are constructed and it is determined to what degree these are able to reproduce the non-Gaussian signatures in the GCM data. By keeping the full nonlinear drift and replacing the multiplicative noise by an additive white noise process that is still correlated, a simplified model is still able to capture the main non-Gaussian features in the PDF.

It turns out that the correlations in the noise are essential and cannot be neglected. Branstator and Berner (2005) calculated the principal oscillation patterns in the phase space spanned by the first four EOFs and found an oscillatory mode that is very reminiscent of the westward propagating feature known as Branstator–Kushnir oscillation (Branstator 1987; Kushnir 1987). The projection of this oscillation onto the EOF1–EOF4 phase-space plane is shown for an arbitrarily chosen amplitude in Fig. 6c. Since the rotation axis is almost, but not entirely, parallel to our phase-space plane, the projections onto this plane are weak. This oscillation is believed to be a major contributor to the correlated noise that was found to be necessary for a good stochastic model. This is supported by the fact that the tilt of the projected oscillation is consistent with a negative cross diffusion (Fig. 3b). The inclusion of this linear oscillation as correlated fluctuating force is essential for a good stochastic model.

It was found that the nonlinearities in the drift are absolutely necessary for a reduced model of planetary wave behavior. Since the results here must be interpreted in the Stratonovitch sense, it is impossible to determine whether the drift nonlinearities are deterministic or noise induced (see appendix). However, since a noise-induced drift is a direct consequence of multiplicative noise, and multiplicative noise, in the context of empirical stochastic modeling, is a direct consequence of nonlinear interactions (BB1), the nonlinearities in the drift are in both cases a consequence of nonlinearities.

Although a nonlinear stochastic model forced by additive white noise captures many aspects of the climatological PDF, it is not as good as the multiplicative-noise model. In that regard, the findings here agree with the results of Sura et al. (2005), who found that a multiplicative noise was necessary to balance the probability budget. However, the studies come to different conclusions with regard to the importance of a nonlinear drift: while a nonlinear drift is found here to be essential for a good model, Sura et al. (2005) conclude that “the observed departures from Gaussianity do not result from the nonlinear drift term [. . .] but rather from the multiplicative structure of the noise”. The two studies differ in a number of ways. First, Sura et al. use observational data, which reduces the sample to less than 10 000 days and raises the question of statistical significance. Second, they analyze the residual term of a probability budget. However, since the contribution of the linear drift is not separated from that of the nonlinear drift, it is difficult to draw conclusions about the importance of drift nonlinearities from their work. Finally, Sura et al. (2005) estimated the drift from 5-day running means. BB1 has quantified linear and nonlinear behavior in tendencies based on 5-day lags in the GCM and found that for such a lag, the linear contribution masks the nonlinear behavior. This suggests that the nonlinearities in the observational record are more pronounced in drifts estimated from unfiltered data and from shorter lags (e.g., BB1, Fig. 1).

An analysis of the decorrelation rates in the manner of DelSole (2000) yields a range of time lags between 1 and 6 days, for which it is appropriate to model the planetary wave behavior stochastically. For time lags *τ* < 1 day, the implicit white-noise assumption is violated and a stochastic model cannot reproduce the climatological GCM. Even within the range of valid lags, there are differences in the details of the performance of the stochastic models. This is to be expected since estimating drift and diffusion coefficients from various time lags effectively shifts physical processes between the resolved and unresolved variables. Interestingly, there seems to be a trade-off between reproducing the non-Gaussianities in the PDF versus capturing the temporal aspects of planetary wave behavior. While shorter time lags yield an excellent agreement in the PDF, the stochastic model tends to have too much memory of the initial state. For longer time lags, the stochastic model captures the temporal decay rates much better, but the variance of the climatological PDF is slightly underestimated.

In summary, this study shows that although the major part of atmospheric planetary wave behavior can be modeled in linear terms, the inclusion of nonlinear interactions via nonlinear drift and multiplicative noise lead to substantial improvements. A nonlinear stochastic model of planetary wave behavior succeeds in capturing the subtle, but significant, non-Gaussian features in the climatological distribution as well as key elements of the nonlinear evolution of nearby states that cannot be captured by a linear model with additive white noise.

## Acknowledgments

The author profited from beneficial discussions with A. Fournier, A. Hense, A. Majda, M. Newman, C. Penland, C. Perez, P. Sardeshmukh, R. Saravanan, J. Tribbia, and E. Vanden-Eijnden. In particular I wish to thank G. Branstator for numerous enlightening discussions and for his invaluable input. G. Branstator and A. Fournier helped greatly improve earlier versions of this manuscript. In addition, I thank P. Sura and an anonymous reviewer for their insightful comments. A. Mai aided in processing the GCM data. During this study, the author was supported by NCAR’s Advanced Study Program.

## REFERENCES

Berner, J., 2003:

*Detection and Stochastic Modeling of Nonlinear Signatures in the Geopotential Height Field of an Atmospheric General Circulation Model*. Bonner Meteorologische Abhandlungen (Heft 58), Asgard-Verlag, 156 pp.Branstator, G., 1987: A striking example of the atmosphere’s leading traveling pattern.

,*J. Atmos. Sci.***44****,**2310–2323.Branstator, G., 1990: Low-frequency patterns induced by stationary waves.

,*J. Atmos. Sci.***47****,**629–648.Branstator, G., 1992: The maintenance of low-frequency atmospheric anomalies.

,*J. Atmos. Sci.***49****,**1924–1945.Branstator, G., , and S. E. Haupt, 1998: An empirical model of barotropic atmospheric dynamics and its response to tropical forcing.

,*J. Climate***11****,**2645–2667.Branstator, G., , and J. Berner, 2005: Linear and nonlinear signatures in the planetary wave dynamics of an AGCM: Phase space tendencies.

,*J. Atmos. Sci.***62****,**1792–1811.Branstator, G., , A. Mai, , and D. Baumhefner, 1993: Identification of highly predictable flow elements for spatial filtering of medium- and extended-range numerical forecasts.

,*Mon. Wea. Rev.***121****,**1786–1802.Charney, J., , and J. D. DeVore, 1979: Multiple flow equilibria in the atmosphere and blocking.

,*J. Atmos. Sci.***36****,**1205–1216.Cheng, X., , and J. Wallace, 1993: Cluster analysis of the Northern Hemisphere wintertime 500-hpa height field: Spatial patterns.

,*J. Atmos. Sci.***50****,**2674–2696.Corti, S., , F. Molteni, , and T. N. Palmer, 1999: Signature of recent climate change in frequencies of natural atmospheric circulation regimes.

,*Nature***398****,**799–802.Crommelin, D. T., , and A. Majda, 2004: Strategies for model reduction: Comparing different optimal bases.

,*J. Atmos. Sci.***61****,**2206–2217.Deker, U., , and D. Ryter, 1980: Properties of the noise-induced (“spurious”) drift. II. Simplifications of Langevin equations.

,*J. Math. Phys.***21****,**2666–2669.DelSole, T., 2000: A fundamental limitation of Markov models.

,*J. Atmos. Sci.***57****,**2158–2168.DelSole, T., 2001: Optimally persistent patterns in time-varying fields.

,*J. Atmos. Sci.***58****,**1341–1356.Ditlevsen, P. D., 1999: Observation of

*α*-stable noise induced millennial climate changes from an ice-core record.,*Geophys. Res. Lett.***26****,**1411–1444.Egger, J., , and T. Jonsson, 2002: Dynamic models for Icelandic meteorological data sets.

,*Tellus***54A****,**1–13.Ewald, B., , C. Penland, , and R. Témam, 2004: Accurate integration of stochastic climate models with application to El Niño.

,*Mon. Wea. Rev.***132****,**154–164.Franzke, C., , A. J. Majda, , and E. Vanden-Eijnden, 2005: Low-order stochastic mode reduction for a realistic barotropic model climate.

,*J. Atmos. Sci.***62****,**1722–1745.Gardiner, C. W., 1983:

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*. Springer-Verlag, 415 pp.Hannachi, A., 1997a: Low-frequency variability in a GCM: Three-dimensional flow regimes and their dynamics.

,*J. Climate***10****,**1357–1379.Hannachi, A., 1997b: Weather regimes in the Pacific from a GCM. Part II: Dynamics and stability.

,*J. Atmos. Sci.***54****,**1334–1348.Hansen, A., , and A. Sutera, 1986: On probability density distribution of planetary-scale atmospheric wave amplitude.

,*J. Atmos. Sci.***43****,**3250–3265.Hasselmann, K., 1988: PIPs and POPs: The reduction of complex dynamical systems using principal interaction and oscillation patterns.

,*J. Geophys. Res.***93****,**11015–11021.Horsthemke, W., , and R. Levefer, 1984:

*Noise-Induced Transitions*. Springer-Verlag, 318 pp.Hsu, C., , and F. Zwiers, 2001: Climate change in recurrent regimes and modes and atmospheric variability.

,*J. Geophys. Res.***106****,**20145–20159.Itoh, H., , and M. Kimoto, 1999: Weather regimes, low-frequency oscillations, and principal patterns of variability: A perspective of extratropical low-frequency variability.

,*J. Atmos. Sci.***56****,**2684–2705.Khasminsky, R. Z., 1966: A limit theorem for the solutions of differential equations with random right-hand sides.

,*Theory Prob. Appl.***11****,**390–406.Kimoto, M., , and M. Ghil, 1993: Multiple flow regimes in the northern hemisphere winter. Part I: Methodology and hemispheric regimes.

,*J. Atmos. Sci.***50****,**2625–2643.Kloeden, P., , and E. Platen, 1992:

*Numerical Solution of Stochastic Differential Equations*. Springer-Verlag, 632 pp.Kushnir, Y., 1987: Retrograding wintertime low-frequency disturbances over the north Pacific Ocean.

,*J. Atmos. Sci.***44****,**2727–2742.Kwasniok, F., 1996: The reduction of complex dynamical systems using principal interaction patterns.

,*Physica D***92****,**28–60.Legras, B., , and M. Ghil, 1985: Persistent anomalies, blocking and variations in atmospheric predictability.

,*J. Atmos. Sci.***42****,**433–471.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20****,**130–141.Majda, A., , I. Timofeyev, , and E. Vanden-Eijnden, 1999: Models for stochastic climate prediction.

,*Proc. Natl. Acad. Sci. USA***96****,**14687–14691.Majda, A., , I. Timofeyev, , and E. Vanden-Eijnden, 2001: A mathematical framework for stochastic climate models.

,*Comm. Pure Appl. Math.***54****,**891–974.Majda, A., , I. Timofeyev, , and E. Vanden-Eijnden, 2003: Systematic strategies for stochastic mode reduction in climate.

,*J. Atmos. Sci.***60****,**1705–1722.Newman, M., , P. D. Sardeshmukh, , and C. Penland, 1997: Stochastic forcing of the wintertime extratropical flow.

,*J. Atmos. Sci.***54****,**435–455.Palmer, T. N., 1999: A nonlinear dynamical perspective on climate prediction.

,*J. Climate***12****,**575–591.Papanicolaou, G., , and W. Kohler, 1974: Asymptotic theory of mixing stochastic ordinary differential equations.

,*Commun. Pure Appl. Math.***27****,**641–668.Penland, C., 2003a: A stochastic approach to nonlinear dynamics: A review (electronic supplement to “Noise out of chaos and why it won’t go away”).

,*Bull. Amer. Meteor. Soc.***84****,**ES43–ES51.Penland, C., 2003b: Noise out of chaos and why it won’t go away.

,*Bull. Amer. Meteor. Soc.***84****,**921–925.Penland, C., , and T. Magorian, 1993: Prediction of Niño-3 sea surface temperatures using linear inverse modeling.

,*J. Climate***6****,**1067–1076.Penland, C., , and L. Matrosova, 1998: Prediction of tropical Atlantic sea surface temperatures using linear inverse modeling.

,*J. Climate***11****,**483–496.Risken, H., 1984:

*The Fokker-Planck Equation: Methods of Solution and Applications*. Springer-Verlag, 454 pp.Sardeshmukh, P., , C. Penland, , and M. Newman, 2001: Rossby waves in a fluctuating medium.

*Stochastic Climate Models,*P. Imkeller and J.-S. von Storch, Eds., Progress In Probability, Vol. 49, Birkäuser Verlag, 369–384.Shannon, C. E., , and W. Weaver, 1949:

*The Mathematical Theory of Communication*. University of Illinois Press, 117 pp.Siegert, S., , R. Friedrich, , and J. Peinke, 1998: Analysis of data sets of stochastic systems.

,*Phys. Lett. A***243****,**275–280.Stephenson, D., , A. Hannachi, , and A. O’Neill, 2004: On the existence of multiple climate regimes.

,*Quart. J. Roy. Meteor. Soc.***130****,**583–605.Sura, P., 2003: Stochastic analysis of Southern and Pacific Ocean sea surface winds.

,*J. Atmos. Sci.***60****,**654–666.Sura, P., , and J. Barsugli, 2002: A note on estimating drift and diffusion parameters from timeseries.

,*Phys. Lett. A***305****,**304–311.Sura, P., , M. Newman, , C. Penland, , and P. Sardeshmukh, 2005: Multiplicative noise and non-Gaussianity: A paradigm for atmospheric regimes?

,*J. Atmos. Sci.***62****,**1391–1409.von Storch, H., , G. Buerger, , R. Schnur, , and J-S. von Storch, 1995: Principal oscillation patterns: A review.

,*J. Climate***8****,**377–400.Wiin-Nielsen, A., 1979: Steady states and stability properties of a low-order barotropic system with forcing and dissipation.

,*Tellus***31****,**375–386.Williamson, D. L., 1983: Description of the NCAR Community Climate Model (CCM0B). NCAR Tech. Note, NCAR/TN-210+STR, 88 pp.

Wirth, V., 2001: Detection of hidden regimes in stochastic cyclostationary time series.

,*Phys. Rev. E***64****.**016136, doi:10.1103/PhysRevE.64.016136.Wong, E., , and M. Zakai, 1965: On the convergence of ordinary integrals to stochastic integrals.

,*Ann. Math. Stat.***36****,**1560–1564.

## APPENDIX

### White-Noise Approximation and Stratonovitch Systems

This appendix contains an outline in which cases of a continuous deterministic process can be approximated by a Stratonovitch SDE, and the ways in which a Stratonovitch system differs from an Itô system. It is a synopsis of the various discussions in the reviews by Gardiner (1983), Risken (1984), Horsthemke and Levefer (1984), and Penland (2003a, b). The reader is pointed to these references for further details and proofs.

*δ*-correlated noise, that is, the noise at time

*t*and time

*t*+

*τ*is uncorrelated for arbitrarily small time increments

*τ*. For physical systems with continuous sample paths, the processes modeled by noise are not uncorrelated, but have a small finite correlation time, which strictly destroys the Markov property. Such systems obey the differential equation:where

*b*(

_{j}*t*) is a forcing with some nonzero correlation time (Gardiner 1983). Wong and Zakai (1965) have shown that if

*b*(

_{j}*t*) is a Markov process, then in the limit that it becomes a

*δ*-correlated process, (A1) converges weakly to the Stratonovitch SDE:where (S) denotes that the stochastic noise is interpreted in the Stratonovitch sense [see e.g., Gardiner (1983) for a proof]. A necessary condition for the white-noise approximation of processes with finitely correlated noise is a sufficient time-scale separation between the decorrelation time of the noise and the decorrelation time of the deterministic processes (Wong and Zakai 1965; Papanicolaou and Kohler 1974). Only if such a time-scale separation exists, is it possible to find a stochastic model on a time scale that is larger than the decorrelation time scale of the noise, but small enough to resolve the deterministic dynamics. Often the goal is not to find a unique time scale for which this is possible, but a range of time scales. Since the GCM is conceptually of the form (A1), it will converge to a Stratonovitch SDE and all results need to be interpreted in the Stratonovitch sense.

*g*(

_{ij}**,**

*ξ**t*)Γ

_{j}(

*t*) is evaluated in the respective calculus. It is always possible to change from a Stratonovitch SDE to an Itô SDE and vice versa. Whenever possible, we will work with the equivalent Itô SDE rather than the Stratonovitch SDE, since it is computationally advantageous. Regardless, as long as the SDE is obtained as the white-noise approximation of a continuous process with finitely correlated noise, all results must be interpreted in the Stratonovitch sense.

*h*(

_{i}**,**

*ξ**t*), a second term that represents the influence of the integrated effects of the noise as systematic forcing on the resolved variables. It originates in the fact that for a Stratonovitch system with multiplicative noise, the noise

**Γ**(

*t*) and its amplitude

*g*(

_{ij}**x**,

*t*) are correlated over a small but finite amount of time (Risken 1984). This term also appears in the FPE (3), which for a Stratonovitch system has the coefficients:From (A4), one can see that for a Stratonovitch system the drift coefficient

*A*is the sum of the deterministic drift

_{i}*h*and the noise-induced drift

_{i}*A*

_{i,noise}=

*g*(

_{kj}**x**,

*t*) [∂

*g*(

_{ij}**x**,

*t*)/∂

*]. Without knowledge of the underlying Stratonovitch SDE (A2) it is impossible to distinguish the contributions from the deterministic drift from those of the noise-induced drift.*

_{k}The moment definitions (4, 5) of drift and diffusion are unchanged for Stratonovitch SDEs. However, if drift and diffusion are estimated from data generated by a system with finitely correlated noise, they cannot be obtained from infinitely small time lags *τ* → 0, (even if the data were available on such a time scale), but must be obtained from the smallest time lag for which the white-noise approximation holds.

Since for Stratonovitch systems the noise-induced drift is a function of *g _{ij}*(

**x**), the non-uniqueness in

*g*(

_{ij}**x**) leads now also to a non-uniqueness in the noise-induced drift

*A*

_{i,}_{noise}, and hence to a non-uniqueness in the deterministic part

*h*(

_{i}**x**). From this follows that different members of the class of stochastically equivalent Stratonovitch processes have—in addition to different noise amplitudes—also different noise-induced drifts and deterministic parts. In particular, in the context of empirically fitting the FPE, it is impossible to determine a unique deterministic drift

*h*(

_{i}**x**) from drift and diffusion alone.

^{1}

More precisely, the measure used is the fractional reduction in the mean square of mean tendencies when their nonlinear component is removed.

^{2}

In this context, the term time-scale separation strictly refers to a significant difference in the decay rates of the autocorrelation function of fast and slow processes, which does not necessarily imply a gap in the frequency spectrum. An example for a system that has time-scale separation, but no peaks in the frequency spectrum, is the linear model forced by red noise described by Newman et al. (1997) and DelSole (2000).

^{3}

The terms resolved and unresolved are used strictly to refer to the deterministic and stochastic part of the stochastic differential equation.

^{4}

Selecting patterns with large decorrelation times over patterns with small decorrelation times is another form of time-scale separation. It refers to the decorrelation times of the combined fast and slow processes in the different phase space, not to the decorrelation times of fast and slow processes contributing to the combined dynamics within a given phase space.

^{5}

Details on numerical schemes for SDEs and their accuracy and convergence properties can be found, e.g., in Kloeden and Platen (1992), Ewald et al. (2004), and Penland (2003a, b).