## Abstract

Variability in the climate system involves interactions across a broad range of scales in space and time. While models of slow “climate” variability may not explicitly account for fast “weather” processes, the dynamical influence of these unresolved scales cannot generally be ignored. Perspectives from statistical physics indicate that if the scale separation between slow and fast scales is sufficiently large, deterministic parameterizations are appropriate, while for smaller scale separations the parameterizations should be nondeterministic. The method of “stochastic averaging” provides a framework for the reduction of coupled fast–slow systems into an effective dynamics of the slow variables. This study describes the hierarchy of approximations associated with stochastic averaging and applies this reduction methodology to two idealized models: a Stommel-type model of the meridional overturning circulation and a model of coupled atmosphere–ocean boundary layers. Finally, stochastic averaging is compared to other stochastic reduction strategies that have been applied to climate models.

## 1. Introduction

The earth’s climate system is characterized by variability coupled across a broad range of space and time scales (e.g., Palmer et al. 2008; Hurrell et al. 2009). The variability of the faster parts of the system is both modulated by and feeds back on the slower components of the system. It is conventional to refer to the slower parts of the system as “climate” and the faster parts as “weather”, keeping in mind that (i) this distinction is operational in that the climate on one time scale may be weather on another, and (ii) the time-scale separation between weather and climate may not be large. Nevertheless, this fast–slow separation is useful conceptually and generally underlies the development of subgrid-scale parameterizations (either implicitly or explicitly). As ocean–atmosphere processes on larger spatial scales tend to evolve more slowly, we can think of fast and slow variability as respectively being spatially small and large scale.

Statistical physics provides a natural framework with which to address variability of systems with coupled scales (e.g., Reichl 1998; Honerkamp 2002). As a concrete example, consider a monatomic ideal gas of *N* identical atoms in contact with an external heat bath of temperature *T*. Atoms in the gas are free to exchange energy with each other and with the external heat bath: the energy *E _{i}* of any individual atom at any time is given by the Maxwell–Boltzmann distribution

*p*(

*E*). The mean internal energy of a gas of

_{i}*N*atoms

is a random variable with probability density function (pdf)

Following standard treatments of statistical mechanics, we have that in the limit that *N* → ∞ (the “thermodynamic limit”) the mean energy *E* becomes a deterministic function of the other thermodynamic variables

That his limit holds as *N* → ∞ follows from the law of large numbers: as the gas is ideal, the energy of each atom is an independent random variable so

The variance of mean internal energy fluctuations var(*E*) vanishes as *N* → ∞. In the limit of an arbitrarily large “scale separation” between the microscale (the individual atoms) and the macroscale (the population of atoms), it is meaningful to talk about a mean internal energy as a deterministic function of other state variables (such as temperature). For finite *N*, however, fluctuations of *E* around are nonzero. The quantity

has mean zero and variance var(*E _{i}*) independent of

*N*. For finite

*N*, the mean internal energy of the system at any time is the random variable ; by the central limit theorem, the distribution of

*E*approaches a Gaussian as

*N*increases. We see that in this system, for large but finite-scale separation between the micro- and macroscales, the macroscale state of the system is given by a value averaged over the probability distribution of the microscale quantities plus a stochastic correction term.

This perspective from statistical physics suggests that if the time-scale separation between weather and climate is very large, then the net effect of the fast weather on the slow climate averages out on climate time scales and can be represented as a deterministic function of the climate state. However, for a finite time-scale separation, a given climate state is not associated with a unique upscale influence and stochastic corrections must be accounted for. This line of reasoning underpins the motivation for the representation of unresolved scales by stochastic processes (e.g., Palmer and Williams 2010). Since the early work by Mitchell (1966) and Hasselmann (1976), stochastic differential equations (SDEs) have been used to model effective climate dynamics influenced by fast weather variability (general introductions to SDEs are presented in Gardiner 1997 and Horsthemke and Lefever 2006; an introduction in the context of climate modeling is presented in Penland 2003). These stochastic climate models have generally assumed the form of the reduced model, estimating parameters from observations (e.g., Penland 1996; Kravtsov et al. 2005); or replaced unresolved variables with stochastic processes in mechanistic models (e.g., Farrell and Ioannou 1993; DelSole 2001; Monahan 2002c, 2006; Sura and Newman 2008). Such studies have displayed considerable success in accounting for a broad range of observational results, in settings as diverse as El Niño and the Southern Oscillation, extratropical atmospheric dynamics, and atmosphere–ocean boundary layer variability. These successes provide ample evidence for the utility of the stochastic ansatz. However, relatively little attention has been paid to the explicit development of effective models of the slow climate dynamics from coupled fast–slow systems. Such a program has been carried out by Majda et al. (2003, 2005, 2010) (the so-called Majda–Timofeyev–Vanden-Eijnden (MTV) theory] for the case of fast–slow systems in which the strength of the forcing of slow by fast variables increases with the scale separation so that SDEs are obtained in the strict limit of an infinite scale separation. This earlier analysis, which yields explicit formulae for the effective dynamics of the slow system, has been applied to idealized atmospheric models in Franzke et al. (2005) and Franzke and Majda (2006) and has demonstrated that even in the absence of a large scale separation between fast and slow variables, an effective stochastic model of the climate can be produced that captures essential aspects of the full coupled system (although Culina 2009 and Strounine et al. 2010 demonstrate examples in which features of interest are not captured by the MTV reduction).

The MTV reduction method is a particular case of stochastic averaging, a more general framework for the reduction of coupled fast–slow systems to an effective dynamics of the slow variable (e.g., Freidlin and Wentzell 1998; Rödenbeck et al. 2001; Arnold et al. 2003; Vanden-Eijnden 2003; Fatkullin and Vanden-Eijnden 2004). Stochastic averaging generalizes classical averaging procedures (e.g., Sanders and Verhulst 1985), by which the slow dynamics is averaged over the fast variables to produce an effective deterministic dynamics in the slow variables alone. Stochastic corrections to the averaged forcing are given by explicit formulae in terms of the statistics of the fast variables and the dynamics of the slow variables. Stochastic averaging provides a general systematic framework for the derivation of effective stochastic climate models from coupled weather–climate systems.

An introduction to stochastic averaging is presented in section 2. The discussion in this section does not introduce any new mathematical results; rather, it presents established results in a context familiar to researchers in atmosphere, ocean, and climate dynamics. The focus is intuition building rather than mathematical detail. Subsequently, this study considers the application of stochastic averaging to two idealized climate models. First, section 3 considers the reduction of a simple Stommel (1961)-type model of the meridional overturning circulation (MOC), for which an effective stochastic dynamics of the (slow) meridional salinity gradient is obtained by averaging over the variability of the (fast) meridional temperature gradient. A coupled model of sea surface winds, surface air temperature (SAT), and sea surface temperature (SST) is considered in section 4, in which effective stochastic dynamics of the coupled SAT–SST system are derived. A discussion and conclusions are presented in section 5. A number of illustrative “tutorial” examples of the reduction procedure are presented in the appendix. These examples are strictly didactic and are included to illustrate the application of the reduction method. The models considered in this study and in a previous study by Arnold et al. (2003) are highly idealized; the application of stochastic averaging to a much more complex system (a two-layer quasigeostrophic channel model) is presented in Culina (2009) and Culina et al. (2011).

## 2. Stochastic averaging

Following Arnold et al. (2003), consider the coupled fast–slow system

where , and

The quantity *τ* measures the time-scale separation between *x* (the so-called slow variables) and *y* (the so-called fast variables). The process of stochastic averaging involves computing averages over the conditional statistics of *y* for *x* fixed—that is, averages over weather variability for a fixed climate state. For convenience, we will assume that the invariant measure of *y* (denoted *μ _{x}*) is sufficiently smooth that the conditional pdf

*p*(

*y*|

*x*)

*dy*=

*μ*(

_{x}*dy*) exists. Subject to certain caveats, one of the most important of which is that the dynamics of

*y*be sufficiently “mixing” (e.g., with a sufficiently rapidly decaying autocorrelation function) the stochastic averaging results hold. These results will now be discussed informally; more precise discussions are presented in, for example, Papanicolauou and Kohler (1974), Freidlin and Wentzell (1998), Kifer (2001), Arnold et al. (2003), or Vanden-Eijnden (2003). Note that the stochastic averaging results hold if the dynamics of

*y*are nondeterministic [i.e., if Eq. (7) is an SDE] or if they are deterministic but sufficiently chaotic. Effective stochastic dynamics are known to emerge from deterministic dynamics in the limit of an arbitrarily large number of degrees of freedom (e.g., Ford et al. 1965) or rapidly decorrelating chaos (e.g., Beck and Roepstorff 1987; Majda et al. 2006). This last fact is manifest practically in the use of pseudorandom number generators to simulate stochastic processes with high accuracy (e.g., Kloeden and Platen 1992).

### a. Averaging approximation (A)

Defining

then for *τ* sufficiently small , where

The averaged approximation Eq. (10) will be denoted the “(A) approximation.” This approximation is expected to be good (in general) only up to some (*τ*-dependent) time *T*_{1}(*τ*), after which the trajectories *x*(*t*) and will diverge significantly. For a system with large time-scale separation and fluctuations in *y* that are “small enough” that they are well averaged over typical time scales of *x*, the (A) approximation can be expected to be good on climate time scales. Note that if the climate tendency *f*(*x*, *y*) is nondeterministic, then this will also be true of the (A) approximation.

For sufficiently large *T*_{1}, will settle down onto an attractor, which will in general differ substantially (both quantitatively and qualitatively) from that obtained from Eq. (6) by simply replacing *y* by its averaged value *E*{*y*} (the so-called bare truncation). In other words, the dynamics averaged over the fast variable may differ substantially from the dynamics obtained by simply neglecting fluctuations in the fast variable. This fact is familiar in the fluid mechanical context of classical Reynolds averaging. Denoting by and *u*′* _{i}*(

*x*,

*t*) the low- and high-pass-filtered components (defined by a running average over a window of width Δ) of the fluid velocity field

*u*(

_{i}*x*,

*t*), the averaged momentum budget [with force per unit mass

*F*(

*x*,

*t*)] is

The functional form of the effective dynamics differs from that of the full field *u _{i}*(

*x*,

*t*) through the inclusion of a Reynolds stress term, which through the averaging operator depends on the distribution of

*u*′

*(*

_{i}*x*,

*t*) conditioned on . Note that the existence of a time-scale separation between and

*u*′

*(*

_{i}*x*,

*t*) is implicit in Reynolds averaging: for the averaging operator to be a true projection operator [i.e., for ] there must be a spectral gap separating the time scales of the averaged and eddy fields.

### b. Linear diffusion approximation (L)

For *τ* > 0 the full and averaged trajectories of the slow variable will differ (although they may be close). For small *τ*, over the time scale *T*_{1}, the difference between the averaged and full trajectories

approaches (in distribution) the solution to the (nonautonomous) linear SDE

where is a white noise process,

is the Jacobian of the averaged dynamics, and the diffusion matrix *σ*^{2}(*x*) is given by

The quantity *C _{f′f′}* (

*s*;

*x*) is the lag covariance of the climate variable tendency:

where *f*′* _{j}*(

*x*,

*y*) is the tendency anomaly

In the above, *y*(*t*; *x*)denotes a solution to Eq. (7) with *x* fixed. The averaged approximation with the stochastic correction

is denoted the (L) approximation. In practice, the (L) approximation is meaningfully superior to (A) in situations in which fluctuations in *y* do not completely average out over the time scales of *x*, as a result of finite scale separation or large variance of *y*.

In this approximation, the stochastic process *ζ*(*t*) describes a correction superimposed on the (A) trajectory ; that is, these corrections do not feed back upon and influence the averaged trajectory. As the process *ζ*(*t*) is random, the precise trajectory of any (L) approximation *x*(*t*) will of course differ from that of the solution to the original system, but for *τ* sufficiently small these two trajectories will have the same statistical features (e.g., pdf and autocorrelation scale). The (L) approximation is analogous to that of the ideal gas example considered in the introduction: in the case of a large but finite time-scale separation a better representation of the effective dynamics involves a stochastic correction term with statistical features determined by the dynamics of both the fast and the slow variables.

The time-scale separation Eq. (8) implies that

That is, the standard deviation of the “correction” stochastic process *ζ*(*t*) scales as times the standard deviation of *f′* for *x* fixed. If std(*f′*) is bounded as *τ* decreases, then *σ*(*x*) → 0 as *τ* → 0. In the limit of infinite time-scale separation, if the term describing the upscale influence of *y* on *x* is bounded, then the stochastic corrections of the (L) approximation vanish and we recover the (A) approximation. This result is analogous to the law of large numbers: for infinitely large sample sizes, the (random) sample mean converges to the population mean. Conversely, if std(*f′*) scales as *τ*^{−1/2} (as in the MTV approximation; Majda et al. 2001), then *σ*(*x*) remains nonzero in the strict *τ* → 0 limit. This point is further discussed in example 4 of the appendix.

Heuristically, the (L) approximation arises as follows (considering scalar *x* and *y* for notational simplicity). Starting with Eqs. (6)–(18), we have

for *ζ* sufficiently small. From Eq. (10) it follows that

We can write

where the parameter *τ* scales the “memory” of *f′*(*t*), and the constant *β* is defined so that

independent of *τ*. As *τ* → 0, *h*(*s*/*τ*)/*τ* becomes increasingly concentrated around *s* = 0 and arbitrarily large at *s* = 0. That is,

For *τ* small,

where is a white noise process scaled by the factor *σ*, and where *σ* is given by Eq. (15). We note in passing the analogy between quantity *σ*(*x*) defined by Eq. (15) and the Lagrangian integral length scale of transport in isotropic, homogeneous turbulence (e.g., Tennekes and Lumley 1972).

In the (L) approximation, the stochastic correction to *x*(*t*) is described by a linear SDE driven by additive noise (i.e., it is a multivariate Ornstein–Uhlenbeck process). The linear operator corresponds to the linearization of the *averaged* dynamics around the (A) trajectory. The “bare truncation” linearized dynamics has been augmented by that part of the averaged influence of the fast variable, which can be described as a linear operator. The (L) model is a more general statement of the ansatz that the influence of fast weather variability on the slow climate variable can be represented as linear dynamics augmented with white noise variability (e.g., Penland and Sardeshmukh 1995). Models based on this ansatz have seen broad application in stochastic modeling of climate processes such as the El Niño–Southern Oscillation (e.g., Penland 1996; Kleeman 2008), midlatitude atmospheric variability (e.g., Farrell and Ioannou 1993; Newman et al. 1997; Whitaker and Sardeshmukh 1998; DelSole 2001), and dynamics of the meridional overturning circulation (e.g., Tziperman et al. 2008; Alexander 2008). A major motivation for the consideration of these models is their practical utility: they are amenable to both analytic solution and estimation from observations. However, such models cannot capture features in *ζ* such as non-Gaussian pdfs or long time-scale behavior like transitions between metastable circulation regimes that are observed in some climate processes. Such effects require that the stochastic corrections feed back on *x* through dynamical nonlinearities or state-dependent (multiplicative) noise, a point discussed in detail in Sardeshmukh and Sura (2009). It is to such approximations that we now turn.

### c. Nonlinear diffusion approximations (N+), (N)

The most accurate approximation of the coupled fast–slow system (N+) allows the fast variables to feed back on the (generally nonlinear) dynamics of the slow variables, as expressed by the SDE

where and *σ*(*x*) are given respectively by Eqs. (9) and (15). This approximation is generally valid on much longer time scales than (A) or (L). As with the (L) approximation, *x*(*t*) predicted by Eq. (26) converges to *x*(*t*) from the full system statistically (in terms of pdfs and autocorrelation structure), not for individual climate trajectories. In this most general approximation, the model for *x* becomes fully stochastic, allowing for dynamical nonlinearities and multiplicative noise determined explicitly by the slow dynamics *f*(*x*, *y*) and the statistical properties of the fast variable.

In Eq. (26), the first and second expressions for the SDE are the Stratonovitch and Ito forms, respectively. The noise-induced drift *G*(*x*) is given by

This quantity can be difficult to compute for very complex systems. In the notation of Arnold et al. (2003), the (N) approximation neglects the *O*(*τ*) noise-induced drift term. The derivation of Eq. (27) is straightforward; for notational simplicity we will consider the scalar case. The noise-induced drift associated with moving from Ito to Stratonovitch representation is (½)*σ*(*x*)∂* _{x}σ*(

*x*), so with

*σ*

^{2}(

*x*) given by (15), we compute directly

from which Eq. (27) follows (invoking ergodicity).

As with the (L) approximation, the stochastic terms in the (N+) approximation vanish as *τ* → 0 unless in the full system the upscale influence of the fast variables on the slow is rescaled appropriately as the time-scale separation increases. Such rescaling (as in MTV theory) allows the stochastic terms to survive in the strict *τ* = 0 limit. Without this rescaling, the (N+) approximation with *σ* ≠ 0 represents a “small *τ*” theory, rather than a *τ* = 0 theory.

Note that the hierarchy of approximations (A), (L), and (N+) are not to be interpreted as corresponding to successive terms in a formal series expansion in *τ*; rather, they represent approximations appropriate on different time scales. The (A) and (L) approximations are valid over shorter time scales than (N+); what defines “short” in this context is problem dependent. For many practical problems, (A) and (L) may be sufficient on time scales of interest. The (N+) approximation is more general, but it is also more difficult to implement.

## 3. Idealized “Stommel type” model of the meridional overturning circulation

The Atlantic MOC (AMOC) is an important component of the global climate system, transporting heat from the South Atlantic to the North Atlantic and playing a central role in the deep circulation of the World Ocean. Variability in the strength of the AMOC has been implicated in climate variations on time scales from decades to millennia: in particular, evidence (from models and paleoclimate proxies) that the AMOC can undergo transitions between large-scale “circulation regimes” suggests that it plays a fundamental role in variability such as Dansgaard–Oeschger oscillations (e.g., Rahmstorf 2002). Many fundamental questions remain regarding the mechanisms driving the AMOC and the character of its variability; idealized stochastic models have played a useful role in addressing some of these issues (e.g., Monahan et al. 2008).

In particular, Timmermann and Lohmann (2000, hereafter TL) considered a version of the Stommel (1961) two-box model in which the full two-dimensional dynamics (of meridional gradients of temperature and salinity) were reduced to a one-dimensional SDE (with multiplicative noise) in salinity alone through the assumption that the time scale of temperature variability is very much shorter than that of salinity. As was discussed in Monahan et al. (2002), the interpretation of the multiplicative noise terms in TL as arising from temperature fluctuations is problematic as it involves the simple transfer of the fluctuating temperature variable from inside an absolute value function to outside. In Monahan et al. (2002) and Monahan (2002c), the multiplicative noise term was instead interpreted as arising from variable large-scale diffusion (or horizontal gyre transport, cf. Monahan 2002b). The question remains regarding the correct limiting salinity dynamics when the time scale of temperature variability becomes very short; stochastic averaging provides a natural tool for addressing this question.

Following TL, we consider the idealized Stommel-type model

where the state variables *x* and *y* are the meridional salinity and temperature gradients, respectively, and *μ* is the mean meridional freshwater forcing gradient. The model state variables and parameters have been nondimensionalized so that the mean atmospheric temperature gradient (to which the oceanic temperature gradient relaxes) is unity, and the time scale of salinity dynamics is *O*(1). High-frequency variability in the freshwater and temperature forcing are modeled as white noise processes scaled, respectively, by the coefficients *σ _{A}* and

*σ*. This model and the scaling of the various terms are described in detail in Monahan (2002c).

_{M}For small *τ*, the first term in Eq. (30) becomes negligible compared to the other terms; TL approximated Eq. (30) by

The stationary pdf of *y* is then Gaussian with mean 1 and variance . It was assumed in TL that in the *τ* → 0 limit the effective equation for *x* takes the form

That is, it was assumed that the deterministic part of the effective stochastic dynamics was given by the full system with fluctuations in *y* set to zero (i.e., the “bare truncation”) and that the induced multiplicative noise was linear in the state variable. In fact, an explicit calculation using stochastic averaging demonstrates the form of the reduced model is quite different.

Averaging the slow dynamics over the invariant distribution *p*(*y*) we obtain

where we have explicitly separated the deterministic and stochastic terms in the averaged tendency. The (A) approximation is an additive SDE with stationary distribution

where is a normalization factor. Extrema of this pdf occur where (maxima around stable equilibria; minima around unstable). In particular, the number of steady states of the deterministic tendency determines if the pdf is unimodal or multimodal (i.e., if there is a single or there are multiple climate regimes). The number and location of the deterministic equilibria of the averaged model are determined by the parameters *μ* and *σ _{M}*; plots of as a function of these parameters are presented in the upper row of Fig. 1. For

*σ*= 0 three deterministic steady states exist over the range 0 ≤

_{M}*μ*≤ ¼, bounded on either end by fold bifurcations at which two of these solutions vanish. As

*σ*increases, this range of

_{M}*μ*values over which multiple deterministic steady states exist decreases. It is evident from Fig. 1 that the averaged dynamics can be very different from the bare truncation of the full system [i.e., Eq. (29) with

*y*= 1]. The differences can be in both the details of the dynamics (e.g., the location of deterministic equilibria) as well as more basic features (e.g., the number of regimes).

For this system, the diffusion coefficient *σ*^{2}(*x*) must be determined numerically from Eq. (15) with

The diffusion *σ*(*x*) can be written as , where depends only on the parameter *σ _{M}*; a plot of as a function of

*x*and

*σ*is presented in Fig. 2. Note that

_{M}*σ*(

*x*) is not a linear function of

*x*(as had been assumed in TL); in fact, for smaller values of

*σ*the diffusion coefficient is not even a monotonic function of

_{M}*x*.

The (N+) approximation SDE

has the associated Fokker–Planck equation for the stationary density *p _{s}*(

*x*)

with solution

where is a normalization constant. Plots of *p _{s}*(

*x*) as a function of

*μ*and

*σ*for

_{M}*τ*= 0.1 and

*σ*= 0.1 are displayed in Fig. 3. For most values of

_{A}*μ*in the bistable range, only one regime is substantially populated (e.g., the regime is

*stabilized*, cf. Monahan 2002c). As

*σ*increases, there are shifts in both the location of the climate regimes and the range of

_{M}*μ*displaying multiple regime behavior. These shifts reflect both changes in and in the influence of multiplicative noise (cf. Monahan 2002a,c). The second of these effects disappears as

*τ*→ 0 but the first persists.

A comparison of *p _{s}*(

*x*) as predicted from stochastic averaging [Eq. (38)] with that estimated from direct numerical simulations of the SDE [Eqs. (29) and (31)] over a range of

*τ*is presented in Fig. 4 for

*σ*= 0.05 and

_{A}*σ*= 0.25. For small

_{M}*τ*, the agreement between the averaged and numerically simulated pdfs is excellent. For larger

*τ*, the averaged solution diverges from that of the full system, but even for

*τ*= 1 the stationary pdf

*p*(

_{s}*x*) of the averaged and full systems is qualitatively similar.

Note that the starting approximation for the analysis considered in TL assumed that for small *τ* the temperature gradient variability *y* can be modeled as a red noise process independent of the salinity gradient *x*. In fact, for intermediate and large values of *τ* this approximation is not appropriate [consideration of *τ* = 0.5 and *τ* = 1.0 above was for illustration of the averaging process rather than reduction of the full system given by Eqs. (29)–(30)]. Reduction of the full coupled model [Eqs. (29) and (30)] requires consideration of the statistics of the *y* dynamics for fixed *x*. The stationary Fokker–Planck equation associated with this SDE can be solved analytically to yield the stationary pdf of *y* conditioned on *x*:

Clearly, as *τ* → 0, *p _{y}*(

*y*|

*x*) approaches the TL limit of Gaussian distribution with mean 0 and variance . The averaged drift is contoured as a function of

*x*,

*μ*, and

*σ*for

_{M}*τ*= 0.1 and

*τ*= 1 in the second and third rows of Fig. 1. As expected, for

*τ*small the differences between from the full model and from the TL approximation are small. For larger values of

*τ*, these differences (including differences in the number and location of fixed points) are stronger. Of course, the averaging approximation itself is expected to be least accurate for these larger values of

*τ*. The diffusion coefficient

*σ*

^{2}(

*x*) computed from the full system (contoured as a function of

*x*and

*σ*in Fig. 5) also differs from that of the TL system for larger values of

_{M}*τ*, particularly in the neighborhood of

*x*= 1. As

*τ*decreases, the two diffusion coefficients of the TL and full models come into agreement.

## 4. Coupled atmosphere–ocean boundary layers

The atmosphere and ocean interact through their respective boundary layers, exchanging momentum, energy, and material substances; these exchanges mediate the dynamics of the coupled climate system on seasonal to decadal time scales (e.g., Taylor 2000). The character of turbulence in the atmospheric boundary layer exercises a primary control on the strength of these exchanges; this turbulence both influences and is influenced by surface winds and thermal stratification. Because the atmospheric and oceanic boundary layers evolve over very different time scales, they are a canonical example of a coupled fast–slow geophysical system (e.g., Sura and Newman 2008). Frankingoul and Hasselmann (1977), in one of the first applications of a stochastic climate model, used a model of upper-ocean heat content driven by rapidly fluctuating surface winds as a basic “null hypothesis” explanation of the generically red spectrum of SST. Subsequent studies have addressed physical controls on the probability distributions of SAT and SST (Sura et al. 2006; Sura and Newman 2008) and of sea surface winds (Monahan 2004, 2006, 2010), informally deriving SDEs from the boundary layer heat and momentum budgets (respectively).

A model of the coupled wind–SAT–SST system follows from coupling the sea surface wind model of Monahan (2006) with the Sura and Newman (2008) model of SAT–SST dynamics:

The state variables of this system are (*u*, *υ*) (respectively, the components of the 10-m horizontal wind along and across the long-term mean wind) and (*T _{a}*,

*T*) (respectively, the anomaly relative to the long-term mean of SAT and SST). Equations (40) and (41) describe an idealized horizontal momentum budget averaged over an atmospheric boundary layer of depth

_{o}*h*(which is itself a function of the surface stratification

*T*−

_{a}*T*, as will be discussed). Tendencies in the horizontal wind are driven by imbalances between surface turbulent momentum flux [expressed using a bulk drag formula in terms of the wind speed and a drag coefficient

_{o}*c*(

_{d}*w*)], boundary layer top turbulent momentum flux (expressed in terms of an effective entrainment velocity

*w*), and a net “ageostrophic tendency” (subsuming both pressure gradient and Coriolis forces). The ageostrophic tendency is taken to have mean and fluctuating components [with respective vectors ]; for the sake of simplicity the fluctuating force is taken to be spatially isotropic and white in time. This model is described in more detail in Monahan (2006) and Monahan (2010).

_{e}The dynamics of *T _{a}* and

*T*, adapted by Sura and Newman (2008) from the earlier model of (Frankingoul and Hasselmann (1977), describes the anomaly heat budget of the coupled atmospheric column (through the depth of the troposphere) and oceanic upper mixed layer, linearized around the climate-mean state. Heat (both sensible and latent) is exchanged between the atmosphere and ocean with a net exchange coefficient

_{o}*β*=

*ρ*(1+

_{a}c_{s}c_{p}*B*) (where

*c*is the specific heat capacity of air at constant pressure,

_{p}*ρ*is the air density,

_{a}*c*is the bulk sensible heat transfer coefficient, and

_{s}*B*is the Bowen ratio; the standard model value of

*β*corresponds roughly to

*B*= 10). A nonzero mean air–sea temperature gradient

*θ*results in a nonzero average heat flux. Processes internal to the atmosphere and ocean (e.g., radiative fluxes, turbulent exchanges with the free atmosphere, or submixed layer ocean) relax

*T*and

_{a}*T*back toward their equilibrium values with rate coefficients

_{o}*λ*and

_{a}*λ*, respectively. The parameters

_{o}*γ*and

_{a}*γ*denote the effective heat capacities of the respective atmospheric and oceanic layers. The heat budgets of both the atmospheric and oceanic layers are perturbed by fast fluctuations representing variability in the heat budget not simply related to the resolved state variables (e.g., variability in turbulent fluxes, cloudiness, etc.). These white noise processes are assumed to be independent of each other and of those driving tendencies in the vector wind. Because this is an anomaly model of temperatures, the energy flux associated with the mean temperature difference

_{o}*θ*is expressed in terms of the wind speed anomaly,

*w*−

*μ*, where

_{w}*μ*is the mean wind speed.

_{w}The coupling of the winds and the temperatures in this system is two-way: the air–sea energy flux depends on the surface wind speed *w*, while the drag coefficient and boundary layer thickness both depend on the surface stratification *T _{a}* −

*T*. Monahan (2010) demonstrated that the variability of surface winds is only weakly affected by the direct dependence of surface exchange coefficients on surface stability; this dependence is neglected in the present study. A much more important process is the enhanced production (consumption) of turbulence kinetic energy in unstable (stable) conditions, which encourages the deepening (shallowing) of the atmospheric boundary layer. The boundary layer thickness

_{o}*h*is accordingly modeled as a function of the surface stratification

*T*−

_{a}*T*(Fig. 7):

_{o}(e.g., Samelson et al. 2006; Small et al. 2008; Monahan 2010). To prevent *h* from becoming unrealistically small in conditions of very stable stratification, a minimum value *h*_{min} is set. Note that one aspect of the boundary layer coupling neglected by this model is the influence of surface winds on the turbulence within the oceanic upper mixed layer and the associated variability in turbulent entrainment of colder waters from depth. A more complete treatment of the coupled system would include a prognostic model for the ocean mixed layer depth. Furthermore, maritime mixed layer depths are influenced by processes not associated with surface stratification (e.g., radiatively driven turbulence in stratocumulus clouds); for simplicity, these contributions to the variability of *h* are neglected in this study.

Model parameters were set to reproduce the observed variability at Ocean Station Papa (OSP; 50°N, 145°W). Daily-mean extended wintertime (November–March) data from 1949 to 1981 were considered. For *T _{a}* and

*T*daily anomalies were calculated relative to a smooth annual cycle fit by three sinusoids (periods of 12, 6, and 4 months) to a raw mean seasonal cycle. These model parameter values (Table 1) are based on values determined in Sura and Newman (2008), slightly tuned to improve the agreement between the model and observations. Note that because Sura and Newman (2008) fit parameters of a reduced stochastic model of

_{o}*T*and

_{a}*T*alone, their best-fit parameters are not necessarily those of the model Eqs. (40)–(43). The stratification/boundary layer height regression parameter

_{o}*α*was estimated from the data presented in Fig. 6. The mean boundary layer depth is set at an unrealistically high value (by an order of magnitude; cf. Fig. 6) so that the wind speed autocorrelation time scale matches that of observations (with the other model parameters adjusted accordingly to reproduce the correct wind speed pdf). Lag autocovariance functions of

*T*and

_{a}*T*and the joint pdf

_{o}*p*(

*T*,

_{a}*T*) (both observed and simulated) are displayed in the top panels of Fig. 7; the bottom panels illustrate the lag autocorrelation function and pdf of the wind speed

_{o}*w*. Evidently, the idealized model [Eqs. (40)–(43)] captures the basic features of observed variability.

We will consider **T** = (*T _{a}*,

*T*) as the slow variables and (

_{o}*u*,

*υ*) as the fast variables. In this case, an estimate of

*τ*is given by the ratio of characteristic time scales of the winds and

*T*:

_{a}The time-scale separation in this case is clearly not large; this system presents a strenuous challenge to the reduction technique. Equations (42) and (43) can be expressed in compact form as

where **W** = (*W*_{3}, *W*_{4}). Averaging over the fast wind variability for fixed **T** we obtain

where we have written *E*{*w*|*T _{a}* −

*T*} to emphasize that these are mean wind speeds conditioned on the air–sea temperature difference. The components of are contoured as functions of

_{o}*T*and

_{o}*T*in Fig. 8. Note that these components depend nonlinearly on

_{a}*T*and

_{a}*T*as a consequence of the dependence of the statistics of

_{o}*w*on the surface stratification through changes in

*h*. Were

*E*{

*w*|

*T*−

_{a}*T*} independent of

_{o}*T*−

_{a}*T*(as assumed in Sura and Newman 2008) the contours in Fig. 8 would all be straight lines antisymmetric across the neutral stratification

_{o}*T*−

_{a}*T*= 0 line. In fact, the dependence of on

_{o}*T*−

_{a}*T*is different for stable and unstable stratifications. The gradients in are different on either side of the neutral stratification line. The kink in in strongly stably stratified conditions is associated with the boundary layer thickness taking its minimum value. The dependence of the wind speed statistics on the surface stratification (via the thickness of the atmospheric boundary layer) has modified the averaged dynamics; these are not simply given by the full dynamics with

_{o}*w*replaced by

*μ*(the bare truncation).

_{w}The expression for *σ*(*T _{a}*,

*T*) takes a particularly simple form:

_{o}where

characterizes the state dependence of the noise terms in the (N+) approximation. A plot of *ψ* as a function of *T _{a}* −

*T*is presented in Fig. 9. This curve is manifestly nonlinear, taking a maximum at around

_{o}*T*−

_{a}*T*= 5 K. The kink in

_{o}*ψ*at around

*T*−

_{a}*T*is associated with

_{o}*h*reaching its

*h*

_{min}. In Sura and Newman (2008), the state dependence of noise (correlated with additive noise) is present only in the factor (

*T*−

_{a}*T*+

_{o}*θ*) in

*ψ*; accounting for the feedback of air–sea temperature differences on wind variability adds the extra factor under the square root in Eq. (49). This extra factor is responsible for the nonlinear dependence of

*ψ*on

*T*−

_{a}*T*. As a consequence of the modulation of the statistics of

_{o}*w*by changes in boundary layer thickness, the limiting SDE when wind speed variability is on shorter time scales than temperature variability is not simply obtained by replacing

*w*by with

*σ*a constant noise strength.

_{w}Transforming from Stratonovitch to Ito form, the SDE for **T** becomes

We can integrate this equation using standard numerical methods (e.g., Kloeden and Platen 1992) to obtain statistics of the reduced model in the (N+) approximation. Figure 10a compares the autocorrelation functions of *T _{a}* and

*T*and the joint pdf

_{o}*p*(

*T*,

_{a}*T*) to those of the full model; while the statistics of the reduced and unreduced models are in broad agreement, there are evident quantitative differences. In particular, the reduced model variables decorrelate more rapidly than those of the unreduced model, and the variance of

_{o}*T*in the reduced model is underestimated relative to the true value. In fact, the agreement between the full and reduced models is remarkably good considering the very poor time-scale separation (

_{o}*τ*~ 0.7) between the fast and slow variables. Figure 10b illustrates the reduced model statistics when

*σ*(

*T*,

_{a}*T*) is artificially reduced by a factor of 2. This simple retuning of the stochastic term brings the full and reduced models into much better agreement. The necessity of this retuning is a consequence of applying stochastic averaging to a system without a large time-scale separation, as is also seen in more complex systems (e.g., Culina et al. 2011). This rescaling of terms is similar to the “minimal regression” rescaling used in MTV theory to improve the approximation of the full system by the reduced model (e.g., Culina 2009; Franzke et al. 2005; Franzke and Majda 2006; Strounine et al. 2010).

_{o}This analysis has demonstrated how in an idealized, but physically relevant system, stochastic averaging can both produce accurate reduced models of the slow dynamics and provide insight as to how these are influenced by the feedback from the modulation of the statistics of the fast variables. Because of the poor time-scale separation between the slow and fast variables, some ad hoc retuning of the model was necessary. Application of these ideas to more complex systems (such as the two-layer quasigeostrophic channel model considered in Culina et al. 2011) requires the use of more sophisticated algorithms, but the underlying approach is the same as in this relatively simple case.

## 5. Discussion and conclusions

The mathematical theory of stochastic averaging discussed in this study provides a precise quantitative framework for investigating the effective dynamics of the slow (resolved) climate variables in the presence of fast (unresolved) weather variability. This framework characterizes a hierarchy of approximations to the full coupled fast–slow system. In the limit of infinitely large scale separation, the slow dynamics are deterministic, with tendencies given by those of the full system averaged over the statistics of the fast variable (conditioned on a fixed value of the slow variable). In general, these deterministic dynamics will differ from those associated with ignoring fluctuations in the fast variables and replacing them with their mean values (the so-called bare truncation). For large but finite scale separations, stochastic corrections emerge. A better approximation to the deterministic averaged system includes Gaussian corrections described by a linear stochastic differential equation (such as those considered in, e.g., Farrell and Ioannou 1993; Penland 1996; DelSole 2001; Kleeman 2008). At this level of approximation, the climate variable is described by the deterministic averaged dynamics with superimposed fluctuations that do not feed back on the averaged trajectory. A more accurate approximation describes the climate variable with a full stochastic differential equation, which may be linear or nonlinear and may involve additive or multiplicative noises. The drift and diffusion parameters of this SDE are predicted explicitly in terms of the original dynamics and the statistics of the fast variable.

Two examples of stochastic averaging were considered in this study: first, a simplified Stommel-type model of the meridional overturning circulation; and second, an idealized model of the coupled atmosphere–ocean boundary layers. The first example demonstrated how the averaged dynamics can differ substantially from the bare-truncation dynamics; in particular, the number and location of deterministic steady states can be changed by the averaging. Furthermore, as the time-scale separation is made arbitrarily large it was demonstrated that the full and reduced models come into good agreement. The second example, in which model parameters were set to reflect observed wind and temperature variability at Ocean Station Papa in the northeastern subarctic Pacific Ocean, we did not have the luxury of being able to adjust the time-scale separation between the fast and slow variables (respectively, wind and temperatures). In fact, in this example the time-scale separation was weak. Nevertheless, the statistics of the reduced model predicted by stochastic averaging were in reasonable agreement with those of the full model; this agreement was substantially improved by some slight ad hoc retuning of the stochastic corrections. Furthermore, this analysis demonstrated how the dynamics of sea surface temperature and surface air temperature are influenced by the dependence of wind speed statistics on the surface stratification—that is, how the modulation of the fast variables by the slow feeds back on the effective slow dynamics.

Other studies have considered the application of reduction techniques for deriving effective stochastic dynamics from a coupled fast–slow system. The so-called MTV theory is an example of precisely such a framework (e.g., Majda et al. 2001, 2003, 2005, 2010; Franzke et al. 2005; Franzke and Majda 2006; Strounine et al. 2010). In fact, MTV theory can be understood as a special case of stochastic averaging in which the influence of the fast variables on the slow is rescaled as the time-scale separation increases in such a way that an effective stochastic dynamics is obtained even in the strict limit of infinite time-scale separation (a simple exactly solvable case is presented in example 4 of the appendix). This assumed form of the dynamics has the advantage of dramatically simplifying computation of the parameters of the SDE; furthermore, the assumed quadratically nonlinear form of the dynamics assumed in MTV theory provides closed-form analytic expressions for the reduced dynamics. In practice, these simplifications are tremendously useful. However, they may not be consistent with the dynamics of the fast–slow system, and these inconsistencies may account for some of the biases that have been found in MTV reduced models (e.g., Culina 2009; Strounine et al. 2010). We further note that a reduction strategy similar to stochastic averaging known as “optimal prediction” was introduced by Chorin et al. (1998, 1999, 2000); the essential similarity between optimal prediction and stochastic averaging was demonstrated in Park et al. (2007).

In this study, stochastic averaging has been applied to highly idealized models. While the application of this reduction framework to more complicated models is the same in principle, in practice more sophisticated algorithms must be used. An application of stochastic averaging to a high-dimensional problem (a two-layer quasigeostrophic channel model) is presented in Culina et al. (2011), along with a detailed discussion of the algorithms required for this more complex system. As in the present study, it was found that the absence of a large time-scale separation requires some ad hoc a posteriori retuning of the stochastic corrections to obtain good agreement between the full and reduced models (as has been found to be the case with MTV theory; e.g., Culina 2009; Franzke et al. 2005; Franzke and Majda 2006). An important direction of future research is the investigation of the physical origin of such reduced model biases. Furthermore, to make practical the application of stochastic averaging to coupled weather–climate systems with modest time-scale separations, it is important to develop general strategies for the determination of such tuning parameters. The recent study of Strounine et al. (2010) presents such a strategy in the context of MTV theory; it would be interesting to see such an approach applied in the more general stochastic-averaging framework.

A potential benefit of stochastic reduction in systems with large time-scale separations between fast and slow variables is a significant increase in computational efficiency (e.g., Fatkullin and Vanden-Eijnden 2004). As discussed above, such a large separation often does not exist in atmosphere or ocean dynamics. The present study has focused on the existence and structure of the reduced dynamics; these are of interest on their own, in terms of the light they shed on the effective climate dynamics without explicitly modeled weather. A more detailed discussion of computational efficiency is given in Culina et al. (2011).

We will never in the foreseeable future be able to represent all dynamically relevant scales in models of the climate system, particularly as we ask questions about climate processes over ever longer time scales. Strategies for the construction of reduced models of the resolved variables alone will continue to be important. Mathematically rigorous strategies for the reduction of coupled fast–slow systems provide a potentially important direction of study, but these are only strictly valid in conditions that rarely obtain for real climate models. The great challenge of this program is to square the circle of balancing rigor with practical utility in the development of tools for systematically exploring the coupled dynamics of weather and climate.

## Acknowledgments

The Ocean Station Papa data were kindly provided by Philip Sura. We thank Christian Franzke for bringing “optimal prediction” to our attention. We would also like to thank Philip Poon, Cécile Penland, and two anonymous reviewers for their very helpful comments on the manuscript.

### APPENDIX

#### Toy Examples of Stochastic Averaging

The following are presented as toy “tutorial” examples of the application of averaging to simplified systems. Note that while the equations for the fast variables are linear in all of these cases for ease of analytic solution, this is not required by the averaging approach (cf. section 4).

##### a. Example 1

Consider the system

where Σ is a function of *x*. The stationary pdf of the fast variable *y* is Gaussian with mean 0 and autocovariance function

so and . Thus, we have the (A) approximation

with solution

Furthermore,

and so for small *τ* we have the (L) approximation

where

As *t* → ∞, in the (L) approximation *x*(*t*) approaches an Ornstein–Uhlenbeck process with variance *τ*Σ^{2}(0)/2.

In this case, the (N+) approximation corresponds to the predictions of the Wong–Zakai theorem (e.g., Horsthemke and Lefever 2006), which states (in essence) that as the autocorrelation time of noise driving a system approaches zero the limiting SDE is Stratonovitch.

##### b. Example 2

Consider the system

The stationary pdf of *y* (with *x* frozen) is Gaussian with mean 0 and autocovariance function

so

and

Thus, we have the (A) approximation

As *t* → ∞, (depending on the initial conditions).

Furthermore,

and so for small *τ* we have the (L) approximation

where

In the (L) approximation, the two attractors of are fixed points at ±1. Considering initial conditions *x*(0) for which , for large *t*

where

In this limit, *x* is Gaussian distributed with mean 1 and autocovariance function

For sufficiently small Σ, a trajectory of the full system (A10) and (A11) starting near *x* = 1 will spend most of its time in a small neighborhood of this point and will come to a local statistical equilibrium approximately described by the linearized system

Writing this SDE for a multivariate Ornstein–Uhlenbeck process in vector form:

the stationary covariance matrix is given by

where is the 2 × 2 identity matrix (e.g., Gardiner 1997) from which we compute

In the limit that *τ* → 0, reduces to the form given by Eq. (A21).

Over longer time scales, the (N+) approximation is more appropriate:

where the last expression is the SDE in its Ito representation.

The Fokker–Planck equation for the stationary pdf of *x* can be solved analytically

Plots of *p _{x}*(

*x*) from Eq. (A29) and calculated from long integrations of Eqs. (A10) and (A11) for

*τ*Σ

^{2}= 0.25 fixed and a range of

*τ*values are presented in Fig. A1. For larger values of

*τ*, the numerical simulation and the (N+) approximation do not agree well; as

*τ*is decreased, agreement improves. Note that

*p*(

_{x}*x*) from the (N+) approximation [Eq. (A29)] depends on

*τ*and Σ only through the combination

*τ*Σ

^{2}, so the (N+) approximation is the same for all values of

*τ*in this figure.

##### c. Example 3

Consider the system

for which the stationary pdf of **y** is bivariate Gaussian with mean 0 and autocovariance function

In this case,

so we have the (A) approximation

and the (L) approximation

where

(where the value of the prefactor was determined numerically). In this case, the (N+) and (L) approximations are the same. For **y** bivariate Gaussian, the distribution of ‖**y**‖ is Rayleigh:

(e.g., Monahan 2006); this distribution is positively skewed. To varying degrees, depending on the value of *τ*, the skewness of the fluctuating forcing is imprinted on the pdf of *x*. As *τ* decreases, the more “statistically independent” fluctuations of ‖**y**‖ there are per unit time scale of *x*; by the central limit theorem, the distribution of the resulting response approaches a Gaussian as *τ* → 0. Probability density functions of *x*(*t*) obtained from direct numerical integration of Eqs. (A30)–(A31) for varying values of *τ* and Σ (such that *τ*Σ^{2} is constant) are compared against the stationary pdf of *x* from Eq. (A35) in Fig. A2. As *τ* → 0, the pdfs of the full and reduced slow variables converge.

For the system in example two the multiplicative noise of the (N+) approximation allowed the reduced model to capture the skewness of *x*(*t*). In the present example, the (N+) approximation does not differ from (L), so the skewness of *x* in the full model is a feature that the reduced model cannot capture (except in the limit τ → 0, in which the skewness vanishes).

##### d. Example 4

Consider the system

(where for simplicity we restrict ourselves to the parameter range *a* < 1 so that the dynamics are stable). This system is of the form considered in MTV theory, in which there are three time scales: slow *O*(1); intermediate *O*(*τ*^{−1/2}); and fast *O*(*τ*^{−1}). In particular, the strength of the forcing of *x* by *y* scales as *τ*^{−1/2}, increasing as the time-scale separation increases. Note that unlike the previous examples the stationary distribution of *y* for *x* fixed is not independent of the time-scale separation *τ*. This model is exactly solvable; in particular, the variable **x** = (*x*, *y*) is bivariate Gaussian with covariance matrix given by Eq. (A25). It follows that

As *τ* → 0,

This limit only exists if *a* < 1; otherwise, the slow variable *x*(*t*) does not possess stationary statistics. Equation (41) is the autocovariance function of the solution *x*(*t*) of the SDE in the slow variable alone

which is the appropriate limiting SDE as *τ* → 0.

Now applying stochastic averaging: for *x* fixed, *y* is a Gaussian process with mean

and autocovariance

A direct application of averaging gives

so the (A) approximation is

Moving directly to the (N+) approximation:

where

so the averaging equation is

which is exactly the correct limiting dynamics from the earlier explicit calculation. It is worth noting that in computing the limiting dynamics, averaging was performed first and then the *τ* → 0 limit was taken. If the τ → 0 limit had been first, then the correction to the drift term would have been lost. Furthermore, the stochastic terms are still present in the *τ* → 0 limit because the factor of *τ*^{−1/2} in the term by which *y* drives *x* in the original dynamics exactly counterbalances the *τ*^{−1/2} entering the computation of *σ* from integration over the lag autocovariance function. Note that as discussed in Majda et al. (2002), the MTV reduction can be extended to the case *a* ≥ 1.

## REFERENCES

## Footnotes

^{*}

Current affiliation: Department of Mathematics and Statistics, Acadia University, Wolfville, Nova Scotia, Canada.