## Abstract

The fluctuation–dissipation theorem (FDT) has been suggested as a method of calculating the response of the climate system to a small change in an external parameter. The simplest form of the FDT assumes that the probability density function of the unforced system is Gaussian and most applications of the FDT have made a quasi-Gaussian assumption. However, whether or not the climate system is close to Gaussian remains open to debate, and non-Gaussianity may limit the usefulness of predictions of quasi-Gaussian forms of the FDT. Here we describe an implementation of the full non-Gaussian form of the FDT. The principle of the quasi-Gaussian FDT is retained in that the response to forcing is predicted using only information available from observations of the unforced system, but in the non-Gaussian case this information must be used to estimate aspects of the probability density function of the unforced system. Since this estimate is implemented using the methods of nonparametric statistics, the new form is referred to herein as a “nonparametric FDT.” Application is demonstrated to a sequence of simple models including a stochastic version of the three-component Lorenz model. The authors show that the nonparametric FDT gives accurate predictions in cases where the quasi-Gaussian FDT fails. Practical application of the nonparametric FDT may require optimization of the method set out here for higher-dimensional systems.

## 1. Introduction

The prediction of the response of the climate system to an external forcing or to a change in some parameter of the system is seen as a key problem in climate science. It is an example of the general mathematical problem of predicting the change in a dynamical system, for example, the change in the probability density function (PDF) of the system as manifested by changes in selected observables, as a response to a change in the parameters of the system. In a broad class of physical systems this problem can potentially be addressed using the fluctuation–dissipation theorem (FDT). The FDT remains an active topic of research interest—for example, to extend the range of systems to which it can be applied and to clarify the conditions under which it holds. For a recent review, see Marconi et al. (2008). The possible application of the FDT to climate was first suggested by Leith (1975) and over the last decade there has been a surge of interest in this topic (e.g., Majda et al. 2005; Gritsun and Branstator 2007; Majda et al. 2010a and references therein).

The FDT is relevant to cases where the applied forcing or change in parameters is sufficiently small that the change in the PDF, or correspondingly in any observable, is a linear function of the applied forcing. The FDT gives an expression for the corresponding linear operator, relating the response to the forcing, in terms of the statistics of the unforced system. This has potential advantages for determining the forced response in a model climate where, for example, it may be more efficient to calculate the response to a large set of different forcings via the FDT-determined operator rather than to carry out a large set of perturbation simulations. For the real climate it offers the possibility of determining the linear response operator from past climate observations. Additionally knowledge of the linear operator allows not only prediction of the response to a given forcing but also, if the inverse operator can be calculated, prediction of the forcing that excites a given response or indeed what sort of forcing will most efficiently excite a response.

One remarkable aspect of the FDT (e.g., Marconi et al. 2008) is that it does not require any detailed knowledge of the unperturbed governing equations of the system in question. It is naturally applicable to any system that behaves suitably randomly, which might be through deterministic chaos, that is, internally generated randomness, or through the inclusion of explicit randomness. Broadly speaking, the system must have a time-invariant PDF (which defines the climate) although extension to the time periodic case is possible; e.g., Majda and Wang (2010) and this PDF must be differentiable. However, there are well-known simple dynamical systems for which the differentiability property does not hold, and it remains an open question whether climate models (which might include simplified models with few degrees of freedom and complex models with very many degrees of freedom) have this property. One way of ensuring it is to add suitable explicit stochastic terms to the governing equations and, if these are small enough in magnitude, it is reasonable to assume that the overall properties of the system are not changed as a result.

Consider a dynamical system described by a state vector **x** and governed by a set of evolution equations that give *d***x**/*dt* in terms of **x**. The equilibrium state of the system is described by a PDF *ρ*(**x**), which is assumed to satisfy the differentiability requirement mentioned previously. A forcing *δ***f**(*t*) is applied to the system, meaning that *δ***f**(*t*) is added to the expression for *d***x**/*dt*. A more general perturbation where parameters appearing in the evolution equations are changed is straightforward to include (e.g., Risken 1984) but will not be considered further here except in the derivation presented in the appendix. The mean response to the forcing is given by

where 〈**x**_{0}(*t*)〉 is the mean state vector of the system at time *t* in the undisturbed system and 〈**x*** _{f}*(

*t*)〉 is the corresponding mean state vector with the applied forcing.

The mean denoted by 〈···〉 is an ensemble mean, for example, over all realizations of explicitly random terms that appear in the evolution equations, or over a large number of initial conditions if the randomness arises through deterministic equations. The FDT relates the response 〈*δ***x**(*t*)〉 to a small amplitude forcing *δ***f**(*t*), assumed only applied at times *t* > 0, through the equation

where superscript T denotes the matrix transpose (Deker and Haake 1975; Hänggi and Thomas 1977; Risken 1984; Majda et al. 2005; Marconi et al. 2008). A derivation of Eq. (1) is presented in the appendix, primarily to clarify its connection to other approaches to calculating the response, particularly that of Thuburn (2005).

The requirement of differentiability of *ρ*(**x**) is clear from the appearance of **∇***ρ* in expression (1). The expression (1) is a linear relation that will be valid only in the limit of a small forcing *δ***f**(*t*). It is, in principle, possible to write down an extended expression containing higher order terms, but the practical usefulness of such higher order expressions is not yet clear. For a linear dynamical system, of course, the expression (1) will be valid without restriction. For a nonlinear system, the skill of estimate (1) will depend on the magnitude of the forcing vector, but we expect that there is some small but finite range of magnitudes for which Eq. (1) will be of practical use.

In this paper, motivated by the steady-state response of the climate, or of a climate model, we will consider the special case of a steady forcing applied for *t* > 0 and the steady-state response 〈*δ***x**(∞)〉 in the limit *t* → ∞, which we hereafter denote 〈*δ***x**〉, given by

where from expression (1) the matrix is given by

It is useful to note at this point that , with the identity matrix, which may be shown by expressing the expected value as an integral over the state space and integrating by parts.

In both expressions (1) and (2) a significant simplification arises if the equilibrium system is assumed to be Gaussian, that is, that

in which is the lag *τ* covariance matrix, is the covariance matrix, and we have assumed without loss of generality that 〈**x**〉 = 0. Combining Eqs. (2) and (3) implies that

Note that in Eq. (4), as in Eq. (2), the integrand is equal to at *τ* = 0. Explicit dependence on *ρ*(**x**) has now disappeared and the response is predicted in terms of covariance matrices, which are in principle straightforward to evaluate from data from the unperturbed system. [A similar simplification would be possible for distributions other than Gaussian if the explicit form of *ρ*(**x**) were known.] Expression (4) and the corresponding version of the more general expression (1) applying to the time-evolving response are expressions of the quasi-Gaussian FDT, which is the form of the FDT that has appeared in the vast majority of applications.

The primary practical concerns in using Eq. (4) or similar forms of the quasi-Gaussian FDT are (see, e.g., Gritsun and Branstator 2007) in choosing an appropriate finite upper limit for approximation of the integrals, in having accurate estimates of the lagged covariance matrix for different values of *τ*, and for high -dimensional systems, avoiding inaccuracies associated with calculation of .

A related approach is to assume that the system may be modeled by linear equations driven by noise, that is, by equations of the form

where is a constant matrix and ** ξ** is a stochastic white noise term. The system is stable if eigenvalues of have negative real parts, and it then follows (Penland 1989) that

where *τ* can in principle be chosen freely [i.e., the expression in Eq. (6) is independent of *τ*, since ]. Note that (5) implies that , but Eq. (6) provides an estimate of when is unknown. Under the assumptions of Eq. (5) the expressions for in Eqs. (6) and (4) are equivalent. Note that the quasi-Gaussian assumption (3), which leads to Eq. (4), is more general than the assumptions leading to Eq. (6) since Eq. (5) implies Gaussianity.

The quasi-Gaussian form of the FDT has been applied to a general circulation model (GCM) by Gritsun and Branstator (2007). While results were encouraging, the magnitude of the difference between the response estimate and the true response was often up to 50% (and in some cases more), with predictive skill varying with the type of forcing (see their Fig. 4). Ring and Plumb (2008) applied the quasi-Gaussian FDT via the inverse linear modeling approach (6) to a simple dynamics-only GCM and found that their predicted response to a mechanical and thermal forcing was not quantitatively accurate. We have also found in work to be reported elsewhere that the Gaussian FDT fails to predict the correct response in a dynamics-only GCM. In none of these cases does the system being studied have an exactly Gaussian PDF nor, indeed, does the real atmospheric circulation (see, e.g., Branstator and Selten 2009). This suggests that it is worth investigating the possibility of implementing the general form of the FDT expressed by Eqs. (1) and (2) without any assumption of Gaussianity. Thuburn (2005) has already suggested an approach to calculating climate response for a non-Gaussian system using the Fokker–Plank equation (which assumes that explicit randomness is induced in the evolution equations via a white noise forcing), but makes no reference to the FDT. Eyink et al. (2004) and Abramov and Majda (2007, 2008, 2009) have also presented approaches requiring integration of linearized adjoint equations along trajectories of the unperturbed system. But these different approaches lose the essential simplicity of the FDT (1), which requires only observation of the unperturbed system.

In this paper the novel approach that we take is to estimate the PDF as part of a response calculation using a standard nonparametric kernel density estimation method. The assumption of a Gaussian PDF is then replaced by the assumption that the PDF may be sufficiently well estimated by a sum of kernel functions. In section 2 we introduce this “nonparametric FDT” and explain how it may be implemented in practice. In section 3 we apply the nonparametric FDT to some simple models to test its ability at predicting the response to a forcing compared against that of the quasi-Gaussian FDT. The relation to the other approaches to calculating responses in non-Gaussian systems, along with issues regarding the application of the nonparametric FDT to more complex models, is discussed in section 4.

## 2. A nonparametric fluctuation dissipation relation

### a. Application of the density estimation procedure

If *ρ*(**x**) appearing in Eqs. (1) and (2) is not known, then in principle it may be estimated from data. A standard method of estimating a multidimensional PDF is to use the kernel density estimator (Silverman 1986). For a given choice of kernel function *K* we estimate the PDF by centering a kernel function of specified scale *h* at each data point

Here is the estimated PDF, *n* is the number of available data points, **X*** _{i}* is data point

*i*,

*d*is the number of dimensions of the state vector

**x**, and

*h*> 0 is conventionally known as the bandwidth in the statistical literature (Silverman 1986). In effect the density estimate is simply a convolution of the data and the kernel function. As the number of data points

*n*→ ∞, the estimate of the PDF tends to the convolution of the true PDF

*ρ*(

**x**) and the kernel function

*K*,

where the asterisk denotes the convolution operator. See Silverman (1986, sections 3.5 and 3.7 and references therein) for a discussion of the rate of and weak conditions required for this convergence. Therefore, for any finite value of *h* the estimated PDF can only be an approximation to the true PDF and, indeed, should be regarded as a biased estimator with the bias being a function of *h*. As *h* → 0, *K* tends to a delta function and the bias tends to zero.

We use, for simplicity, an isotropic Gaussian kernel with identity covariance and, hence, write

where

denotes the multivariate normal distribution function with mean **y** and standard deviation *h*, evaluated at **x**. It is useful to note that

In estimating the operator in Eq. (2) we make two further approximations. First, we truncate the integral to have finite upper limit *T*. Second, we express the ensemble average appearing in Eq. (2) as a time average, justified by requiring the system to be sufficiently ergodic. Assume that the time interval between each data point **X*** _{i}* is Δ

*t*. Then for some lag

*s*Δ

*t*the integrand, Λ(

*s*Δ

*t*), may be approximated by

where, to recap, *h* is the bandwidth of the kernel density estimation, *m* is the number of points included in the sample used to obtain the time average, and *n* is the number of points included in the sum used for the density estimate. The operator may then be approximated as

where ** μ** is a vector of weights used to estimate the integral appearing in Eq. (2). In the calculations to be reported subsequently, the method of estimation is the trapezium rule applied to the integral truncated at some finite upper limit. It is expected that will be a good approximation of provided that the numbers

*m*and

*n*are sufficiently large, the bandwidth

*h*is sufficiently small, and the vector

**extends to large enough time lags and has a sufficient number of elements**

*μ**r*to accurately approximate the integral. Equation (12) may be regarded as the practical implementation of a nonparametric FDT that uses the methods of nonparametric statistics. No matrix inversion is required, in contrast to the standard quasi-Gaussian FDT (4). Note further that, while , it is not necessarily the case that . This suggests the alternative estimator

which is constrained to satisfy , with a correspondingly defined

Note that Eq. (11) assumes that all sample points are used in the estimation of **∇***ρ*/*ρ* (i.e., the summations in the numerator and denominator of the bracketed expression are over all sample points). But given that it is computationally expensive to compute the double sum appearing in Eq. (11), it may be that it is optimal only to use a subsample for summation in the expression within braces. Another way of saying this is that, as written, Eq. (11) assumes that the time intervals for which the lag correlations are evaluated are multiples of the time intervals of sampling used to estimate **∇***ρ*/*ρ*. But these time intervals could be multiples of some fraction of the latter sampling interval.

### b. Choice of bandwidth parameter h

For the practical application of Eq. (12) picking an optimal value of the bandwidth parameter *h* is crucial. Some insight into how to make this choice may be obtained by considering the PDF estimation procedure (7). The variance and bias of the kernel PDF estimator may be approximated, using Taylor’s theorem assuming bounded and continuous second derivatives (Silverman 1986), as

where

and

In Eqs. (16)–(18) standard suffix notation is used and the summation convention is used in Eq. (16). So, for large *h* the estimate has large bias but small variance, but for small *h* the estimate has a small bias but a large variance. When choosing *h* there is, therefore, a tradeoff between an incorrect answer (large bias) and uncertain answer (large variance). The variance is also a function of *n*, the number of data points, with larger *n* implying a more certain estimate of the PDF.

One approach to choosing an optimal value for *h* is to minimize the mean square difference between the true and estimated PDF, averaged over all **x**. The optimal value of *h* is then given by

where again the summation convention is applied. Note that *h*_{opt} reduces with *n* at the rate *n*^{−1/(d+4)}, with precise details depending on the density being estimated (Silverman 1986). Other approaches to choosing *h*_{opt} include maximizing an appropriate likelihood function and to estimate a PDF that is uniformly close to the true PDF. Again, see Silverman (1986) for further details.

While the approaches described above for optimizing the choice of *h* for the problem of estimating *ρ*(**x**) are not directly relevant to the estimation problem posed by Eqs. (2), (11), and (12), it is expected that some of the issues mentioned will also be relevant to this new estimation problem. Consider, for example, the part of Eq. (11) that corresponds to the estimate of **∇***ρ*(**x**)/*ρ*(**x**). By writing this as a Taylor series, assuming , then standard methods imply that

A new ingredient over the standard problem (16) is, therefore, that bias depends upon *n* and increases as *h* reduces. It is difficult to go further than this and estimate the bias of because of problems with divergent integrals, but the form of Eq. (21) suggests an optimal estimate at some finite value of *h*: *h*_{opt}, say, which depends upon *n*, with *h*_{opt} ∼ *n*^{−1/(d+2)}. The corresponding bias varies as *n*^{−2/(d+2)}. Thus, as *d* increases, reduction of the bias to a specified level requires larger and larger *n*.

We can gain further insight by considering various possible values of *h*, in particular taking account of the expression in braces inside the summation over *j* in Eq. (11). For large *h*, and given *j*, many *i* will contribute to each of the sums appearing in the numerator and denominator. A typical value of |**X*** _{j}* −

**X**

*| for those terms that do contribute will be*

_{i}*h*or less. [Terms that have |

**X**

*−*

_{j}**X**

*| ≫*

_{i}*h*will have

*N*(

**X**

*;*

_{j}**X**

*,*

_{i}*h*) ≪ 1.] Therefore, bearing in mind the

*h*

^{2}factor in the denominator, the term within braces is expected to be

*O*(

*h*

^{−1}) and the expression for is expected to include a factor proportional to

*h*

^{−1}. An opposite limit is for

*h*very small when for given

*j*all terms that have

*i*≠

*j*are exponentially small in

*h*

^{−2}. The largest term in the denominator will be the

*i*=

*j*term. However, the

*i*=

*j*term in the numerator is identically zero since

**X**

*−*

_{j}**X**

*= 0. Therefore the numerator will be dominated by the*

_{i}*i*≠

*j*such that |

**X**

*−*

_{j}**X**

*| is smallest. The expected size of this term decreases exponentially as*

_{i}*h*

^{−2}; therefore, will reduce to zero as

*h*reduces to zero. We expect a situation where , and therefore any estimate of

*δ*

**x**for given forcing

*δ*

**f**is expected to be small for large

*h*and to increase in magnitude as

*h*decreases. On the other hand, the estimate is likely to be very small for

*h*small, suggesting that a best estimate will come for intermediate

*h.*Furthermore, by analogy with standard results for density estimation, the variance of the estimate is expected to be large for small

*h*and also to depend on the size

*n*of the sample being used to estimate the PDF. Therefore, in considering any estimate it is likely to be important to examine the effect of changing both

*h*and

*n*.

## 3. Testing the nonparametric FDT

In this section we test the nonparametric form of the FDT developed above by applying it to some simple models. We consider only the steady-state responses as described by Eq. (2) and not finite time predictions of the state vector as potentially described by Eq. (1). We first consider two linear stochastic models that exhibit Gaussian statistics: first with 1 degree of freedom and second with 3 degrees of freedom. We are therefore able to compare the performance of the Gaussian and nonparametric algorithms. The third model considered is a one-dimensional bimodal stochastic model that has a strongly non-Gaussian PDF. Here the Gaussian form of the FDT fails, and we demonstrate that the nonparametric form does not. Finally, we consider the application of the nonparametric FDT to the Lorenz model (Lorenz 1963) with the addition of stochastic white noise to avoid difficulties with nondifferentiable PDFs. The nonparametric form of the FDT is once again able to predict the response to forcing of this stochastic Lorenz system. Throughout we take *m* = *n* when applying Eq. (11) although, as noted previously, variations on this might have advantages.

### a. A Gaussian model

To produce a system with a Gaussian PDF we use the *d*-dimensional linear stochastic model specified by Eq. (22):

where **x** is a *d* element state vector, *t* is time, is a *d* × *d* constant matrix chosen such that the system is stable, ** ξ** is a white noise stochastic term with a constant covariance matrix , and

**f**is a forcing term. For this system is simply equal to . With

**f**= 0 the covariance matrix may be found via the Lyapunov equation

For a linear stochastic system the lag *τ* covariance matrix is given by

In the one-dimensional case where say, this reduces to the autocorrelation function *a*_{corr}(*τ*):

It is the autocorrelation that determines the time scale in the model and therefore the number of independent or uncorrelated data points in a given integration.

The practical application of the FDT to such an equation has been considered in detail in various earlier publications, for example Risken (1984) and Penland (1989). We choose to restrict ourselves to the case , where is the identity matrix, and use the method of Rümelin (1982) to solve Eq. (22) by applying the Euler–Maruyama scheme

in which ** ξ** is a Gaussian distributed random vector with covariance .

In making comparison between the predicted form of and the exact form of , it needs to be kept in mind that the accuracy of Eq. (24) as an approximation to Eq. (22) is limited by the size of the time step *δt.* Note further that in some of the comparisons to be reported there was some indication of sensitivity to the type of random number generator used to generate ** ξ**, and for the work throughout this paper we used the Mersenne Twister algorithm (Matsimoto and Nishimura 1998) that is now increasingly used in standard software packages.

Given that, for the system (22), we know that the form of the PDF is described by Eq. (3), we are able to compute the expected value of the nonparametric response calculation as *n* → ∞ by evaluating in Eq. (8) as the convolution of two Gaussians and then substituting this for *ρ* in Eq. (2), which yields

In this case the bias, indicated by the fact that the product of the second two matrices in Eq. (25) is not equal to , may be removed by multiplying by the inverse of this product to give

For the one-dimensional case we take (i.e., *b* = 0.5) as a first example. We used *δt* = 0.01 and recorded the state variable for the purpose of evaluating Eqs. (11) and (12) at unit time intervals. The upper limit *T* = 10, significantly greater than the autocorrelation time, was taken for the integral (2). The integration time was taken to be 65 536 (i.e., ∼3 × 10^{4} times the correlation time of the system). Figure 1 shows predictions of the forced response based on eight independent integrations, allowing assessment of the uncertainties involved in estimating Eq. (12). There is a clear dependence on the choice of *h*, broadly consistent with the qualitative arguments presented in the previous section. (Note that the scale for *h* is logarithmic.) For large *h* the response is underpredicted, consistent with the general arguments and also with Eq. (25), but the variation between estimates from different samples is small. For an intermediate range of values of *h* the estimate is close to the correct value, still with little variation between samples. For small values of *h* there is very large variation between samples (so estimates from an individual sample would essentially be useless).

Similar comparisons were carried out for , with *b* taking various values between 0 and 1. Results were broadly consistent with those reported for *b* = 0.5, bearing in mind that increasing *b* implies a shorter autocorrelation time and hence, for fixed integration length and sampling interval, effectively larger samples of the system. Increase of *b*, keeping the magnitude of the stochastic term fixed, also implies that the bias associated with the finite width of the kernel functions will tend to be larger at fixed *h* (since the width of the distribution itself is smaller relative to *h*).

A three-dimensional system was also considered with

with chosen randomly subject to the constraint that the system is stable. Note that the eigenvalue of with smallest real part is −6.6067 × 10^{−3}, implying an autocorrelation time of ∼150. Typical results are shown in Fig. 2. In this case 10 integrations, each of 10^{6} time units, were performed and the forcing **f** was taken to be

In this case it is useful to show explicitly the effect of the choice of suitable upper limit *T* for the integral (2), or equivalently integral (4) if the quasi-Gaussian approximation is made. The top left panel shows the variation with *T* of the predicted response under the quasi-Gaussian approximation. Note that the optimal choice of *T* is finite for any finite sample size since the uncertainty in the prediction increases as *T* increases. The results shown in Fig. 2 show that, for the parameter values here, *T* = 1000 is a suitable choice and gives a quasi-Gaussian estimate within a few percent of the exact answer. The top right panel of Fig. 2 shows, for this choice of *T*, the corresponding variation of the nonparametric FDT estimate with *h*. As for the one-dimensional case, the magnitude of the response is seen to be underpredicted for large *h*. In this case the behavior at small *h* is slightly different from that seen in the one-dimensional case. In the one-dimensional case the variance of the predicted response increases rapidly as *h* decreases, but there is no clear increase in bias. In the three-dimensional case shown in Fig. 2, on the other hand, there is a clear increase in bias at small *h*, associated with an anomalously small predicted response consistent with arguments given in section 2b above. The bottom left panel shows the “corrected” FDT estimate according to Eq. (14). This reduces but does not eliminate the inaccuracies at large *h* and gives a greatly increased variance at small *h*. The bottom right panel shows the corrected FDT estimate according to Eq. (26). This highlights the bias due to finite sample size *n*, which is dominant for small *h*.

These Gaussian examples have served to confirm that the nonparametric FDT works in practice and to highlight the implications of the choice of *h*. The bias increases with *h*; therefore *h* should be chosen to be as small as possible. On the other hand, the variance and the bias of the estimate increase at small *h*, but in a way that depends strongly on the sample size *n*. Therefore in any particular application the choice of *h* should be specified on the basis that for this *h* the estimate is robust to modest changes in *h* and to modest changes in *n*.

### b. A multiwell potential model

We now test the nonparametric method on a system that is far from Gaussian. We start with a one-dimensional example defined by

in which *b*_{1} = 1, *b*_{2} = 0.02, and *b*_{3} = 5; *ξ* is Gaussian distributed stochastic noise with unit variance, and *f* represents an applied forcing.

This model has a bimodal behavior and, as can be seen in Fig. 3, its PDF is smooth and strongly non-Gaussian. Since all of the requirements for the nonparametric FDT are met, we expect it to make accurate predictions of the forced response, whereas we expect the standard Gaussian FDT to fail. We simulated the system defined by Eq. (27) by applying a fourth-order Runge–Kutta method to the deterministic part of Eq. (27) with an added noise term identical to that used in Eq. (24). We used a time step of 2 × 10^{−3} and recorded the state *x* every 0.2 time units. With *f* = 0 we generated 100 independent integrations, each of 2 × 10^{5} data points after discarding the first 100.

Both forms (2) and (4) of the FDT capture only the linearized response and can therefore make accurate predictions only when the forcing is small enough in amplitude. The system (22) considered previously is exactly linear and the linearized prediction of the response will be valid for any choice of the forcing **f**. For the system (27) on the other hand, the linearized prediction can be valid only for small enough |*f*|. To test linearity we integrate Eq. (27) for a range of choices of *f*, for each choice generating 10 independent integrations, each of 10^{5} data points (with the first 1000 being discarded). The response is plotted against forcing in Fig. 4. The graph shows only small deviations from linearity over the range 0 ≤ *f* ≤ 1, and we take *f* = 0.1 as the case against which the linearized prediction can be reliably tested. The gradual reduction of the slope of the curve as *f* increases toward 1 can be expected to strengthen as *f* increases further and the nonlinearity of the response becomes more important.

Explicit results for application of the Gaussian and nonparametric forms of the FDT to this case are shown in Fig. 5. Application of the Gaussian form of the FDT required only a subset of our dataset to reduce the variance to an acceptable level. Ten independent integrations each of 10^{5} data points were used. With this data Eq. (4) yields at an upper limit of the integral of 50, a calculated response to a forcing of *f* = 0.1 of , which is not consistent with the true response, estimated by model integrations to be (see Fig. 5). Application of the nonparametric FDT, using the full set of 100 integrations, each of 2 × 10^{5} data points, yields a response that depends on the choice of *h*. However, there is a range of *h* between about 2 × 10^{−2} and 5 × 10^{−1} where the predicted response is independent of *h* and the uncertainty is acceptably low. For *h* = 0.1, say, is consistent with the true response.

### c. A stochastic Lorenz model

We now wish to test the ability of the nonparametric FDT to predict the response of a system where the PDF is to a large extent controlled by the chaotic behavior of the internal nonlinear dynamics rather than primarily by applied stochastic forcing. The three-dimensional Lorenz model (Lorenz 1963) is a natural choice because it is often used as a conceptual paradigm for climate (e.g., Palmer 1993), although it possesses a fractal equilibrium PDF, which violates the assumption of smoothness required by the FDT. However, if we smooth the PDF by adding a stochastic white noise term, ** ξ** = (

*ξ*,

_{x}*ξ*,

_{y}*ξ*), to the Lorenz equations, we expect that the conditions required for the nonparametric FDT should apply.

_{z}With standard parameters and an additional forcing term our stochastic version of the Lorenz model is defined by

where *x*, *y*, and *z* are the components of the three-dimensional state vector and **f** = (*f _{x}*,

*f*,

_{y}*f*) is the applied forcing.

_{z}This is the model considered by Thuburn (2005), who showed, at least for a particular amplitude of the stochastic term, that the linear response to a forcing was well estimated by consideration of the Fokker–Plank equation for the system, that is, the deterministic partial differential equation for the equilibrium PDF. We first compare the FDT algorithm proposed here with Thuburn’s results and then apply it to a system with a smaller stochastic component.

#### 1) High amplitude uniform noise

In this case we take the amplitude of the noise to be the same as that in Thuburn (2005), where the standard deviation of the noise components *ξ _{x}*,

*ξ*, and

_{y}*ξ*is chosen so that Eq. (28) is equivalent to a vector stochastic differential equation including in each component term a Wiener process multiplied by a factor of 20. We approximate solutions to the stochastic Lorenz Eqs. (28) using the same method as that used in section 3b. A fourth-order Runge–Kutta scheme was applied to the deterministic parts of the equation with an added noise term identical to that in Eq. (24). We used a time step of 0.002 nondimensional time units and recorded the state vector every 0.01 nondimensional time units. We recorded 10 model runs, each of 10

_{z}^{6}data points after discarding the first 1000. A typical trajectory from this model is shown in Fig. 6.

The response in the *x* direction to unit forcings applied independently in the *x*, *y*, and *z* directions was estimated by direct simulation and is shown as solid lines in Fig. 7, alongside the corresponding estimate from Thuburn (2005), shown as dotted lines, which are in good agreement. This suggests that the unit forcings are in the amplitude range consistent with linear theory. The corresponding quasi-Gaussian FDT estimates of the response in the *x* direction to forcings in the *x*, *y*, and *z* directions are respectively 0.0252 ± 0.0018, 0.1287 ± 0.0010, and −5.6114 × 10^{−4} ± 6.0076 × 10^{−4}, where an upper limit of the integral in Eq. (4) of 1 was used in each case. These results are summarized in Table 1. In the first two cases (forcings in *x* and *y* directions), these are clearly inconsistent with the simulated response (respectively underpredicted by 65% and overpredicted by 24%). The case of forcing in the *z* direction is not so clear since the mean simulated response is significantly smaller than its variance. Also shown in Fig. 7 is the response predicted by the nonparametric FDT, with dependence on the bandwidth parameter *h* and on the sample size both shown. Taking a value of *h* around 1 as providing the “best” estimate it is clear that the nonparametric FDT is much more accurate than the Gaussian FDT and very comparable in accuracy to the Thuburn (2005) approach. Note that, if the direction of forcing is fixed, then optimization of the choice of *h* for each component of the response (by minimizing some component of bias and variance) naturally gives a different optimal value of *h* for each component. A more satisfactory approach might be to optimize by applying a criterion on *δ***x** as a vector, rather than to individual components of *δ***x**.

#### 2) Lower amplitude nonuniform noise

The forcing of trajectories toward the strange attractor of the Lorenz model is exceptionally strong in comparison with movement along the attractor itself (Tucker 2002). Therefore, the effect of a stochastic forcing in an arbitrary direction is strongly suppressed in a direction perpendicular to the attractor. On the other hand, to provide sufficient smoothing of the PDF for application of the nonparametric FDT to be feasible the isotropic noise term needs to be large, which reduces correlations in time and disrupts the structure of the attractor as a whole. The solution we use is to add a stochastic forcing locally normal to the attractor itself. To find an approximation of this direction we use numerical approximations of time differentials

and

to define the vectors **v** and , where the caret denotes a small random perturbation. If the perturbations , and are sufficiently small with respect to the time step, , and we assume that **v** and are not parallel. Then, since for a sufficiently small noise term movement in one time step along the attractor is further than movement toward it, the cross-product is a vector approximately perpendicular to the plane of the attractor. We now define the stochastic noise vector to be

in which *ξ*_{0} is a Gaussian distributed random variable with zero mean and unit variance and *a* is the noise amplitude. In practice it may be difficult to be sure that a chosen perturbation is small enough, and for very small perturbations ** ξ** may be subject to significant numerical inaccuracy. To mitigate this we doubled the time step used to calculate in our numerical solver. After applying a range of noise amplitudes we found that using Gaussian distributed random numbers with a standard deviation of 0.5 for the perturbations , and and

*a*= 5 is sufficient to enable application of the nonparametric FDT with reasonable computational effort while not being too large so that structure and lag correlation is retained, as shown in Fig. 6. We approximate solutions to these stochastic Lorenz equations using the Runge–Kutta method used above with a time step of 10

^{−4}and record the state vector every 0.01 nondimensional time units.

In this case, without any guidance from previous work, we must be careful to consider a forcing small enough to be in the linear regime. We choose a range of forcing in the *x* direction and calculate the response from a series of model integrations, Fig. 8. We generated a total of 10^{4} independent integrations, each of 10^{6} data points after discarding the first 10^{4}, for each value of applied forcing. This clearly demonstrates that a forcing of *f _{x}* = 0.1 is well within the linear regime. Applying the Gaussian FDT to predict this response, using 1000 of our unforced integrations leads to a large overestimate of the

*x*and

*y*components, as shown in Fig. 9. Again, the variation in the upper limit of the time integral is shown for the Gaussian calculation demonstrating convergence. The nonparametric FDT, using only the first 10

^{5}data points of 1000 of our unforced integrations and an upper limit of the integral in Eq. (2) of 20 time units, gives a predicted response that is much more consistent with the simulated response. Numerical results are summarized in Table 2.

The predicted response in the *y* direction, although consistent with the error bars, suggests an overestimate. This could be an effect of finite *h*, as discussed previously, and we attempt to understand it by examining the convergence of our predicted response with integration length. Figure 10 illustrates our general conclusions (stated earlier) that, as integration length is increased, we are able to choose lower and lower values of *h* and that, for a integration length *n* ≥ 4 × 10^{4} data points, change in the calculated response is slow with respect to *n*.

Naive application of the Gaussian and nonparametric forms of the FDT to the system (28) without any added noise was unable to give an accurate prediction of the response to forcing. This may be a practical issue of estimation of the underlying PDF with reasonable computational resources, or it may be a more fundamental problem to do with nondifferentiability of the PDF. The idea that in any practical application computational numerical rounding error will provide sufficient noise to smooth the PDF does not seem to apply in this case.

## 4. Discussion

In this paper we have proposed an approach to exploiting the FDT in its general form [Eq. (1)], particularly in the steady-state limit [Eq. (2)], as a predictor of the linearized response of a dynamical system to an applied forcing (or, by simple extension, to a change in the parameters of the system). The key step in the approach is to follow standard practice in nonparametric statistics and to use a kernel density estimation procedure to meet the requirement for knowledge of the equilibrium PDF of the system. The estimator for the response of the system, which results from this “nonparametric FDT,” is constructed from a sample of the unforced system, just as is the case in the much more standard quasi-Gaussian form [Eq. (4)] of the FDT. Note that, while we have focused on the forced response in the mean state vector, this nonparametric FDT can be applied straightforwardly to estimate responses in other observables, including quadratic and other nonlinear observables. The appropriate generalization of Eq. (2) is to replace the **x**(*τ*) with *g*[**x**(*τ*)] and of Eq. (11) to replace **X**_{j}_{+s} with *g*(**X**_{j}_{+s}).

We have demonstrated using relatively simple examples that the new nonparametric FDT works in practice to give a quantitatively accurate prediction of the response. Linear systems with stochastic forcing, where the response operator is known analytically and may also be evaluated using the quasi-Gaussian FDT, were considered as test cases. These demonstrated the important principle, well known in the context of density estimation, that both the bias and variance of the predicted response depend on the bandwidth parameter *h* of the kernel function used in the density estimation and on the size *n* of the sample of the unforced system. Thus, any application of the nonparametric FDT needs to consider carefully the optimal choice of *h*, perhaps by testing sensitivity to changes in *h* or by subsampling to test sensitivity to sample size *n*, and also to estimate the variance. Bearing these points in mind, the nonparametric FDT was applied successfully to two simple nonlinear systems, a one-dimensional system driven by stochastic noise, with a bimodal probability distribution, and the three-dimensional Lorenz system.

It remains to be seen whether the application of the nonparametric FDT outlined here can be improved, for example, by use of an adaptive or non-Gaussian kernel in the density estimation procedure. In the standard problem of density estimation this can improve bias from *h*^{2} to *h*^{4} (Silverman 1986). It is also possible that theoretical prediction of the asymptotic form of the bias for small *h* and large *n* would allow a correction to estimates obtained for finite *h* and *n.*

In all the systems that we considered, including the Lorenz system, the inclusion of explicit stochastic forcing ensures that the equilibrium PDF is smooth. It has been speculated that, for many chaotic dynamical systems, including the Lorenz system, whether or not the “true” equilibrium PDF is smooth, the effect of rounding error in numerical integration is akin to stochastic forcing and ensures smoothness. Whether or not that is true, our efforts to apply the nonparametric FDT to the Lorenz system without explicit stochastic forcing were unsuccessful, but this could have been due to the practical problem of requiring too small a value of *h* alongside feasible sample size rather than any fundamental barrier caused by nonsmoothness of the PDF. If nonsmoothness were relevant, then one would have to consider alternative approaches that do not make the smoothness assumption (e.g., Ruelle 1998; Abramov and Majda 2007, 2008).

The nonparametric FDT is one of several different approaches that have been proposed to consider the forced response of a dynamical system without making the assumption of Gaussianity that is necessary for the standard form of the FDT. One line of approach has been via an ensemble adjoint technique, which requires performing an ensemble of integrations of the adjoint system along trajectories of the unforced dynamical system (see, e.g., Eyink et al. 2004). This approach has the strong disadvantage that quantities of interest, in particular those that make up the response function which appears in integrals such as Eq. (1), grow without bound in individual integrations of the adjoint system. This unbounded growth is, in principle, suppressed in ensemble averaging, but the number of realizations required to suppress the growth of ensemble-averaged quantities up to some finite time *τ* grows very rapidly as a function of *τ*. Abramov and Majda (2007, 2008, 2009) have suggested a “blended-response approach” in which the response function is estimated using the ensemble adjoint technique for small times and then gradually replaced by that estimated through the quasi-Gaussian FDT at larger times. A different line of approach has been suggested by Thuburn (2005), who considers the evolution equation not for individual trajectories of the dynamical system, but for the PDF, so that the equation for the PDF is the Fokker–Planck equation (assuming that additive stochastic forcing is included in the evolution equations). The resulting expression for the forced response requires knowledge of the solution of the adjoint to the Fokker–Planck equation with appropriate forcing, which Thuburn demonstrates may be obtained either by numerical solution of this partial differential equation or else by suitable ensemble averaging of an integral along trajectories of the dynamical system. Either method essentially requires tabulation of the solution in phase space and the cost therefore rises extremely rapidly with the dimension of the phase space.

Thuburn (2005) does not mention the FDT, but the approach he proposes is clearly related to the FDT, and we clarify that relation in the appendix (in doing so noting how Thuburn’s approach, which he presents for the steady-state problem, can be extended to the time-dependent problem.) The key point is that writing down the response to forcing naturally requires solution of the equation governing the time evolution of the PDF of the undisturbed system, with an initial condition that is not the equilibrium distribution (otherwise the solution would be trivial). In Thuburn’s approach that solution appears explicitly (albeit as the solution of the adjoint problem). In the FDT approach the required quantities are obtained not by knowledge of the time evolution of the PDF but from the time decay of correlations of appropriate functions.

The nonparametric FDT approach does not require tabulation of any function in phase space, potentially very expensive in high dimensions, but instead an evaluation of an estimator at sample points. This evaluation naturally occurs in those parts of phase space favored by the system, that is, those parts where the PDF is relatively large. In other words, potentially inefficient evaluation in parts of the phase space that are infrequently occupied is avoided. That efficiency notwithstanding, the required computation is potentially expensive, with the required number of operations increasing as *n*^{2}, where *n* is the sample size, and this is the main difficulty ahead in applying the nonparametric FDT in high dimensions [where *n* must be large, perhaps impossibly large—recall the interplay between *n* and *h* and the dependence on dimension implied by Eq. (21)]. One question for the future, therefore, is whether alternative forms of the estimator are possible that improve on *n*^{2}. A possible approach would be to map the density estimation problem to a nearest neighbor problem and order the data into a tree, which reduces the number of computations required to the order of *dn*log*n*, where *d* is the dimensionality of phase space. This approach works in a low dimensional phase space, but severe difficulties present themselves when the dimensionality is increased.

The *n*^{2} dependence of the computation described in this paper requires that the sample is as efficient as possible. For example, if successive sample points are separated only by short time intervals and therefore by small distances in phase space, then it might be that *n* can be reduced without adversely affecting the bias or uncertainty in the estimate and allow considerable computational saving. Note that for accurate evaluation of the integral appearing in Eq. (2), for example, it is necessary to sample the system at relatively short time intervals so that lag correlations can be calculated as functions of delay time at correspondingly short intervals. However, it may not be optimal to use the whole sample in the estimation of **∇***ρ*/*ρ*, that is, in the inner sum in Eq. (11).

The advantage of the nonparametric FDT, as compared to the other approaches such as the ensemble adjoint method, is that no explicit knowledge of the underlying equations for the unperturbed system is needed. All necessary information about these equations is obtained from the observed decay in the unforced system of correlations of appropriate quantities. (Note that the explicit form of the perturbation is, of course, needed; otherwise *δ***f** would be unknown.) This means that the FDT is, for example, potentially a predictor of the zonal mean response of the atmospheric circulation to an applied forcing that does not require any explicit parameterization of the eddy fluxes in terms of the zonal mean flow (e.g., Ring and Plumb 2008): indeed, the FDT provides such a parameterization.

For all of the different approaches mentioned above attention needs to be paid to computational requirements and, in particular, whether the approaches are computationally feasible for systems with large or even modest numbers of dimensions. The ultimate aim is to apply the nonparametric FDT to a realistic climate model in which the dimension of the phase space is very large. Gritsun and Branstator (2007), for example, apply the quasi-Gaussian FDT to a system with 1800 degrees of freedom, and this is a significant reduction to the number of degrees of freedom of the full underlying model. The effective number of degrees of freedom of a system may be significantly less than the dimension of its phase space. (For example, consider adding extra spatial degrees of freedom to a simulation of a fluid system when the number of degrees of freedom is already sufficient for a good representation of the true solution.) Since the expressions arising in the nonparametric FDT are evaluated only in those parts of phase space that the system visits, it is possible that the convergence and accuracy properties of the nonparametric FDT depend upon the effective number of degrees of freedom present rather than the actual dimensionality of phase space. If the dimensionality of a system is found to be too large to allow practical application of the nonparametric FDT and improvements cannot be found, then another approach would be to project the large system onto a small number of degrees of freedom. This might, for example, be based on the widely made, but often poorly justified, assumption that the spatial structure of forced response is dominated by a small number of empirical orthogonal functions representing leading modes of natural variability [e.g., see the discussion in Branstator and Selten (2009)]. Majda et al. (2010b) discuss some of the conditions under which reduction in the number of degrees of freedom is possible and consider some simple examples. The nonparametric FDT that we have presented in this paper opens up various related possibilities, including, for example, some kind of combination of the nonparametric approach, to capture the dependence of the equilibrium PDF on a small set of key state variables, with a quasi-Gaussian assumption to represent dependence on remaining variables.

## Acknowledgments

This research was supported by the Natural Environment Research Council Grant NE/0005051/1 to the consortium “Solar Influences on Climate” led by Professor J. Haigh and by the Natural Environment Research Council Grant NE/G003122/1 to Dr. G. Esler.

### APPENDIX

#### Relation with Thuburn (2005)

Thuburn (2005) presents a method for calculating climate sensitivity via the adjoint of the evolution equation for the PDF. This method has advantages over those previously proposed based on linear adjoint calculations (Eyink et al. 2004 and references therein) since it does not suffer from unbounded growth of important quantities at large times. However, Thuburn (2005) does not relate his approach to the fluctuation–dissipation theorem, and therefore we clarify the relation between the two approaches below.

Consider a dynamical system with **X**(*t*) as a single realization of the time evolution and for an ensemble of realizations described by a PDF *P*(**x**, *t*) that satisfies an equation of the form

where *M*_{x} is some partial differential operator, assumed not to depend explicitly on time. Thuburn considers the case where the above equation is the Fokker–Planck equation, relevant where the evolution equations for the dynamical system contain explicit randomness. Without explicit randomness *M*_{x} would be the Liouvillian operator. Define to be the operator that integrates Eq. (A1) from an initial condition at *t* = 0 to later time *t*; that is, is the solution to Eq. (A1) with initial condition *P*(**x**, 0) = *P*_{0}(**x**). Now consider a perturbation to the dynamical system of the type considered in section 1, where *δ***f**(**x**, *t*) = *χ*** _{f}**(

*t*)

*δ*

**f**(

**x**) is added to the right-hand side of the evolution equations of the dynamical system, implying that

*M*

_{x}

*P*changes to

*M*

_{x}

*P*−

**∇**

_{x}[

*δ*

**f**(

**x**,

*t*)

*P*]. Note that in the main part of the paper we consider only the simple case where

*δ*

**f**is independent of

**x**and

*t.*

The corresponding correction to *P*(**x**, *t*) will be *δP*(**x**, *t*), which satisfies the equation

with solution

Now consider the change *δG ^{t}* in an observable

*G*= 〈

^{t}*g*〉 = ∫

*g*(

**x**)

*P*(

**x**,

*t*)

*d*

**x**, confining attention to the case where the unperturbed system is in equilibrium; that is,

*P*(

**x**,

*t*) =

*ρ*(

**x**), where consistent with the notation in section 1,

*ρ*(

**x**) is the equilibrium PDF. Then, from the above,

The fluctuation–dissipation theorem approach is to consider the lag correlation 〈*g*[**X**(*t* + *τ*)]*h*[**X**(*t*)]〉, which can be written as

where *P*(**x**, *t* + *τ*|**y**, *t*) is the conditional PDF for **X** at (**x**, *t* + *τ*), given that **X**(*t*) = **y**. It follows that and, hence, that

Comparing with Eq. (A4) and reordering the integration, it follows that

This is the general form of the fluctuation–dissipation theorem (1).

Now reconsider Eq. (A4), again with the integration reordered,

where the second expression follows by defining the adjoint operator . The integral with respect to *s* in the third expression is the solution *ψ*(**x**, *t*) of the equation

with initial condition *ψ*(**x**, 0) = 0. Integrating the third expression by parts gives

which expresses *δG ^{t}* in terms of the solution

*ψ*of the adjoint equation with the function

*g*(

**x**) appearing in the forcing term. In the steady-state limit where

*χ*

**(**

_{f}*t*) = 1 and

*t*→ ∞, this is simply the expression (15) given by Thuburn (2005).