## Abstract

A linear response function (LRF) relates the mean response of a nonlinear system to weak external forcings and vice versa. Even for simple models of the general circulation, such as the dry dynamical core, the LRF cannot be calculated from first principles owing to the lack of a complete theory for eddy–mean flow feedbacks. According to the fluctuation–dissipation theorem (FDT), the LRF can be calculated using only the covariance and lag-covariance matrices of the unforced system. However, efforts in calculating the LRFs for GCMs using FDT have produced mixed results, and the reason(s) behind the poor performance of the FDT remain(s) unclear. In Part I of this study, the LRF of an idealized GCM, the dry dynamical core with Held–Suarez physics, is accurately calculated using Green’s functions. In this paper (Part II), the LRF of the same model is computed using FDT, which is found to perform poorly for some of the test cases. The accurate LRF of Part I is used with a linear stochastic equation to show that dimension reduction by projecting the data onto the leading EOFs, which is commonly used for FDT, can alone be a significant source of error. Simplified equations and examples of 2 × 2 matrices are then used to demonstrate that this error arises because of the nonnormality of the operator. These results suggest that errors caused by dimension reduction are a major, if not the main, contributor to the poor performance of the LRF calculated using FDT and that further investigations of dimension-reduction strategies with a focus on nonnormality are needed.

## 1. Introduction

In statistical physics, the fluctuation–dissipation theorem (FDT) relates the mean response of a nonlinear system to weak external forcing with the internal variability of the system [see Marconi et al. (2008) for a review]. According to FDT, the system’s linear response function (LRF) which relates the mean-response to an imposed (external) forcing via

can be calculated using only the covariance and lag covariances of the unforced system—that is, when and the system fluctuates because of its internal dynamics (details are discussed in section 2). Here the angle brackets mean long-time average and is the state-vector response—that is, deviation from the time-mean state vector of the unforced system.

Leith (1975) introduced the FDT to climate science, arguing that the climate system approximately satisfies the conditions for the theorem to hold, and formulated how the LRF in Eq. (1) can be calculated using the fluctuations of the unforced system. Since then many studies have used climate models of varying degrees of complexity to examine the LRFs calculated using different forms of the FDT and the implications of this theorem (e.g., Bell 1980; North et al. 1993; Cionni et al. 2004; Gritsun and Branstator 2007; Abramov and Majda 2007; Gritsun et al. 2008; Ring and Plumb 2008; Gerber et al. 2008a,b; Kirk-Davidoff 2009; Majda et al. 2010; Gershgorin and Majda 2010; Achatz et al. 2013; Lutsko et al. 2015; Fuchs et al. 2015). However, the competency of the FDT for the climate system remains unclear as some of these studies found FDT to only work qualitatively, although some others, such as Gritsun and Branstator (2007) and Fuchs et al. (2015), found promising quantitative skills.

Whether the reported failure of the FDT is because of the invalidity of the underlying assumptions for the climate system or because of practical problems (e.g., associated with using limited datasets), or a combination of both is unclear (see section 2 for further discussion). How accurately the FDT holds for the climate system is important, not only because of the possibility to construct skillful LRFs from unforced GCM simulations or even ambitiously, from observational records, but also because some of the implications of this theorem. For example, FDT relates the amplitude of the forced response to the time scales of internal modes of variability of the system (e.g., Ring and Plumb 2008; Shepherd 2014). Given that idealized and comprehensive GCMs overestimate the persistence of the leading mode of the extratropical variability in both hemispheres (i.e., the annular mode) by factors as large as 2–3 (and even larger in some cases), an important implication of the FDT is that these models also overestimate the mean response to external forcings by such factors, which has significant consequences for the climate sensitivity [see Shepherd (2014); Gerber et al. (2008a,b); Ring and Plumb (2008); but also see Simpson and Polvani (2016)].

In Hassanzadeh and Kuang (2016, hereafter Part I), for zonally averaged flows in the context of an idealized dry atmosphere, we derive and discuss Eq. (1) and show that the state vector can be reduced to , where and are zonal-mean zonal wind and temperature, respectively. We also calculate the LRF of an idealized GCM, the dry dynamical core with Held–Suarez physics, using Green’s functions: we apply hundreds of weak localized forcings (of and ), one at a time, to the GCM and calculate the mean responses , which are then used along with the applied forcings to find the LRF through matrix inversion. To be consistent with Part I, the LRF calculated for this idealized setup will be denoted with :

where is zonal momentum and/or thermal forcing. Several tests in Part I show that the LRF calculated using Green’s functions accurately reproduces the mean response to imposed thermal/mechanical forcings and vice versa.

The goal of the current paper (Part II) is to use the same idealized setup and to investigate why the LRF calculated using the FDT performs poorly in some cases. The paper is structured as follows. In section 2 the formulation of the FDT is presented, in section 3 the idealized GCM and calculation of from the simulations are briefly described, and in section 4 the performance of for a few test cases (same as tests 1–3 in Part I) is discussed. In sections 5 and 6 we use linear stochastic equations with, respectively, and 2 × 2 matrices to show how the nonnormality of the operator can affect the performance of the FDT. The paper is summarized in section 7.

## 2. Formulation of the FDT

According to the most common formulation, the so-called quasi-Gaussian FDT, the LRF that relates the mean response to an imposed forcing via Eq. (2) can be calculated as

where is the lag-*τ* covariance matrix (a dagger denotes the adjoint). Recently, Majda et al. (2005) and Gritsun and Branstator (2007) have demonstrated that Eq. (3) can be derived under conditions that are more closely satisfied by the atmosphere compared to those used by Leith (1975) and Kraichnan (1959). Still, an important assumption involved in Eq. (3) is that has Gaussian statistics; however, non-Gaussianity in dynamical and thermodynamic variables has been found in observational data (e.g., Ruff and Neelin 2012; Huybers et al. 2014; Loikith and Neelin 2015) and GCM simulations (e.g., Berner and Branstator 2007; Franzke et al. 2007; Sardeshmukh and Sura 2009; Hassanzadeh et al. 2014).

Furthermore, there are practical problems with calculating in Eq. (3) generally owing to the limited length of the dataset. Realistically, the upper bound of the integral is replaced with a finite number and while a small degrades the approximation of the integral, a large can lead to an imprecise because of inaccuracies in at large *τ* due to limited sample size. Additionally, the calculation of or involves the inverse of the sum of the lag-covariance matrices or . These matrices can be close to singular because of short datasets and anisotropic internal fluctuations, which together result in the phase space not being entirely excited by the fluctuations. The common remedy for this problem is to calculate and on reduced dimensions, for example, by first projecting the data onto a specified number of the leading empirical orthogonal functions (EOFs) (Penland 1989; Gritsun and Branstator 2007; Ring and Plumb 2008). Another practical issue is the number of variables that are included in the state vector . Some studies have used one variable such as zonal wind or temperature and some other have used two or more variables.

It is plausible that the reported inaccuracies in FDT and discrepancy in the previous studies are due to the invalidity of the assumptions underlying Eq. (3), such as non-Gaussianity in the data, and/or practical problems such as short datasets and uncertainties in choosing , , and . Several interesting studies have recently attempted to systematically address the issues related to sample size and dimension reduction in calculations of (Fuchs et al. 2015; Lutsko et al. 2015; Cooper et al. 2013), state-vector reduction (Majda et al. 2010), and non-Gaussianity (Cooper and Haynes 2011; Nicolis and Nicolis 2015); however, further work is evidently needed to fully utilize the FDT for practical applications.

To further evaluate the performance of LRFs calculated from FDT and to better understand the potential sources of their inaccuracies, in section 3 we have employed multivariate FDT and a long dataset to compute for the idealized GCM and in section 4 we have tested the performance of using tests 1–3 of Part I.

## 3. The idealized simulations and calculation of

The model and setup are identical to Part I. Briefly, we use the GFDL dry dynamical core, which is a pseudospectral GCM that solves the primitive equations on sigma levels. The GCM is used with the Held–Suarez setup (Held and Suarez 1994): the model is forced by Newtonian relaxation of temperature to a prescribed equinoctial radiative-equilibrium state with a specified equator-to-pole surface temperature difference, and Rayleigh drag with a prescribed rate is used to remove momentum from the low levels and hyperdiffusion is used to remove enstrophy at small scales. The forcings, dissipations, and boundary conditions are all zonally symmetric and symmetric between the two hemispheres. A T63 spectral resolution with 40 equally spaced sigma levels and 15-min time steps are used to solve the equations. Using the control-run setup of the model, which is identical to the control run of Part I where all parameters are the same as in Held and Suarez (1994), we have constructed from Eq. (3) using a 1 000 000-day dataset as described below.

An ensemble of 10 simulations with the control-run setup, each 50 000 days, are used to create the employed dataset, which contains daily averaged anomalous and of the last 49 500 days of each simulations where each variable is weighted by (where *μ* is latitude) and normalized by area-averaged standard deviation of each pressure level (we did not find the performance of FDT sensitive to the weighting details). Covariance matrices are then calculated for each hemisphere of each simulation from (stacked together). The 20 covariance matrices are subsequently averaged to calculate the EOFs of the ensemble. Then for a given , the weighted daily averaged anomalous is projected onto the first EOFs using least squares linear regression, and the results are used to calculate the reduced-dimension covariance and lag-covariance matrices for each hemisphere in each simulation, which are then averaged for each *τ* to find for days. The integral in Eq. (3) is evaluated using the trapezoidal rule.

To find the best performance of , for each test we have tried *τ*_{∞} = 20, 30, 45, 60, and 90 days and *n*_{EOF} = 64 (97.5%), 113 (99.0%), 165 (99.5%), 200 (99.7%), and 300 (99.8%), where the parentheses show the explained variance.

## 4. Tests 1–3 for

Tests 1–3 of Part I are used to examine the performance of in predicting the mean response to external forcings and vice versa. The tests and the true mean responses are discussed in detail in Part I (section 4), but they are also briefly described here and true mean responses are shown in Fig. 1. For each test of , the best results for the attempted ranges of and are shown in Fig. 2 and discussed below.

In test 1 we examine the accuracy of in calculating the time-mean response to an external subtropical thermal forcing

with units of kelvins per day (pressure *p* is in hPa and latitude *μ* is in degrees). The true response, calculated using an ensemble GCM run forced with this forcing, is shown in Figs. 1a and 1b (see section 4a of Part I for details). The mean response to this forcing calculated using is shown in Figs. 2a and 2b. Comparing with the true response (Figs. 1a,b) shows that while can crudely reproduce the patterns of the zonal-wind and temperature response such as the poleward shift of the jet and warming in the subtropical midtroposphere, it cannot reproduce the amplitude of the response or its patterns at small scales. We have further tested the performance of using an external tropical forcing

The true mean response is shown in Figs. 1c and 1d. The mean response calculated using (Figs. 2c,d) agrees better, both qualitatively and quantitatively, with the true response compared to the subtropical forcing, but there are still notable differences particularly in the pattern of the zonal wind and amplitude of the temperature response.

The purpose of tests 2 and 3 is to test whether can accurately predict the forcing needed to produce a specified mean response (i.e., the target). In these tests, for a given target , is calculated and applied in the GCM to run a three-member ensemble, where each member is 45 000 days (note that although we use angle brackets for notation consistency, the forcing is time invariant). The mean response is then calculated with respect to a three-member control-run ensemble (see section 4 of Part I for more details). To minimize the computational cost given the large combination of , we have first used , instead of a GCM ensemble run, to test the accuracy of calculated using for the specified targets of tests 2 and 3. Once that produces the best results are found for each test, a GCM ensemble run is used to calculate the results that are discussed below and shown in Fig. 2 (as expected from the results of Part I, the results of the GCM ensemble run and agree well).

For test 2, the target is the mean response to 10% increase in the Newtonian relaxation time scale of the Held–Suarez setup (Figs. 1e,f). As shown in Figs. 2e and 2f, reproduces the pattern of the zonal-wind response relatively well, except in the tropical stratosphere, and the relative error in the amplitude of the zonal-wind response is 14%. The pattern and amplitude of the temperature response, however, are poorly reproduced using . For test 3, the target is the leading EOF (EOF1) of daily averaged zonally averaged anomalous (with respect to the climatology) zonal wind and temperature, which is the positive phase of the annular mode (Figs. 1g,h). As shown in Figs. 2g and 2h, for test 3 both zonal wind and temperature responses are reproduced reasonably well using with the exception of the pattern of the temperature response at high latitudes. It should be highlighted that in all these tests, the best results are obtained with *τ*_{∞} = 30 days, which is around the decorrelation time of EOF1 in the control run. Gritsun and Branstator (2007) and Fuchs et al. (2015) also used *τ*_{∞} = 30 days, although their GCMs and setups are very different from ours.

The results of Fig. 2 show that the LRF calculated using the FDT is relatively accurate for some problems, such as test 1 with the tropical forcing and test 3, and inaccurate and only qualitative for some other, such as test 1 with subtropical forcing and test 2, consistent with the findings of previous studies (e.g., Gritsun and Branstator 2007; Lutsko et al. 2015; Fuchs et al. 2015). However, even in test 3 where the FDT performs the best, some features of the response, such as cooling in the high latitudes, are poorly reproduced, which can limit applications of —for example, for hypothesis testing.

The source(s) of the poor performance of for some problems and its inability to reproduce some patterns of the response is (are) unclear and difficult to identify and might be due to violation of the assumptions underlying Eq. (3) such as Gaussianity and/or one or some of the practical issues discussed in section 1. The departure from Gaussianity in the control run is found to be substantial, in particular for temperature for which the skewness and kurtosis of daily averages can be as large as 2 and 14, respectively. Although we use the equivalent of a 990 000-day integration to compute , the limited dataset can certainly still be a source of error. However, the results shown in Fig. 2 are not substantially better than those obtained using only one-fifth of the dataset [but it should be noted that the error in decreases as with the length of the dataset *N* (Gritsun and Branstator 2007)]. Nonlinearity and state-vector reduction are likely not the sources of the poor performance of given the good performance of for these tests (see section 4 of Part I) and the state-vector reduction analysis of appendix A in Part I.

The dimension reduction using projection onto a number of the leading EOFs is another likely source of error. For example, Gritsun and Branstator (2007) and Fuchs et al. (2015) have found that the LRFs calculated using the FDT perform poorly when the forcings project onto the excluded EOFs. This happens when the forcing pattern differs significantly from the leading EOFs, which is the case for Gaussian forcings. In the following two sections, we show that the dimension reduction alone can result in a significantly poor performance of for systems with nonnormal LRFs and that the errors increase rapidly with nonnormality. This nonnormality, not to be confused with non-Gaussianity of the statistics of the state vector, refers to the nonorthogonality of the eigenvectors of the operator (i.e., the LRF) (Farrell and Ioannou 1996a,b; Trefethen et al. 1993; Butler and Farrell 1992; Reddy et al. 1993) and can result in strong interaction between the components of the forcing and response that project onto the included and excluded EOFs.

## 5. Tests using a linear stochastic equation with

To focus on the effect of dimension reduction and eliminate other possible causes for a poor performance of the LRF calculated using FDT, we use a dataset that consists of daily averaged obtained from integrating the linear stochastic equation

using the Euler–Maruyama method (Higham 2001) with a 0.1-day time step for 15 million days. In Eq. (6), is a 200 × 1 vector of Gaussian white noise, the 200 × 200 matrix is after inaccurate fast modes are filtered out (see section 3 of Part I for details), and is a 200 × 1 state vector consisting of the coefficients of basis functions

for and (similar to ), where , , , and (see section 3 of Part I for details). The advantage of investigating FDT using Eq. (6) is that while its LRF has the same complexity as that of the GCM, the problem is linear, it produces Gaussian statistics, it can be easily integrated to generate a very long dataset, and the noise is uniformly added to excite all basis functions. As a result of the last two, can be accurately calculated for large *τ*, and and can be computed without the need for dimension reduction.

Similar to the procedure used previously, normalized with the standard deviation of each pressure level is used to calculate for *τ* = 0, 1, …, 90 days, which are then used to compute from Eq. (3). The mean responses to a unit-amplitude thermal forcing of the basis function at calculated using (the true response) and (obtained without dimension reduction) are shown in Figs. 3a–d. The amplitude and patterns of the two responses agree well. When is calculated with dimension reduction using the first 81 EOFs (which explain 90% of the variance), the performance of FDT declines significantly (Figs. 3e,f) and relative errors in amplitude as large as 60% arise.

The errors are not simply due to the inability of the dimension-reduced to capture part of the true response that projects onto the excluded EOFs and is forced by the excluded component of the forcing (i.e., components of forcing that projects onto the excluded EOFs). In fact similar differences in pattern and errors in amplitude are found if only parts of the responses that project onto the first 81 EOFs are compared (see the caption). Therefore, the error is due to the inability of the dimension-reduced to capture part of the true response that projects onto the included EOFs and is forced by the excluded component of the forcing. This component of the response, which depends on the nonnormality of the LRF, can complicate understanding the relationship between the error in the predicted response and the excluded part of the forcing and can be best understood using simple examples of 2 × 2 matrices.

## 6. Tests using 2 × 2 normal and nonnormal matrices

We focus on a simple linear system

where is a 2 × 1 state vector and and are 2 × 1 vectors of Gaussian white noise and time-invariant external forcing, respectively. To start, we choose to be either a normal matrix :

or a nonnormal matrix :

The spectral properties of these matrices are shown in Figs. 4a and 4b. The matrices have the same eigenvalues −1 and −2 s^{−1} and the same slowest-decaying eigenvectors , which are parallel to the horizontal axis. However, while the other eigenvector of is along the vertical axis and hence orthogonal to , the second eigenvector of is nearly antiparallel to with a 11.4° angle. As a result, is nonnormal (i.e., ), and consequently, despite having negative eigenvalues, can lead to nonnormal growth in and instability [see Fig. 1 in Trefethen (1991)]. Nonnormal operators are common in engineering and geophysical/astrophysical flows [see Palmer (1999, p. 579) for an illustrative example of why] and their significance for the dynamics of the atmosphere and ocean has been recognized through the pioneering papers of Farrell and his colleagues (e.g., Farrell 1988, 1989; Farrell and Moore 1992; Butler and Farrell 1992; Farrell and Ioannou 1996a,b) and those of others (e.g., Buizza and Palmer 1995; Penland and Sardeshmukh 1995; Ioannou 1995; Zanna and Tziperman 2005; Tziperman et al. 2008; Palmer and Zanna 2013).

For both matrices, Eq. (8) is integrated for using the Euler–Maruyama method with a 0.05-s time step for 5 000 000 s. The EOFs of the results are shown in Figs. 4a and 4b. For the normal matrix, EOF1 and EOF2 are almost identical to and , respectively, while they are different for the nonnormal matrix. For both matrices, the results are used to construct from Eq. (3) without dimension reduction (denoted as FDT-Full) and with only EOF1 retained (denoted as FDT-EOF1). The mean responses of Eq. (8) with to external forcing predicted using FDT-Full and FDT-EOF1 are shown in Fig. 4c. Both full and dimension-reduced LRF are very accurate. When the forcing has 10% projection onto EOF2, the LRF calculated using FDT-EOF1 shows a small error because it cannot capture the part of the response that project onto EOF2 and is forced by the EOF2 component of . This becomes clear when Eq. (8) is transformed into the EOF space:

where is a vector of the coefficients of the EOF1 and EOF2, and are 2 × 2 matrices whose columns are the EOFs and eigenvectors of , respectively, and is a diagonal matrix of the eigenvalues of . The last term is simply a vector of the projections of onto the EOFs. The noise term is ignored for convenience. For a normal matrix such as , and are identical and the first term on the right-hand side reduces to . Hence the equations for and decouple. Then, if does not have any projection on EOF2, and FDT-EOF1 works as accurately as FDT-Full (Fig. 4c). If has a projection onto EOF2, then causes some errors in FDT-EOF1, which can still accurately calculate (Fig. 4e).

However, for nonnormal matrices [Fig. 4b; also see North (1984)] and the off-diagonal elements of the matrix in the first term on the right-hand side of Eq. (11) can be large, which strongly couples the two equations. In this case, a forcing that only projects onto EOF1 can result in large , and part of a forcing that projects onto EOF2, even if small, can have a large contribution to . Neither effects can be captured by FDT-EOF1, which can lead to large errors, as shown in Figs. 4d and 4f for . The amplitude of these errors depends on the nonnormality of . For example, as shown in Fig. 5, for the same forcing , which has a 10% contribution from EOF2, the error in the response predicted using FDT-EOF1 rapidly increases as the acute angle between the eigenvectors of deceases. The decrease in the angle results in a larger off-diagonal term (coefficient of ) in the equation of , which explains the increase in the error.

The effect of nonnormality complicates understanding the relationship between the error in dimension-reduced FDT predictions and the excluded part of the forcings. For example, nonnormality might explain why Fuchs et al. (2015) could not find a simple scaling relation between the error in the amplitude of the response and the loss of amplitude of the forcing due to dimension reduction. While the error in prediction using FDT-EOF1 for a normal matrix is proportional to the percentage of the forcing that projects onto EOF2, the relationship is not linear for a nonnormal matrix. This is seen in Fig. 6, which shows the relative error of the response that is predicted by the LRF constructed using FDT with only EOF1 retained (FDT-EOF1), as a function of the EOF2 contribution to the forcing for the 2 × 2 linear stochastic system [Eq. (8)] with either normal operator [Eq. (9)] or nonnormal operator

Finally it should be noted that while here we have focused on the effect of nonnormality on the errors in the mean response arising from dimension reduction, nonnormality can also induce errors in the forcing calculated using dimension-reduced FDT for a given mean response in a similar way.

The analyses presented in sections 5 and 6 show that dimension reduction alone can be a substantial source of error in the results obtained from LRFs constructed using FDT because of the nonnormality of the system’s operator and demonstrate that the error depends on the relationship between the included and excluded EOFs, eigenvectors of the system’s true operator, and the projections of the forcing/response onto the eigenvectors and EOFs. Based on these results, it is likely that errors arising from dimension reduction are a major, if not the main, contributor to the poor performance of the LRF calculated using FDT in section 4. This is further supported by the fact that the best performance of FDT is achieved for test 3 where the response projects mostly only onto EOF1 (there is little projection onto other EOFs because of the different weightings used in EOF calculations in section 4 of Part I and section 4 of Part II).

It should be noted that the problem with nonnormality and dimension reduction discussed here cannot be resolved just by including more EOFs in the LRF construction, because the poorly sampled EOFs degrade the accuracy of LRF, and excluding them is the rationale behind the dimension-reduction strategy. In fact, as reported by Fuchs et al. (2015) and also found here, including too many EOFs reduces the accuracy of . Furthermore, only focusing on forcings/responses that strongly project onto the leading EOFs is an imperfect solution because it can seriously limit the applications of , as many phenomena of interest do not project onto the natural modes of variability (e.g., Scaife et al. 2009).

## 7. Summary

In Part II of this study, we have calculated the LRF, , for an idealized dry CGM using the FDT. Despite efforts to maximize the performance of the FDT—for example, by using a 1 000 000-day dataset and trying a range of —the is found to perform poorly for some test cases (section 3). To eliminate some of the potential causes of this poor performance, the accurate LRF of this model, , which has been calculated in Part I of the paper using Green’s functions, is used in a linear stochastic equation driven by Gaussian white noise. Calculating from very long integrations of this equation reveals that dimension reduction by projecting the data onto leading EOFs, which is commonly used for FDT, can significantly degrade the performance of the FDT (section 5). We show in section 6 that the dimension reduction causes this error because the LRF of the system is nonnormal. For example, as a result of this nonnormality, the coefficients of the EOFs can be strongly coupled [see Eq. (11)], and even small projections of the forcing onto the excluded EOFs can have large contributions to the part of the true response that projects onto the included EOFs. Such contributions cannot be captured by the dimension-reduced , which leads to erroneous predictions.

The results of this paper point to the operator’s nonnormality as a major source of difficulty for the practical use of the FDT for the general circulation. The role of nonnormality might explain the mixed success and difficulty in understanding some of the results obtained using the FDT in other studies. Given that dimension reduction is inevitable for calculating LRFs using FDT from limited datasets and that nonnormality is common in the oceanic and atmospheric flows, we suggest further investigations of dimension-reduction strategies with a focus on nonnormality.

The results of this paper also provide another example for the applications of the accurate LRF, , which has been calculated in Part I of this study [see Hassanzadeh and Kuang (2015) for another example]. Furthermore, we suggest that the linear stochastic equation [Eq. (6)] can be helpful in evaluating various strategies related to FDT and in disentangling the contributions of different sources of error because while this equation closely retains the complexity of a full GCM through , it is computationally inexpensive to solve and flexible in terms of the driving noise. For example, replacing with correlated additive and multiplicative noise allows non-Gaussian statistics (Sardeshmukh and Sura 2009), which provides a simple framework to investigate the performance of FDT in non-Gaussian systems.

## Acknowledgments

We thank Ashkan Borna, Gang Chen, Brian Farrell, Nick Lutsko, Ding Ma, Saba Pasha, and Alan Plumb for fruitful discussions; Elizabeth Barnes, Packard Chan, Nick Lutsko, Marie McGraw, and Marty Singh for useful comments on the manuscript; and Chris Walker for help with the GCM runs at the initial stage of this study. We are grateful to two anonymous reviewers for insightful feedback. This work was supported by a Ziff Environmental Fellowship from the Harvard University Center for the Environment to P.H. and NSF Grants AGS-1062016 and AGS-1552385 to Z.K. The simulations were run on Harvard Odyssey cluster.

## REFERENCES

*Information Theory and Stochastics for Multiscale Nonlinear Systems*. CRM Monograph Series, Vol. 25, American Mathematical Society, 133 pp.

*Numerical Analysis*, D. F. Griffiths, and G. A. Watson, Eds., Longman, 234–265.