## Abstract

The most commonly used version of a linear inverse model (LIM) is forced by state-independent noise. Although having several desirable qualities, this formulation can only generate long-term Gaussian statistics. LIM-like systems forced by correlated additive–multiplicative (CAM) noise have been shown to generate deviations from Gaussianity, but parameter estimation methods are only known in the univariate case, limiting their use for the study of coupled variability. This paper presents a methodology to calculate the parameters of the simplest multivariate LIM extension that can generate long-term deviations from Gaussianity. This model (CAM-LIM) consists of a linear deterministic part forced by a diagonal CAM noise formulation, plus an independent additive noise term. This allows for the possibility of representing asymmetric distributions with heavier- or lighter-than-Gaussian tails. The usefulness of this methodology is illustrated in a locally coupled two-variable ocean–atmosphere model of midlatitude variability. Here, a CAM-LIM is calculated from ocean weather station data. Although the time-resolved dynamics is very close to linear at a time scale of a couple of days, significant deviations from Gaussianity are found. In particular, individual probability density functions are skewed with both heavy and light tails. It is shown that these deviations from Gaussianity are well accounted for by the CAM-LIM formulation, without invoking nonlinearity in the time-resolved operator. Estimation methods using knowledge of the CAM-LIM statistical constraints provide robust estimation of the parameters with data lengths typical of geophysical time series, for example, 31 winters for the ocean weather station here.

## 1. Introduction

Multivariate linear theory has been used to great success in practically all realms of climate science. One widely applied linear method is the linear inverse model (LIM) (Penland and Sardeshmukh 1995) framework, in which a linear approximation to a system dynamics is empirically obtained from the system’s covariance statistics. In this framework, a linearly stable system describing the evolution of “slow” variable anomalies (e.g., sea surface temperatures anomalies) is driven by Gaussian white noise representing the effect of unresolved “fast” variability (e.g., wind stress, convection) on the slow variable (Papanicolaou and Kohler 1974; Penland 1996). It is a common practice to restrict the noise forcing the LIM to be state independent (additive), and while often providing valuable results, it is not required by these kinds of systems. This kind of model has been used successfully as a forecast tool (Newman 2013) and performs well when the underlying slow deterministic dynamics is linear or weakly nonlinear.

Despite the qualitative (and often quantitative) success of linear inverse models, these kinds of models are unable in general to reproduce observed deviations from Gaussianity, when driven by additive Gaussian white noise. These deviations from Gaussianity are typified for example in skewed (asymmetric) or kurtotic (lighter- or heavier-than-Gaussian distribution tails) probability density functions (PDFs). Deviations from Gaussianity in geophysical variables’ distributions are commonplace and well documented (e.g., Monahan 2004; Neelin et al. 2010; Ruff and Neelin 2012; Stefanova et al. 2013; Loikith et al. 2013; Perron and Sura 2013; Cavanaugh and Shen 2014; Huybers et al. 2014; Loikith and Neelin 2015; Sardeshmukh et al. 2015) and can be generated through multiple dynamical processes. Perhaps the most intuitive of these mechanisms is through nonlinearity in the deterministic dynamics, with the models of Timmermann et al. (2001), Kravtsov et al. (2005), Kondrashov et al. (2006), and Chen et al. (2016) (among others) providing examples in the inverse modeling setting. Simple advective–diffusive prototypes for passive tracers under a mean gradient can produce distinct non-Gaussianity, most evidently at the distribution tails (Bourlioux and Majda 2002; Neelin et al. 2010). Other mechanisms that lead to non-Gaussianity include cross-frequency coupling (Rennert and Wallace 2009), jet stream meandering (Luxford and Woollings 2012), and first passage processes (Stechmann and Neelin 2014; Neelin et al. 2017). Sura and Hannachi (2015) provide a comprehensive review on the mechanisms that generate deviations from Gaussianity in the atmospheric sciences.

Alternatively, even if the deterministic term (i.e., the term in which noise is not explicit) is linear, deviations from Gaussianity may arise through interactions between a slowly evolving system and fast transients forcing the system if the fast transients depend on the state of the system (Sura et al. 2005). Strictly speaking, any differential equation with stochasticity in it represents a treatment of nonlinearity at some level. That is where dynamical stochasticity originates. A linear system forced with additive noise represents a coarse graining long enough that all of the state dependence, if any, of the nonlinear effects is averaged out. In that case, the central limit theorem (CLT) applies strongly enough to render the statistics of the measured state approximately Gaussian. When the time-scale separation between the linear decay and the rapid nonlinearities is too small to invoke such a strong version of the CLT but is large enough to average out the details of the nonlinearities, the system may be modeled as a linear process with state-dependent (multiplicative) noise. Thus, unlike additive noise, the multiplicative noise processes that drive the deterministic dynamics explicitly depend on the system state (e.g., subdaily wind variance dependence on storminess or blocking, or surface fluxes depending on local stability). Multiplicative noise is well established as a source of non-Gaussianity (Penland 2003; Sura et al. 2005; Majda et al. 2008; Sardeshmukh and Sura 2009; Franzke et al. 2015; Sura and Hannachi 2015; Berner et al. 2017) and has been employed to model several aspects of climate variability, including El Niño–Southern Oscillation (Perez et al. 2005; Jin et al. 2007; Levine and Jin 2017) and extratropical variability (Neelin and Weng 1999; Sura et al. 2005).

For evaluation and comparison purposes, it is important to establish a baseline for variability, including deviations from Gaussianity, that can be explained through a multilinear deterministic system that integrates (possibly) state-dependent noise. To do that, it is necessary to have a simple methodology to extract the multiplicative noise information from data. This has proven difficult because the state-dependent noise, as elaborated below, in general contributes to both the “signal” and the “noise,” so disentangling its contribution is not straightforward. Thus, despite important progress on the matter (e.g., Siegert et al. 1998; Peavoy et al. 2015), a simple methodology to calculate the state-dependent noise from data in a statistically consistent way has been lacking. The development of this methodology, tailored to linear deterministic systems driven by multiplicative noise, is the primary goal of this paper.

In general, fast variability may depend not only on the magnitude of the system anomalies but also on their sign. This to a first approximation can be modeled through a type of noise formulation termed correlated additive–multiplicative (CAM) noise (Müller 1987; Sura et al. 2006; Sardeshmukh and Sura 2009; Majda et al. 2009; Penland and Sardeshmukh 2012; Sardeshmukh and Penland 2015; Sardeshmukh et al. 2015; Franzke 2017). Mathematically, the CAM noise amplitude depends linearly on the state of the system, and this dependency is allowed to be asymmetric with respect to the mean. This asymmetry is expected in systems where the fast variability is modulated differently whether the system is in its positive or negative state, which naturally leads to skewness. This is the case when linearizing the effects of rapid wind variability on fluxes affecting ocean mixed layer dynamics (Sura et al. 2006; Sura and Newman 2008). For example, Sura et al. (2006), studying an ocean mixed layer model, finds at least two (related) sources for this noise amplitude asymmetry. The first one arises because of ocean–atmosphere mean-state temperature differences. This affects the sensible and latent heat fluxes driven by rapid wind variability at the ocean–atmosphere interface and can be mapped directly onto a CAM noise term. The second source arises because of the different sensitivity of boundary layer stability to positive or negative anomalies. This contribution, while not precisely following a CAM noise form (a piecewise linear function would be better), can be approximated by it.

In addition to the noise amplitude asymmetry, the CAM noise linear-state dependency is important because it modifies the probability of noise events as the system evolves, leading to higher probability of extreme events (at least in one tail), compared to similar systems forced by pure additive noise. In fact, in the univariate case, it can be shown that the skewness *S* and excess kurtosis *K* − 3 are related such that^{1} (Sura and Sardeshmukh 2008; Sardeshmukh and Sura 2009)

Several variables have been found to follow such a parabolic relationship (Sardeshmukh and Sura 2009; Sardeshmukh and Penland 2015; Sardeshmukh et al. 2015; Sura and Hannachi 2015), where is a small offset that occurs possibly because of sampling effects. In other words, this framework produces heavy-tailed distributions (although considering the skewness generated, one of the tails may be light at values less than about 10 standard deviations; at larger values, the tails behave similarly; we ignore these extreme tails in what follows) and is an attractive candidate to correctly model extreme events (Sardeshmukh et al. 2015).

Henceforth in this paper, we will consider the next step in complexity beyond estimating parameters from the standard LIM (driven by additive noise) and LIM applied to the univariate CAM system (Sardeshmukh et al. 2015). That is, we consider a linear inverse model driven by a simplified diagonal CAM noise formulation (CAM-LIM). Although this formulation neglects CAM noise covariance and nonlocal state dependency [see, for example, Sardeshmukh and Sura’s (2009) (4a) and (4b)], it is a more general model than used in previous applications and allows for the generation of deviations from Gaussianity in a linear deterministic setting.

To calculate CAM noise in a LIM setting, consistency relations between the CAM-LIM parameters and the statistics generated by it will be derived. In this way, a statistical dynamical description of a system is calculated, which can be employed for multiple purposes, including the construction of realistic forecasts and representation of its scatter, as well as the study of the underlying processes that generated the observations. Importantly, the employment of this model can be used as a baseline for the variability expected from deterministic linear dynamics and raises the bar for claims of nonlinear behavior. To do this, we use the Stratonovich Fokker–Planck equation [Fokker (1914); Kolmogoroff (1931); see Gardiner (2010) for a discussion of Ito (1951) and Stratonovich (1966) calculi]:

which is the equation satisfied by the PDF of a deterministic system driven by Gaussian white noise:

In this equation, encodes the deterministic dynamics and the amplitude of noise process affecting variable , and Ito’s circle is implied. For future reference, we will clarify the terminology used in (2). The first term in that equation corresponds to the “deterministic drift,” the second term is known as the “noise-induced drift” and is zero if the noise is independent of the state of the system, and the last term is usually called the “diffusion.” For a heuristic explanation of the noise-induced drift, see Sura and Newman (2008) (section 2). It is worth pointing out that in the LIM framework, only a combination of deterministic drift and noise-induced drift, known as effective drift, can be inferred from data rather than the terms separately (Penland 2007). An important result from the framework presented herein is that, within the confines of this model (section 3), the deterministic and noise-induced drifts can be separately resolved.

Stochastic modeling has been used to study different aspects of climate variability [see Berner et al. (2017) for a review]. In particular, simplified versions of (3) have provided important insight into the nature of ocean–atmosphere interactions in the midlatitudes (e.g., Frankignoul and Hasselmann 1977; Hall and Manabe 1997; Barsugli and Battisti 1998; Sura et al. 2006; Sura and Newman 2008). We will illustrate the derivation of the CAM-LIM parameters and the general usefulness of the model by constructing a two-variable model of ocean–atmosphere thermal coupling in midlatitudes, empirically derived from ocean weather station data. The remainder of this manuscript is organized as follows: Section 2 presents a brief overview of the LIM framework. Section 3 introduces the CAM-LIM, some important simplifications, and the derivation of the parameters of the model as a function of its statistical structure. Additionally, the constraint in (1) is updated to include the effects of the coupling. Section 4 exemplifies this in the previously mentioned two-variable thermal-coupling model, and results are compared to the standard LIM modeling of the same system. Finally, section 5 concludes the paper.

## 2. Brief review of linear inverse modeling

In this section, we present a brief overview of the LIM (Penland and Sardeshmukh 1995). In this framework, an *N*-component state vector of anomalies **x** evolves according to the following linear equation (also written in component notation for future use):

In this equation, is a constant matrix, is a state-independent matrix of noise amplitudes, and ** η** is an

*L*-component vector of Gaussian white noise processes. Note that the noise covariance matrix

^{T}has an dimensionality. The matrix denotes the slow time-resolved linearized dynamics, while the temporally unresolved fast variability is modeled by the noise input

**. In this framework, is a stable operator, so the system needs the stochastic input to generate variance. Here, the diagonal terms correspond to an effective measure of dissipating processes that depend linearly on variable , and the system is coupled through the terms. Finally, the matrix can be calculated from data (von Storch et al. 1988; Penland and Sardeshmukh 1995) using**

*η*where is the lag covariance matrix at lag *τ* and is the contemporaneous covariance matrix. Here, the angle brackets denote a long-term average.

Given an initial condition , the most probable evolution of the system is (Penland 2007)

There is one key difference in how this multilinear system behaves compared to its univariate version . In absence of stochastic forcing, the one-dimensional system decays exponentially, while in the multilinear case, short-term growth is possible if the dynamics of the system are nonnormal (; e.g., Boyd 1983; Farrell 1988; Borges and Hartmann 1992; Penland and Sardeshmukh 1995; Moore and Kleeman 1999; Thompson and Battisti 2000; Zanna and Tziperman 2005; Vimont 2010; Sévellec and Fedorov 2017; Martinez-Villalobos and Vimont 2017). This makes possible the use of this framework as a forecasting tool (Penland and Sardeshmukh 1995; Penland 1996; Johnson et al. 2000; Alexander et al. 2008; Newman et al. 2011; Zanna 2012).

There are balance conditions in the dynamics of stochastically generated systems that can be deduced from the Fokker–Planck equation [see (2)]. In statistical steady state, the fluctuation–dissipation relation (e.g., Leith 1975; Penland and Matrosova 1994; DelSole and Hou 1999; Ghil et al. 2002; Gritsun et al. 2008) relates the state variables covariance to the noise processes covariance ^{T} as

where we also write this relation in component notation for future reference:

This can be understood as a covariance budget, where the fluctuating stochastic input is dissipated by the deterministic dynamics, so statistical steady state is attained.

The LIM framework is and has been used extensively to study the state of the tropical Pacific (Penland and Sardeshmukh 1995; Penland 1996; Newman et al. 2011; Vimont et al. 2014; Capotondi and Sardeshmukh 2015), tropical Atlantic (Penland and Matrosova 1998; Vimont 2012), and extratropical dynamics (Alexander et al. 2008; Zanna 2012; Newman 2013; Newman et al. 2016). In the tropical Pacific, the forecast of sea surface temperature (SST) anomalies through this method is competitive compared to forecasts provided by general circulation models (Newman and Sardeshmukh 2017). The LIM framework provides a good description of the state variables contemporaneous and lagged covariances if the temporally resolved dynamics is close to linear, but it is not designed to account for long-term deviations from Gaussianity, for example, asymmetric behavior between positive and negative anomalies, and different than Gaussian frequency of extreme events.

## 3. CAM-LIM

In this section, we introduce a CAM-LIM framework, calculate several formulas to extract the multiplicative noise information from data, and derive and discuss the constraints that this formulation puts on the statistical moments generated.

### a. Model derivation

To retain the advantages of the LIM approach and also account for deviations from Gaussianity while keeping the modifications to a minimum, we consider a LIM-type model driven by a simple CAM noise formulation, assuming diagonal dominance in the multiplicative term. Similarly to the standard LIM, a slow variable integrates fast random forcing, but in this case, the random forcing amplitude depends on the slow variable state itself. The model is given as follows:

Here, corresponds to the *i* component of a state vector **x** of anomalies, and is an matrix that encodes the linearized deterministic dynamics of the system. Entries correspond to deterministic dissipating processes that depend linearly on , and the system is coupled through the terms . The system is driven by *L* Gaussian white noise processes whose amplitudes [in keeping with the notation of (3)] are given as follows:

The first set of coefficients corresponds to the CAM noise processes. Here, corresponds to a “local” state dependency for the noise amplitude, and accounts for the part of the additive noise that is correlated to the state-dependent (multiplicative) noise. The second set of coefficients denotes the amplitude of additive noise processes uncorrelated to the CAM noise. For simplicity, this formulation neglects direct nonlocal noise state dependency, although part of the nonlocal effects can be captured (if local and nonlocal variables are correlated) through this simple local state dependency. In this formulation, the CAM noise processes affect the individual noise variances (as seen below), while the pure additive noise carries the noise covariances information. An important feature of this model is that the noise amplitude is asymmetric with respect to the mean; that is, the magnitude of the CAM noise amplitude is zero at rather than at . This will produce an expected mean noise-induced drift that can be removed from the equation for the anomalies in (10) by a term (Sardeshmukh and Sura 2009). In the univariate case, this model corresponds exactly to the one proposed and solved by Sardeshmukh and Sura (2009).

The use of a diagonal CAM noise formulation (one independent process per variable) and the neglect of direct nonlocal noise state dependency are important simplifications but allow us to calculate relatively simple formulas for the CAM-LIM parameters. Using this particular CAM noise formulation is the logical first step to introduce noise state dependency in a LIM framework, and it is in the spirit of, though more general than, the principle of diagonal dominance postulated by Sardeshmukh and Sura (2009, section 6). This principle states the increasing importance of the self-correlation terms in representing the higher-order statistics of a system and explains the success of the univariate version of this model in representing the observed deviations from Gaussianity in several climate variables (Sardeshmukh and Sura 2009; Penland and Sardeshmukh 2012; Sardeshmukh et al. 2015; Sura and Hannachi 2015). Here, in addition to the terms considered by Sardeshmukh and Sura, coupling between the variables and noise covariance effects are incorporated. This allows for the calculation of joint statistics. Despite these simplifications, in most cases, the model will be enough to display a realistic representation of the emergent non-Gaussian behavior, while maintaining all the advantages of the standard LIM framework.

Multiplying the Fokker–Planck equation [see (2)] by the appropriate moment of **x** and integrating over from −∞ to ∞, we can calculate an equation for the first two moments of the system. In statistical steady state,

Comparing to (9) and imposing that both standard LIM and CAM-LIM describe the first two moments of the system in the same way, the following relations are obtained:

These relations relate the parameters of a standard LIM to the parameters of a CAM-LIM. Here, (14) makes explicit the partition of the effective dissipating processes into a deterministic part , and a noise-induced modification . Also, (15) enforces that both standard LIM and CAM-LIM reproduce the same noise covariance, with the right-hand side of the expression amounting to a partition of it between pure additive terms and CAM noise processes. Formulas to calculate all these terms from data are derived in the appendix, with some important ones repeated below.

Under CAM-LIM, it can be shown that the best prediction (in the mean square sense) of the evolution of the state vector^{2} given a current state is also given by (7) (Penland 2007):

which further justifies the use of the notation shown in (14). Also, (14) and (16) reiterate the message that, in general, when calculating the matrix from data, that determination not only includes the linearized deterministic drift but also a noise-induced drift component that may be confused with deterministic dynamics (Penland and Matrosova 1994). Equation (13) generalizes the fluctuation–dissipation relation to include the extra CAM noise terms. From the Fokker–Planck equation, we can also calculate an equation for the system (unnormalized) skewness and kurtosis budgets. Again, in statistical steady state,

Combining the information provided by the first four statistical moments [(12), (13), (17), and (18)], we may find an expression for the CAM-LIM parameters as

where matrices , , and entries are defined as

matrices denoted with a bar are defined as

and denote particular entries of the covariance matrix _{0} ( is the variance of variable ). The nondiagonal elements of ^{T} are calculated using (15). Notice that and are just the skewness and kurtosis of variable . Note that in the multivariate case shown here, variable influences skewness and kurtosis through . Analogous to the univariate case (Sardeshmukh and Sura 2009), the statistics generated by the CAM-LIM are constrained in a distinctive way. These constraints are explored in more detail in the section below. Remaining aspects of the derivation are shown in the appendix.

### b. CAM-LIM constraints on the statistics

In general, the moments of a CAM-LIM-generated dataset in (10) are necessarily constrained. The first constraint (denoted as ) can be derived from (19) and is given as follows^{3} for variable :

This constraint reduces to (1) in the univariate case (which is a good consistency check) and shows that given a nonzero real amplitude of the multiplicative noise term, the CAM-LIM will generate variability that is typically kurtotic. This is a manifestation of the increased chances for the system to make extreme-event excursions, due to the noise amplitude state dependency.

A second constraint arises because the pure additive covariance matrix ^{T} needs to be positive definite [see (15)]. This constraint (denoted as ) may be written as

This constraint necessarily, but not sufficiently, requires the following inequality (denoted with a prime) to be satisfied as well [(21)]:

The last inequality, given that has already been satisfied, ensures that the additive noise variances are positive. Basically, this limits the contribution of the CAM noise to the total noise covariance. In the univariate case, simultaneous consideration of constraints and leads to a stricter relation between kurtosis *K* and skewness *S*: (Sardeshmukh et al. 2015). Although a similar (but more complicated) relation could be derived in the multivariate case, here, we keep both constraints separate. These relations will be explored in practice in section 4c.

## 4. Modeling midlatitude ocean–atmosphere local coupling using CAM-LIM

In this section, we apply the CAM-LIM methodology to a simple dataset that has been investigated in the literature (Hall and Manabe 1997; Sura et al. 2006; Sura and Newman 2008). A simple model of ocean–atmosphere coupling in the midlatitudes is calculated from data and compared to observations. The CAM-LIM parameters estimation procedure is described in detail, and the information provided by the constraints described above is used to improve the calculation of the parameters.

### a. The models

Simple linear stochastic models have been extensively used to study ocean–atmosphere interactions (e.g., Frankignoul and Hasselmann 1977; North and Cahalan 1981; Kim and North 1992; Hall and Manabe 1997; Barsugli and Battisti 1998; Sura et al. 2006; Wu et al. 2006; Sura and Newman 2008; Smirnov et al. 2014). These kinds of systems are simple enough that can be regarded as a null hypothesis or baseline against which distinctively nonlinear variability can be compared. Here, we show the usefulness of this framework by modeling the local midlatitude ocean–atmosphere coupling using both standard LIMs and CAM-LIM frameworks. The CAM-LIM and standard LIM are given, respectively, as follows:

where is the *i* component (*i* = 1, 2) of vector . Here, and represent near-surface atmospheric and surface oceanic temperature anomalies, respectively, at a particular midlatitude location. Standard LIM and CAM-LIM parameters are defined as in (5) and (10), respectively, and can be calculated using (6) and (9) in the standard LIM case and (6), (19), (20), and (21) in the CAM-LIM case. LIM and CAM-LIM parameters are related as in (14) and (15). Although nonlocal noise state dependency (i.e., terms, ) is expected for this kind of interaction (e.g., Neelin and Weng 1999; Sura and Newman 2008), the simple CAM noise formulation used here provides satisfactory results (as seen below), especially compared to a standard LIM. Interestingly, within the confines of this model formulation, the noise part and deterministic part contributions to effective damping term can be cleanly separated out using this framework. Below, we show the result of the previously stated calculations.

### b. Models' parameter estimation

To estimate parameters for our models in (27) and (28), we use ocean weather station (OWS) data [for information on OWS, see Diaz et al. (1987) and Dinsmore (1996)], specifically OWS Papa (OWS P) in the North Pacific. OWS P is located far from strong currents (Hall and Manabe 1997) and is only affected weakly by ENSO (Alexander et al. 2002), thus providing an ideal location to construct these models.

We consider daily data from 1 January 1950 to 31 December 1980 (total of 31 years). The and climatologies are constructed using the annual mean plus the first three annual Fourier harmonics. Anomalies ( and ) are computed by subtracting the respective daily climatologies. The few unavailable daily values (~1.5% of the total) are neglected when computing the climatologies, and 29 February values are neglected as well. A 3-day running mean is applied to the anomalies, and only “extended winter” (November–April) values are considered to construct the model. Finally, and are standardized for easier comparison. Note that using standardized variables is only done for further plotting convenience. To help gauge the results, the standard deviations are and .

The parameter estimation algorithm starts with the calculation of from data using (6). This requires contemporaneous and lag covariance matrices. For our calculations, we use a lag *τ* of 6 days. Notice that both LIM and CAM-LIM generate the same lag covariance matrix as required by (7) and (16). Importantly, both linear models provide an excellent representation of the observed lag correlation functions, as seen in Fig. S1 of the supplemental material. The remaining model parameters are calculated using (9) for the standard LIM case and (19), (20), (21), (13), and (14) for the CAM-LIM case. The sensitivity of the and calculated values to the choice of lag is fairly minor, with maximum variations respect to the values quoted below on the order of 10% for reasonable choices of lag (Fig. S2). The results for the CAM-LIM model are given as follows:

We notice that the effect of the state-dependent noise on the damping of each variable is relatively minor (cf. with , for example). The values of (the amplitude of the multiplicative noise) and (the amplitude of the additive noise correlated to the multiplicative noise) differ from what would be calculated in an univariate setting (uncoupled system and no noise covariance). For example, and would be overestimated by 12% and 27% [calculated using (19) in the univariate case ( when ) or alternatively using Sardeshmukh et al.’s (2015) (8)] had we assumed individual, CAM noise-driven, univariate models for and .

It is tempting to compare the calculation of these parameters in (29) to Sura and Newman (2008) modeling of the same dataset [see their (29), (34), and (36)]. Although superficially similar, the two models differ in several respects, making the comparison difficult. The model presented here is totally empirical, while Sura and Newman’s takes into account the dynamical equations. Having somewhat different objectives, the two models make different assumptions that prohibit their direct comparison. For example, while the CAM-LIM simplified noise formulation allows for a direct estimation of the noise amplitudes, it will not directly represent some of the nonlocal effects in Sura and Newman’s model. It is important to emphasize that in the CAM-LIM case, there are no assumptions as to where the noise is coming from, whereas Sura and Newman neglect some potentially important processes (ocean currents, vertical entrainment, variable mixed layer depth, and mixing) in order to highlight deviations from Gaussianity arising from the effect of state-dependent rapid wind fluctuations on sensible and latent heat fluxes at the air–sea interface. Because of the positive mean climatological ocean–atmosphere temperature difference almost everywhere, models restricted to local air–sea interaction can only generate positive SST skewness (Sura and Sardeshmukh 2009). Although SST skewness is positive at OWS P, there are many parts of the globe where skewness is negative (Sura and Sardeshmukh 2008; Sardeshmukh and Penland 2015). Comparing to the dimensional reduction strategy employed in Sura and Sardeshmukh [2009, see their (16) or (19)], CAM-LIM-independent deterministic components in (27) allow for the parameterization of other processes besides air–sea temperature difference. This implies that unlike models restricted to local air–sea interactions, CAM-LIM is able to generate negative SST skewness as well if the data support it. Despite these differences, the two types of models (loosely speaking, “empirical” and “dynamical”) are complementary and, taken together, help inform the relative importance of local air–sea interaction versus other processes.

To compare both the standard LIM (28) and CAM-LIM (27) with observations, we run both models 10 times for 1000 years each with the calculated parameters in (29) using the stochastic Heun integration method (Rüemelin 1982; Ewald and Penland 2009). We remove the first 50 years of each integration as spinup time, for a total of 9500 years of LIM- and CAM-LIM-generated time series. We use an integration time step of 3 min and collect daily output. This corresponds to 9500 full years of (3-day running mean) daily values or, equivalently, to 19 157 extended winters of 181 days.

Using the generated datasets, we calculate the and joint PDFs produced by each model (Fig. 1b for standard LIM and Fig. 1c for CAM-LIM), and we compare them with the observed joint PDF in Fig. 1a. The joint PDFs are calculated using a bivariate Gaussian kernel density estimator, and shading denotes the difference from a best-fit bivariate normal distribution. As expected, the standard LIM produces a Gaussian joint PDF. On the other hand, although there are differences at the finer scale, CAM-LIM performs noticeably better at reproducing the observed deviations from Gaussianity. Visually, some of the differences between the observed and CAM-LIM joint PDFs may look important, most strikingly, what appears to be two local maxima separated by a local minimum. Here, we note that similar “inhomogeneities” in the joint PDF do arise in other contexts, most notably in the study of atmospheric “regimes” (e.g., Kimoto and Ghil 1993; Smyth et al. 1999), where they are usually explained as arising through nonlinear deterministic dynamics. It is shown below that those inhomogeneities in this case likely appear because of limited sampling and are well explained by the CAM-LIM framework.

Given the extended LIM and CAM-LIM integrations, one may ask how the observations compare with LIM and CAM-LIM integrations of the same length. Figure 2 shows the difference between the observed joint PDF and Monte Carlo estimates for the LIM joint PDF (Fig. 2a) and CAM-LIM joint PDF (Fig. 2b). For each model, Monte Carlo PDF estimates are obtained for each of 617 different 31-yr periods (181 extended winter days per year) contained within the respective 9500-yr simulations and averaged to obtain the dashed curve. Shading indicates regions where the observed PDF falls outside of the 2.5nd or 97.5th percentiles calculated from the 617 LIM and CAM-LIM PDF estimates. Comparing Figs. 2a and 2b, it is visually apparent that the observed variability can be better explained through the CAM-LIM formulation. Although there are some spots where the observed and CAM-LIM joint PDFs are different (at the 95% confidence level), noticeably for strong positive , for the most part, the CAM-LIM provides a good model to explain the observed variability, including the deviations from Gaussianity. We note that both LIM and CAM-LIM have problems explaining the largest anomalies, although that problem is much more reduced in the CAM-LIM case. Here, we point out that the inhomogeneities in the observed joint PDF are nonsignificant and can be well explained by a CAM-LIM null hypothesis at the 95% confidence level. In addition, only one local maximum in the observed joint PDF deviates significantly from Gaussian as seen in Fig. 2a. Given the good correspondence between observed and CAM-LIM joint PDFs, it is suggested that even a coarse noise state dependency, as presented here, may significantly improve coupled variability statistics.

A similar analysis can be conducted for the distribution of the individual variables. Figure 3 shows the observed and standard LIM- and CAM-LIM-generated and cumulative density functions (CDFs) in a linear axis. Similarly as before, confidence intervals are calculated using a Monte Carlo procedure. An important difference between the standard LIM and CAM-LIM is that CAM-LIM generates asymmetric confidence intervals—with narrower spread for positive and negative , where the noise amplitudes are smaller [see (29)]—whereas LIM generates symmetric confidence intervals. The top panels (Figs. 3a,b) show the CDFs in the middle range of the data (between −2 and 2 standard deviations). Both observed (Fig. 3a) and (Fig. 3b) CDFs are well within the 95% confidence interval generated by both LIM and CAM-LIM (not shown), although even in this range, the CAM-LIM fit is noticeably better. The middle and bottom panels show the CDFs at the negative tails (Figs. 3c,d) and positive tails (Figs. 3e,f), respectively. For clarity, Figs. 3c–f are also shown in a logarithmic *y* axis in Fig. S3. As seen in these panels, it is for extreme events where the differences between the standard LIM and CAM-LIM are most evident. With the exception of the largest positive anomalies (; see Fig. S3c), the CAM-LIM produces a better fit of the observed variability at the tails, including both light and heavy tails. For example, this is seen in the heavier-than-Gaussian tail of negative and the lighter-than-Gaussian tail of negative . With only the aforementioned exception, the observations stay within the 95% confidence level generated by the CAM-LIM realizations, whereas for the most part, that is not the case for the standard LIM, where only the negative tail is well captured. To put numbers in perspective, a negative value of three standard deviations (an anomaly of ~−4°C) occurs 5 times more frequently in both observations and CAM-LIM than in the standard LIM.

A general understanding of the data distribution, including the behavior of the tails, can be found by calculating the distribution’s skewness and kurtosis. Table 1 shows the observed skewness and kurtosis, as well as the values calculated using the full LIM and CAM-LIM integrations. As expected, the standard LIM skewness and kurtosis matches the ones of a Gaussian distribution. Even though the match is not perfect, it is evident that the CAM-LIM provides a closer match to observations.

There is an important degree of variability in the statistics as a function of the length of the data segment considered for the calculations. Figure 4 shows the skewness (*S*) and excess kurtosis (*K* − 3) distributions when partitioning the standard LIM- and CAM-LIM-generated time series in segments of 31 winters (the length of the OWS P observations) as done before. First, note that although the fitting works better for than , in both cases, the observed skewness and excess kurtosis are within the 95% confidence interval generated by the CAM-LIM realizations. Conversely, the observed skewness and kurtosis values fall outside the standard LIM confidence interval in all cases, implying that the observed deviations from Gaussianity are a feature of this locally coupled system and are not due to limited sampling. Second, note that the values of skewness and kurtosis in the different CAM-LIM realizations are fairly variable. For example, there are several segments where and excess kurtosis is bigger than 2 (*K* − 3 99th percentiles are 2.21 and 3.36, respectively), implying a much-higher-than-average number of extreme events over that interval. On the other hand, for example, there are segments where excess kurtosis is negative, meaning that although the system generates long-term heavy-tailed variability, quiet extreme-events periods are not unusual. This variability shows that the CAM-LIM generative process [see (27)] supports a wide range of 31-yr climates. This implies that for this system, important swings, owing to internal dynamics, in the number of extreme events decade to decade, or even century to century, is what is normal rather than the anomaly. This has important consequences, for example, for hypothesis testing of extreme events (Sardeshmukh et al. 2015).

### c. Parameter estimation and CAM-LIM-generated statistical constraints

In this section, we analyze how well the parameter calculation algorithm in (19), (20), and (21) performs on the CAM-LIM-generated variability that uses (29) as input parameters. This is an important self-consistency check as the output parameters from the estimation procedure should match the input parameters. When using the full CAM-LIM integration as our time series, we retrieve the following values:

The retrieved parameters compare very well with the input in (29) with differences starting on the third decimal value, showing that the methodology is self-consistent [i.e., input parameters are related to the statistics generated from (19), (20), and (21)]. As is the case for most stochastically generated systems, a long segment of data is needed for the retrieved parameters in (30) to match the input parameters in (29), and there will be some inherent variability when using shorter segments of the data, as shown below.

Although the observational input data (and, by construction, the full CAM-LIM integration) satisfy the CAM-LIM constraints in (24)–(26), for short enough data segments, sampling variability may cause these constraints to be not satisfied. Practically, this becomes a problem because these “short enough” segments may be longer than the available dataset for a particular application. To partially overcome this, we need a redefinition of the sample statistics that take into account the constraints in (24)–(26). This can be done in several ways. Taking into account the information provided by the constraints, we choose a simple redefinition of matrix in (22) (recall the entry corresponds to variable *j* kurtosis) as the rescaled matrix . This is similar to the methodology used by Sardeshmukh et al. (2015) in the univariate version of this problem. The strategy is as follows. We increase *α* up to the point the first constraint in (24) is satisfied for both variables.^{4} But as *α* is increased, the second constraint in (25) may or may not be satisfied. In most cases, increasing *α* just to the point where the first constraint is satisfied results in values that are barely above zero, implying large values in order to satisfy the skewness and kurtosis budgets in (17) and (18). If a particular is too large, (26) is likely not satisfied. This implies that given the statistics, and values are constrained to be inside a surface defined by in (24) and in (25) and (26). Basically, increasing *α* allows us to find the interval of values that and can take to stay inside that surface. Although we are not guaranteed an unbiased calculation of and , following this procedure, we recover values that are at least within the much narrower band of possible values.

As an example, Fig. 5 shows the histogram of retrieved values when using data segments that match the length of the observational input (31 winters; 31 × 181 = 5611 daily values; Figs. 5a,c) and 100 winters (100 × 181 = 18 100 daily values; Figs. 5b,d) of the full CAM-LIM integration (~3.47 × 10^{6} daily values). For visual clarity, there is a kernel density estimation of the distribution of values superimposed to each histogram. In each 31- or 100-winter partition, we show two different cases, one denoted “*α* = 0” and one denoted “*α* varying.” The *α* = 0 case shows the distribution of the retrieved values for the segments where both constraints in (24) and (25) are satisfied without modification of matrix (38% for the 31-winter segment length case and 69% of the time for the 100-winter case; see Fig. 5e), and for the *α*-varying case, the distribution of retrieved values after the procedure described above is followed ( redefined as ; note that this includes the *α* = 0 instances). The bright green point denotes the value of the input parameter calculated from the observational data [(29)]. Figure 5e shows the percentage of time when the constraints are satisfied as a function of *α*. Some general takeaways from this figure are as follows: as might be expected, the longer the segment considered, the better representation of the long-term statistics and the faster (24) and (25) are satisfied (Fig. 5e). Also expectedly, segments that satisfy the aforementioned constraints without modification of (*α* = 0) provide a better match of the long-term statistics, though considerable sampling variability exists. Finally, although it can be further refined, the procedure of redefining the sample matrix produces reasonable estimations, meaning that in this case, approximated values can be retrieved by redefinition of the sample statistics. This result may prove useful in practice when using CAM-LIM to model other systems. Note that in general, the noise terms are much harder to estimate. For example, a length of 500 winters is needed for the standard deviation of retrieved values to be within 10% of the input value in (29). On the other hand, as expected, the “effective drift” values are estimated much faster, for example, only needing segments of ~25 winters for the retrieved-values standard deviation to be within 10% of the input value.

## 5. Concluding remarks

In this paper, we consider a natural extension of the linear inverse model framework. Here, in addition to an additive Gaussian white noise component, the system is driven by a simple state-dependent noise formulation, termed CAM noise (Sardeshmukh and Sura 2009). Compared to a standard LIM, this framework generates the same (lag and contemporaneous) covariance structure and the same expected evolution of anomalies while at the same time generating skewness and excess kurtosis. One important result is that the statistical moments generated by this system are constrained. One of the constraints identified here generalizes the well-known univariate CAM noise constraint [(1)] between skewness *S* and kurtosis *K* to include the effects of coupling and noise covariance. In common with the univariate case, the coupled time series generated are typically kurtotic, making this an attractive framework to model extreme-events frequencies in many cases. The univariate constraint has been shown to be relevant for different climate variables (Sardeshmukh and Sura 2009; Sardeshmukh and Penland 2015; Sardeshmukh et al. 2015). We expect the multivariate constraint in (24) to provide additional information for coupled datasets.

We illustrate the general framework by using a locally coupled model of ocean–atmosphere interaction in midlatitudes. We calculate the model parameters using available sea surface temperature and near-surface atmospheric temperature at an ocean weather station. We show that, compared to a standard LIM, CAM-LIM better reproduces the joint PDF of and as well as the individual PDFs. Importantly, both light and heavy tails are better described by the CAM-LIM formulation, which may be of interest also in the modeling of lighter-than-Gaussian tails (e.g., Loikith and Neelin 2015). Practical issues related to the implementation of the model, including the amount of data needed, were also discussed. An important point here is that knowledge of the statistical constraints arising from this framework can be used to improve the parameter estimation in cases where there are insufficient data to adequately resolve the statistics of the system.

Although here we presented the concrete example of a midlatitude coupled model, we picture this framework to have wide applicability. Specifically, any system where the time resolved dynamics is reasonably linear but significant deviations from Gaussianity are present is susceptible to be modeled using CAM-LIM. Here, we note that the model has been tested in other contexts, including higher-dimensional systems, with good results (Martinez-Villalobos 2016). Implicit in the derivation of this framework is a separation of the dynamics between slow and fast time scales. Here, we argue (together with many other studies) for the importance of the fast unresolved part of the dynamics in shaping not only the variance of the resolved dynamics but also the mean state (through the noise-induced drift), asymmetry in the PDF, and the behavior of the extremes. The tool presented here can be valuable to quantify these effects.

## Acknowledgments

This work was supported by National Science Foundation Grants AGS-1463643 and AGS-1463970 (CM, DJV and MN), and National Science Foundation Grant AGS-1540518 (CM and JDN). We thank Dr. Zhaohua Wu for handling the manuscript and two anonymous reviewers for their insightful comments.

### APPENDIX

#### Derivation of CAM-LIM Parameters

The starting point here is the Fokker–Planck equation [see (2)], which applies to a system of the form (3). First, we start by rewriting (10) as

Here, , , and we have explicitly separated the CAM noise coefficients and pure additive noise coefficients . Writing (A1) in (3) form, we have

Separating the Fokker–Planck equation [see (2)] into its deterministic drift (DD), noise-induced drift (ND), and diffusion (DI) parts, we have

Using (A2), DD, ND, and DI are given as follows:

Equations. (A5) and (A6) make explicit the CAM noise processes enter the system in both noise-induced drift and diffusion parts, while the pure additive noise processes only enter in the diffusion. Equations (A4) and (A5) can be combined into an “effective” drift (ED; ED = DD + ND) term as

Here, as in (14), and we have identified the mean noise-induced drift response to be equal to . After all previous steps, the Fokker–Planck equation is the addition of (A6) and (A7):

Multiplying (A8) by the appropriate moment and integrating from −∞ to∞, we obtain (12), (13), (17), and (18) in the main text. Focusing in the diagonal terms, and in statistical equilibrium, this implies the following set of equations that need to be satisfied simultaneously:

Here, (A9) is used to eliminate the mean terms . Then (A10) and (A12) are used to simultaneously eliminate and terms, obtaining the expression for [(19) in the main text], as a function of [previously calculated using (6)], and the system statistics. Here, we can calculate . Interestingly, the skewness budget in (A11) is independent of the pure additive noise amplitude. We use that information to calculate [(20) in the main text] as a function of , , and the statistics of the system. Finally, [(21) in the main text] is calculated as the remainder needed to close the variance budget. It is well known that only the quadratic expression rather than the individual amplitude terms can be extracted from data (e.g., Monahan 2004; Sura and Newman 2008). In this simplified system, there are unique expressions for (up to a plus-and-minus sign) and , but if more complex CAM noise formulations are specified, only quadratic and forms will be extracted from data.

## REFERENCES

*Applied Smoothing Techniques for Data Analysis: The Kernel Approach with S-Plus Illustrations.*Oxford Statistical Science Series, Vol. 18, Clarendon Press, 193 pp., https://searchworks.stanford.edu/view/3749403.

*Handbook of Numerical Analysis: Computational Methods for the Atmosphere and the Oceans*, R. Temam and J. Tribbia, Eds., Elsevier, 279–306.

*Stochastic Methods: A Handbook for the Natural and Social Sciences.*Springer Series in Synergetics, Vol. 13, Springer, 447 pp.

*Nonlinear Dynamics in Geosciences*, A. A. Tsonis and J. B. Elsner, Eds., Springer, 485–515.

## Footnotes

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JAS-D-17-0235.s1.

*Publisher’s Note:* This article was revised on 29 January 2018 to include a funding source in the Acknowledgments that was missing when originally published.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

Note that Sardeshmukh et al. (2015) derive a stricter bound . This is discussed in section 3b in the context of the multivariate system presented here.

^{2}

In this case, the mean of the conditional PDF will not correspond in general to its most probable value (Penland 2007).

^{3}

Note that the denominator of (19) is always positive (Wilkins 1944).

^{4}

This provides a conservative estimate. In this step, we may choose to calculate the constraint variable by variable, and some variables may be recovered faster.