## 1. Introduction

In climate research it is common practice to fit a statistic or stochastic model to time series of observed variables. A familiar example is the projection of data on a regression model (e.g., von Storch and Zwiers 1999). The master equation (e.g., Gardiner 1983, 8–11) is a prognostic equation for the probability density function (PDF) among discrete states of a system. A discrete time approximation of the master equation is used in this paper in discretized phase spaces spanned by climate variables. The coefficients of a discrete time master equation are probabilities for cell transitions. These probabilities can be estimated from a time series of the variables (e.g., Egger 2001), hence the attribute empirical. The empirical master equation (EME) is described in section 2. The PDF forecasts given by an EME can be used for making probabilistic predictions. Another use of the EME is for studying the processes underlying the variable set. For example, Egger (2001) studied inter alia the evolution of the mean position, that is, a trajectory in phase space, from various initial conditions. This approach is applied in the second part of this paper (Dall’Amico and Egger 2007, hereafter Part II) for studying the relationship between the variables. Another example is offered by Pasmanter and Timmermann (2002), where the entropy production is derived from the coefficients of EMEs for assessing predictability time scales.

The EME is not new in the atmospheric sciences. For instance, Spekat et al. (1983) analyzed the zonal, mixed, and meridional weather regimes from a centennial time series on the basis of an empirical first order Markov model; the latter is closely related to the EME used in this paper (see section 2). Fraedrich (1988) applied inter alia a Markov chain model to the problem of estimating predictability time scales from annual time series of ENSO. Egger (2001) derived master equations from time series of the equatorial components of the global angular momentum of the atmosphere and related torques in order to analyze dynamics in the phase plane of two variables at any one time. Pasmanter and Timmermann (2002) applied the theory of cyclic Markov chains to the ENSO predictability problem; Crommelin (2004) studied the issue of atmospheric circulation regimes in Northern Hemisphere winter using a similar Markov model.

EMEs are numerical structures whose numerical properties are not yet well known. Some of the factors influencing the quality of an EME depend on choices made by the user such as the number and the choice of the variables, and the type and degree of phase space partition. Only rules of thumb are available to assess an adequate level of partitioning. Other factors are predetermined, as for instance the accessible climate time series and computer resources. The length and resolution of the available time series greatly influence the quality of an EME. The number of variables must be small because long data records are needed to correctly estimate the coefficients of an EME, and any addition to the number of variables involved dramatically increases the amount of data required (see also Crommelin 2004, section 2). This paper presents for the first time three-dimensional EMEs, yet in many applications the studied system can be expected to have more dimensions.^{1} It is the purpose of this part of the paper to address these problems by taking output from the Lorenz (1963) model with additional white noise forcing as a data basis (section 3). Systematic variations of grid size, time series length, sampling interval, and the number of variables are conducted in section 4 in order to study their effect on how well the EME reproduces the dynamics of the studied system. The conclusions are outlined in section 5. The results of this part of the paper provide guidelines for the application of this methodology to any problem in (and beyond) the atmospheric sciences. Real data of limited length will be considered in Part II, were these guidelines are applied to EMEs for the quasi-biennial oscillation of equatorial stratospheric wind (see the review by Baldwin et al. 2001), the 11-yr solar cycle (e.g., Labitzke and van Loon 1999), and the northern annular mode (Thompson and Wallace 2000). This two-part paper is based on work presented in Dall’Amico’s doctoral thesis (Dall’Amico 2005).

## 2. EMEs

*q*. We consider partitions of that region of the

*q*axis where the PDF is nonzero into grid intervals of equal size

*Dq*(see Fig. 1). (Other approaches are found in literature.)

^{2}This region extends between

*q*

_{min}= 0 and

*q*

_{max}=

*i*

_{max}

*Dq*. The center of grid interval (

*i*) is located at

*q̃*(

*i*) =

*q*

_{min}+ (

*i*− ½)

*Dq*. Let us assume that observations are available at discrete points in time

*t*=

_{n}*t*

_{0}+

*n Dt*, where

*Dt*is the sampling time interval; time is also considered as a discrete variable. Any state for which

*q*satisfies

*q*

_{min}+ (

*i*− 1)

*Dq*≤

*q*<

*q*

_{min}+ (

*i*)

*Dq*is located in (

*i*). The PDF is defined in the following way:

*f*(

_{i}*t*)

_{n}*Dq*is the probability that the state variable

*q*is in grid interval (

*i*) at time

*t*. It is normalized so that Σ

_{n}^{imax}

_{i=1}

*f*(

_{i}*t*)

_{n}*Dq*= 1 for every

*n*. Obviously 0 ≤

*f*(

_{i}*t*) ≤ 1/

_{n}*Dq*. The discrete time master equation describes the dynamics in this discretized version of the phase space (see Zwanzig 2001, 61–63):where the transition coefficient

*W*

^{i′}

_{i}gives the probability that the state variable

*q*leaves grid interval (

*i*) to enter (

*i*′) at the next time step (see Fig. 1). Gains (losses) in probability density by transitions are described by the second (third) term on the right. Boundary conditions are not needed provided the domain of the solution contains all available observations and these adequately represent all those states which the system may possibly occupy. Equation (1), though linear in

**f**, may capture nonlinear relationships between the variables. It can be written in a more compact form:where

*D*

^{i}_{i′}=

*W*

^{i}_{i′}−

*δ*

_{ii′}Σ

_{p}

*W*

^{p}_{i′}. Since Σ

_{i}

*D*

^{i}_{i′}= 0 probability is conserved. Further, transition probabilities cannot be negative, so all off-diagonal terms of 𝗗 will be positive or zero. Thus, according to standard matrix theorems, 𝗗 will have at least one zero eigenvalue and all other eigenvalues will have negative real parts describing an approach to equilibrium. Matrix 𝗗 may have more than one equilibrium state so that different initial states may lead to different stationary solutions at infinite time (Zwanzig 2001). From Eq. (2) follows:which shows the equivalence between a discrete time master equation in the form of Eq. (1) and a first-order Markov chain description in the form of Eq. (3). In this paper, we assume that the underlying systems are ergodic, so there is only one equilibrium state (see Spekat et al. 1983). The eigenvector associated with the biggest eigenvalue of matrix 𝗪; that is, the unit eigenvalue

^{3}represents the climatological equilibrium distribution

**f**

*n*, to this climatological mean (Spekat et al. 1983). (The value of the second biggest eigenvalue of 𝗪 gives information on how fast the system tends to

**f**

**:where**

*ρ**N*is the number of events when an observation falls in (

_{i}*i*); hence Σ

^{imax}

_{i=1}

*ρ*= 1. Stationarity may be guaranteed by checking that the time series does not end in a cell from where no transition is observed.

_{i}Dq*M*

^{i′}

_{i}is the number of transitions from (

*i*) to (

*i*′) observed in the time series. Conservation of probability requires Σ

_{i′}

*W*

^{i′}

_{i}= 1 so that Σ

_{i′}

*M*

^{i′}

_{i}=

*N*. Therefore

_{i}*N*is not increased when a time series ends in (

_{i}*i*). Transition probabilities have been estimated from data, for instance, by Spekat et al. (1983), Nicolis et al. (1997), Egger (2001, 2002), Pasmanter and Timmermann (2002), and Crommelin (2004).

*i*) toward (

*i*′) are expected to be binomially distributed (Spekat et al. 1983). Out of

*N*transitions from (

_{i}*i*),

*N*·

_{i}*W*

^{i′}

_{i}are expected to result in transitions to (

*i*′), implying that

*N*·

_{i}*W*

^{i′}

_{i}is the expected value from

*N*independent binomial Bernoulli trials where at each trial the probability of “success;” that is, transition to (

_{i}*i*′) is

*W*

^{i′}

_{i}and the probability of “no success” 1 −

*W*

^{i′}

_{i}(Spekat et al. 1983). For the central limit theorem, the distribution will tend, for large sample sizes, to a normal distribution whose 90% confidence interval has a half-width, recall Eq. (5), of

*c*

^{i′}

_{i}= 1.645

*W*

^{i′}

_{i}(1 −

*W*

^{i′}

_{i})/

*N*

_{i}^{2}to 10

^{3}and more cells are considered, so even 10

^{6}transitions are conceivable; an individual consideration of the significance level of just a few selected transitions

^{4}is senseless. Therefore use is made here of a weighed ratio averaged over the whole phase space:where

*ρ*, recall Eq. (4), acts as a weight. The ratio

_{i}*R*can be interpreted as a noise-to-signal ratio. A value of

_{w}*R*of the order unity would suggest that the half-widths of the confidence intervals of the transition coefficients’ estimates are of the same magnitude as the estimates themselves and their statistical significance is low.

_{w}Depending on the system, transition coefficients may depend on time. Seasonal dependence was introduced for instance by Pasmanter and Timmermann (2002), who estimated transition matrices for each month of the year on the basis of a 640-yr-long ENSO model run. However, the amount of data available in observational records is in most cases insufficient to introduce such a time dependence.

^{5}(e.g., Anderson and Goodman 1957). In this paper, we apply the approach suggested by Cencini et al. (1999), Egger (2001), and Pasmanter and Timmermann (2002), that is, to compare correlation functions as delivered by the EME to those obtained directly from the data (i.e., the sample correlation functions). This is a sensible way of testing whether the EME correctly reproduces the behavior of the studied system. The sample cross-covariance function for a nonnegative time lag

*τ*between variables

*q*and

_{κ}*q*can be estimated from a time series as(the primes indicate that anomalies from the mean are being considered). For negative lags, the sum goes from

_{η}*t*

_{0}−

*τ*to

*t*. Covariance functions as given by an EME arise from spatial contributions. This paper focuses on three-dimensional systems. The indices of the cubic cells be

_{N}*i*

_{1},

*i*

_{2}, and

*i*

_{3}, with (

*i*

_{1},

*i*

_{2},

*i*

_{3}) ≡ (

**i**). The contribution to the total estimate of the cross-covariance function for variables

*q*and

_{κ}*q*at time lag

_{η}*τ*from an EME run starting with unit probability in cell (

*i*

^{0}

_{1},

*i*

^{0}

_{2},

*i*

^{0}

_{3}) ≡ (

**i**

^{0}) iswhere lags

*τ*are nonnegative,

*q̃*′

_{λ}refers to the anomaly of the

*λ*th coordinate of the center of the particular cell (recall Fig. 1),

*ρ*

_{i0}, and

**g**is the consequent conditional probability to be calculated from the EME run with

*g*(

**i**

^{0},

*t*= 0) = 1/

*Dq*

^{3}and

*g*= 0 elsewhere, so thatHere

*q̂*′

_{η}(

**i**

^{0},

*t*= 0;

**i**,

*t*=

*τ*) = Σ

_{i}

*q̃*′

*(*

_{η}g**i**

^{0},

*t*= 0;

**i**,

*t*=

*τ*)

*Dq*

^{3}is the anomaly of the mean

*q*coordinate at lag

_{η}*τ*for the PDF run that had started with unit probability in (

**i**

^{0}). The contributions sum up to the total estimate of the covariance function,

*s*

_{EME}[

*q*

_{κ}(

*t*),

*q*

_{η}(

*t*+

*τ*)] = Σ

_{i0}[

*s*(

_{qκqη}*τ*)]

_{i0}. Covariances for negative lags may be obtained from

*s*

_{EME}[

*q*(

_{κ}*t*),

*q*(

_{η}*t*+

*τ*)] =

*s*

_{EME}[

*q*(

_{η}*t*),

*q*(

_{κ}*t*−

*τ*)].

## 3. The Lorenz model with additional white noise forcing

*x*,

*y*,

*z*, and time are nondimensional, the Prandtl number (Pr),

*r*, and

*b*are parameters and

*α*is a constant. A Runge–Kutta method of the fourth order with time step

*Dt*

_{RK}= 0.001 is used.

^{6}New values for

*ξ*are continuously generated throughout the numerical integration of the model equations. The

*h*th value

*ξ*assumes is

*ξ*=

^{h}*γ*/

^{h}*Dt*

_{RK}

*γ*is the

^{h}*h*th value output by a random number generator with Gaussian deviate, 〈

*γ*〉 = 0, and 〈

^{h}*γ*〉 =

^{h}γ^{j}*δ*[see the Box–Muller algorithm in Press et al. (1999)]. Hence 〈

_{hj}*ξ*〉 = 0 and 〈

^{h}*ξ*〉 =

^{h}ξ^{j}*δ*/

_{hj}*Dt*

_{RK}, which is the numerical approximation to a Dirac function. The evolution of a cloud of points is not altered by reducing

*Dt*

_{rk}by a factor of 10. Time series are then created from the integration run by sampling variable values every time interval

*Dt*≫

*Dt*

_{rk}; represents the sampling interval of an atmospheric observation and becomes the time step of the EME.

The Lorenz model displays chaotic dynamics for a suitable choice of the parameters. We choose the standard parameter set Pr = 10, *r* = 28, and *b* = 8/3 (e.g., Lorenz 1963; Palmer 1993; Kaplan and Glass 1995). The state vector **q** = (*q*_{1}, *q*_{2}, *q*_{3}) = (*x*, *y*, *z*) evolves around the famous Lorenz attractor with its two butterfly-wing-shaped lobes. A trajectory far from the Lorenz attractor rapidly approaches the attractor, whereas trajectories near the attractor show sensitive dependence on initial conditions.

The Lorenz model has often been taken as a paradigm of large-scale atmospheric circulation (e.g., Palmer 1993). Though only three-dimensional, it reflects many of the properties of the full climate system (Thuburn 2005). It is a shortcoming of the model in comparison to the atmosphere that its diffusivity in phase space is fairly low. White noise in inserted partly for this reason. Moreover, the divergence of the Lorenz model without stochastic forcing is, with **∇** · (*ẋ*, *ẏ*, *ż*) = −Rr − 1 − *b*, negative and constant. This means that the phase space occupied by the trajectories is shrinking continuously onto the Lorenz attractor. The additional white noise forcing acts against this frictional contraction, leading to a stationary stochastic system (see von Storch and Zwiers 1999, 1–2), as needed for EMEs with time-constant transition coefficients. A noise amplitude *α* = 2.5 is chosen, such that a modest diffusion of the states is obtained (see Figs. 5a,b), which does not drastically alter character and shape of the trajectories (see Fig. 2).

Three time scales are associated with the model. The one describing the evolution of a trajectory about the (weakly) unstable fixed point at the center of each attractor wing is *t *_{win} ≈ 0.7. The residence time, *t *_{res}, in a wing varies approximately between 1 and 10 time units. These time scales can be seen in Fig. 2 (solid line), where the evolution of the *x* component from an arbitrary initial state is shown as a function of time. The dependence on initial conditions is revealed by the evolution of the dashed line, where the white noise forcing is identical but the initial conditions are slightly different from those of the solid line. There exists also a diffusive time scale, *t *_{dif} ≈ *L*/*α*, due to the stochastic forcing (*L* ≈ 20 is the diameter of a wing), with *t *_{win} < *t *_{dif}.

Both the choice of the Lorenz model and that of a rather weak white noise forcing are quite challenging. The attractor has a complex shape and a relatively fine grid size is required in order to resolve its wings. The demands on the quality of the EME would have been less stringent if a system with a more trivial attractor had been chosen or, as often seen in literature, if a stronger noise had been used [as in e.g., Gradišek et al. (2000) and Thuburn (2005), where Fokker–Planck equations are considered]. With weak white noise, a cloud of points smears quite slowly. In contrast, as *α* grows, the diffusion of a cloud of points due to the noise term becomes comparable to the numerical diffusion acting on the PDF forecast by an EME derived from a long time series (see section 4), leading to better predictive skill when comparing the two.

## 4. Results

The impact of the choice of the main numerical parameters on the quality of the EME is discussed in the following subsections. The transition coefficients of each EME are estimated from a single time series whose length is indicated as Δ*t *.

### a. Grid size

The choice of the grid size depends on the problem at hand. At best, a few guidelines can be formulated. The data of a “long” time series beginning near the attractor are included in a parallelepiped with a volume of about 50 · 60 · 50. This volume is the domain of solution. The grid size is set initially depending on the features that have to be resolved. In Fig. 3 is shown how the observed state density for a time series of length Δ*t * = 51200 varies depending on the grid size chosen to partition the phase space. With a grid size *Dq* = 5.00, the above parallelepiped is partitioned into 1200 cells (see Fig. 3a, where −30 ≤ *y* < 30). Almost 250 of these cells intersect the attractor. A relatively fine grid size is needed because of the complicated structure of the attractor. As may be seen in Fig. 3a, the choice of *Dq* = 5.00 does not resolve the “holes” in each of the butterfly wings. They are resolved with *Dq* = 2.50 (Fig. 3b) and the overall picture contains more details when *Dq* = 1.25 (Fig. 3c). However, the required computer cost grows dramatically by reducing the grid size. With *Dq* = 2.50, approximately 1000 cells intersect the attractor, and with *Dq* = 1.25 almost 4750 cells. Moreover, as the number of cells representing the domain of the solution grows by decreasing *Dq*, since transitions to and from all cells have to be considered, the size of the array representing the transition matrix, 𝗪, grows with the square of the number of cells. The time needed for computing correlation functions increases with approximately the eighth power of the inverse of the grid size for this three-dimensional case. Therefore the grid size is set at *Dq* = 2.50.

Figure 4 reports the ratios *R _{w}* for grid sizes

*Dq*= 2.50 and

*Dq*= 1.25 as a function of the length of the time series used to estimate 𝗪 for an arbitrary time resolution; the time series must be about eight times longer for the

*R*values for

_{w}*Dq*= 0.125 to be as low as those for

*Dq*= 2.50. While deriving an EME from an observational time series, the grid size might have to be adjusted according to the ratio

*R*and to the correlation functions delivered by the EME. Correlation functions as delivered by EMEs confirm the choice

_{w}*Dq*= 2.50 in terms of quality and of needed computer resources also with respect to other grid sizes (not shown).

### b. Time series length

The length of the time series affects the value of the ratio *R _{w}*, as may be seen in Fig. 4. The values of

*R*in Fig. 4 are quite high for short time series even for the chosen grid size,

_{w}*Dq*= 2.50. For Δ

*t*< 200, for instance,

*R*> 0.5, which means that the statistical significance of the transition coefficients’ estimates is low.

_{w}By affecting the estimate of the transition coefficients, the time series length can also have an impact on the PDF forecasts. As a case study, PDF forecasts from a particular initial condition, which are given by EMEs derived from time series of different lengths, are compared to the evolution of an ensemble of points. The latter is obtained by integration of the Eqs. (7) and is shown for *t * = 0, 0.2, 0.4, 0.6 in Fig. 5a and for *t * = 0.8 in Fig. 5b. These points are initially located in a cell of the phase space discretized with *Dq* = 2.50. This cell includes part of the attractor. Palmer (1993) pointed out that there are portions of the Lorenz attractor where trajectories depart fairly slowly. In other cases, an ensemble of points reaches the splitting region of the attractor and adjacent trajectories diverge toward the two different wings, so that there are regions of the attractor that are relatively more sensitive to initial conditions. The cloud in Fig. 5a moves partly through the splitting region between *t * = 0.3 and *t * = 0.4, and only very few points are located on the left-hand side of the attractor at *t * = 0.8. Losses in prediction skill are expected if the PDF evolves occupying the splitting region imprecisely. Initial conditions as in Fig. 4 of Palmer (1993), which either evolve far away or directly through the splitting region, represent an easier task for the EME. The one in Fig. 5a is challenging and shows how the estimation of 𝗪 from a very short time series may lead to very poor forecasts. The forecasts are considered at time *t * = 0.8. This time is close to the minimum time generally needed for nearby initial conditions to diverge toward the two wings of the attractor (see Palmer 1993). The reference density of ensemble members, ** μ**, that is, the relative frequency of points per cell integrated along the

*y*axis, is shown for

*t*= 0.8 in Fig. 5c. To compare the PDF with

**, the latter is normalized accordingly: Σ**

*μ*_{i}

*μ*

_{i}

*Dq*

^{3}= 1. In Fig. 5c there is a well-marked head on the right wing of the attractor and a long thinly populated tail. The PDF forecasts in Fig. 6 are compared to Fig. 5c. These forecasts are given by EMEs derived from time series of different lengths. The time series are obtained by extending the integration shown by the solid line in Fig. 2. No forecast is possible if the time series is shorter than 50 time units since no observation falls into the starting cell. The forecast shown in Fig. 6a is of poor quality. The forecasts shown in Figs. 6d,e,f are of better quality and almost identical. In these forecasts, the PDF is higher on the right wing of the attractor, just as the density of ensemble members in Fig. 5c; the PDF on the left wing is a result of the numerical diffusion (see below), which leads to nonzero PDF for time

*t*= 0.4 (see Fig. 5a) on a wider region, a part of which evolves to the left attractor wing.

*S*. At any given time,

*S*is a weighted measure of the mean square deviation between

**(Fig. 5c) and**

*μ***f**(Fig. 6), the weight being

**(Gardiner 1983, p. 40). Moreover,**

*μ**S*is normalized with the outcome in the case where the two patterns are fully apart (

*S*= 1):A perfect forecast scores zero. Values of

*S*for the PDF forecasts for

*t*= 0.8 shown in Fig. 6 (and others) are shown in Fig. 7 as a function of the time series length. A value

*S*> 1 may occur if the PDF largely overestimates the point density in some cells. This can be the case when very short time series are used. The scarcity of data implies that the small number of observed transitions carry higher PDF values. This could also result in a better skill measure than that obtained with a longer time series, as is the case for the lower

*S*values corresponding to Δ

*t*= 100 and Δ

*t*= 200 (see also the forecast corresponding to Δ

*t*= 200 in Fig. 6b). For Δ

*t*= 50 (Fig. 6a) however, higher PDF values are erroneously carried to the left side of the attractor. It is interesting that the skill hardly improves for time series lengths Δ

*t*> 10

^{3}, which is a threshold length for the chosen grid size

*Dq*= 2.50 and for the time resolution

*Dt*= 0.020. For these time series lengths,

*R*is of order 0.1 (or lower). If the time series is obtained from observations, no skill may be computed as just one realization occurs in reality; however,

_{w}*R*may still be calculated.

_{w}*l*) and 𝗪(

*s*) estimated on the base of the longer and the shorter of two time series, respectively. The coefficient

*C*is shown in Fig. 8 as a function of the longer time series length when this has twice the length of the shorter. As the time series length goes to infinity,

_{w}*C*tends to zero, suggesting that 𝗪 converges. The weight

_{w}*ρ*

**(**

_{i}*l*) reduces the effect of outliers in very long time series and at the borders of the attractor, which influence parts of 𝗪 that are not relevant for the evolution of the PDF. (Calculations carried out without weighting do not lead to convergence.) The results in Fig. 8 confirm that the extension of time series to extreme lengths does not produce a noticeable improvement in the estimate of 𝗪. As an effect of using a finer grid size,

*Dq*= 1.25, corresponding

*C*values in Fig. 8 are higher; the same results from the use of a coarser time resolution (not shown).

_{w}### c. Time resolution

The transition coefficients, and consequently their estimate after Eq. (5), depend on the time step. On one hand, a fine time resolution is desirable in order to improve the statistical significance of the transition coefficients. The ratio *R _{w}* generally decreases with increasing Σ

_{i}

*N*

_{i}as is the case for a long time series and a fine time resolution. On the other hand, the computing time increases by reducing the time step and, since the time scales of the investigated phenomena are usually known,

*Dt*should not be unnecessarily fine. The time resolution also sets a limit to the highest systems’ frequency that an EME will be able to reproduce.

Figure 9 shows PDF forecasts obtained with different time steps. Figure 9b shows for reference the forecast obtained with a time series of length Δ*t * = 3200 and time resolution *Dt* = 0.020 (same as Fig. 6d), that is, longer than the threshold length discussed in section 4b. Figure 9a shows the PDF forecast delivered by the EME when the sampling frequency of the same time series as for Fig. 9b is increased by a factor of 5, that is, *Dt* = 0.004. In contrast, for Fig. 9c, the time resolution is coarser than in Fig. 9b by a factor of 5, *Dt* = 0.100. The best forecast of the three is Fig. 9c, where the PDF, integrated along the *y* axis, exceeds a value of 0.0100 on the right wing of the attractor. Surprisingly, this forecast is the one obtained with the coarsest time resolution, *Dt* = 0.100. The skill scores *S* computed for the forecasts in Fig. 9 confirm this. For the finest time resolution the skill is the worst at 0.83 (Fig. 9a), and improves to 0.75 with a time resolution *Dt* = 0.020 (Fig. 9b); the skill is best at 0.48 for the coarsest time resolution (Fig. 9c).

The two drawings in Fig. 10 represent a simple example, which helps one understand the surprisingly rapid smearing of the PDF observed in connection with a finer time resolution. In Fig. 10b, the time step is 3 times larger than in Fig. 10a. The PDF values written in Fig. 10 are predicted by this illustrative EME on the base of the few observations available. It is easy to follow the evolution of the PDF and to realize that the PDF smears much faster in the case with shorter *Dt* (Fig. 10a). Given good estimates of the transition coefficients, the EME makes best forecasts, whatever the initial condition, over one single time step no matter how large *Dt*. (The PDF forecast is almost perfect if the cloud of states initially coincides with a cell.) For the previous case study (Fig. 9), *Dt* = 0.8 is the best time resolution since only one time step is needed for prediction. After any time step, the PDF is spread evenly over each cell. This spreading is unavoidable and implies diffusion. This is a specific example of numerical diffusion. While Fig. 10 is illustrative, exact statements about the numerical spreading of PDFs can be made for simple systems; in the appendix, an explicit solution to the problem of numerical spreading is given for an advective case.

Figure 11 shows some correlation functions^{7} estimated directly from the time series (solid lines) and delivered by the EMEs that gave the forecasts in Fig. 9. The transition coefficients of these EMEs are estimated from time series differing only in their sampling interval whereas the starting point, the evolution, and the length are the same. Figure 11 shows that all EMEs approximate the decay of the correlation functions extremely well. The EME based on the time series with the coarsest time resolution does the best job in reproducing the sample correlations, confirming the results seen in the previous case study. This finding contrasts with standard results where the numerical diffusion is reduced for smaller time steps (Mesinger and Arakawa 1976). Given the low diffusion in the time series due to the model’s chaotic nature and due to white noise forcing, numerical diffusion prevails unless a much finer grid size is used at which the diffusion in the data dominates the advective transport (see Durran 1998, p. 139). As the time step grows, the ratio between the scale of diffusion in the data and the grid size increases, thus weakening the impact of numerical diffusion. As can be seen in Fig. 11, the decorrelation time for the time series in question is much longer than the time resolutions considered (*Dt* = 0.004, 0.020, and 0.100). This makes numerical diffusion the sole explanation for the result that the PDF forecasts delivered by the EME improve by increasing the time step. A reduction to numerical diffusion can also be sought by reducing the grid size. However, besides implying a growth of the ratio *R _{w}*, a grid size reduction also causes a rapid increase of the requirements for the computing resources.

The encouraging results in Figs. 9c and 11 (dashed–dotted lines) have important practical implications. With *Dq* = 2.50, about *ζ* = 10^{3} cells intersect the region of the phase space occupied by the observations and up to *ζ*^{2} = 10^{6} transitions are conceivable. Yet the EME was derived from a time series of length Δ*t * = 3200 and time resolution *Dt* = 0.100, that is, of only 32 × 10^{3} data points. For the EME in question, *R _{w}* = 0.29. Transitions occur mainly toward the cells along the main direction of movement (not backward or in any transverse direction), and in this case merely 3.2 × 10

^{3}transitions can actually take place. For an atmospheric application where the region occupied by the observations has a much simpler structure than the attractor of Eqs. (7), the grid size can be initially set to a value that gives a few hundred cells. The grid size can then be adjusted depending on the ratio

*R*for the considered time step. The adherence of the correlation functions may also suggest a change of the numerical parameters. This approach is adopted in Part II with positive results.

_{w}The EME used here is discrete in time and phase space. The Fokker–Planck equation, on the other hand, is a partial differential equation that can also be used to predict the PDF at least under favorable circumstances (Zwanzig 2001). In practice, the Fokker–Planck equation must be solved numerically, whereby it is transformed into a finite difference equation (e.g., Thuburn 2005). The drift and diffusion terms of a Fokker–Planck equation may also be estimated from data (Siegert et al. 1998). This technique was applied, for example, by Egger and Jònsson (2002) to meteorological observations in the Icelandic region. An empirical discrete Fokker–Planck equation can be seen as a particular case of an EME. Master equations are much more general than Fokker–Planck equations (Zwanzig 2001). In a discretized version of the phase space, the coefficients within the parentheses in Eq. (8) can also be estimated from a time series of the variables. Gradišek et al. (2000) applied this technique to time series of various origin, among these some stemming from the Lorenz (1963) model with additional stochastic forcings. They found that the maximum acceptable time step needed for the estimates of the drift and diffusion coefficients to converge is shorter than the time step required for the integration of the corresponding differential equations. Moreover, the integration of the Fokker–Planck equation is constrained by the Courant–Friedrichs–Levy criterion and its numerical implementation involves normally only neighboring cells. Large time steps can be used only with the EME, which considers transitions to any cell in phase space. Within a time *Dt* = 0.100, for instance, the state vector may jump to a cell separated from the previous one by up to 12 cells of grid size *Dq* = 2.50.

### d. Dimension of the EME

In a more realistic situation, the EME does not contain as many variables as the investigated system. In practice, EMEs with, say, three variables are applied to systems with many variables. To assess the effect of considering a reduced set of variables, a two-dimensional EME is derived from a projection on the (*x*, *z*) plane of the same time series, which led to the prediction in Fig. 9c. In this case, variable *y* does not appear in the EME. The left panel of Fig. 12 shows the forecast delivered by such an EME starting from the same initial condition as in Fig. 9c. The partition of the *x* and *z* axes is unchanged. The value of *R _{w}* is now 0.19, whereas

*R*was 0.29 in the three-dimensional case. Such a reduction should not surprise since the same number of data points as in the three-dimensional case is now used to estimate a much lower number of transitions. The forecast in the left panel of Fig. 12, where the PDF is about equally distributed on either attractor wing, is clearly worse than the corresponding three-dimensional one (Fig. 9c). The evolution of the autocorrelation function of the

_{w}*x*component,

*r*, is shown in the right panel of Fig. 12 and beyond about 0.3 time units it is not as good as the one delivered by the three-dimensional EME. In the case of an observational time series, the introduction of another variable should be considered.

_{xx}^{8}While deriving EMEs from atmospheric datasets, however, it is practically impossible to consider a complete set of variables. In the example of Fig. 12, the low value of

*R*might have induced optimism, yet this EME does not quite a good job in reproducing the dynamics of the studied system. An

_{w}*R*value below, say, 0.4, suggests that the amount of data available is adequate for the number of transitions to be estimated. For

_{w}*R*values between 0.4 and 0.6, much caution in the study of the EME is recommended if the user decides not to increase the grid size. Values above 0.6 should lead to the use of a coarser grid size. Whatever the

_{w}*R*value, caution is recommended in evaluating results obtained from an EME if its correlation functions suggest that the behavior of the system is not adequately reproduced. An atmospheric application requiring a five-dimensional rather than a three-dimensional EME is discussed in section 3b of Part II.

_{w}## 5. Conclusions

EMEs are constructed directly from data and provide a model of the phase space dynamics of a system. They may capture nonlinear behavior. A study of the numerical properties of EMEs has been the object of this part of the paper. This is the first time that grid size, time series length, time step, and phase space dimensionality have been studied systematically. The time series needed for the numerical study have been generated by numerical integration of the equations of the Lorenz model with additional white noise forcing. Thus, time series with the desired characteristics could be generated easily. The choice of the dynamical model has been quite challenging. The attractor of this model has a fairly complicated structure so that a fine grid size is needed in order for the EME to replicate the motion in phase space. Moreover, the model’s diffusivity due to its chaotic nature and due to the white noise forcing is fairly low, making numerical diffusion a dominant factor. The significance of the estimates of the transition coefficients of the EME has been assessed in terms of a weighted averaged ratio, *R _{w}*, between the half-width of the confidence intervals of the transition coefficient estimates and the estimates themselves. The adherence of correlation functions as delivered by the EME to those estimated from the data has been used as a test of how well the EME reproduces the dynamics of the system.

We find that:

- The grid size choice is a compromise between desired resolution in phase space on one side, and available data and computer resources on the other. A moderately fine grid size has delivered very encouraging results.
- The estimate of the transition coefficients improves with growing time series length. Moreover, we find that, for a given grid size and time resolution, there is a threshold time series length beyond which the forecast skill does not improve; this threshold length appears to be moderate. For the Lorenz model with additional white noise forcing, we find a threshold of order 10
^{3}time units. An approach considering the transition coefficient matrix 𝗪 as a multidimensional vector shows that 𝗪 converges with growing time series length. - Surprisingly, EMEs derived from time series with coarser time resolutions show better forecast skills and deliver better correlation functions. This is due to a decrease in the numerical diffusion acting on the PDF forecasts as the time step grows. Best forecasts are obtained with a single time step.
- All in all, encouraging results are obtained considering partitions of the attractor into about 1000 cells with a time series consisting of only 3.2 × 10
^{4}data points. This result supports the applicability of the method to atmospheric time series. For a three-dimensional atmospheric application, we recommend to begin by considering partitions of the “data cloud” into several hundred cells. The numerical parameters can be adjusted according to the corresponding value of the ratio*R*and, finally, to the quality of the correlation functions delivered by the EME._{w} - The example chosen is unusual in that the EME has the same number of variables as the Lorenz model. An EME derived for time series of only the
*x*and*z*components leads, despite a low*R*value, to poor PDF predictions. The adherence of the correlation functions deteriorates with respect to the three-dimensional case. This result demonstrates the importance of the number of variables and calls for caution in cases when the correlation functions delivered by the EME strongly disagree with the ones obtained from the data. This disagreement can occur particularly when high dimensional systems are studied, suggesting, if feasible, to increase the dimension of the EME._{w}

In Part II, EMEs are derived from time series obtained from the ERA-40 re-analysis (Uppala et al. 2005) and observations. These time series are limited in time and hence the existence of a threshold length may not be tested. However, the remaining results of this numerical study, including the surprising role of time resolution, are confirmed.

## Acknowledgments

Financial support was provided by the German Ministry of Education and Research and the German Aerospace Center within KLIMESTO, a project of the German Climate Research Program, Contract 01LD0033. See Part II for more thorough acknowledgments.

## REFERENCES

Anderson, T. W., , and L. A. Goodman, 1957: Statistical Inference about Markov Chains.

,*Ann. Math. Stat.***28****,**89–110.Baldwin, M. P., and Coauthors, 2001: The quasi-biennial oscillation.

,*Rev. Geophys.***39****,**179–229.Cencini, M., , G. Lacorata, , A. Vulpiani, , and E. Zambianchi, 1999: Mixing in a meandering jet: A Markovian approximation.

,*J. Phys. Oceanogr.***29****,**2578–2594.Crommelin, D. T., 2004: Observed nondiffusive dynamics in large-scale atmospheric flow.

,*J. Atmos. Sci.***61****,**2384–2396.Dall’Amico, M., 2005: Data-based master equations for the stratosphere. Ph.D. thesis, Ludwig-Maximilians-Universität of Munich, Germany, 71 pp. [Available online at http://edoc.ub.uni-muenchen.de/archive/00003890/.].

Dall’Amico, M., , and J. Egger, 2007: Empirical master equations. Part II: Application to stratospheric QBO, solar cycle, and northern annular mode.

,*J. Atmos. Sci.***64****,**2996–3015.Durran, D. R., 1998:

*Numerical Methods for Wave Equations in Geophysical Fluid Dynamics*. Springer, 465 pp.Egger, J., 2001: Master equations for climatic parameter sets.

,*Climate Dyn.***17****,**169–177.Egger, J., 2002: Master equations for Himalayan valley winds.

,*Stochastic Dyn.***2****,**381–394.Egger, J., , and T. Jònsson, 2002: Dynamic models for Icelandic meteorological data sets.

,*Tellus***54A****,**1–13.Egger, J., , and M. Dall’Amico, 2007: Empirical master equations: Numerics.

,*Meteor. Z.***16****,**139–147.Fraedrich, K., 1988: El Niño/Southern Oscillation predictability.

,*Mon. Wea. Rev.***116****,**1001–1012.Gardiner, C. W., 1983:

*Handbook of Stochastic Methods for Physics, Chemistry and the Natural Sciences*. Springer, 442 pp.Gradišek, J., , S. Siegert, , R. Friedrich, , and I. Grabec, 2000: Analysis of time series from stochastic processes.

,*Phys. Rev. E***62****,**3146–3155.Kaplan, D., , and L. Glass, 1995:

*Understanding Nonlinear Dynamics*. Springer, 420 pp.Kloeden, P. E., , E. Platen, , and H. Schurz, 1997:

*Numerical Solution of SDE through Computer Experiments*. 2d ed. Springer-Verlag, 292 pp.Labitzke, K. G., , and H. van Loon, 1999:

*The Stratosphere, Phenomena, History, and Relevance*. Springer, 179 pp.Levy, P., 1948:

*Processus stochastiques et mouvement brownien*. Gauthier-Villars, 365 pp.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

,*J. Atmos. Sci.***20****,**130–141.Mesinger, F., , and A. Arakawa, 1976: Numerical methods used in atmospheric models. Global Atmospheric Research Programme (GARP) Publication Series 17, World Meteorological Organization, 64 pp.

Nicolis, C., 1990: Chaotic dynamics, Markov processes and climate predictability.

,*Tellus***42A****,**401–412.Nicolis, C., , W. Ebeling, , and C. Baraldi, 1997: Markov processes, dynamic entropies and the statistical prediction of mesoscale weather regimes.

,*Tellus***49A****,**108–118.Palmer, T. N., 1993: Extended range atmospheric prediction and the Lorenz model.

,*Bull. Amer. Meteor. Soc.***74****,**49–65.Pasmanter, R. A., , and A. Timmermann, 2002: Cyclic Markov chains with an application to an intermediate ENSO model.

,*Nonlinear Proc. Geophys.***9****,**1–14.Press, W. H., , S. A. Teukolsky, , W. T. Vetterling, , and B. P. Flannery, 1999:

*Numerical Recipes in Fortran 77. The Art of Scientific Computing. Volume 1 of Fortran Numerical Recipes*. 2d ed. Cambridge University Press, 933 pp.Siegert, S., , R. Friedrich, , and J. Peinke, 1998: Analysis of data sets of stochastic systems.

,*Phys. Lett. A***243****,**275–280.Spekat, A., , B. Heller-Schulze, , and M. Lutz, 1983: Über Großwetter und Markov-Ketten (“Großwetter” circulation analysed by means of Markov chains).

,*Meteor. Rundsch.***36****,**243–248.Thompson, D. W. J., , and J. M. Wallace, 2000: Annular modes in the extratropical circulation. Part I: Month-to-month variability.

,*J. Climate***13****,**1000–1016.Thuburn, J., 2005: Climate sensitivities via a Fokker–Planck adjoint approach.

,*Quart. J. Roy. Meteor. Soc.***131****,**73–92.Uppala, S. M., and Coauthors, 2005: The ERA-40 re-analysis.

,*Quart. J. Roy. Meteor. Soc.***131****,**2961–3012.Vautard, R., , K. C. Mo, , and M. Ghil, 1990: Statistical significance test for transition matrices of atmospheric Markov chains.

,*J. Atmos. Sci.***47****,**1926–1931.von Storch, H., , and F. W. Zwiers, 1999:

*Statistical Analysis in Climate Research*. Cambridge University Press, 484 pp.Zwanzig, R., 2001:

*Nonequilibrium Statistical Mechanics*. Oxford University Press, 222 pp.

## APPENDIX

### Numerical Diffusion in an Advecting System

*U*

_{0}. It is a standard procedure to test numerical methods on the basis of the advection equation (e.g., Durran 1998). Appropriate boundary conditions can be chosen so that equilibrium solutions to Eq. (A1) are obtained.

*f*(

_{J}*t*

_{0}) ≡

*f*

^{0}

_{J}= 1/

*Dq*for a selected grid interval (

*J*), let

*U*be the distance covered by the mean flow during a time step,

_{o}Dt*Dt*. Thus, the states in grid interval (

*J*− 1)

*Dq*≤ (

*J*)

*Dq*are moved to (

*J*− 1)

*Dq*+

*q*≤ (

*J*)

*Dq*+

*I*− 1) and (

*I*), where

*I*is the integer part of

*J*+ 1 + (

*Dq*). Thusand, therefore,so that Eq. (1) is available with exact transition coefficients.

*f*

^{0}

_{J}= 1/

*Dq*for a specific value of

*J*, the analytic solution to Eq. (A1) isEquation (1) gives Eq. (A2) after one time step. By applying again Eq. (A3), further evolution is obtained:At

*t*

_{0}+

*n Dt*we obtain a binomial distribution with

The analytic solution, Eq. (A4), does not show any spread, whereas the solution of the master equation is spread out over *n* + 1 grid intervals at time *t *_{0} + *n Dt*. Thus, if *n* and correspondingly increasing *Dt*, the spread decreases. There is, however, the complicating effect that Eq. (A5) depends also on *Dq*, the situation is singular in that the master equation reproduces the analytic solution Eq. (A4). In that sense, a slight reduction of *Dq*. We may overcome this complication by looking just at differences (*I* − *J*)*Dq* − *Dt* is altered such that this difference remains the same, the spread of Eq. (A5) decreases with increasing *Dt*. The extension of Eq. (A1) with small diffusive terms would not affect the result that numerical diffusion is reduced by increasing the time step *Dt*; it would sure complicate the treatment, yet removing this singularity. For the Λ-dimensional case, numerical diffusion can be expected to grow approximately with Λ(*Dq*^{2}/*Dt*) [see Egger and Dall’Amico (2007), where the study in this appendix is extended and idealized two-dimensional flow configurations are also studied].

^{1}

With the evolution of computing power, the implementation of four- or five-dimensional EMEs will eventually become feasible, provided that appropriately long time series are available.

^{2}

Nicolis (1990) used the concept of Markov partitions to discretize the third component of the Lorenz (1963) model into two grid intervals. Pasmanter and Timmermann (2002) made use of equipartitions (i.e., each cell contains the same number of observations) to discretize one- and two-dimensional phase planes. Our approach of considering equal sized grid intervals has been used by Egger (2001), and, similarly, Thuburn (2005) partitioned into equal sized cubical cells the phase space spanned by the variables of the Lorenz (1963) model.

^{3}

See also the discussion on stochastic matrices and the Perron–Frobenius theorem in Pasmanter and Timmermann (2002).

^{4}

Crommelin (2004) introduced an approach to statistical significance applicable to equipartitions into, say, up to 10 cells, whenever one is interested in a few meaningful state transitions. Vautard et al. (1990) introduced a significance test based on Monte Carlo simulations while examining a few atmospheric circulation pattern transitions.

^{5}

A second-order Markov chain involves a number of coefficients of order (*i*_{1max} · *i*_{2max} · . . . · *i*_{Λmax})^{3}, where *i _{λ}*

_{max}is the total number of grid intervals along the

*λ*th axis, and Λ is the number of variables. Such a hypothesis test is impractical with the computing resources we have. (In the papers cited in section 1, such hypothesis tests have been carried out by only a few authors in situations where the total number of cells was up to 5.) If the test is computationally feasible, the limited amount of data generally available in atmospheric applications would not allow the rejection of the hypothesis because of the high number of cells we plan to work with. Even if the hypothesis is rejected, the questions remain open on whether a first-order Markov chain description may still bring some insight in the underlying processes and whether a higher-order Markov chain description represents a feasible way to deal with the available data. For these reasons, we introduce another type of test of the quality of the EME.

^{6}

If nonadditive noise terms were present, care should be taken in the choice of the numerical scheme (e.g., Kloeden et al. 1997).

^{7}

Correlation functions were not introduced in section 4b because both the ones directly estimated from a time series and the ones delivered by an EME derived from the same time series improve with increasing time series length. Correlation functions estimated from a time series of length Δ*t * = 3200 negligibly improve if a substantially longer time series is used. A time series of length Δ*t * = 3200 occupies densely and uniformly the portion of the phase space where the attractor is located [not shown, compare Fig. 3b with Fig. 3.8 in Dall’Amico (2005)]. Correlation functions are derived from this whole portion.

^{8}

Kaplan and Glass (1995, 308–311) mention how time-lag embedding enables the reconstruction of the geometry of a chaotic system from a time series even if only one of the variables is measured.