## 1. Introduction and motivation

### a. Background

Interannual variability in the Tropics is dominated by the El Niño–Southern Oscillation phenomenon (ENSO; see Neelin et al. 1998, for a review). Models of various degrees of complexity are capturing different aspects of interannual variability in the Tropics with increasing success (see Latif et al. 1998, for a review).

Consistence of the initial state with a coupled ocean–atmosphere model can result in smaller spinup errors and useful forecast skill over longer lead times. Data assimilation with a fully coupled system is therefore gaining attention (Hao 1994; Hao and Ghil 1995). Recently, Chen et al. (1998, 1999) showed that the skill of the Lamont model (Cane et al. 1986; Zebiak and Cane 1987) can be considerably improved by assimilating better wind field data [National Aeronautics and Space Administration's Scatterometer (NSCAT) satellite-derived winds instead of The Florida State University observed winds], as well as sea level data from tropical Pacific tide gauges into the coupled model. Furthermore, to obtain a better initial state for an El Niño forecast, one should use all the observations available—from both ocean and atmosphere—especially those about the upper tropical ocean (Hao and Ghil 1994; Miller et al. 1995).

Here we present a data assimilation study with an intermediate coupled model using the extended Kalman filter (EKF). The main advantage of using the EKF is that it gives explicitly the evolution of forecast-error covariances, while the major difficulties in applying it are its computational cost and the need to specify the model-error covariance matrix 𝗤.

Recent progress in data assimilation for ocean models has been reviewed by Anderson et al. (1996). Cane et al. (1996) studied data assimilation using the Kalman filter via a reduced state space approach, to deal with the issues of computational cost and sparse data coverage. Fukumori et al. (1999) evaluated the feasibility and accuracy of assimilating satellite altimetry data into a global ocean general circulation model, using an approximate Kalman filter and smoother; following the approach of Fukumori and Malanotte-Rizzoli (1995), they computed the time-independent asymptotic limit of the forecast-error covariance 𝗣^{f} on a reduced state space while carrying the model on the full grid. Verron et al. (1999) assimilated satellite data into a primitive equation model with the EKF on a reduced state space. They found that this EKF was efficient in transferring the information from the surface-height observations to the deep ocean.

In this study, we compute the Kalman gain on the full space of the coupled model and conduct several “identical twin” experiments. Our intention here is to explore ENSO data assimilation using EKF on a simple yet fully coupled nonlinear model. By understanding the details of how EKF works and the way observational information is propagated in this intermediate coupled model, we hope to gain insight into the workings of the EKF when applied to more complicated models. In Part II (unpublished manuscript) we extend the approach to the problem of parameter estimation.

### b. Outline

Part I of this two-part study has three main objectives. The first one, stated in the previous paragraph, is to gain insight into the workings of an EKF approach for the coupled ocean–atmosphere system in the Tropics. Second, we examine how error propagation differs between a fully coupled model and an uncoupled ocean model driven by wind stress. Third, we study the optimal placement of observations in the coupled case.

The present study uses synthetic data. Such studies are worthwhile supplements to more practically oriented ones that use real data, in that the true-state history and model error statistics are known and can be compared easily with the results of the assimilation. Many studies have been carried out using real observations for *uncoupled* tropical ocean models, as was the case in the past for atmospheric models (e.g., Ghil and Malanotte-Rizzoli 1991). Some studies addressing ENSO have used real data, advanced data assimilation methods, and complete ocean models, but prescribed winds (e.g., Behringer et al. 1998; Ngodock et al. 2000; Verron et al. 1999; Fukumori et al. 1999). Simple data assimilation methodologies have been applied with intermediate coupled models to assimilate real data (e.g., Chen et al. 1998, 1999).

Studies with advanced data assimilation systems and intermediate coupled models that have used real data, while valuable, are still at a fairly preliminary stage (Bennett et al. 1998, 2000; Lee et al. 2000). For example, Lee et al. (2000) used an adjoint method to assimilate monthly mean data of sea surface heights and temperatures, as well as wind stress, for September 1996 to January 1998. The method was applied to sliding 6-month intervals of data, from September 1996 to July 1997, in an intermediate coupled model with a statistical diagnostic atmosphere component. Forecasts were issued from the last month of each such window and compared with the actual evolution of the system, based on assimilation results valid at the appropriate epoch. These authors found that the intermediate coupled model has a reasonable skill in reproducing observed interannual variability of SST and sea level during the 1997–98 El Niño.

In the present study, wind stress error is considered to be the sole model deficiency. As the tropical oceans are largely wind-driven (Gill 1980; Philander 1990), errors in the wind stress have a great impact on the simulated ocean circulation (Leetmaa and Ji 1989; Sheinbaum and Anderson 1990; Fu et al. 1993). Hao and Ghil (1994) and Miller et al. (1995) have shown that, in the absence of ocean–atmosphere feedback, oceanic data can compensate for errors in the wind stress. Even though an atmospheric component is included explicitly in a coupled ocean–atmosphere model, wind stress error can still be present and produce large simulation errors, since higher-frequency variability dominates in the atmosphere and thus effects the model's ocean. The profound effect of observed wind stress errors on the prediction skill of a coupled model has been noticed by Graham et al. (1992).

We first present the dynamical structure of the forecast errors due to the wind stress error in our coupled model, estimated with a linearized Kalman filter (LKF) scheme. The role of ocean–atmosphere coupling process in the propagation of model errors is investigated by comparing the results for the coupled case with those for the uncoupled one.

We then present the results of data assimilation using the extended Kalman filter. The most favorable places to measure the oceanic currents or sea surface temperature (SST) for recovering the ENSO cycle are examined by comparing the impact of assimilating observations at different longitudes.

Linear and nonlinear investigations of ENSO dynamics show that distinct parameter values can modify the coupled model's behavior, both quantitatively and qualitatively (Neelin et al. 1998). One way to improve our model and its forecast skill is to estimate the correct value of its parameters from observations, at the same time as estimating the model state. This estimation is carried out with an extension of the EKF scheme, and relevant results are reported in Part II of this study.

The structure of the present paper is as follows: In section 2, we introduce the coupled ocean–atmosphere model used in the study. Further model details appear in appendix A. In section 3, data assimilation methods are briefly reviewed, and two extensions of the Kalman filter to nonlinear problems, the LKF and EKF, are presented. The model-error covariance is constructed from wind stress errors, with details presented in appendix B. Section 4 contains the estimation of the forecast-error structure by the LKF experiments. Model-state estimation studies for the coupled model with the EKF and different data sources are reported in section 5. Concluding remarks appear in section 6.

## 2. Model solutions

We use the intermediate coupled model of Jin and Neelin (1993) and Neelin and Jin (1993) (jointly referred to as JN hereafter). The details of this coupled ocean–atmosphere model are described in appendix A. This model is essentially a further idealization of the model of Zebiak and Cane (1987). The major simplification is to treat explicitly only the zonal dependence of sea surface temperature over an equatorial strip, with the meridional structure of the associated atmospheric forcing specified.

This coupled model presents two different kinds of ENSO oscillation, of the delayed-oscillator and westward-propagating type. The nature of the oscillation depends on the values of two key parameters, namely the relative coupling coefficient *μ* and the surface-layer coefficient *δ*_{s} (JN). In this study, we choose *δ*_{s} to be 0, and *μ* to be 0.76. These parameter values correspond to the delayed-oscillator regime of the model. The model parameters are given in Table 1.

The meridional structure of the mean currents in the upper layer of the ocean is projected onto the Hermite functions and truncated at a limited number, 14 here. The coupled model thus boils down to a spatially one-dimensional system of evolution equations in the zonal direction *x* and time *t.* The amplitude for the Kelvin wave is *q*_{0}, while *q*_{n} for *n* = 2, 4, … , 14 are the amplitudes of the first seven Rossby waves.

We use a staggered grid, with ocean wave coefficients *q*_{n}, *n* = 0, 2, 4, … , 14, carried at full grid points, and SST carried at half grid points. The total length of the ocean basin is 150° (from 130°E to 80°W), with a grid spacing of 6.25°. The grid points in the *x* direction are numbered from 1 to 25, going from the basin's western to eastern boundary, and used in identifying the location of observing sections. The coupled system so discretized has (1 + 8) × 24 = 216 degrees of freedom. The time step is 6 h (see Table 1).

To illustrate the impact of the ocean-to-atmosphere coupling feedback, an uncoupled ocean model otherwise similar to the coupled one is constructed. The former displays the same time evolution as the latter when starting from the same initial state and forced with the appropriate wind stress, recorded from the coupled control run. The wind stress used in the uncoupled case is no longer a function of the SST anomalies.

A 30-year run of the model confirms that it asymptotes rapidly to a limit cycle with a period of about 3.6 years; see Fig. 1a for a verification of its SST evolution over the Niño-3 region (5°N–5°S, 90°W–150°W). Figures 1b and 1c show the meridional structure of the SST anomalies (SSTA) at a transition phase (shortly after year 5) and a warm phase (near year 6), respectively. Figures 1d and 1e show the thermocline depth anomaly *h* at these two phases. It can be seen clearly that thermocline depth anomaly *h* leads SSTA. Hereafter, we will refer to the results of the model run without data assimilation (and without any errors imposed) as the reference solution. The reference solution presents many typical features of the ENSO phenomenon, including a standing oscillation in SST, with subsurface dynamics providing the memory (Suarez and Schopf 1988; Neelin et al. 1994).

Figure 2 shows the time–longitude anomaly plots of the reference solution's SSTA, wind stress (*τ*), Kelvin wave (*q*_{0}) and first Rossby wave (*q*_{2}) amplitude for 30 years. The SST anomalies are confined to the eastern part of the basin. Their oscillation has a strong standing component, while slight eastward propagation is also present. The wind stress anomalies lag slightly behind the SST anomalies in time, and are shifted to the west of the SST anomalies in space, with the maximum located in the central part of the basin.

Differences in phase and in the location of the maxima among the SST, Kelvin wave, and Rossby wave fields indicate the important effect of ocean wave dynamics on the oscillation. Notice that the slowly propagating oceanic waves here are not the conventional free-basin modes, but rather a packet of mixed SST–ocean-dynamics modes of the coupled system (JN; Hao et al. 1993; Neelin et al. 1994). This is especially apparent in the presence of the Rossby wave component along the equator itself, with opposite phases in the eastern and western part of the basin, while the Kelvin wave component still exhibits predominantly eastward propagation.

Since the surface-layer coefficient *δ*_{s} is set to be zero here, the thermocline feedback dominates (JN; Hao et al. 1993). Large variations of the SST in the eastern part of the basin are mainly due to the shallowness of the mean thermocline there. In the western part, the SST anomalies are smaller since the mean thermocline depth there is much greater.

## 3. Data assimilation methods

### a. Methodology

#### 1) Extended Kalman filter

Since the SST equation is nonlinear, the extended Kalman filter is applied here for assimilating the observations. In the EKF, the nonlinear model is linearized around the current state when estimating the propagation of the forecast error, while the state itself is advanced according to the full nonlinear model. Miller et al. (1994) discussed a number of distinct Kalman filter extensions to nonlinear systems. Ide and Ghil (1997a,b) studied the EKF for vortex systems in the time-continuous case. We describe the EKF here briefly, following the unified notation of Ide et al. (1997).

**x**

^{f}

*t*

_{k+1}

_{k}

**x**

^{a}

*t*

_{k}

**x**

^{a}(

*t*

_{k}) is the best estimate of the model state at the last time step

*k,*and

_{k}is the state transition function; superscripts “a” and “f” stand for analysis and forecast, respectively.

**(**

*η**t*

_{k}),

**x**

^{t}

*t*

_{k+1}

_{k}

**x**

^{t}

*t*

_{k}

*η**t*

_{k}

**is a Gaussian white-noise sequence, with mean zero and model-error covariance 𝗤,**

*η**E*

**(**

*η**t*

_{k}) = 0 and

*E*

**(**

*η**t*

_{k})

*η*^{T}(

*t*

_{l}) = 𝗤

_{k}

*δ*

_{kl}, where

*E*is the expectation operator and

*δ*

_{kl}is the Kronecker delta. The natural system's evolution is detected by observations contaminated by errors

*ϵ*^{o}

_{k}

**y**

^{o}

_{k}

_{k}

**x**

^{t}

*t*

_{k}

*ϵ*^{o}

_{k}

*ϵ*^{o}

_{k}

_{k}. The observation function

_{k}can be nonlinear as well. The superscripts “t” and “o” denote truth and observation. Even though the dynamic model

_{k}and observation

_{k}may be nonlinear, a linear estimation of the model state, based on the observations, is sought:

**x**

^{a}

*t*

_{k}

**x**

^{f}

*t*

_{k}

_{k}

**d**

_{k}

**d**

_{k}

**y**

^{o}

_{k}

_{k}

**x**

^{f}

*t*

_{k}

_{k}for the observations are obtained by minimizing the expectation of least squares distance between the analysis and the true state.

*t*

_{k}:i) forecasting step,ii) updating step,

The matrix 𝗞_{k} is the Kalman gain, which represents the optimal weight given to observations in updating the model state, and 𝗛_{k} is the linearized approximation of the observation function _{k}. The state transition matrix 𝗠_{k} is the state transition function _{k} linearized about the current state. The system propagates the forecast-error covariance linearly in time, while the state itself evolves nonlinearly, with _{k}. When the functions _{k} and _{k} are linear, the EKF reduces to the conventional Kalman filter. When there is no observation, only the forecasting step is performed, and the optimal estimation of the model state is the model forecast. The analysis-error covariance ^{a}_{k}^{f}_{k}

#### 2) Linearized Kalman filter

The structure of the forecast-error covariance represents the impact of the model dynamics on the model-error propagation. For a nonlinear system, the evolution of the forecast-error covariances depends on the actual model trajectory. Given a prescribed trajectory, the propagation of model-error structure can be carried out independently, that is, “offline.” When linearization is done along a prescribed trajectory, the resulting estimation process is called the linearized Kalman filter (LKF; Gelb 1974; Fukumori and Malanotte-Rizzoli 1995).

The LKF scheme, like the EKF, estimates the propagation of the model state and model error. The weights of the observations for the estimation of the model state are calculated according to (10). In the LKF scheme, however, the nonlinear functions _{k} and _{k} are linearized about the prescribed state vector **x**^{p}, which is specified prior to processing the observations. The Kalman gains may be precalculated and stored in the computer and there is no feedback of the updated model states on the estimation of 𝗣^{f}.

The LKF scheme is often used to avoid difficulties that arise from the need for repeated numerical differentiation of the state transition function _{k}. Our main purpose in using it for the present study is to obtain as clear a picture as possible of the effects of coupling on the structure of the error-correlation field.

On the other hand, the LKF is generally less successful in assimilating data than the EKF scheme because the fixed trajectory is often not as close to the true one as the sequentially updated trajectory (Gelb 1974). Therefore, we implement the EKF scheme for the actual assimilation experiments with observations, but use the LKF scheme to investigate systematically an offline model-error propagation in the absence of observations. In the present section, we use, therefore, Eq. (7) to compute the evolution of the error covariances, with the model matrix 𝗠_{k} given by the linearization of _{k} about the reference solution of Fig. 1.

### b. Error covariances

#### 1) Model-error covariance 𝗤

The most severe errors in tropical ocean prediction seem to arise from wind stress errors (Leetmaa and Ji 1989; Graham et al. 1992; Hao and Ghil 1994; Miller et al. 1995). We introduce, therefore, model errors at every time step by adding noise to the wind stress that is obtained from the model atmosphere's response to SST anomalies. The wind stress error is specified as in Miller and Cane (1989), Miller et al. (1995), and Cane et al. (1996). It is taken as white in time and Gaussian-correlated and homogeneous in space. The errors are assumed to have the same meridional structure as the atmospheric response to the SST anomalies.

^{w}is given by

*e*

_{k,i}

*e*

_{l,j}

*σ*

^{2}

_{τ}

*δ*

_{kl}

*e*

^{−(xi−xj)2/2L2x}

*e*

_{k,i}and

*e*

_{l,j}are the wind stress errors at locations

*i*and

*j,*at time steps

*k*and

*l,*respectively,

*δ*

_{kl}is the Kronecker delta,

*σ*

_{τ}is the standard deviation of the wind stress error, and

*L*

_{x}the prescribed decorrelation distance. We use

*σ*

_{τ}= 0.02 Pa and

*L*

_{x}= 10° of longitude, as used by Miller and Cane (1989).

*τ*is the wind stress and

**e**

_{k}is the wind stress error at time

*t*

_{k}. Our model-error covariance 𝗤 is then computed from

**, as described in appendix B.**

*η*Figure 3 shows model-error correlations between SST and selected ocean wave coefficients—the Kelvin (*q*_{0}), and the first and the last even Rossby coefficients of this model (*q*_{2} and *q*_{14}). The autocorrelations of the model variables exhibit a structure similar to that of the wind stress error imposed. All Rossby wave modes are positively correlated with each other, since they travel in the same direction and thus the wind stress projects onto them with the same sign. However, model-error correlations between the Kelvin and Rossby waves are negative. Negative correlations among these waves were also obtained by Miller and Cane (1989), who considered the wind stress noise in an equatorial ocean wave model. The SST correlates positively with the Kelvin and negatively with the Rossby waves.

A first-order approximate projection, Eq. (B3), is used to transfer the model-error covariance of the wind stress to the model's other variables. Hence, the estimated forecast-error structure is accurate only when the model error is fairly small. A particular problem that may arise otherwise is loss of positive definiteness of 𝗤 during the projection procedure (Balgovind et al. 1983; Parrish and Cohn 1985; Boggs et al. 1995; Gaspari and Cohn 1999). The diagonal elements of 𝗤 are required here to be positive and have a minimum value on the order of 10^{−6} to prevent this problem from occurring. The assimilation experiment results shown later suggest that the projection, so modified, is satisfactory.

#### 2) Observation-error covariance 𝗥

The observation-error covariance matrix 𝗥 is prescribed here as a diagonal matrix, *R*_{ij} = *σ*^{2}_{i}*δ*_{ij}, where *σ*_{i} is the variance of the state variable *x*_{i}. The observations are represented as measurements *T* of SST, amplitudes *q*_{n} of the oceanic waves, or atmospheric wind stress data *τ* at longitudinal locations; they are all taken twice a month if no other specification is given.

The observations are contaminated by white noise, with the following standard deviations: *σ*^{o}_{T}*σ*^{o}_{q0}*σ*^{o}_{qn}*σ*^{o}_{τ}*u,* thermocline depth anomaly *h,* and vertical velocity *w* are *σ*^{o}_{u}^{−1}; *σ*^{o}_{h}*σ*^{o}_{w}^{−1}.

#### 3) Initial forecast-error covariance 𝗣^{f}_{0}

^{f}

_{0}

The initial errors are white in space, with the following standard deviations at all grid points: *σ*_{T} = 0.9 K; *σ*_{q0}*σ*_{qn}*n* = 2, 4, 6; and *σ*_{qn}*n* = 8, 10, 12, 14. The standard deviations for all wave coefficients are in nondimensional units, as for the observation errors. The initial forecast error covariance ^{f}_{0}*t* = 0: *P*^{f}_{0ij}*σ*^{2}_{i}*δ*_{ij}.

## 4. Forecast-error statistics

The behavior of forecast-error covariances is of interest for our understanding of how data assimilation works, in theory as well as in practice. For a linear system, the Kalman filter correctly estimates the forecast-error covariance, which represents the expected relationship between the forecast errors, based on initial uncertainties, model errors, and model dynamics, and thus provides the optimal gain 𝗞 for the observations. For a nonlinear system, the EKF is nearly optimal in most circumstances (but not all; see the discussion of Miller et al. 1994, 1999, for various strongly nonlinear models). In this section, we study forecast-error propagation by linearized model dynamics according to (7), in preparation for the actual data assimilation experiments.

The random wind stress errors affect the evolution of 𝗣^{f} by appearing in (7) through the model-error covariance 𝗤. The latter is obtained according to the error model governed by (11) and (12). In practice, we have used a different value of 𝗤_{k} at each time step *k.* This value was obtained by using the outer product 𝗤_{k} = *η*_{k}*η*^{T}_{k}*η*_{k} is a particular random vector. The time-averaged effect on the evolution of ^{f}_{k}

### a. Uncoupled case

Experiments are carried out first for the uncoupled case. In this case, the tropical ocean model includes no ocean–atmosphere coupling feedback and is forced by a prescribed time sequence of wind stress, generated by the coupled model. With no wind stress noise added, the uncoupled run reproduces the time evolution of the ocean in the reference solution.

_{k}can be written symbolically as

_{k}

^{(o)}

_{k}

^{(a)}

_{k}

^{(c)}

_{k}

^{(o)}

_{k}

^{(a)}

_{k}

^{(c)}

_{k}

_{k}

^{(o)}

_{k}

^{(a)}

_{k}

^{(c)}

_{k}

_{k}

^{(o)}

_{k}

_{k}, as described above. This is not entirely self-consistent but helps us isolate and highlight further the role of coupled versus uncoupled dynamics in forecast-error evolution.

To track the error evolution, we compute the estimation error standard deviation, that is, the square root of each diagonal entry of 𝗣^{f}, and refer to it as the root-mean-square (rms) error hereafter. In the uncoupled case, the rms error of each field at each grid point reaches an asymptotic behavior after a fairly short transient of a few months. This asymptotic behavior is the result of the competition between three effects: (i) the oscillatory dynamics of the model, (ii) the steady pumping of error into the model by the addition of 𝗤_{k} at every step *k,* and (iii) numerical dissipation that mimics physical dissipation of energy and, hence, error (Ghil et al. 1981; Fukumori and Malanotte-Rizzoli 1995).

In the uncoupled model, with no feedback between ocean and atmosphere, the forecast errors for the amplitude *q*_{0} of the Kelvin wave and *q*_{2} of the first Rossby wave quickly reach their steady-state levels (Figs. 4b,c). This saturation is expected since the ocean is driven by the wind and no SST changes are allowed to feed back to the oceanic waves in this case. Thus the latter two effects balance each other almost exactly and the oscillatory effect is absent from the wave-component errors. At steady state, the Kelvin wave exhibits a fairly gradual eastward increase of the forecast error, while the Rossby waves have almost constant forecast error over most of the basin, with larger values concentrated near the eastern basin boundary.

SST forecast errors, on the other hand, are dominated by a regular oscillation with a period of about 3.6 years, as suggested by Fig. 1 (see also Jiang et al. 1995; Moron et al. 1998). They are concentrated overall in the central and eastern basin, while their peak values lie in the central basin (Fig. 4a). At the time where these peak values reach their asymptotic steady state, at *t* ≃ 15 years, they are about twice as large as the accumulated model errors directly imposed on the SST field (not shown). This difference indicates that the wind stress error transmitted through the oceanic waves to the SST is amplified by the model dynamics; but the rms errors in SST also reach a cyclo-stationary steady state, due to the balance between the model-error pumping and the dissipation.

The zonal variations of the forecast errors for the ocean waves are consistent with the analytical arguments by Miller and Cane (1989) that the faster Kelvin wave picks up more wind stress errors on its way eastward, while the slower Rossby waves do not accumulate much error during their propagation westward, since the wind stress error is white in time, but correlated spatially. The large Rossby wave forecast errors that occur at the eastern boundary are due to the reflection of the Kelvin wave, which has a larger forecast-error level there (Fig. 4c).

The forecast-error correlations estimated by LKF at the end of 5 years are shown in Fig. 5, by which time a fairly steady structure has emerged. The correlation patterns display clearly the impact of the oceanic waves: the autocorrelations for Kelvin or Rossby waves are highest in their respective directions of propagation (Figs. 5d and 5g). The angle between the axis of maximum positive correlation of distinct Rossby waves and the main diagonal—about 22° for *q*_{2} and *q*_{4} versus 39° between *q*_{2} and *q*_{14} (see Figs. 5h and 5i)—corresponds to the difference between their wave speeds; for example, *q*_{4} travels at 3/7 of the speed of *q*_{2}, while *q*_{14} travels at 1/9 of *q*_{2}'s speed. In the absence of stochastic perturbations, these correlations would be reduced to sharp straight lines whose slopes can easily be computed from the phase velocities of the various waves involved.

The positive correlations near either zonal boundary between the Kelvin and Rossby waves (Figs. 5e,f) stem from the wave reflection there. The slower phase speed of the highest wave (*q*_{14}) results in a much smaller positive region for its correlation with the Kelvin wave (Fig. 5f) than that in the cross correlation between the lowest Rossby wave *q*_{2} and the Kelvin wave *q*_{0} (Fig. 5e).

The ocean's thermodynamic processes result in a tight SST autocorrelation for the western part of basin, and broader structure in the central and eastern basin (Fig. 5a). The wide correlation area is associated with the large SST forecast error in the eastern basin (Fig. 4a). The positive correlation between the SST and Kelvin wave is consistent with the oceanic waves's impact in the finite basin on the SST: the Kelvin wave anomalies excited by the wind stress in the central basin induce SST changes with the same sign on the way to the east, while Rossby waves reflected from the Kelvin wave off the eastern boundary also enhance the SST changes in the eastern basin (Figs. 5b and 5c). However, the Rossby waves directly excited by the wind stress and their Kelvin wave reflection off the western basin boundary correlate negatively with the SST changes in the entire basin (Fig. 5c). These negative correlations represent the negative feedback process between the ocean and atmosphere there.

### b. Coupled case

The LKF is applied next to the coupled model to estimate its forecast-error structure. The model-error covariances 𝗤_{k} are the same as for the uncoupled case, while the linearized error-evolution dynamics is now given by the full 𝗠_{k} of Eq. (14). The difference between the coupled and uncoupled cases is in the state transition matrix 𝗠.

The forecast errors in the coupled case are much larger than those in the uncoupled case (Fig. 6). The expected forecast errors in SST anomalies (Fig. 6a) are one order of magnitude larger than in the uncoupled case, but the spatiotemporal pattern of the errors is quite similar (cf. Fig. 4a). The coupling processes clearly amplify the model's SST errors. The errors in the oceanic wave amplitudes *q*_{0} and *q*_{2} (Figs. 6b,c) are a few times larger than in the uncoupled case (see Figs. 4b,c) and their patterns differ substantially from the uncoupled case. In the latter, both *q*_{0} and *q*_{2} errors were steady in time after 5 years and increased monotonically from west to east, gradually for *q*_{0} and abruptly near the eastern boundary for *q*_{2} (Figs. 4b,c). In the coupled case at hand, both *q*_{0} and *q*_{2} evolve over the first 10 years toward a fairly regular oscillation, with two separate maxima, in the eastern and western basin (Figs. 6b,c). For the Kelvin wave, the maximum in the eastern basin is larger, while it is larger in the western basin for the first Rossby wave. This makes sense given the accumulation of errors as each wave propagates across the basin.

The forecast-error correlation structure of the coupled case, estimated by LKF at the end of year 5, is shown in Fig. 7. The forecast-error correlations in the coupled and the uncoupled cases have, overall, similar structures when SST is involved (compare Figs. 7a–c with Figs. 5a–c). Roughly speaking, the SST errors over the entire basin are negatively correlated with the errors in the oceanic waves over the western part of the basin and positively correlated over the eastern part. The relatively small differences between these correlations and those in the uncoupled case are not surprising, because the model-parameter values used in the present paper—moderate coupling of *μ* = 0.76 and no upwelling feedback, that is, *δ*_{s} = 0—yield a strong influence of thermocline depth on SST.

The forecast-error statistics of the oceanic waves, on the other hand (Figs. 7d–i), show major differences between the coupled and uncoupled cases. The coupling processes enhance all the correlations, especially the cross correlations between oceanic waves, and emphasize an East–West seesaw pattern. This seesaw structure is expressed by negative autocorrelations, as well as cross correlations, of the ocean waves in the western basin; it might be related to the most unstable coupled mode in this intermediate model (JN; see also Figs. 6b,c).

## 5. Assimilation experiments

With confidence in our sequential estimation approach confirmed in the previous section on covariance propagation, we continue the investigation by assimilating various types of observations. The questions which we want to address in this “model world” campaign are: What is the minimum number of observations needed in order to follow the evolution of an ENSO cycle? Where should these observations best be taken to minimize estimation error for the ocean state and SST?

We conduct identical-twin experiments in which the true model history, as well as the forecast and observations, are produced by the same model. The true history differs from the reference solution in Figs. 1 and 2 in two respects: (i) an initial error that is drawn from the distribution having ^{f}_{0}_{k} as its covariance matrix and are imposed at each step during the model simulation. The observations are taken twice a month and differ from the true-state history due to errors in the observation process with covariance matrix 𝗥. The matrices ^{f}_{0}

The assimilation experiments are all carried out for 30 years. They start from an initial state that differs from the true initial state by having a different phase along the reference solution's limit cycle. In these experiments, we use the full EKF, according to Eqs. (6)–(10). The case of a model that generates the true-state history being different from the one that is used for the assimilation is handled in Part II.

In the following experiments, *q*_{n} data at a single point refer to observations of all the oceanic waves included in the model at that zonal location. Since the meridional structure of the ocean fields is projected on the Hermite functions, these observations are equivalent to having measured mass and velocity fields for a whole meridional section: the projection operator is bijective (one-to-one and onto) when restricted to the finite-dimensional subspace of interest. In reality, oceanic observations do not provide direct measurements of *q*_{n}. However, in our present, highly simplified model, *q*_{n} data can be used directly and should provide theoretical guidance on how to use subsurface mass and velocity data in a coupled system.

### a. Simulated true state

Figure 8 shows the time evolution of the SST anomalies, wind stress *τ,* Kelvin wave amplitude *q*_{0}, and the first Rossby wave amplitude *q*_{2} for the simulated true-state history that starts with perturbed initial data. In our terminology, the reference solution (see Fig. 2) is the model run without any errors imposed, while the history of the true states is generated by adding errors to the initial state and to the wind stress forcing. The wave fields *q*_{0} and *q*_{2} are noisier than the SST anomalies because the wave coefficients *q*_{n}, *n* = 0, 2, … , 14, are directly affected by the wind stress noise while SST is only influenced through the integrated effect of the wind errors [see Eq. (A20)].

We choose the initial state of the true-state history to be in a different phase of the ENSO cycle than the reference solution's initial state. The reference solution (see Fig. 2a) starts from a transient phase that enters immediately a cold-anomaly phase (La Niña), while the true-state history (see Fig. 8a) starts from the next transient phase that enters immediately a warm-anomaly phase (El Niño). Our data assimilation experiments all start out with the model's initial state being the same as in the reference solution (Fig. 2a), rather than the “true” initial state (Fig. 8a).

### b. State estimation

#### 1) SST data at a single location

We first perform an EKF assimilation run with a single SST observation located in the eastern basin at 108°W. To the extent that the SST equation (A1) in the JN model represents off-equatorial behavior as well, such an SST observation should also be thought of as a meridional section of SST measurements.

Time evolution of the SST anomaly and the amplitude *q*_{0} of the Kelvin wave over 30 years are presented in Figs. 9a,b. EKF assimilation does recover the phase information of the true-state history with the initial state starting from a different phase (compare Fig. 9a with Fig. 8a). The diagonal elements of the forecast-error covariance matrix 𝗣^{f} (dashed) and analysis error covariance matrix 𝗣^{a} (solid) for the SST (Fig. 9c) are concentrated in the eastern basin, while those of *q*_{0} are spread over the whole basin (Fig. 9d). This is not surprising because we have seen in sections 4a and 4b that SST forecast errors are the largest in the central and eastern basin, while the forecast errors for the wave amplitudes *q*_{n} are large across the whole basin (see especially Fig. 6). The improvements in both SST and *q*_{0} indicate that the SST information is transferred to the oceanic waves by the EKF assimilation. That the latter does capture the ocean–atmosphere coupling processes well is consistent with the large SST–ocean-wave cross correlations in Fig. 7.

We wish to determine the preferred observing location for the SST, so that the forecast errors are smallest when the unique SST observation is taken at that location. To do so, EKF experiments were performed for two other SST observation locations, in the western basin at 152°E and in the central basin at 177°W. The three single-SST experiments will be referred to by the index of the grid point at which the single observation is taken. It is found that the assimilation errors for SST observations are smallest when using the SST data in the eastern basin (Fig. 9), and increase when the SST observation moves toward the west.

The rms errors of the EKF assimilation across the basin width are shown in Fig. 10 for the model forecast without data assimilation, as well as for the three experiments with SST data located in the eastern, central, and western basin. The state-estimation errors when using an SST observation in the eastern basin at 108°W (denoted by SST_{20}) for the SST anomaly, Kelvin wave, and the first two Rossby wave amplitudes (heavy solid) are consistently quite small. Their being less than 10% of the maximal model-forecast errors (short dashes) indicates the effectiveness of the EKF scheme.

The assimilation errors with the single observation at 177°W (denoted by SST_{9}) are considerably larger, but still less than 50% of the maximal errors. The data assimilation experiment using an SST observation in the western basin at 152°E (denoted by SST_{4}) barely reduces the model forecast errors without data assimilation (short dashes). The latter assimilation experiment fails to recover the right phase of the true-state history, even though it recovers the correct magnitude of the anomalies (not shown). This is not surprising because in our simple coupled model, there is not much SST signal in the western basin at all (see Figs. 2a and 8a).

The results in Figs. 9 and 10 suggest that the single-section SST measurements in the eastern part of the basin are most valuable, those in the central basin are still quite useful but do not reduce the estimation errors as much as necessary, while those in the western basin are barely useful at all. The usefulness of SST data can be related to the corresponding forecast error at the observed location, since the SST errors are largest in the eastern part of the basin and decreased westward (Fig. 10a). In fact, roughly speaking, the contribution of an observation in an EKF-based assimilation is proportional to the corresponding forecast-error variance at the observed location; see Eqs. (4) and (10).

#### 2) Subsurface data at a single location

The EKF assimilation of the subsurface data *q*_{n} at a single location in the eastern basin (111°W) works as well as that of the SST data nearby and even somewhat better (Figs. 11a,c). Note that we assumed all the ocean wave coefficients *q*_{n}, *n* = 0, 2, … , 14, are known at that location. The analysis-error covariances 𝗣^{a} (solid) are successfully reduced by the *q*_{n} updates and result in forecast-error covariances 𝗣^{f} (dashed) that are barely any larger (Figs. 11b,d). Alternate odd and even peaks in 𝗣^{f} for SST (Fig. 11c) correspond to positive and negative phases of the SST anomaly cycle (Fig. 8a).

The forecast-error correlations estimated by EKF at the end of 10 years are shown in Fig. 12. Compared to the LKF estimates for the coupled model (Fig. 7), the autocorrelations (Figs. 12a,d,g vs 7a,d,g) are quite similar but the cross correlations differ more noticeably. The latter differences are probably due to the fact that the EKF captures better than the LKF the evolution in the spatial error structure over a full ENSO cycle. Perhaps most telling is the difference in the structure of the cross correlations between SST and *q*_{0} (Fig. 12b vs Fig. 7b). The EKF must be capturing the nonlinearity in the subsurface temperature relation to thermocline depth that in turn affects SST. This relation is known to differ between a warm and a cold phase.

The preferred location to measure subsurface data is investigated by using ocean wave data *q*_{n} at a single location in the western (149°E), central (180°), or eastern basin (111°W). The improvements in the SST (Fig. 13a) and ocean wave fields (Figs. 13b–d) are very similar. This is to be expected because the forecast errors in Kelvin wave and Rossby wave amplitudes are comparable in different parts of the ocean basin (see Figs. 6b,c). Since strong ocean wave signals are present throughout the basin (see Figs. 2b,d and 8b,d), *q*_{n} observations taken at different locations are able to provide the information needed to recover the true state.

Uncoupled ocean wave propagation is the mechanism of ENSO memory in the delayed-oscillator “toy models” for ENSO (Suarez and Schopf 1988; Battisti and Hirst 1989; Jin 1997; Burgers 1999). If this were the case in less highly idealized models, one might think that observations in the western part of the basin would provide the most useful information. Hao and Ghil (1994) showed this to be true for the uncoupled ocean model of Cane and Patton (1984), which is slightly more realistic than the JN ocean model used here. They attributed this result in their model to the Kelvin waves's greater speed and hence efficiency in carrying information eastward.

The same could be true in the coupled situation, if the wind stress forcing were perfectly simulated by the model. Because the coupled model is imperfect, however, the western basin observational information—carried eastward, albeit quite efficiently, by Kelvin waves—becomes less accurate than the data observed in the eastern basin (Figs. 13b–d). Hence the *q*_{n} data from the central and eastern basin are just as useful in our coupled model as those from the western basin.

#### 3) Three SST observations

Additional assimilation experiments were carried out in which a number of distinct sections of SST and *q*_{n} data were used. These are included in Table 2, along with the results of the single-section experiments already discussed. We show 30-year rms values of Niño-3 SST, and those of ocean wave coefficients *q*_{0}, *q*_{2}, and *q*_{4} averaged over the whole basin.

An EKF experiment with SST observations at three locations—177°, 139°, and 108°W—is denoted by 3*SST in Table 2. The errors in this case are much smaller than when using SST data in the west-central basin (177°W) only, but just slightly better than those with SST data in the east-central basin (139°W), or the eastern basin (108°W) alone. The small gain by using the additional SST data over a single, well-placed cross section suggests that the latter can deliver most of the benefits, at least in this simple model.

#### 4) Multiple subsurface sections

We saw that a single section of subsurface data gives assimilation results that are better than the best single-section results with SST data. The ocean wave results are almost independent of the single section's location, as the corresponding curves in Fig. 14 are almost indistinguishable. The numerical results in Table 2 show that, in fact, the farther east the section the better, albeit by a very small margin.

To see whether one can still improve, in our simple coupled model, the EKF results for subsurface data, we ran assimilation experiments with multiple *q*_{n} sections. We denote the case with two observations of *q*_{n} at 149°E and 111°W as the 2**q*_{n} case, and the case with observations of *q*_{n} at 5 locations—at 143°E, 160°E, 180°, 142°W, and 111°W—as the 5**q*_{n} case (Table 2).

The assimilation run with five sections does have the smallest errors overall, in both SST and Rossby waves, but the additional gains of using the 2**q*_{n} and 5**q*_{n} datasets over the eastern basin data at 111°W only are quite limited. In fact, the Kelvin wave error in the 5**q*_{n} experiment is slightly worse than in other experiments with subsurface data, and both the *q*_{0} and *q*_{2} errors are slightly worse than in the *q*_{n20} experiment. These are subtle effects due to the model's distinct dispersion relation for the Kelvin versus the Rossby waves and its nonlinearity. We refer to Hao and Ghil (1994) for a discussion of the effect of wave dispersion on data assimilation in a linear, two-dimensional shallow-water model of the tropical Pacific and to Miller et al. (1994) for a similar discussion of the effect of nonlinearities in zero-dimensional but highly nonlinear models.

Such subtleties aside, one cross section of subsurface data appears rather satisfactory for this simple coupled model. Better spatial resolution and a more faithful representation of both oceanic and atmospheric processes could, however, increase the number of degrees of freedom that are active in the system and thus the number of observations required to track it accurately.

### c. Weights assigned by EKF

To understand how the EKF treats the data in different parts of the basin, the weights given to the observations in the two-section set of subsurface data are examined in Fig. 14. The weights shown here are snapshots at time step *t* = 1, 2, … , 30 years.

The observations in the eastern part of the basin are given much more weight overall, in particular when the affected variable is distinct from the observed one (see especially Figs. 14a,d). Since the strongest interaction between the SST and wave fields occurs in the eastern basin, it is reasonable to assign a larger weight to oceanic observations there, given the same observational accuracy, especially when correcting SST. This emphasis on eastern basin observations is smaller when considering the effect of *q*_{0} or *q*_{2} on each other (Figs. 14c,e). Note that the weights given to either *q*_{0} or *q*_{2} when updating the field that is being observed (Figs. 14b,f)—at the observation location itself—is about 0.5. This confirms that our assimilation scheme is well balanced, in the sense that it assigns comparable weights to the current observations, on the one hand, and to the forecast based on past observations, on the other.

We also find that the weights have similar shapes and magnitudes during different phases of ENSO, except for the weights given to *q*_{2} observations when updating SST (Fig. 14d). The shapes are still similar in the warm and cold phases, but the magnitude of the weights and the locations where the weights peak are shifted. During the warm phase, the *q*_{n} observations in the eastern basin (*q*_{n20}: black curves) are given larger weights in updating SST, and the largest weight occurs farther toward the basin's east coast than during the cold phase, while the observations in the western basin (*q*_{n4}: gray curves) are given smaller weights. During the cold phase, the observations in the western basin are given more weight and the maximum of the weights—for both *q*_{n4} and *q*_{n20} observations—occurs further towards the central basin. The magnitude of the weights during the transitions phases of ENSO are in between the extreme-phase magnitudes. The EKF thus assimilates observations of oceanic waves in the eastern basin more effectively during El Niño, while the observations in the central basin are more useful during La Niña.

## 6. Conclusions

### a. Summary

In this paper, we studied data assimilation for a simple, but fully coupled tropical ocean–atmosphere model. Our coupled model is based on Jin and Neelin's (1993) and Neelin and Jin's (1993) (JN throughout the main text) “stripped down” version of Zebiak and Cane's (1987) model and consists of an upper-ocean model of the tropical Pacific and a steady-state atmospheric response to the ocean model's SST anomalies. An advanced data assimilation method, the extended Kalman filter (EKF), was used to tackle the model's nonlinearity. This coupled model is simple enough for the EKF to be applied to its full state space and can serve, therefore, as a benchmark for future data assimilation studies with more realistic and highly resolved coupled models.

Wind stress errors were assumed to be the sole source of model errors. This assumption is a useful simplification for two complementary reasons. First, wind stress errors are a dominant source of errors in the estimation and prediction of the tropical climate system (Miller and Cane 1989; Hao and Ghil 1994; Chen et al. 1999). Second, wind stress errors represent schematically the main effects of weather noise in the system.

We started by investigating the dynamical structure of the forecast errors in either the coupled or uncoupled framework, by applying the linearized Kalman filter (LKF), which propagates the error's covariance matrix linearly along a reference solution (Fig. 2). The forecast-error structure for the coupled model differs from that of the uncoupled one. The ocean–atmosphere coupling feedback enhances the forecast-error correlations, especially the cross correlations among the various oceanic waves (compare Figs. 5 and 7).

The coupling induces and reinforces, in particular, a seesaw feature along the equator, with error correlations of opposite sign in the oceanic waves on either side of the ocean basin. This seesaw pattern seems to result from the constraint imposed on the waves by the tropical ocean's Sverdrup balance between the surface forcing and the pressure gradient. The influence of wave reflection at the zonal boundaries is also enhanced in the coupled case. The difference between the forecast-error covariances of the uncoupled and coupled cases thus suggests that we pay special attention in passing from data assimilation in the former to the latter.

Next, we have shown that the EKF scheme can effectively estimate the tropical ocean–atmosphere state for the coupled model. To restore the phase and amplitude of its ENSO oscillation, the model needs only observations covering a single meridional section. Either sea surface temperature (SST) or subsurface data along such a section appear to suffice (Figs. 9 and 11).

The location of SST observations should be well chosen: a section located in the eastern basin tends to be the most useful (Fig. 10 and Table 2). The usefulness of SST observations decreases towards the western part of the basin. Subsurface data anywhere in the basin reduce the initial errors dramatically (Fig. 13), with the section in the eastern basin having the smallest errors (Table 2). The nonlocal response of the atmosphere to the SST anomalies ensures a smooth wind field and hence ocean wave fields that are smooth, too. Atmospheric observations of the wind stress, which is not a prognostic variable of the model, can be easily assimilated with the EKF scheme as well (not shown here).

### b. Discussion

The EKF extracts the information content of sparse observations, greatly enhancing the usefulness of a limited dataset. The small number of data needed to track the coupled model's periodic solution in the present model is probably due to the fact that a pair of oscillatory modes with a simple spatial structure dominate its evolution. This fact was demonstrated for the model at hand by JN and Hao et al. (1993).

The dominance of a few simple modes in the temporally irregular and spatially rich behavior of a number of related ENSO models was shown by Jin et al. (1994, 1996) and Tziperman et al. (1994), among others. Rasmusson et al. (1990), Keppene and Ghil (1992), Jiang et al. (1995) and other authors also found a large fraction of the observed variance in the tropical Pacific's various climatic fields to be captured by a few oscillatory modes. Ghil and Jiang (1998) showed that predicting the, possibly related, quasi-biennial and low-frequency mode is crucial for the accuracy of both dynamical and statistical ENSO forecasts. Ghil and Robertson (2000) argued, therefore, that the evidence from a full hierarchy of climate models, up to and including general circulation models (GCMs)—as well as from observations and real-time forecast skill—supports the dominance of a few oscillatory modes in ENSO variability on seasonal to interannual scales.

This situation for the tropical coupled system resembles that encountered for the midlatitude ocean or atmosphere: a very small number of data can suffice to track solutions of idealized models thereof (Todling and Ghil 1994; Ghil and Todling 1996; Ide and Ghil, 1997a,b). Ghil (1997), in reviewing this work, concluded that the number of data required to track a model solution with the EKF can be comparable to the number of the model's dominant linearly unstable and nonlinearly equilibrated modes (see also Ghil and Ide 1994 and Dee 1995).

Indeed, our work, as well as other data assimilation studies for the tropical Pacific that rely on the Kalman filter (Cane et al. 1995, 1996) and the EKF (Verron et al. 1999) suggest that the number of observations needed can be as small as the number of degrees of freedom required to describe the system's evolution over the time interval of observation. As we move toward accurately describing a larger fraction of the variance, the number of modes required to capture that fraction increases, and a larger number of observations will be needed. This number can still be much smaller than the model's number of gridpoint or spectral coefficient variables, even for much more highly resolved and realistic models. Similar arguments apply to the size of a Monte Carlo ensemble required for the accurate and efficient performance of an ensemble Kalman filter (see also Keppene 2000; Mitchell and Houtekamer 2000).

The forecast experiments carried out in Part II of our study using simulated data from a Tropical Ocean Global Atmosphere (TOGA)–Tropical Atmosphere Ocean (TAO)-like array assimilated into the model's oceanic component suggest that once a relatively large set of observations is available, initialization of the uncoupled ocean model might suffice for good coupled forecasts, given the atmosphere's rapid adjustment to the ocean state. Still, further improvements of the ENSO forecasts can be made by using the coupled model in the assimilation process. It is shown in Part II that the TOGA–TAO data are sufficient for good forecast skill in this intermediate model, and can be expected to provide accurate enough initial states for a more complete coupled model, when using the EKF.

The next step toward an advanced forecast and analysis cycle in a fully coupled GCM is to examine EKF or slightly suboptimal filter performance (Todling and Cohn 1994; Ghil 1997) in a somewhat more realistic coupled ocean–atmosphere model. Such a model could still have an oceanic component like the one used here. It would include, however, a prognostic atmospheric component, albeit a fairly simple one. For a coupled system with both components of GCM complexity, however, it is not clear how close to optimality a feasible data assimilation scheme can be. The requirement to assimilate the fairly sparse observations currently available into such a complex model, which has very different dominant time scales in its oceanic and atmospheric components, points to the need for advanced assimilation technology that yields initial fields consistent with the forecast dynamics.

## Acknowledgments

This work has evolved from ZH's doctoral work. CS thanks K. Ide for stimulating discussions and helpful suggestions on this work, and R. N. Miller for his help on generating white noise with a prescribed covariance. We thank the two anonymous reviewers for their constructive comments. MG also wishes to acknowledge the sabbatical hospitality of the Laboratoire de Météorologie Dynamique du CNRS, Ecole Normale Supérieure, Paris, and the support of a Visiting Chair of the French Académie des Sciences. Computational resources were provided mainly by the Department of Atmospheric Sciences, UCLA, the San Diego Supercomputer Center, and the Supercomputer Center of The Florida State University. This research was supported by NASA Grants NAG-5317 (MG and ZH), NAG 5-9294 (MG and CS), and NOAA Grant NA86GP0314 (JDN) at UCLA and by a UCAR fellowship for the IRICP Pilot Project at UCSD (ZH). ZH currently works in the IT industry and the other coauthors are thankful for his continued collaboration.

## REFERENCES

Anderson, D. L. T., , J. Sheinbaum, , and K. Haines, 1996: Data assimilation in ocean models.

,*Rep. Progr. Phys.***59****,**1209–1266.Balgovind, R., , A. Dalcher, , M. Ghil, , and E. Kalnay, 1983: A stochastic-dynamic model for the spatial structure of forecast error statistics.

,*Mon. Wea. Rev.***111****,**273–296.Battisti, D. S., , and A. C. Hirst, 1989: Interannual variability in a tropical atmosphere–ocean model: Influence of the basic state, ocean geometry and nonlinearity.

,*J. Atmos. Sci.***46****,**1687–1712.Behringer, D. W., , M. Ji, , and A. Leetmaa, 1998: An improved coupled model for ENSO prediction and implications for ocean initialization. Part I: The ocean data assimilation system.

,*Mon. Wea. Rev.***126****,**1013–1021.Bennett, A. F., , B. S. Chua, , D. E. Harrison, , and M. J. McPhaden, 1998: Generalized inversion of tropical atmosphere–ocean data and a coupled model of the tropical Pacific.

,*J. Climate***11****,**1768–1792.Bennett, A. F., , B. S. Chua, , D. E. Harrison, , and M. J. McPhaden, 2000: Generalized inversion of tropical atmosphere–ocean (TAO) data and a coupled model of the tropical Pacific. Part II: The 1995–96 La Niña and 1997–98 El Niño.

,*J. Climate***13****,**2770–2785.Boggs, D., , M. Ghil, , and C. Keppenne, 1995: A stabilized sparse-matrix U-D square-root implementation of a large-state extended Kalman filter.

*Proc. WMO Second Int. Symp. on Assimilation of Observations in Meteorology and Oceanography,*Tokyo, Japan, WMO/TD-No 651, 219–224.Burgers, G., 1999: The El Niño stochastic oscillator.

,*Climate Dyn.***15****,**521–531.Cane, M. A., , and R. J. Patton, 1984: Numerical model for low-frequency equatorial dynamics.

,*J. Phys. Oceanogr.***14****,**1853–1863.Cane, M. A., , S. E. Zebiak, , and S. C. Dolan, 1986: Experimental forecasts of El Niño.

,*Nature***321****,**827–832.Cane, M. A., , S. E. Zebiak, , and Y. Xue, 1995: Model studies of the long-term behavior of ENSO.

*Natural Climate Variability on Decade-to-Century Time Scales,*D. G. Martinson et al., Eds., National Academy Press, 442–457.Cane, M. A., , A. Kaplan, , R. N. Miller, , B. Tang, , E. C. Hackert, , and A. J. Busalacchi, 1996: Mapping tropical Pacific sea level: Data assimilation via a reduced state space Kalman filter.

,*J. Geophys. Res.***101****,**22599–22617.Chen, D., , M. A. Cane, , S. E. Zebiak, , and A. Kaplan, 1998: The impact of sea level data assimilation on the Lamont model prediction of the 1997/98 El Niño.

,*Geophys. Res. Lett.***25****,**2837–2840.Chen, D., , M. A. Cane, , and S. E. Zebiak, . 1999: The impact of NSCAT winds on predicting the 1997/98 El Niño: A case study with the Lamont–Doherty Earth Observatory model.

,*J. Geophys. Res.***104****,**11321–11327.Dee, D. P., 1995: On-line estimation of error variance parameters for atmospheric data assimilation.

,*Mon. Wea. Rev.***123****,**1128–1145.Fu, L-L., , I. Fukumori, , and R. N. Miller, 1993: Fitting dynamic models to the Geosat sea level observations in the tropical Pacific ocean. Part II: A linear, wind-driven model.

,*J. Phys. Oceanogr.***23****,**2162–2181.Fukumori, I., , and P. Malanotte-Rizzoli, 1995: An approximate Kalman filter for ocean data assimilation: An example with an idealized Gulf Stream model.

,*J. Geophys. Res.***100****,**6777–6793.Fukumori, I., , R. Raghunath, , L-L. Fu, , and Y. Chao, 1999: Assimilation of TOPEX/Poseidon altimeter data into a global ocean circulation model: How good are the results?

,*J. Geophys. Res.***104****,**25647–25665.Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125****,**723–757.Gelb, A., Ed.,. 1974:

*Applied Optimal Estimation*. The MIT Press, 374 pp.Ghil, M., 1997: Advances in sequential estimation for atmospheric and oceanic flows.

,*J. Meteor. Soc. Japan***75****,**289–304.Ghil, M., , and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography.

*Advances in Geophysics,*Vol. 33, Academic Press, 141–266.Ghil, M., , and K. Ide, 1994: Extended Kalman filtering for vortex systems: An example of observing-system design.

*Data Assimilation: A New Tool for Modeling the Ocean in a Global Change Perspective,*P. Brasseur and J. C. H. Nihoul, Eds., Springer-Verlag, 167–191.Ghil, M., , and R. Todling, 1996: Tracking atmospheric instabilities with the Kalman filter. Part II: Two-layer Results.

,*Mon. Wea. Rev.***124****,**2340–2352.Ghil, M., , and N. Jiang, 1998: Recent forecast skill for the El Niño/Southern Oscillation.

,*Geophys. Res. Lett.***25****,**171–174.Ghil, M., , and A. W. Robertson, 2000: Solving problems with GCMs: General circulation models and their role in the climate modeling hierarchy.

*General Circulation Model Development: Past, Present and Future,*D. Randall, Ed., Academic Press, 285–325.Ghil, M., , S. E. Cohn, , J. Tavantzis, , K. Bube, , and E. Isaacson, 1981: Applications of estimation theory to numerical weather prediction.

, L. Bengtsson, M. Ghil, and E. Källén, Eds., Springer-Verlag, 139–224.*Dynamic Meteorology: Data Assimilation Methods*Gill, A. E., 1980: Some simple solutions for heat induced tropical circulation.

,*Quart. J. Roy. Meteor. Soc.***106****,**447–462.Graham, N. E., , T. P. Barnett, , and M. Latif, 1992: Considerations of the predictability of ENSO with a lower-order coupled model.

,*TOGA Notes***7****,**11–15.Hao, Z., 1994: Data assimilation for interannual climate-change prediction. Ph.D. dissertation, University of California, Los Angeles, 223 pp.

Hao, Z., , and M. Ghil, 1994: Data assimilation in a simple tropical ocean model with wind-stress errors.

,*J. Phys. Oceanogr.***24****,**2111–2128.Hao, Z., , and M. Ghil, . 1995: Sequential parameter estimation for a coupled ocean-atmosphere model.

*Proc. WMO Second Int. Symp. on Assimilation of Observations in Meteorology and Oceanography,*Tokyo, Japan, WMO/TD-No. 651, 181–186.Hao, Z., , J. D. Neelin, , and F-F. Jin, 1993: Nonlinear tropical air–sea interaction in the fastwave limit.

,*J. Climate***6****,**1523–1544.Ide, K., , and M. Ghil, 1997a: Extended Kalman filtering for vortex systems. Part I: Methodology and point vortices.

,*Dyn. Atmos. Oceans***27****,**301–332.Ide, K., , and M. Ghil, . 1997b: Extended Kalman filtering for vortex systems. Part II: Rankine vortices and observing-system design.

,*Dyn. Atmos. Oceans***27****,**333–350.Ide, K., , P. Courtier, , M. Ghil, , and A. C. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational.

,*J. Meteor. Soc. Japan***75****,**181–189.Jiang, N., , J. D. Neelin, , and M. Ghil, 1995: Quasi-quadrennial and quasi-biennial variability in COADS equatorial Pacific sea surface temperature and winds.

,*Climate Dyn.***12****,**101–112.Jin, F-F., 1997: An equatorial ocean recharge paradigm for ENSO. Part I: Conceptual model.

,*J. Atmos. Sci.***54****,**811–829.Jin, F-F., , and J. D. Neelin, 1993: Modes of interannual tropical ocean–atmosphere interaction—A unified view. Part I: Numerical results.

,*J. Atmos. Sci.***50****,**3478–3503.Jin, F-F., , J. D. Neelin, , and M. Ghil, 1994: El Niño on the Devil's staircase: Annual subharmonic steps to chaos.

,*Science***264****,**70–72.Jin, F-F., , J. D. Neelin, , and M. Ghil, . 1996: El Niño/Southern Oscillation and the annual cycle: Subharmonic frequency-locking and aperiodicity.

,*Physica D***98****,**442–465.Keppenne, C. L., 2000: Data assimilation into a primitive-equation model with a parallel ensemble Kalman filter.

,*Mon. Wea. Rev.***128****,**1971–1981.Keppenne, C. L., , and M. Ghil, 1992: Adaptive filtering and prediction of the Southern Oscillation index.

,*J. Geophys. Res.***97****,**20449–20454.Latif, M., and Coauthors. 1998: A review of the predictability and prediction of ENSO.

,*J. Geophys. Res.***103****,**14375–14393.Lee, T., , J-P. Boulanger, , A. Foo, , L-L. Fu, , and R. Giering, 2000: Data assimilation by an intermediate coupled ocean-atmosphere model: Application to the 1997–1998 El Niño.

,*J. Geophys. Res.***105****,**26063–26087.Leetmaa, A., , and M. Ji, 1989: Operational hindcasting of the tropical Pacific.

,*Dyn. Atmos. Oceans***13****,**465–490.Miller, R. N., , and M. A. Cane, 1989: A Kalman filter analysis of sea level height in the tropical Pacific.

,*J. Phys. Oceanogr.***19****,**773–790.Miller, R. N., , M. Ghil, , and F. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical systems.

,*J. Atmos. Sci.***51****,**1037–1056.Miller, R. N., , A. J. Busalacchi, , and E. C. Hackert, 1995: Sea surface topography fields of the tropical Pacific from data assimilation.

,*J. Geophys. Res.***100****,**13389–13425.Miller, R. N., , E. F. Carter Jr., , and S. T. Blue, 1999: Data assimilation into nonlinear stochastic models.

,*Tellus***51A****,**167–194.Mitchell, H. L., , and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter.

,*Mon. Wea. Rev.***128****,**416–433.Moron, V., , R. Vautard, , and M. Ghil, 1998: Trends, interdecadal and interannual oscillations in global sea-surface temperatures.

,*Climate Dyn.***14****,**545–569.Neelin, J. D., 1990: A hybrid coupled general circulation model for El Niño studies.

,*J. Atmos. Sci.***47****,**674–693.Neelin, J. D., , and F-F. Jin, 1993: Modes of interannual tropical ocean–atmosphere interaction—A unified view. Part II: Analytical results in the weak-coupling limit.

,*J. Atmos. Sci.***50****,**3504–3522.Neelin, J. D., , M. Latif, , and F-F. Jin, 1994: Dynamics of coupled ocean-atmosphere models: The tropical problem.

,*Annu. Rev. Fluid Mech.***26****,**617–659.Neelin, J. D., , D. S. Battisti, , A. C. Hirst, , F-F. Jin, , Y. Wakata, , T. Yamagata, , and S. E. Zebiak, 1998: ENSO theory.

,*J. Geophys. Res.***103****,**14261–14290.Ngodock, H. E., , B. S. Chua, , and A. F. Bennett, 2000: Generalized inverse of a reduced gravity primitive equation ocean model and tropical atmosphere–ocean data.

,*Mon. Wea. Rev.***128****,**1757–1777.Parrish, D. F., , and S. E. Cohn, 1985: A Kalman filter for a two-dimensional shallow water model: Formulation and preliminary experiments. Office Note 304, National Meteorological Center, Washington, DC, 64 pp.

Penland, C., 1996: A stochastic model of IndoPacific sea surface temperature anomalies.

,*Physica D***98****,**534–558.Philander, S. G., 1990:

*El Niño, La Niña, and the Southern Oscillation*. Academic Press, 293 pp.Rasmusson, E. M., , X. Wang, , and C. F. Ropelewski, 1990: The biennial component of ENSO variability.

,*J. Mar. Syst.***1****,**71–96.Sheinbaum, J., , and D. L. T. Anderson, 1990: Variational assimilation of XBT data. Part II. Sensitivity studies and use of smoothing constraints.

,*J. Phys. Oceanogr.***20****,**689–704.Suarez, M. J., , and P. S. Schopf, 1988: A delayed action oscillator for ENSO.

,*J. Atmos. Sci.***45****,**3283–3287.Todling, R., , and S. E. Cohn, 1994: Suboptimal schemes for atmospheric data assimilation based on the Kalman filter.

,*Mon. Wea. Rev.***122****,**2530–2557.Todling, R., , and M. Ghil, 1994: Tracking atmospheric instabilities with the Kalman filtering. Part I: Methodology and one-layer results.

,*Mon. Wea. Rev.***122****,**183–204.Tziperman, E., , L. Stone, , M. A. Cane, , and H. Jarosh, 1994: El Niño chaos: Overlapping of resonances between the seasonal cycle and the Pacific ocean–atmosphere oscillator.

,*Science***264****,**72–74.Verron, J., , L. Gourdeau, , D. T. Pham, , R. Murtugudde, , and A. J. Busalacchi, 1999: An extended Kalman filter to assimilate satellite altimeter data into a nonlinear numerical model of the tropical Pacific Ocean: Method and validation.

,*J. Geophys. Res.***104****,**5441–5458.Zebiak, S. E., , and M. A. Cane, 1987: A model El Niño Southern Oscillation.

,*Mon. Wea. Rev.***115****,**2262–2278.

## APPENDIX A

### The Model

The model description here is brief and we refer the reader to Jin and Neelin (1993) for details. The feedback between the ocean and atmosphere takes place every time step.

#### The ocean

The ocean dynamics is described by linear shallow-water equations for the currents and a nonlinear equation for the sea surface temperature (SST). The dynamical variables are the three velocity components, (*u,* *υ,* *w*), and the thermocline depth anomaly *h.* We list and explain the parameter values in Table 1.

##### SST equation

*T*is the temperature of the surface mixed layer,

*u*

_{1}the zonal current in this surface layer,

*w*the vertical surface current, and

*υ*

_{N}the meridional surface current at the northern boundary of the equatorial box. Symmetry of SST and antisymmetry of

*υ*

_{N}are assumed.

The equilibrium value *T*_{0} of SST is set at 29°, *ϵ*_{T} = (90 days)^{−1} is the Newtonian damping time, *L*_{y} is the width of the box, and *T*_{N} is the off-equatorial SST at a distance *L*_{y} from the equator. The depths *H*_{1} and *H*_{2} of the two layers are taken here to be 50 and 100 m, while *H*_{1.5} = 75 m is the depth scale that characterizes upwelling of the subsurface temperature *T*_{sub}.

*x*) of the Heaviside function

*H*(

*x*) is used here,see Eq. (B6) in appendix B.

*υ*

_{N}can be obtained by finite differencing of the continuity equation:The subsurface temperature

*T*

_{sub}is parameterized as a function of thermocline depth anomaly

*h*—a deeper thermocline is associated with warmer upwelled waters:

##### Ocean currents

*β*-plane in the long-wave approximation,Here

*τ*is the zonal wind stress,

*ρ*is the oceanic density,

*H*=

*H*

_{1}+

*H*

_{2}is the total depth of the two layers, and

*ϵ*= (2.5 yr)

^{−1}is the damping rate for the vertical-mean current. The relative adjustment time coefficient

*δ*measures the ratio of the timescale of adjustment by oceanic dynamics to the net timescale of SST change through the SST equation.

*x*

_{E}and

*x*

_{W}locate the eastern and western boundaries, respectively.

*q,*

*r,*and

*υ*as the dependent variables, in nondimensional units. Following JN, we truncate the equations so obtained to include only the Kelvin mode and the first seven symmetric Rossby modes. The resulting equations for the oceanic wave coefficients

*q*

_{n}arehere

*τ*

_{n}is the zonal wind stress projected onto oceanic mode

*n,*

*q*

_{0}is the amplitude of the Kelvin wave, and

*q*

_{n},

*n*= 2, 4, … , 14, are the amplitudes of the first seven Rossby waves.

*u*

_{s}and

*υ*

_{s}of the vertical-shear current are governed by steady-state equations dominated by damping due to interfacial stress between the layers (Zebiak and Cane 1987):where

*ϵ*

_{s}= (2 days)

^{−1}is the damping coefficient for the shear currents.

*w*

_{s}of the shear currents can be calculated from the continuity equation using

*u*

_{s}and

*υ*

_{s}:For simplicity, the three components of the shear current are written as:here

*b*

_{u}≈

*H*

_{2}/

*H*

_{1}

*ϵ*

_{s}and

*b*

_{w}≈ (

*H*

_{1}/

*L*

_{d})

*b*

_{u}, and

*L*

_{d}is the characteristic meridional length scale determined by the damping timescale of vertical mixing:

*L*

_{d}≡

*ϵ*

_{s}/

*β.*

#### The atmosphere

*τ*′ to SST anomalies

*T*′ at the equator isHere

*A*is an amplitude factor,

*ϵ*

_{a}is a Rayleigh friction due to boundary layer turbulence, and

*μ*is an ocean–atmosphere coupling parameter.

#### Climatological state and coupling

*x*

_{0}= 0.57

*L,*where

*L*is the basin width.

*τ*given by

*τ*

*τ*

*τ*

*τ*′ is derived from the atmospheric response to SST anomalies

*T*′,

*T*

*T*

*T*

*δ*

_{s}that controls the anomalous surface-layer currents due to coupling. These currents can thus be reduced or even turned off completely to examine their effects without affecting the climatology:Here

*u*

_{s},

*υ*

_{s},

*w*

_{s}are the velocity components associated with the climatological wind stress, while

*u*

^{′}

_{s}

*υ*

^{′}

_{s}

*w*

^{′}

_{s}

*u*

_{s}does not depend on

*δ*

_{s}.

## APPENDIX B

### The Model-Error Covariance 𝗤

*q*

_{k}represents the state vector for all the

*q*

_{n},

*n*= 0, 2, … , 14, at time step

*k,*and

*e*

_{k}is the wind stress error at

*k.*Since the oceanic waves have simple linear dynamics,

*L*

_{τ}( · ) and

*L*

_{q}( · ) are linear operators, while

*z*( · , · ) and

*g*( · , · , · ) are nonlinear functions associated with the advection terms in the SST equation.

Because of the interaction of the model state with the random noise *e*_{k} in the wind stress, function *g* is very rough and cannot be integrated in the traditional Riemann or even Lebesgue sense. Of the two forms of calculus available for stochastic processes, we use the Stratonovich calculus here. Unlike the often-used Itô calculus, it resembles the traditional calculus in allowing the chain rule for differentiation (e.g., Penland 1996).

*η*

^{T}

_{i}

*i*(and time step

*k*) are then computed from the wind stress errors

*e*

_{i}and

*e*

_{i+1}(both at time step

*k*), at grid points

*i*and

*i*+ 1, according tohere

*b*

_{u}and

*b*

_{w}are the coefficients associated with the shear currents, as in Eqs. (A17)–(A19), while Δ

*t*and Δ

*x*are the time step and grid spacing, respectively. Note that we choose a smooth function

*x*) to represent the Heaviside function

*H*(

*x*) of Eq. (A2),in (B5),

*w*

_{0}and

*υ*

_{0}are appropriate scaling factors for the respective arguments of

*ηη*^{T}

*η*^{T}is the transpose of the vector

**.**

*η*Parameter values in the coupled ocean–atmosphere model

Comparison of forecast run and assimilation runs