## 1. Introduction

Data assimilation provides a framework for the combination of the information about the state of the ocean contained in an incomplete data stream with our knowledge of the ocean dynamics included in a model. The problem of data assimilation may be formulated in statistical terms where, because of uncertainty in both observations and models, an estimate of the state of the ocean at any given time is considered to be a realization of a random variable. An estimate of the state of the ocean is produced as a blend of estimates from observations and model forecast based on prior knowledge of the error statistics of each, with some measure of the uncertainty in the estimate. The differences between assimilation methods lie primarily in the approaches taken to estimate the error statistics associated with the forward (dynamical) model, the so-called background or forecast error statistics. Since an accurate representation of the observation and forecast error statistics is crucial to a successful data assimilation, a lot of effort has been expended in this direction.

One simplifying assumption that is often made is that the forecast error statistics do not change significantly with time and thus can be approximated by a constant probability distribution. This is the basis of the optimal interpolation (OI) data assimilation scheme, also known as statistical interpolation (e.g., Daley 1991, chapters 4 and 5). An alternative to this assumption is to allow for time evolution of the probability distribution. An example of such a data assimilation scheme is the Kalman filter (Kalman 1960), in which the model and data errors are assumed to be normally distributed and the forecast error covariance matrix is evolved prognostically. The Kalman filter can be shown to give an optimal estimate in the case of linear dynamics and linear observation operator. To account for nonlinear processes a generalization of the Kalman filter, the extended Kalman filter uses instantaneous linearization (and often a truncation) of the model equations during the update of the error covariance matrix and the full equations to update the model forecast (e.g., Daley 1991; Ghil and Malanotte-Rizzoli 1991). However, time stepping the forecast error covariance matrix is computationally expensive, rendering this method impractical when used with high-resolution general circulation models. Under certain conditions it is possible to use an asymptotic Kalman filter (e.g., Fukumori et al. 1993), where a steady-state covariance matrix replaces the time-evolving one. An ensemble Kalman filter (EnKF) was introduced by Evensen (1994) based on a Monte Carlo technique in which the forecast error statistics are computed from an ensemble of model states evolving simultaneously. The methodology of the EnKF was further refined by adding perturbations to the observations (e.g., Burgers et al. 1998) to maintain consistent variance in the ensemble analysis. An application of this method with the Poseidon ocean model used in this study has been developed by Keppenne and Rienecker (2002, 2003). Zhang and Anderson (2003) describe an ensemble adjustment Kalman filter (EAKF), which is another modification of the Kalman filter based on a Monte Carlo approach, and compare it to an ensemble OI scheme (time-invariant forecast error, but spatial structure is derived from a collection of state vectors) as well as an OI with functionally prescribed covariances. Their conclusion is that when applied to a simple atmospheric model an ensemble OI can produce reasonably good assimilation results if the covariance matrix is chosen appropriately.

This study focuses on the importance of the multivariate aspect of the forecast error covariance in the context of data assimilation using OI. Provided with a fairly good observing network, the background error structure can be estimated using analysis of spatial and temporal decorrelation scales, as done in numerous meteorological applications (Ghil and Malanotte-Rizzoli 1991; Derber et al. 1991). However, even for atmospheric data assimilation, the observing system is not adequate to support a full calculation of the background error covariance statistics; hence model forecasts are often used for error estimation, as, for example, done in the “NMC (National Meteorological Center) method” (Derber et al. 1991).

The vastness and complexity of the domain and relative scarcity of oceanographic observations would require additional simplifying assumptions in similar calculations. To avoid imposing severe restrictions on the error covariance calculation due to limited data availability, this paper explores the efficacy of estimating the forecast error from an ensemble of model integrations. Several studies used a Monte Carlo approach to estimate forecast error covariance structure from an ensemble of assimilation integrations with randomly perturbed observations (Fisher and Andersson 2001) or randomly perturbed background states and observations (Buehner 2005). Houtekamer et al. (1996) use an ensemble in which the uncertain elements of the forecast system are perturbed in different ways for different ensemble members, including perturbations to observations, perturbations to the model’s parameters and perturbations to the surface fields. A Monte Carlo technique similar to the EnKF is used here. An important advantage of using an ensemble of ocean states is that it provides a natural way to estimate cross covariances between the fields of different physical variables constituting the model-state vector while incorporating model balance relations and the influence of boundaries. The idea of a multivariate forecast error covariance matrix has been implemented in the oceanographic context, for example, to relate the tide gauge data (Cane et al. 1996) and surface velocity data (Oke at al. 2002) to the dynamically varying quantities in the water column below.

There are many questions that arise with this approach. For example, how large should the ensemble be, and more generally, how should it be generated? Other questions are related to the underlying assumption of the stationarity and the unbiased nature of error statistics in the OI algorithm. Will a one-time estimate of the forecast error, derived from a Monte Carlo ensemble, be a good representation of this error at another time? At any time during assimilation? Or, in other words, what is the variability of the forecast error covariance structure? What are the dominant time scales? Can this information be acquired and, if so, used to improve the assimilation scheme?

The primary interest of this study is ocean phenomena taking place on seasonal-to-interannual time scales. One example of such phenomena is the quasi-regular occurrence of El Niño—a large-scale warming of near-surface temperature in the eastern equatorial Pacific Ocean accompanied by a basinwide perturbation in the tilt of the thermocline across the equatorial ocean (e.g., Philander 1990). The estimate of error statistics derived below attempts to capture errors associated with such variability. The logical organization of the paper is as follows. Next, the OI assimilation algorithm, model, and data are described (section 2). Then the forecast error covariance model, a traditional Gaussian model of the forecast error covariance, and the empirical multivariate model of interest are detailed (section 3). Then the multivariate error covariance model properties are explored (section 4). After the experimental setup is described, the results of multivariate assimilation are compared with univariate assimilation (section 5). The paper concludes with a discussion of the results and further directions of research (section 6).

## 2. OI assimilation

### a. OI framework

A detailed discussion of sequential data assimilation algorithms can be found in earlier literature (see, e.g., Lorenc 1986; Daley 1991; Cohn 1997). Here, only a brief outline is given to introduce necessary terminology and notation.

The aim of a data assimilation algorithm is to determine the best estimate of the state vector based on the estimates available from both model and observations. A dynamic (prediction) model can be represented in terms of a nonlinear operator Ψ(**x**), where **x** is a state vector of length *n _{x}*. Let

**d**denote a vector of observations, which has dimension

*n*≪

_{d}*n*(typically for ocean applications) and an element of

_{x}**d**is not necessarily an element of the state vector

**x**. A discrete form of the model can be written as

**x**

*= Ψ*

_{k}

_{k}_{−1}(

**x**

_{k}_{−1}), where

**x**

*is the forecast state vector at time level*

_{k}*k*, and Ψ

_{k}_{−1}is the numerical approximation to the set of model equations describing the evolution of the state forward from time

*k*− 1 to

*k*. Similarly, observations available at time

*k*can be denoted as

**d**

*and the observation transformation operator as*

_{k}*(*

_{k}**x**

*).*

_{k}**x**

*is given byHere superscript*

_{k}*f*stands for the forecast and

*a*stands for the analysis. The sequential data assimilation schemes that have the form of Eq. (2) differ from each other by the weight matrix 𝗞

*often called the*

_{k}*gain matrix*.

*can be defined under certain assumptions about the error statistics. Most sequential data assimilation algorithms are based on assumptions that the observational and model errors are unbiased, white in time, spatially uncorrelated with each other, and that their spatial covariances are known (usually it is assumed that, at least initially, the errors are Gaussian). The observational error may also include any error of representation of the processes of interest, although such errors will not in general satisfy the assumption of a white, Gaussian sequence. Without any loss of generality, it is also assumed that the system noise and the observational noise are uncorrelated with each other. Under these assumptions, for a linear model and a linear observation transformation operator,*

_{k}*≡ 𝗛*

_{k}*, the optimal 𝗞*

_{k}*is given byHere 𝗣*

_{k}^{ f}

_{k}is the forecast error covariance matrix, which, in general, is time dependent. For a high-resolution ocean model with the number of state variables on the order of 10

^{6}, 𝗣

^{ f}

_{k}is extremely expensive to store and evaluate in full. Thus, numerous approaches have been suggested to simplify the computation of 𝗣

^{ f}

_{k}. The traditional OI method assumes that 𝗣

^{ f}

_{k}≡ 𝗣 is approximately constant in time. In the case of observational errors, the matrix

**R**is often assumed to be diagonal and to contain only information about the level of variance in the measurement error due to instrumental imperfection and unresolved small-scale signal. There are means of allowing for simple time evolution of the forecast error variance (see, e.g., Ghil and Malanotte-Rizzoli 1991; Rienecker and Miller 1991), but they are not considered here. A full evolution of 𝗣

^{ f}

_{k}would be a Kalman filter.

The effects of nonlinear dynamics and inhomogeneities associated with ocean boundaries are implicitly taken into account when the empirical forecast error covariance matrix 𝗣 is constructed from model integrations as presented in the next section.

### b. Model and forcing

The model used for this study is the Poseidon reduced-gravity, quasi-isopycnal ocean model introduced by Schopf and Loughe (1995) and used by Keppenne and Rienecker (2002, 2003) for testing the ensemble Kalman filter. The model described by Schopf and Loughe (1995) has been updated to include the effects of salinity (e.g., Yang et al. 1999). The model was shown to provide realistic simulations of tropical Pacific climatology and variability (Borovikov et al. 2001). Explicit details about the model are provided in Schopf and Loughe (1995). The prognostic variables are layer thickness, temperature, salinity, and the zonal and meridional current components. The generalized vertical coordinate of the model includes a turbulent well-mixed surface layer with entrainment parameterized according to a Kraus–Turner (1967) bulk mixed-layer model.

For this study, the domain is restricted to the Pacific Ocean (45°S to 65°N) with realistic land boundaries. At the southern boundary the model temperature and salinity are relaxed to the Levitus and Boyer (1994) climatology. The horizontal resolution of the model is 1° in longitude, and in the meridional direction a stretched grid is used, varying from 1/3° at the equator to 1° poleward of 10°S and 10°N. The calculation of the effects of vertical diffusion, implemented at 3-h intervals through an implicit scheme, are parameterized using a Richardson-number-dependent vertical mixing following Pacanowski and Philander (1981). The diffusion coefficients are enhanced when needed to simulate convective overturning in cases of gravitationally unstable density profiles. Horizontal diffusion is also applied daily using an eighth-order Shapiro (1970) filter. The net surface heat flux is estimated using the atmospheric mixed-layer model of Seager et al. (1995) with monthly averaged time-varying air temperature and specific humidity from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis (e.g., Kalnay et al. 1996), climatological shortwave radiation from the Earth Radiation Budget Experiment (ERBE) (e.g., Harrison et al. 1993), and climatological cloudiness from the International Satellite Cloud Climatology Project (ISCCP) (e.g., Rossow and Schiffer 1991).

Surface wind stress forcing is obtained from the Special Sensor Microwave Imager (SSM/I) surface wind analysis (Atlas et al. 1991) based on the combination of the Defense Meteorological Satellite Program (DMSP) SSM/I data with other conventional data and the European Centre for Medium-Range Weather Forecasts (ECMWF) 10-m surface wind analysis. The surface stress was produced from this analysis using the drag coefficient of Large and Pond (1981). Monthly averaged wind stress forcing was applied to the model. The precipitation is given by monthly averaged analyses of Xie and Arkin (1997).

Model mean (1988–97) temperature, salinity, and zonal velocity sections along the equator compare very well with estimates made from observations (Johnson et al. 2002) taken during an overlapping period (Fig. 1).

### c. Data

The Tropical Atmosphere–Ocean (TAO)/Triton array (Fig. 2), consisting of more than 70 moored buoys spanning the equatorial Pacific (http://www.pmel.noaa.gov/tao; McPhaden et al. 1998), measures oceanographic and surface meteorological variables: air temperature, relative humidity, surface winds, sea surface temperatures, and subsurface temperatures down to a depth of 500 m. By 1994 these measurements became available daily approximately uniformly spaced at 2°–3° latitude and 10°–15° longitude across the equatorial Pacific Ocean.

The temperature observations from the TAO/Triton array were the only data type used in these assimilation experiments since the focus is on well-known deleterious effects of temperature assimilation in the equatorial waveguide, as discussed, for example, in Troccoli et al. (2002, 2003). [In the global assimilation conducted by the National Aeronautics and Space Administration (NASA) Seasonal-to-Interannual Prediction Project to initialize seasonal forecasts, the global XBT database is included.] The standard deviation of the observational error, denoted *σ*_{TAO}, is set to 0.5°C and the errors are assumed to be uncorrelated in space and time. This value is high compared to the instrumental error of 0.1°C (Freitag et al. 1994) since it also has to reflect the representativeness error; that is, the data contain a mixture of signals of various scales including frequencies much higher than the target scales of assimilation. By tuning *σ _{TAO}* we effectively control the ratio of the data error variance to the forecast error variance.

## 3. Forecast error covariance modeling

In error covariance structure modeling, one is striving for an accurate representation of the error statistics as well as for simple and efficient implementation for computational viability. With little knowledge of the true nature of the forecast error covariances, one often has to make assumptions and settle for simple methods that usually have the advantage of being easy to implement. This section describes two different models for the forecast error covariance structure, a simpler and less computationally intense and a more elaborate and more accurate model. For both, an OI framework is used wherein the forecast error covariance matrix, 𝗣* ^{ f},* is assumed to be time invariant.

### a. Univariate functional model

A commonly used analytical error covariance function (e.g., Carton and Hackert 1990; Ji et al. 1995) has been employed here in the tropical Pacific Ocean region: the spatial structure of the model temperature (T) forecast error is assumed to be Gaussian in all three dimensions with scales 15°, 4°, and 50 m in zonal, meridional, and vertical directions, respectively. The values used in this study were estimated from the ensemble of model integrations described in the next subsection. Those spatial scales are also (marginally) resolved by the equatorial moorings, which are nominally separated by 10° to 15° in the zonal direction and by 2° to 3° in the meridional direction. Horizontal scales are comparable to scales used in similar [three-dimensional variational data assimilation (3DVAR)] assimilation schemes (e.g., Ji et al. 1995; Rosati et al. 1996). There are several advantages to this error covariance model. For the Gaussian form of the covariance function, the minimum variance estimate for the least squares minimizing functional is the maximum likelihood estimate, and the analysis error covariance function is also Gaussian. It is relatively easy to implement and adapt to parallel computing architecture. The study by Rosati et al. (1997) also shows that use of such empirical covariance scales, though simplified, is nevertheless effective for improving seasonal forecasts.

In the univariate implementation the temperature observations have been processed and the correction was made only to the model temperature field during each assimilation cycle, while other variables adjusted according to the model’s dynamic response to the temperature correction.

### b. Monte Carlo method for estimating the multivariate forecast error covariance

A more realistic covariance structure that is consistent with model dynamics and the presence of ocean boundaries was sought through an application of the Monte Carlo method. The variability across an ensemble of ocean-state estimates was used for a one-time estimate of the forecast error statistics. This approach is similar in spirit to the ensemble Kalman filter except that the error covariance does not evolve with time and does not feel the impact of prior data assimilation, although it could.

The design of this forecast error covariance model was influenced by the need to assimilate TAO mooring observations for seasonal forecasts. While the Poseidon model has a layered configuration, the TAO observations are taken at approximately constant depth levels. In the implementation for this study, the covariances are calculated on predefined depth levels. At each assimilation cycle the model fields are interpolated to these depths, the assimilation increments are computed on these prespecified levels, and are then interpolated back to the temperature grid points at the center of the model layers. The discussion below deals with the three-dimensional forecast error covariance matrix whose horizontal structure coincides with the model grid, and in the vertical is arranged at depths coincident with the nominal TAO instrument depths.

*T, S, U, V*, and ssh are model variables: temperature, salinity, zonal, and meridional velocities and dynamic height, respectively, and

*σ*

_{[T, S, U, V, ssh]}are nondimensionalizing factors. For the latter we took the global standard deviation within each of the model fields at a depth of 100 m (the depth of highest variability, around the thermocline):

*σ*= 0.65°C,

_{T}*σ*= 0.08,

_{S}*σ*= 0.09 m s

_{U}^{−1},

*σ*= 0.08 m s

_{V}^{−1}, and

*σ*

_{ssh}= 0.08 m. Note that

*σ*, which represents the internal variability of the model is comparable to the assumed

_{T}*σ*− the observational error standard deviation, so that the model forecast and data are given comparable weights in assimilation. The multivariate covariance matrix isIf the matrix 𝗔

_{TAO}^{m×nx}contains the

*m*-member ensemble of (anomalous) ocean states as columns, then 𝗣 can be computed asThe size of 𝗣 is of the order of

*n*≈ 10

_{x}^{6}(the dimension of the state vector), while its rank is smaller than the size of the ensemble,

*m*(on the order of 10

^{2}in the case of this study). Since the rank of the error covariance matrix 𝗣 estimated using this method is so small, it can be conveniently represented using a basis of empirical orthogonal functions (EOFs), 𝗘. EOFs have been widely employed in oceanographic contexts (e.g., Cane et al. 1996; Kaplan et al. 1997), and the relevant theoretical background can be found, for example, in Preisendorfer (1988). The necessary linear algebra concepts may be reviewed by using Golub and Van Loan (1996).

*has the same eigenvalues as 𝗔*

^{T}*𝗔, which is only*

^{T}*m*×

*m*, and the eigenvectors of 𝗔𝗔

*are related to those of 𝗔*

^{T}*𝗔 aswhere 𝗘*

^{T}^{nx×m}contains the eigenvectors of 𝗔𝗔

*, 𝗨*

^{T}^{m×m}contains the eigenvectors of 𝗔

*𝗔, and Λ*

^{T}^{m×m}= diag(

*λ*

^{2}

_{1}, . . . ,

*λ*

^{2}

_{m}) has the eigenvalues of 𝗔

*𝗔. Then, since 𝗨 is orthogonal (Golub and Van Loan 1996, p. 393),The columns of 𝗘 are orthonormal and the eigenvalues,*

^{T}*λ*

^{2}

_{i},

*i*= 1, . . . ,

*m*, are the variances. Equation (3) can thus be rewritten as

#### 1) Ensemble generation

*δ*𝗙

*are interannual anomalies—in phase with respect to the annual cycle and interannual SST anomalies but with different internal atmospheric chaotic variations. Surface forcing is used for the ensemble generation because this is probably the dominant source of error in the upper ocean in the equatorial Pacific. Our approach is similar to Cane et al. (1996) in the sense that all the ensemble variability is a result of the perturbations to the atmospheric forcing, although the implementation details differ. Although errors in the synoptic forcing will be large, the focus here is on the longer time scales of interest for seasonal prediction. The fluxes were obtained from a series of integrations of the Aries atmospheric model (e.g., Suarez and Takacs 1995) forced by the same interannually varying sea surface temperatures (SSTs) and differing only in slight perturbations to the initial atmospheric state. The interannual anomalies in surface stress and heat flux components were added to the seasonal forcing estimated from the sources described in section 2b. This approach attributes all of the ocean model forecast error to uncertainties in the surface flux anomalies, since differences between the ensemble members were due to atmospheric internal variability. No perturbations were added to the SSTs used for the atmospheric integrations and so the long-term mean of the heat fluxes are strongly constrained.*

_{n}In all, 32 runs were conducted, each 15 yr long, corresponding to the 1979–93 period of the SST data used to force the atmospheric model. Five-day averages (pentads) of the model fields were archived. These were subsequently interpolated to the 11 depth levels, coincident with the depths of the TAO observations. All the covariance estimates have been made using these fields. Selecting at random a pentad from a 15-yr period, a computation of the EOFs of the matrix 𝗣 was carried out using the ensemble of 32 ocean-state realizations. The first EOF explained only about 3% of the total error variance, and this result was similar for many one-time estimates of 𝗣 attempted at other randomly selected dates. All eigenvalues of 𝗔𝗔* ^{T}* appear to be so close to each other as to be virtually indistinguishable. Apparently, this ensemble was not sufficient to reliably define the subspace containing the leading directions of the forecast error variability. A possible reason for this result is the small size of the ensemble, not adequate to resolve the dominant modes of variability of such a complex system. Thus, the question arose of how to enlarge the ensemble given the accumulated model output. A natural solution would be to include fields from the same model run, but selected in such a way as to prevent contamination of the internal model error variability by the temporal variability, such as lag correlation or interannual variations.

Thus, a matrix of ensemble members, 𝗔, was formed by selecting at random 5 yr from the 15-yr period, then choosing a pentad from each year corresponding to the same date, say, the first of January. Such a choice ensured that the states were sufficiently separated in time to be considered independent. This allowed for the collection of an ensemble of 160 members. This limit was set by practical considerations. The mean was removed separately for each of the 5 yr to remove the influence of interannual variability. The EOFs of the matrix 𝗣 were then computed. The properties of the error covariance matrix constructed in such a way are discussed below.

#### 2) Compact support

A persistent problem associated with empirical forecast error covariance estimation is the appearance of unphysical large lag correlations that are an artifact of the limited ensemble size (e.g., Houtekamer and Mitchell 1998, their Fig. 6). We use an ensemble size of 160, yet the potential number of degrees of freedom is *O*(10^{6}). To alleviate this problem, the multivariate anisotropic inhomogeneous matrix was modified by a matrix specified by a covariance function that vanishes at large distances; that is, a Hadamard product of the two matrices was employed, as discussed by Houtekamer and Mitchell (2001). Keppenne and Rienecker (2002) implemented the compact support for the ensemble Kalman filter developed by the NASA Seasonal-to-Interannual Prediction Project (NSIPP) for parallel computing architectures, and that implementation is used in the present study. The functional form follows the work by Gaspari and Cohn (1999) who provided a methodology for constructing compactly supported multidimensional covariance functions. The characteristic scales of this function were selected in such a way that most of the local features of the empirically estimated error covariance structure are preserved, but at large spatial lags the covariance vanishes: 30°, 8°, and 100 m in the zonal, meridional, and vertical directions, respectively.

To visualize the covariance structure, an artificial example is considered with a single observation different from a background field by one nondimensional unit. The resulting correction reflects the forecast error correlation structure—it corresponds to a section of a single row of the 𝗣 matrix. This is also termed the marginal gain since it measures the impacts of processing a single perfect measurement without reference to other data that might be assimilated. The correlation between temperature observations at several locations across the equatorial Pacific Ocean (156°E, 180°, 155°W, and 125°W) at depths roughly corresponding to the position of the thermocline, estimated by the 20°C isotherm depth, and the temperature elsewhere in the Pacific reveals that with compact support the long-range correlation is eliminated, but the local structure is intact (Fig. 3).

#### 3) Multivariate error covariance patterns

The following discussion of the multivariate error covariance model will focus on the thermocline region in the equatorial Pacific Ocean. The shapes of the correlation structure associated with a single point differ between the eastern and western regions (Fig. 3, top four panels). The zonal scale tends to be shorter in the western and central and longer in the eastern part of the basin. Meridional decay scales are similar along the equator, but the vertical correlation (Fig. 3, middle four panels) varies: shorter and symmetrical in the western part, slightly skewed in the central part, and symmetrical but more elongated in the eastern part of the equatorial Pacific basin. Zonal sections (Fig. 3, bottom four panels) illustrate the anisotropy associated with the tilt of the thermocline. This example alone demonstrates that even the error covariance structure of the temperature itself is so complex that a homogeneous error correlation structure is not quite applicable.

Although to date there have been very few salinity observations, this is changing with the Argo program (http://argo.jcommops.org; Wilson 2000). Hence, it is of interest to explore corrections associated with salinity observations (Fig. 4). The decorrelation scales in the western basin are noticeably longer than in the middle and eastern basin, 8°–10° in the zonal and 4°–6° in the meridional direction in the west and 2°–4° in the zonal and 1°–2° in the meridional direction in the east. The scales are notably shorter than those for temperature (Fig. 3) except for the meridional scales in the west.

In a similar fashion one can analyze the temperature–salinity, temperature–velocity, and other cross-variable relationships, that is, the effect of a single unit observation on various fields/components of the ocean-state vector. Corrections in *S* and *U* fields associated with a *T* observation and corrections in *T* and *U* associated with an *S* observation are displayed for a single location, 155°W at the equator (Fig. 5).

Examples of the temperature–salinity covariance (Fig. 5) reveal and reflect the complex and irregular nature of the temperature–salinity relationship. The change in salinity associated with a temperature increment is not necessarily density-compensating. Equatorial temperature and salinity south of the equator in the western region are anticorrelated, while temperature at the equator and salinity immediately to the north are correlated at 150 m in the western and central Pacific. The scales of influence are short compared with the temperature–temperature relationship. The anticorrelation is consistent with the mean thermohaline (*T*–*S*) structure, with freshwater overlying a saline core. In the east, the correlation between *T* and *S* is primarily vertical; horizontal scales are very short, on the order of 2°–4°. The positive correlations on the equator, as seen on the meridional sections of the central basin, are higher toward the Northern Hemisphere. The negative correlations to the south are consistent with higher temperatures straddling the cold tongue with more saline water south of the equator and fresher water north. Thus the covariances are consistent with vertical and meridional variations.

The relationship between temperature and velocity in the western Pacific reflects temperature changes associated with upstream advection/convergence effects. At 156°E and at the date line (not shown), the higher temperatures are associated with a weaker equatorial undercurrent in a broad region to the west. At 155°W, the effects are more local and wavelike with increased temperature associated with a stronger equatorial undercurrent. At 125°W (not shown) the scales are shorter and also wavelike, with changes in temperature apparently associated with instability waves.

It is possible to infer from the multivariate analysis the effect a single salinity observation would have on temperature and zonal velocity fields at various locations across the equatorial Pacific Ocean. The large positive correlation between salinity and temperature fields in the central and to a lesser degree in the eastern Pacific indicates that the correction of the salinity field may have a significant impact on the temperature. The S–U relationship is weak in the western part of the basin and the correlation patterns are wavelike in the east, strongly pronounced in the north–south direction.

## 4. Robustness of the model error covariance estimate

In this section, the sensitivity of the covariance structure to the choice made in populating the ensemble—that is, to seasonal or interannual variations in the atmospheric forcing—is explored to evaluate the robustness of the covariance estimates. The robustness is tested by randomly sampling the full suite of integrations. Five years out of 15 (the length of the run) were picked at random, then the same date (e.g., 1–5 January pentad) was taken for each year. As before, the mean across the ensemble was removed for each year. The procedure was repeated 10 times, allowing us to obtain 10 realizations of the covariance matrix 𝗣. The pentads were chosen so that realizations from the same season and from different seasons could be compared. From visual assessment of figures similar to Figs. 3 –5, the correlation structures represented by the different estimates of 𝗣 were very similar.

One comparison of the robustness of covariance estimates is pointwise covariance sections (Fig. 6) at the same locations as simulated temperature observations as in Figs. 3 and 4. The tight distribution of the decorrelation curves from the 10 different 𝗣 realizations (thin lines) indicates good reproducibility of the covariance structure. No significant interannual variability is apparent within this collection of 𝗣 matrices. The overplotted Gaussian curves show that the decorrelation scales vary at the four locations across the equatorial basin and can hardly be fitted by a single parameter (scale estimate) in a functional covariance model. In the univariate optimal interpolation (UOI) covariance model used for comparison below, the temperature decorrelation scales chosen are consistent with the scales of the empirical error covariance model in the western and central equatorial Pacific.

The difference among the Monte Carlo estimates of 𝗣 can also be quantified in terms of the dominant error subspaces spanned by each of the ensemble sets. These subspaces are best described by the orthonormal bases of EOFs. The use of EOFs allows a spatial filtering of the covariance structures by inclusion of only those EOFs that are non-noise-like, thus defining the dominant error subspace. This procedure also eliminates problems associated with different levels of variance even though the spatial structures (covariances) are similar.

**a**can be expressed in terms of the EOF basis {

*α*} asThe set of EOFs {

*α*} spans the subspace

*of the forecast error space*

_{α}*δ*is the residual lying in the complement of

^{α}*, that is, subspace*

_{α}*S*, not spanned by {

^{c}_{α}*α*};

*S*may or may not contain significant forecast error covariability information. To assess the information content not included in

^{c}_{α}*we examine covariability through the EOFs of*

_{α}*δ*. If the EOFs of

^{α}*δ*are noise-like, this would indicate that the EOFs {

^{α}*α*} captured the significant information regarding the forecast error covariance contained in

**a**. This calculation was repeated for several instances of {

*α*} and

**a**} to assess the invariability of

*.*

_{α}The spectra of various ensembles of *δ ^{α}* ⊂

*=*

^{c}_{α}_{α}are shown in Fig. 7, where {

*α*} are calculated from January pentads and {

**a**} are pentads from July. In every case, the eigenvalues of {

*α*} and {

*δ*} are normalized by the variance of the corresponding ensemble {

**a**}. The eigencurves of {

*δ*} are almost flat, characteristics of white noise, and are one order of magnitude less than the dominant eigenvalues of

*α*. Thus the error subspace generated from this Monte Carlo simulation appears to be robust.

## 5. Assimilation experiments

The effectiveness of the empirical multivariate forecast error covariance estimate is assessed by assimilating the temperature observations from the TAO moorings. The evaluation uses a set of independent (i.e., not assimilated) temperature, salinity, and zonal velocity observations from the TAO servicing cruises. The temperature and salinity data are based on conductivity–temperature–depth (CTD) profiles and the velocity data from the acoustic Doppler current profiler (ADCP). The comparison uses a gridded analysis of these data, as described by Johnson et al. (2000).

The assimilation experimental setup is as follows. The model was spun up for 10 yr with climatological forcing and then integrated with time-dependent forcing for 1988–98 in all the experiments. The assimilation began in July 1996. The initial conditions and the forcing were identical in all assimilation experiments. In addition to the data assimilation runs, a forced model integration without assimilation (referred to as the control) serves as a baseline for assessing the assimilation performance. The assimilation run with a simple univariate covariance model is denoted UOI. The run with the empirical multivariate forecast error covariance model is termed MvOI.

In every assimilation experiment, the daily averaged subsurface temperature data from the TAO moorings were assimilated once a day. To alleviate the effects of the large shock on the model resulting from the intermittent assimilation of imperfectly balanced increments, the incremental update technique was used (Bloom et al. 1996). In this implementation, the assimilation increment is added gradually to the forecast fields at each time step.

The simulation (i.e., the control, with no assimilation) and two assimilation tests are cross validated against the independent temperature, salinity, and zonal velocity sections from Johnson et al. (2002). All of the available observed profiles are used and the statistics are separated corresponding to four regions: Niño-4 (160°E–150°W) and Niño-3 (150°–90°W), further divided into two halves, north and south of the equator (5°S–0° and 0°–5°N). To put the amplitude of the root-mean-square difference (rmsd) in perspective, the mean monthly standard deviation (std) of the model is plotted as well. It is calculated using daily values at the same predefined depth levels on which the analyses are performed. The standard deviation represents the level of the internal variability in the model for the submonthly temporal scales, which could in part be responsible for the errors in the monthly averaged profiles assessed against single synoptic ship observations. In general, the rmsd of the control quantities and the data is about twice as large as the model standard deviation. The MvOI experiment shows comparable skill in temperature as the UOI with the greatest reduction in rmsd in the thermocline in the Niño-3 region south of the equator (Figs. 8 and 9). Below 400 m neither of the assimilation schemes shows smaller rmsd than the control run due to the fact that data for assimilation are only available above 500 m and at this level the observations are sparse, but the MvOI error is smaller north of the equator in both Niño-3 and Niño-4 regions. The transition between the upper part of the water column, where the temperature profile is corrected by the assimilation, to the abyss, where the data are absent, may cause disruptions in the internal dynamic balances. While the model is attempting to reinstate them using available mixing tools, it is not able to fully preserve the temperature structure below the transition region, which is reflected in the larger rmsd (top panels in Figs. 8 and 9). Apparently we should have included broader, deeper covariances to take care of this situation. [The problem has been corrected in the global implementation.] The MvOI is able, however, to preserve the salinity structure very well in every region except south of the equator in the Niño-3 region. The MvOI current structure is also improved compared with the UOI everywhere, especially south of the equator.

The UOI assimilation improves upon the control case in the representation of temperature, yet the investigation of other model fields, such as salinity, reveals potential problems in a long-term integration. To illustrate this, consider time series of the equatorial salinity, averaged between 2°S and 2°N at the thermocline depth compared to the observed salinity (Fig. 10). In the UOI experiment, within 3–4 months the salinity structure deteriorates significantly. Poor performance of UOI is due to the fact that correcting the temperature field alone introduces artificial and potentially unstable water mass anomalies whose propagation and eventual strengthening destroys model dynamical balances. A method to alleviate this problem, proposed by Troccoli and Haines (1999) relies on the model-derived water mass properties to correct the model salinity commensurate with the temperature corrections made by assimilating temperature observations. The salinity increments are calculated according to the temperature analysis by preserving the model’s local *T*–*S* relationship. While the proposed method shows improvement in temperature and salinity analyses when tested with the Poseidon ocean model (Troccoli et al. 2003), it has the limitation that the scheme is designed solely for temperature observations and relies on the model maintaining a consistently good *T*–*S* relationship.

To test how well the assimilation schemes preserve the water mass properties, we consider, in a manner similar to Troccoli et al. (2003), the *T*–*S* relationships in the same subregions as used above. *T*–*S* pairs at each observation are compared with model values interpolated to the same locations using a *T*–*S* grid of granularity 0.25°C by 0.1 (Figs. 11 and 12). At least five *T*–*S* pairs must be found for a colored circle to be plotted to make sure that the features in the figures are robust. For a black dot to appear all of these values must be from a model simulation, for a cyan dot to be plotted all five must be observations, and for a red dot to appear there must be a total of at least five of either kind.

North and south of the equator in both Niño-3 and Niño-4 regions the model without assimilation (top panels) shows good representation of *T*–*S* except in the area of warmest water (cyan circles near the top of the plot) and somewhat in the representation of the dense cool saline water (few cyan circles below the main body of red color). The first deficiency is successfully corrected by the MvOI and to a lesser degree by the UOI. Some observed surface warm saline waters in the Niño-3 region north of the equator are not included in any of the model analyses, probably due to errors in surface forcing that the assimilation is not able to rectify. The problem of the lack of dense saline water in the model is slightly overcorrected by MvOI: all cyan circles change to red and some black circles appear in the Niño-3 region north and south of the equator and in the Niño-4 region south of the equator. The UOI scheme shows gross overproduction of this type of water south of the equator and to a lesser degree in the north, and it misses the more saline side of the distribution from anomalous density *σ _{θ}* of 22 to 26 kg m

^{−3}, north of the equator as well as in the south. Thus, significant problems are apparent in the UOI scheme, while MvOI is able to improve upon the control over almost the entire range of the

*T*–

*S*diagram.

Meridional cross sections of the temperature, salinity, and zonal velocity (Figs. 13 –15) are compared to a selection of sections prepared and presented in Johnson et al. (2000). The sections are chosen so that approximately simultaneous sections across the Pacific basin can be shown after a long period of integration (about 2 yr). These sections are included in the rmsd statistics of Figs. 8 and 9. The temperature in the UOI experiment is an improvement over the control, while the salinity structure in the UOI has little resemblance to data. The model by itself is capable of producing good salinity and current fields. The UOI salinity cross sections display no penetration of the saline waters from the south across the equator. The salinity close to the equator is too low and there is an erroneous deep extension of high salinity around 8°S in the eastern basin. The MvOI salinity cross sections are closer to the observations, although the salinity near the surface at 165°E north of the equator is somewhat low and the region of high salinity values at 180°W south of the equator is too wide. The MvOI zonal current is the closest to the observed in the western and eastern Pacific with a better representation of the deeper subsurface maxima and a surfacing of the undercurrent at 165°E. The UOI currents reach too deep. At the date line the current structure in MvOI is exaggerated compared to observed, but the secondary subsurface maximum at about 4°N (the northern subsurface countercurrent) is captured in the assimilation. UOI currents are again too weak, particularly at the equator, and reach too deep south of the equator. It is apparent from these figures that the MvOI corrects the current structure on and close to the equator better than the statistics of Figs. 8 and 9 might suggest.

## 6. Conclusions

Two conceptually different forecast error covariance models were considered in the context of optimal interpolation data assimilation. One is the univariate model of the temperature error, which uses a Gaussian spatial covariance function with different scales in the zonal, meridional, and vertical directions. The second is the multivariate error covariance matrix estimated in the dominant error subspace of empirical orthogonal functions (EOFs) generated from Monte Carlo simulations. The latter provides an empirical estimate of the covariability of the errors in temperature, salinity, and current fields and spatial structure consistent with the governing dynamics. Thus during an assimilation cycle not only the temperature field but the entire ocean-state vector can be updated.

The univariate assimilation scheme brought the temperature field close to observations, yet the structure of the unobserved fields (salinity and currents) deteriorated quickly, precluding long-term integration. Most of the problems with the univariate OI run (no salty tongue in the south and deep penetration of salinity in the south; currents that are too deep) are due to neglect of the correlation between temperature and salinity when assimilating temperature alone, which tends to cause spurious convective overturning. The multivariate scheme more successfully corrects the salinity and currents as verified by independent observations.

The empirical error covariance model presented in this study is an initial estimate of the forecast error covariance and is used throughout the assimilation under the assumption that the forecast error statistics do not change significantly in time or after prior assimilation. The robustness of such an estimate was investigated and it was found that it does not exhibit significant seasonal or interannual variability, although there are not enough simulation years to distinguish among statistics during El Niño, La Niña, and normal years.

The empirical multivariate forecast error covariance model provides important information regarding the error statistics of all the model fields, prognostic or diagnostic. This gives a natural way to include into the state-estimation process observations of different types, for example, the sea surface height, which is often a model diagnostic.

Further developments are underway in implementing the MvOI method for the global ocean model configuration, particularly improving the ensemble statistics by including synoptic perturbations to the forcing fields, perturbations to the model parameters, and initial conditions. It is more natural, taking into account the Poseidon ocean model formulation, to consider the covariances of the model variables within the quasi-isopycnal layers. Investigations are also underway to make the MvOI scheme more efficient in a reduced space by including only a limited number of leading EOFs.

## Acknowledgments

The study was supported by funding provided by NASA Oceanography Program under RTOP 622-48-04 and computer time was provided by the NASA Center for Computational Sciences at NASA Goddard Space Flight Center. GCJ was supported by the NOAA Office of Oceanic and Atmospheric Research, the NOAA Office of Global Programs, and the NASA Physical Oceanography Program. We also gratefully acknowledge the helpful comments from anonymous reviewers.

## REFERENCES

Atlas, R., , S. C. Bloom, , R. N. Hoffman, , J. V. Ardizzone, , and G. Brin, 1991: Space-based surface wind vectors to aid understanding of air-sea interactions.

,*Eos, Trans. Amer. Geophys. Union***72****,**201. 204. 205. 208.Bloom, S. C., , L. L. Takacs, , A. M. da Silva, , and D. Ledvina, 1996: Data assimilation using incremental analysis updates.

,*Mon. Wea. Rev.***124****,**1256–1271.Borovikov, A., , M. M. Rienecker, , and P. S. Schopf, 2001: Surface heat balance in the equatorial Pacific Ocean: Climatology and the warming event of 1994–95.

,*J. Climate***14****,**2624–2641.Buehner, M., 2005: Ensemble-derived stationary and flow-dependent background error covariances: Evaluation in a quasi-operational NWP setting.

,*Quart. J. Roy. Meteor. Soc.***131****,**1013–1044.Burgers, G., , P. J. van Leeuwen, , and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126****,**1719–1724.Cane, M. A., , A. Kaplan, , R. N. Miller, , B. Tang, , E. C. Hackert, , and A. J. Busalacchi, 1996: Mapping tropical Pacific sea level: Data assimilation via a reduced state Kalman filter.

,*J. Geophys. Res.***101****,**C10,. 22599–22617.Carton, J. A., , and E. C. Hackert, 1990: Data assimilation applied to the temperature and circulation in the tropical Atlantic, 1983–1984.

,*J. Phys. Oceanogr.***20****,**1150–1165.Cohn, S. E., 1997: An introduction to estimation theory.

,*J. Meteor. Soc. Japan***75****,**257–288.Daley, R., 1991:

*Atmospheric Data Analysis*. Cambridge University Press, 457 pp.Derber, J. C., , D. F. Parrish, , and S. J. Lord, 1991: The new global operational analysis system at the National Meteorological Center.

,*Wea. Forecasting***6****,**538–547.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99C****,**10143–10162.Fisher, M., , and E. Andersson, 2001: Developments in 4D-Var and Kalman filtering. Tech. Memo. 347, ECMWF, 36 pp.

Freitag, H. P., , Y. Feng, , L. J. Mangum, , M. P. McPhaden, , J. Neander, , and L. D. Stratton, 1994: Calibration procedures and instrumental accuracy estimates of TAO temperature, relative humidity and radiation measurements. ERL PMEL-104, NOAA, 16 pp.

Fukumori, I., , J. Benveniste, , C. Wunch, , and D. B. Haidvogel, 1993: Assimilation of sea surface topography into an ocean circulation model using a steady-state smoother.

,*J. Phys. Oceanogr.***23****,**1831–1855.Gaspari, G., , and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125****,**723–757.Ghil, M., , and P. Malanotte-Rizzoli, 1991: Data assimilation in meteorology and oceanography.

*Advances in Geophysics,*Vol. 33, Academic Press, 141–266.Golub, G. H., , and C. F. Van Loan, 1996:

*Matrix Computation*. The Johns Hopkins University Press, 694 pp.Harrison, D. E., , E. F. P. Minnis, , B. Barkstrom, , and G. Gibson, 1993: Radiation budget at the top of the atmosphere.

*Atlas of Satellite Observations Related to Global Change,*R. J. Gurney, J. L. Foster, and C. L. Parkinson, Eds., Cambridge University Press, 19–38.Houtekamer, P. L., , and H. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Houtekamer, P. L., , and H. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129****,**123–137.Houtekamer, P. L., , L. Lefaivre, , J. Derome, , H. Ritchie, , and H. L. Mitchell, 1996: A system simulation approach to ensemble prediction.

,*Mon. Wea. Rev.***124****,**1225–1242.Ji, M., , A. Leetmaa, , and J. Derber, 1995: An ocean analysis system for seasonal to interannual climate studies.

,*Mon. Wea. Rev.***123****,**460–481.Johnson, G. C., , M. J. McPhaden, , G. D. Rowe, , and K. E. McTaggart, 2000: Upper equatorial Pacific Ocean current and salinity variability during the 1996–1998 El Niño–La Niña cycle.

,*J. Geophys. Res.***105****,**1037–1053.Johnson, G. C., , B. M. Sloyan, , W. S. Kessler, , and K. E. McTaggart, 2002: Direct measurements of upper ocean currents and water properties across the tropical Pacific during the 1990s.

*Progress in Oceanography,*Vol. 52, Pergamon Press, 31–61.Kalman, R., 1960: A new approach to linear filtering and prediction problems.

,*J. Basic Eng.***D82****,**35–45.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.,***77****,**437–471.Kaplan, A., , Y. Kushnir, , M. A. Cane, , and M. B. Blumenthal, 1997: Reduced space optimal analysis for historical data sets: 136 years of Atlantic sea surface temperature.

,*J. Geophys. Res.***102****,**27835–27860.Keppenne, C. L., , and M. M. Rienecker, 2002: Initial testing of a massively parallel ensemble Kalman filter with the Poseidon isopycnal ocean general circulation model.

,*Mon. Wea. Rev.***130****,**2951–2965.Keppenne, C. L., , and M. M. Rienecker, 2003: Assimilation of temperature into an isopycnal ocean general circulation model using a parallel ensemble Kalman filter.

,*J. Mar. Syst.***40–41****,**363–380.Kraus, E. B., , and J. S. Turner, 1967: A one-dimensional model of the seasonal thermocline: II. The general theory and its consequences.

,*Tellus***19****,**98–109.Large, W. G., , and S. Pond, 1981: Open ocean momentum flux measurements in moderate to strong winds.

,*J. Phys. Oceanogr.***11****,**324–336.Levitus, S., , and T. P. Boyer, 1994: Temperature. Vol. 4,

*World Ocean Atlas 1994,*NOAA Atlas NESDIS, 117 pp.Lorenc, A. C., 1986: Analysis methods for numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***112****,**1177–1194.McPhaden, M. J., and Coauthors, 1998: The tropical ocean global atmosphere observing system: A decade of progress.

,*J. Geophys. Res.***103****,**14169–14240.Oke, P. R., , J. S. Allen, , R. N. Miller, , G. D. Egbert, , and P. M. Kosro, 2002: Assimilation of surface velocity data into a primitive equation coastal ocean model.

,*J. Geophys. Res.***107****.**3122, doi:10.1027/2000JC000511.Pacanowski, R., , and S. G. H. Philander, 1981: Parameterization of vertical mixing in numerical models of tropical oceans.

,*J. Phys. Oceanogr.***11****,**1443–1451.Philander, S. G., 1990:

*El Niño, La Niña, and the Southern Oscillation*. Academic Press, 239 pp.Preisendorfer, R. W., 1988:

*Principal Component Analysis in Meteorology and Oceanography*. Elsevier, 425 pp.Rienecker, M. M., , and R. N. Miller, 1991: Ocean data assimilation using optimal interpolation with a quasi-geostrophic model.

,*J. Geophys. Res.***96****,**C8,. 15093–15103.Rosati, A., , R. Gudgel, , and K. Miyakoda, 1996: Global ocean data assimilation system.

*Modern Approaches to Data Assimilation in Ocean Modeling,*P. Malanotte-Rizzoli, Ed., Elsevier, 181–203.Rosati, A., , K. Miyakoda, , and R. Gudgel, 1997: The impact of ocean initial conditions on ENSO forecasting with a coupled model.

,*Mon. Wea. Rev.***125****,**754–772.Rossow, W. B., , and R. A. Schiffer, 1991: ISCCP cloud data products.

,*Bull. Amer. Meteor. Soc.***72****,**2–20.Schopf, P. S., , and A. Loughe, 1995: A reduced-gravity isopycnal ocean model—Hindcasts of El Niño.

,*Mon. Wea. Rev.***123****,**2839–2863.Seager, R., , M. B. Blumenthal, , and Y. Kushnir, 1995: An advective atmospheric mixed layer model for ocean modeling purposes: Global simulation of surface heat fluxes.

,*J. Climate***8****,**1951–1964.Shapiro, R., 1970: Smoothing, filtering and boundary effects.

,*Rev. Geophys. Space Phys.***8****,**359–387.Suarez, M. J., , and L. L. Takacs, 1995: Documentation of the Aries/GEOS dynamical core Version 2. NASA Tech. Memo. 104606, Vol. 5, 44 pp.

Troccoli, A., , and K. Haines, 1999: Use of the temperature–salinity relation in a data assimilation context.

,*J. Atmos. Oceanic Technol.***16****,**2011–2025.Troccoli, A., and Coauthors, 2002: Salinity adjustments in the presence of temperature data assimilation.

,*Mon. Wea. Rev.***130****,**89–102.Troccoli, A., , M. M. Rienecker, , C. L. Keppenne, , and G. C. Johnson, 2003: Temperature data assimilation with salinity corrections: Validation in the tropical Pacific Ocean, 1993–1998. NASA GSFC Tech. Rep. 104606, Vol. 24, 23 pp.

Wilson, S., 2000: Launching the Agro armada.

,*Oceanus***42****,**17–19.Xie, P. P., , and P. Arkin, 1997: Global precipitation: A 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs.

,*Bull. Amer. Meteor. Soc.***78****,**2539–2558.Yang, S., , K-M. Lau, , and P. S. Schopf, 1999: Sensitivity of the tropical Pacific Ocean to precipitation-induced freshwater flux.

,*Climate Dyn.***15****,**737–750.Zhang, S., , and J. L. Anderson, 2003: Impact of spatially and temporally varying estimates of error covariance on assimilation in a simple atmospheric model.

,*Tellus***55A****,**126–147.