## 1. Introduction

A very important task in ENSO forecasting is to determine the oceanic initial conditions as accurately as possible. Since the mid 1990s, a large effort has led to significant improvement in the initialization of ENSO prediction models through data assimilation of surface wind stress, subsurface in situ observations, and satellite observations.

The availability of data is a core issue in data assimilation. The oceanic observations, in particular subsurface observations, are relatively sparse in both space and time (McPhaden et al. 1998). The data issue is more serious in other regions such as midhigh latitudes and the subsurface below the thermocline where few in situ observations are available. Even for the tropical Pacific Ocean, it will require considerable computational effort to assimilate these data into ocean models. The computational expense is a primary concern in data assimilation.

As an alternative, satellites can consistently provide us with very complete surface observations (e.g., SST and altimetry) at very high resolution. There has been considerable interest in assimilating altimetry data (e.g., Carton et al. 1996; Cooper and Haines 1996; Segschneider et al. 2000). However, relatively little attention has been paid to the assimilation of SST, particularly with regard to ENSO prediction. Oberhuber et al. (1998) successfully initialized a coupled ocean–atmosphere general circulation model with observed SST anomalies (SSTAs) by a nudging scheme for several ENSO prediction cases. Such an initialization scheme, however, failed to generate good predictions with other models (Chen et al. 1997; Rosati et al. 1997).

A major probable reason is that the strategy that is usually used in assimilation of SST data cannot effectively correct the subsurface ocean state. SST is a prognostic variable in the ocean models, thus the general procedure of SST assimilation is to optimally insert it into the models (e.g., Chen et al. 1997; Rosati et al. 1997; Syu and Neelin 2000; Tang and Heish 2003). However, this strategy can lead to serious imbalances between the thermal fields and the dynamical fields during the assimilation process. There are two primary reasons for this. First, it cannot effectively correct the subsurface thermodynamical structure (thermocline); the thermocline is mainly affected by the atmospheric wind stress and the direct impact of SST on it is of little significance. Second, there generally exist large systematic differences in the spatial distribution of variance between the model SST field and the observed SST field. As compared with the observed SST, the modeled SST appears to be more narrowly confined to the equator, with less variability near the eastern boundary for almost all ocean models. With the assimilation of observed SST, the structure of the model SST is quickly forced to resemble its observational counterpart. However, the model adjustment is relatively slow, especially for adjustment of the thermocline, which mainly determines the variability of SST anomalies in the equatorial central and eastern Pacific.

Therefore, a key issue for SST assimilation is to alleviate the aforementioned imbalances. This could be implemented in theory via well-defined error covariance matrices for the model and observation. However, the design of the error covariance structures is difficult and not well understood. Recently, a novel assimilation scheme based on error subspace statistical estimation (ESSE) was proposed (Lermusiaux and Robinson 1999), which combines data and dynamics in accord with their respective dominant uncertainties. However, the costly algorithm might limit its application on an OGCM at present computational conditions.

Alternatively, Tang and Kleeman (2002) proposed a relatively simple strategy with an intermediate complexity ocean model to solve this problem, which involved assimilating two proxy datasets, SST and subsurface thermal data, into the ocean model, so that the observational forcing is not made too strong in the regions where the model SST has a significantly different variance structure comparable to the observations, and the subsurface temperature could be well adjusted. The subsurface thermal data was derived from surface SST with a statistical technique similar to that used in Fox et al. (2002) and Carnes et al. (1994).

In this paper, we further develop the strategy proposed by Tang and Kleeman (2002) applied to an OGCM. A statistical atmospheric model is also coupled to the OGCM to investigate ENSO prediction skills initialized with the assimilation scheme. This paper is structured as follows: Section 2 briefly describes the ocean model and data assimilation scheme. Section 3 describes a new scheme for SST assimilation. Sections 4 and 5 validate and test the new scheme by examining ocean analyses. Section 6 further explores the new scheme by examining ENSO prediction skills, followed by a summary and discussion in section 7.

## 2. Ocean model and data assimilation method

### a. The ocean model

The ocean model used is based on the Océan Parallélisé (OPA), version 8.1 (Madec et al. 1998), a primitive equation OGCM. The version of the model used here is configured for the tropical Pacific ocean between 30°N and 30°S and between 120°E and 75°W. The horizontal resolution in the zonal direction is 1°, while the resolution in the meridional direction is 0.5° within 5° of the equator, smoothly changing up to 2.0° at 30°N and 30°S. There are 25 vertical levels with 17 concentrated in the top 250 m of the ocean. The time step for integration was 1.5 h. The boundaries were closed, with no-slip conditions. Detailed formulation of the ocean model is described in Vialard et al. (2002).

*Q*

_{s}as forcing fields, where

*Q*

_{s}was represented by climatological heat flux

*Q*

_{0}, obtained from the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis project, plus a relaxation term to

*T*

_{0}, the observed climatological SST; that is,

*Q*

_{s}

*Q*

_{0}

*λ*

*T*

*T*

_{0}

*T*is the model SST, and

*λ*(which is −40 W m

^{−2}K

^{−1}) controls the rate of relaxation to the observed SST. A similar annual mean net freshwater flux forcing was also used. This will be referred to as the control run.

### b. The assimilation scheme

The superscript T represents the transpose, **T** is an *N*-component vector containing the correction to the first-guess field (the first-guess field being generated by the model before assimilating the latest data), 𝗘 is an estimate of the *N* × *N* first-guess error covariance matrix and is assumed to be a simple Gaussian function, **D** is a simple bilinear operator interpolating from the model grid to the observational stations, **T**_{0} is an *M*-component vector containing the difference between the observations and the interpolated first-guess field, and 𝗙 is an estimate of the *M* × *M* observational error covariance matrix and is assumed to be diagonal, where *N* and *M* denote the number of model grid points and the number of the observational stations, respectively. The first term of the right-hand side of (2) is a measure of the fit of the corrected field to the first-guess field, while the second term measures the fit of the corrected field to the observations. The result is a weighted average of the first-guess field (which contains information from an earlier period) and the observations.

The function *I* in (2) is minimized using a preconditioned conjugate gradient algorithm (Gill et al. 1981). The detailed formulation and description of the data assimilation algorithm can be found in Tang and Hsieh (2003).

### c. The old scheme for SST assimilation

As SST usually is a prognostic variable in ocean models, a common strategy for SST assimilation is to directly insert SST observations into models by minimizing the function *I* of (2). This strategy is referred to as “old scheme” in this paper.

The SST observations assimilated in this study are the monthly mean SST obtained from the Comprehensive Ocean–Atmospheric Data Set (COADS; Smith et al. 1996) with 2° lat by 2° lon resolution. The assimilation domain is confined to the tropical Pacific, 15°S–15°N for all assimilation experiments described in this paper.

*ae*

^{−r2/(b2cosϕ)}

*ϕ*is the latitude of the grid point,

*r*is the distance between any two points,

*a*= 0.01 (°C)

^{2}, and

*b*= 570 km, as in Derber and Rosati (1989). The cost function is preconditioned by 𝗘 so that only 𝗘, and not its inverse 𝗘

^{−1}, is needed in practice. The observation error variances for SST are set to (0.5°C)

^{2}.

## 3. A new scheme for SST assimilation

As discussed in the introduction, the old scheme for the assimilation of SST can lead to serious imbalances of model variables. The imbalances are mainly caused by a rapid correction of model SST and a slow adjustment of model subsurface temperature during the assimilation of SST. Therefore, some external forcing should be imposed during the assimilation process to speed up the adjustment in the subsurface ocean for alleviating the imbalances. In this section, we will propose a new scheme for this purpose. An essential issue in the new scheme is to seek additional relations between SST and the subsurface temperature in order to transfer the temperature corrections from the surface to the subsurface during the assimilation process. Such additional relations are derived from either statistical techniques or physical laws. This strategy has been widely used in the assimilation of sea level height (SLH), such as in Ji et al. (2000), Carton et al. (1996), and Cooper and Haines (1996). Here the additional relations play two roles in SLH assimilation: (i) transferring the correction in SLH, a diagnostic variable of ocean models, to the correction in model prognostic variables such as temperature and (ii) alleviating the imbalance among model variables by forcing the subsurface adjustment. The recognition of (ii), however, is often neglected because of the particular importance of (i) in SLH assimilation.

Plotted in Fig. 1 are the first two principal components (PCs) of the temperature anomalies in several model levels, derived from the model control run. The good agreements among these PCs at different depths can be clearly seen in Fig. 1, in particular for PC1. Such good relations were also found using observations as shown in Fig. 2. Figures 1 and 2 provide the justification for using statistical techniques based on SST for correcting the subsurface temperature fields.

When constructing a statistical equation, one important issue is the choice of input variables (i.e., predictors). It seems straightforward to use SST as inputs here for constructing statistical equations for all subsurface levels, as SST is being assimilated with observation data. However, because the statistical relations between SST and subsurface temperature decay with the depth, using SST as inputs could lead to poor statistical estimates for deep levels. Table 1 shows the correlation coefficients of temperature anomalies among several model levels, indicating a rapid decay of correlation with the depth, in particular for PC2. On the other hand, there is always a good correlation between adjacent levels as the diagonal elements indicated in Table 1, suggesting that using the statistical relations between adjacent levels might be a better choice for estimating the deeper level temperatures.

An elaborate procedure for the statistical scheme is thus to build statistical models for any two adjacent levels, starting from the surface. But as the variations in temperature change little within a limited depth during SST assimilation, we only construct the statistical models for several representative regions for simplicity. We divided the upper 250 m of the ocean (17 model levels) into seven regions as shown in Table 2. A representative level at each region, referred to as the statistical level, is chosen to build the statistical model for this region.

Another issue for statistical techniques is to choose an adequate method. Because there is no a priori reason to believe nonlinearity is insignificant for subsurface thermal fields, even for adjacent levels, we tested three methods here: PC linear regression (LR), PC nonlinear regression using a neural network (NN), and a singular value decomposition (SVD) technique. For PC regression, an EOF analysis was employed to each statistical level. The first three modes accounting for over 70% of total variance were kept to construct the statistical relation between adjacent levels for the LR and NN methods. The detailed formulation of the NN model is described in Tang et al. (2001).

Figure 3 shows the predicted values of the first three PCs of the temperature at the depth of 130 m using LR and NN, using the first three PCs of the temperature at the depth of 105 m as inputs. As can be seen, the differences between LR and NN is relatively small, suggesting a good linear relation exists between the two neighboring regions. This is also true for other regions (not shown), so for the remainder of this paper we have used a linear statistical method.

SVD differs from linear regression based on PC analysis because the former chooses the predictors that most covariate with predictands, whereas the later uses the predictors that explain the most variances of the input fields (i.e., leading PCs). We also applied the SVD technique to build the statistical relations for two adjacent regions, and found that it had advantages over PC regression as shown in Fig. 4. Therefore, SVD was used to derive all statistical relations in this paper. To alleviate artificial skill, a cross-validated scheme described in Tang et al. (2001) was used to develop these statistical relations. Figure 5 shows cross-validated correlation and rmse (root mean square error) skills between the estimated temperature anomalies using SVD against the modeled values from the control run, indicating that a skillful estimate with correlation over 0.6–0.7 can be obtained for all levels.

A common strategy for SLH assimilation is to transfer the corrections in SLH to the corrections in other variables using derived statistical relations; that is, the statistically estimated values are directly used to update the oceanic states. This strategy neglects statistical errors and assumes that the statistical relations derived are perfect. Clearly, this is far from the truth. Therefore, we will adapt an alternative strategy for SST assimilation in this paper. Viewing the statistically estimated values as an estimate of true oceanic states and the simulated value from the model control run as another estimate, an optimal estimate for the true oceanic state thus can be obtained by combining both estimates via the optimization algorithm described in section 2b. As such, this new strategy, in fact, consists of an iterative assimilation procedure; that is, the statistically estimated values are used as proxy data for further assimilation.

Thus, at one time step, the detailed procedure of the new scheme for SST assimilation can be summarized as below:

i) The ocean model was forced with observed wind stress to form a control run.

ii) Statistical relations between any two adjacent regions in Table 2 were derived from the control run using the SVD technique.

iii) SST was inserted into level 3 (i.e., region 1 of Table 2) of the ocean model using the 3D Var algorithm described above to obtain the temperature analysis at this level.

iv) By subtracting the first-guess from the analysis, the increment was obtained which was then used to correct the temperature of the other levels in this region.

v) By the statistical relation derived from ii, the statistically estimated values of the temperature at level 8 were obtained using the temperature analyses at level 3. The estimated values, as proxy observations, were assimilated into level 8. Repeating iv, the temperature of other levels in this region were corrected.

vi) Repeating v for all other regions sequentially so that all subsurface temperature in the upper 250 m were corrected.

For simplicity of the algorithm of data assimilation, the observational errors are usually assumed to be uncorrelated; that is, the matrix 𝗙 is diagonal. However, for the above procedure v and vi, the “observations” used here are actually from a statistical estimate, suggesting that the observational errors are probably spatially correlated. In this case, the assumption that 𝗙 is diagonal is not correct and could lead to an overweighting of the observations in the optimization. In principle, the problem can be solved by a nondiagonal matrix for 𝗙, although this would greatly increase the complexity of the algorithm for solving (2). To avoid the complications of defining the inverse of a nondiagonal matrix 𝗙, we adopt a simpler strategy of thinning the estimated observations used in (2) and hence reducing the effects of correlated observation errors. To achieve this, we discard “data” at grid points on even-order latitudes.

## 4. Validation of analyses

In this section, we examine oceanic analyses from two SST assimilation experiments: one using the old scheme and the other using the new scheme. For comparison, results from the control run are also presented. We choose two physical variables, SSTA and heat content anomalies^{1} (HCA), to explore and validate the oceanic analyses because the former is a direct target of ENSO prediction and represents significant ENSO characteristics, whereas the latter is the source of memory for the coupled system and is important for ENSO dynamics. In addition, the HCA also equivalently represents anomalous features of the sea level height and thermocline.

Figure 6 shows the correlation and rmse between observed and modeled SSTA from the control run and the two oceanic analyses. As can be seen, both ocean analyses significantly improve the tropical Pacific SSTA simulation with correlations over 0.8 for the whole assimilation domain. The improvements occur not only in the regions where the control run has a good simulation skill, but also in the regions where the control run is poor. Since the model simulations are compared to the same data that were assimilated, it is not surprising that such a high correlation was achieved. In fact, the good SSTA simulation skills in both schemes are probably mainly from the simple forcing of observations rather than from the adjustment of model physics. This will be further seen in discussions below.

The comparisons between observations and analyses in terms of HCA simulation are shown in Fig. 7. The observed HCA is from the dataset of the Joint Environmental Data Analysis Center at the Scripps Institution of Oceanography. This dataset consists of all available XBT, CTD, MBT, and hydrographic observations, optimally interpolated by White (1995) to a three-dimensional grid of 2° lat by 5° lon, and 11 standard depth levels between the surface and 400 m. As shown in Fig. 7, the best correlation coefficients and rmse appear in the analysis with the new scheme as expected. Since we use a statistical approach to analyze subsurface temperature during SST assimilation, the improvement of the HCA simulation from this ocean analysis also provides some justification for this approach.

There is little difference between the skills of the HCA simulation of the control run (Figs. 7a,b) and of the old scheme (Figs. 7c,d). This is quite different from the SSTA simulation skills shown in Fig. 6 as discussed above, suggesting that possibly there is a serious inconsistency between the variations in SST and those in thermocline during SST assimilation using the old scheme. This inconsistency would have a critical impact on prediction skill, which will be discussed in section 6. Such an inconsistency is also seen in Figs. 8 and 9, which show time–longitude distributions of SSTA and HCA along the equator. In Fig. 8, the old scheme can generate a good SSTA simulation similar to the observations. However, for HCA simulation shown in Fig. 9b the old scheme shows little improvement when compared with the control run. It seems that the HCA has a more realistic feature of eastward propagation in the new scheme (Fig. 9c) than in the old scheme (Fig. 9b) and control run (Fig. 9a). This is obvious during some moderate El Niño events such as 1986–87 and the early 1990s. Figure 10 shows the correlation and rmse along the equator from the control run and the oceanic analyses with the old scheme and the new scheme against the observed HCA, clearly indicating that the new scheme led to some improvements for HCA simulation in the equatorial eastern Pacific ocean.

As the propagation features described in Fig. 9 actually represent a phase-shift relation between the equatorial western Pacific and the equatorial eastern Pacific that characterizes ENSO delayed action oscillation (Schopf and Suarez 1988), so the new scheme could produce better initial conditions for ENSO prediction.

## 5. Comparison of the new scheme and data assimilation using subsurface observations

As discussed above, a more realistic thermal structure in the oceanic analysis (i.e., prediction initial field) can be obtained by the new scheme. In this section, we will further explore the new scheme by comparing it with the assimilation of subsurface observations. We will focus on a major issue here: that is, how much useful information can be captured by the new scheme relative to the assimilation of subsurface observations.

We choose the NCEP reanalysis subsurface temperature (Behringer et al. 1998, also referred to as NCEP data hereinafter) instead of subsurface in situ observations (e.g., TAO and XBT) for the subsurface assimilation experiment. This is because the NCEP reanalysis prod-uct not only is much easier and more convenient to use compared with sparse and sporadic subsurface in situ observations (McPhaden et al. 1998), but can effectively improve the predictions for Niño-3 (5°N–5°S, 150°–90°W) sea surface temperature anomalies at all lead times and generate as good oceanic analyses as in situ observations (Tang et al. 2003, manuscript submitted to *Mon. Wea. Rev.*, hereinafter T03). With the assimilation scheme described in section 2, NCEP data of the upper 17 levels ranging from 5 to 240 m were respectively inserted into the corresponding level of the OGCM. NCEP data were vertically interpolated into each model level with a simple linear algorithm prior to assimilation. Considering the computational cost, we perform the assimilation experiment for the period from 1993 to 1998 because it contains the strongest El Niño event and also has been used by Weaver et al. (2003) and Vialard et al. (2003) to test a 3DVar and 4DVar assimilation system using the same OGCM as here.

Figure 11 shows Hovmöller diagrams of upper-ocean 250-m HCA along 5°N and 5°S from the new scheme when NCEP analysis data are assimilated. Clearly HCA analysis from the new scheme is very similar to that from NCEP data assimilation. The correlation of HCA between the new scheme and NCEP data assimilation is all over 0.8 for the equatorial belt from 10°S to 10°N (not shown). This suggests that the new scheme could possibly lead to as good subsurface thermal analyses as the assimilation of subsurface data.

Figure 12 shows the correlation and rmse between the observed HCA and analyzed HCA from the assimilation using NCEP data. Comparison between Fig. 12 and Figs. 7e,f further shows that the new scheme can produce HCA analyses that are as good as these obtained from the assimilation of NCEP data. In fact, the correlation of HCA along the equator from the new scheme is almost the same as that from the assimilation of NCEP data (Fig. 13).

The new scheme can also lead to good analyses for dynamical variables due to the model adjustment to the corrections in the subsurface thermal fields. Figure 14 shows two analyses for zonal velocity, generated by the new scheme and the assimilation of NCEP data. As shown in Fig. 14, the new scheme better simulates the equatorial currents than the control run relative to the analyses from the assimilation of NCEP data, in particular for the westward equatorial currents in the central and eastern Pacific Ocean, although the equatorial currents propagate farther west in the new scheme than in the assimilation of NCEP data. A good example is for 1998 in which the new scheme produced westward equatorial currents similar to those from the assimilation of NCEP data in terms of the strength and phase propagation, in contrast with a failed simulation by the control run. The improvement of the simulation of the zonal velocity from the new scheme can be further demonstrated by Fig. 15, which shows that the simulation skills from the new scheme are considerably better than those from the control run in the equatorial central-eastern Pacific Ocean.

The results shown above indicate that the new scheme can effectively adjust oceanic thermal and dynamical fields. This suggests that the new scheme provides a useful way for the ocean data assimilation, in particular for the situations where the subsurface observations are either unavailable or hard to be processed.

## 6. Implications for ENSO prediction

A primary purpose of data assimilation is to provide good initial conditions for model predictions. The prediction performance is thus a valid test bed with which to evaluate a data assimilation scheme. In this section, we will examine ENSO predictions initialized by the new scheme. For comparison, the predictions initialized by the control run and the old scheme are also presented. The period of 1993–98 was chosen for prediction experiments. A total of 72 hindcasts of 12 months in duration were made for each experiment from January 1993 to December 1998, starting at each month (1 January, 1 February, … , 1 December).

A statistical atmospheric model was coupled to the ocean model, referred to as hybrid coupled model (HCM), for carrying out prediction experiments. The atmospheric model is a monthly varying statistical model identical to Barnett et al. (1993), trained by linear regression with the NCEP reanalysis wind products and Reynold–Smith SST observations (Smith et al. 1996) from 1951–80. The details of the atmospheric model can be found in T03.

Figure 16a shows correlation skills of the predictions initialized by the control run, the old scheme and the new scheme, where the predicted Niño-3 SSTA is compared against the observed values. Compared with persistence skill, the predictions initialized by the new scheme and the control run beat persistence for all lead times whereas the predictions initialized from the old scheme beat persistence from the seventh month on. The prediction skills from the old scheme rapidly decline with lead time at a rate faster than that of the control run, reaching a minimum at 5 months, beyond which the skill rebounds and increases. This is because the old scheme produces serious imbalances between the physical fields as discussed above, leading to a rapid decrease in prediction skills during the first few months. The imbalances were gradually alleviated by geostrophic adjustment in the coupled run and the skill rebounds and increases at later time. The rmse skills from the old scheme also present similar characteristics (Fig. 16b).

As compared with the control run, the new scheme improved the prediction skills for all lead times, in particular for lead times of 4–11 months. The most obvious improvement occurs for the prediction of lead times of 4–7 months, with high correlation skills around 0.7–0.8. Such high correlation skills, in fact, defeat the predictions initialized by the assimilation of subsurface observations (T03), which are still considered spatially sparse and temporally sporadic (McPhaden et al. 1998). In this sense the new scheme has practical significance. Likewise, the new scheme also displays significant improvements in rmse for all lead times (Fig. 16b), in particular for the lead times of 4–7 months.

The new scheme also improved the prediction skills of Niño-3.4 SSTA (5°N–5°S, 170°–120°W) for all lead times (Fig. 17). The improvement of the new scheme to Niño-4 (5°N–5°S, 160°E–150°W) SSTA predictions mainly occurred for lead times of under six months, as compared with the predictions initialized by the control run (Fig. 18), although the new scheme defeated the old scheme at all lead times.

Figures 16, 17, and 18 were obtained based on 72 prediction experiments. The finite sample size implies some uncertainty in the computed correlation coefficient and rmse. To determine the extent of this uncertainty we used the bootstrap method of sample point omission as discussed in T03. The uncertainty estimations (±*σ*) superimposed on the original skills are displayed in these figures (vertical bars). Clearly, the improvement in the correlation and rmse skills results from the improvement in the initial conditions due to new scheme rather than from the uncertainty in the finite sample size.

We next show predictions for two ENSO events, the 1997 and 1994–95 El Niño. The predictions of the SSTA along the equator from three initialization schemes are plotted in time–longitude diagrams in Fig. 19 for the 1997 El Niño. When compared with the observed SSTA, the new scheme successfully predicted the anomalous warming in the equatorial central-eastern Pacific in terms of the timing of the warming start and peak in contrast with the prediction initialized by the control run, which suffered an unrealistically early decay and a relatively weak strength. Obviously, the prediction is better by the new scheme than that by the control run and by the old scheme for the strongest El Niño.

The advantage of the new scheme could be further demonstrated by the prediction of the 1994 weak El Niño. Figure 20 shows the time–longitude diagram of predicted SSTA along the equator for the event. Clearly, the prediction initialized by the control run and by the old scheme failed for the weak event. However, the new scheme fairly well predicted the anomalous warming in the equatorial Pacific around six months ahead although the predicted warming was rather unevenly distributed along the equator, not concentrated in the eastern Pacific like observations. The prediction also decayed earlier than observation.

## 7. Summary and discussion

While the long and extensive SST observation records have been widely used in many applications, from climate analyses and statistical prediction to model validations, their application to ocean data assimilation and to the initialization of ENSO models have been less well addressed. This is probably due to the fact that the scheme that has been widely used in SST assimilation cannot effectively adjust the subsurface thermal fields, and so serious imbalances among model physical fields often appear during SST assimilation.

An essential issue in SST assimilation is to alleviate the imbalance between SST and other variables. As discussed in the introduction, the major reason for the imbalance is rapid forcing of SST by the observations and the subsequent slow adjustment in the subsurface ocean. This inconsistency can be seen in Figs. 6 and 7, where the old scheme has an almost perfect simulation for SSTA but there is little improvement in the simulation of HCA. Therefore, a possible solution to the imbalance problem is to seek external forcing to speed up the adjustment in the subsurface ocean. Towards this goal, a new strategy is proposed here and tested using a simple 3DVar assimilation scheme. In this new scheme, statistical relations between adjacent depths were derived using an SVD technique, and then used to transfer the temperature corrections from shallower levels to deeper levels.

To validate the new scheme, the oceanic analyses from the new scheme were compared with those from the old scheme, control run, and the assimilation of NCEP subsurface reanalysis temperature. It was found that, when compared with the old scheme, the new scheme can more effectively adjust oceanic thermal and dynamical fields. The ocean analyses from the new scheme display a better phase-shifted relation between the variations of the upper-ocean heat content in the western equatorial Pacific and in the eastern equatorial Pacific than that from the old scheme and the control run. This might explain why the new scheme can lead to better correlation skills for ENSO predictions than the old scheme and the control run. In fact, the new scheme yielded subsurface analyses that were as good as those obtained from the assimilation of subsurface observations.

An ensemble of predictions for SSTA was performed to further examine the new scheme. The results show that the new scheme can effectively improve ENSO prediction skills at all lead times, in particular for anomalous warm events, and for lead times of 4–7 months. This is interesting because ENSO predictions are currently the most skillful and reliable at these lead times. The prediction skills of Niño-3 obtained by the new scheme can be as high as, or even better than, those obtained by assimilating subsurface observations for lead time of less than 7–8 months.

A key issue in the new scheme is how to construct the statistical relations for transferring the corrections in temperature from the surface to the subsurface. We used two novel strategies here. First, the relation between two neighboring depths was used instead of the relation between surface and subsurface because the linear relation of SST to subsurface temperature rapidly decays with the depth. We also constructed statistical relations using SST for all levels and found that the relations are poor for deep levels below 120 m. Second, we did not use the statistical relations to directly update subsurface temperatures, as there are always estimate errors in any statistical schemes. Instead, the estimated values from the statistical scheme were used as a proxy dataset for further assimilation. This strategy allows us to have an optimal estimate for subsurface temperature. In fact, assimilating proxy data into ocean models for ocean analyses and ENSO predictions is often used either in situations where the observations are sparse or unavailable or in situations where a model with a low vertical resolution cannot assimilate the available observations (e.g., Moore and Anderson 1989; Kleeman et al. 1995). A third advantageous situation occurs here and in Tang and Kleeman (2002), where the proxy data are used to better estimate subsurface ocean states by a specified scheme.

The subsurface in situ observations are still considered spatially sparse and temporally sporadic, and even not available for many regions. Also, it will require considerable effort and computational expense to assimilate these data into an OGCM with high resolution (see Weaver et al. 2003; Vialard et al. 2003). On the other hand, SST can be obtained easily at low cost and has a long-term archive and high resolution. As compared with the assimilation of other types of data, the study of techniques of SST assimilation has received relatively little attention. In this paper, a practical scheme for the assimilation of SST data is proposed and examined. In this sense, this exploratory work has practical significance, in particular for ocean data assimilation for regions where subsurface observation are not available. A further study extending the proposed strategy to mid- and high-latitude regions is under way.

## Acknowledgments

This work is supported by NSF Grant 25-74200-F0960. We thank Jérôme Vialard and Anthony Weaver for their help for configuring the ocean model.

## REFERENCES

Barnett, T. P., M. Latif, N. E. Graham, M. Flügel, S. Pazan, and W. White, 1993: ENSO and ENSO-related predictability. Part I: Prediction of equatorial sea surface temperature with a hybrid coupled ocean–atmosphere model.

,*J. Climate***6****,**1545–1566.Behringer, D. M., M. Ji, and A. Leetmaa, 1998: An improved coupled model for ENSO prediction and implications for ocean initialization. Part I: The ocean data assimilation system.

,*Mon. Wea. Rev.***126****,**1013–1021.Carnes, M. R., W. J. Teague, and J. L. Mitchell, 1994: Inference of subsurface thermohaline structure from fields measurable by satellite.

,*J. Atmos. Oceanic Technol.***11****,**551–566.Carton, J. A., B. S. Giese, X. Cao, and L. Miller, 1996: Impact of altimeter, thermistor, and expendable bathythermograph data on retrospective analyses of the tropical Pacific Ocean.

,*J. Geophys. Res.***101****,**14147–14159.Chen, D., S. E. Zebiak, and M. A. Cane, 1997: Initialization and predictability of a coupled ENSO forecast model.

,*Mon. Wea. Rev.***125****,**773–788.Cooper, M., and K. Haines, 1996: Altimetric assimilation with water property conservation.

,*J. Geophys. Res.***101**(C1) 1059–1077.Derber, J., and A. Rosati, 1989: A global oceanic data assimilation system.

,*J. Phys. Oceanogr.***19****,**1333–1347.Fox, D. N., W. J. Teague, and C. N. Barron, 2002: The Modular Ocean Data Assimilation System (MODAS).

,*J. Atmos. Oceanic Technol.***19****,**240–252.Gill, P. E., W. Murray, and M. H. Wright, 1981:

*Practical Optimization*. Academic Press, 401 pp.Ji, M., R. W. Reynolds, and D. W. Behringer, 2000: Use of TOPEX/Poseidon sea level data for ocean analyses and ENSO prediction: Some early results.

,*J. Climate***13****,**216–231.Kleeman, R., A. Moore, and N. R. Smith, 1995: Assimilation of subsurface thermal data into a simple ocean model for the initialization of an intermediate tropical coupled ocean–atmosphere forecast model.

,*Mon. Wea. Rev.***123****,**3103–3113.Lermusiaux, P. F. J., and A. R. Robinson, 1999: Data assimilation via error subspace statistical estimation. Part I: Theory and schemes.

,*Mon. Wea. Rev.***127****,**1385–1407.Madec, G., P. Delecluse, M. Imbard, and C. Levy, 1998: OPA 8.1 Ocean General Circulation Model reference manual. Notes du pol̂e de modélisation 11, Institut Pierre Simon Laplace (IPSL), 91 pp.

McPhaden, M. J., and Coauthors, 1998: The Tropical Ocean Global Atmosphere (TOGA) observing system: A decade of progress.

,*J. Geophys. Res.***103**(C7) 14169–14240.Moore, A. M., and D. L. T. Anderson, 1989: The assimilation of XBT data into a layer model of the tropical Pacific Ocean.

,*Dyn. Atmos. Oceans***13****,**441–464.Oberhuber, J. M., E. Roeckner, M. Christoph, M. Esch, and M. Latif, 1998: Predicting the '97 El Niño event with a global climate model.

,*Geophys. Res. Lett.***25****,**2273–2276.Rosati, A., K. Miyakoda, and R. Gudgel, 1997: The impact of ocean initial conditions on ENSO forecasting with a coupled model.

,*Mon. Wea. Rev.***125****,**754–772.Schopf, P. S., and M. J. Suare, 1988: Vacillations in a coupled ocean–atmosphere model.

,*J. Atmos. Sci.***45****,**549–566.Segschneider, J., D. L. T. Anderson, and T. N. Stockdale, 2000: Towards the use of altimetry for operational seasonal forecasting.

,*J. Climate***13****,**3115–3138.Smith, T. M., R. W. Reynolds, R. E. Livezey, and D. C. Stokes, 1996: Reconstruction of historical sea surface temperatures using empirical orthogonal functions.

,*J. Climate***9****,**1403–1420.Syu, H-H., and D. Neelin, 2000: ENSO in a hybrid coupled model. Part II: Prediction with piggyback data assimilation.

,*Climate Dyn.***16****,**35–48.Tang, Y., and R. Kleeman, 2002: A new strategy for assimilating SST data for ENSO predictions.

*Geophys. Res. Lett.,***29,**1841, doi:10.1029/2002GL014860.Tang, Y., and W. W. Hsieh, 2003: ENSO simulation and prediction in a hybrid coupled model with data assimilation.

,*J. Meteor. Soc. Japan***81****,**1–19.Tang, Y., W. W. Hsieh, B. Tang, and K. Haines, 2001: A neural network atmospheric model for hybrid coupled modeling.

,*Climate Dyn.***17****,**445–455.Vialard, J., P. Delecluse, and C. Menkes, 2002: A modeling study of salinity variability and its effects in the tropical Pacific Ocean during the 1993–99 period.

*J. Geophys. Res.,***107,**8005, doi:10.1029/2000JC000758.Vialard, J., A. T. Weaver, D. L. T. Anderson, and P. Delecluse, 2003: Three- and four-dimensional variational assimilation with a general circulation model of the tropical Pacific Ocean. Part II: Physical validation.

,*Mon. Wea. Rev.***131****,**1379–1395.Weaver, A. T., J. Vialard, and D. L. T. Anderson, 2003: Three- and four-dimensional variational assimilation with a general circulation model of the tropical Pacific Ocean. Part I: Formulation, internal diagnostics, and consistency checks.

,*Mon. Wea. Rev.***131****,**1360–1378.White, W. B., 1995: Design of a global observing system for gyre-scale upper ocean temperature variability.

*Progress in Oceanography,*Vol. 36, Pergamon, 169–217.

The correlation of PC for temperature anomalies between two levels

Division of regions

^{1}

*h*

_{i}and

*T*

_{i}are, respectively, the thickness and temperature of level

*i.*