## 1. Introduction

**x**

*is an*

_{t}*n*-dimensional vector representing the coupled model state at time

*t*(

*n*is the size of the model state),

**f**is an

*n*-dimensional vector function,

**w**

*is a white Gaussian process (uncorrelated in time) of dimension*

_{t}*r*with mean 0 and covariance matrix 𝗦(

*t*), while 𝗚 is an

*n*×

*r*matrix. The first and second terms of the right-hand side in Eq. (1) represent the deterministic modeling and uncertainty contributions in a coupled system, respectively.

These uncertainties lead to the existence of modeled climate drifts from the real world. Observations on climate state variables are sparse and noisy in both time and space. For example, the expendable bathythermograph (XBT), the major means of measuring the ocean state throughout the twentieth century, provides only temperature profiles based on irregular ship courses, and starting from the early of 1990s, satellite altimeters began to provide measurements of sea surface height. All observations have instrument measurement errors and sampling (representation) errors. Neither modeling nor observations alone provides a complete picture of climate variations (which in oceans are defined by the time series of three-dimensional temperature, salinity, and currents, etc.).

Climate research requires the implementation of data assimilation with coupled climate models for 1) assessment of climate change from all perspectives (e.g., see Hahn and Manabe 1982), 2) initialization of forecasts (Rosati et al. 1997), and 3) estimation of climate state components for which adequate measurements are still unavailable. Coupled data assimilation (CDA) uses ocean–atmosphere coupled dynamics to extract the signals from available observations (some aspects of climate states during some time periods) to produce a continuous time series of climate states in which each variable is distributed over a regular mesh in time and space. Coupled dynamics impact the assimilation results in both direct and indirect ways. The direct way refers to using observations to directly adjust certain exchange fluxes between coupled components using covariances between fluxes and observed variables. Examples include the adjustment of wind stresses and heat/water fluxes at the surface by the observed temperature at the top of the ocean. On the other hand, the assimilation results can be impacted indirectly by the feedback processes between coupled components, which improve the estimate of the background covariances in the assimilation. One example is when only oceanic data assimilation (ODA) is carried out in a CDA system in which the atmospheric circulations are to be improved by the corrected sea surface temperature (SST) and in return the improved atmospheric flows provide better surface fluxes to the oceans, which improves the estimate of the background covariance in ODA. This positive feedback process is expected to speed up the convergence of assimilation and enhance the assimilation quality. Combining all aspects above, the net result is that the reconstructed historical sequence of climate states by CDA blends the observational information and coupled dynamics. Because all components of the CDA-estimated coupled model state are expected to stay in a dynamical balance at any instant in time, the initial shock of coupled model forecasts initialized from CDA products is expected to be minimized.

The coupled data assimilation system at the National Oceanic and Atmospheric Administration/Geophysical Fluid Dynamics Laboratory (NOAA/GFDL) solves for a temporally varying probability density function (PDF) of climate state variables by combining the PDF of observations and a prior PDF derived from dynamically coupled models using the framework described in Eq. (1). The resulting temporally varying PDF is a complete solution for the coupled data assimilation problem. The climate state is estimated by the expectation (the first moment of the PDF, i.e., the ensemble mean) and the uncertainty of the estimate is measured by all higher-order moments. The vectorized Eq. (1) means that the solved PDF has a joint-distribution nature that reflects the physical balance between state variables required by the coupled model dynamics. The prior PDF is discretely estimated using a set of ensemble integrations of the coupled model by a Monte Carlo approach. The combination of the observational PDF and the prior PDF is implemented using the ensemble adjustment Kalman filter (EAKF; Anderson 2001, 2003). Because four major components in the GFDL’s coupled climate models—atmosphere model, land model, ocean model, and sea ice simulator—are highly parallelized, the ensemble filter, also serving as the ensemble organizer, involves a so-called superparallel technique, which is an extension of the previous study of Zhang et al. (2005). The system is currently configured for assimilating both atmospheric and oceanic observations. Under the same ensemble organizer and filter framework, other assimilation components (e.g., land and sea ice) can be added feasibly in the future when the relevant measurements for assimilation become available. Utilizing the cross covariances provided by the joint PDF of climate state variables, evaluated by the ensemble integrations, the system is able to maintain the physical balance (relying on the ensemble size according to the availability of computation resources) between different climate state variables. Thus, it has a wide scope of applications.

For multiple purposes such as climate detection, ocean observing system evaluation, and assimilation validation, etc., as the first step of the CDA system application this study and a forthcoming follow-up study (Zhang et al. 2007, manuscript submitted to *J. Geophys. Res*.) are using a perfect model study framework, or so-called idealized “twin” experiments. The truth in the twin experiments is a long model integration with the temporally varying greenhouse gas and natural aerosol (GHGNA) radiative forcings. The “observations” are the projections of the truth onto a certain observational network, modified by adding white noise to simulate the observational errors. Under the perfect model study framework, the CDA system has completed a series of long (25 yr) assimilation experiments based on the twentieth-century (XBT, CTD, etc.) and twenty-first century (Argo) ocean observational networks. This study focuses on the system description and the first step, validation. In particular two test cases are examined to illustrate the importance of maintaining geostrophic balance in atmospheric data assimilation (ADA) and maintaining the temperature–salinity (*T*–*S*) relationship in ODA. The most difficult assimilation case in the series of ODA experiments uses a fixed-year GHGNA radiative forcing to retrieve the truth (from a simulation with the temporally varying GHGNA radiative forcings) through the XBT network. Its analysis serves as a preliminary evaluation of the system. Detailed analyses and diagnostics about the impact of the XBT–Argo ocean observational network, the temporally varying GHGNA radiative forcing in assimilation, and the atmospheric data constraint on climate detection will be presented in follow-up studies.

This study is organized as follows. Section 2 describes the coupled model and filtering algorithm with parallelization design. Section 3 describes the twin-experiment design. Section 4 examines the importance of maintaining the *T*–*S* relationship in oceanic data assimilation. Here, the importance of assimilating salinity data for estimating climate states, based on a dummy salinity observing network, is also discussed. Section 5 examines the importance of maintaining geostrophic balance in reconstructing the mid- and high-latitude atmospheric structure. Section 6 analyses and discusses the results of a long ODA experiment, which provides a preliminary evaluation of the system. A summary and discussion are given in section 7.

## 2. Description of the coupled data assimilation system

### a. GFDL’s coupled climate model: CM2

Using both the B-grid finite-difference and finite-volume atmosphere dynamical cores, GFDL has two coupled climate models: CM2.0 and CM2.1. In this study, the B-grid version (CM2.0) is first chosen to implement the coupled data assimilation. The CM2.0 uses the GFDL atmosphere model AM2p12 (AM2/LM2; GFDL Global Atmospheric Model Development Team 2005) with a B-grid dynamical core that has 24 vertical levels and 2° latitude by 2.5° longitude horizontal resolution, including a Mellor–Yamada level-2.5 dry planetary boundary layer, relaxed Arakawa–Schubert convection, and a simple diffusive parameterization of the vertical momentum transport by cumulus convection.

The ocean component is the fourth version of the Modular Ocean Model (MOM4) configured with 50 vertical levels (22 levels of 10-m thickness each in the top 220 m) and 1° × 1° horizontal B-grid resolution, telescoping to ⅓° meridional spacing near the equator. The model has an explicit free surface with true freshwater fluxes exchanged between the atmosphere and ocean. Parameterizations include *k*-profile parameterization vertical mixing, neutral physics, a spatially dependent anisotropic viscosity, and a shortwave radiative penetration depth that depends on a prescribed climatological ocean color. Insolation varies diurnally and the wind stress at the ocean surface is computed using the velocity of the wind relative to the surface currents. An efficient time-stepping scheme (Griffies 2005) is employed. More details can be found in Gnanadesikan et al. (2006) and Griffies (2005).

The sea ice component of the CM2.0 is the GFDL Sea Ice Simulator, a dynamical ice model with three vertical layers (one snow and two ice) and five ice-thickness categories. The elastic–viscous–plastic technique (Hunke and Dukowicz 1997) is used to calculate the internal stresses of the ice, and the thermodynamics follows a modified Semtner three-layer scheme (Winton 2000). More details can be found in appendix 1 of Delworth et al. (2006). The interactions of these four major model components in the GFDL’s coupling system are schematically demonstrated in Fig. 1.

### b. EAKF under a local least squares framework

The general derivation of an ensemble filter from Bayes’s rule (Jazwinski 1970) can be found in the literature (e.g., Evensen 1994; Miller 1998; Miller et al. 1994, 1999; Houtekamer and Mitchell 1998; 2001; Burgers et al. 1998; van Leeuwen 1999; Mitchell and Houtekamer 2000; Bishop et al. 2001; Hamill et al. 2001; Anderson 2001; Whitaker and Hamill 2002). Tippett et al. (2003) analyzed existing ensemble-based filters (Anderson 2001; Bishop et al. 2001; Whitaker and Hamill 2002) and pointed out that these methods are roughly equivalent and suggested that the deterministic square-root filter as a unified family name may be appropriate. Houtekamer and Mitchell (2001) and Anderson (2003) pointed out that ensemble-based filters can be applied sequentially to individual scalar observations when each scalar observation has an independent error distribution, or with the application of a singular value decomposition technique when the observational errors are correlated (Anderson 2003). Furthermore, Anderson (2003) described a two-step data assimilation procedure for ensemble filtering under a local least squares framework, which is quite suitable for application to the implementation of parallelization if an appropriate core domain and halo size are defined (Zhang et al. 2005). Without mathematical details, but with the aid of a schematical diagram as shown in Fig. 2, a detailed flow for the two-step assimilation procedure is depicted: the first step computes ensemble increments at an observation location and the second step distributes the increments over the impacted grids. This universal algorithm is applied to the ADA (see section 5) and ODA (see section 4) with their own parameters according to the different time scales and internal variabilities in the atmospheric–oceanic processes for constructing the GFDL’s CDA system.

*y*, with the observation value

*y*and standard deviation

^{o}*σ*, which has a Gaussian distribution (marked by the thick-dashed arrow and labeled STEP 1 in Fig. 2). Then, a least squares fit is used to distribute the increment over the relevant grid points (marked by the thick-dashed arrow and labeled STEP 2 in Fig. 2) for each ensemble member. The new shape (solid arrow 1) of the prior PDF at the observation location denotes the formation of the new ensemble spread (Δ

^{o}_{y}*y*′ as below; Fig. 2, dotted curve in bottom-right panel) from the prior ensemble spread (Δ

*y*as below, solid-thin curve) by the observation distribution (denoted “obs PDF” in Fig. 2). Here, Δ

^{p}*y*′

_{i}is formulated by

*i*represents the ensemble sample index and

*k*represents the observation index, while

*σ*and

^{o}_{k,k}*σ*represent the standard deviation of the observation error and its prior estimate from the ensemble, respectively. Here

^{p}_{k,k}*r*is the ratio of the estimated prior ensemble standard deviation and the observational error standard deviation. Equation (3) says that if the estimated prior ensemble variance is greater than the observational error variance (

_{k}*r*> 1), the ensemble spread is largely reduced by the observation; otherwise, the ensemble stays close to the prior. The shift of the ensemble mean (solid arrow 2) induced by this observation is computed by

_{k}*y*for the

^{o}*i*th ensemble sample member at the observation location is

*j*) and the observation location (

*k*),

*c*, as

^{p}_{j,k}*x*represents the component of a certain state variable at grid point

*j*. The computation in Eq. (6) (marked by solid arrow 3 in Fig. 2) uses the ensemble-estimated covariance between the observation location and the model grid point,

*c*,

_{j}*, denoted by the shaded region around the observation location (asterisk) and the model gridpoint location (circle) to distribute the observation increments Δ*

_{k}*y*onto all relevant grid points for each ensemble sample member so that an “analysis PDF” is formed (Fig. 2, left panel). This kind of ensemble-based algorithm is sequential because the prior ensemble estimate of any observation, which is used to compute

^{o}_{i}*σ*,

^{p}_{k,k}*c*

_{j,k},

*y*, and

^{p}_{i}*in Eqs. (3)–(6), is updated using the ensemble vector adjusted by what is already known. The background covariance is a function of time and space, that is, it is flow dependent and anisotropic.*y

^{p}As shown above, an ensemble filter uses finite samples to estimate the PDF of a state variable, solving the data assimilation problem by computing the product of modeled and observational PDFs. This process, called *filtering*, solves for signals that have major likelihood at the centers of PDFs and gets rid of noise with minor likelihood at the tails of PDFs; it uses a linear regression based on error covariance between the analyzed and observed variables (as illustrated in Fig. 2). In an ensemble-based filtering algorithm, the background error covariance between state variables is directly computed from the model ensemble integrations by a Monte Carlo approach. It is convenient to conduct multivariate data assimilation using an ensemble filter because once error covariances are available, the observational increment of any variable, if available, can be regressed onto another to obtain the adjustment amount. The nature of multivariate adjustment is essentially important for solving such problems as climate assessment that require maintenance of the joint distribution as much as possible. The other important advantage of ensemble-based filters is that error covariances used in regression are flow dependent and temporally varying (Zhang and Anderson 2003). Thus, they are well suited to handle the nonstationary stochastic processes like climate variations in which the error structure of flows is highly anisotropic and strongly dependent on the seasonal cycle and interannual fluctuations.

Based on a previous study (Zhang et al. 2005) on covariance filtering and observation smoothing techniques, and under computational resource constraint, trials and errors are used to determine the ensemble size to be 6 in this perfect model study.

### c. A “super”-parallelized ensemble filter with CM2

Because of the limitations on memory storage, a single GFDL coupled model run requires a parallel computation environment [e.g., a minimum number of processing elements (PEs) is 20 on the Silicon Graphics Inc. Intel–Altix cluster] and the ensemble filter demands a so-called superparallelization technique to guarantee that model ensemble integrations and the filtering computation are conducted iteratively online. First, a large number of PEs (where *K* is the total PE number) are loaded and regrouped to form a global PE list and *M* sub-PE lists, each of which has *K*/*M* PEs (where *M* is the ensemble size). The analysis domain decomposition is done on the global PE list in which *K* analysis domains [each containing a core domain and a halo; Zhang et al. (2005)] are formed. Within each sub-PE list, the model domain decomposition is first conducted and a certain ensemble member model integration is then advanced in parallel, in which each PE works for a subdomain. In this process, these *M* sub-PE lists work independently and the whole ensemble of model integrations is forwarded synchronously. Then, when the model ensemble reaches an observational time, a data transfer process from the model domains (sub-PE lists) to the analysis domain (global PE list) is activated so that an ensemble vector is formed in each analysis domain where a specific PE updates the ensemble vector by assimilating observations independently. Once the analysis process is done, data in the ensemble vectors over core domains are transferred back to the model domains for each ensemble member on a certain sub-PE list for initializing the next cycle of ensemble model integrations. A flow chart illustrating the iterative procedure specifically for a six-member ensemble is shown in Fig. 3 in which each member uses 30 PEs to carry out the model integration (left panels) while the daily filtering analysis uses 180 PEs (right panels).

## 3. Design of a perfect model study using the existing ocean observational network

### a. Perfect model framework

Coupled data assimilation is a multitask problem that involves many issues: coupled model bias, sampling of the observing system, validation of the analysis scheme, etc. A CDA system is so complex that any uncertainty in the aspects described above may cause the evaluation of CDA results to become extremely difficult. To reduce the complexity, this study excludes the model bias issue by using a perfect model study framework, or so-called identical twin experiments, in which a real-ocean observational network is used to sample a modeled time series of climate states serving as the true solution of the assimilation problem. Then it is feasible to evaluate the assimilation quality by verifying assimilation results with the “truth” so that any up/downgrade of the assimilation system, when a new assimilation component or observational data type is added, or when an assimilation parameter is tuned, can be quantified. Once the confidence of the assimilation scheme in a CDA system is established, the signal of the climate variations contained in the observing system can be assessed by verifying the assimilation results with the truth. This process within the identical-twin framework is referred to as an observing system simulation experiment (OSSE) or climate detection because various scale variabilities and trends in climate variations have to be assessed in this process. The perfect model framework that is designed in this study is based on the real-ocean observational network, which is important not only for the CDA system development but also for OSSEs/climate detection.

### b. Idealized “observed” data in the actual ocean observational network

In this study, all observed ocean data are produced by projecting a model integration onto a real observational network and superimposing white noise. The three-dimensional structure of the ocean observational network is based on the temperature profiles taken from the World Ocean Database, which is maintained by the National Oceanographic Data Center. Data types used here are for the most part the same as those used by Levitus et al. (2005) for the *World Ocean Analysis*, including XBT, conductivity–temperature–depth, drifting buoy, ocean station data, undulating oceanographic recorder, and moored buoy data, shown in Fig. 4. The GFDL’s Intergovernmental Panel on Climate Change (IPCC) twentieth-century historical integration, which uses the temporally varying GHGNA radiative forcings, is set to be the true solution for the assimilation experiments. Then, the observed ocean profiles are formed by sampling the historical integration temperature and/or the salinity data from the ocean observational network on a daily basis, and adding white noise. The projection from the model space onto the observational space (limited to the top 500 m in this study) is basically a trilinear (horizontal and vertical) interpolation. The imposed white noise attempts to account for random measurement errors of the observing system and the interpolation error in the projection. The standard deviation of the white noise is 0.5°C for temperature and 0.1 psu for salinity at the sea surface (typical error levels for SST and sea surface salinity) and exponentially decays to zero at 500-m depth. The representation errors of the observations, which reflect the limitations of the scales of the observation sampling, are not included in the superimposing white noise. How to realistically construct the error distribution to represent sampling errors could be an interesting research topic in itself.

### c. Observed data for the atmosphere

The atmospheric observations take the monthly mean reanalysis format (i.e., gridpoint values) of atmospheric variables (full grid points) in the GFDL’s IPCC historical run described previously. In this case, an observed atmospheric variable is a monthly mean time series from the model integration, onto which white noise is superimposed with standard deviations of 1°C for temperature; 1 m s^{−1} for the *u* and *υ* wind components; and 10 hPa for surface pressure. Again those numbers represent typical standard deviation values of atmospheric observational errors that do not include the representation error of the observations.

As discussed after Eq. (4) in section 2b, the standard deviation of the observational errors is a parameter that determines the strength of the observational constraint. The values of the atmospheric observation error standard deviation set in this section and the values of the oceanic observation error standard deviation set in the last section may be tuned for an optimal observational constraint.

## 4. Tests on ODA

The ocean observational network from the last quarter of the twentieth century is used to sample the GFDL’s IPCC historical run. All assimilation experiments in this study use observed ocean data only above 500 m. A totally independent ensemble initial condition is formed by combining the atmosphere and land states at 0000 UTC 1 January of years 41, 42, 43, 44, 45, and 46 with the ocean and ice state at 0000 UTC 1 January 44 of the GFDL’s IPCC control run (using the 1860 fixed-year GHGNA radiative forcing). The assimilation model integration only uses the fixed-year GHGNA radiative forcing at 1860, which is the hardest assimilation case in this perfect model study because the different GHGNA radiative forcings in the truth and in the assimilation model may introduce model bias into the assimilation. The initial motivation to use fixed-year GHGNA radiative forcing in the assimilation model is an attempt to determine how much of the radiative forcing information is detectable by an ocean observational network, although the temporally varying GHGNA radiative forcing will be used in the real-data assimilation. The impact of the temporally varying GHGNA radiative forcing on data assimilation for climate detection and ocean observing system evaluation will be discussed in an accompanying study (Zhang et al. 2007, manuscript submitted to *J. Geophys. Res*.), where the assimilation results with the fixed-year–temporally varying GHGNA radiative forcing are compared and analyzed in detail. Then, all of the ODA tests shown below try to answer the following question: With the ocean observational network, how much can we retrieve of the truth? In other words, these tests offer a means of simultaneously evaluating the assimilation system and the ocean observing system.

Given the fact that most ocean observations in the twentieth century consist of temperature data only, once the ODA system using the GFDL’s coupled climate models is set up, the first issue we want to address is the capability of the ODA system to maintain the physical balance in ocean flows, mostly characterized by the *T*–*S* relationship, while assimilating only ocean temperature data. As shown in Fig. 4, from the 1970s to the 1990s the coverage of the ocean observational network had improved. We chose 1991 as a representative sample in the 1990s coverage for this first set of tests. With such a small ensemble size, it is important to use a weighting function, Ω(*a*, *d*) (Gaspari and Cohn 1999), for covariance filtering to limit the noise in the covariance estimates (Hamill et al. 2001). Following Zhang et al. (2005), Ω(*a*, *d*) is applied for space (horizontal and vertical) and time (an observational time window for smoothing the observations). Most of the parameters in the ODA scheme are the same as in Zhang et al. (2005) except for those that need to be adjusted according to the new model configuration such as the halo size (10° for both longitude and latitude) and the time window (2 days before and after the analysis time). In addition, the horizontal correlation scale [the parameter *a* in Ω(*a*, *d*)] is multiplied by a cos*ϕ* (*ϕ* is the grid latitude) factor up to 80°N (S) to make the scale consistent with the characteristics of the Rossby deformation radius for a global analysis scheme. The vertical scale *a* is set to be the width of a grid box (10 m above 200 m, gradually increasing up to 80 m around 500-m depth) and each observation is only allowed to impact at most four neighboring levels (two above and two below). Then, three assimilation experiments using different analysis scheme are conducted and the root-mean-square (RMS) errors of the oceanic temperature and salinity are listed in Table 1. The errors of a control simulation (starting from the same initial conditions as the assimilation) without any data constraint are also listed as the reference.

### a. Importance of maintaining the T–S relationship

The first assimilation employs a univariate analysis scheme, allowing the observed temperature to only correct the temperature itself, denoted by T2T in Table 1. The univariate ODA reduced the top-500-m ocean temperature RMS errors by 43% compared with the control case. However, the univariate scheme increases the top-500-m ocean salinity error by 6% and the top-1000-m ocean salinity at the equator by 36%, compared with the control case. From the zonal-depth sections of the temperature and salinity errors at the equator, it is found that the assimilation of temperature causes the top 250 m of the central Pacific Ocean to cool because data require a relatively shallow thermocline while the west Pacific Ocean becomes too fresh. The *T*–*S* imbalance in the univariate assimilation scheme also causes larger salinity errors in other places such as the Atlantic and Indian Oceans. The following example investigates how temperature and salinity errors can both be coherently reduced over the tropical Pacific by employing the *T*–*S* covariance.

The cooling of the central Pacific caused by the assimilation of oceanic temperature can be clearly exhibited in the zonal-depth distribution of the temperature correction right at the equator (Fig. 5a). Yet the positive *T*–*S* covariance (at the same location) over the central Pacific (Fig. 5b) means that in order to simultaneously satisfy the model relationship as well as the cooling response, the ocean has to be fresher. Because the salinity remains unadjusted in the univariate assimilation scheme, the water’s density in the central Pacific Ocean is higher than it should be. This higher density causes excessive downwelling (Fig. 6b) over the central Pacific. Through the same mechanism, excessive upwelling is produced in the western Pacific by the univariate assimilation scheme because of the failure to maintain the correct *T*–*S* balance. This excessive upwelling persistently transports the 500–1000-m freshwater over the western Pacific to the top and causes a strong negative salinity error center (the water tends to be much fresher) there. Complementary to the excessive upwelling–downwelling over the western/central Pacific Ocean, an excessive westerly undercurrent also is observed (Fig. 7b) from the western to the central Pacific.

A multivariate assimilation scheme uses the covariance between any two variables estimated by the model ensemble to accordingly adjust the ocean state when observations of only one variable are available. Figures 6c and 7c depict the tropical vertical motion and undercurrent errors (zonal-depth sections) when only temperature observations are assimilated but both temperature and salinity are adjusted by applying the *T*–*S* covariance. These results are denoted by T2TS. Compared with the univariate assimilation, use of the *T*–*S* covariance in the multivariate assimilation significantly improves the assimilation quality due to the maintenance of the *T*–*S* balance. Most notably, at the equator the salinity errors are reduced by 45%, vertical motion errors by 81% (Figs. 6b and 6c), and the undercurrent errors by 50% (Figs. 7b and 7c).

We may attribute the positive *T*–*S* covariance along the thermocline (thick red line in Fig. 5b) to upward–downward thermocline oscillations associated with the isopycnal nature of ocean movements, and the negative *T*–*S* covariance at the top of the western Pacific to the atmospheric precipitation response associated with the warmer SST (over the ascending branch of Walker cells). It is worth mentioning that attributing the covariance to the certain physical process is usually very difficult because a covariance reflects the syntheses of the correlation between two variables over all scales of motions. From the viewpoint of *information estimation*, use of covariances is a means of trying to maintain the nature of the joint distribution of a multivariate stochastic dynamical process, which plays an important role in solving such a complex problem as climate assessment. Previously, methods of relaxing the inconsistencies of the adjusted–unadjusted temperature–salinity have been studied in a few other ways to build up the *T*–*S* relationship (Troccoli and Haines 1999; Troccoli et al. 2002; Han et al. 2004; Ricci et al. 2005). More experiments with the application of covariances between temperature and currents (*T*–*U*, *T*–*V*) and temperature and zonal and meridional wind stresses (*T*–*τ _{x}*,

*T–τ*) do not produce a dramatic improvement in the assimilation quality (not shown here), as the

_{y}*T*–

*S*does. However, to better maintain the nature of the joint distribution, the long run in section 6 and follow-up studies for climate detection and/or ocean observing system evaluation all use the above-mentioned covariances associated with the ocean state.

### b. Importance of assimilating salinity observations

With the advent of the new century, great efforts have been made to obtain more salinity measurements [e.g., the Array for Real-time Geostrophic Oceanography (ARGO) design and deployment]. The second set of experiments discussed below primarily attempts to quantify the importance of explicitly assimilating the observed oceanic salinity as well as temperature.

Assuming that the observational network used in section 4a provides both temperature and salinity measurements, the salinity profiles have the same structure as temperature profiles except that the “observed” data are the samples (projection) of the salinity of the truth on (onto) the ocean observational network. White noise is superimposed onto the projection of the model-simulated observed salinity data by the procedure described in section 3b. Again, to maintain the nature of the joint distribution, while assimilating the salinity, the multivariate scheme also applies the *T*–*S* covariance to adjust the temperature (denoted by TS2TS in Table 1). The resulting assimilation errors are shown in column 8 and 9 in Table 1, as well as in Figs. 6d and 7d, which are for the tropical vertical motions and undercurrents (zonal-depth sections). Comparing columns 8 and 9 in Table 1 and Figs. 6d and 7d (TS2TS case) to columns 6 and 7 in Table 1 and Figs. 6c and 7c (T2TS case), it is observed that assimilating the salinity measurements significantly improves the analysis of salinity but has a marginal effect on the temperature assimilation errors. For example, salinity assimilation errors are reduced by 40% for the global average and 50% for the Tropics, whereas the temperature assimilation errors are reduced by only 6% for the global average and 13% for the Tropics. Again, assimilating salinity observations further improves the estimate of the joint PDF of the multivariate stochastic process, and hence the errors of both the vertical motion and the undercurrent are further reduced by approximately 13% by the introduction of salinity data (see Figs. 6c,d and 7c,d).

The meridional heat–salt transport integrated zonally and vertically (∫∫ *ρc _{p}Tυ dx dz*/∫∫

*ρSυ dx dz*) is an indicator of how well the ocean general circulation is estimated. Figures 8a and 8b show the annual mean of the integral of the meridional heat and salt transport, respectively, for all three data assimilation experiments. Because of the incorrectness of the density in the univariate assimilation (red in Fig. 8a, denoted by T2T), the gradually increase of the northward heat transport at low latitudes from south to north (black curve in Fig. 8a, denoted by truth) is significantly trapped in the Tropics. The use of the

*T*–

*S*covariance mostly fixes this problem (green curve in Fig. 8a, denoted by T2TS). The introduction of salinity data greatly improves the southward heat transport over the Southern Hemisphere subtropics (blue curve in Fig. 8a, denoted by TS2TS). On the other hand, while fixing the problem of the tropical northward salt transport trap, the use of the

*T*–

*S*covariance overestimates the northward salt transport in the Southern Hemisphere and the southward salt transport in the middle latitudes in the Northern Hemisphere (green curve in Fig. 8b). Such overestimates may come from imperfections in the

*T*–

*S*covariance estimates based upon the small ensemble size (six in this case), and then these overestimates are relaxed through the direct assimilation of the salinity observations (blue curve in Fig. 8b).

## 5. Tests on ADA: Importance of maintaining the geostrophic balance

The correlation scales employed in the atmosphere-filtering analysis are 1000 km for temperature and 500 km for *u* and *υ*. In the following test cases, one or more atmospheric variables are chosen as the observed variables to be assimilated. The purpose is to understand how to assimilate the atmospheric variables to improve the estimate of the atmospheric state and the fluxes it provides to other model components in the coupled modeling system. Using the observed atmospheric data produced in section 3c, three assimilation cases are compared and analyzed: 1) case I, to assimilate only atmospheric winds; 2) case II, to assimilate only atmospheric temperature; and 3) case III, to assimilate both winds and temperature. The verification discussed below is based on the first month’s atmospheric data assimilation results from daily analyses.

### a. Assimilating winds only (case I) versus assimilating temperature only (case II)

The first experiment (case I) assimilates only winds (both *u* and *υ* components) to adjust the atmospheric wind itself as well as the temperature, based on an ODA-only case (T2TS). Results show that the assimilation of *u* and *υ* retrieves the truth’s winds very well, reducing the global RMS errors by 58% from the ODA-only results (from 4.5 to 1.9 m s^{−1}). Reconstructing the atmospheric temperature by assimilating only temperature (case II) turns out to be somewhat more difficult than reconstructing the atmospheric winds by assimilating winds (case I), the global temperature RMS error reduction from the ODA-only being 46% (from 1.7° to 0.9°C). Because of the improvement of the atmospheric bottom boundary conditions on SST generated by the ODA process, the ODA reduces the global RMS errors for both atmospheric winds (by 18%) and atmospheric temperature (by 19%), compared with the control (the RMS errors are 5.5 m s^{−1} for winds and 2.1°C for temperature).

To illustrate the impact of assimilating only atmospheric winds or assimilating only atmospheric temperature on the atmosphere analysis, the RMS error variation with respect to latitudes, of winds and temperature (summed in the zonal and vertical domain), are plotted in Fig. 9. It is observed that while the atmospheric winds are consistently reconstructed well in all latitudes by assimilating the wind observations in case I (red curve in Fig. 9a), the atmospheric temperature is improved in the Tropics but becomes worse at middle and high latitudes (red curve in Fig. 9b). On the other hand, it is relatively easier to improve the estimates of the winds at high latitudes of the Northern Hemisphere than in the Tropics by assimilating the atmospheric temperature observations in case II (green curve in Fig. 9a) while the estimate of temperature is improved in a global domain (green curve in Fig. 9b). This phenomenon can be explained by the geostrophic balance constraint on atmospheric flows at different latitudes. In the Tropics, because of the weak geostrophic balance constraint, it is the winds that govern the formation of the flows in which the temperature adapts to the flow, so that once the winds are corrected, the temperature improves (case I), while the better temperature estimates do not guarantee improved winds (case II). Meanwhile, at middle and high latitudes where geostrophic balance dominates the atmospheric flows, the thermal winds govern the formation of the flows so that in case I the imbalance of the winds and temperature causes the temperature errors to exceed those of the ODA-only results even though the winds are corrected well, and a corrected temperature easily improves the estimate of the winds in case II. These results are consistent with the simulation experiment study of Gordon et al. (1972). Assimilating the atmospheric temperature only cannot improve the estimate of the Southern Hemisphere subpolar jet, probably because of the strong dependence of the jet on the SST, while the ODA cannote provide a good SST estimate because of the lack of oceanic data in this region.

Because of the strong internal variability of the atmospheric flows and the small ensemble size in the filter (six in this case), the use of cross covariances between temperature and winds relaxes the imbalance only slightly, but not enough to significantly improve the assimilation quality.

### b. Case III: Assimilating both winds and temperature

From the analyses and discussion in the last section, assimilating both the atmospheric temperature and wind observations is critically important for obtaining a self-consistent atmospheric state. In this section, we show the results of an experiment in which both winds and temperature data are assimilated. Figure 10 presents the errors of the vertical velocity in the Tropics (averaged over 20°S–20°N) for the control (Fig. 10a), as well as the ODA-only (Fig. 10b) and the ODA + ADA (Fig. 10c) simulations. Figure 10 shows that because both the winds and the temperature are consistently estimated by the assimilation, the ADA simulation (Fig. 10c) significantly improves the estimate of the Walker circulation in the Tropics compared with the ODA-only simulation (Fig. 10b). Nevertheless, it is also clear that the estimate of the Walker circulation in the ODA-only simulation is better than that of the control simulation. Again, this is because the ODA process provides a better SST bottom boundary condition for the atmosphere. Figure 11 depicts the errors of the zonal wind stress that the atmosphere exerts on the ocean surface. It shows that the estimate of the zonal wind stress is improved by the ADA over much of the globe, but especially over the North Atlantic.

It is worth mentioning that although the problem discovered in the previous section by assimilating monthly mean winds or temperature individually may more or less be relaxed by assimilating daily data, the destruction of the geostrophic balance while only using the atmospheric wind data is a nonnegligible issue. A “balanced initial condition” for a reliable coupled system will be a key element for improving the seasonal–interannual forecasts [an El Niño–Southern Oscillation (ENSO) event, for instance].

## 6. A 25-yr ODA long run test using the CDA system

The temperature of the GFDL’s IPCC twentieth-century historical run is sampled onto the last quarter of the twentieth-century ocean temperature network (Fig. 4) to produce a 25-yr idealized observed dataset, as described in section 3b. Using the ensemble initial condition and the assimilation model configuration described in section 3b, and the ocean assimilation parameters described in section 4a, the assimilation system is run to assimilate the 25-yr ocean temperature observations. However, to simulate the sparseness of the XBT observations in the deep ocean, the observations used in this experiment are limited above 500 m. In addition to the temperature correction, the observed temperature is allowed to correct salinity and currents using the covariance between these variables and the temperature. Also, ocean temperature observations above 50 m are allowed to directly impact the sea surface wind stresses (*τ _{x}* and

*τ*) to increase the constraint of oceanic observations on the coupled model in the absence of an atmospheric data constraint, as the direct means of applying the coupling dynamics to the CDA system mentioned in the introduction. The heat–water fluxes appear to be very sensitive to the adjustment by the ocean temperature observations, and because of the small ensemble size (six) used in this study, the adjustment of heat–water fluxes by the ocean temperature observations is not included here.

_{y}The error reduction of the ocean temperature over the top 500 m by the ODA is presented in Fig. 12. It is shown that the global RMS error is reduced by roughly 60% (from 0.85° to 0.35°C) (Fig. 12a) during the 5-yr spinup period. The 20-yr time mean errors (from 1981 to 2000) of the vertically averaged top-500-m ocean temperature are shown in Fig. 12b (control) and Fig. 12c (ODA). Comparing Fig. 12c with Fig. 12b, it is observed that except for the Southern Ocean (south of 32°S) and the North Atlantic, the ODA significantly reduced the temperature errors below 0.2°C from 1°C of the control. The interesting portions of the assimilation temperature errors include the southwest–northeast error belt along the northwest coastline of the Atlantic and a nearly equator-symmetric error distribution over the central-eastern tropical Pacific. The former must be associated with the complex heat–salt transport mechanism over the North Atlantic including meridional overturning circulation (MOC), and the latter may be created by the extra Kelvin waves induced by the imbalance in the data constraint process and their reflection as Rossby waves at the east coast. Some degree of imbalance still exists in the data constraint process, mainly because of an imperfect *T*–*S* relation as a result of the small ensemble size. On the other hand, the temperature assimilation errors over the Southern Ocean basically can be attributed to the sparseness of the observations in that region (see Fig. 4).

To examine the capability of the ODA to reconstruct ENSO variability, the anomalies of the regionally averaged ocean temperature over the Niño-3.4 region are computed and presented in Fig. 13. Figure 13 shows that except for some small-scale details, the ODA (second panel from the top, denoted by ASSIM) captures nearly all events, that is, reproducing the phase and amplitude of all ENSO events of the truth (third panel from the top) while the control (top panel, denoted by CTL) exhibits its own ENSO variability that is entirely different from the truth. The ability of the ODA to accurately reconstruct the ENSO variability can be more clearly demonstrated by the vertical average of the Niño-3.4 ocean temperature anomalies (bottom panel). Note that the assimilation curve (red) follows the truth (black) very closely.

Another interesting point about the coupled ODA process we want to show here is the response of the atmospheric bottom winds to the SST generated by the ODA. Figure 14 presents the zonal wind stress (*τ _{x}*) exerted on the ocean surface in the Tropics by the atmosphere in three cases: the control (top panel), the ODA (second panel from the top), and the truth (third panel from the top). First, the control

*τ*shows an entirely different variability from the truth, that is, while a few strong wind bursts associated with strong ENSO events appear in the truth during the first 20 yr, the control only has some weak

_{x}*τ*anomalies with different phase. The ODA

_{x}*τ*captures the major wind burst events that occur in the truth, although the former tends to be a smoothed version of the truth. Also note that the very strong wind burst event in 1982–83 is reconstructed precisely.

_{x}It is useful to estimate the uncertainty of various variables of the assimilation product in an ensemble filtering framework. This exercise may further our understanding of assimilation results and possibly provides clues on how to improve the assimilation system. The upper–lower bounds of the ODA ensemble spread of the Niño-3.4 temperature anomaly averaged over the top 250 m are plotted by pink dashed lines in the bottom panel in Fig. 13. (The terms “spread” and “deviation” throughout this study are used relative to the ensemble mean.) Comparing the ODA spread to the control spread (green dashed), which is estimated by using six 25-yr nonoverlapping time series of the model simulation from the 150-yr IPCC control run (using 1860 fixed-year GHGNA radiative forcings), the ODA reduces the uncertainty of the model heat content dramatically because of the direct constraint of ocean temperature observations. Comparing the spread of the assimilation wind stress (pink dashed lines in the bottom panel in Fig. 14) with the spread of the control wind stress (green dashed), although the entire ensemble of the ODA wind stresses appears to follow the truth’s trend, the uncertainty of the zonal wind stress is only slightly reduced by the ODA. The horizontal distribution of the time mean (25 yr) of the standard deviations of the spread of the zonal wind stress and SST is shown in Fig. 15. The largest uncertainty of the model wind stress is located over the North Atlantic, which may be associated with the North Atlantic Oscillation (NAO) phenomenon and the low-level jets there. Two other places exhibiting large model wind stress spread are the North Pacific and the high latitudes in the Southern Ocean, which are basically consistent with the corresponding regional storm tracks. The largest model spreads of SST (Fig. 15c) are located over the equatorial Pacific and the North Atlantic. The former reflects the ENSO variability that is associated with the Kelvin wave activity and strong atmosphere–ocean interaction, while the latter can be linked to the North Atlantic gyre. In addition, some stronger spread centers associated with Rossby wave activity are found in the middle and high latitudes of the Pacific. Beneath the mixed layer, these Rossby-wave-related spread centers become even stronger and appear to spread over the whole Pacific (not shown here). Through the ODA, the uncertainties of ocean temperature over the Pacific and Atlantic are significantly reduced (Fig. 15d) by the direct assimilation of observed ocean temperatures while the uncertainties of the wind stresses are reduced slightly at the equator and remain nearly unchanged off of the equator (Fig. 15b; see also the bottom panel in Fig. 14). In this perfect model ODA experiment, the atmospheric spread is based on both stochastic initial conditions and SST uncertainties. The difference between the wind stress spread and the ocean temperature spread generated by the ODA (in Figs. 13 –15) implies that the atmospheric spread is dominated by the strong internal variability of the atmosphere, while the convergence of SST brought about by the ODA is not sufficient to constrain the atmosphere.

Finally, time series of heat content anomalies in various ocean basins (basin averaged over the top 500 m) are presented in Fig. 16. For comparison, all anomalies are computed using the truth’s climatology. The ocean mask used here is the same as in the work of Levitus et al. (2000). Comparing the black curve (truth) in Fig. 16 to the estimate of Levitus et al. (2000, their Fig. 1) for the top 300 m using real-ocean temperature observations, it is found that the GFDL’s model simulation using the historical GHGNA radiative forcing shows a consistent multidecadal warming trend in almost all oceans with its own interannual variability. Figure 16 shows that for oceans that have reasonable observation coverage (as shown in Fig. 4) the ODA process retrieves the trend and the variability of the heat content quite well, with a reduced uncertainty [pink-dashed (green dashed) curves represent the upper (lower) bounds of the ODA–control spread]. While the heat content anomalies of those oceans (which have good data coverage) approach the truth quickly (i.e., within a couple of years; e.g., over the Pacific and North Atlantic Oceans), a relatively longer spinup is required for the oceans in which data coverage is sparse (e.g., over the South Atlantic, and the southern and northern Indian Ocean). It is interesting to notice that the ODA’s heat content in the Southern Ocean and the Arctic Ocean follows the general trend of the truth with different details. For example, in the assimilation, the Southern Ocean heat content remains a nearly constant departure from the truth while the Arctic Ocean heat content exhibits some strong warmer events in the middle of the 1990s. Because of the lack of observations in the Southern Ocean and the Arctic Ocean (see Fig. 4), the adjustment of the trend in both oceans may be attributed to the communication between ocean circulations in different areas and/or between different coupling components. For example, the adjustment of the Southern Ocean heat content trend can be maintained by the interaction between the circulations in the Southern Ocean and other neighboring oceans such as the South Pacific, the South Atlantic, and the southern Indian Ocean, where the strong data constraint in ODA significantly corrects their circulations. The adjustment of the Arctic Ocean heat content trend may be more complex by also being associated with the ice–water interaction and ice–atmosphere flux exchanges. More detailed analyses on interactions between different ocean circulations and different coupling components and their impact on climate detection will be investigated in future studies.

## 7. Conclusions and discussion

We have described a CDA system based on the GFDL’s CM2 and an EAKF. The method produces ensemble estimates of the coupled system state and its uncertainty, by assimilating observations in a temporally continuous manner. The resulting ensemble can then be used to initialize seasonal to multidecadal forecasts of climate variability and trends. The experiments herein serve as a proof-of-concept ensemble data assimilation in comprehensive coupled models.

The CDA system is evaluated using a series of twin experiments, in which a particular model integration (with temporally evolving GHGNA radiative forcings) serves as the “truth” from which observations are drawn. These experiments highlight the importance of maintaining temperature–salinity relationships (associated with particular water masses) in ODA, and of maintaining geostrophic balance for ADA. They also address whether the ODA in the CDA system is capable of reconstructing the twentieth-century variability and trends of global ocean heat content, given the twentieth-century ocean observing network.

Because the atmosphere exhibits strong internal variability, a small ensemble (six in this case) may not accurately evaluate the covariance among the simulated fields, and thus the assimilation cannot effectively adjust the analysis solution away from the observational locations. The ADA experiments with dense atmospheric observations suggest that in the Tropics observations of the winds are most useful in reconstructing the atmospheric state (Gordon et al. 1972); while in the middle and high latitudes, atmospheric temperature data are more useful for establishing the geostrophic balance. The more slowly evolving ocean, in contrast, appears amenable to assimilation even with the small ensemble. The ODA effectively utilizes the *T*–*S* covariances to maintain realistic water masses, isopycnal transports, and the observed collocation of warm SST and enhanced precipitation in tropical warm pool regions, even when temperature-only observations are assimilated. At higher latitudes, direct salinity observations become more important for constraining the ocean circulation.

To test how well the analysis system reconstructs the oceanic impacts of twentieth-century radiative forcing changes, we performed a 25-yr CDA using the historical oceanic temperature observing network. We find that the assimilation takes at most 5 yr to spin up to equilibrium, at which point the heat content in all eight ocean basins closely resembles the true trends and variability—with a 60% reduction in RMS ocean temperature errors relative to an unconstrained control run. The true heat content variability is captured best in the Pacific, where the data coverage is relatively dense; ENSO variations in particular are reconstructed very well. The analysis is less skillful in high-latitude regions, where observations are extremely sparse over the twentieth century; while the assimilation is unable to capture the interannual variability of the oceanic heat content, it does reconstruct the long-term trend teleconnected from the lower latitudes.

The purpose of this study has been to outline the design, implementation, and initial evaluation of the ensemble CDA system, in terms of reconstructing the twentieth-century oceanic temperature trends and variability. In a series of follow-up studies, we intend to explore several remaining issues.

The impacts of temporally varying radiative forcing on the state estimate: the present study represents a particularly stringent test for the CDA, in that the truth run uses historically evolving GHGNA forcings, while the assimilation run uses GHGNA radiative forcings from a fixed year (1860). Presumably, more realistic radiative forcing will improve the CDA performance.

Impacts of the observational network on the detection of climate variability and trends: in particular, we will explore to what extent the deep-profile temperature and salinity measurements from Argo floats can better constrain the assimilation in high latitudes, which experience substantial freshwater input from river runoff and melting ice. These freshwater inputs, combined with strong thermohaline transports, may be key in determining the Atlantic MOC, an important source of multidecadal climate variability and trends.

Impacts of atmospheric observations on the coupled state estimate, and on the initial conditions used for forecasts of global climate: presumably this will have a positive impact for the Tropics and ENSO, where the air–sea fluxes of heat and momentum are largely controlled by the atmosphere. Estimation and prediction of the high-latitude oceans and the global ocean circulation may also benefit from ADA, given the link between atmospheric NAO and the MOC (Delworth and Greatbatch 2000; Delworth and Dixon 2000).

Impacts of model drifts and biases on the assimilation skill: the two main approaches we would like to explore include (a) the assimilation of additional kinds of observations—such as satellite SSTs and altimeter, and ocean currents from drifting and moored buoys to increase the sample size of oceanic observations; and (b) the use of multiple coupled models and multiple model parameters in the assimilating ensemble for improving the estimation of the prior PDF. In particular, for the first approach, because altimeter data contain integrated information about the temperature and salinity within the whole water column, based on the model dynamics the ensemble filter may project sea surface height information onto the vertical structure so as to correct the biases beneath the surface. How to use altimeter data to build the vertical structure of oceanic circulations will be the next challenge within the perfect model study framework. The multimodel approach could be a long-term goal. The improvement of the estimate of the prior PDF brought by a multimodel ensemble not only could improve assimilation skills but also may ultimately enhance the ensemble forecast quality.

## Acknowledgments

Thanks to Dr. J. L. Anderson for generous discussions on the early parallel design of the ensemble system and to V. Balaji for his continuing support of the parallel code and environment. Thanks to Dr. Guijun Han for her suggestions in processing observation data during her visit at GFDL. Thanks also to F. Zeng, Mike Spelman, Z. Liang, and H. Lee for their help in configuring the coupled model; F. Zeng also provided frequent help with data processing and visualization. The authors thank Drs. T. Gordon, S. Fan, and H. Lee for their comments on earlier versions of this manuscript.

## REFERENCES

Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation.

,*Mon. Wea. Rev.***129****,**2884–2903.Anderson, J. L., 2003: A local least squares framework for ensemble filtering.

,*Mon. Wea. Rev.***131****,**634–642.Bishop, C. H., B. J. Etherton, and S. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects.

,*Mon. Wea. Rev.***129****,**420–436.Burgers, G., P. J. van Leeuwen, and G. Evensen, 1998: Analysis scheme in the ensemble Kalman filter.

,*Mon. Wea. Rev.***126****,**1719–1724.Delworth, T. L., and K. Dixon, 2000: Implications of the recent trend in the Arctic/North Atlantic oscillation for the North Atlantic thermohaline circulation.

,*J. Climate***13****,**3721–3727.Delworth, T. L., and R. J. Greatbatch, 2000: Multidecadal thermohaline circulation variability driven by atmospheric surface flux forcing.

,*J. Climate***13****,**1481–1495.Delworth, T. L., and Coauthors, 2006: GFDL’s CM2 global coupled climate models. Part I: Formulation and simulation characteristics.

,*J. Climate***19****,**643–674.Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics.

,*J. Geophys. Res.***99****,**10143–10162.Gaspari, G., and S. E. Cohn, 1999: Construction of correlation functions in two and three dimensions.

,*Quart. J. Roy. Meteor. Soc.***125****,**723–757.GFDL Global Atmospheric Model Development Team, 2004: The new GFDL global atmosphere and land model AM2/LM2: Evaluation with prescribed SST simulations.

,*J. Climate***17****,**4641–4673.Gnanadesikan, A., and Coauthors, 2006: GFDL’s CM2 global coupled climate models. Part II: The baseline ocean simulation.

,*J. Climate***19****,**675–697.Gordon, C. T., L. Umscheid Jr., and K. Miyakoda, 1972: Simulation experiments for determining wind data requirements in the tropics.

,*J. Atmos. Sci.***29****,**1064–1075.Griffies, S. M., 2005: Some ocean model fundamentals.

*Ocean Weather Forecasting: An Integrated View of Oceanography*, E. P. Chassignet and J. Verron, Eds., Springer, 19–74.Hahn, D. G., and S. Manabe, 1982: Observational network and climate monitoring.

*Proc. Sixth Annual Climate Diagnostics Workshop*, Palisades, NY, Lamont-Doherty Geological Observatory, 229–235.Hamill, T. M., J. S. Whitaker, and C. Snyder, 2001: Distance-dependent filtering of background error covariance estimates in an ensemble Kalman filter.

,*Mon. Wea. Rev.***129****,**2776–2790.Han, G., J. Zhu, and G. Zhou, 2004: Salinity estimation using the

*T*-*S*relation in the context of variational data assimilation.,*J. Geophys. Res.***109****.**C03018, doi:10.1029/2003JC001781.Houtekamer, P. L., and H. L. Mitchell, 1998: Data assimilation using an ensemble Kalman filter technique.

,*Mon. Wea. Rev.***126****,**796–811.Houtekamer, P. L., and H. L. Mitchell, 2001: A sequential ensemble Kalman filter for atmospheric data assimilation.

,*Mon. Wea. Rev.***129****,**123–137.Hunke, E. C., and J. K. Dukowicz, 1997: An elastic–viscous–plastic model for sea ice dynamics.

,*J. Phys. Oceanogr.***27****,**1849–1867.Jazwinski, A. H., 1970:

*Stochastic Processes and Filtering Theory*. Academic Press, 376 pp.Levitus, S., J. I. Antonov, T. P. Boyer, and C. Stephens, 2000: Warming of the World Ocean.

,*Science***287****,**2225–2229.Levitus, S., J. I. Antonov, and T. P. Boyer, 2005: Warming of the World Ocean.

,*Geophys. Res. Lett.***32****,**1955–2005.Miller, R. N., 1998: Introduction to the Kalman filter.

*Proc. ECMWF Seminar on Data Assimilation*, Reading, United Kingdom, ECMWF, 47–59.Miller, R. N., M. Ghil, and P. Gauthiez, 1994: Advanced data assimilation in strongly nonlinear dynamical system.

,*J. Atmos. Sci.***51****,**1037–1056.Miller, R. N., E. F. Carter, and S. T. Blue, 1999: Data assimilation into nonlinear stochastic models.

,*Tellus***51A****,**167–194.Mitchell, H. L., and P. L. Houtekamer, 2000: An adaptive ensemble Kalman filter.

,*Mon. Wea. Rev.***128****,**416–433.Ricci, S., A. T. Weaver, J. Vialard, and P. Rogel, 2005: Incorporating state-dependent temperature–salinity constraints in the background error covariance of variational ocean data assimilation.

,*Mon. Wea. Rev.***133****,**317–338.Rosati, A., K. Miyakoda, and R. Gudgel, 1997: The impact of ocean initial conditions on ENSO forecasting with a coupled model.

,*Mon. Wea. Rev.***125****,**754–772.Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters.

,*Mon. Wea. Rev.***131****,**1485–1490.Troccoli, A., and K. Haines, 1999: Use of the temperature–salinity relation in a data assimilation context.

,*J. Atmos. Oceanic Technol.***16****,**2011–2025.Troccoli, A., and Coauthors, 2002: Salinity adjustments in the presence of temperature data assimilation.

,*Mon. Wea. Rev.***130****,**89–102.van Leeuwen, P. J., 1999: Comments on “Data assimilation using an ensemble Kalman filter technique.”.

,*Mon. Wea. Rev.***127****,**1374–1377.Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations.

,*Mon. Wea. Rev.***130****,**1913–1924.Winton, M., 2000: A reformulated three-layer sea ice model.

,*J. Atmos. Oceanic Technol.***17****,**525–531.Zhang, S., and J. L. Anderson, 2003: Impact of spatially and temporally varying estimates of error covariance on assimilation in a simple atmospheric model.

,*Tellus***55A****,**126–147.Zhang, S., M. J. Harrison, A. T. Wittenberg, A. Rosati, J. L. Anderson, and V. Balaji, 2005: Initialization of an ENSO forecast system using a parallelized ensemble filter.

,*Mon. Wea. Rev.***133****,**3176–3201.

Cartoon of how a two-step data assimilation procedure works for updating the estimate of the probability distribution of a single state variable *x* given a single observation *y* in the EAKF under the least squares framework. The right-hand column represents step 1: updating the PDF at the observation location as a new observation comes in (denoted by the thick-dotted arrow labeled STEP 1). The solid arrow 1 denotes that the prior PDF at the observation location is squashed by a new observation (denoted by the bottom-right dashed curve), computed by Eq. (3), and the solid arrow 2 represents the shift of the prior ensemble mean at the observation location due to the new observation, computed by Eq. (5). The thick-dotted arrow extending from the right-hand column to the left-hand column denotes step 2: using the correlation distribution (shaded region) to distribute the observation increments to impacted grid points, computed by Eq. (6). The solid arrow 3 represents the process of updating the PDF of a grid point.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Cartoon of how a two-step data assimilation procedure works for updating the estimate of the probability distribution of a single state variable *x* given a single observation *y* in the EAKF under the least squares framework. The right-hand column represents step 1: updating the PDF at the observation location as a new observation comes in (denoted by the thick-dotted arrow labeled STEP 1). The solid arrow 1 denotes that the prior PDF at the observation location is squashed by a new observation (denoted by the bottom-right dashed curve), computed by Eq. (3), and the solid arrow 2 represents the shift of the prior ensemble mean at the observation location due to the new observation, computed by Eq. (5). The thick-dotted arrow extending from the right-hand column to the left-hand column denotes step 2: using the correlation distribution (shaded region) to distribute the observation increments to impacted grid points, computed by Eq. (6). The solid arrow 3 represents the process of updating the PDF of a grid point.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Cartoon of how a two-step data assimilation procedure works for updating the estimate of the probability distribution of a single state variable *x* given a single observation *y* in the EAKF under the least squares framework. The right-hand column represents step 1: updating the PDF at the observation location as a new observation comes in (denoted by the thick-dotted arrow labeled STEP 1). The solid arrow 1 denotes that the prior PDF at the observation location is squashed by a new observation (denoted by the bottom-right dashed curve), computed by Eq. (3), and the solid arrow 2 represents the shift of the prior ensemble mean at the observation location due to the new observation, computed by Eq. (5). The thick-dotted arrow extending from the right-hand column to the left-hand column denotes step 2: using the correlation distribution (shaded region) to distribute the observation increments to impacted grid points, computed by Eq. (6). The solid arrow 3 represents the process of updating the PDF of a grid point.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Flow chart of the GFDL’s superparallelized coupled data assimilation system for the 180 PEs case. Generally, this system can be scaled for any ensemble size and any PE number that is big enough. But in practice, because of efficiency considerations, it is currently scaled for running 6, 12, and 24 ensemble members by invoking a minimum of 120 PEs and a maximum of 1440 PEs on the SGI Altix cluster at GFDL or the National Aeronautics and Space Administration’s Supercomputing Division.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Flow chart of the GFDL’s superparallelized coupled data assimilation system for the 180 PEs case. Generally, this system can be scaled for any ensemble size and any PE number that is big enough. But in practice, because of efficiency considerations, it is currently scaled for running 6, 12, and 24 ensemble members by invoking a minimum of 120 PEs and a maximum of 1440 PEs on the SGI Altix cluster at GFDL or the National Aeronautics and Space Administration’s Supercomputing Division.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Flow chart of the GFDL’s superparallelized coupled data assimilation system for the 180 PEs case. Generally, this system can be scaled for any ensemble size and any PE number that is big enough. But in practice, because of efficiency considerations, it is currently scaled for running 6, 12, and 24 ensemble members by invoking a minimum of 120 PEs and a maximum of 1440 PEs on the SGI Altix cluster at GFDL or the National Aeronautics and Space Administration’s Supercomputing Division.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Samples of the ocean observational network during the last quarter of the twentieth century.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Samples of the ocean observational network during the last quarter of the twentieth century.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Samples of the ocean observational network during the last quarter of the twentieth century.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Annual mean corrections of (a) potential temperature and (c) salinity, and (b) the local *T*–*S* covariance at the same location, distributed on the *x*–*z* cross section at the equator produced by the T2TS analysis scheme. The contour intervals are 0.01°C, 0.002 psu °C, and 0.005 psu for (a)–(c), respectively.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Annual mean corrections of (a) potential temperature and (c) salinity, and (b) the local *T*–*S* covariance at the same location, distributed on the *x*–*z* cross section at the equator produced by the T2TS analysis scheme. The contour intervals are 0.01°C, 0.002 psu °C, and 0.005 psu for (a)–(c), respectively.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Annual mean corrections of (a) potential temperature and (c) salinity, and (b) the local *T*–*S* covariance at the same location, distributed on the *x*–*z* cross section at the equator produced by the T2TS analysis scheme. The contour intervals are 0.01°C, 0.002 psu °C, and 0.005 psu for (a)–(c), respectively.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Annual mean ocean vertical motion errors at the equator for (a) the control simulation with no data constraint, (b) only allowing temperature observations to impact temperature itself (denoted by T2T, univariate analysis scheme), (c) allowing temperature observations to impact both temperature and salinity using their cross covariance (denoted by T2TS, multivariate analysis scheme), and (d) using both temperature and salinity observations to adjust both temperature and salinity (denoted by TS2TS, multivariate analysis scheme). The contour interval is 0.05 m day^{−1}, the 0 contour is omitted, and the dashed lines are negative.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Annual mean ocean vertical motion errors at the equator for (a) the control simulation with no data constraint, (b) only allowing temperature observations to impact temperature itself (denoted by T2T, univariate analysis scheme), (c) allowing temperature observations to impact both temperature and salinity using their cross covariance (denoted by T2TS, multivariate analysis scheme), and (d) using both temperature and salinity observations to adjust both temperature and salinity (denoted by TS2TS, multivariate analysis scheme). The contour interval is 0.05 m day^{−1}, the 0 contour is omitted, and the dashed lines are negative.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Annual mean ocean vertical motion errors at the equator for (a) the control simulation with no data constraint, (b) only allowing temperature observations to impact temperature itself (denoted by T2T, univariate analysis scheme), (c) allowing temperature observations to impact both temperature and salinity using their cross covariance (denoted by T2TS, multivariate analysis scheme), and (d) using both temperature and salinity observations to adjust both temperature and salinity (denoted by TS2TS, multivariate analysis scheme). The contour interval is 0.05 m day^{−1}, the 0 contour is omitted, and the dashed lines are negative.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Same as in Fig. 6 but for the undercurrent and with a contour interval of 0.05 m s^{−1}.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Same as in Fig. 6 but for the undercurrent and with a contour interval of 0.05 m s^{−1}.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Same as in Fig. 6 but for the undercurrent and with a contour interval of 0.05 m s^{−1}.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

The zonal and vertical integral of the (a) meridional heat and (b) salinity transports in the truth (black), the univariate assimilation (T2T, red), the multivariate assimilation using *T*–*S* covariance without salinity observations (T2TS, green), and the multivariate assimilation using both *T* and *S* observations (TS2TS, blue).

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

The zonal and vertical integral of the (a) meridional heat and (b) salinity transports in the truth (black), the univariate assimilation (T2T, red), the multivariate assimilation using *T*–*S* covariance without salinity observations (T2TS, green), and the multivariate assimilation using both *T* and *S* observations (TS2TS, blue).

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

The zonal and vertical integral of the (a) meridional heat and (b) salinity transports in the truth (black), the univariate assimilation (T2T, red), the multivariate assimilation using *T*–*S* covariance without salinity observations (T2TS, green), and the multivariate assimilation using both *T* and *S* observations (TS2TS, blue).

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

The RMS errors computed on a zonal-vertical domain, for (a) the atmospheric zonal wind and (b) the atmospheric temperature in the ODA-only simulation (black), the case with ODA plus the atmospheric wind assimilation (red), and the case with ODA plus the atmospheric temperature assimilation (green). The control model simulation (dashed black) is also plotted as a reference.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

The RMS errors computed on a zonal-vertical domain, for (a) the atmospheric zonal wind and (b) the atmospheric temperature in the ODA-only simulation (black), the case with ODA plus the atmospheric wind assimilation (red), and the case with ODA plus the atmospheric temperature assimilation (green). The control model simulation (dashed black) is also plotted as a reference.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

The RMS errors computed on a zonal-vertical domain, for (a) the atmospheric zonal wind and (b) the atmospheric temperature in the ODA-only simulation (black), the case with ODA plus the atmospheric wind assimilation (red), and the case with ODA plus the atmospheric temperature assimilation (green). The control model simulation (dashed black) is also plotted as a reference.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Vertical motion errors of the tropical atmosphere (averaged over 20°S–20°N) for (a) the control, (b) the ODA-only, and (c) the ODA + ADA simulations. The contour interval is 0.1 m day^{−1}. The 0 contour is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Vertical motion errors of the tropical atmosphere (averaged over 20°S–20°N) for (a) the control, (b) the ODA-only, and (c) the ODA + ADA simulations. The contour interval is 0.1 m day^{−1}. The 0 contour is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Vertical motion errors of the tropical atmosphere (averaged over 20°S–20°N) for (a) the control, (b) the ODA-only, and (c) the ODA + ADA simulations. The contour interval is 0.1 m day^{−1}. The 0 contour is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Zonal wind stress errors for (a) the control, (b) the ODA-only, and (c) the ODA + ADA simulations. The contour interval is 0.04 N m^{−2}. The contour 0 is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Zonal wind stress errors for (a) the control, (b) the ODA-only, and (c) the ODA + ADA simulations. The contour interval is 0.04 N m^{−2}. The contour 0 is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Zonal wind stress errors for (a) the control, (b) the ODA-only, and (c) the ODA + ADA simulations. The contour interval is 0.04 N m^{−2}. The contour 0 is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the global RMS error of the top-500-m ocean temperature for (a) the control (dotted) and the ODA (solid), and the time mean of the vertically averaged ocean temperature errors over the top 500 m for (b) the control and (c) the ODA. The contour intervals in (b) and (c) are 0.2°C. The 0 contour is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the global RMS error of the top-500-m ocean temperature for (a) the control (dotted) and the ODA (solid), and the time mean of the vertically averaged ocean temperature errors over the top 500 m for (b) the control and (c) the ODA. The contour intervals in (b) and (c) are 0.2°C. The 0 contour is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the global RMS error of the top-500-m ocean temperature for (a) the control (dotted) and the ODA (solid), and the time mean of the vertically averaged ocean temperature errors over the top 500 m for (b) the control and (c) the ODA. The contour intervals in (b) and (c) are 0.2°C. The 0 contour is omitted.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the Niño-3.4 ocean temperature for the control (denoted CTL), the ODA (denoted ASSIM), and the truth. Curves in the bottom panel are the vertical averages over the top 250 m for the control (blue), the ODA (red), and the truth (black). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines in the bottom panel. The control (model climatological) spread is estimated by six 25-yr nonoverlapping time series and the ODA spread is computed by six ensemble members in the filter. All anomalies are computed using the truth’s climatology, and the contour interval for the first three panels is 0.5°C.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the Niño-3.4 ocean temperature for the control (denoted CTL), the ODA (denoted ASSIM), and the truth. Curves in the bottom panel are the vertical averages over the top 250 m for the control (blue), the ODA (red), and the truth (black). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines in the bottom panel. The control (model climatological) spread is estimated by six 25-yr nonoverlapping time series and the ODA spread is computed by six ensemble members in the filter. All anomalies are computed using the truth’s climatology, and the contour interval for the first three panels is 0.5°C.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the Niño-3.4 ocean temperature for the control (denoted CTL), the ODA (denoted ASSIM), and the truth. Curves in the bottom panel are the vertical averages over the top 250 m for the control (blue), the ODA (red), and the truth (black). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines in the bottom panel. The control (model climatological) spread is estimated by six 25-yr nonoverlapping time series and the ODA spread is computed by six ensemble members in the filter. All anomalies are computed using the truth’s climatology, and the contour interval for the first three panels is 0.5°C.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the zonal wind stress (*τ _{x}*) over the tropical Pacific (5°S–5°N) average for the control (denoted CTL), the ODA (denoted ASSIM), and the truth. Curves in the panel second from the bottom are the zonal averages over the Pacific for the control (green), the ODA (red), and the truth (black). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines in the bottom panel. The method for estimating the spread is the same as in Fig. 13. All anomalies are computed using the truth’s climatology and the contour interval for the first three is 0.01 N m

^{−2}

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the zonal wind stress (*τ _{x}*) over the tropical Pacific (5°S–5°N) average for the control (denoted CTL), the ODA (denoted ASSIM), and the truth. Curves in the panel second from the bottom are the zonal averages over the Pacific for the control (green), the ODA (red), and the truth (black). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines in the bottom panel. The method for estimating the spread is the same as in Fig. 13. All anomalies are computed using the truth’s climatology and the contour interval for the first three is 0.01 N m

^{−2}

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the zonal wind stress (*τ _{x}*) over the tropical Pacific (5°S–5°N) average for the control (denoted CTL), the ODA (denoted ASSIM), and the truth. Curves in the panel second from the bottom are the zonal averages over the Pacific for the control (green), the ODA (red), and the truth (black). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines in the bottom panel. The method for estimating the spread is the same as in Fig. 13. All anomalies are computed using the truth’s climatology and the contour interval for the first three is 0.01 N m

^{−2}

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

(a), (b) Time mean of the standard deviations of the zonal wind stress spread and (c), (d) the SST spread in (a), (c) the control and (b), (d) the ODA. The method for estimating the spread is the same as in Fig. 13. The contour interval is 0.01 N m^{−2} for (a), (b) and 0.1°C for (b), (d).

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

(a), (b) Time mean of the standard deviations of the zonal wind stress spread and (c), (d) the SST spread in (a), (c) the control and (b), (d) the ODA. The method for estimating the spread is the same as in Fig. 13. The contour interval is 0.01 N m^{−2} for (a), (b) and 0.1°C for (b), (d).

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

(a), (b) Time mean of the standard deviations of the zonal wind stress spread and (c), (d) the SST spread in (a), (c) the control and (b), (d) the ODA. The method for estimating the spread is the same as in Fig. 13. The contour interval is 0.01 N m^{−2} for (a), (b) and 0.1°C for (b), (d).

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the top-500-m ocean heat content (averaged temperature) in different oceans for the truth (black), the ODA (red), and the control (blue). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines. The method for estimating the spread is the same as in Fig. 13. All anomalies are computed using the truth’s climatology.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the top-500-m ocean heat content (averaged temperature) in different oceans for the truth (black), the ODA (red), and the control (blue). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines. The method for estimating the spread is the same as in Fig. 13. All anomalies are computed using the truth’s climatology.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time series of the anomalies of the top-500-m ocean heat content (averaged temperature) in different oceans for the truth (black), the ODA (red), and the control (blue). The upper (lower) bounds of the control–ODA spread are plotted by the green-dashed (pink dashed) lines. The method for estimating the spread is the same as in Fig. 13. All anomalies are computed using the truth’s climatology.

Citation: Monthly Weather Review 135, 10; 10.1175/MWR3466.1

Time-averaged RMS errors of the oceanic temperature (°C) and salinity (psu) in one control simulation and three assimilation experiments.