## 1. Introduction

Observations of sea surface height by altimeters, and the vertical structure of temperature and salinity by Argo floats, are leading to new insights into the variability and physics of the world’s oceans. Even though the global coverage of these observations is unprecedented, their use in mapping the ocean, and initializing forecast models, remains problematic because altimeters provide no direct information on vertical structure of the ocean and the horizontal spacing of Argo floats aliases mesoscale variability. It is generally accepted that such data have to be assimilated into dynamically based models in order to map ocean states, and also to physically interpret the observed variability.

Many practical schemes for sequentially assimilating ocean data are variants of the Kalman filter. For example the ensemble Kalman filter (e.g., Evensen 2006) is being used to make 10-day forecasts of the North Atlantic and Arctic Ocean on a weekly schedule (more information is available online at http://topaz.nersc.no/). In general however the computational cost of the ensemble Kalman filter is high and so simplifications have been developed for many operational applications. For example, Fukumori (2006) has constructed a simplified Kalman filter to forecast the large-scale global ocean circulation. To reduce the computational cost, the error covariance matrices are assumed time invariant, and the Kalman filtering is performed in distinct geographical regions each with its own reduced error subspace. To initialize the ocean component of a coupled atmosphere–ocean forecast system, Behringer (2007) and Sun et al. (2007) use simplified Kalman filters (based on three-dimensional variational assimilation and optimal interpolation) to assimilate sea level and in situ data. De Mey and Benkiran (2002) developed a reduced-order optimal interpolation scheme that has been used successfully to operationally forecast mesoscale variability in the Mediterranean (see also Demirov et al. 2003) and the North Atlantic (e.g., Brasseur et al. 2005).

In an independent line of research, Cooper and Haines (1996) used simple physical principles to design an effective scheme for assimilating altimeter data into ocean circulation models. Their idea was to account for differences between observed and forecast sea level by raising or lowering isopycnals until the associated dynamic height perturbation matched the sea level difference. In addition to ease of implementation and low computational cost, this scheme has the major advantage that it conserves local temperature and salinity relationships. The scheme has proved very useful in the development of practical operational ocean forecast schemes (e.g., Brasseur et al. 2005).

The study of Cooper and Haines (1996) has been extended in several ways. For example, Troccoli and Haines (1999) used the idea of vertical advection of water parcels to develop a practical method for updating salinity when assimilating temperature profiles. Haines et al. (2006) subsequently showed how to assimilate both temperature and salinity profiles in a way that allows salinity to vary on isotherms. Balmaseda et al. (2008) combined the above approaches into a practical, multistep scheme for assimilating ocean data into a global ocean model. Ricci et al. (2005) and Weaver et al. (2005) have shown how to construct background error covariance matrices for use in three- and four-dimensional variational assimilation schemes, using balance operators based in part on the ideas of Troccoli and Haines (1999).

A major difficulty with the implementation of sequential schemes, such as those mentioned above, is the specification of the background error covariance. Dee (1995) outlined an interesting scheme for sequentially estimating the parameters that define the covariance structure of the observation and background errors (e.g., their regional variances and correlation length scales). By allowing the parameters to be estimated online, the error covariances could change with time (and indeed the state of the physical system) thereby adding more realism, flexibility, and robustness to the assimilation scheme. [Dee and da Silva (1999) provide further information on the practical implementation of the approach, and Dee et al. (1999) describe several applications.]

In this study we present a computationally efficient method for assimilating sea level measured by altimeters and vertical profiles of temperature and salinity measured by Argo floats. The scheme is sequential, multivariate, and the update of the background is completed in a single step. The scheme allows for the online estimation of background error covariance parameters and also the online correction of temperature and salinity biases. The development of the scheme is strongly influenced by, and builds upon, the work of Derber and Rosati (1989), Dee (1995), Cooper and Haines (1996), Troccoli and Haines (1999), Ricci et al., (2005), and Weaver et al. (2005). The heart of the method is a transformation of sea level, temperature, and salinity into three variables that are associated with physical processes such as vertical displacement of water parcels, fluxes of heat and freshwater across the air–sea interface, turbulent mixing, and convection. The transformation simplifies the specification of the background error covariance matrices. The online estimation of background error covariance parameters is similar in philosophy to Dee (1995) although some technical modifications lead to significant computational savings. The online correction of model temperature and salinity bias is achieved using the simple spectral nudging approach of Thompson et al. (2006).

Two applications illustrate the effectiveness of the new scheme. For the first application, the background is a monthly climatology of observed temperature and salinity and the scheme is used to reconstruct mesoscale variability of the upper 1000 m of the northwest Atlantic for 2004 and 2005. The effectiveness of the scheme is assessed by how well it can recover contemporaneous Argo profiles that were not assimilated. In the second application the backgrounds are forecasts from an eddy-permitting model of the North Atlantic. The effectiveness of the scheme is assessed by comparing forecasts of temperature, salinity, and sea level to observations made at the verifying time (and thus not assimilated). Both applications show that the assimilation scheme has useful skill.

The assimilation scheme is described, and related to earlier studies, in section 2. The climatology application is described in section 3 and the forecasting application is described in 4. Results are summarized and suggestions are made for future work in section 5. Details of the assimilation procedure are given in the two appendixes.

## 2. The assimilation scheme

To motivate the development of the assimilation scheme, the local temperature and salinity relationship is plotted in Fig. 1 at four depths for a location in the northwest Atlantic (Fig. 2). It is clear that the relationship is strongly dependent on depth: in the deep ocean temperature and salinity covary almost linearly, but the strength of the relationship weakens significantly as the surface is approached. To explain the depth-dependent covariance structure, climatological temperature and salinity profiles for March, June, September, and December have been added to the figure. Clearly the temperature and salinity covariation at a fixed depth is closely aligned with the climatological temperature–salinity relationship, consistent with the effect of vertical advection of water parcels past a fixed level (e.g., Troccoli and Haines 1999). The weakening of the temperature and salinity relationship near the surface is presumably due in part to heat and freshwater fluxes across the air–sea interface and turbulent mixing.

A strong relationship also exists between observations of sea level and hydrographic properties. To illustrate, dynamics heights were calculated for 1546 Argo profiles that surfaced during 2003–05 between 26.6°–45°N and 75°–28°W. (The monthly climatology of Yashayaev was the reference state and a level of no motion of 1160 m was assumed. Details on the climatology are available online at www.mar.dfo-mpo.gc.ca/science/ocean/woce/climatology/naclimatology.htm.) The correlation between the dynamic heights and the corresponding sea level measured by altimeters was 0.75 and a scatterplot suggests a near linear relationship, close to the 1:1 line. Thus, for this region at least much of the observed sea level variability is simply due to changes in temperature and salinity through their effect on the vertical integral of water density.

The assimilation scheme described below is based on simple physical processes (e.g., vertical advection of water parcels, turbulent mixing, and air–sea exchange) and simple physical balances (e.g., dynamic heights based on an assumed level of no motion). It will be shown that this simplifies the specification of the background error covariance and reduces the computational cost of the assimilation scheme.

### a. A physically motivated transformation

*T*and

*S*denote the true temperature and salinity at a fixed time, depth, and horizontal location. Let

*η*denote the true sea level at the same time and horizontal location. Similarly let

*T*,

_{b}*S*, and

_{b}*η*denote the corresponding background values (e.g., a climatology or forecast from an ocean model). The relationship between true and background states is modeled as follows:where

_{b}*T*′

*and*

_{b}*S*′

*denote vertical gradients of*

_{b}*T*and

_{b}*S*, Δ

_{b}*ρ*= (

*ρ*

_{0}−

*ρ*)/

_{h}*ρ*

_{0}and

*ρ*

_{0},

*ρ*are the background density at the surface and reference depth

_{h}*h*, respectively. (The reference depth was taken to be 1160 m.) The variables

*ξ*,

_{D}*ξ*, and

_{T}*ξ*define the uncertainty in the background and correspond physically to interface displacement, and sources and sinks of heat and salt, respectively. The constants

_{S}*α*and

_{T}*α*relate changes in

_{S}*T*and

*S*to changes in density. It is assumed that

*ξ*,

_{T}*ξ*, and

_{S}*ξ*vary with horizontal position and time;

_{D}*ξ*and

_{T}*ξ*also vary with depth.

_{S}Equation (1) models the difference between the background and true temperature (i.e., the background error) in terms of an advective contribution, based on vertically displacing the background temperature profile by *ξ _{D}*, and a diabatic forcing term,

*ξ*. A similar interpretation can be given to (2), the corresponding equation for salinity. Equation (3) relates sea level to changes in dynamic height driven by (i) vertical advection of water parcels and (ii) the effect of

_{T}*ξ*and

_{T}*ξ*on the depth integral of water density.

_{S}Equations (1)–(3) define a linear relationship between (*T*, *S*, *η*) and (*ξ _{T}*,

*ξ*,

_{S}*ξ*). The important point to note is that even though the covariance structure of the

_{D}*ξ*variables may be simple, the covariances of the background errors can be quite complex because of their dependence on

*T*′

*and*

_{b}*S*′

*. This is illustrated in Fig. 3, which shows the theoretical correlation between temperature at a given depth with neighboring temperatures and salinities, assuming Gaussian covariances for the independent variables*

_{b}*ξ*,

_{T}*ξ*, and

_{S}*ξ*(see caption of Fig. 3). Note that the correlations are spatially complex and, in general, the maximum

_{D}*T*–

*S*correlation does not occur for collocated temperature and salinity. This figure also shows that the correlation functions for the background errors will not, in general, be separable functions of depth and horizontal position.

### b. Background error covariance matrix

**x**denote a column vector that defines the true ocean state at a given time. In the present setting

**x**has three subvectors. The first two subvectors define the vertical structure of temperature and salinity at the Argo locations and the third subvector defines the sea level on a horizontal grid that spans the study area. Thus, the first two subvectors are defined in observation space (e.g., Kalnay 2003). The column vector

*ξ*defines the three

*ξ*variables at the same locations as the elements of

**x**and thus has the same structure as

**x**. According to (1)–(3) the ocean state and auxiliary variables are related bywhere

**x**

*is the vector of corresponding background values and the elements of the square matrix 𝗠 depend on quantities appearing in (1)–(3) including*

_{b}*T*′

*and*

_{b}*S*′

*. The quantity 𝗠*

_{b}*ξ*is the background error and its covariance matrix is 𝗠𝗕𝗠

^{T}where 𝗕 is the variance of

*ξ*.

*ξ*is given byThe direct product ⊗ arises from the assumption that the covariance of

*ξ*and

_{T}*ξ*are separable functions of horizontal and vertical position. It has also been assumed that

_{S}*ξ*is uncorrelated with

_{D}*ξ*and

_{T}*ξ*.

_{S}*ξ*and

_{T}*ξ*is defined by 𝗕

_{S}*. The (*

_{h}*i*,

*j*) element of 𝗕

*is the covariance between*

_{h}*ξ*(or

_{T}*ξ*) at the

_{S}*i*th and

*j*th Argo float position. It is assumed to be of the formwhere Δ

_{ij}is the horizontal distance between the two Argo positions and

*L*

_{2}is a correlation length scale. Here 𝗕

_{TT}and 𝗕

_{SS}define the vertical covariance structure of

*ξ*and

_{T}*ξ*. The dimension of these matrices depends on the number of levels used to define the ocean state vector. The matrix 𝗕

_{S}_{TS}allows for correlations between

*ξ*and

_{T}*ξ*at different levels. The overall variance level of

_{S}*ξ*and

_{T}*ξ*is set by the parameter

_{S}*γ*

_{2}.

_{D}defines the covariance of

*ξ*. The (

_{D}*i*,

*j*) element of 𝗕

_{D}is assumed to beThe variance of

*ξ*varies with position according to the nonnegative function

_{D}*σ*. This allows the variance of isopycnal displacements to increase in regions with strong eddy activity (e.g., the vicinity of the Gulf Stream). The correlation length scale is set by

_{D}*L*

_{1}and the overall variance of the isopycnal displacements is set by

*γ*

_{1}.

The background error covariance matrix depends on four parameters that set the variance levels (*γ*_{1} and *γ*_{2}) and length scales *(L*_{1} and *L*_{2}) of *ξ*. For convenience we stack these parameters in the vector ** θ** = (

*γ*

_{1},

*γ*

_{2},

*L*

_{1},

*L*

_{2}) and write 𝗕 = 𝗕(

**). In general it is difficult to specify**

*θ***a priori and so it will be estimated, along with the ocean state vector**

*θ***x**, at each analysis time as explained in section 2c.

### c. Simultaneous state and parameter estimation

**y**denote the vector of observations. In the present setting it is made up of temperature and salinity profiles and along-track altimeter observations. The relationship between the observations and the true state

**x**iswhere 𝗛 defines the (linear) interpolation of

**x**to the observation locations, and

*ϵ*denotes the observation error.

**x**and

**, given the observations, is written asThis posterior density function contains all the information on the state of the ocean and the unknown parameters given the observations and the prior density**

*θ**p*(

**). [See Wikle and Berliner (2007) for a discussion of how this approach relates to hierarchical Bayesian modeling.] If we assume**

*θ**ξ*∼

*N*(0, 𝗕) and

*ϵ*∼

*N*(0, 𝗥), where 𝗥 is the covariance matrix of observation errors, the posterior density is proportional towhere

Estimates of **x** and ** θ** at each analysis time were found by maximizing

*p*(

**x**,

**|**

*θ***y**). As noted above, the state vector is not defined at every model grid point (e.g., the elements corresponding to temperature and salinity are defined at observed Argo locations). To estimate gridded fields of temperature, salinity, and sea level we optimally interpolate the estimates of

**x**to the full model grid [in accord with the second step of Physical Space Analysis System (PSAS) as reviewed, e.g., by Kalnay (2003), p. 172]. More details on the optimization algorithm are given in appendix A.

### d. Relationship to previous studies

The idea for the vertical displacement variable, *ξ _{D}*, comes directly from Cooper and Haines (1996). If

*ξ*and

_{T}*ξ*are zero, and error-free sea levels are assimilated, the scheme reduces to that of Cooper and Haines (1996). More specifically from (3) it follows that the water column will be raised or lowered such that the change in dynamic height matches the difference between the observed sea level and the background. Equations (1) and (2) imply updates to temperature and salinity that preserve local water properties. If error-free temperature observations are assimilated then (1) gives the vertical displacement required to bring the model temperature into agreement with observations, and this displacement can then be used in (2) to update salinity. This is equivalent to the linearized version of Troccoli and Haines (1999) proposed by Ricci et al. (2005). If error-free temperature and salinities are assimilated, and

_{S}*ξ*= 0, the scheme is similar to Haines et al. (2006) in that (i) the temperature observations give the vertical displacement from (1), and (ii) the salinity innovations, after adjustment for vertical displacement, are mapped horizontally using optimal interpolation based on the covariance structure of

_{T}*ξ*.

_{S}De Mey and Benkiran (2002) described a reduced-order optimal interpolation scheme that has been used extensively to assimilate vertical profiles of temperature and salinity, and sea level, into eddy-resolving models (e.g., Demirov et al. 2003; Brasseur et al. 2005). They assume that the background error covariance is a separable function of horizontal and vertical position, and reduced the dimension of the state vector by projecting it onto a set of multivariate EOFs defined in the vertical at each horizontal location. In the present study the background error covariance is not assumed a separable function of horizontal and vertical position, rather it is allowed to change with location and time through the vertical gradients in the background state. There is also no need to calculate the EOF in the present scheme and thus the problem of scaling multivariate data prior to EOF analysis is circumvented.

Weaver et al. (2005) used the concept “balance operators” to construct the background error covariance matrix as a prelude to variational data assimilation. Assuming temperature is given, they obtain the balanced component of salinity using the vertical displacement approach of Troccoli and Haines (1999). Balanced sea level is then obtained from dynamic height (calculated from temperature and salinity), and balanced horizontal velocity is calculated assuming geostrophy. The unbalanced components of salinity, temperature, and current are obtained by subtraction of their balanced counterparts. By conditioning on temperature in this way, and using some simple physical constraints, Weaver et al. (2005) transformed the salinity, sea level, and current into four unbalanced variables. By assuming the temperature and unbalanced variables are mutually uncorrelated, they constructed a background error covariance matrix in the original state variables that reflected the assumed physical balances. A similar type of construction is used in this study. The main differences are essentially technical but lead to different background error covariance matrices. For example the present scheme does not condition on a single variable when constructing the background error covariance matrix; the three *ξ* variables appear on an equal footing (i.e., the balance operator matrix is not required to be lower diagonal as in Weaver et al. 2005). Another difference is that the effect of vertical displacement appears explicitly in the present scheme (through the *ξ _{D}* variable) and this significantly affects the structure of the background error covariance matrix.

The online updating of the background error covariance parameters is similar in philosophy to that of Dee (1995). Some technical modifications have been made, however, that reduce significantly the computational cost of the online parameter estimation. Details are given in appendix A.

There are similarities between our assimilation scheme and the operational scheme of Balmaseda et al. (2007, 2008) used to initialize the ocean component of a coupled model for seasonal forecasting. Both schemes are motivated by the physical principles mentioned above. There are, however, some important technical differences: (i) the Balmaseda scheme takes five steps to carry out the assimilation, in contrast to the present scheme that takes one step; (ii) horizontal velocity is not explicitly updated in the present scheme, rather it is updated by the model (possible because of the frequent updating); (iii) Balmaseda et al. (2007) assimilate trends in global sea level to allow for secular changes in ocean volume (no trends are assimilated here because the focus is prediction of mesoscale variability); (iv) the Balmaseda et al. (2007) model is global and its horizontal grid spacing is 1° away from equatorial regions (in contrast the present model is regional and has an eddy-permitting resolution across the whole model domain); and (v) the present scheme uses a simple bias correction scheme for temperature and salinity and an online updating scheme for background error covariance parameters.

## 3. Application 1: Ocean hindcasting

The assimilation scheme is now used to reconstruct the daily, three-dimensional temperature and salinity fields of the northwest Atlantic (see Fig. 2) from 2003 to 2005 using a monthly temperature and salinity climatology as background. The grid has a longitudinal spacing of 1/6° and the latitudinal spacing decreases with latitude such that the grid boxes are approximately square. Horizontal distance will subsequently be specified in terms of grid box widths (denoted by Δ). The vertical grid covers the upper 1000 m of the ocean, corresponding to the depth range best monitored by Argo floats. (See Table 1 for vertical levels.)

### a. The temperature and salinity climatology

The monthly temperature and salinity climatology of Yashayaev was used following spatial smoothing to eliminate some high wavenumbers that did not appear realistic. The resulting fields are henceforth referred to as the smoothed climatology.

Comparison of the Argo observations with the smoothed climatology suggested systematic differences that reached 0.7°C and 0.1 psu close to the surface (see Table 1). The reason for the difference is probably related to low-frequency variability of the hydrographic properties of the North Atlantic; there is no reason to expect the mean of the Argo observations from 2003 to 2005 to match the Yashayaev climatology, which is based on observations covering most of the last century. To remove this bias, annual mean correction fields for temperature and salinity, based on optimal interpolation of the differences between Argo observations and the smoothed climatology, were calculated for each level. The bias-corrected and smoothed monthly climatology is used in this section as the background for the new assimilation scheme.

Visual inspection of *T* ′* _{b}* and

*S*′

*revealed that some gradients were unrealistic near the surface. Hence, for levels shallower than 222 m (level 9), the gradient at 222 m was used after scaling by a depth factor that decreased linearly from 0.9 to 0.3 as the surface was approached. Some unrealistic vertical gradients in shallow water were also removed. These modifications led to significant improvements in the skill of the assimilation scheme.*

_{b}### b. Altimeter and Argo innovations

All available along-track altimeter observations were obtained from Archiving, Validation, and Interpretation of Satellite Oceanographic data (AVISO) for the period 1992–2002 in order to calculate a monthly mean climatology. [Observations from the following missions were used: *Jason-1*, *Envisat*, the *European Remote Sensing Satellite-1/2 (ERS-1/2)*, GFO, and the Ocean Topography Experiment (TOPEX)/Poseidon (T/P).] Innovations for the period 2003–05 inclusive were obtained by subtracting the monthly climatology from along track data from the *Envisat*, *Jason-1*, and updated T/P missions.

The Argo profiles are made from a moving platform at irregular times. To simplify the temporal structure of the Argo data, they were interpolated to regular analysis times using observations from the same platform. The rationale for such Lagrangian interpolation is that Argo floats tend to follow the same water parcel and thus their hydrographic properties will change more slowly in a Lagrangian frame of reference. Note that this approach assumes the flow at the float’s parking depth is representative of flow throughout the water column. The details of the temporal interpolation are as follows. First the horizontal position of the float at the analysis time was estimated by linear interpolation of the two closest fixes from either side of the analysis time. Temperature and salinity innovations were next calculated by removing the smoothed, bias-corrected climatology from the observations. Innovations at the analysis time were estimated by optimal interpolation of innovations from the same float and same level. The autocorrelation function needed for the optimal interpolation was an exponential function of time lag fit to pairs of innovations from the same float, binned by time lag and depth. The exponential decay times (*τ _{T}* and

*τ*) and the autocorrelation extrapolated to zero time lag (

_{S}*ρ*

_{0T}and

*ρ*

_{0S}) are listed in Table 1. [Daley (1991) provides details on the calculation and interpretation of autocorrelation extrapolated to zero lag.]

### c. Observation and background error covariances

To estimate the observation error variance, a spatial autocorrelation function was fit to pairs of altimeter innovations from the same track, binned by spatial separation, and then extrapolated to zero time lag as above. Based on this calculation, and an estimate of the error of representativeness, we took the standard deviation of the altimeter observation errors to be 2.5 cm.

To assimilate the Argo temperature and salinity it is necessary to specify the observation and background error variances in an Eulerian frame of reference. The observation error variance of the Argo observations were estimated from a spatial autocorrelation function fit to Argo innovations at the same analysis time, binned by spatial separation. The standard deviation of the observation errors, estimated from the extrapolation of the autocorrelation to zero separation, was between 0.3° and 0.5°C for temperature and 0.04 and 0.14 psu for salinity with the largest errors close to the surface (Table 1).

The vertical structure of background error covariance at Argo data locations (i.e., the 13 × 13 𝗕_{TT}, 𝗕_{SS}, and 𝗕_{TS} matrices) were calculated for each season by (i) estimating *ξ _{D}* from the altimeter observations using the approach of Cooper and Haines, (ii) calculating the effect of the vertical advection on temperature and salinity using

*T*′

*and*

_{b}ξ_{D}*S*′

*and subtracting it from each Argo profile to give a residual profile, and (iii) calculating 𝗕*

_{b}ξ_{D}_{TT}, 𝗕

_{SS}, and 𝗕

_{TS}from the covariance of these residual profiles.

### d. Time-varying parameter estimates

The assimilation scheme, monthly climatology, and covariance structures described above are now used to provide daily estimates of the four background covariance parameters (*γ*_{1}, *γ*_{2}, *L*_{1}, and *L*_{2}) and the ocean state from November 2003 to December 2005. The Argo temperature and salinity data were mapped to the analysis day using Lagrangian interpolation; the altimeter data were assimilated using a sliding window (e.g., Derber and Rosati 1989) of ±7 days centered on the analysis time. [The altimeter data were weighted by exp (−|*t*_{obs}−*t _{a}*|/15) where the observation and analysis times are in days.]

The overall magnitudes of the correlation length scales (*L*_{1} and *L*_{2}) are stable through time and reasonable in magnitude (between 100 and 120 km in the middle of the domain, see Fig. 4). The typical value of *γ*_{1} corresponds to an rms of *ξ _{D}* of about 20 m;

*γ*

_{2}corresponds to an rms of near surface

*ξ*and

_{T}*ξ*of about 0.5°C and 0.15 psu, respectively.

_{S}There is a clear seasonal cycle in the time variation of *γ*_{1}, *γ*_{2}, and *L*_{2}. One explanation for the strong seasonal cycle of *γ*_{1} is that the vertical stratification (evident in Δ* _{ρ}*, not shown) has a strong seasonal cycle and this could suppress isopycnal variability at certain times of the year. A possible explanation for the seasonal variation in

*γ*

_{2}and

*L*

_{2}is that the bias correction of the climatology was only made to the annual mean (due to paucity of Argo observations) and so the temperature and salinity innovations may still include a seasonal cycle that should have been removed by the background. There is a tendency for

*γ*

_{1}to decrease from 2004 to 2005. We have no simple explanation for this trend.

### e. Effectiveness of the assimilation scheme

A simple comparison of an observed profile against a prediction is not very informative if the profile has been assimilated. In this section the effectiveness of the scheme is assessed by its ability to recover temperature and salinity profiles that were not assimilated. A simple, but stringent, rule was used for withholding observed Argo profiles. For each profile in turn, all profiles within a circle of radius *r* centered on the horizontal position of the profile of interest (henceforth the analysis point) were excluded. This automatically excludes the Argo profile of interest and all other profiles in the exclusion zone, regardless of when they were made. Note that Lagrangian interpolation is still used to estimate profiles at the analysis time, but none of the observed profiles used in the interpolation were made within a distance *r* of the analysis point. By subtracting the withheld observed profile at the analysis point from the corresponding prediction one obtains a measure of the effectiveness of the assimilation scheme. Repeating this procedure for each Argo profile in turn allows statistics of the prediction errors to be calculated.

The depth dependence of the temperature and salinity variability is shown in Fig. 5. The line with circles shows the standard deviation of the difference between the observations and the smoothed, bias-corrected climatology. As expected the highest variability is observed near surface, and also around 600 m where the thermocline and halocline are strongest in this part of the North Atlantic. The continuous line shows the rms when all of the profiles are assimilated (i.e., there is no exclusion zone). As expected the assimilation scheme reduces the observed rms significantly; it is approximately half that of the observations. The more interesting results are shown by the dashed line showing the rms error when the Argo profile of interest is withheld. The rms prediction error falls about halfway between the rms of the observations and the misfits resulting from the assimilation of all profiles. The dashed line clearly shows that the assimilation scheme does have useful skill in terms of predicting temperature and salinity in the North Atlantic.

The impact of increasing the radius of the exclusion zone is shown in Fig. 6 for three representative depth levels. The values at zero exclusion radius correspond to the values shown by the dashed line in Fig. 5. For all curves there is a fairly rapid increase in the rms of the prediction errors as the radius increases to about 8Δ. This increase is the direct effect of losing the information from the Argo profiles as the exclusion zone increases. Beyond 8Δ the scheme still has skill resulting from the assimilation of the altimeter data.

## 4. Application 2: Ocean forecasting

The scheme is now used to assimilate along-track altimeter and Argo profile data into an eddy-permitting model of the North Atlantic. The effectiveness of the scheme is assessed by comparing its forecast errors with those resulting from using monthly climatology as a predictor, and also a model with no assimilation.

### a. The ocean model

The model configuration is similar to Thompson et al. (2006) and Wright et al. (2006). The model is version 2.0 of the Parallel Ocean Program. It is a *z*-coordinate, hydrostatic general ocean circulation model with an implicit free surface. The model grid covers the North Atlantic basin between 7° and 67°N (see Fig. 2) and has a horizontal resolution of 2Δ and 23 vertical levels. The northern and southern boundaries of the model domain are closed and temperature and salinity are strongly restored to the monthly climatology of Yashayaev within boundary sponge layers. To eliminate bias in the interior, the model’s temperature and salinity for all runs was restored to the smoothed seasonal climatology using spectral nudging with a restoring time of 20 days and *κ*^{−1} = 4 yr (see appendix B). The model was spun up from 1 January 1990 driven by surface wind stress, precipitation minus evaporation, latent and sensible heat flux, and net radiative forcing derived from NCEP daily reanalysis fields.

### b. The model runs

The atmospheric forcing for all runs is based on the same reanalysis fields. It is therefore important to note that this application quantifies only the effect of ocean data assimilation, through improved initial conditions, on forecast skill.

#### 1) Control run (RunC)

The model was initialized with the smoothed climatology and integrated from 1990 to 2004. Apart from spectral nudging to the smoothed seasonal climatology, no other data were assimilated.

#### 2) Assimilation run (RunA)

The model was initialized with results from RunC for 1 November 2002 and the model was run, with data assimilation, to the end of 2004. The first two months were used to spin up the estimates of ** θ**. The spatial structure function

*σ*(

_{D}*x*,

*y*) [see (7)] was estimated by the rms of the difference between anomalies of observed sea level and RunC from 1996 to 2002. This increases the variance of the vertical displacements in the eddy-rich Gulf Stream region. To assimilate the altimeter observations we first removed their monthly means and then added back the seasonal mean of the model sea level (calculated sequentially using the time filter employed in spectral nudging). This ensures the altimeter data do not affect the mean sea surface topography of the model, or its seasonal cycle; the altimeter data only contribute information on intraseasonal time scales.

The assimilation of the profile data requires specification of the vertical gradients of background temperature and salinity [see (1) and (2)]. Experimentation showed the most realistic results were obtained by taking a linear combination of the vertical gradients of the model forecast (averaged over the last day) and the smoothed climatology (with weights 1/4 and 3/4, respectively). To minimize shocks to the model, and to allow the velocity field to adjust slowly to changes in the density field, the increment was evenly distributed over the time steps between daily analysis times (e.g., Bloom et al. 1996).

#### 3) Forecast run: (RunF)

Starting with an initial condition provided by RunA for the beginning of each month, the model was run for an additional 22 days with data assimilation but in such a way that only data prior to day 22 were assimilated. This required careful adjustment of the size of the various time windows (including the windows for Lagrangian interpolation and extrapolation of Argo data) to make sure no observations from beyond day 22 were assimilated. The model was then run for 60 days without data assimilation (i.e., in forecast mode). By comparing the forecasts from days 22 to 82 against the corresponding observations it is possible to assess the true forecast skill of the ocean model as a function of lead time. We stress that no observations past day 22 are assimilated and, in this sense, the comparisons shed light on how the assimilation scheme may perform in an operational setting. This type of run was completed for each month from January 2003 to December 2004 giving a total of 24 forecast runs, each 60 days long.

### c. Description and comparison of runs

The ** θ** estimated from the assimilation run (not shown) were similar to those described in the previous section. The main differences are that

**from RunA has (i) more variability through time, (ii) weaker seasonal cycles of**

*θ**γ*

_{2}and

*L*

_{2}, and (iii) smaller

*γ*

_{1}.

Typical snapshots of sea level from observations, and results from RunA and RunF at the same verifying time, are shown in Fig. 7. As expected the analysis (top-right panel) is close to the observations because they were assimilated into the model. It is encouraging to note that forecast made 15 days earlier (bottom-left panel) remains similar to the analysis (e.g., most of the eddies evident in the analysis can be found in the 15-day forecast). As expected the 45-day forecast (bottom-right panel) for the verifying time is not as close to the analysis; much of the forecast skill has been lost, particularly in the vicinity of the Gulf Stream.

To quantify the skill of the assimilation scheme, the rms of the difference between the altimeter observations and the corresponding predictions made by RunC, RunA, and RunF were calculated as a function of forecast lead time. To allow for geographical differences in the effectiveness of the scheme, the rms were calculated for three regions (see Fig. 8). As expected the smallest rms differences were found for RunA and the largest rms values for RunC. The rms of the observed sea levels about the monthly climatology is also shown in Fig. 8; as expected it falls between the rms of the errors for RunC and Run A. Note that these three rms values do not change significantly with lead time; the small differences reflect sampling variability. Clearly the Gulf Stream region (area II) has the most energetic observed sea level variations. RunC overestimates this observed value, presumably because the model is generating eddies with strong sea level signatures but they are not necessarily appearing at the right time or location.

The most interesting information in Fig. 8 is the rate at which the rms of forecast errors from RunF increases with lead time. As expected the rms increases monotonically from the RunA to RunF values as the effect of the observations on the initial condition is forgotten. To provide a simple measure of the effectiveness of the assimilation scheme we define the lead time *τ _{P}* as the time at which the rms of RunF equals the rms of the altimeter innovations (i.e., for lead times less than

*τ*the forecast run outperforms climatology as a predictor of future states). From Fig. 8 it can be seen that

_{P}*τ*is about 9 days in the vicinity of the Gulf Stream area, and over 20 days for the rest of the North Atlantic.

_{P}The forecast skill time scale (*τ _{P}*) is shown in Fig. 9 for temperature and salinity. Over most of the water column the assimilation scheme outperforms climatology for about 15 days. Because of the paucity of Argo data it was not possible to calculate

*τ*in different regions as for sea level. We expect, however, that

_{P}*τ*will probably be shortest in the eddy-rich Gulf Stream region. The domain-averaged

_{P}*τ*for temperature and salinity are shorter than for sea level (cf. Figs. 8 and 9). One reason is that the Argo observations have a mean separation of over 10Δ that is comparable to

_{P}*L*

_{2}. Thus, a significant part of the prediction skill must come from just the altimeter data. Another reason is that the mean time between profiles made by the same Argo float is about 10 days. This means that, on average, an extrapolation of 5 days is required to move from the last Argo observation to the analysis time. Finally, for this application we have not removed the bias from the temperature and salinity climatology and this may contribute to the shorter

*τ*.

_{P}## 5. Summary and discussion

A physically motivated, sequential scheme has been described for assimilating altimeter and Argo observations. The assimilation is carried out in a single step and each observation contributes to the simultaneous update of the model’s temperature, salinity, and sea level fields. The scheme allows for the online correction of temperature and salinity biases (Thompson et al. 2006), and the simultaneous estimation of background error covariance parameters. The latter feature gives the scheme some robustness with respect to covariance choices. The development of the scheme is strongly influenced by, and builds upon, the work of Derber and Rosati (1989), Dee (1995), Cooper and Haines (1996), Troccoli and Haines (1999), Ricci et al., (2005), Weaver et al. (2005), and Haines et al. (2006). The scheme is similar in philosophy to the operational scheme of Balmaseda et al. (2007) used to initialize the ocean component of a coupled model for seasonal forecasting.

To assess the effectiveness of the scheme, two applications were carried out, both focused on the prediction of mesoscale variability of the North Atlantic. In the first application the background temperature and salinity fields were simply monthly climatologies. This was essentially a “data-only” reconstruction of the sea level and three-dimensional temperature and salinity fields of the northwest Atlantic for 2003–05. In the second application the background fields were model forecasts. Both the estimated background error covariance parameters, and the reconstructed fields, from the two applications are reasonable; cross-validation studies based on recovering observed Argo profiles that were withheld from the assimilation scheme, and also forecasting future states, show that the scheme has useful skill.

The forecasting application allowed us to compare the computational efficiency of the assimilation to that of running an eddy-permitting model. For the North Atlantic forecasting application the model has a horizontal grid spacing of 1/3°, 23 levels in the vertical, a free surface, a time step of 1/20 day, daily assimilation updates, and parameter updates every 2 days. The computational cost of the assimilation was about 40% the cost of running the ocean model. The memory requirement of the assimilation was about one-third that of the model. The scheme is therefore relatively cheap compared to other schemes such as the ensemble Kalman filter or the singular evolutive extended Kalman (SEEK) filter.

There are several ways in which the scheme could be improved and extended. At present only profiles with matching temperature and salinity profiles have been assimilated. In principle it should be straightforward to assimilate temperature profiles alone (e.g., expendable bathythermographs). This will add a significant amount of extra information to the model. Ricci et al. (2005) point to a straightforward extension of the present scheme that could introduce some of the advantages of four-dimensional variational assimilation. Their idea is to calculate innovations over a sliding time window using model forecasts made at the exact time and location of the observations [i.e., first guess at appropriate time (FGAT)]. We plan to compare FGAT to the Lagrangian approach used to assimilate Argo profiles in the near future.

We also plan to compare the skill of the new assimilation scheme to existing schemes such as the SEEK filter. The focus will remain on mesoscale variability in the North Atlantic and the skill metrics will be the variance of forecast error as a function of lead time, and statistics such as *τ _{P}*. We also plan to use the new mean sea surface topographies resulting from the Gravity Recovery and Climate Experiment (GRACE) satellite mission (e.g., Thompson et al. 2009) to improve the prediction of the mesoscale through the correction of model biases.

## Acknowledgments

We thank Igor Yashaev for generously making his climatology available to us, Youyu Lu and Dan Wright for help in gridding the climatology and ocean modeling, and two anonymous reviewers for their thorough and helpful comments. This work was funded by the Canadian Foundation for Climate and Atmospheric Sciences through support for the Global Ocean and Atmosphere Prediction and Predictability research network.

## REFERENCES

Balmaseda, M. A., , D. Anderson, , and A. Vidard, 2007: Impact of Argo on analyses of the global ocean.

,*Geophys. Res. Lett.***34****,**L16605. doi:10.1029/2007GL030452.Balmaseda, M. A., , A. Vidard, , and D. L. T. Anderson, 2008: The ECMWF ocean analysis system: ORA-S3.

,*Mon. Wea. Rev.***136****,**3018–3034.Behringer, D., 2007: The Global Ocean Data Assimilation System (GODAS) at NCEP. Preprints,

*11th Symp. on Integrated Observing and Assimilation Systems for the Atmosphere, Oceans, and Land Surface,*San Antonio, TX, Amer. Meteor. Soc., 3.3. [Available online at http://ams.confex.com/ams/pdfpapers/119541.pdf].Bloom, S., , L. Takacs, , A. da Silva, , and D. Ledvina, 1996: Data assimilation using incremental analysis updates.

,*Mon. Wea. Rev.***124****,**1256–1271.Brasseur, P., and Coauthors, 2005: Data assimilation for marine monitoring and prediction: The MERCATOR operational assimilation systems and the MERSEA developments.

,*Quart. J. Roy. Meteor. Soc.***131****,**3561–3582.Cooper, M., , and K. Haines, 1996: Altimetric assimilation with water property conservation.

,*J. Geophys. Res.***101****,**1059–1077.Daley, R., 1991:

*Atmospheric Data Analysis*. Cambridge University Press, 457 pp.Dee, D., 1995: On-line estimation of error covariance parameters for atmospheric data assimilation.

,*Mon. Wea. Rev.***123****,**1128–1145.Dee, D., , and A. M. da Silva, 1999: Maximum likelihood estimation of forecast and observation error covariance parameters. Part I: Methodology.

,*Mon. Wea. Rev.***127****,**1822–1834.Dee, D., , G. Gaspari, , C. Redder, , L. Rukhovets, , and A. da Silva, 1999: Maximum likelihood estimation of forecast and observation error covariance parameters. Part II: Applications.

,*Mon. Wea. Rev.***127****,**1835–1849.De Mey, P., , and M. Benkiran, 2002: A multivariate reduced order optimal interpolation method and its application to the Mediterranean basin-scale circulation.

*Ocean Forecasting: Conceptual Basis and Applications,*N. Pinardi and J. Woods, Eds., Spinger-Verlag, 281–305.Demirov, E., , N. Pinardi, , C. Fratianni, , M. Tonani, , L. Giacomelli, , and P. De Mey, 2003: Assimilation scheme of the Mediterranean forecasting system: Operational implementation.

,*Ann. Geophys.***21****,**189–204.Derber, J., , and A. Rosati, 1989: A global oceanic data assimilation system.

,*J. Phys. Oceanogr.***19****,**1333–1347.Evensen, G., 2006:

*Data Assimilation: The Ensemble Kalman Filter*. Springer, 279 pp.Fukumori, I., 2006: What is data assimilation really solving, and how is the calculation actually done?

*Ocean Weather Forecasting: An Integrated View of Oceanography,*E. Chassignet and J. Verron, Eds., Springer, 317–342.Haines, K., , J. Blower, , J-P. Drecourt, , C. Liu, , A. Vidard, , I. Astin, , and X. Zhou, 2006: Salinity assimilation using

*S(T)*: Covariance relationships.,*Mon. Wea. Rev.***134****,**759–771.Kalnay, E., 2003:

*Atmospheric Modelling, Data Analysis and Predictability*. Cambridge University Press, 341 pp.Ricci, S., , A. Weaver, , J. Vialard, , and P. Rogel, 2005: Incorporating state-dependent temperature–salinity constraints in the background error covariance of variational data assimilation.

,*Mon. Wea. Rev.***133****,**317–338.Sun, C., , M. Rienecker, , A. Rosati, , M. Harrison, , A. Wittenberg, , C. Keppenne, , J. Jacob, , and R. Kovach, 2007: Comparison and sensitivity of ODASI ocean analyses in the tropical Pacific.

,*Mon. Wea. Rev.***135****,**2242–2264.Thompson, K., , D. Wright, , Y. Lu, , and E. Demirov, 2006: A simple method for reducing seasonal bias and drift in eddy resolving ocean models.

,*Ocean Modell.***13****,**109–125.Thompson, K., , J. Huang, , M. Véronneau, , D. Wright, , and Y. Lu, 2009: The mean surface topography of the North Atlantic: Comparison of independent estimates based on satellite, terrestrial gravity and oceanographic observations.

, in press.*J. Geophys. Res.*Troccoli, A., , and K. Haines, 1999: Use of the temperature–salinity relationship in a data assimilation context.

,*J. Atmos. Oceanic Technol.***16****,**2011–2025.Weaver, A., , C. Deltel, , E. Machu, , S. Ricci, , and N. Daget, 2005: A multivariate balance operator for variational data assimilation.

,*Quart. J. Roy. Meteor. Soc.***131****,**3605–3625.Wikle, C., , and L. Berliner, 2007: A Bayesian tutorial for data assimilation.

,*Physica D***230****,**1–16.Wright, D. G., , K. R. Thompson, , and Y. Lu, 2006: Assimilating long-term hydrographic information into an eddy-permitting model of the North Atlantic.

,*J. Geophys. Res.***111****,**C09022. doi:10.1029/2005JC003200.

## APPENDIX A

### Estimation of State and Background Error Covariance

**x**) and the covariance parameter vector (

**) we maximize the posterior density**

*θ**p*(

**x**,

**|**

*θ***y**). This is equivalent to minimizing [see (10)]:In general

**x**,

**) is a complicated function of**

*θ***x**and

**and its minimum must be found numerically. We first assume**

*θ***is known. The optimal value of**

*θ***x**is unique and given byIn practice,

**x**

*is found numerically using a preconditioned, conjugate gradient descent algorithm. (To improve the condition of the background error covariance matrix a small positive number was added to its diagonal elements.) We then substitute (A2) into (A1) and minimize the following function of*

_{θ}**:**

*θ**n*is the number of vertical levels,

_{z}*n*is the number of Argo profiles, andFurther simplifications are possible because, from (7), it is clear that covariance function for is

_{h}*ξ*separable in the east and north coordinate directions. It is thus possible to write 𝗕

_{D}*= 𝗕*

_{D}_{1}⊗ 𝗕

_{2}where 𝗕

_{1}and 𝗕

_{2}are

*n*

_{1}×

*n*

_{1}and

*n*

_{2}×

*n*

_{2}covariance matrices, and thus log |𝗕

*| =*

_{D}*n*

_{2}log|𝗕

_{1}| +

*n*

_{1}log|𝗕

_{2}|. The last term in (A3) comes from the prior for

**.**

*θ*Although **x**,** θ**) is relatively inexpensive to calculate using the above simplifications of the determinant, it is still a complicated function of

**x**and

**and the minimum must be found numerically. We used the quasi-Newton method following careful preconditioning. (The Hessian was updated using the BFGS algorithm.)**

*θ*Standard optimal interpolation, based on 𝗕(** θ**) evaluated using the optimal value of

**, was used to propagate the estimates of**

*θ***x**to any chosen set of points (e.g., a high-resolution spatial grid). In practice the quantity 𝗕

^{−1}

*ξ*is calculated in the first step of the above minimization procedure and this greatly reduces the computational cost of propagating the estimates of

_{θ}**x**to the chosen points. For the present study the displacement vector

*ξ*was defined on a regular grid with a spacing of 3Δ and 2Δ for the applications in sections 2 and 3, respectively.

_{D}#### Maximizing the marginal posterior density

**is to maximize the marginal posterior density,**

*θ**p*(

**|**

*θ***y**) = ∫

*p*(

**x**,

**|y)**

*θ**d*

**x**. This is equivalent to the approach of Dee (1995) and requires the minimization of

In general the maximization of the joint and marginal posterior distributions will give different results. This result is not really surprising because any choice of a single estimate derived from the joint posterior probability density (e.g., mode or median of marginal density) is in some sense subjective. We have chosen to estimate ** θ** by maximizing the joint posterior density for the state and parameters. One advantage is that this approach leads to significant computational savings because it is much easier to evaluate |𝗕| than |𝗥 + 𝗛𝗠𝗕𝗠

^{T}𝗛

^{T}| using the procedure given above.

## APPENDIX B

### Outline of Spectral Nudging

Eddy-permitting ocean models often drift away from the observed state when integrated for long periods of time leading to significant bias errors. The model used in this study is no exception and significant biases were noted in the northwest Atlantic, particularly in the mean path of the Gulf Stream. To suppress drift and bias, we restored the model’s temperature and salinity at all grid points to the observed monthly climatology of Yashayaev. To avoid damping the mesoscale variability, we used the spectral nudging technique of Thompson et al. (2006), which restricts the nudging of the model to specified frequency and wavenumber bands; outside of these bands the model is free to evolve prognostically.

Spectral nudging is implemented by adding a restoring term of the form *λ*〈*X _{t}^{c}* −

*X*〉 to the model’s temperature and salinity update equations, where

_{t}^{f}*X*is the observed climatology at time

_{t}^{c}*t*and

*X*is the corresponding model forecast. The restoring time is proportional to

_{t}^{f}*λ*

^{−1}. The angle brackets denote a filtered quantity that is close to zero for wavenumbers above a specified cutoff and frequencies beyond

*κ*of the climatological frequencies of 0 and 1 cycles per year. Note that the time filter used to calculate the spectral nudges has a zero phase shift at the climatological frequencies. Reducing

*κ*reduces the width of the nudged frequency bands but increases the time for the filter, and the model, to spin up. (The spinup time scales with

*κ*

^{−1}.) The advantages of spectral nudging are that it suppresses seasonal biases in the model’s temperature and salinity fields and leaves the mesoscale variability free to evolve prognostically. For more details on the technique, including the spectral characteristic of the fourth-order Butterworth filter used to spatially smooth the nudges, see Thompson et al. (2006).

Vertical levels and statistics of the Argo observations. The first two columns specify the depths of the 14 levels. The remaining columns give statistics calculated from observed Argo profiles. The two bias columns show the annual mean, spatially averaged differences between the Argo observations and the smoothed climatology. Columns labeled *ρ*_{0} and *τ* define the normalized signal variance, and the *e*-folding times of the exponential autocorrelation function fit to the temperature and salinity innovations from the same Argo float. The last two columns give the estimated standard deviation of the observation errors of the Argo temperatures and salinities. (See text for details)