## 1. Introduction

The Argo project (Argo Science Team 1999) has increased the data coverage of the South Atlantic substantially (Fig. 1). These data, when combined with those from conductivity–temperature–depth (CTD) probes, are sufficient to provide an empirical basis for estimating salinity from measurements of temperature. Such estimates allow temperature-only expendable bathythermograph (XBT) data to characterize density and dynamics. As the Argo project continues and the coverage further improves, the estimates can be refined. Even now, confidence intervals for the estimates provide a useful tool for flagging suspect measurements, and the estimates are sufficiently accurate for computing geostrophic transports from XBT sections and for preserving water-mass properties when XBT data are assimilated into numerical models.

Efforts to estimate salinity from temperature date back to the work of Stommel (1947), which suggested that salinity might be estimated from previous measurements at the same temperature. This idea has often been implemented using climatological mean profiles of salinity and temperature (Conkright et al. 2002a): for the observed value of temperature the estimate for salinity can be read from the TS curve plotted from the climatological means. For some situations this works fairly well, but for others such a curve provides a poor fit to data from individual profiles Thacker (2006). Although fitting regression models to the profile data is considerably more work than using published climatologies, it offers greater flexibility and provides more accurate results (Thacker 2006).

The need for assimilating XBT data into numerical models of oceanic circulation has motivated much of the research on salinity estimation. Haines et al. (2006), building on the work of Troccoli and Haines (1999), implement Stommel’s idea using the TS relationship of the model, which initially might reflect that of the climatology used for the spinup and subsequently should reflect corrections from the assimilation of observed salinity. This is quite practical for data assimilation but not for other applications where there is no model to provide the TS relationship. Other approaches try to capture the TS relationship using joint temperature–salinity empirical orthogonal functions (EOFs) inferred from profile data (Carnes et al. 1994; Maes and Behringer 2000; Maes et al. 2000; Fujii and Kamachi 2003; Sparmocchia et al. 2003). For example, Sparmocchia et al. (2003) approximate the error covariance matrix needed for data assimilation with a reduced-order representation based on multivariate EOFs characterizing vertical variability around a seasonal climatology. While EOF approaches favor the use of a few modes to characterize coherent vertical structure, linear regression (Hansen and Thacker 1999; Fox et al. 2002a; Thacker 2006; Thacker and Sindlinger 2006) can exploit high-density vertical sampling to focus on the variability at specific depths, which can be chosen as desired throughout the water column. Furthermore, regression models can capture systematic local spatial variability by including longitude and latitude among the regressors. For these reasons, the regression approach is pursued here.

While previous studies (Hansen and Thacker 1999; Thacker 2006; Thacker and Sindlinger 2006) have focused on relatively small regions, giving considerable attention to finding the model most appropriate to the locale, the focus here is on treating a considerably larger area (Fig. 1) in a more automatic way. As this study was motivated by the monitoring of the meridional heat flux across 35°S using data from the AX18 high-density XBT line^{1} (Garzoli and Baringer 2007; Baringer and Garzoli 2007), the boundaries were rather arbitrarily set to 25° and 45°S to ensure there would be sufficient CTD and Argo data to characterize the TS relationship everywhere along that line. Because that relationship varies across the region, it was clear that a method was needed for capturing this variability in a systematic way.

Much of the South Atlantic between 25° and 45°S composes the subtropical gyre, but there are significant smaller-scale features (Reid 1989; Peterson and Stramma 1991). On the western boundary of the region, the cold, fresh Malvinas Current branches off from the Antarctic Circumpolar Current in the south; flows northward along the Argentine shelf; meets the warmer, saltier, southward flowing Brazil Current around 38°S offshore from the Rio de la Plata; and together they flow eastward as the South Atlantic Current (Stramma and Peterson 1990; Garzoli 1993; Goñi et al. 1996). At the southeastern boundary Agulhas eddies bring warm salty water from the Indian Ocean into the region (Gordon 1985; Schmid et al. 2003), and the Benguela Current carries water northward along the northeastern boundary (Garzoli and Gordon 1996; Garzoli et al. 1997). The subtropical front extends along roughly the 40°S parallel into the Indian Ocean and separates the South Atlantic’s subtropical gyre from the colder water north of the Antarctic Circumpolar Current (Hofmann 1985; Stramma and Peterson 1990). While better results might be obtained by more careful attention to the oceanography of these features, the objective here is to find a common framework that can provide salinity estimating capability everywhere within this relatively large area *without* having to cater to the individual features. The relatively slow evolution of the TS relationship across the region makes this possible. In some places it will be clear that more local attention is warranted, but results for the region as a whole are quite useful. Consequently, this serves as a prototype for an approach that might be used for other parts of the World Ocean.

To accommodate the spatially varying TS relationship, distance-weighted regression models are defined at regularly spaced points covering the South Atlantic region (Fig. 1). At each grid point models are constructed at 25-dbar intervals by fitting to data from a relatively large neighborhood that extends well beyond the closest neighboring grid points. The size of the neighborhoods allows sufficient data for fitting, their overlap promotes spatial smoothness, and the weighting ensures that the more local data dominate. This approach differs from that of Fox et al. (2002b) in the details determining the distance from each grid point over which data are gathered and the weighting that limits the influence of remote data, but more importantly it differs in the nature of the regression models. Rather than restricting the regression models to be simply linear functions of temperature, here models are considered that are quadratic in temperature and also in longitude and latitude, thereby capturing the curvature of the TS relationship at any point as well as the spatial tendency of the relationship across the fitting region.

The approach taken here is quite similar to local regression as described by Cleveland et al. (1994), which fits regression models locally while allowing nearby data to exert a greater influence,^{2} differing principally in where the local models are centered and in the determination of the extent of the region providing the data. Here, the models are centered on a uniform grid, while their method uses as so-called k-d tree to determine the locations.^{3} These differences are relatively minor, so similar results should be expected using their method. In fact, excellent free software implementing their approach is available,^{4} which could have been used for this application. The approach taken here has the advantage of providing regression coefficients on a regular grid.

Ridgway et al. (2002) have used a similar local regression approach to constructing climatologies, fitting models at points on a regular grid to the 400 data closest to each grid point. Viewed from that perspective the results presented here can be regarded as *conditional* climatologies (i.e., *the* local climatological mean salinity *for a given temperature*). In somewhat different words, the method used here holds the same relationship to cokriging (Cressie 1991) as that of Ridgway et al. (2002) does to simple kriging or optimal interpolation (Gandin 1965; Bretherton et al. 1976). Whereas Fox et al. (2002a) include no spatial variables in their local regression models and Ridgway et al. (2002) omit dependence on temperature from their estimates of salinity, here both spatial and thermal covariability are included.

## 2. Data

The local regression models for the South Atlantic are based on two datasets: CTD profiles from the National Ocean Data Center’s World Ocean Database 2001 (Conkright et al. 2002b) together with the Argo Global Data Assembly Center’s profiles from the Global Ocean Data Assimilation Experiment (GODAE) Monterey Server (Carval et al. 2006). Both come with flags indicating which data are considered to be reliable, and only those flagged as the most reliable have been included in this study. Figure 1 shows the locations of these 2579 CTD profiles and the 5164 Argo profiles. Although the coverage is sparse in some places, especially the south-central part of the region, the gradual change in the TS relationship with latitude and longitude permits reaching far enough to gather sufficient data for stable estimates everywhere.

These profiles were interpolated to standard levels spaced at 25-dbar intervals, and data interpolated over large vertical gaps were discarded. The left panel of Fig. 2 indicates the number of data for each pressure level. Because the salinity estimates are to complement XBT temperature profiles, there was no need to consider levels deeper than 1000 dbar. The dips in the Argo histogram reflect gaps in the profiles. The right panel shows that the CTD and Argo data are from different periods.

Figures 3 –6 show TS plots at 25, 100, 500, and 800 dbar, respectively, for sixteen 20° × 5° subregions laid out with columns corresponding to longitude progressing from west (left) to east (right) and with rows corresponding to latitude progressing from south (bottom) to north (top). The CTD and Argo data are distinguished by color with the CTD data plotted on top.^{5} Such plots were examined for each standard level, and a few obvious outliers were detected and removed. Most of the Argo data were found to be consistent with the CTD data; however, some of the Argo profiles with higher cycle numbers appeared to exhibit salinity drift, and those data were also removed.^{6} The fact that the CTD and Argo data cannot be distinguished without the aid of color on these plots together with the fact that they come from different years (Fig. 2, right panel) indicates that the TS relationship is stationary over the interval for which data are available.

The spread in salinity values at 25 dbar (Fig. 3) is significantly greater than at deeper levels (Figs. 4 –6) and is particularly great in the west. Nevertheless, even at 25 dbar, the knowledge of temperature reduces the spread, especially in the east. The reduction of spread by knowing temperature is quite dramatic at 100 and 500 dbar, and is still evident at 800 dbar where the spread of salinity values without regard for temperature is less than that conditioned on temperature at 25 dbar. At each level and for each panel, the spread remaining after accounting for temperature might be attributed to spatial variations, which are evident in the differences between the TS relationships for the different panels, manifesting at a smaller scale.

The cluster of relatively salty points at 500 dbar in the subregion (30°–35°S, 0°–20°E) in Fig. 5 is suspect. Perhaps those points should have been discarded as bad data during the preliminary cleaning. Nevertheless, they have been included, as subregions immediately to the north and to the south have similar salty clusters, and they will influence the regression models. As more data become available, especially those for which salinity drift has been corrected, deciding which to use will be less problematic. However, it is worth noting that deviations from a smooth fit to the data can be used for detecting possibly problematic profiles in an automated quality-control procedure.

The salinity minimum seen at 100 dbar for the subregion (35°–40°S, 60°–40°W) in Fig. 4 resembles that seen at 500 and 800 dbar (Figs. 5, 6) for the same subregion, except that it is much sharper and a bit warmer.^{7} The kink in the 100-dbar TS plot suggests that separate models should be used for the two sides of the minimum. However, the goal here is to treat the region as a whole without special attention to such details.

The cluster of cold, salty Argo points at 800 dbar in the subregion (25°–30°S, 40°–20°W) in Fig. 6, which is not seen in the CTD data, appears to be an indication of increasing salinity below the minimum, which is more apparent at this depth in the data from farther southwest or northeast. Between the salinity minimum and the deeper salinity maximum, which is beyond the reach of most XBT probes, the increased variability of salinity exhibits little covariability with temperature, so observations of temperature are not particularly helpful for estimating salinity. However, deeper still below the salinity maximum, salinity again shows a strong correlation with temperature. Separate treatment for estimating salinity is indicated for three depth ranges: above the salinity minimum, between the minimum and maximum, and below the salinity maximum. As mentioned above, such special attention is beyond the scope of this project.

## 3. Models

The strategy used here to characterize the TS relationship required fitting local regression models at points spaced 2° in longitude from 60°W to 20°E and 1° in latitude from 25° to 45°S (Fig. 1) for each level from 25 to 1000 dbar at 25-dbar increments.^{8} Each local model was fitted to data from its own depth level in a neighborhood large enough to encompass at least 100 profiles.^{9} However, because not all profiles contribute data at all levels, some models were fitted to fewer than 100 samples. Once fitted to its supporting data, each model provides a prescription for estimating salinity within its grid cell for its depth level. This design yields smooth cell-to-cell and level-to-level variation of the salinity estimates.

*Ŝ*denotes the estimate for salinity;

*T*,

*x*, and

*y*denote observed temperature, longitude, and latitude, respectively; and where the coefficients

*a*,

*b*, . . . ,

*h*were determined for each model by fitting to the local training data.

^{10}Thus, for each grid point at each level, type-1 models require determining four coefficients by fitting to local data, just as type-2, type-3, and type-4 models require determining five, seven, and eight coefficients, respectively.

The models were fitted to their local data using weighted least squares with the tricube function of distance *w*(*d*) = [1 − (*d*/*d*_{max})^{3}]^{3} controlling the weights. Distance is measured so that adjacent grid points are equally distant whether separated in longitude or latitude: *d* = *x* − *x*_{0})2/4 + (*y* − *y*_{0})^{2}*x*_{0} and *y*_{0} are the longitude and latitude of the model grid point and *x* and *y* are those for the observation. Dividing by the distance to the most remote point *d*_{max} allows the weights to scale so that more distance points have greater influence in sparsely sampled regions.

The data were partitioned into two groups—one for fitting the local regression models and the other for verifying their performance. While the models were to be fitted to data from overlapping regions, each was intended to be used to characterize the salinity within its own grid cell, so the verification data should be compared to models associated with the closest grid point. The verification data were chosen randomly by taking every third profile from each of the 2° × 1° cells centered on each grid point, leaving two-thirds of the closest data for fitting. Of the 7743 profiles, 2379 were chosen to be used for verification; even then, not all grid cells had sufficient data for verification (Fig. 7). Because models were fitted to data within overlapping regions, data used to verify models at one grid point were used to train models at other grid points. Even for highly sampled grid cells there is an advantage of reaching into the neighboring cell to get additional data for fitting, as this helps to guarantee smooth variations of estimates across cell boundaries. So all data within an ellipse with a longitudinal radius of 2° and latitudinal radius of 1° were used. Consequently, for grid points in highly sampled regions, most of which are near the northeastern limit of the domain, more than 100 data were used. Generally, the data were gathered from a larger ellipse of the same aspect ratio. Figure 8 indicates the maximum reach *d*_{max} used for each grid cell.

Figures 9 –12 illustrate how well each model fits the local data at 25, 100, 500, and 800 dbar, respectively. Color is used to indicate ranges of residual standard error,^{11} a conventional measure of the model accuracy, and each panel corresponds to one of the four model types. The smallest fitting errors are expected for the type-4 models (bottom panel), as they have the most parameters.^{12} The generally small differences between the errors for the four model types indicate that the simplest type-1 model captures most of the variability. Models of types 2 and 4 with quadratic dependence on temperature definitely do better near the western boundary at 500 dbar (Fig. 11) where there is a pronounced salinity minimum; similar but less dramatic improvements due to quadratic temperature dependence are evident in the eastern quarter of the domain. To a lesser extent the same is seen at 800 dbar (Fig. 12).

As expected the residuals are largest at 25 dbar where the variability is greatest and temperature is least helpful. Nevertheless, the residual standard error at 25 dbar is less than 0.45 psu everywhere, which is considerably smaller than the ranges of salinity seen in Fig. 3. Those for 100 dbar are smaller by one-third, and those at 500 and 800 dbar are again one-third smaller.

It is reassuring that the regions of sparse sampling seen in Fig. 1 and reflected in Fig. 8 do not stand out on the plots of residual standard errors. In fact, quadratic spatial variability of type-3 and type-4 models does not seem to offer a dramatic advantage for handling data voids over the linear spatial variability of type-1 and type-2 models. At 500 dbar, quadratic temperature dependence is clearly more important in the data-sparse areas than is quadratic spatial dependence.

The data composing the salty cluster of subregion (30°–35°S, 0°–20°E) seen at 500 dbar in Fig. 5 present the large residuals contributing to the local increase in standard error near 2°E and 32°S in Fig. 11. This suggests that questionable data, which survived the initial flag-based and visual screening, might be identified via their large residuals. Perhaps the time-consuming visual screening might even be eliminated in favor of an automated screening of residuals from a preliminary fit to type-1 models or even to models without spatial regressors.

While the residual standard errors are available for each model at each grid point and each pressure level and thus provide useful information about its performance, such assessments based entirely on training data can paint an optimistic picture. Whenever possible, it is best to judge performance by seeing how well the models can reproduce independent data. For this reason not all data were used for fitting; some were held back for independent verification. Unfortunately, as Fig. 7 illustrates, there are insufficient data to score each of the many models individually. Still, because the models vary slowly from point to point, average verification scores for subregions can be used to characterize the different model types.

Figure 13 gives an idea of how well each type of model can be expected to perform within sixteen 20° × 5° subregions. Each panel corresponds to a different subregion, some with more verification data than others. Each curve corresponds to one of the four model types: at each pressure level 90% of the measurements differed from the corresponding estimates by less than the value indicated by the curve. In many cases the four types of models have only small differences in performance, so with the scale set by the problematic near-surface levels, some curves occasionally overlay others.^{13} Those that are quadratic in temperature have the smallest 90th percentile error for almost all levels and subregions; when the black curve obscures the blue, the more parsimonious type-2 models score as well as type 4. Type-1 models, linear in temperature, longitude, and latitude, score best by a tiny margin at midlevels in subregion (30°–35°S, 0°–20°E).

Similar curves can be drawn for the root-mean-square errors. Generally they look quite similar to 90th percentile curves except that values are about two-thirds as large. The notable exception is the occurrence of large midlevel spikes in subregion (35°–40°S, 0°–20°E) for models of types 2 and 4, which are quadratic in longitude and latitude. This suggests that these models, while generally providing reasonable estimates, are more sensitive to unexpected inputs than are models with linear spatial variations.

As expected, the largest errors occur close to the surface. In subregions (35°–40°S, 60°–40°W) and (40°–45°S, 60°–40°W) in the southwest, the 90th percentile values are off scale: the smallest values in both regions are for type-2 models (0.77 and 0.61, respectively) and the largest are for type-3 models (0.79 and 0.65). These large errors reflect the scatter observed in the TS plots (Fig. 3) for these subregions. For most of the other subregions, the 90th percentile absolute verification errors at 25 dbar are smaller by half. Deeper than 100 dbar, 95% of the plotted 90th percentile values are smaller than 0.1 psu, 75% are less than 0.06 psu, and half are less than 0.044 psu. The greatest accuracy is around 400–600 dbar. The decline in accuracy with increasing depth is most pronounced for subregion (35°–40°S, 0°–20°E).

If, for convenience, estimates are to be restricted to those from a single model type for the entire South Atlantic region under consideration, the best choice would be type-2 models, which are quadratic in temperature and linear in longitude and latitude. The small advantage occasionally offered by the fully quadratic type-4 models is countered by their propensity to occasional large errors. On the other hand, a different model type can be chosen depending on the longitude, latitude, and depth at which the estimate is needed. While such a choice might be based on verification summaries within the sixteen 20° × 5° subregions, it may be desirable to choose different model types within a subregion. As the verification scores within each of the 16 subregions are generally consistent with the residual standard errors for that region’s grid points, the choice of which model to use can be guided by the residual statistics. Fortunately, the Argo program should be providing data regularly, and those profiles can be used as additional verification data to guide the local choice of model type.

It is interesting to compare the estimates from the type-2 models with those based on readily available climatological profiles. The *World Ocean Atlas 2001* (*WOA01*; Conkright et al. 2002a) has climatological mean profiles of temperature and salinity for the individual months and also without regard for seasonal variations; the latter are referred to as the annual climatological means. This offers four possibilities for estimating salinity: a choice of using either annual or monthly mean profiles combined with a choice of using estimates based on the climatological salinity profiles or based on the TS relationship inferred from climatological profiles of both temperature and salinity. Figure 14 compares the root-mean-square errors of these four approaches with those from the type-2 regression models. The errors for the regression models are scored for the verification profiles only (including the training profiles would give smaller errors); errors for climatological estimates were based on all profiles. At all pressure levels the regression approach gives substantially better estimates than do the methods that exploit the climatological profiles. Except near the surface where there is no strong relationship between salinity and temperature, it is clear that using the climatologically inferred TS relation is better than using only the mean salinity profile. And for both types of climatological estimates, the annual climatology gives better results than the monthly.

While errors in the geopotential height at 25 dbar relative to 1000 dbar are relatively small for all four *WOA01*-based methods for estimating salinity, Fig. 15 indicates that type-2 regression estimates are considerably smaller.

## 4. Conclusions

The local regression approach to characterizing the spatially varying TS relationship has been shown to be feasible for the South Atlantic Ocean between 25° and 45°S, and it can be considered for use throughout the world’s oceans. While the results found here might not apply everywhere, they do suggest a starting point for other regions.

An obstacle hindering the implementation of this or any other approach that deals with data from the CTD and Argo archives is detecting and avoiding bad data. Even when only data flagged as the most reliable are used, among them will be data that behave quite differently from the majority. As the delayed-mode Argo data become available, this will be less of a concern. Until then, it is recommended that TS plots be examined for outliers, which can be excluded when fitting the regression models. An objective approach to accomplish the same without the need for as intense a visual inspection might be developed based on the identification of large residuals from smooth fits to TS plots.

A principal conclusion is that the systematic change in the TS relationship with longitude and latitude in this part of the South Atlantic is sufficiently gradual that they can be modeled as linear terms and still have the capability for relatively accurate estimates in areas for which few TS profiles are available. In fact, quadratic spatial dependence in a few instances produced estimates with unusually large errors. However, in other parts of the world where the horizontal evolution of the TS relationship is more rapid, such quadratic terms might prove necessary. Quadratic dependence on temperature, on the other hand, proved valuable and did not substantially degrade salinity estimates in situations where it was not needed.

Fronts associated with near-surface currents were not an obstacle to the basinwide characterization of the TS relationship, as at any given level all data fell nicely into a coherent pattern on TS plots. On the other hand, the basinwide salinity minimum marked a change in the nature of the TS relationship, as did the deeper salinity maximum. Above the salinity minimum and below the salinity maximum, temperature accounts for a substantial part of salinity variability. However, between the minimum and maximum temperature is not of much help. In this depth range the climatology of salinity conditioned on temperature reverts to an unconditioned climatology and the methodology reverts to something quite similar to the local regression technique of Ridgway et al. (2002).

This study did not address the problem of modeling near-surface salinity, because of its large range of variability and its weak covariability with temperature. Seasonality, which might account for much of salinity’s near-surface variability, could be incorporated by using the sine and cosine of the day of the year as regressors. However, this would require relatively uniform sampling throughout the seasonal cycle. As the Argo data accumulate and coverage becomes more uniform, such models should be feasible for much of the World Ocean. Soon, satellites should be providing measurements of surface salinity, and these data may prove useful not only for improving our knowledge of surface salinity but also for sharpening our estimates at 25 and 50 dbar via salinity’s vertical autocorrelations. While waiting for these developments, the models found here for 25 dbar can be used to estimate salinity between 25 dbar and the surface.

These models can be distributed as sets of model coefficients associated with each grid point and each level. To estimate salinity, choose the model coefficients for the closest grid point to the target location for the levels that bracket the target depth, evaluate the salinity from the target temperature, longitude, and latitude, and interpolate to the target depth. When using these models to estimate salinity, care should be taken when the target temperature is outside the range of values usually encountered at the target level. When the temperature is extremely warm, it might be better to base the estimate on a model for the same grid point but for a shallower level, where that temperature is more frequently encountered, as the observed value is likely to reflect a large downward displacement. Similarly, unusually cold values suggest the use of a model for deeper levels.

The best type of model for the region as a whole was type 2, with linear spatial and quadratic dependence on temperature, which requires five coefficients to be determined from the data in a neighborhood around each grid point at each depth. In this study the extent of the neighborhood from which the data were drawn was controlled by the target of 100 profiles, which seemed sufficient for determining the five coefficients while limiting the size of the neighborhoods in data-sparse areas. As predictive skill might improve if more data were used, a future study examining the sensitivity of the estimates to sample size would be useful.

## Acknowledgments

This work was supported by the National Oceanographic Partnership Program, by the National Oceanic and Atmospheric Administration’s Office of Global Programs, and by the Atlantic Oceanographic and Meteorological Laboratory.

## REFERENCES

Argo Science Team, 1999: Argo: The global array of profiling floats.

*Proc. Conf. on the Ocean Observing System for Climate: OCEANOBS99,*St. Raphael, France, Argo, 1–12.Baringer, M. O., and Garzoli S. L. , 2007: Meridional heat transport determined with expendable bathythermographs. Part 1: Error estimates from model and hydrographic data.

,*Deep-Sea Res.***54****,**1390–1401.Bretherton, F. E., Davis R. E. , and Fandry C. B. , 1976: A technique for objective analysis and design of oceanographic experiments applied to mode-73.

,*Deep-Sea Res.***23****,**559–582.Carnes, M. R., Teague W. J. , and Mitchell J. L. , 1994: Inference of subsurface thermocline structure from fields measurable by satellite.

,*J. Atmos. Oceanic Technol.***11****,**551–566.Carval, T., and Coauthors, 2006: Argo data management user’s manual, version 2.1. ARGO, 60 pp. [Available online at http://www.usgodae.org/argo/argo-dm-user-manual.pdf.].

Cleveland, W. S., Grosse E. , and Shyu M-J. , 1992: A package of C and Fortran routines for fitting local regression models: Loess user’s manual. Bell Labs, 54 pp.

Cleveland, W. S., Grosse E. , and Shyu W. M. , 1994: Local regression models.

*Statistical Models in S,*J. M. Chambers and T. J. Hastie, Eds., Wadsworth & Brooks, 309–376.Conkright, M., Locarnini R. A. , Garcia H. E. , O’Brien T. , Boyer T. , Stephens C. , and Antonov J. , 2002a:

*World Ocean Atlas 2001: Objective Analysis, Data Statistics, and Figures*. NOAA/NESDIS/National Oceanographic Data Center Internal Rep. 17, CD-ROMs 1, 2.Conkright, M., and Coauthors, 2002b:

*World Ocean Database 2001*. NOAA/NESDIS/National Oceanographic Data Center Internal Rep. 16, CD-ROM, 1–8.Cressie, N. A., 1991:

*Statistics for Spatial Data*. Wiley & Sons, 900 pp.Fox, D. N., Barron C. N. , Carnes M. R. , Booda M. , Peggion G. , and Gurley J. V. , 2002a: The modular ocean data assimilation system.

,*Oceanography***15****,**22–28.Fox, D. N., Teague W. J. , Barron C. N. , Carnes M. R. , and Lee C. M. , 2002b: The Modular Ocean Data Assimilation System (MODAS).

,*J. Atmos. Oceanic Technol.***19****,**240–252.Fujii, Y., and Kamachi M. , 2003: Three-dimensional analysis of temperature and salinity in the equatorial Pacific using a variational method with vertical coupled temperature-salinity empirical orthogonal function modes.

,*J. Geophys. Res.***108****.**3297, doi:10.1029/2002JC001745.Gandin, L. S., 1965:

*Objective Analysis of Meteorological Fields*. Israel Program for Scientific Translations, 242 pp.Garzoli, S. L., 1993: Geostrophic velocity and transport variability in the Brazil-Malvinas confluence.

,*Deep-Sea Res.***40****,**1379–1403.Garzoli, S. L., and Gordon A. L. , 1996: Origins and variability of the Benguela Current.

,*J. Geophys. Res.***101****,**897–906.Garzoli, S. L., and Baringer M. O. , 2007: Meridional heat transport determined with expendable bathythermographs. Part II: South Atlantic transport.

,*Deep-Sea Res.***54****,**1402–1420.Garzoli, S. L., Goñi G. J. , Mariano A. J. , and Olson D. B. , 1997: Monitoring the upper southeastern Atlantic transports using altimeter data.

,*J. Mar. Res.***55****,**453–481.Goñi, G., Kamholz S. , Garzoli S. , and Olson D. , 1996: Dynamics of the Brazil-Malvinas confluence based on inverted echo sounders and altimetry.

,*J. Geophys. Res.***101****,**16273–16289.Gordon, A. L., 1985: Indian-Atlantic transfer of thermocline water at the Agulhas Retroflection.

,*Science***227****,**1030–1033.Haines, K., Blower J. D. , Drecourt J-P. , Liu C. , Vidard A. , Astin I. , and Zhou X. , 2006: Salinity assimilation using

*S*(*T*): Covariance relationships.,*Mon. Wea. Rev.***134****,**759–771.Hansen, D. V., and Thacker W. C. , 1999: On estimation of salinity profiles in the upper ocean.

,*J. Geophys. Res.***104****,**7921–7933.Hofmann, E. E., 1985: The large-scale horizontal structure of the Antarctic circumpolar current from FGGE drifters.

,*J. Geophys. Res.***90****,**7087–7097.Maes, C., and Behringer D. , 2000: Using satellite-derived sea level and temperature profiles for determining the salinity variability: A new approach.

,*J. Geophys. Res.***105****,**8537–8547.Maes, C., Behringer D. , Reynolds R. W. , and Ji M. , 2000: Retrospective analysis of the salinity variability in the western tropical Pacific Ocean using an indirect minimization approach.

,*J. Atmos. Oceanic Technol.***17****,**512–524.Peterson, R. G., and Stramma L. , 1991: Upper-level circulation in the South Atlantic Ocean.

,*Prog. Oceanogr.***26****,**1–73.R Development Core Team, cited. 2005: R: A language and environment for statistical computing. R Foundation for Statistical Computing Manual, Ref. Index version 2.4.0, Vienna, Austria, 2535 pp. [Available online at http://www.R-project.org.].

Reid, J. L., 1989: On the total geostrophic circulation of the South Atlantic Ocean: Flow patterns, tracers, and transports.

,*Prog. Oceanogr.***23****,**149–244.Ridgway, K. R., Dunn J. R. , and Wilkin J. L. , 2002: Ocean interpolation by four-dimensional weighted least squares—Application to the waters around Australasia.

,*J. Atmos. Oceanic Technol.***19****,**1357–1375.Schmid, C., Siedler G. , and Zenk W. , 2000: Dynamics of intermediate water circulation in the subtropical South Atlantic.

,*J. Phys. Oceanogr.***30****,**3191–3211.Schmid, C., Boebel O. , Zenk W. , Lutjeharms J. R. E. , Garzoli S. L. , Richardson P. L. , and Barron C. , 2003: Early evolution of an Agulhas Ring.

,*Deep-Sea Res.***50****,**141–166.Sparmocchia, S., Pinardi N. , and Demirov E. , 2003: Multivariate empirical orthogonal function analysis of the upper thermocline structure of the Mediterranean Sea from observations and model simulations.

,*Ann. Geophys.***21****,**167–187.Stommel, H., 1947: Note on the use of the T-S correlation for dynamic height anomaly calculations.

,*J. Mar. Res.***6****,**85–92.Stramma, L., and Peterson R. G. , 1990: The South Atlantic Current.

,*J. Phys. Oceanogr.***20****,**846–859.Thacker, W. C., 2006: Estimating salinity to complement observed temperature: 1. Gulf of Mexico.

,*J. Mar. Syst.***65****,**224–248.Thacker, W. C., and Sindlinger L. , 2006: Estimating salinity to complement observed temperature: 2. Northwestern Atlantic.

,*J. Mar. Syst.***65****,**249–267.Troccoli, A., and Haines K. , 1999: Use of the temperature–salinity relation in a data assimilation context.

,*J. Atmos. Oceanic Technol.***16****,**2011–2025.Venables, W. N., and Ripley B. D. , 2002:

*Modern Applied Statistics with S*. 4th ed. Springer-Verlag, 495 pp.Wong, A. P. S., Johnson G. C. , and Owens W. B. , 2003: Delayed-mode calibration of autonomous CTD profiling float salinity data by

*θ*–*S*climatology.,*J. Atmos. Oceanic Technol.***20****,**308–318.

(left) Number of data for each standard level. (right) Number of profiles contributing data for each year. Cyan, CTD; magenta, Argo.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

(left) Number of data for each standard level. (right) Number of profiles contributing data for each year. Cyan, CTD; magenta, Argo.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

(left) Number of data for each standard level. (right) Number of profiles contributing data for each year. Cyan, CTD; magenta, Argo.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Data at 25 dbar for sixteen 20° × 5° subregions. Cyan indicates CTD data; magenta, Argo. Dark green indicates contours of constant potential density.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Data at 25 dbar for sixteen 20° × 5° subregions. Cyan indicates CTD data; magenta, Argo. Dark green indicates contours of constant potential density.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Data at 25 dbar for sixteen 20° × 5° subregions. Cyan indicates CTD data; magenta, Argo. Dark green indicates contours of constant potential density.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 100 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 100 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 100 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 500 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 500 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 500 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 800 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 800 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 3, but for 800 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Color indicates the number of verification profiles for each 2° × 1° grid cell. Of the 774 cells, 269 have no verification profiles and 285 have more than 2. Not all profiles provide data for all levels, so the number of verification data vary slightly with level.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Color indicates the number of verification profiles for each 2° × 1° grid cell. Of the 774 cells, 269 have no verification profiles and 285 have more than 2. Not all profiles provide data for all levels, so the number of verification data vary slightly with level.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Color indicates the number of verification profiles for each 2° × 1° grid cell. Of the 774 cells, 269 have no verification profiles and 285 have more than 2. Not all profiles provide data for all levels, so the number of verification data vary slightly with level.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Distance from grid point to most remote training station indicated by color. Distance is measured in elliptical increments with one unit indicating 1° latitude and 2° longitude.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Distance from grid point to most remote training station indicated by color. Distance is measured in elliptical increments with one unit indicating 1° latitude and 2° longitude.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Distance from grid point to most remote training station indicated by color. Distance is measured in elliptical increments with one unit indicating 1° latitude and 2° longitude.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Residual standard error for each model fitted to local data at 25 dbar. Panel labels indicate model type.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Residual standard error for each model fitted to local data at 25 dbar. Panel labels indicate model type.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Residual standard error for each model fitted to local data at 25 dbar. Panel labels indicate model type.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 100 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 100 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 100 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 500 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 500 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 500 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 800 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 800 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Same as in Fig. 9, but at 800 dbar.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

The 90th percentile of absolute value of verification errors by model type over sixteen 20° × 5° subregions. Cyan, model 1; blue, 2; red, 3; black, 4. Note that all curves at 25 dbar are off scale for both western subregions south of 35°S.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

The 90th percentile of absolute value of verification errors by model type over sixteen 20° × 5° subregions. Cyan, model 1; blue, 2; red, 3; black, 4. Note that all curves at 25 dbar are off scale for both western subregions south of 35°S.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

The 90th percentile of absolute value of verification errors by model type over sixteen 20° × 5° subregions. Cyan, model 1; blue, 2; red, 3; black, 4. Note that all curves at 25 dbar are off scale for both western subregions south of 35°S.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

RMSE for five different methods for estimating salinity. Blue curve corresponds to type-2 regression models. Red (green) indicates estimates based on annual (monthly) climatology; circles indicate the climatological salinity and the curves indicate the results of using a TS relationship obtained by combining temperature and salinity climatologies.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

RMSE for five different methods for estimating salinity. Blue curve corresponds to type-2 regression models. Red (green) indicates estimates based on annual (monthly) climatology; circles indicate the climatological salinity and the curves indicate the results of using a TS relationship obtained by combining temperature and salinity climatologies.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

RMSE for five different methods for estimating salinity. Blue curve corresponds to type-2 regression models. Red (green) indicates estimates based on annual (monthly) climatology; circles indicate the climatological salinity and the curves indicate the results of using a TS relationship obtained by combining temperature and salinity climatologies.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Box-and-whisker plots of errors in geopotential height at 25 dbar relative to 1000 dbar associated with five approaches for estimating salinity. Central dot indicates median; box, interquartile range; dots beyond whiskers, outliers.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Box-and-whisker plots of errors in geopotential height at 25 dbar relative to 1000 dbar associated with five approaches for estimating salinity. Central dot indicates median; box, interquartile range; dots beyond whiskers, outliers.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

Box-and-whisker plots of errors in geopotential height at 25 dbar relative to 1000 dbar associated with five approaches for estimating salinity. Central dot indicates median; box, interquartile range; dots beyond whiskers, outliers.

Citation: Journal of Atmospheric and Oceanic Technology 25, 1; 10.1175/2007JTECHO530.1

^{1}

Four times a year voluntary observing ships traveling between Cape Town, South Africa; and Buenos Aires, Argentina; launch XBT probes every 10–50 km to measure temperature to depths of about 800 m. Preliminary results show that the difference in the meridional heat transport was about 0.03 PW (i.e., about 6%) when using salinity estimated using the type-2 models described below compared to when salinity was estimated based on climatological profiles.

^{2}

Their suggestion of the tricube function for distance weighting is adopted here.

^{3}

The region is sequentially partitioned by either longitude or latitude, whichever has the greater spread of data stations, until a further partitioning would reduce the number of data below a specified threshold.

^{4}

This functionality is provided by the loess package (Venables and Ripley 2002) for the R software (R Development Core Team 2005). Software written in C and FORTRAN is also available (Cleveland et al. 1992).

^{5}

While some Argo points are obscured, this seemed better than obscuring the less numerous CTD points by plotting Argo points on top.

^{6}

At the time of this study, none of these Argo profiles had undergone the delayed-mode quality control to detect and to correct salinity drift (Wong et al. 2003).

^{7}

These data reflect the salinity minimum throughout the South Atlantic associated with the Antarctic Intermediate Water, which is generally found between 800 and 1000 dbar but is considerably shallower in the southwest and somewhat shallower in the northeast, and the surface associated with the salinity minimum forming a trough running from the southeast to northwest (Schmid et al. 2000).

^{8}

Because of the high variability of both salinity and temperature and their weak covariability near the sea surface, the modeling of salinity for pressures less than 25 dbar deserves a separate study that would examine the predictive utility of seasonality.

^{9}

Better results might have been obtained by using more profiles for fitting, but this was not explored.

^{10}

The R function lm was used for fitting the models to the data.

^{11}

Residual standard error *N*^{−1}Σ* _{n}w_{n}*(

*S*

_{n}−

*Ŝ*)

_{n}^{2}

*N*(i.e., one less than the difference between the number of data determining the fit and the number of parameters determined by the fit). The sum is over all local training data with weights

*w*reflecting their distance from the model grid point; the residual is the difference between the observed

_{n}*S*and estimated

_{n}*Ŝ*salinity. If

_{n}*N*were the number of data rather than the number of degrees of freedom, the expression for residual standard error would be that for the minimized root-weighted-mean-square residual.

^{12}

For the root-weighted-mean-square error this would definitely be the case. Because the residual standard error takes into account the number of parameters, ineffective additional parameters can actually increase this measure of error a bit. Such small differences are generally less than the contouring intervals and thus unlikely to be seen in these figures.