## 1. Introduction

Comprehensive air–sea interaction and climate studies have been restricted in the past due to the lack of uniformly gridded upper-ocean temperature datasets with a temporal coverage of two to three decades. Unlike meteorological studies, oceanographic studies do not have the luxury of regular time series observations spanning decades. Most historical oceanographic data have been collected from ships, which not only include research vessels but also voluntary observing ships (VOSs). Consequently, due to variations in cruise times and shipping routes, oceanographic data collected from ships provide irregularly distributed information in space and time. Hence, a major consideration when analyzing historical ocean data in ocean–atmosphere interaction and other large-scale climate studies is the distribution of these data in space and time.

Given the recent increase in the number of ocean temperature profiles now available through present-day ocean observing systems—for example, through the intensive period of expendable bathythermograph (XBT) deployments during the Tropical Ocean Global Atmosphere (TOGA) program—generating uniformly gridded maps of ocean fields on a *global* scale at sufficient spatial (in three dimensions) and temporal resolution is becoming more computationally expensive, even for present-day computers. Nevertheless, the demands of present-day climate research are such that scientists require large- to global-scale gridded fields, with objective measures of the observational/mapping error fields being essential, in order to quantify estimates of climate variability and climate change. It is these data that will be used to validate and verify the predictions from climate change scenarios. The limiting factor in producing gridded maps, however, remains the large computational costs involved with the objective mapping procedures (e.g., Gandin 1963; Bretherton et al. 1976; Roemmich 1983).

Fukumori and Wunsch (1991) have more recently explored the use of empirical orthogonal function (EOF) analysis, by the singular value decomposition (SVD), to reduce the volume of data associated with profiles of temperature, salinity, oxygen, and nutrients obtained along a series of sections across the North Atlantic basin between 1981 and 1985. They showed that these data can be decomposed into a collection of EOFs, and that the number of significant modes required to represent the vertical (and horizontal) structure in the data may be dramatically reduced as a consequence of the inherent vertical and horizontal correlations within the ocean. A similar analysis was applied successfully by Bindoff and Wunsch (1992) to the South Pacific Ocean.

As indicated in their review, Fukumori and Wunsch’s (1991) application of EOFs to hydrography was, at the time, in itself not new [prior work, for example, includes the more locally/regionally focused study of Frankignoul (1981)]. Rather, Fukumori and Wunsch (1991) applied the methodology to the entire North Atlantic basin using both hydrographic and chemical observations. In the present paper, we demonstrate, by application to historical mechanical bathythermograph (MBT) and XBT observations collected in the southwest Pacific Ocean between 1955 and 1988, that the EOF methodology may be successfully extended to four dimensions through the inclusion of temporal variations in the ocean data, and that the objective mapping of only a few significant modes can produce an unbiased and statistically consistent gridded time history of the large-scale temperature changes with objective error estimates.

The combined technique described here adds to the previous literature primarily through its (i) improved estimate of the first-guess (a priori) mean field as a polynomial in two-dimensional space with annual and semiannual harmonics in time; and (ii) addition of time to produce four-dimensional gridded maps in polar coordinates, *T*(*λ, ϕ, z, t*), directly.

Other major strengths of the method are the (i) production of smooth, gridded maps in *both* space and time;(ii) identification and quantification of noise in the data;(iii) preservation of the vertical structure in the data; (iv) retention of only the most important information in the data, and as a consequence; (v) vastly improved efficiency in computing time.

This paper describes the data (section 2), the methods used to map the data (section 3), and the results from the technique applied to the MBT and XBT data, together with an analysis of the residuals and mapped fields (section 4). A discussion of the results and some concluding remarks are presented in section 5.

## 2. Data

The ocean temperature data used in this study are from the National Oceanographic Data Center, Washington, DC, (National Oceanographic Data Center 1991) CD-ROM. Most of the available MBT and XBT open ocean data, about 40 000 casts, were examined for the southwest Pacific Ocean region 0°–50°S, 140°E–180° covering the period between 1955 and 1988. These data were selected from the total of 56 000 casts that were available. There are a disproportionately large number of observations along the East Australian shelf off the New South Wales coast, with 30% of the total dataset located between 30° and 37°S and between the East Australian coast and 154°E.

The application of the technique described in this paper focuses on the climate of the *large-scale* ocean temperature field, in particular, the broad-scale, upper-ocean temperature variability within the interior of the southwest Pacific. This paper is not concerned with the detailed description and mapping of the temperature variations in the boundary flow of the East Australian Current (EAC). This more complex region is dominated by the seasonal cycle of EAC eddies that pinch off each year (Mulhearn et al. 1986, 1988; Ridgway and Godfrey 1997), making even the time-averaged mean flow difficult to determine on the timescales and space scales of this study. Rather, the main objectives of the ocean climate study (Holbrook and Bindoff 1997, 1999) have been to concentrate on the large-scale features of the subtropical gyre. For the present application of this technique, and given our interests only in the large-scale field, the coastal data were excluded and are not analyzed.

The MBT data were provided at discrete and regular vertical intervals. In contrast, and for data storage efficiency, most of the XBT data were provided at “inflection point” depths. These depths were chosen such that the original set of temperature data could be reproduced by linear interpolation with an error of no more than ±0.10°C (D. Collins of NODC User Services, 1991, personal communication). However, the XBT data that were originally collected and stored at selected levels (or standard depths) were also provided at those selected levels by the NODC. Details and references regarding the accuracy of both instruments are contained in Holbrook (1994).

The temperature data were first linearly interpolated in the vertical onto a 5-m grid between the surface and the maximum depth of 450 m. Quality checks were made to identify any gross errors in the data such as obvious positional errors and outlying values. Figure 1 shows the spatial positions and time distribution of all MBT and XBT casts retained in the data after quality control. Given the large number of observations and the limitations of available computer resources, it was necessary to carry out the data analysis on subsets of the original dataset. Consequently, the study region was divided into four subregions chosen with some reference to the local dynamics Subregion 1 includes the western Coral Sea, Solomon Sea, and the Bismarck Sea [TOGA Coupled Ocean–Atmosphere Response Experiment (COARE) domain]; subregion 2 encompasses the broader tropical southwest Pacific basin; subregion 3 extends from the subtropics through to the midlatitudes and defines the western Tasman Sea and is a region of high mesoscale variability in the vicinity of the EAC; and finally subregion 4, a less variable region, mainly represents the eastern Tasman Sea, which is broadly separated from the west by the Lord Howe rise. These subregions contain a 2° overlap at each of the east–west and north–south boundaries in order to minimize sharp edges and discontinuities in the final mapped fields.

The MBT and XBT datasets complement each other well, providing an evenly distributed temporal coverage during the selected period. The main spatial variations in coverage across the study region have resulted mostly from changes in shipping routes and the establishment of the VOS network. Although the MBT data are more sparse than the XBTs in some regions, notably region 2 (cf. Figs. 1a,b), they are also the most comprehensive dataset in the region south of New Zealand due particularly to the intensive period of Southern Ocean cruises between 1956 and 1962. The work by, for example, Wyrtki (1975, 1985) during the 1970s and 1980s and, in general, the renewed interest in El Niño since the late 1960s, have resulted in a more focused program of XBT data collection in the tropical regions since that time. The more central latitudes within the region (15°–40°S) are well covered in space and time over the entire period. It should be noted that the MBT data are shallower and typically extend to 135-m depth, while the XBT data usually extend to 450-m depth or more in the open ocean.

Although the observational coverage is relatively uniform, the density of the data in time and space is sparse. As an annual average, the coverage translates to almost 100 casts per 2° latitude–longitude grid cell (about 25 casts per 1° box). As a monthly climatology (long-term averaged), this data density reduces to about eight casts per 2° cell (about 2 casts per 1° cell). However, at the seasonal scale (3-month increments) for *each* year over the full 34 years between 1955 and 1988, there is a little less than one observation per 2° spatial cell, with the higher density XBT observation period between 1973 and 1988, averaging just over one observation per 2° cell.

The data density greatly affects the temporal and spatial resolution that can be achieved from these data for this region. As a consequence of these limitations, we have been both realistic and pragmatic with our choices of space scales and timescales used for mapping the temperature field. These choices are based largely on a combination of the (i) observed anisotropy of upper ocean temperatures, with longer zonal scales than meridional scales apparent in the upper ocean (Fig. 2); (ii) consideration of the trade-offs in data resolution between the selection of large length scales for statistical reliability as opposed to the finer spatial and temporal scales; and (iii) spatial and temporal scales in the tropical–subtropical zone for the South Pacific reported in the literature (Meyers et al. 1991).

## 3. Treatment of the data

### a. Data products and analysis sequence

Two types of gridded datasets have been produced with the analysis technique described here. The first type comprises of climatological (long-term averaged) monthly temperatures [referred to hereafter as the *monthly* (reconstructed) *climatology*] and the second type is a time series, with a time interval of three months of the temperature field for January, April, July, and October over two separate periods (referred to as the *time series*). The two periods for the time series are 1955–88 (down to 100-m depth) and 1973–88 (down to 450-m depth). The shallower vertical extent of the longer series is due to the shallower depth limitations of the MBT data and to ensure internal consistency when merging these two types of data (Ridgway 1995; Holbrook and Bindoff 1997). The gridded monthly climatology and the two gridded time series are products of a case study application of the technique to upper-ocean temperature profiles in the southwest Pacific (Holbrook and Bindoff 2000).

There are three stages to the present analysis. Stage I involves determining a first-guess (a priori) mean field. As a first approximation, a polynomial expansion of the temperature field is used to estimate the a priori background monthly climatology for the region. It is essential that in data-sparse regions, the a priori estimate of the mean field is close to the expected mean so that the final fields are smooth and free of bull’s-eyes. Stage II improves our estimate of the monthly climatology through the EOF/objective mapping procedure of the data pregrouped into 12 monthly bins, where the first-guess mean used in the analysis at stage II is the polynomial mean determined in stage I. Finally, stage III calculates the two gridded time series using the same EOF/objective mapping procedure as in stage II. The first-guess mean used in stage III is the monthly (reconstructed) climatology determined in stage II. Thus, the mapping procedure is a two-tiered process, where the results of stage I are used in stage II, and likewise the results of stage II are used in stage III, to allow greater temporal resolution of the mapped time series.

In order to prepare the temperature data for the monthly climatology (stage II), the temperature profiles were first organized into 12 separate groups, each of length “365.25/12 days” approximately corresponding to the calendar months. For the time series (stage III), subsets of the original data comprising of “overlapping” time periods were chosen, the period of overlap being typically between six and nine months. Each of these periods contained a limited number of data points (on average, ⩽1500 points) so that the data could be easily mapped to create the three products described above and summarized in Table 1, using the available computing resources.

### b. Prior statistics

Before calculating the EOFs, it is important to obtain prior estimates of both the signal and the background noise in the temperature data. The signal is represented as the difference between the point observations of temperature and the a priori mean temperature at each of the observation points. The a priori noise in the data is defined here as the small-scale variability associated with the unresolved ocean processes, such as internal waves and mesoscale eddies, or errors associated with instrument type and/or reporting technique. The estimate of the a priori noise is the same for both stages II and III since it is an objective estimate of the noise.

#### 1) A priori noise

The variance of the data about its mean vertical structure contains both the signal variance and the noise variance and is not an appropriate measure for weighting the data used in the EOFs (as is commonly done). The noise variance at each depth should instead be estimated independently of the signal variance. In the present technique, it is found from the difference between each temperature value and those for close neighboring casts. Neighboring is referred to here as the shorter spatial and temporal scales that tend to separate individual casts along a cruise track by less than 100 km and less than 1 day. The short timescale here ensures that the selected casts represent those deployed from the same ship, or a separate ship close in time (and space). This scheme overcomes the problem of sampling across different seasons and ensures that noise is represented by mesoscale processes, rather than the large-scale, longer period signals that we wish to resolve.

We have assumed that the a priori noise estimate is homogeneous over the period of the measurements within each subregion. On a year-by-year basis, the oceanographic noise within each region is difficult to estimate and is not reliable, given the data sparsity. The subdivision of the study region spatially, however, permits a simple estimate of the spatial variation in the noise statistics across the southwest Pacific.

*T*

_{r}be the observed temperature for cast

*r*at a selected depth, with signal,

*s*

_{r}, and noise,

*n*

_{r}. Therefore,

*T*

_{r}=

*s*

_{r}+

*n*

_{r}. The mean-square difference between this temperature value and that of a neighboring cast,

*T*

_{s}, at the same depth is

*T*

_{r}

*T*

_{s}

^{2}

*s*

_{r}

*s*

_{s}

^{2}

*n*

^{2}

*sn*〉 = 0 and 〈

*n*

_{r}

*n*

_{s}〉 = 0. Assuming the signal has a much greater correlation scale than the distance between neighboring casts, the first term in Eq. (1) is small compared with the second term and can be ignored. Thus, the a priori noise at depth

*i*becomes

*T*

_{r}−

*T*

_{s})

_{k}is the

*k*th difference between pairs of distinct casts separated spatially by less than 100 km and temporally by less than 1 day.

Direct application of Eq. (2) to the data within the individual subregions is costly in computer time because of the time required to calculate distances between casts. Since the casts of interest in the above calculation uses only neighbors within 100 km and 1 day of each other, the information from individual casts, say, 15° of latitude apart, is clearly redundant. Consequently, after some initial trials, it was found that the computing time could be dramatically reduced by successively making searches for neighboring casts within 3° latitude bands and intervals of 1 year in time, with little overall change to the results.

#### 2) A priori mean field

In stage I (in preparation for stage II), the a priori mean temperature field, *T*_{apriori} = *f*(*ϕ, z, t*), was modeled as a second-order polynomial in latitude (*ϕ*) and depth (*z*), with annual and semiannual harmonics in time. This a priori model of the seasonal temperature cycle is linear in the unknown coefficients to the polynomial and seasonal harmonics (with a total of 45 regression coefficients) and was found using least squares (e.g., Menke 1989). The complete expression for this polynomial is given in the appendix.

The polynomial model of the a priori mean field is not highly sophisticated. Nevertheless, this model broadly fits the large-scale vertical and meridional structure of the ocean together with the seasonality of upper-ocean temperature variations across the region. We have neglected zonal variations in the temperature field, as zonal variations are much weaker in the ocean interior of the subtropical gyre. Statistics associated with the a priori mean field are presented in section 4.

For the time series products (stage III), the a priori mean field was the estimated monthly climatology (the product of stage II).

### c. Singular value decomposition

**A**

*M*×

*N*matrix for the data such that

*M*represents the number of standard depths and

*N*is the number of stations. Vertical positions are allocated every 5 m from the surface to 450-m depth, making a total of 91 standard depths. Following Fukumori and Wunsch (1991), the temperature data are normalized by subtracting an a priori mean field and dividing by the a priori noise. Hence, each element of the

**A**

*T*

_{ij}is the temperature value at depth

*i,*station

*j,*

*T*

_{ij}

*i,*station

*j,*and the normalization factor,

*σ*

_{i}, is the a priori error (noise) estimate at depth

*i.*

**AA**

^{T}for the data within each subregion such that

**AA**

^{T}

**U**

**Λ**

^{2}

**U**

^{T}

**u**

_{i}, of

**U**

**A**

**A**

**U**

**Λ**

**V**

^{T}

**U**

**U**

^{−1}=

**U**

^{T}), yields

**V**

**Λ**

^{−1}

**U**

^{T}

**A**

^{T}

*M*×

*M*matrix,

**Λ**, are the set of eigenvalues and

**V**

*N*×

*M*matrix with each column vector,

**v**

_{i}, representing the amplitudes of the associated vertical eigenvector,

**u**

_{i}. Since each column of the data matrix,

**A**

**v**

_{i}, represents the time and space variation of the associated vertical eigenvector

**u**

_{i}.

The rank, *p,* of the matrix **A***M* × *N* and represents the total number of modes, that is, equal to the total number of eigenvectors or eigenvalues, arising from the SVD. Here, we are primarily interested in the number of significant modes, *k* ⩽ *p,* which are above the noise level of the data.

SVD is a numerically expensive procedure because it requires ∼*N*^{3} operations. However, since the rank of the covariance matrix is bounded by the minimum of *M* or *N,* then for data matrices with one dimension much smaller than the other, as in this problem, huge savings in computer time and memory can be achieved by using the vertical covariance matrix. These savings occur because there are only 91 vertical levels, compared with the 8000^{+} profiles contained in each subregion. As a consequence, the computational cost of determining the EOFs is trivial compared to the objective mapping step.

### d. Resolution

Both resolution and statistical reliability are important factors in the analysis of data. However, there is a trade-off between the two. Statistical reliability is increased when using large length scales at the expense of the finescale resolution. The EOF analysis provides information about the vertical resolution in the temperature data.

*k*⩽

*p,*and can be represented as the sum of modes,

**t̂**

_{k}, by the relation

**t̂**

_{k}

**U**

_{k}

**U**

^{T}

_{k}

**t**

_{data}

**UU**

^{T}

### e. Objective mapping

**v**

_{i}’s onto the chosen regular horizontal grid. The vertical structure at each grid point for each mapped mode is defined by

**u**

_{i}. The objectively mapped

**v**

_{i}values are given by Bretherton et al. (1976) as

**v**

_{i}is the

*i*th eigenvector of the horizontal coefficient matrix,

*i*th mode and

**C**

_{md}and

**C**

_{dd}are, respectively, the model–data and data–data covariance matrices.

**C**

_{idd}

*r*

_{xjk}

*r*

_{yjk}

*r*

_{tjk}

*j,*and

*k, l*

_{x},

*l*

_{y},

*l*

_{t}are the corresponding covariance scales, and

*s*

_{i}and

*e*

_{i}are estimates of the signal and noise variances for the

*i*th mode. The signal variance

*s*

_{i}here is simply the covariance of

**v**

_{i}, while the noise variance

*e*

_{i}of

**v**

_{i}is obtained using the same procedure as for the a priori noise. The model–data covariance matrix

**C**

_{imd}

*r*

_{xjk}

*r*

_{yjk}

*r*

_{tjk}

Computationally, the objective mapping procedure is a costly step in the process of producing the gridded climatology and time series. The procedure involves an inversion of the data–data covariance matrix. This is an *N*^{3} operation in time and can become unmanageable for *N* > 1500 points or so on most good workstations. A major advantage of the EOF analysis procedure combined with objective mapping is that it reduces the number of times that mapping is required, while still retaining the vertical correlations present in the data.

Objective mapping relies on a reasonable choice of scales based on both the underlying physics and the data distribution. As mentioned earlier, we have been both realistic and pragmatic with our choices of space scales and timescales for the ocean temperature field. The only comprehensive assessment of space scales and timescales within the region that the authors are aware of is by Meyers et al. (1991) in the tropical–subtropical South Pacific. From their study, they report median scales for the depth of the 20°C isotherm (a proxy for the depth of the thermocline, which approximates the dominant baroclinic mode structure for the subsurface ocean) across the tropical–subtropical Pacific Ocean as 3° in latitude (meridional length scale), 15° in longitude (zonal length scale) and two months in time. For sea surface temperatures, and hence mixed layer temperatures, these scales are larger.

In the present study, horizontal length scales of 4° in longitude and 2° in latitude and a timescale of 90 days were chosen. The larger scale in longitude was chosen due to the zonality of ocean temperatures, while the choice of grid was selected to be consistent with that of available surface forcing data such as the Comprehensive Ocean–Atmsophere Data Set (Woodruff et al. 1987). Given the results of the Meyers et al. (1991) study, we believe that our choices of scales for the interior of the southwest Pacific are reasonable and consistent with the available data.

### f. Reconstruction

**u**

_{i}and

**v**

_{i}, are nondimensional vectors while the singular value,

*λ*

_{i}, is a scaling factor, with

*λ*

^{2}

_{i}

**A**

*i.*The total variance of

**A**

*p,*orthogonal modes (Lawson and Hanson 1974)

**A**

^{2}

*λ*

^{2}

_{1}

*λ*

^{2}

_{2}

*λ*

^{2}

_{p}

*p*is the rank. Selecting a smaller subset of modes, say, up to and including

*λ*

_{k}, then Eq. (5) can be written in the reduced matrix form

**A**

**U**

_{k}

**Λ**

_{k}

**V**

^{T}

_{k}

*k*denotes that only the first

*k*columns of the matrices are retained. An alternative representation of Eq. (12) is

*p.*However, reconstruction of the normalized temperatures using only a small number of modes,

*k,*that are able to explain a high proportion of the total variance in the data, is clearly more efficient.

*σ*

_{ijnorm}

*p*is the rank and

*u*

_{im}and

*e*

_{jm}are, respectively, the

*m*th mode eigenvector and estimated error of the mapped coefficients at the

*j*th grid point and at the

*i*th depth [cf. Bretherton et al. (1976)]. Here, however, the posteriori error [Eq. (14)] is generated using only the selected number of

*k*modes and hence this estimate of the error is incomplete since it does not include the error associated with the neglected (

*p*−

*k*) modes. The dimensional error is represented by

*σ̂*

*σ*

_{norm}

*σ*

_{apriori}

*k,*will be described in the following section.

## 4. Results

This section describes the various statistics arising from the application of the method to the historical upper-ocean temperature observations in the southwest Pacific. The main criteria of success is that the residuals between the monthly climatology or the time series are free of bias at all depth levels, and that the residuals are statistically consistent with the a priori estimate of the noise. Because this method involves multiple steps in the estimation of the final gridded field, there is no formal guarantee that the estimated temperature climatology or time series is optimal in the sense of minimizing the residuals and is an unbiased estimate of the true field.

Other information gleaned from the analysis include (i) estimates of the spatial variability of the signal variance in the data calculated as the mean squared differences between the data and the polynomial mean as well as between the temperature data and the estimated (mode reconstructed) temperature field for 1° latitude (and longitude) bands, (ii) a simple estimate of the spatial variability of the noise provided by a priori estimates for each subregion and at every 5 m in depth, (iii) a description of the vertical structure in the data for each of the four subregions provided by the vertical modes **u**_{i}, and (iv) a description of the contribution to the horizontal and temporal variability in the data for each of the four subregions provided by the expansion coefficients, **v**_{i}.

### a. Prior statistics

#### 1) A priori noise

The a priori noise is central to estimating the gridded fields. This noise is an objective estimate of the variability associated with the spatial sampling and unresolved oceanographic processes. It is the measure by which the residual statistics from the climatology and time series products are based as measures of goodness of fit. Figure 3 shows the vertical distribution of the a priori noise [calculated from Eq. (2)] for the four subregions. The discontinuities evident in the a priori noise estimates for subregion 1 at 150- and 250-m depth, reflect the typical depth limits of the MBT and XBT data collected in this subregion. Despite these discontinuities, the noise estimates display an interesting vertical structure. Noise is typically lower near the surface and increases with depth to a maximum between 200 and 350 m, before decreasing again at greater depths. Subregion 3 data have the largest noise reflecting the high mesoscale variability in the vicinity of the EAC (Nilsson and Cresswell 1981; Mulhearn et al. 1986, 1988; Lilley et al. 1986; Tate et al. 1989).

The low noise near the surface in each subregion is a result of the homogeneity of the mixed layer caused by air–sea interaction. Between about 100- and 400-m depth, the vertical temperature gradient is greater and results in higher noise levels due to the vertical heaving of isotherms by mesoscale eddies and short Rossby waves. At depths greater than about 350 m, the noise estimates decrease within the main thermocline due to the weaker vertical temperature gradients.

#### 2) Polynomial mean field

Figure 4a shows the mean differences between the data and the polynomial mean estimates of the data as a function of 1° latitude bands for subregion 1. This estimate of the mean field is meant to represent the largest vertical and horizontal scales of the temperature field. The polynomial mean explains a high proportion of the underlying “deterministic” field and the data are represented quite well at most latitudes and depths. Close to the equator, the magnitude of the mean differences at the surface and at 450-m depth increase to about 4°C. These differences are larger than the a priori error estimates and shows that there is a significant temperature mismatch to be resolved by the EOFs and objective mapping. As will be discussed in section 4b, this difference near the equator is absorbed in the first-mode EOF, while the smaller scale variations in space and time, not explained by the a priori mean, are resolved by the addition of successive higher-order modes.

Mean-square differences between the data and the polynomial mean at the data points show the distribution of the signal variance at each latitude. For subregion 3 (Fig. 4b), it is striking that the mean-square differences are less than 3°C^{2} for most latitudes, which is typically only three times the variance of the a priori noise. In the well-sampled region between Auckland and Sydney and between about 26° and 39°S (within subregion 3), there is a stronger signal at all depths. Despite the removal of all data west of 154°E between 30° and 37°S, the subsurface signal variance in excess of 3°C^{2} between about 26° and 39°S reflects the higher variability of the EAC.

Mean differences for subregion 4 were typically a little more than 1°C, while mean-square differences were, on average, no more than about 2°C^{2} (not shown). This reflects the smaller mesoscale temperature variability within this subregion compared with the outflow of the EAC in subregion 3.

### b. Vertical correlations

#### 1) Monthly climatology

The differences between the monthly pregrouped temperature data and the polynomial mean field were normalized by the a priori noise and the climatological monthly EOFs were calculated. The cumulative percentage of the variance associated with the singular value spectrum and arising from the stage II analysis of the subregion 1 data shows that five modes are sufficient to explain 93.2% of the total variance in the data (Fig. 5a). Similar results are also found for subregions 2, 3, and 4, where the first five modes, respectively, account for 94.3%, 96.4%, and 95.2% of the total variance. In each case, the first mode absorbs the largest spatial scale variability in the data. The cumulative percentage of the variance for the first five eigenvalues for each of the four subregions shows similar distributions to subregion 1 (see Table 2).

#### 2) Time series

The differences between the temperature data and the monthly climatology were also normalized by the a priori noise, and the time series EOFs were calculated. The cumulative percentage of the total variance explained by successive modes for subregion 1 show a whiter eigenvalue spectrum (Fig. 5b) compared with the climatological monthly EOFs. A total of five modes were sufficient to explain about 86.1% of the total variance of the difference between the observations and the a priori monthly climatology. This is less than the 93.2% of the variance explained by the first five modes in the seasonal analysis of subregion 1 (cf. Fig. 5a). This difference in variance between stage II and stage III makes good sense. Much of the variance in stage II explains the differences between the data and the fitted polynomial mean, whereas for the time series (stage III), the smaller differences between the data and the estimated seasonal mean temperature field result in a flatter eigenvalue spectrum. Similar results are also apparent in comparisons of variance estimates from each of the other three subregions. Table 2 shows the first five eigenvalues, *λ*_{i}, from the SVD of the **A**

The individual vertical modes from the EOFs of the subregion 1 and subregion 3 data are, respectively, shown in Figs. 6a,b. Results for subregions 1 and 3 are discussed since they provide the most interesting features for both the low- and midlatitudes. The vertical modes also implicitly describe spatial variations and time changes in the vertical temperature field for each of these subregions. The first vertical mode EOF for subregion 1 resolves the temperature anomalies in the thermocline, while the second mode resolves the surface mixed layer (Fig. 6a). The first vertical mode EOF for subregion 3 is uniform with depth and is associated with a geographical bias in the a priori mean (Fig. 6b). In contrast, the second mode again resolves the surface mixed layer, while the third and fourth modes resolve the baroclinic structure of the thermocline. Overall, the five modes taken together successfully resolve the typical vertical scales associated with the surface mixed layer and thermocline.

### c. Resolution

The time series resolution matrix is calculated using Eq. (7) and shows how these EOFs filter the temperature observations. Since five vertical modes were sufficient to explain about 90% of the normalized variance in the data, it is clear that a high degree of vertical correlation exists for ocean temperatures profiles.

Figure 7 gives a graphical representation of the data resolution matrix for the time series, but using only the first five vertical modes for subregion 1. The horizontal axes identify the depths of the 91 vertical elements for each of the five vertical modes reconstructed to form the data resolution matrix. The vertical scales and magnitudes of the resolution for each of the five vertical modes are indicated by the “depth” and “resolution” scales of each of the five major peaks along the diagonal in Fig. 7. The smallest vertical scales are observed in the upper layers of the water column to about 150-m depth, while successive depths incorporating the thermocline have larger vertical scales. The highest contributions have the strongest signal and the smallest vertical scale in the upper 50 m and upper 100 m, respectively. As depth increases, the vertical scales also increase. For example, the bandwidth of the main diagonal of the resolution matrix indicates vertical scales in excess of 75 m below 250-m depth. This progression of scale length with depth is qualitatively consistent with the vertical thermal structure and is a nice example of the EOFs being able to select the relevant vertical length scales.

### d. Residuals from mapping and reconstruction

A necessary condition for any mapping procedure is to examine residuals between the mapped fields and the original data. This is undertaken by examination of the differences between the reconstructed data, up to *k* selected modes, and the original data. Normalized temperature differences in the original **A****Â****U**_{k}**Λ**_{k}**V**^{T}_{k}*k,* after the objective mapping of the expansion coefficients, **v**_{i}.

#### 1) Monthly climatology

Figure 8a shows the root-mean-square (rms) normalized residuals between the original pregrouped monthly temperature data and the monthly (reconstructed) climatology up to mode 5 for all January data obtained in subregion 1. The solid vertical line at unity indicates the a priori noise at all depths. Hence, rms residuals less than one at each depth level indicate that the data are statistically consistent with the mapped data given our objective estimates of the a priori noise.

The first mode quickly reduces the residuals between 50- and 150-m depth, while the third mode operates on data between 150- and 400-m depth. Clearly, by mode 5, all of the January data for subregion 1 are explained to within the a priori noise except at the surface. In fact, the rms normalized residuals shown in Fig. 8a indicate that a 4-mode reconstruction is adequate to describe the original January data for subregion 1 to within the noise except at the surface and the base of the mixed layer between about 50- and 75-m depth. Overall, the results suggest that a 5-mode reconstruction is sufficient to explain the data at most depths to within the a priori noise in any month or subregion.

These vertical modes have preserved the vertical structure in the temperature data, thus avoiding the necessity for each level to be mapped separately. As a consequence of using the vertical correlations, the computing time was reduced by a factor of 36, since only five modes instead of the original 91 vertical levels needed to be objectively mapped.

#### 2) Time series

Figure 9a shows the mean residuals between the temperature data and the estimated (reconstructed) temperature time series, using five modes, as a function of 1° latitude bands for subregion 1, between January 1982 and August 1985. This example region and “overlapping” period (about 1100 data points) was chosen because it covers the period of the strong 1982–83 El Niño event. Results are presented for the sea surface, 150-, 300-, and 450-m depth. The mean residuals are clearly negligible compared with the a priori noise at most depths except 450 m, where a small mean residual of less than 0.5°C is apparent. This small bias is a result of only five modes being used in the data reconstruction, although the data are still explained to within the a priori noise at *all* depths.

The mean residuals (Fig. 9b) and mean-square residuals (Fig. 9c) as a function of 1° latitude bands for subregion 3 are also presented. Here, the time series between March 1980 and January 1985 is analyzed, which includes a similar number of data points. The peak signal variance between about 30° and 35°S in Fig. 4b is similarly observed in Fig. 9c, albeit of reduced magnitude due (at least partly) to the better estimate of the underlying large-scale monthly climatology, and reflects the high eddy variability that is mainly incoherent on the large spatial scales off the East Australian coast [e.g., Nilsson and Cresswell (1981) and Wilkin and Morrow (1994)].

Figure 8b shows the rms normalized residuals between the original data and the reconstructed temperature time series (for modes 1–5) for subregion 1 between January 1982 and August 1985. The residuals between 100- and 250-m depth are quickly reduced by the first mode to about the estimates of the a priori noise at those depths. It is apparent that the second mode affects the residuals near the surface, while mode 3 affects the residuals primarily below 250-m depth. Most of the data are explained to within the a priori noise by mode 4 in this subregion and during this period. Overall, the number of modes required to explain the data to within the a priori noise for *any* subregion or overlapping period, varied between 2 and 5.

### e. Example of the mapped temperature time series

To provide an example of the gridded temperature time series data generated in this study, 3-monthly snapshots of temperature anomalies (departures from the monthly climatology) in the thermocline (the layer in the upper ocean where the vertical temperature gradient is largest) of the southwest Pacific Ocean are presented in Fig. 10 during the 1982–83 El Niño period. As a single event, the 1982–83 El Niño has been the subject of much interest and investigation, not only because of its extreme magnitude but also because of the atypical manner in which it developed (Philander 1990).

Figure 10 shows the interannual temperature changes (the seasonal cycle has been removed from the data) that occurred at 250-m depth during the months of January, April, July, and October in 1983. These temperature perturbations describe vertical excursions of the thermocline, and these timescales are of interest to interannual heat budget and circulation studies. The most obvious feature of the time series of the anomaly fields during this period is the lack of signal during January 1983 followed by significant temperature changes north of about 10°S during April and July of 1983 when temperatures at 250-m depth fell by as much as 4°C. With respect to an average annual mean temperature profile calculated between 3° and 11°S, this 2°–4°C reduction in temperature at 250-m depth indicates that there was a shallowing of the base of the tropical thermocline by as much as 50–125 m. This shoaling of the thermocline is consistent with the progression of the 1982–83 El Niño event reported elsewhere (Meyers and Donguy 1984).

Another feature of the temperature anomalies during April 1983 is that coincident with the strong cool anomalies observed at 250-m depth east of Papua New Guinea [in the Western Pacific warm pool (WPWP)], the Western Coral Sea was slightly warmer than normal. Here, temperature changes at 250-m depth in the Western Coral Sea are shown to be anticorrelated with the cool anomalies in the subsurface of the WPWP that are associated with the large-scale vertical movements of the thermocline across the tropical Pacific Ocean due to El Niño–Southern Oscillation (ENSO) dynamics. This feature is discussed further in Holbrook and Bindoff (1997).

Contours of the posteriori (mapping) errors (°C) are also superimposed on Fig. 10. The posteriori errors at 250-m depth (Fig. 10) describe the mapping error associated with the distribution in space and time of XBT casts deployed in the region that extended to a depth of 250 m or more. In the absence of observations, the objectively mapped time series relaxes to the a priori estimate of the mean, which in this case is the monthly climatological mean field. Also in the case of sparse observations, the error estimate increases to the a priori noise. For this example, the errors indicate that the data coverage was reasonably good in the Tropics (errors ∼0.4°–0.6°C) and subtropics (subtropical gyre region) (errors ∼0.2°–0.4°C). However, farther south in the midlatitudes, in particular south of Australia, the errors become much larger due to the absence of available observations during this period (errors >0.8°C).

## 5. Discussion and concluding remarks

The purpose of this paper has been to present an objective method for mapping irregularly sampled ocean temperature profiles, such that a full reconstruction of the four-dimensional temperature field (i.e., three spatial dimensions and time) can be made, that is consistent with our knowledge of the spatial and temporal scales, and with a minimum of computational cost. As a case study, the irregularly spaced upper-ocean temperature profiles used in this study were collected from a mixture of merchant vessels and oceanographic cruises with individual projects in mind rather than long-term monitoring as the goal. In order to generate a useful uniformly gridded temperature time series from such data, it is important that an objective method is used because it provides an unbiased estimate of the entire gridded field together with estimates of the errors associated with the original data distribution through the mapping procedure.

Objective mapping (or objective analysis/interpolation), a least squares procedure, was chosen for this analysis. Unfortunately, for very large datasets, such as the dataset discussed in this paper, objective mapping becomes computationally very costly since it requires a matrix inversion of the large data–data covariance matrix, an *N*^{3} operation in time. Nevertheless, given the continuous vertical temperature profiles available from MBT and XBT deployments, optimal use of this vertical information is also desired. A feature of the present study is that it was able to take account of the inherent vertical correlations which physically exist within the ocean. This simple fact hightlights the vertical redundancy in the data and is the key to the savings in computing time afforded by the objective mapping of only a few (five) EOFs, rather than the complete set of 91 vertical levels.

It was found that about 90% of the overall temperature variability in the vertical could, in most cases, be explained by the first five vertical modes. Furthermore, it was found that the original data could be reconstructed to within the a priori noise using only the first five vertical modes. Consequently, the gridded temperature fields at each of the 91 vertical levels could be generated using the objective mapping procedure only five times, providing dramatic computational savings. This combination of EOF analysis and objective mapping was used to produce uniformly gridded datasets of the upper-ocean seasonal mean (monthly climatology) temperature field and also the upper-ocean temperature time series between 1955 and 1988 to 100-m depth and between 1973 and 1988 to 450-m depth on a 3-monthly grid (January, April, July, and October). Each of these datasets was produced on a 2° × 2° spatial grid.

With the interest in better understanding climate variability and change, and the important role that the ocean plays in the coupled climate system, there is now perhaps more than ever before a need to be able to quantify the regional to large-scale changes in ocean and atmosphere temperatures and circulation, with estimates of uncertainty. On the large spatial scale and for long time histories, investigations of this kind often require the analysis of huge quantities of data. The technique described in this paper provides an objective, yet pragmatic approach to quantifying time changes, with errors, in these historical observations for very large datasets.

We are currently extending the technique described here to produce an optimal and sequential method, which allows updating of the vertical eigenvectors and posteriori errors. It is expected that this method will be applied in near–real time as required by the Climate Variability and Predictability Programme and modern forecasting techniques.

## Acknowledgments

We are grateful for helpful comments provided by Ken Ridgway, CSIRO Marine Research in Hobart, and two anonymous reviewers. This project was supported in part by a National Greenhouse Advisory Committee (NGAC) grant and is a contribution to the World Ocean Circulation Experiment (WOCE).

## REFERENCES

Bindoff, N. L., and C. Wunsch, 1992: Comparison of synoptic and climatologically mapped sections in the South Pacific Ocean.

*J. Climate,***5,**631–645.Bretherton, F. P., R. E. Davis, and C. B. Fandry, 1976: A technique for objective analysis and design of oceanographic experiments applied to MODE-73.

*Deep-Sea Res.,***23,**559–582.Frankignoul, C., 1981: Low-frequency temperature fluctuations off Bermuda.

*J. Geophys. Res.,***86,**6522–6528.Fukumori, I., and C. Wunsch, 1991: Efficient representation of the North Atlantic hydrographic and chemical distributions.

*Progress in Oceanography,*Vol. 27, Pergamon, 111–195.Gandin, L. S., 1963:

*Objective Analysis of Meteorological Fields.*(English translation). Israel Program for Scientific Translations, 242 pp.Holbrook, N. J., 1994: Temperature variability in the Southwest Pacific Ocean between 1955 and 1988. Ph.D. thesis, University of Sydney, 229 pp.

——, and N. L. Bindoff, 1997: Interannual and decadal temperature variability in the southwest Pacific Ocean between 1955 and 1988.

*J. Climate,***10,**1035–1049.——, and ——, 1999: Seasonal temperature variability in the upper southwest Pacific Ocean.

*J. Phys. Oceanogr.,***29,**366–381.——, and ——, 2000: A digital upper ocean temperature atlas for the southwest Pacific: 1955–1988.

*Aust. Meteor. Mag.,***49,**37–49.Lawson, C. L., and R. J. Hanson, 1974:

*Solving Least Squares Problems.*Prentice-Hall, 340 pp.Lilley, F. E. M., J. H. Filloux, N. L. Bindoff, I. J. Ferguson, and P. J. Mulhearn, 1986: Barotropic flow of a warm-core ring from seafloor electric measurements.

*J. Geophys. Res.,***91,**12 979–13 109.Menke, W., 1989:

*Geophysical Data Analysis: Discrete Inverse Theory.*Academic Press, 289 pp.Meyers, G., and J. R. Donguy, 1984: South Equatorial Current during the 1982–83 El Niño.

*Tropical Ocean–Atmosphere Newsletter,*Vol. 27, 10–11.——, H. Phillips, N. Smith, and J. Sprintall, 1991: Space and time scales for optimal interpolation of temperature—Tropical Pacific Ocean.

*Progress in Oceanography,*Vol. 28, Pergamon, 189–218.Mulhearn, P. J., J. H. Filloux, F. E. M. Lilley, N. L. Bindoff, and I. J. Ferguson, 1986: Abyssal currents during the formation and passage of a warm-core ring in the East Australian Current.

*Deep-Sea Res.,***33,**1563–1576.——, ——, ——, ——, and ——, 1988: Comparison between surface, barotropic and abyssal flows during the passage of a warm core ring.

*Aust. J. Mar. Freshwater Res.,***39,**697–707.National Oceanographic Data Center, 1991: CD-ROMs NODC-02 and NODC-03: Global ocean temperature and salinity profiles. NODC Informal Rep. 11, 14 pp. [Available from National Oceanographic Data Center, User Services Branch, NOAA/NESDIS E/OC21, 1825 Connecticut Ave. NW, Washington, DC 20235.].

Nilsson, C. S., and G. R. Cresswell, 1981: The formation and evolution of East Australian Current warm-core eddies.

*Progress in Oceanography,*Vol. 9, Pergamon, 133–183.Philander, S. G. H., 1990:

*El Niño, La Nina and the Southern Oscillation.*Academic Press, 293 pp.Ridgway, K. R., 1995: An application of a new depth correction formula to archived XBT data.

*Deep-Sea Res.,***42,**1513–1519.——, and J. S. Godfrey, 1997: Seasonal cycle of the East Australian Current.

*J. Geophys. Res.,***102,**22 921–22 936.Roemmich, D., 1983: Optimal estimation of hydrographic station data and derived fields.

*J. Phys. Oceanogr.,***13,**1544–1549.Tate, P. M., I. S. F. Jones, and B. V. Hamon, 1989: Time and space scales of surface temperatures in the Tasman Sea, from satellite data.

*Deep-Sea Res.,***36,**419–430.Wilkin, J. L., and R. A. Morrow, 1994: Eddy kinetic energy and momentum flux in the Southern Ocean: Comparison of a global eddy-resolving model with altimeter, drifter, and current-meter data.

*J. Geophys. Res.,***99,**7903–7916.Woodruff, S. D., R. J. Slutz, R. L. Jenne, and P. M. Steurer, 1987: A comprehensive ocean–atmosphere data set.

*Bull. Amer. Meteor. Soc.,***68,**1239–1250.Wyrtki, K., 1975: El Niño—The dynamic response of the equatorial Pacific Ocean to atmospheric forcing.

*J. Phys. Oceanogr.,***5,**572–584.——, 1985: Water displacements in the Pacific and the genesis of El Niño cycles.

*J. Geophys. Res.,***90,**7129–7132.

## APPENDIX

### Polynomial Mean Field

*T*

_{apriori}=

*f*(

*ϕ, z, t*), is modeled as a second-order polynomial in latitude (

*ϕ*) and depth (

*z*), with annual and semiannual harmonics in time. This a priori model of the seasonal temperature cycle is linear in the unknown coefficients to the polynomial and seasonal harmonics (with a total of 45 regression coefficients) and was found using least squares (e.g., Menke 1989). The complete expression for this polynomial is given below:

*X*

_{1}= sin

*ωt,*

*X*

_{2}= cos

*ωt,*

*X*

_{3}= sin2

*ωt,*

*X*

_{4}= cos2

*ωt,*

*ω*= 2

*π*/

*T,*and

*T*= 365.25 days is the period. It should be noted that in each case, the spatial variables,

*ϕ*and

*z,*were fit as centered differences within the region, that is,

*ϕ*=

*ϕ*− (−25°) and

*z*=

*z*− 225 m, in order to help to optimize the integrity of the fit. The regression coefficients,

*a*

_{i}, that were determined in the present study, by fitting the temperature profiles for the southwest Pacific region, are given in Table A1. Clearly, many of the higher-order cross-correlation terms are very small and make a negligible contribution to the overall polynomial fit of the southwest Pacific temperature data.

The three mapped upper-ocean temperature products generated from the technique described in this paper.

The first five eigenvalues from a singular value decomposition of the **A**

Table A1. The regression coefficients of the polynomial expression in Eq. (A1) for the first-guess seasonal temperature cycle (stage I of the analysis) based on a fit of the available temperature profiles (bathythermograph data) in the southwest Pacific Ocean between 1955 and 1988.