## Abstract

Systematic biases in historical expendable bathythermograph (XBT) data are examined using two datasets: 4151 XBT–CTD side-by-side pairs from 1967 to 2011 and 218 653 global-scale XBT–CTD pairs (within one month and 1°) extracted from the *World Ocean Database 2009* (*WOD09*) from 1966 to 2010. Using the side-by-side dataset, it was found that both the pure thermal bias and the XBT fall rate (from which the depth of observation is calculated) increase with water temperature. Correlations between the terminal velocity *A* and deceleration *B* terms of the fall-rate equation (FRE) and between *A* and the offset from the surface terms are obtained, with *A* as the dominant term in XBT fall-rate behavior. To quantify the time variation of the XBT fall-rate and pure temperature biases, global-scale XBT–CTD pairs are used. Based on the results from the two datasets, a new correction scheme for historical XBT data is proposed for nine independent probe-type groups. The scheme includes corrections for both temperature and depth records, which are all variable with calendar year, water temperature, and probe type. The results confirm those found in previous studies: a slowing in fall rate during the 1970s and 2000s and the large pure thermal biases during 1970–85. The performance of nine different correction schemes is compared. After the proposed corrections are applied to the XBT data in the *WOD09* dataset, global ocean heat content from 1967 to 2010 is reestimated.

## 1. Introduction

The ocean stores ~90% of the world’s anthropogenic global warming energy (Church et al. 2011), so detection of changes in ocean storage content is a key method of monitoring the global energy budget (Domingues et al. 2008; Levitus et al. 2012). Great efforts have been made to reconstruct the changes in ocean heat content (OHC) (Domingues et al. 2008; Ishii and Kimoto 2009; Levitus et al. 2009; Palmer and Haines 2009). All studies show a robust global ocean warming in the past 60 yr, but large differences exist between individual estimates (Lyman et al. 2010). Evidence indicates that many of the uncertainties originate from biases in the historical expendable bathythermograph (XBT) dataset, which dominates the past 50 yr of ocean in situ observations (Lyman et al. 2010).

Biases in XBT data have been identified since the early 1970s (Flierl and Robinson 1977). The sources of these biases originate from depth errors and pure temperature biases (Gouretski and Reseghetti 2010; Cowley et al. 2013). There are no pressure sensors in XBT probes; instead, time is converted to depth by an empirical fall-rate equation (FRE): *D* = *At* − *Bt*^{2}, where the values of coefficients *A* and *B* are specified by manufacturers and *t* is the elapsed time from when the probe hits the sea surface. The usually adopted form of FRE does not accurately describe the XBT fall rate (Cheng et al. 2011; Cowley et al. 2013; Hamon et al. 2012; Kizu and Hanawa 2002b), and there is evidence that the fall rate has changed over time (Cowley et al. 2013; DiNezio and Goni 2011; Good 2011; Gouretski 2012; Gouretski and Reseghetti 2010; Hamon et al. 2012; Ishii and Kimoto 2009; Levitus et al. 2009). Many authors calculated different fall-rate coefficients (although based only on a specific and small dataset of XBT profiles), and sometimes an additional offset term was included in the fall-rate equation.

Pure temperature biases are independent of fall rate and possibly result from thermistor errors or recording system biases (Gouretski and Reseghetti 2010; Cowley et al. 2013). The sensitivity of the XBT thermistor is ±0.01°C, but the overall accuracy of an XBT recording system is ±0.2°C (provided by the manufacturer). Experimental tests both in the ocean and laboratory have indicated that an XBT system (recorder plus cables plus XBT probe) usually measures temperatures warmer than reality, indicating a positive bias (Anderson 1980; Gouretski and Reseghetti 2010; Kizu and Hanawa 2002a; Reseghetti et al. 2007; Wright and Szabados 1989), but the causes of this bias are still unknown.

Many field tests have tried to determine the size of these biases by comparing XBT profiles with collocated conductivity–temperature–depth (CTD) profiles, which are listed in detail in Gouretski and Reseghetti (2010). The results show a high cruise-to-cruise and probe-to-probe variability, which complicates their application to the global XBT dataset. Different causes of biases in XBTs are reported in literature and can be broadly classified as follows:

mechanical (probe type, its weight and dimensions, shape, nose roughness, manufacturer, year) (Green 1984; Seaver and Kuleshov 1982)

external (launch height, atmospheric conditions, water conditions, maybe ship speed) (Gouretski and Reseghetti 2010; Hamon et al. 2012; Kizu et al. 2005; Reseghetti et al. 2007; Reverdin et al. 2009)

electrical (recording system, circuitry, ground connection, thermistor, wire)

Several authors suggest a decrease in fall rate in cooler waters because of increased viscosity (Gouretski and Reseghetti 2010; Kizu et al. 2008; Thadathil et al. 2002), but this effect has not been fully quantified, so it is not included in most of global correction schemes. Quantification of the change in fall rate with water temperature will be important for corrections in the Southern Hemisphere and high-latitude regions. Differences in fall rates between manufacturers are also important to quantify (Kizu et al. 2011).

In 1995, Hanawa et al. (1995, hereinafter H95) recalculated the fall rates for some XBT types based on 285 XBT–CTD pairs, and these corrections have been adopted as the fall rate by manufacturers and the Intergovernmental Oceanographic Commission (IOC) and the Integrated Global Ocean Services System (IGOSS) since then. Subsequently, Gouretski and Koltermann (2007) provided the first evidence of the time-varying feature of XBT biases, showing that H95’s correction is not applicable to all time periods. Adding to the confusion, since 1995 XBT metadata have contained a mix of manufacturer FRE and H95 FRE and in some cases, no indication at all of which FRE has been used.

Attempts to remove biases from the XBT dataset for use in climate studies have been made in the last 5 years. Pseudoreference datasets have been used in some cases, where each XBT profile and its reference profile are located within a specified spatial and temporal distance, usually a grid box in space and time (Gouretski and Reseghetti 2010; Hamon et al. 2012; Ishii and Kimoto 2009; Wijffels et al. 2008). These datasets are briefly summarized below:

Wijffels et al. (2008, hereinafter W08) propose a depth-independent but time-varying depth correction factor and separate probes into two groups: shallow and deep.

Ishii and Kimoto (2009, hereinafter IK09) also assume that all XBT biases are due to depth error, and they model depth error as a linear equation as function of elapsed time. Their corrections are also time varying.

Levitus et al. (2009, hereinafter L09) examine the total XBT bias, which is time and depth varying, but they do not separate the probe types.

Gouretski and Reseghetti (2010, hereinafter GR10) separate the depth error from the pure thermal bias and suggest both are time varying. Additionally, depth error is assumed to be depth varying.

Good (2011, hereinafter GD11) uses a different technique to estimate the depth error by comparing XBT depth with ocean depth measured by the General Bathymetric Chart of the Oceans (GEBCO). The results show XBT depth biases are time varying.

Gouretski (2012, hereinafter G12) extends the method proposed in GD11 and models a linear depth error with depth. A similar depth error history to previous studies is found and a time-varying pure thermal bias is presented.

Hamon et al. (2012, hereinafter H12) model the depth error using a linear equation with an offset and includes corrections for the pure thermal bias. Their corrections are separated into 0–200-m-averaged high and low temperature groups and deep and shallow groups, rather than by probe type/manufacturer.

Comparisons among these corrections show a robust time variation in the depth errors and the pure thermal biases. However, the uncertainties among the corrections are as large as natural intradecadal variation (Lyman et al. 2010).

Despite the extensive amount of work to define the biases in XBTs, how to obtain a reliable correction scheme for the historical XBT dataset is still debated because of a lack of understanding about why the biases exist in XBT data. In a recent study by Cowley et al. (2013, hereinafter CW13), a large number of side-by-side XBT–CTD comparisons (more than 4000 pairs) from the past 50 years were collected to provide a new and accurate insight into the biases. They report that the pure thermal bias is likely temperature and time varying but independent of depth, and that depth error is depth and time varying and possibly dependent on temperature. In CW13, an independent method (Cheng et al. 2011) is applied to side-by-side data to calculate fall rate coefficients and pure thermal bias, implying that they are all variable with time (hereafter CW13-CH). However, since the XBT–CTD pairs used in CW13 were mostly collected by researchers under ideal conditions, are the results of these ~4000 data representative of the ~2 200 000 XBT profiles globally?

In this study, we try to answer this question by examining the XBT bias in two XBT–CTD comparison datasets: the side-by-side XBT–CTD pairs (from CW13) and a global-scale XBT–CTD dataset. We introduce the two datasets and the method in section 2. The results derived from the two datasets are presented in sections 3 and 4. In section 5, the correction scheme for historical XBT data is proposed. In section 6, the performances of our corrections are tested and compared with other independent corrections. In section 7, global ocean heat content is reestimated after our corrections to the historical XBT data in the *World Ocean Database 2009* (*WOD09*) dataset. The conclusions are presented in section 8.

## 2. Data and methods

### a. Introduction to XBT probe types

The current official XBT manufacturers are Sippican Corporation (now Lockheed Martin Corporation, hereinafter Sippican) in the United States and TSK (Tsurumi Seiki, hereafter TSK) in Japan under a Sippican license. In the 1990s, a small amount of XBT probes was independently manufactured by Sparton of Canada, using their own models but they were very similar to Sippican XBTs. Detailed XBT–CTD comparison studies indicate that the two manufacturers’ XBTs (Sippican and TSK) have slightly different designs, which lead to different fall rates (CW13; Kizu et al. 2011). Studies based on global-scale data similarly show a difference in the depth error and the pure thermal bias between the Sippican and TSK types (Ishii and Kimoto 2009). Furthermore, differences in bias history exist among probe types from the same manufacturer.

The XBT is a relatively easy-to-use, robust, and accurate sensor that measures the temperature of a water column, designed to be dropped from a moving ship. Unfortunately, it has no pressure sensor, so its depth is estimated by a fall-rate equation provided by Sippican, the inventor of the XBT, with experimentally estimated fall-rate coefficients valid in the World Ocean. There are different XBT types, defined by their nominal maximum depth (200 m, T10; 460 m, T4–T6; 760 m, T7–DB; and 1830 m, T5). Three other types (1000 m, FD; 460 m, T11; and 2000 m, T12) were manufactured, but their presence in the historical archives are negligible. Table 1 listed the main characteristics of the different XBT types (maximum depth, maximum allowed ship speed, maximum acquisition time, and fall-rate equation coefficients). Sometimes, there are two versions of XBT probe types for the same depth rating, one for low speed and one for faster ships. Sippican stated that T4–T6–T7–DB have the same characteristics in water, so that they have the same coefficients for the fall-rate equation. T10, FD, and T5 have different coefficients. Operational experience shows that an XBT is able to reach up to about 20% deeper than the nominal maximum depths: ~250 m for T10, ~550 m for T4–T6, ~900 m for T7–DB, ~1200 m for FD, and ~2000 m for T5 without any evidence of errors. On the other hand, wire breaks, insulation penetrations, and leakage defects often shorten the maximum depth of the profile. The thermistor samples at ~0.1s, nearly coincident with the thermistor time constant. Since the early 1970s, it has also been verified in field tests comparing XBT records with more accurate and expensive instruments (usually CTD) that the uncertainties in XBT data were beyond the values stated by the manufacturer (±0.2°C in temperature and 2% or 5 m in depth, whichever is greater), whereas the stated instrumental resolution is 0.01°C in temperature and about 0.65 m in depth.

From a technical point of view, XBT–CTD comparison tests do not represent typical operational launching conditions. During an XBT–CTD test, the ship is required to be motionless, so wake and air turbulence due to hull motion and water turbulence in the near surface are absent. Most XBT–CTD pairs in the side-by-side dataset were performed from a stationary ship, and the height of the launch platform and test calibration details are frequently not included in reports/papers describing the comparison. These details could provide more insight into the reason for the offset term.

### b. Side-by-side XBT–CTD pairs

In the side-by-side XBT–CTD pairs dataset, each XBT profile is collocated in space and time with a CTD profile. This data source includes historical XBT–CTD experiments and data from the *WOD09*. The dataset analyzed in this paper is the same as that used in CW13. The dataset contains 4115 pairs, including 2096 high-resolution pairs (with a depth interval of less than 1 m for XBT, and less than 2 m for CTD), and 2019 low-resolution pairs (>1-m depth intervals in XBT and >2-m depth intervals in CTD or bottle). This side-by-side dataset is available via the Commonwealth Scientific and Industrial Research Organisation (CSIRO) Data Access Portal (doi:10.4225/08/52AE99A4663B1).

### c. Global-scale XBT–CTD pairs

Global-scale XBT–CTD pairs were sourced from *WOD09* (Boyer et al. 2009) using the following criteria:

The locations of the XBT and reference CTD profiles are within 1°.

The times of the XBT and CTD profiles are within 30 days.

There are 218 653 individual XBT profiles. Each XBT profile is used only once and a reference CTD profile is constructed from the CTD/bottle data within the distance/time range specified above. The method used to obtain the reference CTD profile is:

For each XBT profile,

*n*CTD profiles are located within the distance/time range.The

*n*CTD profiles are interpolated to standard depths: from 1 to 800 m with a 5-m interval. These CTD profiles are denoted as CT_{1}, CT_{2}, … . , CT_{n}. The distances between each CTD profile and the XBT profile are calculated and denoted as*D*_{1},*D*_{2}, … . ,*D*_{n}*.*

In the global-scale pairs dataset, pre-1990, shallow XBTs (T4–T6–T10) make up nearly half of the total dataset (Fig. 1; Table 2). T7s (deep XBT) make up the other half. XBTs with unknown type dominate during 1983–95. Deep XBTs (T5–T7–DB–FD) are the main part of the dataset after 1992. TSK XBTs become more common in the dataset after 1988; the amount of TSK XBTs is consistent in time from 1995 to 2010.

According to the manufacturers, the T4 and T6 probes are structurally the same (the same applies to T7 and DB), the only difference being the length of the wire on the canister spool. There is no significant difference in the fall-rate error or the pure thermal bias between the T7 and DB or the T4 and T6 probe types (in CW13). However, there is a statistically significant difference between T7–DB and T4–T6 (CW13), the TSK and Sippican probes, T10, and other probe types. The unknown type makes up the majority of the dataset and will have a different behavior because of the mix of probe types it contains. Keeping these differences in mind, XBT profiles are collected into nine groups: G1 includes Sippican T7–DB; G2 contains unknown-type profiles with a maximum depth deeper than 550 m post-1985 (540 m pre-1985), denoted as DX; G3 includes Sippican T4–T6; G4 includes all of the unknown-type profiles with a maximum depth shallower than 540 m pre-1985 (550 m post-1985), denoted as SX; G5 includes Sippican T10; G6 includes Sippican T5, since the terminal depth is as deep as 1800 m, which is distinct from the other probes; G7 includes the T4–T6 probes manufactured by TSK; G8 includes the TSK T5; and G9 includes the TSK T7–DB.

### d. Method to calculate fall-rate coefficients and pure thermal bias

The method proposed in Cheng et al. (2011, hereinafter CH11) is used in this paper to calculate the fall-rate coefficients from the side-by-side dataset and the global-scale dataset. The pure thermal bias is determined after the calculated fall-rate coefficients are applied. The method is also used in CW13 as an independent method to estimate the fall-rate coefficients and the pure thermal biases in the side-by-side dataset. We summarize the main steps of this method:

The fall-rate equation suggested by the manufacturers is

*D*=*At*−*Bt*^{2}, which is revised to*D*=*At*−*Bt*^{2 }*−*offset, where*A*and*B*are the terminal fall rate and deceleration term, respectively; offset “is a correction describing the up-to-now unpredictable phenomena occurring mainly in the near-surface layer, at the start up, with a deviation from the manufacturer’s equation”(CH11, p. 247).The individual fall-rate equation is obtained by minimizing the standard deviation of the temperature differences between the XBT and CTD profiles. This process helps to isolate the depth error from the pure thermal bias—that is because the standard deviation of the temperature difference is zero when a XBT temperature curve is parallel to the reference CTD profile.

The residuals after correcting an XBT profile by using the individual fall-rate equation are regarded as the pure thermal bias.

The final

*A*,*B*, and offset coefficients for the whole XBT dataset are obtained by calculating the weighted median of these coefficients from each individual XBT–CTD comparison, and the weights are the depths of each individual XBT profile. This weighting gives more importance to deep profiles.

## 3. Results from the side-by-side dataset

### a. Temperature’s influence on pure thermal bias

CW13 found a small but increasing trend of the pure thermal bias with column-averaged temperature by using side-by-side data. Though the relationship between the water temperature and the pure thermal bias is robust and is suggested by several independent studies (Gouretski and Reseghetti 2010; Reseghetti et al. 2007; Reverdin et al. 2009), CW13 finds that the temperature-varying thermal error plays a minor role in determining the time variation of the biases. Similar to CW13, we compared the Δ*T* at each 1-m interpolated depth with the CTD temperature at that depth, high-resolution T7–DB data within 1993–2005, and all of the T4–T6 data within 1982–93 are used according to Fig. 14 in CW13. The relationship (the function of “Robustfit” in MATLAB is used to fit a linear model) is described as

where Temp_{column} is the column-averaged temperature over all depths and the 95% confidence intervals are in parentheses. The trend for T7–DB is more robust than that for T4–T6, which suggests near-90% uncertainties. The statistical *p* value of the trend is 0.11 × 10^{−10} for T4–T6 and 0.21 × 10^{−11} for T7–DB, resulting in a rejection of null hypothesis (that there is no significant trend between the pure thermal bias and temperature) at the 99% confidence level. The results are in general agreement with the results from XBT calibrations in a temperature bath, quoted in Reseghetti et al. (2007), Reverdin et al. (2009), GR10, and CW13. This temperature dependence had been ascribed to both temperature-resistance characteristics of the XBT thermistor and the wire length (Reseghetti et al. 2007; Seaver and Kuleshov 1982).

### b. Temperature’s influence on depth error

Temperature’s influence on the XBT fall rate has been documented in several papers (Gouretski and Reseghetti 2010; Kizu et al. 2005; Thadathil et al. 2002), suggesting an increasing fall rate with water temperature due to decreasing water viscosity. However, an explicit relationship has not been described with the exclusion of Kizu et al. (2008) even for expendable CTD (XCTD) probes. In this paper, we try to quantify the temperature’s influences on the terminal value of the fall rate, *A*, which is derived by fitting coefficient *A* over the 0–100-m integrated water temperature. The 0–100-m depths are chosen instead of terminal depth or sea surface temperature because (i) the terminal depths of XBTs in the side-by-side pairs are not exactly the same, varying from 100 to 1000 m, depending on probe type and maximum depth attained. Using column-averaged temperature may give results dominated by the variation of the terminal depth. (ii) The terminal velocity is most likely affected by the upper-ocean (i.e., 100 m) temperature rather than the deeper ocean temperature. (iii) Terminal velocity (described by coefficient *A*) is reached within 0–100 m.

The relationship between temperature and the averaged coefficient *A* calculated over 0–100 m for the whole side-by-side dataset (Fig. 2a) shows a slightly increasing trend for both T7–DB (2220 high-resolution pairs used) and T4–T6 (684 high-resolution pairs) with a slope of 0.0038 (±0.0010) and 0.0072 (±0.0025) m s^{−1} (°C)^{−1}, respectively, but the high-resolution data (Fig. 2b) have slightly different slopes: 0.0027 (±0.0010) m s^{−1} (°C)^{−1} for T7–DB (1569 high-resolution pairs used) and 0.0080 (±0.0061) m s^{−1} (°C)^{−1} for T4–T6 (only 89 high-resolution pairs, unfortunately). The difference is so evident probably because we have removed data from the early period: most of the early period has low-resolution data and the remaining high-resolution data show a slope more representative of recent years.

Since the coefficient *A* has shown temporal variability, to fully separate the time variation from changes in *A* with temperature, we subtract the yearly averaged *A* (in Tables 3–5 in CW13) from each individual *A*. The relationship of the residual (*dA*) and temperature is presented in Fig. 2c, showing an increasing trend with temperature with slopes of 0.0025 (±0.0010) and 0.0050 (±0.0027) m s^{−1} (°C)^{−1} for T7––DB and T4–T6, respectively, ~30% smaller than that in Fig. 2a. In this scenario, we assume that the time variation has been totally removed.

Using data within a period of almost constant fall rate and containing a large amount of data from high latitudes (CW13) (1985–2000 for T7–DB, 1985–92 for T4–T6) (Fig. 2d), coefficient *A* increases with temperature, with a slope of 0.0051 (±0.0020) m s^{−1} (°C)^{−1} for T7–DB, ~20% more than that in Fig. 2a; and 0.0045 (±0.0041) m s^{−1} (°C)^{−1} for T4–T6, similar to Fig. 2a. A smaller slope [0.0037 ± 0.0036 m s^{−1} (°C)^{−1}] is found when 1966–84 data are used for T4–T6.

Based on the four tests shown above, the fall-rate coefficient *A* increases with temperature from 0.0025 to 0.0051 m s^{−1} (°C)^{−1} for T7–DB and from 0.005 to 0.008 m s^{−1} (°C)^{−1} for T4–T6 when different subsets of the side-by-side dataset are used.

In this paper, we use the relationship in Fig. 2c, where we remove the time variation in *A* as a first guess of the *A–*temperature relationship. In short, the obtained relationship for T4–T6 and T7–DB is, with 95% confidence interval, as follows:

The statistical *p* value of the trend is 0.246 × 10^{−3} for T4–T6 and 0.58 × 10^{−5} for T7–DB, resulting in a rejection of the null hypothesis (there is no significant trend between *dA*_{temp} and Temp_{100}), suggesting the two trends are robust. The different temperature dependence of the T4–T6 and T7–DB probes are possibly due to the small sample size and the low resolution of the T4–T6 sample. Physically, the two types of probes are identical when submerged in water, but they have different weight distributions: for T7–DB there is more wire in the upper part of the probe and less weight for the zinc nose.

Since a relationship between fall rate and water temperature is apparent, the fall rate may be variable with latitude as well. Despite some large uncertainties at some latitudes, the terminal fall rate appears to be lower at high latitudes and higher in the subtropics in both hemispheres (Fig. 3a). However, in the tropics (from −10° to +10°), the terminal fall rate seems to be constant, approximating the H95 FRE for both T7–DB and T4–T6. From 60° to 10°S and from 60° to 10°N, coefficient *A* increases from high to low latitudes for both T7–DB and T4–T6.

In Fig. 3b, the temporal variability of *A* is removed. When we compare the four zonal slopes of *dA* after removing the time variation with the slope of *A* (in Fig. 3a), the slope of the fall rate is reduced by 4%–40%. The largest reduction exists for T7–DB data in the Southern Hemisphere, where the slope is reduced from 0.0019 (±0.0007) to 0.0011 (±0.0007)°C (°C)^{−1}, around 40%. This test indicates that the time variation feature of *A* may only partly reduce the zonal variation, since the removal of the time variation only partly reduces the zonal variation of the total bias.

### c. Correlations between fall-rate coefficients

In this section we examine the possible correlations among the three coefficients in our proposed fall-rate equation. We assume *A* to be the main term and investigate representing the *B* and offset terms as a function of *A* and investigate whether they are linearly dependent on the speed *A*. Using this approach, the speed *A* is supposed to be able to allow us to estimate both *B*, representing the variation of the probe mass and ambient conditions, and offset, linked to the strength of the initial transient time and the deployment conditions.

The correlations between coefficients *B* and *A* are obtained by fitting the data using Robustfit in MATLAB for T4°C T6, T7°C DB, and TSK (all types from the side-by-side dataset) separately. We looked for a linear relationship between *A* and *B* (such as *B* = *a*_{1}*A* + *a*_{2}) and we found (Fig. 4a) the following:

where the 95% confidence interval is given in parentheses. The statistical *p* values for the three trends are 0.274 × 10^{−7}, 0.386 × 10^{−11}, 0.47 × 10^{−2} for T4–T6, T7–DB, and TSK, respectively, indicating the null hypothesis (there is no significant trend) is rejected. The results for T7–DB have a smaller range of uncertainties (smaller *p* value) than for T4–T6 because of a larger number of T7–DB profiles. The correlations for TSK are completely different from the Sippican probes, and all the error bars overlap, indicating that the statistical difference of the results for TSK is less robust.

Then, we looked for a linear correlation (offset = *b*_{1}*A* + *b*_{2}) for the T7–DB, T4–T6, and TSK probe types. The results are (with the 95% confidence interval) (Fig. 4b):

The statistical *p* values for the three trends are 0.345 × 10^{−7}, 0.380 × 10^{−11}, 0.0101 for T4–T6, T7–DB and TSK respectively, indicating the rejection of the null hypothesis test. The correlation for TSK is different from T4–T6 and T7–DB.

Why are the coefficients correlated? The three coefficients are physically independent of each other, because they are used to independently model the XBT falling motion. However, the results show they are significantly correlated with each other. We discuss the possible reasons for these correlations in appendix A and the impact of the offset term on the *A*–*B* correlation in appendix B. A recent study based on a numerical simulation of the XBT probe entry indicates that the velocity term is probably dominant, and it seems to influence both the deceleration and offset terms (Abraham et al. 2014). By using the correlations, we make an assumption about the XBT bias: the correlations are an intrinsic feature of the fall-rate equation. The correlations shown above are used as the first guess to apply to the global-scale dataset.

### d. Test of the correlations using the side-by-side dataset

To test whether the XBT FRE is dominated by term *A* and if a new version of FRE in terms of *A* is possible, we first check the side-by-side data using the specific *A* coefficients and then substitute *A* into the *B* and offset terms in Eqs. (5)–(7) and (8)–(10), respectively. After this step, the residual thermal bias is compared with the total thermal bias and the pure thermal bias (calculated as described in CW13) in Fig. 5. Biases for both the T4–T6 and T7–DB probe types are reduced to an amount comparable to the pure thermal bias, suggesting that the correlation works well. The correlation method is not as effective during the years around 1975 for T4–T6 when the correlation method reduces the thermal bias by about half compared to the pure thermal bias. Since the number of pre-1980 pairs in the side-by-side dataset is small, large uncertainties are expected. These results reasonably suggest that *A* is the dominant coefficient in determining XBT depth. Therefore, we apply these correlations to the global-scale analyses, trying to correct depth in a similar way.

Because of the small size of the side-by-side dataset, the spatial and temporal distribution is limited and the errors obtained using the correlation method are large in the early time period. To try to reduce these errors and to get more information on the XBT bias history, we examine the biases in the global-scale XBT dataset.

## 4. Results from global-scale pairs

### a. Simplified FRE to model the XBT depth in the global-scale pairs dataset

Since the global-scale pairs are not true XBT–CTD comparisons, to achieve computational efficiency and to avoid uncertainties introduced by using four variables (three fall-rate coefficients and the pure thermal bias), the fall-rate equation is simplified from the equations in section 3c to

For the global-scale dataset, Eq. (11) is used to model the XBT depth (*D*). The values for the constants *a*_{1}, *a*_{2}, *b*_{1}, and *b*_{2} are given in section 3c for each probe type and coefficient *A* is calculated for each XBT–CTD pair in the global-scale dataset.

### b. Bootstrap analyses to determine the length of each temporal bin

A bootstrap test is needed to determine the length of each temporal bin to obtain statistically significant results, since the global-scale pairs are not evenly distributed over time and small numbers of pairs in some time periods may lead to spurious results.

The depth of each XBT profile is calculated using Eq. (11), leaving only the pure thermal bias (individual corrections). Next, a bootstrap analysis is conducted on the temperature differences after individual corrections. For each group, we randomly select *n* pairs (from 50 to 3750 in increments of 100) at a specific depth (from 5 m to the terminal depth in increments of 50 m). The median of the temperature differences is then calculated. This procedure is repeated 200 times and 200 medians are obtained. The standard error of these medians is calculated at each depth for each *n* and each group (Fig. 6).

The pure thermal biases have larger uncertainties in the upper ocean for all the groups, where the water properties change with space and time much more than in the deeper ocean. For G1-T7–DB, G2-DX, G3-T4–T6, and G4-SX, the standard errors for all depths, except 10 m, decrease to 0.02°C (2 times the XBT thermistor sensitivity) when the number of pairs is more than a threshold of ~(2200, 2000, 2200, and 3000, respectively). There are larger errors for G5-T10 than the other Sippican groups, and the standard errors for all depths (except 10 m) decrease to 0.04°C (4 times the XBT thermistor sensitivity) when there are more than 800 pairs. The standard errors for T5 (except 10 m) are less than 0.04°C when there are more than 700 pairs. For the TSK pairs, 900 and 850 pairs are needed for TSK–T4–T6 and G9-TSK–T7–DB, respectively, to decrease the standard errors to 0.04°C. However, for G8-TSK-T5, there are only ~800 pairs in total, so we do not specify a threshold for this group.

Based on these results, we select time windows to diagnose the time variation of the depth and the pure thermal bias. Initially, 5-yr bins centered on each year are used from 1967 to 2010. Most XBTs delivered from Sippican–TSK are probably used in about 2 years, so a 5-yr window could easily include numerous distinct batches, with distinct fall rates. Additionally, in each 5-yr bin, if the number of pairs is less than the threshold shown above, then the length of the bin is extended until the number of pairs is equal or greater than the threshold. The maximum length of each window is 10 years. We used 7-yr bins for the TSK-T5 group, owing to the limited amount of data.

### c. Results after individual corrections

After the application of the individual corrections to the XBT profiles, the depth error in each profile is removed, and the residuals are regarded as pure thermal bias. CW13 found the pure thermal bias is generally consistent with depth. This property is used to test our method of depth correction. Before individual corrections, all Sippican XBTs have depth-varying temperature biases (Fig. 7). After correction of the depth error, the depth variation is reduced for all groups, leaving a more depth-independent thermal bias as found in CW13. The medians of the pure thermal biases are approximately 0.030°, 0.00°, 0.08°, 0.04°, 0.04°, 0.06°, 0.05°, 0.31°, and 0.09°C from G1 to G9, respectively. Positive bias is found in all probe types over the past 45 yr. TSK probes seem to have larger pure thermal biases than Sippican probes, and T4–T6 probes have a warmer bias than T7–DB for both manufacturers.

### d. Time variable pure thermal bias

As suggested by previous studies, the pure thermal bias is time variable and contributes to the XBT bias history. In this study, the residuals after the individual corrections to the depth error are regarded as pure thermal bias. The yearly averaged pure thermal bias is not always consistent over depth (Fig. 8, left panels). Studies by GR10 and G12 using global-scale data also show pure thermal biases varying with depth. The source of the variations in thermal bias with depth found in the global-scale studies could be the background mesoscale and seasonal errors associated with the large spatial and temporal distances between the XBT data and the reference CTD data.

The residuals are averaged over depth to obtain a depth-averaged bias (Fig. 8, right panels), compared with results from GR10, G12, H12, CW13-CH, CW13, and L09. Generally, L09 shows much larger temperature corrections for each group than the other schemes, since they encapsulate depth-related biases as well as pure thermal bias.

The pure thermal biases for G1-T7–DB show a bias shift near 1984 from a larger bias (~0.15°C) to a much smaller bias, similar to the results from the side-by-side data (CW13/CW13-CH).

G2-DX has a similar time evolution as T7–DB, but with an earlier bias shift near 1980. This result differs significantly from H12 and G12, who find a near-consistent pure thermal bias over time, but match well with GR10.

The T4–T6 shows a much later decrease of pure thermal bias (~1985) compared with the side-by-side data result in CW13 (~1975). CW13 suggests that the change from strip charts to digital recording systems may be responsible for the thermal bias shift during early 1980s. It is possible this shift happens earlier in the side-by-side data (where scientific organizations and navies were testing new digital equipment) than in the global-scale data.

The G4-SX matches well with G12-SX pre-1985 and is generally consistent with other schemes (except L09) since 1995. A larger pure thermal bias is detected from 1985 to 1995 for SX with very larger uncertainties, due to a small data amount at this period. Both G4-SX and G2-DX show an earlier shift compared with T7–DB and T4–T6, suggesting a mix of other probes or FREs in the SX and DX groups. Significant differences appear between the DX and SX groups, so we recommend not merging the two datasets into a single group.

For Sippican T10, the pure thermal bias shifts from a slightly negative bias pre-1990 to a consistent warm bias after 1990. The bias reaches a maximum near 1995–2000. G12’s result for T10 is well within the error bars of our results.

Sippican T5 probes show a larger bias pre-1980 (~0.25°C) than later (~0.05°C), values consistently larger than the other Sippican probes. For TSK-T4–T6 and TSK-T5, the pure thermal bias again shows a shift from a larger warm bias to a smaller bias in the 1990s. The TSK-T7 probes show an increasing bias history over time from the late 1980s to the 2000s.

Although some consistencies in pure thermal bias are found between global-scale data in this study and the side-by-side data in CW13 (such as T7–DB, SX), the discrepancies are significant between T4–T6 and DX. For example, during 1970–90, the pure thermal bias experiences a shift, but the shift time is different for the two datasets. This discrepancy indicates that the corrections derived from the side-by-side dataset may not be adequate to represent the bias history in the global XBT dataset.

### e. Time variable depth error

In this section, we examine the time variation in the XBT fall rate. Using the scheme to calculate the depth from correlations between *A* and the other coefficients (described in section 3c), the depth of an XBT profile is fully characterized by the fall-rate coefficient *A*. The time variation of *A* is obtained by calculating the maximum depth-weighted median of *A* from the global-scale pairs in each year bin (Fig. 9, left panels) and comparing the results from the side-by-side data. The results show again the existence of the time-varying depth error (DiNezio and Goni 2011; Gouretski and Koltermann 2007).

There is a similar evolution for G1-T7–DB, G2-DX, G3-T4–T6, and SX groups. The value of *A* reaches a peak near 1984 for T7–DB and DX, but near 1989 for T4–T6 and SX. These results from the global-scale data agree well with those from the side-by-side data, suggesting that the side-by-side dataset seems to be able to represent the time variation of the depth error in the global XBT dataset. On the other hand, G5-T10 shows a decreasing fall rate since 1997. The value of *A* for G6-T5 has a peak around 1985 and it then decreases. For TSK–T4–T6 and T7, the *A* values are all slightly larger than in the H95 FRE and show a smaller time variation. Recent Sippican T7–DB and DX probes after 2000 have a fall-rate equation closer to Kizu et al. (2011)’s results than to the manufacturer’s or H95’s fall-rate equation.

The corresponding depth errors as a function of depth (Fig. 9, middle column) lie between the H95 FRE in green and the manufacturer FRE in cyan. For G1-T7–DB and G2-DX, the depth errors diagnosed in this study generally agree with those presented in G12. A contradiction is found in the Sippican-T10 group between G12 and this study. The errors in the upper 100 m are negative in this study compared with both the manufacturer’s and H95’s FREs.

We then compared the results based on our correction scheme with those proposed by the following independent studies: H95, W08, IK09, GR10, GD11, H12, G12, CW13, and CW13-CH. First, it is apparent that there are differences among these studies:

the corrections are depth constant (W08, GD11) or depth variable (H95, IK09, GR10, G12,H12, CW13, and CW13-CH), and here depth variable is defined as a depth error (or temperature bias) that depends on depth

pure thermal bias corrections are included (GR10, G12, H12, CW13, and CW13-CH) or not (H95, W08, IK09, and GD11), and only L09 presents thermal bias correction

the corrections are proposed separately for different probe types (IK09, GD11, G12, CW13, CW13-CH), or according to the depth, namely, shallow or deep XBTs (W08, GR10, and H12)

To account for these differences, the depth corrections are converted into a depth correction factor (Fig. 9, right column) defined by CF = *D*_{cor}/*D*_{uncor}, where *D*_{cor} is the XBT depth after correction and *D*_{uncor} is the XBT depth in the global-scale dataset. The *D*_{uncor} is calculated from the H95 equation with the exception of T5 and T10, which use the original manufacturer’s fall rate (see Table 1). The correction scheme of W08 is time varying but constant over depth. The CF for IK09, CW13, CW13-CH, GD11, and this study is calculated using CF = mean[*D*_{cor}(*t*)/*D*_{uncor} (*t*)], where *t* is the elapsed time from 1 s to approximately the maximum acquisition time (40s for T10, 80s for T4–T6–SX–TSK-T4–T6, 140s for T7–DB–DX–TSK-T7, 350s for T5–TSK-T5) with a bin of 1 s. The *D*_{cor}(*t*) is calculated according to each scheme’s corrections. For GR10, GD11, and G12, CF is calculated from CF = mean[*D*_{cor}(*d*)/*D*_{uncor} (*d*)], where *d* is the depth calculated from the H95 FRE (or a specific manufacturer’s coefficients for T5–T10) from 1 m to the terminal depth (250, 520, and 890 m for T10, T4–T6–SX–TSK-T4, T7–DB–DX–T5–TSK-T5–T7, respectively) with a bin of 1 m. The *D*_{cor}(*d*) is the corrected XBT depth calculated using each scheme. It should be noted that the correction factor is only an approximation of the depth error and assumes the depth error is depth independent. However, it is still worthwhile comparing these correction schemes in this manner to provide insight into the depth error history.

Generally, all of the correction factors for G1–G4 show a slightly increasing fall rate with time during 1975–85, and a decreasing fall rate after 1990, where the corrections mostly lie between H95 and the manufacturer’s FRE. The depth-averaged corrections for T7–DB and DX for all of the correction schemes except H12 and GD11 agree quite well from 1987 to 2010. All of the schemes share a similar history, but our results show a much weaker time variation compared with other studies.

The schemes of W08, IK09, and GD11 have a larger time variation of depth corrections for T7–DB, DX, T4–T6, and SX, especially before 1990, which may be due to the assumption of zero pure thermal bias, leading to the overestimation of depth errors. Even separating the depth error from the pure thermal bias, H12 still shows a dramatic time variation of depth error, which is larger than schemes that also separate the two error sources (G12, GR10, CW13, and this study).

The depth correction factor for T10 has different histories in the five independent studies of GR10, G12, GD11, IK09, and this study. We found a slightly decreasing correction factor for T10 from the 1990s to the 2000s. For T5, all three schemes (GR10, IK09, and this study) show a peak within 2000–05 but differ for pre-1995. IK09 and this study suggest an increasing correction factor from 1980 to 2005. For TSK probes, our corrections are stable over time and generally agree with IK09 for TSK-T4–T6.

In all, the difference in depth correction factors between the global-scale dataset (this study) and the side-by-side dataset (CW13, CW13-CH) are small (within one standard error bar as in Fig. 9, left column), suggesting that the depth error is consistent over different datasets. The small number of pairs in the CW13 side-by-side dataset means comparisons with the global-scale trends for the T10, T5 and TSK groups cannot be made.

### f. The influence of temperature on depth error and pure thermal bias

From the analyses of the side-by-side dataset, both the fall rate and the pure thermal bias increase with water temperature. Therefore, two questions are proposed:

Is the time variation of the fall rate and the pure thermal bias caused by the different temperature of seawater sampled by the XBT system during XBT–CTD comparison tests?

How does water temperature contribute to the historical XBT bias?

The average temperature of the water column sampled by XBTs during the comparison tests during the early era (pre-1980), based on the XBT data distribution only, is up to 10°C warmer than that in the latter time period (post-2000). How much does the temperature contribute to the time variation of the XBT bias? To answer this question, we apply two temperature-related corrections to the historical dataset. The *A* coefficient was modified as described in Eqs. (3) and (4), and the temperatures were modified following Eqs. (1) and (2). After these steps, we found a reduced magnitude of biases (with the exception of TSK–T4–T6–T10) and a slightly reduced time variation (with the exception of TSK-T7) (Fig. 10).

The regional variation of bias may be strongly influenced by water temperature, since the upper-ocean temperature has more than a 20°C difference between the tropical and high-latitude regions. At six depths (100, 200, 300, 400, 500, and 600 m), the zonal gradient of the bias is reduced significantly when the temperature-related corrections are applied (Fig. 11). However, the residual of the biases after the temperature-related corrections (blue) is larger at the lower latitudes, since the XBT was mainly deployed at low latitudes in the early era, when data show a larger warm bias.

## 5. A time, probe type, and temperature variable bias correction

In section 3, we determined the relationship among the three fall-rate coefficients, and the relationships between the water temperature and the fall rate/pure thermal bias. Based on these results, in section 4, we simplified the fall-rate equation and diagnosed the time evolution of the fall rate and the pure thermal bias using the global-scale dataset. The influence of temperature on the fall rate and the pure thermal bias was also examined. Using these results, a new correction scheme for historical XBT data is proposed, by using the results from the global-scale dataset. Global-scale data, instead of side-by-side data, are used because (i) the side-by-side dataset contains insufficient data to derive corrections for probe types other than T4–T6 and T7–DB; (ii) although we found the fall-rate coefficient for the two datasets had a similar time variation, the pure thermal bias was considerably different (such as in the T4–T6 and SX groups). Since the global pairs database is more representative of the global historical archive, we chose to use the depth error and thermal bias calculated from this dataset to get our final correction scheme.

### a. Corrections for pure thermal bias

We assume the pure thermal bias is determined by both time and water temperature. For an XBT profile in a specific year (denoted as “year”), the correction to the pure thermal bias is determined by

where *dT*_{temp} is the temperature-dependent part of the thermal bias, which is specific to each XBT probe type. Initially, the column-averaged temperature for the XBT is calculated from the XBT profile. The term *dT*_{temp} is calculated using the relationship between the pure thermal bias and the column-averaged temperature described in Eqs. (1) and (2). For the global-scale dataset, the yearly averaged values of *dT*_{temp} are listed in Table 3.

The term *dT*_{year} represents the time variable part of the pure thermal bias. It is calculated by subtracting the yearly averaged pure thermal bias (*dT*_{temp}, Table 3) from the total pure thermal bias (red lines in Fig. 8, right panels). The values of *dT*_{year} in each year for each group based on the global-scale dataset are presented in Table 3.

By summing *dT*_{year} and *dT*_{temp}, the correction to the temperature measurements (*dT*) is obtained.

### b. Correcting the depth error

We assume that the depth error is affected by calendar year and 0–100-m-averaged water temperature. The depth in each XBT profile is first recalculated to the H95 FRE (except for the T5 and T10 groups), then corrected using the terms *dA*_{year} and *dA*_{temp}, which represent the time and temperature variable corrections, respectively. In detail, XBT depth (*D*) is recalculated as

where *A*_{XBT} is the H95 fall rate (6.691 m s^{−1}) for all the probe types with the exception of T10 and T5–TSK–T5, where *A*_{XBT} = 6.301 m s^{−1} for T10 and 6.828 m s^{−1} for T5–TSK–T5. The coefficients *B* and offset are calculated from the correlations *B* = *a*_{1}*A* + *a*_{2} and offset = *b*_{1}*A + b*_{2}, where the coefficients *a*_{1}, *a*_{2} are given in Eqs. (5)–(7). Coefficients *b*_{1}, *b*_{2} are constant with time and presented in Eqs. (8)–(10).

To correct the global-scale data, the relationships obtained from the side-by-side dataset are used. XBT profiles with types T7–DB, DX, and T5 use the relationship obtained for T7–DB; profiles of type T4–T6, SX, and T10 use the relationship obtained for T4–T6; and all of the TSK profiles use the relationship obtained using the results of the TSK pairs.

The term *dA*_{temp} is the part of the *A* coefficient influenced by water temperature. For each XBT profile, the 0–100-m-averaged temperature is obtained by using the XBT temperature records (Temp_{100}). Next, the *A*–temperature relationship described in Eqs. (3) and (4) is used to calculate *dA*_{temp} (Table 4).

The term *dA*_{year} is the time variable part of the XBT fall rate. This term is obtained by subtracting the temperature varying fall rate (*dA*_{temp}) from the total time-varying term *A*_{total} (as shown with red lines in the left panels of Fig. 9): *dA*_{year} = *A*_{total} − *A*_{XBT} − *A*_{temp}. The yearly average values of *dA*_{year} are shown in Table 4.

It should be noted that Tables 3 and 4 list the corrections applied to the *WOD09* dataset, where the temperature-related parts (*dT*_{temp} and *dA*_{temp}) are the yearly averaged values. This can be used to calculate historical ocean heat content. However, if one needs a more careful correction to an individual XBT profile, the temperature-related part should be calculated for this individual XBT profile, and then the time variable part (*dT*_{year} and *dA*_{year} in Tables 3 and 4) should be added to this temperature-related part to get the final correction.

## 6. Evaluation of the correction scheme

### a. Temperature differences between XBT and CTD in the global-scale dataset

In Fig. 12, temperature differences between XBT and CTD in the global-scale dataset before and after XBT bias corrections (as in section 5) are presented at each year–––depth for nine groups. Before correction, the XBT data for each group have a time-varying warm bias using the H95 FRE (or the Sippican FRE for T10–T5). After depth errors are removed from the global-scale dataset, reduced warm biases still exist and are time variable. After correction for both the fall-rate error and the pure thermal bias, the temperature differences are reduced further and mostly eliminated, although there remain some small errors.

### b. Quantification of correction performance

As suggested by Tim Boyer in the 2010 XBT Bias and Fall Rate Workshop in Hamburg, Germany, the corrections on XBT data should (i) be able to successfully meet a performance criterion, such as being able to correct 95% of all depths to within 0.025°C versus CTD in statistical tests; (ii) be relatively easy to implement, fully documented, and extendable into the future; and (iii) cover all XBT types, all observation depths, and all time periods (T. Boyer 2010, meeting presentation). Recent correction schemes for XBTs tend to satisfy criteria (ii). However, a consistent validation scheme has not been agreed upon. Existing schemes are validated mainly (i) by calculating the global ocean heat content history with and without the corrected XBT data or (ii) by comparing the correction scheme with the other schemes, to look for similarities between schemes. In this section, we propose a method of measuring correction performance.

At first, we assume that the temperature differences between XBT and CTD data in the global-scale dataset have a normal distribution with a mean (or median) of zero if there is no systematic bias. The distributions of the temperature differences between XBT and CTD (*T*_{xbt} − *T*_{ctd}) before and after correction are inserted in 10-m bins from a depth of 5 m to the terminal depth of each profile. Then, the frequencies of (*T*_{xbt} − *T*_{ctd}) differences are normalized by the maximum frequency (Fig. 13). Before this procedure, the (*T*_{xbt} − *T*_{ctd}) differences are positively biased in all nine groups. The mean (cross) and the median (circle) of the temperature differences are not the same, implying asymmetry in the distributions. All of the correction schemes are used to correct the XBT profiles in all groups. If there is not a correction on a specific type for a correction scheme (such as T5 in GD11), then the corrections are made according to the corrections for unknown types (such as UNK in GD11). If there is no correction available for unknown types, then corrections are made according to their shallow or deep groups. CW13/CW13-CH is not included in this comparison for TSK groups because CW13 provides corrections on TSK–T6 before 1986, but we found a very small amount of TSK data before 1986 in the global-scale dataset.

After correction, we highlight three aspects:

Mean and median. For all data, the scheme proposed in this study has reduced the total biases and the residuals are distributed around zero for all groups. Correction schemes based on the side-by-side dataset (CW13 and CW13-CH) underestimates the biases in T4/T6 and SX. IK09 underestimates the bias for all groups (except TSK-T7). Maybe this is the reason why the historical OHC time series based on IK09 always shows a weaker long-term trend.

The L09 median is negative for most probe types, suggesting an overcorrection, but positive (undercorrected) for T4–T6, T5–TSK–T5. Almost for all groups and correction schemes (except TSK-T7), the mean is distributed at the right side of (warmer than) the median, indicating asymmetric residual distributions. None of the schemes correct the TSK groups well with the exception of this study (and H12 for TSK-T4–T6 and TSK-T7). L09 looks to perform similarly to H12 for TSK-T4–T7 and TSK-T7

Symmetry. Using the 0.5 frequency line (dashed line) as a reference point, the uncorrected data are always biased to the positive. The correction scheme proposed in this study reduces the asymmetry.

Peak. The results of this study have the peak near zero for all groups with the exception of TSK T4–T6 and TSK T7. The position of the peak varies for each correction scheme.

These results give an insight into the distribution of the biases. To quantify the performance of the correction schemes, more detailed statistics are required. We suggest that after correction, the distribution of the (*T*_{xbt} − *T*_{ctd}) differences should meet the following criteria:

The (

*T*_{xbt}−*T*_{ctd}) differences should be minimized. The median of the (*T*_{xbt}−*T*_{ctd}) difference is the first quantity used to assess the correction performance, denoted as the median.The distribution of the (

*T*_{xbt}−*T*_{ctd}) differences should be symmetrical around zero (i.e., it has a normal distribution), and its skewness (calculated by the routine Skewness in MATLAB), is defined as the third standardized moment of a random variable*X*: For a sample of*n*values, the sample skewness is

where is the sample mean, *m*_{3} is the sample third central moment, and *m*_{2} is the sample variance.

These two indices are assessed before and after applying the correction schemes to the XBT profiles in the global-scale dataset. They are calculated using the available (*T*_{xbt} − *T*_{ctd}) differences in 10-m bins from 5 m to the terminal depth in each time window. After a comparison of the median of the (*T*_{xbt} − *T*_{ctd}) differences before and after corrections, Fig. 14 shows the following:

The biases are all significantly reduced for the T7–/DB, DX, T4–T6, and SX groups for all correction schemes.

For G3-T4–T6 and G4-SX, the correction schemes of CW13/CW13-CH underestimate the total bias from 1975 to 1985, which is due to the much earlier shift of the pure thermal bias in CW13 as presented in Fig. 8.

All of the schemes reduce the bias in T5.

For T10 and all of the TSK groups, all the previous corrections are weak in correcting the biases, but we underline that most of the correction schemes include no special treatment for the T5 (with the exception of IK09), T10 (excepting IK09, GD11, and G12), and TSK groups (excepting IK09 and G12).

We suggest that these probe types should be separated, since their bias history seems to be different when compared with other groups.

Comparison of the skew of the results shows that the distribution of the (*T*_{xbt} − *T*_{ctd}) differences is not always symmetrical around zero but still positively biased in general. This asymmetry may be caused by the following:

Incomplete corrections to XBT biases, especially to the depth error.

Errors due to the large temporal–spatial differences between an XBT and the correspondent CTD, introducing mesoscale noises into the distribution.

“Leakage” from small insulation penetrations in the copper wire of the XBT, causing a gradual anomalous increase in temperature with depth (Bailey et al. 1994; Heinmiller et al. 1983; Kizu and Hanawa 2002a). This bias exists in a large part of the global XBT dataset; it is difficult to detect using automated quality-control procedures, and leads to a positive skew. Anderson (1980) states that, among the known sources of wrong measurements, there are more effects (mainly electrical effects) producing a warm variation than a cold one. This means that (statistically speaking) small errors that are not easily identified are more likely to result in a warm bias in XBTs.

Surface thermal inertia caused by large differences in air temperature between the XBT storage area immediately prior to the drop and water temperature, which are also affected by the recording system used (Kizu and Hanawa 2002b). Usually, an XBT can take 3–4 m to equilibrate with the water temperature, but this time is partially dependent on the coupling between the probe and the recording system (up to ~10 m; Kizu and Hanawa 2002a). This transient depends on the thermistor and the acquisition system, which includes also the computer and the ground connection of the recording system to the hull of the ship. Surface thermal inertia may introduce a cool bias near the sea surface, which may contribute to the negative skew.

Different vertical resolution of the data, which introduces vertical variations in the (

*T*_{xbt}−*T*_{ctd}) differences.

Although the contribution of these factors to the distribution cannot be addressed in a single paper, we highlight several properties, which help us to identify the different behaviors of errors for different probe types:

The T7–DB group has a negative skew pre-1980, while DX shows positive skew at the same time. After 1980, the skew of both groups trends downward from 1980 to 1988 and upward from 1990 to 2000. This indicates that there are more mixed probe types in the DX group before 1980 (perhaps including some T5s, Sparton-XBTs).

The T4–T6 and SX groups show a very similar trend of skew, with two peaks near 1987 and 1994. This similarity suggests that the majority of the SX group is of the T4–T6 type. But the skew for SX is generally more positive than that for T4–T6 before 1985, implying that maybe some T10 data are included in the SX groups, since T10 data have a consistently positive skew in most years.

The T5 group has a consistent positive skew from 1966 to 2010, implying that some T5s may be included in the DX group, since the skew of the DX group is always more positive than the T7–DB group.

The TSK groups show a different skew history compared with the Sippican probes. Studies by Kizu et al. (2011) have shown that Sippican and TSK have different characteristics. Only the thermistor is the same, being produced by the same external manufacturer. Any other component of an XBT probe is independently produced and assembled by two manufacturers. For example, TSK probes seem to be less affected by leakage than Sippican probes, probably because TSK uses a different wire with more robust insulation (Kizu et al. 2011).

### c. Evaluation by an independent dataset

An independent dataset (EN3-v2a dataset; Ingleby and Huddleston 2007) is used in this section to further evaluate the performance of various correction schemes. EN3a-v2a is a fully quality-controlled dataset, with data sourced from the *World Ocean Database 2005*, the Global Temperature and Salinity Profile Programme (GTSPP), Argo, and Arctic Synoptic Basin-Wide Observations (ASBO). XBT–CTD pairs are identified using the same strategy proposed in section 2b. All correction schemes shown in the previous sections are used to correct the XBT profiles, with results shown in Fig. 15. All of the correction schemes considerably reduce the total XBT bias. The method presented in this study appears to be the best one, reducing the bias within the range of −0.04°~0.04°C at any depth. However, there seems to be a systematical warm bias from 0 to 450 m and a cold bias from 450 to 700 m. Almost all of the correction schemes give a similar depth variable temperature difference between XBT and CTD, but the reason is unknown. L09 does not show the depth variable difference, but it overcorrects equally at each depth. Using this independent dataset, our correction scheme results in a reduction of the systematic XBT bias.

This test gives us the opportunity to test the sensitivity of the strategy used to obtain a reference CTD. Previously, we averaged all of the CTD profiles around an XBT profile together to get the reference CTD buddy; an alternative choice is to use the nearest CTD profile to a given XBT profile as a reference buddy. The results of both choices are shown in Fig. 15, indicating very similar results using the two strategies.

### d. Evaluation by checking the temperature dependency of XBT bias

How effective are the various correction schemes at reducing the temperature dependency in the XBT bias? Using the global-scale pairs, we binned the temperature differences between the XBT and CTD buddies at each 10-m depth from 0 to 700 m into 0.5°C water temperature bins (Fig. 16). The median and mean of the temperature differences at each temperature bin are calculated. An increase in XBT bias (median) with temperature is apparent before applying any corrections, with a slope of 0.0151° ± 0.0013°C (°C)^{−1}. This slope is much stronger than that found in the side-by-side dataset. When only applying the temperature-related part of our correction scheme, this temperature dependency is reduced by 22%. When both the temperature-related and the time-related parts of our correction scheme are applied, this temperature dependency is reduced by 34% [0.0101 ± 0.0010°C (°C)^{−1}]. When other correction schemes are applied, the temperature dependency is reduced by 27% (GD11), 28% (G12), 15% (CW13), 19% (CW13-CH), 20% (L09), 14% (IK09), 40% (GR10), 7% (H12), and 17% (W08). GR10 and the scheme presented in this study result in a larger reduction of temperature dependency than the other schemes. However, even after bias corrections, temperature dependency can still be found, suggesting that there may be some unknown temperature-varying errors.

## 7. Re-estimation of global ocean heat content

After applying our corrections to the XBT data in the *WOD09* dataset, the historical ocean heat content from 1967 to 2010 is reestimated according to the methodology presented in L09. XBT data are from the *World Ocean Database* as of January 2011, which includes data to the end of December 2010. Quality-control procedures for the XBT data follow the procedures outlined in Boyer and Levitus (1994). XBT data are corrected by using three alternative correction schemes: L09, CW13, and this new scheme. All available data (XBT, MBT, CTD, bottle, moored buoy, glider, undulating CTD, drifting buoy) were used to obtain yearly anomalies and then calculated by subtracting standard-level interpolated values from mean monthly climatology, and averaging over 1° grid boxes.

The yearly anomalies were objectively analyzed to map to the full-ocean grid, following L09. The mean monthly climatologies were calculated from profiling floats only, to remove any possible effect due to XBT bias. Float cycles used were collected from 2004 to 2011, delayed-mode quality controlled when possible, real-time data otherwise. Temperature anomaly fields were converted to ocean heat content using a volume of water dependent on vertical length halfway between adjacent standard depth levels. Ocean heat content was then summed over all layers, 0–700-m depth, to arrive at the final yearly ocean heat content values.

The OHC under three correction schemes shows consistent structures as in Fig. 17. The warming “hump” during 1970–85 is reduced compared with both raw data and with CW13 corrections. The linear trend of the OHC changes from 1967 to 2010 is about ~0.36 (10^{22} J yr^{−1}) based on our results. The new correction scheme gives a robust confirmation of the so-called global warming of ocean water, due to a better evaluation of the critical period from 1960 to 1990, when XBT data dominated the ocean observation system.

## 8. Summary

The systematic biases in historical XBT data were examined by comparing XBT with CTD profiles in two datasets: side-by-side XBT–CTD comparisons and global-scale XBT–CTD pairs extracted from the *WOD09* dataset. The combination of the two datasets provided a new insight into the XBT bias history.

We first suggested in this paper that the structure of the original Sippican FRE can be improved: the addition of an offset term helps to better reproduce the XBT probe motion since the beginning. Moreover, we proposed and verified that the three coefficients in this new FRE are correlated with each other. We accordingly hypothesized that the *A*–*B* and *A*–offset correlations are part of the intrinsic nature of the XBT FRE. So, a new approach was presented to correct the XBT depths using only one fall-rate coefficient *A*, where the other two coefficients *B* and offset were calculated from *A* based on their correlation with *A*. In such an approach, the number of unknown variables needed to describe the XBT motion was reduced from four (*A*, *B*, offset, and pure thermal bias) to two (*A* and pure thermal bias). A test based on side-by-side dataset using this new description indicated the model works well, enabling us to apply the correlations to the global-scale dataset. We also verified that *A* increases slightly with water temperature in the side-by-side dataset. Then, this relationship was applied to the global-scale dataset due to its influence on the local variation of the measured bias.

By using the global-scale XBT–CTD pairs dataset, we obtained robust indications of a different temporal evolution of the XBT falling motion dependent on probe type. Similar characteristics to previous studies were reproduced: the XBT falls slower than the H95 FRE states during 1975–85 and after 2000. The most likely source of fall-rate variation over time is changes in the probe design by the manufacturer (such as probe weight, shape, dimension, etc.). A time variable pure thermal bias was also obtained; shallow and deep XBTs have a warmer period before 1985 as large as 0.2°C, but T10 and T5 probes have different thermal bias histories compared with other Sippican probes. This could be partially linked to the use of analog recording systems as well, as noted in the XBT bias literature. In addition, the different influence of even digital recording systems on the quality of the XBT profiles became evident after the field tests described by Wright and Szabados (1989). The TSK-T4 and TSK-T5 groups showed a thermal bias shift near 1998–99, while TSK-T7 showed a consistent thermal bias over time. Comparisons among several independent studies showed a similar bias history for XBT profiles, but all results had a large variation.

Therefore, a new time-, temperature-, and probe-type-dependent correction scheme was proposed to correct the historical XBT data. As validation of this new approach, we corrected the XBT data in the global-scale XBT–CTD dataset: the so calculated thermal biases are significantly reduced when compared to previous results. We checked the performance and the robustness of the new correction scheme via an objective performance test based on two simple statistics: the median of the temperature differences between XBT and CTD, and the skew of their distribution. Ideally, both these values need to be minimized so that any systematic biases are removed. We applied this technique to nine different correction schemes available in the literature. We noted that all schemes improved the median of the thermal differences, but the skew has marginally significant improvements. These metrics provide a new insight into the behavior of the biases of each probe type.

The XBT data in an alternative, fully quality-controlled dataset (EN3-v2a) was corrected by using various correction schemes. The results suggest all of the correction schemes do not perfectly correct the EN3 dataset; the best scheme is the one proposed in this study. After correction, the 0–700-m averaged bias is eliminated, and the bias at each depth is also reduced to ±0.04°C.

Based on the results presented in this study, XBT biases in both the *WOD09* and the EN3-v2a dataset can be reduced by using our proposed correction scheme. Application of this new correction scheme to an estimation of the global ocean heat content was evaluated. This re-estimation of global ocean heat content from 1967 to 2010 results in a robust warming trend.

## Acknowledgments

This work is supported by the project “Structures, Variability and Climatic Impacts of Ocean Circulation and Warm Pool in the Tropical Pacific Ocean” of the National Basic Research Program of China (Grant 2012CB417404), National Natural Science Foundation of China (Grant 41075064), and Key Technologies R&D Program of China (Grant 2011BAC03B02).

### APPENDIX A

#### Possible Reasons for Correlations in *A*–*B* and *A*–offset

The correlations suggested in section 3c indicate that the three fall-rate coefficients are not independent of each other. Therefore, we assume that the correlations are an intrinsic feature of the XBT fall rate, and then we used their correlations in the global-scale dataset. It is impossible now to directly verify this assumption; instead, in this section, we discuss some possible reasons for the correlations.

##### a. Behavior of the noise?

CH11 simulated an XBT profile by using the manufacturer’s FRE to calculate the depth and sampled the temperature from a CTD profile using H95’s FRE. In this way, the depth error is the only error source. The simulated XBT profile is corrected using both the integrated and H95 methods. Both sets of results show that *A* and *B* are distributed along a line in the *A–B* plane. As discussed by H95, the coefficients *A* and *B* are distributed in a statistical ellipse on the *A*–*B* plane, and the major axis of the ellipse lies along the line of the least depth error. The noises in the model or data force the resultant coefficients to distribute in an elliptical shape, which is similar to the correlation of the two coefficients we presented in Fig. 4.

As suggested in both analyses, if there is no surface offset, the correlation between *A* and *B* is likely to be related to the behavior of the noise, due to the inaccuracies in the fall-rate model, to improper methods, or noise from the measurements. If something different with respect to the description of the “standard” two-coefficient FRE happens in the near-surface layer (during the first seconds of the XBT motion), then the three coefficients will also be distributed in an ellipsoid following the math in H95, and a correlation still exists.

##### b. Terminal velocity error?

An offset term in an FRE sometimes appears in the XBT literature. If we exclude early “step” descriptions (Heinmiller et al. 1983; Seaver and Kuleshov 1982), then it is since the 1990s that this option was proposed as a way to improve the description of the first seconds of the XBT motion (Biggs 1992; Boyd and Linzell 1993; Hallock and Teague 1992; Prater 1991; Reseghetti et al. 2007; Singer 1990). The motivation for including this term is to model the depth error in the upper-40 m, accounting for the dynamic adjustment of the probe in the first few seconds in the water. Tests conducted in very shallow waters show that the probe is accelerating in the first 3 s (GR10) until it reaches a stable fall rate. Whatever the physical explanation of the surface offset, all of the factors lead to a variable velocity in the surface layer compared with a constant velocity *A*. We now introduce a simple conceptual model to examine the effect of this start-velocity error on the calculation of the fall-rate equation.

Assuming that the actual fall rate of the probe in the first 3 s (0 s < *t* < 3 s) in the water is modeled by *υ*_{1}(*t*) = *A*_{1 }*−* 2*B*_{1}*t − m*/(*t*^{2} + *s*^{2}*)*, where *t* is the elapsed time after the probe hits the sea surface; *υ*_{1} is the probe velocity, which varies with *t*; the term *m*/(*t*^{2} + *s*^{2}) models the velocity in the first seconds, apparent as an acceleration when *m* > 0; and *s* is a constant coefficient modeling the velocity change.

The XBT depth according to the actual fall rate is

where the series expansion is used to simplify the function of tan^{−1}(*t*/*s*). If we summarize the higher-order terms in *D*_{1} and denote the sum as *h*, then the depth *D*_{1} can be simplified as:

The coefficients offset and *A* will be offset = *mh*/*s* and *A* = *A*_{1} −*ms*^{−2}, respectively, which shows that the two coefficients are linked by the terminal velocity error [modeled by *−m*/(*t*^{2}+*s*^{2})]. Therefore, if there is a negative start-velocity error, then this error will finally lead to a positive offset term and an *A*–offset correlation when calculating the FRE.

To further clarify this model, we tried to simulate the tests described in GR10, where results of the XBT probes dropped in a shallow area (bottom depth down to about 50 m) are shown. They find that the average probe velocity changes from 6.1–6.2 m s^{−1} in the first 5.5 m to 6.45–6.6 m s^{−1} at 47.2 m. In our simulation, the coefficients in fall rate *υ*_{1} are set to be *m* = 0.8, *s*^{2} = 1.5, *A*_{1} = *A*_{H95} = 6.691 m s^{−1}, and *B*_{1} = *B*_{H95} = 0.002 25 m s^{−2} to get a similar fall-rate structure in the first 30 m as in GR10. Our simulated fall rate is shown as the green line in Fig. A1. According to the math presented above, we can calculate the depth error compared with H95 at 5 s (~35 m) as

The obtained depth error at 4 s (~30 m) is ~0.83 m, about half of the mean surface offset calculated from the side-by-side data (~1.8 m as in Fig. A2a), a value nearly coincident with the correction obtained in a different way in Reseghetti et al. (2007), where a shift up of about 2.0 m (or 0.3 s) was proposed.

In the real world, the surface velocity will not be exactly represented by *υ*_{1}(*t*) = *A*_{1} − 2*B*_{1}*t* − *m*/(*t*^{2} + *s*^{2}); instead, the velocity in the first seconds seems to be a complex function of elapsed time (in a typical standard FRE, the velocity is linearly decelerating with time since the beginning of the XBT motion). However, the start-velocity error will contribute to the *A*/offset correlation.

##### c. Start-time error?

The start-time error will occur if the XBT data recording system begins working before the probe hits the sea surface. Real ocean conditions are complex with surface waves, spray, bubbles (both natural and induced by the ship motion), etc.—all of them can lead to a false early recording start. Start time might also be affected by wake produced by the motion of the hull, creating both waves and turbulences in the water. Unfortunately, it is almost unrealistic to drop the XBT probe into unperturbed water without a drop having a significant component of the motion parallel to the sea surface (something like a javelin throw). On the other hand, the thermistor in the XBT probe has a time constant of about 0.10–0.15 s and requires around 0.60 s to detect a thermal signal (Reseghetti et al. 2007 and references therein). A start-time error near the sea surface (denoted as *dt*) can contribute to the correlation between *A* and offset. In detail,

If the manufacturer’s FRE is used, then the FRE can be rewritten as .

We define as the offset term: , which results in an apparent correlation between offset and

*A.*

In the case where surface water sprays induce a fake contact, the acquisition system will start before the probe hits the water’s surface, and *dt* is always positive. If the *dt* is set to 0.15s, then the resultant offset will be about 0.9–1 m, also approximately half of the systematical bias we detect in the side-by-side data (~1.8 m). If there is a water spray in the ocean with height of ~0.5 m above the sea surface, and the probe begins to work when splashed with the water at a velocity of 7 m s^{−1}, then there will be a time error of about ~0.072 s. This time error leads to an offset of ~0.5 m.

Although there is no direct observational evidence confirming such an error, it is possible that the start-time error can contribute to the offset term.

### APPENDIX B

#### Impact of the offset Term on *A*–*B* correlation

We assumed that the FRE with a positive offset term better describes the actual fall rate of a probe, with an average depth offset of about ~1.8 m. What if the FRE does not contain an offset term? To test the effect of the offset term, we use the FRE without an offset term to correct the side-by-side data using the integrated method. The terminal velocity without the offset term (median *A* = 6.53 m s^{−1}) is smaller than that with offset term (*A* = 6.64 m s^{−1}) by 0.11 m s^{−1} (Fig. A2a). A comparison of the *A*–*B* correlation with and without the offset term (Fig. A2b) shows the *A**–**B* correlation has a greater slope in the case of zero offset at the surface. We present some simple math to explain this slope shift:

We suppose that the actual fall-rate equation is *D* = *At* − *Bt*^{2} − *C*, where *A* = 6.64 m s^{−1}, *B* = 0.002 28 m s^{−2}, and *C* = 1.8 m are the medians of all of the FRE coefficients with the offset term; see Fig. A2a. The manufacturer FRE is *D*_{1} = *A*_{1}*t* − *B*_{1}*t*^{2}. If one uses this manufacturer’s FRE (*D*_{1}) to calculate the actual depth (*D*), the depth error has to be minimized as *D*_{error} = *D*_{1} − *D* = 0, where *D*_{error} is defined as

where *T* is the acquisition time (the time when a probe reaches its terminal depth). If we simplify the slope of the *A*–*B* correlation to *s* = *B*/*A* and *s*_{1} = *B*_{1}/*A*_{1} for the two different FRE models *D* and *D*_{1}, respectively, then the slope *s* = 0.002 28/6.64 = 0.000 343 with units of 1 (s)^{−1} at *T* = 120 s (the acquisition time of the T7–DB probe at a depth of ~760 m). Then we impose the minimization of *D*_{error} = 0, which is (*B*/*s* − *B*_{1}/*s*_{1})*T*^{2}/2 − (*B* − *B*_{1})*T*^{3}/3 − *C* × *T* = 0. So, *s*_{1} is obtained by the following equation: *s*_{1} = 1/[*B*/*sB*_{1} − 2(*B* − *B*_{1})*T*/3*B*_{1} − 2*C*/*BT*].

This solution of *s*_{1} has dependence on *B*_{1}. Since the coefficient *B* is mainly due to the probe weight loss during its fall, we assume *B* is independent of the variant choice of FRE. When *B*_{1} has the same value as *B*, *s*_{1} = 1/(*B*/*sB*_{1} − 2*C*/*BT*), is always greater than the slope *s*; this explains the slope difference when the offset term is included in Fig. A2b. For T4–T6, *T* is 80 s, so the slope *s*_{1} for T4–T6 will be greater than *s*_{1} for T7–DB under the same conditions, as shown in Fig. A2b. The resultant *A*_{1} = *B*_{1}/*s*_{1} will always be smaller than *A* = *B*/*s*, again explaining the differences in *A* (Fig. A2a).

Based on this test, we conclude that if there is no offset term in the FRE, then the surface offset can lead to the stronger *A*–*B* correlation and smaller *A* to compensate for the difference at the surface.

## REFERENCES

*World Ocean Database 2009.*S. Levitus, Ed., NOAA Atlas NESDIS 66, 216 pp.

**36,**L07608, doi:.

**39,**L10603, doi:.

*Diving Safety and Physiology, Ocean Engineering/Technology,*D. Anderson, Ed., Vol. 5,

*Oceans ʼ89: An International Conference Addressing Methods for Understanding the Global Ocean,*Publ. 89CH2780-5, IEEE, 1621–1626, doi:.