## Abstract

Using collocations of three different observation types of sea surface temperatures (SSTs) gives enough information to enable the standard deviation of error on each observation type to be derived. SSTs derived from the Advanced Along-Track Scanning Radiometer (AATSR) and Advanced Microwave Scanning Radiometer for Earth Observing System (EOS; AMSR-E) instruments are used, along with SST observations from buoys. Various assumptions are made within the error theory, including that the errors are not correlated, which should be the case for three independent data sources. An attempt is made to show that this assumption is valid and that the covariances between the different observations because of representativity error are negligible. Overall, the spatially averaged nighttime AATSR dual-view three-channel bulk SST observations for 2003 are shown to have a very small standard deviation of error of 0.16 K, whereas the buoy SSTs have an error of 0.23 K and the AMSR-E SST observations have an error of 0.42 K.

## 1. Introduction

Sea surface temperatures (SSTs) derived from satellite-borne instruments have some advantages over traditional in situ SST measurements from buoys and ships. First, they provide a global coverage that is important in regions with sparse in situ observations. Second, some satellite instruments can give well-calibrated, accurate skin SST observations with global and temporal consistency that is not possible from in situ measurements. Satellite datasets are also beginning to span long enough periods to start being able to detect long-term changes in SST. However, it is important to understand the error characteristics of these data.

This study investigates the errors in SST observations from three different sources: infrared SSTs from the Advanced Along-Track Scanning Radiometer (AATSR), microwave SST observations from the Advanced Microwave Scanning Radiometer for Earth Observing System (EOS; AMSR-E), and in situ SST observations from drifting and moored buoys. A three-way analysis is performed in which all three SST types are collocated using data during all of 2003. Other examples of error derivation using three-way collocation include Stoffelen (1998) and Blackmore et al. (2007). These three observation types are complementary, with each having its own strengths and weaknesses.

In situ SST measurements are affected by the varying depth of measurement according to buoy type. Also, the lack of maintenance of in situ instruments, which mainly affects drifting buoys, contributes to variations in the accuracy of in situ SST observations. Additionally, there are large geographic regions not covered by the buoy network (Emery et al. 2001). Errors related to satellite-derived SST observations include cloud contamination, aerosol, and inadequacies of the retrieval process. Microwave SST retrievals can be obtained in areas that are cloudy, though not those that are precipitating, which provide a better coverage than the infrared sensors, although at generally lower spatial resolution.

An additional issue, applicable to all three observation types, is to consider the depth at which the observation is actually representative. For in situ observations, we generally assume the depth to be around 1 m, although this will vary. Infrared satellite observations retrieve the radiative skin SST, which is only 1 *μ*m thick, whereas microwave satellite observations retrieve an SST a couple of millimeters below the skin layer of the ocean (Donlon et al. 2004). In the analyses, we consider these differences and correct for them where possible.

The three SST observation types are collocated, and statistics of the differences between the pairs of observation types at a time are computed. The standard deviations are used within a statistical method to calculate the error of each observation type, assuming uncorrelated errors. A number of experiments are performed using slightly different collocation criteria to assess whether the results are robust and, therefore, whether the various assumptions are justified.

## 2. Description of sea surface temperature datasets

### a. Advanced Along-Track Scanning Radiometer

AATSR was launched upon the sun-synchronous *Envisat* in March 2002. AATSR has three infrared channels centered in the atmospheric windows at 3.7, 10.8, and 12 *μ*m, plus channels in the visible–near-infrared part of the spectrum used for cloud detection. The instrument has an inclined conical scan that enables it to make observations of the surface from two different angles—nadir and forward views (∼55°) from zenith—within a few minutes of each other to allow for effective correction of atmospheric absorption and aerosol. The instrument is designed to produce *dual-view* SST retrievals to better than 0.3-K accuracy in the derived SST and to give a long-term stability of better than 0.1 K decade^{−1} (Llewellyn-Jones et al. 2001), which is required of observing systems for climate change detection purposes (Allen et al. 1994). The AATSR instrument continues the data collection of high-quality dual-view SSTs begun by Along-Track Scanning Radiometer (ATSR) ATSR-1 and ATSR-2 upon the launch of the first and second *European Remote Sensing Satellites* (*ERS-1* and *ERS-2,* respectively) in 1991 and 1995, respectively.

The measurements are most closely related to the radiative skin temperature (skin SST), which is usually cooler than the subskin by more than 0.1 K (e.g., Hasse 1963; Saunders 1967; Robinson et al. 1984; Murray et al. 2000; Donlon et al. 2002; Horrocks et al. 2003). The skin SSTs are retrieved using physically based algorithms derived from radiative transfer models (e.g., Závody et al. 1995; Merchant et al. 1999). Two types of dual-view skin SST retrieval are possible: those using all three infrared channels (D3) only during the night because of the solar contribution to the 3.7-*μ*m channel during the day and retrievals using just the 10.8- and 12-*μ*m channels (D2) during both day and night. The AATSR data used for this study are supplied as an averaged clear-sky radiance product to a spatial resolution of 10 arc min, although 1-km resolution data are available from AATSR. In this study, D3 retrievals have been used [derived using the European Space Agency (ESA) “prelaunch” operational coefficients], which are expected to be the most accurate retrievals because of the availability of the extra shortwave infrared channel. The data supplied from ESA were already cloud screened using the operational system based on Závody et al. (2000).

For climate purposes, we require the bulk temperature, the temperature of the ocean at around 1 m in depth, which provides a more comparable measurement when looking at other climate datasets, such as the Hadley Centre Sea Ice and Sea Surface Temperature Analysis (HadISST) (Rayner et al. 2003), than skin SST and is more representative of the overall heat capacity of the upper layers of the ocean (Harris et al. 1998). The AATSR skin SSTs used in this study have been processed to a pseudo–bulk SST using the Fairall model (Fairall et al. 1996). The Fairall model was an extension of work by Saunders (1967) to treat both shear and convectively driven turbulence together through their contribution to the near-surface turbulent kinetic energy dissipation rate, which directly affects the cool skin-layer thickness. Further information on applying a skin effect model to (A)ATSR skin SSTs can be found in Horrocks et al. (2003). Figure 1 shows an example of nighttime AATSR bulk (D3) SSTs for July 2003.

### b. In situ observations

In this study, in situ SST observations have been obtained by extracting global moored and drifting buoy SST measurements obtained via the Global Telecommunications System. The buoy SSTs are collocated to the AATSR 10 arc min cell within a 3-h time window to provide an AATSR–buoy matchup dataset. Figure 2 shows a time series of weekly mean AATSR SST–buoy SST differences for 2003. The AATSR skin and bulk SSTs used within this plot are a combination of day and night D2 and D3 retrievals and are not bias corrected. The time series shows that the weekly mean differences are well within 0.5 K of each other. An obvious anomaly, which occurs in November 2003 when the AATSR SSTs are much cooler than the buoy SST, is due to two matchups where the AATSR SSTs are around 1 K cooler than the buoy SSTs. These anomalies could be investigated in more detail using 1-km AATSR data, but this is beyond the scope of this paper. The time series shows that the AATSR bulk SSTs are warmer than the AATSR skin SSTs as expected. However, the AATSR skin SSTs are closer to the buoy SSTs than the AATSR bulk SSTs because of a 0.2-K warm bias in the AATSR skin SSTs. This warm bias, which is especially significant in the AATSR D3 retrievals, has been observed by analyzing the AATSR–buoy matchup dataset. The matchup time window is 1 h, and the buoy “bulk” SSTs have been converted to a buoy “skin” SST using the Fairall model during nighttime only to evaluate the biases on the D2 and D3 retrievals. The D3 biases are found to be ∼+0.21 K with respect to in situ observations, whereas the D2 biases are found to be ∼+0.05 K. More AATSR validation results against in situ SSTs and other SST datasets can be found in O’Carroll et al. (2006).

### c. Advanced Microwave Scanning Radiometer

The AMSR-E instrument is onboard the sun-synchronous *Aqua* satellite that was launched in May 2002. The instrument was provided to the National Aeronautics and Space Administration (NASA) by the National Space Development Agency of Japan (NASDA). Global and daily AMSR-E version 4 products of SST, wind speed, atmospheric water vapor, cloud water, and rain rate were obtained at 0.25° spatial resolution. Note that since the release of AMSR-E version 5 products, a subset of data were reprocessed. For experiment 1, the standard deviations were found to be within 0.03 K and the overall derived standard deviation of error was within 0.02 K of the version 4 products. The ability of AMSR-E to view the surface through nonprecipitating clouds is a major advantage over traditional cloud-free infrared measurements of SST. Details of the AMSR SST algorithm are given in Wentz and Meissner (2000). The global SSTs have been retrieved as daily averaged files from the Remote Sensing Systems (RSS) Web site (online at http://www.ssmi.com). The microwave-derived SSTs are representative of a depth of a few millimeters. RSS has applied a method for correcting a flaw in the AMSR-E calibration (more details from Wentz et al. 2003). It should be noted that RSS has applied bias corrections to the AMSR-E SSTs (additional information available online at http://www.ssmi.com), but the resultant SSTs remain representative of the same depth. Figure 3 shows an example of AMSR-E SSTs for July 2003.

## 3. Method of comparison

The AATSR bulk SSTs for 2003 used in this study have been previously validated against buoy SSTs (Watts et al. 2004). These results show that the mean AATSR D3 minus buoy SST is 0.22 K with a standard deviation of less than 0.3 K. Additionally, the results show that there is a difference of 0.14 K between D2 and D3 retrievals at night where the D3 SSTs are warmer. Other validation exercises of AATSR SSTs have been performed using ship radiometer data (summarized in Corlett et al. 2006), which confirm the warm bias in the AATSR SSTs.

A matchup database between AATSR SSTs and buoy SSTs was produced containing collocated observations. The matchup criteria were that the collocated observations occur within 3 h of each other and that the buoy SSTs were located within the AATSR 10 arc min grid. Throughout 2003, more than 16 000 day and nighttime matchups between AATSR SSTs and buoy SSTs were obtained. A map of their distribution is shown in Fig. 4.

These collocated observations were then compared to daily AMSR-E SSTs at 0.25° spatial resolution. The AATSR SST locations were then matched to the nearest AMSR-E SST cell within which the AATSR–buoy matchup would be located.

Before statistics on the differences were calculated between these three-way daily matchups, various quality-control processes were performed on the observations. Only nighttime AATSR observations were used to reduce the effect of diurnal warming on the observations. In addition, a thermocline flag contained within the AATSR–buoy matchup database was interrogated to see if a diurnal thermocline, created in scenarios of high insolation and low wind speed, was likely. This thermocline flag was created in the initial processing of the AATSR observations at the Met Office within which a diurnal thermocline model is run based on Kantha and Clayson (1994), although this procedure is only relevant during the day.

A buoy quality-control flag is contained in the AATSR–buoy matchup database, which indicates whether the buoy SST observation is deemed to be of high quality. The flag is set within the processing of the AATSR–buoy matchup database. Several processes affect whether the flag is set. First, a blacklist of known problem buoys is referenced. The list also contains details of inland buoys and lakes that are not used in the analysis. Second, during the processing of the AATSR–buoy matchups, a week’s worth of buoy SSTs is compared to the Met Office NWP global SST analysis and values of observation minus analysis are computed. Buoys that showed a mean weekly bias of more than 1.2 K or a standard deviation of greater than 0.6 K compared to the NWP SST were not used within the analysis. More information on the buoy quality-control procedure is given in O’Carroll et al. (2006).

The AATSR skin SSTs contained within the matchups are bias corrected before use. The bias correction is calculated by observing the difference between nighttime AATSR skin SSTs and the buoy SSTs, which have been converted to a buoy skin SST using the Fairall model (Horrocks et al. 2003). The time difference between the selected collocated observations is less than 1 h to increase the confidence of the quality of the matchup. The derived bias correction to be applied to the D3 AATSR skin SSTs is −0.21 K. The bulk SSTs are then computed from the bias-corrected skin SSTs.

A statistical analysis is performed on these quality-controlled AATSR–buoy–AMSR matchups. The differences between the collocated AATSR and buoy SSTs are computed and the mean difference and standard deviation of all the differences are assessed using a three-sigma standard deviation test to remove outliers. The same method is applied to derive the mean difference and standard deviation between the buoy and AMSR-E SSTs, and also the AATSR and AMSR-E SSTs.

Six different experiments were performed with slightly different areas and AATSR–buoy matchup criteria. The use of buoy types is defined in Table 1. When dividing the matchups spatially, the following two different global regions have been used: region 1 (0°–90°N, 0°–180°W) and region 2 (90°S–0°, 0°–180°). By selecting these different experiment types, we can observe whether the error analysis is consistent using these different regions and matchup criteria. In addition, the method of collocation was slightly altered for experiments 7 and 8, with the rest of the criteria for the two additional experiments remaining the same as experiment 1. For the main experiments (1–6), the AMSR-E observations are collocated to the nearest ⅙° grid box of the AATSR–buoy matchup. However, for experiments 7 and 8, four AMSR-E observations are taken surrounding the AATSR matchup cells and interpolated—first to the AATSR observation location (experiment 7) and second to the buoy observation location (experiment 8).

## 4. Theory of error analysis

In the appendix, we derive a set of simultaneous equations for estimating the error variances *σ*^{2}* _{i}* for observation type

*i*(where

*i*= 1, 2, or 3) for an ensemble of collocations of observation triplets:

where *V _{ij}* is the variance of the difference between observation types

*i*and

*j*, and

*r*is the correlation of error between observation types

_{ij}*i*and

*j*.

If the errors in the three observation types are uncorrelated, then *r _{ij}* = 0 for all

*i*≠

*j*, and so Eq. (1) becomes

For Eq. (2) to be a reasonable approximation to Eq. (1), we require that the covariances of error between the observation types are small relative to the error variances. The validity of this approximation is not obvious, for the reasons discussed in the appendix. We will proceed in section 5 by tentatively making this assumption, and we will discuss its implications in section 6.

## 5. Error analysis

The statistics of differences between AATSR, buoy, and AMSR-E SSTs are presented in Table 2 and for the experiments 1 to 8 as described in Table 1. The results show that the mean AATSR bulk (D3) SSTs minus buoy SSTs are close to zero for all experiments. This is expected for the mean difference for this study because the AATSR SSTs have been bias corrected with respect to buoy SSTs. The standard deviation of their differences is also very low at 0.27–0.30 K.

The AMSR-E SSTs are 0.02 K cooler than the AATSR bulk (D3) SSTs and the buoy SSTs for experiment 1, and similar results are obtained for the remaining experiments, except for experiment 5. AMSR-E measures a subsurface SST, so it is expected that AMSR-E SSTs should be cooler than the bulk SST measurements. Both measurement types have been bias corrected, leading to a near-zero difference.

By inserting the standard deviations presented in Table 2 into Eqs. (2), we can calculate the standard deviation of error on each observation type for each different experiment. These errors are presented in Table 3. For experiment 1, AATSR bulk (D3) SSTs have errors of ∼0.16 K, the buoy SSTs have errors of ∼0.23 K, and the AMSR-E SSTs have errors of ∼0.42 K. All of the experiments have errors that are broadly consistent with each other, suggesting that the assumption that all of the errors are uncorrelated is quite good. The smallest error on the AATSR SSTs is estimated for experiment 2 at 0.12 K and increases to its maximum of 0.16 K for experiments 1, 6, and 8.

We have neglected terms involving the correlation of errors between different observations in Eq. (2), which is discussed further in the appendix. We need to assess whether this has an impact on the sizes of derived errors. For example, if we define the truth on the scale of the buoy observations, then there is no representativity error for buoys and so *r*_{12} = *r*_{23} = 0, in which the subscripts refer to 1 = AATSR, 2 = buoys, and 3 = AMSR-E. While examining Eq. (1) and assuming that *r*_{12} = *r*_{23} = 0, we can argue that *r*_{31} must be positive. Therefore, the *σ*^{2}_{1} will be underestimated but *σ*^{2}_{3} will be underestimated by the same amount. To further explain the positive nature of *r*_{31}, consider that if the average SST in the area around the buoy differs from the SST at the buoy, then AATSR and AMSR-E, which both sense some sort of areal average, will both sense this. Therefore, their differences from the “truth” (when defined as buoy location value) will tend to be positively correlated. In addition, the same effect will tend to occur when we consider the difference in depth of the AATSR and AMSR-E observations compared to buoy SSTs and that it may be even larger.

This analysis shows that we need to be confident that the assumption whereby the potential covariances of error between the observations is negligible for the results to stand. To test this assumption, we can analyze the error of representativeness by changing the method of collocation between the observation pairings. The collocation has been done by the two similar methods explained in section 3. Compiling the same difference statistics as for the other experiments gives an error of 0.15 K on the AATSR observations for experiment 7 and 0.16 K for experiment 8, which are close to the error of 0.16 K for experiment 1. Additionally, the derived errors of buoy SSTs and AMSR-E SSTs for experiments 7 and 8 support those derived from experiment 1. These results indicate that the error of representativity is small, and it is a reasonable assumption to ignore the covariances of errors between the observations when deriving Eq. (2).

Global mean difference maps of nighttime AATSR bulk D3 SSTs minus nighttime AMSR-E SSTs have been studied to look at the global variation in differences. Figure 5 shows a global map for July 2003. For most of the globe, the differences are within ±0.5 K. However, at around 45°N, AATSR is cooler than AMSR-E by up to 2 K and at around 45°S, AATSR is warmer than AMSR-E by up to 2 K. The observed pattern is seen for other months throughout the year and is currently unexplained. However, the pattern is thought to be an artifact of the AMSR-E SSTs because AATSR comparisons against other SST datasets (O’Carroll et al. 2006; Watts et al. 2004) do not show this pattern.

The largest differences at around ±45° are also in the regions of the highest global wind speeds, most significantly in the southern oceans (see Fig. 6). Perhaps sea state is affecting the AMSR-E SST retrievals. Figure 7 shows the global AATSR minus AMSR-E difference versus binned wind speed (from the Met Office NWP model) for July 2003. The graph shows that at lower wind speeds, a larger difference between AATSR and AMSR-E SSTs occurs, with AATSR being the warmer. This relationship does not relate to the problem of larger differences between AATSR and AMSR-E at higher latitudes. Warmer AATSR SSTs at lower wind speeds could be due to diurnal warming effects of the sea surface. However, because only nighttime data are used in this analysis, and AATSR bulk SSTs are discarded if they are thought to be affected by diurnal warming effects according to Kantha and Clayson (1994), other effects must be present. Figure 7 also shows the results for the tropics (±30° latitude). The global mean difference between nighttime bias-corrected AATSR bulk D3 SSTs and nighttime AMSR-E SSTs for July 2003 is 0.03 K, with a standard deviation of 0.47 K. Because AMSR-E is measuring a subskin SST, that is, their SSTs will be cooler than the AATSR bulk SSTs during nighttime, the global results presented in Fig. 7 are consistent with the results of skin–bulk SST difference versus wind speed in Murray et al. (2000).

## 6. Conclusions

Overall, the error on nighttime AATSR bulk (D3) SST observations for 2003 was evaluated at 0.16 K, whereas the error on the collocated buoy SSTs was 0.23 and 0.42 K for AMSR-E SSTs. Varying the collocation criteria while analyzing the observation errors produce similar values of error throughout for each observation type, giving confidence both in the results and that the assumption that errors are not correlated is valid (specifically for representativity errors). There is good reason to assume that the complete (A)ATSR time series has a similar accuracy, with the exception of the ATSR-1 data after the loss of the 3.7-*μ*m channel.

The error analysis shows how the AATSR SST retrievals are of the highest accuracy, followed by the in situ SST observations and then the AMSR-E SST observations. However, the AMSR-E SST observations do have the advantage of being able to observe the surface through cloud, giving better coverage. Because most centers are aiming toward compiling SST analyses using both infrared and microwave SST as well as in situ sources, it is essential to gain knowledge of the error characteristics of each measurement type.

## Acknowledgments

This work was carried out in support of the Climate Prediction Programme, funded by the U.K. Department for Environment, Food and Rural Affairs (DEFRA). AATSR data are provided by the European Space Agency (ESA; online at http://www.envisat.esa.int). AMSR-E data are produced by Remote Sensing Systems and sponsored by the NASA Earth Science REASoN DISCOVER Project and the AMSR-E Science Team (data are available online at http://www.remss.com).

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

### APPENDIX

#### Error Analysis for Three-Way Collocation Statistics

##### Basic theory

In this section, we develop the theory relevant to the analysis of statistics of differences between collocated observations of three different types.

The error in observation *x _{i}*, of type

*i*, can be expressed as

where *x _{T}*, is the true value of variable

*x*,

*b*is the bias (mean error) in the observation, and ɛ

_{i}*, is the random error in the observation (which, by definition, has zero mean but may be non-Gaussian).*

_{i}We shall return below to the nontrivial question of what we mean by the “true value of the variable.” For the present, we will only assume that we have a consistent definition of the true value for comparison with each observation.

For a set of three collocated observations, of types *i* = 1, 2, and 3, we can write the following corresponding set of equations for their errors:

Now consider each set of three observations as three sets of pairs. The difference between observations *i* and *j* is given by

For an ensemble of such sets of observations, the mean difference between observations of type *i* and *j* is

and the variance of the difference between these two observation types is

Therefore,

where *σ*^{2}* _{i}* is the variance of the error in observation type

*i*, and

*r*is the correlation of error between types

_{ij}*i*and

*j*.

Stating Eq. (A6) explicitly for the following three sets of observation pairs:

The three simultaneous Eqs. (A7) can be solved to give the variance of error in each observation type as follows:

If the errors in the three observation types are uncorrelated, then *r _{ij}* = 0 for all

*i*≠

*j*. Therefore, Eq. (A8) becomes

This allows us to estimate the error variance in the three different observations types from the observation difference statistics. However, the validity and accuracy of this method depends crucially on the assumption of the independence of errors, which is examined further in the following section.

##### Some comments on the concept of the “true value” of a variable and the correlation of observation error

Equation (A9) is potentially very powerful because it offers a method of separating out the (usually elusive) error variances of different observations types, given statistics of the observation differences, which are easily obtained from samples of real data. However, the equation is also problematic because it suggests that this separation can be done without carefully defining what we mean by error in each observation, that is, what we mean by the true value of the variable. If, as in this paper, we are concerned with observations of a geophysical variable made by three different measurement techniques, then we might expect that our estimate of the error variance for observations made using any of these techniques should be dependent on how we define the true value, for example, whether it is a point value or an areal average. The true value must be of some variable (e.g., the temperature either at some level or averaged over some layer) at some space–time sampling. Therefore, the discussion applies equally to differences in the measurement type as it does to differences in space–time sampling. However, Eq. (A9) suggests that we do not have to define the true value.

The resolution of this paradox lies in the step from Eqs. (A8) to (A9), where we neglect the potential covariances of error between the observations. For the measurement errors themselves, it is usually reasonable to assume that the errors in measurements made by totally independent techniques will be truly independent. However, this is not the only source of error. We must also consider the “error of representativeness,” which captures the difference between the value of the variable on the space–time scale on which it is actually measured and its value on the space–time scale on which we wish to analyze it. For a single observation, the latter scale can be chosen arbitrarily, but whenever we wish to compare two observations, it must be taken into account.

As we change our definition of the true value of the variable, we change the value of the error variance of the observation. However, we also change the values of the error covariances between observations; thus Eq. (A8) continues to hold whatever our definition of the true value might be.

Therefore, Eq. (A8) will always be valid, but Eq. (A9) will only be a reasonable approximation to it in certain circumstances. Our ability to step between Eqs. (A8) and (A9) assumes that we are using a space–time scale for the “true value”; which, although we may avoid defining it precisely, is such that the covariances of the errors of representativeness on this scale are negligible when compared with the error variances. In this paper, we proceed tentatively on the assumption that these covariances are negligible, but we also make analyses to determine if this assumption is valid.

## Footnotes

*Corresponding author address:* Anne G. O’Carroll, Met Office, Fitzroy Road, Exeter, Devon EX1 3PB, United Kingdom. Email: anne.ocarroll@metoffice.gov.uk