The National Environmental Satellite, Data, and Information Service (NESDIS) has been operationally generating sea surface temperature (SST) products (TS) from the Advanced Very High Resolution Radiometers (AVHRR) onboard NOAA and MetOp-A satellites since the early 1980s. Customarily, TS are validated against in situ SSTs. However, in situ data are sparse and are not available globally in near–real time (NRT). This study describes a complementary SST Quality Monitor (SQUAM), which employs global level 4 (L4) SST fields as a reference standard (TR) and performs statistical analyses of the differences ΔTS = TS − TR. The results are posted online in NRT. The TS data that are analyzed are the heritage National Environmental Satellite, Data, and Information Service (NESDIS) SST products from NOAA-16, -17, -18, and -19 and MetOp-A from 2001 to the present. The TR fields include daily Reynolds, real-time global (RTG), Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA), and Ocean Data Analysis System for Marine Environment and Security for the European Area (MERSEA) (ODYSSEA) analyses. Using multiple fields facilitates the distinguishing of artifacts in satellite SSTs from those in the L4 products. Global distributions of ΔTS are mapped and their histograms are analyzed for proximity to Gaussian shape. Outliers are handled using robust statistics, and the Gaussian parameters are trended in time to monitor SST products for stability and consistency. Additional TS checks are performed to identify retrieval artifacts by plotting ΔTS versus observational parameters. Cross-platform TS biases are evaluated using double differences, and cross-L4 TR differences are assessed using Hovmöller diagrams. SQUAM results compare well with the customary in situ validation. All satellite products show a high degree of self- and cross-platform consistency, except for NOAA-16, which has flown close to the terminator in recent years and whose AVHRR is unstable.
Sea surface temperature (SST) products have been operationally derived at National Environmental Satellite, Data, and Information Service (NESDIS) from the Advanced Very High Resolution Radiometers (AVHRR) since the early 1980s, employing regression-based multichannel SST (MCSST) and nonlinear SST (NLSST) techniques (McClain et al. 1985; Walton 1988). Satellite SSTs are best validated against in situ radiometers, which also measure skin SST (e.g., Suarez et al. 1997; Donlon et al. 1998; Kearns et al. 2000; Minnett et al. 2001; Noyes et al. 2006). However, individual sea campaigns that collect in situ radiometry data are limited in space and time and their cost is prohibitive, and the long-term routine deployment of radiometers at sea still remains difficult (Donlon et al. 1998, 2002). Therefore, satellite SSTs are customarily validated against in situ SSTs from fixed and drifting buoys (e.g., Walton et al. 1998; Kilpatrick et al. 2001; Brisson et al. 2002; Dong et al. 2006; O’Carroll et al. 2006a,b; Haines et al. 2007; Lazarus et al. 2007; Merchant et al. 2008). However, the global distribution of buoys is sparse and nonuniform in space and time (cf. Garraffo et al. 2001). Furthermore, they originate from different countries and agencies, which use various measurement protocols, thus rendering their quality nonuniform (cf. Emery et al. 2001). Moreover, attaining reliable validation statistics with in situ data typically requires up to a month, still leaving large geographical areas underrepresented.
This study explores an alternative approach for the near-real-time (NRT) monitoring of satellite SST products called the SST Quality Monitor (SQUAM). SQUAM is based on statistical self- and cross-consistency checks that are applied to differences between satellite SST TS and global reference SST fields TR [level 4 (L4) products], ΔTS = TS − TR (Ignatov et al. 2004; Dash et al. 2009). Several different reference fields may be used, from an optimally interpolated blended satellite–in situ analysis (e.g., Reynolds et al. 2002, 2007; Gemmill et al. 2007; Stark et al. 2007, 2008) to single and/or multiple satellite SST analyses, or even a climatological SST (e.g., Bauer and Robinson 1985; Casey and Cornillon 1999). The underlying assumption is that the probability density function of global ΔTS is close to a Gaussian shape (although the distributions of both TS and TR are highly asymmetric). Statistical moments of a Gaussian distribution can thus be used to quality control (QC) the satellite SSTs and monitor them for stability and cross-platform consistency in NRT.
The major premises of the SQUAM approach are that global reference fields cover the world oceans much more fully and uniformly, and that the quality of such “sea truth” is also comparatively more uniform in space and time than that of in situ SST. This is because multiple satellite SST data, used in the production of L4 products, have already undergone extensive QC and have been bias adjusted to match in situ SSTs, which were also quality controlled prior to blending (cf. Reynolds et al. 2007). As a result, the number of “matchups” with L4 fields is more than two orders of magnitude larger, and their geographical coverage and quality are much more uniform than (and yet anchored to) the in situ SSTs. This provides a synoptic global snapshot of satellite SST performance (global maps, histograms, and dependencies of ΔTS), and allows monitoring of the ΔTS global statistics on fine time scales approaching NRT.
Ideally, an L4 product should optimally blend multiple satellite and in situ SSTs into a “true” SST. However, in reality most global SST analyses produced today use AVHRR data; one might therefore question whether comparison against these L4 products provides an independent assessment of the AVHRR SST. To explore sensitivity to the TR field, SQUAM employs several global L4 SSTs, including Reynolds, real-time global (RTG), Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA), and Ocean Data Analysis System for Marine Environment and Security for the European Area (MERSEA) (ODYSSEA). These products are produced by different teams using various blending and optimal interpolation (OI) methods, and with different combinations of satellite (polar and geostationary, and infrared and microwave) and in situ SST data as input. In particular, different L4s use different AVHRR SST products derived from different National Oceanic and Atmospheric Administration (NOAA) and Meteorological Operation (MetOp) platforms, and using different cloud screening and SST algorithms (cf. May et al. 1998; Kilpatrick et al. 2001; Le Borgne et al. 2007; Gemmill et al. 2007).
Currently, SQUAM evaluates NESDIS operational heritage SST products from five AVHRR/3 sensors onboard NOAA-16 (27 February 2001–present), -17 (15 April 2003–present), -18 (16 August 2005–present), and -19 (25 May 2009–present), and MetOp-A (19 September 2007–present). (Note that NESDIS heritage SST products are not input to any current L4 product employed in SQUAM.) SQUAM functions have been automated and operate with minimum manual intervention. Global processing is performed and statistics are posted to a dedicated Web site (available online at http://www.star.nesdis.noaa.gov/sod/sst/squam) within ∼24 h of availability of Main Unit Task (MUT) and L4 products. The main purposes of SQUAM are to identify, in NRT, sensor and algorithm malfunctions, assess cross-platform consistency of products, diagnose artificial dependences, and generate global SST difference maps for highlighting residual clouds.
The paper is organized as follows: Sections 2 and 3 describe the NESDIS heritage MUT SST data and the reference SST fields used. Section 3 also shows a brief comparison between different TR fields. Section 4 details the global QC concept, with emphasis on handling outliers. Monitoring of SST for stability and self- and cross-platform consistency is described in section 5. Section 6 concludes the paper and provides an outlook for the future.
2. NESDIS MUT heritage SST product
a. MUT SST observation data
The NESDIS heritage MUT system (McClain et al. 1985; Walton 1988; McClain 1989) has been in operational use since the 1980s. It first ingests AVHRR 4-km global area coverage (GAC) level 1b data (Goodrum et al. 2003), performs navigation and calibration of raw counts to radiances, and converts top-of-atmosphere radiances to brightness temperatures (BT). Further processing is performed on 2 × 2 GAC pixel arrays (referred to as unit arrays), resulting in an effective spatial resolution of ∼8 km, and the retrievals are restricted to a ±53° view zenith angle. The MUT does not attempt to process every 8-km unit array because of processing constraints because the heritage software was designed in an era when processing every pixel was not feasible. In the nighttime algorithm, only one unit array is chosen from a larger 11 × 11 target array sized to map to the ∼20-km High Resolution Infrared Sounder (HIRS) footprint. This unit array is chosen such that one of the four GAC pixels in the unit array is the warmest pixel in the target (i.e., it has the maximum BT in AVHRR channel 4). The warmest pixel is chosen to maximize the chance of a cloud-free retrieval. In the daytime algorithm, retrieval density varies according to an input table that specifies, for each 10° latitude × 10° longitude box, how many unit arrays it attempts to process within each 11 × 11 target array. SSTs are calculated by applying the regression equations listed in Table 1 using the unit array’s averaged brightness temperatures. Retrievals are saved in platform-specific rotating SST observation (SSTOBS) files, which are archived once a week. At any point in time, SSTOBS files contain ∼8 days of global data. Along with SST, they also report time, location, view zenith angle (VZA), solar zenith angle (SZA), a day–night flag (based on the threshold of SZA = 75°), relative azimuth angle, reflectances in visible channels and BTs in thermal infrared channels, and the nearest-100-km analyzed field SST derived from the last 24 h of satellite SSTs. More details about MUT products are found in Ignatov et al. (2004). These archived SSTOBS “weekly” files are analyzed in this study.
Figure 1 (top panels) shows time series of the nighttime and daytime number of observations (NOBS) from all five platforms. Initially, more AVHRR pixels are classified as “nighttime” by the MUT system, because of the use of the SZA = 75° day–night threshold (which also contains the twilight zone). However, because of a much heavier subsampling in MUT at night, each weekly SSTOBS file contains only 400 000–500 000 nighttime NOBS, whereas during the daytime, NOBS range from 400 000 to 800 000. (Note that these numbers are somewhat “inflated” because each “weekly” SSTOBS file actually contains ∼8 days of data, with 1 day of overlap. No attempt was made in this study to remove these overlapping days, and thus each point in the time series plots corresponds to one entire 8-day SSTOBS file.) Examples of global nighttime and daytime SST maps from NOAA-18 are shown in Fig. 1 (bottom panels). At night, the spatial coverage is globally more uniform and complete, despite the generally smaller NOBS. Another observation is that the high latitudes are predominantly observed at night, because of the low sun in these areas.
The nighttime NOBS are relatively stable in time, whereas daytime NOBS show large variations partly resulting from the continuous monthly updates of the AVHRR calibration in the visible bands and periodic revisions of the associated cloud thresholds. Also during daytime, the MUT retrieval density can be increased regionally, using a table as previously noted, based on users’ requirements. These changes can be further modulated by the seasonal variations in illumination, sun glint, and clouds in the high-density areas.
Note that the NOAA-16 processing in MUT was suspended in mid-2005, shortly after NOAA-18 became operational, but was then resumed to facilitate multisensor consistency analyses and to better quantify the effects of NOAA-16 orbital evolution and sensor anomalies on the SST product. Also, note that in late 2008, NOAA-16 daytime NOBS decreased and nighttime NOBS increased, resulting from orbital drift. The operational production from NOAA-17 was succeeded by MetOp-A in April 2007 and from NOAA-18 by NOAA-19 in June 2009. Nevertheless, the processing of NOAA-17 and -18 continues, and their SST products are analyzed in SQUAM. The monitoring of SST from multiple platforms in SQUAM enables cross-platform analyses, and may facilitate their future potential blending for improved SST products.
b. Modifications to the original MUT SSTOBS data for SQUAM analyses
SST values are saved in the original heritage MUT SSTOBS files to only one decimal place, whereas VZA and all BTs are available to two decimal places. Also, in the original SSTOBS files, only one climatological reference SST field (Bauer and Robinson 1985) is available. Prior to the SQUAM analyses, two modifications are made to the original SSTOBS data. First, the SSTs are recalculated, using the corresponding regression equations given in Table 1, and stored with two decimal places; second, seven other reference SST fields listed in section 3 are appended.
3. Global reference SST fields used in SQUAM
This section summarizes seven SST analyses fields appended to MUT data. The Group for High Resolution SST Pilot Project (GHRSST-PP; see Donlon et al. 2007; information online at http://www.ghrsst.org) established a concerted effort toward the generation and reconciliation of high-quality L4 SST fields. The L4 products employed in SQUAM either have been developed within the GHRSST framework or comply with its standards and specifications. The Reynolds and RTG SSTs are normalized to in situ SST and therefore are considered bulk SSTs. The OSTIA product is referred to as a “foundation SST” (Donlon et al. 2007). It minimizes the effect of the diurnal thermocline by using only nighttime satellite data, and daytime data with wind speeds above 6 m s−1. An empirical correction of 0.17 K is applied to convert satellite skin SST to the foundation. ODYSSEA is solely based on nighttime satellite SSTs (not in situ), which are subsequently corrected by 0.17 K, similar to OSTIA, and it is termed a “subskin” product. The input data to all L4 products are listed in Table 2.
a. Reynolds optimal interpolation SSTs
The WOI SST 1° (180 × 360 grid) data are available for the time period from 1981 to the present in a binary format (online at ftp://ftp.emc.ncep.noaa.gov/cmb/sst/oisst_v2). The data are centered at the middle of the week (Wednesday).
Two daily 0.25° (720 × 1440 grid) OISST products are also available online (see ftp://eclipse.ncdc.noaa.gov/pub). One is a blend between AVHRR and in situ SSTs (DOI_AV), and the other additionally uses the Advanced Microwave Scanning Radiometer for Earth Observing System (EOS) (AMSR-E) SST data (DOI_AA). Both DOI_AV and DOI_AA SST data are reported in several formats, including the network Common Data Form (netCDF). All Reynolds products used Pathfinder SST (Kilpatrick et al. 2001) from January 1985 to December 2005, and operational Naval Oceanographic Office (NAVOCEANO) AVHRR SST (May et al. 1998) from January 2006 onward. The DOI_AA SST has been available since June 2002, after AMSR-E data became available from the Aqua satellite. The main benefit of using AMSR-E data is its near-all-weather SST coverage. The DOI switched from version 1 to version 2 on 6 January 2009 (information online at http://www.ncdc.noaa.gov/oa/climate/research/sst/oi-daily.php), and so did the NRT SQUAM analyses.
b. Real-time global daily SSTs
Two RTG SSTs (online at http://polar.ncep.noaa.gov/sst) are available: one at a relatively low resolution of 0.5° [RTG_LR; available on a 360 × 720 grid (see Thiébaux et al. 2003)] and the other at a higher resolution of [RTG_HR; on a 2160 × 4320 grid (see Gemmill et al. 2007)]. The RTG_LR is available from 30 January 2001 to the present (online at ftp://polar.ncep.noaa.gov/pub/history/sst). It uses the same input data as DOI_AV (i.e., operational NAVOCEANO AVHRR and in situ SSTs) but processes them differently. The RTG_HR became operational on 27 September 2005, and the product is available (online at ftp://polar.ncep.noaa.gov/pub/history/sst/ophi) from April 2007 to present. The SST input to the RTG_HR is based on a new physical retrieval system developed at the Joint Center for Satellite Data Assimilation (Gemmill et al. 2007).
c. Operational SST and sea ice analysis
The OSTIA product was developed at the Met Office in response to the requirements of the GHRSST (Stark et al. 2007, 2008). The data are generated daily at a 0.05° (3600 × 7200 grid) spatial resolution and made available in netCDF format (online at ftp://podaac.jpl.nasa.gov/GHRSST/data/L4/GLOB/UKMO/OSTIA). The data are available free of charge for noncommercial purposes. However, users are required to obtain a license agreement (http://www.metoffice.gov.uk/corporate/legal/data_lic_form.html). The dataset is available from 1 April 2006 to the present.
d. Ocean data analysis system for MERSEA
The ODYSSEA SST is a daily 0.1° resolution (1600 × 3600 grid; from −80° to 80° latitude) product. It was developed in the framework of the MERSEA project (information online at http://www.mersea.eu.org) and complies with the GHRSST standards (Autret and Piollé 2007). The data in netCDF format are available (online at ftp://ftp.ifremer.fr/ifremer/medspiration/data/l4hrsstfnd/eurdac/glob/odyssea) from October 2007 to the present.
e. Comparisons of global L4 SST fields
Although the primary objective of the SQUAM validation tool is to monitor satellite SSTs, diagnostics of the L4 fields are provided as well. Figure 2 shows example time series plots of global mean differences and standard deviations in several TR fields with respect to (w.r.t.) RTG_LR.
The two daily Reynolds products (DOI_AV and DOI_AA) appear mutually consistent. Their mean differences and standard deviations w.r.t. RTG_LR show a clear seasonal cycle, with amplitudes from −0.1 to +0.2 and from 0.5 to 0.9 K, respectively. The OSTIA and ODYSSEA differences w.r.t. RTG_LR also show seasonality, although the corresponding mean differences and standard deviations are somewhat smaller. Shortly after its inception in February 2006, OSTIA had a cold mean difference of −0.2 K, which reduced to −0.1 K later in 2006 but then shortly spiked again to −0.2 K in early 2007. After this initial period, the OSTIA product has been fairly consistent with DOI. ODYSSEA SST shows some short-term spikes up to ∼+0.2 K in early 2008 and 2009, and occasional data gaps (e.g., 20–21 August 2008).
To better understand the observed L4 differences, the comparisons were further stratified into those with TR ≥ 0°C (Fig. 2, middle panels) and TR < 0°C (Fig. 2, bottom panels). These analyses suggest that the major differences between L4 products take place in the high latitudes (Fig. 2, bottom panels), likely resulting from the different treatment of the marginal ice zone (e.g., Reynolds et al. 2007).
Although some products show a global mean difference to be close to zero, their large standard deviation suggests some significant regional differences. Figure 3 shows two examples of Hovmöller diagrams (the latitude–time evolution of TR minus DOI_AV) to understand the zonal differences.
In midlatitudes, the DOI_AV, OSTIA, and RTG_LR match each other quite well. However, much larger and seasonal mean differences are found in the higher latitudes.
Analyses in this subsection suggest that more work is needed to reconcile different L4 products, especially in the high latitudes. In the remainder of this paper, only three reference SSTs are used (DOI_AV, RTG_LR, and OSTIA). More intercomparisons between daily L4 fields are available (online at http://www.star.nesdis.noaa.gov/sod/sst/squam/L4/).
f. Matchup of MUT SST with L4 fields
Satellite SSTs are matched to the reference SST datasets using the nearest-neighbor approach and no interpolation in space and time is attempted. All TR fields provide near-global and almost-gap-free coverage, so that there are only a few MUT SST retrievals found outside the domains covered by these fields. The MUT SSTs without corresponding reference SSTs are excluded from the SQUAM analyses.
4. Global quality control and handling outliers in SQUAM
Figure 4 shows examples of nighttime and daytime maps of ΔTS for MetOp-A and OSTIA SSTs. Generally, ΔTS is close to zero. However, in some pixels, TS is either too warm or too cold relative to TR, suggesting that these points are likely outliers.
The distribution of ΔTS can be significantly distorted by outliers. The outliers may be due to “contaminant” points in TS, TR, or both, or they may be caused by “discordant” data points resulting from, for example, TS–TR space–time mismatch in areas of high SST gradients (e.g., at the boundaries of oceanic currents, upwellings, etc.). If the objective is to provide a high-quality satellite SST product, then only contaminant TSs should be excluded and discordant TSs retained to preserve SST information in the dynamic oceanic areas. If the objective is to routinely monitor the global performance of SST products, which is the subject of SQUAM analyses, then both contaminant and discordant observations are to be excluded.
Customarily, outliers in data are handled using one of the two principal approaches: “identification” or “accommodation” (e.g., Tietjen 1986). Identification involves labeling and removing outliers from the data, whereas accommodation belongs in the area of robust estimation. In SQUAM, these two approaches are used in concert, in order to most effectively handle outliers in the data.
A common identification approach is removing data points beyond a confidence interval based on three or four standard deviations about the mean (cf. Bevington and Robinson 1992). The exact number of standard deviations used in QC can be based on Chauvenet’s criterion, which links the probability to the sample size (cf. Bevington and Robinson 1992). This study employs a simpler approach based on using a fixed N = 4, irrespective of the sample size (e.g., Ostle and Malone 1988).
In reality, the conventional mean and standard deviation themselves are contaminated by outliers, rendering their use for identification progressively less effective as the fraction of outliers increases. To circumvent this problem, robust statistics [median and robust standard deviation (RSD)] are employed in SQUAM to construct the screening thresholds, that is, median ±4 × RSD (cf. Merchant and Harris 1999). The RSD of a distribution is given as IQR/S, where, IQR is the interquartile range (75th percentile–25th percentile, in an ordered dataset) and S is a scaling factor (1.348 for an ideal normal distribution).
a. Histograms of ΔTS
Figure 5 shows typical nighttime histograms of ΔTS for NOAA-18 against in situ (within 20 km × 1 h) and OSTIA data before and after removal of outliers. The equivalent number of matchups w.r.t. in situ data is ∼250 times smaller than those w.r.t. OSTIA (∼7000 per month and ∼450 000 per 8-day period, respectively). Also, the fraction of outliers is a factor of ∼2 higher than that w.r.t. OSTIA data, indicating that in situ SSTs themselves are strongly contaminated by bad data. Statistical parameters and a Gaussian fit X ∼ N(median, RSD) are also annotated in the histograms.
Prior to the removal of outliers (Fig. 5, left), min(ΔTS) and max(ΔTS) reach ∼±20°C for in situ data and ∼±10°C for OSTIA. The extreme ΔTS values (minimum and maximum) before removing the outliers are likely due to failed cloud detection and land–glint contamination (see section 4b for distribution of outliers). Mean and median estimates of the global average ΔTS distribution are close to each other, with a magnitude of only a few hundredths of a kelvin. However, the RSDs and conventional standard deviations differ significantly. For instance, the RSD = 0.25 K w.r.t. in situ data, corresponding to only ∼14% of the variance measured by the conventional standard deviation = 0.67 K. For OSTIA, the RSD is ∼0.29 K, compared with the standard deviation of ∼0.47 K. This is because the conventional standard deviation is artificially inflated by outliers. The conventional values of skewness (s of ∼2.28 for in situ and s of ∼2.62 for OSTIA) and kurtosis (k ∼ 252 and ∼39, respectively) indicate strong asymmetry and peakedness of the empirical histograms. (Note that no robust measures of the third and fourth moments are employed in SQUAM.)
After excluding outliers (Fig. 5, right), the robust statistics (median and RSD) remain practically unchanged, as expected. The conventional statistics do change, however, with the higher moments improving dramatically. In particular, the standard deviation is significantly reduced and becomes closer to the RSD, which changes only a little. The kurtosis (1.01 and 1.07 for in situ and OSTIA, respectively) becomes much more realistic and representative of the observed distribution. The min(ΔTS) and max(ΔTS) are now within ∼±1.2 K because data are not allowed to depart from the median by more than four RSDs (with typical RSD < ∼0.3 K).
To summarize analyses of histograms, the distributions of ΔTS are indeed close to Gaussian but contaminated by a small fraction of outliers. The global differences (mean and median) are close to zero and RSDs range from ∼0.3 to 0.5 K, which is quite close to the similar metric against in situ SST (cf. McClain et al. 1985; Walton et al. 1998; May et al. 1998). One thus concludes that validation against global reference fields can be successfully used to monitor satellite SST products globally and in NRT.
b. Distribution of outliers in space and time
Although outliers are generally considered a nuisance for validation purposes, their distribution in space and time may carry important information about their source and help identify potential areas for improvement in the satellite or reference SSTs. Figure 6 shows examples of global distributions of low (i.e., ΔTS < median − 4 × RSD) and high (i.e., ΔTS > median + 4 × RSD) outliers in the nighttime ΔTS for MetOp-A.
Reproducible low outliers (e.g., in the northern Pacific, “roaring forties,” off the East African coast, and southeast Arabian Sea) are predominantly associated with persistent cloud and aerosols, and suggest the need for improvements in satellite SST. On the other hand, the consistent pattern of prominent high outliers (especially in the high latitudes and in the Northern Hemisphere) may be due to a low bias in all L4 products, although high bias in AVHRR SST may not also be ruled out (cf. also comparisons between different L4s in section 3e, which show the highest uncertainties in the high latitudes). A reduced number of high outliers in the Arctic (above ∼65°N) in the DOI_AV SST, relative to RTG and OSTIA, may be due to their different processing of the sea ice boundary (e.g., Reynolds et al. 2007). Many reproducible distribution patterns, with high and low outliers closely interleaved, are found in the high-gradient regions (such as the Gulf Stream, Brazil Current, Mozambique and Agulhas Current to the south of Africa, and East Australia Current). Those are likely caused by mismatches between TS and TR, partly resulting from the inherent variability within a given TR grid and partly resulting from the different spatial resolutions of TS and TR, whose combined effects may become significant in highly dynamic oceanic areas.
Figure 7 shows a time series of nighttime outliers. Consistent with Fig. 5, the fraction of outliers against in situ data is a factor 2–3 larger compared to L4, which suggests a persistently strong contamination in the in situ SSTs. For all platforms, the rate of low outliers (likely indicating residual cloud in MUT nighttime SST data) is relatively flat in time and ranges from ∼0.5% to 1.0%. The right panels of Fig. 7 show corresponding time series of high outliers, which exhibit a strong annual cycle, with the maximum reaching ∼2.5% in July–August. This seasonality mainly comes from the high latitudes of the Northern Hemisphere, which are sampled by the polar-orbiting NOAA and MetOp platforms more frequently during the boreal summer (cf. Fig. 6). In December–January, the fraction of high outliers is reduced to ∼0.5%, consistent with the general level of low outliers. This seasonality suggests either consistent problems with L4 products in the ice-melting zone, or problems with AVHRR cloud screening and/or SST algorithms, or both. More analyses are needed to reconcile different satellite and L4 data in this complex area, which is generally lacking in situ data (cf. top panels of Figs. 6 and 7).
5. Monitoring satellite SST for stability and cross-platform consistency
In SQUAM, the statistical moments of the ΔTS distributions are monitored to assess satellite data for stability and cross-platform consistency. Following the discussions in section 4b, only outlier-free data are analyzed here.
a. Monitoring stability of satellite SST products
Figure 8 shows a time series of median ΔTS. Although the major trends are captured well in all time series, the L4 plots show more fine structure compared to in situ results, resulting from a much larger number of matchups supporting each data point, and higher temporal resolution (8 days instead of 1 month).
Two major types of anomalies are observed in the time series.
The first group is due to problems with satellite SST. For instance, the NOAA-17 and -18 ΔTSs track each other closely, whereas MetOp-A has been biased ∼0.1 K higher until mid-2009. NOAA-16 shows highly anomalous behavior, including two large dips in late 2006 and 2007, followed by a series of smaller dips in 2008 and 2009. Recall that NOAA-16 currently flies close to the terminator, and its AVHRR continuously experiences rapid changes in its thermal regime and its blackbody is subject to frequent solar impingement (cf. Cao et al. 2001). Additional offline analyses (not shown here) confirm that its calibration coefficients in all bands undergo cyclic changes. Work is underway to better understand and resolve this NOAA-16 anomaly. For the rest of this study, NOAA-16 data will be excluded from further analyses and discussion.
The second group of anomalies in Fig. 8 comes from the reference fields themselves. The degree and magnitude of these artifacts is L4 product specific. For instance, there are two “jumps” in the DOI_AV plots in 2004 and 2005, and one “jump” in the first half of 2007. These artifacts are also seen, although to a lesser extent, in the RTG_LR. Another example of a nonreproducible feature is observed in the OSTIA time series, which shows an elevated SST anomaly in 2006 and a spike in the first quarter of 2007. Also, different L4 products show a different degree of short-term “noise,” which is smaller in OSTIA and RTG_LR and larger in the DOI_AV time series.
Note that despite artifacts in individual L4 products, the time series of ΔTS from different platforms track each other very closely, suggesting that the selection of TR is not critical for monitoring cross-platform consistency of different TS products.
b. Using double differences to monitor satellite SST for cross-platform consistency
A more direct way to monitor TS for cross-platform consistency is based on using the double differencing (DD) technique (cf. Alber et al. 2000). The DD methodology has been employed in remote sensing for many applications, including transferring calibration from one satellite sensor to another via a third “transfer standard” sensor (cf. Wang and Wu 2008). For our analyses, the NOAA-17 ΔTS was selected as the respective “transfer standard” and subtracted from the corresponding ΔTSs for other platforms as follows: DD = (TS,SAT − TR) − (TS,N17 − TR) ≈ TS,SAT − TS,N17. Note that monitoring cross-platform consistency with direct differencing, that is, TS,SAT − TS,N17, is also possible, but only in the intersection subsample of the two satellites, and therefore it is more geographically nonuniform from one temporal snapshot to another. This issue is largely alleviated when the DD technique is used. The major premise is that the TR, which is subject to artifacts and irregularities, cancels out, and the DD thus provides a measure of average cross-platform consistency in a global domain.
Figure 9 shows time series of the DDs for several different TRs. The patterns are quite consistent for different L4s, suggesting that the respective DDs are largely insensitive to the selected TR. Based on the nighttime local overpass times for different satellites (∼2130 LT for MetOp-A, ∼2200 LT for NOAA-17, and ∼0200 LT for NOAA-18 and -19), one would expect the best consistency to be between NOAA-17 and MetOp-A, and between NOAA-18 and -19. Because all global L4 products currently do not resolve the diurnal cycle, the second cluster is expected to be several hundredths of a degree kelvin cooler than the first cluster, based on the expected diurnal cooling at night (cf. Stuart-Menteth et al. 2005; Gentemann et al. 2003; Kennedy et al. 2007). Indeed, NOAA-18 and -19 closely agree, but MetOp-A is biased high w.r.t. NOAA-17 by ∼+0.1 K, until about mid-2009. Note that these relationships are also seen in Fig. 8, but DDs provide a better way to quantify the cross-platform biases.
The DDs look different when in situ SST is used as TR (Fig. 9, top panel). Recall that in situ bulk SSTs account for the diurnal variation in SST, but only partially, because the diurnal cycle in bulk SST is always suppressed compared with skin SST. In the top panel of Fig. 9, it is again expected that NOAA-17 and MetOp-A form one cluster, and NOAA-18 and -19 form another (and colder) one. The data do follow this expected pattern but not fully.
c. Using DD to monitor satellite SST for day–night consistency
The DD technique can also be employed to quantify platform-specific day minus night (DN) SST biases. Recall also that in addition to the diurnal cycle in SST, artificial DN biases may occur because the regression coefficients in the daytime (NLSST) and nighttime (MCSST) algorithms are tuned independently against in situ SSTs, and the DN check is also useful to verify the relative consistency of these tunings. Figure 10 shows the global average of DN satellite SST biases calculated as DN = (TS,D − TR) − (TS,N − TR) ≈ TS,D − TS,N.
Should in situ SST fully account for the diurnal cycle in skin SST, then the time series on the top panel of Fig. 10 would have been flat and at 0 K. However, as discussed before, this accounting is only partial. As a result, all DNs show a small positive bias, with a clear seasonal cycle from 0 to 0.15 K. This cycle is caused by systematic changes in the skin–bulk difference, as affected by the solar insolation and wind speed and modulated by the changing global coverage.
Turning to the L4 results in Fig. 10, the shape of the corresponding DNs is largely insensitive to the reference SST. This is expected, because the current L4 products do not resolve the diurnal cycle, and therefore the global DN here captures the average differences between the satellite skin SSTs between the day and night satellite overpasses. For the two ∼(1000–2200) LT platforms, NOAA-17 and MetOp-A, the DNs range from 0 to 0.2 K and track each other closely. For the 0200–1400 LT platforms, the DNs change from 0.2 to 0.4 K. The DN for 0200–1400 LT is larger than that for ∼(1000–2200) LT platforms because the corresponding local overpass times are close to the diurnal minimum and maximum of SST (cf. Stuart-Menteth et al. 2005).
Work is underway to explore the potential of the DD technique to better quantify and minimize cross-platform and day–night biases.
d. Using higher moments for SST monitoring
For all TR fields, there is excellent cross-platform consistency. The nighttime RSDs w.r.t. in situ and OSTIA SSTs are ∼0.3 K, followed by RTG_LR and DOI_AV (<0.4 K). Note that there is a close proximity of the OSTIA and in situ SST validation results.
Nonuniformities in the time series are deemed to be due to changes in the quality of the reference fields themselves. For instance, the RSD w.r.t. to in situ SST has decreased from 0.4 to 0.3 K since 2003, likely resulting from the improved quality of in situ SST. The drop in DOI_AV RSD from >0.5 to <0.4 K on 1 January 2006 coincides with the switch in DOI production from Pathfinder to NAVOCEANO SST as the primary input (Reynolds et al. 2007). Similar nonuniformities (although of a somewhat smaller magnitude) are also observed in the RTG_LR time series in 2004 and 2005 for NOAA-17 and in late 2005 for NOAA-18. Some of these changes might have been caused by the incorporation in the RTG processing of the NOAA-17 and -18 data, respectively.
e. Additional self-consistency diagnostics
SQUAM additionally performs self-consistency checks of SST products by plotting global ΔTS as a function of relevant observational and geophysical variables, such as the VZA. A case study is shown in Fig. 12 where nighttime NOAA-17 and -18 ΔTS are plotted against VZA for the following two different periods: one before January 2006 and one in the beginning of January 2006. The dependence prior to January 2006 (Fig. 12, left panel) shows an artificial across-swath bias of >0.3 K. This bias was caused by a faulty assignment of VZA in MUT and was uncovered with analyses from an early prototype of SQUAM. Notice the reduction in dependence and improved symmetry after correction (Fig. 12, right panel).
Analyses similar to those shown in Fig. 12 are routinely performed in SQUAM to identify and remove any artificial dependencies. Such synoptic diagnostics can be reliably obtained in NRT only using a global field as the reference. Note that the selection of a particular reference field is not critical for these analyses. With in situ data, similar synoptic diagnostics can also be obtained but in significantly delayed and time-integrated mode (cf. Merchant et al. 2008).
6. Summary and future work
The Web-based SST quality monitor (SQUAM) is employed to continuously control the quality of NESDIS operational AVHRR SST products (TS). Similarly to the customary validation against in situ SST, SQUAM performs analyses of SST differences ΔTS = TS − TR, but calculated with respect to various L4 products, including Reynolds, RTG, OSTIA, and ODYSSEA. Processing is done automatically and results are posted online (http://www.star.nesdis.noaa.gov/sod/sst/squam) in near-real time (NRT).
The major trends and anomalies seen against in situ SSTs are also well captured against L4 fields. Because of its extensive validation statistics, SQUAM performs global quality control of satellite SSTs by checking ΔTS for proximity to a Gaussian shape and by handling outliers in NRT. Global maps, histograms, and dependencies plots of ΔTS are generated for synoptic assessment of satellite SST products, and moments of the ΔTS distributions are trended in time. Satellite SSTs are further monitored for cross-platform and day–night consistency using double differences (DD).
Testing NESDIS heritage AVHRR SSTs from NOAA-16, -17, -18, -19, and MetOp-A from 2001 to the present shows that, overall, the products are stable and cross-platform consistent. The initial warm bias in nighttime MetOp-A SSTs of +0.1 K, which was likely due to specifying suboptimal regression coefficients in its MCSST equation, has been greatly reduced in mid-2009. The NOAA-16 product shows a distinct out-of-family behavior, apparently resulting from unstable AVHRR calibration in recent years, likely caused by its near-terminator orbit. Improvements of NOAA-16 AVHRR calibration (cf. Trishchenko 2002; Mittaz et al. 2009) may be explored in future work. The remaining differences are largely attributed to different temporal sampling from different platforms and to the diurnal variability in the satellite SST, which is currently not resolved in the global L4 fields.
Using multiple TRs facilitates distinguishing artifacts in satellite SSTs from those in TR fields. In particular, all of the AVHRR products show widespread positive biases in the Arctic, suggesting that low biases are possible in all current L4 fields. Comparisons between different L4 fields are also performed in SQUAM. They show important differences, particularly in high latitudes, which presumably originate from different treatment of the sea ice marginal zone in different L4 analysis schemes. Some L4 products show various nonuniformities in time and a larger degree of day-to-day noise.
Identifying one “most suitable” L4 field would simplify SQUAM analyses. However, different TR fields emphasize different aspects of SST (bulk, foundation, and subskin), are available for different time periods, and have different spatial resolution, quality, and data stability. Validation statistics against some L4 fields (e.g., OSTIA) approach the biases and standard deviations measured against in situ data, while for others (RTG and Reynolds) the validation statistics are slightly degraded (larger). SQUAM analyses can contribute to the objective evaluation of different satellite and L4 SST products and facilitate their improvement, and possibly their convergence. In particular, it supplements a high-resolution diagnostic dataset (HR-DDS; http://www.hrdds.net) system, which at specified locations (not global) allows interactive analysis of several satellite, in situ, and model data, and a global and regional monitoring facility at the National Centre for Ocean Forecasting (NCOF; online at http://ghrsst-pp.metoffice.com/pages/latest_analysis/sst_monitor/). The SQUAM, HR-DDS, and NCOF tools can be used in concert for comprehensive intercomparison of global products. We also plan to explore the GHRSST ensemble of the standard L4 products in SQUAM (http://ghrsst-pp.metoffice.com/pages/latest_analysis/sst_monitor/daily/ens).
The near-term SQUAM objective will be working toward reconciliation of all AVHRR SST products from different platforms, during day and night, and establishing a consistent benchmark SST product. Two particular tasks that will be pursued toward this goal are modeling diurnal variability in SQUAM [e.g., implementing the model of Gentemann et al. (2003) or the climatological data of Kennedy et al. (2007)] and exploring improved AVHRR calibration.
Recently, NESDIS’s newly developed Advanced Clear Sky Processor for Oceans (ACSPO) and NAVOCEANO AVHRR GAC SST products were also included in the SQUAM processing. Analyses of the ACSPO SST products and establishment of reliable links with the heritage MUT SST products are underway. SQUAM will also be adapted to monitor other existing [such as the Meteosat Second Generation (MSG) Spinning Enhanced Visible and Infrared Imager (SEVIRI)] and future [such as MetOp-B and -C AVHRRs, National Polar-Orbiting Operational Environmental Satellite System (NPOESS) Visible Infrared Imager/Radiometer Suite (VIIRS), and Geostationary Operational Environmental Satellite (GOES-R) Advanced Baseline Imager (ABI)] sensors. The SQUAM will also be instrumental in the quality control of climate data records (cf. Vázquez-Cuervo et al. 2004) and in establishing links between the past, present, and next-generation SST products.
This work was supported by the Product System Development and Implementation program managed by the NESDIS Office of Systems Development, by the Internal Government Studies managed by the NPOESS Integrated Program Office, and by the GOES-R Algorithm Working Group. We thank members of the SST Team (F. Xu, X. Liang, B. Petrenko, N. Shabanov, J. Stroup, and D. Frey) for helpful discussions and constructive feedback. P. Dash acknowledges the CIRA visiting scientist fellowship. We would also like to thank the anonymous reviewers for their comments. The views, opinions, and findings contained in this report are those of the authors and should not be construed as an official NOAA or U.S. Government position, policy, or decision.
Corresponding author address: Prasanjit Dash, CSU/CIRA Research Scientist II, Room 601-1, 5200 Auth Rd., NOAA/NESDIS/STAR, WWB, Camp Springs, MD 20746. Email: email@example.com