## 1. Introduction

Sea surface temperatures (SSTs) are important to the study of the earth's climate. For this purpose it is important that the SST analyses be global and as consistent as possible for the historic record. This is difficult because of changes in the observing system. As discussed in many sources (e.g., Smith, et al. 1994) the historic distribution of in situ SST data from ships has varied with time due to a variety of economic and political changes (the opening of new canals, world wars, improved communication, etc.). In addition, biases in the ship in situ data have occurred as observational techniques have changed, and those biases must be corrected, as discussed in Folland and Parker (1995) and Smith and Reynolds (2002). Beginning in the 1970s, SST in situ data began to be available from drifting and moored buoys. Initially the number of observations from buoys was small, although the present number is comparable to the number of ship observations. Infrared (IR) satellite SST retrievals became available in November 1981, as discussed by Reynolds et al. (2002, henceforth REA), and are now supplemented with satellite SST microwave retrievals. The coverage from satellite data greatly improves SST coverage and now overwhelms the in situ data. This is an improvement for climate SSTs but may also be a problem when biases occur as discussed by Reynolds (1993) and by Reynolds et al. (2004).

Smith and Reynolds (2003, hereafter SR) introduced the Extended Reconstruction SST (ERSST) analysis, a global SST analysis that attempts to be consistent over time, here referred to as ERSST version 1 (ERSST.v1). This was done by using the REA analysis from the period with satellite data to develop global spatial correlation scales defined by a set of spatial modes. However, the ERSST.v1 analysis is always computed by a fit to in situ data, even in the period with satellite data. The U.K. Met Office computed their own analysis (Rayner et al. 2003) using a similar technique but using all available data, both in situ and satellite. Both of these analyses and others (e.g., Smith et al. 1996; REA) were used for intercomparisons. The results showed several shortcomings in the ERSST.v1 analysis. The purpose of this paper is to correct these shortcomings with an improved version of the analysis, ERSST version 2 (ERSST.v2). In the sections that follow we first briefly review the methods used to develop ERSST.v1 so that this paper can be more easily understood. We then discuss the problems in ERSST.v1. In the following sections, we develop the modifications used to create ERSST.v2 and show intercomparisons of ERSST.v2 and some of the other analyses mentioned above.

## 2. ERSST.v1

The ERSST.v1 analysis is produced from the latest version of the Comprehensive Ocean Atmosphere Data Set (COADS; Slutz et al. 1985; Woodruff et al. 1998). This version is called COADS release 2.0 and it is used for both ERSST.v1 and ERSST.v2. The analysis uses monthly and 2° spatial superobservations, which are defined as individual observations averaged onto our 2° grid. The superobservations are produced after data screening, or quality control (QC), which is needed to eliminate outliers. The superobservations are also corrected for historical biases before 1942 by the method described in Smith and Reynolds (2002). The combined satellite and in situ analysis of REA is used to develop spatially complete statistics for our reconstruction. We averaged the monthly 1982–2000 REA analysis to the same 2° grid that we use for the COADS data. In addition, we computed a SST climatology for the same period.

The ERSST.v1 analysis is performed separately for the low- and high-frequency components, which are then added together to form the total SST anomaly. The low- and high-frequency components are separated because the stationary statistics used for the high-frequency analysis are based on only 19 years of SST anomalies and, thus, may not adequately span interdecadal variations. The low-frequency analysis is computed by smoothing and filtering our QC superobservation anomalies over 10° spatial regions using 15 years of data, to generate one low-frequency analysis per year. This low-frequency anomaly is removed from the observed SST superobservations before the high-frequency analysis is computed.

The analysis of high-frequency anomalies uses our 2° version of the 1982–2000 REA analysis to define a set of anomaly increment modes, or spatial patterns. Increments are defined as the difference between the present analysis or superobservation and the previous month's analysis. The modes are computed using the method of empirical orthogonal teleconnections (EOTs; van den Dool et al. 2000), which are similar to empirical orthogonal functions. The EOTs used here are localized so that teleconnections of more than 5000 km are damped, and those of more than 8000 km are set to zero. Localization prevents the analysis of anomalies that are not locally supported by the observations. An objective method is used to select the set of EOT modes that are adequately sampled by the superobservations. For the selected modes, a weight for each mode is found by fitting the set of modes to the superobservations. Variance associated with the unselected modes is damped. The superobservations used to compute the set of weights have been adjusted in two ways: First, the low-frequency analysis has been subtracted from the observations. Second, the observations have been converted to data increments. The high-frequency component of ERSST.v1 is then computed from the weighted sum of the EOT modes. The complete analysis is the high-frequency analysis plus the low-frequency analysis.

For the satellite period the standard deviation of ERSST.v1 (*σ*_{Ev1}) is similar to that of the National Oceanic and Atmospheric Administration (NOAA) optimum interpolation (OI) data (*σ*_{NOI}) of REA in most regions. However, there are some regions where *σ*_{Ev1} is weaker than *σ*_{NOI}. These include the western tropical Pacific and Atlantic and the tropical Indian Ocean (Fig. 1). For comparison, the standard deviation from an in situ only OI analysis (*σ*_{IOI}) is also shown. The in situ OI analysis is produced using methods similar to the REA NOAA OI analysis, except that only in situ SST data are used and monthly anomalies are analyzed directly on the 2° grid. The 1982–97 period is fairly well sampled, and all three have a similar standard deviation in the Northern Hemisphere, where sampling is best. Both *σ*_{NOI} and *σ*_{IOI} are comparable in the tropics, but *σ*_{Ev1} is weaker than the other two in the western tropical Pacific, Atlantic, and Indian Oceans. Those regions have low variations in all three, but *σ*_{Ev1} is weakest, which may affect the monitoring or modeling some climate variations.

As discussed in the following section, the ERSST.v2 analysis method will be devised to correct the low tropical variance. In addition, other improvements will be added. The first is to add sea ice information to the analysis. This will improve the analysis at high latitudes where other in situ data are sparse. The second is an adjustment of the historical SST bias correction for the 1939–41 period, which affects results most strongly in 1941. The third is to add improved error statistics as well as a spatial gridded field of the monthly error. This will provide an objective way to determine how the analysis accuracy changes with time and space.

## 3. ERSST.v2

We now discuss the changes that have been made in the ERSST.v2 analysis: improved high-frequency analysis, use of a sea ice to SST conversion algorithm, adjustment of the bias correction, and improved error analysis.

### a. Improved high-frequency analysis

In many ways ERSST.v2 is nearly identical to ERSST.v1. They both separate the analysis into low- and high-frequency components, which are summed for the total analysis. Both use the same COADS SST anomaly superobservations and the same low-frequency anomaly analysis. The difference between the two versions is the high-frequency analysis.

In ERSST.v1 anomaly increments were analyzed using anomaly increment modes. There are some advantages to analyzing increments with increment modes. Increments are designed to include temporal information, and fewer increment modes may be needed provided that they can be accurately defined (Thiébaux 1997). However, month-to-month anomaly increments generally have a much weaker signal than the anomalies themselves. In addition, the modes used for analysis are computed from anomalies containing some random error, and the difference of two such anomalies can have twice the random error. Thus, the anomaly increments used for computing modes will tend to have a lower signal/noise ratio than the anomalies alone. In regions where the anomaly changes slowly this can make it difficult to accurately compute increment modes. This explains why ERSST.v1 had difficulty analyzing the western tropical Pacific, where month-to-month variations tend to be small.

*M*spatial modes except that ERSST.v2 is based on anomalies. The high-frequency anomaly for ERSST.v2 is then estimated usingHere

*E*

_{m}(

*x*) is the mode

*m*at each spatial area

*x,*and the analysis weight for time

*t*and mode

*m*is

*w*

_{m}(

*t*).

For each time step the high-frequency anomaly is analyzed by screening out modes not adequately sampled, with screening done as in ERSST.v1. Screening rejects modes that have less than a critical percent of their variance sampled. For the selected modes we find the set of weights that minimize the squared error of the fit, compared to the available data, as in the ERSST.v1 analysis. The maximum number of anomaly modes used for analysis, 130, is larger than the maximum number of increment modes in ERSST.v1, which is 75 modes. More anomaly modes are used to ensure that we better resolve tropical variations. Note that 130 is the maximum number of modes used. Undersampled modes are screened out, with sampling checked at each time step. The method for determining if a mode is adequately sampled is discussed below and in SR.

When a mode is undersampled its ERSST.v2 weight is not defined by fitting to the data. In those cases the weight is estimated using the autocorrelation for that mode, similar to ERSST.v1. The autocorrelations are estimated from the 1950–97 period, when weights for all modes can be computed most of the time. For each mode, the undefined weights are estimated using the nearest defined weights in both forward and backward directions and the autocorrelation of the mode. If the time lags to the nearest defined weights in the forward and backward directions are lp and lm, and the lag-1 autocorrelation for the mode is *r*_{m}, then the damped estimate for the undefined weight is set to *w*_{e}(*t*) = 0.5 [*w*(*t* + lp)*r*^{lp}_{m}*w*(*t* + lm)*r*^{lm}_{m}

In addition to the autocorrelation damping described above, a way to include temporal information is to pool anomalies from several months about the analysis month. Here we test the analysis two ways: using only anomalies from the analysis month and using pooled anomalies from three months centered on the analysis month. For the pooled anomalies, where the anomaly is defined for the analysis month we use that value. Where the anomaly is not defined for the analysis month but is defined in one or both of the adjacent months, the anomalies from the adjacent months are used, averaging them if both are defined. As discussed below, both methods give similar results, but the pooled data gives slightly stronger analyses when sampling is most sparse.

Beside the quality control described by SR, the 2° monthly anomalies used for the ERSST.v2 high-frequency analysis are further screened. This was found to be necessary by first performing the analysis without the second screening. In that analysis, there were a few times and locations where the analyzed anomalies were much larger than other anomalies at that location. These anomalies were apparently due to biased data. For example, in the Niño-4 area (5°S–5°N, 160°E–150°W), the range of anomalies is nearly always between −1.5° and 1.5°C. However, without the second screening, there is a time in the late 1910s when the Niño-4 average is less than −3°C. In that period sampling is sparse and the unusually large anomaly appears to be caused by a few biased data that passed the initial quality control. There must be several observations with a similar bias or else the first screening would have removed them, while random errors are filtered out by fitting to modes. This was not a problem in ERSST.v1, which used fewer modes that had larger spatial scales than some of the modes used in ERSST.v2.

The additional screening removes superobservations with a high-frequency anomaly magnitude of greater than *kσ*_{NOI} where *k* is a positive number. We also retain all superobservations with anomaly magnitudes of 1.0°C or less to avoid excessive screening where the variance is weak. To assign *k,* screening with *k* = 4, 4.5, and 5 was tested and the change to the analysis was evaluated. For *k* = 4, the resulting ERSST.v2 analysis still contained occasional outliers like the Niño-4 outliers discussed above. For both *k* = 4.5 and 5, the Niño-4 example was cleaned up without damping the analysis, and there were few other large magnitude anomalies in the analysis. We used the less restrictive *k* = 4.5 value.

Most anomalies removed by the additional screening are at high latitudes, where SST gradients are strong and small errors in latitude can produce large anomaly errors. In low latitudes relatively few are removed except for before 1880, around 1915, and around 1940. Analysis sampling is determined by how many modes are selected with the available sampling. The second screening usually only reduces the number of modes selected by one or two (out of the 70 or more modes typically selected). Therefore, the second screening does not greatly affect sampling error in the analysis. Note that the low-frequency analysis is exactly the same as was used in ERSST.v1, and does not use the second screening. Additional screening is not needed for the low-frequency analysis since that analysis filters out the effects of the outliers.

The ERSST.v2 analysis was tuned to determine 1) the critical sampling level needed to accept a mode in the analysis and 2) whether to use 1 month of data or data pooled from 3 months. In ERSST.v1 SR used cross-validation tests to determine that at least 15% of the variance of each mode should be sampled for the mode to be used in the analysis. Here we tested ERSST.v2 using 3 months of pooled data and critical values of 15%, 25%, and 35% of the variance sampled for each mode.

The variance from 15-yr running periods for each test is computed and averaged over the 60°S–60°N region. The square root of the ratio of that variance to the average 1982–97 NOAA OI variance gives the relative strength of the analyses, here called the relative standard deviation. For all three critical values the variance is similar after 1950 when sampling is relatively dense. Compared to the recent period, the 15% critical value gives high variability before 1950 when sampling is sparse (Fig. 2). The relative standard deviation is as much as 25% stronger than in the recent period, indicating that some poorly sampled modes may be fit in that period. For a 35% critical-sampling value, variability is lower before 1950 compared to the recent period. With a 25% critical-sampling value the relative standard deviation is nearly constant except before 1880 when it decreases due to sparse sampling. In addition, the relative standard deviation from ERSST.v1 is nearly the same as for the 25% critical sampling in ERSST.v2. Therefore, we use a 25% critical-sampling value for ERSST.v2.

The 1982–97 standard deviation map for the ERSST.v2 using 3-month pooled data (Fig. 3) compares well to the NOAA OI standard deviation for the same period. It is an improvement over ERSST.v1, which has more damped tropical standard deviation. For the 1-month analysis the standard deviation in this period is nearly identical to the 3-month pooled analysis. For both, nearly all modes are chosen in this recent period, but in earlier periods many more modes are lost in the 1-month analysis compared to the 3-month analysis. In the first half of the twentieth century, using 3-month pooled data increases the number of selected modes by more than 20%. The increase is even greater in the nineteenth century. In addition, the spatial standard deviation for the 60°S–60°N region for ERSST.v1 and for the 3-month analysis are comparable (see Murphy and Epstein 1989 for a definition of spatial variance). However, for the 1-month analysis the spatial standard deviation is lower before about 1910. For these reasons we use the 3-month pooled data analysis for ERSST.v2.

### b. Merging of sea ice information

*T*

_{f}, as the ice concentration approaches 0.9 (i.e., 90% ice cover). The freezing point of seawater is here set to −1.8°C, except in the Great Lakes where it is set to 0°C. A relationship between collocated SST and sea ice concentration is derived by a quadratic adjustment. The adjustment is defined as

*T*

_{Q}

*aI*

^{2}

*bI*

*c,*

*I*is the ice concentration and

*a,*

*b,*and

*c*are empirically derived coefficients derived with the constraint that

*T*

_{Q}(

*I*= 0.9) =

*T*

_{f}. Those coefficients are defined using the available data from the marginal ice zone (MIZ). In situ data in the MIZ are sparse, so both in situ and Advanced Very High Resolution Radiometer (AVHRR) satellite-based SSTs are used to derive the coefficients for a recent period. In the Arctic, coefficients are computed as a function of both month and longitude, with separate coefficients for several Northern Hemisphere regions (the Great Lakes, Baltic Sea, Sea of Okhotsk, Sea of Japan, and Gulf of Alaska). In the Antarctic the coefficients are computed only as a function of month because of limited data.

In Rayner et al. (2003), adjusted temperatures with concentrations less than 0.15 were smoothed into the no-ice analysis using Poisson blending. In our ice blending we test the quadratic adjustments, computing them the same way except that we simplify the ice-edge smoothing of *T*_{Q} so that the adjustment linearly merges with the no-ice analysis as *I* goes from 0.2 to 0.

Quadratic adjustments gives reasonable results, but there are potential problems because of reliance on regional, and month-dependent empirical coefficients, which were computed to fit the available data in a fixed base period. If conditions change significantly over the analysis period from the base conditions, the method may not fully represent those changes. In addition, some of the coefficient fits were computed using satellite SST retrievals, which have been shown to be biased because of difficulty in distinguishing ice from open water (N. A. Rayner 2003, personal communication). Therefore, we developed a simplified adjustment method, which we here examine along with the quadratic method.

To test the sea ice to SST conversion algorithms, we use the sea ice concentrations of Rayner et al. (2003). Those concentrations include Arctic summer concentrations from microwave sea ice retrievals, which tend to underestimate summer concentrations due to the formation of melt ponds on the ice. Those satellite retrievals are combined with other observations, values taken from atlases, and climatology when there are insufficient data. The methods tested all set the analysis SST to the freezing point of seawater for ice concentrations of more than 0.9.

*A*is the pivot point where the two pieces join and is set between 0 and 0.9. The unadjusted analysis temperature is

*T*

_{o}. For

*A*= 0, this gives a linear adjustment with no pivot. For

*A*= 0.9 there is only an adjustment at

*I*= 0.9. This gives a linear adjustment from

*T*

_{o}to

*T*

_{f}, as the concentration

*I*ranges between

*A*and 0.9. In practice the piecewise linear method is similar to the quadratic method, which gives slowly changing temperatures at low concentrations and more rapid changes at high concentrations. However, the piecewise linear adjustments require only one (monthly and regionally constant) parameter, compared to two monthly and spatially dependent parameters used for the quadratic adjustment.

To evaluate the different methods, we bin SST values from the unanalyzed COADS 2° superobservations, along with the ERSST.v2 analysis adjusted using several sea ice to SST algorithms. For different ice concentration intervals (i.e., intervals of *I*) the average SST from the algorithms is computed and compared to the COADS superobservations at locations where COADS observations are available averaged over time and spatially. Values are binned within ice concentration intervals of 0.1. Here we discuss comparisons for the 1980– 89 period over regions poleward of 60° latitude. Comparisons for other decades and also some regional comparisons showed similar results.

Table 1 gives the root-mean-squared difference (rmsd) of adjusted SST from COADS superobservations, comparing concentrations between 0.1 and 0.8. The rmsd is weighted by the number of comparisons in each ice concentration interval. For the 60°–90°N region in summer, winter, and all months, the piecewise linear rmsd decreases with increasing *A,* from *A* = 0 (referred to as PL0) to *A* = 0.9 (referred to as PL9). As discussed below, we believe that the high-concentration COADS values are questionable, which makes the lower rmsd values for PL9 questionable. The quadratic adjustment gives similar rmsd values to PL6, but PL0 is clearly inferior to the quadratic adjustment. From comparison in the 60°–90°S region, a similar conclusion can be drawn.

In the warm seasons for both hemispheres, the COADS superobservations tend to not decrease very much with increasing concentration (Fig. 4). This may be due to more noise at high concentrations because there are fewer temperature observations with high concentrations (shown on the lower half of the panels), or due to bad matchups between the ice concentration data and COADS from errors in either dataset, or due to SST biases in high-concentration observations. Physically, the SST must approach *T*_{f} as *I* approaches 0.9, so the high temperatures with high sea ice concentrations are questionable.

The quadratic-adjusted SST and the piecewise linear adjusted SST with *A* = 0.6 (PL6) give similar variations of temperature with ice concentration. Based on these comparisons, we use the PL6 adjustment for ERSST.v2. The same adjustment is also applied to the in situ OI comparison analysis. The entire 1854–1997 period is adjusted using the sea ice concentrations of Rayner et al. (2003).

### c. Bias correction adjustment

Corrections of SST for historical biases are needed before 1941 due to differences in measurement practices before and after that year (Folland and Parker 1995). Before World War II most SST measurements were computed from water samples taken on deck in buckets. After the war, ship condenser intake measurements were more common. Folland and Parker (1995) suggest a set of bias corrections to account for the different measurement practices. Those corrections abruptly end after 1941, and there are no suggested corrections for after that year. Following Folland and Parker (1995), Smith and Reynolds (2002) developed a set of bias corrections using different methods. Due to sparse data during World War II, they were not able to resolve that period well enough to justify a more gradual end to the corrections. Thus, they also end their corrections abruptly after 1941. Those Smith and Reynolds (2002) corrections were used in ERSST.v1.

Around 1940, COADS release 1 was dominated by Japanese data (see Fig. 3 of Woodruff et al. 1998), but new (primarily U.S.) data altered the COADS release 2.0 data used for ERSST. Over much of the oceans, the 1941 ERSST.v1 anomalies are noticeably warmer than the HadISST anomalies for that year. An analysis by C. K. Folland (2003, personal communication) indicates that the COADS release 2 SST has a positive bias relative to the data originally considered by Folland and Parker (1995). This bias begins in 1939 and gradually increases until 1941, ending after 1941.

Most differences between COADS and the data previously considered by Folland and Parker (1995) disappear if the bias correction is reduced to zero linearly over the 1939–41 period, rather than as a step after the end of 1941. Therefore, for that period we weight the Smith and Reynolds (2002) bias correction by 1 − *m*/36, where *m* = 1 for January 1939 to *m* = 36 for December 1941. That adjusted bias correction is used for ERSST.v2. In the future, more careful evaluation of biases in this period is planned.

### d. Error analysis

^{2}

^{2}

_{R}

^{2}

_{S}

^{2}

_{B}

*R,*

*S,*and

*B*represent random, sampling, and bias error variances, respectively. Random error variance in an analysis,

^{2}

_{R}

^{2}

_{S}

^{2}

_{B}

The ERSST.v2 analysis removes almost all random error. As discussed above and in SR, the low-frequency component of the analysis averages and filters a large number of data. The low-frequency analysis requires a minimum of over 100 observations and, in addition, there is spatial and temporal binomial smoothing as part of that analysis. So random error variance in the low-frequency analysis is reduced a factor of more than 100 compared to the random noise of individual observations. The individual observations are themselves quality controlled, so the low-frequency random error variance is very small.

The high-frequency component of the analysis is computed by fitting data to a set of spatial modes representing the covariance from a well-sampled period. Random errors should not project onto these modes and, therefore, they should be filtered out of the high-frequency analysis. For ERSST.v1, SR computed the signal/noise variance ratio for several 30-yr periods in the nineteenth and twentieth century and gave the 60°S– 60°N spatial average of the ratio for each period. In ERSST.v1 the ratio was found to be about 30 for all periods. The noise error variance is here represented by ^{2}_{R}*σ*^{2}_{Ta}

*T*be the true SST anomaly and

*T*

_{a}be the analysis SST anomaly. Then by definition, the total error variance iswhere the angle brackets denote averaging. The first three terms on the right-hand side of (5) involve the variance of the true and analysis SST anomalies and the correlation between them,

*r*

_{T,Ta}. Those three terms represent sampling and random error variance. The last term, (〈

*T*〉 − 〈

*T*

_{a}〉)

^{2}, is the bias error variance.

*T*

_{a}in terms of the true anomaly

*T*as

*T*

_{a}

*αT*

*β*

*R,*

*α*and

*β*are constants, and

*R*is the random noise. Random noise includes all of the uncorrelated differences, which cannot be accounted for by

*αT*+

*β.*Note that

*α*determines the relative strength of the analysis, and the sampling error

^{2}

_{S}

*α.*The systematic bias is represented by

*β.*Since R is random, its mean is zero and it is uncorrelated with anything else, and its variance is

^{2}

_{R}

*σ*

^{2}

_{Ta}

*α*

^{2}

*σ*

^{2}

_{T}

^{2}

_{R}

*σ*

^{2}

_{T}

*α*

^{2}

*σ*

^{2}

_{T}

*α*

^{2}

*σ*

^{2}

_{T}

^{2}

_{R}

*r*

_{T,Ta}≈ 1. Thus, for ERSST.v2

^{2}

_{S}

^{2}

_{R}

^{2}

_{S}

*T*

*T*

_{a}

^{2}

*σ*

_{T}

*σ*

_{Ta}

^{2}

*T*

*T*

_{a}

^{2}

*σ*

_{T}−

*σ*

_{Ta}for the high frequency. This gives the expression

^{2}

_{S}

_{hf}

*σ*

_{T}

*σ*

_{Ta}

^{2}

_{hf}

The NOAA OI from 1982 to 1997 is used to estimate the true high-frequency SST anomaly variance (*σ*^{2}_{NOI}_{hf} ≈ (*σ*^{2}_{T}_{hf} and we assume that it is stationary. Although the NOAA OI analysis contains some noise due to its use of different data types and bias corrections for satellite data, it is dominated by satellite data and gives a good estimate of the truth. Since we use this for estimating the high-frequency error, we remove the linear trend from the NOAA OI data before computing the variance to minimize its interdecadal variations. Similarly, for the analysis we remove the linear trend for running 15-yr periods centered on the time of interest and compute the analysis high-frequency variance in that period to estimate the high-frequency sampling error. Because of the possibility of seasonality in the sampling error, we use only data from the appropriate season for seasonal or monthly estimates.

The low-frequency sampling error is computed by estimating how well the available sampling can estimate a linear trend. A trend of 0.5°C (100 yr)^{−1} is assigned everywhere on the globe. That is approximately the magnitude of the trend of global average temperature over the twentieth century. For each year the low-frequency sampling error is estimated by sampling that constant trend using the observed sampling for the year, with sampling held constant over 100 simulated trend years. The mean-squared difference between the actual trend and that subsampled low-frequency analysis of the trend defines the low-frequency sampling error variance for the given year. Repeating the process using the sampling of other years gives the low-frequency sampling error for those years. Note that this estimate of the low-frequency sampling error is similar to the estimate in SR and Smith et al. (2002), who estimated the low-frequency sampling error using coupled GCM SST output that was subsampled to match historical sampling.

The bias error variance ^{2}_{B}*σ*^{2}_{Ta}_{B} value of 0.015°C is assigned whenever the measured value falls below the minimum, as discussed by SR.

The three error components, for annual and 60°S– 60°N averages (Fig. 5), show that most of the error of the average is from the low-frequency analysis. Because of its damping when data are sparse, the low-frequency analysis may underestimate interdecadal variations. Problems are greatest before 1900 and in periods associated with the two world wars. The bias error can also be large prior to 1940, but it is generally less than the low-frequency error. The high-frequency error is usually the smallest, although it is occasionally larger than the bias error. Our bias error is similar in size to the estimate of Folland et al. (2001). However, because of the large low-frequency error estimate the overall ERSST.v2 error estimate for the near-global average is larger than their estimate. Sampling of the low-frequency (trend) part of SST needs to be greater than for the high-frequency part since the trend is not described by a stationary set of basin-scale modes. In Folland et al. (2001) a 52-yr filled analysis was developed to describe covariance associated with both the low and high frequency, allowing potentially better low-frequency analysis. However, that analysis assumes that the 52-yr period spans all important interdecadal variations, while our analysis is slightly more conservative with the low-frequency analysis.

## 4. Results

Regions where ERSST.v1 variability is too weak include the western tropical Pacific and Atlantic (Fig. 1). Time series of the Niño-4 area SST anomalies (Fig. 6) from both ERSST.v1 and ERSST.v2 show differences in the new analysis and differences from the HadISST analysis of Rayner et al. (2003). All three show similar interannual variations. Both ERSST.v1 and ERSST.v2 are more damped than HadISST before 1900. The HadISST analysis more closely follows the available observations when data are sparse, to maximize analysis signal, while the ERSST analyses are more conservative in sparse-sampling situations in order to minimize noise. The ERSST.v1 anomalies are more damped than the ERSST.v2 anomalies in this west central tropical Pacific region, especially before 1960.

In section 3a (Fig. 2) we discussed the relative standard deviation for the 60°S–60°N region. Here we show the relative standard deviation for three smaller regions: 23°–60°N, 23°S–23°N, and 60°–23°S (Fig. 7). In both of the extratropical regions the ERSST.v2 analysis is slightly weaker than ERSST.v1. However, in the northern extratropics ERSST.v2 is stronger than ERSST.v1 in the most recent period, when sampling is best. The Northern Hemisphere stronger variance in ERSST.v1 before 1960, when sampling is less, may be due to slightly less filtering of noise in that analysis. The Southern Hemisphere ERSST.v2 variance is smaller and changes less over time than the ERSST.v1 variance. Variance differences are larger in the Southern Hemisphere than in the Northern Hemisphere, and differences are largest in the most recent period, suggesting that ERSST.v1 is more sensitive to changes in sampling in that region. For the Tropics and Northern Hemisphere, HadISST variations are stronger after 1950. The HadISST analysis attempts to make maximum use of the available data, so the strength of its variations are more sensitive to changes in sampling.

In the Tropics the ERSST.v2 variance is always larger than for ERSST.v1 due to better representation of the western tropical Pacific and Atlantic, as well as the tropical Indian. Both analyses have fairly constant tropical variance. The better representation of the Tropics is the main advantage of ERSST.v2 compared to version 1. The HadISST analysis tends to be weaker between before 1950 and its strength increases as sampling increases later.

The rmsd between ERSST.v2 and HadISST (over the same 15-yr running periods) shows an uncertainty similar to our total standard error (Fig. 8), which includes bias and sampling error. Differences are largest in periods of poor sampling, especially before 1900 and around the 1940s. Besides poor sampling around the 1940s, that period also has a large difference between the two SST bias corrections, which further increases the analysis differences. Since it is difficult to know which analysis is better, the rmsd between them could also be used as a measure of analysis uncertainty. The consistency of the standard error estimate with the rmsd suggest that our error estimate is reasonable.

Maps of monthly standard sampling error indicate regions where sampling is relatively good or bad. The ERSST.v2 normalized standard sampling error for January of 1900 and 1940 (Fig. 9, left panels) indicate normalized error less than 0.3 over most regions for those years. This includes both the low- and high-frequency sampling error. Standard errors discussed here are normalized with the NOAA OI standard deviation for the month. For 1900 the sampling problems are largest in the western tropical Pacific. For 1940 the largest sampling problems are in the Southern Hemisphere. Both of these years have relatively large sampling error (Fig. 8), and sampling errors after 1950 are much smaller. The comparable sampling error maps for the in situ OI (Fig. 9, right panels) clearly shows the sampling along ship tracks. In that analysis, only densely sampled regions have a normalized sampling error of less than 0.3, and there are many more regions with error above 0.9. Contrasted to ERSST.v2, the in situ OI analysis does not use large-scale spatial modes and, therefore, does not incorporate as much spatial-covariance information. The in situ OI also does not include as much temporal-covariance information. Because of its lesser use of these statistics, the in situ OI requires more dense sampling. The ERSST.v2 requires less sampling, with the trade off that it filters the input data more than the in situ OI analysis.

## 5. Conclusions

The new analysis (ERSST.v2) is an improvement over version 1 because of its stronger variance in the western tropical Pacific, its inclusion of ice concentration information, and its improved error estimates. Averaged over large regions there is relatively little difference between ERSST.v2 and version 1. However, there are at times large differences between ERSST.v2 and the HadISST analysis of Rayner et al. (2003). Those differences generally lie within the 95% confidence interval for the analysis. For example, for the 60°S–60°N annual averages (Fig. 10, upper panel) there are large differences before 1900. In that period sampling is sparse, as reflected by the wide confidence intervals. The confidence interval is most narrow after 1950 when sampling is better than in earlier periods.

Temporal correlation of monthly HadISST and ERSST.v2 anomalies is highest in the most recent decades and away from the Southern Oceans (Table 2). Except for that region and the Arctic Ocean, typical correlations are 0.7 or higher since 1940, but less than 0.6 before 1940. Correlations are lowest near Antarctica, as would be expected from the sparse data there.

Although HadISST is almost always within the 95% confidence intervals for ERSST.v2, an exception occurs in the 60°–23°S annual averages for the period after 1980 (Fig. 10, lower panel). HadISST incorporates different types of data, including satellite estimates of SST when they are available, to help the analysis where data are sparse. Biases in the satellite data may cause the Southern Ocean cool bias relative to the COADS-based ERSST.v2.

For the 1982–97 period, when the NOAA OI and the historical analyses overlap, comparisons can be made to the NOAA OI analysis. Since the NOAA OI analysis incorporates satellite and in situ data, we consider it to be a close approximation of the truth. The global spatial rmsd between the NOAA OI analysis and ERSST.v1, ERSST.v2, and HadISST (Fig. 11) show that both ERSST.v1 and HadISST have comparable overall skill. They both also show a large annual cycle in the rmsd, with larger differences around the middle of the year (when Southern Ocean ship sampling is reduced). The ERSST.v2 has consistently higher skill (lower rmsd), and also a smaller cycle in the rmsd. The improvement in ERSST.v2 rmsd is nearly the same at all latitudes except in the Arctic Ocean, where there is almost no difference between all three.

All of these historical analyses show broadly similar variations, which give us confidence in the results. Because data are sparse over much of the historical period, methods incorporating large-scale spatial modes of variation are needed to make the most efficient use of those sparse observations. In our comparisons, all methods except the in situ OI use large-scale spatial modes. However, even with the most efficient use of the available data, there are periods such as the 1940s and before 1880 when data are very sparse, and analyses from those times should be used with caution.

The monthly average ERSST.v2 are available online along with the monthly error estimate at the National Climatic Data Center (NCDC) Web site. Analysis begins January 1853 and is updated monthly (http://www.ncdc.noaa.gov/oa/climate/research/sst/sst.html.).

## Acknowledgments

We thank Vern Kousky, Yan Xue, Kevin Trenberth, Sam Shen, Sharon LeDuc, Scott Woodruff, and an anonymous reviewer for discussions and comments. In addition, the NOAA office of Global Programs provided support for some of this work.

## REFERENCES

Folland, C. K., , and D. E. Parker, 1995: Correction of instrumental biases in historical sea surface temperature data.

,*Quart. J. Roy. Meteor. Soc***121****,**319–367.Folland, C. K., and Coauthors, 2001: Global temperature change and its uncertainties since 1861.

,*Geophys. Res. Lett***28****,**2621–2624.Kagan, R. L., 1979:

*Averaging of Meteorological Fields*(in Russian). Gidrometeoizdat, 212 pp. [English translation, 1997: Kluwer, 212 pp.].Murphy, A. H., , and E. S. Epstein, 1989: Skill scores and correlation coefficients in model verification.

,*Mon. Wea. Rev***117****,**572–581.Rayner, N. A., , D. E. Parker, , E. B. Horton, , C. K. Folland, , L. V. Alexander, , D. P. Rowell, , E. C. Kent, , and A. Kaplan, 2003: Global analyses of sea surface temperature, sea ice, and night marine air temperature, sea ice, and night marine air temperature since the late nineteenth century.

*J. Geophys. Res.,***108,**4407, doi:10.1029/2002JD002670.Reynolds, R. W., 1993: Impact of Mount Pinatubo aerosols on satellite-derived sea surface temperatures.

,*J. Climate***6****,**768–774.Reynolds, R. W., , N. A. Rayner, , T. M. Smith, , D. C. Stokes, , and W. Wang, 2002: An improved in situ and satellite SST analysis.

,*J. Climate***15****,**1609–1625.Reynolds, R. W., , C. L. Gentemann, , and F. Wentz, 2004: Impact of TRMM SSTs on a climate-scale SST analysis.

*J. Climate,*in press.Slutz, R. J., , S. J. Lubker, , J. D. Hiscox, , S. D. Woodruff, , R. L. Jenne, , D. H. Joseph, , P. M. Steurer, , and J. D. Elms, 1985: COADS: Comprehensive ocean–atmosphere data set, release 1. Climate Research Program, NOAA Environmental Research Laboratories, 262 pp. [Available from Climate Research Program, Environmental Research Laboratories, 325 Broadway, Boulder, CO 80303.].

Smith, T. M., , and R. W. Reynolds, 2002: Bias corrections for historic sea surface temperatures based on marine air temperatures.

,*J. Climate***15****,**73–87.Smith, T. M., , and R. W. Reynolds, 2003: Extended reconstruction of global sea surface temperatures based on COADS data (1854–1997).

,*J. Climate***16****,**1495–1510.Smith, T. M., , R. W. Reynolds, , and C. F. Ropelewski, 1994: Optimal averaging of seasonal sea surface temperatures and associated confidence intervals (1860–1989).

,*J. Climate***7****,**949–964.Smith, T. M., , R. W. Reynolds, , R. E. Livezey, , and D. C. Stokes, 1996: Reconstruction of historical sea surface temperatures using empirical orthogonal functions.

,*J. Climate***9****,**1403–1420.Smith, T. M., , T. R. Karl, , and R. W. Reynolds, 2002: How accurate are climate simulations?

,*Science***296****,**483–484.Thiébaux, H. J., 1997: The power of the duality in spatial-temporal estimation.

,*J. Climate***10****,**567–573.van den Dool, H. M., , S. Saha, , and Å Johansson, 2000: Empirical orthogonal teleconnections.

,*J. Climate***13****,**1421–1435.Woodruff, S. D., , H. F. Diaz, , J. D. Elms, , and S. J. Worley, 1998: COADS Release 2 data and metadata enhancements for improvements of marine surface flux fields.

,*Phys. Chem. Earth***23****,**517–527.

Average rmsd (°C) from COADS superobservations for ERSST.v2 SST adjusted using the quadratic method (Quad), and the piecewise linear method with *A* = 0.0 (PLO), 0.4 (PL4), . . . , 0.9 (PL9). Results are given for the 1980–89 period for Dec–Feb (DJF), Jun–Aug (JJA), and all months (All). Results are given for the two polar regions as indicated

Spatial averages of temporal correlation of monthly SST anomalies. The HadISST and ERSST.v2 monthly anomalies are cor related for the indicated periods, averaged over the listed regions