Abstract

Radiosonde datasets of temperature often suffer from discontinuities due to changes in instrumentation, location, observing practices, and algorithms. To identify temporal discontinuities that affect the VIZ/Sippican family of radiosondes, the 1979–2004 time series of a composite of 31 VIZ stations are compared to composites of collocated values of layer temperatures from two microwave sounding unit datasets—the University of Alabama in Huntsville (UAH) and Remote Sensing Systems (RSS). Discontinuities in the radiosonde time series relative to the two satellite datasets were detected with high significance and with similar magnitudes; however, some instances occurred where only one satellite dataset differed from the radiosondes. For the products known as lower troposphere (LT; surface–300 hPa) and midtroposphere (MT; surface–75-hPa layer), significant discontinuities relative to both satellite datasets were found—two cases for LT and four for MT. These are likely associated with changes in the radiosonde system. Three apparent radiosonde discontinuities were also determined for the lower-stratospheric product (LS; 150–15 hPa). Because they cannot be definitely traced to changes in the radiosonde system, they could be the result of common errors in the satellite products. When adjustments are applied to the radiosondes based independently on each satellite dataset, 26-yr trends of UAH (RSS) are consistent with the radiosondes for LT, MT, and LS at the level of ±0.06, ±0.04, and ±0.07 (±0.12, ±0.10, and ±0.10) K decade−1. Also, simple statistical retrievals based on radiosonde-derived relationships of LT, MT, and LS indicate a higher level of consistency with UAH products than with those of RSS.

1. Introduction

Satellite-mounted microwave sounding units (MSUs), advanced MSUs (AMSUs), and radiosondes (sondes) are currently the best sources of data for deriving time series of upper-air temperatures for short-term climate studies. At the end of 2004, the satellite record was 26 yr long; the sonde record, more than 50. Neither is homogeneous over its entire period of operation because of changes in instrumentation, in methods of operation, in processing algorithms, or, for some radiosondes, in station location. With each such change a discontinuity may be introduced into the record that could affect the size of the variations or the slope of the trends of the data. The purpose of the work described here is to analyze the effect of discontinuities on a composite temperature index derived from a particular set of sondes. Because the analysis requires the use of satellite data, the study is limited to the period 1979–2004.

The sondes used in this study come from 31 U.S.-controlled stations using the VIZ/Sippican series of instruments (VIZ for short). The National Weather Service (NWS) specially manages these stations in an effort to produce a consistent, long-term record, which is preferred for climate studies. The NWS attempted to use the same type of instrument package at these stations to reduce the number of changes and give increased confidence in the validity of long-term variations and trends.1 Even so, changes have occurred.

Several groups interested in climate studies have developed approaches for discovering and reducing the effects of discontinuities in larger collections of sonde records. Parker et al. (1997) used the CLIMAT TEMP monthly, upper-air data to produce several adjusted time series known collectively as HadRT. The method applied for HadRT 2.1 and HadRT 2.2 identified temperature shifts relative to the version-B satellite temperatures from the University of Alabama in Huntsville (UAH). Shifts significant according to a Student’s t test and aligned with documented changes in instrumentation or procedure were taken to be the result of discontinuities. The method is limited by (a) relying on satellite data, which have their own discontinuities; (b) relying on metadata, which are incomplete; and (c) deconvolving layer-mean discontinuities into layer-specific corrections for sondes, which is a nonunique process. Parker et al. showed that significant discontinuities existing in several stratospheric time series implied too much cooling over time. Their results for the troposphere indicate that positive and negative discontinuities appear to be more randomly distributed and have little effect on composite trends.

Lanzante et al. (2003, hereafter LKS) selected 87 stations with reasonably complete data and fairly representative spatial distribution over the globe. Each station was examined individually through 1997. Using statistical analyses and human judgment, they determined adjustments to reduce discontinuities and produce the LKS dataset. Among the factors they considered were vertical coherence, day–night differences, sudden shifts, and reports of instrument changes. Free et al. (2005) have appended post-1997 data, adjusted through objective criteria, to this dataset to produce the Radiosonde Atmospheric Temperature Products for Assessing Climate (RATPAC) datasets.

Christy and Norris (2004) selected 89 stations in the Southern Hemisphere and performed an analysis similar to that of Parker et al. (1997). The sonde and UAH satellite data (v5.1) for the troposphere were directly compared station by station to determine the effects of known major changes in instrumentation. The most prominent change was caused by the switch from Philips to Vaisala sondes, which led to relatively warmer temperatures in the troposphere. Other instrument changes, for example, from either Vaisala RS-21 or VIZ-B to Vaisala RS-80, caused shifts toward temperatures cooler than those of the satellite values. The cumulative effect of the adjustments on the trend of the composite time series was very small (±0.02 K decade−1) in agreement with Parker et al.

Thorne et al. (2005) amassed data from about 650 stations (the HadAT2 dataset). A common mean period was established for the stations from which anomalies were generated. Thorne et al. identified discontinuities in each station (the target) by comparing its anomalies with those of a reference series. The reference series was obtained from the nontarget stations whose anomalies had high temporal coherence with those of the target. Statistical tests were applied to discover discontinuities in the target. Metadata were also consulted to build confidence in the statistically identified breakpoints.

Haimberger (2004) used the output of a global reanalysis or forecast model to supply the reference time series. The sonde data were compared with the first guess of the model. A statistical analysis determined whether shifts or changes in trends appeared over time.

Our approach in this analysis is different. By examining the composite time series from a small set of stations that have experienced common changes over time (a “family” of stations), we can be more confident that any discontinuities we discover are real, even when metadata are not available for confirmation, as is typical. Indeed, confirmation was not available for most of the discontinuities identified by Lanzante et al. (2003), Thorne et al. (2005), and Haimberger (2004).

In the following we describe the datasets and methods used in the analysis. Next, we state and discuss the results. Finally, we present our conclusions.

2. Data

This analysis is based on two satellite datasets and the soundings of the 31 stations in the U.S.-controlled VIZ family. The satellite datasets supply reference time series for discovering discontinuities in the sonde dataset. Table 1 provides information about the sonde stations.

Table 1.

Listing of radiosonde stations examined in this paper. We include the date of the one major instrumental change (B to B2) in this time period in addition to other notes. Other changes are not listed here (see text).

Listing of radiosonde stations examined in this paper. We include the date of the one major instrumental change (B to B2) in this time period in addition to other notes. Other changes are not listed here (see text).
Listing of radiosonde stations examined in this paper. We include the date of the one major instrumental change (B to B2) in this time period in addition to other notes. Other changes are not listed here (see text).

a. Satellite data

The satellite datasets are products of UAH (Christy et al. 2003) and Remote Sensing Systems (RSS) (Mears et al. 2003; Mears and Wentz 2005). Temperatures are produced for three layers: the lower troposphere (LT), roughly surface to 300 hPa; the midtroposphere to lower stratosphere (MT), roughly surface to 75 hPa; and the lower stratosphere (LS), roughly 150 to 15 hPa. The UAH LT is version 5.2, which includes the corrected diurnal shift unique to LT (Spencer et al. 2006). The UAH MT and LS are version 5.1. The RSS LT and MT are version 2.1 and RSS LS is version 1.3. The records are derived from observations of MSUs and AMSUs flown on National Oceanic and Atmospheric Administration (NOAA) weather satellites since late 1978 and are provided as monthly anomalies on a 2.5° × 2.5° grid.

Each instrument detects the intensity of upwelling microwave radiation from the column of air in the direction with which the rotating mirror is aligned. A polar-orbiting satellite may sample an earth location twice a day, around 0730 and 1930 or around 0200 and 1400 local time. Because two satellites operated simultaneously during most of the 26 yr, local areas were sampled usually 4 times a day, though satellite cross-swaths are not wide enough between 40°S and 40°N to fully sample all grids on a given day. Diurnal corrections are applied in an attempt to account for satellite drift from the initial orbital crossing times, although ambiguity remains (Mears and Wentz 2005).

b. Sonde data

The sonde data were obtained from the Integrated Global Radiosonde Archive (IGRA) of the National Climatic Data Center (Durre et al. 2006). IGRA is a comprehensive set of sondes compiled through extensive intercomparison of 11 datasets to select the most reliable reports. The records were not adjusted to remove discontinuities, though a complex quality control check was applied to eliminate obviously erroneous observations. The VIZ-family stations release balloons at 0000 and 1200 UTC each day. Some of the Caribbean stations drop 0000 UTC observations from December to May, the part of the year unaffected by hurricanes. Because of the time required for ascent, the 50-hPa report is about an hour later than the surface report. During this time the balloon may drift to a different location. This can create a mismatch in the reported and actual local times of the sondes and satellites, but such differences do not appear to affect monthly mean anomalies since the features of the monthly mean anomaly field are large and coherent (Spencer et al. 1990; Christy et al. 1998).

c. Comparison of satellite and sonde data

To be able to compare sonde and satellite data, brightness temperatures simulating the satellite observations of the three layers are computed from sonde profiles. This is done with full radiation code that includes the tiny effects of humidity and surface emission and reflection. For a profile to be acceptable it must report data to at least 50 hPa. On average over this period, 90% reached 50 hPa. Observations missed during an ascent are interpolated from previous and following soundings of the same release time provided the time gap is not too large. Any remaining missing levels are then supplied by vertical interpolation except at the top of the sounding, where missing values are replaced by the last valid observation. The great majority of months were populated with at least 10 days of soundings for each release time. Monthly means were produced for both 0000 and 1200 UTC. If enough soundings for both release times were available, a daily average of the two was calculated and is designated 99 UTC in Table 1.

3. Method

The monthly average satellite and sonde anomalies were paired in time and space. If a sonde average was missing, the satellite average was set to missing. A residual mean annual cycle was calculated and removed to account for the aliasing influence of missing values and to ensure the satellite and sonde anomalies were consistent. At this point there were 31 time series of 312 months for each data source and atmospheric layer.

We composited each group of 31 time series into an average series for each data source and layer. The resulting nine series are shown in Fig. 1 as annual averages. Also, before compositing the series, we computed the differences between the VIZ and UAH and between the VIZ and RSS series and composited them into mean difference series for each layer. The resulting six difference series are also shown in Fig. 1 as annual averages. We will refer to any difference series obtained by subtracting the UAH series from the corresponding VIZ series as VIZ-UAH. Similarly, for the RSS differences we use VIZ-RSS. We will frequently refer to the trend of the difference series, a metric sensitive to discontinuities.

Fig. 1.

Time series of LS, MT, and LT annual anomalies for the unadjusted VIZ and the UAH and RSS datasets. Differences between VIZ and the two satellite datasets are beneath each respective time series, offset for clarity. Horizontal grid lines are separated by increments of 1 K.

Fig. 1.

Time series of LS, MT, and LT annual anomalies for the unadjusted VIZ and the UAH and RSS datasets. Differences between VIZ and the two satellite datasets are beneath each respective time series, offset for clarity. Horizontal grid lines are separated by increments of 1 K.

The composite sonde product is only an index series. Because of the irregular spatial distribution of the VIZ sondes, the composite product is not a good indicator of large-scale temperature behavior. Also, adjustments to the index series may not apply to the component stations. Variations between Tropics and high latitudes and other local climate effects may require different adjustments from site to site. Even at the same site, series constructed for different release times may need to be adjusted separately (Elliott et al. 2002; Sherwood et al. 2005). The purpose of averaging the time series of the individual stations is to smooth small-scale variations and thus make breakpoints common to the family easier to identify.

To determine whether a discontinuity occurred during a given month, we examined the composite difference series. Specifically, we took the difference of the 36-month averages immediately preceding and following the month in question. If the difference was significantly large, a discontinuity could exist. The number of degrees of freedom in a single 36-month period varied from 5 to 35 but was usually less than 15. For significance testing we chose the z score (the ratio of the current difference and the standard deviation of the differences for all months). We took z = ±3.0 as the threshold of a significant breakpoint. This insures a 95% confidence interval for even the smallest number of degrees of freedom.

4. Results

We began by searching for breakpoints in the full-station composite difference series. We then looked for breakpoints in difference series formed from subsets of the full 31-station set. The purpose of considering subsets was to determine whether the breakpoints were an overall characteristic of the sondes or appeared because of peculiarities in certain subsets. One of the subset divisions was by latitude zone, the other, by day versus night release time.

a. Full-station breakpoints

Figure 2 displays the monthly time series of the absolute values of z scores for each layer (|z| scores). It also displays the difference series from which the |z| scores were derived (VIZ-UAH and VIZ-RSS). Dark arrows identify consensus breakpoints, that is, when the mean of the |z| scores for VIZ-UAH and VIZ-RSS exceeded the threshold. Gray arrows show events when the |z| score of either VIZ-UAH or VIZ-RSS exceeded 3.5, but the mean of the two was less than 3.0. When one difference series produced a significant breakpoint but the other did not, one of the satellite datasets may have been part of the problem. In this figure the breakpoints are labeled A, B, C, etc., in order of decreasing magnitude.

Fig. 2.

Time series of absolute values of z scores of 36-month differences of the difference time series between VIZ and UAH and between VIZ and RSS. Below the z scores are the actual monthly difference anomalies (K). Horizontal grid lines are separated by an increment of 2 for the z scores and 0.4 K for the anomaly differences.

Fig. 2.

Time series of absolute values of z scores of 36-month differences of the difference time series between VIZ and UAH and between VIZ and RSS. Below the z scores are the actual monthly difference anomalies (K). Horizontal grid lines are separated by an increment of 2 for the z scores and 0.4 K for the anomaly differences.

For MT, the |z| scores for both VIZ-UAH and VIZ-RSS identify significant breakpoints for June 1997 (MT-A), January 1990 (MT-B), December 2001 (MT-C), and December 1982 (MT-D). A temperature shift in 1993 was significant for VIZ-RSS but not for VIZ-UAH. For MT-A and MT-B a corresponding shift occurs in LT but not in LS. Thus the source of these breakpoints appears to lie in the lower troposphere shared by both LT and MT.

For LT, a shift in VIZ-UAH corresponds to MT-C, but a corresponding shift does not occur in VIZ-RSS. The greater variability in VIZ-RSS in its last two years makes identifying a breakpoint less probable during this period. Because MT-D has no corresponding signature in LT, the breakpoint appears to derive from the stratosphere. As in MT, a shift in VIZ-RSS occurs in 1993 but not in VIZ-UAH.

For LS, VIZ-UAH and VIZ-RSS consistently identify breakpoints, although the peak in LS-B for VIZ-RSS is 11 months later than that for VIZ-UAH. Both series are significant over the same period. Because the mean of the |z| score is larger at the VIZ-RSS peak, we selected November 1983 as the date of this breakpoint. For LS-C (September 1986) the |z| score of VIZ-RSS reaches 6.03 but that of VIZ-UAH is only 2.29. The reverse situation occurs in March 1994. This latter event is not identified as a consensus breakpoint because the mean of the two |z| scores is less than 3.0.

These two inconsistencies coincide with times when satellite-merging events occurred. At the time of LS-C, both UAH and RSS drop NOAA-6 and add NOAA-10. UAH dropped NOAA-14 in March 1994. The merging procedure at the earlier time is complicated because several satellites influence the handoff from NOAA-6 to NOAA-10. UAH uses a “backbone” method in which intersatellite biases are determined from a single path (NOAA-6 to NOAA-9 to NOAA-10). RSS uses a consensus, or “unified” method, in which biases from all overlapping satellite pairs contribute (NOAA-6 to NOAA-7, NOAA-7 to NOAA-8, NOAA-7 to NOAA-9, and NOAA-8 to NOAA-9). The methods also differ in the way adjustments are made for diurnal drift and instrument calibration.

With about 15% of the MT signal coming from LS, one would expect to see a breakpoint there for each LS breakpoint, but the shifts at March 1991 (LS-A), September 1986 (LS-C), and March 1994 have no noticeable counterparts in the |z| scores of MT for either VIZ-UAH or VIZ-RSS. Thus, satellite-merging procedures must be considered as a possible cause of the differences at these times rather than discontinuities in the sonde data.

b. Zonal breakpoints

We divided the Northern Hemisphere into three latitude zones: low (0°–30°N), mid (30°–47°N), and high (47°–72°N). There were 11 low-latitude sonde stations, 11 midlatitude stations, and 9 high-latitude stations. VIZ-UAH and VIZ-RSS composites were formed for each layer for each subset. Figures 3a–c display the temperature shifts in each difference series wherever significant discontinuities occur. For comparison, these figures also include the significant shifts in VIZ-UAH and VIZ-RSS for the full-station series. Full-station consensus breakpoints, determined in the analysis above, are labeled either UAH Cons or RSS Cons. Nonconsensus but significant breakpoints determined for each dataset independently are labeled UAH Ind or RSS Ind. Significant breakpoints from the zonal subsets are labeled Low, Mid, or High.

Fig. 3.

(a) Individual LS temperature shifts for breakpoints with z scores > 3.0 or <−3.0. Breakpoints identified from UAH data are represented by large open symbols; those from RSS are represented by small filled symbols. In the key, “cons” (consensus) indicates a breakpoint found as the average of both UAH and RSS data, and “ind” (independent) indicates a breakpoint found by either UAH or RSS data independently. Latitude zones are Low (0°–30°N), Mid (30°–47°N), and High (47°–72°N). (b) As in (a) but for MT. (c) As in (a) but for LT.

Fig. 3.

(a) Individual LS temperature shifts for breakpoints with z scores > 3.0 or <−3.0. Breakpoints identified from UAH data are represented by large open symbols; those from RSS are represented by small filled symbols. In the key, “cons” (consensus) indicates a breakpoint found as the average of both UAH and RSS data, and “ind” (independent) indicates a breakpoint found by either UAH or RSS data independently. Latitude zones are Low (0°–30°N), Mid (30°–47°N), and High (47°–72°N). (b) As in (a) but for MT. (c) As in (a) but for LT.

For MT (Fig. 3b), the zones repeat the four consensus breakpoints from the full-station analysis. Also, both the full-station and zonal analyses show that MT-A (1997) is strongest in the low latitudes. From the full-station, low, mid, and high VIZ-UAH datasets and the corresponding datasets of VIZ-RSS, there are eight opportunities to confirm a significant breakpoint. The numbers of confirmations were tallied from tables of |z| scores (not included). The counts for MT-A through MT-D are 8, 8, 8, and 5. Thus, both UAH and RSS robustly identify MT-A through MT-C, and MT-D less so. All four VIZ-RSS datasets identify a broad area of temperature shift in the period 1992–94; VIZ-UAH does not do so except in the midlatitudes.

For LT (Fig. 3c), |z| scores tend to be lower than those for MT or LS because of the way in which brightness temperatures are computed for this layer. The LT product, which is the residual of differing satellite view angles, is inherently noisier and thus less likely to locate significant breakpoints. Also, the RSS method for merging satellite observations produces noisier difference series than those used by UAH. Indeed, although the full-station VIZ-RSS series identifies LT-A (1997) and LT-B (1990) as significant, the zonal VIZ-RSS series do not (this is not obvious in Fig. 3c because of the superposition of symbols). Only the low-latitude VIZ-RSS identifies LT-A, and none of the zonal series identify LT-B. However, even with generally lower LT |z| scores, all four VIZ-UAH series identify both LT-A and LT-B as significant. LT-A and LT-B coincide in time with MT-A and MT-B, both of which were identified as highly significant by UAH and RSS, and thus support the hypothesis that changes in sondes at these times affected how the troposphere was monitored.

The full-station VIZ-RSS indicates relative warming around 1992–93 in LT (Fig. 3c) and MT (Fig. 3b). Only high-latitude VIZ-UAH agrees for LT, and midlatitude VIZ-UAH, for MT. The LT full-station z score for VIZ-RSS was −3.89 in August 1993. For VIZ-UAH it was only −1.23. The MT full-station z score for VIZ-RSS was −3.92 in July 1993. For VIZ-UAH it was −1.99. The RSS scores are significant; those of UAH are not. The z scores are reflected in the RSS and UAH anomaly series. The LT full-station anomaly for RSS in August 1993 was −0.140 K; for UAH, −0.044 K. The MT full-station anomaly for RSS in July 1993 was −0.093 K; for UAH, −0.052 K.

The differences between the RSS and UAH anomaly series in the 1992–94 period are the major source of the relatively more positive trends in the RSS series since this event occurs near the center of the time series. During this time the instrument body temperature of NOAA-11 increased over 6 K. The adjustment in the observed temperature between 1992 and 1994 through the correction coefficient for MT is –0.21 and –0.13 K for UAH and RSS respectively; that is, the 6-K warming of the instrument induces a spurious warming of the observed temperature that must be removed (Christy et al. 2000). Thus, a relative difference of 0.08 K in the adjustment between UAH and RSS explains most of the difference here for MT. If any problems with the sondes (discussed later) existed during this period, we are not aware of them. This leads us to conclude that the differences in the way RSS and UAH apply the correction for the variation of the NOAA-11 instrument temperature on the observed temperature largely are responsible for the differing trend results.

December 2001 also points to differences in the RSS and UAH datasets. The full-station and low-latitude VIZ-UAH series agree with MT-C that a significant temperature shift occurred in late 2001, but the noise level in the VIZ-RSS series for LT did not allow a consensus.

For LS (Fig. 3a), there is even less consistency in the results than in MT or LT. The number of confirmations of the breakpoints out of a possible 8 are 7, 7, 4, and 4 for LS-A through LS-D. Of these, the number of UAH confirmations are 3, 4, 1, and 1. Figure 3a shows that the consensus breakpoint of 1986 (LS-C) is significant in VIZ-RSS largely because of the low-latitude stations. VIZ-UAH also has a low-latitude breakpoint, but it is not strong enough to overcome the influence of the remaining stations. If sonde changes are the cause of this breakpoint, they are local or zonal—not systemic. This situation argues for site-specific rather than dataset-wide adjustments to compensate in greatest detail for the shifts that do occur, just as is done in the methodologies of HadRT, RATPAC, HadAT, and Haimberger.

Overall, most confidence can be placed in the accuracy of LT-A, LT-B, MT-A, MT-B, MT-C, LS-A, and LS-B. The less significant breakpoints, which depend primarily on only one satellite dataset, may simply represent the discovery of discontinuities in one of the satellite datasets by the sondes. The evidence also suggests that stations in specific zones may have changed at times different from the remaining population.

c. Day–night breakpoints

The final test of the breakpoints was to determine whether they appear in both daytime and nighttime releases. Because the 31 stations lie on either side of meridian 90°W, some of the sondes for a given release time were released in the day and the others, at night. Six of the stations were north of 30°N and within 10° longitude of 90°W. During either the winter or summer, the day–night classification of these stations would be incorrect, but we did not remove them from the analysis for these periods. We also limited the analysis to MT to avoid the noisier satellite data in LT and to reduce the impact of the time to reach the highest elevations and thus the potential for being in a day–night regime opposite from that of the release.

This analysis identified the same breakpoints as the previous analyses. The magnitudes were similar, and except for three minor differences, the times were identical: 1) the daytime MT-C was four months earlier, 2) the daytime MT-C was one month earlier, and 3) the nighttime MT-D was one month earlier. The average temperature of daytime ascents becomes cooler than those at night over the 26-yr period by 0.05 K decade−1 (see Sherwood et al. 2005). The magnitudes of the daytime breakpoints are less positive than those for the night, but even the largest difference is only +0.044 K. Thus, the effect of the adjustments to sondes based on satellite comparisons will cause daytime trends to be adjusted more positively than nighttime trends. Though the trends of daytime observations are slightly more negative than those of the nighttime observations and thus need more positive adjustments, the breakpoints found when the sondes were classified by release time were the same as those found without this classification.

d. A summary of the results

In viewing all the significant breakpoints, whether derived by consensus or independently, we conclude that UAH and RSS agree in identifying the following. 1) LS-B and MT-D (1982–83) is a stratospheric change with perhaps an upper-tropospheric influence. 2) MT-A (1997) and MT-B (1990) and their counterparts LT-A and LT-B are exclusively tropospheric and highly significant. 3) LS-D and MT-C (2001) appear to have both tropospheric and stratospheric signatures. 4) LS-A is exclusively stratospheric.

The remaining differences arise where there is disagreement of the two satellite datasets. LS-C (1986) and the UAH LS breakpoint (1994) are likely related to differences in merging procedures. Also, the significant breakpoints identified by RSS for LT and MT (∼1993) were not found by UAH.

5. Breakpoint attribution

We now consider whether documented sonde changes, undocumented sonde changes present in corporate memory, or satellite data problems may be the source of the breakpoints identified above.

a. June 1997

The largest tropospheric breakpoint, MT-A and LT-A, coincides with the deployment of a new version of the VIZ-B sonde, VIZ-B2.2 The baroswitch of VIZ-B, which used a mechanical arm that rotated through 180 discrete electrical contacts, was replaced with a superior solid-state capacitance pressure sensor (Blackmore and Lukes 1998). The calibration methodology also changed. Before VIZ-B2, thermistors were calibrated in batches of 1000–1500. About 1% of the batch was tested. If the sample was deemed acceptable, passing the error tolerance of ±0.5 K over the tested temperature range, the entire batch was calibrated according to the sample and accepted. The VIZ-B2 procedure tests each thermistor rod individually and calculates the calibration coefficients to an acceptable error of ±0.3 K.

From April 1995 through November 1998, both NOAA-12 and NOAA-14 provided highly consistent data and show no apparent anomaly at June 1997. Thus, the evidence suggests MT-A and LT-A are related to the change in sonde packages. Both RSS and UAH agree the MT shift is −0.180 K. UAH indicates a shift of −0.215 K for LT-A, and RSS, −0.199 K. Thus, the unadjusted VIZ time series experienced a shift to cooler tropospheric values relative to the satellites when VIZ-B2 replaced VIZ-B. These shifts, the amounts to be subtracted from the VIZ MT and LT time series from June 1997 forward, are representative of a microwave weighting function. The effects of the change on the actual vertical profile could be complex.

Seven of the 31 stations did not exhaust their supplies of VIZ-B until after June 1997 (Elliott et al. 2002).3 The composite difference time series of these seven indicated a peak z score at December 1998. To estimate the shift if all stations had exhausted the VIZ-B supplies simultaneously, we shifted the anomaly series of the seven stations backward to align December 1998 with June 1997 in the remaining series. Now forming the 31-station composite, we determined the new shift. Because the change was less than 0.02 K, the breakpoint was left at June 1997 for the 31-station composite.

b. January 1990

The second-largest breakpoint, MT-B and LT-B, corresponds to a ground station hardware/software upgrade from MiniArt 2 to MicroArt. The algorithms and selection levels were not changed, but certain settings were; for example, the minimum reporting temperature was set to 183.05 K (−90.1°C). Because this breakpoint is very significant, having a mean MT z score of +6.45 and a mean LT z score of +4.25, and of clear tropospheric importance, we speculate that the change introduced by the new processing system generated a temperature increase. The warming is not likely to have been induced by the satellite data. During November 1988 through September 1991, NOAA-10 and NOAA-11 provided highly consistent data and show no anomaly at this breakpoint. The MT-B shift is +0.156 K for both UAH and RSS. The LT-B shifts are +0.168 and +0.164 K for UAH and RSS, respectively. Both datasets agree to within 0.001 K for MT-A and MT-B and within 0.02 K for LT-A and LT-B.

c. December 2001

MT-C is possibly related to the acquisition of VIZ by Sippican in December 1997. Sippican built a plant in Juarez, Mexico, in 1998. Thousands of Philadelphia-made thermistors were stockpiled for the transition. However, a critical component, the paint that encapsulates the thermistor, was not applied until after the move. Sippican began shipping Juarez-made packages to NWS around the beginning 2000. By mid-2000, the Juarez-made sondes were in the observational stream. At first these sondes contained components from both Philadelphia and Juarez. We have found no direct evidence to indicate that the changes in manufacturing or in observational procedures caused the shift of December 2001, but the satellite data are not likely to be responsible. Both UAH and RSS used the two AMSUs (except RSS LT), the second coming online in February 2001. These instruments exhibited improved precision and intersatellite agreement. Shifts at this breakpoint are +0.173 K for UAH and +0.145 K for RSS. This is a case in which the entire vertical profile of the sondes may be affected, because (a) significant z scores exist in LS and MT for this breakpoint, and (b) UAH identifies this breakpoint as a significant in LT. Using different channels and methods, the three satellite products are not likely to contain simultaneous errors; however, this is still a possibility.

d. 1982–83

Now we proceed to breakpoints more difficult to explain. For MT-D and LS-B we have found no official documentation that changes were made during this time, but we are aware that the data-reporting scheme was altered for instruments shipped to NWS in 1982. Earlier, the current pressure, temperature, relative humidity, and a reference number were transmitted whenever the baroswitch moved to the next contact (“pressure commutated”). In the new package, in which transmission was governed by time instead of pressure, a single quantity was reported every half-second (“time commutated”). Temperature was an instantaneous value, but pressure was given by the current position of the baroswitch. Thus, the pressure associated with a temperature could be as much as 4 hPa too high. If adjustments were not made to account for this discrepancy, then on average the temperatures would be lower for a given pressure level than with the earlier reporting scheme. We do not know that this change is responsible for MT-D and LS-B, but net cooling is consistent with the sign of these breakpoints for the upper troposphere.

According to field personnel, NWS tried to standardize the length of the cord connecting the balloon to the sonde during this period. In the highest levels of an ascent, a longer cord leads to cooler temperatures because the sonde is less affected by the balloon’s reflected radiation and circulation. Most nonstandard cords were generally too short.

The breadth of the interval of elevated |z| scores in 1982–83, especially in the RSS data, suggests that if a sonde change occurred, it was staggered across many months among the stations. Figure 2 shows that the initial peak is at December 1982 but that the average of the UAH and RSS |z| scores for LS is higher in November 1983. The LS-B shift is −0.179 K according to UAH; −0.208 K, according to RSS. The MT-D shift is −0.129 K for UAH, and −0.109 K for RSS. The interpretation of this breakpoint is complicated by the presence of problems in the satellite data. During this time NOAA-8 was commissioned and decommissioned twice and NOAA-6 had to be recommissioned to replace it during the outages. Also, channel 2 on the NOAA-9 MSU failed after only two years of operation. Both the LT and MT products require this channel. Merging satellite data when only short intervals of data were available from individual satellites, a characteristic of this period, can lead to more uncertainty.

e. March 1991

LS-A is also a case where we find no clear reason in the operational reports for such a highly significant breakpoint. Because this breakpoint does not affect MT, it is apparently confined to the lowest pressure levels—for example, 50–10 hPa—if it is due to radiosonde issues. We examined daily and monthly radiosonde values for eight stations that exhibited the most significant individual shifts at March 1991.4 We did not find any systematic change in the number of reports reaching each level or other logistical aspect that might affect the temperature processing.5 We also examined the daily temperature reports and anomalies from the two satellites, NOAA-10 and -11, and found no indication of a shift in their relative difference at March 1991. We speculate that an update to the computer code in the ground station, which is not uncommon, could have altered the readings in sonde-LS or that some other, undocumented change may have occurred. A joint error in the satellite datasets is still a possibility, though this is difficult to explain given the different methods by which UAH and RSS merge the data and the independent measurements of the two satellites.

f. Effects on trends

The trend metric is very sensitive to discontinuities in time series. We close this section by considering how the trends are affected when adjustments are applied to reduce shifts at consensus breakpoints in the VIZ-UAH and VIZ-RSS sets of time series. Table 2 summarizes the results for each of the three layers. The results for MT are presented in Table 2a; for LT, in Table 2b; and for LS, in Table 2c. We do not discuss the correlation-squared and standard deviation, which are included in the table for reference. For each layer, the table begins with the statistics of the unadjusted series. Adjustments are applied in three phases: 1) for each consensus breakpoint separately, 2) for consensus breakpoints that appear to arise solely from sonde changes, and 3) for all consensus breakpoints. The table presents the statistics of the adjusted series in this order. Figure 4 shows the series of Fig. 1 after the adjustments of the second phase have been made.

Table 2.

(a) Statistics of the time series of differences between the sondes and the respective satellite datasets for the trend and the standard deviation (std dev) of the residuals. The value of the breakpoint (BP) is that which is subtracted from the VIZ-sondes throughout the remaining part of the time series. Here R2 = correlation squared, and std dev = standard deviation. (b) As in (a) but for layer LT. (c) As in (a) but for layer LS.

(a) Statistics of the time series of differences between the sondes and the respective satellite datasets for the trend and the standard deviation (std dev) of the residuals. The value of the breakpoint (BP) is that which is subtracted from the VIZ-sondes throughout the remaining part of the time series. Here R2 = correlation squared, and std dev = standard deviation. (b) As in (a) but for layer LT. (c) As in (a) but for layer LS.
(a) Statistics of the time series of differences between the sondes and the respective satellite datasets for the trend and the standard deviation (std dev) of the residuals. The value of the breakpoint (BP) is that which is subtracted from the VIZ-sondes throughout the remaining part of the time series. Here R2 = correlation squared, and std dev = standard deviation. (b) As in (a) but for layer LT. (c) As in (a) but for layer LS.
Fig. 4.

As in Fig. 1 but for VIZ annual anomalies adjusted for the most robust breakpoints, which are likely due to radiosonde inconsistencies (LS-A, -B, -D; MT-A, -B, -C, -D, and LT-A, -B).

Fig. 4.

As in Fig. 1 but for VIZ annual anomalies adjusted for the most robust breakpoints, which are likely due to radiosonde inconsistencies (LS-A, -B, -D; MT-A, -B, -C, -D, and LT-A, -B).

The most meaningful statistics are the 95% confidence error bands on the difference trends, expressed as K decade−1 and shown with the trend values in Table 2. In the following, adjustments are made according to each satellite dataset. For MT, the error is reduced from ±0.079 to ±0.049 in VIZ-UAH when adjustments are made for A and B. In VIZ-RSS, the error is reduced from ±0.113 to ±0.094. For LT, the error is reduced from ±0.071 to ±0.056 in VIZ-UAH when adjustments are made for A and B. In VIZ-RSS, the error is reduced from ±0.120 to ±0.115. For LS, the error is reduced from ±0.125 to ±0.041 in VIZ-UAH when adjustments are made for A, B, and D. In VIZ-RSS, the error is reduced from ±0.392 to ±0.099.

When the error bands are centered on the trends to give the 95% confidence intervals, we see that 0.0 K decade−1 lies in the interval in every case. Thus, the existence of no trend in any of the VIZ-UAH or VIZ-RSS series is statistically possible. However, the error ranges in VIZ-UAH are generally half as large as those of VIZ-RSS. For VIZ-UAH, the error ranges are consistent with the previously published global values (Christy et al. 2003): ±0.05 K decade−1 (LT and MT) and ±0.10 K decade−1 (LS).

6. Simple statistical retrievals (SSRs)

SSRs are an alternative method of estimating time series for LT. They use LS to reduce the stratospheric influence on MT and thus enhance the tropospheric signal. SSRs cannot completely eliminate the influence of LS, but they can reduce it by canceling correlated layers near the tropopause (Spencer et al. 2006).

SSRs have the form

 
formula

where α is the coefficient of LS influence, ɛ0 is a constant bias, and ɛ is the time-varying error [see Fu and Johanson (2004), who apply SSRs to a deeper tropospheric column than LT]. Since satellite weighting functions have overlapping layers, the variability of these layers should be consistently apparent through statistical relationships among the weighting profiles (Fu et al. 2004).

The value of α is usually determined by the anomaly method. Values of MT, LS, and LT needed in (1) are simulated from sonde data and expressed as monthly anomalies. The values of LT replace LTSSR in (1) and α is then obtained by minimizing Σɛ2. When obtained by this method, the coefficient is called αA. For sonde data we used the adjusted, 31-station, composite anomalies of MT, LS, and LT. For the 312 months of data, we determined αA to be 0.160.

With αA inserted, the left side of (1) predicts a series for LTSSR. We used the values for MT and LS from the sondes and from the UAH and RSS satellite products to form three series of LTSSR. We are interested in how the trends of these series compared with the trends of their actual LT counterparts; LTSSR − actual LT for the three pairs, shown in Table 3, are +0.011, +0.011, and −0.026 K decade−1 for the sondes, UAH, and RSS, respectively. These differences indicate the value of αA produced by the anomaly method generates reasonable trends for the 31-station set over the 26-yr period.

Table 3.

Trend differences between the time series produced using simple statistical retrievals (LTSSR) and the actual LT time series. Values of the SSR coefficients applied are given as αA and αT.

Trend differences between the time series produced using simple statistical retrievals (LTSSR) and the actual LT time series. Values of the SSR coefficients applied are given as αA and αT.
Trend differences between the time series produced using simple statistical retrievals (LTSSR) and the actual LT time series. Values of the SSR coefficients applied are given as αA and αT.

To test how well the αA developed from the 26-yr dataset, applied to shorter periods, we divided the data into two 13-yr periods and recomputed the trend differences for each set. The results are listed in Table 3. They show that the interlayer relationships producing the 26-yr αA are not representative of the shorter periods (Spencer et al. 2006). For long-term analysis the SSR technique requires that interlayer statistics be stationary and that monthly interlayer relationships represent decadal relationships (Tett and Thorne 2004).

A value of α can also be determined by the trend method. In this case α is tuned to make the trends of the sonde LT and the sonde LTSSR identical and is denoted αT. The value of αT derived from 26 yr of sondes is 0.140. Table 3 shows the trend differences of the LTSSR and LT series for the sondes, UAH, and RSS for the full 26-yr period and for the two half periods. For the full period the trend difference for the UAH data is −0.001 K decade−1, and for RSS, −0.037 K decade−1.

Values of αA and αT can also be derived by using satellite data for MT, LS, and LT in (1). For UAH, the values are αA = 0.154 and αT = 0.141. For RSS, they are αA = 0.190 and αT = 0.209. Recall, when derived from the sondes, the values are αA = 0.160 and αT = 0.140. If the values from the sondes and UAH are reasonable, then one or more of the RSS products is inconsistent with its partner products to a certain degree. For example, if the RSS LS time series is artificially adjusted to be more negative by −0.10 K decade−1, then αA = 0.182 and αT = 0.176 and thus closer to the values determined from the sondes and UAH. Because of the interdependency of the products, there is no obvious way to determine where the inconsistency may lie.

We make four points about the SSR analysis that apply only to this 31-station dataset for this 26-yr period. 1) LTSSR trends derived from sonde-based SSRs by either the anomaly or the trend method are within ±0.04 K decade−1 of the actual LT trends for all datasets. 2) We are unable to determine from SSRs whether one dataset is closer to actual trends than another. We can say only whether each dataset is self-consistent in the same way as sondes; that is, UAH is self-consistent within ±0.01 K decade−1 and RSS is self-consistent within ±0.04 K decade−1. 3) The SSR coefficients determined from RSS data, viewed especially in terms of the resulting trends, are significantly different from those determined from the sondes and UAH data. This suggests a higher level on inconsistency within the RSS products. 4) The values of α depend on the method of calculation and the choice of time period.

7. Conclusions

The dataset derived from the U.S.-controlled, 31-station VIZ/Sippican sondes is characterized by considerable metadata, rich observational records, and general consistency among instruments and procedures. Our analysis of the dataset, using satellite microwave-sounding products of UAH and RSS as references, indicates that changes over time have likely introduced spurious temperature shifts. In some cases these are documented by station records, but in others they are inexplicable. Other shifts are significant relative to only one satellite dataset and suggest the inconsistency lies with the satellite product.

The most robust breakpoints in the sonde time series occur on two occasions when changes in sonde instruments or procedures are well documented. In January 1990, a change in ground-station software and hardware led to warmer tropospheric temperatures. The change from VIZ-B to VIZ-B2 packages led to cooler tropospheric temperatures starting in June 1997. A temperature shift in 2001 may be related to the relocation of the sonde manufacturing plant, and another shift in the early 1980s may be related to changes in the way the sondes radioed variables to the ground stations. Overall, the discontinuities apparently attributable to the sondes affected the trends by only ±0.02 K decade−1 in the troposphere. The significant differences between sondes and satellites in the stratosphere cannot be accounted for except by the explanations already offered for the shifts in the early 1980s and in 2001. Two shifts found in LS and one each in LT and MT appear to be related to problems in the satellite datasets and not in the sondes. In particular we note that the difference around 1993 between UAH and RSS in LT and MT appears to originate from different ways the rapid increase in the instrument body temperature of the NOAA-11 MSU (over 6 K) is accommodated. Also, the very prominent difference between both satellite datasets and VIZ sondes in LS during 1991 requires an explanation in further work.

An independent satellite dataset is useful as a reference to identify breakpoints in an index time series averaged over a family of sondes. The magnitude of the shifts found by this method is not likely to be useful to adjust the shifts in individual stations. Indeed, because satellite data represent broad vertical layers of the atmosphere, determining the adjustment at each pressure level of a sonde profile would be difficult. The primary value of the method is its ability to determine where significant breakpoints may occur. Other means may be used to determine the amount of adjustment at each level.

Overall the UAH dataset is more consistent with the sonde record than is RSS. For LT, MT, and LS, the error ranges of the trends in the UAH datasets relative to those of the sondes (after the composite sonde time series have been adjusted to compensate for the identified shifts) are ±0.06, ±0.04, ±0.07 K decade−1, respectively. The corresponding error ranges for the RSS datasets are ±0.12, ±0.10, ±0.10 K decade−1. The UAH error ranges are consistent with those published in Christy et al. (2003). Simple statistical retrievals derived from interlayer relationships of sondes show a higher consistency with UAH time series than with RSS.

Acknowledgments

We thank William Blackmore, NOAA/NWS, and Thomas Curran, Lockheed-Martin/Sippican, for many details regarding the VIZ radiosonde instrumentation and its manufacture. Three reviewers suggested important revisions to the original manuscript. This research was supported by Department of Energy DE-FG02-04ER 63841 and Department of Transportation DTFH61-99-X-00040.

REFERENCES

REFERENCES
Blackmore
,
W. H.
, and
M. M.
Lukes
,
1998
:
Implementation of the VIZ-B2 radiosonde at NWS upper-air stations. Preprints, 10th Symp. on Meteorological Observation and Instrumentation, Phoenix, AZ, Amer. Meteor. Soc., 22–27
.
Christy
,
J. R.
, and
W. B.
Norris
,
2004
:
What may we conclude about global tropospheric temperature trends?
Geophys. Res. Lett.
,
31
.
L06211, doi:10.1029/2003GL019361
.
Christy
,
J. R.
,
R. W.
Spencer
, and
E.
Lobl
,
1998
:
Analysis of the merging procedure for the MSU daily temperature time series.
J. Climate
,
11
,
2016
2041
.
Christy
,
J. R.
,
R. W.
Spencer
, and
W. D.
Braswell
,
2000
:
MSU tropospheric temperatures: Dataset construction and radiosonde comparison.
J. Atmos. Oceanic Technol.
,
17
,
1153
1170
.
Christy
,
J. R.
,
R. W.
Spencer
,
W. B.
Norris
,
W. D.
Braswell
, and
D. E.
Parker
,
2003
:
Error estimates of version 5.0 of MSU/AMSU bulk atmospheric temperatures.
J. Atmos. Oceanic Technol.
,
20
,
613
629
.
Durre
,
I.
,
R. S.
Vose
, and
D. B.
Wuertz
,
2006
:
Overview of the integrated global radiosonde archive.
J. Climate
,
19
,
53
68
.
Elliott
,
W. P.
,
R. J.
Ross
, and
W. H.
Blackmore
,
2002
:
Recent changes in NWS upper-air observations with emphasis on changes from VIZ to Vaisala radiosondes.
Bull. Amer. Meteor. Soc.
,
83
,
1003
1017
.
Free
,
M.
,
D. J.
Seidel
,
J. K.
Angell
,
J.
Lanzante
,
I.
Durre
, and
T. C.
Peterson
,
2005
:
Radiosonde Atmospheric Temperature Products for Assessing Climate (RATPAC): A new data set of large-area anomaly time series.
J. Geophys. Res.
,
110
.
D22101, doi:10.1029/2005JD006169
.
Fu
,
Q.
, and
C. M.
Johanson
,
2004
:
Stratospheric influences on MSU-derived tropospheric temperature trends: A direct error analysis.
J. Climate
,
17
,
4636
4640
.
Fu
,
Q.
,
C. M.
Johanson
,
S. G.
Warren
, and
D. J.
Seidel
,
2004
:
Contribution of stratospheric cooling to satellite-inferred tropospheric temperature trends.
Nature
,
429
,
55
58
.
Haimberger
,
L.
,
2004
:
Homogenization of radiosonde temperature time series using ERA-40 analysis feedback information. ERA-40 Project Report Series 23, European Centre for Medium-Range Weather Forecasts, Reading, United Kingdom, 74 pp
.
Lanzante
,
J. R.
,
S. A.
Klein
, and
D. J.
Seidel
,
2003
:
Temporal homogenization of monthly radiosonde temperature data. Part I: Methodology.
J. Climate
,
16
,
224
240
.
Mears
,
C. A.
, and
F. J.
Wentz
,
2005
:
The effect of diurnal correction on satellite-derived lower-tropospheric temperature.
Science
,
309
,
1548
1550
.
Mears
,
C. A.
,
M. C.
Schabel
, and
F. J.
Wentz
,
2003
:
A reanalysis of the MSU channel 2 tropospheric temperature record.
J. Climate
,
16
,
3650
3664
.
Parker
,
D. E.
,
M.
Gordon
,
D. P. N.
Cullum
,
D. M. H.
Sexton
,
C. K.
Folland
, and
N.
Rayner
,
1997
:
A new global gridded radiosonde temperature data base and recent temperature trends.
Geophys. Res. Lett.
,
24
,
1499
1502
.
Sherwood
,
S.
,
J.
Lanzante
, and
C.
Meyer
,
2005
:
Radiosonde daytime biases and late 20th century warming.
Science
,
309
,
1556
1559
.
Spencer
,
R. W.
,
J. R.
Christy
, and
N. C.
Grody
,
1990
:
Global atmospheric temperature monitoring with satellite microwave measurements: Method and results 1979–1985.
J. Climate
,
3
,
1111
1128
.
Spencer
,
R. W.
,
J. R.
Christy
,
W. D.
Braswell
, and
W. B.
Norris
,
2006
:
Estimation of tropospheric temperature trends from MSU channels 2 and 4.
J. Atmos. Oceanic Technol.
,
23
,
417
423
.
Tett
,
S.
, and
P.
Thorne
,
2004
:
Atmospheric science: Tropospheric temperature series from satellites.
Nature
,
432
.
doi:10.1038/nature03208
.
Thorne
,
P. W.
,
D. E.
Parker
,
S. F. B.
Tett
,
P. D.
Jones
,
M.
McCarthy
,
H.
Coleman
,
P.
Brohan
, and
J. R.
Knight
,
2005
:
Revisiting radiosonde upper-air temperatures from 1958 to 2002.
J. Geophys. Res.
,
110
.
D18105, doi:10.1029/2004JD005753
.

Footnotes

Corresponding author address: John R. Christy, ESSC/Cramer Hall, University of Alabama in Huntsville, Huntsville, AL 35899. Email: christy@nsstc.uah.edu

1

Barbados, Belize, and Grand Cayman followed a different schedule for some of the changes, but their statistics appear to be consistent with the U.S. stations and thus remain in the database. Also, Bismarck and Del Rio used the Space Data Division sondes for a few years, and Guam used the Micosonde package for two years. Months with non-VIZ instrumentation were set to missing for these stations.

2

The change from VIZ-A to VIZ-B took place generally in October 1988, but no evidence of the change was seen.

3

August 1997 (San Juan); November 1998 (Annette Island, Guam, and Hilo); December 1998 (Barrow, Majuro, and St. Paul Island).

4

Barrow, Belize, Great Falls, Green Bay, Guam, Desert Rock, Oakland, and San Diego.

5

Some thought was given to the possibility that aerosol shading from the eruption of Mt. Pinatubo may have lowered the sonde’s stratospheric readings. However, the main eruption occurred in June 1991 and the aerosol cloud was not in the vicinity of most of these stations until months after the event.