Po-Chedley and Fu investigated the difference in the magnitude of global temperature trends generated from the Microwave Sounding Unit (MSU) for the midtroposphere (TMT, surface to about 75 hPa) between the University of Alabama in Huntsville (UAH) and Remote Sensing Systems (RSS). Their approach was to examine the magnitude of a noise-reduction coefficient of one short-lived satellite, NOAA-9, which differed from UAH and RSS. Using radiosonde comparisons over a 2-yr period, they calculated an adjustment to the UAH coefficient that, when applied to the UAH data, increased the UAH global TMT trend for 1979–2009 by +0.042 K decade−1, which then happens to agree with RSS’s TMT trend. In studying their analysis, the authors demonstrate 1) the adjustment calculated using radiosondes is inconclusive when errors are accounted for; 2) the adjustment was applied in a manner inconsistent with the UAH satellite merging strategy, creating a larger change than would be generated had the actual UAH methodology been followed; and 3) that trends of a similar product that uses the same UAH coefficient are essentially identical to UAH and RSS. Based on the authors’ previous analysis and additional work here, UAH will continue using the NOAA-9 noise-reduction coefficient, as is, for version 5.4 and the follow-on version 5.5.
The global Microwave Sounding Unit (MSU) midtropospheric temperature TMT product measures the bulk atmospheric temperature between the surface to about 75 hPa, peaking in the midtroposphere, as observed by a series of polar-orbiting satellites. The University of Alabama in Huntsville (UAH) has been generating this product since 1990 (Spencer and Christy 1990) and has provided newer versions as information was developed to address errors and as instrument problems were discovered (Christy et al. 1995).
In the late 1990s, we discovered that TMT differences between two satellites orbiting and observing the globe at the same time could be explained in many cases by the variation in the temperature of the sensor itself, as represented by the hot calibration target (HCT) temperature THCT (Christy et al. 1998, 2000). This was especially true for the MSUs on spacecraft in the afternoon orbit [nominal ~1330 UTC equatorial crossing time: National Oceanic and Atmospheric Administration (NOAA) Satellite NOAA-7, -9, -11, -14]. We devised a technique to calculate the linear relationship between the THCT and the intersatellite differences by solving a system of equations and then removed this effect. Remote Sensing Systems (RSS; Mears et al. 2011) and NOAA’s Center for Satellite Applications and Research (STAR; Zou and Wang 2011) also generate TMT products and use corrections based on variations of THCT as well. Since the differences between UAH and RSS were the main focus of Po-Chedley and Fu (2012, hereafter PF2012), we shall focus on those. We recognize that the details of this analysis may seem obscure and difficult to follow, so we have attempted to describe the issues as simply as possible.
As noted, the magnitude of the UAH HCT coefficients (HCTCs) were empirically calculated as the solution to the system of daily equations of the co-orbiting satellites’ daily brightness temperature differences and hot target temperatures (Christy et al. 2000). The HCTC is unitless, so that for a given change in a sensor’s THCT, the correction to the observed temperature of that sensor in units of kelvins will be THCT multiplied by the HCTC.
The individual HCTCs, one for each spacecraft, depend on the satellite data after they have been modified. Prior to the HCTC calculation, 1) the scene brightness temperature effect due to the spacecraft’s drift through the diurnal cycle is removed; and 2) the data are low-pass filtered, so the solution deals with the time scale of differences deemed important to monitoring global temperature trends. This is of critical importance for the UAH merging procedure, as it focuses the calculation of the HCTC on the relative trend differences. Without this smoothing, a different (smaller) HCTC is calculated, but which introduces a trend difference between the satellites—the very error we seek to detect and remove. Thus, a comparison of unsmoothed intersatellite differences with the HCT temperature without the temporal filtering, as in PF2012, will misrepresent the temporal relationship between NOAA-6 and NOAA-9, leaving spurious trends. Finally, 3) the specific satellites chosen for sequential merging determine which overlapping satellite pairs are addressed.
Because UAH and RSS prepare the data differently, their HCTCs will thus be different. More specifically 1) UAH applies a diurnal correction based on empirical information, while RSS utilizes diurnal output from a climate model; 2) UAH low-pass filters the daily data to isolate the intersatellite low-frequency (i.e., trend) differences, while RSS applies minimal smoothing, allowing higher frequencies to contribute to the HCTC; and 3) UAH uses a “backbone” merging sequence in which the most stable satellites are given priority (NOAA-6, -10, -12, -15, and Aqua), while RSS applies a “consensus” approach in which all satellites are given weight according to their length of record.
While there are reasons for choosing any of these paths, the key point is that these choices affect the magnitude of the HCTC. RSS, for example, has occasionally increased the NOAA-9 HCTC as updated preadjustments were applied [−0.0195 in Mears et al. (2003), to −0.0362 in Mears and Wentz (2009), to −0.0400 reported in PF2012). The value of the UAH NOAA-9 HCTC is −0.099 used in version 5.4, again a value calculated empirically from the UAH-adjusted satellite data.
With this as an introduction, the following sections will address the PF2012 analysis, specifically, 1) the confidence of the radiosonde results to detect HCTC magnitudes, 2) the manner in which PF2012 applied their suggested correction, and 3) other information that supports the UAH value of the HCTC for NOAA-9.
2. The radiosonde comparison
PF2012 approached the study with an assumption that the main difference between global TMT trends of UAH and RSS (about 0.04 K decade−1, 1979–2009) was due to the differing magnitudes of their respective NOAA-9 HCTCs. They thus focused specifically on the short 26-month period in which NOAA-9 operated (late December 1984 through February 1987). They then checked to see whether there was a relationship between the difference of UAH TMT versus a “reference” TMT estimated from radiosondes (i.e., assuming this difference to be UAH error) and NOAA-9’s THCT. They found a relationship between this difference or “UAH HCTC error” and THCT displayed in their Fig. 2. From this relationship between the UAH “error” and THCT they calculated an error slope or “bias” in the UAH HCTC of +0.051. Stating it differently, they would have, in effect, calculated a new NOAA-9 coefficient of −0.048, that is, −0.099 plus their calculated error of 0.051.
The first concern here is that the PF2012 analysis assumes that the UAH TMT values for January 1985–February 1987 are those from NOAA-9 only. However, during the NOAA-9 period, data from NOAA-6, -7, -8, and -10 were also fully part of the time series. Indeed, NOAA-7 and -8 are completely independent of the NOAA-9 HCTC. NOAA-7 in particular was problematic during its last days of operation (i.e., during its brief overlap with NOAA-9) as its drift accelerated and its THCT increased about 9 K in just 3.5 years, rendering those late NOAA-7 adjustments (both HCTC and diurnal) very uncertain. NOAA-11, a spacecraft launched later but in the same type of orbit, saw a THCT increase of only 2 K during its first 3.5 years (Christy et al. 2000). The point here is that the completed values of TMT are representative of more than NOAA-9 alone.
A second concern is the global radiosonde data. During this period, many stations converted to different manufacturers (Vaisala RS80 was becoming popular), which typically reported increases in tropospheric temperature over previous instrumentation (Christy and Norris 2004). The radiosonde datasets used by PF2012 attempt to remove these biases, but very likely, some remain (see evidence below) and radiosonde data, like all non–International System of Units traceable environmental data records, contain several sources of unaccounted-for errors.
We have reproduced Fig. 2 of PF2012 (Fig. 1) but using 31 U.S. VIZ radiosondes, which span the western Northern Hemisphere from the western tropical Pacific to Alaska, across the conterminous United States, to the Caribbean islands. The radiosonde-simulated TMT values incorporated the full radiative transfer formula including variations in humidity. A key feature of the VIZ radiosondes is that they experienced no known or reported changes in instrumentation during the NOAA-9 period (Christy and Norris 2006). In the top portion of the figure, we show the “UAH minus radiosonde” monthly differences versus the corresponding value of THCT as well as identifying which satellites were involved. In the bottom portion, we limit the months to those that are used to calculate the NOAA-9 HCTC (for the backbone).
The plots differ from PF2012 in important ways. First, if just 2 months in the top plot are removed (February and March 1985), the sign of the slope becomes essentially zero, that is, no UAH error slope relative to THCT. Even when including these 2 months, the slope is not significantly different from zero (statistical error of ±0.047). However, this characterizes only one type of error known as statistical error. Any set of nonlinear points will produce statistical error around a line of best fit even if the underlying data are prefect. This type of error simply quantifies the ability of a straight line to represent the data points. This, however, assumes the data points are without error.
We know that both the radiosondes and the satellite data contain “measurement errors,” particularly at the monthly gridpoint level. Christy et al. 2011 calculated the standard deviation of the error in the UAH versus VIZ composite (i.e., 31 stations) of ±0.09 K, included in Fig. 1 as error bars of 1.5 σɛ. The slope and error estimates using this independent source of measurement error are +0.0412 ± 0.0441. Combining the two sources of error (statistical and measurement) produces a relationship of +0.042 ± 0.073, that is, a nonsignificant relationship indicating an inconclusive result. As noted, this result is derived from the well-characterized U.S. VIZ radiosondes, which included humidity, complete soundings for virtually every day of each month, and the enhanced accuracy of the MSU layer mean due to the inclusion of all reporting levels (i.e., well beyond the mandatory levels), many with over 50 reports per sounding.
Increasing the number of radiosonde comparisons, if they were as well characterized as the U.S. VIZ stations, would obviously decrease the measurement error. However, when adding (or sampling randomly) stations that have 1) far fewer pressure levels; 2) few, poor, or no humidity reports; 3) many missing days; and 4) many changes in instrumentation, one is not guaranteed a reduction in noise; in fact, it is just as likely that the noise will increase (Christy and Norris 2004). In other words, if the sample of radiosondes includes poorer-quality stations, then the result will have higher measurement errors.
We note that PF2012 calculated their slope and errors for the UAH comparison as 0.051 ± 0.031 without including the measurement error and assuming all the points were dependent on NOAA-9 alone. We estimate that the PF2012 error range would be at least ±0.054 when only statistical and measurement errors are accounted for, which would also be an inconclusive result. For a further point below, we remind the reader that there have been two values of the UAH HCTC introduced here: 1) the UAH value calculated empirically and applied in our datasets: −0.099, and 2) the PF2012 value calculated from reference radiosondes: −0.048.
The two key satellite overlaps with NOAA-9 (NOAA-6 and NOAA-10) are shown in the bottom panel of Fig. 1 with a magnified y axis, where it can be seen there is no relationship between UAH “error” and THCT. This is to be expected because the UAH NOAA-9 HCTC is calculated to remove the trend relationship between the three instruments and the THCT. Again, without the temporal filtering, one cannot remove the trend difference between NOAA-6 and NOAA-9. The NOAA-9 HCTC does not influence any other HCTC, since the only overlaps with NOAA-9 that feed into its calculation are NOAA-6 and NOAA-10, whose target factors are already essentially zero.
There is another way to address NOAA-9’s influence on the time series. We can check the temperature change before and after NOAA-9’s period of service as determined by global radiosonde data. Since over short time series the stations utilized for the global datasets are unchanging, we will capture the key information we seek—the interannual differences of anomalies (i.e., we are not concerned with absolute temperatures). For 24-month (12 months) periods before and after NOAA-9, Table 1 indicates UAH and RSS are different by only +0.015 (+0.006) K and that both are cooler than the radiosonde average by +0.09 (+0.06) K. Thus, this apparent relative warming of the radiosondes, which was implied earlier, exceeds the difference between UAH and RSS, suggesting radiosondes are not capable over this time span of identifying small satellite errors. Since differences over short periods of time between any of these data often exceed 0.1 K, we can only say the satellite datasets are probably closer to each other than either is to the radiosondes.
In our view, the radiosonde information presented above does not constitute evidence that requires a change to the NOAA-9 HCTC. Further, the UAH HCTC of −0.099 was objectively and empirically determined through a system of equations and verified in Fig. 1. Stating otherwise simply cannot be consistent with the UAH merging method. As a corollary, intercomparisons outside of the November 1985–October 1986 period with NOAA-9 are irrelevant to the calculation of the UAH NOAA-9 HCTC. However, that being the case, there is still an issue of how the PF2012 applied their adjustment to the UAH time series in a way that overmagnified its actual impact on the time series.
3. Applying an adjustment to the UAH time series
Before examining the problem of how PF2012 applied their adjustment to the UAH data, a description of the UAH merging procedure is necessary. Unfortunately, the procedure is complicated because two NOAA sensors, NOAA-6 and NOAA-8, were brought online and offline during this particular period, but we shall focus only on those components that are pertinent to the discussion at hand.
The underlying goal in this portion of the time series is to use NOAA-9 as a bridge to bring the two backbone satellites, NOAA-6 and NOAA-10, together (see Fig. 2). Since there was no direct overlap between NOAA-6 and NOAA-10, the task is to determine how much to adjust NOAA-6 so it will match up with NOAA-10 in late 1986 and continue the time series for all remaining data. It is important to note that in the UAH methodology, all data for the pre-NOAA-9 satellites are tied directly into NOAA-6 as the backbone (i.e., satellites TIROS-N, NOAA-7, and NOAA-8 are attached to NOAA-6 via direct bias removal and can be thought of in total as the “NOAA-6 backbone” in Fig. 2). Thus, all data prior to NOAA-10 depend on a good estimate of the NOAA-6–NOAA-10 bias adjustment.
With 350 days of directly overlapping data with NOAA-6, NOAA-9 is the only satellite that can tie NOAA-6 to NOAA-10 effectively [see Christy et al. (1998) for investigation of all pathways]. Alternatively, for example, there were only 65 days of overlapping data between NOAA-7 and NOAA-9 at the end of NOAA-7’s service. As noted above, this period was at the point where NOAA-7’s THCT was soaring and its drift accelerating, thus giving little confidence that a proper bias could be obtained because of the shortness of the overlap and the unreliability of the values [i.e., going from NOAA-6 to NOAA-7 to NOAA-9 then to NOAA-10 (Christy et al. 1998)]. Indeed, a difference between NOAA-7 and NOAA-9 remains in their final individual time series because the adjustments to NOAA-7, both diurnal and HCTC, are large and most uncertain at its very end.
If NOAA-9 were perfectly stable during its operation, then it would be a simple matter of removing the bias between NOAA-6 and NOAA-9, then between NOAA-9 and NOAA-10, and finally merging NOAA-6 and NOAA-10. However, NOAA-9’s raw TMT values experienced spurious warming during their overlaps with NOAA-6 and NOAA-10 due to NOAA-9’s increasing THCT. If not accounted for this would leave the NOAA-6 backbone too cool relative to NOAA-10, spuriously warming the entire time series. So, using the HCTC as a factor multiplying the NOAA-9 THCT, we can remove the relative trend between NOAA-9 and NOAA-6, and then we can proceed with the removal of the relative biases.
In simple terms, NOAA-9’s THCT warmed by 1.135 K between its NOAA-6 and NOAA-10 overlaps. The HCT bias correction to then “lift” NOAA-6 to match NOAA-10 using NOAA-9 would be −1.135 KHCT × −0.099 for a bias correction to the NOAA-6 backbone of +0.112 K. Applying PF2012’s estimate for the HCTC would give −1.135 KHCT × −0.048 or +0.055 K, had it been applied properly (see below). In practice, the correction is applied day by day to NOAA-9 depending on the value of THCT of that day, but framing the discussion as the fundamental bias correction or lift applied to the NOAA-6 backbone preserves the representation of the merging procedure here.
The problem with the manner by which PF2012 applied their adjustment is twofold. First, working in reverse with the published UAH data, they applied their adjustment to the completed time series rather than on the individual NOAA-9 segment. Second, the adjustment was applied to the entire 26-month NOAA-9 period, starting in January 1985, rather than to the shorter period of its overlap with NOAA-6 starting in November 1985 (when NOAA-6 was reactivated to replace the problem-plagued NOAA-8 spacecraft.) The critical point here is that the backbone of the time series prior to November 1985, not January 1985, essentially depends only on the NOAA-6 backbone.
Starting in January 1985, PF2012 calculated their adjustment based on the NOAA-9 THCT anomaly at that time, the value of which was about −2.4 KHCT. This produced an adjustment value of −0.122 K (−2.4 KHCT × 0.051) for January 1985 and the entire NOAA-6 backbone, that is, December 1978–January 1985. From January 1985 forward, a linear adjustment from −0.122 K ramping upward to zero by February 1986 was applied to the published UAH time series. This, in effect, gives a ramped shift of +0.122 K over the 26 months to the published UAH time series, which then increased the 1979–2009 trend by +0.042 K decade−1.
To put all calculations described above in the same perspective, the lifts or bias corrections applied to the NOAA-6 backbone to match it with NOAA-10 and all subsequent data are 1) UAH: +0.112 K; 2) PF2012, if properly done: +0.055 K; and now 3) the actual application in PF2012: −0.010 K (i.e., +0.112 minus +0.122 K, since it is applied as a correction to the UAH implemented value).
Had PF2012 applied their correction according to the methodology that generated the UAH dataset (i.e., an adjustment of +0.055 K to the NOAA-6 backbone), the increase in the UAH trend would have been only +0.022 K decade−1 for 1979–2009, not +0.042 K decade−1 as they claim. We verified this result exactly by inserting the PF2012 HCTC of −0.048 directly into our merging code and checking the result. This is the complete and accurate impact of a change in the NOAA-9 HCTC because all biases are recalculated and, as stated earlier, NOAA-9’s HCTC is independent of the other HCTC’s. Additionally, it is not meaningful to express this difference as a percentage of the base, since the base trend is already very small with typical error bars being ±0.04 K decade−1.
When applied to the current time series through 2012, all differences would be even smaller. Thus, in summary here, we are able to reproduce the suggested corrections of PF2012 and their resulting trend effect. In doing so, we demonstrate how the incorrect application of the suggested HCTC by PF2012 led to an overmagnification of its effect on the trend of the time series.
4. Additional evidence supporting the UAH HCTC
There is further evidence that the NOAA-9 HCTC as calculated should be near −0.099 for the merging strategy UAH employs. Besides TMT, UAH and RSS produce a lower-tropospheric temperature product (TLT, surface to about 300 hPa) that is more heavily used and documented. UAH uses the identical NOAA-9 HCTC for TLT because both TMT and TLT are derived from the same MSU channel 2. The global TLT trends (from 1979 to November 2012) for UAH and RSS are virtually identical (+0.138 and +0.132 K decade−1, respectively). Further, PF2012 found no TLT problem when applying their radiosonde test for UAH TLT during the NOAA-9 period (a layer for which radiosondes have less error). This is not surprising given the inherent error of all of the datasets as mentioned earlier. From our analysis, we conclude that the radiosonde comparisons over such short periods are simply inconclusive for both TMT and TLT, and thus unreliable for determining a correction to the HCTC. [Note: though UAH version 5.4 (v5.4) has been replaced with v5.5 in October 2012 due to sudden excessive noise of the Aqua AMSU, there was no impact on the issue dealt with in this discussion.]
Over longer periods, where the random errors have a reduced impact on trends, we show the 1979–2011 global TLT trends for the key datasets used in the State of the Climate in 2011 report in Table 2 (Willet et al. 2012). Here, TLT represents a layer with more confident radiosonde values because corrections to radiosonde datasets become more uncertain as the altitude increases into the stratosphere, which affects TMT. The results indicate good agreement among the datasets with the median trend being +0.14 K decade−1—the magnitude reported by both UAH and RSS.
5. Documented reasons for differences
The issue of differences between UAH and RSS TLT and TMT has been addressed many times (e.g., Christy and Norris 2006; Christy et al. 2007; Randall and Herman 2008; Christy and Norris 2009; Bengtsson and Hodges 2011; Christy et al. 2010, 2011.) The clearest difference is found in the 1990s in which RSS warms relative to UAH (and radiosonde datasets), especially in the tropics. Note in the fourth column in Table 1 RSS warms by +0.07 K relative to UAH and the radiosonde datasets. However, in the 2000s the situation is reversed with RSS cooling relative to UAH (not shown). In the TLT products, these gradual changes balance each other out, so the overall trends are essentially identical as indicated earlier. However, there is less relative cooling for RSS versus UAH in TMT in the 2000s, so the RSS trend still is more positive than UAH overall. Without claiming which dataset is closest to reality, the differences are consistent with the timing of the adjustments applied to account for the spacecraft drift through the diurnal cycle.
Studies we have performed, using well-characterized radiosonde datasets, indicate UAH has the slightly smaller error characteristics and highest reproducibility (e.g., Christy et al. 2011). This provides some evidence that the UAH adjustments are closer to reality than RSS. However, using larger, but less well characterized, earlier versions of the radiosonde datasets, Mears et al. (2011) find little difference between RSS and UAH in large-scale trends.
Finally, as of this writing, the January 1979–November 2012 TMT global trends for UAHv5.5 and RSSv3.3 are +0.043 and +0.078 K decade−1, respectively. None of the dataset builders claims to have constructed an error-free dataset. We believe the main differences between UAH and RSS are small (within the error bands), have been identified, and are being addressed with new versions to be released as time goes on. In the broader perspective of the science, that the UAH and RSS global trend differences for TMT (TLT) are so small, −0.035 (+0.006) K decade−1, is an encouraging aspect of these endeavors because they were independently constructed.
Through several lines of evidence, we demonstrate that the result of PF2012, that is, that the UAH hot calibration target coefficient is too large, relies on what in our opinion is inconclusive evidence from radiosondes. Indeed, we show that for short periods of time, UAH and RSS actually agree with each other better than with the radiosondes and that short-term data errors are larger than the error signal being sought. For longer periods, the datasets agree within small margins of error. Additionally, the effect on the global UAH trend of the reported adjustment was shown here to have been incorrectly applied by PF2012 to the UAH dataset due to a misunderstanding of the UAH merging process. The main differences between UAH and RSS, as noted in earlier publications, relate to the different diurnal corrections applied to the datasets. In retrospect, after 22 years of working on the MSU dataset, it is remarkable that the discrepancies between UAH and RSS decadal temperature trends for TMT (TLT) of −0.035 (+0.006) K decade−1 are so small, given that the two groups follow different construction methodologies in taking the raw radiance counts to the final geophysical products.
This research was supported by U.S. Department of Energy Grant DE-SC0005330. The perceptive comments of the reviewer contributed to significant improvements in the original manuscript.
The original article that was the subject of this comment/reply can be found at http://journals.ametsoc.org/doi/abs/10.1175/JTECH-D-11-00147.1.