Radiosonde temperature records contain valuable information for climate change research from the 1940s onward. Since they are affected by numerous artificial shifts, time series homogenization efforts are required. This paper introduces a new technique that uses time series of temperature differences between the original radiosonde observations (obs) and background forecasts (bg) of an atmospheric climate data assimilation system for homogenization.
These obs − bg differences, the “innovations,” are a by-product of the data assimilation process. They have been saved during the 40-yr ECMWF Re-Analysis (ERA-40) and are now available for each assimilated radiosonde record back to 1958. It is demonstrated that inhomogeneities in the obs time series due to changes in instrumentation can be automatically detected and adjusted using daily time series of innovations at 0000 and 1200 UTC.
The innovations not only reveal problems of the radiosonde records but also of the data assimilation system. Although ERA-40 used a frozen data assimilation system, the time series of the bg contains some breaks as well, mainly due to changes in the satellite observing system. It has been necessary to adjust the global mean bg temperatures before the radiosonde homogenization.
After this step, homogeneity adjustments, which can be added to existing raw radiosonde observations, have been calculated for 1184 radiosonde records. The spatiotemporal consistency of the global radiosonde dataset is improved by these adjustments and spuriously large day–night differences are removed. After homogenization the climatologies of the time series from certain radiosonde types have been adjusted. This step reduces temporally constant biases, which are detrimental for reanalysis purposes. Therefore the adjustments applied should yield an improved radiosonde dataset that is suitable for climate analysis and particularly useful as input for future climate data assimilation efforts. The focus of this paper relies on the lower stratosphere and on the internal consistency of the homogenized radiosonde dataset. Implications for global mean upper-air temperature trends are touched upon only briefly.
Since the 1940s radiosondes are an essential component of the global atmospheric observing system. They reach farther back than satellite records and they also provide relatively high vertical resolution compared to satellite instruments, such as the Microwave Sounding Unit (MSU). Therefore they are a unique source of information about the upper-air climate and they are essential for climate data assimilation efforts, such as the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) or the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis (Uppala et al. 2005; Kistler et al. 2001). While radiosondes measure temperature, humidity, and wind, this work’s analysis is restricted to temperature. The quality of the radiosonde instrumentation has improved over time. Systematic errors of radiosonde temperature measurements comprised several kelvin in the stratosphere during the 1960s and 1970s. The main, but not the only reason for systematic errors, was radiation effects. In the late 1980s, at many sites the radiation error was still larger than 1 K at the 50-hPa level, as is indicated in Fig. 1. With today’s modern sounding equipment the radiation error is reduced to a few tenths of a kelvin (Nash et al. 2005).
Figure 2 shows time series of mean 1200 − 0000 UTC differences of composites of radiosondes located between 30°W and 40°E as well as between 120°E and 120°W at the 50-hPa level. The 50-hPa level has been chosen as a compromise between data availability and susceptibility to radiation errors. Between 30°W and 40°E there is darkness at 0000 UTC and high solar elevation at 1200 UTC (except at polar regions). Between 120°E and 120°W the opposite is the case. A larger longitude range has been chosen in Fig. 2b to include more stations around the Pacific. In recent years the mean differences, which exceed 1 K in the 1960s, are gradually reduced at this altitude to practically zero. The resulting “trend” in the 1200 − 0000 UTC differences is almost certainly artificial (Sherwood et al. 2005) and must be removed before the radiosonde data can be applied for climate change research. Breaks and biases also cause problems in operational data assimilation systems as well as in climate data assimilation systems; observations with large bias but otherwise good quality tend to be rejected more frequently by a data assimilation system. If they are not rejected they cause biases in the resulting analyses unless the data assimilation system is specifically designed to deal with biased observations (Dee and Da Silva 1998).
The task of removing artificial breaks from time series is referred to as homogenization. Some authors attempted to use the available metadata (Gaffen 1996) and detailed knowledge of the instruments to apply physically based corrections (Luers and Eskridge 1995; Eskridge et al. 2003; Redder et al. 2004). While this is potentially the most favorable approach, it can be applied only at selected sites since the required detailed information about equipment and launch times is often not available.
Homogeneity adjustments based on time series analysis, metadata, and expert judgment have been published by Lanzante et al. (2003a), Lanzante et al. (2003b), Free et al. (2005) (referred to as the LKS dataset), and Thorne et al. (2005a). While these datasets consider only a subset of the global radiosonde network (87 stations in the LKS dataset, 678 stations by Thorne et al. 2005a) they are important achievements and valuable tools for intercomparisons with satellite data and with climate model results.
Subsampling of the data, as performed in these analyses, is acceptable if the desire is to characterize only large-scale changes. For climate data assimilation purposes it is much harder to justify omission of at least one-third of the available radiosonde data in order to retain temporal homogeneity of the analyzed product. Since only anomaly time series have been adjusted, many records of these homogenized datasets may still have a constant bias and therefore may cause a bias in reanalyses if used as input for data assimilation. Both aspects limit their value as direct input for future reanalyses, although the information on breakpoint timing and magnitude that they contain may prove useful.
None of the above datasets has been created by automatic procedures. Any improvement of the datasets would be rather laborious. Further there is often lack of agreement where and how large the breaks are (see Free et al. 2002). Sizeable uncertainty therefore still remains on upper-air trends and low-frequency variability (Seidel et al. 2004; Free and Seidel 2005; Thorne et al. 2005b). Sherwood et al. (2005), Randel and Wu (2006), and Santer et al. (2005) have recently raised serious doubts about the validity of temperature trends from radiosondes in the Tropics. Reduced uncertainty in observed upper-air trends has been identified as one of the most pressing needs in climate research (Karl et al. 2006).
This article describes a new homogeneity adjustment method called Radiosonde Observation Correction Using Reanalyses (RAOBCORE). It addresses some of the problems of the existing homogenized datasets. A preliminary version of RAOBCORE has been documented in Haimberger (2005). The most important characteristics of RAOBCORE are the following:
It uses time series of innovation statistics (the difference between observation and the background forecast of the assimilating model) of a climate data assimilation system, such as the ERA-40 reanalysis (Uppala et al. 2005). The rationale for using ERA-40 innovation statistics for homogenization is discussed in section 2.
It uses time series of individual launches, not monthly or seasonal means, for break detection and adjustment, and it analyzes daytime and nighttime launches separately. The homogeneity adjustment method is described in section 4.
A record with no breaks may still have a large constant bias, which is problematic for reanalysis efforts. In section 7 it is outlined how RAOBCORE adjusts not only breaks of long records but also the biases of the most recent parts of the records.
It is relatively easy to create multiple realizations of the homogenized dataset, for example, for sensitivity experiments, because the adjustment software is fully automatic.
RAOBCORE uses innovations from a climate data assimilation system as reference instead of composites of neighboring radiosondes (cf. Thorne et al. 2005a) and makes no reference to the LKS dataset. Consequently it can be regarded as independent of these homogenization methods. The homogeneity properties of the ERA-40 background temperature time series are discussed in section 6 and in appendix B. Adjustment results for selected stations and for several sensitivity experiments are presented in sections 8 and 9.
2. Interpretation of the innovation statistics of a data assimilation system
The atmospheric data assimilation process may be described as applying a filter on the multivariate time series of meteorological observations. The filtering process is optimal if the time series of differences between observations (obs) and the background forecast (bg) of the assimilating model is stationary random with zero mean (Lewis et al. 2006). The obs − bg differences are often referred to as innovations and the time series of the differences is called innovation process. In practice the filtering process is never optimal, due to systematic errors of both the observing system and the assimilating model. As a result the time series of innovations have nonzero long-term means, are not stationary, and not random. These nonzero means of the innovations are often referred to as bias. One can only estimate the bias of one (test) dataset with respect to a second (reference) dataset. Since the mean of the true state is not known, the real bias is also unknown. However, if there is a shift in the observation time series, for example, due to an instrument change, it should immediately result also in a shift in the innovation time series and thus of the bias. This shift can be estimated and used for homogenization purposes.
For this study mainly the innovations from the three-dimensional variational data assimilation system (3DVAR) used in ERA-40 have been used. Within the 6-h cycle the following cost function is minimized (Courtier et al. 1998):
Here x is the state vector of the assimilating forecast model, and xb is the background state, which is obtained by a 6-h forecast run of the assimilating model. Then the forecast state mapped to observation space by the observation operator 𝗛 is compared with the available observations (the observation vector y), and 𝗕 and 𝗥 are the background error and observation error covariance matrices, respectively.
At the beginning of the minimization process, x = xb; thus the first term vanishes. The quantity (y − 𝗛xb) is called the innovation or obs − bg difference. Its accurate calculation is essential in the data assimilation process and much effort is put into the specification of the observation operator 𝗛. The difference (y − 𝗛xb) is available for every single observation presented to the data assimilation system and is saved in the so-called ERA-40 analysis feedback files. In the case of radiosonde temperatures, 𝗛 represents the transformation from the spectral space to the model grid and then simple linear interpolation from the ECMWF model grid to the observation location (ECMWF 2000).
The innovations have proven most useful for the detection and estimation of systematic observation errors (Hollingsworth et al. 1986). Monitoring of the innovations is an important task at operational forecast centers. The innovations have been used also in ERA-40 to calculate adjustments of the radiation error of radiosondes (Onogi 2000; Andrae et al. 2004). The innovations are more useful for bias estimation than differences between observations and the analyses, since the obs − bg differences are more directly related to systematic observation errors. If the bg were perfect, the obs − bg difference would be the obs error, whereas the obs − analyses (an) difference is already influenced in a complicated way by the erroneous observation (see, e.g., Dee 2004).
It is the working hypothesis of this paper that time series of innovations can be used to detect and remove artificial shifts in radiosonde temperature time series, that is, to homogenize these time series. Since the bg error is generally quite small, the differences between the bg and the corresponding observations tend to be small and therefore the series of differences between bg and obs reacts sensitively to any change in the radiosonde temperature bias.
3. Input data
With the completion of the ERA-40 project (Uppala et al. 2005), a 45-yr global time series of analyses (an) and 6-h bg forecasts has become available. While the analyses are the most useful products for many research applications, the observation database assembled during the reanalysis effort is equally valuable.
The binary unified form for the representation of meteorological data (BUFR)-coded ERA-40 analysis feedback (AF) dataset has been the main input data source for this study. The AF dataset contains all observations from 1958 to 2002 presented to the ERA-40 data assimilation system, plus quality control flags and the innovations described in section 2. No other dataset holds such long and complete time series of upper-air temperature innovations. The time series have daily resolution, and 0000 and 1200 UTC ascents were kept separate. Observations at 16 standard pressure levels (10, 20, 30, 50, 70, 100, 150, 200, 250, 300, 400, 500, 700, 850, 925, and 1000 hPa) were considered and analyzed for breakpoints.
Since the ERA-40 dataset ends in 2002, it has been decided to use operational analysis feedback data from 2001 onward. Concatenating operational and ERA-40 AF has allowed the use of radiosonde records up to December 2005. The obs − bg differences need to be adjusted since the bg of the operational version of the ECMWF data assimilation system used for 2001 and later years differs substantially from the bg of the ERA-40 assimilation system (see section 6). After 2001 no evidence for major temperature inhomogeneities in the bg temperatures due to changes in the operational ECMWF data assimilation system could be found.
The recent advent of the IGRA dataset (Durre et al. 2006) has been the second big improvement of the data basis for homogenization of the global radiosonde network. The IGRA and ERA-40/ECMWF datasets are not identical, as can be seen from Fig. 3. ERA-40 contains more data in the 1960s and 1970s, whereas IGRA contains more data in the 1990s.
The IGRA data complement ERA-40 radiosonde data at some places, especially over the United States in the 1990s. The data missing in IGRA are partly available in the older Comprehensive Aerological Reference Data Set (CARDS; Durre et al. 2006). They have not been included in IGRA because of data quality concerns. Experience from ERA-40 indicates, however, that most of these data are of sufficient quality to be assimilated. While the original IGRA data do not contain obs − bg differences, these can be calculated quite accurately from archived ERA-40 bg fields interpolated to the IGRA observation sites (Haimberger 2005).
In this paper, the union of both datasets is used. The merging procedure has been simple: if there were duplicates, preference has been given to ERA-40 data. A total of 2881 stations (most of them on land but also a few weather ships) have been identified. These have time series containing temperature on standard pressure levels and on significant levels (TEMP-Land and TEMP-Ship). The station identification procedure, which normally makes merging so complicated, has been facilitated by the good documentation of IGRA data and the station tables available from ERA-40.
Apart from pure measurement information and the innovation statistics, metadata information is also available in digitized form from the CARDS archive (Gaffen 1996; with updates from Aguilar 2000). The information about the radiosonde type and the radiation corrections used is useful for finding and interpreting breaks in the radiosonde temperature time series, although the information is incomplete and sometimes inaccurate.
Quite useful additional metadata information could be extracted from the ERA-40 feedback records. From 1979 onward, many stations have routinely transmitted the radiosonde type. This information is available in the ERA-40 feedback files as well [coded according to World Meteorological Organization (WMO)-BUFR table 02011; see WMO Manual on Codes, No. 306, Vol. I.2]. From this information more than 1000 well-documented and often precisely dated radiosonde type changes could be extracted.
4. Break detection method
The obs − bg difference series and obs(1200 UTC) − obs(0000 UTC) difference series have been analyzed with a variant of the standard normal homogeneity test (SNHT; Alexandersson and Moberg 1997) described below. Ducre-Robetaille et al. (2003) recently compared popular homogeneity tests and the SNHT performed well in this intercomparison. However, the results of this comparison are not directly applicable for the analysis of radiosonde records. The frequency of breaks requires analysis windows of only a few (<5) years. For such short time windows it is essential to take the seasonal cycle into account and it is advisable to use daily ascents instead of monthly or annual means in order to have full control over the sampling of the available data. It is worth noting that using anomalies does not remove the annual cycle in the case of radiosonde data since the annual cycles of, for example, 1200 − 0000 UTC differences at the same stations are rather different for different radiosonde types (see, e.g., Fig. 10).
a. A variant of the standard normal homogeneity test
The original SNHT (Alexandersson and Moberg 1997) calculates the series of differences between the tested series (in this study the radiosonde time series) and a reference series (the bg). Then the means of the parts of the difference series before and after a potential breakpoint k are compared. The point dividing the sample into two parts is varied but the time interval stays fixed. For each dividing point k in the sample, a test statistic can be calculated:
where N is the sample size, μ1k is the mean of the subsample before k, μ2k is the mean of the subsample after k, μ is the mean of the whole sample, and σ is the sample standard deviation. The difference series is considered inhomogeneous if the maximum value Ts of all k is above the threshold somewhere in the interval. In this case the point ks where Ts occurs is the most likely location for the breakpoint. The difference of μ2k −μ1k calculated at the point ks is the best estimate for the magnitude of the break.
The original SNHT has two drawbacks. First, it tends to indicate breaks near the edges of the time interval as either k or N − k becomes small (Ducre-Robetaille et al. 2003). Second, the position of a breakpoint is poorly estimated in the presence of a periodic signal. To overcome these problems the original SNHT has been modified such that μ1k and μ2k are calculated as 2-yr moving averages:
The sample sizes at point k are then fixed to N/2 but the interval [k − N/2, k + N/2] now depends on k. The maximum value of the test statistic changes its meaning in the sense that it is not an absolute maximum of the Tk in a fixed interval but the local maximum of the Tk in the interval [k − N/2, k + N/2]. Therefore it is denoted Tsk.
This way the means are more reliably estimated and the annual wave is averaged out exactly if N/2 corresponds to an integer multiple of 365 if daily data are used and none are missing. To make the test more robust against combined effects of an annual cycle and missing data, data before and after the midpoint k of the analyzed time interval have been placed into 12 bins, one for each month. If the data count of, for example, the January values in the earlier 2-yr interval, was less than the count of January values in the later 2-yr interval, the excessive January values in the later interval were removed (starting with the values with the largest time distance from the middle of the investigated interval). This version of the SNHT, which ensures equal sampling of the annual cycle in the intervals before and after a potential breakpoint, is referred to as equal sampling SNHT.
Figure 4 shows examples of the behavior of the moving average SNHT test statistic and related quantities with simulated data. The synthetic time series has 2920 data points, corresponding to 8 yr if daily data are considered. The interval for the moving averages before and after a suspected breakpoint is 2 yr, the same as used for the real data below.
The gray curve in Fig. 4a depicts one of 5000 realization of a N(0, 1) normal distributed random series with a break of size 0.5σ in the middle of the interval. The black Tk curve in Fig. 4a shows a distinct maximum at the right location. Figure 4b shows the distribution of Tsk gained from the 5000 realizations. Values are practically always above 50. Figure 4c shows the distribution of the break estimates μ1ks − μ2ks at the locations ks where Tk is largest. It is normally distributed around 0.5 K with a standard deviation of 0.05 K. Figure 4d shows the distribution of the detected break locations ks. The location of a break with size 0.5 K can be detected with a standard deviation of 0.08 yr (1 month).
The significance levels for rejecting the null hypothesis can be gained from applying the SNHT to a sample of 5000 random series with no break. Figure 4e shows a realization of such a series. The Tk vary erratically and Tsk is much smaller. The significance levels of the moving average SNHT can be estimated from the distribution of the Tsk in Fig. 4f. For the 95% level it is 9.6, and for the 99% level it is 12.5. Figure 4g shows again the distribution of the difference between the moving averages at the point ks. Figure 4h shows the distribution of the break location ks for the case of no breaks. It is close to uniform.
From this result, it becomes clear that breaks with a size of 0.5σ are practically always detected since in this case more than 99% of the Tsk reach values above 20. Figure 4g indicates the problems to be expected if a break is falsely detected: the estimates break size and therefore the false correction would be of magnitude 0.25σ.
The significance levels derived above are valid only for purely random processes (apart from an annual cycle) and for complete time series (1460 days). If data are missing, the Tsk value necessary to reach the 95% significance level decreases, no matter whether the data loss occurs as one big gap or intermittently. This can been verified with Monte Carlo–type experiments similar to those presented above (not shown) and has been documented by Alexandersson and Moberg 1997 as well. If there is an annual cycle in the obs − bg, the significance threshold also decreases, since it generates additional variance. In these cases smaller thresholds for the maximum SNHT should be used.
Only if the time series are generated by stochastic (autoregressive) processes, the significance levels become higher since the degrees of freedom are reduced (von Storch and Zwiers 1999). A check of the obs − bg difference time series (with annual cycle removed) showed weak autocorrelation for lags of up to 4 days at some remote areas and practically zero autocorrelation for areas with good data coverage and for obs(1200 UTC) − obs(0000 UTC) time series. Data show generally much less autocorrelation than a first-order autoregressive model with coefficient α = 0.3, and even if such a model would apply, the 95% threshold is still below 20. An exception are remote island stations before the satellite era where the autocorrelation is similar to a first-order autoregressive model with α = 0.5, for which a Monte Carlo experiment similar to those in Fig. 4 yields a 95% significance threshold of 25. Additional sensitivity experiments with the equal sampling SNHT are documented in Haimberger (2005). It performs well in general as long as there is only one break within the tested interval. If there are more breaks only the largest break is detected.
To be safe, thresholds larger than Tsk > 12.5 suggested from the above analysis have been used for break detection. For obs − bg time series Tsk values above 50 are considered significant, if no metadata are available. For obs(1200 UTC) − obs(0000 UTC) time series, which show no autocorrelation and cannot be influenced by bg errors, values above 20 are considered significant if there are no metadata available. An SNHT value above 50 is almost always reached by breaks with size 0.5σ (see Fig. 4b). SNHT values above 20 are reached by about half of the series with breaks of size 0.25σ but are still not attained by homogeneous time series.
The equal sampling SNHT is now applied to the following time series:
Log pressure–weighted 1200 − 0000 UTC temperature difference in the stratosphere: depending on data availability analysis starts with the 20–30-hPa layer. Then all the means from two pressure levels down to the 200–300-hPa layer are calculated. Each layer mean time series is analyzed with the equal sampling SNHT. For each point in time the maximum SNHT value from the analyzed pressure layers is kept. The resulting time series of maximum SNHT values is then examined for significant values. Note that the 1200 − 0000 UTC time series is independent of the ERA-40 bg. A break detected in this time series has high credibility since it can be safely attributed to the obs series.
Log pressure–weighted layer mean obs − bg temperature difference at 0000 and 1200 UTC in the stratosphere. These time series, if available, are analyzed in the same manner as the 1200 − 0000 UTC temperatures.
The 300–850-hPa log pressure–weighted layer mean obs − bg temperature difference at 0000 and 1200 UTC. These time series are sensitive to station relocations and are more complete than the stratospheric time series. Temperature means from this relatively thick layer have quite small variance, and therefore relatively subtle breaks having the same sign throughout this layer can be detected.
b. Decision algorithm for break detection
The probability for a break depends not only on the SNHT test statistics but also on the occurrence of events, such as instrument changes. These information sources may be combined with the Bayesian rule (DeGroot 1986). Let A1 be the event that a break with a given size occurs within a time interval of ±2 yr around a point in time and Ao be the event that no break occurs. Let B be the event that Tsk reaches a value above a certain threshold x. We now want to know the probability of a break given that the event B occurs. Bayes’ theorem states that this probability can be calculated as
The challenge is now to specify the probabilities on the rhs of this equation. The probabilities Pr(B/A1) and Pr(B/Ao) can be found from the histograms shown in the second column of Fig. 4. It is the area under the histograms between x and ∞. Metadata information can be included by specifying prior probabilities Pr(A1). If, for example, the date of a radiosonde change is known, one can apply a higher prior probability Pr(A1) to this particular date. In the examples presented in this paper, prior probabilities of 0.96, 0.5, and 0.02 have been chosen for radiosonde changes, radiation correction changes, and no documented changes, respectively. This choice may appear extreme given the limited reliability of metadata but is necessary to make the detection system sensitive enough for metadata. With this choice, SNHT values of 31, 43, and 50 (14, 18, 20 for 1200 − 0000 UTC difference series) are necessary to reach scores above 0.5 when put into Bayes’ formula. A documented break with size 0.3σ reaches similar scores as an undocumented break with size 0.5σ. A score above 0.5 is necessary to trigger the breakpoint adjustment algorithm. This happens with breaks of size 0.5σ without metadata.
Since the metadata information may be imprecise, the prior probabilities have been modeled as Gaussians with standard deviation of 60 days. If only the year of a change has been reported the same high prior probability is specified throughout the year in order to assure that high weight is given to the metadata and to let the SNHT detect the break date. If Global Telecommunications System (GTS) metadata have been available, the prior probability has been lowered to 0.01 between events since the sequence of radiosonde-type reports explicitly states that there was no radiosonde-type change.
Figure 5 shows the breakpoint analysis for the radiosonde station Bethel (70219, Alaska). Prior probabilities are assigned according to the CARDS/BUFR metadata. Figures. 6a,b show time series of the test statistic Tk of the SNHT at Bethel for the stratospheric obs(1200 UTC) − obs(0000 UTC) and for the tropospheric obs − bg time series at 0000 UTC. The test statistics exceed the significance levels (20 for Fig. 6a, 50 for Fig. 6b) several times at this station (see also Fig. 10 for the corresponding difference time series). One can also see that the peaks are very sharp for the large breaks (1989, 1995), so that the break location is well determined in these cases, consistent with Fig. 4.
Since there are five time series of probability scores that may contain conflicting information (e.g., the location of the diagnosed breakpoint may differ slightly between the time series) one has to choose the location. The obs(1200 UTC) − obs(0000 UTC) score, shown in Fig. 6c is given the highest priority, since it is independent of possible inhomogeneities in the bg and therefore always attributable to the radiosondes. This is consistent with previous findings (Lanzante et al. 2003a; Sherwood et al. 2005) that most changes seem to be accompanied with changes in the obs(1200 UTC) − obs(0000 UTC). The peaks whose score is above 0.5 and that are the highest peaks within a ±2 yr interval are selected. Then the two probability time series from the stratospheric obs − bg time series are analyzed (the 0000 UTC series is shown in Fig. 6b). The locations with the highest peaks are chosen as breakpoints unless a breakpoint has already been detected within a ±2 yr interval in the obs(1200 UTC) − obs(0000 UTC) series. Finally, the probabilities from the tropospheric obs − bg time series are analyzed. Other priority choices for the breaks from obs − bg series are possible. One could give the tropospheric obs − bg time series priority to stratospheric obs − bg series or one could combine the probability scores again with the Bayesian rule. Little overall sensitivity has been found in respect of this choice.
The diamonds and stars in Fig. 6c indicate the location of the finally chosen breakpoints. The diamonds indicate breaks detected already in the obs(1200 UTC) − obs(0000 UTC) time series, the stars are breaks detected in at least one of the obs − bg time series.
There are many possible ways to further improve this break detection algorithm. Perhaps the most important path for improvement is to use different time intervals (not only ±2 yr) for the SNHT since a running mean with fixed length may be incapable of detecting all breaks.
5. Estimation of the break profiles
After the break detection, the mean obs − bg differences before and after the break to be adjusted are calculated at each pressure level, again ensuring that the annual cycle is sampled equally before and after the break. The difference of the means at every pressure level yields the estimated profile of the breaks (the solid and dotted curves in Fig. 7). While this profile inevitably contains small-scale vertical noise it is not smoothed vertically, as in sensitivity experiments the smoothing did not improve the spatiotemporal consistency of adjusted trends or 1200 − 0000 UTC differences.
The time interval used for estimating the break size is varied between 1 and 8 yr. The default of 8 yr is necessary for stable statistics in the presence of frequent data gaps. It is reduced only if there is another breakpoint before the breakpoint considered within 8 yr. The interval after the breakpoint is always 8 yr (if the time series is long enough), since the time series after the breakpoint has already been homogenized. The minimum interval of 1 yr is also the minimum interval between breaks allowed by the break detection algorithm.
After estimating the break at each level, its significance is tested using Student’s t test. Only profiles where at least two pressure levels contain breaks that are significant at the 95% level are used for the adjustment.
Since the bg may be affected by changes in the satellite observing system, a second criterion to test the significance of a break profile has been developed. It compares obs − bg time series of the tested radiosonde, with a composite of obs − bg time series from radiosondes surrounding the tested site. This composite was originally intended only for adjustment of the climatology of a tested time series (see section 7) but is quite useful for checking the break estimates gained from the obs − bg information at the tested site as well. Only if the break profile from difference series between obs − bg time series of the tested radiosonde and composite obs − bg time series of the surrounding radiosondes contains significant breaks at two or more pressure levels as well, is the break profile adjusted. The second criterion is more robust against jumps in the bg since the obs − bg series of the neighboring radiosondes are likely affected by very similar bg jumps, as these have a very large-scale structure. The neighbor composite criterion used here has similarities to the methods used by Thorne et al. (2005a). They use, however, composites of obs-anomalies, not composites of bg − obs differences for break detection and adjustments. Using the above criteria, only about 70% of the detected breaks are actually adjusted.
If a break profile has been considered significant, adjustments have been applied at all levels, even if the adjustment amounts have been below the significance threshold at some pressure levels. Adjustments at very high levels may be less accurate due to lack of data but the spatiotemporal consistency could be improved by the adjustments even at the 10-hPa level. No data deletions have been performed.
6. Global mean differences between ERA-40 bg and radiosonde observations
It is essential to be aware of any inhomogeneities of the ERA-40 bg since these reduce the applicability of the ERA-40 bg as a reference. Inhomogeneities in the bg time series may be introduced by changes in the ERA-40 observation coverage, in the observation biases correction, and in the overall observation quality. Apart from radiosondes the satellite data are mainly affected by changing biases. The satellite data contributed to the realistic representation of many stratospheric features in ERA-40 analyses (Randel et al. 2004) and to a much better overall quality of the ERA-40 analyses and forecasts. However, the quality control and the adjustment of the changing biases of satellite radiances as described by Hernandez et al. (2004), Li et al. (2006), and Harris and Kelly (2001) is challenging. Suboptimal radiance bias correction can easily compromise the homogeneity of the global mean ERA-40 bg forecasts (Uppala et al. 2006).
As can be seen from the global mean obs − bg difference series in Fig. 8, some changes in the ERA-40 satellite observing system did indeed lead to breaks in the global mean bg forecasts. The most prominent breaks evident in Fig. 8 occurred in January 1975, September 1976, and April 1986 and are related to problems with the National Oceanic and Atmospheric Administration (NOAA) satellites NOAA-4 and NOAA-9. Jumps in 1995–97 coincide with the end of NOAA-11, and the start/end of NOAA-14 (see also Christy and Norris 2006). At high altitudes the effects of insufficient bias correction of radiances from the stratospheric sounding unit (SSU), particularly in the early 1980s, are noticeable (see Haimberger 2005; Uppala et al. 2006). Trenberth and Smith (2006) have recently diagnosed a spurious break in ERA-40 temperature analyses related to the assimilation of MSU-3 radiances at the end of the NOAA-9 period. These problems are the likely reason for the rather weak stratospheric cooling and rather strong upper-tropospheric heating trends of the ERA-40 bg compared to available radiosonde and satellite datasets (see Karl et al. 2006, their Fig. 3.4) in the ERA-40 analysis. This peculiar vertical pattern is also evident in the global mean trends of the unadjusted bg shown in Fig. 9. In the Tropics the upper-tropospheric heating is even more pronounced.
However, pervasive radiosonde temperature biases especially in the Tropics (Sherwood et al. 2005; Randel and Wu 2006; Christy and Norris 2006) have most likely contributed to inhomogeneities in the global mean bg − obs series as well. From the 1990s onward, there is also good correspondence between trends derived from MSU radiances (Santer et al. 2004) and ERA-40, which supports the bg. For this period the ERA-40 analysis also looks more homogeneous in the study by Trenberth and Smith (2006). Therefore, with the present knowledge it is difficult to tell to what extent the radiosonde temperatures or the bg temperatures are responsible for the breaks and in particular for the trends in the global mean obs − bg departures of Fig. 8. Despite these uncertainties, the adjustment of the global mean bg has been tried such that the chance for spurious break detection due to breaks in the bg is reduced.
This is most straightforward for the large break in January 2001, which is certainly caused by the switch from ERA-40 to the operational ECMWF data assimilation system. It is a clear indication of the temperature uncertainties that still exist at stratospheric levels. The break can be adjusted relatively accurately at each radiosonde station from an overlap year (2001) between operational and ERA-40 innovations.
The remaining shifts in the global mean bg − obs series are much harder to attribute to either the bg or the obs. In the bg adjustment procedure applied here, it is assumed that much of the trend of the global mean obs − bg difference stems from time-varying biases in the global mean bg. Therefore the global mean obs − bg difference shown in Fig. 8 has been subtracted from the bg time series at individual stations. The bg adjustment has not been applied completely uniformly but has been varied with radiosonde observation density and with latitude (see appendix B for details). Other choices for the global mean bg adjustment are definitely possible. One could, for example, try to homogenize the global mean bg in a similar manner as the individual radiosonde records. Further investigation of the properties of the bg is needed to design an optimal bg adjustment.
At individual stations this bg adjustment is subtle compared to the typical sizes of breaks in radiosonde records. In the global mean, however, the bg adjustment procedure leads to stronger cooling/weaker heating trends (Fig. 9), especially in the lower stratosphere/tropical upper troposphere. In the lower troposphere the adjustment has less impact since the bg − obs differences are smaller and show little trend below 300 hPa. It is interesting to note that both the unadjusted radiosondes and the bg are practically neutral [−0.01 and 0.02 K (10 yr)−1] in terms of 300–850-hPa trends in the Tropics. The strong warming trends in the tropical lower troposphere as suggested by the Remote Sensing Systems (RSS) MSU lower-troposphere (LT) products (Santer et al. 2005) are not supported either by the radiosondes or by the ERA-40 bg.
The necessity of this bg adjustment prior to the homogenization and the sensitivity of the homogenization results on the bg adjustment (see section 9) limit the accuracy of the homogenization results. It is expected that the size of adjustments necessary for the bg from future reanalyses will be much less due to more advanced treatment of satellite biases.
7. Adjustment of the climatology of radiosonde time series
Even after successful homogenization, the climatology of a radiosonde record may have a large bias if the most recent part of the time series is biased. The reasons are typically as follows:
Some countries still operate radiosondes with nonnegligible temperature biases. Examples are the Chinese and most Russian radiosondes.
Some radiosonde time series end well before the year 2000. Prominent examples are the weather ships in the Atlantic and Pacific Oceans. The most recent parts of these time series are likely biased.
Some time series, for example, over Russia in the 1990s, have gaps that are wider than the averaging intervals used for calculating the break profiles. In this case the time series before the gap cannot be adjusted by the homogenization procedure.
Adjusting the climatology and the choice of “trusted” radiosonde types whose measurements are left untouched, as done here, is a delicate task since it implies preference to certain radiosonde types. However, this task cannot be avoided for climate data assimilation purposes. The involved biases are large enough to cause noticeable biases of the resulting analyses or even to trigger rejection of a substantial part of the radiosonde measurements, particularly at high altitudes in the satellite era.
The adjustment of the climatology is performed after the time series homogenization; it is the final adjustment step. Therefore the time series should already be without any detectable breaks. The challenge is now to calculate a reference that is less biased than the most recent part of the tested time series.
Many ascents launched in recent years have relatively small biases up to the 10-hPa level or the biases are well documented and can be corrected. A subset of 342 radiosonde stations has been identified where the climatology is considered unbiased after homogenization. A map of these can be found in Haimberger (2006). These are stations that have been equipped with Vaisala RS80/90/92 radiosondes or temperature sensors, with MeiseiII radiosondes and Sippican radiosondes. Their performance is well documented in radiosonde intercomparison studies, such as Nash et al. (2005).
At all other stations (e.g., with VIZ radiosondes, Chinese and Russian radiosondes) the most recent part (usually the most recent 8 yr, if available) of the records has been regarded as biased and therefore the records have been adjusted.
A reference time series has been calculated from a neighbor composite of obs − bg differences from the above 342 radiosonde stations. The size of the adjustment is calculated by comparing the obs − bg difference of the tested series with the composite obs − bg difference of the neighboring time series. The mean obs − bg difference of the tested time series should be very close to the mean obs − bg difference of the composite, since the spatial pattern of an 8-yr-averaged bg temperature field is smooth and the bg temperature gradients are considered realistic. Since the obs − bg differences are such small increments, their composites are less affected by data gaps than composites of absolute temperatures.
The composite obs − bg difference is a weighted mean of neighboring homogenized radiosondes using weights decreasing exponentially with distance from the radiosonde station to be adjusted. A decorrelation distance of 3000 km has been used. At least 20 stations have been required to be available for the composite.
Experience with this approach has been encouraging. Many biases of the most recent parts of time series, as evident for example over Russia or at the weather ships in the Atlantic, could be significantly reduced (see Figs. 17 and 18).
8. Adjustment results for selected individual stations and regions
A small subsample of radiosonde records is investigated to demonstrate the strengths and the limitations of RAOBCORE. For most countries similar examples can be found since practically all long radiosonde time series contain inhomogeneities. Even if the radiosonde records presented contain large breaks, they have high information content.
a. Adjustments in the satellite era
Since homogenization works backward in time we begin with the more recent period 1979–2005. While the ERA-40 bg contains more information independent of radiosondes than in the presatellite era, there is also the risk of spurious breaks due to insufficient bias adjustment between satellites (see section 6). The following examples should demonstrate that adjustment with obs − bg difference series is possible despite these problems.
One is Bethel, Alaska, which had a relatively homogeneous time series between 1961 and 1989 but then three marked breaks in 1989, 1992, and 1995. Test statistics for this station have been shown already in Figs. 5 and 6. Due to space constraints the analysis in this and other examples is restricted to the 50-hPa level in this study.
Figure 10 shows time series of unadjusted and adjusted 1200 − 0000 UTC temperature differences at Bethel. One can see how different the day–night differences are depending on the radiosonde system in use (VIZ before 1989, then Space Data radiosondes, then Vaisala RS80 radiosondes, according to CARDS). These jumps can only come from changes of the observing system and are independent of the ERA-40 bg.
The ERA-40 bg comes into play for the adjustments. The adjustments for the 0000 and 1200 UTC ascents are calculated independently from obs − bg time series at 0000 and 1200 UTC. It is worth emphasizing that the 1200 − 0000 UTC obs time series is never used for adjustment purposes in RAOBCORE; it is used for break detection purposes only.
As one can see from Fig. 10b the diagnosed breakpoints coincide well with metadata events in the 1980s–90s. The breaks in 1976 and in the 1960s are smaller, but one can see from visual examination of the unadjusted 1200 − 0000 UTC time series that there are breaks indeed. The adjusted 1200 − 0000 UTC time series is much more homogeneous and has an almost neutral trend, as is expected for this difference series. The homogeneity of the 1200 − 0000 UTC time series is a good consistency check for RAOBCORE since this time series has not been used for calculating the adjustments.
Bethel is not the only radiosonde station in Alaska affected by instrument changes. As can be seen from Fig. 11, the unadjusted radiosonde records over Alaska and Canada yield rather different temperature trends for the period 1989–2004 at 0000 and 1200 UTC and the trends are also spatially rather heterogeneous due to the different station histories. After adjustment with RAOBCORE, the temperature trends at 0000 and 1200 UTC are in good agreement and also the spatial homogeneity is much better. Note that there were substantial adjustments not only for the daytime (0000 UTC) ascents. This is consistent with findings from radiosonde intercomparisons (Nash and Schmidlin 1987) that showed substantial nighttime deviations between VIZ and Vaisala RS80 radiosondes and contradicts the impression one may get from Sherwood et al. (2005), who stress the importance of daytime biases.
The second example is Darwin, Australia, which switched from Philips MKIII radiosondes to Vaisala RS80 radiosondes in May 1987. This station has been analyzed more thoroughly for example by Free et al. (2002). Darwin is difficult since only 0000 UTC ascents have been available before 1987 and even those had some gaps at the 50-hPa level. Figure 12 shows the adjustments suggested by RAOBCORE for this station. The break size is estimated 2 K, which is in accord with Free et al. (2002). There is a clear indication of a break in 1963 as well, which is in good agreement with breaks suggested by Lanzante et al. (2003a) for this station. Additional breaks in Fig. 12 seem evident in 1976 and 1981. but these have not been adjusted since the significance criteria have not been met.
The robustness of the RAOBCORE adjustment procedure may be seen from Fig. 13. The spurious 1200 − 0000 UTC differences are reduced at almost all stations that launch two radiosondes per day during the period 1988–90. The largest adjustments can be found over the United States, China, and Australia. The spatial consistency of the differences (as measured by the “cost” C defined in appendix A) has improved as well compared to Fig. 1, and the amplitude of the differences has been reduced to a few tenths of a kelvin except in the Tropics, in agreement with recent results from Free and Seidel (2005).
b. Adjustments in the presatellite era
In the presatellite era, the ERA-40 bg does not contain much more upper-air information than is available from radiosondes. Therefore the issue of dependence of the bg on the radiosondes to be tested becomes particularly important. Further the radiosonde density in the Tropics and the Southern Hemisphere was very sparse, making neighbor intercomparisons rather difficult. Some examples are given indicating that adjustment of radiosondes using obs − bg time series is still possible.
The dependence of the bg on the radiosondes to be tested is expected to be particularly large for (i) remote stations and (ii) large countries using the same radiosonde equipment. The degree of dependence may be assessed qualitatively by looking, for example, at Mesural radiosondes in the late 1960s–early 1970s. Particularly the Mesural FM43 radiosondes had large radiation errors. This radiosonde type was in use in France and in former French dependencies (e.g., many Pacific islands). This gives an opportunity to see whether the same breaks are diagnosed over data rich and extremely data sparse areas. Figure 14 shows obs − bg differences for local daytime ascents at the French radiosonde station Trappes, France (1200 UTC), and at the southeast Pacific station Rapa, Chile, before and after adjustment with RAOBCORE.
Both time series show very large biases around 1970 when Mesural-FMO-43B radiosondes were in use. These were replaced by Mesural-FMO-44C radiosondes in 1972 at Trappes and in 1976 at Rapa. The suggested corrections from obs − bg time series are about 20% weaker at Rapa, probably because the bg is influenced by Rapa itself. A correlogram from the stratospheric obs − bg time series at Rapa (not shown) also indicated some dependence of the bg on the observations at Rapa: the serial correlation of the obs − bg differences decreases from values of 0.3 for 1-day lags to insignificant levels for lags of 4 days. Nevertheless it is evident that the bg contains enough information to yield sensible break estimates even in this area. The obs − bg time series at Rapa after adjustment of the observations (Fig. 14d) looks much more homogeneous.
At Trappes the standard deviation of the obs − bg difference is much smaller since the bg forecast is more accurate in this region and also the serial correlation of the innovation time series is practically zero (except near a breakpoint). The adjusted obs − bg time series (Fig. 14b) is again more homogeneous although not all instationarities are removed from the strongly varying time series between 1969 and 1972.
Not only Trappes but all French radiosonde sites are affected by the Mesural problem with daytime stratospheric temperature biases in the order of 10 K. The bg over Europe seems almost unaffected by this problem, although most of the biased Mesural observations have been accepted by the ERA-40 quality control system. This can be concluded from the obs − bg time series at station Stuttgart, Germany (Fig. 15), which is situated just “downstream” of France and does not have documented changes around 1970. The obs − bg temperature time series of this station was homogeneous in 1969 and 1972 when France changed the radiosonde instrumentation. If the bg were influenced by the breaks of the French temperature time series, a shift would be visible in the obs − bg time series at Stuttgart.
While over Europe this result may be plausible since there is much independent information available to make the bg apparently immune against even large breaks; the situation may be different when a large country changes its radiosonde equipment simultaneously. Figures 16 and 17 provide two examples that even in those situations the bg is independent enough to be used for adjustment. In the Japanese radiosonde time series (Fig. 16), major breaks are adjusted in 1968, 1981, and 1993–94. The Japanese composite of adjusted 1200 − 0000 UTC time series no longer shows spurious breaks (Fig. 16b).
A similar example is far eastern Russia (station IDs 31000–33000; Fig. 17). The most prominent shift in 1969 is caused by a change from MARS to RKZ5 radiosondes and is adjusted well by RAOBCORE. This composite also shows the difficulties in data availability over Russia in the 1990s. The most recent parts of most stations, before the data gaps had to be adjusted by about 1 K since their obs − bg time series, showed a distinct bias compared to a composite of mainly Japanese and Alaskan stations. The resulting adjusted 1200 − 0000 UTC time series is nevertheless relatively stationary (apart from a remaining annual cycle) as it should be. The daily mean adjustments in 1969 are about −0.7 K at 50 hPa over eastern Russia (31000–33000). This compares well with daily mean adjustments of −0.6 K over western Russia (south of 60°N, west of 60°E), which have been taken at similar solar angles. This is remarkable since the bg over western Russia is potentially influenced by European radiosondes, whereas over eastern Russia it is more likely influenced by Alaskan and Japanese radiosondes (the latter have large breaks in 1968). Only the positive 1200 − 0000 UTC differences after adjustment in the 1960s is an indication of a slight overcorrection, due to a too-warm nighttime bg or a too-cool daytime bg caused by biases in the radiosondes 12 h earlier.
Figure 18 shows that the RAOBCORE adjustments substantially improve the spatial consistency of 1200 − 0000 UTC time series also in the early period 1964–66, although a few stations remain where the adjustments by RAOBCORE seem insufficient. Note also that many weather ships were operational in the 1960s until the early 1970s. For these ships, which operated up to the early 1970s, the adjustment of the most recent period removes the large biases visible in Fig. 18a.
Figure 19 shows the time series of the adjustments per month applied to the global radiosonde network. The combined length of all time series divided by the total number of adjustments applied (6233) yields about one adjustment every 7 yr. While this number may seem large, it is in good agreement with Thorne et al. (2005a) and is also supported by the number of radiosonde-type changes derived from GTS (about 700 from 1990 onward) and documented in CARDS (about 6700). About 30% of the adjustments coincide with metadata events within ±14 days of the detected breakpoint. Most peaks are related to changes in large countries (e.g., Russia, France in 1969). There are no peaks evident in 1973, 1975, 1976, and 1978–79 where major changes in the ERA-40 observing system occurred.
The adjustments for all 1184 stations are available from http://www.univie.ac.at/theoret-met/research/RAOBCORE/ as plots and as an ASCII-formatted file. The dataset will be updated annually, as soon as the input data from IGRA and ECMWF become available.
9. Sensitivity experiments
The homogenization process has several sources of uncertainties, as outlined, for example, by Thorne et al. (2005a) and McCarthy et al. (2006, manuscript submitted to J. Climate). Since RAOBCORE is an automated system, it has been possible to test the sensitivity with respect to various parameters of the adjustment system.
The sensitivity of the following parameters has been tested: (i) 50–100 and 300–850-hPa layer global mean trends, which may be compared, for example, with the results collected in Fig. 3.4 of Karl et al. (2006); (ii) tropical (20°N–20°S) mean trends for the same period; (iii) spatial consistency of 50-hPa 1979–2004 trends, as defined in appendix B; (iv) spatial consistency of 1200 − 0000 UTC differences averaged over 1964–66; and (v) breakpoint count. The main results are summarized in Table 1.
Time series have been used for the trend comparison only if less than 24 months of data out of the 26-yr period 1979–2004 have been missing. If only 0000 or 1200 UTC trends have been available for a station, these have been regarded as daily mean trends. If both have been available, their average has been taken. The daily mean trends have been averaged to 10° × 10° latitude–longitude grid boxes and the mean trends from these grid boxes have been used to calculate the tropical (20°S–20°N) and global mean trends. The intermediate calculation of 10° × 10° trends helped alleviate the effects of the uneven spatial distribution of radiosondes.
The first two rows BG and BGADJ describe the unadjusted and adjusted versions of the ERA-40 + ECMWF background as they are used in the sensitivity experiment below. The original ERA-40 + ECMWF bg (BG) has almost neutral trends in the lower stratosphere, which are not supported by the University of Alabama in Huntsville (UAH) and RSS satellite products and even less by the radiosondes. As one can see from row BGADJ, the adjustment described in section 6 introduces moderate stratospheric cooling (similar to that of MSU satellite products) into the bg.
Row UNADJ refers to the unadjusted radiosonde dataset used as input for RAOBCORE. Note the strong stratospheric cooling and large spatial heterogeneity compared to the bg.
Row RAOBCORE summarizes the results of RAOBCORE as used for the plots in this study, that is, with bg adjustment, with use of metadata, and with the thresholds described in section 4. The radiosonde trends adjusted with RAOBCORE show much less cooling than UNADJ and are very close to the existing homogenized radiosonde datasets [Hadley Centre Atmospheric Radiosonde Temperature Product version 2 (HadAT2), Radiosonde Atmospheric Temperature Products for Assessing Climate (RATPAC); see Karl et al. 2006]. Although the RAOBCORE trends are not directly compared to satellite data in this paper, it is clear that they remain more negative than those derived from satellite products. The spatial consistency of both trends and 1200 − 0000 UTC difference after the adjustment is much better than for the original radiosonde dataset but is still not as high as the consistency of the ERA-40 bg trends and 1200 − 0000 UTC differences.
The other rows in Table 1 show results of sensitivity experiments. For NOMETA, RAOBCORE has been applied with constant prior probability 0.02; that is, metadata information has been completely discarded. For ONLYMETA, the prior probability has been set to zero if no metadata were available and to 0.999 when metadata events (radiosonde-type changes and radiation correction changes) were documented. NOBGC refers to a RAOBCORE run where the bg was not adjusted before application of RAOBCORE. In the experiment NOBGC_ONLYMETA the bg has not been adjusted and metadata have been treated as in ONLYMETA. STRICT refers to an experiment where the thresholds for break detection and adjustments have been set higher (40% higher Tsk values required and 60% larger break size required than for the default).
The results suggest that the adjusted trends are relatively insensitive to changes in metadata treatment and to the increase of the breakpoint thresholds. However, there is sizeable sensitivity to the background adjustment applied before application of RAOBCORE. If the unadjusted bg is used as reference, the resulting RAOBCORE-adjusted trends shift toward more warming by more than 0.3 K (10 yr)−1 in the tropical stratosphere and by about 0.04 K in the tropical troposphere. In this case the adjusted radiosonde trends fit better to satellite-derived temperature trends but are still within the range of uncertainty of upper-air trends (see again Karl et al. 2006). The bg adjustment has relatively little impact on breakpoint count and on the spatial consistency of adjusted radiosonde time series. This shows that high spatial consistency of trends and 1200 − 0000 UTC differences, while a desirable property, is not sufficient to determine the quality of the diagnosed global mean upper-air trends.
While the unadjusted bg trends are most likely too weak in the stratosphere (see Fig. 3.4 in Karl et al. 2006) so that the experiment NOADJ is regarded as rather extreme, it shows the importance of a homogeneous reference. In view of mounting evidence for pervasive biases of the unadjusted radiosonde dataset particularly in the Tropics (Sherwood et al. 2005; Randel and Wu 2006), the bg adjustment applied in the best estimate of this paper (row RAOBCORE) may seem aggressive. There is indeed the possibility that the bg adjustment removes climate signals from the MSU and other satellite instruments. This is suggested by the good correspondence between ERA-40 MSU-4 equivalent and RSS (Mears et al. 2003) datasets, at least in the stratosphere, from circa 1989 onward (Santer et al. 2004). Therefore the difference in stratospheric trends between experiments RAOBCORE and NOADJ is probably larger than the true uncertainty of the radiosonde trends.
The sensitivity of the trends with respect to the bg adjustment is sizeable, even if adjustments are made only at documented breakpoints. Since the breakpoints are practically the same in the ONLYMETA and NOBGC_ONLYMETA experiments, the trend differences must mainly come from break estimation, not from break detection. As is indicated in Fig. 7 the uncertainty in break estimation is at least 0.5 K in the stratosphere, even in recent years. Since almost every radiosonde time series contains at least one break between 1979 and 2004, trend differences in the order of 0.3 K, depending on the reference used, are not surprising. It has further been tried to use shorter intervals (4 yr instead of 8 yr) for break estimation, but this did not reduce the sensitivity to the bg adjustment either.
The high values of C and low break count values for the NOMETA and STRICT experiment suggest that many breaks remain undetected in these experiments. One has to accept that there are thousands of breaks that need to be adjusted in order to get a spatially consistent dataset. The impact of these parameters on trends is small, which indicates that the largest breaks are found even without metadata and large thresholds.
The sensitivity experiments stress the importance of the reference series used for homogenization. The uncertainties in the bg adjustment lead to uncertainties that are similar to the differences between existing homogenized radiosonde and satellite datasets.
10. Conclusions and outlook
This paper has documented a method called RAOBCORE, which uses innovations (observations minus background forecasts) from a frozen data assimilation system, such as ERA-40 (Uppala et al. 2005), for automatic homogeneity adjustments of radiosonde temperature data. Innovations back to 1958 have been used. The method has been designed to make the adjusted radiosonde dataset as suitable for reanalyses as possible. This involves not only realistic trends but also completeness of the radiosonde dataset and adjustment of the absolute temperatures, not anomalies. It could be shown with several examples that RAOBCORE substantially reduces spurious trends in the 1200 − 0000 UTC differences (cf. Fig. 2 with Fig. 20) and improves the internal spatial consistency of the radiosonde measurements.
The global trend figures from RAOBCORE show good agreement with existing global homogenized radiosonde datasets, such as HadAT (Thorne et al. 2005a) and LKS (Lanzante et al. 2003b). More detailed comparisons with regional radiosonde datasets, for example, Comprehensive Alpine Radiosonde Dataset (CALRAS; Haeberli 2006) as well as MSU records (Mears et al. 2003; Christy et al. 2003), are in preparation.
Inhomogeneities introduced into the background forecast temperatures due to changes in the (satellite) observing system and the dependence of the background forecast error on the station to be tested and adjusted are regarded as the largest potential sources of uncertainty when using innovations. While the results from individual stations suggest that the latter problem is relatively small, the sensitivity of the RAOBCORE-adjusted trends with respect to the global mean bg is not negligible. The uncertainty (which is estimated 0.3 K (10 yr)−1 for the global mean 50–100-hPa layer and 0.05 K for the 300–850-hPa layer) of trends from the RAOBCORE-adjusted dataset can be reduced below that of existing upper-air datasets only if the inhomogeneities in the global mean obs − bg time series (as shown in Fig. 8 for the 50-hPa layer) can be attributed more clearly to either the bg or the obs.
The behavior of the ERA-40 data assimilation system in the satellite period is still investigated and increasingly well understood (Uppala et al. 2006). With this building knowledge the uncertainty of the ERA-40 bg and thus of the resulting RAOBCORE trends can likely be reduced. It is quite possible that the bg adjustment used in this study was too aggressive and that therefore the RAOBCORE best estimate for global trends after homogenization still shows too much cooling/little warming. A more thorough investigation of this matter is under way but is beyond the scope of this article.
Some radiosonde temperature time series have a substantial bias even in their most recent parts. For these stations the biases have been reduced by adjusting with a composite of homogenized neighboring radiosondes in an extra step after the homogenization procedure. This step is not compulsory for climate trend analysis but adds substantial value for climate data assimilation applications, which are affected by absolute biases (Dee and Da Silva 1998; Uppala et al. 2006).
Apart from the correction algorithm itself this article has documented gaps in both the ERA-40 and the IGRA (Durre et al. 2006) radiosonde datasets. The union of both datasets is about 5%–10% larger than the individual datasets. Efforts to create a comprehensive global radiosonde dataset must therefore continue.
In view of the results gained so far, it seems worthwhile to perform a pilot climate data assimilation of the period 1939–57 in order to be able to adjust also these time series using the innovations gained from such an assimilation. Numerous radiosondes are available for this period (Durre et al. 2006; Bronnimann 2003). Innovations from a four-dimensional variational data assimilation (4DVAR) system would add valuable information to these data, which are otherwise very difficult to homogenize. Preliminary results with RAOBCORE also indicate that homogenization based on analysis of innovations is feasible also for radiosonde winds. Put into a more general context, the method of reanalysis (or climate data assimilation) together with proper archival of the innovations may become a key method for the improvement of historical observations. The recent success of assimilations using surface pressure only (Compo et al. 2006) together with potential application for homogenization makes the idea of a 100-yr reanalysis attractive, despite the sparseness of the available data.
The efforts described here are part of a long-term activity to provide a mature and complete homogenized radiosonde dataset suitable as input for the next major European reanalysis planned for the end of this decade.
The foundation for this work has been laid during the European Commission Contract MEIF-CT-2003-503976, which enabled the author to work at ECMWF in 2004. I thank Sakari Uppala and Adrian Simmons in particular for their continuous support. After 2004 the work has been funded by project P18120-N10 of the Austrian Fonds zur Förderung der wissenschaftlichen Forschung (FWF). Christina Tavolato and Stefan Sperka were helpful in detecting bugs in the adjustment procedure. The comments of the anonymous reviewers led to substantial improvements of the original manuscript.
A Spatial Consistency Measure for Trends and Day–Night Differences
To estimate the spatial consistency of trends and 1200 − 0000 UTC differences, a simple cost function is defined as
where p is the pressure level, i, j are station indices, N is the number of stations, Δxij is the difference quantity between stations, and dij is the spherical distance between stations in kilometers. Here Δxij is either the difference between trends (Δ∂T/∂t)ij or the difference between 1200 − 0000 UTC differences at stations i, j. Strongly deviant trends/differences at densely covered areas contribute most to the cost. The cost function C for radiosondes should reach similar values as the ERA-40 bg or satellite products after homogenization.
Adjustment of the Global Mean bg Time Series
In section 6 it has been shown that the global mean bg has spurious breaks due to insufficient satellite bias corrections. The signature of these breaks can be found throughout the globe. Although it has a horizontally smooth pattern compared to the distribution of breaks in radiosonde records, it is not constant. In general the influence of the satellite data and excessive latent heating on the bg is smaller in regions with dense radiosonde observation coverage and in the extratropics. Therefore the adjustment is scaled with the radiosonde density in the following way:
where Δbg(λ, ϕ, p, t) is the global mean obs − bg difference and the weighting function w is defined as
The stronger weight at low latitudes helps to reduce the strong heating of the bg in the Tropics, which is considered excessive. The radiosonde observation density p is defined as
where di is the spherical distance between location (λ, ϕ) and radiosonde i. The factor fi takes into account whether the radiosonde reports once ( fi = 1) or twice daily ( fi = 5). It was set to 5, not 2, since sites with twice daily launches tend to be better maintained. Here N(t) is the number of active radiosondes and ρmax(t) is the maximum radiosonde density found at a particular time. Finally, the weights w are adjusted by a constant factor such that the global mean Δbg is equal to when averaged over all radiosonde stations. Figure 18 in Haimberger (2005) shows the spatial pattern of the weight w, which shows little variation in time.
With this choice of parameters the maximum adjustment for an individual radiosonde site is about 1.6 times the global mean adjustment (e.g., in the South Pacific), and the minimum adjustment is below 0.2 times the global mean adjustment (over central Europe, China). It has been tuned to reduce as much as possible the conspicuous jumps due to the erroneous bias correction of the NOAA-4 radiances between January 1975 and September 1976. Figure 19 in Haimberger (2005) shows the obs − bg series averaged over the radiosondes south of 25°N before and after adjustment with the weighted global mean bg. Although this simple adjustment of the bg described here can by no means remove all biases, at least the breaks in January 1975 and September 1976 are substantially reduced in the average over this data-sparse region. For other breaks, for example, due to problems with NOAA-9, other horizontal weighting functions may be optimal, but so far this has not been tested.
Corresponding author address: Dr. Leopold Haimberger, Department of Meteorology and Geophysics, University of Vienna, Althanstrasse 14, A-1090 Vienna, Austria. Email: firstname.lastname@example.org