Temperature data from radiosondes over Germany have been homogenized manually. The method makes use of the different radiosonde (RS) networks existing in East and West Germany until 1990. The largest temperature adjustments, up to 2.5 K, apply to Freiberg sondes used in the east in the 1950s and 1960s. Adjustments for Graw Hamburg 1948 (H48), 1950 (H50), and Munich 1960 (M60) sondes, used in the west from the 1950s to the late 1980s, and for RKZ sondes, used in the east in the 1970s and 1980s, are also significant: 0.3–0.5 K. Small differences between Vaisala RS80 and RS92 sondes used throughout Germany since 1990 and ~2004, respectively, were not corrected for at levels from the ground to 300 hPa. Comparison of the homogenized data with other datasets—Radiosonde Innovation Composite Homogenization (RICH) and Hadley Centre Atmospheric Temperature, version 2 (HadAT2)—and with Microwave Sounding Unit satellite data shows generally good agreement. HadAT2 data exhibit a few suspicious spikes in the 1970s and 1980s and some suspicious offsets up to 1 K after 1995. Compared to RICH, the homogenized data show slightly different temperatures, by less than ~0.4 K, in the 1960s and 1970s. As reported in other studies, the troposphere over Germany has been warming by 0.2 ± 0.1 K decade−1 from ~1950 to 2013, and the stratosphere has been cooling. The stratospheric trend increases from almost no change near 230 hPa (the tropopause) to −0.4 ± 0.2 K decade−1 near 50 hPa. Trends from the homogenized data are more positive by about 0.1 K decade−1 compared to the original data, both in the troposphere and stratosphere.
Radiosonde (RS) measurements started in the first quarter of the twentieth century. The 1957–58 International Geophysical Year provided a large impetus toward regular radiosoundings worldwide. At that time, the main goal of atmospheric soundings was to support aviation and weather forecasting. Various, often locally developed, RS systems were used, with different technical approaches to measuring temperature, relative humidity (RH), wind, and pressure or altitude. Most early RS systems had low accuracy and especially large radiation errors (Nash and Schmidlin 1987; Zhai and Eskridge 1996; Luers and Eskridge 1998; Lanzante et al. 2003).
During daytime, inadequate shielding or ventilation of the temperature sensor results in inaccurately high temperatures, particularly at levels higher than 100 hPa. During nighttime, low readings can occur because of thermal emission from the temperature sensor and lack of sufficient ventilation. These radiation errors have become much smaller for the newer radiosonde types and are usually corrected during automated data processing (MRZ, Vaisala RS80, and RS92). For the older RS types [Freiberg, Graw Hamburg 1950 (H50), Graw Munich 1960 (M60), and VIZ], however, changed temperature sensors and changed radiation corrections are clearly visible (e.g., several and station-specific radiation correction changes for the Graw M60). Sensor time lags were also often long for early sonde types, both for temperature and pressure readings (Nash and Schmidlin 1987). For the Graw M60 radiosonde, for example, varying manual and automatic corrections were applied for the slow responses of its temperature and pressure sensors (Deutscher Wetterdienst 1983; Nash and Schmidlin 1987). Humidity data had especially large errors, even in the midtroposphere (Ivanov et al. 1991).
International RS intercomparison campaigns did reveal large differences between RS types (Hooper and Vockeroth 1975; Nash and Schmidlin 1987; Ivanov et al. 1991), including the Graw M60, Graw RSG, MRZ, VIZ, and Vaisala RS80 sondes used in Germany. Over the years, manufacturers have developed improved and more accurate radiosonde systems (Nash et al. 2006). Unfortunately the corresponding instrument changes have also introduced inhomogeneities into nearly all long-term RS records (Thorne et al. 2005; Haimberger 2007a). Therefore, even though RS measurements provide six decades of valuable upper-air data, these data require some prior homogenization before they can be used for climate trend studies.
Efforts to homogenize historic RS records include the work by Gaffen (1993), who collected metadata on worldwide changes of radiosonde types and practices in a systematic manner. This is a key requirement for any RS homogenization activity. Parker et al. (1997) used such station metadata in combination with satellite-based temperature measurements from the Microwave Sounding Unit (MSU) for homogenizing RS monthly records from the beginning of MSU data in 1979. This approach has some limitations as a result of the much coarser vertical resolution of MSU data (Lanzante et al. 2003). Additionally, homogenization using satellite data (or any other data) as a reference can be affected by temporal inhomogeneities in the reference data themselves.
Lanzante et al. (2003) selected around 90 stations worldwide for manual homogenization [the Lanzante–Klein–Seidel (LKS) dataset]. LKS considered data from each station separately, but at different levels and for different times of the day (corresponding to different solar zenith angles) to determine breakpoints and then adjustments. Comparison to other stations was not made. One problem of this approach is the difficulty of separating artificial differences caused by RS changes from true geophysical changes around a breakpoint.
The Radiosonde Atmospheric Temperature Products for Assessing Climate (RATPAC) dataset (Free et al. 2005; NOAA/National Centers for Environmental Information 2005) is based on the 87 LKS stations. RATPAC focuses on large-scale long-term climate variation, especially global, tropical, and extratropical temperature anomalies. RATPAC homogenization is based on the time-oriented LKS method, but it also uses comparison with neighboring stations through the spatially oriented first difference (FD) method of Peterson et al. (1998). RATPAC is geared toward providing large-scale averages, but the station time series are also available and extend the LKS dataset.
Another spatially oriented homogenization approach by Sherwood et al. (2008) used temperature and wind shear time series and did not rely on metadata for breakpoint determination. For elimination of artificial changes, they used the Iterative Universal Kriging homogenization process (Sherwood 2000), which also considers data from neighboring stations.
Using physically based radiation and lag corrections originally developed for Vaisala RS80 and VIZ sondes by Luers and Eskridge (1995), Durre et al. (2002) also attempted to homogenize worldwide RS data. Physically based approaches are desirable, but they can be applied only to stations where accurate metadata about equipment and launch procedures are available (Haimberger 2007a). Comparison of the Durre et al. (2002) adjusted dataset to MSU records still showed significant inhomogeneities that lacked an easy explanation.
A semiautomatic breakpoint detection, with error adjustments similar to the LKS method, was implemented by Thorne et al. (2005) for building the global Hadley Centre Atmospheric Temperature, version 2 (HadAT2), database (Met Office Hadley Centre 2012). Automatic methods allow using a much larger number of stations (e.g., almost 700 stations in the case of HadAT2). HadAT2 data are available as station time series and in a gridded form. A fully automatic global homogenization method was presented by Haimberger (2007a). He applied a variant of the standard normal homogeneity test (SNHT; Alexandersson and Moberg 1997) to differences between RS time series and reference time series from ERA-40 and ERA-Interim forecasts (Uppala et al. 2005). Adjustments for this homogenized Radiosonde Observation Correction using Reanalysis (RAOBCORE) dataset (Haimberger 2007b) were calculated from the mean of RS minus reference time series before and after breakpoints. However, it can be questioned whether the ERA-40 forecasts are independent enough (they use unhomogenized radiosonde data as one input) and free of inhomogeneities (e.g., because of changes in underlying satellite data). To reduce this problem, the Radiosonde Innovation Composite Homogenization (RICH) database was generated later (Haimberger et al. 2012a,b). It is based on the breakpoint detection method of RAOBCORE, but it also considers neighbor station time series as a reference. The RAOBCORE and RICH datasets provide temperature time series from 1958 on. Their automatic homogenization allows evaluation of a large number of stations.
In recognition of the problems generated by radiosonde biases and lacking metadata in the historic records, the Global Climate Observing System (GCOS) Reference Upper-Air Network (GRUAN; www.gruan.org) was founded almost 10 years ago. GRUAN aims to provide RS measurements of very high quality with very well characterized traceable systems and procedures that are suitable for climate research (Seidel et al. 2009). GRUAN is currently formed by about 15 stations worldwide. Key aspects of GRUAN are traceable calibrations, well-estimated measurement errors (Dirksen et al. 2014), and insured continuity in the future.
The main focus of this study, however, is on historical German radiosonde data. Germany is interesting because unification of the Federal Republic of Germany (FRG) and the former German Democratic Republic (GDR) in 1990 brought together two quite separate networks of RS systems. These had been used from around 1950 to after 1990 in a dense network and over a small geographic region. Since about 1992, the German RS network has been largely homogeneous, first using Vaisala RS80 sondes, then, since 2003–06, Vaisala RS92 sondes (Steinbrecht et al. 2008). To our knowledge, a homogenized RS dataset for Germany has not been published. Because of the limited number of German stations, our aim was to keep the whole homogenization process under manual control, in contrast to the automatic global homogenization methods mentioned above.
Our paper is organized as follows: In the next section, a short description of the data and RS systems is provided. Sections 3 and 4 describe our methods for breakpoint detection, determination of adjustments, and homogenization. Section 5 presents trend analyses and other results for the original and the homogenized datasets.
Upper-air RS data for this study were obtained from the database of the German Weather Service [Deutscher Wetterdienst (DWD)]. Missing data were supplemented from the Integrated Global Radiosonde Archive (IGRA) (Durre et al. 2006, 2008; NOAA/National Centers for Environmental Information 2006). Before supplementing, IGRA and DWD data were checked for mutual consistency in overlapping periods. In the end, 13 German sounding stations were chosen based on launch frequency (usually two or more per day), and on record length (at least 30 yr). Table 1 shows details about the stations, and Fig. 1 shows their geographic locations. Two of the stations, Munich and Emden, were relocated during the reporting period. However, launch positions changed by less than 40 km, and these moves had no discernible effect on the data. Nearly all stations had daily 0000 and 1200 UTC launches, corresponding to midday and midnight flight conditions in Germany. Dresden had no nighttime soundings between 1972 and 1991. Adam et al. (2005) give a description of the (unhomogenized) Lindenberg time series starting as early as 1905. Hohenpeissenberg had only 1–3 launches per week, however, with ozone sondes coupled to higher-quality radiosondes.
Our homogenization of the normal German RS data is based on monthly means of temperature, relative humidity, and geopotential height records for 11 standard pressure levels: 850, 700, 500, 400, 300, 250, 200, 150, 100, 70, and 50 hPa. Data at 30 and 10 hPa are very sparse before ~1990, and were not homogenized. A monthly mean was computed only if there were at least eight measurements at that station for the given month and pressure level. Fewer soundings were required for Hohenpeissenberg. Rare outliers (i.e., data points more than ±3σ away from the monthly mean) were removed. Monthly means and monthly medians gave virtually the same results. The requirement of at least eight soundings per month, reaching at least the 100-hPa level, means that usable time series for the German stations start at the times given in Table 1. Some stations have earlier soundings, but these data are sparse or were not available. Since there are also no reliable reference time series before 1950, we did not attempt to homogenize data before 1950. Our focus is, therefore, on homogenizing the German temperature time series starting at the years given in Table 1. Geopotential height and relative humidity data were not homogenized, although they did provide very useful background information for the detection of breakpoints.
Figure 2 gives an overview of the radiosonde types flown at German stations since 1950. Basic information about the different RS types is given in Table 2, compiled from various sources in the German Weather Service (e.g., Müller 1951), from Gaffen (1993), and from Durre et al. (2006). In an ideal world, there would be traceable information about which radiosonde was used at which time for each station; about data processing and corrections applied; and about the precise differences between the different RS systems and applied corrections. Unfortunately, substantial parts of this information are missing or were not accessible for the German RS stations. For example, no information was found on the precise time for the radiosonde change from Graw Hamburg 1948 (H48) to H50 in West Germany. Therefore, we estimated this time from the monthly mean time series.
In the 1950s and 1960, the German Freiberg sonde was used in the east (GDR). In the west (FRG), H48, H50, and M60 sondes manufactured by Dr. Graw Messgeräte were flown. All these sondes used bimetal strips as temperature sensors and aneroid capsules for pressure sensors. Radiation corrections were not applied to the early Freiberg data. For the Graw sondes, radiation, temperature, and pressure lag corrections were applied. These were revised and changed several times, causing shorter homogeneous intervals for these data. The data record at Stuttgart, the West German GCOS Upper-Air Network station (Daan 2002), appears to have been homogenized at some point. Unfortunately, we were not able to find information about this apparent homogenization of the Stuttgart RS time series. In the early years, all data were processed by hand. By the mid-1980s, Graw M60 sondes started to use semiautomatic data recording and processing. This has also resulted in traceable changes in the records of some stations (e.g., Hannover).
Different from all the other stations, the data at Hohenpeissenberg are from ozone soundings only. These soundings were much less frequent, only 1 sounding per week until 1977, 2–3 soundings per week since 1978, usually at 0500–0600 UTC. However, higher-quality radiosondes than the operational Graw M60 sondes in the west were used at Hohenpeissenberg: from 1967 until 1994, mostly VIZ 1073 and 1393 (and a few AMT 12) 1680 MHz sondes with hypsometer (for accurate pressure measurements in the stratosphere), with the exception of 1972–78, when largely VIZ 1072, 1192, and 1292 (and AMT 4) sondes without hypsometer were flown. Greater care in quality assurance was also taken, and the Hohenpeissenberg data have been homogenized, especially for the 1994 switchover from VIZ 1393 to Vaisala RS80. Compared to the other stations with more substantial RS changes, the Hohenpeissenberg record can be considered fairly homogeneous.
In East Germany, Russian RKZ sondes were introduced in the 1970s to replace the Freiberg sonde. These RKZ sondes had improved temperature sensors (thermistor rods), comparable to sensors of the U.S.-made VIZ (and AMT) sondes. The RKZ sondes had no pressure sensor and used radar tracking for height and pressure determination. Automatic receivers and data processing were introduced in the late 1980s in the east, with the Russian-made MRZ (and MARS) RS systems. These were also the first lightweight sondes (300 g) in Germany, compared to the fairly heavy VIZ (700–1000 g) and Graw M60 (1200 g) sondes.
After reunification of East and West Germany in 1990, the entire German network switched to Vaisala RS80 sondes. These still used aneroid capsule pressure sensors but had more accurate and faster capacitive sensors for temperature and humidity (Thermocap and Humicap). From 2003 to 2005, the German network switched again, to the more accurate Vaisala RS92 system with further improved temperature and humidity sensors and more precise miniature solid-state pressure sensors, as well as onboard GPS (Steinbrecht et al. 2008). Vaisala RS92 sondes are also used in GRUAN (Dirksen et al. 2014). All these RS types may have changed slightly over time. However, changes within one RS type usually had minor effects only (except for repeatedly changed radiation corrections for Graw M60 and Freiberg sondes), compared to the large changes seen for changes of the RS type. For our homogenization here, such minor changes are largely ignored.
3. Determination of breakpoints
Before attempting to correct RS time series, it is necessary to identify the breakpoints (i.e., the times at which artificial changes in the upper-air time series occurred). Breakpoints usually arise from transition from one RS model to another, changes in radiation (or other) corrections, or changes in data reduction and reporting. Instrument-related breakpoints should not be confused with changes due to real atmospheric variations [e.g., due to major volcanic eruptions, El Niño or La Niña events, or the quasi-biennial oscillation (QBO)]. For breakpoint determination, we used manual inspection of time series of temperature, humidity, and geopotential height at several pressure levels (usually with the annual cycle removed), as well as metadata information on major changes. Because most of the instrumental changes did not occur at the same time at all stations, there are usually unchanged reference periods available from other stations. Periods without a change from West Germany, for example, can serve as a reference for East Germany, and vice versa.
As described in more detail below, breakpoint candidates were those times where the indicator time series show a clearly visible step-like change or where metadata indicate a change. A clearly visible step means that step size must exceed the standard uncertainty of the difference in average levels before and after the breakpoint candidate. Before accepting a breakpoint candidate, all pressure levels were investigated (see also Thorne et al. 2005), and metadata were considered as well. Final breakpoints show clearly visible step-like changes at several pressure levels and then apply to all pressure levels.
This determination of breakpoints (and associated step-like changes) was carried out manually for all German stations. It was aided greatly by a specifically developed graphical user interface (GUI) software tool. This GUI tool allows loading and visualizing various time series at different pressure levels for a station to be homogenized and up to three reference time series (e.g., at other stations). Time series and time series differences, as well as average levels can be displayed. The GUI tool then allows us to manipulate breakpoint candidates and their associated step-like changes. Resulting breakpoints and step-like changes are recorded.
Detailed information about instrument changes and their dates came from Gaffen (1993), from the IGRA database (NOAA/National Centers for Environmental Information 2006), and from DWD internal sources (e.g., Müller 1951; W. Adam 2013, personal communication). It is important to keep in mind that metadata may miss some relevant information and may contain information that is wrong (e.g., wrong date or wrong station) or not relevant (e.g., changes that did not affect the data). Therefore, information from the metadata was compared and contrasted with the observed time series. In cases where metadata indicate changes that do not appear in the data, usually no breakpoint was assigned. On the other hand, quite often the time series indicate clear changes, but there is no corresponding information in the metadata. If the changes in the time series were clear at several levels, a breakpoint was still assigned in such cases.
In the end, a surprisingly small percentage of the determined breakpoints, only 34%, was already correct in the metadata. For 14% of the breakpoints, the metadata indicated something, but the date was unclear or was wrong. For a very large fraction of the final breakpoints, 52%, no entries were found in the metadata. These large percentages show that metadata should not be taken at face value and need to be checked carefully against the observed time series.
b. Day minus night difference time series
On average, consecutive daytime and nighttime temperatures should see the same overall atmospheric variability. Time series of daytime minus nighttime temperature should, therefore, be more or less constant. Clear step-like changes in day minus night temperature indicate systematic RS changes in most cases (Lanzante et al. 2003). Examples are given in Fig. 3, for a West and an East German station. Noticeable are, for example, the step-like changes of 50-hPa temperatures recorded at Schleswig, caused by repeatedly changed radiation corrections during the Graw M60 period. At Lindenberg, very clear steps in day minus night temperature are visible for the RS changes from Freiberg to RKZ and from MRZ to RS80. Steps in the day minus night temperature difference already give a fairly clear picture of RS changes.
Figure 3 also outlines which step-like changes are significant and which are not. The figure shows several cases where the data level clearly undergoes a step-like change, and this step is larger than the standard deviation of data before and after the step. Such steps are obvious and are statistically significant. In other examples, the change in data level is less obvious and only appears in the blue average lines (e.g., at 100 hPa for the transition from Vaisala RS80 to RS92 in ~2004). Here, significance requires that step size must exceed the uncertainty (standard error of the mean) of the difference in average levels before and after the breakpoint candidate. In our manual homogenization procedure, this objective significance criterion (1σ) was usually applied. For clear breakpoints with significant steps in the stratosphere, we normally also corrected small nonsignificant steps in the troposphere. Very small corrections, however, less than 0.03 K were usually not applied.
c. Relative humidity time series
Humidity sensors have also changed significantly between sonde types (see Table 2), in some cases also over time for one sonde type (e.g., Vaisala RS80). Modern humidity sensors (e.g., on the Vaisala RS92) are faster and much more accurate than sensors were 50 years ago (Miloshevich et al. 2004; Nash et al. 2006), especially in cold environments. Therefore, as expected, humidity sensor and RS type changes are also apparent in humidity records (Häberli 2006; Dai et al. 2011). In Fig. 4, for example, clear steps appear in the monthly mean humidity record, when the West German network changed from Graw M60 to Graw RSG and, later, to Vaisala RS80.
Not only the means, but also the standard deviations of recorded humidity show such clear changes. An example is given in Fig. 5 for an East German station. There, the standard deviation of humidity shows large step-like changes: for example, around 1982 (blue line) for the change from RKZ-2 to RKZ-5 (which includes changes in data reporting), around 1992 for the transition from MRZ to Vaisala RS80, and in 2005 for the change from Vaisala RS80 to RS92. This latter step is not very visible in the monthly mean humidity (Fig. 4) but is quite visible in the standard deviations in Fig. 5.
Figures 4 and 5 demonstrate that humidity data, both means and standard deviations, provide very useful information about RS changes, especially after 1980. However, it must be stressed that we do not attempt to homogenize the humidity time series. Because of the large changes and the much more complex response of humidity sensors to the true humidity profile, homogenization of RS humidity time series is much more difficult than the homogenization of temperature data (Adam et al. 2005; Häberli 2006; Dai et al. 2011). It is not attempted here.
4. Eliminating inhomogeneities
After determining the breakpoints from metadata and looking at all pressure levels, the next step is to determine the temperature bias during the homogeneous periods between two breakpoints. Like previous RS homogenizations (Lanzante et al. 2003; Haimberger et al. 2012a,b), we assume that biases due to the use of a specific RS type or processing can be corrected for by adding appropriate offsets to the temperature data for that RS type and processing at each station. These offsets are determined for individual stations, for all pressure levels, and separately for day- and nighttime.
Formally, the observed time series at stations i in a small region (like central Europe or Germany) can be written as the sum of a large-scale geophysical signal , a climatological station-specific offset , and a station-specific time-dependent inhomogeneity offset (Szentimrey 2008). The offset is the difference between the large-scale climatological annual cycle and the climatological annual cycle at the specific station i. The remaining variations are small and essentially noise:
The inhomogeneity time series are assumed to be step functions. These alter their value at breakpoints , when the lth change of RS type or data processing occurred at station i.
Taking the difference between two nearby stations i and j removes the large-scale signal and leaves only the constant climatological difference , the time-varying step-like inhomogeneity difference , and noise :
For stations far apart (e.g., more than 500 km), larger differences may occur in the geophysical signal (e.g., after a large volcanic eruption like Mt. Pinatubo in 1991). In these cases, the approximation in Eq. (2) can become problematic.
To determine the step-like change in at the lth breakpoint , it is necessary to find a nearby homogeneous station j, where no breakpoint occurs in for a year or two before and after . Then is constant around , and the step-like change in can be determined by averaging over sufficiently long periods, typically from one to two years, before and after the breakpoint . This removes the noise ε and isolates the step-like change in at breakpoint [because E and are constant around the breakpoint].
Collection of all step-like changes in determined in this fashion at breakpoints then yields the time-dependent inhomogeneity offset for station i. For all stations was set to zero for the period after ~2005, when accurate Vaisala RS92 sondes were used throughout Germany.
a. Determination of daytime minus nighttime temperature offsets
A specific case for Eq. (2) is daytime data using nighttime data from the same station as a reference. As mentioned, the geophysical difference between day- and nighttime data is very close to zero, since both record almost the same geophysical variations. In Fig. 3, daytime minus nighttime temperatures (DN) usually show clear step-like changes, when RS type or processing change.
The first step in our homogenization procedure, therefore, was to determine the steps in DN temperature difference time series . These DN differences can already serve as a good initial guess for the daytime temperature bias of an RS type because nighttime temperature biases are small for most RS types (Nash and Schmidlin 1987; Luers and Eskridge 1998). Using the GUI tool, the step-like changes of day minus night temperature differences were determined (see Fig. 3) for each station, all pressure levels, and RS types. These steps in where then collected, starting with the very small DN differences for the Vaisala RS92 sonde since ~2005 and going back in time.
b. Determination of temperature offsets using reference data
The next step in the homogenization process was the determination of separate offset time series and for day- and nighttime temperatures. These are affected not only by changes in temperature sensor and radiation correction but, in addition, also by changes in pressure sensors. The day minus night temperature differences from the previous section already give a good initial guess for the offset time series .
In this step, however, the day- and nighttime inhomogeneity corrections are refined, using several reference time series (e.g., unchanged East German stations for changes in West Germany, and vice versa). Stations in neighboring countries were used as well (see Fig. 1). After 1979, satellite data from the MSU served as an additional reference, keeping in mind the coarse altitude resolution of the MSU temperature data (Mears and Wentz 2009a,b). Note that the reference time series do not have to be homogeneous over the entire period. Homogeneity is only required for a few years before and after the breakpoint under investigation.
An example outlining the current homogenization step is shown in Fig. 6. The top panel compares the daytime temperature anomalies at station Lindenberg (black, to be homogenized) with nighttime temperature anomalies at a reference station, in this example Stuttgart (blue line). The main requirement for reference stations—usually at least four are used in the current step—is that their time series are homogeneous for at least one or two years before and after the breakpoint under investigation.
The middle panel shows the difference [see Eq. (2)] between the two time series (red line). The main task is to determine the step-like changes in this time series at the breakpoints , when RS changes occurred at Lindenberg (vertical black lines). Determination of these steps is made using our GUI tool by comparing the average difference during the homogeneous periods before and after a breakpoint and by manually adjusting the step-like change at . Note that the important point is determination of the step-like changes associated with RS changes in Lindenberg, not the overall difference between the Lindenberg and the reference time series. In the middle panel of Fig. 6, these average differences are already very similar to the ultimately determined offset time series (green line), because the Stuttgart data are already very homogeneous [ is almost constant].
In many cases, however, individual reference time series are less homogeneous, and only a few step-like changes can be determined. Usually, several different reference time series have to be used, typically four or more. Homogenizing all stations at all levels then becomes quite a long and tedious process, even with the help of our GUI tool. The green offset line in the middle panel of Fig. 6 shows the ultimately resulting overall daytime offset for the Lindenberg time series.
The bottom panel of Fig. 6 compares the Lindenberg time series before (black line) and after homogenization [green line, applied]. The Stuttgart time series is shown as well (blue line). Homogenization has clearly corrected some suspicious features in the Lindenberg data. Both station time series now describe very similar temperature variations, as expected for stations in a limited region like Germany. For much of the time, however, the difference between the unhomogenized and homogenized data is small, less than 0.2 or 0.3 K.
The separate offset time series and for day- and nighttime temperatures were determined for each station at every breakpoint and for the whole vertical profile. At the 50-hPa level before ~1960, adjustments for a few early breakpoints were not possible because of sparse or lacking data.
c. Final temperature adjustments
If the step-like changes estimated individually at each station using the GUI tool for a switchover from RS type A to RS type B are quite similar at multiple stations, they can be averaged. This can result in an improved estimate, and the standard deviation between stations can be used to estimate uncertainty. Table 3 summarizes these average day- and nighttime step-like adjustments (to be subtracted) for the major RS types used in Germany. Adjustments are referenced to the Vaisala RS92 period, where no adjustment is applied. Uncertainty estimates are 2 times the standard deviation of the mean . It should be noted, however, that the average adjustments from Table 3 are often not applicable to individual stations, which require station-specific adjustments. This is especially true for GRAW M60 sondes, for which breakpoints and required adjustments vary substantially between stations and over time. This is indicated by the differing homogeneous periods for Graw M60 stations between 1962 and 1989 in Fig. 2 and exemplified by the many steps during the Graw M60 period at Schleswig and Hannover in Figs. 3 and 7.
Generally, the largest adjustments, greater than 1 K, were necessary for the upper levels of daytime soundings by Freiberg sondes in the 1950s and 1960s. At lower levels, and after the 1960s, radiosondes were more accurate, and inhomogeneity adjustments are usually smaller than ±0.5 K. Uncertainties for the adjustments are typically between 0.05 and 0.5 K, and are quite often substantial. Normally, adjustments were applied at all pressure levels, unless adjustments were very small (<0.03 K) and statistically insignificant. Usually this happens at the lower levels in the troposphere. Most notably, we decided not to correct for the small differences between Vaisala RS80 and RS92 sondes at levels below 300 hPa (see also Steinbrecht et al. 2008).
Figure 7 shows the station-specific adjustments for an East and a West German station, and for five pressure levels. As mentioned, adjustments generally become larger with increasing pressure level and are usually also larger in early years (cf. Table 3). The adjustments given in Fig. 7 have to be subtracted from the original data. Usually, daytime data from the early years have to be corrected for warm bias (e.g., the Graw H50 and M60 data before 1985 at Hannover in the right panel of Fig. 7). In some cases, however, the original data also appear to have a cold bias. Examples can be seen in the left panel of Fig. 7 for the RKZ-1 data between 1972 and 1985 at Meiningen, or in the right panel of Fig. 7, for Graw M60 data at 50 and 100 hPa between 1964 and 1970 in Hannover. Note that the error bars in Fig. 7 only show the estimated uncertainty for the individual RS type adjustments (cf. Table 3). The error bars are not propagated backward in time. This would, of course, increase the error bars in the early years, especially if many adjustments/changes have to be applied at a station.
d. Link to WMO radiosonde intercomparison
The Graw M60, MRZ, and MARS RS types used in Germany have also participated in the WMO radiosonde intercomparison reports phase II and phase III (Nash and Schmidlin 1987; Ivanov et al. 1991). Figure 8 compares the daytime adjustment, determined by our homogenization, to the bias seen in twin flights during these WMO comparisons. The top panel shows quite good agreement between our average adjustment for German stations using Graw M60 around 1983 and results from the phase II WMO radiosonde intercomparison [Figs. 6.2 and 11.1d of Nash and Schmidlin (1987)], which took place in early 1985. Note that after this intercomparison, several changes were implemented in the processing of Graw M60 sondes in the German network, resulting in different adjustments for Graw M60 sondes in later years in our homogenization.
In the bottom panel of Fig. 8, reasonable agreement is found near 700 hPa and above 150 hPa between our daytime adjustment for MRZ sondes, used in East Germany from 1985 to 1992, and the MRZ bias against Vaisala RS80 in phase III of the WMO radiosonde intercomparison [Fig. 6.6 of Ivanov et al. (1991)]. When the ~0.2-K cold bias of RS80 sondes near 70 and 50 hPa is considered (see Table 3), actual agreement would become better at these levels. Near 200 hPa, however, differences between the WMO intercomparison and our results are substantial. Uncertainty bars are quite large too. There is no easy explanation for these larger differences around 200 hPa. However, it must be kept in mind that singular snapshots under specific conditions, such as WMO intercomparisons, do not necessarily capture the behavior under day-in day-out use in an operational network.
a. Time series
For a typical East and West German station, Fig. 9 shows temperature anomaly time series before and after homogenization. The original, unhomogenized temperature anomalies (black lines) are usually slightly higher than the homogenized data (green lines). In some cases [e.g., around 1981 in Schleswig at 300 hPa (see also negative adjustments in Fig. 7)] the unhomogenized temperatures can be lower. Note that for the Freiberg sondes in Lindenberg before 1970, the negative adjustment for the daytime data (cf. Fig. 3) is largely compensated by a positive adjustment for the nighttime data, resulting in little change for the day plus night anomalies shown in Fig. 9.
For comparison, data from the RS-based datasets RICH (one grid point over Germany; blue lines) and HadAT2 (the same two stations; red lines) are also plotted. In addition, satellite data from the MSU lower stratosphere channel (TLS; mean over German grid points; magenta lines) are plotted in the 100-hPa panels after 1979. Differences between all datasets are quite small, for the most part less than 0.2 K. Our homogenized temperature anomalies are very similar to the RICH, HadAT2, and MSU time series. Correlation coefficients with RICH monthly mean anomalies are around 0.98 in the troposphere, around 0.93 near the tropopause, and around 0.96 in the stratosphere, slightly larger than for the unhomogenized data. Correlations with the HadAT2 data are slightly lower.
Spikes in HadAT2 data around 1986 in Lindenberg and around 1974 in Schleswig are not seen in our homogenized data or the RICH data. After about 1995, HadAT2 sometimes reports suspiciously high temperatures: for example, by around 0.5 K in Schleswig from 1994 to 2004 at 300 hPa, or by up to 1.5 K in Lindenberg after 2000 (300 and 500 hPa). Since the Vaisala RS80 and RS92 radiosondes flown at these times have provided quite homogeneous data at levels below 300 hPa and had only small errors, these large HadAT2 differences are suspicious. They might be artifacts of the automatic homogenization procedure used for HadAT2.
b. Trend analysis
Figure 10 shows the long-term temperature trends obtained by linear fits to the homogenized RS anomaly time series. Error bars are obtained from the fit residuals. Generally, all German stations show a warming trend in the troposphere, about 0.25 K decade−1. In the stratosphere, the data give a cooling trend, which increases with altitude from no trend near the tropopause (300–230 hPa) to ~−0.5 K decade−1 near 50 hPa. The spread between trends at the different stations is about ±0.1 K decade−1 in the troposphere, increasing to ±0.25 K decade−1 in the stratosphere. The time periods also differ slightly between stations (cf. Table 1 and Fig. 2). Similar trend patterns are seen for the original, unhomogenized data (not shown). While changing the magnitudes, our homogenization has not resulted in fundamental changes for the long-term trends.
Somewhat to our surprise, the spread of trends for the homogenized data is similar to the spread of trends for the original unhomogenized data (not shown). At most stations, the trend error bars are also only slightly smaller for the homogenized data (not shown). This indicates that trend accuracy is limited by geophysical and other noise and by true differences between stations, including the different available time periods.
Figure 11 does show, however, that trends from the homogenized data are more positive than trends from the original data. The difference is about 0.1 K decade−1 both in troposphere and stratosphere. In the stratosphere, originally uncorrected radiation errors for the daytime data usually mean that temperatures become lower in the early years in the homogenized data, and the trend becomes less negative. This is because of the larger adjustments required in the early years. A 1-K temperature reduction [e.g., for daytime data (cf. Table 3)] should translate into +0.5 K for the day plus night average and to a roughly +0.05 to +0.1 K decade−1 increase for a 50-yr temperature trend.
In Fig. 11, two stations show trend differences outside of the normal range. The larger trend difference for Dresden is due to the lack of nighttime launches there from 1972 to 1991 (see section 1). The homogenization correctly accounts for that, and the homogenized temperature trend at Dresden falls well within the range of the other stations in Fig. 10. At Meiningen, between 300 and 100 hPa, both the trend in Fig. 10 and the trend difference in Fig. 11 fall outside of the range of other stations. This may indicate that homogenization was not so successful for Meiningen at those levels. Meiningen is the only station using RKZ-1A radiosondes, and some of the early data for that type of sonde are sparse.
Comparison of the average temperature trend above Germany from our homogenized data to other sources is illustrated in Fig. 12. Note that the time intervals for trend calculation differ a bit between datasets. All sources report similar trends in the troposphere. Clearly, the troposphere has been warming significantly over Germany by about +0.2 K decade−1. These warming trends are consistent with global temperature trends for the last 50–60 yr, as assessed by Hartmann et al. (2013): about +0.2 K decade−1 for global land temperatures and from +0.1 to +0.3 K decade−1 for temperatures in the free troposphere. Uncertainties are similar for the different datasets in Fig. 12: ~±0.05 K decade−1 (2σ). Uncertainty bars of tropospheric trends from the various sources overlap. Our homogenization produces an ~0.1-K decade−1 larger trend in the troposphere (cf. Fig. 11), bringing the German trend into better agreement with the other datasets. As expected for data from a single station (see Fig. 10), uncertainties are larger for trends at Payerne station (WMO number 06610, 46.49°N, 6.57°E; Brocard et al. 2013), and the 700-hPa trend at Payerne differs more substantially from the other results.
In the stratosphere, all datasets show negative temperature trends. This cooling increases with height, from no cooling near the tropopause up to from −0.4 to −0.6 K decade−1 near 50 hPa. Generally, differences between the datasets are larger in the stratosphere. Sometimes they exceed the uncertainty margins. The RICH dataset gives the largest stratospheric cooling trend, as expected, for example, from Fig. 9. The underlying reason for the larger RICH trend is, however, not clear. Between the tropopause and 100 hPa, our homogenized trend for the 1958–2011 period is in good agreement with HadAT2. At these altitudes the 1950–2013 and the 1958–2011 periods differ by up to 0.1 K decade−1. Near 50 hPa, the stratospheric trend from our homogenized data is almost 0.2 K decade−1 smaller than trends from RICH and HadAT2 but is in good agreement with the large-scale northern extratropical trend from RATPAC-A. Still, all datasets clearly demonstrate that the stratosphere has been cooling over Germany (and worldwide). The remaining differences between the datasets indicate that systematic uncertainties in the stratospheric trends (e.g., due to sampling and inhomogeneities in the data) are of the order of 0.1–0.2 K decade−1, slightly larger than in the troposphere but similar to what is assessed in Hartmann et al. (2013).
We have presented a manual method for homogenizing temperature records from radiosondes (RS). The method has been applied to German RS records from the early 1950s until 2013. Before German reunification in 1990, quite different RS types were flown in East and West Germany. This helps for homogenization, because, for an RS type change in West Germany, unchanged reference data are usually available in nearby East Germany, and vice versa. The largest corrections, more than −2.5 K, were determined for stratospheric daytime temperatures recorded in the 1960s in East Germany (Freiberg sonde). Homogenization corrections become smaller over time and are quite small for the Vaisala RS80 sondes flown from about 1990 to ~2004 (see also Steinbrecht et al. 2008). Uncertainty estimates for the determined corrections are typically between 0.05 and 0.5 K. Very small (<0.05 K) and statistically insignificant adjustments, usually at the lower levels in the troposphere, were not applied. Most notably, no correction was applied for the transition from Vaisala RS80 to RS92 at levels below 300 hPa.
Comparison of our homogenized radiosonde temperature time series to other homogenized datasets for Germany shows good agreement overall but some noticeable differences. Compared to the RICH dataset (Haimberger et al. 2012b), our homogenized temperature time series are almost identical after 1985 but differ a little in the 1960s and 1970s, usually less than 0.1 K, up to 0.5 K in some cases. In the troposphere, the HadAT2 temperatures after 1995 are sometimes unrealistically high for several years, by up to 1 K. This might be an artifact of the automatic homogenization procedures used for the HadAT2 dataset.
Trend analysis of the homogenized German RS temperatures from the 1950s to 2013 shows significant warming of the troposphere, by ~0.2 ± 0.1 K decade−1, and significant cooling in the stratosphere. Stratospheric cooling increases with altitude, from no long-term trend around 230 hPa, near the tropopause, to −0.4 ± 0.2 K decade−1 cooling near 50 hPa. Homogenization shifts both the tropospheric and the stratospheric trend, both by about +0.1 K decade−1. In the stratosphere, the RICH dataset, and to a lesser degree the HadAT2 dataset, show more cooling than our homogenized results, by up to 0.2 K decade−1. RATPAC-A trends for the northern extratropical stratosphere (Free et al. 2005; NOAA/National Centers for Environmental Information 2005) are quite similar to our homogenized German trends. In the troposphere, our homogenized trend results agree better than 0.1 K decade−1 with trends from the other RS datasets and from MSU satellite data and are in the range of global temperature trends, as assessed by Hartmann et al. (2013).
Our results indicate that temperature trends from the various RS datasets are accurate within better than 0.1 K decade−1 in the troposphere and better than 0.2 K decade−1 in the stratosphere. The fundamental information about long-term changes in the atmosphere is contained in both original and homogenized data. Long-term RS records can provide a solid base for analyses of temperature variability and trends over Germany.
Original and homogenized monthly mean datasets are available online (Pattantyús-Ábrahám 2015a,b). An extended radiosonde dataset including other European radiosonde stations has already been used for validation of decadal climate hindcast experiments within the framework of the MiKlip program (http://www.fona-miklip.de).
Comments and suggestions by two anonymous reviewers have uncovered unclear points in the initial manuscript and have helped to improve the paper. The authors thank Wolfgang Adam for providing valuable information about radiosoundings at Lindenberg and in the East German network. Support by the German Federal Ministry for Education and Research (BMBF) programme MiKlip, project MOSQUITO, Grant 01LP1106A, is also greatly appreciated.