The accuracy of Vaisala RS92 versus RS41 global radiosonde soundings, emphasizing stratospheric temperature, is assessed from January 2015 to June 2017 using ~311 500 RS92 and ~65 800 RS41 profiles and three different reference data sources. First, numerical weather prediction (NWP) model outputs are used as a transfer medium to produce relative RS92 and RS41 comparisons by analyzing observation minus NWP model background (OB–BG) and observation minus analysis (OB–AN) differences using the NOAA Climate Forecast System Reanalysis (CFSR; both comparisons) and the operational European Centre for Medium-Range Weather Forecasts (ECMWF) model (OB–AN comparison only). Second, GPS radio occultation (GPSRO) dry temperature profiles are directly compared with radiosondes, using GPSRO data from the University Corporation for Atmospheric Research (UCAR) Constellation Observing System for Meteorology, Ionosphere and Climate (COSMIC) and EUMETSAT Radio Occultation Meteorology (ROM) Satellite Application Facility (SAF). Third, dual launches (RS92 and RS41 suspended from the same balloon) at five sites allow direct assessments. Comparisons of RS92 versus RS41 from all reference data sources are basically consistent. These two sondes agree well with global average temperature differences <0.1–0.2 K in the lower stratosphere from 51.5 to 26.1 hPa based on global stations and the dual launches. RS41 appears to be less sensitive than RS92 to changes in solar elevation angle. This study indicates that nighttime RS92 and RS41 radiosonde temperature biases are negligible, but infers a stratospheric cold bias (<0.5 K) in the CFSR and ECMWF model data.
Balloonborne radiosonde observations (raobs) play a critical role in upper-air climate change detection, numerical weather prediction (NWP) data assimilation and forecasting, and satellite data calibration/validation (cal/val). Vaisala RS92 is a major sonde type in the current global operational upper-air network and a reference sonde in the Global Climate Observing System (GCOS) Reference Upper Air Network (GRUAN; Bodeker et al. 2016). However, RS92 has gradually been replaced by Vaisala RS41 starting in late 2013 and managing this transition is a high priority in GRUAN because RS92 production ended in August 2017.
Vaisala RS41 includes new sensor technologies aimed at providing improvements in measurement accuracy for temperature, humidity, and other variables throughout the atmosphere (https://www.vaisala.com/en/products/instruments-sensors-and-other-measurement-devices/soundings-products/rs41). Understanding the measurement accuracy of this emerging radiosonde type is of great interest to the climate trend detection, NWP, and satellite communities.
Radiation-induced error is a major issue with radiosonde temperature measurements in the upper troposphere and lower stratosphere (UTLS). Ventilation around the instrument diminishes with altitude due to reduced air density, causing radiosonde temperatures to be typically warm biased in daytime due to solar radiation and slightly cold biased at night due to radiation to space (Dirksen et al. 2014). Most weather stations apply radiation corrections to raobs, based on algorithms provided by radiosonde manufacturers. Raobs analyzed in this study include the manufacturer adjustments. However, because the radiation correction schemes were derived using limited data and most adjustments consider few factors [generally pressure, solar elevation angle (SEA), and balloon ascent speed], biases still remain even after correction (e.g., Gaffen 1994; Sherwood et al. 2005; Haimberger et al. 2008; Sun et al. 2013).
This study uses multiple datasets to characterize the differences between RS41 and RS92 upper-air temperature observations made at operational radiosonde stations, with standard operational Vaisala radiation corrections applied to both radiosonde types. Section 2 introduces methodology, section 3 describes the datasets used for the assessment, section 4 discusses results, and section 5 provides a summary and conclusions.
As described in section 3, three different reference data sources are used for the assessment: NWP model data, GPS radio occultation (GPSRO) dry temperature, and dual-radiosonde observations. In this study, the model data are used as a transfer medium to determine the relative differences of RS92 versus RS41 using two relative comparisons:
Compare raobs with forecast model background data, where the computed observation minus forecast background (OB–BG) differences for both RS92 and RS41 are used to identify their relative difference. Basically, an operational NWP model run starts with the preceding short-term forecast valid at the starting time of the new forecast cycle as a “background” or a “first guess.” For example, with a 6-h forecast cycle system, the forecast cycle starting at 1200 UTC uses the 6-h forecast from the 0600 UTC cycle as the 1200 UTC BG. Therefore, the 1200 UTC BG is an operational model forecast and is not affected by the new OB data, including the raobs near 1200 UTC to be analyzed in this work.
Compare raobs with the corresponding forecast model analysis data. The analysis at any given time, for example at 1200 UTC, is the result of the 1200 UTC BG, adjusted by assimilating all types of new observations (received generally from 1200 UTC ± 3 h) including radiosondes, surface data, satellite measurements, and GPSRO data. This study analyzes the observation minus analysis (OB–AN) differences for both RS92 and RS41 in the same way as the OB–BG differences to understand further the relative difference of the two sondes. While the analysis assimilates the same raobs for which the OB–AN differences are computed, the OB and AN values are still partially independent because the BG starting field is a physically consistent short-term forecast, and assimilation that produces the AN field does not automatically accept the reported OB values, but formally balances errors of all data sources. The OB–AN differences are therefore not zero, but are expected to be smaller than the OB–BG differences, and the AN fields could be used as an alternative reference data source to estimate radiosonde data error (as discussed in section 4).
It should be mentioned that major NWP centers, including the National Centers for Environmental Prediction (NCEP) and European Centre for Medium-Range Weather Forecasts (ECMWF), have historically employed OB–BG differences to make radiosonde data radiation corrections prior to their assimilation into the AN field for operational forecasts. To develop corrections, the center uses observations and forecasts, interpolated to observation locations, to compute the OB–BG temperature differences (or increments) at various heights and solar angles (including night) for all individual sonde types. For radiosondes, each distinct type is adjusted to be equivalent to some “reference” radiosonde type. The AN fields from the NCEP model are based on assimilated raobs adjusted with the current radiation correction (RADCOR) scheme, which at NCEP has not been updated since around 2000 (Sun et al. 2013), and NCEP currently applies no correction to newer sondes including RS92 and RS41. ECMWF uses the radiosonde bias correction scheme of Agusti-Panareda et al. (2009) with the corrections being updated monthly and being applied to RS92 and RS41. So, in this study, OB temperatures of RS92 and RS41 observations compared with the NCEP model BG or AN are the same OB values as the ones compared with ECMWF model data, but corrections have been made to the radiosondes assimilated in the ECMWF AN cycle even though the corrections to those two sondes are small (Ingleby 2017).
Direct comparisons were made of RS92 minus RS41 differences (RS92–RS41) in dual launches of both types suspended from the same balloon at five sites listed in Table 1. These provide the most rigorous radiosonde comparisons because both radiosondes sample the same air column, but the comparisons are still relative differences because neither RS41 nor RS92 provides absolute accuracy. The five sites sample different climate regimes, as summarized in section 3, to assess consistency with the “global” assessment.
GPSRO provides a technique to derive atmospheric profiles of temperature that are used to independently evaluate radiosonde observations. A GPSRO is a measurement by an accurately time-synchronized satellite of the time delays (or bending angles) of a GPS satellite signal near the horizon as the signal passes through different atmospheric layers. The bending angle varies due to atmospheric density (temperature, water vapor, and air pressure) and ionospheric electron density. Assuming spherical symmetry (relative to Earth) of the air in the GPS signal path, the ionospheric corrected bending angle profile can be used to compute a refractive index (or refractivity) profile downward from the upper stratosphere. In the stratosphere and upper troposphere, the “dry temperature” (Tdry) profile can be derived from the refractivity profile by ignoring water vapor. Computing Tdry in a moist profile causes a cold bias (Ware et al. 1996), but this study extends only down to 150 hPa, where the maximum plausible cold bias in a tropical thunderstorm is ~0.03 K, so water vapor can be ignored. Tdry has a high vertical resolution (0.1–1 km), although its horizontal resolution is low due to the long atmospheric signal path.
Since the GPS bending angle decreases exponentially with decreasing air density, the Tdry uncertainty reportedly increases quickly above ~25 km or pressures lower than ~25 hPa (Kursinski et al. 1997; Hajj et al. 2004; Steiner et al. 2011). The Tdry profiles above ~35 km are highly sensitive to the a priori required to initialize the hydrostatic integration at the ionosphere due to the radio occultation (RO) measurement “null space” (Tradowsky et al. 2017), and their accuracy in those upper layers is thus noticeably affected by the initialization (Steiner et al. 2013). Caution is therefore needed to analyze the radiosonde bias using GPSRO Tdry as the reference at altitudes higher than ~25 km.
For heights between 10 and 25–30 km (between about 100 and 25–10 hPa) where this study seeks to understand radiosonde accuracy, the average Tdry error is within 0.1–0.2 K (Hajj et al. 2004; Steiner et al. 2011), while the operational RS92 or RS41 uncertainty specified by Vaisala varies from 0.3 to 0.5 K (https://www.vaisala.com/en/products/instruments-sensors-and-other-measurement-devices/soundings-products; viewed October 2018). The value of GPSRO to act as an upper-air reference has been demonstrated in comparisons with radiosonde data to estimate their biases (Kuo et al. 2005; He et al. 2009; Sun et al. 2013; Ladstädter et al. 2015; Ho et al. 2017). Those studies analyze raob minus GPSRO Tdry temperature differences (OB–GPSRO), to directly compare with RS92 and RS41.
The Tdry data quality overall is consistent among different missions (satellite systems) and data processing centers, but differences among GPSRO products are still noticed particularly in regional analysis or analyses using limited data samples (Ho et al. 2012; Steiner et al. 2013). Therefore, two different GPSRO products are used in this study, as described in section 3.
The datasets used in this work, including raobs, model outputs, and GPSRO, have different vertical resolutions. For example, the dual-raob flights report data values at 1-s intervals, or usually ~7000 vertical levels, but standard operational raobs (reported in TEMP code format—as summarized in section 3) used in this work report ~50–150 vertical levels.
To suppress small-scale atmospheric structures in high-density vertical profiles and minimize the impact of different vertical resolutions on the raob accuracy assessment, an approach typical of satellite hyperspectral sounding retrieval validation is adopted here (Susskind et al. 2003; Tobin et al. 2006; Sun et al. 2017). Namely, radiosonde data, the NWP forecast and analysis, and GPSRO Tdry are all linearly interpolated (in the logarithmic pressure coordinate) to 100 common vertical levels from the surface to the top of the atmosphere, and the differences relative to the temperature reported by RS92 and RS41 are computed at the 100 levels. The 100-level values are then averaged into ~1-km coarse layers, and mean biases and standard deviation (SD) differences are computed in each of those layers and are used in the tables, plots, and discussions. Table 2 lists coarse layers centered from ~150 to ~10 hPa where raobs are generally subject to the strongest radiation impacts. Comparing with collocated GPSRO Tdry (see section 4c) reveals that the mean biases for RS92 and RS41 are rather small in the layers around 100 hPa, such as 92.8 and 113.9 hPa where biases for both sondes are <0.04 K. Assessment statistics are displayed in the plots up to ~10 hPa but only those for the lower-stratospheric layers (51.5–26.1 hPa) are included in the tables.
Two and a half years (January 2015 through June 2017) of global operational RS92 and RS41 temperature observations are analyzed for their OB–BG and OB–AN differences. The NWP data fields used for the OB–BG differences are the NOAA Climate Forecast System Reanalysis (CFSR; Saha et al. 2010) forecast background. The NWP data fields used for the OB–AN differences are the CFSR and the operational ECMWF analysis. Two GPSRO Tdry products are used as independent profiles for assessment. The first is the University Corporation for Atmospheric Research (UCAR) Constellation Observing System for Meteorology, Ionosphere and Climate (COSMIC; http://www.cosmic.ucar.edu/cdaac) GPSRO data using GPS receivers on the COSMIC satellites. The second is the EUMETSAT Radio Occultation Meteorology (ROM) Satellite Application Facility (SAF) Global Navigation Satellite System (GNSS) Receiver for Atmospheric Sounding data (GRAS; www.romsaf.org) using GPS receivers on the MetOp-A and MetOp-B satellites. Just as the temperature differences from the raob to target sources are abbreviated OB–BG for “observation minus background” and OB–AN for “observation minus analysis,” the raob minus Tdry differences are abbreviated OB–COSMIC and OB–GRAS.
All of those datasets are collocated in the NOAA Products Validation System (NPROVS; Reale et al. 2012; Sun et al. 2017), supported by the NOAA Joint Polar Satellite System (JPSS) and operated at NOAA NESDIS office of Satellite Applications and Research (STAR) starting 2008. NPROVS provides routine data access, collocation, and intercomparison of multiple satellite temperature and water vapor sounding product suites and NWP model profiles matched with global operational radiosonde and dropsonde observations. The collocation approach is to select the “single closest” sounding from each product suite anchored to the raob launch location. Description of the individual datasets is given below.
Raobs are those assimilated operationally by NOAA’s NCEP. Those raobs are transmitted through the WMO Global Telecommunications System (GTS), and have used the alphanumeric TEMP code (WMO Code Forms FM-35 to FM-38). The TEMP code forms have used the same general format since 1968 (when teletype circuits had very limited communication capacity), so reports have much lower vertical resolution and less precision in measured values than raobs in the Binary Universal Form for Representation of Meteorological Data (BUFR) format (WMO Code Form FM-94). The rounding in the TEMP code can vary with the processing software used. According to Ingleby (2017), at temperatures below 0°C RS41 appears 0.1 K warmer than RS92 using Digicora III software due to rounding. The Digicora III RS92 reports are 60% of all RS92 profiles used in the study, and the RS92 versus RS41 differences (shown in Tables 4–6 and Fig. 3) could thus be slightly underestimated.
In the last few years, an increasing number of raobs and other observations have been transmitted in the BUFR format. According to Ingleby and Edwards (2015) and Ingleby (2017), on average those two types of raob reports are very comparable to each other (with a TEMP reporting bias ~0.1°C or smaller due to systematic rounding or data conversions in station or forecast center decoding). TEMP variable rara or BUFR variable 002011 reports a radiosonde type code, and the code can be a typographical error (e.g., if the instrument group is omitted, and the following group is the launch time in the 2300 h, “23” can be interpreted as a Vaisala RS41 code). This analysis removed five erroneous cases of this type.
As part of the data assimilation process, for archiving and later analysis NCEP stores ingested raobs and all other observations into a file called PREPBUFR, with the 6-h CFSR guess (background) and analysis fields encoded into the PREPBUFR file at radiosonde balloon locations. As stated in section 1, raobs in PREPBUFR contain only corrections applied by the stations.
It was found that 7.7% of raob profiles available for this study are associated with meaningless CFSR background temperatures at all mandatory levels. The raobs associated with those “bad” values, distributed randomly in time and across the global stations, are excluded in this study.
We rejected raobs with a “failure” quality flag in the NCEP assimilation system (Collins 2001a,b) or having temperature difference ≥15 K from the NCEP background, as well as those with a vertical extent <5 km or a vertical gap ≥4 km (Reale et al. 2012), totaling ~1% of the observations. This study uses ~311 500 RS92 raobs from 344 fixed stations and 1101 ship launches, and ~65 800 RS41 raobs from 135 fixed stations and 1080 ship launches.
Figure 1 shows the spatial distribution of the radiosonde stations. As of June 2017, RS41 radiosondes are most used in western Europe, the Middle East, Southeast Asia, and New Zealand, but few or no RS41 are launched from North America, Australia, and Brazil where RS92 dominates. As with all other sonde types, operational RS92 and RS41 raobs are launched primarily around 0000 and 1200 UTC. Dual launches of RS41 and RS92 made at five sites are listed in Table 1, along with launch locations, time periods (some launches were in 2014), and number of analyzed dual launches.
b. CFSR background and analysis and ECMWF analysis
The CFSR data are those included in the PREPBUFR files generated in the NCEP data assimilation (see this section’s raob data description). This study also uses ECMWF (ECMWF 2018) operational analyses available at 0000, 0600, 1200, and 1800 UTC, with 91 vertical pressure levels thinned from the 137 model sigma levels and horizontal resolution of 0.25° × 0.25° (Eresmaa and McNally 2014). CFSR atmospheric profiles (with 64 vertical levels and ~38-km horizontal resolution) are 4D interpolated to radiosonde profiles (J. Wollen 2018, personal communication; Saha et al. 2010), meaning that radiosonde balloon drift in space and time starting from the launch location is taken into account. The balloon drift information is also ingested in the ECMWF system.
c. RO Tdry data
Due to degrading COSMIC constellation satellites in this period, the average daily number of RO profiles is about 600 from COSMIC, but about 1200 from GRAS on MetOp-A and MetOp-B combined. Both sources are approximately evenly distributed across the globe.
Since CFSR profiles were spatially and temporally interpolated to radiosonde data, their collocation error is already minimized. Since ECMWF data fields are at four synoptic times, almost all raob–ECMWF collocations accepted for analysis are within 1 h (including launches around 0600 and 1800 UTC), and raobs not launched close to synoptic times are not analyzed.
GPSRO and radiosonde collocations in this study are within 3 h and 250 km, but a sensitivity test is also conducted to understand the impact of sampling or collocation errors on the raob accuracy assessment. However, since the assessment is focused on the upper troposphere and above and the balloon takes about 30 min to rise to the upper troposphere (Seidel et al. 2011), 30 min is added to the radiosonde launch time when computing the collocation mismatch with respect to GPSRO (and ECMWF). The GPSRO profile location at around 100 hPa is used to compute the GPSRO distance mismatch with the radiosonde profile.
As usual, no reference dataset is the “truth,” but consistent comparison results using datasets of different sources give more confidence in the analysis, as discussed below.
As in Sun et al. (2013), the raob temperature bias is estimated using the mean raob minus collocated target (i.e., NWP or GPSRO) temperature, along with the SD of the differences, sorted into four SEA classes ranging from NIGHT (SEA < −7.5°), DUSK/DAWN (SEA = −7.5°–7.5°), LOW (SEA = 7.5°–22.5°), and HIGH (SEA > 22.5°). The number of soundings in DUSK/DAWN and LOW SEA classes is typically much smaller than in respective NIGHT or HIGH classes because on the average, DUSK/DAWN and LOW sun angles account for small portions of the 24-h day, and standard observation times at 0000 and 1200 UTC near 0° longitude are all near solar midnight and noon. All raobs occurring near solar noon are in the HIGH SEA class, except in winter at fairly high latitudes.
Results for ALL SEA are also computed using simple averages of all included raobs, to facilitate understanding of the overall raob bias tendency. “Global” difference statistics in this study are computed by simply averaging sounding data from all sites.
a. Analysis of OB–BG differences in CFSR
Figure 2 shows average raob minus CFSR OB–BG differences in the four SEAs for RS92 (top panels) and RS41 (bottom panels). The black curve in each left panel is the ALL SEA OB–BG difference. While it is desirable for all soundings to reach 10 hPa, the number of soundings decreases noticeably starting around 50 hPa due to balloon burst, by about 18% at 26 hPa, 38% at 19 hPa, and 59% at 10 hPa. Since balloons tend to burst at lower altitudes in winter (or at night when the balloon is not warmed by sunlight), higher-altitude unweighted statistics can be less seasonally, diurnally, and spatially representative.
In the top panels of Fig. 2, starting around 70 hPa, RS92 is systematically warmer than the CFSR BG, with warming increasing with altitude and with SEA class. The increasing warm difference as the SEA increases is a pattern of radiation-induced radiosonde bias. The BG minus COSMIC differences shown in the gray bars of Fig. 11 of Sun et al. (2013) indicate a small daytime warm bias increasing with SEA to ~0.16 K in the HIGH SEA class averaged for 10–70 hPa. Apparently, because NCEP applies no correction to newer radiosonde types (see the fourth paragraph of section 2), the earlier assimilation of the biased radiosonde data into the NCEP forecast model causes a small diurnal radiation bias in the BG field. Therefore, radiosonde warm biases inferred from OB–BG for both RS92 and RS41 in Fig. 2, and somewhat smaller warm biases inferred using CFSR OB–AN in section 4b could be slightly underestimated.
The standard deviations of the differences are similar among the different SEAs. They reflect the combination of random errors in both radiosonde and the model BG and the spatial and temporal mismatch between the two profiles. They are <1.0 K throughout the troposphere (not shown) and stratosphere except at the surface and at altitudes above 19 hPa where the SD values are larger. In the bottom panels of Fig. 2, RS41 is also systematically warmer than the CFSR BG, but the RS41 OB–BG shows less dispersion with SEA, suggesting RS41 is less sensitive than RS92 to solar radiation errors.
An objective method to quantify the solar radiation sensitivity shown in the left panels of Fig. 2 is to compute the spread of the OB–BG differences among the four SEA classes (i.e., NIGHT, DAWN/DUSK, LOW, and HIGH). The spread, or mean absolute difference (MAD), in any pressure layer is defined as the arithmetic average of the absolute values of the mean OB–BG differences for the four SEA classes (blue, gray, red, and purple lines in the left panels of Fig. 2) from the corresponding OB–BG difference for the ALL SEA class (black line). Note that the “ALL” OB–BG difference is a measure of the “inherent” radiosonde type bias, except that it includes model biases, while MAD is a measure of the sensitivity of radiosonde temperature bias to radiation. MAD should be a good estimate of the raob solar heating error even though OB–BG differences are not absolute comparisons. MAD is computed in the same way using OB–AN or OB–GPSRO differences.
Table 3 shows the MAD values for RS92 and RS41 in three lower-stratospheric layers (26.1, 37.4, and 51.5 hPa). The “OB–BG CFSR” column shows MAD values in these three layers in the left panels of Fig. 2. The increasing MAD value with decreasing pressure for both RS92 and RS41 confirms that the radiation impact on radiosonde bias increases with height. The smaller dispersions among SEA classes in the RS41 OB–BG compared to RS92 (also seen in Fig. 2) confirm that RS41 is less sensitive to solar radiation than RS92.
Using the BG as the transfer medium to compute RS92 minus RS41 differences assumes that, because the model assimilates many observation types and establishes physical consistency, the BG field for the new forecast cycle (e.g., the preceding 6-h forecast) almost completely averages out individual instrument biases. So, if one station uses RS92 and another uses RS41, the BG has negligibly different biases from previous RS92 and RS41 soundings at these locations. Figure 3 and the left column of Table 4 indicate that relative to RS41, RS92 tends to show a radiation-induced temperature bias pattern in the lower stratosphere, for example, with a difference in the 37.4-hPa layer of −0.10 K for NIGHT and 0.11 K for HIGH SEA. Although small, within the manufacturer-specified uncertainty (Jensen et al. 2016), the differences are statistically significant, and could pose a potential challenge for climate trend detection associated with temporal changes in instrumentation (Sherwood et al. 2005).
Note that the RS92 minus RS41 differences for ALL and particularly for HIGH shown in Fig. 3 increase with height above 26.1 hPa. The value for HIGH SEA reaches 0.37 K at 13.5 hPa and 0.52 K at 10.2 hPa. Similar tendencies are observed in the RS92 minus RS41 differences using either CFSR AN or ECMWF AN as the transfer medium (figures not shown). Further investigation is needed to find out if the increasing RS92 minus RS41 daytime warm difference (with height) in the stratosphere is related to the reduction in radiosonde sample sizes, as mentioned in the first paragraph of this subsection (more balloons reach high altitudes in warm conditions, or when sunlight warms the balloon), or reflects the true nature of radiosonde biases at high altitudes.
Radiosonde biases at night tend to be much smaller than in daytime and are often considered to be negligible, or at least satisfactorily corrected by the radiation corrections applied by the stations. So, if the NIGHT OB–BG (OB–AN) difference is not close to zero, it is possible that OB is correct at night, and BG (or AN) is the actual biased quantity (when OB is considered correct, the bias becomes either BG–OB or AN–OB). The considerable magnitudes of the NIGHT biases in the left panels of Fig. 2, which are quite variable with altitude but very similar for both RS92 and RS41, suggest that, instead of RS92 and RS41 both having a fairly large warm bias at the highest levels at night, CFSR BG has a cold bias at those levels. From Table 4, if the NIGHT raobs are the reference, then CFSR BG has a cold bias in the 26.1-hPa layer of −0.239 ± 0.84 K (± indicates one standard deviation in this paper) for RS92 and −0.346 ± 0.84 K for RS41, with the difference in average biases probably due to differing station locations. The CFSR biases are likely to change with altitude due to differing mixes of ingested data, and the CFSR warm bias centered at 92.8 hPa could be an artifact of interpolation of 64 coarsely spaced CFSR model levels to the radiosonde levels (Ballish and Kumar 2008; Saha et al. 2010).
The MAD values in Table 3 (and seen by the spread between the NIGHT and HIGH category lines in Fig. 2) are still a measure of the RS92 and RS41 radiative biases, especially because they increase at higher levels in Table 3, from 0.069 K at 51.5 hPa to 0.098 K at 26.1 hPa for RS92, but only from 0.010 to 0.025 K in the same layers for RS41.
While Fig. 2 comparison statistics by SEA class are based on global data, Fig. 4 shows the comparison statistics for Lindenberg, Germany (WMO ID 10393, 52.21°N, 14.12°E, 112.0 m), launching RS92 four times per day (most 0600 and 1800 UTC soundings have DUSK/DAWN or LOW SEA and some cold season 1200 UTC soundings have LOW SEA), and Zagreb/Maksimir, Croatia (WMO ID 14240, 45.82°N, 16.03°E, 127.6 m), launching RS41 twice a day (all 1200 UTC soundings are in the HIGH SEA class). They are about 750 km apart. The results in Fig. 4 are similar to Fig. 2 even though the difference statistics for each radiosonde are based on only one station. The MAD values in the UTLS, measuring the dispersions between SEA classes in the RS41 OB–BG at Zagreb/Maksimir are much smaller compared to RS92 at Lindenberg, for example, 0.065 K versus 0.033 K at 37.4 hPa. So the smaller radiation errors for RS41 compared to RS92 are supported by similar analyses using individual stations.
b. Analysis of the CFSR and ECMWF OB–AN differences
The CFSR OB–AN plots for RS92 and RS41 (figure not shown, but see OB–AN MAD values in Table 3 and OB–AN values in Table 4) are similar to the CFSR OB–BG. As mentioned in section 2, Table 4 shows that the CFSR OB–BG differences and standard deviations are slightly larger than the CFSR OB–AN differences and standard deviations (for RS92 at 51.5 hPa, the standard deviation decreases from 0.77 K with OB–BG to 0.58 K with OB–AN, and the corresponding RS41 change is from 0.81 to 0.60 K). This is expected because the analysis assimilates the current raobs while the background is only a forecast valid at the analysis time. The only slightly smaller OB–AN values (compared to OB–BG) indicate that the analysis process does not significantly correct any biases that are attributed to either inherent model biases or systematic biases in ingested data other than raobs. Similarly, the CFSR OB–AN MAD values in Table 3 corresponding to the same stratospheric layers as for OB–BG, indicate again that RS41 is less sensitive to SEA than RS92.
Figure 5 shows, and the right column of Table 4 lists, the ECMWF OB–AN differences, and the third data column in Table 3 lists the MAD values for RS92 and RS41. The analysis indicates that RS41 again shows improvement over RS92, but by <0.2 K for HIGH and <0.1 K for ALL in the layers of 51.5 to 26.1 hPa. The RS41 OB–AN curves for NIGHT and HIGH are almost identical except near the surface (not shown in Fig. 5) and in the highest layers. The separation of RS41 LOW and DAWN/DUSK from NIGHT and HIGH in the stratosphere in the lower-left panel of Fig. 5 could be related to small sample sizes for the former cases.
In Table 4, the NIGHT OB–AN ECMWF values are similar to the NIGHT OB–AN CFSR values in the 26.1-hPa layer, but the ECMWF values are roughly the same down to the 51.5-hPa layer while the CFSR values diminish down to that layer. Figure 5 shows statistics similar to Table 4, but for all four SEA categories (and the ALL average) in all layers from 10.2 to 151.2 hPa (the x axis is 151.2 hPa in each panel), and the NIGHT bias for RS92 diminishes at higher pressures and for RS41 diminishes toward the highest and lowest pressures. So, in this data period, the ECMWF analysis probably has a cold bias that is less variable with UTLS altitude than the cold bias of CFSR (roughly 0.1 to 0.7 K for CFSR, but 0.1 to 0.2 K for ECMWF), and ECMWF does not show the spike to a warm bias that occurs in CFSR centered on 92.8 hPa primarily because the former has a much higher vertical resolution than CFSR.
c. Analysis of the OB–COSMIC and OB–GRAS differences
As stated in section 2, Tdry obtained from GPSRO is an independent estimate of temperature with high vertical resolution, but agreement with a single raob is expected to be degraded due to the long GPSRO horizontal signal paths. While GPSRO measurements are available worldwide, they are quite randomly distributed. The local time of an RO occultation is determined by the orbit positions of the receiver and the GPS transmitter. COSMIC occultations are more random in local time, since its satellites are on different orbit planes. The EUMETSAT MetOp-A and MetOp-B satellites are in a fixed orbit plane with local equator crossing times at 0930 and 2130. Thus the local times of GRAS occultations are scattered between about 0800 to 1100 and 2000 to 2300 equatorward of 55° latitude, and can be any local time near the poles. GPSRO collocations are allowed within 3 h and 250 km of each raob, so only about 7% (11%) of raobs have a suitable COSMIC (GRAS) GPSRO. Table 5 shows the numbers of COSMIC and GRAS collocations for NIGHT, HIGH, and ALL sun angles, along with other statistics, in the same format as Table 4.
Figure 6 shows OB–COSMIC differences and standard deviations. OB–COSMIC differences are used to infer radiosonde biases, but differences for both RS92 and RS41 become large at altitudes with pressure greater than 150 hPa (not shown in Fig. 6) from GPSRO Tdry values that are too cold due to increasing moisture at lower altitudes.
The RS92 OB–COSMIC curves in Fig. 6 show a slight radiosonde cold bias at night and a daytime warm bias increasing with SEA and altitude, similar to Table 5. This pattern was also obtained using 2008–11 data (Sun et al. 2013). However, the average (ALL SEA) RS92 biases in the UTLS layers for 113.9 to 92.8 hPa are very small, within 0.04 K.
The small RS92 OB–COSMIC difference at night and the increasing daytime warm bias continue upward even to 10 hPa, indicating that the COSMIC Tdry quality may be good even for altitudes higher than 25 km. Nevertheless, we need to be cautious about the bias statistics above 25 km because of the reasons mentioned in section 2, in addition to small sample sizes.
Due to the small number of RS41 collocation samples, the curves in the bottom panels of Fig. 6 are noisy, but qualitative conclusions can still be made. Compared to the nearly zero RS92 night OB–COSMIC biases (upper-left panel of Fig. 6), RS41 shows small night warm OB–COSMIC biases, possibly related to the limited RS41 geographical regions. Daytime RS41 warm biases increase with altitude and sun angle, but are smaller than for RS92 (as indicated in Table 5). The smaller MAD values for RS41 than for RS92, shown in the OB–COSMIC column of Table 3, again indicate that RS41 is less sensitive to radiation impact.
Figure 7 shows OB–GRAS differences corresponding to the OB–COSMIC differences in Fig. 6, and the curves are very similar including high noise in RS41 for DAWN/DUSK and LOW due to small sample sizes. The general similarities of corresponding OB–GRAS and OB–COSMIC bias patterns for both RS92 and RS41 add to the confidence that most of the OB–GRAS biases are caused by radiosonde radiation errors. Colder RS41 OB–GRAS than OB–COSMIC biases with NIGHT and DAWN/DUSK sun angles (lower-left panels of both figures), and the similar RS92 HIGH and LOW OB–GRAS biases (Fig. 7, upper left), could be rooted in differing diurnal and geographical COSMIC and GRAS data sampling.
Note that, in Figs. 6 and 7 and Table 5, GRAS has larger standard deviations than COSMIC for both RS92 and RS41, even though the GRAS sample sizes are ~50% larger than for COSMIC. Both GPSRO products used in the analysis are nearly real time. GRAS instrument and level 2 processing was upgraded around October 2016 to include additional quality screening (S. Syndergaard 2018, personal communication), leading to smaller GRAS Tdry standard deviations in about the last third of the analysis period. For example, relative to global radiosonde data (including RS92 and RS41) collocated within 6 h and 250 km, the GRAS SD errors at 37.4 hPa average 1.91 K in February 2015 (0.29 K greater than the SD errors of COSMIC data) but decrease to 1.77 K in February 2017 (comparable to the COSMIC SD error values).
Temporal and spatial collocation mismatches impact the statistics of the observations being compared or validated (Tobin et al. 2006; Sun et al. 2010, 2017). To test the collocation sampling impact, we repeat the computations from Table 5 (collocations within 3 h and 250 km) using collocations within 1 h and 150 km in Table 6 This also reduces the sample sizes by about 80%–90%. However, the OB–COSMIC and OB–GRAS bias patterns for RS92 and RS41 are still similar, although with more noise (figure not shown), and the stratospheric standard deviations in Table 6 are about 15% smaller than in Table 5. At night, some of the small biases change sign between Tables 6 and 7, with the RS41 OB–GRAS difference changing from negative for the large collocation window to positive with the small collocation window. This exercise indicates that the error statistics obtained using GPSRO for RS92 are robust, although there is less confidence in the sign of near-zero differences with small samples, which is the case with RS41 in this study.
Overall, the OB–GPSRO biases suggest that the biases are mainly in the radiosonde data. RS92 has daytime stratospheric warm biases in both OB–COSMIC and OB–GRAS, and RS41 has smaller warm biases than RS92, with more noise in SEA classes with small samples. The nighttime stratospheric OB–COSMIC differences in Table 5 are all within ±0.1 K for both instrument types, but RS41 differences are larger than for RS92. At 26.1 hPa and lower pressures the diurnal range of raob minus GPSRO is larger than the diurnal range of raob minus NWP, the reasons for this are not currently clear.
d. Analysis of RS41 and RS92 dual-launch data
Table 7 lists the RS92 minus RS41 difference statistics from five dual-launch sites (Table 1) for SEA classes of NIGHT, DAY, and ALL. Due to the small number of launches in all sites, statistics for DAY combine LOW and HIGH SEA. “ALL” is the number of all dual soundings at all sun angles, but as described in the legend for Table 1, Table 7 omits analyses for the DUSK/DAWN category (no dual soundings except from two stations), and for the NIGHT category from Lamont and Lauder (<6 dual soundings). While these are relative differences, within the limitations of a small number of cases, these comparisons are rigorous measurements to indicate whether RS92 or RS41 has a larger bias.
Special GRUAN data processing (GDP; GRUAN software, version 2) was performed on RS92 data from all sites except Lauder, aiming to remove systematic biases in the data with uncertainty estimates provided (Dirksen et al. 2014). At all sites, RS41 data were processed with standard Vaisala procedures (as at other synoptic stations), including standard Vaisala RS41 corrections.
RS92 agrees well with RS41 in the troposphere except for being <0.1 K colder than RS41 in the middle and upper troposphere at Lindenberg, Lamont, and Lauder (figure not shown; also see Jensen et al. 2016). RS92 shows an increasing warm bias relative to RS41 at higher altitudes. As indicated in Table 7, for layers between 51.5 and 26.1 hPa, RS92 tends to be ~0.05 K warmer than RS41 for NIGHT and is mostly <0.25 K warmer than RS41 for DAY. The RS92 minus RS41 DAY positive differences actually increase with height in the stratosphere through ~13.5 hPa at sites where the data samples are sufficient. Specifically, at 18.6 and 13.5 hPa (respectively), RS92 is warmer than RS41 by 0.36 and 0.51 K at Lindenberg, by 0.37 and 0.41 K at Ny Alesund, and by 0.21 and 0.26 K at Payerne.
A warm bias in the lower stratosphere has been noticed in RS92 with GDP, in comparisons to GPSRO Tdry (Ladstädter et al. 2015). That study pointed out that GDP tends to undercorrect RS92 data, which causes slight warming (less than 0.2 K) compared to synoptic RS92 data containing the operational Vaisala corrections. This suggests that the actual difference between the RS92 and RS41 daytime radiative heating errors may be smaller than the differences found in the direct comparisons, and solar radiative errors in operational data appear to be improved in RS41 relative to RS92 particularly for altitudes above 26.1 hPa. The smaller RS41 minus RS92 difference (by ~0.12 K) at 18.6 and 13.5 hPa at Lauder supports this improvement.
The ALL averages in those three lower-stratospheric layers show RS92 is generally <0.2 K warmer than RS41, but since all sites have more day than night profiles, the RS92 minus RS41 differences for ALL are weighted toward DAY. In Table 7, the standard deviations of the RS92 minus RS41 differences at all sites are greater for DAY than for NIGHT, indicating that these two sondes have larger differences during daytime than nighttime [simply, that RS92 has more radiative heating than RS41, consistent with Jensen et al. (2016)]. Table 7 also shows moderate differences among the sites in RS92 minus RS41 NIGHT or DAY difference values. That could be due to the small number of flights and local factors affecting the radiation balance surrounding the sonde such as clouds and surface albedo (Bower and Fitzgibbon 2004).
5. Summary and discussion
The accuracy of upper-air temperature observations of Vaisala RS92 versus RS41 was assessed using two and a half years (January 2015 to June 2017) of global raobs with ~311 500 RS92 profiles and ~65 800 RS41 profiles. This was achieved using three different data sources as references for comparison, including NWP outputs, GPSRO Tdry, and dual radiosondes. The relative differences of the two sondes are estimated by analyzing observation minus NWP model background (OB–BG) and observation minus analysis (OB–AN) differences, as well as a small number of RS92 and RS41 dual launches (RS92–RS41 differences). GPSRO Tdry profiles are used for direct assessments of radiosonde observations.
These two sondes were found to be in good agreement with average difference <0.1–0.2 K in the lower stratosphere from 51.5 to 26.1 hPa based on globally (but not evenly) distributed stations, and dual-launch data from five different sites. However, the daytime RS92 warm bias relative to RS41 tends to become greater at heights above 26.1 hPa. Further investigation is needed to understand if that tendency is related to radiosonde sample reduction (early balloon burst) or whether RS92 actually has a larger solar radiation error than RS41.
Accuracy of radiosonde pressure measurements can affect the bias statistics, simply by attributing temperatures to incorrect pressures. Radiosonde atmospheric pressure is measured either by direct pressure sensor measurement, or by hydrostatic computation of pressure from reported temperatures and GPS heights. All Vaisala RS92 models have a pressure sensor, but RS41 is available with or without a pressure sensor (instrument codes 123–125 with a pressure sensor, or 141–142 without). According to Vaisala (2013), GPS-based pressures from RS41 are smaller than those from RS92 on average by <0.4 hPa near the surface and <0.02 hPa above 30 hPa, and GPS-based pressures from RS41 are greater than sensor-based pressures from RS92 by <0.3 hPa near the surface and 0.2 hPa above 30 hPa. Those differences are small and within the instrument specification (Vaisala 2013).
By increasing the reported RS92 pressure by 0.15 hPa at 750 hPa to 0.02 hPa at 10 hPa in proportion to the logarithm of pressure, the raob minus Tdry biases for GRAS change slightly. For example, the RS92 bias at 37.4 hPa becomes warmer by 0.04 K for HIGH and colder by 0.02 K for NIGHT. However, the net effect of the uncertainty of pressure measurements using different methods for RS92 versus RS41 may not be significant in the stratosphere where temperature gradients are small.
Previous studies (Sun et al. 2013; Ho et al. 2017) show a radiation-induced bias pattern in RS92 data, with a daytime warm bias increasing with altitude and with SEA. All three comparisons in this study, comparing raobs with NWP model data or GPSRO Tdry profiles, and dual RS92 and RS41 soundings, give similar results for both RS92 and RS41 with RS41 showing less difference in the warm bias with increasing SEA, indicating that RS41 is less sensitive to radiation than RS92.
While model grids are easily interpolated to the raob location and time, the collocation error introduced through spatial and temporal mismatch between GPSRO and a raob could influence the radiosonde accuracy statistics obtained. In this study, when the collocation window was reduced from 3 h and 250 km to 1 h and 150 km, the sample size is reduced by 80%–90%. The pattern and magnitude of the RS92 bias shows no pronounced change but the SD error is systematically reduced. For RS41, which has a much smaller number of launches and collocations with GPSRO Tdry profiles, the reduction in sample size using the tight collocation window does not change the overall night-versus-day bias pattern but causes more noisy differences, including some noticeably differing biases or even opposite signs in some SEA classes or pressure layers, due to suitable collocations being located in only a few climate regions (Sun et al. 2013), so this study should be updated as RS41 observations become more numerous. However, Tradowsky et al. (2017) use NWP model data to reduce the collocation error of individual GPSRO–raob collocations, and their method is recommended to understand the raob bias when the number of collocations available is small.
The RS92 versus RS41 accuracy comparisons obtained using two different near-real-time GPSRO products, UCAR COSMIC and ROM SAF GRAS, are basically consistent, reinforcing the robustness of the sonde bias analysis. However, it is preferable to use reprocessed RO products for analysis in the future, because GPSRO changes cause discontinuities, and reprocessing allows inclusion of delayed data and consistent methods to be applied to a long data record.
NWP model forecast background or analysis biases, originating from earlier assimilation of biased observations or systematic biases in forecast models (Eyre 2016), are generally unknown. This study generally supports the hypothesis that RS92 and RS41 radiosonde biases during nighttime are negligible. Therefore, similar positive OB–BG and OB–AN differences at night (~0.2–0.5 K) in the lower stratosphere with both radiosonde types are inferred to not indicate persistent night radiosonde warm biases, but instead indicate model cold biases. The stratospheric cold biases in the models are most probably caused by radiative cooling effect of moisture leaking into the region from the upper troposphere (Shepherd et al. 2018). The spread of differences with sun angle classes indicates superimposed radiosonde daytime radiative biases, but these model biases are likely to be similar at all sun angles and are larger in CFSR than in ECMWF.
We thank Dennis Kaiser and Jack Woollen at NCEP for discussing the operational radiosonde data assimilation processing at NCEP. Comments and suggestions from Bruce Ingleby at ECMWF and Xavier Calbet at AEMET greatly improve the quality of the paper. Comments on the DMI GRAS data characteristics by Stig Syndergaard and Johannes Nielsen at DMI and Axel Von Engeln at EUMETSAT are appreciated. We thank UCAR/CDAAC for COSMIC data, ROM SAF at DMI for GRAS data, and the GRUAN Lead Centre at Lindenberg, Germany, for access to the dual-launch radiosonde data and the agencies and individuals who made the dual launches. The views, opinions, and findings contained in this report are those of the author(s) and should not be construed as an official National Oceanic and Atmospheric Administration or U.S. government position, policy, or decision.