Quest over the Sampling Error of COSMIC Radio Occultation Temperature Climatologies

: The sampling error caused by the uneven distribution of radio occultation (RO) proﬁles in both space and time domains is an important error source of RO climatologies. In this paper, the sampling error RO temperature climatologies is investigated using the 4-yr (2007–10) data from the Constellation Observing System for Meteorology, Ionosphere, and Climate mission. The error is divided into three parts, including local time component (LTC), temporal component (TC), and spatial component (SC). The characteristics of the three components are investigated. Results show the following: 1) The LTC part of the total sampling error is characterized by a pattern of periodic positive and negative deviations, with a full cycle of about four months. The most signiﬁcant LTC values are found in the area around 60 8 N/S and the polar regions. 2) The TC part is mainly associated with the extent of day-to-day temperature variability and the daily number of ROproﬁlesobservedin eachmonth.The mostpronounced TCpartis shownin high-latitude areas inwintertime, where the day-to-day temperature variability is high. 3) The SC part shows distinct features in different altitude ranges. It is characterized by a systemic error in the lower troposphere (2–8km) but exhibits a seasonal trend at the altitude range from 8 to 40km. 4) The total sampling error is dominated by the TC and SC parts in the troposphere and lower stratosphere, whereas in the upper stratosphere it is dominated by the LTC part.


Introduction
The Global Navigation Satellite System (GNSS) radio occultation (RO) technique provides global coverage, high vertical resolution, all-weather capability, and long-term stability atmospheric profiles that are ideal for global climate monitoring. RO technique was initially used for detecting the atmospheric structure of Venus, Mars, and the outer planets since the mid-1960s (Eshleman and Fjeldbo 1967;Fjeldbo and Eshleman 1969;Kliore and Woiceshyn 1976). The first RO mission to sound Earth's atmosphere is the Global Positioning System/Meteorology (GPS/MET) experiment conducted from 1995 to 1997 where about 150 atmospheric profiles were obtained per day. The GPS/MET mission proved the feasibility of the RO technique in the Earth atmosphere sounding (Kursinski et al. 1997(Kursinski et al. , 1996. The GPS/MET RO measurements were demonstrated to be very effective in revealing the detailed temperature structure near the tropical tropopause even though only a very limited number of measurements were observed (Nishida et al. 2000). The next milestone mission of this technique is the Challenging Minisatellite Payload (CHAMP) project, which was launched in July 2000 and operational until September 2010 (Ho et al. 2012). Since March 2002, about 230 atmospheric profiles per day were obtained from the CHAMP satellite. The number of RO measurements experienced substantial growth with the implementation of the Constellation Observing System for Meteorology, Ionosphere and Climate (COSMIC) project since 2006. The COSMIC constellation, consisting of six low-Earth-orbit (LEO) satellites, is the first space mission dedicated to GNSS RO in a group of satellites in history. It was first launched into its parking orbits of 512-km altitude in April 2006; since June 2006, the six satellites were successively deployed into their final orbits of about 800-km altitude, with a separation of 308 longitude of ascending node; FORMOSAT-3 (FM-3) is maintained at an orbit altitude of 711 km because of a solar array drive mechanism problem. During its optimal working time (;2007-10), the COSMIC constellation provided more than 2000 atmospheric profiles per day (Anthes et al. 2008;Fong et al. 2011).
RO atmospheric data had been widely used in global weather and climate studies. On the one hand, they are assimilated into atmospheric reanalysis to reduce the temperature biases with respect to other data sources, for example, radiosonde (Poli et al. 2010;Rennie 2010). On the other hand, the quantity of RO atmospheric profiles provides the opportunity to establish RO climatologies, which can be directly used in climate monitoring and research (Scherllin-Pirscher et al. 2017;Schmidt et al. 2010;von Engeln et al. 2005;Zhang et al. 2018). RO climatologies Denotes content that is immediately available upon publication as open access. refer to the gridded data of atmospheric parameters (include temperature, pressure, atmospheric density, etc.) that can be established based on RO atmospheric profiles. RO climatologies are obtained most simply through binning and averaging of RO profiles (herein ''bin'' method). The bin method is first present in the CHAMPCLIM project, which dedicates to exploit the CHAMP RO data in the best possible manner for climate monitoring. In the bin method, Earth's atmosphere is divided into evenly distributed geographical cells (bins) with different latitudinal and longitudinal extensions. The mean temperature of those cells is estimated by an average of all RO measurements inside, weighted with the cosine of their latitudes Foelsche et al. 2006).
The GPS/MET, CHAMP, and COSMIC missions together with other RO projects, such as GRACE (2002-17), MetOp-A (2007-15), MetOp-B (2013-15), and FY-3 (from 2015 to present), have provided an RO dataset of about 20-yr from 2000 to 2020. An opportunity for long-term climate analysis using independent RO measurements has arisen as a consequence. Therefore, how best to use the long-term RO measurements for climatological purposes needs to be investigated, and the error characteristics of RO based climatologies becomes a necessary research task.
The error of the RO climatologies is determined through a combination of observational errors and sampling errors. The observational error is predominantly associated with the performance of GNSS receivers onboard on LEO satellites and RO data retrieval methods (Kursinski et al. 1997;Ladstadter et al. 2011;Staten and Reichler 2009;Steiner and Kirchengast 2005). This paper will focus on the analysis of the sampling errors, which results from the uneven distribution of RO profiles in both spatial and temporal domains. The RO climatologies obtained by the bin method make it possible to represents atmospheric states based on discrete RO profiles. However, RO measurements are irregularly distributed around the globe, resulting in an unequally weighted sampling of the atmosphere. Thus, even perfect measurements would yield sampling error in RO climatologies. The sampling error is regarded as an important error source of RO climatologies, it has to be considered carefully especially in a single LEO satellite situation (Foelsche et al. 2003).
To alleviate sampling error's impact on RO climatologies, Foelsche et al. (2006) proposed a method to estimate RO sampling error by subsampling a reference atmospheric model [e.g., European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data] to the locations of RO profiles. By subtracting the estimated error from the RO climatologies, most of the sampling error can then be removed, leaving a residual sampling error, which is about 30% of the original values. Next, an analytical error model for the residual sampling error is established based on empirical error estimates (Scherllin-Pirscher et al. 2011a,b). Leroy et al. (2012) explored the method of mapping RO profiles by Bayesian interpolation, which is ideally suited to fit randomly and nonuniformly distributed data.
In this study, the sampling error of the monthly mean RO temperature climatologies based on COSMIC atmospheric profiles is investigated. According to the sources of the sampling errors, the local time component (LTC), temporal component (TC), and spatial component (SC) parts are first categorized. Then, the climatologies from RO measurements are investigated using 4-yr COSMIC RO data during 2007-10. The structure of this paper is outlined below. Section 2 gives the methods of sampling error estimation and the strategies used to separate different contributing components from the total sampling error. Section 3 analyzes the characteristics of these three components and their contributions to RO climatologies. A summary and conclusions are presented in section 4.

Methods
For RO atmospheric profiles, the COSMIC2013 data series provided by the COSMIC Data Analysis and Archive Center (CDAAC) were used. CDAAC provides a few kinds of RO profiles (WetPrf, AtmPrf, etc.). In this study, the WetPrf files (level 2; wet atmospheric parameters profiles from the surface to 40-km altitude) are used. Notes that since only the geolocations and times of RO profiles are used in the sampling error estimation, the result of sampling error estimates will not be affected by which kind of RO files we chose. There are 2 612 779 high-quality atmospheric profiles in total and an average of 1788 profiles per day during the data span selected (2007-10). The 12-h forecasts of ERA-Interim were used as the ''truth'' fields in the sampling error estimation (Berrisford et al. 2009).

a. Total sampling error estimation
To obtain the separated sampling error from the total error of monthly mean RO climatologies (with the removal of the corresponding observational errors), interpolated profiles are extracted from the truth field (ERA-Interim) by the spatial and temporal interpolation. First, based on the time of each RO profile, the two nearest ERA-Interim time layers are selected. Then, the two ERA-Interim fields are spatially interpolated to the geolocation of the corresponding RO profile, yielding two profiles with a time interval of 6 h (ERA-Interim is available at 0000, 0600, 1200, and 1800 UTC). In the end, the two profiles are linearly interpolated to the time of the RO profile, yielding the interpolated profile. RO profiles below 2 km are often unavailable for reasons such as surface topography complexity and the high contents of water vapor in the lower atmosphere (Yu et al. 2017); thus, a 2-km cutoff altitude is used. The interpolated profiles are averaged into longitude-latitude bins of size 608 3 58 weighted with the cosine values of their latitudes, resulting in the proxy RO climatologies T(l, u, h) Prof .
The reference temperature field is obtained by averaging the temperature profiles (from ERA-Interim) at all grid points inside the fundamental bins and all time layers within each month (also weighted with the cosine of latitude). Then those fundamental bins are averaged to 108 zonal bands, weighted with the area of the corresponding fundamental bin, resulting in the reference climatologies T(l, u, h) Ref .
Since the proxy RO climatologies T(l, u, h) Prof are calculated based on subsampling ERA-Interim to the geolocations and times of the RO profiles, the observation error of RO measurements is filtered out while the sampling error remaining constant. Thus, the separated sampling error can be calculated by where DT(l, u, h) Total is the total sampling error.

b. Local time component estimation
The LTC part of the total sampling error originates from the fact that RO atmospheric measurements cannot cover the full 24-h cycle of the atmospheric temperature. For nonsun-synchronous satellites such as CHAMP and COSMIC, satellites' orbits usually take several months to drift through all local time (about 260 days for CHAMP and 120 days for COSMIC, respectively). So, at a given latitude, the local times of RO atmospheric profiles are roughly constant for a short period (from days to weeks). Those profiles cannot provide an optimal sampling for the full diurnal temperature cycle, resulting in some systematic biases in RO climatologies.
To analyze local time influences on CHAMP RO climatologies, Pirscher et al. (2007) suggested a method to separate the LTC part from the total sampling error by randomizing the local times of all RO profiles. In this paper, a new method is proposed. For every RO profile, three additional auxiliary profiles are generated using the same geolocations and calendar days of the original profiles, with an interval of 6 h in local time. For example, if an RO measurement takes place at 0830 local time, three auxiliary profiles are set at 0230, 1430, and 2030 local time at the same geolocation and day of the original measurement. Then the ERA-Interim field is interpolated to the times and geolocations of both the original and auxiliary profiles, yielding a new set of interpolated profiles. Based on these profiles, a new temperature field T(l, u, h) Prof21 and its sampling error field DT(l, u, h) Prof21 are calculated using the same method mentioned in section 2a.
Since the subsampling profiles with 6-h intervals are sufficient to filter the second harmonics of the diurnal temperature cycle out, most of the LTC (about 90%) in the sampling error field DT(l, u, h) Prof21 is removed ). On the other hand, since the geolocations and dates of auxiliary profiles are the same as the original profiles, both the TC and SC parts of the total sampling error remain constant. Therefore, the separated LTC part can be obtained by calculating the differences between the total sampling error field DT(l, u, h) Total and the new sampling error field DT(l, u, h) Prof21 : where DT(l, u, h) LTC is the separated LTC part of the total sampling error.

c. Spatial component estimation
The SC part of the total sampling error results from the uneven geographical distribution of RO profiles. For atmospheric parameters that vary significantly with location, it is almost impossible to capture the full spectrum of the location-dependent variability in the space domain with a limited number of RO measurements. Furthermore, the unevenly spatially distributed RO measurements may lead to an unequally weighted sampling of the atmosphere. As a result, RO based climatologies will be biased toward the portions in each bin where more RO measurements are recorded. The separated SC part of the total sampling error is also obtained by adding auxiliary profiles into the original RO profiles set. First, for each RO profiles, three additional profiles are added at the same locations and days with a 6-h interval in local time. Then, for each profile (both of the original and auxiliary profiles), another profile is added to each day in that month, with the same local time and geolocation of the profile. For example, if an RO measurement is recorded at 0830 local time on the first day of January, a total number of 123 (31 3 4 2 1) auxiliary profiles will be added at 0230, 0830, 1430, and 2030 local time in each day of that month. After that, the ERA-Interim field is interpolated to the temporal and spatial locations of all those profiles (original and auxiliary profiles), yielding a new set of interpolated profiles. Then a new temperature field T(l, u, h) Prof22 is established based on those interpolated profiles.
The purpose of the above operations is to remove the LTC and TC parts of the total sampling error in the temperature field T(l, u, h) Prof22 . As we mentioned in section 2b, the LTC part is filtered out by the 6-h interval interpolated profiles. At the same time, the number of interpolated profiles in each day of one month is identical, so the TC part does not exist anymore. In the end, the SC part remains constant because the proportion of profiles at each geolocation is not changed. Therefore, the SC part of the total sampling error could be calculated by

444
where DT(l, u, h) SC is the separated SC part of the total sampling error. We must note that a residual LTC part is included in DT(l, u, h) SC because the 6-hour-interval ERA-Interim is not enough to filter out all harmonics of the diurnal cycle. However, since the magnitude of the residual LTC part is very small relative to that of the SC part, it will be ignored in our studies.

d. Temporal component estimation
The TC part of the total sampling error results from the different numbers of daily RO profiles observed in each month. Since the atmospheric state varies with time, the binning and averaging processes will skew climate averages toward the days where more measurements are recorded, leading to a potentially inaccurate estimation of the atmospheric state. For example, the lack of RO data during one or more weeks in a month will lead to an overestimation/underestimation of temperature in the RO climatologies.
After an in-depth study of the characteristics of the sampling errors, we found that the total sampling error is a linear superposition of the three components. So, the TC part can be calculated by removing the SC and LTC parts from the total sampling error where DT(l, u, h) SC is the TC part of the total sampling error.
To verify the correctness of the methods and further investigate the characteristics of the three components, simulation experiments are performed in the appendix.

Results and discussion
a. Local time component summer in the Northern Hemisphere, respectively) at different altitudes. As Fig. 1 shows, in the lower atmosphere, significant LTC parts are shown at midlatitudes with extreme values of about 60.3 K. Between the altitude of 9 and 23 km, the value of the LTC part decreases as the increasing altitude, and its magnitudes are, in general, less than 0.2 K. Above 23 km, the summer hemisphere (i.e., the hemisphere where it is summertime) is mainly dominated by negative LTC parts, with the maximum magnitude of about 0.4 K at midlatitudes (308-608N/S). While in the winter hemisphere (i.e., the hemisphere where it is wintertime), positive values are more pronounced, and the values rarely exceed 0.3 K. According to the altitudinal characteristics of the LTC part, the vertical structure of LTC is divided into three ranges: the lower troposphere (2-8 km), the upper troposphere and lower stratosphere (UTLS) (8-24 km), and the upper stratosphere (24-40 km) regions. At a given altitude the distribution of the LTC part is zonally symmetric. Thus, two types of regional mean LTC part are calculated in zonal bands with 108 latitudinal width across the above three ranges: where N grid is the number of bins in each zonal band; T(u) LTC and T(u) ABS LTC are zonal mean LTC and its absolute magnitude, respectively; and u denotes the latitude of the bins. The time series of these two variables between 2007 and 2010 are presented in Fig. 2. The temporal features of the LTC part are characterized by a pattern of interchangeable positive and negative deviations with a full cycle of about four months (Figs. 2a,c,e), which is the amount of time it takes for the line of nodes to regress 24 h in local (solar) time. For example, in January 2007, the equatorial crossing solar time of the six COSMIC satellites is about 1300-2300 (ascending node) and 0200-1200 (descending node) local time (Fig. 3a). As a result, in the Northern Hemisphere, most RO profiles take place in 0000-0900 and 1600-2200 local time (Fig. 3b), so the nighttime carries more weight than the daytime. However, since the apparent motion of the sun relative to COSMIC satellites' orbits is about 3.338 day 21 (in their parking orbit of 512-km altitude), after about 60 days, in March 2007 (2007.060-069, where the numbers to the right of the decimal are yeardays) the equatorial crossing time of COSMIC satellites moved forward about 13 h to 0000-1000 (ascending node) and 1300-2300 (descending node) local time (Fig. 3c). So, most RO profiles in the Northern Hemisphere are found to be between 0400 and 2000 local time (Fig. 3d), resulting in a negative deviation. Antisymmetric behavior is found in the Southern Hemisphere because the distribution of RO profiles in local time is antisymmetric with respect to the equator (Figs. 3a,c,e).
The LTC part exhibited some different behaviors in 2007 when the COSMIC constellation experienced its raising phase. FIG. 4. As in Fig. 1, but for the TC part of total temperature sampling error.
Before 2008, the zonal mean LTC part is more pronounced, and the regions with significant deviations are wider than in the subsequent periods. This change mainly results from the variation of the COSMIC satellites' orbits. At the beginning of 2007, two of the six COSMIC satellites (FM-2 and FM-5) had already been raised into in their final orbits; FM-6, FM-4, and FM-1 finished their final deployment in February, May, and November 2007, respectively (FM-3 is maintained at an orbital altitude of 711 km) (Fong et al. 2011). The satellites in the separated orbital planes provide a more evenly distributed RO profiles over local time (Figs. 3e,f), and subsequently lead to smaller LTC values.
The latitudinal behaviors of the LTC part are mainly associated with two factors. First, the latitude of a location determines the ability of RO profiles to sample the diurnal temperature cycle. Influenced by the orbital characteristics of COSMIC satellites, in the tropics (208S 208N) each COSMIC satellite passes through the equator twice in one orbital period with a local time interval of 12 h so that the six satellites in 308 separated orbital planes can have an adequate sampling of the full diurnal temperature cycle with 9-10 days. However, the local time interval between the two crossing times that each COSMIC satellite passes through the same latitude decreases poleward. Moreover, the ascending and descending branches become coincident in the high-latitude region (about 758N/S, see Figs. 3a,b). So, in strict terms, in any region besides the equator, COSMIC satellites cannot have an ideal sampling of the diurnal temperature cycle with less than 120 days. In general, the ability of COSMIC RO data to sample the diurnal cycle decreases with an increase of latitude.
The other factor that influences the latitudinal behaviors of the LTC part is the extent of diurnal temperature variation FIG. 5. As in Fig. 2, but for zonal mean TC.

MARCH 2021
S H E N E T A L .
range, which is usually defined as the difference between the highest and lowest temperature during the same day. Generally speaking, the higher the diurnal temperature range, the greater the magnitude of the LTC. In the lower troposphere region, for example, the diurnal temperature range is small near the equator, peaks at 20-408N and 20-308S, and then decreases poleward (Geerts 2003). Under the collective effects of the changes of diurnal temperature range and the sampling ability of COSMIC RO data, in the lower troposphere the magnitudes of LTC part are the smallest around the equator and gradually increase poleward until about 608N/S, where the maximum deviations are achieved, and then decrease poleward (Fig. 2a). Some significant deviations are also observed in the Arctic region. These deviations mainly result from the sparse RO measurements in the polar regions, as well as by some specific meteorological phenomenon, such as polar vortex.
In the UTLS region and the upper stratosphere, the overall characteristics of the LTC part are similar to the case of the lower troposphere, but its magnitudes vary greatly. In the UTLS region, the zonal mean LTC part is, in general, smaller than 60.1 K (Fig. 2c). However, in the upper stratosphere, the LTC part becomes more pronounced, with the maximum zonal mean values of ;60.2 K (Fig. 2e). Since the local times of RO profiles does not change with altitude, those variations are mainly attributed to the different extent of the diurnal temperature variations in different height ranges (Seidel et al. 2005).
The absolute magnitudes of the LTC part display a wavelike structure (Figs. 3b,d,f). The absolute magnitudes are always more pronounced in the winter hemisphere than that in the summer hemisphere, which mainly results from the large diurnal temperature range during wintertime. In the UTLS region and the upper stratosphere, some significant deviations are also observed in the tropics. Those characteristics mainly result from the seasonal and altitudinal changes in the diurnal temperature variation. Figure 4 shows the TC part of the total sampling error in January 2008 and July 2008, respectively. In the lower troposphere, significant TC values are observed at midlatitudes and high latitudes in both hemispheres, with an extreme value of about 61 K. In the altitude range from 9 to 23 km, the magnitude of the TC part decreases with the increasing altitude, and the maximum deviations are about 61.5 K. Above 23 km in the altitude, the values of the TC part in the summer hemisphere decrease gradually, while in the winter hemisphere it increases with the altitude and reaches about 63 K at 38-km altitude.

b. Temporal component
Zonal mean TC part and its absolute magnitudes are calculated using Eqs. (4) and (5), where DT(l, u, h) LTC is replaced by DT(l, u, h) TC . The results are presented in Fig. 5.
The characteristics of the TC part are strongly associated with the intensity of day-to-day temperature variation. To investigate FIG. 6. The monthly temperature differences (K) at the (a) lower troposphere, (b) UTLS region, and (c) upper stratosphere. Also shown are (d) the daily mean temperature (K) in the UTLS high-latitude region in February 2007-10 (the red, green, black, and blue lines, respectively) and daily number of RO profiles in the corresponding region and time (the red, green, black, and blue dots, respectively). the relationship between the TC part and the day-to-day temperature variability, we calculate the monthly temperature differences, which are defined as the difference between the maximum and minimum daily mean temperature in each month (based on ERA-Interim). The results are presented in Fig. 6.
In the lower troposphere, the zonal mean TC parts show a random pattern of negative and positive values, with most values between 61 K (Fig. 5a). That is predominantly attributed to the different number of RO profiles for each day in the corresponding month. The number of RO profiles is influenced by many factors, such as the tracking capabilities of GNSS receivers on LEO satellites, the performance of ground systems, and RO data retrieval methods. So, the number of RO profiles has no apparent regularity in one month's time, resulting in the undetermined signs of the TC part.
In the UTLS region, the TC part shows similar characteristics with the monthly temperature differences (Figs. 6d,b). In general, the larger the monthly temperature differences are, the greater the TC part will be. At midlatitudes and low latitudes, the monthly temperature differences keep within 5 K, and values of the TC part are smaller than 0.5 K in most places. At high latitudes, some notable negative TC parts are shown during the first quarter (January-March), with maximum values of approximately 23 K. To investigate the reasons for those notable TC values, we calculated the daily mean temperature and the daily number of RO profiles in the UTLS highlatitude region in February 2007-10. The results are shown in Fig. 6d. An interesting fact is that more RO measurements are recorded during cold periods than during warm periods. For example, in the first 20 days of February 2007, the daily mean temperature in the UTLS region is about 208-212 K (the red line in Fig. 6d), and the daily number of RO profiles is ;210 per day (the red circles in Fig. 6d); after the twentieth day, the daily mean temperature increases to about 220 K and the number of RO profiles drops to about 170 per day. Similar situations are also found in 2008-10 (Fig. 6d). The reasons for that situation are beyond the scope of this paper, while the results are obvious: the days with lower temperatures carry more weights than the warmer days, resulting in significant negative TC parts. In the upper stratosphere, the characteristics of the TC part are similar to that in the UTLS region, but its magnitudes are smaller (Fig. 5e). Figure 7 shows the SC part of the total sampling error in January 2008 and July 2008, respectively. As Fig. 7 shows, in the lower atmosphere, some significant deviations are observed at high latitudes, with a maximum value of about 61 K. With the increase in altitude, the magnitudes of the SC part gradually decrease. In the altitude range from 9 to 23 km, the values of the zonal mean SC part rarely exceed 60.5 K. Above 23 km in altitude, the SC part in the summer hemisphere decreases with an increase of the altitude, whereas in the winter hemisphere it increases rapidly, with extreme values of ;60.8 K.

c. Spatial component
The time series of the zonal mean SC part and its absolute magnitude are presented in Fig. 8. The characteristics of the SC part are associated with two factors: FIG. 7. As in Fig. 1, but for the SC part of the total temperature sampling error.

MARCH 2021
S H E N E T A L .
1) The geographical distribution of RO profiles. According to the orbital characteristics of COSMIC satellites, the geographical distribution of COSMIC atmospheric profiles is symmetric with respect to the equator. Figure  positive deviations are observed around 508 and 258N/S, respectively. In the tropics, the SC part is relatively small because the north-south temperature gradient is quite weak.
In the UTLS region, the meridional structure of temperature shows seasonal trends. In the winter hemisphere, the zonal mean temperature is highest in the polar regions and decreases uniformly from pole to equator; in the summer hemisphere, the temperature rises from a minimum at the equator to a peak at about 408-508 latitude and then drops to another minimum in the polar regions (Figs. 9b,c). As a result, the SC part exhibits an annual cycle: during wintertime, the SC part is dominated by negative values, only some slightly positive deviations are presented around about 508N/S. In the summertime, positive deviations are more pronounced than negative deviations. At low latitudes, the values of the zonal mean SC part are relatively small (Fig. 8c).
In the upper stratosphere, the distribution of temperature is similar to that of the UTLS for the most place. While noticeable differences are found in the high latitudes, especially in the South Hemisphere. In such a latitude region, the values of positive SC deviations are larger compared to the UTLS region, and the areas with positive deviations become wider.

d. Total sampling error
The distribution of the total sampling error in January 2008 and July 2008 are presented in Fig. 10. As Fig. 10 shows, in the lower troposphere the total sampling error in the winter hemisphere is slightly larger than that in the summer hemisphere. The extreme values are about 62 K, which are observed at midlatitudes and high latitudes. Around the tropopause (9-16 km), the magnitudes of the total sampling error in the two hemispheres are similar, and the values fall within 61 K in most places. In the stratosphere, the total sampling error in the summer hemisphere decreases with the increasing altitude; above the altitude of 20 km, the magnitudes of total sampling error rarely exceed 0.5 K. In the winter hemisphere, however, the deviation increases with altitude, and the maximum values exceed about 62 K at 37-km altitude, especially in the Northern Hemisphere.
The time series of the zonal mean total sampling error are presented in Fig. 11. In the lower troposphere (Figs. 11a,b), the total sampling error shows both positive and negative deviations at high latitudes, but the negative parts are more pronounced than the positive parts. At midlatitudes and the subtropics, zonal mean series has a noticeable negative jump around 508N/S, which mainly attributed to the SC part. In the tropics, the zonal mean total sampling error is close to zero (less than 60.2 K).
In the UTLS region (Figs. 11c,d), at high latitudes the most significant total sampling error is observed in wintertime, with extreme values of about 61.5 K; in other seasons, the errors are mainly negative, and their magnitudes are within 60.5 K. At midlatitudes and the subtropics, the zonal mean total sampling error is dominated by negative values, but some slight positive deviations emerge around 508S. In the tropics, the total sampling error is very small (less than 60.1 K).
In the upper stratosphere (Figs. 11e,f), the maximum zonal mean error at high latitudes is approximately 21.5 K in the wintertime but decreases to 60.5 K for the rest of the year. At midlatitudes and the subtropics, the error shows a pattern with episodic positive and negative deviations, with extreme values of about 60.5 K. In the tropics, the magnitudes of zonal mean total sampling error are less than 0.3 K.
The features of the total sampling error of COSMIC RO climatologies presented here show some differences with previous studies (Foelsche et al. 2007;Pirscher 2010). These differences mainly result from the different orbital characteristics of the COSMIC satellites (compared to other RO missions). The most significant differences are found in the tropics of the lower troposphere, where a ''dry sampling error'' is shown. In the moist atmosphere, RO atmospheric profiles tend to stop at higher altitudes, and the lowest part of RO ensembles is therefore biased toward dry conditions, resulting in a systematic positive deviation to RO climatologies, which is the dry sampling error (Foelsche et al. 2006). The dry sampling error has evident influences on RO climatologies based on earlier RO missions (e.g., CHAMP), but not for COSMIC RO based climatologies. This difference probably results from the COSMIC mission adopting an open-loop tracking technique, which results in significantly deeper penetration of RO profiles below 8 km altitude (Anthes et al. 2008). In the lower troposphere, the number of COSMIC RO profiles is much less affected by humidity than were earlier RO missions. At the same time, the temperature variability in the tropical lower troposphere is very weak, both in temporal and spatial domains. So, the total sampling error of COSMIC RO climatologies is very small in such regions.
To further investigate the three components' contributions to the total sampling error, the 4-yr average of all those errors in different height ranges and different latitude bands are given in Table 1, together with their absolute magnitudes. The zonal mean total sampling error is equal to the sum of the zonal mean FIG. 10. As in Fig. 1, but for the total temperature sampling error. LTC, TC, and SC parts. However, the absolute values of the total sampling error are always small than the sum of the three components. That is understandable because the opposite signs of the values of different components reduce the magnitude of the total sampling error.
The absolute magnitudes of the LTC part are relatively small compared to the other two components. The largest magnitude of LTC, with values of 0.09 K, are observed in the upper stratosphere, where the diurnal temperature variability is strong. The zonal mean LTC part is also very small, with most values between 20.07 and 0.00 K. That can be explained by the periodic positive and negative values are averaged out over the 4-yr time. The influence of the LTC part on total sampling error is not evident in the lower troposphere and UTLS regions. Whereas in the upper stratosphere, the LTC's influences become more pronounced because significant TC and SC parts only occur at high latitudes in wintertime (Fig. 11e).
The TC part has the largest absolute magnitude among the three components. In most instances, its values are about 4 times that of LTC and 2 times that of SC. The zonal mean TC part is relatively small, with most values being between 20.1 and 0.1 K. The most significant zonal mean TC part is found in the northern high latitudes of the UTLS region and the upper stratosphere (20.4427 and 20.2619, respectively). That can be attributed to the notable negative TC parts during the first quarter, which was explained in section 3b.
The absolute magnitudes of the SC part are more significant than that of the LTC part but smaller than that of the TC part. The sole exception is found in the upper stratosphere, where the LTC part is slightly higher than the SC part. The largest SC FIG. 11. As in Fig. 2, but for the zonal mean total sampling error.

MARCH 2021
S H E N E T A L .
part is formed in the lower troposphere, where the north-south temperature gradient is relatively strong. The influences of the SC part on the total sampling error are clearly shown in the lower troposphere, especially at midlatitudes and low latitudes.

Summary and conclusions
The sampling error of the COSMIC RO climatologies from 2007 to 2010 is investigated in this paper. According to the sources of the error, it is divided into three parts: LTC, TC, and SC. By investigating the characteristics of these three components and their contributions to the total sampling error, our main conclusions are summarized as follows: 1) The LTC part of the total sampling error shows a pattern of periodic positive and negative deviations, with a full cycle of about four months, which mainly results from nodal regression of the COSMIC satellites: the COSMIC configuration is spread over only 1808 in ascending node, leaving enormous gores in local time coverage in high latitudes in each hemisphere. Under the collective effects of the diurnal temperature range and the sampling ability of COSMIC RO data, the LTC part is weakest in the tropics and most pronounced around 608N/S. Compared to the other two components, the magnitude of the LTC part is relatively small, but it still has nonnegligible influences on RO climatologies, especially in the upper stratosphere. Since the periodic pattern of the LTC part, a suitable temporal resolution (e.g., seasonal and annual mean) can effectively reduce its influences on RO climatologies. 2) The TC part is mainly associated with the extent of day-today temperature variability and the daily number of RO profiles observed in each month. In the lower troposphere, the magnitude of the TC parts is relatively small, whereas in the UTLS region and the upper stratosphere, the significant deviations are observed at high-latitude regions in wintertime. The TC part's influences on the total sampling error are most pronounced among the three components. A longer averaging period of RO profiles can reduce the deviation but will alleviate the ability of RO climatologies to detect the short-term variations in the atmosphere.
3) The SC part of the total sampling error is associated with the geographical distribution of RO measurements and the meridional gradients of atmospheric temperature. The SC part shows distinctive characteristics in different height ranges. In the lower troposphere, it is characterized by a pattern of systematic deviations. In the UTLS region and the upper stratosphere, the SC part displays a seasonal trend. The SC part's influences on the total sampling error are most evident in the lower troposphere but less pronounced in the UTLS region and the upper stratosphere. 4) The total sampling error is more pronounced in the upper stratosphere and the lower troposphere than that in the UTLS region, more pronounced in wintertime than that in the summertime, and more pronounced at high latitudes than that at low and midlatitudes. The magnitude of the total sampling error is also associated with the resolution of RO climatologies. In our study, with a spatial-temporal resolution of 608 3 58 longitude-latitude and one month, its extreme values are 62.5 K. So, even though most of the deviation can be removed by the sampling error estimation process, the residual error cannot be neglected and should be considered carefully.

Simulation Experiments
To further investigate the characteristics of the sampling errors in RO climatologies, experiments for three cases in which simulated RO measurements over a 3-day period have different spatial-temporal distributions are performed. In the setup of RO climatologies, the atmospheric state in a geographic bin is estimated by averaging all RO measurements in the bin with certain latitudinal and longitudinal extensions. For simplification of calculation, we make the following two assumptions: (i) a geographic bin is composed of two equal-area regions named A and B, and the values of atmospheric temperature in the same region at the same time are the same (due to the characteristic of the slow spatial variation in atmospheric temperature in a short distance, e.g., 20 km); (ii) atmospheric temperature in each region changes once each day at 1200 local time. The evolutions of the temperature in the 3-day period are shown in Fig. A1. The spatial-temporal distributions of the simulated measurements in the three cases are 1) only one temperature measurement located in region A in the morning of the first day (the red square in Fig. A1), 2) one temperature measurement per day in region A (the green triangles), and 3) multiple temperature measurements randomly distributed in both regions A and B (the blue triangles).
The sampling errors (including the total sampling error, LTC, TC, and SC parts) of the measurements for each of the three cases are calculated using the methods introduced in section 2. Results are shown in Table A1, from which the characteristics of the sampling errors from the three different sampling conditions can be investigated.
The findings from Table A1 are as follows: 1) The total sampling error is equal to the difference between the average of all the simulated measurements in the bin (regions A and B together) and the ''true'' mean temperature of the geographic bin (the average of all temperature values in the geographic bin) total sampling error 5 1 where n is the number of all simulated temperature measurements in each case, T i is the value of the ith temperature measurement, and T bin is the true mean temperature of the geographic bin. 2) With regard to the LTC part, in case 1, that is, only one temperature measurement simulated in region A, the LTC part is equal to the difference between the temperature measurement and the true daily mean temperature on the first day in region A. In case 2 and case 3, in which more measurements are recorded in the 3-day period, the LTC part is equal to the average of the differences between each measurement and the daily mean temperature on the same day and the same region as the measurement. All three of these cases can be generalized as where T day, region denotes the daily mean temperature on the same day in the same region as T i . 3) With regard to the TC part, in case 1, the value of the TC part is equal to the difference between the daily mean temperature on the first day in region A and the 3-day mean  temperature of region A; in case 2, in which three simulated measurements are evenly distributed in region A, the TC part is equal to the average of the differences between each daily mean temperature in region A and the 3-day mean temperature of region A; in case 3, that is, eight measurements are distributed in both of the two regions, the TC component is equal to the average of the differences between every daily mean temperature in each of the two regions and the 3-day mean temperature of the same region, weighted by the number of measurements in each day in the region. Therefore, the TC part can be calculated by (T day,region 2 T region ) n day,region n , where N 1 and N 2 denote the numbers of days and regions, and their values in the above cases are 3 and 2, respectively. The T region is the 3-day mean temperature in the region, and n day,region denotes the number of measurements on the day in the region (e.g., in case 3, the values of n 1,A and n 2,B are 0 and 2, respectively). 4) With regard to the SC part, in cases 1 and 2, the SC part is equal to the difference between the 3-day mean temperature of region A and the mean temperature of the geographic bin; in case 3, the SC part is equal to the mean of the differences between the 3-day mean temperature in each of the two regions and the mean temperature of the bin. So the SC part can be obtained by SC 5 å N 2 region51 (T region 2 T bin ) n region n , where n region is the number of the measurements in the region (e.g., in case 2, the values of n A and n B are 3 and 0, respectively).
To verify the correctness of the above findings, an additional case, that is, case 4, is introduced. In this case, the geographic bin is divided to four equal-area regions A, B, C, and D, and the temperature changes four times each day at 0600, 1200, 1800, and 2400 local time, and then 20 simulated measurements are randomly distributed in these four regions; see the blue squares for the measurements shown in Fig. A2.
The first set of sampling errors for case 4 is calculated using the method introduced in section 2, and the total sampling error, the LTC, the TC, and the SC components results are 0.26, 0.28, 20.07, and 0.06, respectively. Then, another set of sampling errors are calculated using Eqs. (A1)-(A4). The two sets of sampling errors are compared and are found to be the same. For even further verification of the above findings, we also use more complex models to simulate temperature measurements, such as a variable amplitude sinusoidal function, and the same results are obtained (not shown here).
In RO climatologies, each geographic bin can be regarded as an area composed of an infinite number of small regions, and the establishment of RO climatologies is a process of estimating the atmospheric state with a limited number of measurements in the regions. Ideally, the true monthly mean temperature of a geographic bin can be obtained from the following steps: 1) the daily mean temperature of each region is calculated by averaging the atmospheric temperature at all local time in that day, 2) the monthly mean temperature of each region is obtained by averaging all daily mean temperature at the corresponding region in that month, and 3) the monthly mean temperature of the geographic bin is calculated by averaging the monthly mean temperature of all regions in the bin (note: the order of the above three steps can be different). However, in practice, because of the deficiency of current RO measurements, the results from the above three steps can hardly be desirable. The errors resulting from the three steps are the aforementioned LTC, TC, and SC components, the sum of which is the total sampling error. This can be also proved using Eqs. (A1)-(A4).
Furthermore, the influence of the spatial-temporal evolution of atmospheric temperature measurements on the sampling errors can be analyzed. For example, a large LTC part is more likely to occur in the regions with large diurnal temperature variation, which may lead to a larger difference between measurements and daily mean temperature (T i 2 T day,region ); a large TC part is always found in the regions with strong day-today temperature variability since the difference between the daily mean and regional mean temperature is relatively large (T day,region 2 T region ) in these regions. The above conclusions are consistent with the results in section 3.