## Abstract

A principal component analysis (PCA) method is applied to Challenging Minisatellite Payload (CHAMP) level-2 radio occultation (RO) observations and the corresponding global analyses from the National Centers for Environmental Prediction (NCEP) in March 2004. The PCA is performed on a square symmetric vertical correlation matrix of observed or modeled RO profiles. By decomposing the matrix into pairs of loadings (EOFs) and associated principal components (PCs), outliers are identified and important modes that explain most variances of the vertical variability of the atmosphere as represented by the GPS RO data and the NCEP analyses are extracted and compared. Specifically, a quality control of RO data based on Hotelling’s *T* ^{2} index is applied first, which removes 255 RO profiles from 4884 total profiles (about 5%) and smoothes the distributions of PC modes, making the remaining GPS RO dataset much more meaningful. The leading PC mode for global refractivity explains 60% of the total variance and is associated with a symmetric zonal pattern, with positive anomalies in the Tropics and negative anomalies at the two poles. The second PC mode explains an additional 16% of the total variance and shows a dipole pattern with positive anomalies in the North Pole and negative anomalies in the South Pole. Three significant positive anomalies are also found in the second and third PC modes over three predominant convective areas in the western Pacific, South America, and Africa in the Tropics. The first leading PC mode calculated from global NCEP analyses compared favorably with that from CHAMP observations, which proves that NCEP analyses are capable of representing most of the variance of the atmospheric profiles. However, disagreements between CHAMP observations and NCEP analyses are noticed in the second EOF over the Tropics and the Southern Hemisphere (SH). It is also found that the NCEP analyses describe CHAMP-observed larger vertical scale features better than smaller-scale features, captures features of more leading EOF modes in the Northern Hemisphere than in the SH and the Tropics, and does not capture the vertical structures revealed by the EOFs in CHAMP observations near and above the tropopause in the Tropics.

## 1. Introduction

During the past decade, the radio occulation (RO) technique has been applied for limb sounding of the earth’s atmosphere with the establishment of the global positioning system (GPS) constellation (Yunck et al. 2000). A number of GPS RO missions, such as GPS/Meteorology (GPS/MET), the German–U.S. Challenging Minisatellite Payload (CHAMP), the Argentinian Satélite de Aplicaciones Cientificas-C, and so on, have successfully demonstrated the feasibility and accuracy of the RO technique for sounding the earth’s atmosphere (Ware et al. 1996; Kursinski et al. 1996; Rocken et al. 1997; Hajj et al. 2002, 2004). Having a high vertical resolution of 0.1–0.5 km in the neutral atmosphere under all weather conditions with long-term stabilities and global coverage, GPS ROs have proven to have unique applications in numerical weather prediction (NWP), and climate and space weather research (Eyre 1994; Kuo et al. 2000; Hajj et al. 2000; Steiner and Kirchengast 2000; Tsuda et al. 2000; Zeng et al. 2002).

A GPS RO measures the phase delay of the radio waves transmitted from a GPS satellite occulted by the earth as viewed from a low-earth-orbiting satellite. Based on the observation geometry of occulations, vertical profiles of bending angle and atmospheric refractivity are derived under the local spherical symmetry assumption of the atmosphere using Snell’s law and Abel inversion (Kursinski et al. 1997). Various data assimilation strategies for RO bending angle and refractivity data have been studied comprehensively (Eyre 1994; Zou et al. 1995; Zou et al. 1999; Kuo et al. 2000; Healy et al. 2005). In this study, a principal component analysis (PCA) method is applied to CHAMP level-2 RO bending angle and refractivity data of March 2004 to remove outliers in the CHAMP data, and to compare CHAMP data with the National Centers for Environmental Prediction (NCEP) analyses in terms of leading EOFs that represent most vertical variabilities of the observed and modeled atmosphere in order to gain further insights into how closely the GPS RO profiles are represented by the current large-scale global analysis.

PCA is a classical multivariate statistical method that has been widely applied for analyzing the spatial and temporal variability of geophysical fields (Preisendorfer 1988; Bjornsson and Venegas 1997; Jolliffe 2002). It is commonly used for two purposes: 1) to reduce the number of variables and 2) to detect structure in the relationships between variables. It has become indispensable in extracting essential information from massive datasets (von Starch and Zwiers 1999). The main motive for applying a PCA to RO data is its capabilities to perform the traits of PCA described above. It is hoped that PCA will provide new insights into both the atmospheric structures revealed from RO data and how accurately these structures are represented by large-scale analyses.

The paper is organized as follows. Brief descriptions of the datasets and PCA equations are given in section 2. Identification of outliers in the RO bending angle and refractivity profiles using PCA results is described in section 3. Section 4 shows the variances and spatial structures of RO observations revealed by PCA analysis, which are compared with those from the NCEP analyses. A summary and conclusions are given in section 5.

## 2. CHAMP data and the PCA method

### a. CHAMP data

We use CHAMP observations in March 2004 published by the University Corporation for Atmospheric Research’s (UCAR) Constellation Observing System for Meteorology, Ionosphere and Climate (COSMIC) Data Analysis and Archival Center (CDAAC; version 002) for this study. A total of 4884 RO profiles, taken before CDAAC quality control (QC), are available during this month and are used as input into a PCA QC procedure. The optimized bending angle and refractivity (Kuo et al. 2004) from these 4884 RO profiles from the surface to 40-km height are investigated.

The original vertical resolution of the CHAMP RO bending angle and refractivity dataset varies from about 0.1 km in the lower troposphere to about 1.5 km in the stratosphere. These RO profiles are interpolated to 0.2-km vertical resolution using an exponential extrapolation method. The monthly biweight means and standard deviations (BSD; Lanzante 1996; Zou and Zeng 2006) as well as the traditional means and standard deviations of the RO retrievals are then calculated at each vertical layer (Fig. 1). Both the traditional and biweight means and standard deviations are shown, because estimates of the mean and standard deviation by the traditional (biweight) method could be greatly (less) influenced by the existence of relatively few outliers. As shown in Fig. 1, the biweight means of the RO bending angle and refractivity decrease exponentially with height, as do the means calculated by the traditional method. The BSDs of the RO bending angle and refractivity increase sharply below 8 km, and have an obvious protuberance around 8–20 km. The BSDs are far smaller than the standard deviations calculated by the traditional method, implying the presence of outliers, which will be dealt with in the next section.

For the PCA analysis, the RO profiles are first subtracted by the biweight mean and then normalized by their BSDs. This process can be expressed mathematically as

where **x*** _{j}* denotes the

*j*th RO profile (

*j*= 1, 2, . . . ,

*N*) and is a vector of dimension

*M*(

*M*is the number of vertical layers),

*N*is the total number of ROs employed in the PCA, and

**x**and

*σ*

_{BSD}(

**x**) represent the biweight mean and BSD of variable

**x**(e.g., bending angle or refractivity), respectively. The variable

**x̂**

_{j}(

*j*= 1, 2, . . . ,

*N*) in Eq. (1) is used in the subsequent PCA.

### b. PCA method

PCA is a multivariate statistical technique based on the fundamental matrix operations. It has proven to be useful in identifying the principal modes of variability of a given field. Herein, the normalized RO variables (**x̂**_{j}, *j* = 1, 2, . . . , *N*) form a vertical–horizontal field with zero horizontal mean. An *M* × *N* data matrix 𝗫 can be constructed by organizing its *M* rows (vertical layers *m*) and *N* columns (RO locations *n*) as follows

The vertical correlation matrix of 𝗫 is defined by the symmetric matrix 𝗥 with its components

where *N _{ij}* is the number of observations concurrently available at the

*i*th and

*j*th levels (generally,

*N*≤

_{ij}*N*because not all RO soundings penetrate down to the surface). The eigenvalues and eigenvectors of the matrix 𝗥 are then calculated as

where **Λ** is the *M* × *M* diagonal matrix containing the eigenvalues *λ _{m}* of 𝗥, which are usually sorted in decreasing order, so that

*λ*

_{1}>

*λ*

_{2}> . . . >

*λ*; 𝗘 has dimension

_{M}*M*×

*M*with its

*m*th column representing the eigenvector

**e**

*of 𝗥 corresponding to the eigenvalue*

_{m}*λ*. Each eigenvector (EOF) can be treated as a vertical pattern. To see how a given vertical pattern varies over the globe, the original field (

_{m}**x̂**

_{j},

*j*= 1, 2, . . . ,

*N*) or simply 𝗫 is projected onto these eigenvectors to obtain

The *i*th row of the coefficient matrix 𝗣 in the above expansion represents the horizontal variation of the weight of the *i*th EOF and will be denoted as PC*i* (the *i*th principal component). The *j*th column of 𝗣 consists of the expansion coefficients of the *j*th RO (**x̂**_{j}) onto the eigenvectors (**e*** _{i}*,

*i*= 1, 2, . . . ,

*M*) or 𝗘.

The *m*th eigenvalue *λ _{m}* is proportional to the percentage of the variance of the field 𝗫 that is accounted for by the

*m*th EOF mode. Just as the EOFs are orthogonal in the vertical dimension, the associated PCs are orthogonal in the horizontal dimension. An EOF, its corresponding eigenvalue, and its expansion coefficients (PCs) define a vertical mode (revealing the vertical variability), its relative importance, and its global variability, respectively. The leading mode (corresponding to the largest eigenvalue) explains the largest fraction of the total variance, the second mode explains the largest fraction of the remaining variance, and so on. By including all (or several) modes, the original data sample can exactly (or approximately) be recovered using EOFs and PCs based on 𝗫 = 𝗘𝗣 (or its truncated version).

Figure 2 shows the individual (bars) and accumulated percentage variances (curves) explained by the first eight EOFs for CHAMP-observed bending angle (Fig. 2a) and refractivity (Fig. 2b) profiles in March 2004, where the percentage variance is calculated as

About 80% (95%) of the total variance is explained by the first eight EOFs for bending angle (refractivity) profiles.

## 3. Outliers identified by a quality control procedure based on PCA

Figure 3 shows the first three PCs of bending angle (left panels) and refractivity (right panels) that describe the horizontal variance of the three leading vertical patterns. Outliers of RO profiles are readily noticeable, characterized by anomalous jumps as seen in these PCs distributions. To quantitatively detect these suspicious RO data, we define a parameter called the Hotelling *T* ^{2}, which is a measure of the distance of each sample with respect to the data center, as follows:

where *P _{ij}* is the PC of the

*j*th RO observation sample onto the

*i*th EOF,

*λ*is the

_{i}*i*th eigenvalue, and

*m*is the total number of kept modes. In our study,

_{T}*m*= 5 is chosen. Then, the larger the value of

_{T}*T*

^{2}, the greater the distance of the sample to the data center. We separately apply PCA to normalized RO bending angle and refractivity data, and detect the outliers according to their

*T*

^{2}criteria. We sort

*T*

^{2}for bending angle (Fig. 4a) and refractivity (Fig. 4b) separately in decreasing order. Each sorted logarithmic

*T*

^{2}curve could be approximated by two sectionalized regression lines. The two regression lines are confirmed via the three-group resistant regression method (Hoaglin et al. 1983). The

*T*

^{2}value corresponding to their intersection is defined as the threshold

*T*

^{2}

_{crs}. It is found that

*T*

^{2}

_{crs}= 4194 for bending angle and

*T*

^{2}

_{crs}= 10 330 for refractivity. They are taken as the PCA QC criteria. Any RO sample whose value of bending angle

*T*

^{2}is larger than 4194 is considered as an outlier based on the bending angle QC (BAQC), while those whose

*T*

^{2}values for refractivity are greater than 10 330 are identified by the refractivity QC (RFQC) as outliers. A total of 210 ROs are identified by both BAQC and RFQC, with another 23 ROs identified only by BAQC and 22 only by RFQC. The 255 ROs identified by either BAQC or RFQC will be taken as outliers, which are removed from the 1-month data sample for the data comparison between CHAMP observations and NCEP analyses in terms of EOFs to be discussed in section 4.

Two scatterplots, which are shown in Fig. 5, present the distribution of outliers in bending angle PC space for all ROs. Outliers identified by both BAQC and RFQC are marked with asterisks, while those only recognized by BAQC or RFQC are marked with triangles and diamonds, respectively. From Fig. 5a, it is clear that points with very large absolute values of PC1 and PC2 are identified as outliers by both BAQC and RFQC. To have a clear view of the majority of the data points near the center characterized by small values of both PC1 and PC2, the same figure is plotted on a smaller scale (Fig. 5b). It is seen that a few outliers, especially those identified by RFQC, are speckled in the center of the bending angle PC1 and PC2 plane. These outliers must have been removed because of their anomalous behavior in PC3, PC4, and/or PC5. Figure 6 shows the values of bending angle Hotelling’s *T* ^{2} of the 233 outliers identified by BAQC, and the individual contribution (in percentage) of the first five PCs to *T* ^{2} of each outlier. Because *m _{T}* = 5 is chosen in this study, the sum of the percentages of the first five PCs to the

*T*

^{2}values is equal to 100%. Although the majority of outliers are characterized by anomalous behavior in PC1 and PC2, there are still some outliers for which any of the first five PCs could contribute to more than 50% to the Hotelling’s

*T*

^{2}values.

To see if there is any latitudinal dependence of PCs as well as the characteristics of outliers, we separate the PCs of ROs in Fig. 5b into three latitudinal zones, with good data points and outliers shown in separate panels (Fig. 7). In Fig. 7 ROs in the Tropics (30°N–30°S), middle latitudes (30°–60°N and 30°–60°S), and high latitudes (60°–90°N and 60°–90°S) are marked in red, green, and blue, respectively. Good data points (nonoutliers) are shown in Figs. 7a and 7b as dotted points and outliers are shown in Figs. 7c and 7d. It is obvious that PCs of good data points from each latitudinal zone are approximately aggregated into their own clusters, while the outliers are not. The PC distributions for the first and third EOFs imply a zonal structure of the atmospheric variability, which will be further discussed in section 4. We find that there are more outliers in the Tropics (7.6%) than in the middle (3.1%) and high (5.8%) latitudes.

The scatterplots of PCs for the three latitudinal zones are shown separately in Figs. 8 –10, in which nonoutliers are represented by black dots, and outliers are marked in crosses or asterisks. The outliers that are located away from the majority of data points are indicated by crosses, while those outliers that are well mixed with the good data points are denoted as asterisks. Figure 8a includes all ROs in the Tropics and Fig. 8b only includes those ROs whose absolute values of PC1 and PC2 are less than 60 and 40, respectively. In Fig. 8c (Fig. 8d), only the outliers that are well mixed with good data points in Fig. 8b (Fig. 8c) are shown, along with all the good data points. In the Tropics, most outliers that seem well mixed with good data points in the PC1 and PC2 plot (Fig. 8b) are located in the periphery of the datapoint cloud in the PC3 and PC4 (Fig. 8c) plot except for two, which are distinguishable in the PC5 and PC6 plane (Fig. 8d). In the middle latitudes, there are only four outliers within the scale ranges shown in Fig. 9b: two have anomalous values of PC1 and PC2 and the other two have large projections on higher modes, especially on PC5 and PC6. At high latitudes, almost all outliers are located near the boundary of the “datapoint cloud” in PC1 and PC2 (Fig. 10b), with those closer to the data center in Fig. 10b having large projections on higher modes (Figs. 10c and 10d). In fact, some outliers might not be easily distinguishable in any 2D PC component plane because the PCA quality criterion based on the Hotelling’s *T* ^{2} index is a squared sum of all kept PCs, that is, an integrate contribution from all kept PCs.

Although only 255 (5%) of ROs are removed, the standard deviations of both bending angle and refractivity are significantly decreased compared with the ones without the applied PCA QC (see Fig. 1). The traditional standard deviations after PCA QC also become close to BSD, indicating the effectiveness of the PCA QC in removing outliers.

Of the 255 profiles identified as the outliers by PCA QC, 122 profiles are also removed by CDAAC QC, which is based on the values of a series of parameters for each RO, such as the mean and standard deviation of the ionosphere-free bending angle from the first guess between 60 and 80 km, the maximum refractivity departure from the first guess, and so on (Kuo et al. 2004). Compared with other data QC methods (Hajj et al. 2004; Healy et al. 2005; Zou and Zeng 2006), PCA QC is unique for the following two reasons: 1) PCA QC is a robust QC method that is resistant to the effect of outliers and 2) no ancillary data or information is needed in PCA QC in which outliers are identified based on the inherent consistency of data itself.

## 4. Comparison between CHAMP RO observations and NCEP analyses

The purpose of the above PCA QC is to compare CHAMP GPS RO data with the NCEP analyses in an optimal way. Unlike the comparisons between the RO data and the large-scale analysis that were presented in a number of earlier studies (Rocken et al. 1997; Steiner et al. 1999; Hajj et al. 2002; Gorbunov and Kornblueh 2003; Kuo et al. 2004), it is the individual modes of the atmospheric vertical variability decomposed by PCA that are compared between CHAMP RO data and the NCEP analyses. The NCEP bending angle profiles are calculated using the Abel integral equation and NCEP vertical refractivity profiles (Kursinski et al. 1997). The mean and standard deviation of the CHAMP RO bending angle and refractivity departures from the NCEP analyses, that is, of the following vector:

where **x** is either bending angle ** α** or refractivity

**N**, before and after PCA QC are shown in Fig. 11. The positive bias of CHAMP refractivity is removed and the standard deviations for both bending angle and refractivity are reduced after applying PCA QC.

The vertical structures of the atmosphere and their global variability revealed by CHAMP RO observations can now be examined and compared with those of NCEP analyses. The same PCA procedure as described in section 2 is applied to the remaining CHAMP RO profiles with those 255 anomalous ROs removed. The same PCA is also applied to the corresponding bending angle and refractivity profiles derived from NCEP analyses. The eigenvectors, representing the vertical structure of the atmosphere, will be denoted as EOF1, EOF2, and so on. The horizontal variability of the EOF expansion coefficients over the globe is represented by the PCs. Because bending angle and refractivity have similar structures and variability, we will only show results derived from CHAMP-observed and NCEP-modeled refractivity profiles.

The vertical structures of the atmosphere represented by the first three eigenvectors and the horizontal variability of the first three PCs over the globe are shown in Figs. 12 –14. The first mode explains 60% of the variance (see Fig. 2). The variance explained by the first EOF calculated from NCEP analyses is slightly higher (63%). It can be seen that EOF1 represents a dominant upper-atmospheric structure, with its amplitude being close to zero from the surface to about 7 km (Fig. 12). The amplitude of EOF1 increases to 0.08 at above ∼10 km and remains nearly constant with increasing altitude above 10 km. The EOF1 derived from the NCEP analyses is almost identical to that of the CHAMP observations, with a correlation coefficient of 1.00. This mode shows a zonal pattern symmetric with respect to the latitude 10°S. Positive anomalies are found in the Tropics while negative anomalies are located near the North and South Poles. The correlation coefficient of PC1 between CHAMP observations and NCEP analyses is as high as 0.99.

The second EOF (EOF2, Fig. 13) explains 16% of the variance for both CHAMP and NCEP data and changes sign at about 22 km. It represents a dominant middle- and upper-tropospheric structure of the atmosphere, where its amplitude is increasing from the surface to a maximum at about 8 km. It has an out-of-phase relationship between the North and South Poles, with positive anomalies in the North Pole and negative anomalies in the South Pole. Another significant feature revealed by PC2 is the positive anomalies over the western Pacific, South America, and Africa in the Tropics, where convection prevails. Although PC2 from the NCEP analyses shows a similar global variability of EOF2 to that of the CHAMP observations, the amplitude of the positive anomalies in the convective areas is noticeably smaller than that from the CHAMP observations.

The vertical structure represented by the third EOF (EOF3) observed by CHAMP is also well captured by the NCEP analyses (Fig. 14). The largest amplitude of EOF3 is found in the middle troposphere. The variability of this mode has smaller spatial scales than those of PC1 and PC2 as seen in the global distribution of PC3. Zonal bands of negative anomalies are found in the extratropics and poles, and those of positive anomalies in the Tropics and midlatitudes. The three distinct positive anomalies seen in Fig. 13 for EOF2 over the western Pacific, South America, and Africa in the Tropics can still be found in the PC3 distribution. Similarly to EOF2, the magnitude of PC3 from the NCEP analyses is smaller than that of CHAMP in these three regions. The negative anomaly of PC3 from CHAMP data is weaker than that of NCEP data in the extratropics.

To examine the latitudinal dependence of both the EOFs and the differences between the CHAMP and NCEP data, the global dataset is partitioned into three subsets: 90°–60°N [Northern Hemisphere (NH)], 30°N–30°S (Tropics), and 60°–90°S [Southern Hemisphere (SH)], and PCA is applied to each subset. The first 11 EOFs derived from the global dataset and the above three subsets are shown in Fig. 15. As the number of EOF increases in Fig. 15a, the vertical structure of the atmosphere described by each EOF becomes more complicated, mainly in the troposphere, characterized by decreasing vertical scales and increasing amplitudes. Except for the 10th and 11th EOFs, the NCEP EOFs have similar vertical structures to those derived from global CHAMP observations (Fig. 15a), but a downward phase shift is noticed for EOF5–EOF11. In the Tropics (Fig. 15b), the NCEP EOFs show more vertical oscillations in the lower troposphere than the CHAMP EOFs. On the contrary, the oscillations of the CHAMP EOFs near and above the tropopause are stronger than the NCEP EOFs. The fact that the oscillations of the NCEP EOFs in the lower troposphere are stronger than those of the CHAMP RO observations might be related to the phase-locked loop signal-tracking technique (phase-locked loop) applied in the CHAMP GPS receiver. This technique is incapable of reliable RO signal tracking under conditions of multipath propagation, which often occurs in the troposphere because of strong vertical refractivity gradients. The more complicated the structure of the refractivity, the more likely the loss of lock on the RO signal in the GPS receiver. Thus, the penetrating depth of the RO is correlated with the vertical structure of refractivity. The occultation corresponding to smooth refractivity profiles appears to be more heavily weighted in the correlation matrix 𝗥, by smoothing its eigenvectors. This problem is expected to be overcome in the future GPS RO missions, such as COSMIC, by applying the open-loop tracking technique (Sokolovskiy 2001), which is not affected by the structure of the refractivity. The fact that the oscillations of the CHAMP EOFs near and above the tropopause are stronger than those of the NCEP EOFs may be caused by a coarser resolution of the NCEP model in these height ranges. In the NH, the CHAMP EOFs and NCEP EOFS match very well at all altitudes up to EOF8. Starting with EOF9, a downward phase shift is noticed in the NCEP analyses. The CHAMP EOFs do not match the NCEP EOFs in the SH as well as they do in the NH (Figs. 15c and 15d). Larger (smaller) amplitudes of EOFs are noticed in the lower troposphere (near the tropopause) of the NCEP analyses than in the CHAMP observations.

The variances explained by each of the first 11 EOFs for each subset of CHAMP observations from three different latitudinal zones are shown in Fig. 16a. The variance explained by the leading EOF for the subset from 30°N–30°S (upward-pointing triangle in Fig. 16) is the smallest compared with those for subsets in higher latitudes. More EOFs must be included to account for the same amount of percentage variance in the Tropics as in the higher latitudes. The first leading EOF from the SH (downward-pointing triangle in Fig. 16) accounts for more variance than that from the NH (diamond in Fig. 16). We also give the explained variances by the first 11 EOFs for NCEP analyses in Fig. 16a (dashed–dotted lines), which show similar characteristics as those for CHAMP observations, except for the first NCEP EOF in the Tropics, which explains much more variance than do the CHAMP data.

The correlations between the CHAMP and NCEP data for the first 11 EOFs and PCs in three latitudinal zones are calculated and shown in Figs. 16b and 16c. The first three EOFs of CHAMP are highly correlated with those of NCEP (Fig. 16b). Correlations close to 1 extend to the eighth mode for the NH (diamond), while the correlation drops quickly beyond the EOF6 in the Tropics (upward-pointing triangle) and SH (downward-pointing triangle). The correlation of PCs between the CHAMP and NCEP data decreases with the increasing number of EOFs (Fig. 16c), except for the first mode in the Tropics, where the correlation of PC1 for the first EOF is much lower than PC2. The correlations of PCs for the NH are higher than those for the SH and the Tropics for all modes. This may imply that the NCEP analyses in the Tropics and SH are less accurate than that in the NH, especially for the first mode in the Tropics and higher modes (>6) in both the Tropics and the SH.

The PCA is based on the explained proportions of the total variance of the vertical profiles of the atmospheric refractivity. Do the data in certain altitude ranges dominate in the total variance? Because of the pronounced difference of the atmospheric vertical structure between the stratosphere and the troposphere, we partition the whole height range (0–40 km) into two parts: the tropospheric part (0–17 km) and the stratospheric part (20–40 km). PCA is applied separately for the data in the troposphere and in the stratosphere in the Tropics. We compare the vertical profiles of EOF1 and the horizontal distributions of the PC1 mode of refractivity field obtained in the whole height range (0–40 km), in the troposphere (0–17 km), and in the stratosphere (20–40 km; Fig. 17). Percentages of explained variances by EOF1 are given in the subtitles of the PC mode, while the correlation coefficients of EOF1 between CHAMP and NCEP are indicated in the subtitles of the EOF mode. The first EOF mode for the whole vertical layer has larger amplitude above 20 km, which mainly represents the variance in the stratosphere. The high degree of consistency of the PC1 mode between the 0–40 km PCA (Fig. 17b) and the 20–40 km PCA (Fig. 17d) also confirms this point. The EOF1 and PC1 in the tropical troposphere (Fig. 17f) mainly reflect features of three prevailing convective areas. The CHAMP EOF1 (Fig. 17e) has a pronounced maximum in the middle troposphere, which is not very well captured by the NCEP analyses.

The explained variance by each EOF and the correlations of the first 11 PCs and EOFs between the CHAMP and NCEP data in the two height regions of the Tropics are shown in Fig. 18. The first EOF for the tropical stratosphere accounts for 46.1% variance, which is larger than that for 0–40 km (23.9%) and 0–17 km (36.1%), while the second EOF sharply decreases to 13.0%, which is smaller than that for other height ranges (Fig. 18a). It illustrates to some extent that the variability of the stratosphere is relatively simple, and the stratospheric atmospheric state can be more effectively approximated by the first PCA mode. Comparison of the explained variance for the CHAMP (solid) and NCEP (dashed) data shows better agreement between 0–17 km than between 20–40 km. As the correlations of EOFs (Fig. 18b), the correlation of the first EOF for 20–40 km is the strongest among all correlations of EOF1, but decreases generally with the increase of EOF number and becomes lower than the correlations for other height regions. Except for the first EOF, the correlations between the NCEP and CHAMP EOFs are higher in the troposphere than in the stratosphere. This implies that the NCEP analyses adequately describe the vertical structures of the troposphere except for the first EOF for which a significant vertical phase shift is noticed (Fig. 17e). The correlations of PCs between the CHAMP and NCEP analyses (Fig. 18c) primarily decrease with the increasing numbers of EOFs, being significantly smaller for the tropical stratosphere than those for the troposphere in the Tropics. This may indicate that NCEP analyses can better represent spatial variability of the atmospheric state in the troposphere than in the stratosphere.

## 5. Summary

In this paper, a PCA is applied to analyze CHAMP-observed and NCEP-modeled bending angle and refractivity profiles. This analysis technique decomposes the RO data fields into a set of EOFs that describes the vertical structures of the atmosphere and into a series of PCs that describes the horizontal variations of the expansion coefficients of these EOFs. A distinctive feature of EOFs is that, for a given dataset, they explain a given amount of variance with the fewest number of functions. The horizontal variations of these vertical structures are described by the PCs.

First, PCA is applied for identifying the outliers so that an outlier-free dataset can be reconstructed. The effectiveness of the proposed PCA QC procedure is demonstrated on the basis of 1-month CHAMP RO observations. The variabilities of RO-observed bending angle and refractivity revealed by PCA are then studied and compared with those derived from the NCEP analyses. The dominant mode of the global analysis represents a strong zonal distribution and very little vertical variation. The next two modes discover the main convection areas in the Tropics and have oscillating vertical structures. A comparison of PCA modes between the CHAMP and NCEP data shows that 1) the NCEP analyses describe CHAMP-observed larger vertical scale features better than smaller-scale features, 2) the NCEP analyses are capable of capturing features of more modes representing the vertical and horizontal variability of the atmosphere in the NH than in the SH and the Tropics, and 3) the NCEP analyses do not capture the vertical structures revealed by the EOFs in the CHAMP observations near and above the tropopause in the Tropics. These results might suggest that GPS RO observations may contribute significantly to global analysis in the SH and near and above the tropical tropopause with high vertical resolution. The inconsistency between the CHAMP RO observations and the NCEP analyses for high-order EOFs may also suggest that an EOF-expanded and -truncated RO data, rather than the raw GPS RO data, may be considered for GPS RO data assimilation in order to reduce aliasing.

Further studies are required to explore the potential applications of the PCA to GPS RO data. For example, a similar comparison between GPS CHAMP observations and NCEP analyses for more than 1 month will allow the GPS-observed and -modeled intraseasonal variations of the atmosphere to be examined. A more in-depth understanding of the physical meaning of each vertical mode revealed by the PCA is required. The implications of the major differences found by such a study between GPS RO observations and the NCEP analyses to NWP need to be studied. The use of PCA as an antialiasing filter for GPS RO data assimilation has yet to be developed and tested. The impact of the signal-tracking technique in the low troposphere, where multipath propagation results in strong fluctuation of the GPS signal amplitude and phase, that is, an open-loop tracking technique versus a phase-locked loop, on resulting EOFs must be examined using both CHAMP and COSMIC data. Differences in the structures of EOFs in the stratosphere near and above 30 km using the optimized and the raw bending angles shall be examined. This may provide an additional insight into the choice of the optimized or raw bending angles for assimilation.

## Acknowledgments

This study is supported by the National Science Foundation under Project ATM-0101036.

## REFERENCES

**.**

**,**

**,**

**.**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**.**

**.**

**,**

**,**

## Footnotes

*Corresponding author address:* Dr. X. Zou, Dept. of Meteorology, The Florida State University, Tallahassee, FL 32306-4520. Email: zou@met.fsu.edu