The following report summarizes the NOAA Products Validation System (NPROVS), operated at the NOAA National Environmental Satellite, Data, and Information Service (NESDIS) Center for Satellite Applications and Research (STAR). NPROVS provides centralized collocation and intercomparison of multiple suites of satellite-derived atmospheric sounding products, global operational radiosonde and dropsonde observations, and numerical weather prediction (NWP) data. The report addresses the screening and processing of radiosonde and dropsonde data, the method of collocating to the respective satellite product data, and perceived characteristic differences among the respective satellite products. The analysis of characteristic tendencies among the satellite products underscores the need for absolute consistency when compiling validation datasets of respective satellite, ground target, and NWP observations in order to minimize the varying degrees of inherent differences among these systems.
The access and collocation of sonde and satellite observations occur on a daily basis with the routine archiving of all collocated data at STAR. The minimum requirement for retaining a given sonde is that the collective temperature and moisture profiles extend vertically at least 5 km without a gap. Retained reports are further processed including analysis to identify suspicious observations, temperature and moisture profile features, and impacts when applying sampling constraints. The collocation approach is optimized for each satellite system to select a single “closest” sounding from each satellite that lies within 6 h and 250 km of a given sonde. The NPROVS analytical and graphical interface provides a dual troubleshooting function to assure the integrity of collocations and validation function for intercomparing the satellite products. Results indicate notable differences in the perceived characteristic of the products systems with seasonal tendencies.
The validation of derived sounding products from environmental satellites using global datasets of collocated conventional radiosonde and satellite observations has been a baseline component of operational satellite programs at NOAA (see the appendix for acronym expansions) since 1979 and the deployment of the TOVS operational soundings (Smith and Woolf 1976). NOAA/STAR now supports multiple programs and scientific approaches for deriving atmospheric sounding products from a variety of radiometric sensor configurations onboard NOAA, MetOp, DMSP, and NASA EOS polar satellites and NOAA GOES satellites. Each program conducts validation, with a common component being the use of collocated conventional radiosonde and satellite data. However, each program uses variations in strategies for compiling collocations and statistics. Such differences can skew a given set of results and associated interproduct analysis when comparing the validation results compiled by the respective programs. Recognizing this and the risk-reduction aspect of being able to routinely compare products from newly deployed and existing legacy product systems, the requirement for centralized, multiple product comparisons was appended to the Cal–Val program for next-generation JPSS CrIMSS EDRs. The NPROVS, which was already operating at STAR, was subsequently incorporated into the CrIMSS EDR Cal–Val program in support of pending NPP sounding products in October 2011.
Figure 1 provides a schematic diagram of the multiple satellite and sonde observations that are collected within NPROVS. A key feature of NPROVS is that the dataset of conventional global radiosondes and dropsondes used to validate each satellite product system is identical. As seen, seven operational satellites and 12 independently operated sounding product systems are accessed with plans to expand to NPP and MetOp-B product systems scheduled for deployment in 2012. The satellite product systems operated at NOAA are the ATOVS (Reale et al. 2008), MIRS (Boukabara et al. 2007), IASI (Maddy et al. 2009), AIRS (Goldberg et al. 2003), and GOES (Ma et al. 1999). The satellite products systems operated external to NOAA are the EUMETSAT’s IASI (Schlussel et al. 2005) and UCAR’s COSMIC GPSRO (Kuo et al. 2004).
A second key feature of NPROVS is that the collocation strategy across all satellite product systems is consistent. The goal is to select the single “closest” observation from each system that lies within 6 h and 250 km of a given sonde observation, assuring at least one collocation from most polar satellite platforms. Ancillary satellite parameters such as terrain, QC, and cloud are not considered in the initial compilation of collocations. NPROVS differs in this respect from typical project-oriented validation, which often restricts the satellite data to satisfy specific QC, terrain, and space–time window constraints. NPROVS relegates such considerations to the analytical interface, which then provides such sampling options later. The important point is that the initial collocation dataset is compiled in a consistent manner for all satellite systems.
Finally, NPROVS provides routine monitoring of the relative performance of each satellite product system through the identification of characteristic differences versus a large, common sample of identically processed global sondes. Varying degrees of random and systematic differences inherent, for example, in comparing sun-synchronous polar satellites to synoptic radiosondes, regional and seasonal effects, and differences in sensors and scientific approach cannot be avoided. However, the identification of characteristic patterns of interproduct differences more attributable to the combination of sensor and scientific approach, an important goal of NPROVS, is nominally achieved using the large, contiguous global samples of “consistently” processed collocation datasets. These differences reveal underlying characteristic patterns, including seasonal and airmass effects that are often missed in project-oriented or shorter-term, intensive validation exercises.
Accuracy assessment using NPROVS is not the main objective given the inherent sonde errors, collocation mismatch and the fact that satellite sensors have limited vertical sensitivity. A variety of authors have undertaken the work of defining vertical profile error characteristics (“error bars”) for respective radiosonde instrument types including Sun et al. (2010) and Wang and Zhang (2008). Pougatchev et al. (2009) and Sun et al. (2010) have quantified spatial and temporal mismatch error sensitivity using IASI and COSMIC atmospheric profiles, respectively. The use of averaging kernels to model satellite sensor vertical sensitivity to a given set of profiles is discussed in Maddy and Barnet (2008) and Pougatchev et al. (2009). Integrating such information for each collocation can potentially move the characteristic performance assessment in the direction of an accuracy assessment. However, absolute accuracy ultimately requires traceable standards, and these are not currently available on a global scale. The deployment of programs such as the GRUAN (Seidel et al. 2009), which begin to address these issues, is at an early stage of development.
2. Radiosonde processing
A standard automated system for radiosonde processing, QCing, and archiving at NOAA is the IGRA as documented in Durre et al. (2006, 2008). The objectives of IGRA and NPROVS are similar in that both seek to identify “suspicious” radiosonde observations, but their context and respective analysis strategies vary. For example, a primary objective of NPROVS is validation in the context of real-time weather applications, whereas IGRA focuses on longer-term climate monitoring. NPROVS utilizes previous NOAA/EMC complex QC decisions based on differences from a collocated weather forecast background, whereas IGRA derives QC decisions based on differences from climatological statistics at each site. NPROVS includes reports from ships and dropsondes, and airmass characterizations based on temperature and moisture profile structures, clouds, and terrain, which are a lesser concern in IGRA. Value-added radiosonde analysis as provided by IGRA and NPROVS are complimentary and more direct comparison and merging of these processes warrants future consideration.
NPROVS processing is a refinement and expansion of procedures developed and operated at NOAA in support of heritage temperature and moisture sounding products since 1979 as presented in Reale et al. (1994) and Reale (2001). Conventional radiosonde data available over the GTS under the oversight of the WMO are accessed from NOAA/EMC (available online at http://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/document.htm). In addition to the mandatory and significant level report data for temperature (T), dewpoint temperature (Td), and wind, these data include QC marks (Collins 2001a,b) for each parameter and level as assigned in NOAA/GFS data assimilation (see http://www.emc.ncep.noaa.gov/GFS/doc.php). The closest-in-time GFS 6-h forecast data spatially interpolated to the radiosonde and used to assign the QC marks are also included on this file.
NPROVS processing initially provides one of the following terrain designations to each radiosonde report: land, coast, island, island–land, and ship. Land and coast designate continental (mainland) reports, with land reports not intersecting open water (ocean, bay, and large lake) within a 50-km radius. Island designates a maritime environment and includes island and selected peninsula formations with horizontal scale less than 500 km. For example, Japan and England are designated as islands. Island stations that do not intersect open water within 50 km are designated as island–land. Terrain designations are assigned manually and retained in an offline directory, and new reports are monitored daily and assigned a terrain designation at the earliest convenience.
Radiosonde and dropsonde QC initiates with the NOAA/EMC QC markers for T, Td, and pressure (P). Data with marks of 0–3 are automatically accepted, consistent with weather forecast data assimilation practices. However, because the forecast assimilation systematically rejects Td above 300 hPa, surface data over high terrain, and selected ship and dropsonde data, the following expanded list of QC marks is used for acceptance: 0–3 for respective parameters; 9, 14, and 15 for Td at or above 300 hPa; 9, 14, and 15 for consecutive levels of Td up to the top of the profile; 8 and 9 for surface level, unless observed for consecutive levels from the surface; and 14 and 15 if there is a ship or dropsonde for all levels and parameters.
Analyses of rejected levels confirm that in most cases the data are suspicious. Cases of unrejected suspicious data are typically associated with the approximately 15% of older sondes for which EMC continues to apply a radiation correction to the report temperature prior to assigning the QC mark and assimilation. The temperature data as received from the field are suspect in these cases, requiring an added test to ensure that they are within 15 K of the appended GFS forecast.
The sets of accepted T and Td profile data must attain a vertical extent of 5.0 km for radiosondes. There is no vertical extent requirement for dropsondes.
Remaining reports are tested for gaps. Gaps are defined if the separation between successive report levels exceeds 1.0 km between the surface and 700 hPa, 2.0 km between 700 and 200 hPa, 3.0 km between 200 and 50 hPa, or 4.0 km above 50 hPa.
Separation or thickness (Z) is defined using the hydrostatic equation
Where Pi and Pi+1 are the lower and upper pressures (hPa), Ti and Ti+1 are the lower and upper temperatures (K), R is the dry air constant 287.04, and g is the gravity constant 9.8 m s−2.
Gap testing is done separately for the T and Td profiles. If a gap is observed, then the report is “capped” at the lower level bounding the gap. Data above the gap are retained but not recommended for use in validation. The capped T and Td profiles must pass the vertical extent requirement, otherwise the report is rejected.
Profile data from accepted reports that are retained in NPROVS include wind, drift, cloud, TPW, solar angle, and metadata, most notably on the instrument type and radiation correction(s) applied (i.e., solar, infrared, EMC; see http//www.ofcm.gov/fmh3/text/default.htm).
Similar to the GFS 6-h forecast, CFSR (http://cfs.ncep.noaa.gov/cfsr/) and associated background profiles are retained on the radiosonde file stored in NPROVS. CFSR data are not available in the near-real-time environment of NPROVS and are postprocessed onto the file later. At this time, the original report is replaced with a postprocessed report that includes the CFSR and associated background and to which the GFS 6-h forecast from the original report is appended. The QC marks, originally based on the GFS 6-h forecast, are based on the CFSR background on the final postprocessed file. Minor, insignificant differences among the original and postprocessed radiosondes are observed. Intermittent systematic errors in the collocation of CFSR data to Antarctic dropsondes from the Concordiasi Project (see http://www.eol.ucar.edu/projects/concordiasi) were also noted and reported to EMC, which acknowledged processing problems in these isolated cases.
All of the appended forecast data undergo separate gap and cap analysis similar to the radiosonde. This was necessary because cases were observed in which GFS and, in particular, CFSR data were missing for existing report levels. Similar to the sondes, data above caps are retained but are not recommended for use in validation.
One problem in utilizing ensembles of radiosonde profiles is that each are reported at varying vertical level densities. This is mitigated by log/linear interpolation of the capped temperature and Td profiles to a fixed set of 42 pressure levels, 42L (Reale et al. 1994), plus the report surface pressure as available. The average thickness between successive levels ranges from about 500 m below 400 hPa to over 1 km above 100 hPa. Interpolation only considers the two report levels that are immediately adjacent to given 42L with report levels identical to 42L used directly. Averaging schemes were not applied given disparities in the vertical density of global reports. Original and interpolated report data are retained.
The tropopause temperature and pressure are calculated on 42L according to WMO guidelines (World Meteorological Organization 1957) and retained along with the reported (radiosonde only) tropopause data. Calculated tropopause data are used as the upper pressure boundary for superadiabatic layer, temperature inversion, and tropospheric moisture tests for each report.
Testing for superadiabatic layers is done for each 42L layer from the surface to the tropopause. Profiles containing tropospheric temperature inversions are characterized by the total depth (D), defined as the thickness (1) over which the lower and upper temperature bounds are the same. Figure 2 illustrates D as compared to the inverted layer depth d, the former estimating the threshold vertical resolution for a satellite sensor to begin resolving this feature (Rodgers 1976). An inversion is only defined if d exceeds 100 m and a surface inversion if the lower bound is within 100 m of the surface; the inversion in Fig. 2 is not a surface inversion. The inversion strength, estimated as the maximum temperature increase over D, is also retained.
NPROVS includes three tests to characterize the moisture profile. The first two tests characterize the profile by TPW (mm) amount and vertical variation of the DD, respectively. TPW categories are grouped into 15-mm intervals ranges, with interval five greater than 60 mm. DD variation tests identify profiles for which the DD is nearly constant for the extent of the moisture profile. Initialized to 0, the DD flag is set to 1 for profiles with variations less than 5.0 K, 2 for variations less than 2.5 K, and 3 for variations less than 1.0 K, the latter suggestive of instrument malfunction.
The final moisture test characterizes the profile shape by assigning a score that increases proportional to the number of abrupt changes in water vapor mixing ratio (MR) with height. The goal is to identify profiles that are ill posed for moisture retrieval using typical satellite sensors and in limiting cases indicative of radiosonde sensor malfunction (http://www.ua.nws.noaa.gov/study2.htm). Testing consists of sequential intercomparisons of MR at successive 42L levels to identify layers with increasing MR (moistening) or abrupt decreases in MR (drying). Testing begins at a pressure P1, defined as the upper bound of the surface layer (SFL) with thickness (1) closest to 1.0 km and ends two 42L levels below the calculated tropopause. This is referred to as the base moisture profile. Separate tests are also done within the SFL and for adjacent layers spanning the upper two levels of the troposphere (UT) and lower two levels of the stratosphere (LS), respectively.
Base profile testing compares successive MR values for interpolated levels from i to i + 1 to identify layers over which the MR increases with height. If encountered, a positive event E+ is denoted and the observed rate of increase ROIObs is determined using
where thickness (km) is given by (1) and Mean (MRi,i+1) denotes the mean MR for a given layer.
If a given ROIObs exceeds 50% km−1, then an extreme positive event xE+ is denoted. A given E+ ends when the MR for some upper level (i + 1, i + 2, etc.) becomes less than the original MRi that was exceeded, defined as MR0.
Concurrent testing is also done to identify successive levels with abrupt drying, denoted E−. This is defined as consecutive interpolated levels (from i to i + 1) for which TDi+1 is less than a “target” value, defined using
where X = 0.45.
If TDi+1 for some upper level is less than TDtarget for X = 0.70 in (3), an extreme negative event xE− is denoted. The threshold values for E+, xE+, E−, and xE−, namely, 0% km−1, 50% km−1, 0.45, and 0.70, were determined empirically with the overall goal to select values for the respective pairs [(E+ and E−) and (xE+ and xE−)] that occur at approximately the same probability distribution function percentiles and in a similar number of reports. Further discussion is presented in section 4.
The moisture score for the base profile is determined by counting the number of E+ occurrences from the top of the SBL to the two levels below the tropopause. Examples of profiles with respective moisture scores are illustrated in Fig. 2. The occurrence of a xE− or xE+ indicating more abrupt changes within a given base profile is also denoted.
A final set of tests characterize moisture in the SFL, UT, and LS, respectively. The SFL has relatively high moisture content and variability coupled with the reduced satellite sensor sensitivity in contrast to the UT and LS with low moisture and associated deficiency in the radiosonde observations. Numerous publications including Wang et al. (2003), Miloshevich et al. (2006), and Vömel et al. (2007) address measurement concerns in these atmospheric zones and implications for monitoring and validation. Moisture characterization in the SFL and UTLS identifies profiles for which a water vapor increase with height exceeds 10% km-1 (2). Such cases within the SFL are problematic for satellite validation and within the UTLS represent potentially suspicious moisture reports.
The complete radiosonde data record including original and interpolated 42L observations, appended forecasts, and test results are retained for collocation with satellite observations.
3. Collocation with satellites
The first step in the collocation process is to add 45 min to the radiosonde launch time, referred to as the target time. This is done to minimize satellite and radiosonde time differences in cases of two radiosondes from a given site that are equally distant in time from a candidate satellite observation.
The collocation strategy is to respectively access the output files from each satellite product system for a given day and select the “single, closest” satellite profile per system and radiosonde. A candidate satellite observation must be within ±6 h of the adjusted radiosonde launch time and 250 km of the radiosonde launch site.
The strategy for identifying collocations from each satellite system is essentially the same and does not consider available satellite ancillary information, such as terrain type, QC flag, or cloudiness. However, some consideration of a given satellite’s spatial and temporal aspects is needed to minimize collocation selection bias. For example, candidate observations from polar satellites that are equatorward of 60°N are typically from a single orbit only. However, for polar satellite data poleward of 60°N, GOES, and COSMIC multiple candidate observations an hour apart or more can lie within the time and space window. A generic approach for selecting a single closest collocation from respective satellite systems uses the closeness parameter C given as
Where C is the closeness parameter (km), Del(t) is the observed time difference from the radiosonde, F is a time penalty factor (km h−1), and Dist is the observed distance from the radiosonde.
Equation (4) only impacts the collocation in cases for which a set of candidates for a given satellite product vary on the order of an hour or more and is transparent to time changes among respective sets of candidates from a given GOES pass or polar orbit. The parameter F serves as a time penalty factor; for example, setting F to zero would result in the closest observation in distance being selected regardless of time difference, and making F large would result in the closest observation in time being selected regardless of distance. The goal is to set a pragmatic value of F for consistent collocation selection among all satellites, given their specific spatial and temporal characteristics and bounded by wind speed.
Table 1 highlights the variable spatial and temporal aspects of the satellite product systems compiled by NPROVS. For example, polar satellites provide global, all-weather, sun-synchronous observations at designated LST with periodicity at a given location ranging from 12 h in the tropics to 106 min (1.7 h) at high latitudes. GOES observations, conversely, are hourly and confined to clear regions centered over CONUS. Differences in product spatial density are reflected in their respective 50th percentile distances from the sondes, ranging from about 15 km for high-density GOES to a nominal 30 km for the lower-density polar satellites, and 175 km for COSMIC, with a density of about 1% of polar soundings.
Given the characteristics listed in Table 1 the value of F for compiling collocations with each product system group is set to 15 km h−1 for GOES, 30 km h−1 for polar satellites, and 50 km h−1 for COSMIC. For example, setting F to 50 km h−1, a typical wind speed in the troposphere is reasonable for COSMIC, but for polar satellites, and particularly GOES, would skew collocation selection to those closest in time.
Once collocations are made, the associated satellite data records are appended to the report data record creating individual collocation data records for each sonde. The primary information common from all satellites are the retrieved temperature and moisture profiles, but varying amounts of ancillary data, such as terrain designation, clouds, first guess, appended forecast data, and the sensor observations are also available from each system. NPROVS retains all the available satellite data from a given system. Some of the more common ancillary data for the respective satellites and systems are listed in rightmost columns of Table 1.
The collocation dataset is a primary output from NPROVS, providing identically processed baseline sonde, multiple satellite profiles, and appended information, including weather forecast and ancillary data. Approximately 1200 collocation data records are compiled daily and have been archived since April 2008.
The following section presents sets of results from the radiosonde and collocation processing and concludes with examples of satellite product system intercomparisons. Radiosonde results summarize the QC and meteorological aspects, including the frequency of occurrence of temperature inversions and moisture profile structures that are difficult to retrieve from satellite sensor data. Collocation results focus on respective satellite product sample yields under various sampling constraints and compiling common samples among multiple satellite product systems. Satellite product intercomparisons illustrate examples of characteristic satellite product performance and the impact of sampling strategy on the perceived performance of a given system.
Tables 2 and 3 summarize radiosonde screening results over respective 5-day periods in January and July 2010. Given that over 85% of all radiosonde reports are from the Northern Hemisphere, with over 65% poleward of 30°N, these reflect typical winter versus summer profile tendencies.
Table 2 lists the percentage of reports for which an EMC QC mark resulted in a rejected observation, a gap was observed resulting in a capped profile, and the observed or capped profile did not extend 5 km without a gap. Reports with rejected observations resulting from the QC mark tend to be higher during winter, with almost 40% of the reports having at least one datapoint rejection. Gaps were observed in about 8% of the reports overall, and approximately 5% of reports during winter and 3% during summer did not extend 5 km without a gap. The remaining columns of Table 2 show the percentages among reports exceeding 5 km that contained a superadiabatic layer, temperature inversion, and associated D values less than 1.5 km and greater than 2.5 km, respectively. The percentage of reports with superadiabatic layers is higher during summer and with inversions is higher during winter with an almost fourfold increase in inversion depths greater than 1.5 km during winter. Reports with missing or rejected surface data are slightly higher during winter versus summer and among reports with valid surface data about 32% indicated a surface inversion during winter compared to 23% during summer.
Table 3 lists the percentage of accepted reports during winter and summer with specified moisture profile characteristics. The features summarized are DD variation; base profile score and abruptness; moisture in the SFL, UT, and LS; and TPW content in 15-mm increments. Fewer cases of DD profiles that vary less than 5.0 K are observed during summer. Base moisture profile scores show a slight tendency for higher scores during summer associated with the increased moisture contents and transport; about 50% of profiles exhibit layers of “abrupt” moistening and/or drying. Moisture testing in the SFL and UTLS identifies profiles for which the water vapor increase exceeds 10% km−1. A twofold increase in such occurrences is observed in the SFL during winter and typically associated with temperature inversions. Occurrences in the UTLS are more frequent in the LS and associated with the tendency for profiles above the tropopause to have an approximately constant DD. The TPW increments clearly indicate the Northern Hemisphere influence on the global dataset with higher frequencies of higher values during summer.
Tables 4 and 5 provide additional statistical summaries underlying the moisture profile scores and selection of test thresholds. Table 4 lists the percentage of base profile layers containing specified mean MR increments ranging from >10.0 to <0.01 g kg−1, and the associated percentage of layers containing an E+ and among those an xE+ for the January and July periods. As expected, the distribution of layer mean MR values shifts toward higher values during summer; however, the frequency of E+ per increment is similar for both seasons, with a slight increase in xE+ occurrences during winter. The fact that the frequency of E+ does not steadily increase as layer mean MR decreases alleviates concerns for excessive test bias in cases of low MR.
Table 5 summarizes probability distribution function percentiles and associated fraction of radiosondes in which the specified ROI and X values were exceeded within the base profile during the January and July periods. Percentiles are referenced from the positive X (dry) and ROI (moist) tails of their respective distributions and are shown for values of maximum frequency of occurrence and ROI thresholds of 0% km−1 and 50% for E+ and xE+ and X thresholds of 0.45 and 0.70 for E− and xE−, respectively. Table 5 shows that the maximum occurring ROI values shift from −34% km−1 in July to −50% km−1 during January, with a corresponding shift from the 45th to 52nd percentile, whereas the maximum occurring X is approximately constant at 0.18 in July and 0.19 in January but shifts from the 49th to 43rd percentile. These seasonal shifts are consistent with the transition to more moist and warm summertime parcels.
Seasonal shifts are less evident in the distribution tails containing the selected ROI thresholds. As seen, the ROI threshold value of 0% km−1 is located at the 16th and 17th percentiles and is exceeded at least once in 73% of the radiosondes during July and January, respectively. The X threshold value of 0.45 occurs at the 11th and 9th percentiles during July and January, however, this value is exceeded in 72% of radiosondes in summer, lowering to 61% in winter. Even though the X value lies further out on the distribution tail, it is exceeded in the same number of reports as the ROI threshold during summer. The paired ROI and X values best satisfy the goal of using a single set of moistening and drying thresholds for all seasons that are exceeded in a similar number of reports with emphasis on moistening. Similar features are shown for corresponding extreme (x) event thresholds. As presented in the satellite products intercomparisons, the moisture characterization provides a meaningful sorting of reports in the context of satellite sensor sensitivity with notable impacts on results, including for the more highly resolved hyperspectral sensors, satisfying a primary objective of this work.
The characteristics of the collocation samples processed from the over 1200 collocation records that are processed and archived daily provide relevant information for interpreting satellite product performance. The left panel of Fig. 4 shows an example of global radiosondes over North America, color coded based on their assigned terrain type; in the middle panel a single set of collocated radiosonde and satellite products from Wallops Island, Virginia, is shown. The drift (pink) of the radiosonde with respect to the spatial domain of the collocated observations is also illustrated, and the right panel shows the multiple profiles of collocated temperature (solid) and water vapor (dashed) profiles. It is interesting to note the drift of the radiosonde (pink), which in this case clearly exceeds the spatial domain of the satellite data. A comprehensive global climatology of typical radiosonde balloon drift distances has been documented using 2 yr of NPROVS data (Seidel et al. 2011). Conventional collocation and validation strategies typically ignore the radiosonde drift in space and time, but explicit consideration of drift parameters is available through NPROVS data, as demonstrated in Sun et al. (2010).
The comparison of multiple product systems using independent versus common sampling strategies is a valuable analytical asset. In general, the use of a common sample over multiple systems is preferred because it improves the integrity of the intercomparison. However, as the number of systems and constraints accumulate, sample yields can drop significantly and potentially skew respective performance results. This is illustrated in Tables 6a,b, which provide examples of collocation sample yield ratios for individual product systems and constraints (Table 6a) and common samples of multiple systems (Table 6b). “None” indicates no constraints and corresponds to the ratio of the total number of collocations containing a given satellite product over the total number of radiosondes (i.e., 36 641). “QC” indicates the ratio of collocations for which a given product passed QC over the total number of collocations containing the given product. Similarly, “QC + sea” and “QC + clear” indicate the ratios of collocations over sea and mostly clear sky (<10% cloud cover), respectively, over the number of collocations that passed QC. Respective designations for QC, sea, and clear are as defined by the provider of a given product system.
One problem is that the respective product systems tend to define QC, terrain, and cloudiness differently, adding ambiguity to intercomparisons and reducing common sampling yields. For example, in Table 6a, yield ratios for IASI and AIRS pass the QC range from 0.32 for IASI EU (which rejects profile segments in cloud) to 0.74 for AIRS, compared to 0.95 or greater for ATOVS and MIRS. Similarly, among the QCed products, the MIRS sample yield for sea collocations is 0.18 compared to 0.10 for ATOVS and IASI, and a meager 0.02 for AIRS. Such differences clutter the intercomparisons, and in the case of common samples with multiple constraints they can significantly reduce sample yields and representation. For example in Table 6b, QCed products for ATOVS and MIRS (1 and 2) from METOP have a common sample yield ratio of 0.94. Adding QCed NOAA IASI also from MetOp (1, 2, and 3) reduces the ratio to 0.58, and adding QCed IASI EU (1–4) further reduces the ratio to 0.25. Adding additional constraints, such as cloudiness or sea terrain, reduces the final sample yield to about 0.15 and 0.01 of the original QCed sample, respectively, which in the best case would be about 4% and 0.3% of the originally available collocations. Tables 6a,b serve as a guide on sample yield “penalties” when comparing multiple systems with various sampling constraints.
Building on the results of Tables 2–6, examples of validation results and impacts are presented for various seasonal, radiosonde profiles, and satellite product attributes. Common sampling strategies are used for the shorter-term vertical profile statistics and independent strategies for the longer-term trends analysis. Results are compiled using features of the NPROVS PDISP and NARCS analytical interfaces (Reale et al. 2010), which can be downloaded along with associated datasets online (see http://www.star.nesdis.noaa.gov/smcd/opdb/poes/NPROVS.php).
As discussed, validation results are primarily indicative of characteristic product performance in a relative sense with outlier curves being suggestive of possible problem areas. Figures 5–8 show sets of vertical profile statistics of satellite minus radiosonde mean (left) and standard deviation (right) for temperature (K) and water vapor fraction (%) for the denoted satellite product systems; the GFS 6-h forecast minus radiosonde differences is a denoted GFS product. The periods of record are 30 days during January and July 2010, with pressure (hPa) along the vertical axis and sample size across the topmost axis. All curves are based on a common sample of QCed products for all terrain and sky conditions. Sample sizes are indicated by dark blue curves with amounts along the top axis. Water vapor fraction is expressed as a percent difference of mean water vapor (g kg−1) at a given level. A statistical technique to weight these differences in proportion to water vapor amount (Susskind et al. 2003; Tobin et al. 2006) is being considered, which can reduce standard deviations above 400 hPa on the order of 20%–40%.
Figures 5 and 7 illustrate product performance curves typical of Northern Hemisphere winter (top) and summer (bottom) seasons for temperature and water vapor fraction, respectively. Figure 6 isolates temperature performance during winter using radiosondes with (top) and without (bottom) temperature inversions, and Fig. 8 depicts moisture performance during summer using radiosondes with moisture scores of 0 and 1 (top) and 2 or more (bottom). Figures 5 and 7 exclude MIRS, which only provides moisture soundings over land, and Figs. 6 and 8 exclude COSMIC which has limited sensitivity to moisture. Otherwise, the systems compared in Figs. 5–8 are the same.
The common sample sizes (blue curve) used for Figs. 5 and 7 have associated yield ratios that are slightly under 10%, primarily because of the inclusion of the low-density COSMIC coupled with the relatively low QCed product yield of IASI EU. Using less than 10% of the total sample raises concern, especially the impact of IASI EU, which skews the sample to clear-sky conditions. However, within this context the intercomparisons are meaningful, and in fact represent a favorable satellite sensor retrieval environment. The exclusion of COSMIC in Figs. 6 and 8 more than doubles the available sample, but these are subdivided among the specified radiosonde features. In many cases a characteristic performance tendency for a given system becomes more discernible under different sets of constraints. For example, Fig. 5 shows that the NOAA IASI has distinct features near the surface in winter (top) versus summer (bottom). This performance is also reflected in Fig. 6 for the winter period even though the sample size is approximately doubled. Figure 6 also identifies a root cause of the outlier performance of NOAA IASI as surface temperature inversions.
Finally, factors as outlined in Table 1 also must be considered when analyzing respective satellite product performance. For example, COSMIC is a comparatively low-density, 250-km-resolved profile that is typically 150 km or more from a given sonde compared to the high-density polar and geostationary products that are typically less than 50 km from a report site. Figures 5 and 7 show elevated standard deviations for COSMIC (yellow), consistent with their increased mean distance from the sondes.
Given the above context, a variety of characteristic features are illustrated in Figs. 5–8 among the respective product systems. Most evident is that the GFS-minus-radiosonde differences are typically smallest, particularly for standard deviation, more for temperature than moisture. This is expected because they are spatially (horizontal and vertical) interpolated to the sonde and thus serve as a quasi target or baseline for satellite assessment. This feature is more evident for temperature than for moisture.
Among the satellite products, mean temperature differences are typically within ±0.5 K, except near the surface with a persistent cold bias observed for IASI EU and AIRS in the lower to middle troposphere (Fig. 5). Characteristic differences between winter and summer are most notable for temperature near the surface (Fig. 5), which is associated with the occurrence of inversions, particularly for NOAA’s IASI (Fig. 6). Comparisons against winter profiles with and without inversions show either little or no differences above the surface layer (Fig. 6). Increased vertical oscillations in mean temperature are observed for hyperspectral systems in the upper troposphere and stratosphere, particularly for NOAA’s IASI during summer (Fig. 5). Standard deviation for temperature is smallest for hyperspectral products, averaging about 1.5 K, excluding the surface and largest for ATOVS with average values approaching 2.5 K (Fig. 5).
Typical mean and standard deviation differences for moisture fraction (%) among satellite product systems range from 25% to 50% below 400 hPa to 100% or more in the upper troposphere (Fig. 7). As mentioned, moisture weighting can reduce perceived differences by up to 40% in the middle and upper troposphere. MIRS exhibits a relatively high moist bias in the lower troposphere, up to 35% during winter (Fig. 7). All satellites exhibit a moist bias in the upper troposphere (Figs. 7 and 8) because of the documented radiosonde dry bias (Soden and Lanzante 1996; Nakamura et al. 2004; Sun et al. 2010). Satellite comparisons against radiosondes with moisture profile scores of 2+ appear wetter with increased standard deviation in the low and middle troposphere (Fig. 8). These characteristics are associated with a slight decrease in mean moisture for these samples.
The panels of Fig. 9 show examples of satellite-minus-radiosonde mean (solid) and standard deviation (dashed) trends for 700-hPa moisture and 50-hPa temperature with the gray curve depicting GFS minus sonde differences. Differences are based on weekly averages of respective “independent” samples of QCed products from all the systems from Figs. 5–8 plus GOES. Unlike Figs. 5–8, the samples for each system are independent and optimized for each system under the common constraints that the satellite products pass their respective QC. For example, GOES samples are confined to CONUS under clear-sky conditions, whereas the polar systems are global with yields as shown in Fig. 6a. The period of record is one year from June 2009 through May 2010.
Long-term trend analysis using independent samples compliments the shorter-term comparisons using common samples. Immediately evident are the seasonal differences and pattern shifts, for example, the tendency toward higher standard deviation in winter, particularly for moisture and the shift in NOAA’s IASI performance around 20 September. Comparing contrasting parameters such as 50-hPa temperature, which is a relatively stable level with respect to satellite retrieval, and 700 hPa, which is a highly variable level and difficult to retrieve, presents a broad range of performance features.
Overall, the GFS minus radiosonde (gray) mean differences indicate a persistent 10% moist bias background at 700 hPa. MIRS SSMIS appears to be outlier at about 30% moist bias, MIRS MetOp and GOES are nominal at about 12% moist bias, and AIRS is an outlier at about 12% dry bias. Satellite standard deviation values appear lowest for the NOAA IASI with the highest values for ATOVS, IASI EU, and MIRS SSMIS. The seasonal shift toward higher values in winter is associated with the lower moisture values during this period (Table 3).
Temperature bias at 50 hPa is generally within 0.5 K and negative, except for ATOVS. GOES and COSMIC show the lowest sustained bias and standard deviations, with the latter complimenting the COSMIC status as a candidate reference temperature measurement in the upper troposphere and stratosphere (Kursinski and Hajj 2001). Two break points in the record are evident as mentioned earlier for the NOAA IASI and also the MIRS MetOp, which appears to merge with the group beginning in March. Standard deviation appears lowest for GOES and highest for MIRS and lower for NOAA’s IASI after 20 September (along with a slight shift in the NOAA IASI moisture).
The above examples represent the tip of the iceberg concerning validation strategies that are available using NPROVS. Available options for constraining samples that were not discussed include the use of specified time difference increments from the sonde, radiosonde instrument type, regional studies, and specific site analysis, but each comes with associated sample size reduction, particularly using common samples. However, over long periods the sample sizes for performing such analysis can be significant, justifying the NPROVS goal for maintaining a permanently expanding archive.
NPROVS provides NOAA/STAR with a centralized validation protocol for the routine monitoring and intercomparison of derived atmospheric weather products from polar-orbiting and GOES environmental satellites. This is achieved through the compilation and analysis of collocated conventional radiosonde, dropsonde, NWP, and independently processed satellite product observations. NPROVS compiles collocations on a daily basis with all collocations routinely archived at STAR. The associated EDGE analytical interface provides a variety of sampling options for detailed analysis on real-time weather (daily and weekly) and seasonal time scales.
A variety of data processing, quality control, and screening tests are done for the sondes, including the assignment of a terrain designation and the segregation of profiles containing temperature inversions and/or abrupt changes in moisture. A single sounding from each satellite system that is closest in space and time within a window of 6 h and 250 km is collocated for each sonde. Approximately 1200 collocation data records are processed daily and have been archived at STAR since April 2008. Results intercomparing the characteristic performance of satellite products systems and the utility of various sampling strategies and associated impacts on sample size yield and perceived characteristic performance tendencies are presented.
NPROVS applications are not limited to routine satellite product monitoring. For example, 2 yr of NPROVS collocation data have helped define uncertainties associated with space and time mismatch (Sun et al. 2011) and compile radiosonde balloon drift statistics (Seidel et al. 2011). NPROVS expansion is underway to support GRUAN in the analysis of observations from collocated ground instruments, and ultimately to integrate balloon and available ground data to derive a site’s atmospheric state best estimates (Tobin et al. 2006). Studies of radiation-induced temperature errors in global radiosonde data identified using NPROVS collocations containing COSMIC data (Sun et al. 2011) are helping to narrow down uncertainties in long-term upper-air temperature change (Sherwood et al. 2005) and to improve the radiation correction schemes used in NWP data assimilation and weather forecasting.
This work was supported by the JPSS Program in conjunction with the EDR calibration and validation of operational weather products from the NPP polar-orbiting satellite. We specially acknowledge Dr. Mitch Goldberg, Chief of the NOAA/STAR/Satellite Meteorology and Climatology Division (SMCD) who initially proposed the development of centralized multiple satellite product and ground truth validation protocols as a benchmark process at NOAA and ongoing support from our colleagues at STAR. We also acknowledge the NOAA EMC and in particular Jack Woollen for providing access and support for processing radiosonde observations and NCAR including Doug Hunt for support in accessing COSMIC data. The contents of this manuscript do not necessarily reflect any position of the U.S. government or NOAA.
Atmospheric Infrared Sounder
Advanced TIROS Operational Vertical Sounder
Calibration and validation
Climate Forecast System Reanalysis
Continental United States
Constellation Observing System for Meteorology Ionosphere and Climate
Cross-track Infrared Microwave Sounder Suite
Defense Meteorological Satellite Program
Environmental Data Graphical Evaluation system
Environmental Data Record
Environmental Meteorological Center
Earth Observing System
European Organization for the Exploitation of Meteorological Satellites
Same as EUMETSAT
Global Climate Observing System
Global Forecast System
Geostationary Operational Environmental Satellite
Global Positioning System Radio Occultation
GCOS Reference Upper Air Network
Global Telecommunications System
Infrared Atmospheric Sounding Interferometer
Integrated Global Radiosonde Archive
Joint Polar Satellite System
Local standard time
Meteorological Operational (satellites)
Microwave Integrated Retrieval System
NPROVS Archive Summary analytical interface
National Aeronautics and Space Administration
National Oceanic and Atmospheric Administration
National Operational Model Archive and Distribution System
National Polar-orbiting Operational Environmental Satellite System
NPOESS Preparatory Project
NOAA Products Validation System
Numerical Weather Prediction
Special Sensor Microwave Imager/Sounder
Center for Satellite Applications and Research
Television Infrared Observation Satellite
TIROS Operational Vertical Sounder
Total precipitable water
University Corporation for Atmospheric Research
World Meteorological Organization