The Comprehensive Aerological Reference Data Set (CARDS) is a database of radiosonde records from worldwide sources. However, many of the records are short or incomplete, and there are clear spatial distribution deficiencies.
The objective of this investigation is to select a subset of core stations, with data that give the best possible temporal and spatial coverage for studies of atmospheric temperature or water vapor trends. This is accomplished by assessing the records for content, and calculating the areas for which stations are representative, from ECMWF gridpoint analysis correlation decay.
A subset of 188 core stations is chosen. Their records still contain some periods of missing data; leave some of the earth’s surface unrepresented, mostly over the southern oceans; and may have quality or homogeneity deficiencies. However, coverage, as assessed from information currently stored in CARDS, has been optimized.
The Comprehensive Aerological Reference Data Set (CARDS) is, as its title suggests, a compilation of all available radiosonde and rawinsonde information into a comprehensive dataset (Eskridge et al. 1995). It currently contains over 24 million observations, from 2522 listed stations, extending from the early 1940s to 1992. The dataset continues to expand as records are updated, and, as original sources are tapped, missing information is filled in.
Some records are, however, of limited value for climate studies, because of their short, sporadic, or incomplete content. There are also easily recognizable deficiencies in spatial distribution, with Northern Hemisphere landmasses well covered but a significant lack of data from most other areas.
The objective of this study is to select a subset of core station data that investigators can use, confident in the knowledge that it will give the best possible temporal and spatial coverage of the globe, without unnecessary duplication. There is no suggestion that data not included in the subset should become unavailable or that use of the abbreviated station list should be mandatory. However, it is expected that the 188-station selection will prove to be useful as a dataset of manageable sizethat will form the basis for future CARDS products and applications.
There have been several climate change studies that have incorporated radiosonde data (Angell and Korshover 1975, 1983; Angell 1988; Oort and Liu 1992). Angell and Korshover (hereafter AK) used a relatively sparse, but carefully selected, network of 63 stations in seven bands of latitude: 60°–90°, 30°–60°, and 10°–30° north and south, and 10°N–10°S. With only nine stations in each band, there were large areas of the earth’s surface unrepresented, but equal area weighting translated this network into a global coverage adequate for calculating global trends. However, given that the distribution of radiosonde stations is heavily biased toward landmasses, especially those in the Northern Hemisphere, it is inevitable that some oceanic areas will be underrepresented.
Oort and Liu (OL) used a much larger dataset of 700–800 stations. Their criterion for inclusion was at least 10 days month−1 with observations to 500 hPa up to 1973, and 15 days month−1 thereafter, regardless of spatial distribution. A maximum of 849 stations met the criterion in 1989. The observations were weighted to represent equal area. In spite of the data-handling differences, global trends they found were similar to those detected by AK.
Parker and Cox (1995), in their recent study of radiosonde data, also mentioned 800 stations, but they recognize that this number is considerably reduced if only those with extended records (>20 yr) are included.
All of these authors have studied historic records.Objectives of the Global Climate Observing System (GCOS) are somewhat different (WMO 1994). GCOS hopes to establish a network of 142 rawinsonde stations worldwide, to monitor future climate. Many of these stations have long records (49 also appear in AK), but others are not yet open, have brief sporadic records, or are winds-only stations. Nevertheless, the list proved to be a good starting point for this investigation.
There have been several assessments of AK and OL estimates of climate behavior, mostly by comparison with model gridpoint data (Madden et al. 1993; Trenberth and Olsen 1991). The consensus conclusion was that, while there are deficiencies inherent in using inevitably imperfect spatial sampling, they are probably less of a problem than temporal inadequacies. However, rms differences between AK regionally averaged time series and European Centre for Medium-Range Weather Forecasts (ECMWF) gridded data are of the same order as the sought-after signal, and trends could possibly be exaggerated.
The effects of incomplete station records have also been studied (Kidson and Trenberth 1988). They found that, because of autocorrelation, if the missingdata are evenly spaced, climate may be defined by as few as eight observations per month. If, however, the gaps are randomly distributed, or there are blocks of missing data, the most frequent real-world situation, then the ratio of rmse of estimated monthly mean gridpoint values to daily standard deviation, increases substantially.
Temporal and spatial distribution deficiencies are not the only problems involved in the use of radiosonde data for climate studies. Eskridge et al. (1995) suggested that many records contain errors, and they have classified them as from random or systematic observational sources, or as “rough” errors such as those that might occur through poor radio reception. CARDS data have been passed through quality control procedures (Alduchov and Eskridge 1996), which include spatial, temporal, and hydrostatic consistency checks, to detect and rectify the random and rough errors.
Systematic observational biases are, however, unlikely to be detected with these checks. They may be from a variety of sources (Elliott and Gaffen 1991; Gaffen 1993, 1994) that include changes in radiosonde design or sensors, changes in algorithms used to calculatethe variables or in the application of lag or radiation corrections, changes in observation time, changes in train length, and changes in balloon type or tracking method. It is clear that upper-air observation has developed considerably since its beginnings in the 1940s, so few long records are free of such biases. Indeed, Gaffen (1994) estimated that 43% of AK station records have significant inhomogeneities.
The two essential criteria for inclusion in the core station list must, therefore, be long-period records, with gaps reduced to a minimum, and the data representative of areas that, when composited, cover as much of the globe as possible. It is probably safe to assume that random and rough errors have been detected by CARDS quality control procedures (they are, in fact, not removed but are flagged and an additional, corrected value inserted), but systematic biases are so pervasive that they may be unavoidable. As a consequence, no core station quality criteria have been adopted. However, after selection, perhaps the first product should involve inhomogeneity detection and adjustment.
Of secondary importance, but necessary for researchcontinuity and convenience, priority should be given to AK or GCOS stations.
2. Selection procedure
CARDS has observations from 2522 stations that have World Meteorological Organization (WMO) identification numbers. These stations, however, include radiosonde-launching cargo ships, some stations that only have information for short periods, and others in which the data are sporadic. Although most records contain a full suite of variables (pressure, temperature, water vapor content, wind speed, and direction), there are some with wind observations only. All of these incomplete records are clearly unsuitable for inclusion in a core station set.
There are also many of the 2522 listed sites that have either very close, or even coincidental, locations. Data from these stations may be consolidated into longer records, with fewer missing periods. Some are simply a WMO identification number change for the same station with no period of disruption. Others have involved arelatively short distance site move, possibly with a discontinuity from change of equipment or position, but often with no gap in the daily observation routine. In other cases, there was no intention of continuity, but nevertheless, one relatively long time series, even with gaps, may be more useful than two or more disconnected short records. Discontinuities in combined station data, arising from equipment, administrative, or personnel changes, were not expected to be of greater magnitude than those from the same sources in long, single-station records.
Analysis of the database, with a proximity criterion of 150 km, found that 551 listed sites have near neighbors, and their data are amalgamated into 256 combined records. The majority of these records have no temporal overlap and they are merged. Where there is an overlap, the average is taken [i.e., (n1T1 + n2T2)/(n1 + n2), where n is the number of days of observation in a month and T is the temperature). Two further additions have been made to this cross-referencing procedure, with GCOS stations 930120 (Kaitaia, New Zealand) and 479710 (Chi Chi Jima, Japan) being combined with 931190 (Auckland, New Zealand) and 479810 (Iwo Jima, Japan), respectively. They are approximately 250 km apart in both cases but consolidate into near-continuous records. The site identification number and location of the latest data are used for all of these combined records.
Analysis of record content is the next priority. Although CARDS data run from the early 1940s to 1992, focus is placed on the standard 30-yr climatic period, 1961–90. Data from before this period are, in fact, generally sparse, and they also contain the global discontinuity that occurred in 1957 when observation times were changed from 0300 and 1500 to 0000 and 1200 UTC, respectively. Figure 1 shows the distribution of the 471 stations that had 75% or more months, with a minimum of 10 days of temperature observations to 100 hPa month−1, in the 30-yr period. A similar assessment was made for 1971–90. Figure 2 shows the locations of the 555 stations that satisfied the criteria for the shorter period. The general scarcity of data from the Southern Hemisphere and oceans is very apparent.
WMO recommendations suggest optimum radiosonde spacing of 600 km (WMO 1977), but this is mainly to catch synoptic-scale disturbances. Such a dense networkis not necessary for study of global- or regional-scale trends of temperature or water vapor and, in fact, enhances the risk of introducing inhomogeneous data.
To ascertain area of representativeness, time series of ECMWF monthly gridpoint 850–300- and 300–100-hPa thickness data are used, with boundaries drawn where correlation coefficients decay to less than the inverse of e, the natural base of logarithms (1/e = 0.3679).
The information was acquired from the National Climatic Data Center in CD-ROM format, and thickness values are obtained by subtracting monthly geopotential heights at each point on the 73 × 144 global grid. Seasonal, latitudinal means are then removed, and 1980–89 time series of the anomalies are analyzed. Correlation between points and adjacent points, in zonal and meridional directions, are calculated, and distance of decay to 0.3679 is noted. These distances are then averaged over the 144 points of each latitude band, and the resultant dimensions are assumed to indicate an appropriate area of representativeness. As expected, the decay distances are not isotropic. The areas have elliptic boundaries with symmetric majoraxes in the zonal direction but assymetry of the minor meridional axes.
These dimensions are, however, calculated for average monthly thickness values only and have been smoothed by taking latitudinal means. They may not be indicative of the correlation decay of other climatic variables and would certainly be different for other timescales, so they do not represent a “worst case scenario.” For safety in their use as station selection criteria, therefore, the dimensions are halved. Additional precautions are also taken by only using tropospheric values, which are generally lower than those calculated from 300–100- hPa thicknesses. For ease of computation in handling meridional asymetry, the smaller of the two minor axes is always chosen. The dimensions relevant to each station are then calculated by interpolation from the values at adjacent grid points.
For final core station selection, it was intended that priority be given to AK and GCOS sites, so their areas of representativeness are plotted and are shown in Figs. 3 and 4, respectively. Both are drawn against a background of the “good” 1971–90 stations in Fig. 2.
The 63-station AK network was never intended to give maximum worldwide coverage. The sites were carefully chosen for representativeness in a study of global upper-air temperature trends. It is, therefore, no surprise that there are huge gaps, some of them, such as over the United States, that can easily be filled.
Figure 4 shows that the GCOS network also has significant gaps. However, using this 142-station plot, together with the background 1971–90 sites, it is possible to identify unrepresented areas and, at least over the northern landmasses, to select stations that give full coverage.
As noted earlier, the GCOS list has some sites that might be considered unsuitable for inclusion as historic dataset core stations. After investigation, some have not been used; others have been replaced with more complete, relatively close stations; and a few have been retained as there is no realistic alternative. Several additional stations are also added for more complete coverage. In spite of overlapping areas, because of their often incomplete content, no GCOS tropical records were excluded from the CARDS subset.
The final CARDS core station subset contains 188 records, some of which are composite. Figures 5 and 6 show their distribution and areas of representativeness, respectively, the latter also with the 1971–90 background. Table 1 lists the stations, together with their positions and elevations, and whether they have combined records or appear in either the AK or GCOS lists.
It may be indicative of the care AK took in selection that 59 of their 63 stations are included in this subset. There are 119 from the GCOS network, and 49 appear in all three lists. There are also 36, listed as if from single stations that are, in fact, from two or more cross- referenced records. These stations, together with their content, are listed in Table 2.
The CARDS database contains approximately 24 million observations from 2522 listed stations, extending from the early 1940s to 1992. The bulk of these data are in a relatively small number of records, as illustrated by only 555 stations meeting the criteria of 75% or moremonths, with at least 10 days of temperature observations to 100 hPa month−1, in the period 1971–90. This does not mean that information not contained in these records is irrelevant to all investigations but clearly indicates that use of CARDS for climate studies must involve selection of suitable data. The objective of this investigation is to bypass this selection process, at least partially, by basing CARDS products on a subset of“good” core stations that are strategically positioned for optimal coverage.
By a process that involves elimination of short, sporadic, and winds-only station records, consolidation of some that are very close, and then careful selection to ensure optimal coverage, a core set of 188 sites is chosen.
There are significant areas that are not included, mostly over the oceans and, although quality control procedures have been applied to the dataset, neither completeness nor homogeneity can be guaranteed. It should be emphasized that the content analysis is based on the records currently stored in CARDS. It may be possible to rectify some deficiencies by adding to, or amending,the list as further information is received. An interesting by-product of this research is the clear indication that there are enough stations for full tropical representation, although their records are frequently sporadic and some of the data might be of questionable quality.
Neither atmospheric sounding by satellite-borne instruments nor modeling have made radiosonde data obsolete. In fact, these data may possibly be more essential than ever, as they are the only viable historic source of upper-air data and are the only “ground truth” available for verification of models or calibration of satellite information. Products based on this 188-station subset of CARDS are likely to prove very useful for upper-air research.
The CARDS program is supported by the U.S. DOE under Contract DE-AI05- 90ER61011, the Climate and Global Change program of NOAA, and the National Climatic Data Center. I would like to thank Mr. S. R. Doty, Dr. R. E. Eskridge, and Ms. H. V. Frederick of the CARDS project for their help and support.
Corresponding author address: Dr. Trevor W. Wallis, The Orkand Corporation, Federal Building, 151 Patton Avenue, Asheville, NC 28806.