Abstract
In any study, the collection, processing and storage of data are important. Whether the data are clean, biased or contaminated is also important. Pollution or adulteration of data confuse the investigator.
Data do not necessarily fall into neatly packaged boxes or groups. Usually the data sets are mixtures of several types of phenomena. Some of these are basically deterministic in nature while others are not.
This paper illustrates the use of a clustering technique to separate mixed data sets into subsets which exhibit group characteristics. The investigator then assesses the relative importance of the subsets, the nature of the subsets, and perhaps makes an assumption as to whether a particular subset is biased, contaminated or adulterated, i.e., an assessment of the quality of the data may be made.
The technique is applicable to any data set which is multivariate normal. Here it is applied to the climatological set composed of the winds, temperatures and heights at the Canton Island 30 mb level with particular application to the quasi-biennial oscillation of the tropical equatorial stratosphere.