Using imagery from NOAA’s Advanced Very High Resolution Radiometer (AVHRR) orbiting sensor, one of the authors (RLB) earlier developed a probabilistic neural network cloud classifier valid over the world’s maritime regions. Since then, the authors have created a database of nearly 8000 16 × 16 pixel cloud samples (from 13 Northern Hemispheric land regions) independently classified by three experts. From these samples, 1605 were of sufficient quality to represent 11 conventional cloud types (including clear). This database serves as the training and testing samples for developing a classifier valid over land. Approximately 200 features, calculated from a visible and an infrared channel, form the basis for the computer vision analysis. Using a 1–nearest neighbor classifier, meshed with a feature selection method using backward sequential selection, the authors select the fewest features that maximize classification accuracy. In a leave-one-out test, overall classification accuracies range from 86% to 78% for the water and land classifiers, with accuracies at 88% or greater for general height-dependent groupings. Details of the databases, feature selection method, and classifiers, as well as example simulations, are presented.
The proliferation of satellite imagery today is surpassing our manual ability to provide meaningful and timely analyses for the user. This statement is particularly true for the U.S. Navy, for which frequent tour rotations preclude the buildup of imagery interpretation expertise at remote, worldwide locations, particularly aboard ship. For these reasons, the Naval Research Laboratory (NRL) has, over the past 8–10 years, applied pattern recognition technology, now commonly called computer vision, to meteorological satellite imagery. This research has focused on cloud system (Peak and Tag 1992, 1994; Bankert and Aha 1995), as well as individual cloud, classification (Bankert 1994; Bankert and Aha 1996). Recently, automated tropical cyclone intensity analysis has been undertaken (Bankert and Tag 1998).
A number of researchers have addressed cloud classification from a variety of perspectives. Other imagery classification studies include, but are not limited to, the discrimination of surface and cloud types in polar regions (Ebert 1987, 1989; Key et al. 1989; Key 1990; Welch et al. 1992), the classification of cloud types in tropical scenes (Shenk et al. 1976; Desbois et al. 1982;Inoue 1987), the discrimination of ice and water clouds (Knottenberg and Raschke 1982), the separation of clouds and snow (Tsonis 1984; Allen et al. 1990), and the classification of ocean clouds (Garand 1988). Lee et al. (1990) concluded that nonparametric classification methods such as neural networks and K nearest neighbor produced results superior to, and needed less training data than, linear discriminant analysis in the classification of stratocumulus, cumulus, and cirrus. More recently, Lewis et al. (1997) added spatial and temporal cloud characteristics to conventional spectral and textural cloud features in an attempt to improve classification accuracy. Using Advanced Very High Resolution Radiometer (AVHRR) imagery, Miller and Emery (1997) developed an automated classifier for both underlying land and ocean surfaces; the technique is intended for determining the choice of a passive microwave rain-rate algorithm for use with the Tropical Rainfall Measuring Mission satellite. Baum et al. (1997) applied fuzzy logic in their classification approach to discriminate between single- and multilayered clouds. Lubin and Morrow (1998) applied Ebert’s (1987) classifier to overwater AVHRR Arctic scenes in which surface observations of cloudiness were used as ground-truth verification.
Cloud classification procedures by the above authors serve diverse purposes. Our purpose is to develop a classifier that can provide a quick “first-look” analysis of conventional meteorological cloud types to a naval meteorologist. Because the only digital imagery data that U.S. Navy ships can acquire directly is from orbiting satellites, we limited this initial work to AVHRR [in contrast to Defense Meteorological Satellite Program (DMSP)] imagery because of its availability. As with other artificial intelligence (AI) products that we have developed, such as expert systems, we treat this product as an aid to the forecaster and not an end in itself. Note that, because we originally expected to apply any developed algorithm to DMSP data, which has only corresponding visible and infrared channels, the AVHRR development was limited to the use of channels 1 and 4 only. Current development of a Geostationary Operational Environmental Satellite (GOES)-based classifier will consider additional channels.
Section 2 discusses the development of the cloud databases used for our supervised cloud classifications. Section 3 covers the feature selection and classification procedures. Section 4 summarizes the classification accuracies for the water/land daytime components of the classification system. Section 5 discusses postprocessing and display. Section 6 provides a summary of lessons learned and final discussion.
In general, classification procedures can be divided into two types: supervised and unsupervised. The maritime daytime classification procedure developed by Bankert (1994) was based upon supervised classification, using a database of NRL-classified cloud images (Crosiar 1993), supplemented by approximately 600 classifications from the Naval Postgraduate School (C. Wash 1991, personal communication). See Table 1 for a listing of the regions and months covered by the maritime database.
The premise of supervised classification is the “training” of a classifier based upon known, typed cases of specific clouds (in a 16 × 16 pixel area for our purposes) such that the classifier, once trained, can be used with confidence on unknown cloud image samples. This method, although straightforward, entails considerable effort in the manual typing of the training samples. Bankert’s (1994) database had a total of 1633 samples in 10 classes [later increased to 11 classes (see Table 2), for a total of 1834 samples, to allow for the addition of cumulus congestus (CuC) and altocumulus (Ac), and the removal of nimbostratus (Ns)]. Manual classifications by the experts included the following requirements: stratiform clouds had to cover 75% or greater of a 16 × 16 pixel area; for cumuliform clouds (except Cb), the coverage requirement was 50%. Note that the only cirrus-type cloud allowed to be defined as optically thin was Ci (not Cs or Cc). Samples did not include any areas covered by smoke or fires. Considering the multitude of these complexities, it would be very desirable if training data could be generated without the time-consuming process of manual classification.
One automated method for creating such a database would be to use unsupervised learning techniques. An unsupervised method allows the classifier to determine, using a mathematical separability of classes based upon the designated cloud characteristics (features), its own division of cloud types. These types may or may not coincide with conventional typing nomenclatures, but are often useful in understanding new sets of data for which there are no recognized separations. In an attempt at determining whether unsupervised classification could be used to generate a land database, Gordon et al. (1995) applied two unsupervised learning procedures, the K-Means algorithm (MacQueen 1967) and AutoClass (Cheeseman et al. 1988), to our previously hand-classified maritime cloud database. Our goal was to determine if existing clustering algorithms could type clouds into classes similar to conventional meteorological cloud classes (Table 2). Without presenting the detailed results here, we concluded that the classes developed by the clustering methods, although generally consistent with the classes shown in Table 2, were not sufficiently segregated in order to formulate a new land database.
Because of limitations and uncertainties imposed by using an unsupervised learning procedure, the general procedure established by Crosiar (1993) was followed again for development of the cloud database over land. Over a 2-month period during the spring of 1996, three meteorological satellite experts (Mr. Robert Fett and Mr. Ron Englebretson of Science Applications International Corporation, and Mr. Robin Brody of NRL) were employed to hand-classify cloud samples on pixel data taken from AVHRR local area coverage images, the same as used by Crosiar. The AVHRR instrument collects radiance measurements of reflected solar radiation and emitted thermal infrared (IR) energy in five spectral window regions at a spatial resolution of 1.1 km at nadir. Identical to Bankert’s (1994) scheme, visible channel 1 (0.55–0.68 μm) and IR channel 4 (10.5–11.5 μm) were chosen as the channels to be used in the classification procedures. As was done with the maritime cloud data, visible channel data are normalized by the cosine of the solar zenith angle (SZA). For both the training data and later operational application, an SZA of less than 75° was arbitrarily defined.
Data from 13 worldwide regions were used for the training/testing dataset; see Table 3 for a listing. In addition to the visible and IR imagery, the human classifiers had pixel-level temperatures (computed from the IR data) and surface and upper-level synoptic charts at their disposal. From the imagery represented in Table 3, a total of 7912 16 × 16 (see section 3 for size restrictions) pixel cloud samples were analyzed by the three experts for the cloud classes shown in Table 2. Of these, all experts agreed on the classes of 2945 samples. However, because many of these agreements were for samples that fell into an unknown or “mixed” cloud category, only 1201 samples were of use in a supervised classifier. Samples were considered mixed if more than one cloud type was determined to exist in the 16 × 16 pixel region. No mixed samples are used in the training data. With 7 of 11 classes falling below the nominal 100 samples per class (see section 2a), we decided to augment this number with an additional 421 samples in which two out of three experts agreed, and the third differed only in type but not in height of cloud (low, middle, high). For example, if two experts classified an image as Sc and the third as Cu, the Sc classification was accepted; however, if two classified an image as As and the third registered Cs, this sample was thrown out (see Table 4).
Eliminating an additional 88 samples for either noise or SZA restrictions, and later adding 71 independent samples for cirrocumulus (Cc, for which there were only seven agreed-upon samples in the original database), produced a total of 1605 samples, distributed as shown in Table 5. Note that there are disproportionally more samples in the clear category. However, the number of clear samples was not limited for two reasons: these samples had agreement from all three experts and represent an important component for the classifier—a worldwide database of land backgrounds valid nearly year-round. Except for this qualitative worldwide sampling, no sampling distinction was made with regard to vegetation, soil type, or wetness.
Feature selection and classification
The visible and IR imagery databases form the basis for computing features used to classify 16 × 16 pixel areas into the 11 cloud classes shown in Table 2. We separate features, for both water and land classifications, into one of three categories: spectral (e.g., maximum and minimum pixel values), textural (spatial distribution of pixel values), and physical (e.g., cloud fraction, latitude). Because textural features are computed within subregions (4 × 4 pixel areas) of the samples, a 16 × 16 pixel area is the minimum area for which textural characteristics can be defined. For the results presented here, there are 181 total features for the land classifications, and 110 for the water, calculated from this 16 × 16 pixel area (see Table 6). [The original water classifications described by Bankert (1994) used over 200 features; for technical reasons, this total feature set was reduced with no loss in accuracy.]
From this large number of features, we must isolate those features that optimize classification accuracy. Because there is redundant or irrelevant information within the feature set, the use of all features necessarily degrades classifier performance. Feature reduction is performed with a feature selection algorithm developed by Dr. David Aha from the NRL AI Center in Washington, District of Columbia. This algorithm, which uses backward sequential selection (BSS), is embedded with a 1–nearest neighbor (1–NN) classifier acting as the evaluation function. In effect, different randomly selected feature subsets are presented to the algorithm. Within the algorithm, BSS is a process in which features are progressively removed from the feature subset until performance (using 1–NN evaluation) does not improve; see Aha and Bankert (1995) and Bankert and Aha (1996). The reduced subset that provides the highest accuracy in a cross-validation test is selected. The feature selection sequence is illustrated in Fig. 1.
The feature selection testing procedure uses leave-one-out (LOO) cross validation of the 1–NN classifier. A LOO test is an extreme version of cross validation. Whereas, for example, 10-fold cross validation removes 10% of the dataset each iteration to be used as testing data (with the remaining 90% used for training), with this process repeated 10 times, a LOO cross validation repeats the training and testing the same number of times as there are data samples; if there were 1000 data samples, the training and testing process would be repeated 1000 times, with each sample serving as an independent test based on a training set of 999. The average of the 1000 tests would provide an estimate of the algorithm’s accuracy on unseen data.
In operations, the 1–NN classifier is used with the feature set selected by the BSS/1–NN algorithm. This procedure is different from that of Bankert (1994), who used a probabilistic neural network (PNN) to do the classification. The changeover to a 1–NN was made because of its use in the feature selector. Because the PNN requires manual adjustment of a smoothing parameter to optimize performance, the PNN was not suitable for the iterative feature selection process. Features selected using the 1–NN as the evaluation function were usually not ideal, with some accuracy lost, when used as the PNN input vector. As noted by Lee et al. (1990), both the 1–NN and the PNN are nonparametric classifiers, referring to the fact that no a priori assumption is made regarding the statistical distributions of the supporting feature sets. This characteristic permits accuracies for the 1–NN and the PNN to be similarly high.
From the initial feature sets summarized in Table 6, and using the Aha feature selection program, 10 features for water and 11 features for land provided the highest 1–NN LOO accuracy. See Tables 7 and 8. Textural measures, representing the spatial distribution of pixel gray-level values within the image, are defined as GLDV, referring to gray-level difference vector, and SADH, referring to sum and difference histogram computations. ASM is the angular second moment; SD, standard deviation; and SZA, solar zenith angle; ABS refers to absolute value. See Welch et al. (1992) for a detailed description of texture measurements.
LOO testing, as described in section 3, is used to compute accuracies. As noted earlier, redundant or irrelevant information within the feature set degrades classifier performance—for example, the use of all features (Table 6) produces an 11-class accuracy of 73.5% for water and 68.0% for land. Specific portions of the feature database produce lower accuracies as well—the use of spectral features alone results in 76.2% and 68.0% for water and land, respectively. Using a mixture of feature types almost always produces a higher accuracy. Detailed accuracies using the selected features listed in Tables 7 and 8 are now presented.
Tables 9 and 10 provide LOO testing accuracies for the water and land databases, respectively. The overall, 11-class accuracies for the water and land classifiers are 86% and 78%, respectively. Individual cloud class percentages vary from 98 (Cl) to 72 (Ac) for water and 96 (Cl) to 57 (As) for land classifications. Overall, the water classifications are superior to those for land, probably because of the large disparity of land backgrounds in comparison with the relatively minor variations in underlying water conditions.1 The lowest accuracy percentages for water occur for Cs (75) and Ac (72), with all other accuracies at 85% or higher. Over land the accuracies are not as good, with As (57%), Ac (57), and Cc (58) being the lowest, and other class accuracies above 70%. Although the relatively low accuracies of Ac and As are due partially to misclassifications between the two classes (see section 4b), many of the additional misclassifications are with the lower Sc. Also, notice the number of Cb samples misclassified as high clouds (Ci, Cc, and Cs) and vice versa. These latter misclassifications remind us that the algorithm is basing its classifications on cloud-top information only, and that such a viewpoint by itself can lead to ambiguity. Inclusion of features computed from channel 3 (3.7 μ) could have improved these accuracies and will be a part of the upcoming GOES classification project (see section 6).
A close look at the confusion matrices (Tables 9 and 10) reveals that many of the misclassifications are between clouds of similar height (e.g., cirrus versus cirrostratus; altocumulus versus altostratus, etc.). For many users, a classification based primarily upon height provides a sufficient depiction without the meteorological detail provided by Latin-based cloud types. For this reason, the classifications have also been grouped into classes labeled low, middle, high, vertical, and clear. The “vertical” class was chosen to represent cumulus congestus and cumulonimbus, two classes that indicate convection of some significance. Tables 11 and 12 summarize these accuracies, which, by definition, are higher, with average overall values of 92.7% and 88.1%, respectively, for the water and land classifications. Simple groupings of 11 classes into 5 classes were shown to have approximately the same accuracy as selecting features based upon a 5-class dataset (as done in Bankert 1994). The lowest accuracy over water is 86%, for middle-level clouds. Over land, the lowest accuracy is also for middle clouds (75%), with vertical clouds at 76%.
Processing and display
The classifications described in section 4, based upon the training data, consider individual, hand-typed samples of 16 × 16 pixel areas. When the samples are not as pure as the training set (i.e., mixed cloud types), our“winner-take-all” classifiers type every 16 × 16 pixel area into a single cloud type. In practice, this classification procedure must be applied sequentially to an entire image. Rather than moving 16 pixels at a time across the grid (resulting in the minimum amount of computation), the classifier is applied every 8 pixels instead—see Fig. 2. As a result, most pixels (except those near the edge of the domain) are classified four times rather than once. The dominant classification for a pixel is the chosen classification. This method, although taking more computational time, provides more resolution and a softer appearance for cloud boundaries. Dependent upon the underlying surface, the water or land classifier is used to classify each 16 × 16 pixel area in an unregistered image. Along a coast line, the choice of classifier is dependent upon where (land or water) the majority of pixels in a square reside.
The classifications and corresponding satellite images are available for viewing on the satellite page of the NRL Monterey Web site (www.nrlmry.navy.mil/projects/sat_products.html) on the Internet. Sample output from the classifiers, illustrating typical winter and summer West Coast weather patterns, is shown in Figs. 3 and 4.2 The Web page allows several display options. Both unregistered and registered satellite imagery (with both visual and IR) and classifications are available. For unregistered imagery, the entire satellite pass is shown. For the registered image, a specific West Coast subsection is isolated and redisplayed in Mercator coordinates. (Note that for unregistered imagery from a satellite perspective, all pixels are the same size; in transforming to a Mercator projection for the registered imagery, pixel size becomes distorted.) For the Web presentations, both 11- and 5-class displays are available. Cloud types are color-coded, with general color classifications as follows: yellow, high clouds; green, midlevel; blue, low level; and red, vertical (CuC and Cb).
Evaluating classification output and making improvements have been ongoing since the summer of 1997. One of the obvious observations was that single classifications for each 16 × 16 pixel area visually dominate regions for which we know that clouds are not continuous, particularly those involving cumuliform clouds such as cumulus, stratocumulus, cumulus congestus, and cumulonimbus. For these cloud types, there can be significant clear areas between the clouds, particularly at the AVHRR resolution of 1 km. For this reason, one of our colleagues (J. E. Peak 1997, personal communication) made the suggestion that a postprocessing step might be in order: after all classifications are performed for the entire image, we examine each image pixel to determine if that pixel represents clear conditions. This process, although more complicated over land than over water, can be a simple test based upon whether the pixel albedo (visible channel) lies below a certain threshold. For example, we found that an albedo of 15% (after correcting for SZA) provides a conservative threshold. If a pixel is deemed to be clear, that pixel is then set to black, representing clear conditions. This postprocessing step, along with the computational smoothing technique discussed earlier, has the significant benefit of making the classification pattern appear, visually, much more realistic. Future work may involve enhancing this postprocessing step to include pixel temperature as well.
Discussion and lessons learned
The cloud classification algorithms used to produce the displayed classifications over land and water (see Figs. 3 and 4) have been evaluated over the eastern Pacific and the western and central United States during the summer, fall, and winter months using NOAA-14 imagery valid between 2000 and 2300 UTC. During recent months we have evaluated morning NOAA-12 and NOAA-15 imagery as well. Some classifier strengths have become evident. For example, aside from generally acceptable classifications of small cloud groupings, the distribution of cloud types around midlatitude storm systems appears reasonable. The postprocessing step in which a pixel-by-pixel examination identifies clear pixels within classified (16 × 16 pixel) cloud areas results in a more accurate product, particularly for cumuliform clouds.
During early development, to maximize LOO accuracy, we included region as a feature for the land classifier. Both the original 12 regions and 4 integrated, more general categories—desert, high latitude polar, eastern continental (such as east Asia and eastern North America), and western continental (including the Mediterranean)—were tried. While including region resulted in a several percent gain in LOO accuracy, testing revealed, in practice, occasional geographic classification discontinuities when more than one region existed in a satellite pass. This result stems primarily from the fact that the training database is far from being complete in terms of cloud types for all regions. For this reason, region was eliminated as a feature. We understand that this limitation means that the classifier must be retested for any new region in which the classifier is applied.
As with any numerical analysis based upon computer vision, the product must be interpreted and its idiosyncrasies noted. The following list summarizes a few of the classifier’s weaknesses.
Because entirely different classifiers (in terms of features and training data) are used on either side of a coastline, an occasional discontinuity in cloud type can occur across that boundary.
For the few cases that we observed last summer in which sunglint was seen over the water, the classifier appeared not to be fooled by the increased reflectivity. However, recent classifications of high-angle NOAA-12 morning imagery show that sunglint can be labeled as a low cloud. As a result, sunglint areas must be interpreted with caution. Because the location of sunglint can be determined mathematically, one option being considered for presentation purposes is an automated delineation of sunglint areas.
The classifier can be fooled by snow-covered surfaces, particularly over high terrain. This weakness has been noted in springtime classifications for the Sierra Nevada of California. Fortunately, an analyst can discern this anomaly because of the recurring patterns from day to day.
Classifications over water tend to overestimate the occurrence of Cb.
Our near-term plans involve evaluating the classifiers in other parts of the world, the first two being the eastern United States and the European/Mediterranean area, geographic regions covered by the U.S. Navy’s regional weather centers in Norfolk, Virginia, and Rota, Spain. In addition, we are developing software to create expert-defined databases for a similar GOES classifier. Development of this classifier should proceed more smoothly, evolving from our experience thus far, and produce changes that address many of the weaknesses noted above. Improvements, such as including the 3.9-μ GOES channel, should provide a more robust, accurate, and useful classifier that can take advantage of the geostationary satellite’s high temporal frequency.
The support of the sponsors, the Office of Naval Research under Program Element 0602435N, and the Oceanographer of the Navy through the program office at the Navy’s Space and Naval Warfare System Command under Program Element 0603207N, is gratefully acknowledged. The authors wish to thank the following individuals: Mr. Robert W. Fett and Mr. Ron Englebretson of Science Applications International Corp. (SAIC) for their careful typing of the 8000 land cloud samples; Dr. David Aha of NRL for his feature selection algorithm; Mr. Jonathan Allen of Computer Sciences Corporation (CSC) for implementing the classifier code and developing the classification graphical display; Mr. John Kent, of SAIC, for the overall Web page design and the integration of the cloud classifications into the page; and Mr. James E. Peak of CSC for writing the summarization code for the land database and his suggestions for an improved classification display.
Corresponding author address: Dr. Paul M. Tag, Department of the Navy, Naval Research Lab, Monterey, CA 93943-5502.
We realize that region-specific classifiers would produce better results, but our limited numbers of samples would make such a classifier less robust. However, the distribution of the misclassifications leads us to believe that this single classifier is acceptable. Application to regional data on a worldwide basis will provide optimum testing.
For the winter depiction shown in Fig. 3, the classifier distinguishes the various frontal clouds (cirrostratus/cirrus tapering off to altocumulus/altostratus toward the southwest where the front weakens) with embedded Cb associated with the frontal band. Also shown, as one would expect, are the cumulus congestus that predominate behind the front on the cyclonic side of the polar jet stream, while stratocumulus predominates ahead of the front on the anticyclonic side of the polar jet. For the summer case (Fig. 4), the classifier shows the typical stratocumulus found within the boundary layer over the eastern north Pacific. The middle and high clouds in the northwest corner are associated with an approaching trough.