An automated neural network cloud classifier that functions over both land and ocean backgrounds is presented. Motivated by the development of a combined visible, infrared, and microwave rain-rate retrieval algorithm for use with data from the 1997 Tropical Rainfall Measuring Mission (TRMM), an automated cloud classification technique is sought to discern different types of clouds and, hence, different types of precipitating systems from Advanced Very High Resolution Radiometer (AVHRR) type imagery. When this technique is applied to TRMM visible–infrared imagery, it will allow the choice of a passive microwave rain-rate algorithm, which performs well for the observed precipitation type, theoretically increasing accuracy at the instantaneous level when compared with the use of any single microwave algorithm. A neural network classifier, selected because of the strengths of neural networks with respect to within-class variability and nonnormal cluster distributions, is developed, trained, and tested on AVHRR data received from three different polar-orbiting satellites and spanning the continental United States and adjacent waters, as well as portions of the Tropics from the Tropical Ocean and Global Atmosphere Coupled Ocean–Atmosphere Response Experiment (TOGA COARE). The results are analyzed and suggestions are made for future work on this technique. The network selected the correct class for 96% of the training samples and 82% of the test samples, indicating that this type of approach to automated cloud classification holds considerable promise and is worthy of additional research and refinement.
Of the numerous entities involved in the earth’s climate, clouds and precipitation are among the most important. The latent heat released in the formation of clouds and precipitation is a major player in both local and large-scale atmospheric circulation, and this importance is mirrored in the development and operation of atmospheric models. Clouds also play a major role in the earth’s radiation budget, reflecting solar shortwave radiation back into space and trapping the earth’s own emitted infrared radiation. Precipitation is an integral part of local and global hydrologic cycles. Clouds are the subject of this paper; precipitation will be discussed briefly here, as its measurement has provided much of the motivation for the current study.
In 1997, the National Aeronautics and Space Administration will launch the Tropical Rainfall Measuring Mission (TRMM) satellite. TRMM’s primary mission objective will be to provide the data for a 3-yr time series of monthly averaged rainfall over the Tropics in 5° × 5° bins (Simpson et al. 1988). Toward this end, TRMM will be equipped with three primary instruments—a visible–infrared radiometer, a passive microwave radiometer, and a precipitation radar. The availability of concurrent data from these three sensors will facilitate a number of new combined algorithms for estimating rainfall from space. Any such algorithm should make use of the strengths of each sensor’s electromagnetic regime with respect to remote sensing of precipitating systems.
The following rationale is presented for a combined algorithm. There are a variety of passive microwave rain-rate retrieval methods available in the scientific community, such as Grody (1991), Wilheit et al. (1991), Adler et al. (1993), Bauer and Schluessel (1993), Liu and Curry (1992), Ferraro et al. (1994), Ferriday and Avery (1994), Kummerow and Giglio (1994), and Petty (1994). Some methods work only over oceans, some require the presence of an ice layer overlying precipitation, some are developed using models, some are empirical, and so on. Each algorithm performs differently depending partially upon the type of precipitation it is attempting to measure, and no one algorithm has emerged from various intercomparisons, such as Arkin and Xie (1994) and Ebert (1996), as a clear choice for a global algorithm. Additionally, it is noted that visible–infrared algorithms for measuring rain rate, such as those detailed in Wu et al. (1985), Arkin and Meisner (1987), and Adler and Negri (1988), are less physically direct than passive microwave algorithms, and hence visible–infrared methods are intrinsically less precise than microwave techniques under similar sampling conditions. Visible and infrared sensors detect more information about clouds than about rain. The following strategy is therefore proposed for TRMM: use the visible and infrared data to perform a cloud classification that effectively separates the imagery into different types of precipitating systems and then calculate the rain rate using a passive microwave technique that is best suited for the type of precipitating system observed. The radar data would be used as a means of validation at nadir. This approach would theoretically provide a higher accuracy at the instantaneous level than would a single microwave algorithm.
The rest of this paper summarizes a cloud classifier that has been developed with the above application in mind. An automated cloud classifier is sought to function over a wide range of imagery, including different types of backgrounds, different atmospheric temperature profiles, and varying scene geometry. A neural network was chosen for this task, with a front-end cloud filter designed to work over both land and water surfaces. The classifier was trained on data originating from regions across the continental United States and adjacent waters, and it was tested on similar data, along with imagery from the Tropical Ocean and Global Atmosphere Coupled Ocean–Atmosphere Response Experiment (TOGA COARE). The results and some analysis are both presented in this paper. In section 2, the data used in this development are described. Sections 3 and 4 describe the cloud filter and neural network components of the cloud classifier, respectively. Section 5 describes the choices of input features and output classes, and section 6 details the training of the network. The classifier’s performance for several independent test images is analyzed in section 7, and some conclusions and suggestions for improvement are given in section 8. It should be noted that generalization for season and latitude dependencies and for nighttime classification has not been implemented. Both issues will be addressed in the conclusions.
The cloud classifier, Cloud Automated Neural Network (CANN), was designed to operate on imagery from the Advanced Very High Resolution Radiometer (AVHRR) and, as currently structured, the CANN requires data from four of the five channels of the AVHRR, as well as the viewing geometry. The AVHRR channels used are summarized in Table 1. An operational AVHRR is currently carried on two National Oceanic and Atmospheric Administration (NOAA) satellites, NOAA-12 and NOAA-14, which are in sun-synchronous polar orbits with an inclination of approximately 98.9° and an altitude on the order of 850 km. The ground swath of the AVHRR is roughly 2700 km. AVHRR data covering the continental United States are received by an automated antenna system run by the Colorado Center for Astrodynamics Research at the University of Colorado (CU-CCAR). This antenna receives high resolution picture transmission formatted data from the operational NOAA polar orbiters. These data are at full resolution, nominally 1.1 km at nadir. All of the data used to train the CANN originated from the AVHRR on the NOAA-11 satellite, which is no longer operational. Data from NOAA-11, NOAA-12, and NOAA-14 were used in the test cases discussed in section 7; the NOAA-12 data originated from the TOGA COARE dataset; however, they are in an identical format to that of the data received by CU-CCAR. The NOAA-14 data were received by CU-CCAR as described above.
Each desired satellite pass at CU-CCAR is received, downloaded, and archived to 8-mm tape. Calibration and navigation software developed and maintained at CU-CCAR is used to extract albedos and brightness temperatures for georegistered images as subsets of the passes. The accuracy in the calibration of the AVHRR is a subject of some debate. The reflective sensors were calibrated prior to launch, and attempts to arrive at assessments of the accuracy of AVHRR-derived albedos are often quite involved, as issues such as sensor degradation play a major role (see, e.g., Vermote and Kaufman 1995). For current purposes, the albedos were being used as inputs to a classifier, meaning absolute accuracy was not strictly required so much as self-consistency; the typical values observed for a given cloud type seemed to remain quite consistent for different images from the same satellite or even from different satellites. The total rms error in the thermal brightness temperatures is assumed here to be approximately 0.55 K, as taken from Weinreb et al. (1990), although transient biases associated with terminator crossing, as high as 1 K in channel 3, have also been reported in Steyn-Ross et al. (1992). These transient biases are not considered to be important for the current study, as all of the imagery here is far removed from terminator crossing. The navigation here is accurate to the order of 3 km or less. An accuracy of 1 km is attainable using spacecraft attitude and time correction, as detailed in Rosborough et al. (1994); these methods were not applied for the current study, as the cloud classification was carried out on a scale of 32 km. The CANN operates on 32 × 32 pixel regions in AVHRR imagery. The reasoning behind selecting this region size will be presented in section 5.
The front-end cloud filter, however, functions on individual AVHRR pixels by labeling each one as either cloudy or clear. No attempt is made to estimate cloud fraction within a single AVHRR pixel. The output of the cloud filter is thus a full-resolution image of zeroes and ones. The filter proceeds as follows. First, the normalized difference vegetation index (NDVI) is determined for each pixel:
where ch1 and ch2 are the respective reflectances in channels 1 and 2. The NDVI is normally used to discern the presence, density, and health of vegetation. However, it also has some utility as a cloud discriminator; clouds typically show little variation in albedo between the wavelengths of channels 1 and 2 for the AVHRR, so the NDVI for clouds is often near zero. In contrast, even barren land typically exhibits NDVI values notably higher than zero, while water bodies generally have NDVI values substantially lower than zero. There is some degree of overlap, however, for both cloud–land and cloud–water. The cloud filter for the CANN considers all pixels with NDVI values less than zero to be either cloud or water, and all pixels with NDVI values greater than or equal to zero are considered to be either cloud or land. For the cloud–water case, an albedo threshold is then used to determine whether the pixel in question should be classified as cloud or water. This albedo is corrected for the false assumption of an overhead sun by dividing by the cosine of the solar zenith angle, and then it is adjusted by a bidirectional reflectance model for clear skies over ocean generated from Suttles et al. (1988). The model is used to reduce the number of instances where sun glint is falsely identified as cloud. A channel 1 albedo threshold of 16% was chosen here, based upon manual inspection and trial-and-error adjustment using a set of AVHRR images separate from the training set. For the cloud–land case, a brightness temperature threshold of 295 K is used to distinguish cloud from land. This value was also chosen based upon manual inspection and adjustment.
Solid thresholds and an NDVI filter were chosen here for simplicity and to provide a completely self-contained AVHRR cloud classifier. In an operational version of this classifier, the use of ancillary information such as climatologies, land–water databases, and data from numerical weather prediction analyses would probably be preferable to this self-contained approach. Surface albedo and brightness temperature thresholds can vary significantly with changes in season and latitude so that numerical weather predicition analyses of surface temperature would be considerably more robust than a simple threshold. A preexisting land–water database would be less subject to the errors inherent in the NDVI.
Neural network classifier
Once all the pixels in an image have been assigned a status of either cloudy or clear, the image is divided into regions with 32 pixels on a side for the classification process. A neural network was chosen for the clustering technique. The classification method described by Garand (1988) was quite successful on its own and has seen widespread application; however, the topic of neural network cloud classification is fairly new, and the authors felt this approach worthy of investigation. Neural networks have a theoretical advantage over the more standard maximum likelihood and clustering techniques in their ability to handle within-class variability and nonnormal cluster distributions, and this has sometimes been seen in practice as well (see, e.g., Key et al. 1989). This robustness is helpful for any global cloud classification technique. “Global” here is taken to mean ranging over all longitudes for temperate and tropical latitudes. Polar regions, such as those seen in the cloud classification technique developed by Ebert (1987), present enough problems to effectively preclude their combination with lower latitudes for a single cloud classifier, due in large part to the frequent temperature inversions and the great abundance of ice and snow at high latitudes. Examples of existing neural network cloud classifiers can be found in Key et al. (1989), Lee et al. (1990), and Bankert (1994); while all three techniques can be considered successful to some degree, no attempt was made to test them globally, which is one of the goals of the current work.
Neural networks are an attempt to make computers function, on a basic level, more like the human brain. Abstract objects called nodes, which correspond to human neurons, are formulated within a programming structure, as shown in Fig. 1. The network shown is a three-layer, feed-forward neural network. The nodes in the input layer correspond to input features; for example, the first node might contain an albedo value, the second node might contain thermal brightness temperature, and so on. These values were all normalized between zero and one for the CANN. The values of the input features are combined to produce different values in the nodes of the hidden layer, where the network forms its internal representation of the patterns observed. These values are subsequently fed into the output nodes, each of which corresponds to an output class such as stratus or cumulus. The output values are also normalized. A properly functioning network will then take as inputs some set of feature values corresponding to a region of cumulus clouds and produce a high value in the output node representing the cumulus class, along with low values in all the other output nodes. Each hidden or output node in the network receives a “signal” from each node in a previous layer; these signals are summed and fed into an activation function to produce a single output signal for that node. The logistic function was chosen as the activation function for all the nodes in the CANN; it is given by
and it is quite effective at compressing extreme input values while retaining good dynamic behavior for moderate inputs. More importantly, it has a very simple derivative,
The importance of this simple relation will be discussed momentarily. It is the nonlinearity of activation functions such as (2) that allows neural networks to be useful in a number of different applications.
The ability of a neural network to learn input–output relationships is embodied in the connections between nodes in successive layers. Each connection has an associated weight, which can be either positive or negative. This weight is multiplied by the signal traveling along the connection from one node to another, and this product is fed into the activation function, along with similar products for all the other connections leading into the node in question. The weights in a neural network can be thought of as a large array of independent variables that can be adjusted, and the output error, typically the total mean squared error of the output node activations with respect to the desired output, can be thought of as a cost function. The training of a neural network is therefore identical to a nonlinear optimization problem. It was shown in Rumelhart et al. (1986) that the gradient of the mean square error is related to the values of the weights themselves through the values of the activation function and its derivative at each node. This is why the simplicity of (3) is so useful; if the value of the activation function is known, so is the value of its derivative. This greatly aids the computation process. The process of using the mean square error for a network to work backward and adjusting the weights so that the network can produce a smaller error is known as error backpropagation. The CANN uses the conjugate gradient technique, described in Johansson et al. (1992), for error backpropagation.
A final note on the operation of neural networks is that one must first get close to a solution before it can be found through error backpropagation. The sheer number of weights in the typical network allows the existence of numerous local minima on the error surface. To find the true global minimum, the CANN first uses the method of simulated annealing; good discussions of this technique can be found in Press et al. (1989) and Masters (1993). It proceeds as follows: random weights with a fairly large spread in values are generated 10000 times. The set of weights with the lowest error is then used as the center for the generation of a new succession of random weights with a smaller spread. This process is carried out five times to ensure that a useful minimum will be found. The method of conjugate gradients is then utilized on the set of weights that produces the lowest overall error during simulated annealing.
Selection of inputs and outputs
A number of candidates were considered as inputs; a final set of 11 was chosen. Each feature is generated for a 32 × 32 region of AVHRR pixels. There were a number of factors considered before choosing this particular region size. It is important to maintain a balance in the region size between good statistics, which requires a large number of pixels and good separation of classes, which places both lower and upper limits on the region size. Region sizes that are a power of two are often useful for certain types of features, such as Fourier spectral methods or wavelet transforms. While features of this type were not used in the CANN, the authors chose to leave this option open. Starting with this constraint, 16 × 16 regions were considered borderline in terms of good statistics; 32 × 32 regions are more reliable. For AVHRR imagery, this corresponds to regions approximately 32 km on a side. Garand (1988) used regions 128 km on a side, but at lower pixel resolution; the infrared image regions in that study each contained 32 × 32 pixels, with the visible image regions containing 64 × 64 pixels. Manual inspection of the available imagery for the current study made it clear that 128 km was too large, as most of the regions, particularly in areas of convective activity, would need to be classified as “mixed” at that scale. Even 64-km regions seemed to suffer from this effect; 32-km regions offered substantial improvement. Finally, a number of studies, including Gu et al. (1989), Gu et al. (1991), and Ebert (1987), have achieved some degree of success using 32-km classification regions.
where Nc is the number of cloudy pixels in the region and m is the total number of pixels in the region. This parameter is interesting on its own, but it is also helpful in distinguishing clouds that have a large horizontal extent. The albedo feature (ALB) is the average of the mean and maximum albedos observed for all the cloudy pixels in the region,
where α1 is the albedo in channel 1 for a given pixel and max (α1) is the highest channel 1 albedo found in a given region. This feature distinguishes brighter and, hence, thicker clouds. The cloud-top temperature (CTT) is simply the minimum brightness temperature observed in the region,
where Tb4 is the pixel brightness temperature in channel 4. No other measure of cloud height showed significant improvement over CTT. The effective droplet radius (EDR) is generated from channels 1 and 3; it utilizes the parameterization developed in Slingo (1989) relating albedos from visible and near-infrared radiance measurements to optical depth and droplet radius within water clouds. The parameterization involved an iterative process; this was replaced with simple exponential functions, which were obtained from graphical analysis of the parameterization’s behavior for channels 1 and 3. The EDR for a given pixel was then obtained as
where α3 is the albedo in channel 3, estimated by subtracting the channel 3 emissive radiance for an object at the temperature given by channel 4 from the measured channel 3 radiance. EDR for the region was then simply calculated as
This feature is by no means intended to serve as a true measure of EDR; the errors and approximations involved in its calculation are numerous. Still, it does prove useful as an input feature for the purposes presented here, and its values tend to be in the neighborhood of what one would expect for the effective radius parameter.
Two connectivity features were borrowed from Garand (1988), namely, background connectivity (BGC) and cloud connectivity (CLC). To calculate these two features, all of the separate cloud and/or background elements in the region are isolated. Two cloud pixels are considered to be part of the same cloud if they are horizontally or vertically adjacent. A similar rule is used for background elements. CLC is calculated from
where n(k) represents the number of pixels in the kth cloud element, m is the number of pixels in the region (1024 here), and k is the smallest integer such that
Similarly, BGC is calculated as
where nb(kb) represents the number of background pixels in the kbth background element and kb is the smallest integer such that
For a better description of texture, two channel 1 gray-level difference features are generated as inputs to the CANN. These features were first introduced in Haralick et al. (1973), and the reader is directed to that work for a detailed description of their mathematical derivation; a portion of that description is presented here. For a given image region, we have Nx pixels in the x direction and Ny pixels in the y direction. In this case, Nx = Ny = 32. Let Lx = (1, 2, . . . , Nx) be the x spatial domain, Ly = (1, 2, . . . , Ny) be the y spatial domain, and G = (1, 2, . . . , Ng) be the set of quantized gray levels; in this case, Ng = 256. The set Ly × Lx is the image region itself. The image I can be represented as a function that assigns some value in G to each pixel in the region, such as I: Ly × Lx → G. We can now construct angular gray-tone spatial-dependence matrices P(i, j, d, θ) such that Pij contains the number of pixels separated by distance d at an angle θ, one with gray level i and the other with gray level j. The angle θ is defined as 0° for pixels on the same line, 45° for pixels along a lower left to upper right diagonal, 90° for pixels in the same column, and 135° for pixels along an upper left to lower right diagonal. Thus, for example
where the pound sign denotes the number of elements in the set. These matrices are symmetric, and their elements can be summed in various manners to produce mathematical descriptions of texture within an image region. Two such descriptions were used here, labeled as the CON and HOM features. Both features are averages over the four angles and always for a distance d = 1. CON measures the amount of contrast or local variations present in an image region and is calculated as
where θp = (0°, 45°, 90°, 135°), N(θp) is the number of pairs of pixels in the image region separated by distance 1 in the direction θp, and h(q, θp) is a histogram function, first introduced in Weszka et al. (1976), indicating the number of times that the difference n occurs between the gray levels i and j in pixels separated by distance 1 and in the direction θp:
HOM is somewhat opposite to CON, although not entirely; HOM represents the homogeneity across an image region and is calculated from
The final three channel 4 features used as inputs to the CANN are the low etage fraction (LEF), the middle etage fraction (MEF), and the high etage fraction (HEF):
where NL, NM, and NH are the number of cloudy pixels in the low, middle, and high levels in the atmosphere, respectively. The levels are defined from the values given in WMO (1956) for temperate latitudes, with the standard atmosphere defined by the International Civil Aviation Organization, given in Iribarne and Godson (1981), being used as the relation between height and temperature so that cloud-top height, and hence level, can be determined from channel 4 thermal brightness temperatures in AVHRR imagery. Again, making these features dynamic by altering the temperature–height relationship based on numerical weather predicition analyses would provide a substantial increase in generality.
Output cloud classes
The issue of choosing outputs for any classification process is not a trivial one. For the purposes here, we have tried to adhere as closely to ground-based cloud types as possible. The need to separate different types of precipitating systems was also a motivating factor. In the end, the nine classes shown in Table 4 were selected. Stratus (St) and stratocumulus (Sc) were chosen as separate classes to test the textural capabilities of the CANN. Cumulus (Cu) and cumulus congestus (Cg) are separated because the probability of substantial precipitation is higher with the latter. The three midlevel classes, altostratus, altocumulus, and nimbostratus, are confined to one class (MD) because of the difficulty in truly distinguishing the three. Some sources, such as Dvorak and Smigielski (1990), indicate that the individual cells in altocumulus cannot be discerned at 1-km resolution, while others, such as Ebert (1987), do make a distinction between altostratus and altocumulus. Further, nimbostratus is often simply separated from the other types by an albedo threshold, while in truth the only way to distinguish nimbostratus is to verify the presence of surface rainfall. Cumulonimbus (Cb) is separated from anvil cirrostratus (Cs) by more surface roughness and slightly colder cloud-top temperatures for the former. The cirrus class (Ci) contains broken and wispy occurrences of cirrus cloud with no other cloud present. As a global neural network classifier is a relatively new attempt, for simplicity, a separate multilayer class was not included. Multilayers are defined here as instances where cirriform cloud was obscuring some other cloud underneath (with parts of the lower cloud being visible through thinner portions of the cirrus) or cases where cirrus and another clearly visible cloud type were both present in different horizontal locations. These cases are grouped together into the Cs class, as there is a continuum between cirrostratus and cirrus cloud with multilayers. A region is classified as having no cloud (NC) if less than 5% of the pixels are determined to be cloudy by the cloud filter; thus, the CANN does not include an output node for NC. Sample regions for each of the cloud types are shown in Fig. 2 for channels 1, 3, and 4 of the AVHRR.
With 11 inputs and nine outputs, the final choice for the network architecture was to determine an appropriate number of hidden nodes. Once the total number of nodes is set, so is the number of adjustable weights in the network. This decision can generally only be made through trial and error, as no single rule applies in all cases. Beginning with a small number of hidden nodes, a network such as the CANN will typically show increased accuracy on its training set with each additional hidden node until a certain threshold number is reached, at which point the overall error will decrease much more slowly. The network appeared to achieve an optimal balance between overfitting, where too many weights cause the network to become too focused upon its training set, and “underfitting,” where too few weights are present to make efficient use of the input features and their combinations with one another, with eight hidden nodes, bringing the total number of weights in the network, including those on the biases, to 177. A good topic for future study would be to rigorously investigate which features could be combined or eliminated from the set of 11 used here because 177 weights demand an enormous training set if one is to thoroughly eliminate the problem of overfitting. Limits imposed by available resources allowed only 322 samples to be extracted for the CANN’s training set, as discussed in section 6. This virtually guarantees some degree of overfitting, which may explain some of the problems that the CANN encountered. It would be preferable to have at least 100 samples for each class, or 10 times as many training samples as weights. Still, in light of this shortcoming, the CANN’s overall performance on the test images is favorable.
For the training process, 57 summer afternoon images spanning the continental United States and adjacent waters were generated and used as a source of 322 training samples for the various cloud types. These samples were subjectively classified by the primary author. The time period was restricted to June through August 1993; the solar zenith angles for dates beyond this range became increasingly high, compromising the effectiveness of the reflective features. Issues concerning the absence of reflective features for nighttime data will be discussed in the conclusion. Each of the images in the training set was 512 pixels on a side, with a resolution of approximately 1.1 km per pixel. The images were broken into 256 square regions, each with 32 pixels on a side, for the extraction of samples. Some classes appeared more often than others, and these classes had more representative regions in the training set, as the larger number of occurrences for such classes also allowed for a larger number of possible within-class variations. An attempt was made to ensure the samples were not restricted to “typical” situations, as neural networks require samples near the decision boundary to maximize their accuracy in borderline cases. Figure 3 illustrates how the input features related to the output classes for the training set. ALB behaves much as one would expect, with generally the highest values for cumulonimbus, midlevel, and cirrostratus clouds. Cirrus exhibits very low albedos relative to the other classes. Note that some of the values of ALB exceed 100%; these are generally associated with cumulonimbus clouds. Recall that the albedos are corrected for the solar zenith angle by dividing by its cosine; cumulonimbus clouds often have very large portions tilted away from horizontal, so that in some instances the radiation arriving at a satellite sensor will have been reflected from the sunward side of a cumulonimbus tower, effectively lowering the solar zenith angle and causing our correction to overcompensate. Since this typically occurred only for cumulonimbus clouds, these occurrences were simply reduced to 100% before they were fed into the CANN. The CTT feature distinguishes the different levels quite well, although the results are a bit fuzzy for cirriform classes; it should be noted that the thermal brightness temperature is often quite removed from the physical temperature for cirrus clouds, weakening the relation to cloud height.
The performance of the EDR feature is interesting. This feature also lacks physical meaning for cirriform clouds, and the effects of ice in cumulonimbus clouds must also be kept in mind. The range of EDR seen in Fig. 3 is on the right scale for an effective droplet radius; however, there are some discrepancies with what might be expected, reaffirming that this should not be interpreted as a highly accurate measure of effective radius. Mason (1971) includes a rather thorough discussion of droplet size distributions within various cloud types. He focuses on average droplet radius, which is generally about two-thirds of the effective radius, the latter placing a weight on the effective cross section of the droplets. Table 5 is adapted from Mason (1971) for comparison with the EDR values shown in Fig. 3. Maritime cumulus clouds will tend to have higher droplet radii because there are fewer particles competing to become condensation nuclei over oceans than over land. The imagery used in the training process here was dominated by continental cumulus, however, which is one explanation for the lower EDR values seen for Cu. The only major discrepancy in the EDR feature from what is observed in Table 5 occurs for Cg, where the EDR actually seems a bit lower than for Cu, yet the value for congestus clouds in Table 5 is 24 μm. There is no obvious reason for this discrepancy, although there are a number of possibilities. As mentioned earlier, for example, the issue of what any class truly represents is worthy of more investigation, as there may not be a complete match between Cg and the traditional definition of Cg. It is assumed for the present that ice is not a factor in the discrepancy, as the presence of a large amount of ice would preclude labeling of the cloud as Cg. The measurements themselves are also of different types; EDR was generated from area averages on the scale of 1000 km2 and based upon satellite radiance measurements, while the values shown in Table 5 were generated from various aircraft in situ measurements.
The connectivity features show similar behavior to one another; however, their linear correlation from Table 3 is only −0.30. They show some utility for separating the low-level stratiform classes, but the variation is rather high, particularly for stratocumulus. Cumulonimbus and cirrostratus show a high degree of cloud connectivity, and cumulus and cirriform classes show high background connectivity. These observations do make sense, as cumulonimbus and cirrostratus are generally massive complexes of connected cloud, while scattered cumulus and cirriform clouds in general do not have a great number of holes in them. CON clearly singles out cirriform clouds, and cumulonimbus and midlevel clouds appear to exhibit the lowest degree of homogeneity according to the plot of HOM.
The CANN was trained multiple times, each time with a different random number seed, before it was considered to have been successfully trained. The CANN chose the correct class for 96% of the samples. This was encouraging, as the training regions spanned a very wide range of clouds and locations, including both land and water backgrounds and varying scene geometry. Again, there was undoubtedly a certain degree of overfitting involved with the training process, as indicated by some of the results in section 7. Backward adjustment of the number of nodes in the CANN based on this phenomenon could have possibly reduced this problem, but this would have essentially rendered the test images as part of the training set, creating a new test set. As a result, the performance of the CANN on the test imagery as initially trained is presented in section 7 for the reader to evaluate.
Five additional AVHRR images, listed in Table 6, were generated for testing of the CANN. Images 1 and 2 were selected to test the network in situations similar to those it had “seen” during training, but for data from a year later (1994). Images 3 and 4 were chosen to test the network for more tropical regions and in the morning instead of the afternoon. Image 5 provided a winter test case. The CANN should not have difficulty jumping between different satellites, as it relies entirely upon physical parameters, whose end state should be independent of the platform so long as the calibration coefficients are properly provided. The latest coefficients provided by NOAA were used for all of the training and test imagery. In all, the test images provided 1280 samples. Additionally, they provided a realistic application of the classifier, as opposed to selecting a group of “representative” samples from a multitude of imagery. The primary author acted as the human classifier to which the CANN was compared. Since this person also selected and labeled the training imagery, the test imagery provided a venue in which the CANN’s ability to mimic its trainer could be assessed. The five test images and their CANN classifications are shown in Figs. 4 through 8.
The results for image 1 are summarized in Table 7. This is the confusion matrix for the two classifiers, manual and automated, tabulated in numbers of 32 × 32 regions. High numbers in the diagonal are desired. Some confusion clearly exists between St and Sc for the CANN. These two classes are very similar to one another, with the only major differences being that ST is more textured and slightly less bright on average than Sc. These two classes have a high propensity to transform from one to the other as well and, particularly in marine environments, they are almost always found together. In general, St near a coast often transforms into Sc farther out over the ocean, which in turn eventually scatters into Cu. Figure 4 illustrates some of this behavior. The only other problem evident for image 1 is the noticeable number of cloud regions, both Sc and Cu, which were labeled as NC by the CANN. This is entirely independent of the neural network itself, as the determination of which pixels are cloudy is made before the network ever sees the data. While some number of borderline cases such as these will always be misclassified, an improvement in the performance of the cloud filter would certainly be expected if surface temperatures from numerical weather prediction models were used in place of the NDVI, albedo, and brightness temperature used here. Some degree of overfitting was likely another factor in the current results. The correct class was chosen 84% of the time for image 1.
Table 8 shows the results for image 2, shown in Fig. 5. The only notable discrepancy here seems to occur with Cb regions (as defined by the manual classification), which were labeled as Cs by the classifier. The exact line between these two classes is a very fuzzy one; there is a continuum of gradually increasing albedo and texture going from cirrostratus through anvil cirrus into overshooting cumulonimbus cores. Defining boundaries such as this one and the one between St and Sc is key to a successful classifier, and any operational classifier would probably require rigorous training and testing by more than one nephanalysis expert to ensure that performance at the boundaries was meaningful and accurate. In the work presented here, the trainer’s experience with nephanalysis is both limited and biased on its own. The CANN still performed fairly well overall despite this limitation, selecting the correct class in 87% of the cases.
Table 9 shows the results for image 3, which is reproduced in Fig. 6. This image produced the most difficulty for the CANN. In particular, this was the only test image with a substantial amount of midlevel cloud, and a large portion of that class was misclassified into one of the cirriform classes, probably due in part to a change in brightness temperature of the midlevel clouds. The midlevel clouds in the training set were typically displaying brightness temperatures from 260 to 270 K. The midlevel cloud in image 3, manually identified as such because of the larger patterns and the texture in the visible channel combined with indications of minimal ice content from channel 3, was often dominated by brightness temperatures from 250 to 260 K. This type of problem can be approached in one and/or two ways. One might add more training data to include a wider range of brightness temperatures and reduce the effects of overfitting, and/or one might utilize temperature profiles from climatologies or numerical models to generate a more robust relationship between cloud-top temperature and height. The CANN selected the correct class 73% of the time for image 3. Some of the problems encountered in image 3 might also be attributable to differing viewing geometry; image 3 is a morning image from the Tropics, while the training data were dominated by afternoon imagery at temperate latitudes.
The performance for image 4 is summarized in Table 10, and the image itself is shown in Fig. 7. This was the most successful test case, with 90% correct classification. This is not surprising, as the CANN’s toughest problems, midlevel cloud and St–Sc occurrences, are largely absent from image 4. Here as elsewhere, Cs was the dominant class because of two factors. First, thick cirrostratus is a very common cloud type. Second, bright cirrus over some other type of cloud is also very common, and these instances were also classified as cirrostratus; in truth, it would be impossible to draw an objective line between the two types using visible and infrared data alone. It is encouraging that the CANN was able to recognize these highly varied samples as the same class, although this may also have somewhat compromised the CANN’s ability to recognize midlevel cloud in the process.
The performance for the final test image is shown in Table 11, and the corresponding image is shown in Fig. 8. This particular image originated from NOAA-14, which flies over the continental United States closer to local noon than the typical NOAA afternoon pass. This allows the use of visible imagery year round, meaning that winter cases can be investigated. Image 5 is from January 1995, in the early afternoon over the Pacific Ocean near the coast of Baja California. The St–Sc problem is clearly illustrated here once again, but the remaining types are, for the most part, successfully classified. The CANN chose the correct class for 77% of the samples in image 5.
Table 12 summarizes the overall performance of the classifier for the five test images. The CANN selected the correct class 82% of the time for the full test set, which indicates that the concept of a global neural network cloud classifier holds some promise, even given the limitations of the CANN, such as the generalization problems for thresholds and cloud levels and the limited scope of the training set. These results suggest that the development of a more robust and detailed version of the CANN with a much broader training set and a more robust treatment of temperature profiles is a worthwhile pursuit. For a more detailed description of the development of the CANN, the reader is directed to Miller (1996).
An automated neural network cloud classifier has been developed and presented here. The results of applying the classifier to five independent test images indicate that even a moderately well-trained classifier with rigid cloud filtering thresholds can provide correct classifications over a wide range of imagery. This holds promise for further development of classifiers such as the CANN. Any such development will need to address several key issues. First, the thresholds involved in the cloud filter should be replaced with surface temperatures from climatologies or numerical models. The brightness temperature threshold between land and cloud cannot remain constant for both winter and summer imagery; land-surface temperature was observed to drop substantially in the winter imagery we have seen, and the presence of snow provides an additional challenge. Normally, snow can be separated from clouds by the co-occurence of low values in channel 3, indicating possible ice, and relatively warm values in the thermal infrared, as ice clouds primarily occupy only the highest levels of the troposphere. Stratus clouds, however, exhibit much the same behavior as snow in these two channels. More accurate treatment of surface temperatures would allow better elimination of snow. It may also be worthwhile to consider snow as another class and allow the network itself to distinguish it from clouds. There was not a sufficient number of good samples of snow to attempt this approach here, although in parallel with that, there was a low number of instances where snow occurred in the test imagery.
Temperature profiles either from climatologies or numerical weather prediction models would also allow more robust definition of the cloud levels used to generate the LEF, MEF, and HEF features. Additionally, the manual classifications would be improved by more accurate information concerning cloud-top temperature versus height.
To reduce the effects of overfitting, a larger number of training samples with a wider amount of variation should be assembled and a rigorous effort to pare down the number of inputs should be conducted. A general rule of thumb for training neural networks is to have at least 10 times as many training samples as weights; for our study, the two numbers were on the same order of magnitude, which undoubtedly was one of the reasons for the CANN’s difficulty in distinguishing certain classes. Available resources simply did not allow as large a training set as desired, yet the CANN’s overall good performance is in fact highlighted by this additional limitation. We submit that if the CANN can achieve the degree of accuracy presented here, a classifier more thoroughly trained and with a more adaptive cloud filtering technique should be able to rival human-assisted classifiers on any imagery from low and middle latitudes.
The features themselves should also be more thoroughly considered. Sensitivity tests conducted by using overlapping instead of rigid region boundaries would provide insight on how the features react to spatial perturbations and whether the 32 × 32 scale is truly optimal. The relationships among the features might also be broken down into eigenvectors and eigenvalues to help determine which features are more important; as mentioned above, the fewer inputs the network receives, the fewer the weights that need to be trained and the smaller the effects of overfitting for a given training set become. This would also make an investigation of the transfer functions between individual inputs and outputs easier, allowing a glimpse of the network’s internal “machinery.”
A final but important note is that the CANN relies heavily on reflective data for discernment of cloud types. At first glance, it seems there may be no way to classify the cloud types presented here from infrared data alone; visible texture and brightness information are often crucial. The initial motivation for the CANN originated with measurement of rainfall on a global scale; however, it is important to address the issue of nighttime imagery. The reflective information from channels 1, 2, and 3 of the AVHRR or its TRMM counterpart is lost in these situations, meaning that some sort of reduced form of the CANN will be needed for the combined rain-rate algorithm. The way to approach this challenge probably lies along the lines of new spatial features generated from the thermal channels. The spectral combination of channels 4 and 5 can provide information at the pixel level about cirrus clouds, but only when they are very thin. Gray-level difference features such as CON and HOM could be generated for the thermal channels as well, although thermal infrared texture is never as distinct as visible texture. Differences in thermal brightness temperature require substantial changes in height, while differences in visible albedo can be effected by mere changes in orientation of the cloud-top “surface.” Still, there are more sophisticated measures of texture available, as well as new means of isolating shapes and patterns, which could prove invaluable for a cloud classifier operating on thermal imagery alone. This is another topic we deem worthy of more attention in future studies.
The authors would like to thank Dan Baldwin and Chuck Fowler for their helpful input in interpreting the imagery used in this study, Phil Mislinski for providing access to the AVHRR data from TOGA COARE, and Phil Arkin for supplying helpful comments in the revision of this paper. Thanks must also go to Jeff Forbes for providing much-needed funding during an important phase of the project. The majority of the work presented here was conducted under a subcontract with The University of Arizona, under NASA Goddard Contract NAG5-1642. Additional funding via Jeff Forbes originated with the National Science Foundation Grant ATM-9415874.
Corresponding author address: Dr. Shawn W. Miller, Dept. of Meteorology, University of Maryland at College Park, College Park, MD 20742-2425.