Classification of Cloud Particle Imagery from Aircraft Platforms Using Convolutional Neural Networks

Vanessa M. Przybylo aUniversity at Albany, State University of New York, Albany, New York

Search for other papers by Vanessa M. Przybylo in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-4380-3543
,
Kara J. Sulia aUniversity at Albany, State University of New York, Albany, New York

Search for other papers by Kara J. Sulia in
Current site
Google Scholar
PubMed
Close
,
Carl G. Schmitt bNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Carl G. Schmitt in
Current site
Google Scholar
PubMed
Close
, and
Zachary J. Lebo cUniversity of Wyoming, Laramie, Wyoming

Search for other papers by Zachary J. Lebo in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

A vast amount of ice crystal imagery exists from a variety of field campaign initiatives that can be utilized for cloud microphysical research. Here, nine convolutional neural networks are used to classify particles into nine regimes on over 10 million images from the Cloud Particle Imager probe, including liquid and frozen states and particles with evidence of riming. A transfer learning approach proves that the Visual Geometry Group (VGG-16) network best classifies imagery with respect to multiple performance metrics. Classification accuracies on a validation dataset reach 97% and surpass traditional automated classification. Furthermore, after initial model training and preprocessing, 10 000 images can be classified in approximately 35 s using 20 central processing unit cores and two graphics processing units, which reaches real-time classification capabilities. Statistical analysis of the classified images indicates that a large portion (57%) of the dataset is unusable, meaning the images are too blurry or represent indistinguishable small fragments. In addition, 19% of the dataset is classified as liquid drops. After removal of fragments, blurry images, and cloud drops, 38% of the remaining ice particles are largely intersecting the image border (≥10% cutoff) and therefore are considered unusable because of the inability to properly classify and dimensionalize. After this filtering, an unprecedented database of 1 560 364 images across all campaigns is available for parameter extraction and bulk statistics on specific particle types in a wide variety of storm systems, which can act to improve the current state of microphysical parameterizations.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Schmitt’s current affiliation: University of Alaska Fairbanks, Fairbanks, Alaska.

Corresponding author: Vanessa Przybylo, vprzybylo@albany.edu

Abstract

A vast amount of ice crystal imagery exists from a variety of field campaign initiatives that can be utilized for cloud microphysical research. Here, nine convolutional neural networks are used to classify particles into nine regimes on over 10 million images from the Cloud Particle Imager probe, including liquid and frozen states and particles with evidence of riming. A transfer learning approach proves that the Visual Geometry Group (VGG-16) network best classifies imagery with respect to multiple performance metrics. Classification accuracies on a validation dataset reach 97% and surpass traditional automated classification. Furthermore, after initial model training and preprocessing, 10 000 images can be classified in approximately 35 s using 20 central processing unit cores and two graphics processing units, which reaches real-time classification capabilities. Statistical analysis of the classified images indicates that a large portion (57%) of the dataset is unusable, meaning the images are too blurry or represent indistinguishable small fragments. In addition, 19% of the dataset is classified as liquid drops. After removal of fragments, blurry images, and cloud drops, 38% of the remaining ice particles are largely intersecting the image border (≥10% cutoff) and therefore are considered unusable because of the inability to properly classify and dimensionalize. After this filtering, an unprecedented database of 1 560 364 images across all campaigns is available for parameter extraction and bulk statistics on specific particle types in a wide variety of storm systems, which can act to improve the current state of microphysical parameterizations.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Schmitt’s current affiliation: University of Alaska Fairbanks, Fairbanks, Alaska.

Corresponding author: Vanessa Przybylo, vprzybylo@albany.edu

1. Introduction

Minute atmospheric particles govern large-scale systems through upscale growth and development, which are inherently linked to cloud formation and evolution. These processes scale globally to impact Earth’s energy budget through latent heating, buoyancy changes, and hydrometeor loading. Fundamentally, these dynamic and thermodynamic processes are understood, but perfect representation in models is infeasible due to their inherent complexity from varied shapes and features and current lack of computational power. Consequently, it is often that atmospheric models operate on a statistical basis to capture a broad yet likely outcome for a multitude of processes within a given parameterization, which presumes that data samples are sufficient to draw analytical conclusions. As computational efficiencies and new data practices emerge, scientists can deduce more substantial analytical estimates from sizeable in situ datasets that diminish uncertainties within cloud models.

Geophysical calculations on observational data, such as radiative transfer measurements, radar retrievals, and particle fall speed relationships, suffer from large uncertainties related to the physical properties of ice cloud particles (e.g., Protat and Williams 2011; Sun et al. 2011). Clouds that contain ice differ from warm-phase clouds in that the crystals form different shapes or habits, which contribute to uncertainties in regional and global climate and Earth system models (e.g., Comstock et al. 2007). Many remote sensing algorithms rely on particle characteristics gleaned from distributions of geometric or fractal properties of crystals that lie within specified bounds. Any uncertainties in habit distributions can propagate into subsequent parameterizations.

High‐altitude ice clouds cover >50% of Earth’s surface (Wang et al. 1996; Wylie et al. 2005; Hong and Liu 2015; Huang et al. 2015), and their role in latent heat and mass redistribution in the upper troposphere is known but poorly quantified due to microphysical uncertainty. This uncertainty propagates into future climate scenarios through numerical simulations due to incomplete and overly simplistic parameterizations of clouds and microphysical properties (Iacobellis et al. 2003). Moreover, it is recognized that ice clouds have a profound impact on the radiation budget and cannot be ignored (e.g., Sun and Shine 1994). For example, cirrus clouds can be present at all latitudes and can cover as much as 40% of Earth’s surface (Liou 1986). Studies have shown that the cirrus radiative impact could be modified by as much as 30% given the major source of uncertainty attributed to ice particle shape (Takano and Liou 1989; Stephens et al. 1990; Mishchenko 1996). Given the expanse of ice and mixed-phase clouds studied in past field campaigns, plentiful scientific opportunities exist for creating the sizeable datasets needed to hone empirical estimates.

Many past field campaigns have leveraged aircraft platforms for differing scientific endeavors within a variety of cloud systems. Often, aircraft payloads include a suite of optical imagers and environmental probes, which provide data-rich, underutilized documentation across a nondiscriminatory spatial and temporal scale. Recent computational achievements have provided the unique opportunity to perform comprehensive analysis on a multiplex of big-data applications within the atmospheric sciences and beyond. Of late, widespread success surrounds machine learning (ML) techniques as powerful tools for efficiently operating on large datasets. ML is particularly beneficial when manually producing rules for label distinction is too laborious (Witten et al. 2016). ML algorithms quickly and efficiently operate on large quantities of data and identify features otherwise unnoticed or ignored given the abundance of data, which would be nearly impossible to analyze individually. Proper uses of ML require large quantities of data of the same type, which is why classification of the millions of cloud particle images is an ideal application.

2. Background

Since the wide acceptance and ever increasing use of optical array probes beginning in the early 1970s with the Particle Measuring Systems 2D-C (Knollenberg 1972), documentation of the microphysical structure and composition of individual particles (Korolev and Isaac 2003; Baum et al. 2005) has shed light on particle scattering properties and satellite retrievals, such as effective particle size, growth rates, and terminal fall velocity (Korolev et al. 2000; Heymsfield et al. 2002a, 2004; Heymsfield and Westbrook 2010). In calculating the mass of precipitation, incorrect ice shape (e.g., a 420-μm sphere vs 5-mm stellar dendrite) may result in a particle mass error factor of ∼15 (Mason 1994). Mason (1994) showed that columns can fall 1.5–2 times as far as a plate of equal mass (crystal length/radius = 5), although more extreme aspect ratios would have more extreme differences. The high-resolution imagery and environmental properties associated with each habit from a suite of in situ aircraft probes make for an invaluable test bed to implement these findings into cloud models, improve radiative forcing measurements, and better understand and represent climatic thermodynamic feedbacks across all latitudes. Classification of probe imagery as a function of particle shape (and appearance characteristics such as texture), along with associated particle and environmental characteristics, is the first step in developing a means by which such a massive dataset can be exploited for more focused microphysical research.

Classification of particle images via airborne optical array probes is not a novel research project; a multitude of techniques and mathematical methods have been applied at various complexities to achieve automated habit classification in addition to feature descriptor understanding. Initial efforts for hydrometeor classification on probe imagery used simple geometric features (e.g., maximum dimension, perimeter, area) extracted from shadowgraph images (Hunter et al. 1984; Holroyd 1987; Moss and Johnson 1994). These trials included decision tree classification algorithms that automatically define conditions based on particle descriptors. While useful, they were not able to identify common yet more intricate habits such as aggregates or bullet rosettes; thus, advancements were proposed by Korolev et al. (2000) to include more complex particle descriptors based on ratios of geometrical measures. With progress in imaging techniques came new probes with higher pixel resolution, such as the Cloud Particle Imager (CPI) probe (2.3-μm pixel size resolution with particles that pass through the sample volume at speeds up to 200 m s−1; SPEC 2012b). Lawson and Baker (2006) developed a habit classification scheme based on CPI images for nine categories dependent on acceptance criteria from more sophisticated measures such as radial harmonics, along with past measures (e.g., length, width, area, and perimeter). However, it is argued that when deviating from past well-understood quantities, physical interpretability is obscured. Lawson and Baker (2006) found that on occasion, the automatic crystal classification program could not distinguish reliably between crystal types for 20% of the dataset taken at the South Pole Station. Specifically, when compared with manual classification, 12% of all categories were misclassified (Lawson and Baker 2006). More recently, Lindqvist et al. (2012) used principal component analysis of selected physical and statistical features of ice‐crystal perimeters to classify particles from the CPI probe into eight categories. Three randomly selected test cases of 222, 200, and 201 crystals from tropical, midlatitude, and Arctic ice clouds, respectively, resulted in a combined classification accuracy of 81.1% (Lindqvist et al. 2012). Moreover, computations of shortwave radiative fluxes showed that the flux differences between clouds of manually and automatically classified crystals can be as significant as 10 W m−2 but also that two manual classifications of the same imagery result in even more considerable differences, implying the need for a systematic and repeatable classification method (Lindqvist et al. 2012).

McFarquhar et al. (1999) used two different techniques for the classification of four habits combining both simulated and observed crystals from a 2D cloud probe and video ice particle sampler. Discriminate analysis was used to predict where the observed crystals fell in a simulated phase space of 1000 randomly oriented crystals and resulted in success rates between 31% for bullet rosettes and 96% for polycrystals (McFarquhar et al. 1999). A second technique used an unsupervised self-organized neural network (NN) and returned much improved success rates ranging between 69% and 87% (McFarquhar et al. 1999). Further analysis using a Monte Carlo radiative transfer routine for reflectances proved to significantly depend on predicted habit via the classification scheme (McFarquhar et al. 1999). Similarly, one approach by Feind (2008) used a backpropagation NN trained on 2000 images labeled into eight classes from the 2D-C probe in one storm. The model reached 85.1% accuracy, but preliminary feature extraction was still used as opposed to automatic feature extraction through deep learning architectures such as convolutional neural networks (CNNs).

Classification of surface-based imagery has also been recently developed using the three-view Multi-Angle Snowflake Camera (MASC; Garrett et al. 2012) based on geometric characteristics and degree of riming (Hicks and Notaroš 2019); 1450 prelabeled images composing six classes were used as the training dataset for a CNN, resulting in a mean accuracy of 93.4% on the prelabeled data and displayed excellent generalization on new data. Additionally, 3000 hand-labeled MASC images were classified by means of multinominal logistic regression (MLR) and achieved a classification accuracy of 95%, although 72 feature descriptors were used to know which variables are most relevant for classification (Praz et al. 2017). These descriptors included particle size, shape, and textural information and were all extracted during preprocessing. Praz et al. (2017) validated the classification method through comparison with two-dimensional video disdrometer (2DVD) data by Grazioli et al. (2014).

Grazioli et al. (2014) used 2DVD data to employ a support vector machine (SVM) for classification tasks. The SVM was trained over 1-min averages to predict the dominant hydrometeor type during precipitation events, which were visually labeled into eight dominant hydrometeor classes. The algorithm achieved relatively high classification performances, with accuracies higher than 84% for each hydrometeor class and median overall accuracies of 90%. Bernauer et al. (2016) extended the 2DVD classification to three dominant crystal types (single crystals, complex crystals, and pellets) and three classes for the degree of riming (weak, moderate, and strong) through a simple decision tree algorithm; however, the classification failed for 12% of the hydrometeor populations. Resolution for the 2DVD is limited to approximately 2 mm and produces binary black-and-white images, which neglects particle structural details and forces classification averaging over 1-min intervals for prevailing hydrometeor type.

More recently, Praz et al. (2018) tested and compared MLR, SVMs, and artificial NNs to predict six particle habits from the high-volume precipitation spectrometer, the 2D-S stereo probe, and the CPI probe. Praz et al. (2018) was able to achieve 95.3% accuracy on testing data for the CPI probe using MLR. However, feature extraction is required for MLR, an arduous preprocessing phase that may also require retraining for images originating in different environmental conditions. O’Shea et al. (2016) developed a habit recognition algorithm for sorting CPI images into nine categories using a three-layer feed-forward NN and geographically diverse samples from two flights during the Cirrus Coupled Cloud-Radiation Experiment (CIRCCREX) project (Lindqvist et al. 2012). The network had an estimated classification accuracy of 79% when compared with manually classified images and increased to 88% if categories were merged from nine into six categories. If the predicted classification probability was less than 50% across all habits, the image was not classified. Note that the process of image flattening in a feed-forward NN tends to lose spatial relationships as compared with a CNN (Hou et al. 2021).

Recently, the rapid development and use of artificial intelligence, in particular, CNNs, has progressed image processing and classification capabilities. Touloupas et al. (2020) used a CNN to classify an ensemble of particles from holographic imagers as liquid, artifact, or ice. The CNN surpassed the predictive capabilities of the other decision tree and the SVM approaches in six of seven metrics, with the median overall accuracy being as high as 96.8%. Touloupas et al. (2020) also stated that using the CNN not only improves the generalization ability but also requires less engineering effort because it automatically extracts features from data, unlike decision trees and SVMs. It should be noted that SVM classification requires feature extraction on multidimensional arrays because the algorithms cannot handle images directly. Thus, it is desirable to advance classification to algorithms that disregard a large number of features, which are experimentally determined and extracted prior to training. In addition, SVM classifiers tend to be unreliable on unseen data, which requires updates to training datasets for generalizability and broad applicability. Most recently, Wu et al. (2020) used eight different predefined CNNs to classify airborne Cloud Imaging Probe (CIP) images. The CIP probe has a resolution that is approximately 6 times as coarse as that of the CPI probe, and yet the best model reached an average precision of 98% on nine classes. However, the training dataset of 1800 images from 10 years of data between 2008 and 2018 is geographically constrained to the Beijing, China, area (Wu et al. 2020). For comparison, the dataset used herein consists of an order of magnitude more images, six additional years of research flight initiatives, and most importantly, generalizability to a wider variety of particle types.

Even with past advancements in image categorization, there is much room for improvement and interpretability in future models, given the lack of automation on a wide range of crystal types and incorrect labeling on a portion of testing data. Instead of writing a classification model by hand, which would lack “rules” that are both simple and reliable, this work turns to deep learning innovations because layers of NNs have the advantage to learn more complex, nonlinear functions without the need for separate algorithms for each particle type or predefined feature descriptors. There is great benefit to leveraging deep learning CNNs: neither background knowledge on feature metrics for classification is required nor are inferences on how these features are distributed among and between classes. In reality, it is seldom that two crystal classes are separable in lower-dimensional space (i.e., 1D or 2D) with features that uniquely distinguish image identity. Feature extraction is purposefully averted in this study as the assets of NNs are utilized, which excel at higher-dimensional tasks and automatically deduce which features are most important for classification. This is at the expense of significant computational resources needed, increased model complexity, and lack of full model interpretability and transparency using neural networks.

This work begins by discussing CPI probe details and image collection in section 3, and then classification models and the training process are described in section 4. A summary of the results is presented in section 5 including performance assessment and time trials.

3. Data collection

The SPEC, Inc., CPI probe (Lawson et al. 2006b) is used because of its fine-scale resolution: 2.3-μm pixel size, 256 levels of gray at each pixel, and a maximum detectable particle size of 2.3 mm × 2.3 mm (1024 × 1024 pixels; SPEC 2012a), making the probe advantageous over other optical array probes (OAPs) [see Lawson et al. (2006a), their Fig. 1, and Lawson et al. (2006b), their Figs. 4, 5, 6, and 8]. The CPI operation principle is different from that of other OAPs in that it relies on a square photo-detector array, meaning that an entire image is captured instantaneously when the device is triggered. As a result, the measurement is less sensitive to distortion effects, but discontinuous. While the CPI is unreliable for particle size distributions because of limitations with the detected size range (∼15–2500 μm) and slow sampling speeds relative to other OAPs, only the imagery itself is used. Also, while it is possible for shattered particles to enter the sample volume, artifacts are minimized by only using single particles per frame, which is filtered during preprocessing (McFarquhar et al. 2013).

Feind (2008) found that separating not-so-circular hail images (with relatively rough perimeters) from snow images (with not-so-convoluted perimeters) required larger pixel areas due the binary (black or white) images produced by the 2D-C probe. Moreover, there is a huge benefit to the 256 grayscale levels that depict surface roughness in CPI images. Although background noise can be substantial, the intricate details in the CPI images can be used as additional descriptors. Praz et al. (2018) found that the presence of transparent areas within the ice particle boundaries is a relevant criterion to distinguish among particles, which is not a feature found in lower resolution probes. Importantly, from Praz et al. (2018), riming degree identification with MLR was not achievable and would require additional dedicated effort due to some limitations intrinsic to the CPI device: noise in the photodiodes, lack of constant background intensity threshold and highly variable contrast, average brightness, and focus from image to image. In this work, the CNNs can distinguish between unrimed and rimed particles without additional efforts past prelabeling a relatively small amount of rimed particles. Once CPI imagery from all campaign IOPs are gathered, processed, and trained using CNNs for habit classification, these images and their associated classification and geometrical properties serve as critical information used to populate a CPI database.

To build a robust image dataset with heterogeneity in crystal type, geographical location, and system dynamics, past field campaigns that carried the CPI probe on the aircraft payloads are utilized. A detailed description on image preprocessing from the CPI probe is included in appendix A. Table 1 provides a brief description of the location, date, and aircraft that housed the CPI probe, along with the number of images generated per campaign to be classified; the selected 12 campaigns provide 10 492 847 images. This massive dataset contains both ice and liquid phases, various particle orientations, and in the case of aggregates, a varying number and aspect ratio of constituents. To this end, the comprehensiveness of this dataset is unparalleled and provides plentiful scientific opportunities to study microphysics on a scale that has yet to be achieved in terms of in situ airborne probe imagery. For example, this dataset will make for an invaluable test bed for cloud modeling, radiative forcing, and climatic thermodynamic feedbacks across all latitudes.

Table 1

CPI data acquisition details for each campaign. The following campaigns are included—ARM-IOP: Department of Energy Atmospheric Radiation Measurement Intensive Operating Period at the Southern Great Plains (SGP) site; ATTREX: Airborne Tropical Tropopause Experiment; CRYSTAL-FACE: Cirrus Regional Study of Tropical Anvils and Cirrus Layers–Florida-Area Cirrus Experiment; AIRS II: Alliance Icing Research Study II; MidCiX: Midlatitude Cirrus Experiment; ICE-L: Ice in Clouds Experiment–Layers; MACPEX: Midlatitude Airborne Cirrus Properties Experiment; MC3E: Midlatitude Continental Convective Clouds Experiment; OLYMPEX: Olympic Mountains Experiment; MPACE: Mixed Phase Arctic Cloud; POSIDON: Pacific Oxidants, Sulfur, Ice, Dehydration, and Convection Experiment. The last column represents processing time in hours to extract individual images from the sheets using five compute cores.

Table 1

4. Machine learning for habit classification

After all images are extracted, image classification is achieved using an ML approach. The simplest NNs are defined by three layers, an input layer where the data enter the model, a hidden layer that processes the information, and an output layer where a decision is made (e.g., Marr 2019). A deep NN contains multiple hidden layers to process the information in multiple complex ways with a typical trade-off between accuracy and computational expense. However, acceleration of graphical processing unit (GPU) technology has facilitated the ability to efficiently run dense deep NNs. For example, this study utilized up to eight Tesla V100 GPUs, which substantially increased pixel rendering speeds for training and analysis.

a. Classification models

CNNs are popular in image processing due to their ability to extract dominating features from matrices in multiple channels (Yu et al. 2016; Sharma et al. 2018). Multiple CNNs are trained on a number of CPI images to identify a given particle type. Nine categories are chosen in this work; examples of each category can be seen in Fig. 1 with descriptions in Table 2. It is important to note that categories were chosen such that there would be an appropriate category for any particle type given a stream of aircraft data. In addition, a rimed category, while unique among most other studies, is useful in representing and studying regions with high concentrations of supercooled cloud droplets.

Fig. 1.
Fig. 1.

Examples of the nine categories used to train the CNNs.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

Table 2

Category descriptions.

Table 2

Here, multiple CNNs are utilized and tested from the PyTorch library (Torchvision 2019) [e.g., residual networks (ResNet; He et al. 2015), AlexNet (Krizhevsky et al. 2012), Visual Geometry Group (VGG; Simonyan and Zisserman 2014), dense convolutional network (densenet; Huang et al. 2016), and EfficientNet (Paszke et al. 2016)]. Throughout initial examination, training relied on transfer learning, or pretrained models with fixed model weights and parameters for quick use on later tasks and versatility in application. The pretrained models use the extensive ImageNet dataset (Deng et al. 2009) and optimize convergence toward a minimum in error from incorrect predictions. Transfer learning is attractive due to the lack of a laborious model development phase and straightforward implementation; however, training for the target dataset must be completed on data that are intrinsically similar to data the model was initially tested on to utilize frozen model weights. Given the deviations between the ImageNet and CPI datasets, analysis proves that additional training time to produce weights derived entirely from the CPI dataset is a worthwhile trade-off resulting in improved model performance, and thus, each model is trained from scratch using the predefined and widely tested architectures.

b. Training dataset

To aid in the tedious hand labeling phase into predefined categories, a non-ML algorithm is developed, namely, the Particle Image Characterization Tool (PICT). PICT acts on single images to find contours from a predefined grayscale image, combine small gaps or irregularities in the main foreground contour for geometric calculations, and discard images where a large portion of the crystal is cut off. It then computes the variance of the Laplacian on the grayscale image to detect “blurriness.” After these checks, PICT calculates dimensional characteristics on the largest contour through a series of variables (Table 3), which include but are not limited to solidity, area, perimeter, area ratio, circularity, complexity, length, and width from an encompassing rectangular box. From the particle properties listed in Table 3, conditional statements are used to automatically, filter images that mostly fit each category and are unbiased, which greatly eases preprocessing steps. The definitions used for each class are not listed since they are altered on a campaign-by-campaign basis, especially for the Laplacian value threshold that detects blur, and human inspection is still needed to double check the output from PICT.

Table 3

Predictor descriptions used in PICT.

Table 3

Note that the CNNs used herein require a fixed and symmetric image size. Initial testing proved that degrading resolution upon enlarging small images resulted in higher accuracy than did adding padding to the smaller images to maintain the initial aspect ratio and size. While this unfortunately results in images that become visually “stretched” when fed into the model (e.g., a thin column that stretches to look wider), this transformation can be reversed to obtain the original image following classification. PICT is used as the sole classifier for spheres, columns and needles, and blurry images; however, human confirmation of some category labels is still required at the current state given the lack of automation for complex particle shapes, especially those that show riming. In all, PICT is used to aid in initial CPI labeling and extract detailed CPI particle feature and dimensional characteristics either before or after passing each image through the CNN.

For particle imagery that is insufficiently classified by geometric properties, hand labeling is necessary and was completed by one individual, which inherently could introduce some bias. Additional labelers would limit such bias with a trade-off for additional person-hours. Details on the training dataset with respect to each class can be found in Fig. 2. Class totals include data augmentation, which consists of rotating and transposing the orientation of each image to artificially build upon already present images without compromising model accuracy (i.e., 4 times as many images are generated). The labeled dataset includes 24 720 images to broadly represent a variety of particle habits and configurations. To improve the efficiency and decision making of the labeling pipeline, early versions of the model trained on a subset of data were used to suggest new labels on unseen data, which acted to accelerate data labeling after careful inspection. Better calibrated predictions are expected with less overfitting as more data reinforce the training dataset. Further, keeping in mind that there are only relative categories within a classification phase space and that it is uncommon to see particles with identical features within a class, this training dataset reinforcement not only helps to build each main category, but can also identify subcategories (e.g., columns hollowing out to needles or dendrites growing from plates) to reduce model uncertainty and the need to “force” into a predefined category. For example, while rarer in nature, pristine dendritic crystals could be reconsidered as a future category in the context of this application. A major benefit to using CNNs is the relative ease of allowing a human operator to build the network classifications based on the desired classes for a specific need.

Fig. 2.
Fig. 2.

Class distribution of the labeled dataset for the nine classes totaling 24 720 images. Weighted random sampling was used to oversample minority classes and undersample majority classes based on an inverse weighting via the number of samples per class.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

For training, 80% of the labeled dataset is used for training, and 20% is used for validation, the latter of which the labels are not given to the model for biased prediction. The same transformations are applied to each portion of the dataset; all images are resized to 224 × 224 pixels and normalized between −1 and 1 by subtracting the mean and dividing by the standard deviation of the pixel values in the dataset. Instead of using the mean and standard deviation statistics calculated on the pretrained ImageNet dataset, mean and standard deviation metrics are calculated on the CPI imagery and consistently applied to training, validation, and testing datasets. Because of the imbalanced number of images per category, weighted random sampling (“WeightedRandomSampler” method; Torchvision 2019) was used to oversample minority classes and undersample majority classes based on an inverse weighting via the number of samples per class so that each class is learned equally. During the model exploration phase, the learning rate1 is halved from 0.01 for every iteration when the validation accuracy does not improve to optimize convergence. Also, Nesterov stochastic gradient descent with a decay of 0.9 is used for accelerating gradient descent in the relevant direction (Laborde and Oberman 2020).

5. Results

Once data samples have been processed and split into subcategories for training and validation, nine CNNs are trained on 80% of the labeled dataset (19 776 images) to obtain a baseline performance in terms of loss and accuracy. Figure 3 shows learning curves for accuracy (top) and loss (bottom) of each supervised CNN (colors) with weights entirely derived from CPI imagery. The VGG networks (Sharma et al. 2018) display supreme accuracy and loss on both the training (left; dots) and validation (right; stars) datasets. The training data adjust the weights of the NN, and the validation dataset acts to verify that the models are not overfitting such that any increase in accuracy over the training dataset yields an increase in accuracy over a dataset that has not been shown to the network (Xu and Goodacre 2018). Last, the testing set is entirely unseen data and used to confirm the predictive power of the network in the final stage once the model development phase is satisfactory (not shown in this figure).

Fig. 3.
Fig. 3.

(top) Accuracy and (bottom) loss for 20 epochs on a dataset totaling 24 720 images split 80% for (left) training (dots) and 20% for (right) validation (stars). Colors represent different CNNs that categorize nine habits. The learning-rate parameter decreases upon reaching a plateau in validation accuracy with a batch size of 64 images read into memory at a time.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

A logarithmic increase in model accuracy and a decrease in loss across epochs2 is an indication of learning. A good fit is identified by training and validation losses that decrease to a point of stability with a minimal gap between the two final loss values. Attention should be brought to how quickly this is achieved in Fig. 3 without overfitting. All networks obtain validation accuracies over 90% after five epochs (Fig. 3, top right), which indicates appropriate labeling and the ability for the dataset to be learned. Since the training loss steadily decreases while validation loss stabilizes (and importantly does not increase), each network sufficiently generalizes on the validation dataset.

Given nine separate model architectures, there is minimal spread in performance. Between AlexNet and VGG-16, the worst and best performing models, respectively, validation accuracy varies by ∼6% (Fig. 3, top right). After each model has been roughly quantified, user-defined model parameters need to be considered. These hyperparameters, whose values are used to control learning progression, govern the training process and dictate overall model performance. Hyperoptimization is used to determine the best set of parameters (e.g., batch size, epochs, dropout values, and learning rate) for model accuracy on the validation dataset. To filter through what can be an exhaustive list of user-defined parameters, a grid search (Pedregosa et al. 2011) with a manually specified estimator and parameter search space is utilized. The grid search is completed on the basis that validation accuracy is maximized. Note that the accuracy of the classification changes each time the network is trained due to the different initial weights and biases that are used unless a fixed random seed is used throughout all stochastic processes within the pipeline. Figure 4 shows an example of hyperparameterizing on one such parameter, batch size3 (x axis; colors) with respect to maximum validation accuracy (y axis; %) out of 20 epochs for the VGG-16 network, initially proven to be the best classifier. The maximum validation accuracy is chosen as the classification metric as the model parameters are saved and updated contingent on an increase in validation accuracy. From Fig. 4, validation accuracy maxima tend to exhibit an inverse trend as a function of batch size but vary by less than 2%. Larger batch sizes make larger gradient steps that can get misguided to local minima during descent. Also, larger batch sizes result in fewer model updates given the same number of epochs, which could confirm the lower performance if node weights are not ideal by the specified epoch. Ultimately, the batch size is set to 64 for the remainder of the study since the validation accuracy is the highest when 64 images are loaded into memory.

Fig. 4.
Fig. 4.

Maximum validation accuracy (y axis; %) with respect to batch size (x axis; colors) for the VGG-16 network out of 20 epochs.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

a. Performance assessment

After hyperparameterization, all nine models are assessed for sensitivity to random sampling effects through cross validation, which is a resampling technique used to ensure the models can generalize to new, independent data; K-fold cross validation (“StratifiedKFold”; Pedregosa et al. 2011) splits the labeled dataset into “k” folds, where each fold is used once as a validation dataset while the k − 1 remaining folds are concatenated to form the training set. This process is repeated until all folds and labeled samples have been used as the validation dataset to eliminate sampling bias. The type of cross validation used here is stratified k-fold, where the entire dataset is shuffled once and then folds are made by preserving the percentage of samples for each class. In this study, five folds are used, and each fold results in separate performance metrics.

Choosing the correct performance metric to evaluate and interpret model performance is vital and should be considered based on the task at hand. For example, Fig. 3 quantifies accuracy as the ratio of the number of samples correctly classified by the model to the total number of images for a given training or validation dataset. However, accuracy is not a proper measure due to the minor class imbalance on the validation dataset (no weighted sampling) and the failure to distinguish between correctly identified samples across classes evenly. For this reason, three more performance metrics are considered: precision, recall, and F1 score (Pedregosa et al. 2011). Precision determines how often the model is correct when the model predicts positive, that is, both true positives (TP) and false positives (FP). As such, precision helps identify models with higher FP rates. Recall refers to the percentage of total relevant particles correctly classified and helps identify the occurrence of false negatives (FN). As an example, for the classification of aggregates, a TP occurs when an aggregate is correctly classified as an aggregate, an FP is when a nonaggregate is classified as an aggregate, and an FN is when an aggregate is classified as a nonaggregate. The ideal scenario is when the amount of FP and FN are minimized, which is captured in the F1 score through the combination of precision and recall. Precision, recall, and F1 scores range between 0.0 and 1.0, where 1.0 is considered perfect and 0.0 is no skill. Mathematically, precision P, recall R, and F1 scores (F1) are calculated as
P=TP(TP+FP),
R=TP(TP+FN),  and
F1=2(PRP+R)=TPTP+0.5(FP+FN).

Hence, given the example above, P is a measure of the trust of the model when it classifies an aggregate (the positive class), R is a measure of the trust of the model to find all aggregates out of the total number of actual aggregates (including aggregates that the model classifies as nonaggregates), and F1 is a measure of accurate classification that incorporates these two parameters to optimize a balance between model strictness and lenience in the criteria for determining an aggregate.

Figure 5 shows the variability in each metric between classes and across five folds or iterations of the labeled validation dataset for P (orange), R (green), and F1 (blue) for all nine CNNs (x axis) after 20 epochs. The y axes represent the magnitudes of each performance metric between 0.75 and 1.00. First, Fig. 5a (top) shows performance metric distributions across all 45 data points from five folds and nine classes. Figure 5b (middle) then averages metrics across the five folds for each class to emphasize variation in performance across classes (nine data points). Figure 5c (bottom) computes the equal-weighted macro average across classes to show variation across folds or sampling bias including five data points per distribution. Comparison between Figs. 5b and 5c proves that there is greater variability in terms of performance spread (interquartile ranges) among class predictions (Fig. 5b) than there is from sampling bias or lack of generalizability (folds, Fig. 5c). In addition, the VGG-16 model proves to consistently achieve the best performance in terms of all metrics with the least amount of variability; P, R, and F1 generally have a median value near 0.96. The VGG-19 model results in similar P, R, and F1 scores, but it takes longer to classify when compared with the VGG-16 network given the dense layer architecture.

Fig. 5.
Fig. 5.

(a) Classification report distributions for F1 scores (blue), precision (orange), and recall (green) including all folds and classes with respect to each CNN (x axis). (b) Classification report distributions with each metric averaged across five folds to show variation in classes. (c) Classification report distributions averaging metrics across all classes (using the macro average) to show variation across folds or independent groups of the labeled dataset. All data are taken at epoch 20. Boxes extend between the first and third quartiles, with vertical black lines extending between the minimum and maximum values. The median is denoted by the horizontal black line within each box.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

Given confirmation of model accuracy in the results presented above, the remaining analysis is completed for the VGG-16 network only. Measures P, R, and F1 are broken down by class across all 24 720 labeled images for bulk statistics in Fig. 6. Colors range between magnitudes of 0.90 (blue) and 1.0 (red) representing worse and better classification, respectively. Each row represents a class with the final three rows representing an average across classes with respect to each metric: P (first column), R (second column), and F1 (third column). Notably, a majority of classes measure greater than 0.95 for all metrics, with an average near 0.97, which is also confirmed in Fig. 5b for the VGG-16 network. Fragments, which include blurry images, and spheres, are best classified, likely due to their distinct and easily discernible features, whereas plates and budding rosettes are worst classified. Aggregates and bullet rosettes fall toward the middle of the classification metric spectrum relative to the seven other classes.

Fig. 6.
Fig. 6.

Precision, recall, and F1 score predicted by the VGG-16 on each class (rows) from the labeled dataset of 24 720 images at epoch 20. The last three rows represent bulk statistics across classes both weighted by count (weighted avg) and unweighted (macro avg). Shades of blue and red represent values that are respectively less than and greater than 0.95.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

To further visualize classification error, confusion matrices are presented in Fig. 7 for the VGG-16 network. Because of the supervised learning approach of prelabeled images, confusion matrices have the benefit of statistically displaying what class a model predicts versus what the actual label is. The confusion matrices in Fig. 7 (top) display predicted labels on the x axis and actual labels on the y axis for all 24 720 labeled images (i.e., all folds during cross validation). The confusion matrix on the left is normalized by class samples given the imbalanced dataset. The confusion matrix on the right shows the unweighted or total number of images classified for each possible scenario. Accurate predictions fall along the one–one line with darker blues representing higher frequencies. Figure 7 (bottom) shows examples of incorrect model predictions that correspond to the location on the normalized confusion matrix with the same color. The ordering is such that the examples are presented from top to bottom of each column and from left to right following the normalized confusion matrix. Note that in many cases of misclassification, the particles misclassified are quite similar (e.g., budding and bullet rosettes), and in some cases could be considered for either category. While great supervision went into labeling, which included a combination of automated and human hand labeling, ultimately some images may be mislabeled, and in fact instances occur when the model correctly predicts the particle type, even if the label is incorrect. These two points imply that the accuracies of the image classification may actually be higher than the results indicate.

Fig. 7.
Fig. 7.

(top) Confusion matrices from the VGG-16 network for all 24 720 labeled images. Each column represents model predictions for each class, whereas each row represents ground truth according to the manual labeling performed beforehand. A perfect model would display no values outside of the 1-to-1 line. The color bar in the top-left panel represents the percentage (0.0–1.0, normalized for class imbalance), and the color bar in the top-right panel represents the number of images sampled, with darker colors indicating higher frequencies. (bottom) Colored boxes that show examples of incorrect predictions that correspond to the location in the top-left normalized confusion matrix with the same color.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

Normalized accuracies (Fig. 7, top left) range between 0.92 for budding rosettes and 1.00 for fragments and blurry images. Budding rosettes are most often misclassified as compact irregulars, bullet rosettes, and occasionally, aggregates, none of which are surprising. There are subjective delineations between the length of the branches contributing to a budding versus a bullet rosette, and budding rosettes technically could be a subcategory of compact irregulars. Aggregates, compact irregulars, and rimed particles are also naturally misclassified. Rimed particles have no definitive measure that determines the degree of riming in which a particle is deemed rimed versus unrimed. Should the particle not be declared as rimed after visual inspection, the size and shape of the particle is used to determine a compact irregular from an aggregate. For the purposes of this study, compact irregulars tend to have fewer monomers and exhibit less expansive 2D surface area than aggregates, which are classified to have branched and identifiable collection. However, these rules are at the discretion of the classifier and periodically deceive the network as well. Another notable region that could be improved upon is the classification of plates. Plates are classified to be columns or compact irregulars 29 and 44 times, respectively (Fig. 7, top right) out of 1070 total plates. Plates can be difficult to appropriately classify due to orientation and lack of a third field of view from the CPI probe. The broad basal face of a plate tilted into the field of view of the camera becomes a thin marking that resembles the rectangular prism face of a column. The primary feature to consider between similarly sized plates, columns, and compact irregulars are the number of sides, or lack thereof. Plates are typically classified with six sides (also guided by the existence of intricate transparent regions), whereas columns are usually longer than they are wide and can have eight sides in a 2D view but are most often classified with four in a rectangular shape. A compact irregular can comprise a plate as the primary particle but another (smaller) particle has attached, or there is a deformity on the particle that hinders easy identification. Note that there are exceptions to the “rules” used here, such as when a plate is angled on the side and can show eight sides, in which case, it is up to the user to deduce the appropriate class. In fact, in the other nine classes, typically the particle type can be determined regardless of viewing angle, though the correct dimensionality characteristics may be warped. However, the unique shape and aspect ratio of plates lends to an asymmetry in viewing angle, increasing the likelihood that a plate may be mistaken for another particle type. Moreover, the plate category here is established to contain any pristine plate-like particles, including solid hexagonal plates, sectors, stellars, and dendrites. The diversity in this category alone (unlike most other classes) contributes to the lower classification accuracy as compared with the other classes. Reinforcement could be added to the plate category by splitting the class into dendrites and sectors and by increasing the number and diversity of samples in each of these classes. For these reasons, the shortcomings of classifying 2D images make multiangle probes such as the MASC enticing (Praz et al. 2017).

b. Time trials

The creation of a well-performing model is evident from Figs. 37; however, there needs to be consideration on the amount of time it takes to train each CNN as this model will continue to be reinforced and retraining will be necessary in the future. Figure 8 shows the time in minutes that it took to train each CNN on 24 720 images. Profiling was completed using 20 central processing unit (CPU) cores and 2 Tesla V100 GPUs. Networks with fewer layers completed training the fastest; for example, ResNet-18 finished in ∼35 min, whereas ResNet-152 took over 80 min to complete. At the current state, the bottleneck occurs pushing image data from CPUs to GPUs, and thus, GPUs are not running at maximum capacity; however, this is an issue for all models and so does not affect the relative speed. That being said, ideally, now that the best model has been confirmed, that model only needs to be trained once for when new imagery is added for additional categories or for broadening the scope of image variation that the model can capture. The training times presented in Fig. 8 are not unreasonable given the validation accuracies and generalizability to new data.

Fig. 8.
Fig. 8.

Time (min) to train each CNN on 24 720 images with a batch size of 64 for 50 epochs to emphasize differences. Twenty CPU cores and 2 GPUs are used to process the training dataset. Notice that networks of the same group share the same hue.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

While Fig. 8 indicates the time to train the models, Fig. 9a shows the time to make new predictions in seconds (y axis) for 100 (blue), 1000 (black), and 10 000 (red) samples for each model (x axis); Fig. 9b shows the corresponding processing efficiency in terms of samples per second. It is important to note that this figure reports from a system architecture again with 20 CPUs and 2 GPUs and can greatly vary but the analysis remains useful for relative speeds among CNNs as well as samples. Every model shows very similar results for 100 and 1000 samples; it takes no more than 8 s to transform and classify 1000 images. A nonlinear increase in prediction time is noted when 10 000 samples are loaded following the nonlinear increase in samples, which actually increases sampling efficiency as bottlenecks are minimized (Fig. 9b). All predictions are completed in less than 40 s or more than 250 images per second. In older models, the CPI charge coupled device camera flashes up to 75 frames per second (fps), with more recent camera upgrades capable of bringing the frame rate to nearly 500 fps (Tagg 2021); therefore, the processing speeds given in Figs. 9a and 9b are within the range of real-time processing capabilities. For comparison, Touloupas et al. (2020) classified 10 000 particles in ∼15s for a decision tree, ∼30 s for an SVM and ∼60 s for a CNN on a local server, but comparisons highly depend on the computational power of the computer.

Fig. 9.
Fig. 9.

(a) Time (s) to make predictions for 100 (light blue), 1000 (black), and 10 000 (red) samples with respect to nine different CNNs (x axis). (b) The number of samples per second that can be processed (efficiency; y axis) for each model (x axis) using 20 CPU cores and 2 GPUs.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

c. Dataset statistics

After all campaign imagery is processed through the VGG-16 network, statistics on the number and percent of fragments or blurry images, cloud drops, and particles nearly entirely in frame are displayed in Table 4. The number of images is displayed on the left of each column and the percent with respect to each campaign (rows) is displayed on the right. The percentage of fragmented and blurry images highly varies between 5.9% and 86.1% for ATTREX and ICE-L, respectively (Table 4, second column). On average, 57.3% of the campaign datasets are discarded for lack of quality imagery, which is substantial and should be acknowledged in terms of processing speed. Removal of CPI sheets during times of CPI probe malfunction would greatly enhance dataset statistics for quality ice particles; however, no removal of imagery is performed as the objective is to solely utilize the CNN, training the model on as many image variations as possible given a stream of raw campaign data.

Table 4

Statistics on the number and percent of fragmented or blurry particles, liquid cloud drops, and images that were nearly entirely in frame after processing each campaign though the VGG-16 network. Percentages are taken out of all available images that are >200 pixels squared for all campaigns except OLYMPEX (>1000 pixels squared). The “total” row sums the statistics for fragments/blurry images, cloud drops, and images with ≥10% cutoff across all campaigns. Statistics on cutoff images are considered after removal of fragments, blurry images, and liquid drops, whereas statistics on the number and percent of fragmented or blurry particles and liquid cloud drops are with respect to the entire dataset for that campaign.

Table 4

Similarly, liquid drops make up a substantial portion of the datasets as research aircraft climb and descend through lower altitudes, where liquid water content is typically at least an order of magnitude higher than ice water content (Carey et al. 2008). It is important to recall that images are included in the datasets if the square area is greater than 200 pixels squared or ∼35 μm, which removes a large concentration of small drops and fragments (see appendix A). Table 4, third column, shows 18.6% of images across all campaigns, on average, are classified as spheres with extreme variation between 0.93% (OLYMPEX) and 84.59% (ATTREX). After blurry images, fragments, and liquid drops are removed, the number and percent of ice particles that largely intersect the image border are calculated. Table 4 considers an arbitrary cutoff threshold of 10%, which means the proportion of each particle that intersects the image border to the perimeter of the largest contour in the image must be ≤10%. This is done so that the image is not misclassified, and also so that dimensional characteristics of classified images can be later determined and analyzed. Of all quality ice particles, 38.4% of the images (not including fragments, blurry images, or drops), on average, are discarded for calculation of geometrical calculations. Dimensional characteristics such as particle major axis length is only accurate if the particle is mostly captured within the size limits of the probe and is generally centered in the frame, which is evident in a final database of 1 560 364 images.

6. Discussion and uncertainties

Outside of parameter uncertainty within the ML model that requires hyperoptimization to minimize prediction error (especially in regions of sparse observations), intrinsic uncertainty in preprocessing is unavoidable. Small images are upsampled to meet a dimension criterion, and thus, resolution degradation is inescapable. The confusion matrices presented help define these uncertainties and are used to summarize performance for each class on data for which the correct values are known; however, the same level of understanding cannot be applied to unseen data. It is assumed that the substantial training dataset is, in essence, representative of particles with a wide variety of features that grew in dynamically unique systems. As more campaigns are processed and added to the training dataset, better calibrated predictions are expected with less overfitting. Even with a large training dataset, manual classification is taken to be “truth” or 100% correct. In reality, the division between some of the habits is subjective even when an expert performs manual identification. It is very possible that particle selection by eye for the training dataset is biased toward more pristine examples of each habit. In practice, it has been noticed that the CNNs can find important features (see appendix B) and might even exceed the classification abilities of humans. To that end, it is common to apply a softmax layer to the end of each CNN to produce normalized linear probabilities that sum to 1.0 or 100% (e.g., 90% column, 8% rimed column, 2% fragment). Probabilities across multiple classes for a given classification capture the uncertainty recognized by the CNN in the final databases for each campaign. All campaign databases will be made available upon project completion.

Note that there are also caveats to the CPI probe itself; large ice crystals can shatter or break, which artificially amplifies small particle concentrations and may contribute to the large percentage of fragment/blurry images (Table 4, second column). In the past, ice microphysical research, specifically with regard to observations of cirrus clouds, has been subject to bias in particle size distributions from shattered artifacts (Lawson 2011). Furthermore, imaging probes flown on aircraft platforms take 2D images (typically silhouettes lacking textural information), which cannot produce a 100% accurate shape classification for three‐dimensional objects. However, particle orientation is sufficiently randomized within the turbulent environment of the inlet leading to better representation across thousands of particle samples. Finally, the data in this work are constrained to what would be considered cloud ice up to small precipitation-sized hydrometeors given the range limitation of the CPI probe (∼ 15–2500 μm). Larger hydrometeors would require the use of other instrumentation, which is beyond the scope of this work but is encouraged in future planning of field campaign configurations with dual aircraft and surface-based high-resolution imagers such as the MASC. Hence, while the databases deriving from the classification of these CPI images will provide invaluable insight into the microphysics of the case, the statistics derived thereof cannot be treated as a holistic microphysical representation.

Moreover, through iterations of trial and error, the classes chosen for this work cover diversity in particle habit while allowing the model to learn effectively. A number of other training iterations were performed with other class types, but it was found that misclassification occurred more frequently as the model had difficulty differentiating among certain habits (e.g., rimed aggregates vs rimed columns). With that being said, the nine classes chosen for this work could be expanded upon to better capture the diversity of habit type, including emphasis on rimed particles or distinguishing among pristine crystals (e.g., plates vs dendrites). Requirements would include a new labeling phase and a more extensive training set that includes each new habit. There is also the possibility of adding a metric that determines the degree of riming within the rimed category.

7. Conclusions

A detailed understanding of the microphysical nature of cloud systems is critical for not only closing knowledge gaps that may still exist on these topics, but also helps to refine and improve current microphysical parameterizations that aim to simulate these systems. The large cache of in situ measurements available from past field campaigns provide the perfect opportunity for this task. Recent advances in computational power, and specifically, the acquisition of GPUs for pixel rendering, have made it possible to exploit these large data archives more so than in the past, enabling the analysis and classification of ice crystal imagery not only under a wide variety of environmental conditions, but over decades of data, processing millions of images efficiently (on the order of minutes to hours).

The work presented herein compiled 10 492 847 images from the Cloud Particle Imager probe from 12 different field campaigns. A laborious yet robust preprocessing methodology has been established to appropriately extract and label the many images for machine learning model development and training. A convolutional neural network was opted given its well-documented reputation in image processing; initial tests using other models include SVMs and k-means clustering but were inferior to the CNNs. Nine different CNNs were each trained on nine classes, then analyzed to determine the optimal CNN in terms of predictive accuracy. Training was completed using k-fold cross validation to ensure consistent and unbiased training across the labeled dataset.

The highest-performing CNN across multiple metrics is the VGG-16 network, which has classified over 10 million CPI images across a spatial and temporal scale that has yet to be achieved (16 yr). The results from this work outperform traditional techniques to classify habit type and encompass a broad spectrum of particles including those with evidence of riming with all statistical quantities, including accuracy, precision, recall, and F1 score ≥ 90% for all classes, with an average of 97% for the model. An end-to-end methodology is established from probe retrieval to network classification and database creation, which can be further analyzed. Statistics are shown for the number and percent of fragmented particles, cloud drops, and images that are largely cutoff or out of frame, all of which highly vary from campaign to campaign. On average, 57% of the over 10 million images are indistinguishable, 19% are spherical, and 38% of quality, nonspherical particles are too cropped for further dimensional calculations. Plates and budding rosettes are misclassified most often, with recall values of 0.90 and 0.91, respectively (likely a result of limitations in the dataset for this particular class), while all other classes achieve performance metrics above 0.95. Last, there is not a substantial trade-off for accuracy as a result of processing time; 10 000 images can be processed in 34 s, and in fact processing efficiency increases with increasing sample numbers, reaching real-time or operational capabilities. In conclusion, following the filtering of liquid drops, fragmented, blurry, and cutoff particles, ultimately 1 560 364 ice particle images have been classified across nine classes, which provides the basis for further microphysical analysis.

1

The learning rate is a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function.

2

An epoch refers to one cycle through the full training dataset, which is divided into batches; based on the output, the model then performs adjustments and reruns (next epoch of the model).

3

The batch size is the number of training examples loaded into memory and utilized in one iteration within an epoch.

Acknowledgments.

Authors V. Przybylo, K. Sulia, C. Schmitt, and Z. Lebo thank the U.S. Department of Energy (DOE) for support under DOE Grants DE-SC0016354 and DE-SC0021033. The authors are grateful for the efforts of the instrument personnel and aircraft flight crew in collection of the campaign data in addition to the funding sources. Sulia is additionally supported through an appointment under the SUNY 2020 Initiative. The authors also thank the ASRC Extreme Collaboration, Innovation, and Technology (xCITE) Laboratory for development support. The National Center for Atmospheric Research is sponsored by the National Science Foundation.

Data availability statement.

The datasets generated during and/or analyzed during the current study are not publicly available because of size limitations and ongoing research initiatives. However, the source code relating to the specific version of this work can be found online (https://doi.org/10.5281/zenodo.4687947), and the latest version of the software has been made publicly visible (https://github.com/vprzybylo/cocpit).

APPENDIX A

Description of Image Preprocessing

A significant level of preprocessing on a large quantity of data is required before automated classification can occur. A schematic of the steps required before classification can be applied is illustrated in Fig. A1. CPI region-of-interest files (*.roi) are used to generate “sheets” (Fig. A1; CPI Sheet) of CPI images using CPIview software (SPEC 2012c,b). Should only the raw data (*.roi files) from each campaign be available, there is a need for an Interactive Data Language (IDL) license for CPIview to generate and save unimpinged images devoid of time or characteristic markings on the images themselves. CPIview takes downloaded *.roi files and produces *.obj files, which are used to calculate particle dimensional characteristics. Should the sheets be available in campaign archives, CPIview is not necessary. In these cases, text typically accompanies the images either bordering the particles in the empty space surrounding each particle or on the particle rectangle itself. Text is initially removed through image dilation using a 5 × 5 kernel to transverse each sheet (Fig. A1; remove text). Dilation on a binary image acts to enlarge the boundaries of regions of foreground pixels or white pixels that encompass the text, which eradicates most small-scale elements (i.e., the text) given the kernel size. To remove text that is connected to the rectangular border, after dilation, all contours on the sheet are masked by overlaying white filled contours if the square area is <200 pixels squared or approximately 32.5 μm in length. Particles smaller than approximately 32.5 μm typically represent cloud drops, fragmented ice, or indistinguishable blurry images; thus, for the purposes of this study, it is more important to remove extraneous bordering text than to include minute images. It should be noted that the removal of small-scale images will have an effect on relative liquid water content and fragmented or shattered particles with respect to the entire dataset. If the text is on the image itself and within the border of each rectangle (e.g., the case for OLYMPEX), it is included in the campaign datasets if the square area of the image is greater than 1000 pixels squared or a particle length of greater than approximately 72 μm. This increased threshold ensures that the size of the text relative to the particle is sufficiently small and inconsequential during classification. In trial, the background of each image was masked surrounding the largest contour to eliminate any effects that text may have on classification; however, the difference in model accuracies between images with and without a background is negligible, and the extra processing time proves this extra task unnecessary.

Fig. A1.
Fig. A1.

Conceptual flow of CPI image preprocessing. 1) CPI sheets are extracted from probe software data using CPIview or downloaded from campaign archives. 2) Sheets are processed to remove text through image dilation. 3) Particle regions of interest are found using a connected component algorithm. 4) Individual images are extracted from the original sheets using the bounding indices from the connected component algorithm.

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

After the surrounding extraneous text is removed, each sheet is turned into a binary image and all contours are found (Fig. A1; find connected components). Each contour at this point represents an individual rectangular image (Fig. A1; colors), which is checked to ensure that at least one feature exists within the cropped region to avoid blank imagery. Each rectangle is then extracted from the sheet with the height and width recorded (Fig. A1; extract individual images). The number of hours to process all sheets into individual images from each campaign using five CPU cores is displayed in Table 1.

APPENDIX B

Assessing Appropriate Performance of CNNs

For conviction that model output is appropriately using the correct features for identification, saliency maps are created to differentiate visual features across the entire image and show the strength of each pixel’s contribution to the final output from the CNN. Figure B1 shows rows of the original image and the corresponding saliency map for one example of each particle type. In the saliency map, pixels are highlighted red based on the gradient of the class score with respect to the image pixels determined during backpropagation. Brighter red coloring represents a greater (absolute) contribution to the final probability of a class and confirms that the model, like humans, uses particle structure, orientation, and edges to classify an image. This process can be easily completed for any untrained image and gives confidence that the CNNs are appropriately “learning.”

Fig. B1.
Fig. B1.

Saliency maps that reflect the degree of importance of a pixel contributing to the final probability of a class. Each saliency map (red) is plotted directly below the original image (blue).

Citation: Journal of Atmospheric and Oceanic Technology 39, 4; 10.1175/JTECH-D-21-0094.1

REFERENCES

  • Baum, B. A., A. J. Heymsfield, P. Yang, and S. T. Bedka, 2005: Bulk scattering properties for the remote sensing of ice clouds. Part I: Microphysical data and models. J. Appl. Meteor., 44, 8851895, https://doi.org/10.1175/JAM2308.1.

    • Search Google Scholar
    • Export Citation
  • Bernauer, F., K. Hürkamp, W. Rühm, and J. Tschiersch, 2016: Snow event classification with a 2D video disdrometer—A decision tree approach. Atmos. Res., 172–173, 186195, https://doi.org/10.1016/j.atmosres.2016.01.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carey, L. D., J. Niu, P. Yang, J. A. Kankiewicz, V. E. Larson, and T. H. Vonder Haar, 2008: The vertical profile of liquid and ice water content in midlatitude mixed-phase altocumulus clouds. J. Appl. Meteor. Climatol., 47, 24872495, https://doi.org/10.1175/2008JAMC1885.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • CIRES, 2016: POSIDON mission goals. Accessed 30 March 2021, https://ciresblogs.colorado.edu/posidon/wp-content/uploads/sites/59/2016/10/POSIDION-Open-House-Flyer-10-11-16.pdf.

    • Crossref
    • Export Citation
  • Comstock, J. M., and Coauthors, 2007: An intercomparison of microphysical retrieval algorithms for upper-tropospheric ice clouds. J. Climate Appl. Meteor., 88, 191204, https://doi.org/10.1175/BAMS-88-2-191.

    • Search Google Scholar
    • Export Citation
  • Deng, J., W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, 2009: ImageNet: A large-scale hierarchical image database. 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, FL, IEEE, 248–255, https://doi.org/10.1109/CVPR.2009.5206848.

    • Crossref
    • Export Citation
  • Feind, R. E., 2008: Comparison of three classification methodologies for 2D Probe hydrometeor images obtained from the armored T-28 aircraft. South Dakota School of Mines and Technology Institute of Atmospheric Sciences Rep. SDSMT/IAS/R08-01, 61 pp.

  • Field, P. R., A. J. Heymsfield, B. J. Shipway, P. J. DeMott, K. A. Pratt, D. C. Rogers, J. Stith, and K. A. Prather, 2012: Ice in Clouds Experiment–Layer Clouds. Part II: Testing characteristics of heterogeneous ice formation in lee wave clouds. J. Atmos. Sci., 69, 10661079, https://doi.org/10.1175/JAS-D-11-026.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Garrett, T. J., C. Fallgatter, K. Shkurko, and D. Howlett, 2012: Fall speed measurement and high-resolution multi-angle photography of hydrometeors in free fall. Atmos. Meas. Tech., 5, 26252633, https://doi.org/10.5194/amt-5-2625-2012.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Grazioli, J., D. Tuia, S. Monhart, M. Schneebeli, T. Raupach, and A. Berne, 2014: Hydrometeor classification from two-dimensional video disdrometer data. Atmos. Meas. Tech., 7, 28692882, https://doi.org/10.5194/amt-7-2869-2014.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2015: Deep residual learning for image recognition. arXiv, https://arxiv.org/abs/1512.03385.

    • Crossref
    • Export Citation
  • Heymsfield, A. J., and C. Westbrook, 2010: Advances in the estimation of ice particle fall speeds using laboratory and field measurements. J. Atmos. Sci., 67, 24692482, https://doi.org/10.1175/2010JAS3379.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., A. Bansemer, P. R. Field, S. L. Durden, J. Stith, J. E. Dye, W. Hall, and T. Grainger, 2002a: Observations and parameterizations of particle size distributions in deep tropical cirrus and stratiform precipitating clouds: Results from in situ observations in TRMM field campaigns. J. Atmos. Sci., 59, 34573491, https://doi.org/10.1175/1520-0469(2002)059<3457:OAPOPS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., S. Lewis, A. Bansemer, J. Iaquinta, L. M. Miloshevich, M. Kajikawa, C. Twohy, and M. R. Poellot, 2002b: A general approach for deriving the properties of cirrus and stratiform ice cloud particles. J. Atmos. Sci., 59, 329, https://doi.org/10.1175/1520-0469(2002)059%3C0003:AGAFDT%3E2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., A. Bansemer, C. Schmitt, C. Twohy, and M. Poellot, 2004: Effective ice particle densities derived from aircraft data. J. Atmos. Sci., 61, 9821003, https://doi.org/10.1175/1520-0469(2004)061<0982:EIPDDF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., C. Schmitt, A. Bansemer, G. van Zadelhoff, M. McGill, C. Twohey, and D. Baumgardner, 2006: Effective radius of ice cloud particle populations derived from aircraft probes. J. Atmos. Oceanic Technol., 23, 361380, https://doi.org/10.1175/JTECH1857.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hicks, A., and B. M. Notaroš, 2019: Method for classification of snowflakes based on images by a Multi-Angle Snowflake Camera using convolutional neural networks. J. Atmos. Oceanic Technol., 36, 22672282, https://doi.org/10.1175/JTECH-D-19-0055.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Holroyd, E. W., 1987: Some techniques and uses of 2D-C habit classification for snow particles. J. Atmos. Oceanic Technol., 4, 498511, https://doi.org/10.1175/1520-0426(1987)004<0498:STAUOC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hong, Y., and G. Liu, 2015: The characteristics of ice cloud properties derived from CloudSat and CALIPSO measurements. J. Climate, 28, 38803901, https://doi.org/10.1175/JCLI-D-14-00666.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hou, Q., Z. Jiang, L. Yuan, M. Cheng, S. Yan, and J. Feng, 2021: Vision permutator: A permutable MLP-like architecture for visual recognition. arXiv, https://arxiv.org/abs/2106.12368.

    • Crossref
    • Export Citation
  • Houze, R. J., Jr., and Coauthors, 2017: The Olympic Mountains Experiment (OLYMPEX). Bull. Amer. Meteor. Soc., 98, 21672188, https://doi.org/10.1175/BAMS-D-16-0182.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, G., Z. Liu, and K. Q. Weinberger, 2016: Densely connected convolutional networks. arXiv, https://arxiv.org/abs/1608.06993.

    • Crossref
    • Export Citation
  • Huang, L., J. H. Jiang., Z. Wang, H. Su, M. Deng, and S. Massie, 2015: Climatology of cloud water content associated with different cloud types observed by A‐Train satellites. J. Geophys. Res. Atmos., 120, 41964212, https://doi.org/10.1002/2014JD022779.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunter, H. E., R. M. Dyer, and M. Glass, 1984: A two-dimensional hydrometeor machine classifier derived from observed data. J. Atmos. Oceanic Technol., 1, 2836, https://doi.org/10.1175/1520-0426(1984)001<0028:ATDHMC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Iacobellis, S. F., G. M. McFarquhar, D. L. Mitchell, and R. C. J. Somerville, 2003: The sensitivity of radiative fluxes to parameterized cloud microphysics. J. Climate, 16, 29792996, https://doi.org/10.1175/1520-0442(2003)016<2979:TSORFT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Isaac, G. , and Coauthors, 2005: First results from the Alliance Icing Research Study II. 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, AIAA, https://doi.org/10.2514/6.2005-252.

    • Crossref
    • Export Citation
  • Jensen, E., and Coauthors, 2005: Formation of a tropopause cirrus layer observed over Florida during CRYSTAL‐FACE. J. Geophys. Res., 110, D03208, https://doi.org/10.1029/2004JD004671.

    • Search Google Scholar
    • Export Citation
  • Jensen, E., and Coauthors, 2017: The NASA Airborne Tropical Tropopause Experiment: High-altitude aircraft measurements in the tropical western Pacific. Bull. Amer. Meteor. Soc., 98, 129143, https://doi.org/10.1175/BAMS-D-14-00263.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jensen, M. P., and Coauthors, 2016: The Midlatitude Continental Convective Clouds Experiment (MC3E). J. Atmos. Sci., 97, 16671686, https://doi.org/10.1175/BAMS-D-14-00228.1.

    • Search Google Scholar
    • Export Citation
  • Knollenberg, R. G., 1972: Measurements of the growth of the ice budget in a persisting contrail. J. Atmos. Sci., 29, 13671374, https://doi.org/10.1175/1520-0469(1972)029<1367:MOTGOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Korolev, A., and G. Isaac, 2003: Roundness and aspect ratio of particles in ice clouds. J. Atmos. Sci., 60, 17951808, https://doi.org/10.1175/1520-0469(2003)060<1795:RAAROP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Korolev, A., G. Isaac, and J. Hallett, 2000: Ice particle habits in stratiform clouds. Quart. J. Roy. Meteor. Soc., 126, 28732902, https://doi.org/10.1002/qj.49712656913.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Krizhevsky, A., I. Sutskever, and G. E. Hinton, 2012: Imagenet classification with deep convolutional neural networks. 25th Conf. on Neural Information Processing Systems, Lake Tahoe, NV, NeurIPS, 10971105, https://papers.nips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html.

    • Search Google Scholar
    • Export Citation
  • Laborde, M., and A. M. Oberman, 2020: Nesterov’s method with decreasing learning rate leads to accelerated stochastic gradient descent. arXiv, https://arxiv.org/abs/1908.07861.

  • Lawson, R. P., 2011: Effects of ice particles shattering on the 2D-S probe. Atmos. Meas. Tech., 4, 13611381, https://doi.org/10.5194/amt-4-1361-2011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawson, R. P., and B. A. Baker, 2006: Improvement in determination of ice water content from two-dimensional particle imagery. Part II: Applications to collected data. J. Appl. Meteor. Climatol., 45, 12911303, https://doi.org/10.1175/JAM2399.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawson, R. P., B. A. Baker, P. Zmarzly, Q. M. D. O’Connor, J.-F. Gayet, and V. Shcherbakov, 2006a: Microphysical and optical properties of atmospheric ice crystals at South Pole station. J. Appl. Meteor. Climatol., 45, 15051524, https://doi.org/10.1175/JAM2421.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawson, R. P., D. O’Connor, P. Zmarzly, K. Weaver, B. Baker, Q. Mo, and H. Jonsson, 2006b: The 2D-S (stereo) probe: Design and preliminary tests of a new airborne, high-speed, high-resolution particle imaging probe. J. Atmos. Oceanic Technol., 23, 14621477, https://doi.org/10.1175/JTECH1927.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lindqvist, H., K. Muinonen, T. Nousiainen, J. Um, G. M. McFarquhar, P. Haapanala, R. Makkonen, and H. Hakkarainen, 2012: Ice‐cloud particle habit classification using principal components. J. Geophys. Res., 117, D16206, https://doi.org/10.1029/2012JD017573.

    • Search Google Scholar
    • Export Citation
  • Liou, K. N., 1986: Influence of cirrus clouds on weather and climate processes: A global perspective. Mon. Wea. Rev., 114, 11671199, https://doi.org/10.1175/1520-0493(1986)114<1167:IOCCOW>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marr, B., 2019: Deep learning vs neural networks—What’s the difference? Bernard Marr and Co., https://bernardmarr.com/default.asp?contentID=1789.

  • Mason, B. J., 1994: The shapes of snow crystals—Fitness for purpose? Quart. J. Roy. Meteor. Soc., 120, 849860, https://doi.org/10.1002/qj.49712051805.

    • Search Google Scholar
    • Export Citation
  • McFarquhar, G. M., A. J. Heymsfield, A. Macke, J. Iaquinta, and S. M. Aulenbach, 1999: Use of observed ice crystal sizes and shapes to calculate mean‐scattering properties and multispectral radiances: CEPEX April 4, 1993, case study. J. Geophys. Res., 104, 31 76331 779, https://doi.org/10.1029/1999JD900802.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • McFarquhar, G. M., J. Um, and R. Jackson, 2013: Small cloud particle shapes in mixed-phase clouds. J. Appl. Meteor. Climatol., 52, 12771293, https://doi.org/10.1175/JAMC-D-12-0114.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mishchenko, M., 1996: Sensitivity of cirrus cloud albedo, bidirectional reflectance and optical thickness retrieval accuracy to ice particle shape. J. Geophys. Res., 101, 16 97316 985, https://doi.org/10.1029/96JD01155.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Mordvintsev, A., 2013: Canny edge detection. Stratton Park Engineering Company, https://opencv24-python-tutorials.readthedocs.io/en/latest/py_tutorials/py_imgproc/py_canny/py_canny.html.

  • Moss, S. J., and D. W. Johnson, 1994: Aircraft measurements to validate and improve numerical model parameterization of ice to water ratios in clouds. Atmos. Res., 34, 125, https://doi.org/10.1016/0169-8095(94)90078-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • O’Shea, S. J., and Coauthors, 2016: Airborne observations of the microphysical structure of two contrasting cirrus clouds. J. Geophys. Res. Atmos., 121, 13 51013 536, https://doi.org/10.1002/2016JD025278.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Paszke, A., A. Chaurasia, S. Kim, and E. Culurciello, 2016: ENet: A deep neural network architecture for real-time semantic segmentation. arXiv, https://arxiv.org/abs/1606.02147.

  • Pedregosa, F. , and Coauthors, 2011: Scikit-learn: Machine learning in Python. J. Mach. Learn. Res., 12, 28252830.

  • Praz, C., Y.-A. Roulet, and A. Berne, 2017: Solid hydrometeor classification and riming degree estimation from pictures collected with a Multi-Angle Snowflake Camera. J. Atmos. Meas. Tech., 10, 13351357, https://doi.org/10.5194/amt-10-1335-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Praz, C., S. Ding, G. M. McFarquhar, and A. Berne, 2018: A versatile method for ice particle habit classification using airborne imaging probe data. J. Geophys. Res. Atmos., 123, 13 47213 495, https://doi.org/10.1029/2018JD029163.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Protat, A., and C. R. Williams, 2011: The accuracy of radar estimates of ice terminal fall speed from vertically pointing Doppler radar measurements. J. Appl. Meteor. Climatol., 50, 21202138, https://doi.org/10.1175/JAMC-D-10-05031.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schmitt, C. G., and A. J. Heymsfield, 2014: Observational quantification of the separation of simple and complex atmospheric ice particles. Geophys. Res. Lett., 41, 13011307, https://doi.org/10.1002/2013GL058781.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schmitt, C. G., M. Schnaiter, A. J. Heymsfield, P. Yang, E. Hirst, and A. Bansemer, 2016: The microphysical properties of small ice particles measured by the Small Ice Detector-3 probe during the MACPEX field campaign. J. Atmos. Sci., 73, 47754791, https://doi.org/10.1175/JAS-D-16-0126.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sharma, N., V. Jain, and A. Mishra, 2018: An analysis of convolutional neural networks for image classification. Proc. Comput. Sci., 132, 377384, https://doi.org/10.1016/j.procs.2018.05.198.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Simonyan, K., and A. Zisserman, 2014: Very deep convolutional networks for large-scale image recognition. arXiv, https://arxiv.org/abs/1409.1556.

    • Search Google Scholar
    • Export Citation
  • SPEC, 2012a: 3V-CPI Combination Cloud Particle Probe. Stratton Park Engineering Company (SPEC, Inc.), http://www.specinc.com/3v-cpi-combo.

  • SPEC, 2012b: Cloud Particle Imager (CPI). Stratton Park Engineering Company (SPEC, Inc.), http://www.specinc.com/cloud-particle-imager.

  • SPEC, 2012c: CPIview QuickLook and eXtractor CPI data processing software. Stratton Park Engineering Company (SPEC, Inc.) Doc., 33 pp., http://www.specinc.com/sites/default/files/software_and_manuals/CPI_Post%20Processing%20Software%20Manual_rev1.2_20120116.pdf.

  • Stephens, G. L., S. C. Tsay, P. W. Stackhouse, and P. J. Flatau, 1990: The relevance of microphysical and radiative properties of cirrus clouds to climate and climate feedback. J. Appl. Meteor., 47, 17421754, https://doi.org/10.1175/1520-0469(1990)047<1742:TROTMA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Sun, W., Y. Hu, B. Lin, Z. Liu, and G. Videen, 2011: The impact of ice cloud particle microphysics on the uncertainty of ice water content retrievals. J. Quant. Spectrosc. Radiat. Transfer, 112, 189196, https://doi.org/10.1016/j.jqsrt.2010.04.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sun, Z., and K. P. Shine, 1994: Studies of the radiative properties of ice and mixed‐phase clouds. Quart. J. Roy. Meteor. Soc., 120, 111137, https://doi.org/10.1002/qj.49712051508.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tagg, B. A., 2021: Cloud Particle Imager (CPI). NASA Airborne Science Program, https://airbornescience.nasa.gov/instrument/CPI.

  • Takano, Y., and K. N. Liou, 1989: Solar radiative transfer in cirrus clouds. Part I: Single-scattering and optical properties of hexagonal ice crystals. J. Atmos. Sci., 46, 319, https://doi.org/10.1175/1520-0469(1989)046<0003:SRTICC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Torchvision, 2019: Torchvision models. PyTorch, https://pytorch.org/vision/stable/models.html.

  • Touloupas, G., A. Lauber, J. Henneberger, and A. Beck, 2020: A convolutional neural network for classifying cloud particles recorded by imaging probes. Atmos. Meas. Tech., 13, 22192239, https://doi.org/10.5194/amt-13-2219-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Verlinde, and Coauthors, 2007: The Mixed-Phase Arctic Cloud Experiment. Bull. Amer. Meteor. Soc., 88, 205222, https://doi.org/10.1175/BAMS-88-2-205.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, P. H., P. Minnis, M. P. McCormick, G. S. Kent, K. M. Skeens, 1996: A 6‐year climatology of cloud occurrence frequency from Stratospheric Aerosol and Gas Experiment II observations (1985–1990). J. Geophys. Res., 101, 29 40729 429, https://doi.org/10.1029/96jd01780.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Witten, I. H., E. Frank, M. A. Hall, and C. J. Pal, 2016: Data Mining: Practical Machine Learning Tools and Techniques. 4th ed. Morgan Kaufmann, 621 pp.

  • Wu, Z., and Coauthors, 2020: Neural network classification of ice-crystal images observed by an airborne Cloud Imaging Probe. Atmos.–Ocean, 58, 303315, https://doi.org/10.1080/07055900.2020.1843393.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wylie, D. P., D. L. Jackson, W. P. Menzel, and J. J. Bates, 2005: Trends in global cloud cover in two decades of HIRS observations. J. Climate, 18, 30213031, https://doi.org/10.1175/JCLI3461.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Xu, Y., and R. Goodacre, 2018: On splitting training and validation set: A comparative study of cross-validation, bootstrap and systematic sampling for estimating the generalization performance of supervised learning. J. Anal. Test., 2, 249262, https://doi.org/10.1007/s41664-018-0068-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yu, W., K. Yang, Y. Bai, T. Xiao, H. Yao, and Y. Rui, 2016: Visualizing and comparing AlexNet and VGG using deconvolutional layers. Proc. 33rd Int. Conf. on Machine Learning, New York, NY, JMLR, https://icmlviz.github.io/icmlviz2016/assets/papers/4.pdf.

Save
  • Baum, B. A., A. J. Heymsfield, P. Yang, and S. T. Bedka, 2005: Bulk scattering properties for the remote sensing of ice clouds. Part I: Microphysical data and models. J. Appl. Meteor., 44, 8851895, https://doi.org/10.1175/JAM2308.1.

    • Search Google Scholar
    • Export Citation
  • Bernauer, F., K. Hürkamp, W. Rühm, and J. Tschiersch, 2016: Snow event classification with a 2D video disdrometer—A decision tree approach. Atmos. Res., 172–173, 186195, https://doi.org/10.1016/j.atmosres.2016.01.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carey, L. D., J. Niu, P. Yang, J. A. Kankiewicz, V. E. Larson, and T. H. Vonder Haar, 2008: The vertical profile of liquid and ice water content in midlatitude mixed-phase altocumulus clouds. J. Appl. Meteor. Climatol., 47, 24872495, https://doi.org/10.1175/2008JAMC1885.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Comstock, J. M., and Coauthors, 2007: An intercomparison of microphysical retrieval algorithms for upper-tropospheric ice clouds. J. Climate Appl. Meteor., 88, 191204, https://doi.org/10.1175/BAMS-88-2-191.

    • Search Google Scholar
    • Export Citation
  • Deng, J., W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, 2009: ImageNet: A large-scale hierarchical image database. 2009 IEEE Conf. on Computer Vision and Pattern Recognition, Miami, FL, IEEE, 248–255, https://doi.org/10.1109/CVPR.2009.5206848.

    • Crossref
    • Export Citation
  • Feind, R. E., 2008: Comparison of three classification methodologies for 2D Probe hydrometeor images obtained from the armored T-28 aircraft. South Dakota School of Mines and Technology Institute of Atmospheric Sciences Rep. SDSMT/IAS/R08-01, 61 pp.

  • Field, P. R., A. J. Heymsfield, B. J. Shipway, P. J. DeMott, K. A. Pratt, D. C. Rogers, J. Stith, and K. A. Prather, 2012: Ice in Clouds Experiment–Layer Clouds. Part II: Testing characteristics of heterogeneous ice formation in lee wave clouds. J. Atmos. Sci., 69, 10661079, https://doi.org/10.1175/JAS-D-11-026.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Garrett, T. J., C. Fallgatter, K. Shkurko, and D. Howlett, 2012: Fall speed measurement and high-resolution multi-angle photography of hydrometeors in free fall. Atmos. Meas. Tech., 5, 26252633, https://doi.org/10.5194/amt-5-2625-2012.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Grazioli, J., D. Tuia, S. Monhart, M. Schneebeli, T. Raupach, and A. Berne, 2014: Hydrometeor classification from two-dimensional video disdrometer data. Atmos. Meas. Tech., 7, 28692882, https://doi.org/10.5194/amt-7-2869-2014.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2015: Deep residual learning for image recognition. arXiv, https://arxiv.org/abs/1512.03385.

    • Crossref
    • Export Citation
  • Heymsfield, A. J., and C. Westbrook, 2010: Advances in the estimation of ice particle fall speeds using laboratory and field measurements. J. Atmos. Sci., 67, 24692482, https://doi.org/10.1175/2010JAS3379.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., A. Bansemer, P. R. Field, S. L. Durden, J. Stith, J. E. Dye, W. Hall, and T. Grainger, 2002a: Observations and parameterizations of particle size distributions in deep tropical cirrus and stratiform precipitating clouds: Results from in situ observations in TRMM field campaigns. J. Atmos. Sci., 59, 34573491, https://doi.org/10.1175/1520-0469(2002)059<3457:OAPOPS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., S. Lewis, A. Bansemer, J. Iaquinta, L. M. Miloshevich, M. Kajikawa, C. Twohy, and M. R. Poellot, 2002b: A general approach for deriving the properties of cirrus and stratiform ice cloud particles. J. Atmos. Sci., 59, 329, https://doi.org/10.1175/1520-0469(2002)059%3C0003:AGAFDT%3E2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., A. Bansemer, C. Schmitt, C. Twohy, and M. Poellot, 2004: Effective ice particle densities derived from aircraft data. J. Atmos. Sci., 61, 9821003, https://doi.org/10.1175/1520-0469(2004)061<0982:EIPDDF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Heymsfield, A. J., C. Schmitt, A. Bansemer, G. van Zadelhoff, M. McGill, C. Twohey, and D. Baumgardner, 2006: Effective radius of ice cloud particle populations derived from aircraft probes. J. Atmos. Oceanic Technol., 23, 361380, https://doi.org/10.1175/JTECH1857.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hicks, A., and B. M. Notaroš, 2019: Method for classification of snowflakes based on images by a Multi-Angle Snowflake Camera using convolutional neural networks. J. Atmos. Oceanic Technol., 36, 22672282, https://doi.org/10.1175/JTECH-D-19-0055.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Holroyd, E. W., 1987: Some techniques and uses of 2D-C habit classification for snow particles. J. Atmos. Oceanic Technol., 4, 498511, https://doi.org/10.1175/1520-0426(1987)004<0498:STAUOC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hong, Y., and G. Liu, 2015: The characteristics of ice cloud properties derived from CloudSat and CALIPSO measurements. J. Climate, 28, 38803901, https://doi.org/10.1175/JCLI-D-14-00666.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hou, Q., Z. Jiang, L. Yuan, M. Cheng, S. Yan, and J. Feng, 2021: Vision permutator: A permutable MLP-like architecture for visual recognition. arXiv, https://arxiv.org/abs/2106.12368.

    • Crossref
    • Export Citation
  • Houze, R. J., Jr., and Coauthors, 2017: The Olympic Mountains Experiment (OLYMPEX). Bull. Amer. Meteor. Soc., 98, 21672188, https://doi.org/10.1175/BAMS-D-16-0182.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Huang, G., Z. Liu, and K. Q. Weinberger, 2016: Densely connected convolutional networks. arXiv, https://arxiv.org/abs/1608.06993.

    • Crossref
    • Export Citation
  • Huang, L., J. H. Jiang., Z. Wang, H. Su, M. Deng, and S. Massie, 2015: Climatology of cloud water content associated with different cloud types observed by A‐Train satellites. J. Geophys. Res. Atmos., 120, 41964212, https://doi.org/10.1002/2014JD022779.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hunter, H. E., R. M. Dyer, and M. Glass, 1984: A two-dimensional hydrometeor machine classifier derived from observed data. J. Atmos. Oceanic Technol., 1, 2836, https://doi.org/10.1175/1520-0426(1984)001<0028:ATDHMC>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Iacobellis, S. F., G. M. McFarquhar, D. L. Mitchell, and R. C. J. Somerville, 2003: The sensitivity of radiative fluxes to parameterized cloud microphysics. J. Climate, 16, 29792996, https://doi.org/10.1175/1520-0442(2003)016<2979:TSORFT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Isaac, G. , and Coauthors, 2005: First results from the Alliance Icing Research Study II. 43rd AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, AIAA, https://doi.org/10.2514/6.2005-252.

    • Crossref
    • Export Citation
  • Jensen, E., and Coauthors, 2005: Formation of a tropopause cirrus layer observed over Florida during CRYSTAL‐FACE. J. Geophys. Res., 110, D03208, https://doi.org/10.1029/2004JD004671.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jensen, E., and Coauthors, 2017: The NASA Airborne Tropical Tropopause Experiment: High-altitude aircraft measurements in the tropical western Pacific. Bull. Amer. Meteor. Soc., 98, 129143, https://doi.org/10.1175/BAMS-D-14-00263.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jensen, M. P., and Coauthors, 2016: The Midlatitude Continental Convective Clouds Experiment (MC3E). J. Atmos. Sci., 97, 16671686, https://doi.org/10.1175/BAMS-D-14-00228.1.

    • Search Google Scholar
    • Export Citation
  • Knollenberg, R. G., 1972: Measurements of the growth of the ice budget in a persisting contrail. J. Atmos. Sci., 29, 13671374, https://doi.org/10.1175/1520-0469(1972)029<1367:MOTGOT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Korolev, A., and