1. Introduction
Cloud system appearance, defined here as structure and distribution, results from a complex interplay of physical processes and large-scale environmental conditions. Since the ground-breaking work by Howard (1803), there has been considerable interest in simplifying the complexity of cloud appearance and classifying cloud systems. With the availability of geostationary satellites providing images over large domains with ever-increasing temporal, spatial, and spectral resolution, the need for automatic classification becomes more and more imminent. A prominent pixel-based classification is the International Satellite Cloud Climatology Project (ISCCP; Rossow and Schiffer 1999), which defines nine cloud types based on thresholds in cloud optical depth (COD) and cloud-top pressure (CTP).
Extending pixel-based cloud classifications to a regional context, ISCCP has also established the concept of cloud regimes, sometimes also referred to as weather states, for analyzing cloudiness at regional scales. This approach was introduced by Jakob and Tselioudis (2003) and considers the joint two-dimensional histograms of cloud optical thickness and cloud-top pressure across a domain as a basis. Representative mean histograms for the different cloud regimes are obtained by k-means clustering (see MacQueen 1967). The method has subsequently been refined and extended to global-scale analyses (Tselioudis et al. 2013), applied to other satellite instruments such as the Moderate Resolution Imaging Radiometer (MODIS) (Oreopoulos et al. 2014), and used for climate model evaluation (e.g., Williams and Webb 2008; Jin et al. 2017; Tselioudis et al. 2021). Recently Tzallas et al. (2022) introduced a cloud regime dataset for Europe based on Meteosat Second Generation satellite observations and the Cloud Property Dataset Using SEVIRI, edition 2 (CLAAS-2) climate data record (Benas et al. 2017). In a more fitting variety of contexts, our work uses cloud regimes, cloud classes, and cloud patterns interchangeably.
However, one major shortcoming of this approach is that using the joint two-dimensional histograms of cloud properties completely neglects the spatial structure and organization of clouds within the considered domain. For cloud systems over oceans in the trade-wind regimes, some preliminary indications from recent research by Bony et al. (2020) showed how different mesoscale cloud patterns impact the top of the atmosphere radiation budget differently and, thus, the local climate. Therefore, assessing the organization of cloud systems is important to understand regional climate (changes) but also has potential for other applications such as renewable energy.
Various authors also proposed to use of machine learning algorithms to perform the cloud classification task. Visa et al. (1998) suggested an unsupervised approach making use of self-organizing maps (SOMs) using five channels from the Advanced Very High-Resolution Radiometer (AVHRR) in polar orbit, which were fine-tuned by learning vector quantization (LVQ). However, in that approach, the uncertainty in the cloud classification was high, and there was ambiguity in the physical understanding of the classification (Tian et al. 1999). Fabel et al. (2022) used a self-supervised approach to classify clouds from ground-based, all-sky images, and they labeled the clouds according to low, medium, and high cloud-base height. However, when prescribed to identify specific cloud features, the neural network does not fully utilize the rich representations present in the data, as it focuses on a narrow objective. To overcome the issue of a missing “ground truth,” Stevens et al. (2020) tried to understand cloud systems through visual inspection by scientific peers labeling satellite images and identifying four major patterns. Rasp et al. (2020), through a crowdsourcing effort, asked a group of scientists to label multiple independent and overlapping cloud organizations present in visible images of roughly a 2000-km domain into these predefined classes. Their study showed that the average participant needed 30 s to classify one image, and agreement over cloud patterns was only reached in 37% of cases, showing significant disagreement among scientists for the labeling task. Training a neural network with limited classes can also suppress the information on relevant other cloud regimes. Thus, Denby (2020) used unsupervised machine learning whereby inputted image combinations and loss function train the neural network’s parameters to learn the similarity between the same images again over the tropical North Atlantic Ocean. Denby showed that the identified classes are physically independent, supported by spectral bands representative of the physical properties. However, the approach involved human decisions in their learning stage, like the triplet selection of cloud images for training purposes. Additionally, in contrast to crowdsourcing and the Denby (2020) approach, Kurihana et al. (2022a,b) used learning of rotation-invariant, dimensionally reduced features through unsupervised autoencoders to generate a unique 22-yr-long record of global cloud classification with 42 clusters from MODIS. However, their dataset covers ocean regions only.
In this work, we want to investigate the suitability of a self-supervised deep neural network to identify distinct cloud systems over central Europe. In contrast to the tropical oceans, the central European domain is characterized by more synoptically driven cloud systems, strong land surface type, and topography variability that might hamper our endeavor. We use a resolution-enhanced cloud optical depth (COD) product, following Deneke et al. (2021), derived from Meteosat Second Generation (MSG). This can act as a proxy for future improvements in the spatial resolution by Meteosat Third Generation (MTG) observations, launched at the end of 2022. While other approaches directly used red–green–blue (RGB) images (Denby 2020) or infrared brightness temperature images (Schulz et al. 2021) and, thus, relied directly on the level-1 satellite observations, we opt for COD and thus a level-2 product as input. The choice of COD is beneficial as it is a cloud parameter that should be invariant to other influences; for example, it avoids the influence of the diurnal cycle evident in the reflected radiances measured by the satellite. Furthermore, COD has the highest accuracy of all the retrieved parameters in the satellite dataset (Deneke et al. 2021) and physically relates to the attenuation of clouds being the quantity that modulates solar energy yields most strongly. However, it needs to be noted that many of the boundary layer clouds driving the variability in solar energy might still not be fully resolved with the spatial resolution of our product.
The main challenge we tackle in the present study is whether deep neural networks can understand different cloud regimes based on the structure and distribution of COD and how this helps explore the cloud systems’ physical properties. We can frame four guiding research questions to detail our work better:
-
How can we use deep learning to capture cloud variability from COD images to identify different cloud regimes? Satellite images contain rich visual information, which we transfer into the feature space from which a given number of classes is extracted.
-
How can we physically interpret the identified classes? In the traditional computer vision domain, self-supervision is often used as pretraining, followed by fine-tuning with some labels as a downstream task for vision applications, for example, for medical image classification (Azizi et al. 2021). Without using labels to classify the scenes, we use information obtained by cloud and radiative properties retrieved from the multichannel MSG images to interpret the physical meaning of the classes.
-
What are the measures to identify the correct number of classes? Our principle for selecting the correct number of classes is that the right number of classes should be the best in identifying distinct properties for the classes in the physical parameter space avoiding correlation between classes.
-
Can the model generalize well on unseen data? In deep learning, the true capability of a trained model is realized on unseen data, so testing the model on unseen data (i.e., another year, in our case) will ultimately reveal the network’s performance.
One of the inherent challenges of our study is the closeness of the cloud system patterns as they transit from one form to another. Cloud systems are generally chaotic, meaning their transformation from one type to another can be quick and transient. We want the neural network to understand this continuous relationship and to generate an understanding of this connection in the feature space, that is, the n-dimensional space where the features of the cloud fields are collected.
An additional challenge is that the traditional datasets in computer vision, such as ImageNet (Russakovsky et al. 2015), are prepared in a controlled environment and have a well-balanced class distribution. In contrast, the satellite images are uncurated as their distribution space is not controlled. It is, thus, very likely that the classes are not well balanced and hence have a distribution sample imbalance among the categories, also known as the long-tailed effect. Under this effect, during training, the major cloud systems present in the satellite measurements drive the decision boundary of the neural network. Recent efforts have shown that self-supervised learning (SSL) handles the imbalanced learning problem well (Liu et al. 2022). The learned features are more robust and easy to generalize as the SSL extracts more comprehensive information from the data space.
The present work has the following structure: Section 2 describes the satellite dataset and our working domain. Section 3 provides details on the deep learning architecture, section 4 presents our results that aim at answering the questions posed above, and section 5 concludes on our results and points at future work. In the following work, cloud regimes, cloud classes, and cloud patterns have been used interchangeably.
2. Satellite dataset
Geostationary satellites provide high-spatial and -temporal resolution monitoring of Earth with ever-increasing capabilities and, thus, data volume. For Europe, the Meteosat Second Generation satellites located over the equator and the prime meridian have offered full-disk, multispectral images at 15-min temporal resolution and 3 × 3 km2 nadir spatial resolution and in 11 narrow-band spectral channels since their launch in 2002 (Schmetz et al. 2002). To mimic the capabilities of the upcoming MTG mission, we use a MSG cloud dataset, which features an improved spatial resolution (Deneke et al. 2021). Herein, the Meteosat–9, Spinning Enhanced Visible and InfraRed Imager (SEVIRI) is used, which is positioned at 9.6°E as a backup satellite and provides the so-called rapid scan service on a 5-min repeat cycle. Note that within the operational Meteosat program, Meteosat–11 has taken over the rapid scanning at 9.6°E and Meteosat–9 has drifted over to 45.5°E. Channels 1–3 (λ = 0.635, 0.810, and 1.640 μm) are used in combination with the broadband high-resolution visible (HRV) channel (λ = 0.4–1.1 μm) to retrieve and downscale COD and effective cloud radius re from the reflected solar radiation observations. The retrieval is based on the Cloud Physical Properties algorithm developed at the Royal Meteorological Institute of the Netherlands and applied within the context of the Satellite Application Facility on Climate Monitoring (CMSAF) for the generation of the CLAAS-2 cloud data record (Roebeling et al. 2006). Together with COD and re, other cloud parameters are retrieved, such as liquid water path WL and droplet number concentration ND. The dataset also includes CTP derived from thermal infrared channels via radiative transfer calculations, but this dataset does not benefit from the resolution enhancement. All products have 2 km × 1 km spatial resolution and 5-min temporal resolution over central Europe.
We decided to use COD as input for our self-supervised approach because (i) the use of COD instead of reflectance(s) avoids the consideration of the solar zenith angle and surface characteristics, which have a strong influence on the reflectance; (ii) COD is the most accurate retrieval product (Deneke et al. 2021) at the enhanced resolution; and (iii) COD is related to surface radiation and, thus, is important for solar energy application, which we aim to target at a later stage. Nevertheless, it is important to understand the magnitude and impact of retrieval uncertainties. The bispectral retrieval method used for estimating the COD relies on the principles described by Nakajima and King (1990) and is highly nonlinear. A variety of uncertainty sources affect the overall accuracy, as is discussed in great detail by Grosvenor et al. (2018). Specifically, retrieval results are affected by uncertainties of the instrument itself and its calibration, atmospheric correction, and assumptions on surface albedo. Errors in COD become excessively large outside a value range of approximately 5–50 but are expected to lay below 8%–10% within this range (see Fig. 14 in Platnick et al. 2017). This estimate is valid for warm clouds as observed by the MODIS instrument. Given the efforts directed to validation and instrumental calibration within the context of the Satellite Application Facility on Climate Monitoring (see Benas et al. 2017; Meirink et al. 2013), a similar accuracy is expected for the MSG-based retrievals of COD used here. It must, however, be noted that this uncertainty estimate does not take into account the effects of unresolved cloud variability, retrieval uncertainties arising for mixed-phase and ice clouds, as well as three-dimensional radiative effects. Given the focus of the present paper to classify cloud scenes into characteristic regimes based on the COD, there is hope that such uncertainties are regime dependent and implicitly taken into account by the classification.
We focus on a central European domain (Fig. 1), which exhibits a wide variety of clouds associated with synoptic events and local forcings. While we intentionally apply a machine-learning (ML)-based cloud classification over land surfaces for the first time, we exclude the Alps’ topography, where the reflective snow can complicate cloud retrievals. The study period includes spring and summer, that is, from April to July, to capture more variability in the cloud conditions, including locally triggered low clouds, which pose a challenge to numerical modeling and solar energy production due to their high variability. We avoid dusk and dawn by limiting the period to 0600 to 1800 UTC. The year 2013 is used for training purposes, and the year 2015 to test the general applicability of the approach.
(top) Geographical map of Europe shown along with topography. Thick black lines follow the coast, the thin black lines are political boundaries, and the black-outlined box shows the central European domain. (bottom left) MSG high-resolution cloud optical depth of the central European domain at 1330 UTC 22 May 2013. Four images from 128 × 128 and 64 × 64 configurations are randomly cropped out at every time step. The boundary box inside the COD field represents a 128 × 128 area over Jülich; cropping at every time step is not used during training but is specifically used for the transfer learning task. (bottom right) The central European domain’s natural color RGB image at standard resolution is shown for better context. Note that cyan colors in the RGB indicate ice.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
Ad hoc, it is unknown which spatial scale is most meaningful for classification. Here, we use images of 128 × 128 pixel size corresponding roughly to 260 km × 135 km, where cloud systems distributions up to the mesoscale are well represented. This choice is consistent with previous studies that used domain sizes of around 250 km × 250 km (Denby 2020) or from 1° × 1° to 2.5° × 2.5° (Jakob and Tselioudis 2003; Oreopoulos et al. 2016). We also test a smaller image size of 64 × 64 pixels, approximately 130 km × 68 km, to study the influence of image size. At any instant, we crop four random images from our parent central European domain (Fig. 1). Therefore, with 144 parent images per day for about 120 days in 2013, we obtained 68 000 images per year. We keep 10 000 images unseen from the model for evaluation purposes. We also extract data over the specific location of Jülich, Germany, to investigate the temporal variability and the transition from one class to another. The data over Jülich are not used during training and are specifically used for the transfer learning task. Also, data from 2015 are left out of training to investigate the generalization.
3. Deep learning architecture
Recent years have seen enormous progress in representation learning by the deep learning community (Caron et al. 2020; van den Oord et al. 2017; He et al. 2020). They facilitate the intelligent use of uncurated images with modest computational requirements. In our case, we want the neural network to learn distinctiveness in the distribution pattern of the cloud systems. The network should learn cloud systems’ global and local patterns and distribution properties from satellite images. As cloud systems change continuously, the neural network should learn the features of the satellite images such that it can discriminate among them instantaneously.
To set up this architecture, we make use of the software package DeepCluster, version 2, (DCv2) from Facebook Artificial Intelligence Research (FAIR) (Caron et al. 2018, 2021). The open-source Vision Library for State-of-the-Art Self-Supervised Learning Research (VISSL) (Goyal et al. 2021) was used to adapt the DCv2 neural network to our requirements. Based on sensitivity tests, we set the number of epochs always to 800 and, depending on the available graphics processing unit (GPU) type, batch size is set to 750 or 800.
For the classification task, we opted for a self-supervised approach because, during the representation learning phase, it learns more comprehensively and freely from the data distribution. In our self-supervised method, the network learns directly from the satellite COD observations from an initial dataset of 58 000 images from 2013. The network ingests data in batches, that is, small groups of images, and, for each epoch, the entire 58 000-image dataset makes the complete pass in 72 batches. For each batch, the network adjusts its parameters through stochastic gradient descent to achieve the minimization of the cross-entropy loss function. After 800 epochs (see above), the parameters are optimized. The network can now work with a dataset of COD images the model has not been trained on, and it will assign new images into k categories (or cloud classes). The user decides the number of categories k for clustering clouds; for this study, we varied k between 4 and 12. Our experiment is guided by the well-established ISCCP classification with nine cloud classes and the clear-sky category. Thus, we started with k = 10. To investigate the sensitivity of cluster numbers, we run the experiment also with a higher class number, such as k = 12, and reduce the number until k = 4. The optimum number is investigated in section 4c.
a. Definition of the network input
The input training dataset
Schematic diagram of the deep learning network adapted in this work from the VISSL library. From left to right,
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
b. Network general architecture
The neural network’s task is to learn visual features from each satellite image. A function f describes the learning process via a nonlinear mapping f(
c. Upper branch of the network
In the upper branch (Fig. 2), applying the function f to the image
d. Lower branch of the network and cost function minimization
e. Network learning behavior
To better understand how the computational model works internally and learns progressively from subsequent epochs, we extract the features at different levels of training to visualize and better comprehend the progress in the clustering around the k classes provided as input. Figure 3 shows the ability of a self-supervised approach to extract the rich representations in the dataset. In our case, we do not have labels to measure the accuracy of the scene classification task. Therefore, we assess the quality of the learning algorithm by looking at the dimensionally reduced feature vector.
Progressive visualization of the dimensionally reduced feature space for 10 classes (color coded) and subsamples size of 15 000 at (a) 1 epoch, (b) 25 epoch, (c) 250 epoch, and (d) 800 epoch by the neural network. The perplexity used for tSNE runs is 30.0, and the epsilon derived from autoconfiguration is 1150.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
To represent the M-dimensional feature vector Z, in this case, we used the t-distributed stochastic neighbor embedding (tSNE) algorithm (van der Maaten and Hinton 2008), which tries to preserve the relative positions of points while mapping them to lower-dimensional space. Here, we map the 128 dimensions into two and, for the visualization in Fig. 3, we prescribe the number of classes to 10 (k = 10).
The neural network ingests the data at the first epoch and performs spherical clustering for the first time using the embeddings (a synonym for feature vector Z). In this way, the model generates the first idea of a few different class distributions (e.g., the pink and yellow dots in Fig. 3a). At epoch 25, spirals of data points are clustering together (Fig. 3b). To interpret self-supervised learning using an energy-based, regularized framework (LeCun and Huang 2005), an energy function, the cross-entropy loss here, assigns low energies to training samples. It uses a regularizer, spherical k-means here, and the loss function pushing the energy down along the data points that minimizes the volume of the low energy regions. In such context, Ranzato et al. (2007) showed that compressing the information to represent the data can help to reduce the training points’ energy and maximize the unobserved points’ energy. We observe a spiraling behavior supporting the energy-based-model theory. At epoch 250 (Fig. 3c), the model achieves convergence, and near epoch 800 (Fig. 3d), a clear segregation among the classes appears with, perhaps, some outliers (e.g., the yellow dots in Fig. 3d). The good performance of the network displayed convinced us to stop the training at 800 epochs.
f. Optimization for the convergence
We used stochastic gradient descent (SGD) (Ketkar 2017) to optimize the training process with momentum as compared with gradient descent, which uses the average gradient of all training samples. SGD allows for a slightly different loss curve with a randomly selected minibatch, and this helps to reduce the computations enormously. Using momentum in the SGD lets the past and the current gradients guide the overall direction of the loss curve. In addition, DCv2 uses a learning-rate scheduler that modifies the learning rate based on the progression in the training process. We use the learning-rate scheduler of linear warm-up plus cosine with the linear start value of 0.3, end value of 4.8, cosine start value of 4.8, and end value of 0. VISSL supports mixed-precision training from NVIDIA’s Apex library to reduce the model memory requirement and better training speed. We have used mixed precision to save GPU memory and thus allow larger batch-size training. Therefore, NVIDIA’s Large Batch Training of Convolutional Networks (LARC) and SGD are used to improve large-batch-size training convergence. To avoid collapse, DCv2 applies batch normalization for stability apart from the clustering constraint. The task of the batch-normalization layer is to take outputs from the last layer and normalize them before passing them as input to the next layer. For multi-GPU training, synchronized batch normalization normalizes the input within the minibatch because it helps share the statistics across all the devices. For conversion to global batch normalization, VISSL supports NVIDIA’s Apex SyncBatchNorm. We have used four NVIDIA V100 GPUs for training purposes. Training the neural network for 800 epochs on four V100 GPUs took 16 h, or 64 core hours in total.
4. Results
a. Analysis of the clustering capability
Once the network has sorted the individual images into different classes, we want to check the consistency within a single class. Thus, we quantify how close to each other the evolving cloud patterns are in the feature space. We can check how images from the same cluster that are spatially close or far in the feature space are visually similar to gain confidence in the quality of the representation of the different cloud patterns in the feature space. For this analysis, for each centroid, we select the closest image to a centroid (hereinafter referred to as centroid) over central Europe (Fig. 4) and compare them with their two closest and their two farthest images. The seven centroids in Fig. 4 show distinct COD patterns; however, the physical separation of identified classes and their interpretation will become clearer in the later sections. Centroid 2 is associated mainly with clear-sky conditions, and centroid 3 has optically thin clouds. Centroids 4–6 reveal extended cloud fields with a mean cloud fraction higher than 94%, though their distribution differs, with centroid 6 having the highest COD values, representing intensive convective cells. The images closest to the centroids resemble their behavior, while the ones farthest away sometime deviate to a larger extent. That indicates the distance to a centroid can be used to measure the classification’s quality. It is worth noting that Fig. 4 visually demonstrates the distinctiveness and separation of the features learned within a single class from the cloud systems found in other classes. Additionally, to quantitatively assess the quality of our embeddings in seven clusters, we calculated the silhouette score (Rousseeuw 1987) for our high-dimensional and complex dataset, yielding a score of 0.54. This score ranges between −1 and +1, and low or negative values indicate that the clustering configuration may have too many or too few clusters. Thus, the mean score of 0.54 for all the embeddings shows that clusters are considerably compact and nonoverlapping, supporting our qualitative assessment. Therefore, scores like silhouette can also be used to find the optimal number of clusters, as discussed in section 4b. Further analogies in the cloud patterns will be investigated in section 4d, where we aim at a physical interpretation.
Each row shows five 128 × 128 COD images belonging to a certain cloud regime (CR) over central Europe for (left) centroid, (left center) closest first, (center) closest second, (right center) second farthest, and (right) farthest; the color scale is shown for COD reference purposes.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
b. Determination of the optimal number of classes
As stated in section 3, users provide the number of classes k utilized by the network to cluster the data. We want to understand, for each configuration, what is the optimum number of classes. This number should be as low as possible, but at the same time, it should separate the radiative and physical properties of different clouds. We use, for this purpose, the spatial distribution of COD and CTP as retrieved from MSG. In this way, we follow Oreopoulos et al. (2016), who distinguished cloud regimes based on COD–CTP using k-means clustering methods, starting from a core ensemble of six cloud regimes and then verifying if each regime can be split into subregimes. To split a regime into subregimes, they evaluate the differences in the spatial pattern correlation coefficients between their centroids (mean joint histograms) to some threshold values.
Here, we use a bottom-up approach, starting from varying amounts of cloud regimes (from 12 to fewer cluster numbers). We check the correlation in the joint histograms of COD and CTP and also look at sensible distributions of the mean cloud fraction and the relative frequency of occurrence (shown in Fig. 6) over the central European domain (more details on the joint histograms can be found in section 4d). Based on this, we define the following thresholds for the correlations: 0.65 for the 128 × 128 and 0.7 for the 64 × 64 sample images. A correlation between two clusters higher than a threshold value would indicate that clusters are not sufficiently independent. In fixing the threshold, we got inspired by Oreopoulos et al. (2016), who selected a threshold of 0.6, but in this work, the thresholds are slightly more restrictive. Additionally, for 64 × 64 sample images, since the domain is comparatively smaller than the 128 × 128 sample images, there is more variability in terms of cloud system pattern and distribution. Here, we restrict with a relatively higher threshold of 0.7.
For the 128 × 128 configuration, Fig. 5 shows that the configuration with seven classes is the first one in which the correlation among the classes is never larger than the prescribed threshold. The number of classes whose correlation is higher than the threshold (red squares in Fig. 5) is reduced if we decrease the number of cloud regimes from 12 to 7. The network can distinguish seven classes, thus the largest number of classes, without producing some superposition of the cloud radiative properties. In conclusion, we find the optimal number of classes to be seven for the 128 × 128 configuration and eight for the 64 × 64 configuration (see the online supplemental material).
Correlation patterns among classes for the number of cloud regimes varying from 12 to 7 for the 128 × 128 configuration. Red squares with hatching indicate the classes with a correlation higher than 0.65 in the COD–CTP space. Note that for smaller numbers of cloud regimes no significant correlation appears. A similar figure for the 64 × 64 configuration is shown in the online supplemental material.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
c. Optimum classification
Now we want to investigate how the clustered satellite images arrange themselves in the representation space. Our goal is to see if, with k classes (as determined in section 4b) as a clustering constraint, the features from the satellite images are strong enough to be captured by the deep neural network for a qualitative classification of distinct types of clouds. Since visualizing the feature vector Z is impossible because it has 128 dimensions, we reduce the dimension of the feature array to a 2D space using the tSNE algorithm by van der Maaten and Hinton (2008) and visualize the classes in a two-dimensional space in Fig. 6.
Classification by the deep neural network at (a) k = 7 for the 128 × 128 configuration and (b) k = 8 for the 64 × 64 configuration. Thirty random samples were selected over central Europe and visualized for each class. Each image is assigned a unique class through a colored frame. The bar charts in the lower part of (a) and (b) represent the number of images in relative percentage and the cloud fraction in each regime. To better associate the class number with the cloud regime, the centroid image is shown as well.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
Figure 6 shows the representation learning capability of the neural architecture using the self-supervised algorithm. When looking at the two-dimensional reduced feature space, distinct cloud patterns occupy distant areas. As already noted in section 4a, class 2 relates to clear-sky conditions and is located clearly on the edge of the 2D space (Fig. 6a, left). In contrast, on the opposite edge (Fig. 6a, right), we find the classes with the strongest convective cloudiness (classes 5 and 6), as identified by small areas of enhanced COD. Similar distinct behavior can be found in other regions of the 2D space and will be discussed in more detail in the next section. In comparing the configurations with 128 × 128 and 64 × 64 pixels, one can initially notice that the latter captures more variability in the cloud systems because smaller images can detail more information on cloud structures. Section 4d will investigate this in more detail.
d. Physical significance of the identified classes
In this section, we analyze the physical properties of the identified cloud classes. The goal is to verify the independency of the identified classes in the physical spaces given by a combination of radiative and cloud properties. An overview of different characteristics in different classes is given in Table 1. We first use the surface downwelling solar radiation (SDS), and the top-of-the-atmosphere (TOA) reflected solar radiation (TRS) (Fig. 7a). Note that these products have been derived with radiative transfer calculations (see section 2). Moreover, we analyze CTP and COD (Fig. 7b). As in Denby (2020), we randomly select 1000 images for each of the seven optimal classes of the 128 × 128 configuration. Then, we derive their mean position in the space of the SDS–TRS (Fig. 7a) and similarly for COD and CTP (Fig. 7b).
The relative frequency of occurrence (RFO), mean and standard deviation (std) of cloud mask (CMASK), cloud optical depth (COD), cloud-top height (CTH), cloud-top pressure (CTP), cloud-top temperature (CTT), shortwave downwelling radiation (SDS), and TOA reflected solar radiation (TRS) for all of the classes identified by the network in the 128 × 128 configuration.
(a) Mean TOA TRS vs mean SDS given by dots of 1000 randomly selected images of the 128 × 128 configuration, with color indicating the class to which they belong. Triangles indicate the mean position of the classes. The images associated with the triangles are the closest to the mean position of the 1000 images. The crosses in the triangles represent the error bars of each cluster group for the radiative properties on x and y. (b) As in (a), but in the 2D space given by COD (abscissa) and CTP (ordinate).
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
The collocation of the classes in the radiative and cloud properties space permits the identification of the types of clouds the network distinguishes (Table 1, Fig. 7). Class 1 (light blue dots) probably corresponds to cirrus clouds with a COD mean value smaller than 10 and cloud top pressure around 400 hPa, at approximately 7500 m. Cirrus clouds show a large variability in albedo and surface downwelling solar radiation that depends on the natural variability of cirrus clouds: they can be thicker or semitransparent and contain larger or smaller ice crystals, profoundly impacting their radiative properties. The cloud-free class 2 has the lowest cloud fraction (27%), with most solar radiation reaching the surface. Because of the lower albedo of the surface when compared with clouds, less radiation is reflected back to space. Note that SDS and TRS experience a strong diurnal cycle, and only at noon are SDS values >700 W m−2 reached. Classes 3 and 7 correspond to relatively low-level clouds (orange-shaded dots in Fig. 7) with a medium cloud fraction of 65% and 88%, respectively. These clouds have relatively small cloud optical depth values, with their mean values displaying cloud-top altitude below 600 hPa (around 3000 m). Note that frequently low-level clouds are topped by overlaying (semitransparent) ice clouds (as shown in Fig. 1), which explains the relatively high standard deviation in cloud-top height (CTH) of more than 2 km. However, there are differences between class 3 and class 7: Class 3 has only half of the COD as class 7, but its relative standard deviation is much higher, indicating high variability, which is demanding for solar energy production. Class 7, possibly associated with deeper low-level clouds, displays larger COD values and smaller surface downwelling solar radiation than the shallower low-level clouds of class 3.
Classes 4, 5, and 6 (blue-shaded dots in Fig. 7) all describe overcast situations with cloud fractions of more than 94%. For these classes, average COD values are larger than 10, and they span a broad region of CTP values above 800 hPa, indicating various stages and intensities of convective development. There are more interesting, different features among the classes. Classes 5 and 6 have very similar mean values of COD and CTP around 25 and 500 hPa, which especially differ from the mean COD of 15 and the CTP of 650 hPa of class 4. These differences indicate that classes 5 and 6 correspond to deeper and thicker clouds than class 4. However, class 6 has the highest relative standard deviation of COD (55%) of the three classes, which aligns with the impression of the strongest convective activity, as discussed with respect to Fig. 6. Still, there is a distinct difference, as class-5 clouds reflect twice as much radiation into space as class 6. Thus, despite similar cloud fraction, the network is able to identify different structural features within these three “overcast” classes.
Last, we investigate the joint histograms of cloud optical depth and cloud-top pressure (Fig. 8) for distinctness. Clearly, each class occupies its characteristic areas in the two-dimensional space; however, some overlap exists. From the three overcast classes, both classes 5 and 6 include high clouds, though class 6 is more limited to the highest tops and has a broader spectrum of COD. The third overcast class (class 4) also includes lower clouds and thus slightly overlaps with the low-cloud class 7, but the occurrence of high COD values (23 and above) becomes more prominent for class 4 at the midcloud level. The overlap between classes 2 and 3, clear-sky and shallow low-level clouds, respectively, can be explained by the rapidly changing nature of low-level clouds, which can easily form or dissolve and show large variability in space.
Joint histogram of COD (abscissa) and CTP (ordinate) for the (top) 128 × 128 pixel and (bottom) 64 × 64 pixel configurations.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
For the fine resolution of 64 × 64 also, no strong overlap exists, but it appears that the high occurrence is bound to smaller areas in the COD–CTP space except for classes 6 and 8, likely due to the stronger separation into eight classes. In fact, this might be helpful to identify low-level clouds, which seemed to be more confined in class 5, as in the 128 × 128 configuration. Overall, we find that the classes that are identified and the properties that are represented through these classes are exclusive to a large extent, although some overlaps occur. Therefore, using satellite-retrieved products supports the physical interpretation of the identified cloud classes. Note that a detailed investigation of the different classes’ characteristics is beyond this study’s scope.
e. Generalization
This section will test how the model behaves when ingested with satellite data from a different year. The research question we want to answer is whether the model can identify the same centroids detected with the data from 2013, using data from 2015. The goal is, thus, to discover how the centroids will arrange themselves to represent the cloud regimes of unseen cloud system distributions. The only information the network retains from the run on 2013 data are the values of the internal parameters of the neural network (CNN and MLP). We also classify the 2015 dataset into the same number of optimal classes as discussed in section 4c.
To answer this question, we then correlate the 2013 and 2015 centroid embeddings and compare them with other respective centroids for both configurations (Fig. 9). This type of analysis should reveal the intercluster spread in both years. The 128 × 128 configuration generally performs better on unseen data than does the 64 × 64 configuration. In the 128 × 128 configuration, all centroids from unseen (2015) data coincide with the centroids from seen (2013) data (Fig. 9a). The highest off-diagonal, columnwise centroid correlation is 0.53, which suggests that in a 128 × 128 configuration, the newly identified classes from the 2015 dataset fit nicely with the pretrained network from 2013.
Correlation of centroids obtained from the 2013 and 2015 datasets for the (a) 128 × 128 configuration and (b) 64 × 64 configuration. Below each joint histogram are the images closest to the centroid from 2015 (indicated with an asterisk) and 2013 for each class. Note that the class number is re-sorted as indicated by the asterisk according to matching in the correlation of centroids.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
In the 64 × 64 configuration, it is more challenging for the network to classify scenes similar in both years. Here, the off-diagonal, columnwise centroid correlations are much higher, with values up to 0.93. This especially holds for two classes: 1) a mixture of deep convections along with a spread out of low conviction (class 3), and 2) varying clear-sky conditions (class 1). For example, both centroids of classes 3 and 6 from 2015 display a correlation higher than 0.9 (both main and off-diagonal) with class 3 of 2013, while other centroids get discriminated more clearly (Fig. 9b). These classes have almost the same type of mixed convection, but rather than being in one class, as in 2013, they belong to two classes in 2015. Furthermore, from 2015, class 8 represents very low convection, whereas class 4 groups almost clear-sky cases. Both of these classes have correlations of 0.84 (main diagonal) and 0.74 (off diagonal) with class 1 of 2013, representing an almost clear sky. Similar to the mixed-convection case, rather than being in one class of almost clear-sky type, this type is present in two different cloud classes in 2015.
While the generalization works well for the 128 × 128 configuration, problems appear for the smaller scale configuration. We think this might be related to two probable reasons. The first is the spatial scale of cloud systems, that is, the scale at which cloud systems occur over a region. For the network to properly represent the clouds, such a scale should be well represented in the input dimensions (128 × 128 representing well over central Europe). Still, in the 64 × 64 case, the scale captured might be too small to understand the structure and distribution of cloud systems, which is especially challenging when scales include a mixture of clear-sky, convective, and stratiform regimes. The second reason is we get more classes of variability on a smaller scale and thus likely need more training data. However, letting the network run on 2015 data for a few more epochs might also improve its performance and decrease the centroid correlations in complex cloud systems.
f. Persistency of cloud regimes and probability of transition
We move toward applications and use the network’s ability to identify cloud regimes over a fixed location (Jülich in our case). We can study the temporal development of specific cloud regimes and exploit the 5-min temporal resolution of the dataset to observe the transformation of cloud systems with time at a higher resolution. Note that the network trained with random crop settings over central Europe has not previously ingested the temporal continuous data over Jülich. In other words, the neural network has no idea of the temporal cohesivity of the satellite images. This experiment also gives us insights into the application of transfer learning for the neural network to study cloud systems over a fixed geospatial location. Looking at images from Jülich (see Fig. 1) for an entire day, we can investigate how one class transitions toward another in time. The goal is to understand how close classes should be in the feature space to make a transition and derive a transition probability from one cloud regime to another. Figure 10 illustrates the temporal evolution of cloud regimes for two selected days. In this way, the computational model can provide information on the persistency of the cloud classes over a specific region. Furthermore, it also reveals that sometimes a mixed regime appears, as can be seen for 20 June 2013 (Fig. 10a) in the morning when the alteration between two deep convections (classes 5 and 6) is observed. The network also shows how deep convection decreases, the lower-level cloud appears (class 3), and the clear-sky conditions prevail in the afternoon. Figure 10 also illustrates how similar the four closest images (in terms of cosine distance) look to a randomly selected test image for both days.
Temporal evolution and transition of two example days for the cloud systems over Jülich on (a) 20 Jun 2013 and (b) 25 Jul 2013 as identified by the neural network. Here t = 0 represents a COD image that is given as a test case (identified with the plus sign), and the neural network is asked for the four nearest images sampling over Jülich. The time stamps represent the order of their closeness from the test image (t = 0) in the feature space.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
More specifically, we can detect how long a specific cloud regime stays, on average, over the location or what is the most probable cloud regime it can evolve into. In the 128 × 128 configuration, the clear-sky regime (class 2) displays the longest persistency over the site (Fig. 11a). It is also the cloud regime that most probably evolves into the shallow, low-level type (class 3) (Fig. 11b). Similarly, classes 4, 5, and 6, representing different stages of convective cloud structures and distributions, display a persistency of around 75 min and often evolve into one another. Class 7 stays in one fixed location for the shortest time: an average of 30 min. These clouds are located near the boundary layer and could represent deeper low-level clouds. They often evolve into class 5 clouds, that is, grow deeper to a full convective cloud or, with slightly less probability, can also evolve into class 3 (shallower low-level) clouds, maybe after a precipitation event. Note that the low persistence time results from the occurrence of frequent jumps between two classes (see Fig. 11) during the transition time, and that temporal smoothing might be beneficial for the analysis of the long-term time series.
(a) Cloud systems persistency over Jülich given by the average interval of occurrence of cloud regimes (CRs) over the site. (b) Matrix representing the probability of transition of each cloud regime to another class derived over Jülich. We used 1 year of data for both analyses.
Citation: Artificial Intelligence for the Earth Systems 2, 4; 10.1175/AIES-D-22-0096.1
5. Conclusions and outlook
This work uses a deep neural network architecture and a spherical k-means algorithm for the first time to detect cloud regimes from cloud optical depth images over Europe. Classifying cloud regimes is a long-standing problem in Earth system sciences. Central Europe’s midlatitude geographical location diurnally gives rise to majorly two cloud families: 1) a mix of stratiform and convective clouds for nighttime and 2) a convective type in the daytime. Keeping our long-term vision on solar energy applications, we wanted first to disentangle them and focus on convective cloud systems. We used COD measurements, available in the daytime only, for summer periods. In this way, we were able to investigate a wide variety of convective and stratiform clouds. In the future, with many more years of data, we intend to analyze the diurnal behavior of the different classes. We propose this approach to exploit the greater freedom that the deep neural network offers in understanding the complexity of the cloud patterns. We then verify the results by analyzing the cloud physical and radiative properties of the cloud regimes obtained from the network. The cloud optical depth values and all the other radiative properties are retrieved as a postprocessed satellite product by Deneke et al. (2021) to mimic the MTG. In this way, our approach can also be seen as an opportunity to reduce future satellite missions’ complexity and data volume. We investigated two different configurations of 128 × 128 and 64 × 64 pixels to study the effect of scale. At first, we tried to understand how close classes are in the feature space to better capture how cloud regimes can transform into one another. A high similarity is found among the closest and farthest images to a randomly selected image from each class. We then determined the optimal number of classes representing the cloud pattern variability. The cloud regime number that adequately represents independent and mutually exclusive cloud patterns is seven for the 128 × 128 configuration and eight for the 64 × 64 configuration. By adopting those numbers of classes, in fact, the cloud regimes appear visually distinct and positioned in different areas of the feature space.
Exploiting cloud and radiative satellite products, namely cloud optical depth, cloud-top pressure, surface downwelling, and TOA reflected solar radiation, we associated each cloud regime with specific physical properties for the 128 × 128 configuration. Class 1 represents cirrus clouds, while class 2 groups clear-sky cases; classes 3 and 7 represent two types of low-level clouds, namely shallower and deeper. Classes 4, 5, and 6 consist of various convective clouds.
To test the applicability of the network, which did not get any temporal information during training, we investigate temporal variability of the cloud regimes over a specific location (Jülich). In the 128 × 128 configuration, we identified the most persistent cloud regime as the clear sky (155 min), followed by convective clouds (70–75 min). Low-level clouds, as well as cirrus clouds, last around 30–40 min. These results can help us better understand each cloud class’s diurnal cycle over the area of interest and are relevant to describe the highly variable conditions for solar energy production.
We also tested how the network was clustering data on a new unseen dataset by running it with data from 2015. We found that all centroids from unseen data coincide with those from 2013 in the 128 × 128 configuration. In the 64 × 64 configuration, the network had difficulties discriminating the same centroids, possibly also due to the smaller dataset available. However, the generally good agreement between 2013 and 2015 shows that the data distribution between the 2 years has not changed much. Nevertheless, this highlights the importance of choosing the right image size according to the problem at hand.
This study followed a purely data-driven approach and did not investigate the properties of detected cloud regimes in detail. However, in the future, we aim to investigate which classes correspond to situations most determining for production and distribution of solar energy. Herein, also, the transition probability between different classes offers the potential for short-term prediction capability. For this application, additional training information will also be explored.
Despite using a resolution-enhanced product, geostationary measurements are limited in the detection of boundary layer clouds because of their lower spatial resolution. The question that still can be further explored is whether the lower image resolution offered by geostationary satellites eliminates valuable information for the classification of boundary layer clouds. Sabottke and Spieler (2020) find a case-to-case dependence of image resolutions in radiography. Therefore, determining the optimal satellite resolution for identifying different cloud regimes by machine learning applications remains an open problem and will depend on the application. COD retrieval is also prone to uncertainties (see section 2) Tan et al. (2015) find that cloud regimes are less susceptible to minor, random, or statistically insignificant changes in cloud properties. The highest uncertainties occur for COD retrievals at large values outside the range of 50. However, at these high values, downward solar radiation is already heavily extinguished, so the application for solar energy, which we aim to pursue next, will not be affected strongly.
This work shows the potential inductive bias of convolutional architecture to handle complex patterns during the first few initial epochs of self-supervised training. At each epoch, the representation learning of the feature improves. However, future work can focus on seeing whether the network can also learn some hidden properties of the satellite images, such as the geolocalization of cloud systems or environmental parameters associated with a given cloud pattern. The network in the feature space might be able to reveal the specific combination of environmental parameters leading to convection.
In this work, one of the main difficulties connected with the training of the DCv2 on satellite images is the absence of control over the distribution of images. For example, specific cloud types, like shallow clouds, could be overrepresented within the datasets in comparison with deep convection. In this case, the model could saturate, and it would not be able to extract more information when fed additional images. Moreover, by design, DCv2 favors a well-balanced class dataset; therefore, the network’s performance is expected to decrease when the algorithm works with satellite images, because naturally occurring cloud systems do not conserve class balance properties. One solution to attempt in future work could be to attach a weight (1D tensor) to the cross-entropy loss function that contains the prior knowledge about the class distribution of the dataset.
Satellite images have more factors of variation than do typical benchmark computer vision datasets such as ImageNet. For example, clouds usually spread over the image, generating a different distribution from a usual computer vision dataset, which is normally highly object oriented and centered. Next, it is also likely that two or more very different cloud patterns may be present in an image and thus have multiple concepts in a single image. However, the spherical k-means algorithm uses the dominant concept (pattern and distribution) present in the images and attributes the whole image to a single label. Thus, we think it would be beneficial if we could extract multiconcepts from the features and use them for training. Present-day transformers pretrained using self-supervision and different pretext tasks (Atito et al. 2021; Caron et al. 2021) have shown good promise in understanding the contextual relation of semantics present in the image. Their suitability will be investigated in the future.
Acknowledgments.
Author Dwaipayan Chatterjee’s research has been supported by the Federal Ministry for the Environment, Nature Conservation, Nuclear Safety, and Consumer Protection (Grant 67KI2043; https://kiste-project.de/). This work was performed as part of the Helmholtz School for Data Science in Life, Earth, and Energy (HDS-LEE). Chatterjee also thanks Quentin Duval and others from Facebook’s artificial intelligence research for maintaining the VISSL computer vision library. Chatterjee thanks the three anonymous reviewers for taking the time to provide constructive comments and for making the paper better. Author Claudia Acquistapace’s research has been supported by the Deutsche Forschungsgemeinschaft (Project 437320342) “Precipitation life cycle in trade wind cumuli” (https://gepris.dfg.de/gepris/projekt/437320342; last accessed: 23 December 2021).
Data availability statement.
The data and code to produce this work can be accessed online (https://doi.org/10.5281/zenodo.7437949 or https://zenodo.org/record/7437949).
REFERENCES
Atito, S., M. Awais, A. Farooq, Z. Feng, and J. Kittler, 2021: Mc-SSL0.0: Towards multi-concept self-supervised learning. arXiv, 2111.15340v1, https://doi.org/10.48550/ARXIV.2111.15340.
Azizi, S., and Coauthors, 2021: Big self-supervised models advance medical image classifications. 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 3458–3468, https://doi.org/10.1109/ICCV48922.2021.00346.
Benas, N., S. Finkensieper, M. Stengel, G.-J. van Zadelhoff, T. Hanschmann, R. Hollmann, and J. F. Meirink, 2017: The MSG-SEVIRI-based cloud property data record CLAAS-2. Earth Syst. Sci. Data, 9, 415–434, https://doi.org/10.5194/essd-9-415-2017.
Bony, S., H. Schulz, J. Vial, and B. Stevens, 2020: Sugar, gravel, fish, and flowers: Dependence of mesoscale patterns of trade-wind clouds on environmental conditions. Geophys. Res. Lett., 47, e2019GL085988, https://doi.org/10.1029/2019GL085988.
Caron, M., P. Bojanowski, A. Joulin, and M. Douze, 2018: Deep clustering for unsupervised learning of visual features. arXiv, 1807.05520, https://doi.org/10.48550/arXiv.1807.05520.
Caron, M., I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, 2020: Unsupervised learning of visual features by contrasting cluster assignments. arXiv, 2006.09882v5, https://doi.org/10.48550/arXiv.2006.09882.
Caron, M., H. Touvron, I. Misra, H. Jégou, J. Mairal, P. Bojanowski, and A. Joulin, 2021: Emerging properties in self-supervised vision transformers. arXiv, 2104.14294v2, https://doi.org/10.48550/arXiv.2104.14294.
Denby, L., 2020: Discovering the importance of mesoscale cloud organization through unsupervised classification. Geophys. Res. Lett., 47, e2019GL085190, https://doi.org/10.1029/2019GL085190.
Deneke, H., and Coauthors, 2021: Increasing the spatial resolution of cloud property retrievals from Meteosat SEVIRI by use of its high-resolution visible channel: Implementation and examples. Atmos. Meas. Tech., 14, 5107–5126, https://doi.org/10.5194/amt-14-5107-2021.
Fabel, Y., and Coauthors, 2022: Applying self-supervised learning for semantic cloud segmentation of all-sky images. Atmos. Meas. Tech., 15, 797–809, https://doi.org/10.5194/amt-15-797-2022.
Fukushima, K., 1975: Cognitron: A self-organizing multilayered neural network. Biol. Cybern., 20, 121–136, https://doi.org/10.1007/BF00342633.
Goyal, P., and Coauthors, 2021: Self-supervised pretraining of visual features in the wild. arXiv, 2103.01988, https://doi.org/10.48550/arXiv.2103.01988.
Grosvenor, D. P., and Coauthors, 2018: Remote sensing of droplet number concentration in warm clouds: A review of the current state of knowledge and perspectives. Rev. Geophys., 56, 409–453, https://doi.org/10.1029/2017rg000593.
He, K., X. Zhang, S. Ren, and J. Sun, 2015: Deep residual learning for image recognition. arXiv, 1512.03385v1, https://doi.org/10.48550/arXiv.1512.03385.
He, K., H. Fan, Y. Wu, S. Xie, and R. Girshick, 2020: Momentum contrast for unsupervised visual representation learning. arXiv, 1911.05722v3, https://doi.org/10.48550/arXiv.1911.05722.
Hornik, K., I. Feinerer, M. Kober, and C. Buchta, 2012: Spherical k-means clustering. J. Stat. Software, 50 (10), 1–22, https://doi.org/10.18637/jss.v050.i10.
Howard, L., 1803: I. On the modifications of clouds, and on the principles of their production, suspension, and destruction; being the substance of an essay read before the Askesian Society in the session 1802–3. Philos. Mag., 17, 5–11, https://doi.org/10.1080/14786440308676365.
Jakob, C., and G. Tselioudis, 2003: Objective identification of cloud regimes in the tropical western Pacific. Geophys. Res. Lett., 30, 2082, https://doi.org/10.1029/2003GL018367.
Jin, D., L. Oreopoulos, and D. Lee, 2017: Regime-based evaluation of cloudiness in CMIP5 models. Climate Dyn., 48, 89–112, https://doi.org/10.1007/s00382-016-3064-0.
Ketkar, N., 2017: Stochastic gradient descent. Deep Learning with Python, Apress Berkeley, 113–132, https://doi.org/10.1007/978-1-4842-2766-4_8.
Kurihana, T., E. Moyer, R. Willett, D. Gilton, and I. Foster, 2022a: Data-driven cloud clustering via a rotationally invariant autoencoder. IEEE Trans. Geosci. Remote Sens., 60, 1–25, https://doi.org/10.1109/TGRS.2021.3098008.
Kurihana, T., E. J. Moyer, and I. T. Foster, 2022b: AICCA: AI-driven cloud classification atlas. Remote Sens., 14, 5690, https://doi.org/10.3390/rs14225690.
LeCun, Y., and F. J. Huang, 2005: Loss functions for discriminative training of energy-based models. Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, R. G. Cowell and Z. Ghahramani, Eds., Society for Artificial Intelligence and Statistics, 206–213, https://proceedings.mlr.press/r5/lecun05a.html.
Liu, H., J. Z. HaoChen, A. Gaidon, and T. Ma, 2022: Self-supervised learning is more robust to dataset imbalance. arXiv, 2110.05025v2, https://doi.org/10.48550/arXiv.2110.05025.
MacQueen, J., 1967: Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium Mathematical Statistics and Probability, Vol. 1, L. M. Le Cam and J. Neyman, Eds., University of California Press, 281–297, http://projecteuclid.org/euclid.bsmsp/1200512992.
Meirink, J. F., R. A. Roebeling, and P. Stammes, 2013: Inter-calibration of polar imager solar channels using SEVIRI. Atmos. Meas. Tech., 6, 2495–2508, https://doi.org/10.5194/amt-6-2495-2013.
Nakajima, T., and M. D. King, 1990: Determination of the optical thickness and effective particle radius of clouds from reflected solar radiation measurements. Part I: Theory. J. Atmos. Sci., 47, 1878–1893, https://doi.org/10.1175/1520-0469(1990)047<1878:DOTOTA>2.0.CO;2.
Oreopoulos, L., N. Cho, D. Lee, S. Kato, and G. J. Huffman, 2014: An examination of the nature of global MODIS cloud regimes. J. Geophys. Res. Atmos., 119, 8362–8383, https://doi.org/10.1002/2013JD021409.
Oreopoulos, L., N. Cho, D. Lee, and S. Kato, 2016: Radiative effects of global MODIS cloud regimes. J. Geophys. Res. Atmos., 121, 2299–2317, https://doi.org/10.1002/2015JD024502.
Platnick, S., and Coauthors, 2017: The MODIS cloud optical and microphysical products: Collection 6 updates and examples from Terra and Aqua. IEEE Trans. Geosci. Remote Sens., 55, 502–525, https://doi.org/10.1109/TGRS.2016.2610522.
Ranzato, M., Y.-L. Boureau, S. Chopra, and Y. LeCun, 2007: A unified energy-based framework for unsupervised learning. Proc. 11th Int. Conf. on Artificial Intelligence and Statistics, San Juan, Puerto Rico, PMLR, 371–379, https://proceedings.mlr.press/v2/ranzato07a.html.
Rasp, S., H. Schulz, S. Bony, and B. Stevens, 2020: Combining crowdsourcing and deep learning to explore the mesoscale organization of shallow convection. Bull. Amer. Meteor. Soc., 101, E1980–E1995, https://doi.org/10.1175/BAMS-D-19-0324.1.
Roebeling, R. A., A. J. Feijt, and P. Stammes, 2006: Cloud property retrievals for climate monitoring: Implications of differences between Spinning Enhanced Visible and Infrared Imager (SEVIRI) on METEOSAT-8 and Advanced Very High Resolution Radiometer (AVHRR) on NOAA-17. J. Geophys. Res., 111, D20210, https://doi.org/10.1029/2005JD006990.
Rossow, W. B., and R. A. Schiffer, 1999: Advances in understanding clouds from ISCCP. Bull. Amer. Meteor. Soc., 80, 2261–2288, https://doi.org/10.1175/1520-0477(1999)080<2261:AIUCFI>2.0.CO;2.
Rousseeuw, P. J., 1987: Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math., 20, 53–65, https://doi.org/10.1016/0377-0427(87)90125-7.
Russakovsky, O., and Coauthors, 2015: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision, 115, 211–252, https://doi.org/10.1007/s11263-015-0816-y.
Sabottke, C. F., and B. M. Spieler, 2020: The effect of image resolution on deep learning in radiography. Radiology, 2, e190015, https://doi.org/10.1148/ryai.2019190015.
Schmetz, J., P. Pili, S. Tjemkes, D. Just, J. Kerkmann, S. Rota, and A. Ratier, 2002: An introduction to Meteosat Second Generation (MSG). Bull. Amer. Meteor. Soc., 83, 977–992, https://doi.org/10.1175/BAMS-83-7-Schmetz-2.
Schulz, H., R. Eastman, and B. Stevens, 2021: Characterization and evolution of organized shallow convection in the downstream North Atlantic trades. J. Geophys. Res. Atmos., 126, e2021JD034575, https://doi.org/10.1029/2021JD034575.
Stevens, B., and Coauthors, 2020: Sugar, gravel, fish and flowers: Mesoscale cloud patterns in the trade winds. Quart. J. Roy. Meteor. Soc., 146, 141–152, https://doi.org/10.1002/qj.3662.
Tan, J., C. Jackob, W. B. Rossow, and G. Tselioudis, 2015: Increases in tropical rainfall driven by changes in frequency of organized deep convection. Nature, 519, 451–454, https://doi.org/10.1038/nature14339.
Tian, B., M. A. Shaikh, M. R. Azimi-Sadjadi, T. H. Vonder Haar, and D. L. Reinke, 1999: Errata to “A study of cloud classification with neural networks using spectral and textural features.” IEEE Trans. Neural Network, 10, 722, https://doi.org/10.1109/TNN.1999.761731.
Tselioudis, G., W. Rossow, Y. Zhang, and D. Konsta, 2013: Global weather states and their properties from passive and active satellite cloud retrievals. J. Climate, 26, 7734–7746, https://doi.org/10.1175/JCLI-D-13-00024.1.
Tselioudis, G., W. B. Rossow, C. Jakob, J. Remillard, D. Tropf, and Y. Zhang, 2021: Evaluation of clouds, radiation, and precipitation in CMIP6 models using global weather states derived from ISCCP-H cloud property data. J. Climate, 34, 7311–7324, https://doi.org/10.1175/JCLI-D-21-0076.1.
Tzallas, V., A. Hünerbein, M. Stengel, J. F. Meirink, N. Benas, J. Trentmann, and A. Macke, 2022: CRAAS: A European cloud regime dataset based on the CLAAS-2.1 climate data record. Remote Sens., 14, 5548, https://doi.org/10.3390/rs14215548.
van den Oord, A., O. Vinyals, and K. Kavukcuoglu, 2017: Neural discrete representation learning. arXiv,1711.00937v2, https://doi.org/10.48550/arXiv.1711.00937.
van der Maaten, L., and G. Hinton, 2008: Visualizing data using t-SNE. J. Mach. Learn. Res., 9, 2579–2605.
Visa, A., J. Iivarinen, K. Valkealahti, and O. Simula, 1998: Neural Network Based Cloud Classifier. World Scientific, 303–309, https://www.worldscientific.com/doi/abs/10.1142/9789812816955_0035.
Williams, K. D., and M. J. Webb, 2008: A quantitative performance assessment of cloud regimes in climate models. Climate Dyn., 33, 141–157, https://doi.org/10.1007/s00382-008-0443-1.