Combining crowd-sourcing and deep learning to explore the meso-scale organization of shallow convection

Humans excel at detecting interesting patterns in images, for example those taken from satellites. This kind of anecdotal evidence can lead to the discov-ery of new phenomena. However, it is often difﬁcult to gather enough data of subjective features for signiﬁcant analysis. This paper presents an example of how two tools that have recently become accessible to a wide range of researchers, crowd-sourcing and deep learning, can be combined to explore satellite imagery at scale. In particular, the focus is on the organization of shallow cumulus convection in the trade wind regions. Shallow clouds play a large role in the Earth’s radiation balance yet are poorly represented in climate models. For this project four subjective patterns of organization were deﬁned: Sugar, Flower, Fish and Gravel. On cloud labeling days at two in-stitutes, 67 scientists screened 10,000 satellite images on a crowd-sourcing platform and classiﬁed almost 50,000 mesoscale cloud clusters. This dataset is then used as a training dataset for deep learning algorithms that make it possible to automate the pattern detection and create global climatologies of the four patterns. Analysis of the geographical distribution and large-scale environmental conditions indicates that the four patterns have some overlap with established modes of organization, such as open and closed cellular convection, but also differ in important ways. The results and dataset from this project suggests promising research questions. Further, this study illustrates that crowd-sourcing and deep learning complement each other well for the exploration of image datasets. (Capsule Summary) Crowd-sourcing and deep learning are combined to explore the meso-scale organization of shallow clouds in the subtropics. The selected ahead of the classiﬁcation according to a similarity analysis of atmospheric conditions that resemble the conditions encountered during the DJF season east of Barbados where these patterns were found to we and for each of the in the provide enough by

Humans excel at detecting interesting patterns in images, for example those taken from satellites. This kind of anecdotal evidence can lead to the discovery of new phenomena. However, it is often difficult to gather enough data of subjective features for significant analysis. This paper presents an example of how two tools that have recently become accessible to a wide range of researchers, crowd-sourcing and deep learning, can be combined to explore satellite imagery at scale. In particular, the focus is on the organization of   Finally, "Gravel" describes fields of granular features marked by arcs or rings. The typical scale 117 of these arcs is around 20 km. We suspect that these patterns are driven by cold pools caused by 118 raining cumulus clouds (Rauber et al. 2007). In this regard, Gravel is fundamentally different from 119 open-cell MCC, which has larger cells that are driven by overturning circulations in the boundary 120 layer. However, the line between these two mechanisms can blur at times.

121
It is also interesting to compare our subjectively chosen labels to those of Denby (2020) who 122 used an unsupervised learning algorithm to automatically detect different types of cloud organiza-123 tion (their Fig. 2). Some of their patterns bear resemblance to our classes, e.g. "Sugar" seems to 124 most closely correspond to their patterns A and B, "Gravel" to G and H. However, their automati-125 cally detected classes appear less striking to the human eye. 126 Crowd-sourced labels 127 To obtain a large pool of labeled images from the community, an accessible user interface is  and Gravel could be robustly detected. class separately. We also computed an IoU score for the "Not classified" area. Finally, the "All label. This results in three "no label"-"label" comparisons and three "label"-"label" comparisons.

208
Even with perfect agreement between the three Flower labelers, the mean IoU would only be 0.5.

209
In reality it is 0.44 for this example. These "no label"-"label" pairs with IoU = 0 make up 63% is an adjustable parameter, it has not been given instructions to only label larger patches. An 271 interesting and advantageous feature of the segmentation algorithm is that, despite all training 272 labels being rectangular, it appears to focus on the actual, underlying shape of the patterns, as 273 visible by the rounded outlines of the predicted shapes. This suggests that despite the uncertainty 274 in the human dataset, the deep learning algorithms are able to filter out a significant portion of this 275 noise and manage to distill the underlying human consensus.

276
To quantitatively compare the deep learning algorithms against the human labelers, we compute 277 the mean IoU for each human individually as well as for the two algorithms (Fig. 4b). Both al-278 gorithms show a large agreement with the human labels for a random validation dataset. The fact 279 that the scores are higher than the mean inter-human IoU directly reflects the fact that the algo-280 rithms tend to produce less noisy predictions. Further analysis shows that the algorithms inherit 281 some biases from the human training labels. The frequency and accuracy of the predicted labels 282 is higher for patterns with a higher inter-human agreement, most notably flowers (Supplemental 283 Fig. 3), which could slightly bias the deep learning predictions.

284
The main advantage of deep learning algorithms is that they are very fast at inference, one To obtain global climatologies of Sugar, Flower, Fish and Gravel we ran the algorithm on daily 295 global images for the entire year of 2017 ( Fig. 7b-e). The resulting heatmaps reveal coherent 296 hotspots for the four cloud patterns. The spatial distribution of these hotspots helps answer some 297 further questions raised by the ISSI team's study. For instance, the heat maps indicate that orga-298 nization is most common over the ocean. Only Sugar -the one pattern characterized by its lack 299 of mesoscale organization -was identified over land (but keeping in mind the potential bias of the 300 algorithm). Our results also indicate that Sugar, followed by Flower, are the most common forms patterns are more geographically intertwined, which is in agreement with the similarity of the en-319 vironmental profiles in Fig. 6. Interestingly, Gravel seems to be relatively confined to the Barbados 320 region, the west of Hawaii, and the southern tropical Pacific near regions -like the South Pacific

321
Convergence Zone -of climatological convergence (Fig. 2). Hence the prevalence of Gravel in been highlighted in this paper, and the answers to which we present as follows.

344
The first question (Q1) was concerned with how best to configure a crowd-sourcing activity. We

354
Our second question, Q2, asked whether sufficient agreement exists between the human label-355 ers to warrant scientific use of the labels. We believe that this is indeed the case. As discussed 356 in the section titled "Inferences from human labels" there is a significant amount of disagree-357 ment between the participants, particularly because many cloud formations did not fit one of the 358 four classes exactly. However, more importantly there was significant agreement on patterns that   (Table 1) 23

35
For the segmentation model the images were downscaled to 700 by 466 pixels (batch size = 6).

36
To create the prediction masks, first a Gaussian filter with a half-width of 10 pixels was applied to smooth 37 the predicted field. Then, for each pixel the highest probability for each of the four patterns was used, if this 38 probability exceeded 30%. This last step counteracts the tendency to predict background, which is by far 39 the most common class in the training set.