Unsupervised Clustering of Geostationary Satellite Cloud Properties for Estimating Precipitation Probabilities of Tropical Convective Clouds

: Understanding the growth of tropical convective clouds (TCCs) is of vital importance for the early detection of heavy rainfall. This study explores the properties of TCCs that can cause them to develop into clouds with a high probability of precipitation. Remotely sensed cloud properties, such as cloud-top temperature (CTT), cloud optical thickness (COT), and cloud effective radius (CER) as measured by a geostationary satellite are trained by a neural network. First, the image segmentation algorithm identi ﬁ es TCC objects with different cloud properties. Second, a self-organizing map (SOM) algorithm clusters TCC objects with similar cloud microphysical properties. Third, the precipitation probability (PP) for each cluster of TCCs is calculated based on the proportion of precipitating TCCs among the total number of TCCs. Precipitating TCCs can be distinguished from nonprecipitating TCCs using Integrated Multi-Satellite Retrievals for Global Precipitation Measurement precipitation data. Results show that SOM clusters with a high PP ( . 70%) satisfy a certain range of cloud properties: CER $ 20 m m and CTT , 230 K. PP generally increases with increasing COT, but COT cannot be a clear cloud property to con ﬁ rm a high PP. For relatively thin clouds (COT , 30), however, CER should be much larger than 20 m m to have a high PP. More importantly, these TCC conditions associated with a PP $ 70% are consistent across regions and periods. We expect our results will be useful for satellite nowcasting of tropical precipitation using geostationary satellite cloud properties. SIGNIFICANCE STATEMENT: We aim to identify the properties of tropical convective clouds (TCCs) that have a high precipitation probability. We designed a two-step framework to identify TCC objects and the conditions of cloud properties for TCCs to have a high precipitation probability. The TCCs with a precipitation probability . 70% tend to have a low cloud-top temperature and a cloud particle effective radius $ 20 m m. Cloud optical thicknesses are distributed over a wide range, but thinning requires a particle radius larger than 20 m m. These conditions of cloud properties appear to be unchanged under various spatial – temporal conditions over the tropics. This important observational ﬁ nding advances our understanding of the cloud – precipitation relationship in TCCs and can be applied to satellite nowcasting of precipitation in the tropics, where numerical weather forecasts are limited.


Introduction
Tropical convective clouds (TCCs) are of great interest in atmospheric science as well as hydrological risk management.TCCs are found across the tropics in the 208N-208S regions where large populations are exposed to flood risks (Rentschler and Salhab 2020).In such vulnerable regions, early detection of precipitation from TCCs would be an effective way to minimize losses due to flooding.However, early detection remains challenging because TCCs are developed in as little as 3 to 6 h before producing heavy rainfalls (Chen and Houze 1997).Forecasters, therefore, need to distinguish discernible indicators of precipitation potential from the optical and microphysical properties of developing TCCs.
Satellite observations are now indispensable to weather forecasting.A wide range of TCCs is being perpetually monitored by geostationary satellites.In the early days of satellite observations, detecting precipitating clouds was based on the statistical relationship between infrared brightness temperature T B at 10.4 mm and surface rainfall rates.The T B is generally known as an "atmospheric window channel," where it is less affected by the absorption of gas molecules and can identify thick clouds among high-altitude clouds.The T B threshold method assumes that rainfall rates on the ground are proportional to the decrease in T B , as indicators of the growth stage of sufficiently thick clouds, such as mature TCCs (Spencer et al. 1983;Yuter and Houze 1998;Yuan and Houze 2010).For example, Arkin (1979) proposed that the strongest correlation between cold cloud cover (above 10-km height) and precipitation over the tropics would appear for a T B # 235 K.
However, the T B snapshots capture entire clouds, including rainless areas, rather than precipitation-band boundaries.Mapes and Houze (1993) introduced adjusted T B thresholds to distinguish precipitable areas.From 180 to 265 K were commonly used to identify the area of TCCs.In the TCC area, below 200 K indicates cumulonimbus cores.The core region of TCCs can produce heavy precipitation, but cold cloud tops do not necessarily imply precipitation (Xu et al. 1999).
Moreover, these empirical T B thresholds depend on the region and seasonal conditions.If upper thin-ice clouds overlay the initial state of TCCs, the observed T B may be lower than would be expected under single-layer conditions.In other words, the method of detecting the precipitating area of TCCs with only T B thresholds involves large uncertainties.
Recent studies have used satellite retrieval of cloud properties to estimate precipitation probabilities in growing TCCs (Lin and Rossow 1996, hereinafter LR96;Hong et al. 2007;Rosenfeld et al. 2008;Mecikalski et al. 2011;Senf et al. 2015;Mecikalski et al. 2016;van Diedenhoven et al. 2016).Experimental and theoretical studies have shown that cloud properties, including cloud-top temperature (CTT), cloud optical thickness (COT), and cloud effective radius (CER), correlate with rainfall rates (Rosenfeld and Lensky 1998).CTT and COT are directly related to changes in the vertical growth of clouds.CER has been used to indicate the magnitude of the updraft and glaciation, which are prerequisites for the formation of raindrops in convective clouds.Using these properties, it is possible to estimate the probability of precipitation in TCCs.LR96 compared the amount of liquid water path between nonprecipitating and precipitating clouds using COT and CER.They discovered that precipitating clouds have more than 40 mg cm 22 in the liquid water path.Rosenfeld et al. (2008) revealed that, in CTT-CER relationships, the probability of precipitation increases rapidly when the CER exceeds 14 mm, and conversion from cloud water to rainwater becomes frequent at CER $ 25 mm.Mecikalski et al. (2011) and Senf et al. (2015) showed that both COT and precipitation probability increased as the cloud deepened.These studies provided sufficient potential to infer precipitation using cloud properties as discernible indicators.
A problem with this approach is that the collected TCC properties also include features of areas that may not be directly relevant to precipitation.The pixel-averaged cloud properties of an entire TCC differ from the state appearing in the convective core region of the TCC.This is because the convective core, representing the latest growth state from a vigorous updraft, occupies only a portion of the TCC total area.For this reason, previous studies have elaborated on how to obtain the properties that belong only to the core of the TCC.These tasks rely on human labor to extract features from TCC objects.However, various studies that have attempted to detect individual TCC objects by manual tracking (Rosenfeld and Lensky 1998;Vila et al. 2008;Feng et al. 2018) are hampered by time and space constraints.Recent studies attempted to effectively segment cloud masks from satellite images using deep learning methods (Nailussa'ada et al. 2018;Morales et al. 2018;Gonzales and Sakla 2019), but there is almost no research to classify individual cloud objects depending on cloud microphysics.Consequently, previous observational results on cloud-precipitation relationships have rarely been stable or reproduced.
This study aims to explore the cloud properties of TCCs that indicate a high probability of precipitation by training a neural network with daytime satellite imagery.We assumed that most of the TCCs are the mesoscale convective systems (MCS) over the tropics.MCS consists of cumulonimbus, nimbostratus, and anvil cirrus clouds.They are generally located in the tropics and account for most of the rainfall.Also, they have a significant role in heat and energy transport (Chen and Houze 1997;Houze 2014).We propose an efficient two-step framework to identify conditions for cloud properties based on individual TCC objects with various stages of growth.In the proposed network, we segment South Korean geostationary satellite images to detect TCC objects in tropical Pacific regions.Next, we use a clustering method to classify TCCs with similar cloud properties.We also analyze the cloud properties of clusters with a high probability of precipitation.Section 2 describes the dataset used, followed by an explanation of the image segmentation and clustering method in section 3. Section 4 explains the clustering results and cloud property conditions.Section 5 discusses the specific conditions associated with a high probability of precipitation.

Data
a. GEO-KOMPSAT-2A products GEO-KOMPSAT-2A (GK2A) is a geostationary satellite managed by the Korean Meteorological Administration (Choi and Ho 2015).Considering the field of view of GK2A, we decided on the study area between 808-1758E and 208N-208S, known as the Indo-Pacific warm pool, where the sea surface temperature is relatively high throughout the year (Takayabu 1994).TCCs frequently occur in this domain, directly influencing tropical Pacific Ocean rainfall patterns (Sassen et al. 2008;De Deckker 2016).We concentrated on the latitudes between 208N and 208S instead of the 308N and 308S zone (known as the tropical belt) to minimize the influence of convective clouds from the subtropics (Schumacher and Houze 2003).
The GK2A satellite uses the Advanced Meteorological Imager (AMI), which provides high-resolution imagery in 16 visible and infrared bands.This study focuses on AMI-retrieved cloud properties (CTT, COT, and CER) data at 2-km and 10-min resolutions.All of the retrieved AMI cloud properties are formed based on the varying reflectance and emittance of the liquidand ice-phase clouds.The physical meaning and retrieval methods of the three cloud properties are described in detail in the following section.
The CTT is the temperature corresponding to the emission level near cloud tops.In the GK2A algorithm, the CTT was calculated considering the actual cloud emission rate from the doubleinfrared spectrum test.The liquid clouds assume an emissivity of 1 on a 10.4-mm band and estimate the CTT from the amount of observed radiation (Kim et al. 2019).For ice-and uncertain-phase clouds, additional emissivity calculated in the 11.2-mm band was used to estimate the range of CTTs, considering the various emission rates of ice particles (Strabala et al. 1994;Kim et al. 2019).In this study, we used CTT retrievals rather than T B to consider the multilayered and thin-ice clouds.Choi et al. (2007) showed that a significant linear relationship between T B and CTT is valid for optically thick clouds (i.e., COT .10).For optically thin clouds, T B is usually higher than CTT.
The basic principle involved in COT and CER retrieval is based on a bispectral solar reflectance method described by Nakajima and King (1990).Since then, retrieval methods have continued to develop, and the GK2A algorithm calculates the solar radiation along various particle phases, such as the cloud optical properties algorithm of MODIS Collection 6 (Platnick et al. 2015).In the GK2A algorithm, the reflection function of clouds at a nonabsorbing visible band (0.64 and 0.87 mm) is primarily used as a COT function.The COT is defined as the optical thickness in a vertical profile from the cloud bottom to its maximum height and is often used as cloud optical depth in MODIS products.COT is quantified by the magnitude of extinction when light passes vertically through the clouds due to scattering and absorption by cloud droplets.The absorbing band in the near-infrared (1.6 mm) wavelength is used as a function of the CER, which represents the radiative characteristics of whole clouds composed of particles of several different sizes, taking into account the size of each cloud particle in a column.The CER is the ratio of the third to second moments of the particle-size distribution.In practice, an optimal estimation technique is used to determine the COT and CER with cloud reflectivity (Yang et al. 2013).The reflection, transmission, and absorption of clouds depend entirely on these two parameters, COT and CER.
The COT and CER are only available during the daytime because accuracy depends mainly on visible radiation.Accordingly, AMI cloud data retrieved during the day-night transition in the tropics were flagged as low quality.To deal with the data as clearly as possible, we used hourly AMI cloud products between 0200 and 0400 UTC (1100 and 1300 Korea standard time) from August 2020 through December 2021.

b. GPM products
This study also uses the Global Precipitation Measurement (GPM) product, intercalibrated with precipitation data from estimated infrared of partner satellites and gauge analysis.The GPM was launched in early 2014 by the National Aeronautics and Space Administration and the Japan Aerospace Exploration Agency in advance of the Tropical Rainfall Measuring Mission (TRMM) satellite.In comparison with the TRMM, the GPM products provide more accurate rainfall estimates with greater spatiotemporal resolutions (Prakash et al. 2016).Among the Integrated Multi-Satellite Retrievals for GPM (IMERG) products, we used the "final" run product (IMERG-F), which was released approximately 3.5 months after observation time through the climatological correction with Global Precipitation Climatology Centre precipitation gauge analysis (Hou et al. 2014;Huffman et al. 2015).
The resolution of IMERG-F products was rescaled with the nearest-interpolation method to match the 2-km resolution of the AMI.IMERG-F provides precipitationCal data that include the precipitation rate (mm h 21 ) at a resolution of 0.18 3 0.18 (approximately 10 km) on a half-hourly time scale and a quasi-global coverage (608N-608S latitude band) (Huffman et al. 2015).We calculated the average and maximum rain rate for each TCC object using IMERG-F products.TCC objects with an average rain rate of more than 0.1 mm h 21 were defined as precipitating clouds.

a. Network architecture
This section explains the details of the network architecture, as illustrated in Fig. 1.

1) DISTINGUISHING TCC OBJECTS FROM SEGMENTED SATELLITE IMAGERY
The image segmentation module in the "precipitating cloud classification using a self-organizing map (SOM)" (PCCS) network distinguishes cloud pixels from the background and identifies the location of a single object.Each TCC object can consist of a single cloud cell or multiple cloud cells, reflecting varying sizes and growth states of TCCs.We divided input satellite imagery into multiple TCC objects and collected cloud properties used to discriminate among precipitation conditions in the subsequent clustering module.CTT, COT, and CER values at the location of the minimum CTT pixel were extracted as cloud properties of each TCC object.Here, the minimum CTT is assumed to be the latest physical property to indicate the TCC growth because the cloud growth is centered on the location where the updraft is the strongest.
Several studies have adopted U-Net architecture for satellite imagery segmentation (Gonzales and Sakla 2019;Guo et al. 2020;Trebing et al. 2021).We adopted U-Net (Ronneberger et al. 2015) architecture to generate a cloud mask map at the pixel level.As in Ronneberger et al. (2015), the network consisted of four encoder and four decoder blocks connected via bridges (e.g., skip connection) to avoid loss of information from layer to layer.The sigmoid activation function, added to a backend, estimated the predicted cloud mask classified at the pixel level.
Before segmentation, we generated preprocessing data that filtered the cirrus from CTT imagery based on International Satellite Cloud Climatology Project (ISCCP) cloud classification (Rossow and Schiffer 1999).Cirrus clouds, which appear in the upper atmosphere, are thin and comprise primarily ice particles.This type of extensive cloud cover can significantly affect Earth's radiation but is unlikely to evolve into precipitating clouds.According to ISCCP-based thresholds, cirrus pixels with a cloud-top pressure (CTP) , 440 hPa and a COT , 3.6 were eliminated from CTT imagery.The preprocessed image was assigned a pixelwise class label (cloud and noncloud) and used as the reference data for image segmentation.
For U-Net training, we chose CTT and COT images as "input" data and cirrus-free CTT images as "reference" data.The network was trained from scratch with a learning rate of 10 24 using an Adam optimizer and batch size 8 as described by Gonzales and Sakla (2019).We trained the network for 10 000 iterations using binary cross-entropy loss and dice loss functions.Both loss terms measure pixelwise differences between reference data and our predicted cloud mask.Figures 2a  and 2b are input satellite imagery data, while Fig. 2c is used as the reference data for network training.The test result of the segmentation by the U-Net model achieved an average pixel accuracy of 97% in predicting cloud pixels, as shown in Fig. 2d.The resulting accuracy was superior to that reported by Gonzales and Sakla (2019).

2) UNSUPERVISED CLUSTERING OF GEOSTATIONARY SATELLITE CLOUD PROPERTIES
Machine learning requires having data labeled as the correct answer to learn.However, it is impossible to label all the clouds that appear in satellite imagery because we do not know the correct boundary of overlapped clouds.Learning on such unlabeled datasets is called unsupervised learning.Thus, we exploited unsupervised learning for the tasks of this study.Clustering is one of the unsupervised machine learning approaches that automatically groups data samples with similar characteristics.K-means is a simple and popular clustering algorithm for intuitively retrieving data structure.However, the K-means method is rarely applied to satellite-retrieved cloud data, given the high computational costs and difficulties in visualizing topological data.In addition, we applied the K-means algorithm to our dataset.The results showed that the separation ability for dense data regions is worse than SOM (the result is not shown here).We, therefore, adopted SOM developed by Kohonen (1990) using the MiniSom library (Vettigli 2019), which has proven highly successful in clustering high-dimensional data without requiring prior knowledge of the features of the input data and the class labels of the output data (Zhang et al. 2018).In addition, the mapping process of SOM, which maps highdimensional data to planes, can preserve the topology, meaning the distances between the original data.This allows the network to analyze the cloud properties of each cluster by tracing back the corresponding features in the original satellite product.
The values of cloud properties (CTT, COT, and CER) were extracted at the minimum CTT pixel position of each TCC object.We generated a SOM input vector consisting of the three cloud properties from the total number of all extracted TCC objects n.The dimension of the input data was n 3 3.Although sufficient TCC objects can be extracted from a single satellite image, we used monthly images to avoid overfitting the SOM results.We randomly selected the date of each month.Using these input vectors, each SOM first initializes the weight vector of each node in two-dimensional coordinates (x 3 y) in the lattice.The best matching unit in any set of weight vectors in the training data was selected in every iteration.The network adjusted the weights of nearby nodes to be close to the input data point.
We set the sizes of the SOM nodes on the x and y axes of the target discretized space to 4 and 5, respectively.These sets are empirical values.As the number increases, the data become overly fragmented, making it difficult to find general characteristics.We set the sigma at 0.2 and performed 10 000 iterations.As a result, the SOM projected all TCC objects into 20 different clusters based on their unique characteristics extracted in minutes.In our network, the SOM was trained with a learning rate of 10 24 on all input data as an unsupervised learning technique.

b. Calculation of precipitation probability
We proposed a precipitation probability (PP) index that indicates the probability that a TCC in a cluster with specific cloud properties will have a rainfall rate $ 0.1 mm h 21 .Some studies (Hong et al. 2004;Kidder et al. 2005;Risyanto et al. 2019) have already defined and used various indices similar to PP (e.g., rain probability, probability of rain, tropical rainfall potential).Where there are differences, PP is calculated at the classified cluster level based on similar cloud properties learned by the SOM rather than individual TCC objects.
In the image segmentation module of PCCS, we detected individual TCC objects from the contour lines of adjacent cloud mask pixels (Here, we used the Contour function of OpenCV, Python).First, we determined the average rainfall rate of each TCC object.We interpolated the IMERG-F data using the nearest-interpolation method.The closest IMERG-F data to the cloud pixel were considered the rainfall rate value of that pixel.The average rainfall rate was the sum of the rainfall rate of whole pixels in the individual TCC objects divided by the area.We assumed the TCC object had precipitation when the value exceeded 0.1 mm h 21 (Lau and Wu 2003;Tao et al. 2016).
We then calculated the PP for each cluster using Eq.(1).All TCC objects in the study domain are specified as one of the given numbers of SOM nodes.For clusters in each node, the PP was calculated as the ratio of the total number N of precipitating TCC objects to the total number of TCC objects.If the PP of the cluster appeared to be high, we presumed that TCC properties are highly correlated with precipitating.However, the number of SOM clusters and the criteria for separating them is set with empirical values and is, therefore, variable.We analyzed only the 25th and 75th ranges of physical properties appearing in each cluster to reduce the uncertainty: PP 5 N(Precipitating TCCs in a cluster) N(TCCs in a cluster) 3 100 (%): (1) We attempted to determine the PP threshold, which the cluster was very likely precipitating.For the same purpose, LR96 used the cloud water path (WP, defined as the weight of the water droplets in the cloud), which was estimated from the COT and CER as follows: WP 5 (0.6292 3 COT 3 CER)/10.According to LR96, more than one-half of the precipitating clouds had WP values exceeding 40-50 mg cm 22 .Note that they only assumed liquid water particles in clouds.
We were able to obtain a new WP range with a specific COT and CER range (in Table 1) that was revealed by SOM clustering.As the PP threshold that can distinguish precipitating TCCs, a PP $ 70% closely resembles the WP results of LR96 for precipitating clouds.We used this PP threshold in this study.Substituting the COT and CER values of the clusters satisfying PP 70 into LR96's equation, the WP of TCCs with a precipitation probability of 70% or more appeared to be 40-300 mg cm 22 .This new WP range is wider than the results of LR96.This is probably because our data are not limited to liquid water but consider various water phases in TCCs.

Results
From the PCCS network, each TCC object was assigned to 20 cluster indices from 20 SOM nodes.In comparison with the original input data (Fig. 3a), the index-colored cloud masks in Fig. 3b are more discernable for each TCC object.In Fig. 3b, identically colored objects have the same cluster index, which means they have similar cloud properties.This simple method reduced the ambiguity of cloud cognition from a wide range of thin upper clouds.At the same time, small TCC objects that can develop into precipitating clouds were preserved.Approximately 700-800 TCC objects were detected in a single satellite image.
During the study period, the total number of TCC objects obtained from 15 randomly selected images in each month was approximately 10 500 samples.Cloud property values were extracted from these TCC samples, input to SOM, and then assigned to 20 clusters.The COT-CER scatterplot for each clustered index is shown in Fig. 4.
In Fig. 4, each marker represents each TCC object.This graph only shows the relationship between COT and CER, but the CTT values also contributed an identical weight to the SOM clustering process (see Table 1).The same-colored marker represents the TCC objects assigned to the same cluster.Of the 20 SOM clusters, 8 clusters for a PP $ 70% are denoted by the filled circles, and the "3" markers indicate 12 clusters with a PP lower than that.Most clusters with a high PP had shown more than 20 mm for CER values (clusters 2, 4, 12, 13, 14, 17, 18, and 20).These high-PP clusters sometimes had increased PP as COT increased, reaching 96% in the rightmost cluster (cluster 18).Also, clusters 12, 13, 14, 18, and 20, on the right side of the high-PP clusters, had a PP in excess of 80%.However, if the COT values of the 75th quantile of clusters fell below 30, the CER criteria for clusters meeting high PP should be progressively larger.On the left side, clusters 2, 4, and 17 with relatively thin (COT , 30) TCCs were inversely proportional between CER and COT, as reported earlier by Nakajima and King (1990).These clusters represent relatively low PP (,80%).
Statistics results for each cluster are provided in Table 1.The range of cloud properties was between the 25th and 75th percentile of total TCC values corresponding to the same cluster.The averaged maximum rainfall rate was defined as the average of the maximum rainfall rate of each TCC object in the same cluster, and the values were calculated from IMERG-F.The clusters in bold in Table 1 indicate high-PP ($70%) clusters.In general, and with few exceptions, precipitating clouds have lower CTT ranges when compared with nonprecipitating clouds.Low-PP clusters fall in a CTT range of 215 to 235 K; however, the high-PP clusters are associated with 190 to 230 K.If the cloud CER was not large enough (,20 mm), the PP was small regardless of other cloud property values.For example, clusters 1 and 15 have sufficiently large COTs but with small CERs and high CTTs, indicating a low probability of precipitation.
Next, because the results of Fig. 4 and Table 1 are the average values of all input data, we identified the difference in cloud properties that appear in the high-PP clusters under various spatiotemporal conditions.Because countries within the study domain have different monsoon and dry seasons due to the intertropical convergence zone (ITCZ), we identified clusters satisfying the high-PP thresholds dependent on the location of ITCZ.
Figure 5a shows the results for the ITCZ facing south from December to March (DJFM), and Fig. 5b shows the results for the ITCZ facing north from June to September (JJAS).To test the SOM, 15 scenes were randomly selected from a given period of each seasonal condition.Coincidentally, the TABLE 1. Cloud properties and precipitation probabilities (PP) of SOM clusters in TCC samples extracted from 15 randomly selected satellite images each month from August 2020 to December 2021 over the tropical warm pool.The boldface numbers are for the clusters of high PP ($70%).The averaged max rainfall rate (mm h 21 ) is obtained from IMERG-F and is defined as the mean of the maximum rain rate of TCCs in the same cluster.This was calculated with a bilinear interpolation process because the IMERG-F data have a coarser spatiotemporal resolution than the GK2A data.The range of cloud properties is between the 25th and 75th percentile values of each cluster.Cloud properties include cloud optical thickness (COT; no units), cloud effective radius (CER; mm), and cloud-top temperature (CTT; K). pattern in which the cluster has a high PP ($70%) under both conditions has a shape similar to that in Fig. 4. Also, the overall scattered distribution of TCC objects is identical.Under seasonal constraints, the highest-PP clusters appear at the far right of the distribution.In the DJFM season, cluster 3 has an 89% PP and 11.35 mm h 21 of averaged maximum rainfall, and cluster 1 in the JJAS season has a 96% PP and 16.26 mm h 21 of averaged maximum rainfall.We also compared cloud properties according to where the clouds formed, finding distributions similar to previous results.Takayabu et al. (1994) and Rosenfeld and Lensky (1998) demonstrated that continental and maritime clouds have different microphysical precipitation-forming processes.Park et al. (2007) reported that cases of lightning and low-T B clouds were detected more frequently over land than over the ocean.To confirm the difference at both locations, TCC objects were sorted into continental and maritime clouds using a landsea mask of GK2A.The SOM test results for the continentalmaritime dataset are shown in Fig. 6.Because the cases of clouds over land and ocean were not considered here, the number of available TCC samples for clustering was lower than that of previous experiments.

COT
Under both locational conditions, the pattern of high-PP ($70%) clusters appearing as CER is larger than 20 mm, similar to previous results.However, the highest-PP clusters had a relatively small COT in cluster 19 (Figs. 6a,b).For continental clouds (Fig. 6a), the PP was 93% for cluster 19, where the COT was between 40 and 50.However, cluster 19, with 82% PP in maritime clouds (Fig. 6b), had a COT of 30-40, a slightly lower COT than the continental cloud.The CER variance of the high-PP clusters was larger in maritime clouds than in continental clouds.

Conclusions and discussion
This study revealed the relationship among the cloud properties (CTT, COT, and CER) of precipitating TCCs at various growth stages.Although we collected TCC objects under different spatiotemporal conditions, TCCs with high precipitation probabilities appeared in the specific condition of cloud properties.First, as shown in , high-precipitation probability (PP $ 70%) clusters commonly satisfy CER $ 20 mm and TCC # 230 K conditions regardless of COT.However, relatively thin TCCs (COT , 30) certainly have much larger CER than 20 mm in order to have a high PP.On the other hand, relatively thick TCCs (COT $ 30) showed higher PP than thin TCCs in general.This result corresponds with a report by van Diedenhoven et al. (2014), who noted that convective clouds over the tropics have CER values of 28-36 mm.In addition, several studies have shown that one-half of the cases keep or decrease the CER during the continuous growth of convective clouds (Mecikalski et al. 2011;Senf et al. 2015;van Diedenhoven et al. 2016;van Diedenhoven 2021).The decrease and congestion of CER with increasing COT in Fig. 4 can be accounted for by ice splinter from riming or particle sedimentation from fallout in the cloud top.This indicates that the high-PP ($70%) clusters are either actively growing with vigorous convection or have already reached the mature stage of TCCs.
Furthermore, when the COT or CER satisfies the CTT # 230 K condition at the same time, that cluster shows the highest probability of rain.Although few objects with PP $ 70% were considered a precipitating TCC by the LR96 equation, tracking results showed that these TCCs occupied more than half of the total cloud area over the tropics and were accompanied by a heavy rainfall rate, with maximum rates exceeding 6 mm h 21 (Yuan and Houze 2010).However, the precipitating area is just a few parts of the giant TCC.When using only the T B threshold method, it was difficult to distinguish the actual precipitation area from the cold clouds.In this context, the microphysical conditions revealed in this study can help determine precipitation areas strictly between TCCs covered with extensive anvil cirrus clouds.
Regardless of spatiotemporal conditions, cloud properties for high-PP TCCs are shown as general characteristics over the tropical warm pool region.However, the differences in the graph under various spatiotemporal conditions are noteworthy, and two of them warrant discussion.
First, there were differences in the cloud microphysical properties of high-PP clusters depending on seasonal conditions.In the DJFM season shown in Fig. 5a, the CER decreased slightly as the COT increased in high-PP clusters.The cluster with the largest PP among all clusters was in the most extensive COT range.The CER range for cluster 3 during the DJFM season is 25.26-33.56mm, lower than the range for cluster 1 of the JJAS season (31.26-41.33mm).In other words, TCCs of the DJFM have smaller CER values when compared with JJAS under the same COT conditions.Rosenfeld et al. (2008) revealed that a small CER value indicates the presence of a strong updraft in clouds.We can therefore presume that the DJFM season has a stronger updraft than the JJAS season.
Second, cloud characteristics over land may involve aerosol effects.Previous studies have shown that aerosols may cause other physical processes of precipitation formation in both regions (Rosenfeld and Woodley 2003).In general, land has a higher concentration of aerosols released from more pollutants than maritime air, and continental air has a higher cloud condensation nuclei concentration.Continental clouds are made up of many small droplets, and conversely, maritime clouds are generated with relatively large and fewer droplets.Continental clouds, therefore, induce glaciation at a lower temperature in combination with updrafts caused by strong convection on land.Because the clusters}a total of 19 in Figs.6a and 6b}of the highest PP have a relatively small CER, we can infer that the convection on land is relatively strong (Rosenfeld et al. 2008).
Meanwhile, note the uncertainty of data and model.We used the IMERG-F products to address the lack of observation equipment in the tropics.Several studies have evaluated the performance of the IMERG-F products in comparison with other observation products over the tropics (Liu 2016;Sunilkumar et al. 2019).Results showed significantly superior performance when compared with ground-gauge data, except for underestimated extreme precipitation (Xu et al. 2017).Ramadhan et al. (2022) showed that IMERG-F products tended to smooth out extreme rainfall values at an hourly time scale.However, we did not focus on extreme precipitation and only evaluated the least precipitation in the TCC.This limitation had little impact on the conclusions of this study.In addition, about our PCCS network, the accuracy of clustering depends on the accuracy of TCC detection.We used a contour line-based classification method relying on pixel contiguity.However, this approach may result in errors with overlapping clouds.In a future study, we need more elaborate detection methods and reasonable classification policies for ambiguous cloud boundaries.
The results are expected to help detect and predict precipitation in tropical regions by clarifying the conditions of cloud properties for high precipitation probabilities (Platt 1989;Hu and Stamnes 2000;Grabowski et al. 2019).Because TCCs directly impact tropical monsoons and can cause flooding, nowcasting to provide early warning is required.Through this study, TCCs satisfying the conditions of SOM clusters with high PPs can be detected in the target region.Because the experimental method is simple and insensitive, it can be reproduced in any region with such satellite images with the same cloud products (CTT, COT, and CER).Our results could be used to improve the initial conditions of numerical weather prediction (NWP) models or other models that forecast tropical rainfall.Since rainfall rates can be turned into latent heat rates, NWP models could benefit other routinely operated forecast algorithms.first author was supported by the Hyundai Motor Chung Mong-Koo Foundation.

FIG. 1 .
FIG. 1. Overview of the proposed PCCS network, including an image segmentation module and an unsupervised clustering module.In the image segmentation module, the CTT and COT of AMI/GK2A are used as input data for the U-Net model for noncirrus cloud pixel masking.The cloud mask from the image segmentation module is multiplied by each cloud property image to distinguish TCC objects [section 3a(1)].Next, we detect each TCC object and extract cloud properties.In the unsupervised clustering module with a self-organizing map (SOM), we use a vector component (r CTT n , r COT n , r CER n ) as the input dataset of SOM.After each TCC object is allocated to each SOM cluster index, we calculate a precipitation probability for each cluster [section 3a(2)].

FIG. 2 .
FIG. 2. The input data of U-Net: (a) CTT and (b) COT snapshots from GK2A (bands in parentheses are used) over the tropical warm pool (808-1758E and 208N-208S) at 0300 UTC 4 Oct 2020.Bright pixels in (a) and (b) represent the highly grown cloud region.(c) Reference data for segmentation training; the cirrus clouds have been removed from the CTT image based on the ISCCP cloud classification thresholds.(d) The result of image segmentation by U-Net.In (c) and (d), the white cloud area and the black background area are shown as binary types (1, 0).

FIG. 3
FIG. 3. (a) Normalized grayscale CTT imagery from GK2A (from 0 to 255) over the tropical warm pool at 0300 UTC 20 Sep 2021.(b) TCC objects to which the 20 SOM cluster indices are applied to the cloud mask of (a).The color means the TCC object is assigned to the same SOM node, in which TCCs have similar cloud microphysical properties.The cloud area in (b) has been removed from cirrus clouds as a result of the image segmentation module of PCCS.
FIG. 4. A scatterplot of COT and CER from the monthly random dataset for October 2020 to December 2021 in the tropical warm pool.Each symbol means each TCC object, located at the extracted COT and CER values from a minimum CTT pixel.The TCC objects in a cluster that meets the PP criterion of 70% are denoted by the colored circles [we also marked the cluster number and PP (%)].The "3" symbols are TCC objects of the cluster for PP , 70%.Each color represents a specific cluster index, meaning the same-colored TCCs have similar microphysical properties.

FIG. 5 .
FIG. 5.As in Fig. 4, but for the seasonal condition: scatterplots of the random dataset obtained (a) during the DJFM 2020 and 2021 period and (b) for the JJAS 2020 and 2021 period.More than eight clusters indicated by filled circles satisfy the high-PP ($70%) condition.FIG. 6.As in Fig. 4 but for topological conditions: (a) continental clouds and (b) maritime clouds during the study period.Clouds that span continents and oceans are not included.