A Parameter for Quantifying the Macroscale Asymmetry of Tropical Cyclone Cloud Clusters

A parameter to quantify macroscale (i.e., systemwide) asymmetry of tropical cyclones (TC) in infrared satellite images, galaxy asymmetry (GASYM), which is adoptedfrom astronomy,is described. In addition, an alternativeapproachtoidentifyTCcloudclustersthatisbasedonadensity-basedspatialclusteringalgorithm, cluster identiﬁcation (CI), is presented in this study. Although a commonly used approach in TC study, the predeﬁned radius of calculation (ROC), can be used to identify the TC region in the calculation of GASYM, this approach is not optimal because the size of the TC cloud cluster is often unknown in the calculation. The area speciﬁed by the ROC often includes pixels that do not belong to the TC cloud cluster and excludes pixels that belong to the TC cloud cluster. The CI approach addresses this issue by identifying TC cloud clusters of anysizewithanyshape,becauseitdependssolelyonthethresholdbrightnesstemperaturethatcorrespondsto the upper bound of the brightness temperature of the speciﬁc cloud types. This study shows that the CI approach can be integrated into the GASYM calculation as an objective measure of TC symmetry. Although GASYM-CI and intensity are correlated, the relationship between GASYM-CI and intensity depends on the size of the TC cloud cluster. Comparison between GASYM and an existing objective method to quantify symmetry of TCs, the deviation angle variance technique, is also presented.


Introduction
One of the key observations about tropical cyclones (TCs) is that intense TCs tend to have circularly symmetric cloud tops in satellite images (e.g., Dvorak 1975). Since the cloud top can be considered as the horizontal cross section of a TC structure, the high degree of circular symmetry at the top implies a high degree of axisymmetry in the three-dimensional structure of the TC. An axisymmetric structure is more favorable to TC intensification than an asymmetric structure (Persing et al. 2013). The axisymmetric structure of TCs also forms the basis of many TC intensity theories such as maximum potential intensity (Emanuel 1986(Emanuel , 1988(Emanuel , 1995Bister and Emanuel 1998). Moreover, the degree of circular symmetry of the cloud top of TCs has been one of the key ingredients for various intensity estimation methods, e.g., the Dvorak technique (Dvorak 1972(Dvorak , 1975(Dvorak , 1984Olander and Velden 2007) and the deviation angle variance (DAV) technique (Piñeros et al. 2008(Piñeros et al. , 2011Ritchie et al. 2012Ritchie et al. , 2014. The DAV technique is an objective method to estimate TC intensity by quantifying the axisymmetry of a TC using infrared satellite imagery (Piñeros et al. 2008(Piñeros et al. , 2011Ritchie et al. 2012Ritchie et al. , 2014. The key idea of the DAV technique is to construct the distribution of deviation angle (DA), i.e., angular deviation of the brightness temperature gradient vectors for the pixels in the IR satellite image from the idealized circularly symmetric image (i.e., from the direction pointing radially from the center of the TC) within a predefined radius of calculation (ROC).
The degree of circular symmetry is then given by the variance of the DA distribution, that is, the DAV. Large DAV indicates that the TC cloud cluster (TCCC) is highly asymmetric and disorganized, and small DAV indicates that the TCCC is highly symmetric and organized. The DAV technique has been shown to be useful in determining the intensity of TCs. Time series of intensity can be built using the DAV technique, and it is correlated with the best-track intensity records (Piñeros et al. 2011;Ritchie et al. 2012Ritchie et al. , 2014. Furthermore, the DAV technique has been used in TC genesis detection (Piñeros et al. 2010;Rodriguez-Herrera et al. 2015;Wood et al. 2015) and significant wind radii estimation (Dolling et al. 2016).
Since the DAV technique uses gradients of the brightness temperature field in the calculation, the DAV is sensitive to pixel-to-pixel variations (i.e., small-scale features) in the brightness temperature field, which include actual small-scale features and instrumental artifacts. On the other hand, the DAV technique utilizes all DA within the area of a given ROC, hence it includes information of systemwide (i.e., macroscale) features. This means that information of small-scale and macroscale features are colligated in the DAV technique. To have a better understanding of the relationship between macroscale asymmetry of TCCC and TC intensity, it is useful to quantify macroscale asymmetry and small-scale asymmetry independently. In this study we focus on quantifying the macroscale asymmetry of TCCC by adapting a parameter from astronomy, called the galaxy asymmetry (GASYM) parameter, which is widely used for examining asymmetry in the study of galaxy morphology (Conselice 1997;Conselice et al. 2000). We present the case that GASYM is a useful parameter for quantifying macroscale asymmetry for TCCC that can be used in future studies of TC using machine-learning techniques.
In addition, this study investigates the disadvantages of using predefined ROC and introduces a new approach to address the issue. The use of predefined ROC is a common practice in the study of TCs because it is difficult to determine the size of TCCCs using satellite images; in particular TCs could be surrounded by random unorganized tropical cloud clusters and random deep convections. However, the size of TCCC varies as the TC evolves (Knaff et al. 2014) and the TCCC may not have a circular shape, some of the TC related information from a satellite image (i.e., clouds that are associated to the TC), which are further away from the TC center than the ROC, would be excluded from the calculation. Similarly, some of the non-TC related information from a satellite image, for example, pixels of the ocean and land surfaces and non-TC cloud pixels could be included in the calculation. Consequently, a calculation using predefined ROC is not optimal. Ritchie et al. (2014) pointed out that due to the large variation of TC sizes in the western North Pacific Ocean, it would be better to use the radius of the TC as the ROC. However, no such algorithm has been developed to determine the size of the TCCC from IR satellite imagery. In the series of papers by Piñeros et al. (2011) and Ritchie et al. (2012Ritchie et al. ( , 2014, they attempted to find the optimal ROC which would lead to the optimal relation between DAV and intensity by testing many different ROC. The result shows that different basins and different years have different optimal ROC, which indicates that the optimal ROC is case dependent. In this study, we introduce a new approach to identify TCCCs that does not rely on predefined ROC. This approach makes use of a density based spatial clustering algorithm-Ordering Points to Identify the Clustering Structure (OPTICS) (Ankerst et al. 1999).
We should emphasize that while the DAV technique is used as a comparison with GASYM in this study, the scope of this study and the series of studies of the DAV technique (Piñeros et al. 2008(Piñeros et al. , 2010(Piñeros et al. , 2011Ritchie et al. 2012Ritchie et al. , 2014Rodriguez-Herrera et al. 2015;Wood et al. 2015;Dolling et al. 2016) are not the same. There are three objectives in this study: 1) the introduction of a parameter to quantify macroscale asymmetry of TC using IR images (i.e., GASYM), 2) an investigation of the relationship between macroscale asymmetry of TCCC and intensity, and 3) the introduction of an approach to identify TCCCs without predefined ROC.
The rest of the paper is organized as follows. The dataset of historical observations used in this study is described in section 2. The construction of GASYM is discussed in section 3. Section 4 presents an analysis of the role of macroscale asymmetry in TC intensity using historical observations. Some specific examples of the behavior of DAV and GASYM are shown and discussed in section 5. Section 6 addresses the predefined ROC issue by introducing a cluster identification method using OPTICS (Ankerst et al. 1999). Discussion can be found in section 7. The conclusions of this study and some remarks are presented in section 8.

Data
IR satellite images of TCs are obtained from the Hurricane Satellite Data B1, version 5 (HURSAT b1 v05; Knapp and Kossin 2007). HURSAT contains storm-centered satellite images of TCs around the globe, which are obtained from various geostationary satellites, with a resolution of ;0.078. The IR window channel (10.3-12.1 mm) measures the brightness temperature of land, sea, ice, and clouds. This provides the information of the cloud top temperature of TCs. Satellite images in HURSAT have been recalibrated and shown to be temporally consistent (Knapp and Kossin 2007). Thus, HURSAT can be used for global TC analysis.
Following Piñeros et al. (2011) and Ritchie et al. (2012Ritchie et al. ( , 2014, all of the IR satellite images are first transformed into a Cartesian grid of 10 km 3 10 km. Piñeros et al. (2011) indicated that reducing the resolution of images does not affect the results of the DAV calculation but it is computationally less expensive. Missing lines of data or unrealistic values in these satellite images are corrected by interpolation using ''poisson_grid_fill'' in the NCAR Command Language (NCL) provided that the percentage of missing pixels or pixels with unrealistic values is # 35%. The threshold (i.e., 35%) of whether the above correction should be performed on the images is somewhat arbitrary. It aims to remove images that are completely unusable. Furthermore, this procedure should not induce significant smoothing bias caused by interpolation to the overall dataset because over 97% of the images in HURSAT have # 11% of the image pixels that are either missing or have unrealistic values.
The HURSAT dataset consists of satellite images from multiple geostationary satellites. Although some of these satellites were monitoring the same ocean basins at the same time, they were at different longitudes over the equator, and the viewing zenith angle (VZA) of the satellites at the TC center would be different. The difference in the VZA would affect the image in several ways as summarized by Kidder and Vonder Haar (1995). To minimize the issues associated with large VZA, the image with the lowest VZA at the TC center among all of the images of the same TC at the given time stamp is used. There may be a consistency issue with switching satellites used for the same TC that are due to VZA. However, the number of occasions in which an image from a different satellite is used because of smaller VZA is less than 1.7% of the total number of images. Thus, this should not have a significant influence on the result. In this study, the number of images used for tropical depressions, tropical storms, and hurricanes is 37 033, 32 032, and 19 634, respectively.
The International Best Track Archive for Climate Stewardship (IBTrACS) (Knapp et al. 2010) is used to locate TCs. The IBTrACS dataset is the most complete archive of global TC best-track data today. Most of the best-track records are recorded in 6-h intervals, whereas the HURSAT data are in 3-h intervals. Therefore only 0000, 0600, 1200, 1800 UTC data are used in the study. The basin classification as defined by the IBTrACS is used in the current study. These basins are the North Atlantic (NA), the South Atlantic (SA), the western North Pacific (WP), the eastern North Pacific (EP), the South Pacific (SP), the northern Indian (NI), and the southern Indian (SI). SA is not analyzed in this study as only one TC is observed in that basin throughout the satellite era.
The study period is 1979-2009 to avoid poor observations in the early observational period. The intensity data around the globe are combined by first converting into 1-min maximum wind and then averaging it over all possible records of the same TC from various agencies, except for the China Meteorological Administration (CMA) dataset. CMA records 2-min maximum sustained wind speed of TCs. The values of 2-min maximum sustained wind speed are theoretically similar to 1-min maximum sustained wind speed (Powell et al. 1996). However, the CMA records have similar behavior as 10-min maximum sustained wind speed, and the 1-min maximum sustained wind records converted from the CMA records are statistically lower than other converted records for the same storm. Thus, the CMA records are not used in the current study to avoid significant inconsistency in the intensity data. It should be noted that this method of combining intensity records would result in a more complete intensity record, yet discrepancies in the intensity records between different agencies exist (Knapp and Kruk 2010;Schreck et al. 2014) due to the subjectivity of the intensity estimation method and the different conversion tables used.
In summary, the dataset consists of (i) IR brightness temperature images from HURSAT between 1979 and 2009 at 0000, 0600, 1200, 1800 UTC centered at any TC center interpolated to a Cartesian grid mesh with spacing of 10 km by 10 km with 65% or more nonmissing pixels. If multiple images are available in HURSAT for a TC at a particular time, the image, which is captured with the smallest VZA, is used: (ii) combined 1-min maximum wind intensity data from IBTrACS. CMA records and records in SA are neglected. Some of the data may be removed from the dataset for the reasons outlined in the following sections.

a. Definition
In astronomy, astronomers are interested in studying the asymmetry of galaxies as it is related to the age and history of the galaxies. Conselice (1997) proposed a method to calculate the degree of asymmetry of a galaxy using images from astronomical telescopes. As mentioned in the introduction, we refer to this asymmetry parameter as the galaxy asymmetry, or GASYM, parameter in this study. Conselices's method has been used extensively in the study of galaxy morphology due to the SEPTEMBER 2020 N G E T A L .
simplicity and nonparametric nature of the parameter.
The key idea of Conselices's method is to take the difference between the original image and the image which is rotated by 1808 at the center of the galaxy. We adopt and modify the above idea such that it can be used on satellite images of TCs: where is the brightness temperature of the ith cloud pixel in the original image; T (180) i is the brightness temperature of the ith cloud pixel in the image with 1808 rotation at the center of the TC; and T b is the background value of the image. The selection of T b will be discussed below. By construction, the value of GASYM ranges between 0 and 1. For a TC that is completely circularly asymmetric, GASYM 5 1; for a TC that is completely circularly symmetric, GASYM 5 0. Although there are known issues with the use of ROC, ROC is still necessary at this point for computing GASYM. The cluster identification (CI) approach is introduced in section 6 to address the issue associated with the use of ROC.
Since TCs consist of high clouds and deep convective clouds, pixels with brightness temperature above T b are set to be T b , which ensures that those pixels that do not correspond to high clouds or deep convective clouds are not involved in the calculation. In addition, if the average brightness temperature of the area in question is higher than T b (before the pixels with brightness temperature above T b are set to be T b ), we do not calculate GASYM because the number of pixels that contribute to the calculation is insufficient.

b. T b for high clouds
In the literature, many different cloud top temperature thresholds have been used to identify high clouds (Machado and Rossow 1993;Mapes and Houze 1993). If we define that the highest cloud top of midlevel cloud is located at 8 km above Earth's surface (i.e., high clouds are any cloud with cloud top height above 8 km), and assume that 1) the tropical atmosphere has a lapse rate that is close to the moist adiabatic lapse rate and 2) the temperature of the ocean surface is around 300 K (278C), the temperature at 8 km above Earth's surface is about 248 K (2258C). This is similar to the threshold that was used by Machado and Rossow (1993) [i.e., 245 K (2288C)]. Thus T b 5 248 K (2258C) is chosen to be the temperature threshold for the high clouds in this study.

c. T b for deep convective clouds
Other than the high-cloud temperature threshold [248 K (2258C)], a lower temperature threshold which represents the deep convective clouds can also be used to isolate the deep convective TCCC from the surrounding cloud clusters. In the literature, the typical cloud top temperature threshold corresponding to deep convective clouds takes the value between 208 and 219 K (2658 and 2548C) (Fu et al. 1990;Mapes and Houze 1993;Chen et al. 1996;Evans and Shemo 1996;Machado et al. 1998;Hall and Vonder Haar 1999;Zuidema 2003), with the exception of Tian et al. (2004), who used 230 K (2438C). We choose the temperature threshold for deep convective clouds to be 219 K (2548C), i.e., the upper bound of the typical range.

d. Limitations
One of the limitations of GASYM is that it cannot distinguish a TC with 1808 rotational symmetry (such as an elliptical TC) from a circularly symmetric TC. A solution to this problem is to introduce another parameter, GASYM90, which is similar to GASYM, except that the comparison is made between the original image and the 908 rotated image. The parameter is effective because an object with 1808 rotational symmetry does not necessarily have continuous rotational symmetry. For example, an ellipse is an object with 1808 rotational symmetry but it does not have continuous rotational symmetry. The value of GASYM of an ellipse is zero but the value of GASYM90 is not. Figure 1 shows the joint probability density function (PDF) for the entire dataset in the GASYM-GASYM90 space for both T b 5 248 and 219 K with ROC 5 100, 300, and 500 km. Although the population is displaced away from the 1-to-1 line due to the quasi-rotational symmetry of the TCCCs, it is clear that it is very rare to find observations with high GASYM but low GASYM90 and vice versa. Thus, we can conclude that it is very rare to have a highly elliptical TCCC with 1808 rotational symmetry and that GASYM90 is not needed in practice.
The other limitation of GASYM is the use of T b . If a majority of the pixels that are used in the calculation have values close to T b , the value of GASYM would have a bias toward the symmetric side. This effect can be seen from Eq. (1). Immature and weak TCCCs would be influenced by this effect because their average cloud top temperature would be close to T b . However, since immature and weak TCs are usually highly asymmetric, the impact on the current analysis is negligible.

Macroscale asymmetry of TCCC and TC intensity
In this section, we investigate the relationship between macroscale asymmetry and intensity using historical observations. This is done by comparing the original, unaltered images with corresponding images that are spatially smoothed by using Gaussian smoothing filter to remove small-scale features. Gaussian smoothing filter is a weighted averaging procedure with the weight W g determined by the 2D Gaussian distribution, where r is the distance from the subject pixel to the target pixel and s smooth is the smoothing standard deviations that controls the behavior of W g . More weight is given to the subject pixel and less weight to the other pixels with increasing distance.
s smooth of 1, 2, and 3 grid units are used for light, medium, and heavy smoothing, respectively. A visual example of the performance of the Gaussian smoothing filter is shown in Fig. 2. Smoothing with s smooth 5 3 grid units is sufficient to smooth out the small-scale features without distorting the macroscale structures as demonstrated in Fig. 2, and the images with heavy, medium, and light smoothing and the original image can be considered as a sequence with increasing amplitude of pixelto-pixel brightness temperature variation (i.e., small-scale asymmetry).
The median and interquartile range of DAV and GASYM of the original and smoothed images at different intensity are shown in Figs. 3 and 4, respectively. Figures 3 and 4 share many similarities. The overall trend is the same for all ROC for both DAV and FIG. 2. Effect of the Gaussian smoothing filter on the brightness temperature field: the image of a TC with (a) no smoothing (i.e., the original image) and smoothing with s smooth of (b) 1, (c) 2, and (d) 3 grid units. GASYM; i.e., symmetry increases as intensity increases. This is in agreement with the general observation of TCs. The change of asymmetry parameters per intensity is larger for small ROC in comparison with large ROC. This is because it is more likely to include pixels that are not related to the TCCC (i.e., noise pixels) in the calculation when a large ROC is used. For DAV, since there is no preference in which direction local temperature gradient should be pointing at for noise pixels, the deviation angle distribution would become similar to the uniform distribution when more noise pixels are included and thus the DAV value would move toward 27008 2 . For GASYM, since noise pixels would dilute the signal of macroscale symmetry, the GASYM value would increase.
Smoothing has little impact on the calculation of both asymmetry parameters for low intensity TCs but the impact increases with higher intensity TCs. This is because low intensity TCs are usually highly asymmetric and disorganized, and as such the signals of both smallscale and macroscale asymmetry are strong. Smoothing of low intensity TC images simply allows the macroscale asymmetric signal to dominate over the small-scale asymmetry signal. Consequently, smoothing of low intensity TC images has little impact on both asymmetry calculations. On the other hand, smoothing of high intensity TC images leads to a more dominant signal of the macroscale symmetry than the small-scale asymmetry.
Since DAV is sensitive to small-scale asymmetry signal, DAV decreases for smoothed high intensity TC (Fig. 3); whereas GASYM is less sensitive to small-scale changes [i.e., Eq. (1)], and smoothing has very limited impact on GASYM (Fig. 4). Beyond 130 kt (1 kt ' 0.51 m s 21 ), both asymmetry parameters (for all ROC) do not decrease further. This indicates that a high degree of axisymmetry is needed to reach very high intensity but factors other than convective organization are needed for further increase in intensity.
Despite the similarities in trends (Figs. 3, 4), there are differences between DAV and GASYM. The effects of smoothing on the asymmetry parameters are different. For GASYM, the effect of smoothing on the median is relatively small; for example, for ROC 5 300 km, the percentage change of the median of GASYM at the 150-kt bin is 25% and 222% for light and heavy smoothing, respectively. For DAV, on the other hand, FIG. 4. As in Fig. 3, but for the median and upper and lower quartiles of GASYM.
for ROC 5 300 km, the percentage change of the median of GASYM at the 150-kt bin is 214% and 238% for light and heavy smoothing, respectively. Furthermore, smoothing would increase the width of the interquartile range (IQR) in general, yet GASYM and DAV do not have the same resulting change. For the case of ROC 5 500 km at the 150-kt bin, the percentage change of GASYM IQR and DAV IQR for the heavy smoothing case is ;23% and ;111%, respectively. This implies that GASYM is weak in quantifying smallscale asymmetry but strong in quantifying macroscale asymmetry; whereas DAV is strong in quantifying both small-scale and macroscale asymmetry. Furthermore, Kendall's tau-b correlation coefficient t b between DAV and intensity for different levels of smoothing has been evaluated. The t b is a nonparametric method that measures the rank association between two quantities and that also takes ties into account [see Abdi (2007) for detailed explanation and examples]. The value of t b between DAV and intensity decreases with increasing strength of smoothing across all ROC with percentage change ranging between 212.3% and 222.4%. This indicates that small-scale asymmetric features also play a role in controlling intensity. A full discussion of the relationship between intensity and asymmetry parameters is presented in section 7. Figure 5 shows scatterplots of GASYM versus DAV for different ROCs. The large scatter suggests that for the same ROC, a TC with low GASYM does not guarantee it has low DAV and that a TC with high GASYM does not guarantee it has high DAV. There are cases where a TC has high DAV but low GASYM and vice versa. This is because these two parameters quantify different features. In this section, three examples are discussed. These examples represent three different cases: 1) low GASYM, high DAV; 2) low GASYM, low DAV; and 3) high GASYM, low DAV. Here we define low or high DAV respectively as DAV less than or greater than 18008 2 at an ROC of 300 km. This roughly separates into major hurricanes (e.g., category 3 or above; .95 kt) and below (Fig. 3). TCs reaching major hurricane strength typically have a high degree of symmetry. Similarly, we define low or high GASYM respectively as GASYM less than or greater than 0.475 at an ROC of 300 km.

a. Low GASYM, high DAV-Gay 1992, intensity 5 82 kt [Geostationary Meteorological Satellite-4 (GMS-4)]
The brightness temperature of the central region of cloud cluster of Gay is relatively uniform (Figs. 6a,c), yet the DA field is not (Figs. 6b,e). As we can see from Fig. 6e, DA distribution of ROC 5 100 km has entry from 2908 to 908. This is partly due to the weak temperature gradient across the ROC. As a result, the smallscale signal (e.g., the surface roughness of the cloud top) would dominate the DAV. From the macroscale point of view, the differences of the brightness temperature field are very small which are reflected by the small GASYM value for ROC 5 100 km (Figs. 6c,f). Noted that even the largest ROC (500 km) cannot enclose the entire TCCC (Fig. 6a). The majority of the pixels within the 500 km ROC are related to the cloud cluster of Gay, thus comparatively few pixels in the region do not carry information of TC.

b. Low GASYM, low DAV-Gerald 1987, intensity 5 100 kt (GMS-3)
Gerald is presented at its mature phase with intensity of 100 kt. The brightness temperature field shows relatively symmetric cloud structure with an eye (Fig. 7a). Both DAV and GASYM are low for the cloud cluster of Gerald for all of the ROC (Fig. 7d) because the cloud cluster itself is reasonably symmetric within 400 km radius from the storm center. The DA distribution FIG. 6. (a) IR brightness temperature from GMS-4 satellite of Tropical Cyclone Gay at 1200 UTC 23 Nov 1992; (b) the associate deviation angle field of Gay; (c) the difference between rotated and original brightness temperature fields with the application of the 248-K threshold; (d) asymmetry parameters calculated with different ROCs, where the blue line represents DAV and the red line represents GASYM; (e) deviation angle distribution with different contribution from each of the following annuli with respect to the center: 0-100 (red), 100-200 (green), 200-300 (blue), 300-400 (black), and 400-500 (purple) km; (f) as in (e), but for the distribution of temperature difference between rotated and original brightness temperature fields with the application of the 248-K threshold. The black lines in (a) and (b) are provided for reference of the extent of Gay, which is identified using the CI approach with T b 5 248 K.
( Fig. 7e) shows that although the outer radii contribute more to the distribution, many of the pixels have value within the 2308 to 308 range. The brightness temperature difference distribution shows that a majority of the pixels have very small temperature difference (Fig. 7f). This case shows that if the TC has low GASYM and low DAV, the TCCC typically has a high degree of macroscale symmetry and low small-scale asymmetry (i.e., circular and smooth cloud top).
c. High GASYM, low DAV-Arlene 1987, intensity 5 65 kt (Meteosat-2) In this case, Arlene has an intensity of 65 kt. The value of DAV decreases from 24448 2 at ROC 5 100 km to 16618 2 at ROC 5 350 km and then increases to 19578 2 at ROC 5 500 km (Fig. 8d). Meanwhile, the value of GASYM does not change with respect to ROC. The distinctively different behavior can be explained by the construction of GASYM and DAV. From the construction of GASYM, the use of T b helps to eliminate pixels of water and ground surfaces and also low and midlevel clouds (i.e., noise pixels). Many of those noise pixels are not included in the calculation. However, in the DAV calculation, a procedure to remove pixels of water, ground surfaces and low and midlevel clouds is not used for this kind of scenario. Thus many noise pixels are used in the DAV calculation for this image. This leads to the high GASYM, low DAV scenario. Furthermore, this example demonstrates the problem of the use of ROC in any calculation. Since the cloud cluster is highly asymmetric, the use of ROC cannot enclose all of the pixels of the cloud cluster of Arlene without including large amount of noise pixels.

a. Description of the OPTICS CI approach
As mentioned in section 1, the reason for using a fixed, predefined ROC in calculation is because it is difficult to determine the sizes of TCCCs using satellite images. FIG. 7. As in Fig. 6, but for Tropical Cyclone Gerald at 0000 UTC 9 Sep 1987 and (a) the IR brightness temperature is obtained from GMS-3.

SEPTEMBER 2020 N G E T A L .
In section 3, T b is introduced to the GASYM calculation for removing pixels which correspond to land, ocean, low-and midlevel clouds. However, the use of T b cannot prevent 1) the exclusion of pixels that are associated with the target TCCC in the calculation (see Fig. 6a), and 2) the inclusion of noise pixels in the calculation (see Fig. 8a). The solution to these problems is to identify the pixels that are associated with the TC cloud cluster using a clustering algorithm. The goal of the application of CI to GASYM is to first identify an image mask for the TCCC, then process the IR image by setting every temperature outside the mask as T b without affecting the temperature value within the mask. Consequently, the calculation of GASYM becomes scale independent, and ROC is not required. OPTICS (Ankerst et al. 1999) is a density based spatial clustering algorithm used for cluster identification. OPTICS is an improved version of the Density Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm (Ester et al. 1996). OPTICS creates an ordering of the dataset that corresponds to its density-based clustering structure. Two parameters are needed for OPTICS: 1) the maximum spatial radius to consider (« thres ), and 2) the minimum number of points in the « neighborhood (MinPts), where the « neighborhood of an object p is the space within the radius « from p. These two parameters define the range of density of clusters that would be found. From the ordering found by OPTICS, one way to extract the clusters is to use a single reachability distance 1 cutoff. Details of the OPTICS algorithm can be found in Ankerst et al. (1999). We use the implementation of OPTICS in the dbscan package of R (Hahsler et al. 2019). The values of « thres and MinPts are chosen to be 100 km and 15, respectively. The reachability distance cutoff is set to be 25 km. These parameters are chosen following a sensitivity analysis of various combinations of parameters. By visual inspection, TCCCs, i.e., large cirrus anvil and the associated spiral rainbands, can be consistently identified using this set of parameters. CI in this study is performed based on the locations of the cloud pixels in the IR satellite images. Since only points corresponding to the TC, i.e., high clouds, should be clustered, points corresponding to ocean, land, lowand midlevel clouds are removed from the dataset by removing pixels with brightness temperature equal to or above T b . Figure 9 shows examples of the output of the clustering algorithm at different values of T b . For a high value of T b , 273 K (08C), a majority of the cloud clusters are connected to each other as one large cluster. As the value of T b decreases, more connections between cloud pixels are broken; hence cloud clusters can be identified.
As shown in the third column of Fig. 9, the 248-K threshold can isolate the TC (including spiral rainbands and inner structure) from low-and midlevel clouds, local deep convections, and other cloud clusters. Consequently, this method can isolate the TCCC from other cloud clusters in the environment.
A set of selection criteria is used to objectively identify the TCCC from the clustering output: 1) the size of the cluster must be larger than 200 pixels, i.e., 20 000 km 2 ; 2) the distance from the closest pixel of the selected cluster to the center of circulation must be the shortest; 3) if two or more clusters satisfy criteria 1 and 2, the largest cloud cluster is selected as the TCCC. Criterion 1 ensures the cluster size is not arbitrarily small. Given that each pixel represents 100 km 2 , the minimum size of the cloud cluster with this criterion is 20 000 km 2 . Criteria 2 and 3 are chosen based on observations that the TCCC would not be far away from the circulation center.

SEPTEMBER 2020
N G E T A L .
The method described above works reasonably well from the early mature phase to the beginning of the decaying phase. This method sometimes struggles to identify TC clusters in the very early stage and the end stage of the TC life cycle. In the very early stage of the TC life cycle, TCCCs are not well developed or organized. Thus, TCCCs selected using the above criteria could be small and short lived; i.e., the same cloud cluster may not be found in the next consecutive image. In other words, large variation and inconsistency of the cloud clusters would be expected at the very early stage of the TC life cycle. Near the end of the TC life cycle, TCs can no longer maintain their axisymmetric structure and dissipate, or undergo extratropical transition which leads to a frontal structure (Kossin and Velden 2004). In the former case, TC can no longer sustain deep convections and the brightness temperature of the cloud top increases. 2 In the latter case, since the cloud cluster structure is frontal, the cloud cluster associated with the system would be away from the center. Therefore, a small cloud cluster that does not belong to the TC might be selected. It should also be pointed out that the size of the identified TCCC varies throughout the TC life cycle due to merger and separation of cloud clusters with the TCCC.
Hereinafter, GASYM which is calculated using the CI approach is referred to as GASYM-CI; GASYM which is calculated using the fixed radius (FR) approach, i.e., using ROC, is referred to as GASYM-FR.
The middle and right columns of Fig. 10 show examples of the CI approach of different intensity categories using T b 5 248 K (2258C) and 219 K (2548C), respectively. The 219-K threshold can isolate the inner structure of the TC from the rest of the clouds, as deep convective clouds are the dominant cloud type within the inner core. This means that the choice of the threshold would depend on the purpose of the study. Intense TCs tend to have much more organized inner structures than the weaker TCs because their structures are more axisymmetric. While the TCCC of hurricanes can be identified from the environment, the cloud cluster of a TC with tropical storm (TS) strength is not guaranteed to be found. Near the end of the life cycle of severe Tropical Cyclone Ken (1983) (hereinafter STC Ken) (Fig. 10, TS case), STC Ken was dissipating over land and was not able to maintain the strong deep convections as it was over the ocean. Thus, the cloud top brightness temperature of STC Ken at that stage was higher than 219 K (2548C). As a result, the CI approach is unable to locate the inner structure of STC Ken.
Note that GASYM-CI does not only measure the asymmetry of external geometry, i.e., the shape of the periphery of the TCCC. GASYM-CI also measures the internal asymmetry, i.e., the asymmetry due to the uneven spatial distribution of deep convective clouds or high clouds within the cloud cluster. This means that GASYM-CI measures the asymmetry of the entire TCCC. If the TC is internally symmetric and externally asymmetric (or internally asymmetric and externally symmetric), the overall symmetry of the TC depends on the number of pixels that contribute to the internal symmetry and external asymmetry (or internal asymmetry and external symmetry).

b. Relationship between GASYM-CI and intensity
While the inner core asymmetry affects the maximum potential intensity of TCs (e.g., Emanuel 1988;Wang and Zhou 2007), the macroscale asymmetry of the TCCC is one of the parameters in the Dvorak technique (Dvorak 1984). The relationships between GASYM-CI and intensity are shown in Fig. 11. Although the relationships are similar for the different basins, they are basin specific in details, which suggests that other processes do play a role in controlling intensity as suggested in the literature. In each basin, the relationships are similar to each other. Both GASYM-CI (248 K) and GASYM-CI (219 K) are sensitive to intensity change over a wide range of intensities, but they have plateaulike behavior in the low-and high-intensity regimes for some of the basins. In particular, GASYM (219 K) is not sensitive to low intensity, with the median of GASYM-CI (219 K) reaching the maximum value. Moreover, many low-intensity TCs have undetermined GASYM-CI (219 K), because their entire cloud top has brightness temperature higher than 219 K (see, e.g., the TS case in Fig. 10 discussed in section 6a). Therefore, even though GASYM-CI (219 K) is more physically motivated, GASYM-CI (248 K) is a better option for estimating intensity via macroscale asymmetry. Yet there exists saturation of intensity within the high intensity regime, where GASYM-CI (248 K) is almost independent of intensity and GASYM-CI (219 K) decreases slightly with intensity. This saturation of intensity also occurs in the Dvorak technique where the errors increase as intensities exceed 120 kt (Knaff et al. 2010).

Discussion
The t b correlation between intensity and DAV is higher than GASYM-FR and GASYM-CI for a majority of ROCs (Table 1). This implies that DAV has better performance than GASYM in estimating TC intensity. We also observe that the correlation between DAV and intensity increases with increasing ROC. The cause of the increase can be found in Fig. 3 (black lines). The width of IQRs of DAV decreases with increasing ROC for most of the intensity. For example, at the bin of 90 kt, the width of DAV IQR for ROC 5 100 and 500 km are 651.18 2 and 249.98 2 , respectively. As the result, correlation increases with increasing ROC. However, the cause of the shrinking in the width of IQRs is related to noise pixels. First of all, typically a circular boundary cannot enclose the entire TCCC without including noise pixels (Figs. 6a,7a,8a). Noise pixels do not have a preferential direction in which the temperature gradient should be pointing at because the temperature gradient is weak. When noise pixels are included, the symmetry signal of the TCCC would be diluted and therefore the value of DAV would move toward 27008 2 ; i.e., the deviation angle distribution would become a uniform distribution. Thus, while DAV with large ROC has much stronger correlation with intensity, most of the contributing signals are actually from the noise pixels but not from the TC itself.
For GASYM-FR, t b peaks at ROC 5 150 km across all basins, except for NI (where t b is marginally higher at ROC 5 100 km but the number of cases (;4100) is also the fewest), but the variation of t b is small for all ROCs (# 0.062 in all basins except NI). GASYM-CI (219 K) has larger t b than GASYM-FR of any ROC and larger or similar t b than GASYM-CI (248 K). Last, even though GASYM is less sensitive to small-scale asymmetry than DAV and CI can isolate the TCCC from other cloud clusters in its environment, t b for GASYM-CI never exceeds 20.393 in any basin, which reflects the large interquartile ranges seen in Fig. 11. This again suggests that other processes do play a role in controlling intensity.
The effect of TCCC size on the relationships between asymmetry parameters and intensity has also been investigated. Here the size of the TCCC is defined as the number of pixels (with size of 10 km by 10 km), which is associated to the TCCC as identified by the CI method with T b 5 248 K. TCCCs are divided into three size categories: small (,3000 pixels), medium (3000 # pixels , 6000), and large ($6000 pixels). In the idealized case of a symmetrical TCCC, 3000 and 6000 pixels represent an ROC of ;310 and ;440 km, respectively. This simple categorization is chosen because roughly 37% of TCCCs fall into the small category and roughly 37% of TCCCs fall into the medium category. Thus the number of entries for each category would have the same magnitude. Under this categorization, the relationship between intensity and asymmetry parameters have different behavior (Fig. 12).
For ROC 5 100 km, small (large) TCCCs tend to have higher (lower) value of median GASYM-FR than the overall median for intensity below 105 kt; as intensity increases the discrepancy between the overall median and the median values of GASYM-FR of different sizes reduces. Such discrepancy in GASYM-FR decreases as ROC increases for all intensity. For DAV with ROC 5 100 km, the difference between the overall median and median of DAV of small TCCC increases as intensity increases, whereas the median of the other size categories follows closely to the overall median. Similar to the case of GASYM-FR, such discrepancy in DAV decreases as ROC increases for all intensity. For the case of small ROC (e.g., 100 km), using the overall median of either GASYM-FR or DAV are not ideal for intensity estimation. GASYM-FR (ROC 5 100 km) significantly underestimates intensity of small TCs and this is significant even for TC with intensity close to 90 kt, it also overestimates intensity of large TCs. DAV (ROC 5 100 km) overestimates intensity of small TCs and this effect becomes more significant for the high intensity regime. Since the discrepancy between the overall median and median of individual size categories diminishes as ROC increases, this indicates, for asymmetry parameters which depend on ROC, large ROC is a better choice than small ROC as the discrepancy between medians is small for large ROC.
The relationships between GASYM-CI and intensity for TCCCs in different size categories have very different behavior in comparison to GASYM-FR and DAV. With respect to the overall median, the intensity of small TCs are likely to be underestimated for intensity between 40 and ;65 kt and overestimated beyond ;65 kt. For medium size TCs, the median values are similar to the overall median for intensity below ;55 kt, thus the intensity estimation would be in good agreement. Beyond 55 kt, the intensity of medium size TCs are likely to be overestimated. For large TCs, the overall median underestimates (overestimates) the intensity of TCs above (below) ;60 kt. This indicates that the size of the TCCC is also an important quantity to consider for intensity estimation. From the operational intensity estimation perspective, DAV is a better parameter to use than either GASYM-FR or GASYM-CI as it has higher correlations as well as the better shape of the relationship (Figs. 3, 4, 11). The relationship between GASYM and intensity has large intervals with almost constant regions, i.e., low intensity and high intensity regime, in comparison to the relationship between DAV and intensity. If the relation consists of plateau-like behavior, it would not be possible to discriminate TC intensities for a given value of such parameter, for example in this case GASYM.
From the asymmetry quantification perspective, GASYM-CI provides a robust and simple metric to quantify macroscale asymmetry of TCCC. As it depends on the CI approach, all of the pixels used in the calculation are related to the TCCC. This asymmetry parameter can be used for future studies requiring a quantification of TC asymmetry. This study further demonstrates that TC asymmetry alone cannot explain FIG. 12. Median (solid lines) and upper and lower quartiles (dashed lines) for GAYSM-FR with ROC of (a) 100, (b) 300, and (c) 500 km; for DAV with ROC of (d) 100, (e) 300, and (f) 500 km; and for (g) GASYM-CI (248 K) in intensity bins of 5 kt, for different size categories: including all sizes (black), small (blue), medium (green), and large (red). the variability of TC intensity. Other quantities are necessary to be considered in intensity estimation.

Conclusions and remarks
In this study, we have introduced a parameter to quantify macroscale asymmetry of TCCC, GASYM. The relationship between macroscale asymmetry of TCCC and intensity is investigated using GASYM and DAV. It is shown that while macroscale asymmetry is negatively correlated to intensity, it does not explain all variance of intensity. This suggests that other factors must play a role in determining intensity (e.g., smallscale asymmetry). Furthermore, we have introduced a scale independent approach to identify TCCC from satellite images, cluster identification, as an alternative method to the commonly used the predefined ROC approach. This technique can identify TCCC of different shape because it depends solely on the threshold brightness temperature T b which corresponds to the upper bound of the brightness temperature of the specific cloud types. This study demonstrates that while GASYM-CI and intensity are correlated, such relationship is different for different sizes of TCCC. Because of the simple and nonparametric nature of GASYM-CI, this parameter can be a useful tool for future studies of TC, particularly those which use machine-learning techniques.
Although GASYM-CI has shown to be a useful parameter for quantifying macroscale asymmetry of TCCC, the value of T b that is used in GASYM-CI is by no means optimal. The choice of T b , either 248 K or 219 K, corresponds to clouds at specific vertical heights in the tropical troposphere. However, these thresholds would be different in high latitude regions where the tropopause height is lower than that in the tropical region. Consequently, a dynamic temperature threshold should be developed as part of the improvement of the GASYM-CI approach. Furthermore, as pointed out by Fu et al. (1990), cloud top brightness temperature alone may not be able to differentiate cirrus and deep convective clouds. A better method to identify deep convective clouds for both daytime and nighttime would also improve GASYM-CI. Another issue for further study is that the current study uses the IR channel from the HURSAT data. Images in other channels such as water vapor should be examined to determine whether asymmetry in other channels might have better correlation with intensity.