Nowcasting Multicell Short-Term Intense Precipitation Using Graph Models and Random Forests

Cong Wang School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Search for other papers by Cong Wang in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-2262-2874
,
Ping Wang School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Search for other papers by Ping Wang in
Current site
Google Scholar
PubMed
Close
,
Di Wang School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Search for other papers by Di Wang in
Current site
Google Scholar
PubMed
Close
,
Jinyi Hou School of Electrical and Information Engineering, Tianjin University, Tianjin, China

Search for other papers by Jinyi Hou in
Current site
Google Scholar
PubMed
Close
, and
Bing Xue CMA Public Meteorological Service Centre, Beijing, China

Search for other papers by Bing Xue in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Short-term intense precipitation (SIP; i.e., convective precipitation exceeding 20 mm h−1) nowcasting is important for urban flood warning and natural hazards management. This paper presents an algorithm for coupling automatic weather station data and single-polarization S-band radar data with a graph model and a random forest for the nowcasting of SIP. Different from the pixel-by-pixel precipitation nowcasting algorithm, this algorithm takes the convective cells as the basic units to consider their interactions and focuses on multicell convective systems. In particular, the following question could be addressed: Will a multicell convective system cause SIP events in the next hour? First, a method based on spatiotemporal superposition between cells is proposed for multicell systems identification. Then, the graph model is used to represent cell physical attributes and the spatial distribution of the entire system. For each graph model, a fusion operation is used to form a 42-dimensional graph feature vector. Finally, combined with the machine learning approaches, a random forest classifier is trained with the graph feature vector to predict the precipitation. In the experiment, this algorithm achieves a probability of detection (POD) of 79.2% and a critical success index (CSI) of 68.3% with the data between 2015 and 2016 in North China. Compared with other precipitation nowcasting algorithms, the graph model and random forest could predict SIP events more accurately and produce fewer false alarms.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jinyi Hou, houjinyi@tju.edu.cn

Abstract

Short-term intense precipitation (SIP; i.e., convective precipitation exceeding 20 mm h−1) nowcasting is important for urban flood warning and natural hazards management. This paper presents an algorithm for coupling automatic weather station data and single-polarization S-band radar data with a graph model and a random forest for the nowcasting of SIP. Different from the pixel-by-pixel precipitation nowcasting algorithm, this algorithm takes the convective cells as the basic units to consider their interactions and focuses on multicell convective systems. In particular, the following question could be addressed: Will a multicell convective system cause SIP events in the next hour? First, a method based on spatiotemporal superposition between cells is proposed for multicell systems identification. Then, the graph model is used to represent cell physical attributes and the spatial distribution of the entire system. For each graph model, a fusion operation is used to form a 42-dimensional graph feature vector. Finally, combined with the machine learning approaches, a random forest classifier is trained with the graph feature vector to predict the precipitation. In the experiment, this algorithm achieves a probability of detection (POD) of 79.2% and a critical success index (CSI) of 68.3% with the data between 2015 and 2016 in North China. Compared with other precipitation nowcasting algorithms, the graph model and random forest could predict SIP events more accurately and produce fewer false alarms.

© 2020 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jinyi Hou, houjinyi@tju.edu.cn

1. Introduction

Short-term intense precipitation (SIP) events are defined by the China National Meteorological Center’s Practice Standard as convective precipitation events with hourly precipitation greater than 20 mm. Unlike other convective hazards (e.g., hail and convective winds), SIPs tend to occur over longer time periods and may be associated with multiple convective cells that cause damage cumulatively (Guo et al. 2006; Seo et al. 2014; Han et al. 2016; B. Yang et al. 2016). Since SIPs can result in urban waterlogging, flash floods, landslides, and other hazardous conditions (Iadanza et al. 2016; Akbari Asanjan et al. 2018; Liu et al. 2020; Gao et al. 2020), SIP nowcasting is important for disaster prevention and control.

Although atmospheric science has made significant progress in recent years, convective precipitation nowcasting remains a challenge. The convective systems often change significantly in a short time, and the internal convective cells have complex interactions. In traditional research, two methods have been widely used for precipitation nowcasting, one of which was based on numerical weather prediction (NWP) models and the other based on radar extrapolation techniques. While these two methods were different in approaches, they have played effective and complementary roles for precipitation nowcasting (Golding 1998; Wilson et al. 2004; Liang et al. 2010).

Each of these methods has advantages and disadvantages. The methods based on NWP can simulate the physical process inside the convective systems, but a long time is required for computing and forecast generation. With the significant progress of computing technology over these years, the NWP models of convective systems could be used for nowcasting (Sokol et al. 2016), but they still have some limitations. For example, NWP models are sensitive to the initial conditions and also limited by the horizontal resolution (Bližňák et al. 2017). The methods based on extrapolation techniques mainly include centroid tracking methods (Handwerker 2002; Han et al. 2009; Zahraei et al. 2013; Rossi et al. 2015) and cross-correlation methods (Rinehart and Garvey 1978; Li et al. 1995; Lai 1998; Zahraei et al. 2012; Liu et al. 2015). The thunderstorm identification, tracking, analysis, and nowcasting (TITAN) method (Dixon and Wiener 1993) and the storm cell identification and tracking (SCIT) algorithm (Johnson et al. 1998) are the representatives of the centroid tracking methods, which use convective cells as the research object and predict their future positions. The cross-correlation-based methods do not aim at a single cell but calculate the regional correlation in the radar image on pixels, and the motion vector of each point can be obtained. These methods only need to obtain information from radar data at two consecutive moments and do not consume too many computing resources. Therefore, they have better real-time performance and higher temporal and spatial resolution. However, the extrapolation-based methods can only represent simple changes in the observed data and cannot simulate complex meteorological processes, so it may be challenging for them to produce accurate predictions of future changes, and the nowcasting ability is also limited.

In recent years, machine learning (ML) techniques have been applied in the meteorology field due to its ability to model nonlinear meteorological systems directly from data without much computation (Gagne et al. 2017; Wang et al. 2018; Akbari Asanjan et al. 2018; Czernecki et al. 2019; Loken et al. 2019). Gagne et al. (2014) used the random forest for ensemble forecast postprocessing to enhance storm-scale quantitative precipitation forecasts. Random forest could correct systemic biases in the ensemble precipitation forecast and incorporate additional uncertainty information from aggregations of the ensemble members and additional model variables. Shi et al. (2015) considered the extrapolation problem as a video prediction problem and used deep learning techniques to model changes between radar images. Han et al. (2017) translated the convective system nowcasting problems into classification problems. They made radar reflectivity and environmental field information into 3 km × 3 km grids and trained a support vector machine (SVM) classifier for prediction. Herman and Schumacher (2018) used NOAA’s second-generation Global Ensemble Forecast System Reforecast dataset to train a random forest model for locally extreme precipitation prediction in the next several days. Compared with traditional methods, ML algorithms have significantly enhanced forecasting skill, especially when combined with NWP data and when used for forecasting lead times of several hours to several days. However, there are still challenges in using ML algorithms for predicting convective disasters on short time scales. One challenge is that individual convective cells or systems are usually desired as the basic unit of research. However, convective cells do not have fixed shapes or sizes, and ML algorithms require a set of fixed-length feature inputs. Thus, a method is required to characterize convective cells and convert them into feature vectors that can be used by ML algorithms. Because SIPs are often caused by convective systems composed of multiple unique cells (e.g., Fig. 1), SIP nowcasting algorithms should consider attributes from individual cells as well as their parent convective systems.

Fig. 1.
Fig. 1.

From 1100 to 1154 UTC 21 Jul 2015, a SIP occurred in the radar area of Shijiazhuang, China. White points indicate stations with hourly precipitation greater than 20 mm. It can be seen that the SIP event is caused by multiple cells together instead of a single independent cell.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

In this paper, the graph model and random forest are combined to address the SIP event nowcasting problem. Unlike other methods that directly apply ML algorithms to radar data or NWP data with the grid structure, we use convective cells and systems as the research objects, which is more suitable for the characteristics of convective events. To use ML algorithms for convection systems that contain any number of cells, graph models are used to represent multicell convective systems. The graph model is a concept in discrete mathematics and has been introduced in meteorology to describe meteorological system structure and evolution (Whitehall et al. 2015; Liu et al. 2016; Hou and Wang 2017). A graph model consists of several nodes and edges connecting the nodes. Because the structural characteristics of the graph model are consistent with the spatial distribution of the multicell convective system, the graph model is proposed to represent the multicell convective system. It can combine the cells’ attributes and the entire distribution of the multicell convective system in one model, and transform them into a fixed-length feature vector. In this way, the problem: “Whether there is a SIP event?” is transformed into a classification problem of the graph model. Finally, a random forest model is trained using graph model attributes vectors and observation data for SIP events nowcasting.

This paper is organized as follows. Section 2 introduces the datasets and preprocessing methods. Section 3 represents the method in detail. In section 4, the algorithm parameters selection and performance evaluation are carried out. Section 5 discusses the model’s interpretability and error analysis. The summary and future work are shown in section 6.

2. Data

The automatic weather station (AWS) data and S-band Doppler weather radar data provided by the Meteorological Observation Center of the China Meteorological Administration are used for nowcasting. The regions of the data are from Binzhou, Jinan, Qingdao, Shijiazhuang, and Weifang in North China. A general view of study areas, and radar and AWS locations are shown in Fig. 2. The time range is from June to September in 2015 and 2016. June to September is called the flood season in North China. The AWS data are recorded every 10 min and contains minute- and hourly-resolution precipitation data. It is used to determine whether a SIP event has occurred, and is also used as the label for training ML models and the basis for evaluating the performance of algorithms. The weather radar scan is performed approximately every 6 min, and the scan range is 230 km. Each radar scan includes nine elevation angles between 0.5° and 19.5°. The radar data are converted from the polar coordinate system to the cartesian coordinate system with a spatial resolution of 1 km × 1 km × 1 km, and the reflectivity product is used for SIP nowcasting.

Fig. 2.
Fig. 2.

Study area and station locations. The black circles indicate the scanning range of the radars, and the red points show the AWS sites locations.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

For each multicell convective system, we overlay the coverage area of the system in the next hour. If the number of AWSs with precipitation higher than 20 mm in the coverage area of a multicell system is greater than two, it is considered that the system will have SIP in the next hour, and we use it as a positive sample. Otherwise, it is considered that the system will not have SIP in the next hour, and we use it as a negative sample. The multicell convective systems located in oceans and mountains are not taken as samples because these places lack AWSs as labels. To reduce the correlation between samples and improve the generalization ability of the model, the samples in this study do not overlap in time. We have a total of 1349 samples, including 590 samples with SIP and 759 samples without SIP.

3. Method

Figure 3 shows the complete algorithm flowchart. For the particularities of the multicell convective system SIP events, the algorithm has three key designs: 1) Based on the spatiotemporal superposition of multicell system, the multicell clustering method is proposed to identify multicell convective systems. 2) The graph model is used to present the multicell system. 3) The random forest model is used to carry out the nowcasting of multicell system SIP.

Fig. 3.
Fig. 3.

Algorithm flowchart. The dashed line indicates the algorithm model training process, and the solid line indicates the nowcasting process. During model training, radar data and AWS observation data are used. Multicell convective systems are identified from radar data and represented using graph models. Then we form a set of graph features and combine the observation data to train a random forest model. In the process of nowcasting, the trained random forest model and the graph features obtained from the radar data are used to predict whether SIP events will occur.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

a. Multicell convective system identification

For multicell convective system SIP nowcasting, the convective system should first be identified from the radar data. Since the structure of the cell on the radar image shows that the intensity of the reflectivity decreases from the inside to the outside, it is an obvious method to consider the multiple cells connected in weak echo (such as a group of cells in the same 25 dBZ region) as a convective system (Hou and Wang 2017). However, a single threshold brings more problems. If the threshold is too small, some unnecessary cells will be introduced into the convective system. If the threshold is too large, some cells will be missed. Moreover, this study aims at SIP and hopes to find a “system” consist of all the cells that may cause SIP together rather than just a system in meteorological conception. The key to SIP caused by multiple cells is that these cells pass through the same point on the ground. If the time and space relationships are considered together, it could be found that the essence of multicell SIP is that these cells are superimposed over a short time. In this way, the intensity of the spatiotemporal superposition effect between cells can be used as the measurement of the correlations between the cells in one system. In this paper, a new multicell convective system identification method, which is based on cell spatiotemporal overlap and transitive closure, is proposed (Fig. 4) and applied to the SIP nowcasting to replace the traditional convective system identification method that uses a single threshold.

Fig. 4.
Fig. 4.

The flowchart of multicell system identification. Cell 1 and cell 2 are isolated cells; cell 3 and cell 4 are in a convective system in the current moment. However, by extrapolation at 9 times, it can be seen that cells 1, 2, and 3 may interact to cause precipitation. This algorithm calculates the superposition coefficient η and makes clusters, dividing the monomers 1, 2, and 3 into a system for precipitation prediction.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

1) Cell identification and tracking

Convective cells are the basic units of convective systems. Before identifying the convective system, the cells need to be identified first. In this paper, the Intertwined Irregular Convective Storm Segmentation Arithmetic (IICSSA) method is used to identify cells (Wang et al. 2013). Like most widely used cell identification methods (Johnson et al. 1998; Han et al. 2009), IICSSA uses two thresholds called Thigh and Tlow to identify cells; Thigh is responsible for determining the cores of the cells, and Tlow is used to identify the peripheral part of the cells. Through the combination of Thigh and Tlow, the entire convective cell regions can be obtained. Furthermore, cell movement information is also needed to identify the multicell convective systems. The optical flow algorithm has been widely used in radar extrapolation and nowcasting, which could obtain point-by-point speed information by analyzing two consecutive images (Horn and Schunck 1981; Bowler et al. 2004; Woo and Wong 2017). In this paper, an optical flow algorithm based on the polynomial expansion (Farnebäck 2003) is used for cell tracking and calculating the velocity of the cell. This algorithm uses polynomials to fit the neighborhood of points on the image and uses the change of polynomial coefficients of corresponding points in two adjacent images to estimate the displacement. It avoids the process of global iteration to find the optimal solution and has better computational efficiency.

2) Estimating spatiotemporal cell overlap

To observe the spatiotemporal superposition trend between the cells in the future, a simple extrapolation should be carried out first. The speeds from cell tracking are used for extrapolation. Because the extrapolation results are only used as rough estimates of the spatiotemporal relationships between the cells, it is assumed that the cell size and shape do not change during the extrapolation. Then, the 1-h extrapolation results are superimposed to calculate the superposition coefficient η between the cells. If there are two cells OAt, and OBt at time t, the extrapolation results of the next 9 moments are OAt+1,OAt+2,,OAt+9 and OBt+1,OBt+2,,OBt+9, then one can calculate the ηAB by
ηAB=OAΣ OBΣmin(OAΣ,OBΣ),
where OAΣ=n=09OAt+n, OBΣ=n=09OBt+n, and is the operator of the intersection of sets; OAΣ and OBΣ represent the area covered by the cell in 10 moments. The superposition coefficient ηAB can be applied as a spatiotemporal relationship measurement of the cells OAt and OBt in the next multicell clustering.

3) Multicell clustering

The cells that jointly cause the same SIP event should be treated as a system to analyze, and clustering methods could be used to identify multicell systems. Traditional clustering methods mostly require a certain number of categories (Jain et al. 1999), but in reality, the number of precipitation systems is uncertain. Transitive closure is a concept in discrete mathematics (Rosen and Krithivasan 2012), and it could be used to cluster without the need to specify the number of categories in advance. In this paper, this method is used to divide the cells into different convective systems by the spatiotemporal relationship between the cells.

Assume that there are n cells O1, O2, …, On in a radar scan, the superposition coefficient η is used as the similarity measure in the clustering algorithm, and the steps for the transitive closure are as follows:

  1. Establish a n × n relationship matrix Rn×n = [rij]n×n. For each rij, use Oi and Oj to calculate ηij. Then, set the correlation metric threshold Tre (where Tre = 0.1) and get the rij according to (2). rij indicates whether there is a correlation between Oi and Oj. A value of rij = 1 indicates that Oi and Oj belong to the same precipitation system; rij = 0 indicates that Oi and Oj not belong to the same precipitation system:
    rij={1,ifηijTre0,else.
  2. Calculate the pth power (p is an integer and p ≥ 2; in matrix power operations, the addition and multiplication for real numbers are replaced by the OR operation and the AND operation in boolean operations) of the boolean matrix Rn×n = [rij]n×n to get Rn×n[p]. If Rn×n[p] is an all-one matrix or Rn×n[p]=Rn×n[p1], the clustering ends. Otherwise p: = p + 1 and repeat step 2, where: = is the assignment operator.

When the clustering is completed, Rn×n[p] no longer changes. For each row in the relationship matrix Rn×n[p], all elements with a value of 1 make up a clustering result. All the cells in each clustering result from a potential SIP system based on spatiotemporal superposition.

b. Multicell graph model

1) Build graph models for multicell systems

In this paper, each multicell system is represented as a graph model. A graph G = (U, E) consists of several nodes and edges between nodes, where U is the set of nodes, and E is the set of edges. Each node and edge has its attributes. Each cell in the multicell system can be regarded as a node, and the attributes of each node can be used to characterizes the physical attributes of the cell. The cell nodes are connected by edges, and the attributes of the edges can be used to represent the spatial distribution information between the cells.

For each cell, the features listed in Table 1 are used to describe the cells’ physical attributes, which can be divided into the following categories:

  1. The features used to describe the spatial-scale information of the cell are given by EOCl, EOCs, HOR30, and LOR30.

  2. The features used to indicate about the water content of the cell are VILmax and VILaver.

  3. The feature used to indicate the superposition effect of the cell itself is: RPS. Generally speaking, the slower the cell moves, the more likely it will cause SIP. However, the speed of the cell movement here is relative to the size of the cell. A large cell can cause SIP even if it moves faster. A small cell, even if moving slowly, may not be able to produce SIP. Fundamentally, this phenomenon is that the cell itself has a superposition effect on a certain point on the ground for a period of time. Therefore, we calculate the projection length of the cell in the velocity direction and then get the ratio of the projection length to the speed magnitude as RPS (Fig. 5). RPS can combine the size and speed of the cell to represent the superposition effect of the cell itself.

Table 1.

Features of cell physical attributes.

Table 1.
Fig. 5.
Fig. 5.

RPS is the ratio of the projection of the cell region in the direction of the cell velocity to the cell speed. It combines the size and moving speed of the convective cell, and represents the superposition effect with the cell itself.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

After obtaining the cell features, as shown in Fig. 6, the graph model is constructed to represent the multicell system according to the following steps:
  1. Create a graph G = (U, E). Create node ui for each cell of the multicell system. The attributes of ui is the features of the cell, and then add ui to the set U.

  2. For any two nodes ui and uj in U, calculate their velocity as vi and vj, respectively. Calculate the distance vector Lij from ui to uj. Then calculate tij to represent the relative distance between ui and uj by

tij=|lij||vilij|lij|vjlij|lij||,
where |*| is the operator that evaluates the magnitude of a vector, and ⋅ is the operator that computes the inner product of vectors. Make the threshold Tt, if tijTt (where Tt = 5), then create an edge eij to link ui and uj, and use etij (here e is the natural constant) as the attribute of edge eij. Finally, add eij to the set E.
Fig. 6.
Fig. 6.

Steps to make a convective system to graph model. There are four cells in the convective system. For each cell, its seven physical attributes are calculated. The relative distance t between any two cells is also calculated. A graph model is used to represent the convective system. Each cell in the convective system is represented as a node in the graph model. Node attributes are the physical attributes of the convective cell. A threshold Tt is set. If the relative distance t between two nodes satisfies t < Tt, then an edge is established between the two nodes.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

2) Graph model features

The nowcasting of multicell system SIP has been turned into the classification problem of cell graph, and the graph needs to be represented as feature vectors for ML classification algorithms.

In ML, the number of features is usually fixed. However, the number of cells contained in a multicell system is not invariable, so the features of a plurality of cells cannot be listed together directly as the graph features. Moreover, the features should represent the entire cell graph, contain not only the information of each cell but also the spatial distribution information between the cells. In ML, there are two main ideas for extracting the features of graph models. One is to rely on spectral filtering to form graph features (Bruna et al. 2013; Defferrard et al. 2016), which can only be applied to fixed graph structures. The other is directly inspired by the convolutional neural network, extracts the features of graphs from the spatial perspective of the graph (Niepert et al. 2016; Zhang et al. 2018), and is not limited by the graph structure. In this paper, the latter method is used to extract the features of graph models.

A graph model G = (U, E) contains n nodes, and each node attributes are presented as a seven-dimensional vector (Table 1). A n × 7 dimensional matrix X0 is constructed to represent all of the node attributes in the graph and each row in X0 represents a node. At the same time, the adjacency matrix A is constructed to represent the edges in the graph. A is a symmetric matrix, if there is an edge eij between the nodes ui and uj, make Aij = eij, otherwise, set Aij = 0. To generate feature vectors that could be used to represent the entire graph model, a nodes fusion operation is defined to fuse the different cell features:
Xi+1=D(A+I)Xi,(i=0,1,2,),
where D is the row normalization matrix, and I is the identity matrix. The fusion operation uses edge attributes as weights, and each node features are fused with surrounding nodes according to the spatial distribution information. From a meteorological point of view, each time a fusion operation is performed, each cell combines the adjacent cells to form a subgraph, which could be used to represent part of the entire convective system. For each cell graph, we perform two fusion operations and combine the two fusion results with the original attributes together as the raw features matrix Xfeature of the entire cell graph, where Xfeature = [X0, X1, X2].

One of the key points of the graph features is that they should not be affected by the order of the nodes (Zhang et al. 2018). From a meteorological point of view, the overview of the entire multicell system should not be affected by the order of the cells. In this paper, the simple Weisfeiler–Lehman (WL) algorithm (Weisfeiler and Lehman 1968; Shervashidze et al. 2011) is used to get the fixed graph features result. The WL algorithm is iterative. For each node, after each iteration, it will fuse the attributes of neighboring nodes, and use the fusion result to indicate its position in the graph. After several iterations, only the fusion results of the nodes in the same position are the same. Finally, the fusion results will be sorted. Therefore, regardless of the nodes’ load order when the graph is built, the graph features are the same. Coincidentally, the fusion operation steps as defined above can be seen as the WL algorithm progresses. For this reason, we can directly sort Xfeature after fusion operations to get the final graph features, which are not affected by the order of the cells. Concerning the maximum precipitation, the feature VILmax, which could represent the local water content information of the cells, is specified as the sorting index to rank the Xfeature from large to small. Finally, in order to ensure that the number of rows of Xfeature is constant, the first two rows of Xfeature after the sorting are taken as the final features Xfeaturesort2×21 of the graph.

c. Random forest algorithm

The decision tree is an ML algorithm widely used in meteorology (Burrows et al. 1995; Gagne et al. 2009; Gao et al. 2016; T. Yang et al. 2016; Ma et al. 2018). In the process of constructing a decision tree, for each node, the optimal attribute is selected from all attributes for node splitting. The basis for selecting this optimal attribute is to make the samples contained in one split node belong to the same category as much as possible. This strategy makes decision trees have excellent performance and good interpretability. However, there must be careful control over the split depth of decision trees to prevent them from overfitting the training set. Random forest is an ensemble learning method developed from decision trees. It can reduce the risk of overfitting by considering the output from multiple unique decision trees, which are built using random attributes. For each decision tree in a random forest, each time a node splits, k attributes are randomly selected from all candidate d attributes (usually k = log2d), and the optimal attribute is selected from these k attributes for node splitting (Breiman 2001). In this paper, Gini impurity is used to select the optimal attribute in node splitting and could be defined as
Gini=kkkpkpk=1kpk2,
where pk represents the proportion of class k samples in the current node. The attribute that reduces the Gini impurity most before and after the node split is called the optimal attribute. The strategy of the randomly select features brings randomness into different decision trees in one random forest and improves the robustness and generalization ability of the model. Furthermore, the joint decision of multiple decision trees also makes the random forest have less risk of overfitting. During the training process, random forests use the “bagging” method to form the training set. For an original training set containing m samples, m times of replacement sampling are performed, and the obtained m samples form a real training set for random forest training. When the size of the original dataset is large, about 63.2% of the samples in the original dataset are used to train the random forest model, and the remaining 36.8% of the samples are called “out-of-bag” (OOB) data, which can be used as model evaluation. In this paper, a random forest model is trained to perform nowcasting for multicell system graph models.

4. Experiments and results

a. Evaluation method

A total of 1349 SIP event samples are divided into a training set and a test set according to the ratio of 8:2. To reduce the correlation between the training set and the test set, the samples in the two sets are sampled from different dates. The training set is used for model training and contains 1079 samples, and the test set is used for final model performance evaluation and contains 270 samples. The dataset partition for random forest training is different from other ML models. In other ML model training process, the dataset is often divided into a training set, a validation set, and a test set. The training set and the validation set are used for model training and model parameters selection, and the test set is used for model performance evaluation. In random forest, the OOB set contained in the training set is not used in the training process and could be used for model parameters selection (Breiman 2001). The OOB set is equivalent to the role of the validation set in other ML models.

The contingency table method is used to evaluate model performance (Wilks 2011). As shown in Table 2, a indicates the number of SIP events that are correctly identified; b is the number of false identification of nonevents, often referred to as false alarms; c indicates the number of SIP events but not identified; and d indicates the number of SIP events did not occur and are correctly identified. From this, six evaluation indicators could be calculated: probability of detection (POD) = a/(a + c), false alarm ratio (FAR) = b/(a + b), critical success index (CSI) = a/(a + b + c), Bias = (a + b)/(a + c), true skill statistic (TSS, also named Hanssen–Kuipers discriminant) = (adbc)/[(a + c)(b + d)], and Accuracy = (a + d)/(a + b + c + d).

Table 2.

Contingency table of indicators for model evaluation.

Table 2.

POD represents the ability of the algorithm to identify SIP events correctly. FAR indicates the probability that an event predicted by the algorithm to be “yes,” but did not actually occur. CSI punishes both misses and false alarms and gives a more comprehensive evaluation of the hits of “yes observation” events. Bias is the ratio of the “yes forecasts” to the “yes observations,” indicates whether the model is underforecast or overforecast. TSS represents the model’s ability to separate positive samples from negative samples. These five indicators are used to evaluate models’ performance in meteorological applications. Accuracy is a classification indicator in ML theory for all samples and is often used for training and fine-tuning ML models.

b. Algorithm parameters selection

Random forest is an ensemble learning method composed of multiple decision trees according to the “bagging” principle. During the training process of random forests, the number of decision trees, and the depth of decision trees will affect the performance of random forest models.

Figure 7 shows the trends of model accuracy on the training set and OOB set as the number of decision trees increases. It can be seen that as the number of decision trees increases, the accuracy of the model gradually increases, and overfitting gradually decreases. This is consistent with the random forest theory. Random forest reduces the variance of the model through joint decisions, thereby improving model performance and reducing the risk of overfitting. When the number of decision trees is greater than 100, the accuracy and overfitting of the model tend to be stable. Further increase in the number of decision trees will not bring significant improvement to the model but will consume more computing resources. To balance model performance and computing resource overhead, the number of decision trees is set to 100.

Fig. 7.
Fig. 7.

The relationship between random forest performance and the number of decision trees. The curves show the performance of the model (left axis). The bars represent the margin of the model performance between the training set and the OOB set (right axis).

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

Figure 8 shows the trends of model accuracy on the training set and OOB set as the depth of decision trees increases. It can be seen that as the depth of the decision trees increases, the performance of the model becomes better, but the model’s overfitting gradually increases. This shows that although random forests can reduce the entire model overfitting risk by joint decisions, the overall overfitting also occurs when the base decision trees are overfitting seriously. When the depth is greater than 7, the overfitting reaches 10%, which is unacceptable. This shows that the decision trees at this time are too complicated relative to the sample distribution. When the decision tree depth is greater than 4, the performance of the model on the OOB set becomes stable, but the degree of overfitting is still increasing. To balance model overfitting and model performance, the depth of decision trees is set to 4.

Fig. 8.
Fig. 8.

The relationship between the random forest performance and the depth of decision trees. The curves show the performance of the model (left axis). The bars represent the margin of the model performance between the training set and the OOB set (right axis).

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

c. Algorithm performance

After selecting the optimal parameters, our algorithm is run on the test set to show the final performance. The POD, FAR, CSI, bias, and TSS are shown in Table 3 and are also shown in Fig. 9 using the performance diagram (Roebber 2009).

Table 3.

Performance of different algorithms on the test set.

Table 3.
Fig. 9.
Fig. 9.

Performance diagram summarizing the POD, FAR, bias, and CSI. The curved dotted lines indicate CSI. The straight dotted lines indicate bias.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

Also, the tracking radar echo by cross-correlation (TREC) method (Rinehart and Garvey 1978) and the optical flow-based precipitation prediction method (Bowler et al. 2004) are used as the baseline to illustrate the performance of our algorithm and run on the test set. The results are also shown in Table 3 and Fig. 9. These two algorithms are based on radar data and are widely used in operational systems for precipitation forecasting. TREC uses the correlation of echoes from different radar regions to get the direction of movement of the precipitation system for nowcasting. The optical flow-based method also uses correlation to estimate the move direction and adds smoothing constraints to get better forecast results. In applications, we fine-tuned the parameters of the two algorithms to achieve the best performance on our dataset. Although all three algorithms use the same radar region, both the baseline algorithms are point-to-point algorithms, so an area threshold is set to determine whether the prediction results include SIP events. The area threshold is set to 60 km2, which is an AWS average coverage in our dataset. In other words, when the forecast area of the two algorithms with precipitation of more than 20 mm h−1 can cover one AWS, it is considered that the result includes a SIP event. Based on this, the prediction results are compared with the real labels to calculate the POD, FAR, CSI, bias, and TSS of the two algorithms. As shown in Table 3, compared to the TREC algorithm, our algorithm and optical flow algorithm both have higher POD. However, both the TREC algorithm and the optical flow algorithm have higher FAR, which means that they are more inclined to predict non-SIP events as SIP events. Our algorithm’s FAR is much lower than these two and TSS is higher. This indicates that our algorithm can distinguish SIP events from non-SIP events better. Therefore, higher POD and lower FAR result in our algorithm with higher CSI. From the bias score, the optical flow-based method has the risk of overforecast. TREC and our algorithm only have slight overforecast or underforecast.

Two other issues are also concerned with model performance. Whether the model is overfitting and whether increasing the sample will further improve the model performance. These two problems are now analyzed using the “learning curve” (Fig. 10). It can be seen that when there are few training samples, the model can easily fit the training samples, but at this time, the distribution of the training set and the test set is significantly different, so the model performance on the test set is poor. As the training samples increase, the training set becomes more complex, and the model performance on the training set becomes worse. However, in this increasing process, the distribution of the training set and the test set gradually becomes consistent. The performance gap between the two sets become smaller. In Fig. 10, the final margin between the two curves is less than 5%. This shows that the model performance on the training set and the test set are the same, and there is no overfitting. In addition, as the training samples increase, when the number of samples is greater than 500, the performance of the model on the test set tends to be stable. This shows that the dataset is already large enough, even if more samples are added, the model performance will not be further improved.

Fig. 10.
Fig. 10.

The learning curve shows the model performance when the number of training samples increases from 1 to 1079. The curves show the performance of the model (left axis). The bars represent the margin of the model performance between the training set and the test set (right axis).

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

5. Discussion

a. Feature importance analysis

In ML, feature importance represents the contribution of each feature to the final result and can be used to analyze the behavior of the model, thereby improving the interpretability of the model or exploring the underlying principles of the problem. In the random forest used here, the Gini impurity is used as the basis for each node split in each tree. Every time a split of a node is made on one feature, the Gini impurity criterion for the two descendent nodes is less than the parent node. Adding up the Gini reduction of each individual feature on all trees in the forest gives the contribution of each feature to the entire model result. At the same time, the feature depth in decision trees can also be used as a measure of the contribution of the feature to the model. By combining these two contributions and normalizing them, the importance of each feature can be obtained.

As shown in Fig. 11, there are the top 30 important features in this algorithm. In section graph model features, we make two fusion operations after obtaining the original cell features and combine the features obtained from the two fusions with the original features. Finally, the features of the first two subgraphs are taken as the features of the entire multicell convective system. This process can be explained as follows. The fusion operations make it so that each node in the graph model no longer represents a single cell, but instead represents the common attributes of this cell and surrounding cells. Because each node can represent multiple cells, each node after fusion operations is also called a subgraph. The first subgraph represents the features of the strongest part of the multicell convective system, and the second subgraph represents the features of other weaker cells. The two subgraphs jointly represent the overall features of a multicell system. In this way, each original feature in Table 1 has derived multiple features. For convenience, a superscript is added to each original feature name to distinguish these derived features. The α and β in the superscripts are used to identify whether this feature comes from the first subgraph or the second subgraph. The 0, 1, and 2 in the superscripts are used to distinguish whether this feature comes from the original feature or the first fusion operation or the second fusion operation (e.g., HOR30α2 represents such a feature, which is derived from HOR30 after two fusion operations and belongs to the first subgraph).

Fig. 11.
Fig. 11.

The importance of features. Each vertical axis tick label consists of two parts. The first part consists of capital letters and subscripts and represents the symbol of the feature in Table 1. The second part is the superscript and represents the stage of the feature in the fusion operations, which is described in detail in section 5a. For the superscripts, α indicates that this feature belongs to the first subgraph, and β indicates that this feature belongs to the second subgraph. The numbers 0, 1, and 2 respectively correspond to the original feature, the feature of the first fusion, and the feature of the second fusion.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

In Fig. 11, the HOR30α0 feature has far more important than the other features. HOR30 represents 30 dBZ top heights of cell echoes. It potentially reflects the strength of updrafts and the content of moisture in cells. This result is consistent with general knowledge. Proper top height will cause more moisture to fall to the ground in the liquid phase. Higher top height means more moisture is brought to higher altitudes, which more possibility to be hail. Too-low top height means that there may not be enough moisture to be brought to the upper air, and it is difficult to produce intense precipitation. The result shows that for SIP events, the updraft strength or moisture content of the strongest cell in the convection system is the most critical factor. Also, the features of the second subgraph index by β in superscripts make a great contribution. The second subgraph features are originally used to supplement the strongest part of the system, and to enhance the integrity of the information of the multicell system. This result shows that the strongest part of a multicell system cannot effectively distinguish SIP events from non-SIP events independently. Instead, the overall distribution of the multicell system is more important for predicting SIP events. This also reflects that SIP is caused by all the cells in the entire multicell convective system, and independent analysis of any one of them cannot effectively predict SIP.

b. Error analysis

The graph model and random forest show excellent performance on SIP nowcasting but still can make classification errors. Diagnosing the source of these errors can be challenging since random forest classification depends on the output from all the unique trees in the forest. One of the simplest ways to analyze ML model errors is to investigate the mistakes the ML model makes, so as to estimate the possible problems of the model and improve it.

Figure 12 shows a false alarm case. At 0700 UTC, the multicell convective system has a large scale. The algorithm judges that the system will cause a SIP event in the next hour according to the situation at this time. But this convective system quickly weakens in the next few scans and does not cause SIP. Based on the spatiotemporal characteristics of this convection process, we can infer that the algorithm lacks the ability to characterize convective systems that change dramatically in time. Figure 13 shows a SIP event that could not be correctly identified (also named false negative). From a holistic point of view, this convective system has a larger scale, and the size of the system has not changed significantly over time. Therefore, we should attribute one of the reasons for the error to the poor generalization ability of the algorithm. However, when we analyze from a microscopic point of view, we can find that the convective cells at the center of the system gradually weaken, and the cells around the system strengthen progressively. Therefore, the lack of the algorithm’s ability to characterize the change of convective cell intensity is also one reason for this error.

Fig. 12.
Fig. 12.

A SIP event false alarm case. The time period is 0700–0754 UTC 29 Jun 2016, and the data come from the radar located in Shijiazhuang. The colored points are the AWS sites with precipitation. The area marked with a blue outline in the image at 0700 UTC is the multicell convective system at the moment of the false alarm. In the next hour of the false alarm, the intensity of this multicell system quickly weakened.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

Fig. 13.
Fig. 13.

A SIP event that is not correctly identified. The data comes from the radar located in Jinan, and the time period is 1000–1053 UTC 25 Jul 2016. The area surrounded by the blue outline is the multicell convective system when the omission occurs. The colored points are the AWS sites with precipitation. During this SIP event, the scale of the entire convective system is slightly larger but not dramatic. The size of small cells located around the system is increased significantly.

Citation: Monthly Weather Review 148, 11; 10.1175/MWR-D-20-0050.1

According to the above analysis, the causes of false alarm and false negative can be divided into two categories: poor ability to represent the growth and dissipation of convective systems, and poor generalization ability. In false alarms, the proportions of errors caused by these two reasons are approximately 36.8% and 63.2%, respectively. In false negatives, these ratios are about 20% and 80%. This classification of error reasons seems simple, but one needs to consider the meaning behind the classification. The first error represents the errors caused by the complexity of the research object. This kind of error is targeted and pointing to a prominent characteristic of the research object itself. It can often be solved by the targeted design of the model combined with specific industry knowledge. The second error represents the systematic error in the learning process of the model, which often needs to be solved by improving the overall accuracy of the system or changing the learning strategy of the model.

For the first problem, the explicit description of the growth and dissipation of multicell systems should be attempted included in model design. Specifically, for each cell, the growth and dissipation could be described by using the change of the long axis and short axis of the external ellipse of the cell. More precisely, mathematical morphology also could be used to describe the growth and dissipation of the cell at key points. However, it is should also be noted that compromise is always there. Although simple descriptors are not accurate, they are better able to resist noise and can be easily combined with existing models. The more complex descriptors are more accurate, but they are more sensitive to noise and less likely to be integrated into existing models. Simple or complex needs more detailed and in-depth experimental support in the future.

For the second problem, a model derived from data will often bring unexpected errors, especially when the algorithm is composed of multiple steps, the error location becomes more difficult. At this point, all the details of the algorithm need to be carefully examined, and the learning strategy of the model also considered to be adjusted. For the former, the focus is cell speed estimation. The polynomial expansion optical flow method proposed by Farnebäck (2003) is used to estimate the cell velocity and the dense optical flow field can be obtained efficiently. However, due to the limitations of the optical flow method, good velocity results can be obtained on the larger cells, but the estimations of velocity for the too small cells are not accurate enough. To improve this problem, the velocity estimation method based on the locus of regional centers (Johnson et al. 1998; Hou and Wang 2017) will be combined in future work to improve the velocity estimation results of small cells. For the improvement of model learning strategy, the most intuitive method is to use more labeled samples and more complex models together, such as large-scale deep learning models supported by more samples. At the same time, it should be noted that there are geographical differences between multicell precipitation systems, especially China, which has vast areas and diverse climatic conditions. Therefore, based on more samples, different models will be trained to improve the performance in practical applications for different regions.

6. Summary

The contributions of this paper can be summarized as follows:

  1. The SIP nowcasting problem is considered from the perspective of the multicell system, and combined with graph theory and ML theory, providing a novel perspective for SIP nowcasting.

  2. The graph model is used to represent multicell systems. It takes into account the features of each cell in the multicell system and the spatial distribution of the entire system. Then, the feature vectors that could be used by ML algorithms are generated.

  3. The random forest algorithm is used for the SIP event nowcasting of multicell convective system graph models and shows excellent performance.

In the future, the research will focus on enhancing the model’s ability to represent the growth and dissipation of the system, as well as further improving the model’s training strategy. In addition, the research will introduce environmental information and construct space–temporal models. Environmental field information includes temperature, convective available potential energy, convective inhibition, etc. These data are used to describe the source of water vapor and dynamical system, and can more fundamentally predict the growth and decrease of the cell (Han et al. 2017). Since this work takes multicell systems as the basic unit rather than pixels, different from the traditional format of environmental field data, future efforts will focus on how to introduce environmental field information into multicell systems in an effective way. Finally, the weather system is a spatial-temporal system, and use the spatial-temporal model for it is a natural and more effective method. Shi et al. (2015) proved that introducing the time-based model LSTM into radar extrapolation task can improve the accuracy of the results. The future focus is on how to combine LSTM with the multicell graph model in this paper to construct a spatial-temporal nowcasting model for convective systems.

Acknowledgments

This work is supported by the Natural Science Foundation of Tianjin, China (14JCYBJC21800). All the data are provided by the Meteorological Observation Center of the China Meteorological Administration.

REFERENCES

  • Akbari Asanjan, A., T. Yang, K. Hsu, S. Sorooshian, J. Lin, and Q. Peng, 2018: Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos., 123, 12 54312 563, https://doi.org/10.1029/2018jd028375.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bližňák, V., Z. Sokol, and P. Zacharov, 2017: Nowcasting of deep convective clouds and heavy precipitation: Comparison study between NWP model simulation and extrapolation. Atmos. Res., 184, 2434, https://doi.org/10.1016/j.atmosres.2016.10.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bowler, N. E., C. E. Pierce, and A. Seed, 2004: Development of a precipitation nowcasting algorithm based upon optical flow techniques. J. Hydrol., 288, 7491, https://doi.org/10.1016/j.jhydrol.2003.11.011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Bruna, J., W. Zaremba, A. Szlam, and Y. LeCun, 2013: Spectral networks and locally connected networks on graphs. arXiv:1312.6203, https://arxiv.org/abs/1312.6203.

  • Burrows, W. R., M. Benjamin, S. Beauchamp, E. R. Lord, D. McCollor, and B. Thomson, 1995: CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. J. Appl. Meteor., 34, 18481862, https://doi.org/10.1175/1520-0450(1995)034<1848:CDTSAA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Czernecki, B., M. Taszarek, M. Marosz, M. Półrolniczak, L. Kolendowicz, A. Wyszogrodzki, and J. Szturc, 2019: Application of machine learning to large hail prediction—The importance of radar reflectivity, lightning occurrence and convective parameters derived from ERA5. Atmos. Res., 227, 249262, https://doi.org/10.1016/j.atmosres.2019.05.010.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Defferrard, M., X. Bresson, and P. Vandergheynst, 2016: Convolutional neural networks on graphs with fast localized spectral filtering. 30th Conf. on Neural Information Processing System (NIPS 2016), Barcelona, Spain, NIPS, https://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf.

  • Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785797, https://doi.org/10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Farnebäck, G., 2003: Two-frame motion estimation based on polynomial expansion. Image Analysis SCIA 2003, J. Bigun and T. Gustavsson, Eds., Lecture Notes in Computer Science, Vol. 2749, Springer, 363–370.

  • Gagne, D. J., A. McGovern, and J. Brotzge, 2009: Classification of convective areas using decision trees. J. Atmos. Oceanic Technol., 26, 13411353, https://doi.org/10.1175/2008JTECHA1205.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., A. McGovern, and M. Xue, 2014: Machine learning enhancement of storm-scale ensemble probabilistic quantitative precipitation forecasts. Wea. Forecasting, 29, 10241043, https://doi.org/10.1175/WAF-D-13-00108.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 18191840, https://doi.org/10.1175/WAF-D-17-0010.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gao, S., W. Zhang, J. Liu, I.-I. Lin, L. S. Chiu, and K. Cao, 2016: Improvements in typhoon intensity change classification by incorporating an ocean coupling potential intensity index into decision trees. Wea. Forecasting, 31, 95106, https://doi.org/10.1175/WAF-D-15-0062.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gao, X., M. Guo, Z. Yang, Q. Zhu, Z. Xu, and K. Gao, 2020: Temperature dependence of extreme precipitation over mainland China. J. Hydrol., 583, 124595, https://doi.org/10.1016/j.jhydrol.2020.124595.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golding, B. W., 1998: Nimrod: A system for generating automated very short range forecasts. Meteor. Appl., 5, 116, https://doi.org/10.1017/S1350482798000577.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guo, X., D. Fu, and J. Wang, 2006: Mesoscale convective precipitation system modified by urbanization in Beijing city. Atmos. Res., 82, 112126, https://doi.org/10.1016/j.atmosres.2005.12.007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, L., S. Fu, L. Zhao, Y. Zheng, H. Wang, and Y. Lin, 2009: 3D convective storm identification, tracking, and forecasting—An enhanced TITAN algorithm. J. Atmos. Oceanic Technol., 26, 719732, https://doi.org/10.1175/2008JTECHA1084.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, L., J. Sun, W. Zhang, Y. Xiu, H. Feng, and Y. Lin, 2017: A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos., 122, 40384051, https://doi.org/10.1002/2016JD025783.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, X., H. Xue, C. Zhao, and D. Lu, 2016: The roles of convective and stratiform precipitation in the observed precipitation trends in northwest China during 1961–2000. Atmos. Res., 169, 139146, https://doi.org/10.1016/j.atmosres.2015.10.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Handwerker, J., 2002: Cell tracking with trace 3D: A new algorithm. Atmos. Res., 61, 1534, https://doi.org/10.1016/S0169-8095(01)00100-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018: Money doesn’t grow on trees, but forecasts do: Forecasting extreme precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Horn, B. K., and B. G. Schunck, 1981: Determining optical flow. Artif. Intell., 17, 185203, https://doi.org/10.1016/0004-3702(81)90024-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hou, J., and P. Wang, 2017: Storm tracking via tree structure representation of radar data. J. Atmos. Oceanic Technol., 34, 729747, https://doi.org/10.1175/JTECH-D-15-0119.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Iadanza, C., A. Trigila, and F. Napolitano, 2016: Identification and characterization of rainfall events responsible for triggering of debris flows and shallow landslides. J. Hydrol., 541, 230245, https://doi.org/10.1016/j.jhydrol.2016.01.018.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jain, A. K., M. N. Murty, and P. J. Flynn, 1999: Data clustering: A review. ACM Comput. Surv., 31, 264323, https://doi.org/10.1145/331499.331504.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Johnson, J. T., P. L. Mackeen, A. Witt, E. D. Mitchell, G. J. Stumpf, M. D. Eilts, and K. W. Thomas, 1998: The Storm Cell Identification and Tracking Algorithm: An enhanced WSR-88D algorithm. Wea. Forecasting, 13, 263276, https://doi.org/10.1175/1520-0434(1998)013<0263:TSCIAT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lai, E. S., 1998: TREC application in tropical cyclone observation. ESCAP/WMO Typhoon Committee Annual Review, N. C. Lomarda, Ed., Typhoon Committee Secretariat, 135–139.

  • Li, L., W. Schmid, and J. Joss, 1995: Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteor., 34, 12861300, https://doi.org/10.1175/1520-0450(1995)034<1286:NOMAGO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liang, Q., Y. Feng, W. Deng, S. Hu, Y. Huang, Q. Zeng, and Z. Chen, 2010: A composite approach of radar echo extrapolation based on TREC vectors in combination with model-predicted winds. Adv. Atmos. Sci., 27, 11191130, https://doi.org/10.1007/s00376-009-9093-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, W., X. Li, and D. A. Rahn, 2016: Storm event representation and analysis based on a directed spatiotemporal graph model. Int. J. Geogr. Info. Sci., 30, 948969, https://doi.org/10.1080/13658816.2015.1081910.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Y., D.-G. Xi, Z.-L. Li, and Y. Hong, 2015: A new methodology for pixel-quantitative precipitation nowcasting using a pyramid Lucas Kanade optical flow approach. J. Hydrol., 529, 354364, https://doi.org/10.1016/j.jhydrol.2015.07.042.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Y.-Y., L. Li, Y.-S. Liu, P. W. Chan, and W.-H. Zhang, 2020: Dynamic spatial-temporal precipitation distribution models for short-duration rainstorms in Shenzhen, China based on machine learning. Atmos. Res., 237, 104861, https://doi.org/10.1016/j.atmosres.2020.104861.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Loken, E. D., A. J. Clark, A. McGovern, M. Flora, and K. Knopfmeier, 2019: Postprocessing next-day ensemble probabilistic precipitation forecasts using random forests. Wea. Forecasting, 34, 20172044, https://doi.org/10.1175/WAF-D-19-0109.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ma, L., G. Zhang, and E. Lu, 2018: Using the gradient boosting decision tree to improve the delineation of hourly rain areas during the summer from advanced Himawari imager data. J. Hydrometeor., 19, 761776, https://doi.org/10.1175/JHM-D-17-0109.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Niepert, M., M. Ahmed, and K. Kutzkov, 2016: Learning convolutional neural networks for graphs. Int. Conf. on Machine Learning, New York, NY, JMLR, 2014–2023.

  • Rinehart, R., and E. Garvey, 1978: Three-dimensional storm motion detection by conventional weather radar. Nature, 273, 287289, https://doi.org/10.1038/273287a0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601608, https://doi.org/10.1175/2008WAF2222159.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rosen, K. H., and K. Krithivasan, 2012: Discrete Mathematics and Its Applications: With Combinatorics and Graph Theory. McGraw-Hill Education, 1071 pp.

    • Search Google Scholar
    • Export Citation
  • Rossi, P. J., V. Chandrasekar, V. Hasu, and D. Moisseev, 2015: Kalman filtering–based probabilistic nowcasting of object-oriented tracked convective storms. J. Atmos. Oceanic Technol., 32, 461477, https://doi.org/10.1175/JTECH-D-14-00184.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Seo, J.-H., Y. H. Lee, and Y.-H. Kim, 2014: Feature selection for very short-term heavy rainfall prediction using evolutionary computation. Adv. Meteor., 2014, 203545, https://doi.org/10.1155/2014/203545.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shervashidze, N., P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, 2011: Weisfeiler–Lehman graph kernels. J. Mach. Learn. Res., 12, 25392561.

    • Search Google Scholar
    • Export Citation
  • Shi, X., Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems, C. Cortes et al., Eds., Vol. 1, MIT Press, 802–810.

  • Sokol, Z., V. Bližňák, P. Zacharov, and K. Skripniková, 2016: Nowcasting of hailstorms simulated by the NWP model COSMO for the area of the Czech Republic. Atmos. Res., 171, 6676, https://doi.org/10.1016/j.atmosres.2015.12.006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, P., C. Li, and Y. Zhang, 2013: An adaptive segmentation arithmetic adapted to intertwined irregular convective storm images. 2013 Int. Conf. on Machine Learning and Cybernetics, Tianjin, China, IEEE, 896–900.

  • Wang, P., J. Shi, J. Hou, and Y. Hu, 2018: The identification of hail storms in the early stage using time series analysis. J. Geophys. Res. Atmos., 123, 929947, https://doi.org/10.1002/2017jd027449.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Weisfeiler, B., and A. A. Lehman, 1968: A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Tech. Info., 2, 1216.

    • Search Google Scholar
    • Export Citation
  • Whitehall, K., and Coauthors, 2015: Exploring a graph theory based algorithm for automated identification and characterization of large mesoscale convective systems in satellite datasets. Earth Sci. Info., 8, 663675, https://doi.org/10.1007/s12145-014-0181-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

  • Wilson, J. W., E. E. Ebert, T. R. Saxen, R. D. Roberts, C. K. Mueller, M. Sleigh, C. E. Pierce, and A. Seed, 2004: Sydney 2000 forecast demonstration project: Convective storm nowcasting. Wea. Forecasting, 19, 131150, https://doi.org/10.1175/1520-0434(2004)019<0131:SFDPCS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Woo, W., and W. Wong, 2017: Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere, 8, 48, https://doi.org/10.3390/atmos8030048.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, B., J. Sun, M. Xu, and Y. Lin, 2016: Multi-scale characteristics of atmospheric circulation related to short-time strong rainfall events in Beijing. Acta Meteor. Sin., 74, 919934.

    • Search Google Scholar
    • Export Citation
  • Yang, T., X. Gao, S. Sorooshian, and X. Li, 2016: Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme. Water Resour. Res., 52, 16261651, https://doi.org/10.1002/2015WR017394.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zahraei, A., K. Hsu, S. Sorooshian, J. Gourley, V. Lakshmanan, Y. Hong, and T. Bellerby, 2012: Quantitative precipitation nowcasting: A Lagrangian pixel-based approach. Atmos. Res., 118, 418434, https://doi.org/10.1016/j.atmosres.2012.07.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zahraei, A., K. Hsu, S. Sorooshian, J. Gourley, Y. Hong, and A. Behrangi, 2013: Short-term quantitative precipitation forecasting using an object-based approach. J. Hydrol., 483, 115, https://doi.org/10.1016/j.jhydrol.2012.09.052.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, M., Z. Cui, M. Neumann, and Y. Chen, 2018: An end-to-end deep learning architecture for graph classification. 32nd AAAI Conf. on Artificial Intelligence, New Orleans, LA, AAAI, 4438–4445.

Save
  • Akbari Asanjan, A., T. Yang, K. Hsu, S. Sorooshian, J. Lin, and Q. Peng, 2018: Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos., 123, 12 54312 563, https://doi.org/10.1029/2018jd028375.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bližňák, V., Z. Sokol, and P. Zacharov, 2017: Nowcasting of deep convective clouds and heavy precipitation: Comparison study between NWP model simulation and extrapolation. Atmos. Res., 184, 2434, https://doi.org/10.1016/j.atmosres.2016.10.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bowler, N. E., C. E. Pierce, and A. Seed, 2004: Development of a precipitation nowcasting algorithm based upon optical flow techniques. J. Hydrol., 288, 7491, https://doi.org/10.1016/j.jhydrol.2003.11.011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Bruna, J., W. Zaremba, A. Szlam, and Y. LeCun, 2013: Spectral networks and locally connected networks on graphs. arXiv:1312.6203, https://arxiv.org/abs/1312.6203.

  • Burrows, W. R., M. Benjamin, S. Beauchamp, E. R. Lord, D. McCollor, and B. Thomson, 1995: CART decision-tree statistical analysis and prediction of summer season maximum surface ozone for the Vancouver, Montreal, and Atlantic regions of Canada. J. Appl. Meteor., 34, 18481862, https://doi.org/10.1175/1520-0450(1995)034<1848:CDTSAA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Czernecki, B., M. Taszarek, M. Marosz, M. Półrolniczak, L. Kolendowicz, A. Wyszogrodzki, and J. Szturc, 2019: Application of machine learning to large hail prediction—The importance of radar reflectivity, lightning occurrence and convective parameters derived from ERA5. Atmos. Res., 227, 249262, https://doi.org/10.1016/j.atmosres.2019.05.010.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Defferrard, M., X. Bresson, and P. Vandergheynst, 2016: Convolutional neural networks on graphs with fast localized spectral filtering. 30th Conf. on Neural Information Processing System (NIPS 2016), Barcelona, Spain, NIPS, https://papers.nips.cc/paper/6081-convolutional-neural-networks-on-graphs-with-fast-localized-spectral-filtering.pdf.

  • Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm Identification, Tracking, Analysis, and Nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785797, https://doi.org/10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Farnebäck, G., 2003: Two-frame motion estimation based on polynomial expansion. Image Analysis SCIA 2003, J. Bigun and T. Gustavsson, Eds., Lecture Notes in Computer Science, Vol. 2749, Springer, 363–370.

  • Gagne, D. J., A. McGovern, and J. Brotzge, 2009: Classification of convective areas using decision trees. J. Atmos. Oceanic Technol., 26, 13411353, https://doi.org/10.1175/2008JTECHA1205.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., A. McGovern, and M. Xue, 2014: Machine learning enhancement of storm-scale ensemble probabilistic quantitative precipitation forecasts. Wea. Forecasting, 29, 10241043, https://doi.org/10.1175/WAF-D-13-00108.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gagne, D. J., A. McGovern, S. E. Haupt, R. A. Sobash, J. K. Williams, and M. Xue, 2017: Storm-based probabilistic hail forecasting with machine learning applied to convection-allowing ensembles. Wea. Forecasting, 32, 18191840, https://doi.org/10.1175/WAF-D-17-0010.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gao, S., W. Zhang, J. Liu, I.-I. Lin, L. S. Chiu, and K. Cao, 2016: Improvements in typhoon intensity change classification by incorporating an ocean coupling potential intensity index into decision trees. Wea. Forecasting, 31, 95106, https://doi.org/10.1175/WAF-D-15-0062.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gao, X., M. Guo, Z. Yang, Q. Zhu, Z. Xu, and K. Gao, 2020: Temperature dependence of extreme precipitation over mainland China. J. Hydrol., 583, 124595, https://doi.org/10.1016/j.jhydrol.2020.124595.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Golding, B. W., 1998: Nimrod: A system for generating automated very short range forecasts. Meteor. Appl., 5, 116, https://doi.org/10.1017/S1350482798000577.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guo, X., D. Fu, and J. Wang, 2006: Mesoscale convective precipitation system modified by urbanization in Beijing city. Atmos. Res., 82, 112126, https://doi.org/10.1016/j.atmosres.2005.12.007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, L., S. Fu, L. Zhao, Y. Zheng, H. Wang, and Y. Lin, 2009: 3D convective storm identification, tracking, and forecasting—An enhanced TITAN algorithm. J. Atmos. Oceanic Technol., 26, 719732, https://doi.org/10.1175/2008JTECHA1084.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, L., J. Sun, W. Zhang, Y. Xiu, H. Feng, and Y. Lin, 2017: A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos., 122, 40384051, https://doi.org/10.1002/2016JD025783.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, X., H. Xue, C. Zhao, and D. Lu, 2016: The roles of convective and stratiform precipitation in the observed precipitation trends in northwest China during 1961–2000. Atmos. Res., 169, 139146, https://doi.org/10.1016/j.atmosres.2015.10.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Handwerker, J., 2002: Cell tracking with trace 3D: A new algorithm. Atmos. Res., 61, 1534, https://doi.org/10.1016/S0169-8095(01)00100-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018: Money doesn’t grow on trees, but forecasts do: Forecasting extreme precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Horn, B. K., and B. G. Schunck, 1981: Determining optical flow. Artif. Intell., 17, 185203, https://doi.org/10.1016/0004-3702(81)90024-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hou, J., and P. Wang, 2017: Storm tracking via tree structure representation of radar data. J. Atmos. Oceanic Technol., 34, 729747, https://doi.org/10.1175/JTECH-D-15-0119.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Iadanza, C., A. Trigila, and F. Napolitano, 2016: Identification and characterization of rainfall events responsible for triggering of debris flows and shallow landslides. J. Hydrol., 541, 230245, https://doi.org/10.1016/j.jhydrol.2016.01.018.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jain, A. K., M. N. Murty, and P. J. Flynn, 1999: Data clustering: A review. ACM Comput. Surv., 31, 264323, https://doi.org/10.1145/331499.331504.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Johnson, J. T., P. L. Mackeen, A. Witt, E. D. Mitchell, G. J. Stumpf, M. D. Eilts, and K. W. Thomas, 1998: The Storm Cell Identification and Tracking Algorithm: An enhanced WSR-88D algorithm. Wea. Forecasting, 13, 263276, https://doi.org/10.1175/1520-0434(1998)013<0263:TSCIAT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lai, E. S., 1998: TREC application in tropical cyclone observation. ESCAP/WMO Typhoon Committee Annual Review, N. C. Lomarda, Ed., Typhoon Committee Secretariat, 135–139.

  • Li, L., W. Schmid, and J. Joss, 1995: Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteor., 34, 12861300, https://doi.org/10.1175/1520-0450(1995)034<1286:NOMAGO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liang, Q., Y. Feng, W. Deng, S. Hu, Y. Huang, Q. Zeng, and Z. Chen, 2010: A composite approach of radar echo extrapolation based on TREC vectors in combination with model-predicted winds. Adv. Atmos. Sci., 27, 11191130, https://doi.org/10.1007/s00376-009-9093-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, W., X. Li, and D. A. Rahn, 2016: Storm event representation and analysis based on a directed spatiotemporal graph model. Int. J. Geogr. Info. Sci., 30, 948969, https://doi.org/10.1080/13658816.2015.1081910.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Y., D.-G. Xi, Z.-L. Li, and Y. Hong, 2015: A new methodology for pixel-quantitative precipitation nowcasting using a pyramid Lucas Kanade optical flow approach. J. Hydrol., 529, 354364, https://doi.org/10.1016/j.jhydrol.2015.07.042.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Y.-Y., L. Li, Y.-S. Liu, P. W. Chan, and W.-H. Zhang, 2020: Dynamic spatial-temporal precipitation distribution models for short-duration rainstorms in Shenzhen, China based on machine learning. Atmos. Res., 237, 104861, https://doi.org/10.1016/j.atmosres.2020.104861.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Loken, E. D., A. J. Clark, A. McGovern, M. Flora, and K. Knopfmeier, 2019: Postprocessing next-day ensemble probabilistic precipitation forecasts using random forests. Wea. Forecasting, 34, 20172044, https://doi.org/10.1175/WAF-D-19-0109.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ma, L., G. Zhang, and E. Lu, 2018: Using the gradient boosting decision tree to improve the delineation of hourly rain areas during the summer from advanced Himawari imager data. J. Hydrometeor., 19, 761776, https://doi.org/10.1175/JHM-D-17-0109.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Niepert, M., M. Ahmed, and K. Kutzkov, 2016: Learning convolutional neural networks for graphs. Int. Conf. on Machine Learning, New York, NY, JMLR, 2014–2023.

  • Rinehart, R., and E. Garvey, 1978: Three-dimensional storm motion detection by conventional weather radar. Nature, 273, 287289, https://doi.org/10.1038/273287a0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601608, https://doi.org/10.1175/2008WAF2222159.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rosen, K. H., and K. Krithivasan, 2012: Discrete Mathematics and Its Applications: With Combinatorics and Graph Theory. McGraw-Hill Education, 1071 pp.

    • Search Google Scholar
    • Export Citation
  • Rossi, P. J., V. Chandrasekar, V. Hasu, and D. Moisseev, 2015: Kalman filtering–based probabilistic nowcasting of object-oriented tracked convective storms. J. Atmos. Oceanic Technol., 32, 461477, https://doi.org/10.1175/JTECH-D-14-00184.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Seo, J.-H., Y. H. Lee, and Y.-H. Kim, 2014: Feature selection for very short-term heavy rainfall prediction using evolutionary computation. Adv. Meteor., 2014, 203545, https://doi.org/10.1155/2014/203545.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shervashidze, N., P. Schweitzer, E. J. van Leeuwen, K. Mehlhorn, and K. M. Borgwardt, 2011: Weisfeiler–Lehman graph kernels. J. Mach. Learn. Res., 12, 25392561.

    • Search Google Scholar
    • Export Citation
  • Shi, X., Z. Chen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. NIPS’15: Proceedings of the 28th International Conference on Neural Information Processing Systems, C. Cortes et al., Eds., Vol. 1, MIT Press, 802–810.

  • Sokol, Z., V. Bližňák, P. Zacharov, and K. Skripniková, 2016: Nowcasting of hailstorms simulated by the NWP model COSMO for the area of the Czech Republic. Atmos. Res., 171, 6676, https://doi.org/10.1016/j.atmosres.2015.12.006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, P., C. Li, and Y. Zhang, 2013: An adaptive segmentation arithmetic adapted to intertwined irregular convective storm images. 2013 Int. Conf. on Machine Learning and Cybernetics, Tianjin, China, IEEE, 896–900.

  • Wang, P., J. Shi, J. Hou, and Y. Hu, 2018: The identification of hail storms in the early stage using time series analysis. J. Geophys. Res. Atmos., 123, 929947, https://doi.org/10.1002/2017jd027449.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Weisfeiler, B., and A. A. Lehman, 1968: A reduction of a graph to a canonical form and an algebra arising during this reduction. Nauchno-Tech. Info., 2, 1216.

    • Search Google Scholar
    • Export Citation
  • Whitehall, K., and Coauthors, 2015: Exploring a graph theory based algorithm for automated identification and characterization of large mesoscale convective systems in satellite datasets. Earth Sci. Info., 8, 663675, https://doi.org/10.1007/s12145-014-0181-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

  • Wilson, J. W., E. E. Ebert, T. R. Saxen, R. D. Roberts, C. K. Mueller, M. Sleigh, C. E. Pierce, and A. Seed, 2004: Sydney 2000 forecast demonstration project: Convective storm nowcasting. Wea. Forecasting, 19, 131150, https://doi.org/10.1175/1520-0434(2004)019<0131:SFDPCS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Woo, W., and W. Wong, 2017: Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere, 8, 48, https://doi.org/10.3390/atmos8030048.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, B., J. Sun, M. Xu, and Y. Lin, 2016: Multi-scale characteristics of atmospheric circulation related to short-time strong rainfall events in Beijing. Acta Meteor. Sin., 74, 919934.

    • Search Google Scholar
    • Export Citation
  • Yang, T., X. Gao, S. Sorooshian, and X. Li, 2016: Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme. Water Resour. Res., 52, 16261651, https://doi.org/10.1002/2015WR017394.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zahraei, A., K. Hsu, S. Sorooshian, J. Gourley, V. Lakshmanan, Y. Hong, and T. Bellerby, 2012: Quantitative precipitation nowcasting: A Lagrangian pixel-based approach. Atmos. Res., 118, 418434, https://doi.org/10.1016/j.atmosres.2012.07.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zahraei, A., K. Hsu, S. Sorooshian, J. Gourley, Y. Hong, and A. Behrangi, 2013: Short-term quantitative precipitation forecasting using an object-based approach. J. Hydrol., 483, 115, https://doi.org/10.1016/j.jhydrol.2012.09.052.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, M., Z. Cui, M. Neumann, and Y. Chen, 2018: An end-to-end deep learning architecture for graph classification. 32nd AAAI Conf. on Artificial Intelligence, New Orleans, LA, AAAI, 4438–4445.

  • Fig. 1.

    From 1100 to 1154 UTC 21 Jul 2015, a SIP occurred in the radar area of Shijiazhuang, China. White points indicate stations with hourly precipitation greater than 20 mm. It can be seen that the SIP event is caused by multiple cells together instead of a single independent cell.

  • Fig. 2.

    Study area and station locations. The black circles indicate the scanning range of the radars, and the red points show the AWS sites locations.

  • Fig. 3.

    Algorithm flowchart. The dashed line indicates the algorithm model training process, and the solid line indicates the nowcasting process. During model training, radar data and AWS observation data are used. Multicell convective systems are identified from radar data and represented using graph models. Then we form a set of graph features and combine the observation data to train a random forest model. In the process of nowcasting, the trained random forest model and the graph features obtained from the radar data are used to predict whether SIP events will occur.

  • Fig. 4.

    The flowchart of multicell system identification. Cell 1 and cell 2 are isolated cells; cell 3 and cell 4 are in a convective system in the current moment. However, by extrapolation at 9 times, it can be seen that cells 1, 2, and 3 may interact to cause precipitation. This algorithm calculates the superposition coefficient η and makes clusters, dividing the monomers 1, 2, and 3 into a system for precipitation prediction.

  • Fig. 5.

    RPS is the ratio of the projection of the cell region in the direction of the cell velocity to the cell speed. It combines the size and moving speed of the convective cell, and represents the superposition effect with the cell itself.

  • Fig. 6.

    Steps to make a convective system to graph model. There are four cells in the convective system. For each cell, its seven physical attributes are calculated. The relative distance t between any two cells is also calculated. A graph model is used to represent the convective system. Each cell in the convective system is represented as a node in the graph model. Node attributes are the physical attributes of the convective cell. A threshold Tt is set. If the relative distance t between two nodes satisfies t < Tt, then an edge is established between the two nodes.

  • Fig. 7.

    The relationship between random forest performance and the number of decision trees. The curves show the performance of the model (left axis). The bars represent the margin of the model performance between the training set and the OOB set (right axis).

  • Fig. 8.

    The relationship between the random forest performance and the depth of decision trees. The curves show the performance of the model (left axis). The bars represent the margin of the model performance between the training set and the OOB set (right axis).

  • Fig. 9.

    Performance diagram summarizing the POD, FAR, bias, and CSI. The curved dotted lines indicate CSI. The straight dotted lines indicate bias.

  • Fig. 10.

    The learning curve shows the model performance when the number of training samples increases from 1 to 1079. The curves show the performance of the model (left axis). The bars represent the margin of the model performance between the training set and the test set (right axis).

  • Fig. 11.

    The importance of features. Each vertical axis tick label consists of two parts. The first part consists of capital letters and subscripts and represents the symbol of the feature in Table 1. The second part is the superscript and represents the stage of the feature in the fusion operations, which is described in detail in section 5a. For the superscripts, α indicates that this feature belongs to the first subgraph, and β indicates that this feature belongs to the second subgraph. The numbers 0, 1, and 2 respectively correspond to the original feature, the feature of the first fusion, and the feature of the second fusion.

  • Fig. 12.

    A SIP event false alarm case. The time period is 0700–0754 UTC 29 Jun 2016, and the data come from the radar located in Shijiazhuang. The colored points are the AWS sites with precipitation. The area marked with a blue outline in the image at 0700 UTC is the multicell convective system at the moment of the false alarm. In the next hour of the false alarm, the intensity of this multicell system quickly weakened.

  • Fig. 13.

    A SIP event that is not correctly identified. The data comes from the radar located in Jinan, and the time period is 1000–1053 UTC 25 Jul 2016. The area surrounded by the blue outline is the multicell convective system when the omission occurs. The colored points are the AWS sites with precipitation. During this SIP event, the scale of the entire convective system is slightly larger but not dramatic. The size of small cells located around the system is increased significantly.

All Time Past Year Past 30 Days
Abstract Views 202 0 0
Full Text Views 3408 2624 137
PDF Downloads 961 155 15