1. Introduction
An ensemble prediction system is composed of multiple forecasts acquired at the same time, and these multiple forecasts can be perturbed by initial conditions, boundary conditions, and model physical processes to demonstrate the forecasting uncertainties that are inherent in numerical weather prediction (NWP) models. Ensemble prediction systems traditionally provide the mean as a deterministic prediction product, which can be used to effectively forecast continuous variables, such as temperatures and wind fields. However, this method tends to underestimate its quantitative precipitation forecasts (QPFs) because the mean is a noncontinuous variable with a highly non-Gaussian spatiotemporal distribution. The probability-matched mean (PMM) method has been proposed (Ebert 2001) to mitigate the underestimation of the ensemble mean for extreme values and has been operationally evaluated by the Central Weather Administration (CWA) of Taiwan for ensemble QPFs. Although the performance of the QPF product has significantly improved via the PMM method, previous studies (Su et al. 2016; Yeh et al. 2016) have shown that the distribution of PMM QPFs is dominated by the mean state of the ensemble system. Therefore, the present study proposes a machine learning–based method that combines convolutional neural networks (CNNs) with a space-based attention mechanism [convolutional block attention module (CBAM)] to consider weightings from individual members and relieve the constraint imposed by the ensemble mean state.
The CNN method is frequently utilized in the fields of image processing, image recognition, and semantic segmentation, and it has already been applied in weather forecasting tasks. Grönquist et al. (2021) applied CNNs and locally connected networks (LCNs) to improve NWP forecast accuracy and efficiency, achieving a 16% improvement in the root-mean-square error (RMSE) of temperature forecasts. They adjusted for local weather patterns, resulting in a 7.9% improvement in the RMSE. Li et al. (2022) used ensembling for the machine learning dataset of the European Centre for Medium-Range Weather Forecasts (ECMWF) by applying bilinear interpolation to rescaling data. In the rainfall forecasting region of the China Huaihe River, both the accuracy and reliability of their approach outperformed those of the joint probability model. With another deep learning approach, Rojas-Campos et al. (2023) evaluated the efficacy of deep learning through probabilistic artificial neural networks (ANNs) in postprocessing ensemble precipitation forecasts at four locations. Both the probabilities of precipitation and the hourly predictions outperform those of classical statistical methods at most stations. Ritvanen et al. (2023) developed the L-CNN model to perform convective rainfall nowcasting, combining a CNN with temporal differencing in Lagrangian coordinates. An experiment revealed that the L-CNN performs better at capturing heavy rainfall growth and decay changes, especially when forecasting higher rain rates and small-scale rainfall. Wang et al. (2017) introduced a residual attention network based on an encoder–decoder attention network. By adding refinement steps, the CNN in this method becomes more robust to inputs with noise. Given the distinct spatial features of each ensemble component and the discontinuous distribution of QPF data (strong features within a limited geography), we should focus on analyzing the features of each ensemble component while filtering out the irrelevant components. The CBAM can be trained to prioritize crucial messages while ignoring irrelevant messages and enhance the weight of spatial focus on the properties of various ensemble components to achieve improved image prediction accuracy (Woo et al. 2018). The CBAM can potentially highlight the significant features while downplaying the less crucial features across both the channel and spatial dimensions. As a result, the added CBAM features could provide better forecasting performance.
In this study, a Weather Research and Forecasting (WRF)-based ensemble prediction system (WEPS) is combined with a machine learning method to provide a more reliable QPF product. Four experiments are designed to propose an optimal postprocessing strategy for ensemble QPFs. Experiment A includes three subexperiments for the PMM and artificial intelligence (AI) algorithms, and the sensitivity of the optimal lead time for usage while considering computing resources is discussed. The sensitivity of the training dataset and training approach in AI algorithms is addressed in experiments B and C, respectively. Two severe weather cases—mei-yu front and afternoon thunderstorms—are selected to demonstrate the postprocessing strategy for ensemble QPF data via an AI algorithm. Experiment D assesses the capabilities determined in the experiments mentioned above for the Typhoon Doksuri (2023) case.
The ensemble data and preprocessing procedures are described in section 2. Section 3 presents the AI model configuration, experimental design, and verification description. The performance of the postprocessing products is discussed in section 4. Conclusions and implications are presented in section 5.
2. Data
a. Ensemble prediction system and observation data
A 20-member ensemble prediction system based on the WRF community model (Skamarock et al. 2008; Powers et al. 2017) is operated by the CWA. The WEPS (Li et al. 2020) serves as a principal tool for comprehensive weather forecasting. It operates in cycles of four runs per day (0000, 0600, 1200, and 1800 UTC), producing forecasts with a duration of 108 h. The system employs a horizontal grid spacing of 3 km. It is composed of initial field perturbations, lateral boundary condition perturbations, and multiphysical suites with stochastic parameterization schemes (Li et al. 2020). The initial condition is obtained by the mesoscale deterministic forecast system of the CWA and adding perturbations sequentially from a 32-member ensemble adjustment Kalman filter (EAKF; Anderson 2001) to configure the 20-member ensemble. The lateral boundary conditions (LBCs) are sequentially chosen from the 10 members of the NCEP Global Ensemble Forecast System (GEFS; Wei et al. 2008), and each LBC is used for two members. More details of the configuration of the multiphysical perturbations for each ensemble member can be found in Table 1 of Li et al. (2020). The WEPS QPF data cover the region including Taiwan, that is, from 118° to 123.5°E and 20° to 27°N (Fig. 1). Hourly ensemble QPFs are generated as part of the process of this system. This study is specifically focused on the time frame ranging from 0 to 34 h, which serves as the primary period for evaluating accurate QPFs. In this study, data from May and June 2018–2020 and May 2021 are collected to train the models, and data from 1 to 30 June 2021 are collected to evaluate the performance of the proposed models (Table 1).
Descriptions of the training and testing datasets.
The rainfall data are derived from radar gauge–corrected quantitative precipitation estimation (QPE) data generated from the CWA operational system (Chang et al. 2021). This QPE product provides a rainfall estimate every 10 min with a 0.0125° horizontal resolution by combining single- and dual-polarimetric QPE relations and rain gauge observations (Chen et al. 2020). The ensemble QPF data matrix is remapped to the same resolution as that of the radar QPE. Considering computational resource limitations, a reduced domain (312 × 312; 97 344 grids) is selected in this study (Fig. 1). The primary focus concerns the prediction of 24-h accumulated rainfall, which is initially derived from the 1-h model output for validation purposes (Fig. 2). A mei-yu heavy rainfall event, with a peak 24-h accumulated rainfall amount of more than 600 mm between 21 and 22 June 2021, is evaluated.
b. Probability-matched mean (PMM)
The PMM is a postprocessing method that can redistribute rainfall forecasts based on the ensemble mean and enhance extreme rainfall forecasts compared to the mean of the ensemble (Ebert 2001; Su et al. 2016). Figure 3 is modified from Fig. 2 of Tsai et al. (2021), and the PMM is demonstrated via a five-member ensemble with four model grids in step 1. In step 2, the forecast grids of each member are ranked individually from the greatest to the smallest and comprise subgroups with the same ranking order to provide a PMM column. The value of the PMM column is averaged from the elements in the same subgroup. Finally, the model grid with the first rank (5) in the ensemble mean (Mean) is reassigned to the first rank (7.8) in the PMM column and continues to the last ranking for other model grids following the spatial distribution of the ensemble mean.
c. AI training dataset
This study focuses primarily on the enhancement of rainfall forecasts in extreme rainfall cases. However, the QPF data acquired from the WEPS from 2018 to 2021 had fewer instances of heavy rainfall. This could have led to an imbalance in the training set, favoring forecasts with lighter rainfall and potentially introducing bias. As a result, the dataset includes data from days marked by substantial rainfall accumulation, particularly spanning the months of April–November. This dataset is used as an annual training dataset. To define a heavy rainfall scenario within the domain (consisting of 97 344 grids), if the number of grid cells in the PMM dataset with daily rainfall exceeding 15 mm constitutes more than 25% (approximately 24 366 grids) of the total, the criterion for a heavy rainfall event is met. The PMM dataset is only used to select training samples for heavy rainfall cases and is not utilized as input for the AI model.
3. Model and experimental design
a. Model design
CNNs represent a remarkable advancement in deep learning, especially for tasks involving visual recognition, such as image classification (Li et al. 2014; Ramprasath et al. 2018), object detection (Lin et al. 2017; Wang et al. 2023; Zhou et al. 2019), and segmentation (Badrinarayanan et al. 2017; Chen et al. 2017; Long et al. 2015; Ronneberger et al. 2015). In this study, a modified CNN model is designed on the basis of the CNN architecture described in LeCun et al.’s (2015) work. The proposed model comprises five convolution layers, an adaptive average pooling layer, and a CBAM, as shown in Fig. 4. The convolution layer transforms the input image from points into local features before synthesizing the recognition results through layer-by-layer feature discrimination.
The CBAM is a simple and effective space-based attention and channel-based attention module (Woo et al. 2018). The spatial attention module identifies key spatial areas within images. It assimilates the feature responses from different channels to ascertain the importance of spatial locations. By computing average and maximum values across feature map dimensions and applying convolution, it generates a spatial attention map that adjusts the feature map by enhancing or reducing features in certain areas. The channel attention module determines the significance of channels within feature maps. It strengthens pivotal features for the current task by learning the importance of various channels. The process involves using the global average and maximum pooling to calculate the weights for each channel, which then modulate the primary features. It can be directly connected to a CNN training framework, primarily by integrating an intermediate feature map, assigning weights sequentially along the spatial and channel dimensions, and multiplying the CBAM output with the original feature map. Compared with the squeeze-and-excitation network (SENet) (Hu et al. 2018), which only includes channel attention, the CBAM focuses on both channel and spatial attention, and it performs better than SENet on image classification and object detection tasks. The CBAM can be divided into two modules (Fig. 5): space-based attention (Fig. 5a) and channel-based attention (Fig. 5b) modules.
As highlighted by Scheuerer and Hamill (2015), precipitation data exhibit a combination of discrete and continuous characteristics. The discrete aspect indicates whether it is rainy or not, whereas the continuous aspect quantifies the accumulation of rainfall. This setting requires a more sensitive spatial perception field than conventional continuous data. Therefore, the sensitivity of rainfall location prediction is expected when the CNN and CBAM are combined. By adding the CBAM module, the CNN can identify which location (spatial) and ensemble member (channel) are more critical. According to the experiment, the RMSE and structural similarity index measure (SSIM) can be improved from 26.08 and 0.43 to 26.03 mm and 0.53 compared with those of the vanilla CNN, respectively. Sadeghi et al. (2019) used Precipitation Estimation from Remotely Sensed Information Using ANNs via a CNN (PERSIANN-CNN) to estimate rainfall via remote sensing data, and the results revealed that a CNN can capture more precise spatial distributions of rainfall and predict maximum rainfall values more accurately.
With five convolutional layers that are used to capture rainfall locations and accumulated rainfall levels, the proposed CNN model does not have a maximum pooling layer for downsampling. Instead, this model uses adaptive average pooling (van Wyk and Bosman 2019) to reshape the feature map back to (312, 312). During the training process of machine learning models, data imbalance can cause prediction biases. If a particular type of data is overly abundant, the model will overlearn the features of this data type. In our case, because statistics show that most rainfall events are light rain events, the model tends to adapt more easily to light rain data during training, resulting in decreased accuracy when predicting other types of rainfall. This occurs because the model has developed a preference for light rain. To address this issue, we design a weighted loss function to reduce the influence of light rain data on model training, allowing the model to predict various rainfall events more evenly.
The class imbalance in rainfall data needs to be addressed. Based on the criteria of the CWA for issuing warnings, the collected rainfall data are divided into five intervals: 0 mm ≤ x < 80 mm, 80 mm ≤ x < 200 mm, 200 mm ≤ x < 350 mm, 350 mm ≤ x < 500 mm, and 500 mm ≤ x. The data points in each interval are identified for each case. The corresponding data points for each interval are approximately 3.79 × 107, 1.51 × 106, 1.35 × 105, 1.03 × 104, and 2.74 × 103, respectively, out of a total of 3.95 × 107 points. The proportions of data in each interval are as follows: 95.79%, 3.83%, 0.34%, 0.03%, and 0.01%.
To address the rainfall class imbalance, we employed the following several weighting schemes, all adhering to the principle that intervals with fewer data points are assigned higher weights:
-
Geometric series weights: The interval with the most data is assigned a weight of 1, and other intervals are weighted via a geometric series on the basis of the data size, resulting in weights of [1, 2, 4, 8, 16].
-
Arithmetic series weights: The interval with the most data is assigned a weight of 1, and other intervals are weighted via an arithmetic series, resulting in weights of [1, 1.2, 1.4, 1.6, 1.8].
-
Inverse proportional weights: Weights were assigned as the inverse of the data proportions for each interval. Hence, we derived weights [1.13, 9.46, 106.5, 1392.54, 5240.74] for the data in each interval.
-
Weight calculation via the square root inverse method: The square root inverse method offers a balanced way to set these weights by using the square root of the inverse proportion of the class frequencies. Hence, we derived weights [1.06, 3.08, 10.32, 37.32, 72.39] for the data in each interval.
-
Empirical weighting method: On the basis of past experience and observations, the empirical value setting method is applied to two weight configurations: [1, 2, 4, 6, 8] and [1, 5, 10, 15, 20].
The weight design of the loss function refers to Shi et al. (2017). These different weighting schemes are intended to balance the data distribution through the weighted loss function, thereby enhancing the model’s prediction accuracy across all intervals. We utilized data from May and June of 2018 to 2020, as well as from May 2021, as our training and validation datasets. We employed the “train_test_split” function from Scikit-learn to split the data into training and validation sets at a ratio of 8:2. To ensure consistency across all the experiments, we set the “random_state” parameter to 0, thereby fixing the random seed for reproducibility. The results obtained by using various combinations of hyperparameters are shown in Table 2.
The hyperparameters of the experiment.
On the basis of the results presented in Table 3, the performance of each hyperparameter group was often similar. Therefore, we initially evaluated the RMSE and selected the group with the lowest RMSE. If the RMSE values were close, we then considered the SSIM. The group with weights [1, 1.2, 1.4, 1.6, 1.8] presented the lowest RMSE value (25.87) and a relatively high SSIM value (0.53), making it the best-performing configuration. Consequently, we chose this weight scheme for our weighted loss function setting.
Different loss function weights tested on a validation set.
The final weight settings are shown in Table 4. The hyperparameters of the model are shown in Table 2.
Weighted MSE settings.
To reduce the computing time and release memory capacity, the output layer is not a fully connected layer. Instead, the adaptive averaging pooling layer, which exerts the same effects as those of the fully connected layer, is used as the output layer. The adaptive averaging pooling layer simplifies the final output conversion process, requires fewer parameters than the fully connected layer does, and prevents overfitting. The flowchart of the training model proposed in this study is shown in Fig. 6.
b. Experimental design
Four experiments are designed to evaluate the performance of the proposed algorithms.
-
Experiment A: Due to computational resource limitations and observation data cutting issues in the NWP modeling system, ensemble precipitation systems generate QPF data after a delay of 6–10 h, which means that the first 6–10 h of the ensemble QPF data are unavailable for real-time operational use. Postprocessing methods integrate 6–10-h rainfall observations to improve forecasts for the 6th–30th and 10th–34th hours.
-
Experiment B: Given the small quantity of trainable data after filtering, experiment B is designed such that the 24-h cumulative rainfall data are divided into one 24-, two 12-, and four 6-h periods. This approach is adopted to increase the data quantity, explore additional features, and incorporate available observations corresponding to the first 6–10 h to assess the possibility of adjusting the rainfall location on the basis of the observation results to achieve improved forecast accuracy.
-
Experiment C: Because the period and amount of rainfall in the same month may vary slightly across different years, training the model and making predictions via data from all years may result in errors. Accordingly, the transfer learning model (e.g., Tan et al. 2018) can be adapted to annual rainfall distributions. Transfer learning is a crucial machine learning process for addressing the insufficiency of training data, and it is adopted to incorporate new data for weighting the model in favor of the pattern and amount of rainfall in the current year, which can increase the forecasting accuracy, enhance generalizability, and reduce the computational resource requirement. For example, the model learns the ensemble rainfall locations from 2018 to 2020 and performs transfer learning to learn the rainfall location for 2021. This experiment also evaluates the effect of online learning on the QPF prediction ability of the model.
-
Experiment D: Torrential rainfall resulting from Typhoon Doksuri (2023), with a maximum value exceeding 1000 mm over 3 days, is selected to assess the capabilities established in the experiments mentioned above.
Figure 7 illustrates the experimental flowchart. The ensemble data are preprocessed to construct a network structure for selecting the most suitable ensemble QPF data. The postprocessing approach is then formulated and trained via various hyperparameter combinations. For all the experiments, model training was performed on an NVIDIA RTX 3090 with an Intel i9-9900k and 64 GB of system memory and PyTorch 2.1 deep learning framework + Compute Unified Device Architecture (CUDA) version 11.8.
1) Comparisons among the 24-h accumulated rainfall levels of the three periods
Owing to the computing resource limitations in the CWA, the operational ensemble QPF is available 10 h after the initial modeling time. As a new generation of high-performance computers is under construction at the CWA, the ensemble QPF delay could be shortened from 10 to 6 h. Experiment A evaluates the 24-h accumulated rainfall prediction performances obtained for the three periods (Fig. 8). However, the first 24-h accumulated QPF is taken as the baseline to represent the model performance, and these different valid time forecasts are also compared with those of the PMM.
2) Sensitivity tests conducted on the training dataset
The number of preprocessed extreme heavy rainfall events is 406. To assess the influences of different durations on the proposed AI model, the cumulative rainfall data are partitioned into multiple training datasets, with one group encompassing a 24-h period (406 datasets), two groups spanning 12-h periods (812 datasets), and four groups covering 6-h periods (1624 datasets). In the experiment, the influence of incorporating available observed 6- or 10-h rainfall measurements is assessed.
For each ensemble QPF, the observational rainfall data are incorporated into the training dataset. This allows for the examination of the influence of the inclusion of observational rainfall data, in both datasets with observational rainfall data and those without such data. This investigation aims to increase the precision of predicting the 24-h cumulative rainfall distribution. The basis for this enhancement lies in the use of initial 6–10-h observational rainfall data. By partitioning the 24-h accumulated rainfall data into intervals of 12 and 6 h, the numbers of training datasets are expanded to 812 and 1624, respectively.
Additionally, the cumulative rainfall observations acquired during the initial 6- or 10-h period are incorporated as the 21st ensemble component within the training dataset. This strategy leads to the enlargement of the training dataset and the refinement of the 20-member ensemble QPF model through the incorporation of input rainfall observations. These modifications are implemented to enhance the model’s generalization ability.
3) Sensitivity tests for the training algorithms
Annual rainfall characteristics may vary annually; even during the same season (May–June), the spatial–temporal characteristics of rainfall might be obviously different (Table 5). The accumulated rainfall data from May and June from 2018 to 2021 were collected from the Taichung station located in central Taiwan. The analysis results revealed that the total rainfall accumulated from May to June 2019 surpassed that from 2016 through 2018; this may have led to the underestimation of the rainfall forecast in 2019 with the proposed machine learning–based algorithm. To mitigate this underestimation problem, the utilization of transfer learning processes could allow prediction models to acquire rainfall amounts and distribution features from existing available data. This approach can lead to enhanced generalizability and improved precision in terms of capturing the distinct patterns of every annual rainfall total. The execution and evaluation of these transfer learning processes are undertaken in experiment C.
Accumulated rainfall for Taichung station in May and June from 2018 to 2021.
4) AI model evaluations
To validate the ability of the proposed machine learning–based postprocessing algorithm, Typhoon Doksuri (2023) is selected to demonstrate the capability of the proposed machine learning–based postprocessing method. According to the observations, Typhoon Doksuri caused a tremendous amount of accumulated rainfall with a maximum value of more than 1000 mm within 3 days in the eastern and southern parts of Taiwan.
c. Validation method
QPF data are analyzed and evaluated in terms of their rainfall amounts and distributions. The evaluation indicators are divided into the components as follows.
1) Critical success index
Confusion matrix (contingency table). The terms O and P represent the observed and forecasted rainfall values, respectively. The term “th” denotes the rainfall threshold.
2) Root-mean-square error
3) Structural similarity index measure
Wang et al. (2004) introduced the SSIM. It indicates the similarity between pictures with an interval of (0, 1). A value closer to 1 indicates that the observed and predicted images in a pair are indistinguishable, which means that the prediction result is superior; in contrast, a trend toward 0 indicates that the prediction result is inferior. This study uses the SSIM to evaluate the distribution of rainfall and the discrepancy between the predicted and observed rainfall locations.
4. Model performance evaluation
In this section, we discuss the results of the four experiments and propose an optimal postprocessing strategy.
a. Comparisons of 24-h accumulated rainfall among three periods
In experiment A, three forecast periods are designed to demonstrate which 24-h period is the best choice for an operational application: 0–24, 6–30, and 10–34 h. The results show that the different QPF lead times among the three periods from the AI algorithm are superior to those of the PMM postprocessing method (Fig. 9), indicating that the machine learning–based postprocessing technique can further improve the accuracy of the QPF products from an ensemble system. Comparing the three periods (0–24, 6–30, and 10–34 h), the forecast obtained with the 6-h lead time is considerably more reliable than that of other 24-h accumulated rainfall periods, and the RMSE and SSIM improved from 57.4 and 0.42 to 43.61 mm and 0.56, with percentages of approximately 24% and 41.1%, respectively. An analysis of different CSI values reveals that the average values of the CSI increase from 0.77 and 0.70 to 0.88 and 0.78, respectively. These shifts mark enhancements of approximately 14.8% and 12% for thresholds of 10 and 20 mm, respectively, and consistent forecasting accuracy is retained at different thresholds (Fig. 10).
Figure 11 shows the rainfall distribution observed over Taiwan for a 24-h accumulated rainfall forecast. On 21 June, the daily accumulated rainfall of the AI algorithm is highly accurate in central and southern Taiwan when the ensemble forecast is initialized at 1200 UTC (Fig. 11c), but the extreme rainfall in southwestern Taiwan is underestimated. In addition, a heavy rainfall forecast is found in the southern areas at the initial time of 1800 UTC, with rainfall overestimations in southwestern Taiwan.
When the forecast is delayed by 6 h, both the PMM and machine learning–based approaches predict heavy rainfall in southwestern Taiwan, but both fail to predict heavy rainfall in northern Taiwan. In contrast with the PMM predictions, the proposed AI method notably overestimates rainfall in central Taiwan on 21 June 2021 (Fig. 11f). However, as the degrees of substantial errors in the rainfall amounts predicted for southern Taiwan decrease for both the PMM and machine learning–based approaches, the underestimation of the amount of rain in the eastern region of Taiwan is improved. The 10-h delayed forecast for 21 June (initialized at 0000 UTC) in central Taiwan shows reasonable results; however, while southern Taiwan encountered significant rainfall, an underestimated amount of rain is found in other regions. In general, a 6–30-h accumulated rainfall forecast from an ensemble forecast system combined with AI postprocessing is suggested because of its consistent performance across various initial periods.
b. Sensitivity to the training data
A histogram of the evaluation indicators, including those for June 2021, is depicted in Fig. 12. For each forecasting period (6–30 and 10–34 h), the sensitivity of the model to the observation data and the data quantity augmentation process are investigated. Because real-time observation data might be available for the first 6 or 10 forecasting hours, the extra 6- and 10-h observed data are added to address the impact of the available observation data. The data quantity enlargement step is investigated once the rainfall data are divided into 6- and 12-h intervals. When the observation data are excluded, the best RMSE performance is identified for one 24-h period, whereas that of the SSIM is observed for four 6-h periods. In contrast, when the observation data are incorporated and the data quantity is increased, a reduction in the RMSE value is noted, and an increase in the SSIM value is observed. For each forecasting period (6–30 and 10–34 h), the RMSE and SSIM values are observed for the four 6-h periods, and these values are improved from 57.4, 53.6 and 0.4, 0.37 to 16.8, 19.2 mm and 0.54, 0.54, respectively, corresponding to improvements of approximately 65% and 45%.
Figure 13 depicts the QPF performance with the CSI, including different data quantity experiments. The findings indicate that increasing the data quantity without incorporating observation data does not improve the forecasting ability of the model; in this case, the forecasting ability decreases with increasing data quantity (Fig. 13d). However, when observation data are incorporated, the forecasting ability increases with increasing data quantity. Therefore, the best QPF performance can be achieved by incorporating a training model via four 6-h input periods of training data. The results of the CSI analysis derived from the 10–34 h forecast are similar to those of the CSI analysis for the 6–30-h forecast. Therefore, incorporating observed rainfall data and increasing the data quantity might improve the forecasting ability of the model, and the CSI values for the four 6-h periods observed in each forecasting period (6–30 and 10–34 h) increase from 0.78 and 0.72 to 0.87 and 0.77 at the 10- and 20-mm thresholds, respectively, compared with those of the PMM (Figs. 13b,d).
Figures 14 and 15 depict the projected 24-h rainfall results obtained with and without observation data, respectively. An analysis of the QPF forecasts obtained without observation data indicates that the rainfall distributions are the same for forecasts produced with different quantities of data, the maximum rainfall is predicted to occur in the southern mountainous region (the predicted value varies slightly from the observed value near the coast of southern Taiwan), and the overall rainfall forecast decreases as the data quantity increases. This might have resulted from 6-h accumulated rainfall often being less than 24-h accumulated rainfall. However, analyses of the forecasts obtained with observation data indicate that between the 6th and 30th hours of the forecast, the location of heavy rainfall within a 24-h interval matches the observed value; in contrast, heavy rainfall in small areas along the northern and eastern coasts is not properly predicted. In addition, the rainfall forecast from the 1200 UTC initialization time on 21 June is slightly overestimated. Notably, while the prediction accurately identifies areas of heavy rainfall with two 12-h intervals, the forecasted amount of heavy rainfall is underestimated at 0600 UTC 21 June. The rainfall forecast locations closely match the observed locations for the four 6-h intervals, but the amount of rainfall is underestimated. As a limitation of the PMM method mentioned in the previous section, the PMM result shows not only a rainfall overestimation in southwestern Taiwan but also inaccuracies in predicting rainfall in eastern Taiwan.
During the time span between the 10th and 34th hours, the forecasted value of the AI method for a 24-h period closely resembles the observed value. However, the machine learning–based model’s rainfall forecasts with the two 12- and four 6-h training datasets underestimate the actual rainfall, leading to a significant lack of ability to predict substantial rainfall events. Minor underestimations of extreme rainfall (130–150 mm) in southern regions are also identified. However, across two 12- or four 6-h intervals, the underestimation issue could be mitigated, and the model produces accurate heavy rainfall distribution predictions.
Based on the June 2021 forecast model evaluation, Figs. 16a and 16b present a comparative analysis between the AI approach and the PMM in terms of the RMSE and SSIM for the one 24-h period observed in the 6–30-h forecasting period. Notably, for the forecast instances of 21 June at 06 and 12, the AI model demonstrates a substantial reduction in the RMSE from 59.59 and 55.21 to 35.06 and 43.68 mm, indicating increases of 41.16% and 20.88%, respectively. The SSIM, which reflects the visual similarity between the forecasted and observed rainfall fields, also exhibits notable improvement during these periods, increasing from 0.38 and 0.42 to 0.61 and 0.63, corresponding to improvements of approximately 61.24% and 51.03%, respectively. Overall, the monthly average RMSE (Fig. 16a) and SSIM (Fig. 16b) of the AI and PMM forecasts significantly decrease from 29.52 and 0.55 to 22.46 mm and 0.62, indicating improvements of 23.9% and 13.95%, respectively.
During the concentrated rainfall events from 3 to 5 and 19 to 22 June, the AI forecasts are substantially more accurate. This pattern is consistent across various rainfall scenarios, highlighting the robustness of the AI model. The pronounced superiority of the AI model under diverse meteorological conditions highlights significant advancements in terms of forecasting precision, demonstrating that our proposed AI approach can greatly increase the reliability of meteorological forecasts.
c. Training algorithm sensitivity results
In this section, the strategies for introducing the training datasets are examined. The first technique involves a month-by-month approach, in which the training data are progressively expanded by retraining the model on a monthly basis. The second method employs transfer learning, where model weights are preserved. This facilitates the loading of these preexisting weights to enable training with the subsequent month’s data. Figure 17 presents the evaluation indicator values obtained by various training algorithms, including month-by-month and transfer learning approaches. After incorporating month-by-month data and performing transfer learning, the RMSE and SSIM values are better than those of the PMM, improving from 48.1 and 0.41 to 37.22 mm and 0.57, respectively. The best results are achieved through transfer learning; the RMSE value is reduced by 10.88 mm, and the SSIM value is increased by approximately 0.16 over the PMM values. The results of the CSI analysis are presented in Fig. 18. At all threshold values, the results of transfer learning are superior to those obtained by the PMM and the month-by-month data incorporation approach for thresholds below 20 mm and are comparable at the large thresholds. The prediction results obtained through month-by-month incorporation are superior to those of the PMM at a rainfall threshold of 20 mm, increasing from 0.66 to 0.75, which is an improvement in the performance of 13.02%; demonstrating the forecast performance can be further enhanced via the transfer learning approach.
Figure 19 shows the 24-h accumulated rainfall forecast maps of a severe weather event from 21 to 23 June 2021. According to the comparison, both methods effectively predict the distribution of heavy rainfall. After the datasets from May 2018 to May 2021 are included and transfer learning is conducted via data from June 2021, the heavy rainfall forecast matches the actual value for a small area in northeastern Taiwan (Fig. 19d) and significantly reduces the overforecast compared with that of the PMM method (Fig. 19b). Through transfer learning, the significant error range is reduced, and the SSIM values are increased compared with the values obtained after incorporating month-by-month data.
In this experiment, both the month-by-month and transfer learning approaches are used as online learning methods to incorporate new data and validate their generalizability. During the period of 21–23 June 2021, which is characterized by heavy rainfall, as well as throughout June 2021, transfer learning performs better than does the approach of incorporating data on a month-by-month basis without transfer learning.
d. AI model evaluations
On the basis of the results of the previous experiment, the four 6-h periods with observation data yield the most accurate QPF products from an ensemble prediction system combined with the machine learning–based postprocessing method. In the context of the current experiment, a comparison between the PMM and our AI model is conducted using the case of Typhoon Doksuri (2023). This event caused significant accumulated rainfall in eastern Taiwan from 25 to 27 July 2023. Figure 20 shows that the AI model forecasts yield comparable results to those of the PMM for forecasting thresholds lower than 100 mm on the basis of the CSI values. However, the PMM performs better at a threshold of 100 mm in terms of the CSI than the proposed AI model does. Through comparative analysis, the PMM is found to exhibit overestimation phenomena (Fig. 20a), not only resulting in better CSI performance for rainfall exceeding 100 mm but also indicating a bias toward overestimation. Upon incorporating rainfall observations, the AI forecast generally outperforms the PMM for CSI thresholds exceeding 10 mm, especially for CSIs greater than 100 mm with 6- and four 12-h periods (Fig. 20b). This finding indicates that AI’s bias correction mechanism effectively mitigates the overestimation issues observed in the PMM, leading to improved outcomes.
Figure 21 shows the RMSE and SSIM results obtained for AI forecasts produced both without and with rainfall observations. Notably, when the forecast is segmented into four 6-h periods, a substantial reduction in the RMSE is evident (Fig. 21a). The RMSE values exhibit a remarkable improvement for the four 6-h periods observed in each forecasting period (6–30 and 10–34 h), decreasing from 89.3 and 86.6 to 22.2 and 20.8 mm, presenting increases of 75% and 76%, respectively. Furthermore, the incorporation of observation values (Fig. 21b) yields a substantial increase in the SSIM. Specifically, the SSIM values show a pronounced increase from 0.39 and 0.38 to 0.69 and 0.71, corresponding to notable improvements of 77.8% and 85.2%, respectively.
Figure 22 shows the forecasting outcomes produced for Typhoon Doksuri (2023). In the absence of observations, the 24-h forecast period underestimates the heavy rainfall in eastern Taiwan (Figs. 22c,f). However, this result improves when the forecasting period is divided into two 12-h segments (Fig. 22d) and further into four 6-h segments (Fig. 22e), although slight rainfall underestimation in western Taiwan persists. Notably, particularly in the context of typhoon-induced disasters, accurate heavy rainfall prediction is highly important. Furthermore, the incorporation of observations serves to refine the rainfall distribution even more effectively. Specifically, the daily accumulated rainfall product for Typhoon Doksuri (2023) provided from the four 6-h training models, complemented by observation values, demonstrates superior performance, which is consistent with the similar results shown in previous experiments.
5. Conclusions
In this study, a postprocessing method combining a space-based attention mechanism with a CNN was proposed to achieve improved ensemble QPF prediction performance. The proposed method was utilized for ensemble QPFs, with a specific focus on predicting 24-h accumulated rainfall amounts and distributions over the island of Taiwan. Preliminary investigations were conducted using the most appropriate network architecture for obtaining ensemble QPFs. The optimal postprocessing approach was subsequently formulated through the integration of a space-based attention mechanism coupled with fine-tuning adjustments to the total number of layers and the size of the convolutional kernels.
These experiments involved the integration of a CNN and the CBAM method within the framework of an ensemble prediction system. The study investigated postprocessing strategies, assessing the performance achieved for individual forecasting periods within three 24-h validation periods (0–24, 6–30, and 10–34 h), conducting a sensitivity analysis concerning the number of training datasets, and enhancing the AI prediction model through different training data inputs and algorithms employing online learning. Compared with the outcomes of these studies, specifically for the mei-yu event, the proposed strategies exhibited notable improvements. Compared with the PMM approach, the proposed models provided significant enhancements across multiple evaluation metrics. Specifically, at a 20-mm rainfall threshold, the forecast methodology proposed for different periods in the case studies led to a 12.29% increase in the CSI, a notable 24.73% reduction in the RMSE, and a 42.26% increase in the SSIM. Furthermore, when segmented into four 6-h periods with the inclusion of observational data, the models outperform the PMM with a 13.74% increase in the CSI, a 64.86% decrease in the RMSE, and a 44.53% increase in the SSIM. Finally, the integration of transfer learning in training further elevated the CSI by 13.02%, decreased the RMSE by 20.92%, and enhanced the SSIM by 39.34%. These findings highlight the critical role of AI-based forecasting in advancing ensemble precipitation postprocessing methods.
In addition, evaluations of Typhoon Doksuri (2023) between the AI model and the PMM reveal distinct performance characteristics across multiple metrics and scenarios. Without incorporating observational data, the PMM demonstrates superior CSI values for rainfall exceeding 100 mm but tends toward overestimation. In contrast, the AI model exhibits superior performance in terms of the RMSE and SSIM metrics across all rainfall intensities, which is attributed to its MSE-based loss function. Upon the integration of observational data, the AI model further enhances its performance on the basis of the CSI, RMSE, and SSIM metrics, indicating effective bias correction capabilities compared with those of the PMM. These findings underscore the AI model’s ability to mitigate overestimation issues inherent in the PMM, which aligns with the study’s objectives and expectations. Enhancing the QPF through AI algorithms can improve the accuracy of rainfall distribution predictions, especially when prior rainfall observations are incorporated during training, which is crucial for improving disaster prevention and management operations during heavy rainfall events.
We used a CPU (Intel Core i9-10900K) and a graphics processing unit (GPU) (NVIDIA RTX 3090) to apply the PMM and AI methods separately, and the calculations were repeated 30 times to obtain statistical results. The average computation times of the proposed AI model on the CPU and GPU were approximately 0.036 and 0.034 s, whereas the PMM took 3.8 s on the CPU. These results demonstrate that the proposed AI model offers significantly faster inference times than the PMM method does, particularly when GPU acceleration is used. This increased computational efficiency makes the AI-based approach more effective for real-time applications as the number of ensemble members grows, including operational weather forecasting and disaster prediction.
To further enhance the forecasting performance of the ensemble QPF postprocessing approach, future investigations should concentrate on the following aspects concerning data and models, such as the ensemble radar echo and wind speed, while also exploring multimodal learning techniques to increase the forecasting accuracy. With an expanded dataset and improved training tools, it may be worthwhile to investigate the feasibility of exploring other potential models, such as U-Net (Ronneberger et al. 2015) and SmaAt-UNet (Trebing et al. 2021).
Acknowledgments.
The authors thank the Central Weather Administration in Taiwan for providing the WEPS QPF and radar QPE data. This research is supported by the Ministry of Science and Technology of Taiwan under Grant 111-2625-M-052-001.
Data availability statement.
The WEPS QPF and radar QPE data utilized in this study were sourced from the Central Weather Administration. Owing to its proprietary characteristics, regrettably, the accompanying data cannot be openly shared. For details regarding data access and the necessary conditions, kindly refer to the contact information provided at https://www.cwa.gov.tw/V8/E/S/service_guide.html. The AI model architecture is available at https://github.com/terry258369/CWA_QPF_CNN.
REFERENCES
Anderson, J. L., 2001: An ensemble adjustment Kalman filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903, https://doi.org/10.1175/1520-0493(2001)129%3C2884:AEAKFF%3E2.0.CO;2.
Badrinarayanan, V., A. Kendall, and R. Cipolla, 2017: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39, 2481–2495, https://doi.org/10.1109/TPAMI.2016.2644615.
Chang, P.-L., and Coauthors, 2021: An operational Multi-Radar Multi-Sensor QPE system in Taiwan. Bull. Amer. Meteor. Soc., 102, E555–E577, https://doi.org/10.1175/BAMS-D-20-0043.1.
Chen, I.-H., J.-S. Hong, Y.-T. Tsai, and C.-T. Fong, 2020: Improving afternoon thunderstorm prediction over Taiwan through 3DVar-based radar and surface data assimilation. Wea. Forecasting, 35, 2603–2620, https://doi.org/10.1175/WAF-D-20-0037.1.
Chen, L.-C., G. Papandreou, F. Schroff, and H. Adam, 2017: Rethinking atrous convolution for semantic image segmentation. arXiv, 1706.05587v3, https://doi.org/10.48550/arXiv.1706.05587.
Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation. Mon. Wea. Rev., 129, 2461–2480, https://doi.org/10.1175/1520-0493(2001)129<2461:AOAPMS>2.0.CO;2.
Grönquist, P., C. Yao, T. Ben-Nun, N. Dryden, P. Dueben, S. Li, and T. Hoefler, 2021: Deep learning for post-processing ensemble weather forecasts. Philos. Trans. Roy. Soc., A379, 20200092, https://doi.org/10.1098/rsta.2020.0092.
Hu, J., L. Shen, and G. Sun, 2018: Squeeze-and-excitation networks. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Salt Lake City, UT, Institute of Electrical and Electronics Engineers, 7132–7141, https://doi.org/10.1109/CVPR.2018.00745.
LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436–444, https://doi.org/10.1038/nature14539.
Li, C.-H., J. Berner, J.-S. Hong, C.-T. Fong, and Y.-H. Kuo, 2020: The Taiwan WRF ensemble prediction system: Scientific description, model-error representation and performance results. Asia-Pac. J. Atmos. Sci., 56 (1), 1–15, https://doi.org/10.1007/s13143-019-00127-8.
Li, Q., W. Cai, X. Wang, Y. Zhou, D. D. Feng, and M. Chen, 2014: Medical image classification with convolutional neural network. 2014 13th Int. Conf. on Control Automation Robotics and Vision (ICARCV), Singapore, Institute of Electrical and Electronics Engineers, 844–848, https://doi.org/10.1109/ICARCV.2014.7064414.
Li, W., B. Pan, J. Xia, and Q. Duan, 2022: Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts. J. Hydrol., 605, 127301, https://doi.org/10.1016/j.jhydrol.2021.127301.
Lin, T.-Y., P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, 2017: Feature pyramid networks for object detection. 2017 Proc. IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Institute of Electrical and Electronics Engineers, 936–944, https://doi.org/10.1109/CVPR.2017.106.
Long, J., E. Shelhamer, and T. Darrell, 2015: Fully convolutional networks for semantic segmentation. arXiv, 1411.4038v2, https://doi.org/10.48550/arXiv.1411.4038.
Powers, J. G., and Coauthors, 2017: The Weather Research and Forecasting Model: Overview, system efforts, and future directions. Bull. Amer. Meteor. Soc., 98, 1717–1737, https://doi.org/10.1175/BAMS-D-15-00308.1.
Ramprasath, M., M. V. Anand, and S. Hariharan, 2018: Image classification using convolutional neural networks. Int. J. Pure Appl. Math., 119, 1307–1319.
Ritvanen, J., B. Harnist, M. Aldana, T. Mäkinen, and S. Pulkkinen, 2023: Advection-free convolutional neural network for convective rainfall nowcasting. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 16, 1654–1667, https://doi.org/10.1109/JSTARS.2023.3238016.
Rojas-Campos, A., M. Wittenbrink, P. Nieters, E. J. Schaffernicht, J. D. Keller, and G. Pipa, 2023: Postprocessing of NWP precipitation forecasts using deep learning. Wea. Forecasting, 38, 487–497, https://doi.org/10.1175/WAF-D-21-0207.1.
Ronneberger, O., P. Fischer, and T. Brox, 2015: U-net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, N. Navab et al., Eds., Lecture Notes in Computer Science, Vol. 9351, Springer International Publishing, 234–241.
Sadeghi, M., A. A. Asanjan, M. Faridzad, P. Nguyen, K. Hsu, S. Sorooshian, and D. Braithwaite, 2019: PERSIANN-CNN: Precipitation estimation from remotely sensed information using Artificial Neural Networks–convolutional neural networks. J. Hydrometeor., 20, 2273–2289, https://doi.org/10.1175/JHM-D-19-0110.1.
Schaefer, J. T., 1990: The critical success index as an indicator of warning skill. Wea. Forecasting, 5, 570–575, https://doi.org/10.1175/1520-0434(1990)005<0570:TCSIAA>2.0.CO;2.
Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 4578–4596, https://doi.org/10.1175/MWR-D-15-0061.1.
Shi, X., Z. Gao, L. Lausen, H. Wang, D.-Y. Yeung, W.-k. Wong, and W.-c. Woo, 2017: Deep learning for precipitation nowcasting: A benchmark and a new model. NIPS'17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., 5622–5632.
Skamarock, W. C., and Coauthors, 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp., https://doi.org/10.5065/D68S4MVH.
Su, Y.-J., J.-S. Hong, and C.-H. Li, 2016: The characteristics of the probability matched mean QPF for 2014 Meiyu Season (in Chinese). Atmos. Sci., 22, 113–134.
Tan, C., F. Sun, T. Kong, W. Zhang, C. Yang, and C. Liu, 2018: A survey on deep transfer learning. Artificial Neural Networks and Machine Learning—ICANN 2018, V. Kůrková et al., Eds., Lecture Notes in Computer Science, Vol. 11141, Springer, 270–279.
Trebing, K., T. Staǹczyk, and S. Mehrkanoon, 2021: SmaAt-UNet: Precipitation nowcasting using a small attention-UNet architecture. Pattern Recognit. Lett., 145, 178–186, https://doi.org/10.1016/j.patrec.2021.01.036.
Tsai, C.-C., J.-S. Hong, P.-L. Chang, Y.-R. Chen, Y.-J. Su, and C.-H. Li, 2021: Application of bias correction to improve WRF ensemble wind speed forecast. Atmosphere, 12, 1688, https://doi.org/10.3390/atmos12121688.
van Wyk, G. J., and A. S. Bosman, 2019: Evolutionary neural architecture search for image restoration. 2019 Int. Joint Conf. on Neural Networks (IJCNN), Budapest, Hungary, Institute of Electrical and Electronics Engineers, 1–8, https://doi.org/10.1109/IJCNN.2019.8852417.
Wang, C.-Y., A. Bochkovskiy, and H.-Y. M. Liao, 2023: YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, Institute of Electrical and Electronics Engineers, 7464–7475, https://doi.org/10.1109/CVPR52729.2023.00721.
Wang, F., M. Jiang, C. Qian, S. Yang, C. Li, H. Zhang, X. Wang, and X. Tang, 2017: Residual attention network for image classification. Proc. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Institute of Electrical and Electronics Engineers, 6450–6458, https://doi.org/10.1109/CVPR.2017.683.
Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, 2004: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process., 13, 600–612, https://doi.org/10.1109/TIP.2003.819861.
Wei, M., Z. Toth, R. Wobus, and Y. Zhu, 2008: Initial perturbations based on the Ensemble Transform (ET) technique in the NCEP global operational forecast system. Tellus, 60A, 62–79, https://doi.org/10.1111/j.1600-0870.2007.00273.x.
Woo, S., J. Park, J.-Y. Lee, and I. S. Kweon, 2018: CBAM: Convolutional Block Attention Module. Proc. Computer Vision—ECCV 2018: 15th European Conf. on Computer Vision (ECCV), Munich, Germany, Springer-Verlag, 3–19, https://dl.acm.org/doi/10.1007/978-3-030-01234-2_1.
Yeh, S.-H., P.-L. Lin, J.-S. Hong, and T.-S. Huang, 2016: Modified probability matched mean QPF of the ensemble prediction system (in Chinese). Atmos. Sci., 44, 83–111.
Zhou, X., D. Wang, and P. Krähenbühl, 2019: Objects as points. arXiv, 1904.07850v2, https://doi.org/10.48550/arXiv.1904.07850.