1. Introduction
The tropical Pacific region plays a crucial role in global climate variability. Rainfall anomalies associated with the El Niño–Southern Oscillation phenomenon generate atmospheric waves that propagate across the globe and impact weather in remote regions of the globe. However, climate models still suffer from biases in this region that affect assessments of weather and climate risk (Lee et al. 2022; Sobel et al. 2023). Several attempts have been made to use statistical and machine learning frameworks to estimate rain rates in the tropics. Previous attempts applied machine learning techniques to the parameterization of convection using cloud-resolving model or high-resolution model simulations (Brenowitz and Bretherton 2018; O’Gorman and Dwyer 2018; Rasp et al. 2018). These efforts bring computational gains, but they cannot overcome inherent model deficiencies and may not identify the underlying physical relationships. Instead, Yang et al. (2019) sought to capture the relationship between atmospheric features and the rain amount by applying a generalized linear model (GLM) to observational data. GLMs naturally provide the relationship through their coefficients. But, they may struggle to explain the tail of the rain-rate distributions accurately due to their parametric assumption on the density function. Building upon this work, Wang et al. (2021) improved the predictive performance by implementing random forest (RF) and neural network (NN) methods. These machine learning methods allowed for more flexibility of the relationship and demonstrated better performance than GLM in predicting the rain-rate distribution and capturing its tail behavior. However, the distribution from their outputs still deviated significantly from the observations (Obs). In this work, we experiment with overparameterized NNs (Over NN), where the number of NN parameters exceed the sample size, and find that they enhance the previous results and effectively capture the tail of the rain-rate distribution.
The overparameterized regime has gained significant attention since the double descent phenomenon was proposed by Belkin et al. (2019). The double descent describes the pattern of the test error curve with respect to the model capacity (i.e., parameters; Fig. 1). According to classical wisdom, the test curve exhibits a U-shaped pattern before reaching the interpolation threshold, where perfect interpolation can be attained. Belkin et al. (2019) showed that beyond the interpolation threshold, the test error decreases again, creating a double descent pattern of the test curve. Interestingly, the notorious issue of overfitting, where an excessive number of model parameters compared to the number of samples harms the test performance, does not show up beyond the interpolation threshold. Since Belkin et al. (2019), numerous studies have been conducted to explore and explain this phenomenon theoretically and experimentally (Geiger et al. 2019; Bartlett et al. 2020; d’Ascoli et al. 2020; Nakkiran et al. 2020, 2021; Muthukumar et al. 2021; Gamba et al. 2022; Hastie et al. 2022). However, most of these studies focused on classification tasks, with little attention given to leveraging the double descent phenomenon for regression on real datasets. In addition, to the best of our knowledge, the capability of overparameterized NNs in explaining the tail behavior of data has never been investigated. In this study, we introduce overparameterized NNs to rainfall predictions and present numerical findings that demonstrate the efficacy of this approach.
Another crucial aspect of interest in the rainfall dataset is identifying key features for our model. Even if NNs successfully capture complex relationships between variables, it may be difficult to disentangle the individual contributions of each feature. To address this issue, we employed the permutation importance (PI) approach devised by Breiman (2001). The PI approach allows us to assess the feature importance by examining the change in the model output with permuted feature values from the original model output. By adopting the PI approach, we are able to provide insights into the key features for our overparameterized NNs and enhance the interpretability of our model.
The structure of the remainder of this paper is as follows. We provide a detailed description of the data used in our study in section 2. In section 3, we recap machine learning methods used in previous studies and introduce the overparameterized regime with a suitable training method. Section 4 first checks if our overparameterized NNs are properly trained and exhibit the double descent phenomenon. Then, we compare prediction results of the rain-rate distribution with observations and other machine learning methods. Last, we summarize our findings and discuss future work.
2. Data description
We used 8 years of June–August (JJA) rain-rate data from 2015 to 2022 (Fig. 2). The rain-rate observations were obtained from the Global Precipitation Measurement (GPM) dual-frequency precipitation radar (DPR) version 7 dataset (Iguchi and Meneghini 2021). JJA data from 2015 to 2018 were used for training, and JJA data from 2019 to 2022 were used for testing. The observational domain was limited to the tropical west Pacific (WP; 130°E–180°, 15°S–15°N) and east Pacific (EP; 180°–130°W, 15°S–15°N) regions, as illustrated in Fig. 2. These two regions were selected for their distinct convective environments, with the WP box representing the warm pool region and South Pacific convergence zone (SPCZ) and the EP box impacted by equatorial upwelling with a more distinct intertropical convergence zone (ITCZ) north of the equator. The EP box is also where global climate models (GCMs) commonly experience an erroneous double ITCZ in boreal summer (e.g., Oueslati and Bellon 2015).
The orbital DPR data were gridded to a temporal resolution of 3 h and a spatial resolution of 0.5°, resulting in 6000 spatial locations for each region. We chose this time and space resolution to match the specifications of a higher-resolution GCM output; however, it is important to note that the DPR data represent just snapshots within the 3-h period because of its scanning geometry. GPM is in an inclined low-Earth orbit, so it only revisits particular points on Earth’s surface every day or so, and the times of day vary because of the precessing nature of the orbit (although this feature of the orbit is important to capture the full diurnal cycle). The DPR has a footprint size of 5 km at nadir and a swath width of 245 km that samples only part of each domain each day. To ensure robust sampling in each grid box, we only consider 0.5° grids that had an overpass containing at least 50 DPR pixels (or about half of the grid box) regardless of whether the pixels were raining or not. We average the mean DPR-observed near-surface rain rate over the entire 0.5° grid for a particular 3-h period and a particular rain type. Therefore, the grid-averaged values will be lower than instantaneous 5-km pixel values.
For the rain types, we adopted a three-type categorization: stratiform, deep convective, and shallow convective (Funk et al. 2013). This separation allows us to account for the unique nature of each rain type, with deep convection extending through the depth of the troposphere due to its stronger updrafts and shallow convection being confined to echo tops below the 0°C level because of weaker updrafts and/or a limiting environment (such as dry air) at upper levels (Schumacher and Funk 2023). Stratiform rain in the tropics forms from deep convection and is an important component of mesoscale convective systems (Houze 1997). The percent of total rain for each rain type in each domain is summarized in Table 1. As our focus is on predicting the rain rate, we only use samples with nonzero rain rates, with numbers of samples for each region indicated in Table 2. Tables 1 and 2 show that while the deep convective rain type has the lowest frequencies in both the WP and EP regions, it contributes about 40% of the total rain. On the contrary, the shallow convective rain type shows the highest frequencies in both regions, yet it makes up the smallest portion of the total rain amount (about 10%–20%) in both regions. This emphasizes the distinct nature of each rain type and demonstrates the importance of separately considering each rain type.
Rain amount percentage of each rain type to the total rainfall amount in the WP and EP regions during JJA.
The number of 0.5° grid boxes with nonzero rain rate for each rain type and region in the training (JJA 2015–18) and testing (JJA 2019–22) datasets.
As in Yang et al. (2019), we consider humidity, temperature, zonal wind, meridional wind, latitude, longitude, and latent heat flux as features for the rain-rate prediction. The features used in this study were computed from the MERRA-2 reanalysis (Rienecker et al. 2011). The MERRA-2 humidity, temperature, zonal wind, and meridional wind fields consist of vertical profiles at 40 pressure levels, leading to 163 features in total including latitude, longitude, and surface latent heat flux. MERRA-2 data are available at 3 hourly and 0.5° resolution and represent instantaneous atmospheric state snapshots. The MERRA-2 grid value that occurs before the DPR overpass was used in the training process (e.g., a 0600 UTC grid value would be used for a DPR overpass between 0600 and 0900 UTC). MERRA-2 grids not matching DPR overpasses were ignored.
Rainfall prediction presents a notable challenge due to the extremely heavy tail in the rain-rate distribution, indicating a relatively large portion of extreme values. Table 3 summarizes some statistics of the training dataset for each rain type and region. The percentiles in Table 3 represent the ranking of a particular value within a dataset. For instance, the 75th percentile indicates that 25% of the data values are above it and the 99th percentile means that only 1% of the data values exceed it. In the case of stratiform and deep convective rain, the mean exceeds both the median and the 75% percentile value. This indicates that the data are highly heavy tailed, as the mean values are significantly influenced by the presence of extreme values. Although shallow convective rain may not be as heavy tailed as the other two rain types, the mean values lying between the median and 75% percentile values still point out the presence of a heavy tail in the data. Moreover, finer temporal or spatial resolutions shift the distributions of rain rates further to the right, indicating heavier tails in the data. We hypothesize that overparameterized methods possess the capability to perform comparably well even with more heavily tailed distributions. However, employing such methods could introduce additional challenges in terms of optimization.
Mean, percentiles (median, 75%, 90%, and 99%), and maximum DPR rain-rate value (mm h−1) of the training dataset for each rain type in each region for a 0.5° grid resolution.
A challenge from dealing with heavy-tailed data arises from a large portion of extreme samples. These extreme values can distort the overall patterns from the majority of the data and, hence, decrease the predictive power of models. In addition, many traditional methods may lack the necessary flexibility to cover the full range of values from a majority of the data to the extreme values. As a result, these methods may struggle to estimate and predict the data, resulting in unsatisfactory overall performance. This limitation can particularly affect a model’s ability to accurately capture and forecast extreme events, which is crucial in tasks like rainfall forecasting.
3. Methods
In the following, we present four statistical and machine learning frameworks to investigate the relationship between the rain rates and features.
a. Generalized linear model
The GLM (McCullagh and Nelder 1989) is a flexible statistical model that provides an extension to ordinary linear regression by allowing nonnormal distribution of the response variable. In contrast to ordinary linear regression, where the mean of the response μ is directly modeled as a linear function of the features, the GLM introduces the link function to establish a nonlinear relationship between the response and the features. The link function enables the GLM to handle a wider range of data distributions and provide greater capability in modeling complex relationships.
Despite some advantages of the gamma regression, such as its ability to handle skewed data, its limited parametric distribution we assume for the error may not adequately capture the characteristics of heavy-tailed data such as rainfall (Yang et al. 2019; Wang et al. 2021). Thus, adopting machine learning techniques, which do not impose such strong restrictions as GLMs, could potentially enhance the estimation performance.
b. Random forest
RF has gained attention for its powerful prediction capability since it first appeared in Breiman (2001). RF is an ensemble method that aggregates multiple decision trees to achieve accurate and robust predictions. In an RF model, each decision tree independently learns from random samples of training data with a random subset of features at each split. This random sampling and subsetting process reduces the risk of overfitting by preventing the trees from becoming too specialized to the training data or correlated to one another. Aggregating the outcomes from multiple less correlated trees leads to outcomes with lower variance and improved accuracy.
Although RF may lack interpretability compared to GLM, its performance in fitting and predicting rainfall data exceeded GLM and matched NN (Wang et al. 2021). In our study, we train a random forest model in R with the randomForest function supported by the randomForest package. The number of trees was set to 100, and the number of features selected at each split was set to the largest number below
c. Neural networks
1) Multilayer perceptron NN
As the size of datasets grow and high-performing computation becomes more available, NNs have been one of the most popular frameworks in the statistical and machine learning communities. Inspired by the structure of the human brain, NNs are computational models which are defined by hidden units and connections between the hidden units. Hidden units and connections correspond to the neurons and the synapses in human brain, respectively. A typical example would be multilayer perceptron (MLP), which connects all the units from the input to the output (LeCun et al. 2015). Besides MLP, convolutional NN (Simonyan and Zisserman 2014), recurrent NN (Hochreiter and Schmidhuber 1997), and transformer (Vaswani et al. 2017) are frequently used structures. In this study, MLP is used to compare the standard NN performance with GLM and RF, as our primary goal lies in introducing overparameterized NNs.
2) Overparameterized regime
An overparameterized regime refers to when the number of parameters in a model exceeds the training sample size. Overparameterized models have gained significant attention in recent years (Cao et al. 2022; Hastie et al. 2022; Liu et al. 2022; Xu et al. 2022; Gao et al. 2023), particularly with the rise of deep learning, as many state-of-the-art deep learning methods fall into this category. From classical wisdom, an excess number of parameters compared to the number of data points usually leads to overfitting. However, it has been observed that overparameterized NN models do not suffer from traditional overfitting and result in decent generalization.
Belkin et al. (2019) experimentally showed how classical understanding of overfitting may have to change in an overparameterized regime. Their study demonstrated that overparameterized models not only perfectly interpolate the training data but also attain lower test error than what can be achieved in underparameterized regimes. According to the classical understanding, the test error curve first decreases until an unknown “sweet spot” where the minimum test error is obtained and then it starts to increase again. The maximum test error occurs around the interpolation threshold, where the number of parameters matches the sample size. Beyond the interpolation threshold, also called the overparameterized regime, the test error starts to decrease again. This phenomenon is often called double descent.
However, it becomes more challenging to train overparameterized models as we need to explore a larger parameter space. Belkin et al. (2019) suggested a sequential training procedure to gradually increase the model size, with the final weight of smaller models serving as the pretrained initialization for the larger models. We followed a similar sequential training procedure and considered NNs with different depths (L = 4, 5, 6) and widths (W = 12, 24, …, 600). For example, when training NNs with L = 4, the learned weights of an NN with L = 4 and W = 12 were used as the initialization for an NN with L = 4 and W = 24. The remaining weights were initialized by default in PyTorch, which in our case was the Kaiming uniform initialization (He et al. 2015). For deeper NNs (L = 5, 6), the same width was maintained, but NNs with one less layer were used for initialization. Our final models were NNs with L = 6, but NNs with L = 5 were considered for initialization of NNs with L = 6, as we found that this process stabilized the training procedure. This pretraining strategy allowed us to optimize over a larger parameter space while benefiting from the knowledge gained by smaller models in the sequential progression.
Every experiment with NNs was conducted by PyTorch with the Adam optimizer (Kingma and Ba 2014). The total number of epochs for every NN with a different size was set to be 2000, and the learning rate decayed by 0.9 every 100 epochs. To quantify the discrepancy between fitted and target values, we used the mean-square error (MSE) as the loss function. It is important to note that no explicit or implicit regularization techniques, such as dropout or imposing a penalty on weights, were utilized. More details on the training configurations for each region and rain type are listed in Table 4.
Training configurations for NNs with six layers.
Each model was trained on a single Tesla V100 GPU, with training times influenced by both rain type and region due to varying sample sizes. The most data-intensive case, shallow convective rain in the WP region, demanded the longest training due to its largest sample size. In this case, training the smallest model (L = 4 and W = 12) took 56 min, while the largest (L = 6 and W = 600) required 65 min. Due to the sequential training procedure, the entire training time for the largest model extended to 55 h as it requires training of smaller models for initialization. Conversely, the deep convective rain type in the EP region had shorter training times, with the smallest and largest models taking 25 and 26 min, respectively. The entire training time for this case was 23 h.
d. Model interpretation
One of the popular approaches to explain variable importance in machine learning is permutation importance (Breiman 2001; Fisher et al. 2019; Sood and Craven 2022; Cheung et al. 2022; Ramos-Valle et al. 2023; Sekiyama et al. 2023). Permutation importance quantifies the importance of individual features by assessing the impact on test errors when their values are randomly shuffled while keeping the other features unchanged. To calculate the permutation importance for a specific feature (e.g., temperature at a particular height), we first shuffle values of the target feature while leaving the remaining features in place. Then, we obtain model outcomes from these permuted data and compute the increase in the loss values. A larger increase in loss values indicates that the target feature plays a more significant role in the model, as breaking the relationship between this target feature and the model outcome leads to a greater deterioration in model performance.
The original idea, proposed by Breiman (2001), aimed to explore local variable importance, i.e., the variable importance for an individual sample. However, our primary objective is to achieve transparency for the entire model; hence, we employ permutation importance to derive global variable importance (König et al. 2021; Sood and Craven 2022). The only distinction between calculating local and global variable importance lies in whether each permutation is kept distinct or identical across samples for a given target feature. While there are other popular interpretation methods such as Shapley additive explanations (SHAP; Lundberg and Lee 2017) and local interpretable model-agnostic explanations (LIME; Ribeiro et al. 2016), they mainly provide local variable importance. Since our focus is on global variable importance, permutation importance was considered the most suitable choice for this study.
To summarize our variable importance findings, we employed the following process. For each feature, we conducted 100 random permutations, ensuring the robustness of our variable importance investigations. In each permutation, we identified the top 10 most significant features and aggregated these selections across all permutations. Finally, we determined key features for our model by reporting the top 10 most frequently occurring ones within this combined set of features.
4. Results
Only data points with nonzero rain rates were used for the results presented below. We compare four statistical and machine learning methods: GLM, RF, the underparameterized NN (Under NN), and the overparameterized NN. We chose the underparameterized NN model that yielded the best test error before peaks in the test curve and the overparameterized NN model with the largest number of parameters. The best models were separately obtained for the WP and EP regions and three different rain types: stratiform, deep convective, and shallow convective.
a. Double descent
We first examine whether overparameterized NNs are well trained. To properly train overparameterized NNs, the model size should be increased beyond the interpolation threshold, where we can achieve zero training error and observe the double descent curve (Belkin et al. 2019; Fig. 1). Then, overparameterized models at the end of the double descent curve are used to see how these models perform with heavy-tailed rainfall data.
Figure 3 illustrates the training and testing curves obtained from NNs with varying depths for different regions and rain types. As the model size increases, we are able to achieve nearly zero training error across all rain types. However, due to the heavy-tailed nature of the rainfall dataset, larger models were necessary to attain sufficiently low training error beyond the typical interpolation thresholds (i.e., where the sample size is equal to the number of model parameters as indicated by the vertical dashed line in each panel). In particular, NNs for deep convective rain need larger models to achieve a nearly zero training error, as it is the most heavy-tailed rain type. Nakkiran et al. (2021) also pointed out that other factors such as data distribution and training procedure may have influence on the model complexity, which is directly related to the double descent phenomenon.
Notably, all the test curves in Fig. 3 exhibit the double descent phenomenon. The test errors after the peaks decrease and reach minimum or near-minimum points, although the peaks in the test curves occur at different positions. This observation holds true for all regions and rain types examined in the study, indicating that the overparameterized NNs have been legitimately trained. A minor point to discuss is the position of the peaks in the test errors. Specifically, models with six layers often reach their peak before reaching the interpolation threshold. In theory, with simple frameworks such as linear regression, the peaks are expected to occur at the interpolation threshold. However, a shift in the peak position has been observed in the literature and various reasons such as the data distribution, noises in responses, or regularization may cause the shift (d’Ascoli et al. 2020; Nakkiran et al. 2021; Gamba et al. 2022). Henceforth, all overparameterized NN results presented for each region and rain type are derived from the models at the rightmost end of the x axis in Fig. 3.
b. Rain-rate distributions
Figure 4 illustrates percentile plots of estimated (train) and predicted (test) rain-rate values from the GLM, RF, and NN methods over the WP and EP. Recall that the rain rates are based on 0.5° grid means, so the values will be shifted lower compared to the native 5-km footprint resolution of the DPR. For both regions and all rain types, the distributions of estimated values from the overparameterized NNs (red dashed line) successfully align with the distribution of the DPR observations (solid black line). This result is expected given that overparameterized NNs achieved a nearly zero training error in Fig. 3. The other methods could not fully recover the training data and overestimate occurrence at low rain rates and underestimate occurrence at large rain rates. This result is consistent with Yang et al. (2019) and Wang et al. (2021).
In the test results, overparameterized NNs outperform the other methods and accurately capture the general shape of the rain-rate distributions of the test dataset (Fig. 4). While the test lines for the overparameterized NNs no longer lie directly on the observed DPR rain rates, they still hew closely to each other at both low-to-moderate percentiles (0%–90%) and tail values (90%–99.9%). There are generally better rain-rate predictions over the EP, similar to the GLM performance in Yang et al. (2019). In particular, the predicted distribution of stratiform rain rate over the EP is very well represented from overparameterized NNs. The deep convective prediction has especially high fidelity in the EP, whereas deep convective rain rates are all overpredicted in the WP.
Figure 4 also shows that all the other methods exhibit markedly different patterns in their prediction distributions compared to the true rain-rate distribution. Both GLM and RF tend to overpredict small rain-rate values in the test data while underpredicting large values. This behavior is often seen in GCMs (e.g., Dai 2006; Fiedler et al. 2020) and indicates that these methods produce a narrower range of rain-rate predictions compared to the wider distributions in overparameterized NNs and the actual observations. Underparameterized NNs initially show an underprediction at low rain rates, but then, they bounce back to the observed rain-rate distribution at higher values. However, underparameterized NNs eventually end up generating too large of rain rates compared to the true values. Overall, these findings verify the superiority of the overparameterized NNs in accurately predicting rain-rate distributions.
MedAE of the predicted values (mm h−1) for each model, with values in bold indicating the smallest value among all methods considered.
c. Spatial maps
We next present the geographical patterns produced by each method. The rain-rate prediction results are averaged over multiple years (2019–22) to generate a single spatial map for each method. Figures 5 and 6 display the averaged spatial maps over the WP and EP regions, respectively. While the observations indicate strong spatial heterogeneity over the domains for each rain type, RF and GLM produce very smooth prediction maps. Underparameterized NNs produce nonsmooth spatial patterns, but the high rain rates are too clustered and large. Overparameterized NNs, on the other hand, demonstrate their remarkable ability to capture the spatial heterogeneity of the DPR rain-rate observations. Although the results do not portray individual weather events but rather a 4-yr average, their pebble-like quality is much more representative of true weather events, compared to the underparameterized NN boulders and ruffled sand of the RF and GLM maps.
In terms of the overall spatial patterns, all the methods capture the widespread rainfall of the WP warm pool (Fig. 5) and the ITCZ in the northern part of the EP domain (Fig. 6). However, the overparameterized NN produces higher fidelity in the patterns. For example, while most of the training was done over ocean grid points, the overparameterized NN provides more distinct predictions over Papua New Guinea in the southwest corner of the WP domain and is able to capture orographic features to a higher degree than the other methods, especially in the shallow convective field (Fig. 5c). This result is promising for the future use of overparameterized NNs for rainfall prediction over continents. In the EP, the overparameterized NN produces a more distinct ITCZ in the stratiform and deep convective fields (Figs. 6a,b) while also reproducing the pattern in shallow convective rainfall at the equator (Fig. 6c) despite the fact that sea surface temperatures are not used as a predictor.
d. Interpreting overparameterized neural networks
In this section, we provide the feature importance for overparameterized NNs with permutation importance. Table 6 presents the 10 top features for the test output, categorized into five groups: humidity, temperature, zonal wind, meridional wind, and others. Each category comprises values at 40 pressure levels except for the “Others” category, which contains fields reported at only one level (i.e., longitude, latitude, and latent heat flux). The individual feature contributions to the model output might not hold significant meaning since permuting one variable while keeping the other variables the same may lead to unrealistic environmental conditions. Instead, our primary focus lies in identifying which category plays a significant role in the model output. Future work would be to analyze the individual and combined feature contributions in more detail.
Categories of the top 10 important features from Over NN on test data for the WP and EP regions. The numbers in parentheses represent pressure levels (hPa) of selected features.
Higher numbers in Table 6 represent a larger importance. For example, six pressure levels from the humidity category at or below 750 hPa are identified as the most significant for explaining shallow convective rain rates in the EP, while the other four most important features are temperature values at low (925 and 875 hPa) and mid-to-upper (400 and 350 hPa) levels. While we used all available height levels (including the stratosphere) from MERRA-2 in our training and testing, we only report on the importance of pressure levels within the troposphere (i.e., up to 100 hPa), as almost all weather phenomena occur within this atmospheric layer.
Table 6 shows that the average number of variables in the humidity category that are selected as key features is 4.33, indicating that 4.33 variables from the humidity category were consistently chosen as the 10 most influential features in rain-rate prediction across the tropical Pacific. This result is consistent with our common understanding that humidity has a direct relationship with rain rate (e.g., Bretherton et al. 2004), making it a critical factor in predicting rainfall. Furthermore, for most of the rain types, humidity contains the most variables among key features, which is consistent with the results of Ahmed and Schumacher (2015) that indicated a strong relationship between column humidity and shallow convective, deep convective, and stratiform rain rates across the tropical oceans. The humidity category is closely followed by temperature in importance, which aligns well with the results from Yang et al. (2019). We note that most of the pressure levels highlighted for humidity and temperature are at or below 700 hPa.
The most influential categories in explaining the rainfall prediction results over the tropical Pacific in Table 6 after humidity and temperature are, in their respective order, zonal wind, meridional wind, and others. Even though wind (or wind shear) is not typically included as a parameter in GCM convective parameterizations, the small but perceptible presence of zonal and meridional winds in Table 6 suggests that the inclusion of winds has the potential to improve convective and stratiform rain predictions. A more precise quantification of these relationships is left for future studies; however, differences in the category rankings between the WP and EP and between rain types are consistent with our general understanding of large-scale environmental factors that affect rain production in the tropics.
5. Discussion
Capturing the observed climate variability in the tropical Pacific is one of the great challenges currently faced by climate modeling (Lee et al. 2022; Sobel et al. 2023). The rainfall response to sea surface temperature anomalies in this region plays a key role in driving this climate variability. In this study, we applied statistical and machine learning techniques to model the relationship between the large-scale environment and rainfall. With growing interest in the application of machine learning to climate modeling (Brenowitz and Bretherton 2018; O’Gorman and Dwyer 2018; Rasp et al. 2018), our results can help guide the development of new machine learning–derived parameterizations of rainfall.
We found that properly trained overparameterized NNs correctly explained the rain-rate distributions over the tropical Pacific for multiple rain types (stratiform, deep convective, and shallow convective), including their tail behavior. Overparameterized NNs also outperformed the other methods such as GLM and RF in both predicting the rain-rate distribution and capturing spatial patterns. The permutation importance was implemented to address the feature importance in the NN model outcomes, producing consistent results with those obtained from the GLM framework in Yang et al. (2019), who also found humidity and temperature to be the most important environmental variables. To the best of our knowledge, the benefits of overparameterized models have been neither experimentally nor theoretically verified for heavy-tailed datasets. Given our successful results, the applicability of overparameterized NNs can be expanded to various other real-world datasets.
It is notable that the training and test curves in Fig. 3 show slightly different behaviors compared to previous studies. The interpolation thresholds have shifted to the right due to the heavy-tailed nature of the dataset, as we need larger model complexity to fully explain samples with large rain-rate values. The locations of the peaks in the test curves were often placed much earlier than the interpolation thresholds. While it is partly known that factors like data distribution and noise in responses can influence the locations of the peaks, the majority of reasons and their impacts remain unclear and unexplored. One workaround with this issue is to consider
In addition, the final models for each rain type and region have the potential for further improvement through fine-tuning and optimal selection of the hyperparameters. For instance, as observed in Fig. 3, deeper neural networks consistently lead to better prediction performance even when the number of parameters is similar. The choice of activation functions and the optimizer also played a crucial role in determining the model performance. However, the process of searching for optimal hyperparameters needs to be done for each dataset individually, and it can be tedious work. Furthermore, the improvements achieved through hyperparameter tuning may not yield significant differences in the final results. We followed Nakkiran et al. (2021) to choose the Adam optimizer over stochastic gradient descent and used an MLP structure to claim that we can obtain desirable results even with simple models. While further fine-tuning and customization of the model structure could potentially lead to improvements, the current method allows us to obtain promising results without the exhaustive search for hyperparameters. This helps maintain a balance between model performance and computational efficiency, making the approach more practical and applicable to real datasets.
We further investigated data transformations such as logarithmic or square root to alleviate the skewness in the data distribution. However, these transformations presented challenges for neural networks in understanding the distribution of extreme events, as they reduced variability in the tail distribution. We also delved into alternative probability distributions like the lognormal distribution for GLM, yet none of the distributions yielded satisfactory results. Last, we performed a number of sensitivity tests to see how flexible the overparameterized NN framework is in predicting rain rates including (i) not separating by rain type, (ii) training on a combined WP/EP domain, and (iii) considering both rain and no-rain events. The results (not shown) indicate that overparameterized NNs still do better than GLM, RF, and underparameterized NNs at capturing the tail and spatial structure of the rainfall over the tropical Pacific but not as well as overparameterized NNs that are trained for separate rain types or regions. This highlights the limitations in capturing the full complexity of total rainfall over large regions. Furthermore, simultaneously modeling both rain and no-rain events proved challenging due to the dataset’s zero-inflated and heavy-tailed nature. The training procedure encountered significant instability, highlighting the complexity of this task, so specialized architectures or optimization methods may be necessary. We leave this exploration for future research.
Acknowledgments.
Aaron Funk processed the gridded radar and MERRA-2 data used in the analysis. Mikyoung Jun acknowledges support by NSF IIS-2123247 and DMS-2105847. Courtney Schumacher acknowledges support by NASA 80NSSC22K0617. Funding for this project was also provided by the TAMU College of Arts and Sciences Seed Funds.
Data availability statement.
The GPM DPR and MERRA-2 data are publicly available from the NASA GES DISC (https://disc.gsfc.nasa.gov/). The specific DPR and MERRA-2 training and test datasets used in this study have been placed on the Texas Data Repository (https://dataverse.tdl.org/). The final overparameterized models can be found in our GitHub repository (https://github.com/HojunYou/Rainfall_prediction_with_overNN).
REFERENCES
Ahmed, F., and C. Schumacher, 2015: Convective and stratiform components of the precipitation-moisture relationship. Geophys. Res. Lett., 42, 10 453–10 462, https://doi.org/10.1002/2015GL066957.
Bai, T., and P. Tahmasebi, 2023: Graph neural network for groundwater level forecasting. J. Hydrol., 616, 128792, https://doi.org/10.1016/j.jhydrol.2022.128792.
Bartlett, P. L., P. M. Long, G. Lugosi, and A. Tsigler, 2020: Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117, 30 063–30 070, https://doi.org/10.1073/pnas.1907378117.
Belkin, M., D. Hsu, S. Ma, and S. Mandal, 2019: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl. Acad. Sci. USA, 116, 15 849–15 854, https://doi.org/10.1073/pnas.1903070116.
Breiman, L., 2001: Random forests. Mach. Learn., 45, 5–32, https://doi.org/10.1023/A:1010933404324.
Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 6289–6298, https://doi.org/10.1029/2018GL078510.
Bretherton, C. S., M. E. Peters, and L. E. Back, 2004: Relationships between water vapor path and precipitation over the tropical oceans. J. Climate, 17, 1517–1528, https://doi.org/10.1175/1520-0442(2004)017<1517:RBWVPA>2.0.CO;2.
Cao, Y., Z. Chen, M. Belkin, and Q. Gu, 2022: Benign overfitting in two-layer convolutional neural networks. NIPS’22: Proc. 36th Int. Conf. on Neural Information Processing Systems, New Orleans, LA, Association for Computing Machinery, 25 237–25 250, https://dl.acm.org/doi/10.5555/3600270.3602100.
Cheung, H. M., C.-H. Ho, and M. Chang, 2022: Hybrid neural network models for postprocessing medium-range forecasts of tropical cyclone tracks over the western North Pacific. Artif. Intell. Earth Syst., 1, e210003, https://doi.org/10.1175/AIES-D-21-0003.1.
Dai, A., 2006: Precipitation characteristics in eighteen coupled climate models. J. Climate, 19, 4605–4630, https://doi.org/10.1175/JCLI3884.1.
d’Ascoli, S., M. Refinetti, G. Biroli, and F. Krzakala, 2020: Double trouble in double descent: Bias and variance (s) in the lazy regime. Proceedings of the 37th International Conference on Machine Learning, Z. Abbas et al., Eds., Vol. 119, PMLR, 2280–2290, https://proceedings.mlr.press/v119/d-ascoli20a.html.
Fiedler, S., and Coauthors, 2020: Simulated tropical precipitation assessed across three major phases of the Coupled Model Intercomparison Project (CMIP). Mon. Wea. Rev., 148, 3653–3680, https://doi.org/10.1175/MWR-D-19-0404.1.
Fisher, A., C. Rudin, and F. Dominici, 2019: All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res., 20, 1–81.
Funk, A., C. Schumacher, and J. Awaka, 2013: Analysis of rain classifications over the tropics by version 7 of the TRMM PR 2A23 algorithm. J. Meteor. Soc. Japan, 91, 257–272, https://doi.org/10.2151/jmsj.2013-302.
Gamba, M., E. Englesson, M. Björkman, and H. Azizpour, 2022: Deep double descent via smooth interpolation. arXiv, 2209.10080v4, https://doi.org/10.48550/arXiv.2209.10080.
Gao, Y., S. Mahadevan, and Z. Song, 2023: An over-parameterized exponential regression. arXiv, 2303.16504v1, https://doi.org/10.48550/arXiv.2303.16504.
Geiger, M., S. Spigler, S. d’Ascoli, L. Sagun, M. Baity-Jesi, G. Biroli, and M. Wyart, 2019: Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Phys. Rev., 100E, 012115, https://doi.org/10.1103/PhysRevE.100.012115.
Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani, 2022: Surprises in high-dimensional ridgeless least squares interpolation. Ann. Stat., 50, 949–986, https://doi.org/10.1214/21-AOS2133.
He, K., X. Zhang, S. Ren, and J. Sun, 2015: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, Institute of Electrical and Electronics Engineers, 1026–1034, https://doi.org/10.1109/ICCV.2015.123.
Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 1735–1780, https://doi.org/10.1162/neco.1997.9.8.1735.
Houze, R. A., Jr., 1997: Stratiform precipitation in regions of convection: A meteorological paradox? Bull. Amer. Meteor. Soc., 78, 2179–2196, https://doi.org/10.1175/1520-0477(1997)078<2179:SPIROC>2.0.CO;2.
Iguchi, T., and R. Meneghini, 2021: GPM DPR precipitation profile L2A 1.5 hours 5 km v07. Goddard Earth Sciences Data and Information Services Center (GES DISC), accessed 15 February 2022, https://doi.org/10.5067/GPM/DPR/GPM/2A/07.
Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.
König, G., C. Molnar, B. Bischl, and M. Grosse-Wentrup, 2021: Relative feature importance. 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan, Italy, Institute of Electrical and Electronics Engineers, 9318–9325, https://doi.org/10.1109/ICPR48806.2021.9413090.
LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436–444, https://doi.org/10.1038/nature14539.
Lee, S., M. L’Heureux, A. T. Wittenberg, R. Seager, P. A. O’Gorman, and N. C. Johnson, 2022: On the future zonal contrasts of equatorial pacific climate: Perspectives from observations, simulations, and theories. npj Climate Atmos. Sci., 5, 82, https://doi.org/10.1038/s41612-022-00301-2.
Liu, C., L. Zhu, and M. Belkin, 2022: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Appl. Comput. Harmonic Anal., 59, 85–116, https://doi.org/10.1016/j.acha.2021.12.009.
Longman, R. J., A. J. Newman, T. W. Giambelluca, and M. Lucas, 2020: Characterizing the uncertainty and assessing the value of gap-filled daily rainfall data in Hawaii. J. Appl. Meteor. Climatol., 59, 1261–1276, https://doi.org/10.1175/JAMC-D-20-0007.1.
Lucas, M. P., R. J. Longman, T. W. Giambelluca, A. G. Frazier, J. Mclean, S. B. Cleveland, Y.-F. Huang, and J. Lee, 2022: Optimizing automated kriging to improve spatial interpolation of monthly rainfall over complex terrain. J. Hydrometeor., 23, 561–572, https://doi.org/10.1175/JHM-D-21-0171.1.
Lundberg, S. M., and S.-I. Lee, 2017: A unified approach to interpreting model predictions. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., 4768–4777, https://dl.acm.org/doi/10.5555/3295222.3295230.
McCullagh, P., and J. A. Nelder, 1989: Generalized Linear Models. 2nd ed. Chapman and Hall, 526 pp.
Muthukumar, V., A. Narang, V. Subramanian, M. Belkin, D. Hsu, and A. Sahai, 2021: Classification vs regression in overparameterized regimes: Does the loss function matter? J. Mach. Learn. Res., 22, 10 104–10 172.
Nakkiran, P., P. Venkat, S. Kakade, and T. Ma, 2020: Optimal regularization can mitigate double descent. arXiv, 2003.01897v2, https://doi.org/10.48550/arXiv.2003.01897.
Nakkiran, P., G. Kaplun, Y. Bansal, T. Yang, B. Barak, and I. Sutskever, 2021: Deep double descent: Where bigger models and more data hurt. J. Stat. Mech., 2021, 124003, https://doi.org/10.1088/1742-5468/ac3a74.
O’Gorman, P. A., and J. G. Dwyer, 2018: Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events. J. Adv. Model. Earth Syst., 10, 2548–2563, https://doi.org/10.1029/2018MS001351.
Oueslati, B., and G. Bellon, 2015: The double ITCZ bias in CMIP5 models: Interaction between SST, large-scale circulation and precipitation. Climate Dyn., 44, 585–607, https://doi.org/10.1007/s00382-015-2468-6.
Ramos-Valle, A. N., J. Alland, and A. Bukvic, 2023: Using machine learning to understand relocation drivers of urban coastal populations in response to flooding. Artif. Intell. Earth Syst., 2, 220054, https://doi.org/10.1175/AIES-D-22-0054.1.
Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 9684–9689, https://doi.org/10.1073/pnas.1810286115.
Ribeiro, M. T., S. Singh, and C. Guestrin, 2016: “Why should I trust you?”: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, 1135–1144, https://dl.acm.org/doi/10.1145/2939672.2939778.
Rienecker, M. M., and Coauthors, 2011: MERRA: NASA’S Modern-Era Retrospective Analysis for Research and Applications. J. Climate, 24, 3624–3648, https://doi.org/10.1175/JCLI-D-11-00015.1.
Schumacher, C., and A. Funk, 2023: Assessing convective-stratiform precipitation regimes in the tropics and extratropics with the GPM satellite radar. Geophys. Res. Lett., 50, e2023GL102786, https://doi.org/10.1029/2023GL102786.
Sekiyama, T. T., S. Hayashi, R. Kaneko, and K.-i. Fukui, 2023: Surrogate downscaling of mesoscale wind fields using ensemble super-resolution convolutional neural networks. Artif. Intell. Earth Syst., 2, 230007, https://doi.org/10.1175/AIES-D-23-0007.1.
Simonyan, K., and A. Zisserman, 2014: Very deep convolutional networks for large-scale image recognition. arXiv, 1409.1556v6, https://doi.org/10.48550/arXiv.1409.1556.
Singh, S. K., and G. A. Griffiths, 2021: Prediction of streamflow recession curves in gauged and ungauged basins. Water Resour. Res., 57, e2021WR030618, https://doi.org/10.1029/2021WR030618.
Sobel, A. H., and Coauthors, 2023: Near-term tropical cyclone risk and coupled Earth system model biases. Proc. Natl. Acad. Sci. USA, 120, e2209631120, https://doi.org/10.1073/pnas.2209631120.
Sood, A., and M. Craven, 2022: Feature importance explanations for temporal black-box models. Proc. Conf. AAAI Artif. Intell., 36, 8351–8360, https://doi.org/10.1609/aaai.v36i8.20810.
Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Advances in Neural Information Processing Systems, Curran Associates, Inc., 5998–6008, https://www.bibsonomy.org/bibtex/c9bf08cbcb15680c807e12a01dd8c929.
Wang, J., R. K. W. Wong, M. Jun, C. Schumacher, R. Saravanan, and C. Sun, 2021: Statistical and machine learning methods applied to the prediction of different tropical rainfall types. Environ. Res. Commun., 3, 111001, https://doi.org/10.1088/2515-7620/ac371f.
Xu, W., R. T. Chen, X. Li, and D. Duvenaud, 2022: Infinitely deep Bayesian neural networks with stochastic differential equations. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, G. Camps-Valls, F. J. R. Ruiz, and I. Valera, Eds., Vol. 151, PMLR, 721–738, https://proceedings.mlr.press/v151/xu22a/xu22a.pdf.
Yang, J., M. Jun, C. Schumacher, and R. Saravanan, 2019: Predictive statistical representations of observed and simulated rainfall using generalized linear models. J. Climate, 32, 3409–3427, https://doi.org/10.1175/JCLI-D-18-0527.1.