Prediction of Tropical Pacific Rain Rates with Overparameterized Neural Networks

Hojun You aDepartment of Mathematics, University of Houston, Houston, Texas

Search for other papers by Hojun You in
Current site
Google Scholar
PubMed
Close
,
Jiayi Wang bDepartment of Mathematical Sciences, The University of Texas at Dallas, Richardson, Texas

Search for other papers by Jiayi Wang in
Current site
Google Scholar
PubMed
Close
,
Raymond K. W. Wong cDepartment of Statistics, Texas A&M University, College Station, Texas

Search for other papers by Raymond K. W. Wong in
Current site
Google Scholar
PubMed
Close
,
Courtney Schumacher dDepartment of Atmospheric Sciences, Texas A&M University, College Station, Texas

Search for other papers by Courtney Schumacher in
Current site
Google Scholar
PubMed
Close
,
R. Saravanan dDepartment of Atmospheric Sciences, Texas A&M University, College Station, Texas

Search for other papers by R. Saravanan in
Current site
Google Scholar
PubMed
Close
, and
Mikyoung Jun aDepartment of Mathematics, University of Houston, Houston, Texas

Search for other papers by Mikyoung Jun in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

The prediction of tropical rain rates from atmospheric profiles poses significant challenges, mainly due to the heavy-tailed distribution exhibited by tropical rainfall. This study introduces overparameterized neural networks not only to forecast tropical rain rates but also to explain their heavy-tailed distribution. The investigation is separately conducted for three rain types (stratiform, deep convective, and shallow convective) observed by the Global Precipitation Measurement satellite radar over the west and east Pacific regions. Atmospheric profiles of humidity, temperature, and zonal and meridional winds from the MERRA-2 reanalysis are considered as features. Although overparameterized neural networks are well known for their “double descent phenomenon,” little has been explored about their applicability to climate data and capability of capturing the tail behavior of data. In our results, overparameterized neural networks accurately estimate the rain-rate distributions and outperform other machine learning methods. Spatial maps show that overparameterized neural networks also successfully describe the spatial patterns of each rain type across the tropical Pacific. In addition, we assess the feature importance for each overparameterized neural network to provide insight into the key factors driving the predictions, with low-level humidity and temperature variables being the overall most important. These findings highlight the capability of overparameterized neural networks in predicting the distribution of the rain rate and explaining extreme values.

Significance Statement

This study aims to introduce the capability of overparameterized neural networks, a type of neural network with more parameters than data points, in predicting the distribution of tropical rain rates from gridscale environmental variables and explaining their tail behavior. Rainfall prediction has been a topic of importance, yet it remains a challenging problem for its heavy-tailed nature. Overparameterized neural networks correctly captured rain-rate distributions and the spatial patterns and heterogeneity of the observed rain rates for multiple rain types, which could not be achieved by any other previous statistical or machine learning frameworks. We find that overparameterized neural networks can play a key role in general prediction tasks, with potential expanded applicability to other domains with heavy-tailed data distribution.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Mikyoung Jun, mjun@central.uh.edu

Abstract

The prediction of tropical rain rates from atmospheric profiles poses significant challenges, mainly due to the heavy-tailed distribution exhibited by tropical rainfall. This study introduces overparameterized neural networks not only to forecast tropical rain rates but also to explain their heavy-tailed distribution. The investigation is separately conducted for three rain types (stratiform, deep convective, and shallow convective) observed by the Global Precipitation Measurement satellite radar over the west and east Pacific regions. Atmospheric profiles of humidity, temperature, and zonal and meridional winds from the MERRA-2 reanalysis are considered as features. Although overparameterized neural networks are well known for their “double descent phenomenon,” little has been explored about their applicability to climate data and capability of capturing the tail behavior of data. In our results, overparameterized neural networks accurately estimate the rain-rate distributions and outperform other machine learning methods. Spatial maps show that overparameterized neural networks also successfully describe the spatial patterns of each rain type across the tropical Pacific. In addition, we assess the feature importance for each overparameterized neural network to provide insight into the key factors driving the predictions, with low-level humidity and temperature variables being the overall most important. These findings highlight the capability of overparameterized neural networks in predicting the distribution of the rain rate and explaining extreme values.

Significance Statement

This study aims to introduce the capability of overparameterized neural networks, a type of neural network with more parameters than data points, in predicting the distribution of tropical rain rates from gridscale environmental variables and explaining their tail behavior. Rainfall prediction has been a topic of importance, yet it remains a challenging problem for its heavy-tailed nature. Overparameterized neural networks correctly captured rain-rate distributions and the spatial patterns and heterogeneity of the observed rain rates for multiple rain types, which could not be achieved by any other previous statistical or machine learning frameworks. We find that overparameterized neural networks can play a key role in general prediction tasks, with potential expanded applicability to other domains with heavy-tailed data distribution.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Mikyoung Jun, mjun@central.uh.edu

1. Introduction

The tropical Pacific region plays a crucial role in global climate variability. Rainfall anomalies associated with the El Niño–Southern Oscillation phenomenon generate atmospheric waves that propagate across the globe and impact weather in remote regions of the globe. However, climate models still suffer from biases in this region that affect assessments of weather and climate risk (Lee et al. 2022; Sobel et al. 2023). Several attempts have been made to use statistical and machine learning frameworks to estimate rain rates in the tropics. Previous attempts applied machine learning techniques to the parameterization of convection using cloud-resolving model or high-resolution model simulations (Brenowitz and Bretherton 2018; O’Gorman and Dwyer 2018; Rasp et al. 2018). These efforts bring computational gains, but they cannot overcome inherent model deficiencies and may not identify the underlying physical relationships. Instead, Yang et al. (2019) sought to capture the relationship between atmospheric features and the rain amount by applying a generalized linear model (GLM) to observational data. GLMs naturally provide the relationship through their coefficients. But, they may struggle to explain the tail of the rain-rate distributions accurately due to their parametric assumption on the density function. Building upon this work, Wang et al. (2021) improved the predictive performance by implementing random forest (RF) and neural network (NN) methods. These machine learning methods allowed for more flexibility of the relationship and demonstrated better performance than GLM in predicting the rain-rate distribution and capturing its tail behavior. However, the distribution from their outputs still deviated significantly from the observations (Obs). In this work, we experiment with overparameterized NNs (Over NN), where the number of NN parameters exceed the sample size, and find that they enhance the previous results and effectively capture the tail of the rain-rate distribution.

The overparameterized regime has gained significant attention since the double descent phenomenon was proposed by Belkin et al. (2019). The double descent describes the pattern of the test error curve with respect to the model capacity (i.e., parameters; Fig. 1). According to classical wisdom, the test curve exhibits a U-shaped pattern before reaching the interpolation threshold, where perfect interpolation can be attained. Belkin et al. (2019) showed that beyond the interpolation threshold, the test error decreases again, creating a double descent pattern of the test curve. Interestingly, the notorious issue of overfitting, where an excessive number of model parameters compared to the number of samples harms the test performance, does not show up beyond the interpolation threshold. Since Belkin et al. (2019), numerous studies have been conducted to explore and explain this phenomenon theoretically and experimentally (Geiger et al. 2019; Bartlett et al. 2020; d’Ascoli et al. 2020; Nakkiran et al. 2020, 2021; Muthukumar et al. 2021; Gamba et al. 2022; Hastie et al. 2022). However, most of these studies focused on classification tasks, with little attention given to leveraging the double descent phenomenon for regression on real datasets. In addition, to the best of our knowledge, the capability of overparameterized NNs in explaining the tail behavior of data has never been investigated. In this study, we introduce overparameterized NNs to rainfall predictions and present numerical findings that demonstrate the efficacy of this approach.

Fig. 1.
Fig. 1.

Illustration of the double descent phenomenon. The training curve is represented by the black line, while the test curve is depicted in blue. The point on the x axis where the training error theoretically reaches zero is the “interpolation threshold.”

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0083.1

Another crucial aspect of interest in the rainfall dataset is identifying key features for our model. Even if NNs successfully capture complex relationships between variables, it may be difficult to disentangle the individual contributions of each feature. To address this issue, we employed the permutation importance (PI) approach devised by Breiman (2001). The PI approach allows us to assess the feature importance by examining the change in the model output with permuted feature values from the original model output. By adopting the PI approach, we are able to provide insights into the key features for our overparameterized NNs and enhance the interpretability of our model.

The structure of the remainder of this paper is as follows. We provide a detailed description of the data used in our study in section 2. In section 3, we recap machine learning methods used in previous studies and introduce the overparameterized regime with a suitable training method. Section 4 first checks if our overparameterized NNs are properly trained and exhibit the double descent phenomenon. Then, we compare prediction results of the rain-rate distribution with observations and other machine learning methods. Last, we summarize our findings and discuss future work.

2. Data description

We used 8 years of June–August (JJA) rain-rate data from 2015 to 2022 (Fig. 2). The rain-rate observations were obtained from the Global Precipitation Measurement (GPM) dual-frequency precipitation radar (DPR) version 7 dataset (Iguchi and Meneghini 2021). JJA data from 2015 to 2018 were used for training, and JJA data from 2019 to 2022 were used for testing. The observational domain was limited to the tropical west Pacific (WP; 130°E–180°, 15°S–15°N) and east Pacific (EP; 180°–130°W, 15°S–15°N) regions, as illustrated in Fig. 2. These two regions were selected for their distinct convective environments, with the WP box representing the warm pool region and South Pacific convergence zone (SPCZ) and the EP box impacted by equatorial upwelling with a more distinct intertropical convergence zone (ITCZ) north of the equator. The EP box is also where global climate models (GCMs) commonly experience an erroneous double ITCZ in boreal summer (e.g., Oueslati and Bellon 2015).

Fig. 2.
Fig. 2.

The average GPM DPR rain rate (mm day−1) during JJA 2015–22 over the tropical Pacific. The data domains for this study were separated into the WP (blue) and EP (red) regions.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0083.1

The orbital DPR data were gridded to a temporal resolution of 3 h and a spatial resolution of 0.5°, resulting in 6000 spatial locations for each region. We chose this time and space resolution to match the specifications of a higher-resolution GCM output; however, it is important to note that the DPR data represent just snapshots within the 3-h period because of its scanning geometry. GPM is in an inclined low-Earth orbit, so it only revisits particular points on Earth’s surface every day or so, and the times of day vary because of the precessing nature of the orbit (although this feature of the orbit is important to capture the full diurnal cycle). The DPR has a footprint size of 5 km at nadir and a swath width of 245 km that samples only part of each domain each day. To ensure robust sampling in each grid box, we only consider 0.5° grids that had an overpass containing at least 50 DPR pixels (or about half of the grid box) regardless of whether the pixels were raining or not. We average the mean DPR-observed near-surface rain rate over the entire 0.5° grid for a particular 3-h period and a particular rain type. Therefore, the grid-averaged values will be lower than instantaneous 5-km pixel values.

For the rain types, we adopted a three-type categorization: stratiform, deep convective, and shallow convective (Funk et al. 2013). This separation allows us to account for the unique nature of each rain type, with deep convection extending through the depth of the troposphere due to its stronger updrafts and shallow convection being confined to echo tops below the 0°C level because of weaker updrafts and/or a limiting environment (such as dry air) at upper levels (Schumacher and Funk 2023). Stratiform rain in the tropics forms from deep convection and is an important component of mesoscale convective systems (Houze 1997). The percent of total rain for each rain type in each domain is summarized in Table 1. As our focus is on predicting the rain rate, we only use samples with nonzero rain rates, with numbers of samples for each region indicated in Table 2. Tables 1 and 2 show that while the deep convective rain type has the lowest frequencies in both the WP and EP regions, it contributes about 40% of the total rain. On the contrary, the shallow convective rain type shows the highest frequencies in both regions, yet it makes up the smallest portion of the total rain amount (about 10%–20%) in both regions. This emphasizes the distinct nature of each rain type and demonstrates the importance of separately considering each rain type.

Table 1.

Rain amount percentage of each rain type to the total rainfall amount in the WP and EP regions during JJA.

Table 1.
Table 2.

The number of 0.5° grid boxes with nonzero rain rate for each rain type and region in the training (JJA 2015–18) and testing (JJA 2019–22) datasets.

Table 2.

As in Yang et al. (2019), we consider humidity, temperature, zonal wind, meridional wind, latitude, longitude, and latent heat flux as features for the rain-rate prediction. The features used in this study were computed from the MERRA-2 reanalysis (Rienecker et al. 2011). The MERRA-2 humidity, temperature, zonal wind, and meridional wind fields consist of vertical profiles at 40 pressure levels, leading to 163 features in total including latitude, longitude, and surface latent heat flux. MERRA-2 data are available at 3 hourly and 0.5° resolution and represent instantaneous atmospheric state snapshots. The MERRA-2 grid value that occurs before the DPR overpass was used in the training process (e.g., a 0600 UTC grid value would be used for a DPR overpass between 0600 and 0900 UTC). MERRA-2 grids not matching DPR overpasses were ignored.

Rainfall prediction presents a notable challenge due to the extremely heavy tail in the rain-rate distribution, indicating a relatively large portion of extreme values. Table 3 summarizes some statistics of the training dataset for each rain type and region. The percentiles in Table 3 represent the ranking of a particular value within a dataset. For instance, the 75th percentile indicates that 25% of the data values are above it and the 99th percentile means that only 1% of the data values exceed it. In the case of stratiform and deep convective rain, the mean exceeds both the median and the 75% percentile value. This indicates that the data are highly heavy tailed, as the mean values are significantly influenced by the presence of extreme values. Although shallow convective rain may not be as heavy tailed as the other two rain types, the mean values lying between the median and 75% percentile values still point out the presence of a heavy tail in the data. Moreover, finer temporal or spatial resolutions shift the distributions of rain rates further to the right, indicating heavier tails in the data. We hypothesize that overparameterized methods possess the capability to perform comparably well even with more heavily tailed distributions. However, employing such methods could introduce additional challenges in terms of optimization.

Table 3.

Mean, percentiles (median, 75%, 90%, and 99%), and maximum DPR rain-rate value (mm h−1) of the training dataset for each rain type in each region for a 0.5° grid resolution.

Table 3.

A challenge from dealing with heavy-tailed data arises from a large portion of extreme samples. These extreme values can distort the overall patterns from the majority of the data and, hence, decrease the predictive power of models. In addition, many traditional methods may lack the necessary flexibility to cover the full range of values from a majority of the data to the extreme values. As a result, these methods may struggle to estimate and predict the data, resulting in unsatisfactory overall performance. This limitation can particularly affect a model’s ability to accurately capture and forecast extreme events, which is crucial in tasks like rainfall forecasting.

3. Methods

In the following, we present four statistical and machine learning frameworks to investigate the relationship between the rain rates and features.

a. Generalized linear model

The GLM (McCullagh and Nelder 1989) is a flexible statistical model that provides an extension to ordinary linear regression by allowing nonnormal distribution of the response variable. In contrast to ordinary linear regression, where the mean of the response μ is directly modeled as a linear function of the features, the GLM introduces the link function to establish a nonlinear relationship between the response and the features. The link function enables the GLM to handle a wider range of data distributions and provide greater capability in modeling complex relationships.

The GLM consists of three essential components: the random component, the systematic component, and the link function. The random component deals with the distributional assumption of the response variable, allowing for nonnormal distributions. The systematic component represents the linear predictor, which is formed by combining the features with their regression coefficients. The link function constructs a relationship between the random and systematic components, i.e.,
E(Y|X)=μ=g1(XTβ),
where YR is the response, XRp is the feature with p dimensions, βRp is the coefficient of X, and g is the link function. Depending on the distribution of the response and the link function, GLMs can take on different names such as logistic regression and gamma regression.
Following Yang et al. (2019) and Wang et al. (2021), we implement gamma regression for its resemblance to the distribution of rainfall in terms of skewness. Suppose Y(s) is a response at location sSRd, where S is an observation domain. Then, gamma regression, with the log-link function g(x) = log(x) has the following structure:
Y(s)gamma[α(s),β(s)],E[Y(s)|X(s)]=α(s)β(s)=μ(s),log[μ(s)]=η0+η1X1(s)++ηpXp(s),
where α(s) > 0 and β(s) > 0 are parameters from the gamma distribution where a probability density function is given as
f(y;α,β)=1βαΓ(α)yα1ey/β,0<y<.
Here, Γ is a gamma function, X(s) is the feature vector at location s, and ηi (i = 0, …, p) are coefficients for each feature with the total number of p. Parameter estimation in the gamma regression can be done by maximum likelihood estimation, which is provided by R with the glm function.

Despite some advantages of the gamma regression, such as its ability to handle skewed data, its limited parametric distribution we assume for the error may not adequately capture the characteristics of heavy-tailed data such as rainfall (Yang et al. 2019; Wang et al. 2021). Thus, adopting machine learning techniques, which do not impose such strong restrictions as GLMs, could potentially enhance the estimation performance.

b. Random forest

RF has gained attention for its powerful prediction capability since it first appeared in Breiman (2001). RF is an ensemble method that aggregates multiple decision trees to achieve accurate and robust predictions. In an RF model, each decision tree independently learns from random samples of training data with a random subset of features at each split. This random sampling and subsetting process reduces the risk of overfitting by preventing the trees from becoming too specialized to the training data or correlated to one another. Aggregating the outcomes from multiple less correlated trees leads to outcomes with lower variance and improved accuracy.

Although RF may lack interpretability compared to GLM, its performance in fitting and predicting rainfall data exceeded GLM and matched NN (Wang et al. 2021). In our study, we train a random forest model in R with the randomForest function supported by the randomForest package. The number of trees was set to 100, and the number of features selected at each split was set to the largest number below thetotalnumberoffeatures, which is 12. For the other configurations, we used the default values provided with randomForest function.

c. Neural networks

1) Multilayer perceptron NN

As the size of datasets grow and high-performing computation becomes more available, NNs have been one of the most popular frameworks in the statistical and machine learning communities. Inspired by the structure of the human brain, NNs are computational models which are defined by hidden units and connections between the hidden units. Hidden units and connections correspond to the neurons and the synapses in human brain, respectively. A typical example would be multilayer perceptron (MLP), which connects all the units from the input to the output (LeCun et al. 2015). Besides MLP, convolutional NN (Simonyan and Zisserman 2014), recurrent NN (Hochreiter and Schmidhuber 1997), and transformer (Vaswani et al. 2017) are frequently used structures. In this study, MLP is used to compare the standard NN performance with GLM and RF, as our primary goal lies in introducing overparameterized NNs.

2) Overparameterized regime

An overparameterized regime refers to when the number of parameters in a model exceeds the training sample size. Overparameterized models have gained significant attention in recent years (Cao et al. 2022; Hastie et al. 2022; Liu et al. 2022; Xu et al. 2022; Gao et al. 2023), particularly with the rise of deep learning, as many state-of-the-art deep learning methods fall into this category. From classical wisdom, an excess number of parameters compared to the number of data points usually leads to overfitting. However, it has been observed that overparameterized NN models do not suffer from traditional overfitting and result in decent generalization.

Belkin et al. (2019) experimentally showed how classical understanding of overfitting may have to change in an overparameterized regime. Their study demonstrated that overparameterized models not only perfectly interpolate the training data but also attain lower test error than what can be achieved in underparameterized regimes. According to the classical understanding, the test error curve first decreases until an unknown “sweet spot” where the minimum test error is obtained and then it starts to increase again. The maximum test error occurs around the interpolation threshold, where the number of parameters matches the sample size. Beyond the interpolation threshold, also called the overparameterized regime, the test error starts to decrease again. This phenomenon is often called double descent.

However, it becomes more challenging to train overparameterized models as we need to explore a larger parameter space. Belkin et al. (2019) suggested a sequential training procedure to gradually increase the model size, with the final weight of smaller models serving as the pretrained initialization for the larger models. We followed a similar sequential training procedure and considered NNs with different depths (L = 4, 5, 6) and widths (W = 12, 24, …, 600). For example, when training NNs with L = 4, the learned weights of an NN with L = 4 and W = 12 were used as the initialization for an NN with L = 4 and W = 24. The remaining weights were initialized by default in PyTorch, which in our case was the Kaiming uniform initialization (He et al. 2015). For deeper NNs (L = 5, 6), the same width was maintained, but NNs with one less layer were used for initialization. Our final models were NNs with L = 6, but NNs with L = 5 were considered for initialization of NNs with L = 6, as we found that this process stabilized the training procedure. This pretraining strategy allowed us to optimize over a larger parameter space while benefiting from the knowledge gained by smaller models in the sequential progression.

Every experiment with NNs was conducted by PyTorch with the Adam optimizer (Kingma and Ba 2014). The total number of epochs for every NN with a different size was set to be 2000, and the learning rate decayed by 0.9 every 100 epochs. To quantify the discrepancy between fitted and target values, we used the mean-square error (MSE) as the loss function. It is important to note that no explicit or implicit regularization techniques, such as dropout or imposing a penalty on weights, were utilized. More details on the training configurations for each region and rain type are listed in Table 4.

Table 4.

Training configurations for NNs with six layers.

Table 4.

Each model was trained on a single Tesla V100 GPU, with training times influenced by both rain type and region due to varying sample sizes. The most data-intensive case, shallow convective rain in the WP region, demanded the longest training due to its largest sample size. In this case, training the smallest model (L = 4 and W = 12) took 56 min, while the largest (L = 6 and W = 600) required 65 min. Due to the sequential training procedure, the entire training time for the largest model extended to 55 h as it requires training of smaller models for initialization. Conversely, the deep convective rain type in the EP region had shorter training times, with the smallest and largest models taking 25 and 26 min, respectively. The entire training time for this case was 23 h.

d. Model interpretation

One of the popular approaches to explain variable importance in machine learning is permutation importance (Breiman 2001; Fisher et al. 2019; Sood and Craven 2022; Cheung et al. 2022; Ramos-Valle et al. 2023; Sekiyama et al. 2023). Permutation importance quantifies the importance of individual features by assessing the impact on test errors when their values are randomly shuffled while keeping the other features unchanged. To calculate the permutation importance for a specific feature (e.g., temperature at a particular height), we first shuffle values of the target feature while leaving the remaining features in place. Then, we obtain model outcomes from these permuted data and compute the increase in the loss values. A larger increase in loss values indicates that the target feature plays a more significant role in the model, as breaking the relationship between this target feature and the model outcome leads to a greater deterioration in model performance.

The original idea, proposed by Breiman (2001), aimed to explore local variable importance, i.e., the variable importance for an individual sample. However, our primary objective is to achieve transparency for the entire model; hence, we employ permutation importance to derive global variable importance (König et al. 2021; Sood and Craven 2022). The only distinction between calculating local and global variable importance lies in whether each permutation is kept distinct or identical across samples for a given target feature. While there are other popular interpretation methods such as Shapley additive explanations (SHAP; Lundberg and Lee 2017) and local interpretable model-agnostic explanations (LIME; Ribeiro et al. 2016), they mainly provide local variable importance. Since our focus is on global variable importance, permutation importance was considered the most suitable choice for this study.

To summarize our variable importance findings, we employed the following process. For each feature, we conducted 100 random permutations, ensuring the robustness of our variable importance investigations. In each permutation, we identified the top 10 most significant features and aggregated these selections across all permutations. Finally, we determined key features for our model by reporting the top 10 most frequently occurring ones within this combined set of features.

4. Results

Only data points with nonzero rain rates were used for the results presented below. We compare four statistical and machine learning methods: GLM, RF, the underparameterized NN (Under NN), and the overparameterized NN. We chose the underparameterized NN model that yielded the best test error before peaks in the test curve and the overparameterized NN model with the largest number of parameters. The best models were separately obtained for the WP and EP regions and three different rain types: stratiform, deep convective, and shallow convective.

a. Double descent

We first examine whether overparameterized NNs are well trained. To properly train overparameterized NNs, the model size should be increased beyond the interpolation threshold, where we can achieve zero training error and observe the double descent curve (Belkin et al. 2019; Fig. 1). Then, overparameterized models at the end of the double descent curve are used to see how these models perform with heavy-tailed rainfall data.

Figure 3 illustrates the training and testing curves obtained from NNs with varying depths for different regions and rain types. As the model size increases, we are able to achieve nearly zero training error across all rain types. However, due to the heavy-tailed nature of the rainfall dataset, larger models were necessary to attain sufficiently low training error beyond the typical interpolation thresholds (i.e., where the sample size is equal to the number of model parameters as indicated by the vertical dashed line in each panel). In particular, NNs for deep convective rain need larger models to achieve a nearly zero training error, as it is the most heavy-tailed rain type. Nakkiran et al. (2021) also pointed out that other factors such as data distribution and training procedure may have influence on the model complexity, which is directly related to the double descent phenomenon.

Fig. 3.
Fig. 3.

Training and test curves for the WP (left two columns) and EP (right two columns). In each subpanel, the training results are shown on the left and the test results are shown on the right. The ratio of the number of parameters to the number of samples is on the x axis, and the RMSE values (mm h−1) are on the y axis. The theoretical interpolation threshold is marked as a blue dashed line. Black, red, and blue curves represent NNs with 4, 5, and 6 layers, respectively.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0083.1

Notably, all the test curves in Fig. 3 exhibit the double descent phenomenon. The test errors after the peaks decrease and reach minimum or near-minimum points, although the peaks in the test curves occur at different positions. This observation holds true for all regions and rain types examined in the study, indicating that the overparameterized NNs have been legitimately trained. A minor point to discuss is the position of the peaks in the test errors. Specifically, models with six layers often reach their peak before reaching the interpolation threshold. In theory, with simple frameworks such as linear regression, the peaks are expected to occur at the interpolation threshold. However, a shift in the peak position has been observed in the literature and various reasons such as the data distribution, noises in responses, or regularization may cause the shift (d’Ascoli et al. 2020; Nakkiran et al. 2021; Gamba et al. 2022). Henceforth, all overparameterized NN results presented for each region and rain type are derived from the models at the rightmost end of the x axis in Fig. 3.

b. Rain-rate distributions

Figure 4 illustrates percentile plots of estimated (train) and predicted (test) rain-rate values from the GLM, RF, and NN methods over the WP and EP. Recall that the rain rates are based on 0.5° grid means, so the values will be shifted lower compared to the native 5-km footprint resolution of the DPR. For both regions and all rain types, the distributions of estimated values from the overparameterized NNs (red dashed line) successfully align with the distribution of the DPR observations (solid black line). This result is expected given that overparameterized NNs achieved a nearly zero training error in Fig. 3. The other methods could not fully recover the training data and overestimate occurrence at low rain rates and underestimate occurrence at large rain rates. This result is consistent with Yang et al. (2019) and Wang et al. (2021).

Fig. 4.
Fig. 4.

Rain-rate percentile plots (mm h−1) for 0.5° grids in the WP and EP regions for each rain type. Values on the x axis indicate the 90%, 99%, and 99.9% percentiles from the training and test datasets for each domain and rain type.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0083.1

In the test results, overparameterized NNs outperform the other methods and accurately capture the general shape of the rain-rate distributions of the test dataset (Fig. 4). While the test lines for the overparameterized NNs no longer lie directly on the observed DPR rain rates, they still hew closely to each other at both low-to-moderate percentiles (0%–90%) and tail values (90%–99.9%). There are generally better rain-rate predictions over the EP, similar to the GLM performance in Yang et al. (2019). In particular, the predicted distribution of stratiform rain rate over the EP is very well represented from overparameterized NNs. The deep convective prediction has especially high fidelity in the EP, whereas deep convective rain rates are all overpredicted in the WP.

Figure 4 also shows that all the other methods exhibit markedly different patterns in their prediction distributions compared to the true rain-rate distribution. Both GLM and RF tend to overpredict small rain-rate values in the test data while underpredicting large values. This behavior is often seen in GCMs (e.g., Dai 2006; Fiedler et al. 2020) and indicates that these methods produce a narrower range of rain-rate predictions compared to the wider distributions in overparameterized NNs and the actual observations. Underparameterized NNs initially show an underprediction at low rain rates, but then, they bounce back to the observed rain-rate distribution at higher values. However, underparameterized NNs eventually end up generating too large of rain rates compared to the true values. Overall, these findings verify the superiority of the overparameterized NNs in accurately predicting rain-rate distributions.

Table 5 provides quantitative comparison results between the different model rain-rate distributions using median absolute errors (MedAEs) (Longman et al. 2020; Singh and Griffiths 2021; Lucas et al. 2022; Bai and Tahmasebi 2023), defined as follows:
MedAE=median(|y1y^1|,,|yny^n|),
where yi and y^i (i = 1, …, n) represent the true observation and the model outcome, respectively, with a sample size of n. Despite the common use of RMSE and mean absolute error (MAE) (Wang et al. 2021), we identified their sensitivity to extreme values. Given the presence of a relatively significant proportion of extreme values in the rainfall dataset, and considering our focus on explaining those extreme values, a few erroneous yet extreme outcomes could potentially distort the metrics. Consequently, methods that primarily produce small values might result in smaller RMSE or MAE values, which does not align with the objective of our study. In contrast, MedAE is renowned for its robustness to extreme values, making it a suitable choice for summarizing the outcomes of our study. In Table 5, across all cases, overparameterized NNs consistently exhibit the smallest or closely ranked smallest MedAE values, suggesting that this method is producing more accurate predictions than the other candidates in general.
Table 5.

MedAE of the predicted values (mm h−1) for each model, with values in bold indicating the smallest value among all methods considered.

Table 5.

c. Spatial maps

We next present the geographical patterns produced by each method. The rain-rate prediction results are averaged over multiple years (2019–22) to generate a single spatial map for each method. Figures 5 and 6 display the averaged spatial maps over the WP and EP regions, respectively. While the observations indicate strong spatial heterogeneity over the domains for each rain type, RF and GLM produce very smooth prediction maps. Underparameterized NNs produce nonsmooth spatial patterns, but the high rain rates are too clustered and large. Overparameterized NNs, on the other hand, demonstrate their remarkable ability to capture the spatial heterogeneity of the DPR rain-rate observations. Although the results do not portray individual weather events but rather a 4-yr average, their pebble-like quality is much more representative of true weather events, compared to the underparameterized NN boulders and ruffled sand of the RF and GLM maps.

Fig. 5.
Fig. 5.

Maps of the predicted mean 0.5° JJA rain rates (mm h−1) from 2019 to 2022 over the WP region for (a) stratiform, (b) deep convective, and (c) shallow convective rain types using results from the Over NN, Under NN, RF, and GLM. The medians of absolute error are presented in parentheses next to each model name.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0083.1

Fig. 6.
Fig. 6.

As in Fig. 5, but over the EP region.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0083.1

In terms of the overall spatial patterns, all the methods capture the widespread rainfall of the WP warm pool (Fig. 5) and the ITCZ in the northern part of the EP domain (Fig. 6). However, the overparameterized NN produces higher fidelity in the patterns. For example, while most of the training was done over ocean grid points, the overparameterized NN provides more distinct predictions over Papua New Guinea in the southwest corner of the WP domain and is able to capture orographic features to a higher degree than the other methods, especially in the shallow convective field (Fig. 5c). This result is promising for the future use of overparameterized NNs for rainfall prediction over continents. In the EP, the overparameterized NN produces a more distinct ITCZ in the stratiform and deep convective fields (Figs. 6a,b) while also reproducing the pattern in shallow convective rainfall at the equator (Fig. 6c) despite the fact that sea surface temperatures are not used as a predictor.

d. Interpreting overparameterized neural networks

In this section, we provide the feature importance for overparameterized NNs with permutation importance. Table 6 presents the 10 top features for the test output, categorized into five groups: humidity, temperature, zonal wind, meridional wind, and others. Each category comprises values at 40 pressure levels except for the “Others” category, which contains fields reported at only one level (i.e., longitude, latitude, and latent heat flux). The individual feature contributions to the model output might not hold significant meaning since permuting one variable while keeping the other variables the same may lead to unrealistic environmental conditions. Instead, our primary focus lies in identifying which category plays a significant role in the model output. Future work would be to analyze the individual and combined feature contributions in more detail.

Table 6.

Categories of the top 10 important features from Over NN on test data for the WP and EP regions. The numbers in parentheses represent pressure levels (hPa) of selected features.

Table 6.

Higher numbers in Table 6 represent a larger importance. For example, six pressure levels from the humidity category at or below 750 hPa are identified as the most significant for explaining shallow convective rain rates in the EP, while the other four most important features are temperature values at low (925 and 875 hPa) and mid-to-upper (400 and 350 hPa) levels. While we used all available height levels (including the stratosphere) from MERRA-2 in our training and testing, we only report on the importance of pressure levels within the troposphere (i.e., up to 100 hPa), as almost all weather phenomena occur within this atmospheric layer.

Table 6 shows that the average number of variables in the humidity category that are selected as key features is 4.33, indicating that 4.33 variables from the humidity category were consistently chosen as the 10 most influential features in rain-rate prediction across the tropical Pacific. This result is consistent with our common understanding that humidity has a direct relationship with rain rate (e.g., Bretherton et al. 2004), making it a critical factor in predicting rainfall. Furthermore, for most of the rain types, humidity contains the most variables among key features, which is consistent with the results of Ahmed and Schumacher (2015) that indicated a strong relationship between column humidity and shallow convective, deep convective, and stratiform rain rates across the tropical oceans. The humidity category is closely followed by temperature in importance, which aligns well with the results from Yang et al. (2019). We note that most of the pressure levels highlighted for humidity and temperature are at or below 700 hPa.

The most influential categories in explaining the rainfall prediction results over the tropical Pacific in Table 6 after humidity and temperature are, in their respective order, zonal wind, meridional wind, and others. Even though wind (or wind shear) is not typically included as a parameter in GCM convective parameterizations, the small but perceptible presence of zonal and meridional winds in Table 6 suggests that the inclusion of winds has the potential to improve convective and stratiform rain predictions. A more precise quantification of these relationships is left for future studies; however, differences in the category rankings between the WP and EP and between rain types are consistent with our general understanding of large-scale environmental factors that affect rain production in the tropics.

5. Discussion

Capturing the observed climate variability in the tropical Pacific is one of the great challenges currently faced by climate modeling (Lee et al. 2022; Sobel et al. 2023). The rainfall response to sea surface temperature anomalies in this region plays a key role in driving this climate variability. In this study, we applied statistical and machine learning techniques to model the relationship between the large-scale environment and rainfall. With growing interest in the application of machine learning to climate modeling (Brenowitz and Bretherton 2018; O’Gorman and Dwyer 2018; Rasp et al. 2018), our results can help guide the development of new machine learning–derived parameterizations of rainfall.

We found that properly trained overparameterized NNs correctly explained the rain-rate distributions over the tropical Pacific for multiple rain types (stratiform, deep convective, and shallow convective), including their tail behavior. Overparameterized NNs also outperformed the other methods such as GLM and RF in both predicting the rain-rate distribution and capturing spatial patterns. The permutation importance was implemented to address the feature importance in the NN model outcomes, producing consistent results with those obtained from the GLM framework in Yang et al. (2019), who also found humidity and temperature to be the most important environmental variables. To the best of our knowledge, the benefits of overparameterized models have been neither experimentally nor theoretically verified for heavy-tailed datasets. Given our successful results, the applicability of overparameterized NNs can be expanded to various other real-world datasets.

It is notable that the training and test curves in Fig. 3 show slightly different behaviors compared to previous studies. The interpolation thresholds have shifted to the right due to the heavy-tailed nature of the dataset, as we need larger model complexity to fully explain samples with large rain-rate values. The locations of the peaks in the test curves were often placed much earlier than the interpolation thresholds. While it is partly known that factors like data distribution and noise in responses can influence the locations of the peaks, the majority of reasons and their impacts remain unclear and unexplored. One workaround with this issue is to consider l2 regularization to the weight of NNs (Nakkiran et al. 2020). They found that with the optimally chosen tuning parameter, the peak can be completely eliminated, and even nonoptimal regularization can alleviate the peak. Additionally, optimal early stopping based on test errors may also contribute to removing the peak, though this only happened for certain cases (Nakkiran et al. 2021). Our study does not delve further into this matter, as our primary goal is to introduce overparameterized NNs to rainfall prediction and examine the prediction performance of the method. Our findings indicate that no matter where the peaks are observed, we can attain minimum or near-minimum test errors in the overparameterized regime and the obtained models correctly predict the rain-rate distribution. We hope that future research, both in theory and applications, will shed light on the different patterns of the double descent curve that we observed with heavy-tailed data.

In addition, the final models for each rain type and region have the potential for further improvement through fine-tuning and optimal selection of the hyperparameters. For instance, as observed in Fig. 3, deeper neural networks consistently lead to better prediction performance even when the number of parameters is similar. The choice of activation functions and the optimizer also played a crucial role in determining the model performance. However, the process of searching for optimal hyperparameters needs to be done for each dataset individually, and it can be tedious work. Furthermore, the improvements achieved through hyperparameter tuning may not yield significant differences in the final results. We followed Nakkiran et al. (2021) to choose the Adam optimizer over stochastic gradient descent and used an MLP structure to claim that we can obtain desirable results even with simple models. While further fine-tuning and customization of the model structure could potentially lead to improvements, the current method allows us to obtain promising results without the exhaustive search for hyperparameters. This helps maintain a balance between model performance and computational efficiency, making the approach more practical and applicable to real datasets.

We further investigated data transformations such as logarithmic or square root to alleviate the skewness in the data distribution. However, these transformations presented challenges for neural networks in understanding the distribution of extreme events, as they reduced variability in the tail distribution. We also delved into alternative probability distributions like the lognormal distribution for GLM, yet none of the distributions yielded satisfactory results. Last, we performed a number of sensitivity tests to see how flexible the overparameterized NN framework is in predicting rain rates including (i) not separating by rain type, (ii) training on a combined WP/EP domain, and (iii) considering both rain and no-rain events. The results (not shown) indicate that overparameterized NNs still do better than GLM, RF, and underparameterized NNs at capturing the tail and spatial structure of the rainfall over the tropical Pacific but not as well as overparameterized NNs that are trained for separate rain types or regions. This highlights the limitations in capturing the full complexity of total rainfall over large regions. Furthermore, simultaneously modeling both rain and no-rain events proved challenging due to the dataset’s zero-inflated and heavy-tailed nature. The training procedure encountered significant instability, highlighting the complexity of this task, so specialized architectures or optimization methods may be necessary. We leave this exploration for future research.

Acknowledgments.

Aaron Funk processed the gridded radar and MERRA-2 data used in the analysis. Mikyoung Jun acknowledges support by NSF IIS-2123247 and DMS-2105847. Courtney Schumacher acknowledges support by NASA 80NSSC22K0617. Funding for this project was also provided by the TAMU College of Arts and Sciences Seed Funds.

Data availability statement.

The GPM DPR and MERRA-2 data are publicly available from the NASA GES DISC (https://disc.gsfc.nasa.gov/). The specific DPR and MERRA-2 training and test datasets used in this study have been placed on the Texas Data Repository (https://dataverse.tdl.org/). The final overparameterized models can be found in our GitHub repository (https://github.com/HojunYou/Rainfall_prediction_with_overNN).

REFERENCES

  • Ahmed, F., and C. Schumacher, 2015: Convective and stratiform components of the precipitation-moisture relationship. Geophys. Res. Lett., 42, 10 45310 462, https://doi.org/10.1002/2015GL066957.

    • Search Google Scholar
    • Export Citation
  • Bai, T., and P. Tahmasebi, 2023: Graph neural network for groundwater level forecasting. J. Hydrol., 616, 128792, https://doi.org/10.1016/j.jhydrol.2022.128792.

    • Search Google Scholar
    • Export Citation
  • Bartlett, P. L., P. M. Long, G. Lugosi, and A. Tsigler, 2020: Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117, 30 06330 070, https://doi.org/10.1073/pnas.1907378117.

    • Search Google Scholar
    • Export Citation
  • Belkin, M., D. Hsu, S. Ma, and S. Mandal, 2019: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl. Acad. Sci. USA, 116, 15 84915 854, https://doi.org/10.1073/pnas.1903070116.

    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 62896298, https://doi.org/10.1029/2018GL078510.

    • Search Google Scholar
    • Export Citation
  • Bretherton, C. S., M. E. Peters, and L. E. Back, 2004: Relationships between water vapor path and precipitation over the tropical oceans. J. Climate, 17, 15171528, https://doi.org/10.1175/1520-0442(2004)017<1517:RBWVPA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Cao, Y., Z. Chen, M. Belkin, and Q. Gu, 2022: Benign overfitting in two-layer convolutional neural networks. NIPS’22: Proc. 36th Int. Conf. on Neural Information Processing Systems, New Orleans, LA, Association for Computing Machinery, 25 237–25 250, https://dl.acm.org/doi/10.5555/3600270.3602100.

  • Cheung, H. M., C.-H. Ho, and M. Chang, 2022: Hybrid neural network models for postprocessing medium-range forecasts of tropical cyclone tracks over the western North Pacific. Artif. Intell. Earth Syst., 1, e210003, https://doi.org/10.1175/AIES-D-21-0003.1.

    • Search Google Scholar
    • Export Citation
  • Dai, A., 2006: Precipitation characteristics in eighteen coupled climate models. J. Climate, 19, 46054630, https://doi.org/10.1175/JCLI3884.1.

    • Search Google Scholar
    • Export Citation
  • d’Ascoli, S., M. Refinetti, G. Biroli, and F. Krzakala, 2020: Double trouble in double descent: Bias and variance (s) in the lazy regime. Proceedings of the 37th International Conference on Machine Learning, Z. Abbas et al., Eds., Vol. 119, PMLR, 2280–2290, https://proceedings.mlr.press/v119/d-ascoli20a.html.

  • Fiedler, S., and Coauthors, 2020: Simulated tropical precipitation assessed across three major phases of the Coupled Model Intercomparison Project (CMIP). Mon. Wea. Rev., 148, 36533680, https://doi.org/10.1175/MWR-D-19-0404.1.

    • Search Google Scholar
    • Export Citation
  • Fisher, A., C. Rudin, and F. Dominici, 2019: All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res., 20, 181.

    • Search Google Scholar
    • Export Citation
  • Funk, A., C. Schumacher, and J. Awaka, 2013: Analysis of rain classifications over the tropics by version 7 of the TRMM PR 2A23 algorithm. J. Meteor. Soc. Japan, 91, 257272, https://doi.org/10.2151/jmsj.2013-302.

    • Search Google Scholar
    • Export Citation
  • Gamba, M., E. Englesson, M. Björkman, and H. Azizpour, 2022: Deep double descent via smooth interpolation. arXiv, 2209.10080v4, https://doi.org/10.48550/arXiv.2209.10080.

  • Gao, Y., S. Mahadevan, and Z. Song, 2023: An over-parameterized exponential regression. arXiv, 2303.16504v1, https://doi.org/10.48550/arXiv.2303.16504.

  • Geiger, M., S. Spigler, S. d’Ascoli, L. Sagun, M. Baity-Jesi, G. Biroli, and M. Wyart, 2019: Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Phys. Rev., 100E, 012115, https://doi.org/10.1103/PhysRevE.100.012115.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani, 2022: Surprises in high-dimensional ridgeless least squares interpolation. Ann. Stat., 50, 949986, https://doi.org/10.1214/21-AOS2133.

    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2015: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, Institute of Electrical and Electronics Engineers, 1026–1034, https://doi.org/10.1109/ICCV.2015.123.

  • Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 17351780, https://doi.org/10.1162/neco.1997.9.8.1735.

    • Search Google Scholar
    • Export Citation
  • Houze, R. A., Jr., 1997: Stratiform precipitation in regions of convection: A meteorological paradox? Bull. Amer. Meteor. Soc., 78, 21792196, https://doi.org/10.1175/1520-0477(1997)078<2179:SPIROC>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Iguchi, T., and R. Meneghini, 2021: GPM DPR precipitation profile L2A 1.5 hours 5 km v07. Goddard Earth Sciences Data and Information Services Center (GES DISC), accessed 15 February 2022, https://doi.org/10.5067/GPM/DPR/GPM/2A/07.

  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • König, G., C. Molnar, B. Bischl, and M. Grosse-Wentrup, 2021: Relative feature importance. 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan, Italy, Institute of Electrical and Electronics Engineers, 9318–9325, https://doi.org/10.1109/ICPR48806.2021.9413090.

  • LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436444, https://doi.org/10.1038/nature14539.

  • Lee, S., M. L’Heureux, A. T. Wittenberg, R. Seager, P. A. O’Gorman, and N. C. Johnson, 2022: On the future zonal contrasts of equatorial pacific climate: Perspectives from observations, simulations, and theories. npj Climate Atmos. Sci., 5, 82, https://doi.org/10.1038/s41612-022-00301-2.

    • Search Google Scholar
    • Export Citation
  • Liu, C., L. Zhu, and M. Belkin, 2022: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Appl. Comput. Harmonic Anal., 59, 85116, https://doi.org/10.1016/j.acha.2021.12.009.

    • Search Google Scholar
    • Export Citation
  • Longman, R. J., A. J. Newman, T. W. Giambelluca, and M. Lucas, 2020: Characterizing the uncertainty and assessing the value of gap-filled daily rainfall data in Hawaii. J. Appl. Meteor. Climatol., 59, 12611276, https://doi.org/10.1175/JAMC-D-20-0007.1.

    • Search Google Scholar
    • Export Citation
  • Lucas, M. P., R. J. Longman, T. W. Giambelluca, A. G. Frazier, J. Mclean, S. B. Cleveland, Y.-F. Huang, and J. Lee, 2022: Optimizing automated kriging to improve spatial interpolation of monthly rainfall over complex terrain. J. Hydrometeor., 23, 561572, https://doi.org/10.1175/JHM-D-21-0171.1.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S. M., and S.-I. Lee, 2017: A unified approach to interpreting model predictions. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., 4768–4777, https://dl.acm.org/doi/10.5555/3295222.3295230.

  • McCullagh, P., and J. A. Nelder, 1989: Generalized Linear Models. 2nd ed. Chapman and Hall, 526 pp.

  • Muthukumar, V., A. Narang, V. Subramanian, M. Belkin, D. Hsu, and A. Sahai, 2021: Classification vs regression in overparameterized regimes: Does the loss function matter? J. Mach. Learn. Res., 22, 10 10410 172.

    • Search Google Scholar
    • Export Citation
  • Nakkiran, P., P. Venkat, S. Kakade, and T. Ma, 2020: Optimal regularization can mitigate double descent. arXiv, 2003.01897v2, https://doi.org/10.48550/arXiv.2003.01897.

  • Nakkiran, P., G. Kaplun, Y. Bansal, T. Yang, B. Barak, and I. Sutskever, 2021: Deep double descent: Where bigger models and more data hurt. J. Stat. Mech., 2021, 124003, https://doi.org/10.1088/1742-5468/ac3a74.

    • Search Google Scholar
    • Export Citation
  • O’Gorman, P. A., and J. G. Dwyer, 2018: Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events. J. Adv. Model. Earth Syst., 10, 25482563, https://doi.org/10.1029/2018MS001351.

    • Search Google Scholar
    • Export Citation
  • Oueslati, B., and G. Bellon, 2015: The double ITCZ bias in CMIP5 models: Interaction between SST, large-scale circulation and precipitation. Climate Dyn., 44, 585607, https://doi.org/10.1007/s00382-015-2468-6.

    • Search Google Scholar
    • Export Citation
  • Ramos-Valle, A. N., J. Alland, and A. Bukvic, 2023: Using machine learning to understand relocation drivers of urban coastal populations in response to flooding. Artif. Intell. Earth Syst., 2, 220054, https://doi.org/10.1175/AIES-D-22-0054.1.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Ribeiro, M. T., S. Singh, and C. Guestrin, 2016: “Why should I trust you?”: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, 1135–1144, https://dl.acm.org/doi/10.1145/2939672.2939778.

  • Rienecker, M. M., and Coauthors, 2011: MERRA: NASA’S Modern-Era Retrospective Analysis for Research and Applications. J. Climate, 24, 36243648, https://doi.org/10.1175/JCLI-D-11-00015.1.

    • Search Google Scholar
    • Export Citation
  • Schumacher, C., and A. Funk, 2023: Assessing convective-stratiform precipitation regimes in the tropics and extratropics with the GPM satellite radar. Geophys. Res. Lett., 50, e2023GL102786, https://doi.org/10.1029/2023GL102786.

    • Search Google Scholar
    • Export Citation
  • Sekiyama, T. T., S. Hayashi, R. Kaneko, and K.-i. Fukui, 2023: Surrogate downscaling of mesoscale wind fields using ensemble super-resolution convolutional neural networks. Artif. Intell. Earth Syst., 2, 230007, https://doi.org/10.1175/AIES-D-23-0007.1.

    • Search Google Scholar
    • Export Citation
  • Simonyan, K., and A. Zisserman, 2014: Very deep convolutional networks for large-scale image recognition. arXiv, 1409.1556v6, https://doi.org/10.48550/arXiv.1409.1556.

  • Singh, S. K., and G. A. Griffiths, 2021: Prediction of streamflow recession curves in gauged and ungauged basins. Water Resour. Res., 57, e2021WR030618, https://doi.org/10.1029/2021WR030618.

    • Search Google Scholar
    • Export Citation
  • Sobel, A. H., and Coauthors, 2023: Near-term tropical cyclone risk and coupled Earth system model biases. Proc. Natl. Acad. Sci. USA, 120, e2209631120, https://doi.org/10.1073/pnas.2209631120.

    • Search Google Scholar
    • Export Citation
  • Sood, A., and M. Craven, 2022: Feature importance explanations for temporal black-box models. Proc. Conf. AAAI Artif. Intell., 36, 83518360, https://doi.org/10.1609/aaai.v36i8.20810.

    • Search Google Scholar
    • Export Citation
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Advances in Neural Information Processing Systems, Curran Associates, Inc., 5998–6008, https://www.bibsonomy.org/bibtex/c9bf08cbcb15680c807e12a01dd8c929.

  • Wang, J., R. K. W. Wong, M. Jun, C. Schumacher, R. Saravanan, and C. Sun, 2021: Statistical and machine learning methods applied to the prediction of different tropical rainfall types. Environ. Res. Commun., 3, 111001, https://doi.org/10.1088/2515-7620/ac371f.

    • Search Google Scholar
    • Export Citation
  • Xu, W., R. T. Chen, X. Li, and D. Duvenaud, 2022: Infinitely deep Bayesian neural networks with stochastic differential equations. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, G. Camps-Valls, F. J. R. Ruiz, and I. Valera, Eds., Vol. 151, PMLR, 721–738, https://proceedings.mlr.press/v151/xu22a/xu22a.pdf.

  • Yang, J., M. Jun, C. Schumacher, and R. Saravanan, 2019: Predictive statistical representations of observed and simulated rainfall using generalized linear models. J. Climate, 32, 34093427, https://doi.org/10.1175/JCLI-D-18-0527.1.

    • Search Google Scholar
    • Export Citation
Save
  • Ahmed, F., and C. Schumacher, 2015: Convective and stratiform components of the precipitation-moisture relationship. Geophys. Res. Lett., 42, 10 45310 462, https://doi.org/10.1002/2015GL066957.

    • Search Google Scholar
    • Export Citation
  • Bai, T., and P. Tahmasebi, 2023: Graph neural network for groundwater level forecasting. J. Hydrol., 616, 128792, https://doi.org/10.1016/j.jhydrol.2022.128792.

    • Search Google Scholar
    • Export Citation
  • Bartlett, P. L., P. M. Long, G. Lugosi, and A. Tsigler, 2020: Benign overfitting in linear regression. Proc. Natl. Acad. Sci. USA, 117, 30 06330 070, https://doi.org/10.1073/pnas.1907378117.

    • Search Google Scholar
    • Export Citation
  • Belkin, M., D. Hsu, S. Ma, and S. Mandal, 2019: Reconciling modern machine-learning practice and the classical bias–variance trade-off. Proc. Natl. Acad. Sci. USA, 116, 15 84915 854, https://doi.org/10.1073/pnas.1903070116.

    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 62896298, https://doi.org/10.1029/2018GL078510.

    • Search Google Scholar
    • Export Citation
  • Bretherton, C. S., M. E. Peters, and L. E. Back, 2004: Relationships between water vapor path and precipitation over the tropical oceans. J. Climate, 17, 15171528, https://doi.org/10.1175/1520-0442(2004)017<1517:RBWVPA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Cao, Y., Z. Chen, M. Belkin, and Q. Gu, 2022: Benign overfitting in two-layer convolutional neural networks. NIPS’22: Proc. 36th Int. Conf. on Neural Information Processing Systems, New Orleans, LA, Association for Computing Machinery, 25 237–25 250, https://dl.acm.org/doi/10.5555/3600270.3602100.

  • Cheung, H. M., C.-H. Ho, and M. Chang, 2022: Hybrid neural network models for postprocessing medium-range forecasts of tropical cyclone tracks over the western North Pacific. Artif. Intell. Earth Syst., 1, e210003, https://doi.org/10.1175/AIES-D-21-0003.1.

    • Search Google Scholar
    • Export Citation
  • Dai, A., 2006: Precipitation characteristics in eighteen coupled climate models. J. Climate, 19, 46054630, https://doi.org/10.1175/JCLI3884.1.

    • Search Google Scholar
    • Export Citation
  • d’Ascoli, S., M. Refinetti, G. Biroli, and F. Krzakala, 2020: Double trouble in double descent: Bias and variance (s) in the lazy regime. Proceedings of the 37th International Conference on Machine Learning, Z. Abbas et al., Eds., Vol. 119, PMLR, 2280–2290, https://proceedings.mlr.press/v119/d-ascoli20a.html.

  • Fiedler, S., and Coauthors, 2020: Simulated tropical precipitation assessed across three major phases of the Coupled Model Intercomparison Project (CMIP). Mon. Wea. Rev., 148, 36533680, https://doi.org/10.1175/MWR-D-19-0404.1.

    • Search Google Scholar
    • Export Citation
  • Fisher, A., C. Rudin, and F. Dominici, 2019: All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simultaneously. J. Mach. Learn. Res., 20, 181.

    • Search Google Scholar
    • Export Citation
  • Funk, A., C. Schumacher, and J. Awaka, 2013: Analysis of rain classifications over the tropics by version 7 of the TRMM PR 2A23 algorithm. J. Meteor. Soc. Japan, 91, 257272, https://doi.org/10.2151/jmsj.2013-302.

    • Search Google Scholar
    • Export Citation
  • Gamba, M., E. Englesson, M. Björkman, and H. Azizpour, 2022: Deep double descent via smooth interpolation. arXiv, 2209.10080v4, https://doi.org/10.48550/arXiv.2209.10080.

  • Gao, Y., S. Mahadevan, and Z. Song, 2023: An over-parameterized exponential regression. arXiv, 2303.16504v1, https://doi.org/10.48550/arXiv.2303.16504.

  • Geiger, M., S. Spigler, S. d’Ascoli, L. Sagun, M. Baity-Jesi, G. Biroli, and M. Wyart, 2019: Jamming transition as a paradigm to understand the loss landscape of deep neural networks. Phys. Rev., 100E, 012115, https://doi.org/10.1103/PhysRevE.100.012115.

    • Search Google Scholar
    • Export Citation
  • Hastie, T., A. Montanari, S. Rosset, and R. J. Tibshirani, 2022: Surprises in high-dimensional ridgeless least squares interpolation. Ann. Stat., 50, 949986, https://doi.org/10.1214/21-AOS2133.

    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2015: Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. Proc. IEEE Int. Conf. on Computer Vision, Santiago, Chile, Institute of Electrical and Electronics Engineers, 1026–1034, https://doi.org/10.1109/ICCV.2015.123.

  • Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 17351780, https://doi.org/10.1162/neco.1997.9.8.1735.

    • Search Google Scholar
    • Export Citation
  • Houze, R. A., Jr., 1997: Stratiform precipitation in regions of convection: A meteorological paradox? Bull. Amer. Meteor. Soc., 78, 21792196, https://doi.org/10.1175/1520-0477(1997)078<2179:SPIROC>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Iguchi, T., and R. Meneghini, 2021: GPM DPR precipitation profile L2A 1.5 hours 5 km v07. Goddard Earth Sciences Data and Information Services Center (GES DISC), accessed 15 February 2022, https://doi.org/10.5067/GPM/DPR/GPM/2A/07.

  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • König, G., C. Molnar, B. Bischl, and M. Grosse-Wentrup, 2021: Relative feature importance. 2020 25th Int. Conf. on Pattern Recognition (ICPR), Milan, Italy, Institute of Electrical and Electronics Engineers, 9318–9325, https://doi.org/10.1109/ICPR48806.2021.9413090.

  • LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436444, https://doi.org/10.1038/nature14539.

  • Lee, S., M. L’Heureux, A. T. Wittenberg, R. Seager, P. A. O’Gorman, and N. C. Johnson, 2022: On the future zonal contrasts of equatorial pacific climate: Perspectives from observations, simulations, and theories. npj Climate Atmos. Sci., 5, 82, https://doi.org/10.1038/s41612-022-00301-2.

    • Search Google Scholar
    • Export Citation
  • Liu, C., L. Zhu, and M. Belkin, 2022: Loss landscapes and optimization in over-parameterized non-linear systems and neural networks. Appl. Comput. Harmonic Anal., 59, 85116, https://doi.org/10.1016/j.acha.2021.12.009.

    • Search Google Scholar
    • Export Citation
  • Longman, R. J., A. J. Newman, T. W. Giambelluca, and M. Lucas, 2020: Characterizing the uncertainty and assessing the value of gap-filled daily rainfall data in Hawaii. J. Appl. Meteor. Climatol., 59, 12611276, https://doi.org/10.1175/JAMC-D-20-0007.1.

    • Search Google Scholar
    • Export Citation
  • Lucas, M. P., R. J. Longman, T. W. Giambelluca, A. G. Frazier, J. Mclean, S. B. Cleveland, Y.-F. Huang, and J. Lee, 2022: Optimizing automated kriging to improve spatial interpolation of monthly rainfall over complex terrain. J. Hydrometeor., 23, 561572, https://doi.org/10.1175/JHM-D-21-0171.1.

    • Search Google Scholar
    • Export Citation
  • Lundberg, S. M., and S.-I. Lee, 2017: A unified approach to interpreting model predictions. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, Curran Associates Inc., 4768–4777, https://dl.acm.org/doi/10.5555/3295222.3295230.

  • McCullagh, P., and J. A. Nelder, 1989: Generalized Linear Models. 2nd ed. Chapman and Hall, 526 pp.

  • Muthukumar, V., A. Narang, V. Subramanian, M. Belkin, D. Hsu, and A. Sahai, 2021: Classification vs regression in overparameterized regimes: Does the loss function matter? J. Mach. Learn. Res., 22, 10 10410 172.

    • Search Google Scholar
    • Export Citation
  • Nakkiran, P., P. Venkat, S. Kakade, and T. Ma, 2020: Optimal regularization can mitigate double descent. arXiv, 2003.01897v2, https://doi.org/10.48550/arXiv.2003.01897.

  • Nakkiran, P., G. Kaplun, Y. Bansal, T. Yang, B. Barak, and I. Sutskever, 2021: Deep double descent: Where bigger models and more data hurt. J. Stat. Mech., 2021, 124003, https://doi.org/10.1088/1742-5468/ac3a74.

    • Search Google Scholar
    • Export Citation
  • O’Gorman, P. A., and J. G. Dwyer, 2018: Using machine learning to parameterize moist convection: Potential for modeling of climate, climate change, and extreme events. J. Adv. Model. Earth Syst., 10, 25482563, https://doi.org/10.1029/2018MS001351.

    • Search Google Scholar
    • Export Citation
  • Oueslati, B., and G. Bellon, 2015: The double ITCZ bias in CMIP5 models: Interaction between SST, large-scale circulation and precipitation. Climate Dyn., 44, 585607, https://doi.org/10.1007/s00382-015-2468-6.

    • Search Google Scholar
    • Export Citation
  • Ramos-Valle, A. N., J. Alland, and A. Bukvic, 2023: Using machine learning to understand relocation drivers of urban coastal populations in response to flooding. Artif. Intell. Earth Syst., 2, 220054, https://doi.org/10.1175/AIES-D-22-0054.1.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Ribeiro, M. T., S. Singh, and C. Guestrin, 2016: “Why should I trust you?”: Explaining the predictions of any classifier. Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, San Francisco, CA, Association for Computing Machinery, 1135–1144, https://dl.acm.org/doi/10.1145/2939672.2939778.

  • Rienecker, M. M., and Coauthors, 2011: MERRA: NASA’S Modern-Era Retrospective Analysis for Research and Applications. J. Climate, 24, 36243648, https://doi.org/10.1175/JCLI-D-11-00015.1.

    • Search Google Scholar
    • Export Citation
  • Schumacher, C., and A. Funk, 2023: Assessing convective-stratiform precipitation regimes in the tropics and extratropics with the GPM satellite radar. Geophys. Res. Lett., 50, e2023GL102786, https://doi.org/10.1029/2023GL102786.

    • Search Google Scholar
    • Export Citation
  • Sekiyama, T. T., S. Hayashi, R. Kaneko, and K.-i. Fukui, 2023: Surrogate downscaling of mesoscale wind fields using ensemble super-resolution convolutional neural networks. Artif. Intell. Earth Syst., 2, 230007, https://doi.org/10.1175/AIES-D-23-0007.1.

    • Search Google Scholar
    • Export Citation
  • Simonyan, K., and A. Zisserman, 2014: Very deep convolutional networks for large-scale image recognition. arXiv, 1409.1556v6, https://doi.org/10.48550/arXiv.1409.1556.

  • Singh, S. K., and G. A. Griffiths, 2021: Prediction of streamflow recession curves in gauged and ungauged basins. Water Resour. Res., 57, e2021WR030618, https://doi.org/10.1029/2021WR030618.

    • Search Google Scholar
    • Export Citation
  • Sobel, A. H., and Coauthors, 2023: Near-term tropical cyclone risk and coupled Earth system model biases. Proc. Natl. Acad. Sci. USA, 120, e2209631120, https://doi.org/10.1073/pnas.2209631120.

    • Search Google Scholar
    • Export Citation
  • Sood, A., and M. Craven, 2022: Feature importance explanations for temporal black-box models. Proc. Conf. AAAI Artif. Intell., 36, 83518360, https://doi.org/10.1609/aaai.v36i8.20810.

    • Search Google Scholar
    • Export Citation
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Advances in Neural Information Processing Systems, Curran Associates, Inc., 5998–6008, https://www.bibsonomy.org/bibtex/c9bf08cbcb15680c807e12a01dd8c929.

  • Wang, J., R. K. W. Wong, M. Jun, C. Schumacher, R. Saravanan, and C. Sun, 2021: Statistical and machine learning methods applied to the prediction of different tropical rainfall types. Environ. Res. Commun., 3, 111001, https://doi.org/10.1088/2515-7620/ac371f.

    • Search Google Scholar
    • Export Citation
  • Xu, W., R. T. Chen, X. Li, and D. Duvenaud, 2022: Infinitely deep Bayesian neural networks with stochastic differential equations. Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, G. Camps-Valls, F. J. R. Ruiz, and I. Valera, Eds., Vol. 151, PMLR, 721–738, https://proceedings.mlr.press/v151/xu22a/xu22a.pdf.

  • Yang, J., M. Jun, C. Schumacher, and R. Saravanan, 2019: Predictive statistical representations of observed and simulated rainfall using generalized linear models. J. Climate, 32, 34093427, https://doi.org/10.1175/JCLI-D-18-0527.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Illustration of the double descent phenomenon. The training curve is represented by the black line, while the test curve is depicted in blue. The point on the x axis where the training error theoretically reaches zero is the “interpolation threshold.”

  • Fig. 2.

    The average GPM DPR rain rate (mm day−1) during JJA 2015–22 over the tropical Pacific. The data domains for this study were separated into the WP (blue) and EP (red) regions.

  • Fig. 3.

    Training and test curves for the WP (left two columns) and EP (right two columns). In each subpanel, the training results are shown on the left and the test results are shown on the right. The ratio of the number of parameters to the number of samples is on the x axis, and the RMSE values (mm h−1) are on the y axis. The theoretical interpolation threshold is marked as a blue dashed line. Black, red, and blue curves represent NNs with 4, 5, and 6 layers, respectively.

  • Fig. 4.

    Rain-rate percentile plots (mm h−1) for 0.5° grids in the WP and EP regions for each rain type. Values on the x axis indicate the 90%, 99%, and 99.9% percentiles from the training and test datasets for each domain and rain type.

  • Fig. 5.

    Maps of the predicted mean 0.5° JJA rain rates (mm h−1) from 2019 to 2022 over the WP region for (a) stratiform, (b) deep convective, and (c) shallow convective rain types using results from the Over NN, Under NN, RF, and GLM. The medians of absolute error are presented in parentheses next to each model name.

  • Fig. 6.

    As in Fig. 5, but over the EP region.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 288 288 55
PDF Downloads 234 234 36