Application of Deep Learning to Understanding ENSO Dynamics

Na-Yeon Shin aDivision of Environmental Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea

Search for other papers by Na-Yeon Shin in
Current site
Google Scholar
PubMed
Close
,
Yoo-Geun Ham bDepartment of Oceanography, Chonnam National University, Gwangju, South Korea

Search for other papers by Yoo-Geun Ham in
Current site
Google Scholar
PubMed
Close
,
Jeong-Hwan Kim bDepartment of Oceanography, Chonnam National University, Gwangju, South Korea

Search for other papers by Jeong-Hwan Kim in
Current site
Google Scholar
PubMed
Close
,
Minsu Cho cDepartment of Computer Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea

Search for other papers by Minsu Cho in
Current site
Google Scholar
PubMed
Close
, and
Jong-Seong Kug aDivision of Environmental Science and Engineering, Pohang University of Science and Technology, Pohang, South Korea

Search for other papers by Jong-Seong Kug in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Many deep learning technologies have been applied to the Earth sciences. Nonetheless, the difficulty in interpreting deep learning results still prevents their applications to studies on climate dynamics. Here, we applied a convolutional neural network to understand El Niño–Southern Oscillation (ENSO) dynamics from long-term climate model simulations. The deep learning algorithm successfully predicted ENSO events with a high correlation skill (∼0.82) for a 9-month lead. For interpreting deep learning results beyond the prediction, we present a “contribution map” to estimate how much the grid box and variable contribute to the output and “contribution sensitivity” to estimate how much the output variable is changed to the small perturbation of the input variables. The contribution map and sensitivity are calculated by modifying the input variables to the pretrained deep learning, which is quite similar to the occlusion sensitivity. Based on the two methods, we identified three precursors of ENSO and investigated their physical processes with El Niño and La Niña development. In particular, it is suggested here that the roles of each precursor are asymmetric between El Niño and La Niña. Our results suggest that the contribution map and sensitivity are simple approaches but can be a powerful tool in understanding ENSO dynamics and they might be also applied to other climate phenomena.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jong-Seong Kug, jskug@postech.ac.kr

Abstract

Many deep learning technologies have been applied to the Earth sciences. Nonetheless, the difficulty in interpreting deep learning results still prevents their applications to studies on climate dynamics. Here, we applied a convolutional neural network to understand El Niño–Southern Oscillation (ENSO) dynamics from long-term climate model simulations. The deep learning algorithm successfully predicted ENSO events with a high correlation skill (∼0.82) for a 9-month lead. For interpreting deep learning results beyond the prediction, we present a “contribution map” to estimate how much the grid box and variable contribute to the output and “contribution sensitivity” to estimate how much the output variable is changed to the small perturbation of the input variables. The contribution map and sensitivity are calculated by modifying the input variables to the pretrained deep learning, which is quite similar to the occlusion sensitivity. Based on the two methods, we identified three precursors of ENSO and investigated their physical processes with El Niño and La Niña development. In particular, it is suggested here that the roles of each precursor are asymmetric between El Niño and La Niña. Our results suggest that the contribution map and sensitivity are simple approaches but can be a powerful tool in understanding ENSO dynamics and they might be also applied to other climate phenomena.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jong-Seong Kug, jskug@postech.ac.kr

1. Introduction

There have been continuous efforts to understand weather/climate phenomena for a long time. Classically, weather/climate phenomena have been analyzed by adopting linear analyses such as simple (or multiple) linear regression, correlation, or empirical orthogonal function. However, with the presence of their intrinsic nonlinear characteristics (Burgers and Stephenson 1999; Kang and Kug 2002; An and Jin 2004; An et al. 2005; Liu and Alexander 2007; Stan et al. 2017; Jiménez‐Esteve and Domeisen 2019; Domeisen et al. 2019), understanding weather/climate phenomena by applying linear methods have inevitable limitations. Thus, nonlinear methods have been introduced to climate studies (Grieger and Latif 1994; Hsieh and Tang 1998; Monahan 2001; Gámez et al. 2004; Ross et al. 2008; Mukhin et al. 2015) to overcome such limitations.

In the early days, feed-forward neural networks were widely used to consider the nonlinearity of climate phenomena (Hsieh and Tang 1998), which recently developed into various deep learning algorithms (Reichstein et al. 2019). In this regard, deep learning has exhibited superior performance in detecting weather features such as hurricanes, clouds, and weather fronts (Liu et al. 2016; Racah et al. 2016; Xie et al. 2016; Biard and Kunkel 2019; Prabhat et al. 2021). It has been also applied to weather forecasts and climate variability predictions (Shi et al. 2015; Ham et al. 2019; Ise and Oba 2019; Herman and Schumacher 2018; Chattopadhyay et al. 2020b,c; Rasp and Lerch 2018; Nooteboom et al. 2018; Dueben and Bauer 2018; Toms et al. 2021; Ham et al. 2021; Kim et al. 2022). Rodrigues et al. (2018) used a superresolution method to convert low-resolution data into high-resolution data. Also, other studies improved the parameterization of convection and clouds using deep learning (Schneider et al. 2017; Gentine et al. 2018; Brenowitz and Bretherton 2018; Rasp et al. 2018; O’Gorman and Dwyer 2018). Furthermore, Scher (2018) attempted to emulate the complete physics and dynamics of the general circulation model (GCM) and add information or extract features from GCM through deep learning. These studies were based on convolutional neural networks (CNNs) (Lecun et al. 1998; LeCun et al. 2015; Goodfellow et al. 2016), which use specific pattern recognition and image processing, or recurrent neural networks (RNNs)/long- and short-term memory networks, which consider the temporal evolution of target phenomena (Elman 1990; Hochreiter and Schmidhuber 1997; Kim et al. 2021). Autoclustered datasets by unsupervised learning were also used for training (Chattopadhyay et al. 2020a) to overcome the inconvenience of manually labeling datasets.

Although deep learning has successfully predicted, detected, and processed various meteorology/climate phenomena, some unsolved problems in applying deep learning to meteorology and other fields are related to limitations of the objective interpretation. In this light, various efforts have been made to deduce reasonable results without prejudice. For instance, Simonyan et al. (2013) developed a saliency map that finds the maximum change in the classification score against the minimum change in the input pixel. In other words, the saliency map indicates which parts of the input image are sensitive to the output class using the gradients. Meanwhile, Zeiler and Fergus (2013) introduced a deconvolutional network (deconvnet) method. They implemented a convolutional network (convnet) method in the opposite direction to visualize feature maps. Concretely, it reconstructed a feature map with the same dimension as that of a previous layer (upsampling) using max-unpooling with the location of the local max recorded during convnet. Zhou et al. (2016) visualized the output variable through global average pooling without a fully connected layer using a class activation map (CAM). Ham et al. (2019) produced a heat map while maintaining spatial information instead of flattening the last convolutional layer as in CAM. McGovern et al. (2019) applied the previously introduced deep learning techniques for interpretation and visualization to meteorology. They described the characteristics of each technique to make the black box more transparent. Toms et al. (2020) applied layerwise relevance propagation to geoscience for physical interpretation.

However, despite various attempts for the physical interpretation of deep learning results, understanding dynamic interpretation in deep learning is still challenging. Therefore, we present an easy and simple method to interpret the deep learning results for understanding climate phenomena, by adopting the concept of occlusion (Zeiler and Fergus 2013). El Niño–Southern Oscillation (ENSO) is the most dominant climate phenomenon with a distinguishable impact on a global scale and is relatively well understood with considerable complexity and nonlinearity (Bjerknes 1969; Wyrtki 1975; Rasmusson and Carpenter 1982; Jin 1997a,b; Kang and Kug 2002; Kug et al. 2009; Timmermann et al. 2018; Cai et al. 2019; Shin et al. 2021). Moreover, ENSO is the best-simulated climate phenomenon in climate models, enabling the extension of the dataset size through GCM, as in previous studies (Yu and Mechoso 2001; Kim et al. 2008; Russon et al. 2014; Wittenberg 2009; Kug et al. 2010a; Ham and Kug 2012). This is useful for compensating for the lack of observational datasets. In this study, we show an example of how to interpret deep learning results on ENSO dynamics using long-term model simulation.

2. Data and method

a. Data

Increasing the sample size is one of the most critical issues in applying deep learning to climate studies. Insufficient sample sizes for the training lead to overfitting, which degrades the performance of deep learning. Thus, we used long-term climate simulation data of more than 1000 years to prevent overfitting. The monthly output from the Geophysical Fluid Dynamics Laboratory Coupled Model 2.1 (GFDL-CM2.1) (Gnanadesikan et al. 2006; Delworth et al. 2006; Wittenberg et al. 2006; Lim et al. 2019a,b) control experiment was used for the training and test sets of our deep learning model. Here, we refer to the deep learning (DL) model as a DL network and the GFDL as a model to avoid terminological confusion between the climate model and the deep learning model. In particular, the model has been used in many previous studies as it is one of the models that simulate ENSO dynamics well (Kug et al. 2010b; Atwood et al. 2017).

b. CNN

CNN, one of the most popular neural networks for deep learning, has proven useful for analyzing images (LeCun and Bengio 1995). It can learn to extract effective features and classify an image into target classes from a training dataset of images. The key advantage of CNN is the preservation of the spatial structure in intermediate layers while processing the image. For each convolutional layer of CNN, the output feature map retains the same spatial arrangement as that of the input when it uses zero padding. The convolutional layer applies learned convolution kernels at each position on the input feature map to produce more effective and distinct features as an output. Along with the convolutional layers, the spatial dimension of feature maps is often reduced by subsampling or (max-) pooling operations to derive a classification or regression value as a final output at the end of the network. In our work, as the climate data have similar spatial characteristics to those of the images, we used CNN for regression as our DL network.

c. Architecture of the DL network

We adopted a CNN architecture similar to that of Ham et al. (2019) as the DL network for ENSO prediction. Figure 1 illustrates the architecture of the network. It consists of three convolutional layers, a fully connected layer, and one output layer. Each convolutional layer has 30 activation maps of 12 × 72 dimension with a 2 × 2 convolutional kernel with a stride of 2 and the fully connected layer contains 50 units. It does not have any max-pooling layer. The detailed reasons will be explained in the following subsection. The final output was obtained by averaging the results of 20 ensembles of networks using different random weight initializations. Each ensemble of the DL network has 10 epochs. In the present DL network, we applied the rectified linear unit (Glorot et al. 2011) as an activation function after three convolutional layers and one fully connected layer except for the output layer. The mean-square error was chosen as a loss function, and we used the Adam optimizer (Kingma and Ba 2014).

Fig. 1.
Fig. 1.

The architecture of our DL network. The main framework is taken from Ham et al. (2019). The input has two variables (SSTa and SSHa), and the DL network consists of three convolutional layers, a fully connected layer, and one output layer. This figure is generated by adapting the code from https://github.com/gwding/draw_convnet.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

The input variables are 3-month averaged sea surface temperature anomaly (SSTa) and sea surface height anomaly (SSHa) in March–May (MAM) over 30°S–30°N and the whole longitude. The target variable is the Niño-3.4 SSTa index (5°S–5°N, and 170°–120°W) for December–February (DJF). The resolution of each input variable is 2.5° latitude × 2.5° longitude. The land points of SST and SSH are masked with zero. The target variable is normalized by the standard deviation (STD) of the DJF Niño-3.4 SSTa index. Input variables are also normalized by the area-averaged STD to match the scale of the data. The training and test periods are 800 and 200 years, respectively. Because we have 1000 years of data (1001–2000), we made five datasets by switching the training and test periods. Therefore, we have 1000-yr predictions from five 200-yr test datasets.

d. Contribution map

Our “contribution map” is a map representing the contribution of each variable for each grid box to the output value, by adopting the concept of the “occlusion sensitivity” method introduced by Zeiler and Fergus (2013). In the occlusion sensitivity, it was tested to demonstrate the validity of the feature maps they produced after they visualized the feature map of each layer using deconvnet. They occluded a part of an image with a gray square and put it in a pretrained convnet to monitor its classification. The results showed that if an important part selected by the feature map was occluded, the probability of correct classification was significantly low, whereas the predictions tended to be correct otherwise.

In our contribution map, we use zero instead of a gray square for occlusion because we are dealing with climate data from an anomaly perspective. Also, since it is a regression problem rather than a classification, the effect of occlusion is judged by the difference between the original prediction and the new prediction after the occlusion. Specifically, we computed the root-mean-square difference (RMSD) for all predictions from our test datasets to generate a contribution map. The contribution map is designed to estimate how much each input variable contributes to the output variable in a relative manner on the average, discerning which variables are important for ENSO prediction. The contribution map is calculated using a pretrained DL network. First, we reproduced the ENSO prediction by modulating the input variables. That is, after replacing the input variable at grid box (2 × 2) with zero to eliminate its anomalous effect, we produced a new ENSO prediction using the pretrained DL network. Then, we calculated the difference from the original ENSO prediction. This process by the replacement with zero is repeated for every grid box and variable. The difference between the new output variable [P1 in Eq. (1)] and the original output variable [P2 in Eq. (1)] could be interpreted as the contribution of the input variable at a certain grid box. We set the size of the zero-out grid box to be the same as that of the convolution kernel (2 × 2) and the convolution stride to remove the effect of each grid box completely. This makes the convolutional kernel process zero-out grid box without being affected by nonzero points. Therefore, the resolution of the contribution map is 5° latitude × 5° longitude. Since the kernel size affects the resolution of the contribution map, we used 2 × 2 kernels and 2 strides to obtain a relatively sophisticated contribution map. However, increasing kernel size does not change the main results (not shown).

To quantify the contribution of each grid box and variable we calculated RMSD in 1000-yr output variable over 20 ensembles as follows:
RMSD(x,y)=1mj=1m{i=1n[P1(x,y)i,jP2(x,y)i,j]2n},
where x, y, m, and n are the longitude, the latitude, the number of ensembles (m = 20), and time samples (n = 1000), respectively. This RMSD can be interpreted as the time-averaged contribution to the output variable. Relatively large values indicate that the variables at the grid box are vital for changing the output variable on average. For example, in the case of the ENSO prediction, larger RMSD values suggest that these variables are important precursors of ENSO. Here, we will identify precursors of ENSO and understand dynamical processes with them through the contribution map.

e. Contribution sensitivity

Since the contribution map represents only simple differences, it can be limited to clearly understanding why such differences occur. A large contribution can be contributed by two effects. First, if the magnitude of an input variable is sufficiently large, the contribution can be large when it is removed. Second, if the target output is sensitive to some input variables, even small changes in the input variables lead to large changes in the target output. If we can separate two effects from the large contributions, it will be quite beneficial to understand the roles of the ENSO precursors. To do this, we suggest contribution sensitivity by using the trained DL network with a small perturbation of the input variables (Zeiler and Fergus 2013; Zhou and Troyanskaya 2015; Zintgraf et al. 2017). After the detection of important precursors to ENSO prediction using the contribution map, we estimate how much the output variable sensitively responds to small perturbations in the input variables. Similar to that in the contribution map, we used the pretrained DL network. To measure the contribution sensitivity, we add or subtract a small perturbation from the original input values in the area of the precursor, then produce the new prediction, and calculate the difference from the original output variable. In this study, the small perturbation indicates 0.5 STD at each grid point. When the input variables are slightly changed by adding/subtracting a small perturbation, the changes in the output variables depend on the sensitivity. Even though the sign of input variables near zero can be changed due to small perturbations, it does not affect the sensitivity differently because the sign of the perturbation itself is important for the changes in output variable.

3. Results

a. Skills of the DL network prediction

We calculated the correlation between the target and the output variables during the test period (5 × 200 years) to verify the skill of our DL network. As mentioned in section 2c, we separated 1000-yr data into five combinations with 800 and 200 years for the training and test periods, respectively. We calculated the correlation coefficient between the target and the output variables while changing the lead month of the input variables (Fig. 2a). For each lead month, a new DL network is trained so that they are completely independent. To obtain the statistical significance, we conducted random sampling for both the ensemble mean of DL predictions and target variables with replacement 10 000 times over the 1000 years using the bootstrap method. Evidently, the DL network successfully predicts DJF Niño-3.4 SSTa for the long-lead forecasts. In particular, the correlation skill in the perfect model framework is more than 0.8 at a 9-month lead forecast (Fig. 2b), suggesting that more than 60% of the variability of DJF Niño-3.4 SSTa can be explained by MAM precursors. Therefore, on the basis of the present DL network with predictive skill, we attempt to find which MAM precursor is important for the DJF Niño-3.4 SSTa by applying the contribution map and the sensitivity.

Fig. 2.
Fig. 2.

(a) Lead-month correlation skills between the DL network output of the test set and the target variables (DJF Niño-3.4 SSTa). The black solid line represents the correlation coefficients of 20 ensembles mean. The gray shaded area indicates a 95% confidence level using the bootstrap method. The red dotted lines denote the 9-month lead (using MAM season as the input) and the corresponding correlation skill. (b) Relationship between the DL network output variables (x axis) and the target variables (y axis). The scatters indicate DJF Niño-3.4 SSTa indices, and the black line is y = x.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

b. Contribution map compared with the linear correlation map

Deep learning has the great advantage of learning not only linearity but also nonlinearity through an activation function. In this light, the contribution map is utilized to detect important precursors. Before applying the contribution map, we examined a linear correlation coefficient map to understand the underlying linear relations between the input and output variables (Figs. 3a,c). Generally, the correlation coefficients between MAM SSTa and DJF Niño-3.4 SSTa (Fig. 3a) are weaker than those between MAM SSHa and DJF Niño-3.4 SSTa (Fig. 3c) because the SSH has longer temporal memories than those of the SST as its representation of the subsurface temperature signal. For the SSTa, a relatively high positive correlation appears in the North Pacific, possibly related to Pacific meridional modes (Vimont et al. 2001, 2003a,b; Alexander et al. 2010; Zhao et al. 2020). In contrast, negative correlations appear in the Atlantic and Indian Oceans, which are also consistent with observational relations (Kug et al. 2005; Ham et al. 2013; Cai et al. 2020). For SSHa, DJF Niño-3.4 SSTa has a strong positive linear relationship in the equatorial western–central Pacific, suggesting that the recharge of the equatorial heat content proceeds the El Niño development (Jin 1997a,b; Jin and An 1999). Some negative correlations over other basins and the subtropical Pacific are observed, but they are relatively weak (Fig. 3c). Overall, the well-known ENSO precursors clearly appear in the linear correlation map. Therefore, one may conclude that the most dominant precursors of ENSO during MAM are off-equatorial SSTa and the western Pacific SSHa in the Pacific basin.

Fig. 3.
Fig. 3.

(left) The linear correlation maps between DJF Niño-3.4 SSTa and input variables and (right) the contribution map. (a),(b) MAM SSTa; (c),(d) MAM SSHa. The black boxes in the contribution maps indicate the regions that were selected as precursors due to their large contributions.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

However, our nonlinear analysis shows somewhat different results. Figures 3b and 3d show the contribution maps representing the relative contribution of the input variables to the output variable. It is shown that SSTa in the equatorial Pacific has a greater contribution than in other regions (Fig. 3b), suggesting that the equatorial Pacific MAM SSTa is important in predicting DJF Niño-3.4 SSTa, consistent with Kug et al. (2010a). The SSTa in the north tropic and the eastern Indian Ocean also have unignorable contributions, but its magnitude is smaller than that in the equatorial Pacific. In addition, a relatively large contribution appears in the SSHa over the western–central Pacific (Fig. 3d). In particular, SSHa in the equatorial central and western South Pacific has considerably large contributions. Comparing the linear correlation map with the contribution map in Figs. 3c and 3d, the SSHa contribution map matches the linear correlation well, although the equatorial SSHa signal further shifts to the east in the contribution map. However, the contribution map of the SSTa (Fig. 3b) is quite different from the linear correlation map. Particularly, the contribution map shows a large signal in the equatorial central Pacific, where the linear correlation is almost zero. This suggests a strong nonlinear relation between the MAM central Pacific SSTa and DJF Niño-3.4 SSTa; thus, the linear approach cannot represent the role of the equatorial central Pacific SSTa. We will deal with this issue later.

Based on the contribution map, three highlighted regions are taken as key precursors: the central Pacific SSTa (referred to as CP_SST, 5°S–5°N, and 180°–100°W), western South Pacific SSHa (WP_SSH, 5°–20°S, and 150°E–180°), and equatorial central Pacific SSHa (CP_SSH, 5°S–5°N, and 180°–120°W). Although some large signals are observed in the Indian and Atlantic Oceans, we will only discuss the predictors in the Pacific domain to demonstrate the usefulness of the contribution map. Hereafter, we will examine these three precursors and show how we can interpret the DL network results.

c. Central Pacific SSH

The first precursor is CP_SSH, which has a moderate degree of linear relationship with ENSO. Figure 4a shows the scatter diagram of the normalized MAM CP_SSH and normalized DJF Niño-3.4 SSTa to comprehend their relationship from the model data (blue dots). The correlation coefficient between CP_SSH and DJF Niño-3.4 SSTa is 0.36, which is statistically significant at the 99% confidence level. This indicates that positive (negative) SSHa in the equatorial central Pacific during MAM tend to proceed to El Niño (La Niña). This is consistent with the recharge oscillator theory, which reflects the recharge (discharge) of warm water volumes in the development phase of El Niño (La Niña) (Jin 1997a,b; Meinen and McPhaden 2000).

Fig. 4.
Fig. 4.

(a) Relations between the normalized MAM CP_SSH (x axis) and normalized DJF Niño-3.4 SSTa (y axis). The blue and red scatter indicate DJF Niño-3.4 SSTa from the target variables and DL network output variables, respectively. (b) The effect of MAM CP_SSH from the reproduced DL network output variables (y axis), calculated using the difference between the original output and reproduced output variables after replacing CP_SSH with zero. The x axis is as in (a). All values are unitless because of normalization.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

The above relationship is also shown in the present DL network. In Fig. 4a, the output variables of the DL network (red dots) also show a linear relationship with CP_SSH. The correlation coefficient is 0.42, which is statistically significant at the 99% confidence level. However, as CP_SSH is dynamically coupled to climate anomalies over not only adjacent but also remote regions, we cannot be sure whether this linear relationship is induced by the sole role of CP_SSH. To isolate the role of CP_SSH, we produce the new prediction with the pretrained DL network by replacing SSHa in the central Pacific (5°S–5°N, and 180°–120°W) with zero. Then, we calculated the difference between the original output and the new output variables, which can be interpreted as the effect of CP_SSH. As shown in Fig. 4b, the CP_SSH and the difference show a positive relationship. In other words, positive (negative) SSHa contributes to increasing Niño-3.4 SSTa 9-month later in the DL network, which is quite consistent with the linear relationship shown in Fig. 4a. Interestingly, it is found that the effect of positive CP_SSH is more efficient than that of negative CP_SSH, suggesting an asymmetric role in the ENSO phase.

This asymmetry is also supported by the contribution sensitivity results. The contribution sensitivity measures how sensitive the output variable responds to small perturbations in the input variables. When adding (red) or subtracting (blue) identical small perturbations (0.5 STD) in the CP_SSH area, the magnitude of the response is different depending on the CP_SSH’s sign and magnitude (Fig. 5a). That is, the change in the output variable is more sensitive when CP_SSH is positive. Figure 5b shows the average values of the differences within the ±0.2 STD range at each level of CP_SSH (0, ±1, and ±2 STD). The asymmetry in the differences in response to positive/negative CP_SSH is evident (Figs. 5a,b). For example, even if a small and equal perturbation is added or subtracted, the response to positive CP_SSH at 2 STD is about 3.8 times greater than that to the negative CP_SSH at −2 STD. This suggests that the CP_SSH can be a more credible precursor for El Niño, than for La Niña.

Fig. 5.
Fig. 5.

(a) Differences between the reproduced output variables with contribution sensitivity and the original output variables for MAM CP_SSH. The contribution sensitivity results upon adding and subtracting 0.5S TD are shown in red and blue, respectively. The nonlinear fitting lines are calculated using polynomial regression with n = 3°. (b) The average values in each STD for the ±0.2 STD range. All values are unitless because of normalization.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

Then, how can we interpret this asymmetric role of CP_SSH? Although more detailed analyses are needed, the stronger responses to the positive CP_SSH may reflect the stronger Bjerknes feedback during El Niño than that during the La Niña (Kang and Kug 2002; Burgers and Stephenson 1999; An and Jin 2004; Rodgers et al. 2004; McPhaden and Zhang 2009; Takahashi and Dewitte 2016; Choi et al. 2013; Cai et al. 2015; Kessler 2002; Hannachi et al. 2003; Su et al. 2010; Takahashi et al. 2011). Given the same magnitudes of the initial equatorial heat recharge and discharge, the positive initial perturbation may grow faster due to the strong Bjerknes feedback, reflecting a stronger relationship between the positive CP_SSH and DJF Niño-3.4 SSTa. Although the asymmetric features of ENSO are already known in the El Niño community (Burgers and Stephenson 1999; Kang and Kug 2002; Monahan 2001; An and Jin 2004), our contribution sensitivity analysis can give clear evidence for the nonlinear relationship such as the nonlinear response of DJF Niño-3.4 to spring CP_SSH.

d. Contribution maps for El Niño and La Niña

As previously discussed, the precursor may have an asymmetric relationship with the DJF Niño-3.4. Given this asymmetry, precursors that have an important influence on El Niño and La Niña may be different. However, the contribution map represented in Fig. 3b does not reflect the asymmetry of ENSO well because the contribution is calculated for all periods. Is there any way to separate the precursors for El Niño and La Niña phases? To address this issue, we calculated the contribution map separately. Figure 6 shows the contribution map for El Niño and La Niña. We classified the target value (DJF Niño-3.4 SSTa) as El Niño when it is greater than its 1 STD and La Niña when it is less than −1 STD. Then, we represented the contribution maps for El Niño and La Niña. In SSTa, the spatial pattern is generally similar for El Niño and La Niña, but the contribution of the equatorial central Pacific is slightly larger for the La Niña cases. However, the most pronounced differences between the El Niño and La Niña cases appear in SSHa (Figs. 6c,d). CP_SSH is stronger in El Niño and weaker in La Niña, as shown in section 3c. Conversely, WP_SSH is stronger in La Niña and weaker in El Niño, suggesting that WP_SSH is a more important precursor for the La Niña development. It has not been reported well in previous studies that CP_SSH and WP_SSH are asymmetrically related to ENSO. We will cover this in the following subsection.

Fig. 6.
Fig. 6.

The contribution maps calculated for (left) El Niño and (right) La Niña. (a),(b) MAM SSTa; (c),(d) MAM SSHa. The black boxes in the contribution maps are the regions denoted in Figs. 3b and 3d.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

e. Western Pacific SSH

As with CP_SSH, the relationship between MAM WP_SSH and DJF Niño-3.4 SSTa is confirmed through the scatter diagram in Fig. 7a. The correlation coefficient between WP_SSH and DJF Niño-3.4 SSTa is 0.28 (blue dots), which is significant at the 99% confidence level but weaker than that for CP_SSH. The correlation coefficient increases (0.34) when only considering negative WP_SSH, but it weakens (0.11) for the positive WP_SSH. In addition to the asymmetric relation, WP_SSH has a strong negatively skewed distribution (skewness = −0.41). Quantitatively, the number of the negative WP_SSH cases (less than −1.5 STD) is 97, which is more than double the number of positive cases (greater than 1.5 STD, 42 cases). Note that WP_SSH is mainly developed by preceding El Niño and La Niña signals. Because the amplitude of El Niño is stronger than that of La Niña in the model simulation and the atmospheric response is asymmetric (Choi et al. 2013; Takahashi and Dewitte 2016; Zheng et al. 2014), the strong negative WP_SSH are greater and more frequent, which leads to the negative skewness.

Fig. 7.
Fig. 7.

As in Fig. 4, but for MAM WP_SSH. The black dotted line indicates |1.5 STD| in (a).

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

The DL network also represents the relationship between WP_SSH and the output variables. The correlation is 0.33 (red dots), which is significant at the 99% confidence level (Fig. 7a). Although the overall relation looks weaker than that of CP_SSH, this might be due to the mixed effects of other factors. To isolate the WP_SSH effect, we produced the new predictions with the DL network after WP_SSH (5°–20°S, and 150°E–180°) was set to zero. Remarkably, WP_SSH has a robust linear relation with the differences in the output variables in Fig. 7b. It is also quite symmetric between the positive and negative phases. We also calculated the contribution sensitivity of WP_SSH, as shown in Fig. 8. The contribution sensitivity also shows a clear linear and symmetric response to WP_SSH changes (Fig. 8). These results suggest that the positive (negative) WP_SSH is linearly related to the increase (decrease) in Niño-3.4 SSTa 9-month later.

Fig. 8.
Fig. 8.

As in Fig. 5, but for MAM WP_SSH.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

Interestingly, compared with CP_SSH (Fig. 4b), WP_SSH has a stronger contribution to La Niña development than to El Niño development. This relative importance is also clearly shown in Fig. 6d. This stronger contribution can be explained by the negative skewness of WP_SSH. Although the response to a given WP_SSH is symmetric (Fig. 8), larger magnitudes of WP_SSH lead to stronger contributions to the La Niña development than that of CP_SSH (Figs. 6c,d). On this wise, we can perceive the asymmetric characteristic of WP_SSH for the DJF Niño-3.4 SSTa index by collectively utilizing both contribution map and sensitivity.

f. Central Pacific SST

Last, the most interesting precursor, CP_SST, is analyzed. As noted earlier, CP_SST shows a strong contribution to DJF Niño-3.4 SSTa, even though its linear relationship is not clear in Fig. 3a (corr = 0.04, blue dots). As shown in Fig. 9a, the relationship between MAM CP_SST and DJF Niño-3.4 SSTa is considerably nonlinear. For example, CP_SST shows a somewhat positive relationship with DJF Niño-3.4 SSTa when it has a negative value or weak positive value. However, when the positive CP_SST gets stronger, the DJF Niño-3.4 SSTa tends to be strongly negative, suggesting a nonlinear relationship. The output variables also clearly show this nonlinear relationship (corr = 0.04, red dots). In this regard, we used the pretrained DL network by changing CP_SST to understand its actual role. First, we calculated the sole effect of CP_SST after SSTa in the central Pacific region (5°S–5°N, and 180°–100°W) set to zero. Surprisingly, as shown in Fig. 9b, the nonlinear relation disappears, and a quite linear relationship is observed. The positive linear relationship indicates that removing the positive SSTa in the deep learning decreases the DJF Niño 3.4 SSTa compared to the original one. That is, a positive CP_SST is related to increasing DJF Niño-3.4 SSTa and vice versa. This suggests that the role of CP_SST is linear, but other factors make it look nonlinear.

Fig. 9.
Fig. 9.

As in Fig. 4, but for MAM CP_SST. The nonlinear fitting lines are calculated using polynomial regression with n = 3°. The red and blue boxes in (a) indicate weak and strong cases, respectively. We define a weak case as having DJF Niño-3.4 SSTa exceeding 1 STD and CP_SST between 0 and 1.5 STD. A strong case happens when DJF Niño-3.4 SSTa is less than −1 STD and CP_SST is greater than 1.5 STD.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

To further examine what causes such nonlinear relation, we classified two distinctly different groups [weak positive CP_SST (0–1.5 STD) with positive DJF Niño-3.4 SSTa (>1 STD) versus strong positive CP_SST (>1.5 STD) with negative DJF Niño-3.4 SSTa (<−1 STD)]. Figure 10 shows the SSTa and SSHa patterns during MAM and their evolution up to the subsequent DJF season. For the weak case, weak warm SSTa locates over the central Pacific (Fig. 10a). In the strong case, positive SSTa locates in the central Pacific, but much stronger anomalies exist in the eastern Pacific (Fig. 10b). The difference between the two cases is more distinct in the SSHa. In the weak case, SSHa is overall positive in the equatorial Pacific. In the strong case, however, the SSHa shows a distinctive zonal contrast, similar to the pattern during the El Niño mature phase.

Fig. 10.
Fig. 10.

(top) The shadings and lines indicate the composite of MAM SSHa and SSTa, respectively, for the (a) weak and (b) strong cases. The shadings and contours indicate a 95% confidence level. (middle) The evolution of SSTa from D(−1)JF(0) to FMA(1) for the (c) weak and (d) strong cases. (bottom) The evolution of SSHa for the (e) weak and (f) strong cases. The black solid lines denote the MAM. The shadings indicate a 95% confidence level. The unit of SSTa is °C, and that of SSHa is m.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

The difference between the two cases is clearer in the evolution of SSTa and SSHa. The evolutions from D(−1)JF(0) to FMA(1) of SSTa and SSHa for each group are shown in Figs. 10c–f. In the SSTa evolution, the weak case has weak positive in the central–eastern Pacific from the boreal spring to the next boreal winter in Fig. 10c. In the weak case, the El Niño develops from boreal spring and reaches its peak in boreal winter. In the strong case, however, the El Niño has already developed before the MAM (Fig. 10d), and it rapidly decays and transits to La Niña because of the poleward discharge of warm water induced by the strong zonal contrast of SSHa (Jin 1997a,b). Therefore, even if CP_SST is positive in MAM, the two cases are expected to evolve into two different phases of ENSO. On the basis of the recharge oscillator theory, warm water is recharged in MAM for the weak case (Fig. 10e) but discharged to the off equator in MAM for the strong case (Fig. 10f). With the existence of the two different cases, the relationship between CP_SST and DJF Niño-3.4 SSTa looks nonlinear.

Although we understood why the relationship looks nonlinear, we tried to identify at the basis of a linear analysis whether the positive CP_SST actually plays a role in increasing the DJF Niño-3.4 SSTa, as our DL network analyses suggested in Fig. 9b. In this regard, we estimated the sole effect of CP_SST after removing the effect of the zonal contrast of SSHa. We calculated a partial regression after linearly removing the effect of the zonal contrast of SSHa [east SSHa (120°–90°W) − west SSHa (150°–170°E)] from DJF Niño-3.4 SSTa. Figure 11a shows a significant positive correlation between CP_SST and DJF Niño-3.4 SSTa, consistent with Fig. 9b. Additionally, the relationship between CP_SST and the zonal contrast of SSHa that affects CP_SST is investigated in Fig. 11b. The correlation coefficient between CP_SST and the zonal contrast of SSHa is distinctively high as 0.74 (Fig. 11b). The linear relationship is more pronounced when CP_SST is positive. Although the dynamic roles of CP_SST and the zonal contrast of SSHa are different, the relationship between CP_SST and the DJF Niño-3.4 SSTa can be inferred to be strongly affected by the zonal contrast of SSHa. Therefore, the relation appears nonlinear due to the high correlation between CP_SST and the zonal contrast of SSHa. We investigated how ENSO develops in boreal winter when the zonal contrast of SSHa is weak (<|0.5 STD|) but CP_SST is positive (>0.5 STD) or negative (<−5 STD) to further emphasize the role of CP_SST. Consistent with the previous results, warm and cold CP_SST develops into El Niño (Fig. 11c) and La Niña (Fig. 11d), respectively. This supports the argument that the actual role of the positive CP_SST contributes to increasing the Niño-3.4 SSTa 9 months later.

Fig. 11.
Fig. 11.

(a) The relations between MAM CP_SST and the values that linearly removed the effects of the zonal contrast of MAM SSHa from the DJF Niño-3.4 SSTa through partial regression. (b) The scatter diagram for MAM CP_SST and the zonal contrast of MAM SSHa. The red and blue boxes indicate positive and negative cases, respectively. A positive (negative) case is defined as having a zonal contrast of SSHa of less than |0.5 STD| and CP_SST greater (less) than 0.5 STD (−0.5 STD). The composite of the DJF SSTa for the (c) positive and (d) negative cases. The shadings indicate a 95% confidence level. The values are unitless for (a) and (b) because of normalization but are °C for (c) and (d).

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

In summary, CP_SST shows the nonlinear relation with the DJF Niño-3.4 SSTa, but its actual role is quite linear. Using several linear analyses such as the correlation, composite, partial regression, and physical interpretation, we can understand the actual role of CP_SST. However, our DL network-based analysis directly suggests that the effect of CP_SST is linear, as shown in Fig. 9b. This indicates that our DL network analysis can be a powerful tool in understanding physical phenomena under complex interactions.

4. Comparison with other interpretation methods

In this study, we present simple methods that applied the occlusion sensitivity for interpreting deep learning results to understand ENSO dynamics. Recently, numerous attempts have been made to interpret and visualize the inside of deep learning (Simonyan et al. 2013; Zeiler and Fergus 2013; McGovern et al. 2019; Ham et al. 2019; Ebert-Uphoff and Hilburn 2020; Mamalakis et al. 2022, 2021). Therefore, in this section, we discuss the differences between the contribution map and other methods when interpreting deep learning results. Among the various methods, the saliency map (Figs. 12a–d) and activation map (Figs. 12e,f) were used for comparison.

Fig. 12.
Fig. 12.

The saliency maps of SSTa for (a) El Niño and (b) La Niña and the SSHa for (c) El Niño and (d) La Niña. The activation maps for (e) El Niño and (f) La Niña.

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

Figures 12a–d show the results of the application of saliency maps to our pretrained DL network (Simonyan et al. 2013). The pixel highlighted in the saliency map (δy/δx) denotes where the output variable (y) is greatly affected by the small change in the input variable (x). This means that the saliency map presents how sensitive the output variable is to changes in the input variables. Thus, the concept of the saliency map matches that of our contribution sensitivity measure. On another note, the contribution map represents the contribution of the input variables at each grid box. It is estimated using the difference between the original output and reproduced outputs after eliminating the influence of the input variable. The common feature of these methods is that they can generate maps for each input sample from the pretrained DL network. However, the saliency map has some differences compared with the contribution map. First, the saliency map is calculated using the gradient information stored during the forward propagation. In contrast, the contribution map is calculated as the RMSD between the output variables. Second, the saliency map can select precursors only using gradients, whereas the contribution map can detect precursors by considering the sensitivity and the magnitude of the input variables. Because the saliency map is represented by only the gradient, the value of the saliency map decreases as the input and output approach the local extrema value. However, in our methodologies, we can overcome this by adding/subtracting 0.5 STD in the contribution sensitivity measure.

A saliency map can be produced for each input sample; thus, we averaged it for El Niño and La Niña distinguished on the basis of ±1 STD of the DJF Niño-3.4 SSTa. We averaged to identify common gradient features for El Niño/La Niña, but it should be noted that individual saliency maps may differ because each event of ENSO has slightly different characteristics. In the saliency map, a positive (negative) value implies that the output increases (decreases) when the input increases. Overall, the magnitude for SSTa is smaller than that for SSHa. The saliency map for SSTa seems to show similar patterns to the linear correlation map in Fig. 3a in that it fails to capture the importance of CP_SST. Because CP_SST is quite diverse among individual El Niño events, the average of the gradient seems unable to capture the role of CP_SST. In SSHa, the influence of the central Pacific is greater in El Niño than in La Niña. This result is consistent with the contribution sensitivity of CP_SSH. The magnitudes in the western South Pacific are comparable between El Niño and La Niña. However, in the contribution map, a distinctively stronger signal for La Niña is observed because it considers the asymmetric behavior of the precursor itself. Furthermore, the overall pattern of the saliency map tends to be relatively noisy due to the average of the gradient.

Another method is the activation map, sometimes called the “heat map” (Ham et al. 2019). The activation map quantifies the effect of each grid point in the input variable on the output variable. Instead of deriving a regression output using CNN, the activation map maintains the spatial information from the last convolutional layer to the output layer. Specifically, spatial information is originally lost after flattening the feature map of the last convolutional layer. In heat map analysis (Ham et al.’s 2019 methods), however, the weights of the fully connected layer are reshaped and multiplied with the feature map of the last convolutional layer. Therefore, the contribution of each grid point to the output can be obtained while maintaining spatial information. A positive (negative) value means that a grid point of the input variable contributes to the positive (negative) output variable. Positive and negative contributions are dominant in El Niño (Fig. 12e) and La Niña (Fig. 12f), respectively. Specifically, for the El Niño phase, a positive contribution is dominant in the central Pacific. On the other hand, for the La Niña phase, the negative contribution is large in the western Pacific with a partial positive contribution in the central Pacific. It seems likely that the results of the activation map are consistent with our results. However, only regional information about the quantified contribution is represented in the heat map. Thus, identifying which variable causes the contribution is challenging. In contrast, the contribution map is more useful because it could be produced for each variable. Also, the resolution of the activation map is very coarse because it has the same spatial size as that of the last convolutional layer. As with the saliency map, isolating the effect of a specific precursor is difficult.

5. Summary and discussion

In this study, we suggested an easy and simple method called the contribution map, whose concept is similar to that of the occlusion sensitivity (Zeiler and Fergus 2013). This was applied to detecting precursors for the output variable by estimating the contribution of each variable at each grid box. To isolate the sole effect of each precursor, we produced the new prediction from the pretrained DL network after the precursor was set to zero. Therefore, we could deduce the actual dynamic role of a precursor. The present methods with a DL network can help us understand the dynamics that are difficult to be represented in simple linear analyses. In addition, the contribution sensitivity is presented to estimate how much the out variable is sensitive to the small perturbation of the input variable. Using the contribution map and sensitivity, we not only confirmed well-known ENSO precursors and their roles but also found new insights into ENSO dynamics such as the asymmetric natures of CP_SSH and WP_SSH for El Niño and La Niña and the actual role of CP_SST on ENSO, which have been not captured well in the conventional linear analyses. Therefore, our study suggests that the contribution map and sensitivity will provide useful information to help us understand deep learning results and, eventually, our climate system.

As mentioned earlier, we used climate model data for our analysis instead of observational data. Reliable observational or reanalysis data have been available since 1950. Monthly data were used considering the climate time scale so that the number of data will be less than 100. Even if we use 80% of this for training, we have to train with only less than 80 samples. This is too small for the given time scale of ENSO (2–7 years). Insufficient sample size can induce overfitting by learning incorrect information, which can lead to a misunderstanding of our climate system and degradation of the performance of deep learning. Conversely, sufficient samples can be available if we use climate model data, and the climate system can be learned correctly under climate model physics. However, it should be noted that the models are sometimes quite different from actual situations because of the systematic biases.

To compensate for this issue, transfer learning has been suggested (Bozinovski 2020). Transfer learning is a method of updating parameters by fine-tuning using a pretrained deep learning model. It can overcome the lack of datasets and thus can improve the accuracy. Ham et al. (2019) first built a pretrained CNN using the phase 6 of the Coupled Model Intercomparison Project dataset, which has abundant data. Then, they fine-tuned their model through observational data. We also tried to investigate through observational data. For the test, we use Simple Ocean Data Assimilation (SODA), version 3 (Carton et al. 2018a,b, 2019), from 1980 to 2015 for SST and SSH. Interestingly, the prediction skill is high about 0.86 even without transfer learning (Fig. 13a). Of course, performances increase slightly after transfer learning (not shown), but there is no considerable change. This result means that the climate model captures the characteristics of ENSO well, and our DL network also reflects this well. Therefore, we generate a contribution map using observational data (Figs. 13b,c). Overall, the regions with the strongest signal appear similar to the model though it has a somewhat noisy characteristic. As a result, it can be supported that the dynamics learned and interpreted through the DL network using climate model data can also be applied to the observation.

Fig. 13.
Fig. 13.

(a) Relationship between the DL network output variables without transfer learning (x axis) and the target variables (y axis) for the SODA dataset. The scatters indicate DJF Niño-3.4 SSTa indices, and the red line is y = x. (b),(c) The contribution maps for the SODA dataset. The black boxes represent the three indices (CP_SST, CP_SSH, and WP_SSH).

Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-21-0011.1

Although deep learning has great advantages in understanding not only linearity but also nonlinearity, our DL network used a fixed season of input [MAM(0)] and target [D(0)JF(1)] for training. Therefore, our dynamic interpretation is limited to the fixed season. Additional analyses through combination with other methods such as RNN can be performed to examine continuous time-dependent changes. Moreover, our contribution map and sensitivity can be applied to other types of deep learning models. However, further studies are needed on the proper ways of combining our methodologies with other types of deep learning models.

Additionally, we used SST and SSH, two variables greatly related to ENSO, as input variables. Thus, our methods enable finding important precursors individually but do not specify which combination of input variables is more skillful. Accordingly, experiments to find appropriate combinations of input variables (e.g., impurity importance, permutation importance, and sequential selection) should be conducted considering computing costs and performance (Siedlecki and Sklansky 1989; Leardi 1996; Breiman 2001; Lakshmanan et al. 2015; McGovern et al. 2019).

Despite a few considerations, the application of our method to climate research is useful. In understanding climate dynamics, knowing the exact relationship between precursors and target variables is important. However, there is a high probability that the actual relationship is hidden through simple visualizations such as scatterplots or linear analysis. Interestingly, we revealed the actual dynamic role of the precursor that appeared to be unrelated (e.g., CP_SST). This suggests numerous potentially undisclosed dynamics were inadvertently missed while only conducting linear analysis. Therefore, our methodologies will be a good stepping stone for expanding our climate knowledge with various scalabilities.

Acknowledgments.

This work was supported by the Institute of Information and communications Technology Planning and Evaluation (IITP) grant funded by the South Korean government (MSIT) [2019-0-01906, Artificial Intelligence Graduate School Program (POSTECH)]. J.-S. Kug is partly supported by the National Research Foundation of Korea (NRF) grant funded by the South Korean government (NRF-2022R1A3B1077622, NRF-2018R1A5A1024958).

Data availability statement.

The datasets for training and test the deep learning network used in this study are available at https://doi.org/10.6084/m9.figshare.17194220.v1.

REFERENCES

  • Alexander, M. A., D. J. Vimont, P. Chang, and J. D. Scott, 2010: The impact of extratropical atmospheric variability on ENSO: Testing the seasonal footprinting mechanism using coupled model experiments. J. Climate, 23, 28852901, https://doi.org/10.1175/2010JCLI3205.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • An, S.-I., and F.-F. Jin, 2004: Nonlinearity and asymmetry of ENSO. J. Climate, 17, 23992412, https://doi.org/10.1175/1520-0442(2004)017<2399:NAAOE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • An, S.-I., Y.-G. Ham, J.-S. Kug, F.-F. Jin, and I.-S. Kang, 2005: El Niño–La Niña asymmetry in the coupled model intercomparison project simulations. J. Climate, 18, 26172627, https://doi.org/10.1175/JCLI3433.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Atwood, A. R., D. S. Battisti, A. T. Wittenberg, W. H. G. Roberts, and D. J. Vimont, 2017: Characterizing unforced multi-decadal variability of ENSO: A case study with the GFDL CM2.1 coupled GCM. Climate Dyn., 49, 28452862, https://doi.org/10.1007/s00382-016-3477-9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Biard, J. C., and K. E. Kunkel, 2019: Automated detection of weather fronts using a deep learning neural network. Adv. Stat. Climatol. Meteor. Oceanogr., 5, 147160, https://doi.org/10.5194/ascmo-5-147-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bjerknes, J., 1969: Atmospheric teleconnections from the equatorial Pacific. Mon. Wea. Rev., 97, 163172, https://doi.org/10.1175/1520-0493(1969)097<0163:ATFTEP>2.3.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bozinovski, S., 2020: Reminder of the first paper on transfer learning in neural networks, 1976. Informatica, 44, 291302, https://doi.org/10.31449/inf.v44i3.2828.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Breiman, L., 2001: Random forests. Mach. Learn., 45, 532, https://doi.org/10.1023/A:1010933404324.

  • Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 62896298, https://doi.org/10.1029/2018GL078510.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Burgers, G., and D. B. Stephenson, 1999: The “normality” of El Niño. Geophys. Res. Lett., 26, 10271030, https://doi.org/10.1029/1999GL900161.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cai, W., and Coauthors, 2015: ENSO and greenhouse warming. Nat. Climate Change, 5, 849859, https://doi.org/10.1038/nclimate2743.

  • Cai, W., and Coauthors, 2019: Pantropical climate interactions. Science, 363, eaav4236, https://doi.org/10.1126/science.aav4236.

  • Cai, W., and Coauthors, 2020: Climate impacts of the El Niño–Southern Oscillation on South America. Nat. Rev. Earth Environ., 1, 215231, https://doi.org/10.1038/s43017-020-0040-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carton, J. A., G. A. Chepurin, and L. Chen, 2018a: SODA3: A new ocean climate reanalysis. J. Climate, 31, 69676983, https://doi.org/10.1175/JCLI-D-18-0149.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carton, J. A., G. A. Chepurin, L. Chen, and S. A. Grodsky, 2018b: Improved global net surface heat flux. J. Geophys. Res. Oceans, 123, 31443163, https://doi.org/10.1002/2017JC013137.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Carton, J. A., S. G. Penny, and E. Kalnay, 2019: Temperature and salinity variability in SODA3, ECCO4r3, and ORAS5 ocean reanalyses, 1993–2015. J. Climate, 32, 22772293, https://doi.org/10.1175/JCLI-D-18-0605.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chattopadhyay, A., P. Hassanzadeh, and S. Pasha, 2020a: Predicting clustered weather patterns: A test case for applications of convolutional neural networks to spatio-temporal climate data. Sci. Rep., 10, 1317, https://doi.org/10.1038/s41598-020-57897-9.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chattopadhyay, A., P. Hassanzadeh, and D. Subramanian, 2020b: Data-driven prediction of a multi-scale Lorenz 96 chaotic system using deep learning methods: Reservoir computing, artificial neural network, and long short-term memory network. Nonlinear Processes Geophys., 27, 373389, https://doi.org/10.5194/npg-27-373-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chattopadhyay, A., E. Nabizadeh, and P. Hassanzadeh, 2020c: Analog forecasting of extreme-causing weather patterns using deep learning. J. Adv. Model. Earth Syst., 12, e2019MS001958, https://doi.org/10.1029/2019MS001958.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Choi, K.-Y., G. A. Vecchi, and A. T. Wittenberg, 2013: ENSO transition, duration, and amplitude asymmetries: Role of the nonlinear wind stress coupling in a conceptual model. J. Climate, 26, 94629476, https://doi.org/10.1175/JCLI-D-13-00045.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Delworth, T. L., and Coauthors, 2006: GFDL’s CM2 global coupled climate models. Part I: Formulation and simulation characteristics. J. Climate, 19, 643674, https://doi.org/10.1175/JCLI3629.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Domeisen, D. I. V., C. I. Garfinkel, and A. H. Butler, 2019: The teleconnection of El Niño Southern Oscillation to the stratosphere. Rev. Geophys., 57, 547, https://doi.org/10.1029/2018RG000596.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dueben, P. D., and P. Bauer, 2018: Challenges and design choices for global weather and climate models based on machine learning. Geosci. Model Dev., 11, 39994009, https://doi.org/10.5194/gmd-11-3999-2018.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ebert-Uphoff, I., and K. Hilburn, 2020: Evaluation, tuning, and interpretation of neural networks for working with images in meteorological applications. Bull. Amer. Meteor. Soc., 101, E2149E2170, https://doi.org/10.1175/BAMS-D-20-0097.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Elman, J. L., 1990: Finding structure in time. Cognit. Sci., 14, 179211, https://doi.org/10.1207/s15516709cog1402_1.

  • Gámez, A. J., C. S. Zhou, A. Timmermann, and J. Kurths, 2004: Nonlinear dimensionality reduction in climate data. Nonlinear Processes Geophys., 11, 393398, https://doi.org/10.5194/npg-11-393-2004.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gentine, P., M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis, 2018: Could machine learning break the convection parameterization deadlock? Geophys. Res. Lett., 45, 57425751, https://doi.org/10.1029/2018GL078202.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier neural networks. J. Mach. Learn. Res., 15, 315323.

  • Gnanadesikan, A., and Coauthors, 2006: GFDL’s CM2 global coupled climate models. Part II: The baseline ocean simulation. J. Climate, 19, 675697, https://doi.org/10.1175/JCLI3630.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 800 pp.

  • Grieger, B., and M. Latif, 1994: Reconstruction of the El Niño attractor with neural networks. Climate Dyn., 10, 267276, https://doi.org/10.1007/BF00228027.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ham, Y.-G., and J.-S. Kug, 2012: How well do current climate models simulate two types of El Nino? Climate Dyn., 39, 383398, https://doi.org/10.1007/s00382-011-1157-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ham, Y.-G., J.-S. Kug, J.-Y. Park, and F.-F. Jin, 2013: Sea surface temperature in the north tropical Atlantic as a trigger for El Niño/Southern Oscillation events. Nat. Geosci., 6, 112116, https://doi.org/10.1038/ngeo1686.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ham, Y.-G., J.-H. Kim, and J.-J. Luo, 2019: Deep learning for multi-year ENSO forecasts. Nature, 573, 568572, https://doi.org/10.1038/s41586-019-1559-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ham, Y.-G., J.-H. Kim, E.-S. Kim, and K.-W. On, 2021: Unified deep learning model for El Niño/Southern Oscillation forecasts by incorporating seasonality in climate data. Sci. Bull., 66, 13581366, https://doi.org/10.1016/j.scib.2021.03.009.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hannachi, A., D. Stephenson, and K. Sperber, 2003: Probability-based methods for quantifying nonlinearity in the ENSO. Climate Dyn., 20, 241256, https://doi.org/10.1007/s00382-002-0263-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018: Money doesn’t grow on trees, but forecasts do: Forecasting extreme precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 17351780, https://doi.org/10.1162/neco.1997.9.8.1735.

  • Hsieh, W. W., and B. Tang, 1998: Applying neural network models to prediction and data analysis in meteorology and oceanography. Bull. Amer. Meteor. Soc., 79, 18551870, https://doi.org/10.1175/1520-0477(1998)079<1855:ANNMTP>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ise, T., and Y. Oba, 2019: Forecasting climatic trends using neural networks: An experimental study using global historical data. Front. Rob. AI, 6, 32, https://doi.org/10.3389/frobt.2019.00032.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jiménez-Esteve, B., and D. I. V. Domeisen, 2019: Nonlinearity in the North Pacific atmospheric response to a linear ENSO forcing. Geophys. Res. Lett., 46, 22712281, https://doi.org/10.1029/2018GL081226.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jin, F.-F., 1997a: An equatorial ocean recharge paradigm for ENSO. Part I: Conceptual model. J. Atmos. Sci., 54, 811829, https://doi.org/10.1175/1520-0469(1997)054<0811:AEORPF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jin, F.-F., 1997b: An equatorial ocean recharge paradigm for ENSO. Part II: A stripped-down coupled model. J. Atmos. Sci., 54, 830847, https://doi.org/10.1175/1520-0469(1997)054<0830:AEORPF>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jin, F.-F., and S.-I. An, 1999: Thermocline and zonal advective feedbacks within the equatorial ocean recharge oscillator model for ENSO. Geophys. Res. Lett., 26, 29892992, https://doi.org/10.1029/1999GL002297.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kang, I.-S., and J.-S. Kug, 2002: El Niño and La Niña sea surface temperature anomalies: Asymmetry characteristics associated with their wind stress anomalies. J. Geophys. Res., 107, 4372, https://doi.org/10.1029/2001JD000393.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kessler, W. S., 2002: Is ENSO a cycle or a series of events? Geophys. Res. Lett., 29, 2125, https://doi.org/10.1029/2002GL015924.

  • Kim, D., J.-S. Kug, I.-S. Kang, F.-F. Jin, and A. T. Wittenberg, 2008: Tropical Pacific impacts of convective momentum transport in the SNU coupled GCM. Climate Dyn., 31, 213226, https://doi.org/10.1007/s00382-007-0348-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kim, H., Y. G. Ham, Y. S. Joo, and S. W. Son, 2021: Deep learning for bias correction of MJO prediction. Nat. Commun., 12, 3087, https://doi.org/10.1038/s41467-021-23406-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kim, J., M. Kwon, S.-D. Kim, J.-S. Kug, J.-G. Ryu, and J. Kim, 2022: Spatiotemporal neural network with attention mechanism for El Nino forecast. Sci. Rep., 12, 7204, https://doi.org/10.1038/s41598-022-10839-z.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

    • Crossref
    • Export Citation
  • Kug, J.-S., S.-I. An, F.-F. Jin, and I.-S. Kang, 2005: Preconditions for El Niño and La Niña onsets and their relation to the Indian Ocean. Geophys. Res. Lett., 32, L05706, https://doi.org/10.1029/2004GL021674.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kug, J.-S., F.-F. Jin, and S.-I. An, 2009: Two types of El Niño events: Cold tongue El Niño and warm pool El Niño. J. Climate, 22, 14991515, https://doi.org/10.1175/2008JCLI2624.1.

    • Search Google Scholar
    • Export Citation
  • Kug, J.-S., K.-P. Sooraj, T. Li, and F.-F. Jin, 2010a: Precursors of the El Niño/La Niña onset and their interrelationship. J. Geophys. Res., 115, D05106, https://doi.org/10.1029/2009JD012861.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kug, J.-S., J. Choi, S.-I. An, F.-F. Jin, and A. T. Wittenberg, 2010b: Warm pool and cold tongue El Niño events as simulated by the GFDL 2.1 coupled GCM. J. Climate, 23, 12261239, https://doi.org/10.1175/2009JCLI3293.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lakshmanan, V., C. Karstens, J. Krause, K. Elmore, A. Ryzhkov, and S. Berkseth, 2015: Which polarimetric variables are important for weather/no-weather discrimination? J. Atmos. Oceanic Technol., 32, 12091223, https://doi.org/10.1175/JTECH-D-13-00205.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Leardi, R., 1996: Genetic algorithms in feature selection. Genetic Algorithms in Molecular Modeling, 1st ed. J. Devillers, Ed., Academic Press, 6786, https://doi.org/10.1016/B978-012213810-2/50004-9.

    • Search Google Scholar
    • Export Citation
  • LeCun, Y., and Y. Bengio, 1995: Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, M. A. Arbib, Ed., MIT Press, 255258.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • LeCun, Y., L. Bottou, Y. Bengio, and P. Haffner, 1998: Gradient-based learning applied to document recognition. Proc. IEEE, 86, 22782324, https://doi.org/10.1109/5.726791.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436444, https://doi.org/10.1038/nature14539.

  • Lim, H.-G., J.-S. Kug, and J.-Y. Park, 2019a: Biogeophysical feedback of phytoplankton on the Arctic climate. Part I: Impact of nonlinear rectification of interactive chlorophyll variability in the present-day climate. Climate Dyn., 52, 53835396, https://doi.org/10.1007/s00382-018-4450-6.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lim, H.-G., J.-S. Kug, and J.-Y. Park, 2019b: Biogeophysical feedback of phytoplankton on Arctic climate. Part II: Arctic warming amplified by interactive chlorophyll under greenhouse warming. Climate Dyn., 53, 31673180, https://doi.org/10.1007/s00382-019-04693-5.

    • Search Google Scholar
    • Export Citation
  • Liu, Y., and Coauthors, 2016: Application of deep convolutional neural networks for detecting extreme weather in climate datasets. arXiv, 1605.01156v1, https://arxiv.org/abs/1605.01156.

    • Crossref
    • Export Citation
  • Liu, Z., and M. Alexander, 2007: Atmospheric bridge, oceanic tunnel, and global climatic teleconnections. Rev. Geophys., 45, RG2005, https://doi.org/10.1029/2005RG000172.

    • Search Google Scholar
    • Export Citation
  • Mamalakis, A., I. Ebert-Uphoff, and E. A. Barnes, 2021: Neural network attribution methods for problems in geoscience: A novel synthetic benchmark dataset. arXiv, 2103.10005v2, https://arxiv.org/abs/2103.10005.

    • Crossref
    • Export Citation
  • Mamalakis, A., E. A. Barnes, and I. Ebert-Uphoff, 2022: Investigating the fidelity of explainable artificial intelligence methods for applications of convolutional neural networks in geoscience. arXiv, 2202.03407v1, https://doi.org/10.48550/arXiv.2202.03407.

    • Crossref
    • Export Citation
  • McGovern, A., R. Lagerquist, D. J. Gagne, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 21752199, https://doi.org/10.1175/BAMS-D-18-0195.1.

    • Crossref
    • Search Google Scholar