1. Introduction
First, the LRF estimated for a complex system is often ill-conditioned even after dimension reduction, and inversion of an ill-conditioned matrix often encounters the division by small numbers problem. In addition, this problem gets worse as the dimension of the matrix increases (Kabanikhin 2008). To capture a substantial fraction of the variability of the system, a relatively large number of EOFs must be retained, so one must maintain a delicate balance between capturing as much of the response signal as possible and ensuring the stability of the solution procedure. The third challenge is that the linearity of Green’s function perturbation experiments is not guaranteed. Green’s function experiments are expensive, and the forcing magnitude is often the result of the trade-off between the consideration that it must be large enough to generate a large enough signal-to-noise ratio for a given limited length of model integration but also be small enough not to violate the linearity between the forcing and the response, which is not always guaranteed.
Recently, there has been a growing interest in employing deep learning, specifically deep neural networks, for solving inverse problems. Utilizing deep neural networks for solving inverse problems can reduce costs compared to the traditional regularization-based inversion method in the computation of variances, covariance matrices, and the estimation of hyperparameters (Mohammad-Djafari 2021). These deep learning techniques are currently transforming image reconstruction methods and impacting applications ranging from hydrological, geophysical, scientific, and medical imaging (Das et al. 2019; Ongie et al. 2020; Vijayakumar 2020; Wang et al. 2020). Given the more complex structures and stronger abilities of nonlinear mapping, deep learning model can take the nonlinear inversion process directly from data and labels, eliminating the need for inaccurate and ill-posed initial models or expensive inverse modeling. In particular, deep convolutional neural networks (CNNs) are the most advanced approach for deciphering and extracting features in multidimensional images (Krizhevsky et al. 2017), which makes it suitable for evaluating global response images. However, few efforts have been made to apply these techniques to the climate inversion problems as faced here.
Here, to obtain a functional forcing–response relationship for the global pattern of surface temperature, we adopt the CNN to construct a supervised regression model for the surface temperature responses (as input images) with respect to the climate model forcing (as labeled output), both obtained from a large set of q-flux Green’s function experiments (see Liu et al. 2018a,b). The resultant architecture works to learn an inverse mapping from the surface temperature images to the q-flux forcing from every representative location of the global ocean (see the q-flux perturbation array in Fig. S1 in the online supplemental material). Note that q-flux heat represents the ocean heat flux into the atmosphere that drives the global climatic response including that of the surface temperature.
2. Data and methods
a. Green’s function perturbation experiments
The data used to train the CNN model for the forcing–response relationship is taken from a set of q-flux Green’s function experiments using the atmospheric component [Community Atmospheric Model, version 5 (CAM5)] of Community Earth System Model, version 1.1, coupled to a slab representing a thermodynamic ocean mixed layer, the Community Land Model, version 4 (CLM4), and a thermodynamic sea ice component. We refer to this model as CAM5-SOM in this study. The horizontal resolution of CAM5-SOM is approximately 1°, telescoped meridionally to ∼0.3° at the equator.
Prior to perturbation runs, a long control run is first conducted with preindustrial climate forcing and a seasonally repeating q-flux estimated from a long preindustrial coupled CESM1.1 simulation. The mixed layer depth is taken to be the annual average value estimated from the CESM1.1 control simulation; it is only geographically varying, but independent of time. For q-flux forcing perturbation runs, we perturb the CAM5-SOM with an array of 97 q-flux anomaly patches, as illustrated in Fig. S1. For each patch, the experiment pair consists of a positive and a negative forcing anomaly (peaking at 12 W m−2), respectively, for each of which the simulation branches out from the 51st year of the control run and integrates for 40 years. Only the last 20 years are used for analysis. Making use of the forcing pair, we can define the symmetric component and the asymmetric (or quadratic nonlinear) component of the response as (x+ − x−)/2 and (x+ + x−)/2 for each patch, with the superscript +(−) denoting the response to the positive (negative) forcing. The former is to isolate the linear aspect of the forcing–response relationship even if the response x is not entirely linear given the sizable forcing magnitude (12 W m−2), and only this linear component of the response is investigated in the current study. Our primary objective in this first attempt to tackle the inverse problem of climate forcing is to focus on the linear component. This is based on the recognition that the nonlinear component within our CAM5-SOM model is considerably larger compared to the nonlinearity present in a more realistic modeling framework with feedback from the active ocean dynamics (e.g., Tseng et al. 2023). More details about the configuration of the forcing Green’s function experiments using CAM5-SOM can be found in Lu et al. (2020). We must note that the slab-coupled model does not capture the complete equilibrium response without accounting for the ocean dynamical feedback. However, running a similar number of equilibrium simulations with a fully coupled state-of-the-art climate model is currently far beyond our computational capacity. Such a task would only become practicable with significant improvement in computation power.
b. Efficient sampling-based data augmentation
Climate forcing inversion framework using CNN.
Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0053.1
c. CNN for image-based inversion
The tuning of hyperparameters is an iterative process, combining grid search and manual adjustments based on the model’s iteration history and performance on both training and validation sets. Coarse grid search with determined sets of hyperparameters has been performed on both training and validation sets to determine the best model architecture, i.e., the number of convolutional layers, additional pooling and dropout layers, and the type of optimizer. Then, a finer grid search has been used within the determined model architecture to further enhance model performance. Two evaluation metrics have been adopted: the R2 score and root-mean-square error (RMSE), which provide a balanced assessment of the model’s performance by combining goodness of fit R2 with the management of errors (RMSE). This allows us to balance between model accuracy and computational efficiency. Key hyperparameters, such as batch size, learning rate, and the number of neurons in each layer, are varied within predetermined ranges to identify the configuration that yields the best validation accuracy. For the final selection of hyperparameters, we opt for a set where the R2 score converges to the highest value and the RMSE reaches the lowest for both the training and validation datasets.
As the neural network goes deeper and more complicated, the CNN model will be susceptible to overfitting, and cross validation and independent testing are necessary to prevent overfitting. To this end, the synthetic samples are divided into three subsets: training, validation, and testing, for model development and evaluation. Training and validation use 85% of the total samples during CNN model development to determine the optimal model configuration and hyperparameters. The remaining 15% of the data is used as the testing set to evaluate the final model. Since all the samples are synthetically generated based on the 97 independent perturbation cases of Green’s function experiments, the training set and the testing set are not entirely independent. To provide fully independent cases for verification, we further perform an extra pair of experiments with a uniform ±2 W m−2 q-flux perturbation (see Fig. 4a).
3. Results
We first report the model performance by examining the goodness of fit, as measured by R2, which indicates the model’s prediction skill. Figure 2a shows the R2 scores for the training, validation, and testing datasets. The median values of R2 are around 0.85 for all the three datasets, suggesting the model has been effectively trained. In addition, both the medians and the interquartile ranges of R2 for the three datasets are comparable, indicating no obvious overfitting. Figure 3 shows three top scorers with R2 > 0.95, where the CNN model can predict the “true” forcing quite accurately. Intriguingly, the temperature response in all three cases shares the same large-scale features as the leading neutral mode of the surface temperature of the CESM-SOM system (see Fig. 2a in Lu et al. 2020). However, caution should be exercised in interpreting the contribution of this neutral mode-like pattern to the skillful prediction of the CNN model. As will be shown later, the same pattern is also present in the failed predictions (see Fig. 5).
CNN model evaluation with the final CNN model configuration. (a) Goodness-of-fit R2 scores and their interquartile range for the training, validation, and testing datasets. (b) The absolute prediction error (pW) and its interquartile range for each forcing location for testing dataset (orange bars) in comparison with the errors of the naïve prediction of all 0 PW (gray bars). The whiskers in (a) and (b) denote the 99.3% portion of the R2 scores and that of the absolute errors, respectively.
Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0053.1
Three top scorers illustrating the skill of the trained CNN model in predicting the corresponding synthetic forcings.
Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0053.1
To systematically evaluate the model performance in the prediction of forcing at different forcing locations, the absolute errors between the predicted and “true” forcing are calculated for each of the 4500 test cases and the result is summarized for each of the 97 forcing locations in Fig. 2b, in contrast to the absolute errors of the naïve prediction of no forcing (gray bars). The vast majority of absolute errors fall within the range of ±0.05 PW at more than 90% of locations, which evidently exceeds the skill of the naïve predictions (gray bars). Were the linearity and the skillfulness of the CNN to hold for the real climate problems, the CNN-predicted forcing could potentially serve as a valuable first guess for the climate forcing solution. In any potential deployment of the forcing, this first guess solution can then be applied in conjunction with control algorithms concurrently developed in the geoengineering research community (MacMartin et al. 2014; Kravitz et al. 2016) to further mitigate errors.
The data for training, validation, and testing are not entirely independent, as all are synthetically generated based on the 97 Green’s function perturbation runs. This is also reflected by the comparable R2 scores among them. Therefore, it is necessary to test the CNN model with fully independent forcing cases that are not among the 97 perturbation cases. The independent test case is forced by a spatially uniform q-flux forcing of 2 W m−2 magnitude (Fig. 4a). The actual temperature response in this independent experiment (Fig. 4b) is provided as input to the CNN model to predict the forcing. It is encouraging that the CNN-prediction (Fig. 4c) does capture a relatively uniform pattern of the q-flux forcing; however, the magnitude is underestimated (cf. Fig. 4c with Fig. 4a) by a factor of 3. Since the sum of the 97 forcing patches give a uniform 12 W m−2 forcing over the global open ocean, we can add up the responses to all the 97 forcing patches and rescale the result by a factor of 1/6 to construct a synthetic response to a uniform 2 W m−2 forcing (shown in Fig. 4d). Running it through the CNN model, we then predict the forcing pattern again. The predicted forcing for the synthetic response has a magnitude much closer to the truth (Fig. 4e), with most of the predicted values falling within ±20% of the true values. As the linear additivity assumption has been made by the LHS data augmentation, the disparity between the predicted forcing magnitudes for the actual and synthetic 2 W m−2 case underscores the limitation of the linearity assumption. The fact that the synthetic magnitude of the temperature response is much larger than the actual simulated one (cf. Figs. 4b–d) implicates nonlinearity in the temperature response as one ramps up the forcing magnitude. Specifically, the actual response would be much weaker than the linearly interpolated one from the cases with larger forcing magnitudes. By corollary, were the CNN model trained on the actual simulated samples by CAM5-SOM with different forcing magnitudes (instead of the synthetic samples), the prediction would have captured the 2 W m−2 forcing magnitude much more closely. However, we cannot verify this without running actual simulations with different forcing amplitudes to generate additional samples for the CNN model to train on. In addition, the limitations imposed by the LHS data augmentation may not be the only reason why the CNN model significantly underestimates the forcing magnitude, a point that will be elaborated below.
Model prediction for the uniform forcing case. (a) True forcing with uniform pattern; (b) temperature response obtained from GCM simulation; (c) predicted forcing by CNN model using GCM simulated response; (d) temperature response constructed synthetically; and (e) predicted forcing by CNN model using synthetic response.
Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0053.1
The second reason is related to the nonuniqueness or equifinality (e.g., Tang and Zhang 2008) of the inverse problem. By examining the test samples (4500 cases in total) systematically, we find a tendency for the trained CNN to produce disparate forcing solutions and to underestimate the forcing magnitude for a given characteristic temperature response pattern. To elucidate the point, we show in Fig. 5 two cases selected from the test samples with very similar (but not identical) temperature patterns (Figs. 5c,d) but distinct forcing patterns (Figs. 4a,b). In addition, feeding these temperature patterns to the CNN model as inputs, the CNN predicts much weaker q-flux forcings (Figs. 5e,f) compared to the actual ones the temperature pattern is constructed from (Figs. 5a,b). Using the predicted forcings to reconstruct the temperature responses (as shown in Figs. 5g,h), the reconstructed responses are much weaker in magnitude, although they are spatially similar to the original patterns. This suggests that the CNN model does pay attention to the spatial features, but it is unsure about the precise location and the magnitude of the forcing that can give rise to the characteristic features as shown in Figs. 5c,d. This is not necessarily a caveat particular to the CNN model selected here, but rather a reflection of the inherent underdetermined nature and the equifinality of the inverse problem itself.
Forcing and temperature response comparison between two synthetic cases. (a),(b) Synthetic forcings with two different but similar patterns (W m−2); (c),(d) temperature response constructed using two synthetic forcing patterns (K); (e),(f) CNN model predicted forcings; and (g),(h) temperature response constructed using model predicted forcings. Note that the color bar for the reconstructed temperature response ranges only between −1 and 1 K.
Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0053.1
In retrospect, the underestimation of the forcing magnitude for the independent test case is expected from its response temperature pattern (Fig. 4b), which resembles greatly the pattern associated with multiple forcing solutions and is often predicted by the CNN model with subdued forcing magnitude. More intriguingly, this pattern bears great resemblance to the leading neutral mode of the surface temperature identified for this CAM5-SOM climate system in an earlier study (see Fig. 2a in Lu et al. 2020) and the feedbacks through cloud radiative effect and wind–SST–evaporation mechanism that shape the Pacific pattern of the mode has been identified (e.g., Lin et al. 2021; Hsiao et al. 2022). As the leading neutral mode represents the most excitable mode of the system, that is, an arbitrary forcing would likely produce the neutral mode pattern, the neutrality of the mode presents a challenge to pin down the exact forcing through a data-driven approach, especially when the operator of the system is contaminated with noise (which is always the case). Notwithstanding the underdetermined nature of the problem, the CNN model trained herein can capture the spatial uniformity of the forcing, affording some feasibility for climate forcing design to mitigate climate warming.
4. Summary and discussion
In this study, we utilize an existing set of Green’s function perturbation experiments generated by a slab-coupled climate model to construct a novel inversion framework that predicts the forcing from the response through training a deep CNN. The resulting CNN model shows no clear signs of overfitting and can predict the forcing of the synthetic test cases and the synthetic 2 W m−2 case with considerable skill. When applied to an independent test case with a uniform 2 W m−2 forcing, the CNN model can capture the uniformity of the forcing but significantly underestimates the magnitude of the forcing. This is likely because the true nonlinear forcing–response relationship is not seen in the first place during the training of the CNN model and the response in the independent test case projects strongly onto the leading neutral mode of the slab-coupled climate system. For patterns projecting onto the neutral mode, it is difficult for a data-driven approach to unambiguously predict the forcing, especially when the data contains noise. This may also reflect the underdeterminedness and/or equifinality for an ill-posed, inverse problem.
A main caveat with the CNN for solving the regression problem here is the difficulty with interpretability: What response features do the neural network use to predict the q-flux forcing? The heatmap for the prediction of the independent test case turned out to be uninformative, due to the complex architecture used here and the related shattered gradients problem (Balduzzi et al. 2017). Lack of interpretability is intrinsic to the deep neural network, so an alternative, more interpretable approach to the problem would be preferable, in light of the emerging new field of machine learning known as explainable artificial intelligence (XAI, e.g., Kohlbrenner et al. 2020; Das and Rad 2020).
This study is only our first attempt to identify the optimal two-dimensional forcing for a desired climate state. As a first attempt, we started with examining the forcing for the linear component of the temperature response. At the same time, we attempted to train a CNN model for the response to the positive q-flux forcing only, i.e., the response includes both the symmetric and asymmetric components. The trained model has a much lower goodness-of-fit score and predictive skill compared to its counterpart for the symmetric response (see Fig. S2 and compare it with Fig. 2). This underscores the greater challenge of finding the optimal forcing for the full temperature response as an inverse problem for a system with large quadratic nonlinearity and strong mode behavior (Lu et al. 2021; Bloch-Johnson et al. 2024), although the nonlinearity may be dampened substantially for the real climate with active ocean dynamic feedback (Tseng et al. 2023). For the first attempt here, we only investigate the forcing–response relationship for the equilibrium response in a slab-coupled climate system. The real climate contains much richer scales, especially the slower time scales due to the large heat inertia of the ocean and ocean dynamics. Searching for an optimal time-dependent forcing for a given climate target within a certain time horizon in a fully coupled climate system would be more relevant to the imminent climate challenges we are facing in this century. This would require much longer simulations and even ensembles of simulations using the state-of-the-art climate system—a computational and experimental design challenge requiring concerted community effort.
Acknowledgments.
This work is supported by Office of Science, U.S. Department of Energy Biological and Environmental Research as part of the Regional and Global Model Analysis program area. The Pacific Northwest National Laboratory (PNNL) is operated for DOE by Battelle Memorial Institute under Contract DE-AC05-76RLO1830. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility located at Lawrence Berkeley National Laboratory, operated under Contract DE-AC02-05CH11231 using NERSC Award ERCAP 0017358.
Data availability statement.
The monthly model datasets from Green’s function experiments using CAM5-SOM used for the current study can be found in an open-access data repository at https://doi.org/10.5281/zenodo.4588073. The code for producing the main results of this study is available at https://doi.org/10.5281/zenodo.13146522.
REFERENCES
Balduzzi, D., M. Frean, L. Leary, J. P. Lewis, K. W.-D. Ma, and B. McWilliams, 2017: The shattered gradients problem: If resnets are the answer, then what is the question? Proc. 34th Int. Conf. on Machine Learnings, Sydney, New South Wales, Australia, ICML, 342–350, http://proceedings.mlr.press/v70/balduzzi17b/balduzzi17b.pdf.
Barsugli, J. J., S.-I. Shin, and P. D. Sardeshmukh, 2006: Sensitivity of global warming to the pattern of tropical ocean warming. Climate Dyn., 27, 483–492, https://doi.org/10.1007/s00382-006-0143-7.
Bloch-Johnson, J., and Coauthors, 2024: The Green’s Function Model Intercomparison Project (GFMIP) protocol. J. Adv. Model. Earth Syst., 16, e2023MS003700, https://doi.org/10.1029/2023MS003700.
Das, A., and P. Rad, 2020: Opportunities and challenges in explainable artificial intelligence (XAI): A survey. arXiv, 2006.1157v2, https://doi.org/10.48550/arXiv.2006.11371.
Das, V., A. Pollack, U. Wollner, and T. Mukerji, 2019: Convolutional neural network for seismic impedance inversionCNN for seismic impedance inversion. Geophysics, 84, R869–R880, https://doi.org/10.1190/geo2018-0838.1.
Hassanzadeh, P., and Z. Kuang, 2016: The linear response function of an idealized atmosphere. Part I: Construction using Green’s functions and applications. J. Atmos. Sci., 73, 3423–3439, https://doi.org/10.1175/JAS-D-15-0338.1.
Hsiao, W.-T., Y.-T. Hwang, Y.-J. Chen, and S. M. Kang, 2022: The role of clouds in shaping tropical Pacific response pattern to extratropical thermal forcing. Geophys. Res. Lett., 49, e2022GL098023, https://doi.org/10.1029/2022GL098023.
Kabanikhin, S. I., 2008: Definitions and examples of inverse and ill-posed problems. J. Inverse Ill-posed Probl., 16, 317–357, https://doi.org/10.1515/JIIP.2008.019.
Kohlbrenner, M., A. Bauer, S. Nakajima, A. Binder, W. Samek, and S. Lapuschkin, 2020: Towards best practice in explaining neural network decisions with LRP. 2020 Int. Joint Conf. on Neural Networks, Glasgow, United Kingdom, Institute of Electrical and Electronics Engineers, 1–7, https://doi.org/10.1109/IJCNN48605.2020.9206975.
Kravitz, B., D. G. MacMartin, H. Wang, and P. J. Rasch, 2016: Geoengineering as a design problem. Earth Syst. Dyn., 7, 469–497, https://doi.org/10.5194/esd-7-469-2016.
Krizhevsky, A., I. Sutskever, and G. E. Hinton, 2017: ImageNet classification with deep convolutional neural networks. Commun. ACM, 60, 84–90, https://doi.org/10.1145/3065386.
LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436–444, https://doi.org/10.1038/nature14539.
Lin, Y.-J., Y.-T. Hwang, J. Lu, F. Liu, and B. E. J. Rose, 2021: The dominant contribution of Southern Ocean heat uptake to time-evolving radiative feedback in CESM. Geophys. Res. Lett., 48, e2021GL093302, https://doi.org/10.1029/2021GL093302.
Liu, F., J. Lu, O. Garuba, L. R. Leung, Y. Luo, and X. Wan, 2018a: Sensitivity of surface temperature to oceanic forcing via q-flux Green’s function experiments. Part I: Linear response function. J. Climate, 31, 3625–3641, https://doi.org/10.1175/JCLI-D-17-0462.1.
Liu, F., J. Lu, O. A. Garuba, Y. Huang, L. R. Leung, B. E. Harrop, and Y. Luo, 2018b: Sensitivity of surface temperature to oceanic forcing via q-flux Green’s function experiments. Part II: Feedback decomposition and polar amplification. J. Climate, 31, 6745–6761, https://doi.org/10.1175/JCLI-D-18-0042.1.
Lu, J., F. Liu, L. R. Leung, and H. Lei, 2020: Neutral modes of surface temperature and the optimal ocean thermal forcing for global cooling. npj Climate Atmos. Sci., 3, 9, https://doi.org/10.1038/s41612-020-0112-6.
Lu, J., D. Xue, L. R. Leung, F. Liu, F. Song, B. Harrop, and W. Zhou, 2021: The leading modes of Asian summer monsoon variability as pulses of atmospheric energy flow. Geophys. Res. Lett., 48, e2020GL091629, https://doi.org/10.1029/2020GL091629.
MacMartin, D. G., B. Kravitz, D. W. Keith, and A. Jarvis, 2014: Dynamics of the coupled human–climate system resulting from closed-loop control of solar geoengineering. Climate Dyn., 43, 243–258, https://doi.org/10.1007/s00382-013-1822-9.
Mohammad-Djafari, A., 2021: Regularization, Bayesian inference, and machine learning methods for inverse problems. Entropy, 23, 1673, https://doi.org/10.3390/e23121673.
National Academies of Sciences, Engineering, and Medicine, 2021: Reflecting Sunlight: Recommendations for Solar Geoengineering Research and Research Governance. National Academies Press, 328 pp.
Ongie, G., A. Jalal, C. A. Metzler, R. G. Baraniuk, A. G. Dimakis, and R. Willett, 2020: Deep learning techniques for inverse problems in imaging. IEEE J. Sel. Areas Inf. Theory, 1, 39–56, https://doi.org/10.1109/JSAIT.2020.2991563.
Simonyan, K., and A. Zisserman, 2014: Very deep convolutional networks for large-scale image recognition. arXiv, 1409.1556v6, https://doi.org/10.48550/arXiv.1409.1556.
Tang, J., and Q. Zhuang, 2008: Equifinality in parameterization of process-based biogeochemistry models: A significant uncertainty source to the estimation of regional carbon dynamics. J. Geophys. Res., 113, G04010, https://doi.org/10.1029/2008JG000757.
Tseng, H.-Y., Y.-T. Hwang, S.-P. Xie, Y.-H. Tseng, S. M. Kang, M. T. Luongo, and I. Eisenman, 2023: Fast and slow responses of the tropical Pacific to radiative forcing in northern high latitudes. J. Climate, 36, 5337–5349, https://doi.org/10.1175/JCLI-D-22-0622.1.
Vijayakumar, T., 2020: Posed inverse problem rectification using novel deep convolutional neural network. J. Innovative Image Process., 2, 121–127, https://doi.org/10.36548/jiip.2020.3.001.
Wang, Y., Q. Ge, W. Lu, and X. Yan, 2020: Well-logging constrained seismic inversion based on closed-loop convolutional neural network. IEEE Trans. Geosci. Remote Sens., 58, 5564–5574, https://doi.org/10.1109/TGRS.2020.2967344.