1. Introduction
The challenge of analyzing turbulence experimentally and numerically is tied to the broad range of multiscale physics constituting such flows. The computational cost for simulating such flow significantly increases with an increase in Reynolds number, as with flows encountered in nature and engineering problems. More specifically, for atmospheric turbulence, the added complexity of density stratification involving cloud microphysics and turbulence leads to greater challenges (Wyngaard 1992). Capturing the complex subgrid-scale processes at the scale of clouds is important for constructing accurate global climate models (Schneider et al. 2017b). Resolving these processes requires increased spatial and temporal resolution of the simulations.
There have been various efforts in the atmospheric science community to capture the effects of the subgrid-scale processes using some form of parameterization. Such parameterizations, like large-eddy simulations, rely on filtering the high-wavenumber (subgrid scale) structures and resolving only a coarser domain with embedded subgrid-scale (SGS) modeling (Pressel et al. 2015, 2017). The SGS models parameterize the viscous dissipation from the filtered subgrid-scale structures. Cloud resolving models (CRMs) are another example for such a modeling framework used in simulating global climate models (Randall et al. 2003; Randall 2013; Hannah et al. 2020). CRMs comprise a high-resolution cloud model which parameterizes deep convection for the fluid state at that location in the host model.
Such models are limited by the appropriate choice of the parameterization. Moreover, simulating such models with the global climate model at large climate time scales is still a computational challenge, even with the current generation of accelerator-based high-performance computing systems (Norman et al. 2022). Recently, the use of artificial intelligence has been at the forefront to form surrogate models parameterizing the subgrid-scale processes (Duraisamy et al. 2019; Schneider et al. 2017a; Brunton et al. 2020), taking advantage of the availability of high-resolution simulation data. These models generally rely on components in the eddy viscosity term of the filtered Navier–Stokes equation, which concentrate on capturing the dissipative effects (Gamahara and Hattori 2017; Rasp et al. 2018; Maulik et al. 2019a,b; Beck et al. 2019; Yuval and O’Gorman 2020). Moreover, deep learning techniques have been used to capture and parameterize the ubgrid-scale processes in global climate models (Krasnopolsky and Fox-Rabinovitz 2006; Krasnopolsky et al. 2013; Brenowitz and Bretherton 2018; Gentine et al. 2018; Rasp et al. 2018; Brenowitz and Bretherton 2019).
As stratified turbulent flows encountered in cloud physics also comprise significant antidissipative effects, it is beneficial to capture both the dissipative and antidissipative effects through the SGS parameterization. We approach this problem by directly modeling the state difference between a coarse simulation and a simultaneously running very fine-resolution simulation, enabling us to capture both the dissipative and antidissipative effects from the subgrid-scale processes. An overview of the modeling framework is shown in Fig. 1. We use the two-dimensional inviscid, nonhydrostatic, compressible Euler equations with density stratification for solving canonical mesoscale atmospheric flows and to generate subgrid-scale data as the difference between high- and low-resolution flow field data. Instead of handling the problem on the global spatial domain all at once, we choose to handle it locally using stencils of data (not unlike the nature of convolutions) to reduce the deep learning model size, improve generalization, and take advantage of parallel computations for inference. While the stencil-based framework also results in an increased number of samples available for training, our deep learning models require low amounts of training data. The current approach aims to assess the capability of spatially local stencil-based models to capture the complex nonlinear dynamics of stratified turbulence in idealized atmospheric flows.
An overview of the NN-based approach to capture the subgrid-scale effects missing in a coarse-resolution simulation. Potential temperature perturbations are used to visualize the flow fields.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
A similar approach has been recently utilized by Watt-Meyer et al. (2021) and Bretherton et al. (2022), where the authors use a nudging technique to integrate the machine learned model into the solver. The model is invoked at a certain time interval, as a correction step, nudging the states back to the ideal or expected values. In our framework, the model is invoked after every time step of the flow solver. To avoid chaotic divergence between the two simulations, the states of coarse simulation are updated by the states of fine simulation after each time step. This coupling differs from Watt-Meyer et al. (2021) in that it is more strict and more frequent. However, such coupling need not be stable for more realistic applications (Bretherton et al. 2022; Clark et al. 2022, manuscript submitted to ESS Open Arch.). Future research is needed to know which approach is ultimately more helpful toward the goal of developing a surrogate for SGS effects. The dissipative and antidissipative components are embedded into the coarse simulation as it is integrated with the high-resolution simulation as a parallel driver.
Moreover, we utilize residual neural networks to increase the complexity of deep learning to obtain a highly accurate model with improved stability. Random forest architectures are utilized by the aforementioned efforts in literature (Watt-Meyer et al. 2021; Bretherton et al. 2022) for modeling the state tendencies, which help stabilize the climate model by limiting the generation of arbitrarily high values by the machine learned model. Although that framework results in lower accuracy compared to that resulting from neural networks (NNs) (Brenowitz et al. 2020; Yuval et al. 2021), the stability brought about for online implementation of these models is of significant importance. NNs are known to perform better than random forests on accelerators because NNs are based on dense linear algebra, whereas random forests are based on graph traversals. That is not to say NNs are inherently better, but they do make more efficient use of accelerated hardware.
The current modeling approach for two-dimensional flows is useful as a pilot project to develop the surrogate modeling technique as shown by previous research (Kochkov et al. 2021), as it can give insights into the capability of different ML architectures to capture the nonlinear effects for a canonical flow. The insights gained can aid the modeling of more complex atmospheric flows. The present supervised approach can serve as a novel alternative to the SGS models of CRMs, capturing both dissipative and antidissipative effects of the subgrid-scale processes. The contributions of this work include the use of networks with a small number of parameters enabled by the stencil-based framework, a small amount of data required for training, and single time step corrections (similar to traditional turbulence model implementations) to achieve accurate generalizable models for capturing the high gradients of flow states.
In what follows, we first introduce the modeling approach and model problem setup in section 2. Then the results are discussed in section 3. Finally, we provide concluding remarks and a brief discussion on future extension of the current work in section 4.
2. Approach and problem setup
a. General procedure
b. Fluid flow problem
Equation (3) can be represented in vector form as ∂tq + ∂tf + ∂tg = s with the state vector
We show the time evolution of potential temperature from initial condition to a turbulent state in Fig. 2. The flow is perturbed from a neutrally stratified dry atmosphere. The domain is of size (x, z) ∈ [0, 20] km × [0, 10] km. Slip, solid wall boundary conditions are prescribed for the top and bottom walls. The left and right walls are prescribed with periodic boundary conditions. We choose the grid resolution, nx × nz, of the coarse simulation to be 200 × 100 (grid spacing of 100 m in all directions). The fine-resolution domain has a resolution of 1000 × 500, resulting in a grid mapping ratio of 5× between the coarse and fine domains. This grid mapping leads to a fine-grid spacing of 20 m in the fine-resolution simulation. The time steps are adjusted so that the Courant–Friedrichs–Lewy number is 0.8, corresponding to a time step of 0.23 and 0.046 s for the coarse- and fine-resolution simulations, respectively. We simulate the flows until 2000 s. After the initial laminar regime dominated with antidissipative effects, the flow evolves to a highly turbulent regime from about 500 s onward, as shown in Fig. 2. The temporal regimes consisting of both antidissipative and dissipative effects make the colliding thermal test case a useful model problem to test the capability of the current framework to capture these effects. Further details of the test case, solver, and numerical schemes can be found in Norman (2021).
Time evolution of the colliding thermal problem from initial condition to a turbulent state portrayed using potential temperature perturbations, θ′.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
To generate the training data for Δq, we start with the fine state qfine and the coarse state qcoarse, which is the coarse-grained high-resolution state
c. Network setup
-
A single layer of 45 neurons (single-layer model)–1849 learnable parameters
-
Ten layers with 45 neurons/layer in a ResNet-based configuration (He et al. 2016) for multilayer perceptrons, which we call the ResNet model–23 499 learnable parameters
-
Ten layers with 45 neurons/layer in a DenseNet-based configuration (Huang et al. 2017) for multilayer perceptrons, which we call the DenseNet model–132 808 learnable parameters
Illustration of the network model (the results discussed focus on the ResNet model) for modeling the state difference using stencil data.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
We tune various parameters of the NN architecture, called hyperparameters, through grid search to arrive at these architectures. These hyperparameters for the current approach involve (the values chosen are provided in parenthesis):
-
Number of neurons per layer (varied between 10 and 50; 45 was chosen)
-
Number of layers (varied between 1 and 20; 10 was chosen)
-
Activation function (ReLU and LeakyReLU; LeakyReLU with 0.1 slope was chosen to enable the negative values of the states)
-
Loss function at the output layer (L1, a combination of L1 for large errors and squared L2 for small errors, and squared L2; squared L2 or mean-squared-error loss was chosen)
-
Values for regularization at the input, hidden, and output layers (L1 and L2 regularization with regularization values ranging between 1 × 10−9 and 1 × 10−1; although both L1 and L2 regularization with a value of 1 × 10−6 had the best results within the range, no regularization was eventually used as they did not affect the model accuracy and numerical stability drastically)
-
Drop-out regularization (randomly zeroing out some of the elements of the tensors with a given probability Srivastava et al. 2014) at the input and hidden layers (varied between 0 and 0.1; only the input layer was chosen to have dropout with a value of 0.01)
-
Optimization routine (stochastic gradient descent, Adam, and NAdam; NAdam was chosen due to higher convergence rate)
-
Batch size of training samples (varied between 1 and 2048; 1024 was chosen).
We note that these grid searches are exploratory in nature and there are other avenues to find the most optimized NN architecture (Balaprakash et al. 2018; Egele et al. 2021; Hertel et al. 2020).
The single-layer model is used as a shallow, simple NN model to test the capability of NNs to capture the grid mapping. Drop-out regularization (Srivastava et al. 2014) is applied to the input layer to avoid overfitting and stabilize the single-layer model. Adding more linear layers with drop-out regularization to this model (layers that are equivalent to the only layer in the single-layer model) reduced the stability of the model during testing without a considerable increase in accuracy and ability to capture the nonlinear physics. This leads us to the last two models, the ResNet and DenseNet models. The last two models are deep NNs, used to increase the complexity of the network and accurately capture the highly nonlinear physics. These deep NNs take advantage of skip connections to enable sparse, complex, nonlinear relations between the input and output (He et al. 2016). The difference between the latter two is only in the manner in which the residual (or input) for each layer is computed—the first one uses addition, whereas the second uses concatenation.
We have found that the simple, small framework of using a stencil-based input and output to the network of MLPs performs similarly using convolutional neural networks (CNNs) during testing in unknown data regions. The architecture is intrinsically similar to convolution operations. More importantly, the stencil-based approach mimics a higher-order finite difference scheme. The current formulation does not assume a dissipation- or advection-based mapping for the relationship between qcoarse and Δq. Our approach is to let the NN capture the complex mathematical relations between the states of the center cell and neighboring cells with the state difference at the center cell in the coarse domain.
It is important to note that the actual domain of constraint for the fine-scale model, the domain of inputs used to predict the outputs, is wider than the 3 × 3 coarse cells over the temporal domain of a coarse-grid time step. Therefore, full constraint of the state difference is likely not possible. Still, we wished to maintain a simpler NN model with minimal inputs, and results will show that the 3 × 3 stencil is effective in providing a constraining set of inputs.
d. Training and validation of model
We have learned that sampling the data for training is a crucial step for building an accurate and stable NN model, which we elaborate on here. The training dataset comprises 1-M samples collected between the temporal regime of [0, 900] s. We did a preliminary grid search on the number of training samples required, spanning from a few 100-k to 10-M samples, and found 1-M samples to be enough for training accurate models. Note that this sampling set is obtained using less than 2% of the spatial data at each time step of the simulation. For comparison, the number of training data points generated for a full flow field model (like CNN models) would be over 313 M over the same temporal regime. The small stencil-based framework allows us to use significantly fewer samples for training the models.
To alleviate this bias issue, we use a 50:50 split to curate samples with and without high TV weighting. To sample n = 106/NTS data points at an instant in time, where NTS is the number of time steps, we apply a strict sampling criterion using TV of all the states of a data point. First, we perform a min–max normalization of V(q) for each state computed over the global spatial domain for that instant in time. Next, a threshold is selected to identify n/2 random samples with high TV weighting for all the states. Selecting this threshold poses further challenges as the flow evolves through both laminar and turbulent regimes, leading to a broad range of values for the states over time. Thus, the threshold for identifying the 50:50 split is manually adjusted for various flow regimes. Finally, the rest of the n/2 samples at an instant in time are randomly sampled from the remaining data points in the domain.
We train the model in a Python environment using PyTorch and then deploy the model in the C++ solver (Norman 2021) with graphics processing unit (GPU) acceleration for production usage. The “NAdam” optimizer is used for optimizing the weights of the NNs (Kingma and Ba 2014; Dozat 2016; Ma and Yarats 2019). For training the 1-M data points (with a 70:30 split between training and validation loops and random shuffling of training samples), we use minibatches of 1024 samples. Even with the use of minibatches, the model incurs high variance in accuracy over the training epochs. We have learned that using a higher learning rate at initial training epochs till saturation of the model accuracy and then lowering the value help reduce this high variance in accuracy. Thus, this process of reducing the learning rate represents a two-step reduction, and we use high and low learning rate values of 1 × 10−1 and 5 × 10−2, respectively. Furthermore, for the deep NNs, we also rely on an averaging technique to ensemble the model weights across various training epochs to alleviate the high variance in the training accuracy as shown in Fig. 4. After a regime of stable accuracy is reached over epochs (3000 epochs is set for our current problem), we collect ensembles of model weights from models with errors below a particular threshold (a mean-square-error value of 2.5 × 10−4 is used). The weights of the final model are an ensemble of these collected weights. The predictions using the NN are made only using this single final model, whose weights are computed from the ensembling procedure. The procedure helps average the gradient basin in the optimization manifold, aiding in forming a stable solution. This procedure has similarities with the stochastic weighted averaging (SWA) method introduced by Izmailov et al. (2018).
(top) Illustration of collecting ensembles to form ensemble model weights [similar to the stochastic weighted averaging technique by Izmailov et al. (2018)]. The green line represents the training regime over which the ensemble of the model weights is computed. The red shaded region denotes the training instances when the model acquires error values higher than the cutoff error and hence the weights of which are not considered for the ensembling procedure. (bottom) A sample training curve for the current modeling framework.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
3. Results
For an effective model predicting the subgrid-scale effects, the model should be accurate as well as computationally stable when tested in temporal regimes outside the training regime. As with subgrid-scale turbulence models, we perform both offline and online testing, commonly known as a priori and a posteriori testing in the turbulence literature (Piomelli et al. 1988), of the NN model in turbulent regimes outside the training regime. We also test the models in the initial laminar regime where antidissipative effects dominate. Testing was performed not only on the flow field snapshots but also on the state gradients, their statistical estimates, root-mean-square (rms) values of the states, total kinetic energy, and grid independence. The results reveal that the ResNet-based model gives the most accurate and computationally stable NN-coupled solver for a finite time period. Herein, we only show the results of the ResNet-based model, while we also discuss insights gained from the other models.
a. Offline testing
For the offline testing, the results from the NN model are directly compared with the expected solution and evaluated independently without coupling the NN model with the numerical solver. We first test the model with data randomly sampled in space and time outside the training regime, as shown in Fig. 5. The comparison between the true and predicted Δq for each flow state is shown in Figs. 5a–d, along with the coefficient of determination (regression score), R2. The ResNet model predicts the state differences with high accuracy and low spread, with R2 > 0.95. The normalized errors have a mean and standard deviation of
Comparison between the true state differences and those predicted by the ResNet-based NN model for random data points in space and time. The coefficient of determination, R2 value, for each state difference is also provided.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
Next, we observe the behavior of the model by using data from a full snapshot in a turbulent regime outside the temporal regime of the training data (after 900 s), as shown in Fig. 6 for Δ(ρu). The ResNet model predicts the state difference accurately with an overall Euclidean error norm of
Full flow field prediction by the NN model. Corrections of horizontal velocity are shown along with the L1 error norm, ϵ = |Δq − ΔqNN|/max(|Δq|), and L2 norm of the full flow field,
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
b. Online testing
The online testing is done by coupling the NN model correction with the coarse flow simulation. A sample depiction of the NN-coupled solver is shown in Fig. 7. The fine simulation is used as correction until the testing time, t0, after which the NN model is invoked. Therefore, at time t0, the model state is essentially perfect with respect to the fine-grid model. The solution at t0 can also be viewed as the coarse-grained fine-resolution simulation result. We use the PyTorch C++ API1 to couple the NN model with the C++ solver, performing the in-the-loop ML integration with the scientific solver. A GPU kernel is used to update the qcoarse with the machine learned ΔqNN after every time step, following Eq. (2). Even without special batching consideration of online inference and refactoring the code for GPU performance, we attain speed-ups up to 8× when using the NN-coupled solver compared to a fine-resolution simulation. Since the current discussion is concentrated more on the scientific aspects of the model, we reserve the full computational aspects of the framework for future investigation. In the following results, we show the results of testing the NN-coupled solver in a turbulent regime as well as in a laminar regime.
Sample depiction of online testing using the NN model.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
We invoke the NN model at time t0 = 1000 s, which is outside the training regime and well within the turbulent regime. Recall that the training data were sampled from t ∈ [0, 900] s. The results we show here are those after the simulation has run for t0 + 5 s, which is approximately 25 coarse-grid time steps. The results for ρu of the coarse-grained fine-resolution simulation (
Full flow state prediction using NN-based emulation in turbulent regime outside of training regime. Snapshots of horizontal velocity (ρu) after 5 s from the initial state of the testing regime are shown for (top) coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
The NN model, ρucoarse-NN, accurately predicts the flow evolution with sharp boundaries. The sharp boundaries in the shear-layer regions above and below the vortex core are captured better in ρucoarse-NN compared to ρucoarse-No correction. Moreover, the NN model maintains a maximum L1 difference of 6% and better than that of ρucoarse-No correction (7%), as shown in Fig. 8 bottom right. Note that the chaotic divergence of the modeled flow and its difference from the ideal states are inseparably combined in the results of the L1 difference norm. The main coherent structures are accurately captured in ρucoarse-NN, and the larger L1 errors from the small-scale fluctuations are isolated. Note that these regions of high local errors are located in high-gradient regions.
To provide a clear quantitative measure of the capability of the NN model to capture the sharp boundaries or high-gradient regions, we compute the gradients of ρu in both x and z directions, as shown in Figs. 9 and 10. The L2 difference norm of the gradients with respect to that of
(top) Comparison of gradients of ρu in the x direction for (left) coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
(top) Comparison of gradients of ρu in the z direction for (left) coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
Comparison of the probability density distributions of the gradients of (left) ∇x(ρu) and (right) ∇z(ρu) for coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
We summarize the L2 difference norm and the KL divergence values with respect to the
Comparison of ∇x(⋅) and ∇z(⋅) for qcoarse-No correction (abbreviated NC) and qcoarse-NN using the L2 difference norm and Kullback–Leibler divergence, DKL, with respect to the coarse-grained fine-resolution simulation. The results are for online testing performed from t0 = 1000 s, and the metrics are computed at t0 + 5 s. The values in bold indicate comparatively lower error.
We test the generalization of the model for various grid resolutions compared to that used for training. This is considering ideas of scale invariance for the NN model (and potentially the dynamics themselves). We perform online testing of the model trained using data from the 1000 × 500 → 200 × 100 grid mapping on the following grid mappings: 1) 500 × 250 → 100 × 50, 2) 1500 × 750 → 300 × 150, and 3) 2000 × 1000 → 400 × 200. Note that the grid mapping ratio of 5× is retained for each of the mappings. The total error at an instant in a turbulent regime, at t0 = 5 s, is shown in Fig. 12. The θ′ fields are also shown as insets for reference. The accuracy of the NN model for all the flow states scales according to the grid mapping—the error increases linearly with an increase in grid resolution.
Behavior of the model accuracy at online testing with simulation setups of various grid resolutions. The model was trained with stencil data from the 1000 × 500 → 200 × 100 grid mapping. All other cases have the same grid-mapping ratio of 5× between the respective coarse and fine grids. The total L1 error is given by
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
These results suggest various advantages of the stencil-based architecture used in the current modeling framework. The NN model trained at a lower-grid resolution can be used at a higher resolution, though with slightly more error, hence saving significant computational costs during training. Moreover, such a scaling would be difficult to achieve if the surrogate model has global spatial and/or temporal dependence, which are not considered in the current local, stencil-based framework. While we attribute the reasoning for the scaling to the stencil-based approach, resembling a finite difference scheme, we have yet to fully understand the mechanism behind this character of the model, and this investigation will be performed in future work.
Comparison of (top) the rms values of the flow velocities and (bottom) the total kinetic energy over time. The various lines represent the following: – the coarse-grained fine-resolution simulation, ⋅⋅⋅ the coarse simulation without correction from t0 = 1000 s, and – – the coarse simulation with NN correction. (left) The raw data and (right) the L1 difference with respect to the coarse-grained fine-resolution simulation.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
The results suggest that the NN-based correction is highly accurate over a finite time interval of ∼90 coarse-grid time steps, whereas further research needs to be done to make the solver stable from the accumulation of the errors at high-gradient regions. We note that the DenseNet model is able to perform similarly to that of the ResNet model, becoming unstable after t0 + 12 s but with higher levels of error before it becomes unstable. However, the single-layer model is highly unstable, as it becomes unstable after only t0 + 3 s. This inability of the single-layer model could suggest the utility of a complex formulation of the NN model with multiple layers and skip connections.
Various regularization techniques can be used to address these overestimates although we did not see many changes with such strategies. Utilizing the skip connections in ResNets was one of the ways in which the overestimates were controlled, but not fully alleviated. Since the overestimates do not satisfy the physics constraints of the fluid flow system, physics-informed techniques can be used to constrain the variabilities. We provide insights from our ongoing work on improving the stability of the NN-coupled solver in the conclusions section.
Testing in the laminar regime
Testing the model in a turbulent regime shows its capability to capture predominantly dissipative effects. Next, we demonstrate the capability of the NN model to effectively capture predominantly antidissipative effects by testing it in a transient regime dominated by laminar flow where mixing is not prevalent. For this testing in a laminar regime, we invoke the NN model from t0 = 300 s. The results after the simulation has run for t0 + 5 s are shown in Fig. 14 (similar to the plots in turbulent regime, Fig. 8). The NN model is able to accurately predict the corrections in laminar regimes as well, with a difference (L2 = 0.009) lesser than the difference acquired by the coarse simulation without correction (L2 = 0.013).
Full flow state prediction using NN-based emulation in laminar regime outside of training regime. Snapshots of potential temperature perturbation (ρu) after 5 s from the initial state of the testing regime are shown for (top) coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
We also plot the gradient of ρu in both x and z directions as shown in Figs. 15 and 16. The NN correction outperforms the coarse simulation without corrections for both ∇xρu and ∇zρu. Moreover, Table 2 shows the NN correction to outperform the coarse simulation without correction at capturing the high gradients for all the variables. These results in the laminar regime demonstrate the capability of the NN model to accurately capture high gradients in a flow regime not dominated by mixing and diffusion. Moreover, note that the NN model is accurate for both ρu and θ′ unlike the results in the turbulent regime. This observation suggests that we need to be more careful when choosing the variables being modeled and include the effects of hydrostasis along with the perturbed variables. Moreover, the observation could also suggest the NN model considers mixing in the vertical direction similar to that in the horizontal direction. Thus, a more anisotropic framework may suit the current flow setup.
(top) Comparison of gradients of ρu in the x direction at the laminar regime for (left) coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
(top) Comparison of gradients of ρu in the z direction at the laminar regime for (left) coarse-grained fine-resolution simulation (
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
Comparison of ∇x(⋅) and ∇z(⋅) for qcoarse-No correction (abbreviated NC) and qcoarse-NN using the L2 difference norm and Kullback–Leibler divergence, DKL, with respect to the coarse-grained fine-resolution simulation. The results are for online testing performed from t0 = 300 s, and the metrics are computed at t0 + 5 s. The values in bold indicate comparatively lower error.
We analyze the states in a one-dimensional space, which helps reveal the performance of the model at capturing sharp gradients in the laminar regime and, hence, demonstrates the ability of the model to capture antidissipative effects. For the above online test results at time t0 + 5 s, we analyze the flow states at a horizontal slice at z = 6 km as shown in Fig. 17. We chose the slice at z = 6 km carefully, where sharp gradients are present. The results show that the NN model (orange lines) is able to capture the most highly variable regions more accurately than the noncorrected coarse simulation (blue lines), e.g., at x = 6, 15 km in Fig. 17a. However, in some regions such as z ∈ [11, 13] km, the NN-corrected coarse simulation performs worse than the noncorrected coarse simulation. Nonetheless, even though qcoarse-No correction captures the sharp gradients over the short testing time period of 5 s, these will eventually be smoothed (diffused) over a finite period of time. The results of qcoarse (coarse simulation starting from t0 = 0 s without any correction; green lines) demonstrate this inability of the coarse simulation to retain the sharp gradients over time. All the above results suggest that the NN model has the capability of capturing and steepening gradients when appropriate, particularly in regions not dominated by mixing.
Slices of flow states at z = 6 km from the results in Fig. 14 at time t = t0 + 5 s. The line legends correspond to the following: dashed line (– –) is qfine, the blue line is qcoarse-No correction, the orange line is qcoarse-NN, and the green line is qcoarse, the coarse simulation without any correction from the initial condition at t = 0 s.
Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1
4. Concluding remarks and future work
We use deep learning to generate spatially local surrogate models from high-resolution data for capturing subgrid-scale effects in dry, stratified turbulence in atmospheric flows. Starting from the inviscid Euler equations with density stratification, we generate subgrid-scale data as a difference between high- and low-resolution flow field data. The coarse-resolution simulation is kept in sync with the fine-resolution simulation via coarsening interpolation (coarse graining) after each time step to avoid chaotic nonlinear divergence. The results show that a deep ResNet-type NN model using the stencil information around a given grid cell is able to accurately perform online tests for certain flow states and is stable over a finite time period. Such accuracy is consistent over both laminar and turbulent regimes, which are dominated by antidissipative and dissipative effects, respectively. Even without special batching considerations and refactoring the code for GPU performance, the NN-coupled solver attains speed-ups of up to 8× compared to the fine-resolution simulation. Moreover, we find that the model generalizes well to various grid resolutions even though trained with data from a particular grid resolution, giving potential computational savings by training at a lower resolution and inferencing at a higher resolution.
We find that in turbulent regimes, the model is able to accurately predict flow velocities but has lower accuracy for capturing the behavior density and potential temperature. This might suggest that hydrostasis might have to be explicitly taken into consideration in the framework. We note that while the model is able to outperform coarse simulations to capture the high gradients in the flow states, at some spatial regions, the model has low accuracy to capture the state differences. This could be a difficulty in the ability of the chosen simple neural network–based models to emulate highly discontinuous and variable phenomena, or it could also be due to the need for more training data in this regime. Due to the accumulation of such errors over time, the model becomes computationally unstable after a finite time, after ∼90 coarse-grid time steps. The high accuracy of the model for certain flow states and its capability to capture high gradients in the finite time where it is numerically stable are encouraging to learn from this pilot study to further enhance the capabilities of the framework.
The present supervised learning approach of capturing the state difference using a stencil framework could serve as a novel alternative to the subgrid-scale models in large-eddy simulations and CRMs, capturing both dissipative and antidissipative effects from the subgrid-scale phenomena. We are currently investigating alternate formulations to enhance the numerical stability of the model. Complex diffusion-based approaches inspired by traditional SGS models are an alternative approach in the physics-informed realm. Such models along with enforcing the conservation of various conserved quantities can help stabilize the NN-coupled solver (Ling et al. 2016). Moreover, recurrent neural networks and generative adversarial networks can be used for correcting the emulation. Addressing the numerical stability of the solver against extrema generation can also help stabilize the NN-coupled solver.
The current approach is an exploratory work to assess the capabilities and nuances of spatially local stencil-based models to capture the complex nonlinear dynamics of stratified turbulence in idealized atmospheric flows. We do not perform a comparison with spatially global models, like those considering the full flow field using convolutional-based (CNN) models. Such a comparison between local stencil-based and global CNN-based models is indeed a great topic for research and an important one. Moreover, while we have tested the generalizability of the model to scale over different grid resolutions, we have not tested the generalizability of the model with other types of flows. In this pilot study, we tackle a single type of flow, which comprises both laminar and turbulent regimes, to understand the capability of the framework to model subgrid-scale effects. Future work would involve extending the generalizability of the framework to work for different types of flows. Such efforts would require updating the framework according to the equation of state being solved and retraining the model. A more achievable goal would be to incorporate appropriate flow parameters into the model so that a single model can be tested at different parameter regimes.
Another aspect we have not explored in great depth in this manuscript is the explainability and interpretability of the NN models. Recently, various techniques have been used to understand the physical implications of machine learning, particularly in the meteorological domains (McGovern et al. 2019). Finally, performant, portable, and scalable integration of the model to the solver in hybrid architectures (Partee et al. 2021; Brewer et al. 2021; Bhushan et al. 2021) is of utmost urgency for practical production deployment of such models in climate models.
https://pytorch.org/cppdocs/. A sample implementation of the API is provided at https://github.com/muralikrishnangm/pytorch-cpp-example.
Acknowledgments.
The authors thank the insightful discussions with Kyle Pressel. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources from the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The U.S. government retains, and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript or allow others to do so, for U.S. government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).
Data availability statement.
The specific data, codes, and trained models are openly accessible in Gopalakrishnan Meena and Norman (2024). A collection of sample datasets used for training and testing can also be found in Gopalakrishnan Meena and Norman (2022a,b). The source code for the original ILES method, AWFL, used in this work can be found at https://github.com/mrnorman/awflCloud.
REFERENCES
Balaprakash, P., M. Salim, T. D. Uram, V. Vishwanath, and S. M. Wild, 2018: DeepHyper: Asynchronous hyperparameter search for deep neural networks. IEEE 25th Int. Conf. on High Performance Computing, Bengaluru, India, Institute of Electrical and Electronics Engineers, 42–51, https://doi.org/10.1109/HiPC.2018.00014.
Beck, A., D. Flad, and C.-D. Munz, 2019: Deep neural networks for data-driven LES closure models. J. Comput. Phys., 398, 108910, https://doi.org/10.1016/j.jcp.2019.108910.
Bhushan, S., G. W. Burgreen, W. Brewer, and I. D. Dettwiller, 2021: Development and validation of a machine learned turbulence model. Energies, 14, 1465, https://doi.org/10.3390/en14051465.
Boyd, S. P., and L. Vandenberghe, 2004: Convex Optimization. Cambridge University Press, 716 pp.
Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 6289–6298, https://doi.org/10.1029/2018GL078510.
Brenowitz, N. D., and C. S. Bretherton, 2019: Spatially extended tests of a neural network parametrization trained by coarse-graining. J. Adv. Model. Earth Syst., 11, 2728–2744, https://doi.org/10.1029/2019MS001711.
Brenowitz, N. D., B. Henn, J. McGibbon, S. K. Clark, A. Kwa, W. A. Perkins, O. Watt-Meyer, and C. S. Bretherton, 2020: Machine learning climate model dynamics: Offline versus online performance. arXiv, 2011.03081v1, https://doi.org/10.48550/arXiv.2011.03081.
Bretherton, C. S., and Coauthors, 2022: Correcting coarse-grid weather and climate models by machine learning from global storm-resolving simulations. J. Adv. Model. Earth Syst., 14, e2021MS002794, https://doi.org/10.1029/2021MS002794.
Brewer, W., D. Martinez, M. Boyer, D. Jude, A. Wissink, B. Parsons, J. Yin, and V. Anantharaj, 2021: Production deployment of machine-learned rotorcraft surrogate models on HPC. 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), St. Louis, MO, Institute of Electrical Electronics Engineers, 21–32, https://doi.ieeecomputersociety.org/10.1109/MLHPC54614.2021.00008.
Brunton, S. L., B. R. Noack, and P. Koumoutsakos, 2020: Machine learning for fluid mechanics. Annu. Rev. Fluid Mech., 52, 477–508, https://doi.org/10.1146/annurev-fluid-010719-060214.
Csiszár, I., 1975: I-divergence geometry of probability distributions and minimization problems. Ann. Probab., 3, 146–158, https://doi.org/10.1214/aop/1176996454.
Dozat, T., 2016: Incorporating Nesterov momentum into Adam. Proc. Int. Conf. on Learning Representations, Workshop Track, San Juan, Puerto Rico, ICLR, 1–4.
Duraisamy, K., G. Iaccarino, and H. Xiao, 2019: Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51, 357–377, https://doi.org/10.1146/annurev-fluid-010518-040547.
Egele, R., P. Balaprakash, I. Guyon, V. Vishwanath, F. Xia, R. Stevens, and Z. Liu, 2021: AgEBO-tabular: Joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular data. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, Association for Computing Machinery, 1–14, https://doi.org/10.1145/3458817.3476203.
Gamahara, M., and Y. Hattori, 2017: Searching for turbulence models by artificial neural network. Phys. Rev. Fluids, 2, 054604, https://doi.org/10.1103/PhysRevFluids.2.054604.
Gentine, P., M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis, 2018: Could machine learning break the convection parameterization deadlock? Geophys. Res. Lett., 45, 5742–5751, https://doi.org/10.1029/2018GL078202.
Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier neural networks. Proc. 14th Int. Conf. on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, PMLR, 315–323, https://proceedings.mlr.press/v15/glorot11a.html.
Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 775 pp.
Gopalakrishnan Meena, M., and M. Norman, 2022a: Subgrid-scale effects in cloud-like atmospheric flows: Colliding thermals –Volume 1. Zenodo, accessed 5 January 2022, https://doi.org/10.5281/zenodo.5732523.
Gopalakrishnan Meena, M., and M. Norman, 2022b: Subgrid-scale effects in cloud-like atmospheric flows: Colliding thermals – Volume 2. Zenodo, accessed 5 January 2022, https://doi.org/10.5281/zenodo.5732987.
Gopalakrishnan Meena, M., and M. Norman, 2024: Awfl-sgs-ml: A deep learned spatially local surrogate model of subgrid-scale effects in idealized atmospheric flows. Zenodo, accessed 13 June 2024, https://doi.org/10.5281/zenodo.11636025.
Hannah, W. M., and Coauthors, 2020: Initial results from the super-parameterized E3SM. J. Adv. Model. Earth Syst., 12, e2019MS001863, https://doi.org/10.1029/2019MS001863.
He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Institute of Electrical and Electronics Engineers, 770–778, https://doi.org/10.1109/CVPR.2016.90.
Hertel, L., J. Collado, P. Sadowski, J. Ott, and P. Baldi, 2020: Sherpa: Robust hyperparameter optimization for machine learning. SoftwareX, 12, 100591, https://doi.org/10.1016/j.softx.2020.100591.
Huang, G., Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017: Densely connected convolutional networks. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Institute of Electrical and Electronics Engineers, 2261–2269, https://doi.org/10.1109/CVPR.2017.243.
Izmailov, P., D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, 2018: Averaging weights leads to wider optima and better generalization. 34th Conf. on Uncertainty in Artificial Intelligence, Vol. 2, Monterey, CA, Association for Uncertainty in Artificial Intelligence, 876–885, https://nyuscholars.nyu.edu/en/publications/averaging-weights-leads-to-wider-optima-and-better-generalization.
Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.
Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, 2021: Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA, 118, e2101784118, https://doi.org/10.1073/pnas.2101784118.
Krasnopolsky, V. M., and M. S. Fox-Rabinovitz, 2006: Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Networks, 19, 122–134, https://doi.org/10.1016/j.neunet.2006.01.002.
Krasnopolsky, V. M., M. S. Fox-Rabinovitz, and A. A. Belochitski, 2013: Using ensemble of neural networks to learn stochastic convection parameterizations for climate and numerical weather prediction models from data simulated by a cloud resolving model. Adv. Artif. Neural Syst., 2013, 485913, https://doi.org/10.1155/2013/485913.
Kullback, S., and R. A. Leibler, 1951: On information and sufficiency. Ann. Math. Stat., 22, 79–86, https://doi.org/10.1214/aoms/1177729694.
Ling, J., A. Kurzawski, and J. Templeton, 2016: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech., 807, 155–166, https://doi.org/10.1017/jfm.2016.615.
Ma, J., and D. Yarats, 2019: Quasi-hyperbolic momentum and Adam for deep learning. Int. Conf. on Learning Representations, New Orleans, LA, ICLR, 1–38.
Maas, A. L., A. Y. Hannun, and A. Y. Ng, 2013: Rectifier nonlinearities improve neural network acoustic models. Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, International Machine Learning Society, 1–6.
Maulik, R., O. San, J. D. Jacob, and C. Crick, 2019a: Sub-grid scale model classification and blending through deep learning. J. Fluid Mech., 870, 784–812, https://doi.org/10.1017/jfm.2019.254.
Maulik, R., O. San, A. Rasheed, and P. Vedula, 2019b: Subgrid modelling for two-dimensional turbulence using neural networks. J. Fluid Mech., 858, 122–144, https://doi.org/10.1017/jfm.2018.770.
McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 2175–2199, https://doi.org/10.1175/BAMS-D-18-0195.1.
Norman, M. R., 2021: A high-order WENO-limited finite-volume algorithm for atmospheric flow using the ADER-differential transform time discretization. Quart. J. Roy. Meteor. Soc., 147, 1661–1690, https://doi.org/10.1002/qj.3989.
Norman, M. R., and Coauthors, 2022: Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF’s summit supercomputer. Int. J. High Perform. Comput. Appl., 36, 93–105, https://doi.org/10.1177/10943420211027539.
Partee, S., M. Ellis, A. Rigazzi, S. Bachman, G. Marques, A. Shao, and B. Robbins, 2021: Using machine learning at scale in HPC simulations with SmartSim: An application to ocean climate modeling. arXiv, 2104.09355v1, https://doi.org/10.48550/arXiv.2104.09355.
Piomelli, U., P. Moin, and J. H. Ferziger, 1988: Model consistency in large eddy simulation of turbulent channel flows. Phys. Fluids, 31, 1884–1891, https://doi.org/10.1063/1.866635.
Pressel, K. G., C. M. Kaul, T. Schneider, Z. Tan, and S. Mishra, 2015: Large-eddy simulation in an anelastic framework with closed water and entropy balances. J. Adv. Model. Earth Syst., 7, 1425–1456, https://doi.org/10.1002/2015MS000496.
Pressel, K. G., S. Mishra, T. Schneider, C. M. Kaul, and Z. Tan, 2017: Numerics and subgrid-scale modeling in large eddy simulations of stratocumulus clouds. J. Adv. Model. Earth Syst., 9, 1342–1365, https://doi.org/10.1002/2016MS000778.
Randall, D., 2013: Beyond deadlock. Geophys. Res. Lett., 40, 5970–5976, https://doi.org/10.1002/2013GL057998.
Randall, D., M. Khairoutdinov, A. Arakawa, and W. Grabowski, 2003: Breaking the cloud parameterization deadlock. Bull. Amer. Meteor. Soc., 84, 1547–1564, https://doi.org/10.1175/BAMS-84-11-1547.
Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 9684–9689, https://doi.org/10.1073/pnas.1810286115.
Schneider, T., S. Lan, A. Stuart, and J. Teixeira, 2017a: Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett., 44, 12 396–12 417, https://doi.org/10.1002/2017GL076101.
Schneider, T., J. Teixeira, C. S. Bretherton, F. Brient, K. G. Pressel, C. Schär, and A. P. Siebesma, 2017b: Climate goals and computing the future of clouds. Nat. Climate Change, 7, 3–5, https://doi.org/10.1038/nclimate3190.
Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, 2014: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15, 1929–1958.
Watt-Meyer, O., N. D. Brenowitz, S. K. Clark, B. Henn, A. Kwa, J. McGibbon, W. A. Perkins, and C. S. Bretherton, 2021: Correcting weather and climate models by machine learning nudged historical simulations. Geophys. Res. Lett., 48, e2021GL092555, https://doi.org/10.1029/2021GL092555.
Wyngaard, J. C., 1992: Atmospheric turbulence. Annu. Rev. Fluid Mech., 24, 205–234, https://doi.org/10.1146/annurev.fl.24.010192.001225.
Yuval, J., and P. A. O’Gorman, 2020: Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions. Nat. Commun., 11, 3295, https://doi.org/10.1038/s41467-020-17142-3.
Yuval, J., P. A. O’Gorman, and C. N. Hill, 2021: Use of neural networks for stable, accurate and physically consistent parameterization of subgrid atmospheric processes with good performance at reduced precision. Geophys. Res. Lett., 48, e2020GL091363, https://doi.org/10.1029/2020GL091363.