Spatially Local Surrogate Modeling of Subgrid-Scale Effects in Idealized Atmospheric Flows: A Deep Learned Approach Using High-Resolution Simulation Data

Muralikrishnan Gopalakrishnan Meena aNational Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee

Search for other papers by Muralikrishnan Gopalakrishnan Meena in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-4048-4639
,
Matthew R. Norman aNational Center for Computational Sciences, Oak Ridge National Laboratory, Oak Ridge, Tennessee

Search for other papers by Matthew R. Norman in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-4764-3348
,
David M. Hall bNVIDIA Corporation, Lafayette, Colorado

Search for other papers by David M. Hall in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-0961-1196
, and
Michael S. Pritchard bNVIDIA Corporation, Lafayette, Colorado
cUniversity of California, Irvine, California

Search for other papers by Michael S. Pritchard in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-0340-6327
Open access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

We introduce a machine learned surrogate model from high-resolution simulation data to capture the subgrid-scale effects in dry, stratified atmospheric flows. We use deep neural networks (NNs) to model the spatially local state differences between a coarse-resolution simulation and a high-resolution simulation. The setup enables the capture of both dissipative and antidissipative effects in the state differences. The NN model is able to accurately capture the state differences in offline tests outside the training regime. In online tests intended for production use, the NN-coupled coarse simulation has higher accuracy over a significant period of time compared to the coarse-resolution simulation without any correction. We provide evidence of the capability of the NN model to accurately capture high-gradient regions in the flow field. With the accumulation of the errors, the NN-coupled simulation becomes computationally unstable after approximately 90 coarse simulation time steps. Insights gained from these surrogate models further pave the way for formulating stable, complex, physics-based spatially local NN models which are driven by traditional subgrid-scale turbulence closure models.

Significance Statement

Flows in the atmosphere are highly chaotic and turbulent, comprising flow structures of broad scales. For effective computational modeling of atmospheric flows, the effects of the small- and large-scale structures need to be captured by the simulations. Capturing the small-scale structures requires fine-resolution simulations. Even with the current state-of-the-art supercomputers, it can be prohibitively expensive to simulate these flows when computed for the entire earth over climate time scales. Thus, it is necessary to focus on the larger-scale structures using a coarse-resolution simulation while capturing the effects of the smaller-scale structures using some parameterization (approximation) scheme and incorporating it into the coarse-resolution simulation. We use machine learning to model the effects of the small-scale structures (subgrid-scale effects) in atmospheric flows. Data from a fine-resolution simulation is used to compute the missing subgrid-scale effects in coarse-resolution simulations. We then use machine learning models to approximate these differences between the coarse- and fine-resolution simulations. We see improved accuracy for the coarse-resolution simulations when corrected using these machine learned models.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Muralikrishnan Gopalakrishnan Meena, gopalakrishm@ornl.gov

Abstract

We introduce a machine learned surrogate model from high-resolution simulation data to capture the subgrid-scale effects in dry, stratified atmospheric flows. We use deep neural networks (NNs) to model the spatially local state differences between a coarse-resolution simulation and a high-resolution simulation. The setup enables the capture of both dissipative and antidissipative effects in the state differences. The NN model is able to accurately capture the state differences in offline tests outside the training regime. In online tests intended for production use, the NN-coupled coarse simulation has higher accuracy over a significant period of time compared to the coarse-resolution simulation without any correction. We provide evidence of the capability of the NN model to accurately capture high-gradient regions in the flow field. With the accumulation of the errors, the NN-coupled simulation becomes computationally unstable after approximately 90 coarse simulation time steps. Insights gained from these surrogate models further pave the way for formulating stable, complex, physics-based spatially local NN models which are driven by traditional subgrid-scale turbulence closure models.

Significance Statement

Flows in the atmosphere are highly chaotic and turbulent, comprising flow structures of broad scales. For effective computational modeling of atmospheric flows, the effects of the small- and large-scale structures need to be captured by the simulations. Capturing the small-scale structures requires fine-resolution simulations. Even with the current state-of-the-art supercomputers, it can be prohibitively expensive to simulate these flows when computed for the entire earth over climate time scales. Thus, it is necessary to focus on the larger-scale structures using a coarse-resolution simulation while capturing the effects of the smaller-scale structures using some parameterization (approximation) scheme and incorporating it into the coarse-resolution simulation. We use machine learning to model the effects of the small-scale structures (subgrid-scale effects) in atmospheric flows. Data from a fine-resolution simulation is used to compute the missing subgrid-scale effects in coarse-resolution simulations. We then use machine learning models to approximate these differences between the coarse- and fine-resolution simulations. We see improved accuracy for the coarse-resolution simulations when corrected using these machine learned models.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Muralikrishnan Gopalakrishnan Meena, gopalakrishm@ornl.gov

1. Introduction

The challenge of analyzing turbulence experimentally and numerically is tied to the broad range of multiscale physics constituting such flows. The computational cost for simulating such flow significantly increases with an increase in Reynolds number, as with flows encountered in nature and engineering problems. More specifically, for atmospheric turbulence, the added complexity of density stratification involving cloud microphysics and turbulence leads to greater challenges (Wyngaard 1992). Capturing the complex subgrid-scale processes at the scale of clouds is important for constructing accurate global climate models (Schneider et al. 2017b). Resolving these processes requires increased spatial and temporal resolution of the simulations.

There have been various efforts in the atmospheric science community to capture the effects of the subgrid-scale processes using some form of parameterization. Such parameterizations, like large-eddy simulations, rely on filtering the high-wavenumber (subgrid scale) structures and resolving only a coarser domain with embedded subgrid-scale (SGS) modeling (Pressel et al. 2015, 2017). The SGS models parameterize the viscous dissipation from the filtered subgrid-scale structures. Cloud resolving models (CRMs) are another example for such a modeling framework used in simulating global climate models (Randall et al. 2003; Randall 2013; Hannah et al. 2020). CRMs comprise a high-resolution cloud model which parameterizes deep convection for the fluid state at that location in the host model.

Such models are limited by the appropriate choice of the parameterization. Moreover, simulating such models with the global climate model at large climate time scales is still a computational challenge, even with the current generation of accelerator-based high-performance computing systems (Norman et al. 2022). Recently, the use of artificial intelligence has been at the forefront to form surrogate models parameterizing the subgrid-scale processes (Duraisamy et al. 2019; Schneider et al. 2017a; Brunton et al. 2020), taking advantage of the availability of high-resolution simulation data. These models generally rely on components in the eddy viscosity term of the filtered Navier–Stokes equation, which concentrate on capturing the dissipative effects (Gamahara and Hattori 2017; Rasp et al. 2018; Maulik et al. 2019a,b; Beck et al. 2019; Yuval and O’Gorman 2020). Moreover, deep learning techniques have been used to capture and parameterize the ubgrid-scale processes in global climate models (Krasnopolsky and Fox-Rabinovitz 2006; Krasnopolsky et al. 2013; Brenowitz and Bretherton 2018; Gentine et al. 2018; Rasp et al. 2018; Brenowitz and Bretherton 2019).

As stratified turbulent flows encountered in cloud physics also comprise significant antidissipative effects, it is beneficial to capture both the dissipative and antidissipative effects through the SGS parameterization. We approach this problem by directly modeling the state difference between a coarse simulation and a simultaneously running very fine-resolution simulation, enabling us to capture both the dissipative and antidissipative effects from the subgrid-scale processes. An overview of the modeling framework is shown in Fig. 1. We use the two-dimensional inviscid, nonhydrostatic, compressible Euler equations with density stratification for solving canonical mesoscale atmospheric flows and to generate subgrid-scale data as the difference between high- and low-resolution flow field data. Instead of handling the problem on the global spatial domain all at once, we choose to handle it locally using stencils of data (not unlike the nature of convolutions) to reduce the deep learning model size, improve generalization, and take advantage of parallel computations for inference. While the stencil-based framework also results in an increased number of samples available for training, our deep learning models require low amounts of training data. The current approach aims to assess the capability of spatially local stencil-based models to capture the complex nonlinear dynamics of stratified turbulence in idealized atmospheric flows.

Fig. 1.
Fig. 1.

An overview of the NN-based approach to capture the subgrid-scale effects missing in a coarse-resolution simulation. Potential temperature perturbations are used to visualize the flow fields.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

A similar approach has been recently utilized by Watt-Meyer et al. (2021) and Bretherton et al. (2022), where the authors use a nudging technique to integrate the machine learned model into the solver. The model is invoked at a certain time interval, as a correction step, nudging the states back to the ideal or expected values. In our framework, the model is invoked after every time step of the flow solver. To avoid chaotic divergence between the two simulations, the states of coarse simulation are updated by the states of fine simulation after each time step. This coupling differs from Watt-Meyer et al. (2021) in that it is more strict and more frequent. However, such coupling need not be stable for more realistic applications (Bretherton et al. 2022; Clark et al. 2022, manuscript submitted to ESS Open Arch.). Future research is needed to know which approach is ultimately more helpful toward the goal of developing a surrogate for SGS effects. The dissipative and antidissipative components are embedded into the coarse simulation as it is integrated with the high-resolution simulation as a parallel driver.

Moreover, we utilize residual neural networks to increase the complexity of deep learning to obtain a highly accurate model with improved stability. Random forest architectures are utilized by the aforementioned efforts in literature (Watt-Meyer et al. 2021; Bretherton et al. 2022) for modeling the state tendencies, which help stabilize the climate model by limiting the generation of arbitrarily high values by the machine learned model. Although that framework results in lower accuracy compared to that resulting from neural networks (NNs) (Brenowitz et al. 2020; Yuval et al. 2021), the stability brought about for online implementation of these models is of significant importance. NNs are known to perform better than random forests on accelerators because NNs are based on dense linear algebra, whereas random forests are based on graph traversals. That is not to say NNs are inherently better, but they do make more efficient use of accelerated hardware.

The current modeling approach for two-dimensional flows is useful as a pilot project to develop the surrogate modeling technique as shown by previous research (Kochkov et al. 2021), as it can give insights into the capability of different ML architectures to capture the nonlinear effects for a canonical flow. The insights gained can aid the modeling of more complex atmospheric flows. The present supervised approach can serve as a novel alternative to the SGS models of CRMs, capturing both dissipative and antidissipative effects of the subgrid-scale processes. The contributions of this work include the use of networks with a small number of parameters enabled by the stencil-based framework, a small amount of data required for training, and single time step corrections (similar to traditional turbulence model implementations) to achieve accurate generalizable models for capturing the high gradients of flow states.

In what follows, we first introduce the modeling approach and model problem setup in section 2. Then the results are discussed in section 3. Finally, we provide concluding remarks and a brief discussion on future extension of the current work in section 4.

2. Approach and problem setup

a. General procedure

We train NNs to learn corrections to the flow states of a coarse-resolution simulation, qcoarse, to capture missing subgrid-scale dynamics. The NNs are trained by running a high-resolution simulation in parallel, qfine, and building a training dataset of coarse fields and their corresponding corrections. The majority of subgrid-scale effects missing from the coarse-resolution simulation are captured by the state difference as follows:
Δq=q¯fineqcoarse,
where q¯fine denotes qfine interpolated to the coarse-resolution grid stencil by averaging over the stencil around a given cell in the fine grid to reduce the dimension to that of the coarse grid. We provide a detailed description of generating the training data for Δq in section 2b. A sample portrayal of the procedure is shown in Fig. 1. Given coarse-resolution data, the objective is to obtain an NN learned state difference, ΔqNN, and correct the coarse states to get
qNN=qcoarse+ΔqNN.
The procedure resembles a mapping of the states from a fine grid to a coarse grid. The coarse-resolution grid results in higher numerical diffusion compared to the high-resolution grid and is unable to capture the small-scale eddies. Thus, the coarse simulation fails to capture the diffusive and high-gradient changes to the states caused by the subgrid-scale structures. However, these diffusive and advective effects are captured by the higher-order domain discretization performed in the fine-resolution grid. Thus, modeling the state difference between the coarse- and fine-resolution simulations enables us to capture both the dissipative and antidissipative (or advection dominated) effects of the subgrid-scale processes. In the following section, we will introduce the flow problem we use to demonstrate the modeling technique. We will discuss the details of the NN architecture and training procedures in sections 2c and 2d.

b. Fluid flow problem

We consider a two-dimensional thermal collision of two hot and cold thermals as a sample problem to demonstrate the modeling framework (Norman 2021). The colliding thermals create strong discontinuities, strong winds, and significant turbulent regimes as the flow evolves in time. The flow evolution is described by the two-dimensional, dry, compressible, nonhydrostatic Euler equations, given by
t[ρρuρwρθ]+x[ρuρu2+pρuwρuθ]+z[ρwρuwρw2+ppHρwθ]=[00(ρρH)g0],
ρH=1gpHz,
where ρ is the density; u and w are the wind velocities in the x and z directions (horizontal and vertical directions), respectively; θ is the potential temperature; p = C0(ρθ)γ is pressure (C0 is a constant, γ = cp/cυ, cp is the specific heat of dry air at constant pressure, and cυ is the specific heat of dry air at constant volume); and g is the acceleration due to gravity. The vertical profiles of pressure and density are dominated by hydrostatic balance. The variables pH and ρH are the hydrostatic pressure and density due to this hydrostasis, defined by Eq. (4).

Equation (3) can be represented in vector form as ∂tq + ∂tf + ∂tg = s with the state vector q=[ρρuρwρθ]T. We only model the perturbed scalars, ρ′ and [(ρθ)′], removing the dominant underlying hydrostatic balance described by Eq. (4). Although inviscid, the Euler equations form the basis of atmospheric flow simulations, and inherent numerical dissipation maintains stability. Moreover, the two-dimensional setting is an apt choice for idealized simulation as the majority of idealized test cases in the literature are two-dimensional.

We show the time evolution of potential temperature from initial condition to a turbulent state in Fig. 2. The flow is perturbed from a neutrally stratified dry atmosphere. The domain is of size (x, z) ∈ [0, 20] km × [0, 10] km. Slip, solid wall boundary conditions are prescribed for the top and bottom walls. The left and right walls are prescribed with periodic boundary conditions. We choose the grid resolution, nx × nz, of the coarse simulation to be 200 × 100 (grid spacing of 100 m in all directions). The fine-resolution domain has a resolution of 1000 × 500, resulting in a grid mapping ratio of 5× between the coarse and fine domains. This grid mapping leads to a fine-grid spacing of 20 m in the fine-resolution simulation. The time steps are adjusted so that the Courant–Friedrichs–Lewy number is 0.8, corresponding to a time step of 0.23 and 0.046 s for the coarse- and fine-resolution simulations, respectively. We simulate the flows until 2000 s. After the initial laminar regime dominated with antidissipative effects, the flow evolves to a highly turbulent regime from about 500 s onward, as shown in Fig. 2. The temporal regimes consisting of both antidissipative and dissipative effects make the colliding thermal test case a useful model problem to test the capability of the current framework to capture these effects. Further details of the test case, solver, and numerical schemes can be found in Norman (2021).

Fig. 2.
Fig. 2.

Time evolution of the colliding thermal problem from initial condition to a turbulent state portrayed using potential temperature perturbations, θ′.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

To generate the training data for Δq, we start with the fine state qfine and the coarse state qcoarse, which is the coarse-grained high-resolution state q¯fine using a stencil of 5 × 5 corresponding to the grid mapping ratio of 5×. The coarse graining for the boundary cells is computed using ghost cell values based on the respective boundary conditions in the horizontal and vertical walls. We then advance each of these two states, qcoarse and qfine, by a single coarse-resolution time step (five time steps for the fine simulation in the current case) and then calculate the state difference, Δq, between them. Finally, qcoarse is replaced by q¯fine at the end of the coarse time step, and the procedure is repeated.

c. Network setup

We deploy a supervised deep learning technique to model the relation between the inputs, a local stencil of spatially coarse-resolution states qcoarse, and the outputs, state difference in the center cell of the stencil with the spatially fine-resolution states, Δq. Note that the mapping is only in space and not in time, and thus, the coarsening is done just in space. We model this mapping between qcoarse and Δq using deep NNs of the form N(qcoarse,w) with w representing the parameters (“weights”) of the NN. The learning process involves optimizing the parameters w of the NN by minimizing the loss computed by the function L[Δq,N(qcoarse,w)]. In the current analysis, we use a squared L2 norm as the loss function:
L[Δq,N(qcoarse,w)]=ΔqN(qcoarse,w)22.
A stencil-based approach is used to build the architecture of N(qcoarse,w). We use the stencil around each grid cell in the coarse simulation to model the Δq at the respective locations as shown in Fig. 3. The framework of N(qcoarse,w) is composed of 36 inputs, the flattened vector of 4 state variables of the 3 × 3 grid stencil around a given coarse cell. The outputs (4 values) are the Δq at the given grid cell (the center of the stencil of inputs). The hidden layer(s) of the NN comprise neurons, which enable the nonlinear activation function to capture the relation between the inputs and outputs (Goodfellow et al. 2016). We compare three different types of NN architectures for the hidden layer(s) with a leaky ReLU activation function having a slope of 0.1 (Glorot et al. 2011; Maas et al. 2013):
  1. A single layer of 45 neurons (single-layer model)–1849 learnable parameters

  2. Ten layers with 45 neurons/layer in a ResNet-based configuration (He et al. 2016) for multilayer perceptrons, which we call the ResNet model–23 499 learnable parameters

  3. Ten layers with 45 neurons/layer in a DenseNet-based configuration (Huang et al. 2017) for multilayer perceptrons, which we call the DenseNet model–132 808 learnable parameters

Fig. 3.
Fig. 3.

Illustration of the network model (the results discussed focus on the ResNet model) for modeling the state difference using stencil data.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

We tune various parameters of the NN architecture, called hyperparameters, through grid search to arrive at these architectures. These hyperparameters for the current approach involve (the values chosen are provided in parenthesis):

  1. Number of neurons per layer (varied between 10 and 50; 45 was chosen)

  2. Number of layers (varied between 1 and 20; 10 was chosen)

  3. Activation function (ReLU and LeakyReLU; LeakyReLU with 0.1 slope was chosen to enable the negative values of the states)

  4. Loss function at the output layer (L1, a combination of L1 for large errors and squared L2 for small errors, and squared L2; squared L2 or mean-squared-error loss was chosen)

  5. Values for regularization at the input, hidden, and output layers (L1 and L2 regularization with regularization values ranging between 1 × 10−9 and 1 × 10−1; although both L1 and L2 regularization with a value of 1 × 10−6 had the best results within the range, no regularization was eventually used as they did not affect the model accuracy and numerical stability drastically)

  6. Drop-out regularization (randomly zeroing out some of the elements of the tensors with a given probability Srivastava et al. 2014) at the input and hidden layers (varied between 0 and 0.1; only the input layer was chosen to have dropout with a value of 0.01)

  7. Optimization routine (stochastic gradient descent, Adam, and NAdam; NAdam was chosen due to higher convergence rate)

  8. Batch size of training samples (varied between 1 and 2048; 1024 was chosen).

We note that these grid searches are exploratory in nature and there are other avenues to find the most optimized NN architecture (Balaprakash et al. 2018; Egele et al. 2021; Hertel et al. 2020).

The single-layer model is used as a shallow, simple NN model to test the capability of NNs to capture the grid mapping. Drop-out regularization (Srivastava et al. 2014) is applied to the input layer to avoid overfitting and stabilize the single-layer model. Adding more linear layers with drop-out regularization to this model (layers that are equivalent to the only layer in the single-layer model) reduced the stability of the model during testing without a considerable increase in accuracy and ability to capture the nonlinear physics. This leads us to the last two models, the ResNet and DenseNet models. The last two models are deep NNs, used to increase the complexity of the network and accurately capture the highly nonlinear physics. These deep NNs take advantage of skip connections to enable sparse, complex, nonlinear relations between the input and output (He et al. 2016). The difference between the latter two is only in the manner in which the residual (or input) for each layer is computed—the first one uses addition, whereas the second uses concatenation.

We have found that the simple, small framework of using a stencil-based input and output to the network of MLPs performs similarly using convolutional neural networks (CNNs) during testing in unknown data regions. The architecture is intrinsically similar to convolution operations. More importantly, the stencil-based approach mimics a higher-order finite difference scheme. The current formulation does not assume a dissipation- or advection-based mapping for the relationship between qcoarse and Δq. Our approach is to let the NN capture the complex mathematical relations between the states of the center cell and neighboring cells with the state difference at the center cell in the coarse domain.

It is important to note that the actual domain of constraint for the fine-scale model, the domain of inputs used to predict the outputs, is wider than the 3 × 3 coarse cells over the temporal domain of a coarse-grid time step. Therefore, full constraint of the state difference is likely not possible. Still, we wished to maintain a simpler NN model with minimal inputs, and results will show that the 3 × 3 stencil is effective in providing a constraining set of inputs.

d. Training and validation of model

We have learned that sampling the data for training is a crucial step for building an accurate and stable NN model, which we elaborate on here. The training dataset comprises 1-M samples collected between the temporal regime of [0, 900] s. We did a preliminary grid search on the number of training samples required, spanning from a few 100-k to 10-M samples, and found 1-M samples to be enough for training accurate models. Note that this sampling set is obtained using less than 2% of the spatial data at each time step of the simulation. For comparison, the number of training data points generated for a full flow field model (like CNN models) would be over 313 M over the same temporal regime. The small stencil-based framework allows us to use significantly fewer samples for training the models.

We have found that the accuracy of the model in regions with high gradients of the states is significantly low when trained with data randomly sampled in space and time. Moreover, the online testing of the model (details of the online testing procedure are discussed later in section 3b) became numerically unstable immediately after starting the simulation. Thus, we use a total variation (TV) weighted random sampling technique. The TV of a sample at the cell (i, j) in the two-dimensional domain is given by
V(qi,j)=k=10{|qi+(2k+1),jqi,j|+|qi,j+(2k+1)qi,j|}.
TV identifies local regions with high variations. As the majority of the domain comprises quiescent regions with small variations in the states during the initial transient regime, these regions are not captured by TV weighting. This limitation of the TV weighting to not capture quiescent regions makes the models biased against the background mean.

To alleviate this bias issue, we use a 50:50 split to curate samples with and without high TV weighting. To sample n = 106/NTS data points at an instant in time, where NTS is the number of time steps, we apply a strict sampling criterion using TV of all the states of a data point. First, we perform a min–max normalization of V(q) for each state computed over the global spatial domain for that instant in time. Next, a threshold is selected to identify n/2 random samples with high TV weighting for all the states. Selecting this threshold poses further challenges as the flow evolves through both laminar and turbulent regimes, leading to a broad range of values for the states over time. Thus, the threshold for identifying the 50:50 split is manually adjusted for various flow regimes. Finally, the rest of the n/2 samples at an instant in time are randomly sampled from the remaining data points in the domain.

The input and output data for the NN are scaled according to min–max normalization. For a vector of input and output variables of the network, Q=[qΔq]T, the normalization is given by
Q˜=QmintrainQmaxtrainQmintrainQ,
where Q˜ is the rescaled vector and the operators mintrain and maxtrain refer to the minimum and maximum operators over the training sample space of the variables, respectively.

We train the model in a Python environment using PyTorch and then deploy the model in the C++ solver (Norman 2021) with graphics processing unit (GPU) acceleration for production usage. The “NAdam” optimizer is used for optimizing the weights of the NNs (Kingma and Ba 2014; Dozat 2016; Ma and Yarats 2019). For training the 1-M data points (with a 70:30 split between training and validation loops and random shuffling of training samples), we use minibatches of 1024 samples. Even with the use of minibatches, the model incurs high variance in accuracy over the training epochs. We have learned that using a higher learning rate at initial training epochs till saturation of the model accuracy and then lowering the value help reduce this high variance in accuracy. Thus, this process of reducing the learning rate represents a two-step reduction, and we use high and low learning rate values of 1 × 10−1 and 5 × 10−2, respectively. Furthermore, for the deep NNs, we also rely on an averaging technique to ensemble the model weights across various training epochs to alleviate the high variance in the training accuracy as shown in Fig. 4. After a regime of stable accuracy is reached over epochs (3000 epochs is set for our current problem), we collect ensembles of model weights from models with errors below a particular threshold (a mean-square-error value of 2.5 × 10−4 is used). The weights of the final model are an ensemble of these collected weights. The predictions using the NN are made only using this single final model, whose weights are computed from the ensembling procedure. The procedure helps average the gradient basin in the optimization manifold, aiding in forming a stable solution. This procedure has similarities with the stochastic weighted averaging (SWA) method introduced by Izmailov et al. (2018).

Fig. 4.
Fig. 4.

(top) Illustration of collecting ensembles to form ensemble model weights [similar to the stochastic weighted averaging technique by Izmailov et al. (2018)]. The green line represents the training regime over which the ensemble of the model weights is computed. The red shaded region denotes the training instances when the model acquires error values higher than the cutoff error and hence the weights of which are not considered for the ensembling procedure. (bottom) A sample training curve for the current modeling framework.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

3. Results

For an effective model predicting the subgrid-scale effects, the model should be accurate as well as computationally stable when tested in temporal regimes outside the training regime. As with subgrid-scale turbulence models, we perform both offline and online testing, commonly known as a priori and a posteriori testing in the turbulence literature (Piomelli et al. 1988), of the NN model in turbulent regimes outside the training regime. We also test the models in the initial laminar regime where antidissipative effects dominate. Testing was performed not only on the flow field snapshots but also on the state gradients, their statistical estimates, root-mean-square (rms) values of the states, total kinetic energy, and grid independence. The results reveal that the ResNet-based model gives the most accurate and computationally stable NN-coupled solver for a finite time period. Herein, we only show the results of the ResNet-based model, while we also discuss insights gained from the other models.

a. Offline testing

For the offline testing, the results from the NN model are directly compared with the expected solution and evaluated independently without coupling the NN model with the numerical solver. We first test the model with data randomly sampled in space and time outside the training regime, as shown in Fig. 5. The comparison between the true and predicted Δq for each flow state is shown in Figs. 5a–d, along with the coefficient of determination (regression score), R2. The ResNet model predicts the state differences with high accuracy and low spread, with R2 > 0.95. The normalized errors have a mean and standard deviation of O(103) and O(102), respectively.

Fig. 5.
Fig. 5.

Comparison between the true state differences and those predicted by the ResNet-based NN model for random data points in space and time. The coefficient of determination, R2 value, for each state difference is also provided.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

Next, we observe the behavior of the model by using data from a full snapshot in a turbulent regime outside the temporal regime of the training data (after 900 s), as shown in Fig. 6 for Δ(ρu). The ResNet model predicts the state difference accurately with an overall Euclidean error norm of L2=ΔqΔqNN2/Δq2=0.195. We also plot the L1 error, ϵ = |Δq − ΔqNN|/max(|Δq|), to study the results locally in space, as shown in Fig. 6 (bottom). The main large-scale structures are accurately captured, whereas the smaller-scale features have lower accuracy. Nonetheless, the maximum error is 10%, which is isolated to a small region in the middle-left side of the domain. The DenseNet model performed accurately for both the random samples [normalized errors have a mean and standard deviation of O(103) and O(102), respectively] and full flow field (L2 = 0.55) test datasets. Surprisingly, the single-layer model did give accurate results [normalized errors have a mean and standard deviation of O(102) and O(102), respectively, and L2 = 1.32]. However, neither the DenseNet nor the single-layer models performed as well as the ResNet model.

Fig. 6.
Fig. 6.

Full flow field prediction by the NN model. Corrections of horizontal velocity are shown along with the L1 error norm, ϵ = |Δq − ΔqNN|/max(|Δq|), and L2 norm of the full flow field, L2=ΔqΔqNN2/Δq2.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

b. Online testing

The online testing is done by coupling the NN model correction with the coarse flow simulation. A sample depiction of the NN-coupled solver is shown in Fig. 7. The fine simulation is used as correction until the testing time, t0, after which the NN model is invoked. Therefore, at time t0, the model state is essentially perfect with respect to the fine-grid model. The solution at t0 can also be viewed as the coarse-grained fine-resolution simulation result. We use the PyTorch C++ API1 to couple the NN model with the C++ solver, performing the in-the-loop ML integration with the scientific solver. A GPU kernel is used to update the qcoarse with the machine learned ΔqNN after every time step, following Eq. (2). Even without special batching consideration of online inference and refactoring the code for GPU performance, we attain speed-ups up to 8× when using the NN-coupled solver compared to a fine-resolution simulation. Since the current discussion is concentrated more on the scientific aspects of the model, we reserve the full computational aspects of the framework for future investigation. In the following results, we show the results of testing the NN-coupled solver in a turbulent regime as well as in a laminar regime.

Fig. 7.
Fig. 7.

Sample depiction of online testing using the NN model.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

We invoke the NN model at time t0 = 1000 s, which is outside the training regime and well within the turbulent regime. Recall that the training data were sampled from t ∈ [0, 900] s. The results we show here are those after the simulation has run for t0 + 5 s, which is approximately 25 coarse-grid time steps. The results for ρu of the coarse-grained fine-resolution simulation (ρu¯fine), coarse-resolution simulation without correction (ρucoarse-No  correction), and coarse-resolution simulation with correction using the NN model (ρucoarse-NN) are shown in Fig. 8. The overall accuracy of the NN model, quantified by the L2 norm, is similar to that of the ρucoarse-No  correction. We focus on the eddy at the bottom-right corner of the domain to exemplify the prominence of the diffusive effects by a coarse-grid resolution in the turbulent regime. Without correction after t0 (ρucoarse-No correction, Fig. 8 middle), diffusive effects start to dominate the coarse simulation, and the subgrid-scale effects are not captured. The sharp boundaries of the coherent structure are smoothed in ρucoarse-No  correction by the coarse grid as compared to the fine grid in ρu¯fine. The vortex core is predicted with reasonable accuracy in ρucoarse-No  correction, but the fine-scale boundaries (or gradients) in the boundary of the vortex core and the shear layer regions are not captured.

Fig. 8.
Fig. 8.

Full flow state prediction using NN-based emulation in turbulent regime outside of training regime. Snapshots of horizontal velocity (ρu) after 5 s from the initial state of the testing regime are shown for (top) coarse-grained fine-resolution simulation (ρu¯fine), (middle) coarse simulation without correction (ρucoarse-No  correction), and (bottom) coarse simulation with correction using NN model (ρucoarse-NN). The L2 difference norm is also given for ρucoarse-No  correction and ρucoarse-NN. The zoom-in view of the eddy at the bottom-right corner of the domain is shown adjacent to the respective contour plots. The L1 difference with respect to ρufine, ϵ=|q¯fineqcoarse-NocorrectionorNN|/max(|q¯fine|), is also shown in the right columns. The respective contour plots of the flow fields of ρw, ρ, and θ′ are provided in the online supplemental material document.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

The NN model, ρucoarse-NN, accurately predicts the flow evolution with sharp boundaries. The sharp boundaries in the shear-layer regions above and below the vortex core are captured better in ρucoarse-NN compared to ρucoarse-No  correction. Moreover, the NN model maintains a maximum L1 difference of 6% and better than that of ρucoarse-No  correction (7%), as shown in Fig. 8 bottom right. Note that the chaotic divergence of the modeled flow and its difference from the ideal states are inseparably combined in the results of the L1 difference norm. The main coherent structures are accurately captured in ρucoarse-NN, and the larger L1 errors from the small-scale fluctuations are isolated. Note that these regions of high local errors are located in high-gradient regions.

To provide a clear quantitative measure of the capability of the NN model to capture the sharp boundaries or high-gradient regions, we compute the gradients of ρu in both x and z directions, as shown in Figs. 9 and 10. The L2 difference norm of the gradients with respect to that of ρu¯fine is also computed to quantify the overall difference in the gradients. Both the gradients ∇x(ρucoarse-NN) and ∇z(ρucoarse-NN) have less L2 difference compared to that of ρucoarse-No  correction. The comparatively low local errors of ρucoarse-NN, depicted by the contours of L1 error, also portray the ability of the NN model to capture the high gradients in the flow field. Since the L2 norm also suffers the effects of the chaotic divergence of the flow and spatial repositioning of the fine-scale features, we further use the probability density distribution of the gradients to compare the three simulations.

Fig. 9.
Fig. 9.

(top) Comparison of gradients of ρu in the x direction for (left) coarse-grained fine-resolution simulation (ρu¯fine), (middle) coarse simulation without correction (ρucoarse-No  correction), and (right) coarse simulation with correction using NN model (ρucoarse-NN). The L2 difference norm is also given for ρucoarse-No  correction and ρucoarse-NN. (Bottom) The L1 difference with respect to ρufine, ϵ=|x(q¯fine)x(qcoarse-NocorrectionorNN)|/max[|x(q¯fine)|]. The respective contour plots of the flow fields of ρw, ρ, and θ′ are provided in the supplemental material document.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

Fig. 10.
Fig. 10.

(top) Comparison of gradients of ρu in the z direction for (left) coarse-grained fine-resolution simulation (ρu¯fine), (middle) coarse simulation without correction (ρucoarse-No  correction), and (right) coarse simulation with correction using NN model (ρucoarse-NN). The L2 difference norm is also given for ρucoarse-No correction and ρucoarse-NN. (bottom) The L1 difference with respect to ρufine, ϵ=|z(q¯fine)z(qcoarse-NocorrectionorNN)|/max[|z(q¯fine)|]. The respective contour plots of the flow fields of ρw, ρ, and θ′ are provided in the supplemental material document.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

The probability density distributions of the gradients ∇x(ρu) and ∇z(ρu) for the three simulations are shown in Fig. 11. The NN model enables the distributions of both ∇x(ρu) and ∇z(ρu) to better match that of ρu¯fine, particularly for high-gradient values. The distributions of the gradients of ρucoarse-No  correction fall below that of ρu¯fine for large values. The Kullback–Leibler (KL) divergence (Kullback and Leibler 1951; Csiszár 1975) is used to compare the probability distributions, which for distributions q and q^ is given by
DKL(q,q^)={qlog(q/q^)q+q^q>0,q^>0q^q=0,q^0otherwise,
where the function is nonnegative and is jointly convex in q and q^ (Boyd and Vandenberghe 2004). The observations regarding the distributions are quantified by the lower values of KL divergence for the ρucoarse-NN distributions compared to ρucoarse-No  correction.
Fig. 11.
Fig. 11.

Comparison of the probability density distributions of the gradients of (left) ∇x(ρu) and (right) ∇z(ρu) for coarse-grained fine-resolution simulation (ρu¯fine, black dash), coarse simulation without correction (ρucoarse-No  correction, blue dash), and coarse simulation with correction using NN model (ρucoarse-NN, orange dash). The Kullback–Leibler divergence, DKL, of the ρucoarse-No  correction, and ρucoarse-NN distributions with respect to ρu¯fine are also denoted.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

We summarize the L2 difference norm and the KL divergence values with respect to the ρu¯fine for all the flow variables in Table 1. As seen from the L2 values, the NN model has a higher accuracy at predicting the gradients of velocities ρu and ρw but has lower accuracy for the gradients of density ρ and potential temperature perturbations θ′. The increase in L2 error for θ′ and ρ by the NN model could suggest that the hydrostasis might not have been captured since both these state variables are dominated by hydrostasis. The comparison of KL divergence of the gradients shows that the NN-model-based simulation outperforms the coarse simulation without correction at capturing sharp gradients for all flow variables. Next, we address the ability of the model to scale with spatial-grid resolution other than for which it was trained, which could suggest the model is generalizable to other resolutions.

Table 1.

Comparison of ∇x(⋅) and ∇z(⋅) for qcoarse-No correction (abbreviated NC) and qcoarse-NN using the L2 difference norm and Kullback–Leibler divergence, DKL, with respect to the coarse-grained fine-resolution simulation. The results are for online testing performed from t0 = 1000 s, and the metrics are computed at t0 + 5 s. The values in bold indicate comparatively lower error.

Table 1.

We test the generalization of the model for various grid resolutions compared to that used for training. This is considering ideas of scale invariance for the NN model (and potentially the dynamics themselves). We perform online testing of the model trained using data from the 1000 × 500 → 200 × 100 grid mapping on the following grid mappings: 1) 500 × 250 → 100 × 50, 2) 1500 × 750 → 300 × 150, and 3) 2000 × 1000 → 400 × 200. Note that the grid mapping ratio of 5× is retained for each of the mappings. The total error at an instant in a turbulent regime, at t0 = 5 s, is shown in Fig. 12. The θ′ fields are also shown as insets for reference. The accuracy of the NN model for all the flow states scales according to the grid mapping—the error increases linearly with an increase in grid resolution.

Fig. 12.
Fig. 12.

Behavior of the model accuracy at online testing with simulation setups of various grid resolutions. The model was trained with stencil data from the 1000 × 500 → 200 × 100 grid mapping. All other cases have the same grid-mapping ratio of 5× between the respective coarse and fine grids. The total L1 error is given by ϕ=[1/(i,jnx,nz|q¯finei,j|)]i,jnx,nz|qNNi,jq¯finei,j|, where q¯fine is the interpolated fine-resolution state (coarse-grained fine-resolution simulation) corresponding to each of the tested coarse grid. Insets are the coarse-grid potential temperature perturbation fields for each grid mapping.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

These results suggest various advantages of the stencil-based architecture used in the current modeling framework. The NN model trained at a lower-grid resolution can be used at a higher resolution, though with slightly more error, hence saving significant computational costs during training. Moreover, such a scaling would be difficult to achieve if the surrogate model has global spatial and/or temporal dependence, which are not considered in the current local, stencil-based framework. While we attribute the reasoning for the scaling to the stencil-based approach, resembling a finite difference scheme, we have yet to fully understand the mechanism behind this character of the model, and this investigation will be performed in future work.

With spatial accuracy comes the next challenge of temporal stability of the NN-coupled solver. The high spatial accuracy attained by the NN does not remain numerically stable indefinitely. As the flow evolves, the simulation becomes unstable, which laminar results will show is most likely attributable to new and growing extrema produced by the NN model (see Fig. 17). The root-mean-square (rms) value of wind speeds and total kinetic energy over time are shown in Fig. 13. The rms value (spatial) of wind speeds and the total kinetic energy are defined by
urms=[1nxnzi,jnx,nzui,j2]1/2andTotalKE=12[i,jnx,nz(ρu)i,j2+(ρw)i,j2],
respectively, where u=[ρuρw]T. We also plot the L1 difference of the rms velocity and total KE with respect to the coarse-grained fine simulation on the right column of Fig. 13, computed as ϵ = |(urms,coarse-NNurms,fine)/urms,fine|. Overestimates of variability, as seen previously in L1 difference plots of Fig. 8, accumulate over time until the solver becomes unstable at t0 + 20 s, which is after ∼90 coarse-grid time steps. However, in the stable regime, the (ρu)rms of the coarse simulation with NN correction better maintains the average rms of the field over time compared to the coarse simulation without correction (see blue lines in Fig. 13, top right). The accuracy of the (ρw)rms of the coarse simulation with NN correction is comparable to that of the coarse simulation without correction. The total KE (Fig. 13, bottom right) shows that the NN-based correction indeed outperforms the coarse simulation without correction during most of the stable temporal regimes.
Fig. 13.
Fig. 13.

Comparison of (top) the rms values of the flow velocities and (bottom) the total kinetic energy over time. The various lines represent the following: – the coarse-grained fine-resolution simulation, ⋅⋅⋅ the coarse simulation without correction from t0 = 1000 s, and – – the coarse simulation with NN correction. (left) The raw data and (right) the L1 difference with respect to the coarse-grained fine-resolution simulation.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

The results suggest that the NN-based correction is highly accurate over a finite time interval of ∼90 coarse-grid time steps, whereas further research needs to be done to make the solver stable from the accumulation of the errors at high-gradient regions. We note that the DenseNet model is able to perform similarly to that of the ResNet model, becoming unstable after t0 + 12 s but with higher levels of error before it becomes unstable. However, the single-layer model is highly unstable, as it becomes unstable after only t0 + 3 s. This inability of the single-layer model could suggest the utility of a complex formulation of the NN model with multiple layers and skip connections.

Various regularization techniques can be used to address these overestimates although we did not see many changes with such strategies. Utilizing the skip connections in ResNets was one of the ways in which the overestimates were controlled, but not fully alleviated. Since the overestimates do not satisfy the physics constraints of the fluid flow system, physics-informed techniques can be used to constrain the variabilities. We provide insights from our ongoing work on improving the stability of the NN-coupled solver in the conclusions section.

Testing in the laminar regime

Testing the model in a turbulent regime shows its capability to capture predominantly dissipative effects. Next, we demonstrate the capability of the NN model to effectively capture predominantly antidissipative effects by testing it in a transient regime dominated by laminar flow where mixing is not prevalent. For this testing in a laminar regime, we invoke the NN model from t0 = 300 s. The results after the simulation has run for t0 + 5 s are shown in Fig. 14 (similar to the plots in turbulent regime, Fig. 8). The NN model is able to accurately predict the corrections in laminar regimes as well, with a difference (L2 = 0.009) lesser than the difference acquired by the coarse simulation without correction (L2 = 0.013).

Fig. 14.
Fig. 14.

Full flow state prediction using NN-based emulation in laminar regime outside of training regime. Snapshots of potential temperature perturbation (ρu) after 5 s from the initial state of the testing regime are shown for (top) coarse-grained fine-resolution simulation (ρu¯fine), (middle) coarse simulation without correction (ρucoarse-No correction), and (bottom) coarse simulation with correction using NN model (ρucoarse-NN). The L2 difference norm is also given for ρucoarse-No correction and ρucoarse-NN. The L1 difference with respect to ρufine, ϵ=|q¯fineqcoarse-NocorrectionorNN|/max(|q¯fine|), is also shown in the right column. The respective contour plots of the flow fields of ρw, ρ, and θ′ are provided in the supplemental material document.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

We also plot the gradient of ρu in both x and z directions as shown in Figs. 15 and 16. The NN correction outperforms the coarse simulation without corrections for both ∇xρu and ∇zρu. Moreover, Table 2 shows the NN correction to outperform the coarse simulation without correction at capturing the high gradients for all the variables. These results in the laminar regime demonstrate the capability of the NN model to accurately capture high gradients in a flow regime not dominated by mixing and diffusion. Moreover, note that the NN model is accurate for both ρu and θ′ unlike the results in the turbulent regime. This observation suggests that we need to be more careful when choosing the variables being modeled and include the effects of hydrostasis along with the perturbed variables. Moreover, the observation could also suggest the NN model considers mixing in the vertical direction similar to that in the horizontal direction. Thus, a more anisotropic framework may suit the current flow setup.

Fig. 15.
Fig. 15.

(top) Comparison of gradients of ρu in the x direction at the laminar regime for (left) coarse-grained fine-resolution simulation (ρu¯fine), (middle) coarse simulation without correction (ρucoarse-No correction), and (right) coarse simulation with correction using NN model (ρucoarse-NN). The L2 difference norm is also given for ρucoarse-No correction and ρucoarse-NN. (bottom) The L1 difference with respect to ρufine, ϵ=|x(q¯fine)x(qcoarse-NocorrectionorNN)|/max[|x(q¯fine)|]. The respective contour plots of the flow fields of ρw, ρ, and θ′ are provided in the supplemental material document.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

Fig. 16.
Fig. 16.

(top) Comparison of gradients of ρu in the z direction at the laminar regime for (left) coarse-grained fine-resolution simulation (ρu¯fine), (middle) coarse simulation without correction (ρucoarse-No correction), and (right) coarse simulation with correction using NN model (ρucoarse-NN). The L2 difference norm is also given for ρucoarse-No correction and ρucoarse-NN. (bottom) The L1 difference with respect to ρufine, ϵ=|z(q¯fine)z(qcoarse-NocorrectionorNN)|/max[|z(q¯fine)|]. The respective contour plots of the flow fields of ρw, ρ, and θ′ are provided in the supplemental material document.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

Table 2.

Comparison of ∇x(⋅) and ∇z(⋅) for qcoarse-No correction (abbreviated NC) and qcoarse-NN using the L2 difference norm and Kullback–Leibler divergence, DKL, with respect to the coarse-grained fine-resolution simulation. The results are for online testing performed from t0 = 300 s, and the metrics are computed at t0 + 5 s. The values in bold indicate comparatively lower error.

Table 2.

We analyze the states in a one-dimensional space, which helps reveal the performance of the model at capturing sharp gradients in the laminar regime and, hence, demonstrates the ability of the model to capture antidissipative effects. For the above online test results at time t0 + 5 s, we analyze the flow states at a horizontal slice at z = 6 km as shown in Fig. 17. We chose the slice at z = 6 km carefully, where sharp gradients are present. The results show that the NN model (orange lines) is able to capture the most highly variable regions more accurately than the noncorrected coarse simulation (blue lines), e.g., at x = 6, 15 km in Fig. 17a. However, in some regions such as z ∈ [11, 13] km, the NN-corrected coarse simulation performs worse than the noncorrected coarse simulation. Nonetheless, even though qcoarse-No correction captures the sharp gradients over the short testing time period of 5 s, these will eventually be smoothed (diffused) over a finite period of time. The results of qcoarse (coarse simulation starting from t0 = 0 s without any correction; green lines) demonstrate this inability of the coarse simulation to retain the sharp gradients over time. All the above results suggest that the NN model has the capability of capturing and steepening gradients when appropriate, particularly in regions not dominated by mixing.

Fig. 17.
Fig. 17.

Slices of flow states at z = 6 km from the results in Fig. 14 at time t = t0 + 5 s. The line legends correspond to the following: dashed line (– –) is qfine, the blue line is qcoarse-No correction, the orange line is qcoarse-NN, and the green line is qcoarse, the coarse simulation without any correction from the initial condition at t = 0 s.

Citation: Artificial Intelligence for the Earth Systems 3, 4; 10.1175/AIES-D-23-0043.1

4. Concluding remarks and future work

We use deep learning to generate spatially local surrogate models from high-resolution data for capturing subgrid-scale effects in dry, stratified turbulence in atmospheric flows. Starting from the inviscid Euler equations with density stratification, we generate subgrid-scale data as a difference between high- and low-resolution flow field data. The coarse-resolution simulation is kept in sync with the fine-resolution simulation via coarsening interpolation (coarse graining) after each time step to avoid chaotic nonlinear divergence. The results show that a deep ResNet-type NN model using the stencil information around a given grid cell is able to accurately perform online tests for certain flow states and is stable over a finite time period. Such accuracy is consistent over both laminar and turbulent regimes, which are dominated by antidissipative and dissipative effects, respectively. Even without special batching considerations and refactoring the code for GPU performance, the NN-coupled solver attains speed-ups of up to 8× compared to the fine-resolution simulation. Moreover, we find that the model generalizes well to various grid resolutions even though trained with data from a particular grid resolution, giving potential computational savings by training at a lower resolution and inferencing at a higher resolution.

We find that in turbulent regimes, the model is able to accurately predict flow velocities but has lower accuracy for capturing the behavior density and potential temperature. This might suggest that hydrostasis might have to be explicitly taken into consideration in the framework. We note that while the model is able to outperform coarse simulations to capture the high gradients in the flow states, at some spatial regions, the model has low accuracy to capture the state differences. This could be a difficulty in the ability of the chosen simple neural network–based models to emulate highly discontinuous and variable phenomena, or it could also be due to the need for more training data in this regime. Due to the accumulation of such errors over time, the model becomes computationally unstable after a finite time, after ∼90 coarse-grid time steps. The high accuracy of the model for certain flow states and its capability to capture high gradients in the finite time where it is numerically stable are encouraging to learn from this pilot study to further enhance the capabilities of the framework.

The present supervised learning approach of capturing the state difference using a stencil framework could serve as a novel alternative to the subgrid-scale models in large-eddy simulations and CRMs, capturing both dissipative and antidissipative effects from the subgrid-scale phenomena. We are currently investigating alternate formulations to enhance the numerical stability of the model. Complex diffusion-based approaches inspired by traditional SGS models are an alternative approach in the physics-informed realm. Such models along with enforcing the conservation of various conserved quantities can help stabilize the NN-coupled solver (Ling et al. 2016). Moreover, recurrent neural networks and generative adversarial networks can be used for correcting the emulation. Addressing the numerical stability of the solver against extrema generation can also help stabilize the NN-coupled solver.

The current approach is an exploratory work to assess the capabilities and nuances of spatially local stencil-based models to capture the complex nonlinear dynamics of stratified turbulence in idealized atmospheric flows. We do not perform a comparison with spatially global models, like those considering the full flow field using convolutional-based (CNN) models. Such a comparison between local stencil-based and global CNN-based models is indeed a great topic for research and an important one. Moreover, while we have tested the generalizability of the model to scale over different grid resolutions, we have not tested the generalizability of the model with other types of flows. In this pilot study, we tackle a single type of flow, which comprises both laminar and turbulent regimes, to understand the capability of the framework to model subgrid-scale effects. Future work would involve extending the generalizability of the framework to work for different types of flows. Such efforts would require updating the framework according to the equation of state being solved and retraining the model. A more achievable goal would be to incorporate appropriate flow parameters into the model so that a single model can be tested at different parameter regimes.

Another aspect we have not explored in great depth in this manuscript is the explainability and interpretability of the NN models. Recently, various techniques have been used to understand the physical implications of machine learning, particularly in the meteorological domains (McGovern et al. 2019). Finally, performant, portable, and scalable integration of the model to the solver in hybrid architectures (Partee et al. 2021; Brewer et al. 2021; Bhushan et al. 2021) is of utmost urgency for practical production deployment of such models in climate models.

Acknowledgments.

The authors thank the insightful discussions with Kyle Pressel. This research was supported by the Exascale Computing Project (17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration. This research used resources from the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. This manuscript has been authored by UT-Battelle, LLC, under contract DE-AC05-00OR22725 with the U.S. Department of Energy (DOE). The U.S. government retains, and the publisher, by accepting the article for publication, acknowledges that the U.S. government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this manuscript or allow others to do so, for U.S. government purposes. The DOE will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

Data availability statement.

The specific data, codes, and trained models are openly accessible in Gopalakrishnan Meena and Norman (2024). A collection of sample datasets used for training and testing can also be found in Gopalakrishnan Meena and Norman (2022a,b). The source code for the original ILES method, AWFL, used in this work can be found at https://github.com/mrnorman/awflCloud.

REFERENCES

  • Balaprakash, P., M. Salim, T. D. Uram, V. Vishwanath, and S. M. Wild, 2018: DeepHyper: Asynchronous hyperparameter search for deep neural networks. IEEE 25th Int. Conf. on High Performance Computing, Bengaluru, India, Institute of Electrical and Electronics Engineers, 4251, https://doi.org/10.1109/HiPC.2018.00014.

    • Search Google Scholar
    • Export Citation
  • Beck, A., D. Flad, and C.-D. Munz, 2019: Deep neural networks for data-driven LES closure models. J. Comput. Phys., 398, 108910, https://doi.org/10.1016/j.jcp.2019.108910.

    • Search Google Scholar
    • Export Citation
  • Bhushan, S., G. W. Burgreen, W. Brewer, and I. D. Dettwiller, 2021: Development and validation of a machine learned turbulence model. Energies, 14, 1465, https://doi.org/10.3390/en14051465.

    • Search Google Scholar
    • Export Citation
  • Boyd, S. P., and L. Vandenberghe, 2004: Convex Optimization. Cambridge University Press, 716 pp.

  • Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 62896298, https://doi.org/10.1029/2018GL078510.

    • Search Google Scholar
    • Export Citation
  • Brenowitz, N. D., and C. S. Bretherton, 2019: Spatially extended tests of a neural network parametrization trained by coarse-graining. J. Adv. Model. Earth Syst., 11, 27282744, https://doi.org/10.1029/2019MS001711.

    • Search Google Scholar
    • Export Citation
  • Brenowitz, N. D., B. Henn, J. McGibbon, S. K. Clark, A. Kwa, W. A. Perkins, O. Watt-Meyer, and C. S. Bretherton, 2020: Machine learning climate model dynamics: Offline versus online performance. arXiv, 2011.03081v1, https://doi.org/10.48550/arXiv.2011.03081.

  • Bretherton, C. S., and Coauthors, 2022: Correcting coarse-grid weather and climate models by machine learning from global storm-resolving simulations. J. Adv. Model. Earth Syst., 14, e2021MS002794, https://doi.org/10.1029/2021MS002794.

    • Search Google Scholar
    • Export Citation
  • Brewer, W., D. Martinez, M. Boyer, D. Jude, A. Wissink, B. Parsons, J. Yin, and V. Anantharaj, 2021: Production deployment of machine-learned rotorcraft surrogate models on HPC. 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), St. Louis, MO, Institute of Electrical Electronics Engineers, 21–32, https://doi.ieeecomputersociety.org/10.1109/MLHPC54614.2021.00008.

  • Brunton, S. L., B. R. Noack, and P. Koumoutsakos, 2020: Machine learning for fluid mechanics. Annu. Rev. Fluid Mech., 52, 477508, https://doi.org/10.1146/annurev-fluid-010719-060214.

    • Search Google Scholar
    • Export Citation
  • Csiszár, I., 1975: I-divergence geometry of probability distributions and minimization problems. Ann. Probab., 3, 146158, https://doi.org/10.1214/aop/1176996454.

    • Search Google Scholar
    • Export Citation
  • Dozat, T., 2016: Incorporating Nesterov momentum into Adam. Proc. Int. Conf. on Learning Representations, Workshop Track, San Juan, Puerto Rico, ICLR, 1–4.

  • Duraisamy, K., G. Iaccarino, and H. Xiao, 2019: Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51, 357377, https://doi.org/10.1146/annurev-fluid-010518-040547.

    • Search Google Scholar
    • Export Citation
  • Egele, R., P. Balaprakash, I. Guyon, V. Vishwanath, F. Xia, R. Stevens, and Z. Liu, 2021: AgEBO-tabular: Joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular data. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, Association for Computing Machinery, 1–14, https://doi.org/10.1145/3458817.3476203.

  • Gamahara, M., and Y. Hattori, 2017: Searching for turbulence models by artificial neural network. Phys. Rev. Fluids, 2, 054604, https://doi.org/10.1103/PhysRevFluids.2.054604.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis, 2018: Could machine learning break the convection parameterization deadlock? Geophys. Res. Lett., 45, 57425751, https://doi.org/10.1029/2018GL078202.

    • Search Google Scholar
    • Export Citation
  • Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier neural networks. Proc. 14th Int. Conf. on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, PMLR, 315–323, https://proceedings.mlr.press/v15/glorot11a.html.

  • Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 775 pp.

  • Gopalakrishnan Meena, M., and M. Norman, 2022a: Subgrid-scale effects in cloud-like atmospheric flows: Colliding thermals –Volume 1. Zenodo, accessed 5 January 2022, https://doi.org/10.5281/zenodo.5732523.

  • Gopalakrishnan Meena, M., and M. Norman, 2022b: Subgrid-scale effects in cloud-like atmospheric flows: Colliding thermals – Volume 2. Zenodo, accessed 5 January 2022, https://doi.org/10.5281/zenodo.5732987.

  • Gopalakrishnan Meena, M., and M. Norman, 2024: Awfl-sgs-ml: A deep learned spatially local surrogate model of subgrid-scale effects in idealized atmospheric flows. Zenodo, accessed 13 June 2024, https://doi.org/10.5281/zenodo.11636025.

  • Hannah, W. M., and Coauthors, 2020: Initial results from the super-parameterized E3SM. J. Adv. Model. Earth Syst., 12, e2019MS001863, https://doi.org/10.1029/2019MS001863.

    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Institute of Electrical and Electronics Engineers, 770–778, https://doi.org/10.1109/CVPR.2016.90.

  • Hertel, L., J. Collado, P. Sadowski, J. Ott, and P. Baldi, 2020: Sherpa: Robust hyperparameter optimization for machine learning. SoftwareX, 12, 100591, https://doi.org/10.1016/j.softx.2020.100591.

    • Search Google Scholar
    • Export Citation
  • Huang, G., Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017: Densely connected convolutional networks. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Institute of Electrical and Electronics Engineers, 2261–2269, https://doi.org/10.1109/CVPR.2017.243.

  • Izmailov, P., D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, 2018: Averaging weights leads to wider optima and better generalization. 34th Conf. on Uncertainty in Artificial Intelligence, Vol. 2, Monterey, CA, Association for Uncertainty in Artificial Intelligence, 876885, https://nyuscholars.nyu.edu/en/publications/averaging-weights-leads-to-wider-optima-and-better-generalization.

  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, 2021: Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA, 118, e2101784118, https://doi.org/10.1073/pnas.2101784118.

    • Search Google Scholar
    • Export Citation
  • Krasnopolsky, V. M., and M. S. Fox-Rabinovitz, 2006: Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Networks, 19, 122134, https://doi.org/10.1016/j.neunet.2006.01.002.

    • Search Google Scholar
    • Export Citation
  • Krasnopolsky, V. M., M. S. Fox-Rabinovitz, and A. A. Belochitski, 2013: Using ensemble of neural networks to learn stochastic convection parameterizations for climate and numerical weather prediction models from data simulated by a cloud resolving model. Adv. Artif. Neural Syst., 2013, 485913, https://doi.org/10.1155/2013/485913.

    • Search Google Scholar
    • Export Citation
  • Kullback, S., and R. A. Leibler, 1951: On information and sufficiency. Ann. Math. Stat., 22, 7986, https://doi.org/10.1214/aoms/1177729694.

    • Search Google Scholar
    • Export Citation
  • Ling, J., A. Kurzawski, and J. Templeton, 2016: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech., 807, 155166, https://doi.org/10.1017/jfm.2016.615.

    • Search Google Scholar
    • Export Citation
  • Ma, J., and D. Yarats, 2019: Quasi-hyperbolic momentum and Adam for deep learning. Int. Conf. on Learning Representations, New Orleans, LA, ICLR, 1–38.

  • Maas, A. L., A. Y. Hannun, and A. Y. Ng, 2013: Rectifier nonlinearities improve neural network acoustic models. Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, International Machine Learning Society, 1–6.

  • Maulik, R., O. San, J. D. Jacob, and C. Crick, 2019a: Sub-grid scale model classification and blending through deep learning. J. Fluid Mech., 870, 784812, https://doi.org/10.1017/jfm.2019.254.

    • Search Google Scholar
    • Export Citation
  • Maulik, R., O. San, A. Rasheed, and P. Vedula, 2019b: Subgrid modelling for two-dimensional turbulence using neural networks. J. Fluid Mech., 858, 122144, https://doi.org/10.1017/jfm.2018.770.

    • Search Google Scholar
    • Export Citation
  • McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 21752199, https://doi.org/10.1175/BAMS-D-18-0195.1.

    • Search Google Scholar
    • Export Citation
  • Norman, M. R., 2021: A high-order WENO-limited finite-volume algorithm for atmospheric flow using the ADER-differential transform time discretization. Quart. J. Roy. Meteor. Soc., 147, 16611690, https://doi.org/10.1002/qj.3989.

    • Search Google Scholar
    • Export Citation
  • Norman, M. R., and Coauthors, 2022: Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF’s summit supercomputer. Int. J. High Perform. Comput. Appl., 36, 93105, https://doi.org/10.1177/10943420211027539.

    • Search Google Scholar
    • Export Citation
  • Partee, S., M. Ellis, A. Rigazzi, S. Bachman, G. Marques, A. Shao, and B. Robbins, 2021: Using machine learning at scale in HPC simulations with SmartSim: An application to ocean climate modeling. arXiv, 2104.09355v1, https://doi.org/10.48550/arXiv.2104.09355.

  • Piomelli, U., P. Moin, and J. H. Ferziger, 1988: Model consistency in large eddy simulation of turbulent channel flows. Phys. Fluids, 31, 18841891, https://doi.org/10.1063/1.866635.

    • Search Google Scholar
    • Export Citation
  • Pressel, K. G., C. M. Kaul, T. Schneider, Z. Tan, and S. Mishra, 2015: Large-eddy simulation in an anelastic framework with closed water and entropy balances. J. Adv. Model. Earth Syst., 7, 14251456, https://doi.org/10.1002/2015MS000496.

    • Search Google Scholar
    • Export Citation
  • Pressel, K. G., S. Mishra, T. Schneider, C. M. Kaul, and Z. Tan, 2017: Numerics and subgrid-scale modeling in large eddy simulations of stratocumulus clouds. J. Adv. Model. Earth Syst., 9, 13421365, https://doi.org/10.1002/2016MS000778.

    • Search Google Scholar
    • Export Citation
  • Randall, D., 2013: Beyond deadlock. Geophys. Res. Lett., 40, 59705976, https://doi.org/10.1002/2013GL057998.

  • Randall, D., M. Khairoutdinov, A. Arakawa, and W. Grabowski, 2003: Breaking the cloud parameterization deadlock. Bull. Amer. Meteor. Soc., 84, 15471564, https://doi.org/10.1175/BAMS-84-11-1547.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Schneider, T., S. Lan, A. Stuart, and J. Teixeira, 2017a: Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett., 44, 12 39612 417, https://doi.org/10.1002/2017GL076101.

    • Search Google Scholar
    • Export Citation
  • Schneider, T., J. Teixeira, C. S. Bretherton, F. Brient, K. G. Pressel, C. Schär, and A. P. Siebesma, 2017b: Climate goals and computing the future of clouds. Nat. Climate Change, 7, 35, https://doi.org/10.1038/nclimate3190.

    • Search Google Scholar
    • Export Citation
  • Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, 2014: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15, 19291958.

    • Search Google Scholar
    • Export Citation
  • Watt-Meyer, O., N. D. Brenowitz, S. K. Clark, B. Henn, A. Kwa, J. McGibbon, W. A. Perkins, and C. S. Bretherton, 2021: Correcting weather and climate models by machine learning nudged historical simulations. Geophys. Res. Lett., 48, e2021GL092555, https://doi.org/10.1029/2021GL092555.

    • Search Google Scholar
    • Export Citation
  • Wyngaard, J. C., 1992: Atmospheric turbulence. Annu. Rev. Fluid Mech., 24, 205234, https://doi.org/10.1146/annurev.fl.24.010192.001225.

    • Search Google Scholar
    • Export Citation
  • Yuval, J., and P. A. O’Gorman, 2020: Stable machine-learning parameterization of subgrid processes for climate modeling at a range of resolutions. Nat. Commun., 11, 3295, https://doi.org/10.1038/s41467-020-17142-3.

    • Search Google Scholar
    • Export Citation
  • Yuval, J., P. A. O’Gorman, and C. N. Hill, 2021: Use of neural networks for stable, accurate and physically consistent parameterization of subgrid atmospheric processes with good performance at reduced precision. Geophys. Res. Lett., 48, e2020GL091363, https://doi.org/10.1029/2020GL091363.

    • Search Google Scholar
    • Export Citation

Supplementary Materials

Save
  • Balaprakash, P., M. Salim, T. D. Uram, V. Vishwanath, and S. M. Wild, 2018: DeepHyper: Asynchronous hyperparameter search for deep neural networks. IEEE 25th Int. Conf. on High Performance Computing, Bengaluru, India, Institute of Electrical and Electronics Engineers, 4251, https://doi.org/10.1109/HiPC.2018.00014.

    • Search Google Scholar
    • Export Citation
  • Beck, A., D. Flad, and C.-D. Munz, 2019: Deep neural networks for data-driven LES closure models. J. Comput. Phys., 398, 108910, https://doi.org/10.1016/j.jcp.2019.108910.

    • Search Google Scholar
    • Export Citation
  • Bhushan, S., G. W. Burgreen, W. Brewer, and I. D. Dettwiller, 2021: Development and validation of a machine learned turbulence model. Energies, 14, 1465, https://doi.org/10.3390/en14051465.

    • Search Google Scholar
    • Export Citation
  • Boyd, S. P., and L. Vandenberghe, 2004: Convex Optimization. Cambridge University Press, 716 pp.

  • Brenowitz, N. D., and C. S. Bretherton, 2018: Prognostic validation of a neural network unified physics parameterization. Geophys. Res. Lett., 45, 62896298, https://doi.org/10.1029/2018GL078510.

    • Search Google Scholar
    • Export Citation
  • Brenowitz, N. D., and C. S. Bretherton, 2019: Spatially extended tests of a neural network parametrization trained by coarse-graining. J. Adv. Model. Earth Syst., 11, 27282744, https://doi.org/10.1029/2019MS001711.

    • Search Google Scholar
    • Export Citation
  • Brenowitz, N. D., B. Henn, J. McGibbon, S. K. Clark, A. Kwa, W. A. Perkins, O. Watt-Meyer, and C. S. Bretherton, 2020: Machine learning climate model dynamics: Offline versus online performance. arXiv, 2011.03081v1, https://doi.org/10.48550/arXiv.2011.03081.

  • Bretherton, C. S., and Coauthors, 2022: Correcting coarse-grid weather and climate models by machine learning from global storm-resolving simulations. J. Adv. Model. Earth Syst., 14, e2021MS002794, https://doi.org/10.1029/2021MS002794.

    • Search Google Scholar
    • Export Citation
  • Brewer, W., D. Martinez, M. Boyer, D. Jude, A. Wissink, B. Parsons, J. Yin, and V. Anantharaj, 2021: Production deployment of machine-learned rotorcraft surrogate models on HPC. 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC), St. Louis, MO, Institute of Electrical Electronics Engineers, 21–32, https://doi.ieeecomputersociety.org/10.1109/MLHPC54614.2021.00008.

  • Brunton, S. L., B. R. Noack, and P. Koumoutsakos, 2020: Machine learning for fluid mechanics. Annu. Rev. Fluid Mech., 52, 477508, https://doi.org/10.1146/annurev-fluid-010719-060214.

    • Search Google Scholar
    • Export Citation
  • Csiszár, I., 1975: I-divergence geometry of probability distributions and minimization problems. Ann. Probab., 3, 146158, https://doi.org/10.1214/aop/1176996454.

    • Search Google Scholar
    • Export Citation
  • Dozat, T., 2016: Incorporating Nesterov momentum into Adam. Proc. Int. Conf. on Learning Representations, Workshop Track, San Juan, Puerto Rico, ICLR, 1–4.

  • Duraisamy, K., G. Iaccarino, and H. Xiao, 2019: Turbulence modeling in the age of data. Annu. Rev. Fluid Mech., 51, 357377, https://doi.org/10.1146/annurev-fluid-010518-040547.

    • Search Google Scholar
    • Export Citation
  • Egele, R., P. Balaprakash, I. Guyon, V. Vishwanath, F. Xia, R. Stevens, and Z. Liu, 2021: AgEBO-tabular: Joint neural architecture and hyperparameter search with autotuned data-parallel training for tabular data. Proc. Int. Conf. for High Performance Computing, Networking, Storage and Analysis, St. Louis, MO, Association for Computing Machinery, 1–14, https://doi.org/10.1145/3458817.3476203.

  • Gamahara, M., and Y. Hattori, 2017: Searching for turbulence models by artificial neural network. Phys. Rev. Fluids, 2, 054604, https://doi.org/10.1103/PhysRevFluids.2.054604.

    • Search Google Scholar
    • Export Citation
  • Gentine, P., M. Pritchard, S. Rasp, G. Reinaudi, and G. Yacalis, 2018: Could machine learning break the convection parameterization deadlock? Geophys. Res. Lett., 45, 57425751, https://doi.org/10.1029/2018GL078202.

    • Search Google Scholar
    • Export Citation
  • Glorot, X., A. Bordes, and Y. Bengio, 2011: Deep sparse rectifier neural networks. Proc. 14th Int. Conf. on Artificial Intelligence and Statistics, Ft. Lauderdale, FL, PMLR, 315–323, https://proceedings.mlr.press/v15/glorot11a.html.

  • Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 775 pp.

  • Gopalakrishnan Meena, M., and M. Norman, 2022a: Subgrid-scale effects in cloud-like atmospheric flows: Colliding thermals –Volume 1. Zenodo, accessed 5 January 2022, https://doi.org/10.5281/zenodo.5732523.

  • Gopalakrishnan Meena, M., and M. Norman, 2022b: Subgrid-scale effects in cloud-like atmospheric flows: Colliding thermals – Volume 2. Zenodo, accessed 5 January 2022, https://doi.org/10.5281/zenodo.5732987.

  • Gopalakrishnan Meena, M., and M. Norman, 2024: Awfl-sgs-ml: A deep learned spatially local surrogate model of subgrid-scale effects in idealized atmospheric flows. Zenodo, accessed 13 June 2024, https://doi.org/10.5281/zenodo.11636025.

  • Hannah, W. M., and Coauthors, 2020: Initial results from the super-parameterized E3SM. J. Adv. Model. Earth Syst., 12, e2019MS001863, https://doi.org/10.1029/2019MS001863.

    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Institute of Electrical and Electronics Engineers, 770–778, https://doi.org/10.1109/CVPR.2016.90.

  • Hertel, L., J. Collado, P. Sadowski, J. Ott, and P. Baldi, 2020: Sherpa: Robust hyperparameter optimization for machine learning. SoftwareX, 12, 100591, https://doi.org/10.1016/j.softx.2020.100591.

    • Search Google Scholar
    • Export Citation
  • Huang, G., Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, 2017: Densely connected convolutional networks. Proc. IEEE Conf. on Computer Vision and Pattern Recognition, Honolulu, HI, Institute of Electrical and Electronics Engineers, 2261–2269, https://doi.org/10.1109/CVPR.2017.243.

  • Izmailov, P., D. Podoprikhin, T. Garipov, D. Vetrov, and A. G. Wilson, 2018: Averaging weights leads to wider optima and better generalization. 34th Conf. on Uncertainty in Artificial Intelligence, Vol. 2, Monterey, CA, Association for Uncertainty in Artificial Intelligence, 876885, https://nyuscholars.nyu.edu/en/publications/averaging-weights-leads-to-wider-optima-and-better-generalization.

  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, 2021: Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA, 118, e2101784118, https://doi.org/10.1073/pnas.2101784118.

    • Search Google Scholar
    • Export Citation
  • Krasnopolsky, V. M., and M. S. Fox-Rabinovitz, 2006: Complex hybrid models combining deterministic and machine learning components for numerical climate modeling and weather prediction. Neural Networks, 19, 122134, https://doi.org/10.1016/j.neunet.2006.01.002.

    • Search Google Scholar
    • Export Citation
  • Krasnopolsky, V. M., M. S. Fox-Rabinovitz, and A. A. Belochitski, 2013: Using ensemble of neural networks to learn stochastic convection parameterizations for climate and numerical weather prediction models from data simulated by a cloud resolving model. Adv. Artif. Neural Syst., 2013, 485913, https://doi.org/10.1155/2013/485913.

    • Search Google Scholar
    • Export Citation
  • Kullback, S., and R. A. Leibler, 1951: On information and sufficiency. Ann. Math. Stat., 22, 7986, https://doi.org/10.1214/aoms/1177729694.

    • Search Google Scholar
    • Export Citation
  • Ling, J., A. Kurzawski, and J. Templeton, 2016: Reynolds averaged turbulence modelling using deep neural networks with embedded invariance. J. Fluid Mech., 807, 155166, https://doi.org/10.1017/jfm.2016.615.

    • Search Google Scholar
    • Export Citation
  • Ma, J., and D. Yarats, 2019: Quasi-hyperbolic momentum and Adam for deep learning. Int. Conf. on Learning Representations, New Orleans, LA, ICLR, 1–38.

  • Maas, A. L., A. Y. Hannun, and A. Y. Ng, 2013: Rectifier nonlinearities improve neural network acoustic models. Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, International Machine Learning Society, 1–6.

  • Maulik, R., O. San, J. D. Jacob, and C. Crick, 2019a: Sub-grid scale model classification and blending through deep learning. J. Fluid Mech., 870, 784812, https://doi.org/10.1017/jfm.2019.254.

    • Search Google Scholar
    • Export Citation
  • Maulik, R., O. San, A. Rasheed, and P. Vedula, 2019b: Subgrid modelling for two-dimensional turbulence using neural networks. J. Fluid Mech., 858, 122144, https://doi.org/10.1017/jfm.2018.770.

    • Search Google Scholar
    • Export Citation
  • McGovern, A., R. Lagerquist, D. J. Gagne II, G. E. Jergensen, K. L. Elmore, C. R. Homeyer, and T. Smith, 2019: Making the black box more transparent: Understanding the physical implications of machine learning. Bull. Amer. Meteor. Soc., 100, 21752199, https://doi.org/10.1175/BAMS-D-18-0195.1.

    • Search Google Scholar
    • Export Citation
  • Norman, M. R., 2021: A high-order WENO-limited finite-volume algorithm for atmospheric flow using the ADER-differential transform time discretization. Quart. J. Roy. Meteor. Soc., 147, 16611690, https://doi.org/10.1002/qj.3989.

    • Search Google Scholar
    • Export Citation
  • Norman, M. R., and Coauthors, 2022: Unprecedented cloud resolution in a GPU-enabled full-physics atmospheric climate simulation on OLCF’s summit supercomputer. Int. J. High Perform. Comput. Appl., 36, 93105, https://doi.org/10.1177/10943420211027539.

    • Search Google Scholar
    • Export Citation
  • Partee, S., M. Ellis, A. Rigazzi, S. Bachman, G. Marques, A. Shao, and B. Robbins, 2021: Using machine learning at scale in HPC simulations with SmartSim: An application to ocean climate modeling. arXiv, 2104.09355v1, https://doi.org/10.48550/arXiv.2104.09355.

  • Piomelli, U., P. Moin, and J. H. Ferziger, 1988: Model consistency in large eddy simulation of turbulent channel flows. Phys. Fluids, 31, 18841891, https://doi.org/10.1063/1.866635.

    • Search Google Scholar
    • Export Citation
  • Pressel, K. G., C. M. Kaul, T. Schneider, Z. Tan, and S. Mishra, 2015: Large-eddy simulation in an anelastic framework with closed water and entropy balances. J. Adv. Model. Earth Syst., 7, 14251456, https://doi.org/10.1002/2015MS000496.

    • Search Google Scholar
    • Export Citation
  • Pressel, K. G., S. Mishra, T. Schneider, C. M. Kaul, and Z. Tan, 2017: Numerics and subgrid-scale modeling in large eddy simulations of stratocumulus clouds. J. Adv. Model. Earth Syst., 9, 13421365, https://doi.org/10.1002/2016MS000778.

    • Search Google Scholar
    • Export Citation
  • Randall, D., 2013: Beyond deadlock. Geophys. Res. Lett., 40, 59705976, https://doi.org/10.1002/2013GL057998.

  • Randall, D., M. Khairoutdinov, A. Arakawa, and W. Grabowski, 2003: Breaking the cloud parameterization deadlock. Bull. Amer. Meteor. Soc., 84, 15471564, https://doi.org/10.1175/BAMS-84-11-1547.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Schneider, T., S. Lan, A. Stuart, and J. Teixeira, 2017a: Earth system modeling 2.0: A blueprint for models that learn from observations and targeted high-resolution simulations. Geophys. Res. Lett., 44, 12 39612 417, https://doi.org/10.1002/2017GL076101.

    • Search Google Scholar
    • Export Citation
  • Schneider, T., J. Teixeira, C. S. Bretherton, F. Brient, K. G. Pressel, C. Schär, and A. P. Siebesma, 2017b: Climate goals and computing the future of clouds. Nat. Climate Change, 7, 35, https://doi.org/10.1038/nclimate3190.

    • Search Google Scholar