Learned 1D Passive Scalar Advection to Accelerate Chemical Transport Modeling: A Case Study with GEOS-FP Horizontal Wind Fields

Manho Park aDepartment of Civil and Environmental Engineering, University of Illinois Urbana–Champaign, Urbana, Illinois

Search for other papers by Manho Park in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-8645-3507
,
Zhonghua Zheng bDepartment of Earth and Environmental Sciences, The University of Manchester, Manchester, United Kingdom

Search for other papers by Zhonghua Zheng in
Current site
Google Scholar
PubMed
Close
,
Nicole Riemer cDepartment of Climate, Meteorology and Atmospheric Sciences, University of Illinois Urbana–Champaign, Urbana, Illinois

Search for other papers by Nicole Riemer in
Current site
Google Scholar
PubMed
Close
, and
Christopher W. Tessum aDepartment of Civil and Environmental Engineering, University of Illinois Urbana–Champaign, Urbana, Illinois

Search for other papers by Christopher W. Tessum in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

We developed and applied a machine-learned discretization for one-dimensional (1D) horizontal passive scalar advection, which is an operator component common to all chemical transport models (CTMs). Our learned advection scheme resembles a second-order accurate, three-stencil numerical solver but differs from a traditional solver in that coefficients for each equation term are output by a neural network rather than being theoretically derived constants. We subsampled higher-resolution simulation results—resulting in up to 16× larger grid size and 64× larger time step—and trained our neural-network-based scheme to match the subsampled integration data. In this way, we created an operator that has low resolution (in time or space) but can reproduce the behavior of a high-resolution traditional solver. Our model shows high fidelity in reproducing its training dataset (a single 10-day 1D simulation) and is similarly accurate in simulations with unseen initial conditions, wind fields, and grid spacing. In many cases, our learned solver is more accurate than a low-resolution version of the reference solver, but the low-resolution reference solver achieves greater computational speedup (500× acceleration) over the high-resolution simulation than the learned solver is able to (18× acceleration). Surprisingly, our learned 1D scheme—when combined with a splitting technique—can be used to predict 2D advection and is in some cases more stable and accurate than the low-resolution reference solver in 2D. Overall, our results suggest that learned advection operators may offer a higher-accuracy method for accelerating CTM simulations as compared to simply running a traditional integrator at low resolution.

Significance Statement

Chemical transport modeling (CTM) is an essential tool for studying air pollution. CTM simulations take a long computing time. Modeling pollutant transport (advection) is the second most computationally intensive part of the model. Decreasing the resolution not only reduces the advection computing time but also decreases accuracy. We employed machine learning to reduce the resolution of advection while keeping the accuracy. We verified the robustness of our solver with several generalization testing scenarios. In our 2D simulation, our solver showed up to 100 times faster simulation with fair accuracy. Integrating our approach to existing CTMs will allow broadened participation in the study of air pollution and related solutions.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Christopher W. Tessum, ctessum@illinois.edu

Abstract

We developed and applied a machine-learned discretization for one-dimensional (1D) horizontal passive scalar advection, which is an operator component common to all chemical transport models (CTMs). Our learned advection scheme resembles a second-order accurate, three-stencil numerical solver but differs from a traditional solver in that coefficients for each equation term are output by a neural network rather than being theoretically derived constants. We subsampled higher-resolution simulation results—resulting in up to 16× larger grid size and 64× larger time step—and trained our neural-network-based scheme to match the subsampled integration data. In this way, we created an operator that has low resolution (in time or space) but can reproduce the behavior of a high-resolution traditional solver. Our model shows high fidelity in reproducing its training dataset (a single 10-day 1D simulation) and is similarly accurate in simulations with unseen initial conditions, wind fields, and grid spacing. In many cases, our learned solver is more accurate than a low-resolution version of the reference solver, but the low-resolution reference solver achieves greater computational speedup (500× acceleration) over the high-resolution simulation than the learned solver is able to (18× acceleration). Surprisingly, our learned 1D scheme—when combined with a splitting technique—can be used to predict 2D advection and is in some cases more stable and accurate than the low-resolution reference solver in 2D. Overall, our results suggest that learned advection operators may offer a higher-accuracy method for accelerating CTM simulations as compared to simply running a traditional integrator at low resolution.

Significance Statement

Chemical transport modeling (CTM) is an essential tool for studying air pollution. CTM simulations take a long computing time. Modeling pollutant transport (advection) is the second most computationally intensive part of the model. Decreasing the resolution not only reduces the advection computing time but also decreases accuracy. We employed machine learning to reduce the resolution of advection while keeping the accuracy. We verified the robustness of our solver with several generalization testing scenarios. In our 2D simulation, our solver showed up to 100 times faster simulation with fair accuracy. Integrating our approach to existing CTMs will allow broadened participation in the study of air pollution and related solutions.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Christopher W. Tessum, ctessum@illinois.edu

1. Introduction

Atmospheric chemical transport models (CTMs) are a key research tool to study air pollution and to predict outcomes of pollution mitigation efforts. Currently available CTMs [e.g., GEOS-Chem (Bey et al. 2001), Community Multiscale Air Quality (CMAQ) model (EPA 2022), and WRF-Chem (Grell et al. 2005)] numerically represent physical and chemical processes including emission, chemical reactions, transport, and deposition discretized across many grid boxes. This numerical discretization engenders a dilemma between accuracy and computational cost: higher-resolution simulations are more accurate but can be computationally intractable for some use cases. Parallel and distributed computation can reduce the overall execution time of simulations but add cost and complexity. For example, Zhuang et al. (2020) found that an 8× increase in distributed computing power resulted in only a 4× increase in speed for GEOS-Chem simulations.

Recent efforts have explored the use of machine-learned (ML) surrogate modeling to address computational costs in CTMs. Because the chemistry solver is the most expensive part of the atmospheric chemistry model, most ML research related to the acceleration of CTMs has focused on the chemistry operator (Keller et al. 2017; Keller and Evans 2019; Shen et al. 2022; Kelp et al. 2020, 2022; Huang and Seinfeld 2022). Advective transport operator is generally the second-most expensive module in atmospheric chemical transport models such as GEOS-Chem (Eastham et al. 2018) and the most expensive module of atmospheric models which do not consider detailed chemistry such as the E3SM Atmosphere Model (EAM) (Golaz et al. 2022; Caldwell et al. 2019). However, we are only aware of one existing study of machine-learned advection (Zhuang et al. 2021). Sturm et al. (2023) accelerated the advection operator by compressing the number of species, rather than accelerating the advection solver itself. However, recent progress in machine-learned computational fluid dynamics (CFD) and turbulence simulation (Kochkov et al. 2021; Stachenfeld et al. 2021; Brunton et al. 2020; Vinuesa and Brunton 2022) has laid a solid foundation for the acceleration of the transport operator in the context of air quality modeling.

In this study, we develop a machine-learned 1D horizontal advection solver in the context of an air quality model. We build on the work of Zhuang et al. (2021), who trained a model using synthetic velocity fields within the rectangular computational domain with a fixed grid size. Here, as a step toward full-scale deployment of an ML advection operator in a CTM, we instead use real-world historical wind velocity data and grid cells that change size with changing latitude. Furthermore, we extensively explore the robustness of our approach by limiting the training data used and by testing its performance under a variety of different conditions. Altogether, the results herein characterize the promise of machine-learned advection operators for use in CTMs and outline the remaining challenges to their full-scale adoption.

The remaining parts of this paper are structured as follows: Section 2 describes our methodology, including the numerical advection scheme used as a reference solver, the structure of the learned scheme which we train to match the results of our reference solver, and numerical experiments including generalization testing and 2D application. Section 3 presents the results of the experiments introduced in the methodology section. Section 4 discusses strengths and limitations of our approach and future research needs. Section 5 concludes this paper.

2. Methodology

Our methodology is described in detail below. In summary, we first create a training dataset based on a single 1D, 10-day-long reference model simulation with square-wave initial concentrations and wind fields sampled from a single longitude across the United States. This simulation results in concentration predictions for 2880 time steps in each of 224 grid cells, which we optionally reduce with spatial or temporal coarsening of up to a factor of 64. We create our training dataset by splitting these results into segments of 10 time steps, resulting in a number of training samples ranging from (2880/64 − 10) = 35 to (2880/1 − 10) = 2870, depending on the temporal coarsening factor used. Our models are trained by minimizing recurrent prediction error over those 10-time step simulation segments.

We test the model by evaluating its average error over the entire 10-day simulation, rather than over the 10-time step segments it was trained on. Finally, we evaluate the model’s generalization capability based on its performance in other times and locations, for longer simulation durations, and even for 2D predictions. Figure 1 shows a visual summary of the training, testing, and generalization process in this study.

Fig. 1.
Fig. 1.

Symbolic representation of the training, testing, and generalization process in this study.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

By training our models on results of a single small simulation and evaluating the models under a wide range of conditions, our goal is to explore the potential of our approach, rather than to create models immediately ready for widespread use.

a. Numerical advection

1) Numerical scheme and dataset generation

We implemented the so-called L94 advection scheme (Lin et al. 1994) using the Julia scientific computing language (Bezanson et al. 2012). The L94 advection scheme is a second-order accurate van Leer-type advection scheme and has been used in a well-established chemical transport model, GEOS-Chem (Bey et al. 2001), when coupled with a nondirectional multidimensional splitting (Lin and Rood 1996). An implementation in Julia—a differentiable programming language (Innes et al. 2019)—allows us to train machine-learned models by backpropagating error gradients through multiple model time steps. Following Lin et al. (1994), the equations for our 1D numerical advection reference model are as follows.

Suppose we have a scalar field discretized across a 1D grid, with concentration Φin in the ith grid point at the nth time step. Then, the cell average value (Φi+1/2n) can be computed as Eq. (1) because the L94 scheme assumes the piecewise linear distribution of the scalar field inside a cell:
Φi+1/2n=(Φin+Φi+1n)/2.
The ultimate purpose of this flux form operator is to calculate the next time cell average as in Eq. (2):
Φi+1/2n+1=Φi+1/2nΔtΔx[f(i+1)f(i)],
where the flux f(i) through the cell boundary can be expressed by Eq. (3a) for Ui ≥ 0 or Eq. (3b) for Ui < 0, where Ui is the velocity at the ith edge of the grid cell:
f(i)=Ui[Φi1/2n+ΔΦi1/2n2(1Ci)],
f(i)=Ui[Φi+1/2n+ΔΦi+1/2n2(1+Ci+)].
Both Ci and Ci+ are Courant–Friedrichs–Lewy (CFL) numbers (Courant et al. 1928), which can be calculated using Eq. (4a) for Ui < 0 and Eq. (4b) for Ui ≥ 0, respectively. Here, Δxi−1/2 and Δxi+1/2 are the grid spacing on the left and right sides of the ith gridcell edge and Δt is the time step.
Ci=UiΔtΔxi1/2,
Ci+=UiΔtΔxi+1/2.
The key feature of this reference scheme is the method for deriving the derivative of the cell-averaged scalar, which is implemented using a monotonicity constraint [denoted “mono5” following Lin et al. (1994)] to ensure numerical stability as shown in Eq. (5):
[ΔΦi+1/2]mono5=sign([ΔΦi+1/2]avg)×min[|[ΔΦi+1/2]avg|,2(Φi+1/2Φi+1/2min),2(Φi+1/2maxΦi+1/2)].
In Eq. (5), Φi+1/2max and Φi+1/2min represent the local maximum and minimum, respectively, of Φ in the three-point stencil centered at i + 1/2. Then, [ΔΦi+1/2]avg is the average of the local spatial difference in Φ as in Eq. (6):
[ΔΦi+1/2]avg=(δΦi+δΦi+1)2,
where
δΦi=Φi+1/2nΦi1/2n.
Our 1D simulation domain is a straight line across the GEOS-FP 0.25° × 0.3125° North America nested grid (NASA 2022) which covers 130°–60°W and 9.75°–60°N. We used the component of the ground-level wind field aligned with the domain line direction—for example, if the line lay in the east–west direction, we would use the east–west component of the given wind field. We use velocity data retrieved from 0.25° × 0.3125° GEOS-FP wind field (NASA 2022) that has been preprocessed for use with GEOS-Chem Classic and is therefore on a rectangular rather than cubed-sphere grid.

We generated a training dataset by simulating advection through the east–west horizontal line on 39.00°N with 0.3125° spatial resolution, using a 300-s time step with the reference solver described above. The thick blue line in Fig. 2 shows the training domain. The simulation period is the first 10 days of January 2019. We fed a square-shaped initial condition which has a 100-ppb mixing ratio on the central one-third of the domain and zero on the rest of the domain. We chose 100 ppb for the initial mixing ratio of the passive tracer as it is a typical mixing ratio for atmospheric pollutants and the default initial mixing ratio of the passive scalar in the GEOS-Chem transport tracer simulation. The square-shaped initial condition results in a challenging training dataset because the integrated wave will have a stiff gradient at the beginning of the simulation, which becomes smoother as the simulation progresses. We specified zero-gradient spatial boundary conditions. This results in simulations with 224 grid cells and 2880 integration time steps.

Fig. 2.
Fig. 2.

Visualization of the 1D training and testing domain.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

2) Spatiotemporal coarsening

We subsampled the scalar outputs of the advection simulation with factors of 1×, 2×, 4×, 8×, and 16× in space and 1×, 2×, 4×, 8×, 16×, 32×, and 64× in time. We applied both spatial and temporal subsampling together, resulting in 35 different scenarios of data coarsening. (We subsampled the output of the reference simulation by averaging as shown in Fig. A1 in the appendix) At maximum coarsening in both space and time, the data dimensionality was reduced from 224 cells × 2880 steps to 14 cells × 45 steps. We then trained machine learning models to reproduce the subsampled reference simulation.

b. Development of the learned advection scheme

1) Model structure

As illustrated in Fig. 3, we designed a physics-informed machine learning solver that is fed with scalar and velocity fields and returns the surrogate coefficients for numerical integration of a three-stencil second-order accurate scheme. We represented the temporal gradient using six terms describing three-stencil first- and second-order derivatives as shown in Eq. (8):
Φin+1=Φin+k1ΔtΔx[a1,a2,a3][Φi1n,Φin,Φi+1n]+k2(ΔtΔx)2[b1,b2,b3][Φi1n,Φin,Φi+1n],
where k1 and k2 are constant gradient-scaling factors (to help with training) and ai and bi are the surrogate coefficients generated by the learned solver. This formulation gives our learned scheme the flexibility to use different values of Δx, which can represent the variable relationship between gridcell size in meters and degrees at different latitudes. Overall, Eq. (8) results in a machine-learned model with a strong inductive bias based on the mathematical framework described in section 2a(1).
Fig. 3.
Fig. 3.

Design of the neural-net-based models to emulate the L94 advection solver. The Φn is the scalar field and Un is the velocity field.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

We used a 3-layer convolutional neural network (CNN) with 32 filters per layer to fit the surrogate coefficients. Zhuang et al. (2021) used a 4-layer CNN with 32 filters per layer to emulate 1D advection and a 10-layer CNN with 32 filters per layer for 2D. We used 32 filters per layer as they did and reduced the number of layers from 4 to 3 to reduce computational cost. We used the Flux.jl (Innes et al. 2018) machine learning software library to implement the neural network. We trained a separate neural network for each combination of temporal and spatial coarsening. We used a default kernel size of 3, corresponding to a three-stencil scheme, but for simulations where the maximum CFL number is larger than 1, we increased the CNN kernel size (and stencil size) to 5, 9, 17, or 31 to provide a suitable spatiotemporal information horizon. For example, if the maximum CFL number of the subsampled velocity field is between 1 and 2, we set the kernel size to be 5 (i.e., two cells on the left side, one in the center, and two cells on the right side) to allow our algorithm to access information about all scalar values which may pass through the grid cell of interest during a given time step. To facilitate training our neural network across the stiff gradients that characterize this system, we employed the Gaussian error linear unit (GeLU; Hendrycks and Gimpel 2016) activation function as suggested by Kim et al. (2021). For the final activation layer, we used hyperbolic tangent to limit the rate of potential error growth. We chose the hyperbolic tangent function to limit the predicted coefficients to the range −1:1, for consistency with Taylor’s series coefficients that are typically used in advection schemes and also to encourage numerical stability. To normalize the temporal gradient for each term, we multiplied scaling factors on each term [k1 and k2 in Eq. (8)], with magnitudes similar to the inverse of Δtx and (Δtx)2. In this study, we used 100 for k1 and 10 000 for k2, because Δt = 300 s and Δx ≈ 30 000 km in the baseline resolution.

2) Neural net training

We trained a separate neural network for each coarse-graining resolution listed in section 2a(2). We trained our neural network to minimize the absolute error of a 10-step prediction following Zhuang et al. (2021)’s suggestion and to reflect the temporal variation of the advected scalar field. (This means that each gradient descent step involves running the model forward for 10 steps with the current neural network weights, calculating the gradient of the average error among the 10 steps with respect to the weights, and then updating the weights in the opposite direction of the gradient.) In our experiments, the use of a single-time step loss function resulted either in numerical instability or in mode collapse (where the model kept reproducing the initial condition rather than showing appropriate temporal evolution). Input feature standardization was performed by multiplying the scalar array by 107 and dividing the velocity array by 15 to ensure both scalar and velocity would be bound to a scale of 100 for compatibility with the default parameters of the optimizer that we use. We introduced uniformly distributed noise with a maximum value of 3 × 10−4 times the magnitude of the initial condition into the scalar array to let the model learn the physical patterns behind the noise and prevent error accumulation, as shown by Stachenfeld et al. (2021). We used the Adam optimizer (Kingma and Ba 2014) with the default parameters in Flux.jl. We modified the learning rate ρ to decay with the number of training epochs as in Eq. (9):
ρ=2×1031+epoch.
The purpose of the decaying learning rate is to encourage model convergence. We trained the model for 100 epochs by default and saved the model parameters every epoch. When the local minimum of training loss was observed within 100 epochs, we chose the model weights from the best epoch. When the training loss was still decreasing after 100 epochs, we continued training until the loss stopped decreasing. (The training epoch with the best performance for each resolution is summarized in Table A1.)

After training the model, we evaluated the correctness of our learned schemes in emulating the reference solver. Following common practice for the evaluation of CTMs (Simon et al. 2012), we evaluated the performance of the learned scheme using three different statistics: mean absolute error, root-mean-square error, and R2, all averaged throughout the entire simulation rather than for a single time step. We tested our model’s ability to replicate the full 10-day simulation it was trained on—which is longer than the 10-time step segments of that simulation which were used as training data—as well as additional tests of model generality as described below.

We evaluated the computational acceleration by the learned solver against the finest-resolution reference solver. The computational time to integrate the same duration (10 days) by each solver was measured using a single CPU (dual 6248 Cascade Lake CPU) within an HPE Apollo 6500 node installed in the University of Illinois Campus Cluster. We used the @benchmark macro of BenchmarkTools.jl (JuliaCI 2022) to collect the statistics in time evaluation. We calculated the speedup as shown in Eq. (10):
speedup=mean(timelearned)mean(timebaseline),
where timelearned is the computational time spent by the learned solver and timebaseline is the computational time spent by the reference solver in the finest resolution. When the learned solver is faster than the baseline, the speedup is larger than 1.

c. Generalization ability testing

We evaluated the generalization ability of the learned schemes against a longer time span (3× longer than the training dataset), different initial conditions (Dirac-delta shape and Gaussian distribution shape), and different wind fields. The tests on the different wind fields can be further categorized by seasons (the first 10 days in April, July, and October to represent spring, summer, and fall, respectively), latitudes (29.5° and 45°N), and domain direction (longitudinal line at 76.875°W using the north–south wind velocity component). The pale blue lines in Fig. 2 show the spatial domains of the generalization tests. For the longer-term stability test, we simulated scalar advection from 1 to 30 January. For other tests, the integration time span is 10 days as in the training data.

Each test has a different purpose. The longer-duration simulations are used to evaluate the degradation of predictive performance over longer time horizons. Changing the shape of the initial conditions tests whether the learned schemes were overfitted to the initial condition shape used in the training dataset. The Dirac delta initial condition test investigates the performance of the learned scheme with extremely large spatial gradients. The purpose of using different velocity fields is to assess the robustness of the learned scheme on the different wind patterns. Finally, the tests with changes in latitude and domain direction evaluate the learned schemes’ ability to work with different grid sizes. For context, the grid size of the training set (Δx, 0.3125° in 39°N) is approximately 27.0 km, while Δx in 29.5° and 45°N is 24.6 and 30.3 km, respectively. The longitudinal grid size, Δy = 0.25°, is approximately 27.8 km at all longitudes.

d. Multidimensional splitting

We implemented 2D advection in the same way for both the reference solver and the learned scheme developed in this study. In both cases, 1D advection is converted to 2D using nondirectional splitting as described in Lin and Rood (1996) and reproduced in Eq. (11):
Qn+1=Qn+F[Qn+12G(Qn)]+G[Qn+12F(Qn)]
where Qn is the scalar value in a specific point on the 2D domain at time n and F and G are the x-axis and y-axis discretization operators, respectively. This means that for the learned solver, we used models to make 2D predictions that were only trained to make 1D predictions (at the same spatial and temporal resolutions). All simulations were performed over the GEOS-FP 0.25° × 0.3125° North America nested grid (130°–60°W, 9.75°–60°N). The simulation period is the first 10 days in January 2019. The baseline resolution is 0.25° × 0.3125° in space and 5 min in time. Like the 1D study, coarse-graining was performed by doubling the spatial grid size and the time step several times from 0.25° × 0.3125° × 5 min and up to 4.0° × 5.0° × 5 h 20 min.

3. Results

a. Model performance in emulating the training dataset

Figure 4 summarizes trained model performance in integrating the training dataset with different levels of spatial and temporal coarsening. There are three cases where the learned solver returned unstable outputs, which are discussed in the final paragraph of this section. Except for these unstable cases, the overall errors are fairly small. MAEs and RMSEs range from 2.47–24.8 to 3.94–45.38 ppb, respectively (Figs. 4a,b), and are mostly less than 10 ppb. (For comparison, initial concentrations are 100 ppb.) The R2 values are mostly higher than 0.87 with a worst case of 0.70 (Fig. 4c). The most accurate emulation was shown in the 8Δx–16Δt case rather than in the finer grid with a shorter time interval. This is because in the coarser grid, the fine-scale features are averaged out, so the model training is easier than the finer grid. The maximum computational speedup compared to the high-resolution reference model is 18.0× at the coarsest spatial and temporal resolutions. However, under many conditions, our ML model is slower than the reference model owing to the computational intensity of the neural network. We find that temporal coarsening reduces computation time proportionately to the coarsening factor (e.g., 4Δt is 2× faster than 2Δt). Spatial coarsening also reduces computation time but subproportionally to the coarsening factor (e.g., 4Δx is <2× faster than 2Δx) because operations that are sequential in space can leverage the single instruction multiple data (SIMD) compiler optimizations whereas operations that are sequential in time cannot. (The statistics of the computational time by the learned solvers and the reference solvers are given in Tables A2 and A3.)

Fig. 4.
Fig. 4.

Performance of the learned solver in emulating the training dataset at different coarse-graining resolutions as compared to the highest-resolution reference solver simulation. Conditions where integration failed are marked “Unstable.”

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Computational speedup values reported in Fig. 4 and elsewhere are computed against our Julia implementation of the reference model described in section 2a(1). As a sense check to demonstrate that any ML model speedups that we observed are not owing to an overly slow implementation of the reference model, we compare the execution time of our reference model to that of a GEOS-Chem “transport tracer” configuration, which only computes advection. We turned off other modules including chemistry, depositions, radiation, convection, and boundary layer mixing. On a single CPU in the Illinois Campus Cluster hardware described above, the GEOS-Chem “transport tracer” configuration in the nested 0.25° × 0.3125° North America domain with 4.0° × 5.0° global simulation for providing boundary conditions takes 53 677 s whereas the global 4.0° × 5.0° simulation by itself takes 613 s for a 10-day simulation. Therefore, the computational time for a 10-day simulation in nested North America is to be 53 677 − 613 = 53 065 s or 1.89 μs per grid cell per time step using a single CPU core, whereas our reference solver takes 0.06 μs per grid cell per time step using a single CPU core. Although this comparison does not show that our reference solver is faster than GEOS-Chem—a “transport tracer” simulation does in fact not only do advection so it is not a fair comparison—it does suggest that our reference algorithm is not unreasonably slow.

To evaluate the benefit of our learned advection model, we compare its performance with simulations using the reference method run at spatial and temporal resolutions matching those of the learned models. Figure 5 summarizes the performance of the lower-resolution reference solver. Interestingly, the coarsening with the reference solver appears to have a preferential regime that lies along diagonal elements of the performance matrix (e.g., 2Δx–4Δt and 4Δx–16Δt). Previous research has suggested that shorter time steps unconditionally result in higher model accuracy (Philip et al. 2016), but results here may provide contradictory evidence. Our results instead suggest that matching temporal and spatial coarsening may in some cases enhance the performance of the model as well as the computational speed, possibly because temporal coarsening averages out the high-frequency features which could trigger numerical diffusion in low spatial resolution. Also, the native time step in the GEOS-FP wind field is 60 min, and thus, temporal coarsening from 5 min to a longer step can be beneficial by reducing unnecessary computation and minimizing possible numerical error accumulation. Regardless, additional investigation is required to fully understand the phenomenon we observe here.

Fig. 5.
Fig. 5.

Performance of the reference solver in coarse-graining the fine-resolution training dataset in different resolutions as compared to the highest-resolution reference solver simulation. Conditions where integration failed are marked “Unstable.”

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

When both solvers are run at the same resolution, the computational speed of the reference solver is higher than that of the learned solver, owing to the computational complexity of the neural network (Figs. 4d and 5d). For resolutions where we use a 3-layer CNN, the low-resolution reference solver was 18.7–28.5 times faster than the learned solver. For the (1Δx, 4Δt) and (1Δx, 32Δt) resolutions, we add an additional CNN layer to prevent instability, resulting in learned solvers that are ∼1000× slower than the low-resolution reference solvers. Given the fact that the reference solver with certain coarsening factors such as (8Δx, 64Δt) could achieve much faster speedup than the maximal speedup from the learned solver as well as favorable accuracy, we cannot say that the learned solver in this study is strictly superior to the traditional solver. Instead, we would like to argue that the learned solver presented here could be a complementary option in cases where the traditional solver does not work well. For example, the reference solver does not work well in cases with large grid cells combined with short time steps (Fig. 5)—such as nested simulations with coarse global domains but small time steps to match the higher-resolution inner domains—but the learned solver can still operate with high accuracy in this regime. GEOS-Chem Classic operates the transport with 10 min by default in the global simulation and 5 min by default in the nested simulation. Using our learned solver, we could increase the time interval to even more than 10 min.

The learned solvers with the highest accuracy predicting the training dataset (8Δx, 16Δt) and largest acceleration (16Δx, 64Δt) are shown in Fig. 6, which shows that the spatial patterns with time evolution are well described by the learned schemes in both resolutions. Accuracy is not perfect, and deviations are more noticeable in (16Δx, 64Δt), but R2 values combined across all time steps are high in both cases (0.99 and 0.90, respectively).

Fig. 6.
Fig. 6.

Time series representation of the learned emulation of the training dataset. (top row) 8Δx, 16Δt and (bottom row) 16Δx, 64Δt. The initial shapes are slightly different from the square initial condition because of coarse-graining.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

We are unable to successfully train the model with the coarsening factors of (1Δx, 8Δt), (1Δx, 16Δt), and (1Δx, 64Δt). In these cases, the learned model simulations become numerically unstable. One possible explanation is that those regimes have large CFL numbers due to large temporal coarsening factor relative to that in space. (The low-resolution reference model also becomes unstable under similar conditions.) The maximum CFL numbers in (1Δx, 8Δt), (1Δx, 16Δt), and (1Δx, 64Δt) are 1.82, 3.63, 7.26, and 14.52, respectively. (The maximum CFL numbers in all the tested coarsening cases are summarized in Table A4.) We use larger stencils and larger neural networks in such cases to account for the increased spatiotemporal transport horizon; this solved the problem for (1Δx, 4Δt) and (1Δx, 32Δt) but not for the cases above. Resolving this instability within a computationally efficient framework is a topic for future research.

b. Generalization ability

Figure 7 summarizes error statistics for the generalization tests described in section 2c. (Detailed results for each test are shown in Figs. A2A6.) The numerical errors in generalization testing datasets are similar to the error in the training dataset in most cases, but outliers exist in the generalization tests with errors substantially larger than that found with the training dataset, and numerical instability does occur in some cases. In most of the unstable cases, we could resolve the instability by using model parameters from a training epoch different than the one that is optimal for the training dataset (those cases are shown in bold letters in Figs. A2A6). The goal of the analysis here is to explore the performance and flexibility of our method when trained on a small dataset, but we would expect to see better generalization if we used a more diverse training dataset.

Fig. 7.
Fig. 7.

Boxplots of error statistics in generalization tests for (a) MAE, (b) RMSE, and (c) R2. Each data point represents one combination of spatial and temporal resolutions. Unstable simulations are not shown here but are shown in Figs. A2A6.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

In Fig. 7, the “3 times longer” case shows that predictive performance does not significantly degrade over an extended prediction horizon. It also suggests that the learned solver would work well in integrating wind fields with similar characteristics (e.g., season and latitude) to the training dataset.

The Dirac-delta initial condition test shows that changing the initial condition deteriorates the performance when the new initial condition is extraordinarily stiff (Fig. 7). The normalized error is small, but this is because the initial mass is small, rather than indicating good model performance. The R2 value better captures the large uncertainty of the learned emulation in this case. In contrast, the learned solver is robust to the initial condition with a smooth gradient (Gaussian shape; Fig. 7). Zhuang et al. (2021) found their learned 1D advection scheme had a tendency to have a square-shaped scalar even though they fed the Gaussian-distribution-like initial condition because their scheme was trained with the square wave moving to one direction. In our study, that does not happen because our solver could learn from not only the initial shape but also different spatial patterns with time evolution. However, our results still imply that training our model with a wider range of initial conditions would be useful.

The effects of seasonality and different spatial domains are larger than other testing scenarios in terms of median error (Fig. 7). The largest discrepancy in median appears in changing the spatial domain to a longitudinal direction. The largest outlier is seemingly shown in the October wind case, but this is only because Fig. 7 does not show unstable simulations in the July and 29.5°N tests for (2Δx, 64Δt) resolution. As above, we expect that these issues could be resolved by training our model on a more diverse dataset that has different ranges of velocity, stiffness of gradient, etc.

To look closer into the effect of wind velocity range on the generalization skill of learned solvers, we plot MAE versus maximum CFL number of the velocity fields used in the generalization testing (Fig. 8). This plot shows that the learned solver tended to be unstable when the maximum CFL is far from the maximum CFL of the training dataset. The testing cases with the largest discrepancy in maximum CFL with the training dataset show numerical instabilities in (2Δx, 64Δt) resolution. This feature might originate from the structure of our learned solver. As shown in Fig. 3, the learned solver could scale the magnitude of scalar values but not in the case of the velocity field. Therefore, the neural network could fail to give the learned coefficients in a reasonable range when the input velocity is not close to the training regime. Based on this intuition, in future work, we will want to train the solver with a larger dataset that has a wider range of velocity. However, as mentioned above, our goal here is to explore the potential of this approach rather than producing a highly tuned 1D advection solver.

Fig. 8.
Fig. 8.

Dependence of generalization test accuracy on the maximum CFL number of target wind fields. Each boxplot is the summary of errors in each testing scenario integrating wind data different from the training dataset.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Overall, the results from our 1D generalization tests suggest that our learned scheme can work in a relatively wide set of circumstances, even when trained on an extremely limited dataset. Our results additionally suggest that it is important for the training dataset to include a distribution of CFL numbers similar to the conditions the model is expected to be deployed under. This will likely be even more crucial in 3D applications where wind speed varies greatly with altitude.

c. 2D demonstration

Figure 9 shows the performance of the learned 1D advection solvers when implemented in a 2D simulation. Even though our learned solvers are trained in 1D, 10 among the 35 2D simulation cases achieve R2 values above 0.8. This is a fascinating result given that the velocity fields used in 2D advection have more complex and diverse features than exist in our 1D training dataset. Simply, the 2D advection requires 224 × 201 times of 1D advection, and the relative variety of wind patterns in the 2D dataset could be a function of that dimension. In this regard, our approach looks promising even in 2D applications. The best accuracy is observed in (2Δx, 8Δt), which shows remarkable performance including R2 = 1.00. The maximal acceleration is 340 times in the coarsest resolution. (Computational time statistics for 2D advection by the learned solvers are given in Table A5.)

Fig. 9.
Fig. 9.

Performance of the learned solver in 2D implementation in different coarse-graining resolutions. Cases with bold numbers were unstable with the optimal training epoch but stable when using a different training epoch.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Although the fact that our 1D solver works at all in a 2D context has exceeded our expectations, we nonetheless proceed with an analysis of strengths and weaknesses with the goal of informing future work. Unlike the results in 1D (Fig. 4), 2D simulations in several resolutions show poor results (R2 < 0.5). There are unstable simulations in 2D as well: simulations in (1Δx, 2Δt), (1Δx, 4Δt), (1Δx, 32Δt), and (2Δx, 64Δt) are unstable in 2D advection, though they are stable in 1D. We suspect that this limited performance may be caused by the existence of conditions (e.g., a greater range of wind speeds) that were not present in the training dataset.

To fairly evaluate our approach, we also test the 2D simulations with the low-resolution reference solver. Similar to 1D coarsening cases, 2D advection using the reference solver shows good performance in the diagonal elements of the coarsening performance matrix (Fig. 10). In other coarsening regimes, the reference solvers show a poor R2 but usually maintain small errors. This is because the reference solver has the local slope limiter to satisfy the total variance diminishing property, and thus, the simulation is free from spurious oscillation when the simulation does not violate the CFL condition (Durran 2010). Instead, the solution dissipates to zero due to numerical diffusion, resulting in not only low error but also low correlation. Instability occurs when the coarsening results in the CFL number greater than one. Numerical techniques exist that can handle conditions with CFL > 1, but they lie outside of the focus of the current study. The low-resolution reference solvers also achieve very good performance in some resolutions including (2Δx, 4Δt) and (2Δx, 8Δt). The computational speedup by coarsening is always greater with the reference solver than with the learned solver in the same resolution, as explained in section 3a. The maximum acceleration in 2D by the reference solver is 7200×. (Table A6 summarizes the computational time for 2D advection by the reference solvers in different scales.)

Fig. 10.
Fig. 10.

Performance of the reference solver in 2D at different resolutions.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

We visualize the 2D implementation results of both reference and learned solvers at two resolutions (Fig. 11). We select (2Δx, 8Δt) and (16Δx, 64Δt), representing the best performance and the maximum acceleration by the learned solvers, respectively. In (2Δx, 8Δt), both solvers almost perfectly emulate the finest grid simulation. Both solvers have a tendency to exaggerate the thickness of the plume, but overall they represent the spatial pattern well. For (16Δx, 64Δt), we additionally feed the 1D generalization testing datasets to the model as well as the original training dataset to enhance the learned model. Originally, the learned scheme trained only with the training dataset has limited performance with R2 = 0.27 (Fig. 9). With the increased training data, the learned scheme R2 is doubled, showing the benefit of introducing more data into the learned model. However, the learned scheme in this resolution overestimates the concentration of the plume and sometimes shows a negative concentration (with a magnitude less than 1 ppb). These artifacts result in a relatively large MAE value. The reference solver in this resolution showed a similar range of R2, but dissipates the plume too quickly, and the overall concentration range near the plume is much too low. However, even though the reference solver is dissipative at this resolution, the MAE value was still smaller than the learned solver.

Fig. 11.
Fig. 11.

2D demonstration of coarse-graining by the learned and reference advection schemes. The first row shows the results of the high-resolution baseline solver. The second and fourth rows are the learned advection in different resolutions. The third and fifth rows are the low-resolution reference solvers at different resolutions.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

As an additional test, we analyzed the power spectra of the simulations shown in Fig. 11 by performing a fast Fourier transform (FFT) in each grid cell and averaging the FFT output over the 2D domain to evaluate the ability of the learned solvers to predict realistic mass transport patterns (shown in Fig. A7). The learned schemes with the native resolution and (2Δx, 8Δt) coarsening reproduced well the power spectra of the reference scheme, while the learned scheme with (16Δx, 64Δt) coarsening correctly represents the shape of the power spectrum distribution while overestimating its amplitude, resulting in the excess dispersion shown in Fig. 11.

4. Discussion

We will discuss the approach described herein in terms of strengths, weaknesses, and directions for future research. The first noticeable strength is the decent performance of the learned model in emulating the reference model, even without using a large training dataset or hyperparameter tuning. The coarse-grained learned solver achieves accuracy that is similar to or better than the corresponding-resolution reference solver for conditions similar to the training dataset and performs well in some cases where the reference solver does not perform well, particularly for simulations with temporal coarsening but without corresponding spatial coarsening. This is a key benefit of our approach because for many applications of air quality modeling, high spatial resolution is much more important than high temporal resolution. For example, CTMs typically save results to disk every one or three simulation hours, discarding all information at a higher temporal resolution than that. Therefore, an approach that allows temporal coarse-graining without corresponding spatial coarse-graining could achieve computational acceleration without requiring any reduction in the detail of the CTM output.

Furthermore, our learned solver generalizes fairly well to conditions it was not trained on, even though it was trained on an extremely limited amount of data. This good generalization performance extends to the application of our learned model for 2D simulations, even though it was only trained in 1D. Finally, our learned solvers can achieve substantial computational acceleration relative to the reference solver, with maximum acceleration at ∼10× for 1D and ∼100× for 2D simulations. Together, these results suggest that a scaled-up version of the approach shown here could be a good candidate for embedding within a full CTM for further testing, and this is a priority area for future work.

One limitation of our learned advection model is that it is numerically unstable under certain conditions. The number of scenarios where numerical instability occurred was similar for the learned and reference solvers, but the conditions that cause instability in the learned solver are somewhat unpredictable, whereas with the reference solver instability predictably occurs at conditions with the highest CFL values.

This limitation could be addressed in future work by using a larger and more diverse training dataset to reduce the chance of the solver encountering conditions that were not well represented during training or by minimizing error over a larger number of time steps to encourage stability over longer time horizons. Fine-tuning the hyperparameters and architecture choices (e.g., number and size of layers and type of activation functions) could also help to develop a more accurate and stable model. Other possibilities could be to bound in the magnitude of the learned coefficients, similar to the idea of a flux limiter, or to add terms to the training loss function to encourage mass conservation or stability (e.g., the amplification ratio in von Neumann analysis should be less than or equal to one).

The models we have trained are specific to rectangular grids at particular resolutions. Future work could develop models that generalize across resolutions and that are applicable to other grid types, such as the cubed-sphere grid natively used by GEOS-FP.

We consider a learned solver to be useful when it is faster than the high-resolution reference solver and more accurate than the equivalent-resolution reference solver. A second limitation of the current study is that this is only true for conditions where the reference solver is particularly inaccurate, i.e., simulations with large time steps and small grid cells or simulations with small grid cells and large time steps (Fig. 5).

Here, we have trained separate models for each combination of spatial and temporal resolutions. Future work could make ML advection operators that can generalize across spatial and temporal resolutions, for example, by taking Courant number and spatial and temporal grid size as inputs rather than wind speed.

In this work, our surrogate solver has a higher computational cost than the reference solver if the resolution is the same. Therefore, the computational acceleration was enabled by a large factor of spatiotemporal coarsening. However, future work could reduce the computational intensity of the learned model by exploring different machine-learning architectures for predicting the operator coefficients [a and b in Eq. (8)]. A learned model with a smaller per-step computational penalty as compared to the reference solver would have a larger range of conditions where it was useful.

We have compared the speed of our reference and ML models using standard CPU processors, but both model types could be accelerated using specialized hardware such as GPUs. Future work could implement a GPU-accelerated version of a traditional or ML-based advection operator within a full-scale atmospheric model.

Our ultimate goal is to incorporate an ML advection operator into a full-scale CTM for general use. Our ML model is designed as a drop-in replacement—its inputs and outputs are the same as those of a traditional advection operator. However, opportunities exist to improve the tools available for integrating ML models into large-scale geoscientific models, as the two model types are typically written in different languages.

5. Conclusions

In this study, we train a learned 1D advection solver with 10 days of transport data and characterize its performance with the goal of exploring its general strengths and weaknesses rather than optimizing its performance for a particular use case. Even with this limited training dataset, the learned 1D advection scheme shows generally stable and accurate emulation which is robust against unforeseen conditions. The 1D learned solver could run the simulation up to 18 times faster than the baseline solver and could be used in the 2D advection with an appropriate splitting technique. The 2D implementation results are still mostly stable, reasonably accurate, and overall more successful than we expected ahead of time. Our approach demonstrates up to 340× computational speedup in the 2D application. For simulations with temporal coarse-graining but without spatial coarse-graining, numerical instability occurred occasionally. Addressing this instability in future research will help to fully realize the potential of this approach. We hope that the findings in this study serve as an inspiration and a benchmark for future studies on multidimensional learned advection operators.

Acknowledgments.

We appreciate the time and feedback of three anonymous reviewers. This work is supported by an Early Career Faculty grant from the National Aeronautics and Space Administration (Grant 80NSSC21K1813) and Assistance Agreement RD-84001201-0 awarded by the U.S. Environmental Protection Agency. It has not been formally reviewed by EPA. The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. EPA does not endorse any products or commercial services mentioned in this publication. MP is supported by the Carver Fellowship and Illinois Distinguished Fellowship.

Data availability statement.

The code to implement the reference and learned advection and the velocity dataset are available at https://doi.org/10.5281/zenodo.8111623. The output of the model training including the model parameters and outputs are available at https://doi.org/10.13012/B2IDB-4743181_V1.

APPENDIX

Appendix Tables and Figures

a. Appendix tables

Table A1 summarizes the training epochs with lowest error in the spatial and temporal coarsening cases tested in this study. Tables A2 and A3 show the computational time taken to integrate 10-day-long 1D advection by the learned advection solver and the reference solver, respectively. Table A4 shows the maximum CFL numbers in the tested spatial and temporal coarsening cases in this study. Tables A5 and A6 summarize the computational time taken to integrate 10-day-long 2D advection by the learned advection solver and the reference solver, respectively.

Table A1.

The best training epoch in each resolution tested in this study. In table cells with boldface numbers, the training was not successful with 3-layer CNN but was successful with 4-layer CNN. Cases marked “Unstable” were unable to perform stable simulations even with a 5-layer CNN.

Table A1.
Table A2.

Computational time taken to integrate 10-day-long 1D advection by the learned advection solvers (ms; *: measured only once because the computational time is large enough; N/A: not measured because the solver is unstable).

Table A2.
Table A3.

Computational time taken to integrate 10-day-long 1D advection by the reference advection solvers (ms).

Table A3.
Table A4.

The maximum CFL number of each wind field tested in this study.

Table A4.
Table A5.

Computational time taken to integrate 10-day-long 2D advection by the learned advection solvers (ms; *: measured only once because the computational time is large enough; N/A: not measured because the solver is unstable).

Table A5.
Table A6.

Computational time taken to integrate 10-day-long 2D advection by the reference advection solvers (ms; *: measured only once because the computational time is large enough; N/A: not measured because the solver is unstable).

Table A6.

b. Appendix figures

Figure A1 is a schematic of the subsampling method used in this study. Figures A2A6 show the statistical results of generalization tests. Figure A7 shows the power spectral density plot of the simulations by the reference solver and the learned coarse solvers.

Fig. A1.
Fig. A1.

Illustration of the mass-conserving subsampling method.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Fig. A2.
Fig. A2.

Summary of generalization testing with a longer time span. The cases with bold numbers were unstable when using the optimal epoch for the training data but stable when using a different epoch.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Fig. A3.
Fig. A3.

Summary of generalization testing with different initial conditions. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Fig. A4.
Fig. A4.

Summary of generalization testing with different latitudes. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Fig. A5.
Fig. A5.

Summary of generalization testing with different seasons. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Fig. A6.
Fig. A6.

Summary of generalization testing with the longitudinal application. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

Fig. A7.
Fig. A7.

The density spectral analysis plot of 2D advection results.

Citation: Artificial Intelligence for the Earth Systems 3, 3; 10.1175/AIES-D-23-0080.1

REFERENCES

  • Bey, I., and Coauthors, 2001: Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. J. Geophys. Res., 106, 23 07323 095, https://doi.org/10.1029/2001JD000807.

    • Search Google Scholar
    • Export Citation
  • Bezanson, J., S. Karpinski, V. B. Shah, and A. Edelman, 2012: Julia: A fast dynamic language for technical computing. arXiv, 1209.5145v1, https://doi.org/10.48550/arXiv.1209.5145.

  • Brunton, S. L., B. R. Noack, and P. Koumoutsakos, 2020: Machine learning for fluid mechanics. Annu. Rev. Fluid Mech., 52, 477508, https://doi.org/10.1146/annurev-fluid-010719-060214.

    • Search Google Scholar
    • Export Citation
  • Caldwell, P. M., and Coauthors, 2019: The DOE E3SM coupled model version 1: Description and results at high resolution. J. Adv. Model. Earth Syst., 11, 40954146, https://doi.org/10.1029/2019MS001870.

    • Search Google Scholar
    • Export Citation
  • Courant, R., K. Friedrichs, and H. Lewy, 1928: Über die partiellen Differenzengleichungen der mathematischen Physik. Math. Ann., 100, 3274, https://doi.org/10.1007/BF01448839.

    • Search Google Scholar
    • Export Citation
  • Durran, D. R., 2010: Numerical Methods for Fluid Dynamics: With Applications to Geophysics. Vol. 32. Springer Science & Business Media, 516 pp.

  • Eastham, S. D., and Coauthors, 2018: GEOS-Chem High Performance (GCHP v11-02c): A next-generation implementation of the GEOS-Chem chemical transport model for massively parallel applications. Geosci. Model Dev., 11, 29412953, https://doi.org/10.5194/gmd-11-2941-2018.

    • Search Google Scholar
    • Export Citation
  • EPA, 2022: CMAQ. Zenodo, accessed 1 July 2024, https://doi.org/10.5281/zenodo.7218076.

  • Golaz, J.-C., and Coauthors, 2022: The DOE E3SM model version 2: Overview of the physical model and initial model evaluation. J. Adv. Model. Earth Syst., 14, e2022MS003156, https://doi.org/10.1029/2022MS003156.

    • Search Google Scholar
    • Export Citation
  • Grell, G. A., S. E. Peckham, R. Schmitz, S. A. McKeen, G. Frost, W. C. Skamarock, and B. Eder, 2005: Fully coupled “online” chemistry within the WRF model. Atmos. Environ., 39, 69576975, https://doi.org/10.1016/j.atmosenv.2005.04.027.

    • Search Google Scholar
    • Export Citation
  • Hendrycks, D., and K. Gimpel, 2016: Gaussian Error Linear Units (GELUs). arXiv, 1606.08415v5, https://doi.org/10.48550/arXiv.1606.08415.

  • Huang, Y., and J. H. Seinfeld, 2022: A neural network-assisted Euler integrator for stiff kinetics in atmospheric chemistry. Environ. Sci. Technol., 56, 46764685, https://doi.org/10.1021/acs.est.1c07648.

    • Search Google Scholar
    • Export Citation
  • Innes, M., and Coauthors, 2018: Fashionable modelling with Flux. arXiv, 1811.01457v3, https://doi.org/10.48550/arXiv.1811.01457.

  • Innes, M., A. Edelman, K. Fischer, C. Rackauckas, E. Saba, V. B. Shah, and W. Tebbutt, 2019: A differentiable programming system to bridge machine learning and scientific computing. arXiv, 1907.07587v2, https://doi.org/10.48550/arXiv.1907.07587.

  • JuliaCI, 2022: BenchmarkTools.jl. Accessed 27 July 2023, https://github.com/JuliaCI/BenchmarkTools.jl.

  • Keller, C. A., and M. J. Evans, 2019: Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geosci. Model Dev., 12, 12091225, https://doi.org/10.5194/gmd-12-1209-2019.

    • Search Google Scholar
    • Export Citation
  • Keller, C. A., M. J. Evans, J. N. Kutz, and S. Pawson, 2017: Machine learning and air quality modeling. 2017 IEEE Int. Conf. on Big Data (Big Data), Boston, MA, Institute of Electrical and Electronics Engineers, 4570–4576, https://doi.org/10.1109/BigData.2017.8258500.

  • Kelp, M. M., D. J. Jacob, J. N. Kutz, J. D. Marshall, and C. W. Tessum, 2020: Toward stable, general machine-learned models of the atmospheric chemical system. J. Geophys. Res. Atmos., 125, e2020JD032759, https://doi.org/10.1029/2020JD032759.

    • Search Google Scholar
    • Export Citation
  • Kelp, M. M., D. J. Jacob, H. Lin, and M. P. Sulprizio, 2022: An online-learned neural network chemical solver for stable long-term global simulations of atmospheric chemistry. J. Adv. Model. Earth Syst., 14, e2021MS002926, https://doi.org/10.1029/2021MS002926.

    • Search Google Scholar
    • Export Citation
  • Kim, S., W. Ji, S. Deng, Y. Ma, and C. Rackauckas, 2021: Stiff neural ordinary differential equations. Chaos, 31, 093122, https://doi.org/10.1063/5.0060697.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, 2021: Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA, 118, e2101784118, https://doi.org/10.1073/pnas.2101784118.

    • Search Google Scholar
    • Export Citation
  • Lin, S.-J., and R. B. Rood, 1996: Multidimensional flux-form semi-Lagrangian transport schemes. Mon. Wea. Rev., 124, 20462070, https://doi.org/10.1175/1520-0493(1996)124<2046:MFFSLT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lin, S.-J., W. C. Chao, Y. C. Sud, and G. K. Walker, 1994: A class of the van Leer-type transport schemes and its application to the moisture transport in a general circulation model. Mon. Wea. Rev., 122, 15751593, https://doi.org/10.1175/1520-0493(1994)122<1575:ACOTVL>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • NASA, 2022: GMAO product GEOS near-real time data products. Accessed 27 July 2023, https://gmao.gsfc.nasa.gov/GMAO_products/NRT_products.php.

  • Philip, S., R. V. Martin, and C. A. Keller, 2016: Sensitivity of chemistry-transport model simulations to the duration of chemical and transport operators: A case study with GEOS-Chem v10-01. Geosci. Model Dev., 9, 16831695, https://doi.org/10.5194/gmd-9-1683-2016.

    • Search Google Scholar
    • Export Citation
  • Shen, L., D. J. Jacob, M. Santillana, K. Bates, J. Zhuang, and W. Chen, 2022: A machine-learning-guided adaptive algorithm to reduce the computational cost of integrating kinetics in global atmospheric chemistry models: Application to GEOS-Chem versions 12.0.0 and 12.9.1. Geosci. Model Dev., 15, 16771687, https://doi.org/10.5194/gmd-15-1677-2022.

    • Search Google Scholar
    • Export Citation
  • Simon, H., K. R. Baker, and S. Phillips, 2012: Compilation and interpretation of photochemical model performance statistics published between 2006 and 2012. Atmos. Environ., 61, 124139, https://doi.org/10.1016/j.atmosenv.2012.07.012.

    • Search Google Scholar
    • Export Citation
  • Stachenfeld, K., and Coauthors, 2021: Learned coarse models for efficient turbulence simulation. arXiv, 2112.15275v3, https://doi.org/10.48550/arXiv.2112.15275.

  • Sturm, P. O., A. Manders, R. Janssen, A. Segers, A. S. Wexler, and H. X. Lin, 2023: Advecting superspecies: Efficiently modeling transport of organic aerosol with a mass-conserving dimensionality reduction method. J. Adv. Model. Earth Syst., 15, e2022MS003235, https://doi.org/10.1029/2022MS003235.

    • Search Google Scholar
    • Export Citation
  • Vinuesa, R., and S. L. Brunton, 2022: Enhancing computational fluid dynamics with machine learning. Nat. Comput. Sci., 2, 358366, https://doi.org/10.1038/s43588-022-00264-7.

    • Search Google Scholar
    • Export Citation
  • Zhuang, J., D. J. Jacob, H. Lin, E. W. Lundgren, R. M. Yantosca, J. F. Gaya, M. P. Sulprizio, and S. D. Eastham, 2020: Enabling high-performance cloud computing for earth science modeling on over a thousand cores: Application to the GEOS-Chem atmospheric chemistry model. J. Adv. Model. Earth Syst., 12, e2020MS002064, https://doi.org/10.1029/2020MS002064.

    • Search Google Scholar
    • Export Citation
  • Zhuang, J., D. Kochkov, Y. Bar-Sinai, M. P. Brenner, and S. Hoyer, 2021: Learned discretizations for passive scalar advection in a two-dimensional turbulent flow. Phys. Rev. Fluids, 6, 064605, https://doi.org/10.1103/PhysRevFluids.6.064605.

    • Search Google Scholar
    • Export Citation
Save
  • Bey, I., and Coauthors, 2001: Global modeling of tropospheric chemistry with assimilated meteorology: Model description and evaluation. J. Geophys. Res., 106, 23 07323 095, https://doi.org/10.1029/2001JD000807.

    • Search Google Scholar
    • Export Citation
  • Bezanson, J., S. Karpinski, V. B. Shah, and A. Edelman, 2012: Julia: A fast dynamic language for technical computing. arXiv, 1209.5145v1, https://doi.org/10.48550/arXiv.1209.5145.

  • Brunton, S. L., B. R. Noack, and P. Koumoutsakos, 2020: Machine learning for fluid mechanics. Annu. Rev. Fluid Mech., 52, 477508, https://doi.org/10.1146/annurev-fluid-010719-060214.

    • Search Google Scholar
    • Export Citation
  • Caldwell, P. M., and Coauthors, 2019: The DOE E3SM coupled model version 1: Description and results at high resolution. J. Adv. Model. Earth Syst., 11, 40954146, https://doi.org/10.1029/2019MS001870.

    • Search Google Scholar
    • Export Citation
  • Courant, R., K. Friedrichs, and H. Lewy, 1928: Über die partiellen Differenzengleichungen der mathematischen Physik. Math. Ann., 100, 3274, https://doi.org/10.1007/BF01448839.

    • Search Google Scholar
    • Export Citation
  • Durran, D. R., 2010: Numerical Methods for Fluid Dynamics: With Applications to Geophysics. Vol. 32. Springer Science & Business Media, 516 pp.

  • Eastham, S. D., and Coauthors, 2018: GEOS-Chem High Performance (GCHP v11-02c): A next-generation implementation of the GEOS-Chem chemical transport model for massively parallel applications. Geosci. Model Dev., 11, 29412953, https://doi.org/10.5194/gmd-11-2941-2018.

    • Search Google Scholar
    • Export Citation
  • EPA, 2022: CMAQ. Zenodo, accessed 1 July 2024, https://doi.org/10.5281/zenodo.7218076.

  • Golaz, J.-C., and Coauthors, 2022: The DOE E3SM model version 2: Overview of the physical model and initial model evaluation. J. Adv. Model. Earth Syst., 14, e2022MS003156, https://doi.org/10.1029/2022MS003156.

    • Search Google Scholar
    • Export Citation
  • Grell, G. A., S. E. Peckham, R. Schmitz, S. A. McKeen, G. Frost, W. C. Skamarock, and B. Eder, 2005: Fully coupled “online” chemistry within the WRF model. Atmos. Environ., 39, 69576975, https://doi.org/10.1016/j.atmosenv.2005.04.027.

    • Search Google Scholar
    • Export Citation
  • Hendrycks, D., and K. Gimpel, 2016: Gaussian Error Linear Units (GELUs). arXiv, 1606.08415v5, https://doi.org/10.48550/arXiv.1606.08415.

  • Huang, Y., and J. H. Seinfeld, 2022: A neural network-assisted Euler integrator for stiff kinetics in atmospheric chemistry. Environ. Sci. Technol., 56, 46764685, https://doi.org/10.1021/acs.est.1c07648.

    • Search Google Scholar
    • Export Citation
  • Innes, M., and Coauthors, 2018: Fashionable modelling with Flux. arXiv, 1811.01457v3, https://doi.org/10.48550/arXiv.1811.01457.

  • Innes, M., A. Edelman, K. Fischer, C. Rackauckas, E. Saba, V. B. Shah, and W. Tebbutt, 2019: A differentiable programming system to bridge machine learning and scientific computing. arXiv, 1907.07587v2, https://doi.org/10.48550/arXiv.1907.07587.

  • JuliaCI, 2022: BenchmarkTools.jl. Accessed 27 July 2023, https://github.com/JuliaCI/BenchmarkTools.jl.

  • Keller, C. A., and M. J. Evans, 2019: Application of random forest regression to the calculation of gas-phase chemistry within the GEOS-Chem chemistry model v10. Geosci. Model Dev., 12, 12091225, https://doi.org/10.5194/gmd-12-1209-2019.

    • Search Google Scholar
    • Export Citation
  • Keller, C. A., M. J. Evans, J. N. Kutz, and S. Pawson, 2017: Machine learning and air quality modeling. 2017 IEEE Int. Conf. on Big Data (Big Data), Boston, MA, Institute of Electrical and Electronics Engineers, 4570–4576, https://doi.org/10.1109/BigData.2017.8258500.

  • Kelp, M. M., D. J. Jacob, J. N. Kutz, J. D. Marshall, and C. W. Tessum, 2020: Toward stable, general machine-learned models of the atmospheric chemical system. J. Geophys. Res. Atmos., 125, e2020JD032759, https://doi.org/10.1029/2020JD032759.

    • Search Google Scholar
    • Export Citation
  • Kelp, M. M., D. J. Jacob, H. Lin, and M. P. Sulprizio, 2022: An online-learned neural network chemical solver for stable long-term global simulations of atmospheric chemistry. J. Adv. Model. Earth Syst., 14, e2021MS002926, https://doi.org/10.1029/2021MS002926.

    • Search Google Scholar
    • Export Citation
  • Kim, S., W. Ji, S. Deng, Y. Ma, and C. Rackauckas, 2021: Stiff neural ordinary differential equations. Chaos, 31, 093122, https://doi.org/10.1063/5.0060697.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • Kochkov, D., J. A. Smith, A. Alieva, Q. Wang, M. P. Brenner, and S. Hoyer, 2021: Machine learning–accelerated computational fluid dynamics. Proc. Natl. Acad. Sci. USA, 118, e2101784118, https://doi.org/10.1073/pnas.2101784118.

    • Search Google Scholar
    • Export Citation
  • Lin, S.-J., and R. B. Rood, 1996: Multidimensional flux-form semi-Lagrangian transport schemes. Mon. Wea. Rev., 124, 20462070, https://doi.org/10.1175/1520-0493(1996)124<2046:MFFSLT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Lin, S.-J., W. C. Chao, Y. C. Sud, and G. K. Walker, 1994: A class of the van Leer-type transport schemes and its application to the moisture transport in a general circulation model. Mon. Wea. Rev., 122, 15751593, https://doi.org/10.1175/1520-0493(1994)122<1575:ACOTVL>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • NASA, 2022: GMAO product GEOS near-real time data products. Accessed 27 July 2023, https://gmao.gsfc.nasa.gov/GMAO_products/NRT_products.php.

  • Philip, S., R. V. Martin, and C. A. Keller, 2016: Sensitivity of chemistry-transport model simulations to the duration of chemical and transport operators: A case study with GEOS-Chem v10-01. Geosci. Model Dev., 9, 16831695, https://doi.org/10.5194/gmd-9-1683-2016.

    • Search Google Scholar
    • Export Citation
  • Shen, L., D. J. Jacob, M. Santillana, K. Bates, J. Zhuang, and W. Chen, 2022: A machine-learning-guided adaptive algorithm to reduce the computational cost of integrating kinetics in global atmospheric chemistry models: Application to GEOS-Chem versions 12.0.0 and 12.9.1. Geosci. Model Dev., 15, 16771687, https://doi.org/10.5194/gmd-15-1677-2022.

    • Search Google Scholar
    • Export Citation
  • Simon, H., K. R. Baker, and S. Phillips, 2012: Compilation and interpretation of photochemical model performance statistics published between 2006 and 2012. Atmos. Environ., 61, 124139, https://doi.org/10.1016/j.atmosenv.2012.07.012.

    • Search Google Scholar
    • Export Citation
  • Stachenfeld, K., and Coauthors, 2021: Learned coarse models for efficient turbulence simulation. arXiv, 2112.15275v3, https://doi.org/10.48550/arXiv.2112.15275.

  • Sturm, P. O., A. Manders, R. Janssen, A. Segers, A. S. Wexler, and H. X. Lin, 2023: Advecting superspecies: Efficiently modeling transport of organic aerosol with a mass-conserving dimensionality reduction method. J. Adv. Model. Earth Syst., 15, e2022MS003235, https://doi.org/10.1029/2022MS003235.

    • Search Google Scholar
    • Export Citation
  • Vinuesa, R., and S. L. Brunton, 2022: Enhancing computational fluid dynamics with machine learning. Nat. Comput. Sci., 2, 358366, https://doi.org/10.1038/s43588-022-00264-7.

    • Search Google Scholar
    • Export Citation
  • Zhuang, J., D. J. Jacob, H. Lin, E. W. Lundgren, R. M. Yantosca, J. F. Gaya, M. P. Sulprizio, and S. D. Eastham, 2020: Enabling high-performance cloud computing for earth science modeling on over a thousand cores: Application to the GEOS-Chem atmospheric chemistry model. J. Adv. Model. Earth Syst., 12, e2020MS002064, https://doi.org/10.1029/2020MS002064.

    • Search Google Scholar
    • Export Citation
  • Zhuang, J., D. Kochkov, Y. Bar-Sinai, M. P. Brenner, and S. Hoyer, 2021: Learned discretizations for passive scalar advection in a two-dimensional turbulent flow. Phys. Rev. Fluids, 6, 064605, https://doi.org/10.1103/PhysRevFluids.6.064605.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Symbolic representation of the training, testing, and generalization process in this study.

  • Fig. 2.

    Visualization of the 1D training and testing domain.

  • Fig. 3.

    Design of the neural-net-based models to emulate the L94 advection solver. The Φn is the scalar field and Un is the velocity field.

  • Fig. 4.

    Performance of the learned solver in emulating the training dataset at different coarse-graining resolutions as compared to the highest-resolution reference solver simulation. Conditions where integration failed are marked “Unstable.”

  • Fig. 5.

    Performance of the reference solver in coarse-graining the fine-resolution training dataset in different resolutions as compared to the highest-resolution reference solver simulation. Conditions where integration failed are marked “Unstable.”

  • Fig. 6.

    Time series representation of the learned emulation of the training dataset. (top row) 8Δx, 16Δt and (bottom row) 16Δx, 64Δt. The initial shapes are slightly different from the square initial condition because of coarse-graining.

  • Fig. 7.

    Boxplots of error statistics in generalization tests for (a) MAE, (b) RMSE, and (c) R2. Each data point represents one combination of spatial and temporal resolutions. Unstable simulations are not shown here but are shown in Figs. A2A6.

  • Fig. 8.

    Dependence of generalization test accuracy on the maximum CFL number of target wind fields. Each boxplot is the summary of errors in each testing scenario integrating wind data different from the training dataset.

  • Fig. 9.

    Performance of the learned solver in 2D implementation in different coarse-graining resolutions. Cases with bold numbers were unstable with the optimal training epoch but stable when using a different training epoch.

  • Fig. 10.

    Performance of the reference solver in 2D at different resolutions.

  • Fig. 11.

    2D demonstration of coarse-graining by the learned and reference advection schemes. The first row shows the results of the high-resolution baseline solver. The second and fourth rows are the learned advection in different resolutions. The third and fifth rows are the low-resolution reference solvers at different resolutions.

  • Fig. A1.

    Illustration of the mass-conserving subsampling method.

  • Fig. A2.

    Summary of generalization testing with a longer time span. The cases with bold numbers were unstable when using the optimal epoch for the training data but stable when using a different epoch.

  • Fig. A3.

    Summary of generalization testing with different initial conditions. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

  • Fig. A4.

    Summary of generalization testing with different latitudes. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

  • Fig. A5.

    Summary of generalization testing with different seasons. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

  • Fig. A6.

    Summary of generalization testing with the longitudinal application. The cases with the bold numbers were unstable when using the optimal epoch for the training data but stable when using a different training epoch.

  • Fig. A7.

    The density spectral analysis plot of 2D advection results.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1936 1937 221
PDF Downloads 482 482 33