Deep Learning Parameterization of Vertical Wind Velocity Variability via Constrained Adversarial Training

Donifan Barahona aGlobal Modeling and Assimilation Office, NASA Goddard Space Flight Center, Greenbelt, Maryland

Search for other papers by Donifan Barahona in
Current site
Google Scholar
PubMed
Close
,
Katherine H. Breen aGlobal Modeling and Assimilation Office, NASA Goddard Space Flight Center, Greenbelt, Maryland
bMorgan State University, Baltimore, Maryland

Search for other papers by Katherine H. Breen in
Current site
Google Scholar
PubMed
Close
,
Heike Kalesse-Los cLeipzig Institute for Meteorology, Leipzig University, Leipzig, Germany

Search for other papers by Heike Kalesse-Los in
Current site
Google Scholar
PubMed
Close
, and
Johannes Röttenbacher cLeipzig Institute for Meteorology, Leipzig University, Leipzig, Germany

Search for other papers by Johannes Röttenbacher in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Atmospheric models with typical resolution in the tenths of kilometers cannot resolve the dynamics of air parcel ascent, which varies on scales ranging from tens to hundreds of meters. Small-scale wind fluctuations are thus characterized by a subgrid distribution of vertical wind velocity W with standard deviation σW. The parameterization of σW is fundamental to the representation of aerosol–cloud interactions, yet it is poorly constrained. Using a novel deep learning technique, this work develops a new parameterization for σW merging data from global storm-resolving model simulations, high-frequency retrievals of W, and climate reanalysis products. The parameterization reproduces the observed statistics of σW and leverages learned physical relations from the model simulations to guide extrapolation beyond the observed domain. Incorporating observational data during the training phase was found to be critical for its performance. The parameterization can be applied online within large-scale atmospheric models, or offline using output from weather forecasting and reanalysis products.

Significance Statement

Vertical air motion plays a crucial role in several atmospheric processes, such as cloud droplet and ice crystal formation. However, it often occurs at scales smaller than those resolved by standard atmospheric models, leading to uncertainties in climate predictions. To address this, we present a novel deep learning approach that synthesizes data from various sources, providing a representation of small-scale vertical wind velocity suitable for integration into atmospheric models. Our method demonstrates high accuracy when compared to observation-based retrievals, offering potential to mitigate uncertainties and enhance climate forecasting.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Donifan Barahona, donifan.o.barahona@nasa.gov

Abstract

Atmospheric models with typical resolution in the tenths of kilometers cannot resolve the dynamics of air parcel ascent, which varies on scales ranging from tens to hundreds of meters. Small-scale wind fluctuations are thus characterized by a subgrid distribution of vertical wind velocity W with standard deviation σW. The parameterization of σW is fundamental to the representation of aerosol–cloud interactions, yet it is poorly constrained. Using a novel deep learning technique, this work develops a new parameterization for σW merging data from global storm-resolving model simulations, high-frequency retrievals of W, and climate reanalysis products. The parameterization reproduces the observed statistics of σW and leverages learned physical relations from the model simulations to guide extrapolation beyond the observed domain. Incorporating observational data during the training phase was found to be critical for its performance. The parameterization can be applied online within large-scale atmospheric models, or offline using output from weather forecasting and reanalysis products.

Significance Statement

Vertical air motion plays a crucial role in several atmospheric processes, such as cloud droplet and ice crystal formation. However, it often occurs at scales smaller than those resolved by standard atmospheric models, leading to uncertainties in climate predictions. To address this, we present a novel deep learning approach that synthesizes data from various sources, providing a representation of small-scale vertical wind velocity suitable for integration into atmospheric models. Our method demonstrates high accuracy when compared to observation-based retrievals, offering potential to mitigate uncertainties and enhance climate forecasting.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Donifan Barahona, donifan.o.barahona@nasa.gov

1. Introduction

Many atmospheric processes depend on the movement of individual air parcels at scales ranging from tens to hundreds of meters. General circulation models (GCMs) typically operate at horizontal resolutions on the order of 100 km (IPCC 2013). Reanalysis and short-term numerical weather prediction (NWP) that require skill in reproducing storm dynamics typically use a higher resolution, spanning from about 10 km for NWP (Johnson et al. 2019) to about 50 km for subseasonal and seasonal forecast (e.g., Molod et al. 2020). Resolving convective transport, gravity wave motion, cloud and aerosol microphysics, and turbulent mixing often require meter-scale resolution, and they are still heavily parameterized, even in NWP applications (Bauer et al. 2015). Given the typical horizontal resolution of atmospheric models (∼10–100 km), it is likely that multiple ascending air parcels can be found within each grid cell, each one driven by its own vertical velocity W. This leads to a subgrid distribution of vertical wind velocities, characterized by a standard deviation σW.

The parameterization of σW plays a crucial role in accurately representing clouds and their interaction with aerosol emissions. Aerosol activation into cloud droplets and ice crystals results from the generation of supersaturation in ascending parcels. Gridscale cloud formation rates are obtained by integrating over the subgrid spectrum of vertical wind velocities, determined by σW (Pruppacher and Klett 1997). Variability in σW accounts for about 70% of the total variability in ice crystal and droplet formation rates (Sullivan et al. 2016). Uncertainty in σW thus translates directly into uncertainty in cloud representation, with profound implications for climate predictions (IPCC 2013; Seinfeld et al. 2016).

Atmospheric models traditionally rely on episodic campaign data (Peng et al. 2005; Shi and Liu 2016) and empirical approximations (Morrison et al. 2005; Ghan et al. 1997; Joos et al. 2008; Dean et al. 2007) to estimate σW. More reliable estimates can be obtained using modern turbulence/convection schemes (e.g., Bogenschutz et al. 2013; Lopez-Gomez et al. 2020), as, for example, those adopted in the most recent version of the Community Earth System Model (Danabasoglu et al. 2020). These higher-order schemes, however, depend on numerous parameters that can be challenging to constrain and significantly increase computational cost in climate simulations (Guo et al. 2014). Moreover, these schemes are primarily designed to represent warm, shallow, and stratocumulus clouds, which are strongly influenced by boundary layer turbulence. Consequently, they may tend to underestimate σW in high-level clouds that are impacted by orography and convection (Barahona et al. 2017; Patnaude et al. 2021).

Spatial variability in W at typical GCM resolutions can also be estimated by downsampling high-resolution simulations, like, for example, large-scale eddy simulations (LES) as they explicitly resolve vertical air motion at the scale of a few meters (Lenschow et al. 2012). Due to their computational expense, LES simulations are, however, limited to small domains. Another approach is to use global cloud-resolving models (GCRMs), which explicitly resolve kilometer-scale atmospheric motion and its interaction with cloud microphysics (Satoh et al. 2019). GCRMs provide global coverage at a higher spatial resolution than typical GCMs, although still coarser than LES. GCRMs work by either embedding high-resolution two-dimensional cloud-resolving models within a coarser GCM grid (Terai et al. 2020), or by direct downscaling of the model physics using nonhydrostatic dynamics (Fudeyasu et al. 2008; Putman and Suarez 2011). GCRMs offer significant advantages over traditional, low-resolution GCMs, since they are able to explicitly resolve convection and better link transport, cloud and aerosol microphysical processes, and atmospheric air motion. However, conducting GCRM simulations requires substantial technical resources (Satoh et al. 2019). As a result, most GCRM simulations span relatively short periods, typically a few weeks, limiting their ability to represent climate seasonality (Judt et al. 2021; Satoh et al. 2019). Alternatively, slightly coarser global storm-resolving model (GSRM) simulations can span longer periods, up to a few years (Putman and Suarez 2011).

While downscaling high-resolution simulations offers insights into the climatological behavior of σW, it poses challenges in generating state-dependent parameterizations for standard GCMs. However, these challenges can be overcome by employing artificial neural networks (ANNs) (Rasp et al. 2018; Gettelman et al. 2021; Mooers et al. 2020; Beucler et al. 2021). ANNs have the capability of synthesizing large volumes of data into a compressed representation, while retaining the most significant relationships within the dataset (Goodfellow et al. 2016; LeCun et al. 2015; Schmidhuber 2015). For instance, Rasp et al. (2018) trained an ANN using GCRM output to parameterize subgrid-scale variability of moisture in a GCM, resulting in improved representation of tropical precipitation. The ANN, however, lacked generalization skill for temperatures outside of the training data manifold. Recent proposed architectures show potential in improving the stability of subgrid ANN parameterizations (Lopez-Gomez et al. 2022; Iglesias-Suarez et al. 2023). ANNs are data-driven algorithms, meaning that predictions are made via the identification of mapping functions to transform inputs as opposed to numerical simulations or theoretical models. Successful deep learning applications often require data on the petascale (1024 terabytes) and exascale (1024 petabytes) (Chi et al. 2016; Laney 2001).

Despite their strength in representing small-scale processes, GCRMs exhibit biases in the representation of turbulence, shallow convection, and cloud microphysics (Roh et al. 2021). To address this, Beucler et al. (2021) demonstrated that enforcing energy and mass conservation during training improves the stability and accuracy of simulations incorporating subgrid-scale ANN parameterizations. However, global constraints such as top of the atmosphere radiative balance may compel ANN parameterizations to compensate for errors in other parts of the GCM as opposed to refining predictions of interest. Additionally, ANN models aiming to replace entire GCM components cannot be directly evaluated against observations, but only as part of the complete GCM simulation. To address these challenges, adopting a process-level approach is desirable, enabling the evaluation of individual parameterizations against experimental observations. Merging observational data with GCRM output during the training of surrogate ANN models may also reduce biases resulting from deficiencies in theoretical models.

Generative algorithms offer a promising approach to incorporate subgrid physics inherent in observations into ANN models while mitigating the impact of experimental errors. These algorithms aim to train ANNs by aligning with the data distribution, rather than simply learning the data representation (Zeng et al. 2021). Global statistics tend to be more resilient to nonsystematic experimental errors compared to individual values, making them valuable for guiding the training process of the ANN. Among the various generative models, the introduction of generative adversarial networks (GANs) has significantly improved accuracy and efficiency (Goodfellow et al. 2016). GANs employ a novel framework where two ANNs, a generator and a discriminator, engage in a “competition” during probabilistic training. The generator produces examples that the discriminator either accepts or rejects based on its own learned representation of the target data distribution (LeCun et al. 2015). GANs can be formulated as semi- or fully supervised algorithms (Mirza and Osindero 2014); more recent advances have improved their stability and convergence (Radford et al. 2015; Arjovsky et al. 2017; Zhu et al. 2017; Berthelot et al. 2017; Creswell et al. 2018; Pan et al. 2020). GANs have been widely successful in various domains including computer graphics and natural language processing (Creswell et al. 2018), and found applications in developing physical models (Willard et al. 2020), and in weather and climate prediction (Leinonen et al. 2019; Bihlo 2021; Besombes et al. 2021).

In this work, we propose a novel generative approach to develop an ANN representation of the subgrid distribution of vertical wind velocity. Our method involves combining W retrievals from various sources, global storm-resolving simulations, and reanalysis products. The key aspect of our approach is the integration of observational constraints directly within the ANN parameterization. Ground-based remote sensing stations worldwide have collected extensive high-frequency radar and lidar measurements, enabling the retrieval of W (Kalesse and Kollias 2013; Giangrande et al. 2016; Newsom et al. 2019). Although these measurements span different time periods and have limited spatial coverage, they collectively provide nearly 100 years of W retrievals at a sampling frequency from 2 s to 5 min. We leverage this wealth of observational data to enforce constraints on the ANN model.

2. Components and data

Our parameterization approach uses output from high-resolution simulations, reanalysis products, and observational datasets. These are detailed in this section.

a. The NASA GEOS model and MERRA-2

The NASA Goddard Earth Observing System (GEOS) consists of a set of components that numerically represent different aspects of the Earth system (atmosphere, ocean, land, sea ice, and chemistry), coupled following the Earth System Modeling Framework (https://gmao.gsfc.nasa.gov/GEOS_systems/). In the AGCM mode, atmospheric transport of water vapor, condensate, and other tracers, and associated land–atmosphere exchanges, are computed explicitly, whereas sea ice fraction and sea surface temperature are prescribed as time-dependent boundary conditions (Reynolds et al. 2002; Rienecker et al. 2008). Transport of aerosols and gaseous tracers such as CO are simulated using the Goddard Chemistry Aerosol and Radiation model (GOCART; Chin et al. 2002; Colarco et al. 2010). Cloud microphysics is described using a two-moment scheme where the mixing ratio and number concentration of cloud droplets and ice crystals are prognostic variables for stratiform clouds (i.e., cirrus, stratocumulus) and convective clouds (Barahona et al. 2014; Tan and Barahona 2022). GEOS has been shown to reproduce the global distribution of clouds, radiation, and precipitation in agreement with satellite retrievals and in situ observations (Barahona et al. 2014), and it is used operationally in subseasonal and seasonal forecast prediction (Molod et al. 2020).

The second version of the Modern-Era Retrospective Analysis for Research and Applications (MERRA-2) was the first multidecadal reanalysis where aerosol and meteorological observations are jointly assimilated (Gelaro et al. 2017; Randles et al. 2017). GEOS forms the core model of MERRA-2, which is constrained by ingesting a wealth of data from satellite, ground-based, and aircraft observations using the NASA Data Assimilation (Rienecker et al. 2008) and Aerosol Assimilation Systems (Randles et al. 2017). Because it is highly constrained by observations, MERRA-2 can be collocated in time and space with field retrievals (Gelaro et al. 2017). Section 3b details how this feature allows us to build an ANN model that directly incorporates observational data.

b. GEOS global storm-resolving simulations

Over the last decade, the NASA Global Modeling and Assimilation Office has performed a series of global nonhydrostatic integrations of the NASA GEOS model (Putman et al. 2015) as part of the Dynamics of the Atmospheric General Circulation Modeled on Nonhydrostatic Domains (DYAMOND) project (Stevens et al. 2019). In “nature” mode, these simulations use a free-running configuration of GEOS with no constraint by observations other than climatological sea surface temperatures (most recently, a fully coupled atmosphere–ocean, 5-km resolution, GSRM simulation has been achieved). Because of this, they are good at representing simulated variability associated with physical relationships within a model, but that can only be achieved through massive computational expense. The longest of these runs spanned two years, at a horizontal resolution of 7 km, and it is referred as the GEOS-5 Nature Run (G5NR; Gelaro et al. 2015).

Although G5NR has a slightly lower resolution than current GSRMs, it stands as the only multiyear kilometer-scale simulation achieved thus far. However, G5NR does not resolve boundary layer processes and generally underestimates σW. This is to be expected as a significant portion of the variability in W originates at scales smaller than 7 km. Nevertheless, G5NR captures the impact of large-scale features that trigger small-scale turbulence such as the enhancement of σw over mountain ranges, convective systems, and along jet streams (Barahona et al. 2017). These large-scale effects are challenging to discern solely from ground-based data due to their limited spatial coverage. In this context, G5NR provides a valuable reference derived from physical principles that would enable the ANN parameterization to extrapolate beyond local environments, complementing the observational data.

c. Vertical wind velocity retrievals

We use reported and new W retrievals from ground-based Doppler radar (DR) and Doppler lidar (DLi) instruments at different locations around the world. Both DR and DLi operate in a similar fashion, where backscattered pulses of electromagnetic energy are analyzed in time and space to retrieve W. In DR, the observed mean Doppler velocity is decomposed into the air velocity and the reflectivity-weighted hydrometeor velocity. DR depends on the presence of hydrometeors; hence, W can only be retrieved when clouds are present. Because DLi is sensitive to both clouds and aerosols, W can be retrieved in clear-sky conditions, although it is usually confined to the planetary boundary layer where micrometer-size particles are abundant (Newsom et al. 2019). To take advantage of the strengths of each technique, we have selected 11 diverse sites corresponding to different meteorological conditions, seasonality, and orographic features. Put together, they correspond to more than a 100 years of continuous W retrievals.

Table 1 describes the data available at each of the selected sites; their locations are depicted in Fig. 1. Most datasets were obtained from the Atmospheric Radiation Measurement archive (http://www.archive.arm.gov/). The sites at Leipzig, Germany (LEI), and Limassol, Cyprus (LIM), were collected with the Leipzig Aerosol and Cloud Remote Observations System (LACROS; Bühl et al. 2013) within the Cloudnet network (Illingworth et al. 2007) and are reported for the first time in this work. We have selected datasets that span for at least a year, to have a sufficiently large dataset to train the ANN. In general, all the lidar datasets correspond to retrievals done within the planetary boundary layer (Newsom et al. 2019; Berg et al. 2017), whereas the radar retrievals predominantly focus on ice clouds. The only exception is the MAO dataset (Giangrande et al. 2016), which is radar-based and focuses on convective clouds. When σW is not reported (which is particularly the case for Doppler radar datasets), we use the average horizontal wind velocity at each site to calculate σW from the retrieved W (Illingworth et al. 2007).

Table 1.

Datasets used for training. Site locations are shown in Fig. 1.

Table 1.
Fig. 1.
Fig. 1.

Location of the field sites for the datasets described in Table 1.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

All datasets were interpolated to match the MERRA-2 vertical grid and filtered for outliers, defined as data outside 2.5 standard deviations for each site. To balance the training set, data were augmented by repeating the LIM, LEI, MAN, and MAO sites four times, introducing a 1% random perturbation each time. These sites correspond to cirrus and convective conditions that are underrepresented in the collected dataset. It is challenging to unambiguously split the data to ensure a clear separation between the training and test sets while incorporating observations from different sites with distinct conditions. We adopted a sequential splitting approach, where the observational data were divided into training and testing sets based on time periods. Specifically, we allocated the first 80% of the time periods for training and the last 15% for testing, with a 5% gap to prevent overlap.

3. Parameterization approach

The goal of this work is to develop an ANN parameterization that uses low-resolution state input from a GCM to estimate σW. This is accomplished using a two-stage approach. In the first step, we build an ANN, termed “Wnet-prior,” as a surrogate model of the high-resolution G5NR output (section 3a). In the second step, Wnet-prior is incorporated as the first layers of a second ANN, termed “Wnet” (section 3b), which is trained using an adversarial approach. Wnet constitutes the final parameterization, constrained both by the field data and by the physical relations implicit in G5NR. This methodology is depicted in Fig. 2 and detailed below.

Fig. 2.
Fig. 2.

Scheme of the parameterization approach. (top left) Wnet-prior is trained by downsampling the G5NR output. (bottom left) Its hidden layers are frozen and incorporated as the first layers of the final ANN parameterization, Wnet, which is then (bottom right) adversarially trained against observations. The terms Ns and Nf refer to the number of samples and input features, respectively; XG5NR and XMERRA2 to the state defined by output from the G5NR simulation and the MERRA-2 reanalysis, respectively; Y^GEN and Y^Obs to generated and observed data, respectively; and h is used to represent the “latent space” of the discriminator.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

a. Generation of an ANN representation of σW from global storm-resolving simulations: Wnet-prior

Figure 2, top left, depicts the development of Wnet-prior. On the “input” side (blue), the G5NR data are downsampled by averaging over a nominal resolution, that is, about 0.5°, to represent the variables resolved in a low-resolution GCM. On the “output” side (green), σW (m s−1) is calculated directly from W using about 64 values for each 0.5° cell, which constitutes the target values used for training and validation. Prior experience in upper-tropospheric clouds (Barahona et al. 2017), as well as theoretical considerations, suggested that σW is dependent on orography, turbulence, convection, winds, and thermodynamics. Based on this, and by trial and error, we selected a set of inputs, XG5NR, at the coarse resolution to train the ANN. These include the Richardson number (Ri, dimensionless), total scalar diffusivity for momentum (Km, in m2 s−1), the three-dimensional wind velocity (U, V, and W, in m s−1), the water vapor, liquid, and ice mass mixing ratios (Qυ, Ql, and Qi in kg kg−1), air density (ρa, in kg m−3), and air temperature (T, in K). These variables are found in typical GCM output.

Wnet-prior was designed to work on individual grid cells. That is, to predict a single scalar, σW, from a one-dimensional input vector. Ideally, using three-dimensional (3D) input fields may inform the ANN of large-scale spatial correlations affecting σW, as for example, orography, convection, and teleconnections (Kärcher and Podglajen 2019). However, it might also make the parameterization resolution-dependent limiting its application in different models. Another caveat is that field observations used to constrain the ANN (section 3b) would have to have global 3D coverage, which is almost never the case. Even using two-dimensional features (i.e., atmospheric columns) as input (e.g., Rasp et al. 2018) results in “gaps” in the prediction during the refinement step (section 3b), for vertical levels where most of the observational datasets are not available, such as between 4 and 6 km of altitude (not shown). However, when comparing the 2D and 1D models, it was observed that the impact of orography on σW can be effectively approximated by incorporating a specific set of surface variables into the input of the ANN at all levels. Through trial and error (details not shown), we determined that the optimal set of surface variables for this purpose consists of Km, Qυ, ρa, and Ri. As a result, the final input set for the ANN was a 14-dimensional vector, including these variables. The training and optimization procedure for Wnet-prior is detailed in A1.

Data used for training, validation, and testing were randomly selected from the G5NR data without replacement: training data were selected from the years 2005–06 of the simulation and testing from the year 2007. Training data are used to optimize the mapping between input features and targets. At every epoch (training iteration), it is common to evaluate the current state of the ANN on a dataset for which the targets are known. Training is stopped based on criteria defined using the validation loss. The trained/validated model is then applied to the test set, consisting of 20 half-hourly output files of randomly selected output from G5NR (∼3 × 108 samples), not used during training.

b. Refinement by constrained adversarial training: Wnet

Neural networks are typically trained to identify a nonlinear mapping between a feature vector X and the corresponding target vector Y, yielding the estimation Y^. Instead of learning Y directly, GANs aim to learn the data distribution P(Y) (Goodfellow et al. 2016). This is achieved by contrasting the predictions of two networks, a generator and a discriminator, that train simultaneously. When trained in this way, the generator would produce new examples that conform with the data distribution by sampling a learned distribution, P(Y^) (Zeng et al. 2021). By concatenating labeled data to the target, it is possible to restrict P(Y) and P(Y^) within a particular reference class, hence creating a conditional GAN, cGAN (Mirza and Osindero 2014).

We adapt the cGAN architecture for parameterization development by using the input state to condition the observed and generated distributions so that they become Pobs(Y|X) and P(Y^|X), respectively. This approach entails considering only the distribution defined over a specific state X similar to how Mirza and Osindero (2014) constrained the learned distribution to a particular label. Notice that in the original cGAN formulation, the labels of the data were used to constrain the observed distribution and the input vector was typically random noise. Here instead, the state X is used as both the constraints and the input. In this way, the training procedure yields a regression from the state vector X to the target Y while conforming with Pobs(Y|X).

GANs rely on introducing sufficient variability to the ANN, often achieved through the use of random noise. Mirza and Osindero (2014) specifically focused on generating examples that conform to a target distribution by incorporating random noise as input. In contrast, our objective is to develop an ANN regression that accurately captures the target statistics, taking into account the random experimental error associated with the target values. While the input state remains deterministic, the target values exhibit stochastic behavior. Therefore, it is more appropriate to introduce random variability in Y, allowing the target values to distribute around the corresponding states. To address this, we introduce a random perturbation, typically between 1% and 5%, to Y during training. By doing so, the GAN algorithm effectively compels the generator to train on the more probable target values given the input state.

Generally, the loss function of the discriminator is a measure of the the statistical distance between the estimated and the target distributions. It is originally based on the Kullback–Leibler divergence (Goodfellow et al. 2014), although a number of other functions have been proposed (Pan et al. 2020). The loss function of the generator is typically written so that it would maximize the loss of the discriminator, hence setting up the adversarial training as a minimax game. Although highly effective, this procedure suffers form the caveat that there is no clear convergence criterion (Berthelot et al. 2017).

Adversarial training can also be understood as forcing the discriminator to simultaneously encode the target and the generated distributions. We exploit this view to generalize the cGAN algorithm. The role of the discriminator is to distinguish generated examples that lie outside the data manifold and hence do not likely correspond to Pobs(Y|X). By alternatively feeding generated and real data to the discriminator during training, it would try to encode both P(Y^|X) and Pobs(Y|X). Since the discriminator cannot simultaneously encode two different distributions, the optimization algorithm promotes P(Y^|X)Pobs(Y|X). Using this, the loss functions of the cGAN can generally be written as follows. Let l[A,B] be a generic metric representing the distance between two vectors, A and B, and G and D be the output of the generator and the discriminator, respectively. The discriminator is a conditional autoencoder learning the nonlinear mapping (X, Y) → Y. By evaluating the discriminator on both the real data and the prediction of the generator, its loss function can be written as
LD=l[D(Y|X),Y]+l{D[G(X)|X],G(X)},
where Y is the target data and X is the input vector.
The role of the generator is to produce examples that have a probability distribution indistinguishable from Pobs(Y|X), so that when the autoencoder is evaluated on G it produces similar output as when evaluated on Y. Hence, the loss of the generator can be written in the form:
LG=l{D(Y|X),D[G(X)|X]}.
These equations can be simplified when the discriminator outputs a probability score. The function l[A,B] then becomes a statistical distance between two generic probability distributions, that is, P(A) and P(B). This can be achieved by introducing in the autoencoder an extra, fully connected layer producing a single scalar output. Using a sigmoidal activation function in this layer ensures that the output remains bounded within the 0 to 1 range. Since the discriminator operating on the real data classifies its output as “real” and “fake” when driven using the output of the generator, the loss can be simplified as
LD=l[D(Y|X),1]+l{D[G(X)|X],0}.
Similarly, to force the discriminator to classify the generator output as “real,” we write the generator loss as
LG=l{D[G(X)|X],1}.
In this work, only Eqs. (3) and (4) are used; that is, the discriminator always outputs a probability score, and formulations in terms of Eqs. (1) and (2) are left for future investigation. It can be readily shown that when l corresponds to the binary cross entropy (Goodfellow et al. 2016), Eqs. (3) and (4) are equivalent to the original cGAN formulation (Goodfellow et al. 2014; Mirza and Osindero 2014). Equations (1)(4) thus merely allow for the testing of different functional forms of the statistical distance l between P(Y^|X) and Pobs(Y|X). The framework outlined is referred as constrained adversarial training (CAT), summarized in Fig. 3.
Fig. 3.
Fig. 3.

Constrained adversarial training algorithm.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

Figure 2 (bottom) shows the cGAN architecture. The generator combines the hidden layers from Wnet-prior (nontrainable, preserves physics learned from G5NR) with two new, trainable layers—one to transform and the other to output. The output of the generator feeds the discriminator, which outputs a single scalar indicating the probability that the generated σw is within Pobs(Y|X). After the CAT optimization, the trained generator constitutes the final parameterization of σw. In the final architecture, the generator is an ANN with 6 dense layers with 128 nodes each, and a single-node dense layer as the output. The discriminator architecture consists of an encoder (3 dense layers with 128, 64, and 32 nodes, respectively) to compress the data into the latent space (consisting of a single 8-node dense layer), and a decoder (3 dense layers with 32, 64, and 128 nodes, respectively) to sample from the latent space and reconstruct the data, and a single-node dense layer in the output. Although slightly different than the traditional application of an encoder–decoder architecture (Billault-Roux et al. 2023), this architecture aligns with the concept of the discriminator acting as a pseudoautoencoder, even though it ultimately outputs a probability score. Binary cross entropy is used as the statistical distance, l[A,B]. The optimization procedure is detailed in the appendix. The final model is termed “Wnet.”

c. Alternative models

We developed three alternative models to investigate the role of different aspects of the CAT algorithm on the ANN performance. The first model, termed “Obs-only,” utilized supervised learning and was trained directly on the observational data without incorporating any information from G5NR. This model focused solely on learning from the observed data.

In the “Transfer” model, we extended the pretrained Wnet-prior model by adding a trainable layer. The final layer of the ANN was then trained using supervised learning with the observational data, resembling a typical transfer learning approach (Daw et al. 2017).

The “EMD” model followed a similar architecture as Wnet, but instead of using binary cross entropy as the loss function (as in Wnet), it employed the Earth mover’s distance (EMD) in Eqs. (3) and (4) (Arjovsky et al. 2017). Neither gradient clipping nor spectral normalization (Miyato et al. 2018) was applied to the EMD function, on the rationale that the conditional state helps to stabilize the training process, which was indeed observed during experimentation.

The Obs-only and Transfer models represent alternative parameterizations that do not aim to precisely reproduce the observed distribution; they focus on incorporating the observed data. On the other hand, the EMD model, despite being PDF-based, acts as an integral constraint to P(Y^|X) rather than the divergence constraint imposed by binary cross entropy. A summary of the main features of these models is shown in Table 2.

Table 2.

ANN models developed in this work.

Table 2.

4. Results

To test the effectiveness of the training procedure outlined above, we assessed the accuracy of Wnet-prior at reproducing G5NR output and the skill of Wnet at reproducing the observed statistics at each site.

Wnet-prior captures the global spatial distribution of σw predicted by G5NR. Figure 4 compares σw obtained from G5NR and the Wnet-prior model calculated on the test set. Different processes govern the distribution of σw at different atmospheric levels. Near the surface (900 hPa, Fig. 4), wind shear near the continental shores enhances variability in W. Similarly, shallow convection enhances σw, evident in the tropics and the storm tracks of the midlatitudes of the Southern Hemisphere. G5NR depicts such dependencies as a result of the resolved synoptic-scale motion (Stull 1988). These are well reproduced by Wnet-prior. However, the main factors driving σw in the planetary boundary layer (PBL), that is, buoyancy and turbulence, cannot be resolved by the G5NR simulation as it would require much higher spatial resolution. As a result, σw is about an order of magnitude lower than typically observed values within the PBL [O(1)ms1]. Wnet-prior thus only imposes a physical constraint to the resolved scale [O(7)km], and the final parameterization relies heavily on the observational data.

Fig. 4.
Fig. 4.

Comparison of σw predicted by (left) the Wnet-prior model and (center) the G5NR run averaged over 20 randomly selected output files of the simulation, not used during training. (right) The root-mean-square differences for each grid cell. Numbers in parentheses indicate global means at a given level.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

At 500 hPa (Fig. 4), orographic features and deep convection are the main drivers of variability (Barahona et al. 2017; Dean et al. 2007). This is evident in the tropical oceans and over the Tibetan plateau, the Andes, and the west coast of North America where σw peaks in G5NR. Wnet-prior reproduces such patterns. It also accurately represents the minima in σw in the eastern equatorial Pacific cold tongue, and off the coasts of North and South America, associated with atmospheric stability and low sea surface temperature (Liu et al. 2019).

At 250 hPa, Wnet-prior slightly underestimates σw around the mountain regions, that is, Andes, Tibetan Plateau, and Himalayas. Wnet-prior predicts a weaker maxima in σw around mountain ranges. It is possibly a result of the lack of a spatial information in the input to Wnet-prior, which leads the ANN to underpredict the propagation of orographically induced gravity waves originated at the surface (McFarlane 1987; Dean et al. 2007; Barahona et al. 2017). This is suggested by tests (not shown) using a two-dimensional model (i.e., where each sample corresponds to an atmospheric column), which tended to represent better the peak σw in the upper troposphere. Such a model was deemed impractical to be formulated as a parameterization since it is dependent on the vertical resolution of MERRA-2.

For the test set, Wnet-prior reproduces the G5NR predictions with a mean bias of −0.004 ± 0.05 m s−1. The slight underestimation in σw by the ANN results from the tendency of Wnet-prior to underestimate the vertical propagation of gravity waves compared to G5NR. The ANN has to learn the dynamics of wave propagation from the state vector at each level instead from the whole atmospheric column. Multilayer perceptrons (MLPs) also have limited skill at elucidating spatial patterns in three-dimensional data. On the other hand, using a simple architecture eases its implementation in GCMs. By favoring the flexibility of the parameterization, we thus have subjected the ANN to a more challenging learning problem. Figure 4, however, shows that Wnet-prior is an accurate surrogate model of G5NR, providing a solid physical basis to the parameterization of σw.

Figure 5 compares the predictions of the Wnet ANN against observations for the test set (i.e., data not used during training), for all the sites of Table 1. For the different sites, Wnet reproduces the observed σw data with a normalized mean bias, Nmb, around ±15% (i.e., the mean bias divided by σw¯). For individual measurements, the discrepancy could be larger and the normalized root-mean-square error hovers around ∼50% for most sites. Nmb tends to be lower (typically around ±10%) at sites with multiyear measurements and with low influence from convective activity (NSA, MAN, ENA, and both SGP sites). These datasets provide a better constraint to the ANN and are subject to less variability in the atmospheric state. As shown in Fig. 5, the distribution of σw is also well reproduced at these sites. Accuracy was lower at sites subject to high convective activity (MAO, Nmb = −27% and TWP, Nmb = 30%). It is likely that the higher error is driven by bias in the timing, strength, and location of convection predicted by MERRA-2. The ANN also tends to underpredict variability in σw at the MAO site and overpredict it at the twp site. As these errors are driven by error in the input to the ANN, no attempts were made to improve the accuracy of the parameterization on individual sites. Doing so may lead to overfitting, forcing the ANN to encode uncertainty brought about either by errors in the reanalysis used to drive Wnet or by uncertainty in the measurements.

Fig. 5.
Fig. 5.

Boxplots showing the statistics of σw calculated on the test set (data not used during training) by the different models of Table 2 for the sites of Table 1. “Wpr,” “Oo,” and “Tr” correspond to the Wnet-prior, Obs-only, and Transfer models, respectively. Also shown are statistics from the observed data (Obs) at each site.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

Except for Wnet-prior, all the models shown in Fig. 5 represent reasonable parameterizations of σw. Wnet-prior tends to significantly underestimate the observed σw. This results from the limited ability of G5NR to explore the range of vertical wind velocities observed in nature. Even at the 7-km spatial resolution, the model misses a significant fraction of the observed variability, which is then inherited by Wnet-prior. Since there are different ways to develop the ANN parameterization, the other models of Table 2 aim to explore different aspects in which the CAT algorithm contributes to build a robust parameterization.“Obs-only” represents a direct approach, training on the observed data with no underlying physical constraints, whereas the “Transfer” model does not rely on adversarial training to ingest observations. Figure 5 shows that although reasonably reproducing the observed data, these two models tend to overpredict variability in σw, particularly for the radar sites [SGP (cirrus), MAN, LIM, and LEI]. The accuracy of the “EMD” model, which aims to test the impact of the function used to define the distance between P(Y^|X) and Pobs(Y|X) is similar to that of Wnet, although in some cases it tends to underrepresent variability in σw (e.g., at the ASI, MAO, and ENA).

Further evidence that Wnet represents well the observed statistics of σw is presented in Fig. 6, which shows the probability distribution function (PDF) of σw for the different models of Table 2. The agreement between Wnet and the observations is obvious since there is almost overlap between the black and red curves of Fig. 6. Quantitatively, out of the models of Table 2, Wnet has the lowest Kolmogorov–Smirnov statistic (the largest absolute difference between two cumulative distributions) calculated against observations (Wilks 2011). The positive effect of ingesting observed data within the parameterization is evidenced by comparing against the PDF predicted by Wnet-prior, which trained on simulated data only. Without refinement by observational data, the PDF is much narrower and centered about an order of magnitude lower σw than the measured values. The CAT algorithm thus significantly improves the accuracy of the predicted PDF by bringing it closer to the measured distribution.

Fig. 6.
Fig. 6.

Probability distribution functions of σw for the all the sites of Table 1 predicted by the models listed in Table 2, and from the observations. The calculated Kolmogorov–Smirnov distance against observations is indicated for each of the models.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

There is a slight discrepancy between the Wnet and observed PDFs for σw > 1, explained by bias on the onset of convection predicted by MERRA-2, which propagates to the σw prediction. This is further investigated by comparing the predicted PDF for individual sites against observations (Fig. A1). Across all sites, Wnet consistently outperforms all other models. There is some influence of target variable imbalance, where Wnet appears to perform better at sites with longer σW records. However, a more pronounced pattern is observed in the sites with the largest discrepancies between the predicted PDF and the observations, which are primarily located in the tropics (specifically MAO, PGH, and TWP). This observation suggests that the errors may be introduced due to biases in the prediction of convection by MERRA-2 in these regions and may impact the accuracy of σw predictions in areas with complex atmospheric dynamics such as convective systems.

Application of the Obs-only, Transfer, and EMD models results in similar distributions (Fig. 6). They are narrower than the observed PDF and characterized by two modes centered around σw ∼ 0.5 and σw ∼ 1.5 m s−1. Although also present in the data, the peak at σw ∼ 1.5 m s−1 is more subtle. In fact, the observed PDF would be well approximated by a lognormal distribution, an observation made for the first time in this work. The apparent second mode at higher σw results from the MAO dataset. As W in MAO is retrieved from the core of convective systems, it tends to show high σw values. The Obs-only, Transfer, and EMD models are highly impacted by these high values since there is a lack of observations at moderate σw, leading to the predicted bimodality. In contrast, it is remarkable that only Wnet can effectively capture the transition between the convective system-induced high σw and the more moderate σw values observed in other regions.

Figure 7 shows the global spatial distribution of σw predicted by the Wnet (left panels) and Obs-only models (right panels) for the test set, as in Fig. 4. Since no simulated data were used to train the Obs-only model, the comparison in Fig. 7 is a qualitative assessment of the effect of ingesting G5NR data in Wnet. In general, Wnet predicts about an order of magnitude higher σw than Wnet-prior. That is, Wnet accounts for the variability in W missing in G5NR and by extension in Wnet-prior. However, Wnet also inherits spatial structure in σw imposed by the physical constraints of the atmospheric model. This is evident over the mountain ranges of Asia and North and South America where Wnet (Fig. 7) and Wnet-prior (Fig. 4) display high σw at 500 and 250 hPa. Deep convection also leads to high σw in the tropical regions around the intertropical convergence zone (between 30°S and 30°N), and in the storm tracks of the Southern Hemisphere. This fine structure is largely missing in the predictions of the Obs-only model, which essentially lacks any significant effects from localized convection in the tropics at 500 hPa, and largely misses the effect of orography on σw at 250 hPa. This strongly suggests that the spatial structure depicted by Wnet in Fig. 7 is introduced by the model physics inherited by the incorporation of Wnet-prior into the ANN (Fig. 2).

Fig. 7.
Fig. 7.

As in Fig. 4, but for the (left) Wnet and (right) Obs-only models. Values in parentheses are global means.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

5. Discussion

A premise of this work is that robust parameterizations can be developed by targeting the observed PDF rather than by merely minimizing the difference between predictions and observations. Reproducing the observed PDF as opposed to discrete value-matching makes training more resilient to experimental error and buffers the parameterization against skewness introduced by extreme, low-probability outlier events. The latter could be detrimental in atmospheric simulations, as they propagate to other parts of the system by modifying processes like cloud formation. The importance of reproducing the observed PDF is central to our approach. Except for Wnet-prior, all the models tested represent plausible alternative parameterizations of σw (Fig. 5). Figure 6, however, shows that only Wnet reproduces the observed distribution of σw.

It is worth noting that even though EMD is adversarially trained, it only marginally approximates the PDF better than the Transfer and Obs-only models. Our implementation of EMD defines it discretely over each minibatch, which may limit the variability to which the GAN is exposed. Additionally, the absence of spectral normalization in the EMD loss implementation might cause the ANN to be penalized too heavily when exploring a wider PDF. These factors likely contribute to the suboptimal performance of EMD as compared to Wnet. Our tests, however, underscore the critical role of defining appropriate loss functions in the success of the GAN-based approach (Pan et al. 2020). The selection of an appropriate functional form for the loss functions would be facilitated by the generalized formulation of the GAN equations introduced in section 3b. This formulation enables a more informed exploration of different loss functions to improve the ANN’s ability to capture and reproduce the observed PDF accurately.

The proposed method does not solely depend on the loss function to generalize the behavior of the parameterization. Instead, it leverages physical principles encoded by Wnet to guide extrapolation beyond the domain of the observations. Wnet-prior, which is pretrained on G5NR data, remains frozen during the refinement step and serves as a valuable feature extractor, capturing essential patterns and physics-based information from the simulation. By integrating this knowledge into Wnet, we can guide the parameterization to follow the learned physics from G5NR while refining the predictions based on observational data. This approach allows us to effectively combine the strengths of both G5NR and observational data, leading to an enhanced parameterization that can generalize well beyond the observed domain.

This is evident in Fig. 7, where the spatial distribution of σw predicted by Wnet shares many features with those shown in Fig. 4. On the other hand, the Obs-only model tends to exhibit less variability, and it is prone to predict high values of σw. For example, at 500 hPa, the Obs-only approach predicts a wide band of high σw covering most of the region between 60°S and 60°N. This may be a consequence of limited data, since only the MAO dataset has measurements at 500 hPa in the tropics. Wnet also predicts high σw in that region; however, it shows features associated with the presence of strong convection and a land–ocean contrast. As both features are evident in Wnet-prior (Fig. 4), it is likely that such a structure is associated with underlying physical constraints inherited by Wnet.

The impact of orography on σw is much more evident in Wnet compared to the Obs-only prediction. Encoding such an effect solely from ground-based data is challenging for an ANN model, as orography remains fixed at each site. The partial display of orographic features in the Obs-only model may result from concatenating the surface state to each level. It is plausible that a deeper, more intricate model architecture (e.g., based on convolutional layers) could learn the relationships provided by Wnet-prior directly from observational data. However, a sophisticated model may have limited applicability as a parameterization for GCMs due to potential computational expenses.

By design, σw predicted by Wnet is in good agreement with ground-based observations, as shown in Fig. 5. It is also within the range of reported in situ values taken by aircraft measurements (West et al. 2014). Besides reproducing field campaign data, a state-dependent parameterization can be applied globally to predict the global distribution of σw as shown in Fig. 7. There are, however, few reports on the spatial distribution of σw, particularly near the surface: almost all work is based on field campaign data and in situ analyses. Nevertheless, the predicted σw (Fig. 7, 900 hPa) shows expected features, with higher values over land than over ocean (Peng et al. 2005; Morales and Nenes 2010), and the well-known effect of wind-driven turbulence on σw (Bogenschutz et al. 2013), as, for example, in the storm tracks of the Southern Hemisphere around 40°S, evident as well in the Obs-only model. At higher levels, the distribution of σw predicted by Wnet agrees with published theoretical studies that focus on the effect of gravity waves and orography on wind variability (Dean et al. 2007; Joos et al. 2008; Barahona et al. 2017). Qualitatively, the global distribution of σw at 250 hPa shown in Fig. 6 closely resembles operational air turbulence products (Williams and Storer 2022; Sharman et al. 2006) raising the possibility that a real-time prediction of σw could complement the estimation of air turbulence indexes.

6. Conclusions

This work presents a novel approach to estimate the spatial standard deviation in vertical wind velocity, at scales typical of GCM simulations. The new parameterization results from the combination of global storm-resolving simulations, long-term observational data, and climate reanalysis products. In this way, it is constrained by the physical model driving the GSRM and by the observations. This is achieved by using a two-step technique where an ANN trained on the GSRM output is incorporated within a second, larger ANN model trained on the observational data. The new parameterization uses the meteorological state (winds, temperature, and water concentration) and coarse metrics of turbulence (Richardson number and scalar momentum diffusivity) to predict σw at each grid cell. The model introduced here is suitable to be used online within a GCM, or offline, driven by the output of real-time numerical weather forecasts.

Inclusion of observational data was critical to the performance of the new parameterization. Measurements from 11 stations around the world were used to develop the ANN, including new radar-derived data from two European sites (LEI and LIM). The ANN reproduces these measurements and generalizes well outside the data manifold. Previous work has focused on upper-tropospheric statistics, relevant to cirrus formation. Here, we extended the parameterization of σW to the surface, making it relevant not only to cloud formation but also to diagnose mixing within the PBL (Santanello et al. 2007), and even as a diagnostic tool for air travel safety (Williams and Storer 2022).

In developing the parameterization, emphasis was placed on reproducing the observed PDF of σW. This was achieved by using a conditional GAN algorithm to train the ANN against observations. Compared to direct training methods against the observational data, the constrained adversarial training algorithm results in a robust estimation of σW, and that reproduces the observed statistics. At the same time, the ANN parameterization of σW inherits spatial structure from the global storm-resolving simulation that it might not solely learn from observational data.

The ANN parameterization was designed specifically for host models operating at spatial resolutions coarser than approximately 25 km. As the resolution of the host model increases, it is expected that the contribution of the parameterized σW to the total W variability would diminish. However, the evident underestimation of σW by G5NR highlights the continued importance of a parameterization in most GCRMs. Scaling arguments (Barahona et al. 2017) could be employed to adapt the predictions of Wnet so that the parameterization remains effective across different spatial resolutions and can be seamlessly integrated into models operating at varying scales.

It would also be interesting to further elucidate the relative contributions of the observational data and the prior model to the final ANN. Besides σW, the general approach presented here is suitable to study and parameterize other variables, like, for example, cloud liquid and ice water, supercooled cloud fraction, and water vapor. The successful implementation of the parameterization would also rely on efficient Fortran libraries. Some potential projects are already underway to address this need (Curcic 2019; Ott et al. 2020). Future work would look to advance these topics. Using the tools of deep learning, this work for the first time leverages vertical air velocity data from different sources, that is, GSRMs, observations, and data assimilation, to generate a robust representation of subgrid-scale variability suitable for real-time and online atmospheric predictions.

Acknowledgments.

This work was supported by the NASA MEASURES Program WBS: 281945.02.31.04.39. K.H. Breen was supported by the NASA Postdoctoral Program Fellowship. The authors thank Moritz Hoffman for his input. The authors also thank Patrick Seifert and his team for the cloud radar data at Leipzig and Limassol. Resources supporting this work were provided by the NASA High-End Computing (HEC) Program through the NASA Center for Climate Simulation (NCCS) at Goddard Space Flight Center.

Data availability statement.

The GEOS-5 source code is available under the NASA Open Source Agreement at http://opensource.gsfc.nasa.gov/projects/GEOS-5/. All data generated in this work will be made publicly available through the NASA technical reports server (https://ntrs.nasa.gov) and PubSpace (https://www.nasa.gov/open/researchaccess/pubspace). The MERRA-2 Reanalysis and GEOS-5 nature run datasets are publicly available from https://disc.gsfc.nasa.gov/. Field campaign datasets were downloaded from the Atmospheric Radiation Measurement Archive at https://www.arm.gov/data/. Keras and Tensorflow libraries were obtained from https://keras.io/. Maps were created using the NCAR Command Language (version 6.6.2) software (2019). UCAR/NCAR/CISL/TDD: https://doi.org/10.5065/D6WD3XH5. All codes developed in this work are available under request.

APPENDIX

Training and Optimization

Wnet-prior was implemented as a stack of fully connected layers in an MLP architecture (Goodfellow et al. 2016). Due to the massive size of the G5NR output (about 17 000 half-hourly files), we developed a custom subsampling technique, randomly selecting, without replacement, a set of files (about 3) from the G5NR output, encompassing approximately 7.5 × 107 samples, and processing it for a few epochs (about 5), after which the entire training set was replaced. This adaptive approach, acting as a regularization method, allowed Wnet-prior to generalize the behavior of σW effectively and ensured robustness across varying environmental conditions. However, it also led to “jumps” in the loss every time a new set of files was loaded, although generally receding within one epoch. While other approaches, like generating ANN ensembles (Zhang and Ma 2012), may have potential benefits during training, they come with significant computational costs, making them less practical for operational use. Although our method worked well for our specific problem, it may require further refinement to be a general regularization method. Nevertheless, it provides an efficient and robust solution to parameterize σW using the vast G5NR dataset.

Wnet-prior was trained using the Keras library with Tensorflow backend (Chollet et al. 2015). Optimization was carried out with the Adam algorithm (Kingma and Ba 2014). Hyperparameter optimization was carried out using the Keras tuner (Chollet et al. 2015) and is summarized in Table A1. To optimize the model, each configuration was run three times for 50 epochs with the same set of G5NR files. The final chosen configuration was selected based on the lowest mean validation loss across the three runs per trial.

Table A1.

Parameters used during hyperparameter tuning for Wnet-prior. Optimal hyperparameters are shown in bold.

Table A1.

The loss function was found to be critical for performance since σW spans over four orders of magnitude, from ∼0.001 to ∼10 m s−1. To capture both minima and maxima in σw, a custom loss function, termed “PolyMSE,” was developed as
Lprior=f(Yprior)f(Y^prior)2,
with
f(yi)=n=n1n2yin/10.
where Yprior and Y^prior are the target and predicted σW values from G5NR, n1 = 2 and n2 = 14. The polynomial expansion in Eq. (A2) has the effect of smoothing Lprior around σW = 1 m s−1, thereby reducing the dominance of high values.

The final architecture selected for Wnet-prior used an MLP composed of five fully connected layers of 128 nodes each, with a single-node output layer (Fig. 2). Activation of the hidden layers used the leaky rectified linear unit activation function (Leaky ReLU; Maas et al. 2013). The input to the ANN was standardized using fixed global means and standard deviations from G5NR, calculated over 100 randomly selected half-hourly output files. Using a batch size of 2048 samples, convergence (in terms of YY^ calculated on the validation set) was typically obtained after ∼500 epochs. Despite the occasional “jumps” in the loss of Wnet-prior, it remained smooth enough to effectively utilize early stopping during training.

Refinement step

To train the cGAN, input from the MERRA-2 reanalysis was collocated in time and space for each of the datasets of Table 1 and used to drive the generator. MERRA-2 is highly constrained by conventional data assimilation techniques and represents the best approximation to the actual environmental state for each measurement. Optimization was carried out with the Adam algorithm (Kingma and Ba 2014) and using the binary cross entropy as loss function. The leaky ReLU activation function was used for the hidden layers (Agarap 2018; Maas et al. 2013). The output layer of the discriminator used sigmoidal activation. Dropout using a frequency rate of 0.3 (discriminator only) was performed before the last hidden layer to avoid overfitting (Srivastava et al. 2014). The discriminator inputs a 15-dimensional vector (14 input variables, plus σw) and can be updated several times for each update of the generator; in this work, however, it is done once per iteration. Allowing the discriminator to train multiple times for each generator update generally had a negative impact on the model’s performance. The exact reason behind this observation remains unclear and requires further investigation.

To optimize the GAN, various experiments were conducted by exploring different settings for the distance function [Eqs. (3) and (4)], the prior model, discriminator layers, batch size, learning rate, and data augmentation factor (i.e., the number of times certain sites were repeated to balance the training set). We trained each model for a fixed number of epochs (approximately 500) and selected the weights at the epoch with the lowest mean square error against observations. Due to the nonsmooth nature of the cGAN loss function, which depends on the interaction between the generator and discriminator, formal parameter search was challenging. Therefore, the selection of the best model was guided by expert knowledge, considering not only the error against observations but also the overall behavior of the global distribution of σw. The resulting PDF for the selected model at each individual site is presented in Fig. A1.

Fig. A1.
Fig. A1.

Probability distribution functions of σw predicted by the Wnet-prior (blue), Wnet (black), Obs-only (green), Transfer (cyan), and EMD (purple) models (Table 2), and from the observations (red), at each of the sites of Table 1.

Citation: Artificial Intelligence for the Earth Systems 3, 1; 10.1175/AIES-D-23-0025.1

REFERENCES

  • Agarap, A. F., 2018: Deep learning using rectified linear units (ReLU). arXiv, 1803.08375v2, https://doi.org/10.48550/arXiv.1803.08375.

  • Arjovsky, M., S. Chintala, and L. Bottou, 2017: Wasserstein generative adversarial networks. ICML’17: Proc. Int Conf. on Machine Learning, Sydney, NSW, Australia, PMLR, 214–223, https://dl.acm.org/doi/abs/10.5555/3305381.3305404.

  • Barahona, D., A. Molod, J. Bacmeister, A. Nenes, A. Gettelman, H. Morrison, V. Phillips, and A. Eichmann, 2014: Development of two-moment cloud microphysics for liquid and ice within the NASA Goddard Earth Observing System Model (GEOS-5). Geosci. Model Dev., 7, 17331766, https://doi.org/10.5194/gmd-7-1733-2014.

    • Search Google Scholar
    • Export Citation
  • Barahona, D., A. Molod, and H. Kalesse, 2017: Direct estimation of the global distribution of vertical velocity within cirrus clouds. Sci. Rep., 7, 6840, https://doi.org/10.1038/s41598-017-07038-6.

    • Search Google Scholar
    • Export Citation
  • Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 4755, https://doi.org/10.1038/nature14956.

    • Search Google Scholar
    • Export Citation
  • Berg, L. K., R. K. Newsom, and D. D. Turner, 2017: Year-long vertical velocity statistics derived from Doppler lidar data for the continental convective boundary layer. J. Appl. Meteor. Climatol., 56, 24412454, https://doi.org/10.1175/JAMC-D-16-0359.1.

    • Search Google Scholar
    • Export Citation
  • Berthelot, D., T. Schumm, and L. Metz, 2017: BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv, 1703.10717v4, https://doi.org/10.48550/arXiv.1703.10717.

  • Besombes, C., O. Pannekoucke, C. Lapeyre, B. Sanderson, and O. Thual, 2021: Producing realistic climate data with generative adversarial networks. Nonlinear Processes Geophys., 28, 347370, https://doi.org/10.5194/npg-28-347-2021.

    • Search Google Scholar
    • Export Citation
  • Beucler, T., M. Pritchard, S. Rasp, J. Ott, P. Baldi, and P. Gentine, 2021: Enforcing analytic constraints in neural networks emulating physical systems. Phys. Rev. Lett., 126, 098302, https://doi.org/10.1103/PhysRevLett.126.098302.

    • Search Google Scholar
    • Export Citation
  • Bihlo, A., 2021: A generative adversarial network approach to (ensemble) weather prediction. Neural Networks, 139, 116, https://doi.org/10.1016/j.neunet.2021.02.003.

    • Search Google Scholar
    • Export Citation
  • Billault-Roux, A.-C., G. Ghiggi, L. Jaffeux, A. Martini, N. Viltard, and A. Berne, 2023: Dual-frequency spectral radar retrieval of snowfall microphysics: A physics-driven deep-learning approach. Atmos. Meas. Tech., 16, 911940, https://doi.org/10.5194/amt-16-911-2023.

    • Search Google Scholar
    • Export Citation
  • Bogenschutz, P. A., A. Gettelman, H. Morrison, V. E. Larson, C. Craig, and D. P. Schanen, 2013: Higher-order turbulence closure and its impact on climate simulations in the Community Atmosphere Model. J. Climate, 26, 96559676, https://doi.org/10.1175/JCLI-D-13-00075.1.

    • Search Google Scholar
    • Export Citation
  • Bühl, J., and Coauthors, 2013: LACROS: The Leipzig Aerosol and Cloud Remote Observations System. Proc. SPIE, 8890, 889002, https://doi.org/10.1117/12.2030911.

    • Search Google Scholar
    • Export Citation
  • Chi, M., A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, 2016: Big data for remote sensing: Challenges and opportunities. Proc. IEEE, 104, 22072219, https://doi.org/10.1109/JPROC.2016.2598228.

    • Search Google Scholar
    • Export Citation
  • Chin, M., and Coauthors, 2002: Tropospheric aerosol optical thickness from the GOCART model and comparisons with satellite and sun photometer measurements. J. Atmos. Sci., 59, 461483, https://doi.org/10.1175/1520-0469(2002)059<0461:TAOTFT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Chollet, F., and Coauthors, 2015: Keras. GitHub, https://github.com/fchollet/keras.

  • Colarco, P., A. da Silva, M. Chin, and T. Diehl, 2010: Online simulations of global aerosol distributions in the NASA GEOS-4 model and comparisons to satellite and ground-based aerosol optical depth. J. Geophys. Res., 115, D14207, https://doi.org/10.1029/2009JD012820.

    • Search Google Scholar
    • Export Citation
  • Creswell, A., T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, 2018: Generative adversarial networks: An overview. IEEE Signal Process. Mag., 35, 5365, https://doi.org/10.1109/MSP.2017.2765202.

    • Search Google Scholar
    • Export Citation
  • Curcic, M., 2019: A parallel Fortran framework for neural networks and deep learning. ACM SIGPLAN Fortran Forum, New York, NY, Association for Computing Machinery, 4–21, https://dl.acm.org/doi/abs/10.1145/3323057.3323059.

  • Danabasoglu, G., and Coauthors, 2020: The Community Earth System Model version 2 (CESM2). J. Adv. Model. Earth Syst., 12, e2019MS001916, https://doi.org/10.1029/2019MS001916.

    • Search Google Scholar
    • Export Citation
  • Daw, A., A. Karpatne, W. Watkins, J. Read, and V. Kumar, 2017: Physics-Guided Neural Networks (PGNN): An application in lake temperature modeling. arXiv, 1710.11431v3, https://doi.org/10.48550/arXiv.1710.11431.

  • Dean, S. M., J. Flowerdew, B. N. Lawrence, and S. D. Eckermann, 2007: Parameterisation of orographic cloud dynamics in a GCM. Climate Dyn., 28, 581597, https://doi.org/10.1007/s00382-006-0202-0.

    • Search Google Scholar
    • Export Citation
  • Fudeyasu, H., Y. Wang, M. Satoh, T. Nasuno, H. Miura, and W. Yanase, 2008: Global cloud-system-resolving model NICAM successfully simulated the lifecycles of two real tropical cyclones. Geophys. Res. Lett., 35, L22808, https://doi.org/10.1029/2008GL036003.

    • Search Google Scholar
    • Export Citation
  • Gelaro, R., and Coauthors, 2015: Evaluation of the 7-km GEOS-5 Nature Run. NASA Tech. Rep., NASA/TM2014-104606/Vol.36, 305 pp., https://ntrs.nasa.gov/api/citations/20150011486/downloads/20150011486.pdf.

  • Gelaro, R., and Coauthors, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 54195454, https://doi.org/10.1175/JCLI-D-16-0758.1.

    • Search Google Scholar
    • Export Citation
  • Gettelman, A., D. J. Gagne, C.-C. Chen, M. W. Christensen, Z. J. Lebo, H. Morrison, and G. Gantos, 2021: Machine learning the warm rain process. J. Adv. Model. Earth Syst., 13, e2020MS002268, https://doi.org/10.1029/2020MS002268.

    • Search Google Scholar
    • Export Citation
  • Ghan, S. J., L. R. Leung, R. C. Easter, and H. Abdul-Razzak, 1997: Prediction of cloud droplet number in a general circulation model. J. Geophys. Res., 102, 21 77721 794, https://doi.org/10.1029/97JD01810.

    • Search Google Scholar
    • Export Citation
  • Giangrande, S. E., and Coauthors, 2016: Convective cloud vertical velocity and mass-flux characteristics from radar wind profiler observations during GoAmazon2014/5. J. Geophys. Res. Atmos., 121, 12 89112 913, https://doi.org/10.1002/2016JD025303.

    • Search Google Scholar
    • Export Citation
  • Goodfellow, I. J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 2014: Generative adversarial nets. NIPS’14: Proc. 27th Int. Conf. on Neural Information Processing Systems, Montreal, QC, Canada, Association for Computing Machinery, 2672–2680, https://dl.acm.org/doi/10.5555/2969033.2969125.

  • Goodfellow, I. J., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 800 pp.

  • Guo, Z., and Coauthors, 2014: A sensitivity analysis of cloud properties to CLUBB parameters in the Single-column Community Atmosphere Model (SCAM5). J. Adv. Model. Earth Syst., 6, 829858, https://doi.org/10.1002/2014MS000315.

    • Search Google Scholar
    • Export Citation
  • Iglesias-Suarez, F., P. Gentine, B. Solino-Fernandez, T. Beucler, M. Pritchard, J. Runge, and V. Eyring, 2023: Causally-informed deep learning to improve climate models and projections. arXiv, 2304.12952v3, https://doi.org/10.48550/arXiv.2304.12952.

  • Illingworth, A. J., and Coauthors, 2007: Cloudnet: Continuous evaluation of cloud profiles in seven operational models using ground-based observations. Bull. Amer. Meteor. Soc., 88, 883898, https://doi.org/10.1175/BAMS-88-6-883.

    • Search Google Scholar
    • Export Citation
  • IPCC, 2013: Climate Change 2013: The Physical Science Basis. Cambridge University Press, 1535 pp., https://doi.org/10.1017/CBO9781107415324.

  • Johnson, S. J., and Coauthors, 2019: SEAS5: The new ECMWF seasonal forecast system. Geosci. Model Dev., 12, 10871117, https://doi.org/10.5194/gmd-12-1087-2019.

    • Search Google Scholar
    • Export Citation
  • Joos, H., P. Spichtinger, U. Lohmann, J.-F. Gayet, and A. Minikin, 2008: Orographic cirrus in the global climate model ECHAM5. J. Geophys. Res., 113, D18205, https://doi.org/10.1029/2007JD009605.

    • Search Google Scholar
    • Export Citation
  • Judt, F., and Coauthors, 2021: Tropical cyclones in global storm-resolving models. J. Meteor. Soc. Japan, 99, 579602, https://doi.org/10.2151/jmsj.2021-029.

    • Search Google Scholar
    • Export Citation
  • Kalesse, H., and P. Kollias, 2013: Climatology of high cloud dynamics using profiling ARM Doppler radar observations. J. Climate, 26, 63406359, https://doi.org/10.1175/JCLI-D-12-00695.1.

    • Search Google Scholar
    • Export Citation
  • Kärcher, B., and A. Podglajen, 2019: A stochastic representation of temperature fluctuations induced by mesoscale gravity waves. J. Geophys. Res. Atmos., 124, 11 50611 529, https://doi.org/10.1029/2019JD030680.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. Ba, 2014: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.

  • Laney, D., 2001: 3D data management: Controlling data volume, velocity and variety. META Group Research Note 6, 1 pp.

  • LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436444, https://doi.org/10.1038/nature14539.

  • Leinonen, J., A. Guillaume, and T. Yuan, 2019: Reconstruction of cloud vertical structure with a generative adversarial network. Geophys. Res. Lett., 46, 70357044, https://doi.org/10.1029/2019GL082532.

    • Search Google Scholar
    • Export Citation
  • Lenschow, D. H., M. Lothon, S. D. Mayor, P. P. Sullivan, and G. Canut, 2012: A comparison of higher-order vertical velocity moments in the convective boundary layer from lidar with in situ measurements and large-eddy simulation. Bound.-Layer Meteor., 143, 107123, https://doi.org/10.1007/s10546-011-9615-3.

    • Search Google Scholar
    • Export Citation
  • Liu, J., J. Tian, Z. Liu, T. D. Herbert, A. V. Fedorov, and M. Lyle, 2019: Eastern equatorial Pacific cold tongue evolution since the late Miocene linked to extratropical climate. Sci. Adv., 5, eaau6060, https://doi.org/10.1126/sciadv.aau6060.

    • Search Google Scholar
    • Export Citation
  • Lopez-Gomez, I., Y. Cohen, J. He, A. Jaruga, and T. Schneider, 2020: A generalized mixing length closure for eddy-diffusivity mass-flux schemes of turbulence and convection. J. Adv. Model. Earth Syst., 12, e2020MS002161, https://doi.org/10.1029/2020MS002161.

    • Search Google Scholar
    • Export Citation
  • Lopez-Gomez, I., C. Christopoulos, H. L. Langeland Ervik, O. R. Dunbar, Y. Cohen, and T. Schneider, 2022: Training physics-based machine-learning parameterizations with gradient-free ensemble Kalman methods. J. Adv. Model. Earth Syst., 14, e2022MS003105, https://doi.org/10.1029/2022MS003105.

    • Search Google Scholar
    • Export Citation
  • Maas, A. L., A. Y. Hannun, and A. Y. Ng, 2013: Rectifier nonlinearities improve neural network acoustic models. Proc. 30th Int. Conf. on Machine Learning, Atlanta, GA, JMLR, 6 pp., https://ai.stanford.edu/∼amaas/papers/relu_hybrid_icml2013_final.pdf.

  • McFarlane, N. A., 1987: The effect of orographically excited gravity wave drag on the general circulation of the lower stratosphere and troposphere. J. Atmos. Sci., 44, 17751800, https://doi.org/10.1175/1520-0469(1987)044%3C1775:TEOOEG%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Mirza, M., and S. Osindero, 2014: Conditional generative adversarial nets. arXiv, 1411.1784v1, https://doi.org/10.48550/ARXIV.1411.1784.

  • Miyato, T., T. Kataoka, M. Koyama, and Y. Yoshida, 2018: Spectral normalization for generative adversarial networks. arXiv, 1802.05957v1, https://doi.org/10.48550/arXiv.1802.05957.

  • Molod, A., and Coauthors, 2020: GEOS-S2S Version 2: The GMAO high-resolution coupled model and assimilation system for seasonal prediction. J. Geophys. Res. Atmos., 125, e2019JD031767, https://doi.org/10.1029/2019JD031767.

    • Search Google Scholar
    • Export Citation
  • Mooers, G., M. Pritchard, T. Beucler, J. Ott, G. Yacalis, P. Baldi, and P. Gentine, 2020: Assessing the potential of deep learning for emulating cloud superparameterization in climate models with real-geography boundary conditions. arXiv, 2010.12996v3, https://doi.org/10.48550/arXiv.2010.12996.

  • Morales, R., and A. Nenes, 2010: Characteristic updrafts for computing distribution-averaged cloud droplet number, and stratocumulus cloud properties. J. Geophys. Res., 115, D18220, https://doi.org/10.1029/2009JD013233.

    • Search Google Scholar
    • Export Citation
  • Morrison, H., J. A. Curry, and V. I. Khvorostyanov, 2005: A new double-moment microphysics parameterization for application in cloud and climate models. Part I: Description. J. Atmos. Sci., 62, 16651677, https://doi.org/10.1175/JAS3446.1.

    • Search Google Scholar
    • Export Citation
  • Newsom, R. K., C. Sivaraman, T. R. Shippert, and L. D. Riihimaki, 2019: Doppler lidar vertical velocity statistics value-added product. Rep. DOE/SC-ARM-TR-149, 22 pp., https://www.arm.gov/publications/tech_reports/doe-sc-arm-tr-149.pdf?id=1000.

  • Ott, J., M. Pritchard, N. Best, E. Linstead, M. Curcic, and P. Baldi, 2020: A Fortran-Keras deep learning bridge for scientific computing. Sci. Program., 2020, 8888811, https://doi.org/10.1155/2020/8888811.

    • Search Google Scholar
    • Export Citation
  • Pan, Z., W. Yu, B. Wang, H. Xie, V. S. Sheng, J. Lei, and S. Kwong, 2020: Loss functions of Generative Adversarial Networks (GANs): Opportunities and challenges. IEEE Trans. Emerging Top. Comput. Intell., 4, 500522, https://doi.org/10.1109/TETCI.2020.2991774.

    • Search Google Scholar
    • Export Citation
  • Patnaude, R., M. Diao, X. Liu, and S. Chu, 2021: Effects of thermodynamics, dynamics and aerosols on cirrus clouds based on in situ observations and NCAR CAM6. Atmos. Chem. Phys., 21, 18351859, https://doi.org/10.5194/acp-21-1835-2021.

    • Search Google Scholar
    • Export Citation
  • Peng, Y., U. Lohmann, and R. Leaitch, 2005: Importance of vertical velocity variations in the cloud droplet nucleation process of marine stratus clouds. J. Geophys. Res., 110, D21213, https://doi.org/10.1029/2004JD004922.

    • Search Google Scholar
    • Export Citation
  • Pruppacher, H. R., and J. D. Klett, 1997: Microphysics of Clouds and Precipitation. 2nd ed. Kluwer Academic, 954 pp.

  • Putman, W. M., and M. Suarez, 2011: Cloud-system resolving simulations with the NASA Goddard Earth Observing System global atmospheric model (GEOS-5). Geophys. Res. Lett., 38, L16809, https://doi.org/10.1029/2011GL048438.

    • Search Google Scholar
    • Export Citation
  • Putman, W. M., M. Suarez, and A. Trayanov, 2015: 1.5-km global cloud-resolving simulations with GEOS-5. NASA, https://gmao.gsfc.nasa.gov/research/science_snapshots/1.5km_cloud_simulation.php.

  • Radford, A., L. Metz, and S. Chintala, 2015: Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv, 1511.06434v2, https://doi.org/10.48550/arXiv.1511.06434.

  • Randles, C. A., and Coauthors, 2017: The MERRA-2 aerosol reanalysis, 1980 onward. Part I: System description and data assimilation evaluation. J. Climate, 30, 68236850, https://doi.org/10.1175/JCLI-D-16-0609.1.

    • Search Google Scholar
    • Export Citation
  • Rasp, S., M. S. Pritchard, and P. Gentine, 2018: Deep learning to represent subgrid processes in climate models. Proc. Natl. Acad. Sci. USA, 115, 96849689, https://doi.org/10.1073/pnas.1810286115.

    • Search Google Scholar
    • Export Citation
  • Reynolds, R. W., N. A. Rayner, T. M. Smith, D. C. Stokes, and W. Wang, 2002: An improved in situ and satellite SST analysis for climate. J. Climate, 15, 16091625, https://doi.org/10.1175/1520-0442(2002)015<1609:AIISAS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Rienecker, M. M., and Coauthors, 2008: The GEOS-5 Data Assimilation System—Documentation of Versions 5.0.1, 5.1.0, and 5.2.0. Tech. Memo. NASA/TM-2008-104606, Vol. 27, 97 pp., http://gmao.gsfc.nasa.gov/pubs/docs/Rienecker369.pdf.

  • Roh, W., M. Satoh, and C. Hohenegger, 2021: Intercomparison of cloud properties in DYAMOND simulations over the Atlantic Ocean. J. Meteor. Soc. Japan, 99, 14391451, https://doi.org/10.2151/jmsj.2021-070.

    • Search Google Scholar
    • Export Citation
  • Röttenbacher, J., 2021: Further development of an algorithm to determine cirrus cloud dynamics. M.S. thesis, Institute for Meteorology, Leipzig University, 159 pp.

  • Satoh, M., B. Stevens, F. Judt, M. Khairoutdinov, S.-J. Lin, W. M. Putman, and P. Düben, 2019: Global cloud-resolving models. Curr. Climate Change Rep., 5, 172184, https://doi.org/10.1007/s40641-019-00131-0.

    • Search Google Scholar
    • Export Citation
  • Schmidhuber, J., 2015: Deep learning in neural networks: An overview. Neural Networks, 61, 85117, https://doi.org/10.1016/j.neunet.2014.09.003.

    • Search Google Scholar
    • Export Citation
  • Seinfeld, J. H., and Coauthors, 2016: Improving our fundamental understanding of the role of aerosol cloud interactions in the climate system. Proc. Natl. Acad. Sci. USA, 113, 57815790, https://doi.org/10.1073/pnas.1514043113.

    • Search Google Scholar
    • Export Citation
  • Sharman, R., C. Tebaldi, G. Wiener, and J. Wolff, 2006: An integrated approach to mid-and upper-level turbulence forecasting. Wea. Forecasting, 21, 268287, https://doi.org/10.1175/WAF924.1.

    • Search Google Scholar
    • Export Citation
  • Shi, X., and X. Liu, 2016: Effect of cloud-scale vertical velocity on the contribution of homogeneous nucleation to cirrus formation and radiative forcing. Geophys. Res. Lett., 43, 65886595, https://doi.org/10.1002/2016GL069531.

    • Search Google Scholar
    • Export Citation
  • Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, 2014: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15, 19291958, https://dl.acm.org/doi/abs/10.5555/2627435.2670313.

    • Search Google Scholar
    • Export Citation
  • Stevens, B., and Coauthors, 2019: DYAMOND: The Dynamics of the Atmospheric General Circulation Modeled on Non-hydrostatic Domains. Prog. Earth Planet. Sci., 6, 61, https://doi.org/10.1186/s40645-019-0304-z.

    • Search Google Scholar
    • Export Citation
  • Stull, R. B., 1988: An Introduction to Boundary Layer Meteorology. Kluwer Academic, 666 pp.

  • Sullivan, S. C., D. Lee, L. Oreopoulos, and A. Nenes, 2016: Role of updraft velocity in temporal variability of global cloud hydrometeor number. Proc. Natl. Acad. Sci. USA, 113, 57915796, https://doi.org/10.1073/pnas.1514039113.

    • Search Google Scholar
    • Export Citation
  • Tan, I., and D. Barahona, 2022: The impacts of immersion ice nucleation parameterizations on Arctic mixed-phase stratiform cloud properties and the Arctic radiation budget in GEOS-5. J. Climate, 35, 40494070, https://doi.org/10.1175/JCLI-D-21-0368.1.

    • Search Google Scholar
    • Export Citation
  • Terai, C. R., M. S. Pritchard, P. Blossey, and C. Bretherton, 2020: The impact of resolving subkilometer processes on aerosol-cloud interactions of low-level clouds in global model simulations. J. Adv. Model. Earth Syst., 12, e2020MS002274, https://doi.org/10.1029/2020MS002274.

    • Search Google Scholar
    • Export Citation
  • West, R. E. L., P. Stier, A. Jones, C. E. Johnson, G. W. Mann, N. Bellouin, D. Partridge, and Z. Kipling, 2014: The importance of vertical velocity variability for estimates of the indirect aerosol effects. Atmos. Chem. Phys., 14, 63696393, https://doi.org/10.5194/acp-14-6369-2014.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. Elsevier, 676 pp.

  • Willard, J., X. Jia, S. Xu, M. Steinbach, and V. Kumar, 2020: Integrating physics-based modeling with machine learning: A survey. arXiv, 2003.04919v4, https://doi.org/10.48550/arXiv.2003.04919.

  • Williams, P. D., and L. N. Storer, 2022: Can a climate model successfully diagnose clear-air turbulence and its response to climate change? Quart. J. Roy. Meteor. Soc., 148, 14241438, https://doi.org/10.1002/qj.4270.

    • Search Google Scholar
    • Export Citation
  • Zeng, Y., J.-L. Wu, and H. Xiao, 2021: Enforcing imprecise constraints on generative adversarial networks for emulating physical systems. Commun. Comput. Phys., 30, 635665, https://doi.org/10.4208/cicp.OA-2020-0106.

    • Search Google Scholar
    • Export Citation
  • Zhang, C., and Y. Ma, 2012: Ensemble Machine Learning: Methods and Applications. Springer, 332 pp.

  • Zhu, J.-Y., T. Park, P. Isola, and A. A. Efros, 2017: Unpaired image-to-image translation using cycle-consistent adversarial networks. Proc. IEEE Int. Conf. on Computer Vision, Venice, Italy, Institute of Electrical and Electronics Engineers, 2242–2251, https://doi.org/10.1109/ICCV.2017.244.

Save
  • Agarap, A. F., 2018: Deep learning using rectified linear units (ReLU). arXiv, 1803.08375v2, https://doi.org/10.48550/arXiv.1803.08375.

  • Arjovsky, M., S. Chintala, and L. Bottou, 2017: Wasserstein generative adversarial networks. ICML’17: Proc. Int Conf. on Machine Learning, Sydney, NSW, Australia, PMLR, 214–223, https://dl.acm.org/doi/abs/10.5555/3305381.3305404.

  • Barahona, D., A. Molod, J. Bacmeister, A. Nenes, A. Gettelman, H. Morrison, V. Phillips, and A. Eichmann, 2014: Development of two-moment cloud microphysics for liquid and ice within the NASA Goddard Earth Observing System Model (GEOS-5). Geosci. Model Dev., 7, 17331766, https://doi.org/10.5194/gmd-7-1733-2014.

    • Search Google Scholar
    • Export Citation
  • Barahona, D., A. Molod, and H. Kalesse, 2017: Direct estimation of the global distribution of vertical velocity within cirrus clouds. Sci. Rep., 7, 6840, https://doi.org/10.1038/s41598-017-07038-6.

    • Search Google Scholar
    • Export Citation
  • Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 4755, https://doi.org/10.1038/nature14956.

    • Search Google Scholar
    • Export Citation
  • Berg, L. K., R. K. Newsom, and D. D. Turner, 2017: Year-long vertical velocity statistics derived from Doppler lidar data for the continental convective boundary layer. J. Appl. Meteor. Climatol., 56, 24412454, https://doi.org/10.1175/JAMC-D-16-0359.1.

    • Search Google Scholar
    • Export Citation
  • Berthelot, D., T. Schumm, and L. Metz, 2017: BEGAN: Boundary Equilibrium Generative Adversarial Networks. arXiv, 1703.10717v4, https://doi.org/10.48550/arXiv.1703.10717.

  • Besombes, C., O. Pannekoucke, C. Lapeyre, B. Sanderson, and O. Thual, 2021: Producing realistic climate data with generative adversarial networks. Nonlinear Processes Geophys., 28, 347370, https://doi.org/10.5194/npg-28-347-2021.

    • Search Google Scholar
    • Export Citation
  • Beucler, T., M. Pritchard, S. Rasp, J. Ott, P. Baldi, and P. Gentine, 2021: Enforcing analytic constraints in neural networks emulating physical systems. Phys. Rev. Lett., 126, 098302, https://doi.org/10.1103/PhysRevLett.126.098302.

    • Search Google Scholar
    • Export Citation
  • Bihlo, A., 2021: A generative adversarial network approach to (ensemble) weather prediction. Neural Networks, 139, 116, https://doi.org/10.1016/j.neunet.2021.02.003.

    • Search Google Scholar
    • Export Citation
  • Billault-Roux, A.-C., G. Ghiggi, L. Jaffeux, A. Martini, N. Viltard, and A. Berne, 2023: Dual-frequency spectral radar retrieval of snowfall microphysics: A physics-driven deep-learning approach. Atmos. Meas. Tech., 16, 911940, https://doi.org/10.5194/amt-16-911-2023.

    • Search Google Scholar
    • Export Citation
  • Bogenschutz, P. A., A. Gettelman, H. Morrison, V. E. Larson, C. Craig, and D. P. Schanen, 2013: Higher-order turbulence closure and its impact on climate simulations in the Community Atmosphere Model. J. Climate, 26, 96559676, https://doi.org/10.1175/JCLI-D-13-00075.1.

    • Search Google Scholar
    • Export Citation
  • Bühl, J., and Coauthors, 2013: LACROS: The Leipzig Aerosol and Cloud Remote Observations System. Proc. SPIE, 8890, 889002, https://doi.org/10.1117/12.2030911.

    • Search Google Scholar
    • Export Citation
  • Chi, M., A. Plaza, J. A. Benediktsson, Z. Sun, J. Shen, and Y. Zhu, 2016: Big data for remote sensing: Challenges and opportunities. Proc. IEEE, 104, 22072219, https://doi.org/10.1109/JPROC.2016.2598228.

    • Search Google Scholar
    • Export Citation
  • Chin, M., and Coauthors, 2002: Tropospheric aerosol optical thickness from the GOCART model and comparisons with satellite and sun photometer measurements. J. Atmos. Sci., 59, 461483, https://doi.org/10.1175/1520-0469(2002)059<0461:TAOTFT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Chollet, F., and Coauthors, 2015: Keras. GitHub, https://github.com/fchollet/keras.

  • Colarco, P., A. da Silva, M. Chin, and T. Diehl, 2010: Online simulations of global aerosol distributions in the NASA GEOS-4 model and comparisons to satellite and ground-based aerosol optical depth. J. Geophys. Res., 115, D14207, https://doi.org/10.1029/2009JD012820.

    • Search Google Scholar
    • Export Citation
  • Creswell, A., T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, 2018: Generative adversarial networks: An overview. IEEE Signal Process. Mag., 35, 5365, https://doi.org/10.1109/MSP.2017.2765202.

    • Search Google Scholar
    • Export Citation
  • Curcic, M., 2019: A parallel Fortran framework for neural networks and deep learning. ACM SIGPLAN Fortran Forum, New York, NY, Association for Computing Machinery, 4–21, https://dl.acm.org/doi/abs/10.1145/3323057.3323059.

  • Danabasoglu, G., and Coauthors, 2020: The Community Earth System Model version 2 (CESM2). J. Adv. Model. Earth Syst., 12, e2019MS001916, https://doi.org/10.1029/2019MS001916.

    • Search Google Scholar
    • Export Citation
  • Daw, A., A. Karpatne, W. Watkins, J. Read, and V. Kumar, 2017: Physics-Guided Neural Networks (PGNN): An application in lake temperature modeling. arXiv, 1710.11431v3, https://doi.org/10.48550/arXiv.1710.11431.

  • Dean, S. M., J. Flowerdew, B. N. Lawrence, and S. D. Eckermann, 2007: Parameterisation of orographic cloud dynamics in a GCM. Climate Dyn., 28, 581597, https://doi.org/10.1007/s00382-006-0202-0.

    • Search Google Scholar
    • Export Citation
  • Fudeyasu, H., Y. Wang, M. Satoh, T. Nasuno, H. Miura, and W. Yanase, 2008: Global cloud-system-resolving model NICAM successfully simulated the lifecycles of two real tropical cyclones. Geophys. Res. Lett., 35, L22808, https://doi.org/10.1029/2008GL036003.

    • Search Google Scholar
    • Export Citation
  • Gelaro, R., and Coauthors, 2015: Evaluation of the 7-km GEOS-5 Nature Run. NASA Tech. Rep., NASA/TM2014-104606/Vol.36, 305 pp., https://ntrs.nasa.gov/api/citations/20150011486/downloads/20150011486.pdf.

  • Gelaro, R., and Coauthors, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 54195454, https://doi.org/10.1175/JCLI-D-16-0758.1.

    • Search Google Scholar
    • Export Citation
  • Gettelman, A., D. J. Gagne, C.-C. Chen, M. W. Christensen, Z. J. Lebo, H. Morrison, and G. Gantos, 2021: Machine learning the warm rain process. J. Adv. Model. Earth Syst., 13, e2020MS002268, https://doi.org/10.1029/2020MS002268.

    • Search Google Scholar
    • Export Citation
  • Ghan, S. J., L. R. Leung, R. C. Easter, and H. Abdul-Razzak, 1997: Prediction of cloud droplet number in a general circulation model. J. Geophys. Res., 102, 21 77721 794, https://doi.org/10.1029/97JD01810.

    • Search Google Scholar
    • Export Citation
  • Giangrande, S. E., and Coauthors, 2016: Convective cloud vertical velocity and mass-flux characteristics from radar wind profiler observations during GoAmazon2014/5. J. Geophys. Res. Atmos., 121, 12 89112 913, https://doi.org/10.1002/2016JD025303.

    • Search Google Scholar
    • Export Citation
  • Goodfellow, I. J., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, 2014: Generative adversarial nets. NIPS’14: Proc. 27th Int. Conf. on Neural Information Processing Systems, Montreal, QC, Canada, Association for Computing Machinery, 2672–2680, https://dl.acm.org/doi/10.5555/2969033.2969125.

  • Goodfellow, I. J., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 800 pp.

  • Guo, Z., and Coauthors, 2014: A sensitivity analysis of cloud properties to CLUBB parameters in the Single-column Community Atmosphere Model (SCAM5). J. Adv. Model. Earth Syst., 6, 829858, https://doi.org/10.1002/2014MS000315.

    • Search Google Scholar
    • Export Citation
  • Iglesias-Suarez, F., P. Gentine, B. Solino-Fernandez, T. Beucler, M. Pritchard, J. Runge, and V. Eyring, 2023: Causally-informed deep learning to improve climate models and projections. arXiv, 2304.12952v3, https://doi.org/10.48550/arXiv.2304.12952.

  • Illingworth, A. J., and Coauthors, 2007: Cloudnet: Continuous evaluation of cloud profiles in seven operational models using ground-based observations. Bull. Amer. Meteor. Soc., 88, 883898, https://doi.org/10.1175/BAMS-88-6-883.

    • Search Google Scholar
    • Export Citation
  • IPCC, 2013: Climate Change 2013: The Physical Science Basis. Cambridge University Press, 1535 pp., https://doi.org/10.1017/CBO9781107415324.

  • Johnson, S. J., and Coauthors, 2019: SEAS5: The new ECMWF seasonal forecast system. Geosci. Model Dev., 12, 10871117, https://doi.org/10.5194/gmd-12-1087-2019.

    • Search Google Scholar
    • Export Citation
  • Joos, H., P. Spichtinger, U. Lohmann, J.-F. Gayet, and A. Minikin, 2008: Orographic cirrus in the global climate model ECHAM5. J. Geophys. Res., 113, D18205, https://doi.org/10.1029/2007JD009605.

    • Search Google Scholar
    • Export Citation
  • Judt, F., and Coauthors, 2021: Tropical cyclones in global storm-resolving models. J. Meteor. Soc. Japan, 99, 579602, https://doi.org/10.2151/jmsj.2021-029.

    • Search Google Scholar
    • Export Citation