Improving Ensemble Extreme Precipitation Forecasts Using Generative Artificial Intelligence

Yingkai Sha NSF National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Yingkai Sha in
Current site
Google Scholar
PubMed
Close
,
Ryan A. Sobash NSF National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Ryan A. Sobash in
Current site
Google Scholar
PubMed
Close
, and
David John Gagne II NSF National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by David John Gagne II in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

An ensemble postprocessing method is developed to improve the probabilistic forecasts of extreme precipitation events across the conterminous United States (CONUS). The method combines a 3D vision transformer (ViT) for bias correction with a latent diffusion model (LDM), a generative artificial intelligence (AI) method, to postprocess 6-hourly precipitation ensemble forecasts and produce an enlarged generative ensemble that contains spatiotemporally consistent precipitation trajectories. These trajectories are expected to improve the characterization of extreme precipitation events and offer skillful multiday accumulated and 6-hourly precipitation guidance. The method is tested using the Global Ensemble Forecast System (GEFS) precipitation forecasts out to day 6 and is verified against the Climatology-Calibrated Precipitation Analysis (CCPA) data. Verification results indicate that the method generated skillful ensemble members with improved continuous ranked probabilistic skill scores (CRPSSs) and Brier skill scores (BSSs) over the raw operational GEFS and a multivariate statistical postprocessing baseline. It showed skillful and reliable probabilities for events at extreme precipitation thresholds. Explainability studies were further conducted, which revealed the decision-making process of the method and confirmed its effectiveness on ensemble member generation. This work introduces a novel, generative AI–based approach to address the limitation of small numerical ensembles and the need for larger ensembles to identify extreme precipitation events.

Significance Statement

We use a new artificial intelligence (AI) technique to improve extreme precipitation forecasts from a numerical weather prediction ensemble, generating more scenarios that better characterize extreme precipitation events. This AI-generated ensemble improved the accuracy of precipitation forecasts and probabilistic warnings for extreme precipitation events. The study explores AI methods to generate precipitation forecasts and explains the decision-making mechanisms of such AI techniques to prove their effectiveness.

© 2025 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Yingkai Sha, ksha@ucar.edu

Abstract

An ensemble postprocessing method is developed to improve the probabilistic forecasts of extreme precipitation events across the conterminous United States (CONUS). The method combines a 3D vision transformer (ViT) for bias correction with a latent diffusion model (LDM), a generative artificial intelligence (AI) method, to postprocess 6-hourly precipitation ensemble forecasts and produce an enlarged generative ensemble that contains spatiotemporally consistent precipitation trajectories. These trajectories are expected to improve the characterization of extreme precipitation events and offer skillful multiday accumulated and 6-hourly precipitation guidance. The method is tested using the Global Ensemble Forecast System (GEFS) precipitation forecasts out to day 6 and is verified against the Climatology-Calibrated Precipitation Analysis (CCPA) data. Verification results indicate that the method generated skillful ensemble members with improved continuous ranked probabilistic skill scores (CRPSSs) and Brier skill scores (BSSs) over the raw operational GEFS and a multivariate statistical postprocessing baseline. It showed skillful and reliable probabilities for events at extreme precipitation thresholds. Explainability studies were further conducted, which revealed the decision-making process of the method and confirmed its effectiveness on ensemble member generation. This work introduces a novel, generative AI–based approach to address the limitation of small numerical ensembles and the need for larger ensembles to identify extreme precipitation events.

Significance Statement

We use a new artificial intelligence (AI) technique to improve extreme precipitation forecasts from a numerical weather prediction ensemble, generating more scenarios that better characterize extreme precipitation events. This AI-generated ensemble improved the accuracy of precipitation forecasts and probabilistic warnings for extreme precipitation events. The study explores AI methods to generate precipitation forecasts and explains the decision-making mechanisms of such AI techniques to prove their effectiveness.

© 2025 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Yingkai Sha, ksha@ucar.edu

1. Introduction

The accurate prediction of extreme precipitation events is crucial for saving lives and property but continues to challenge our best prediction systems (e.g., Sukovich et al. 2014; Herman and Schumacher 2016). State-of-the-art global numerical weather prediction (NWP) models typically have 10–50-km horizontal grid spacing (e.g., Molteni et al. 1996; Buizza et al. 2019; Zhou et al. 2017, 2022). These horizontal resolutions require parameterization schemes to approximate small-scale processes that contribute to the generation of rainfall, such as convection and the microphysical interactions of cloud and precipitation water particles (Stensrud 2009). Despite the large improvements made over recent decades (Bauer et al. 2015), systematic error remains in these parameterization schemes, particularly for the modeling of extreme precipitation events (Wilcox and Donner 2007; Wehner et al. 2010; Sun and Liang 2020).

Postprocessing methods have been proposed for the bias correction and calibration of precipitation forecasts. These methods include both parametric methods, which assume a known predictive distribution, and nonparametric methods, which derive a predictive distribution from the training data with no prior distribution assumptions. One common parametric method is nonhomogeneous regression (Scheuerer and Hamill 2015; Baran and Nemoda 2016), which is an approach where a regression model predicts the parameters of a parametric distribution, such as a censored, shifted gamma distribution for precipitation. Bayesian model averaging (Sloughter et al. 2007) estimates a multimodal predictive distribution from an ensemble of deterministic predictions by learning the predicted variance and weight of each ensemble member. Nonparametric methods include the analog ensemble (Hamill and Whitaker 2006; Monache et al. 2013), which identifies the most similar prior NWP predictions to a new prediction and creates a distribution of observed values mapped to those NWP analogs, and quantile regression (Bremnes 2004), which optimizes regression models to predict conditional predictive quantiles rather than the most likely values.

While these methods have achieved great success, their primary focus was the overall postprocessing quality, which was dominated by the calibration performance of mild-to-moderate precipitation events. Machine learning–based methods have been introduced to specifically improve extreme precipitation forecasts [e.g., random forest (Herman and Schumacher 2018), feed-forward neural network (Bodri and Čermák 2000), and deep neural network (Li et al. 2022)]. However, a key challenge of these methods is generating skillful forecast trajectories that represent the evolution of extreme precipitation events. Many machine learning models were trained to predict probabilistic values on locations and forecast lead times independently, whereas end users may look for multivariate forecast trajectories for flood risk assessments (e.g., Lai et al. 2020; Huang et al. 2021) and water resource management (e.g., Strauch et al. 2012).

Producing trajectories that can represent extreme precipitation events properly requires a large ensemble set. Currently, the ensemble size of operational global NWP models is typically limited to 30–50 (e.g., Leutbecher 2019; Zhou et al. 2022), which is insufficient to capture extreme events on the very tail side of the precipitation intensity spectra (Bevacqua et al. 2023). Most statistical and machine learning–based postprocessing methods cannot solve this problem because they are designed to utilize available ensemble members; they can hardly create new forecast scenarios from an existing ensemble set to improve the estimation of extreme events. One possible solution is producing hundreds of numerical ensemble members from a regional numerical model configuration (e.g., Ghazvinian et al. 2024), although such efforts are computationally costly.

Recent advances in generative artificial intelligence (AI) have brought new insights into extreme weather prediction problems. State-of-the-art generative AI can learn distribution properties from training data and produce conditional samplings from the target distribution (Creswell et al. 2018; Bond-Taylor et al. 2022; Yang et al. 2023). Compared to physics-based NWP ensembles, generative AI can expand ensemble sizes by producing more forecast members at minimal computational cost. This enlarged generative ensemble set would contain possible evolutions of the state of the atmosphere, thus providing better support for the estimation of high-impact extreme weather.

Several studies have leveraged generative AI in either NWP or AI-based weather prediction. On regional scales, Sha et al. (2024) found that generated ensemble members from deterministic convection-allowing model forecasts improved probabilistic estimations of tornadoes, hail, and wind gusts. On global scales, Li et al. (2024) applied generative AI to postprocess and emulate ensemble forecasts, resulting in improved forecast skill and more accurate predictions of extreme weather. Zhong et al. (2023) integrated generative AI with an AI weather prediction model to produce multiday forecasts with finer-scale spatial details that outperformed the original AI weather forecasts on various extreme weather–based metrics. Price et al. (2023) developed generative AI–based weather prediction models that produced skillful ensemble forecasts for up to 15 days. Generative AI has also been applied to other topics that are closely related to weather forecasting and postprocessing (e.g., Asperti et al. 2023; Mardani et al. 2023; Ling et al. 2024; Gao et al. 2023; Leinonen et al. 2023 for precipitation nowcasting; Ravuri et al. 2021; Zhang et al. 2023; Bassetti et al. 2024 for downscaling).

Motivated by the challenge of multivariate extreme precipitation postprocessing and the application of generative AI in extreme weather prediction, this research proposes a postprocessing framework that incorporates generative AI to improve the estimation of extreme precipitation events. Specifically, we aimed to produce a skillful generative ensemble of 6-hourly precipitation forecasts out to 6 days. This generative ensemble is expected to provide precipitation forecast trajectories that characterize extreme precipitation events properly and can be summarized with probabilistic guidance.

The methodology of this research was developed over the conterminous United States (CONUS) using the Global Ensemble Forecast System (GEFS) as inputs and the Climatology-Calibrated Precipitation Analysis (CCPA) as targets. The following research questions are addressed: 1) How can generative AI methods be incorporated into precipitation forecast postprocessing, and how well do they verify at producing reliable and discriminative probabilistic forecasts at extreme precipitation thresholds? 2) Can we explain the performance of AI-based precipitation postprocessing methods, and what insights can we gain from such explainability analysis? By answering these, the authors examine the effectiveness of generative AI in extreme precipitation forecasts and explore its decision-making mechanisms for postprocessing. Broadly, the authors also wish to introduce generative AI to severe weather–related studies and inspire future creative works.

2. Research domain and data

a. Region of interest and the definition of extreme precipitation events

This research focuses on precipitation events within the CONUS (Fig. 1a). Following the definition of the Intergovernmental Panel on Climate Change (IPCC), Sixth Assessment Report (AR6), gridpoint-wise 99th percentile values were used as thresholds to identify extreme precipitation events (Pörtner et al. 2022). These values were estimated separately for different time periods of the day (0000–0600, 0600–1200, 1200–1800, and 1800–0000 UTC) to capture diurnal variations. The 0000–0600 and 1200–1800 UTC values were provided as examples in Figs. 1b and 1c, respectively.

Fig. 1.
Fig. 1.

(a) The 0.125° grid spacing CONUS domain with shaded elevation. (b),(c) The 2002–19 climatology of gridpoint-wise 99th percentile values of 6-hourly precipitation rates for 0000–0600 and 1200–1800 UTC, respectively. (d) The 1986–2015 climatological probabilities of tornadoes, hail, and wind gusts derived from NOAA Storm Prediction Center (SPC) reports. (e),(f) The corresponding 2002–19 climatological percentiles of 40 mm (6 h)−1 threshold for 0000–0600 and 1200–1800 UTC, respectively. Hatched area in (e) and (f) means percentiles cannot be estimated as 40 mm (6 h)−1 is close to or larger than the historical maximum.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

The percentile-based peaks-over-threshold approach can be inconsistent due to large regional differences (Wang and Tang 2020). Thus, a fixed precipitation rate threshold of 40 mm (6 h)−1 was also used. Extreme precipitation events defined in this way are not only difficult to predict but also have a higher impact by causing flash floods (e.g., Nair et al. 1997; Caracena et al. 1979; Smith et al. 2001). Figures 1e and 1f provide the corresponding percentiles of the 40 mm (6 h)−1 threshold for 0000–0600 and 1200–1800 UTC. This threshold does not occur within the CCPA in the dry areas on the west side of the Rocky Mountains and is most frequent in the southcentral United States (cf. Figs. 1b,c,e,f). This fixed threshold is more extreme than the 99th percentile definition and exhibited some consistencies with the climatology of severe weather (cf. Figs. 1d–f).

Overall, tropical weather systems, such as tropical cyclones, originating from the Gulf of Mexico (e.g., Shepherd et al. 2007) have a large impact on the spatial distribution of precipitation extremes, especially along the Gulf Coast. In addition, for 0000–0600 UTC (i.e., evening-to-night local time) patterns in Fig. 1e, nocturnal convection over the southcentral United States would play an important role (e.g., Jiang et al. 2006; Johnson and Wang 2017; Blake et al. 2017). For the 1200–1800 UTC (i.e., morning-to-afternoon local time) patterns in Fig. 1f, smaller-scale events, including summertime deep convection (e.g., Tian et al. 2005), and small-scale convection introduced by the sea–breeze circulation (e.g., Hill et al. 2010) may also contribute to the daytime precipitation extremes. Aside from the 40-mm threshold, as described in Figs. 1e and 1f and above, other fixed thresholds, ranging from 1 to 35 mm (6 h)−1, were examined to provide comprehensive views of the performance of postprocessing methods.

b. Forecast data

This research aimed to improve the extreme precipitation forecasts from GEFS, version 12 (GEFSv12; Zhou et al. 2022). The GEFSv12 is a state-of-the-art real-time ensemble forecast system operated by the National Oceanic and Atmospheric Administration (NOAA) since September 2020. GEFSv12 implements the Geophysical Fluid Dynamics Laboratory (GFDL) finite-volume cubed-sphere (FV3) dynamical core, ensemble Kalman filter–based data assimilation to generate initial condition uncertainty, and the GFDL cloud microphysics. Its quantitative precipitation forecasts over CONUS were largely improved from its previous versions (Zhou et al. 2022). GEFSv12 has 0.25° output horizontal grid spacing and 64 vertical hybrid levels. It produces 31-member ensemble forecasts four times per day, with 3-hourly output available within the first 10 forecast days. In this research, the 0000 UTC GEFSv12 initializations and 6-hourly total precipitation forecast [accumulated precipitation (APCP)] were selected as the main variable, whereas total-column precipitable water was used for one of the baseline methods.

The operational GEFSv12 comes with a 30-yr reforecast archive to support postprocessing studies (Guan et al. 2022). This reforecast dataset was produced from the same dynamical core, ensemble generation, and model physics as the operational GEFSv12 but with five members and 0000 UTC initializations only. The phase-two, 2000–19 reforecasts, initialized from the GEFSv12 reanalysis (Hamill et al. 2022), were used by this research as training data.

c. Analysis data

CCPA (Hou et al. 2014) was used in this research to represent the analyzed state of precipitation. CCPA is a precipitation dataset that covers the entire CONUS. It statistically adjusts and combines the National Centers for Environmental Prediction (NCEP), Climate Prediction Center (CPC) unified global daily gauge analysis, and the NCEP stage IV multisensor quantitative precipitation estimation (Hou et al. 2014). CCPA was used as the verification target of the operational GEFS system (Zhou et al. 2017, 2022) and has been applied as training and verification targets in various GEFS-based postprocessing studies (e.g., Scheuerer and Hamill 2015; Hamill and Scheuerer 2018; Stovern et al. 2023; Hamill et al. 2023). We have conducted a detailed statistical analysis on CCPA and compared the result with the fifth major global reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF) (ERA5) total precipitation. We found that CCPA captured extreme precipitation events well in CONUS. Details of this analysis are summarized in the supplemental material. The CCPA was also used as a climatology reference, including the estimation of gridpoint-wise precipitation cumulative distribution functions (CDFs), which were used to define percentile-based extreme precipitation events (see section 2a) and compute skill scores.

3. Methods

Two neural network–based postprocessing steps were combined as the main methodology of this research (Fig. 2). First, a 3D vision transformer (ViT) was proposed to reduce the forecast bias of each GEFSv12 member. Second, each bias-corrected member was used as the conditional input of a diffusion model, which generates postprocessed members as outputs. The two steps above were conducted within the latent space created by a vector quantized variational autoencoder (VQ-VAE), with the VQ-VAE encoder projecting GEFS members into the latent space and the VQ-VAE decoder projecting the latent space outputs back to the real space. The term “latent diffusion model (LDM)” was used to highlight the application of VQ-VAE, and hereafter, the method is named “ViT-LDM.”

Fig. 2.
Fig. 2.

(a) The general concept and (b) technical steps of ViT-LDM. Steps that solve the oversmoothness problem of the VAE-based latent diffusion are highlighted using a yellow background color.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

The ViT-LDM postprocessing was trained from 2002 to 2019 using the 6-hourly GEFS reforecasts out to 6 days (i.e., 6–144-h lead times) and the CCPA data. The validation set was a 10% random sampling from the training set and fixed for all training steps. When applied to the operational GEFSv12 in 2021, ViT-LDM generates two postprocessed members from each GEFSv12 member, thus producing 62 generative members from all 31 operational members. The generated 62 members were verified against the CCPA data from 1 January to 31 December 2021 with a focus on extreme precipitation events. The basics and applications of VQ-VAE, ViT, and LDM are introduced in this section. Hyperparameter optimization, training, and other related information are summarized in the online supplemental material. For data preprocessing, 6-hourly precipitation and the CCPA-based climatology were normalized using a rescaled logarithm transformation: y = log (0.1x + 1); the elevation input was normalized to [−1, 1] by linear scaling. Value truncation was applied to the decoder outputs. Precipitation rates lower than 0.1 mm (6 h)−1 were replaced by zero.

a. Latent space projection using VQ-VAE

A VQ-VAE (van den Oord et al. 2017) was employed to convert gridded precipitation fields, either from the GEFS APCP forecasts or the CCPA data, into a compressed and regularized latent space, enabling effective bias correction and ensemble generation.

VQ-VAE is a type of neural network that combines variational autoencoders with discrete latent representations, typically used in generative AI–related applications (e.g., Gu et al. 2022; Hu et al. 2022; Cohen et al. 2022). VQ-VAE uses an encoder to map input data to a latent space, where the representation is quantized using a fixed number of discrete embeddings (i.e., “codebook”). The quantized latent vectors are then decoded to reconstruct the input data. The VQ-VAE of this research was designed based on 2D convolutional layers. Its encoder contains two 4 × 4 downsampling layers with 16 times compression on latitude and longitude dimensions. The VQ-VAE decoder has two 4 × 4 upsampling layers; it takes the encoded latent space features as input and projects them to the original size with minimum information loss. The output section of the VQ-VAE decoder has an extended substructure that accepts elevation and CCPA climatology as additional inputs to improve the decoding quality. The technical highlight of VQ-VAE is its VQ layer. The VQ layer converts continuous encoded information into discrete values by selecting the closest distance vector from a discrete and learnable codebook. VQ-VAE can be viewed as a VAE that produces discrete latent space embeddings.

The use of VQ-VAE in this research brought two major benefits: 1) The VQ-VAE latent space projection reduces data size, so the ViT and LDM can be designed and trained more effectively. 2) A regularized VAE latent space disentangles the input data. This means each VAE latent variable would represent its own factors of variation, and small perturbations within the VAE latent space would not lead to dramatically different outputs. The disentanglement property benefits the stability of ensemble member generation and model interpretation. Compared to direct diffusion, a known disadvantage of VAE-based latent diffusion is the oversmoothness of its outputs (e.g., Yang and Mandt 2024). Two steps were applied to solve this problem: 1) The embedded GEFS APCP forecasts were linearly combined with the generated latent information to guide the VQ-VAE decoder in producing more physically realistic outputs in inference (Fig. 2). 2) The decoder substructure that incorporates elevation and climatology information, as mentioned in the previous paragraph, was also aimed to mitigate the oversmoothness issue (Figs. 2 and 3).

Fig. 3.
Fig. 3.

(a) The architecture of VQ-VAE with 2D convolutional layers, down- and upsampling layers, batch normalization (Ioffe and Szegedy 2015), Gaussian error linear unit (GELU; Hendrycks and Gimpel 2016) activation function, and dropout (Srivastava et al. 2014). (b) The schematics of the VQ layer and its dashed line arrows represent identical mapping. (c) The design of the residual block. A separate output section is highlighted using a yellow background color.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

VQ-VAE features self-supervised training. Its optimization objective contains three components (van den Oord et al. 2017):
(x)=xzd[ze(x)]22+sg[ze(x)]e22+βze(x)sg(e)22,
where x is a training batch, 22 is the mean-square error computation, ze and zd are the VQ-VAE encoder and decoder, respectively, e is the codebook, and “sg” is the stop-gradient operator, which fixes the target from being updated by the current gradient descent step. The first term of Eq. (1) is the reconstruction loss; it minimizes the difference between the input and the reconstructed input. The second term is the codebook loss; it updates the discrete codebook values to keep them close to the continuous encoded information. The last term of Eq. (1) is the commitment loss; it regularizes the encoder to prevent its encoded continuous values from diverging from the current codebook. The β is a constant hyperparameter that defines the relative importance of commitment loss.

 The architecture of the VAE here aligns with other studies that experimented with conditional diffusion in latent space. For example, PreDiff (Gao et al. 2023), a precipitation nowcasting system, applied a similar VAE design with additional postprocessing steps to refine generated outputs. Leinonen et al. (2023) also combines a postprocessing neural network with a VAE-based latent diffusion model for precipitation nowcasting.

The VQ-VAE described above was trained on the 1/8° CCPA data with (224, 464) input sizes; its decoder produces (14, 29, 4) sized latent variables as outputs (the last dimension represents hidden-layer channels). The 0.25° GEFS APCP forecasts were linearly interpolated to 1/8° before encoding. The same interpolation and VQ-VAE weights were applied to all forecast lead times and both the GEFS reforecasts and operational forecasts. The training of VQ-VAE requires roughly 12 h of wall time using a single NVIDIA A100 with 40-GB memory. The inference of VQ-VAE can be completed on CPUs effectively on the seconds per forecast time scale.

b. ViT-based forecasts bias correction

A 3D ViT (Dosovitskiy et al. 2020; Vaswani et al. 2021; Arnab et al. 2021) was applied within the VQ-VAE latent space for the bias correction of GEFS APCP forecasts. Its architecture consists of three components: 1) an input section that converts 3D tensors into embedded patches, 2) stacked ViT blocks that perform attention-based learning, and 3) an output section that converts embedded patches to the original tensor size. The input section conducts patch partition using 3D convolution kernels, and the positional indices are embedded by a dense layer. This design is similar to many AI-based weather forecast models (e.g., Chen et al. 2023). The ViT block follows the conventional design of Arnab et al. (2021); it features multihead self-attention to learn the cross relationships among embedded patches. The 3D ViT is an ideal choice for coupling with diffusion models. We found that 3D ViT has the ability to adjust the latent space representations of ensemble members and guide diffusion models to generate better forecasts. This part will be examined later in the explainability study. More advanced ViT designs, such as shift window–based transformers (SwinTs; Liu et al. 2021), were also examined during the hyperparameter search, but they did not bring better performance.

The 3D ViT operates (1, 1, 1) sized patch partitions with 128 embedded dimensions. Its ViT block has eight stacks with four attention heads (see Fig. 4 for further details). This configuration was trained using the encoded GEFS reforecast ensemble mean as inputs, encoded CCPA as targets, and mean absolute error as the loss function. It processes eight temporal dimensions at once and was trained using the 6–54-h reforecasts only. This training strategy will be discussed further within the context of AI explainability studies. For inference, the same 3D ViT was applied to the operational GEFS members on 6–54, 54–102, and 102–144-h forecasts to generate bias-corrected ensemble trajectories within the VQ-VAE latent space. The training of 3D ViT requires roughly 36 h of wall time using four NVIDIA A100 with 40-GB memory. The inference of 3D ViT can be completed on a single graphics processing units (GPU) in minutes per forecast time scale.

Fig. 4.
Fig. 4.

(a) The architecture of the 3D ViT. The “C” indicates the number of channels. (b) The design of ViT blocks with layer normalization (Ba et al. 2016), multihead self-attention (Vaswani et al. 2017), GELU activation, and dropout. (c) The design of multihead self-attention. The “Q,” “K,” and “V” represent “query,” “key,” and “value,” respectively, which are three copies of the input tensor for self-attention computation (Vaswani et al. 2017).

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

c. Ensemble generation using LDM

An LDM was implemented to produce generative ensembles conditioned on the bias-corrected GEFS members. The archetype of LDM is the denoising diffusion probabilistic model (DDPM) proposed by Ho et al. (2020), and it was extended to a 3D configuration that supports the generation of the entire forecast trajectory.

DDPM contains forward and reverse diffusion processes. For a given sample of the target distribution X0q (X0), the forward diffusion process adds Gaussian noise into the sample iteratively by following a variance schedule:
q(Xt|Xt1)=N(Xt;1btXt1,btI),
where t = {0, 1, …, T} are the diffusion time steps and B = {b0, b1, …, bT} is the diffusion schedule. The reverse diffusion process is achieved by a neural network θ that approximates qθ (Xt−1|Xt) and recovers the noised sample to its original state X0 iteratively. The optimization objective of DDPM is summarized as follows (Ho et al. 2020; Dhariwal and Nichol 2021):
(t)=ϵtθ(Xt,bt,V)22,
where ϵt is the mean-square error of the predicted accumulated effect of forward diffusion and V is an optional, conditional input that can be incorporated during the reverse diffusion processes to influence the estimation of qθ (Xt−1|Xt). For the sample generation of DDPM, XT is a random draw from N(0,I) and reverse diffused to X0, which results in a generated sample.

 The LDM of this research was designed based on the DDPM above and applied within the VQ-VAE latent space. Its architecture is similar to a 3D U-Net (e.g., Ronneberger et al. 2015; Çiçek et al. 2016) but without down- and upsampling levels. The LDM was configured with a 100-step linear schedule; it takes three inputs (Fig. 5a): 1) the output of the previous reverse diffusion step, 2) a ViT bias-corrected ensemble member as conditional input, and 3) the diffusion schedule of the current step; it produces the reverse diffusion output on the current step. The LDM was trained using the ViT-corrected reforecasts members as inputs and CCPA as targets. During the sample generation process, the weights of LDM were modified using the exponential moving average. The application of LDM was based on NVIDIA A100 GPUs with the same specs as 3D ViT and a slightly longer training time. A single NVIDIA A100 GPU was used for inference with a time cost of minutes per ensemble member per forecast lead time.

Fig. 5.
Fig. 5.

(a) The architecture of the 3D diffusion model. The C indicates the number of channels. (b) The design of residual block.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

d. Baseline methods

The combination of analog ensemble (AnEn; Hamill and Whitaker 2006) and ensemble copula coupling (ECC; Schefzik et al. 2013) was considered as the baseline of this research (“AnEn-ECC”). AnEn is a regression-based method that performs univariate bias correction and ensemble calibration. For each forecast lead time and location, AnEn identifies similar historical dates/times within its reforecast training set and forms an ensemble composed of the CCPA training target at the identified date/times. As a nonparametric method, AnEn leverages a large reforecast archive without requiring an a priori distribution assumption; it is easy to implement and can produce realizations with flexible ensemble sizes. These strengths make AnEn a good option for precipitation forecast postprocessing. The AnEn baseline here follows its improved version as introduced by Hamill et al. (2015) but without supplemental locations. It was trained using the 2002–19 GEFS reforecasts and the CCPA target.

ECC is a multivariate, nonparametric method that recovers spatiotemporal consistencies from univariate postprocessing outputs. Given calibrated AnEn members, ECC applies 31 operational GEFS members as “dependence templates” and reindexes 31 AnEn members based on the rank structure of the selected templates.

More advanced precipitation postprocessing methods were considered for use as a baseline, such as Scheuerer and Hamill (2015) and Stovern et al. (2023), but these methods typically produce probabilistic values directly rather than forecasted trajectories with physics-based units. We prefer AnEn-ECC because, similar to ViT-LDM, AnEn-ECC can postprocess GEFS precipitation ensembles into forecast trajectories, which allows the flexibility of extreme precipitation verification with different definitions and thresholds (see section 2a).

The original 31 operational GEFS members (“GEFS-Raw”) were also used as a baseline. The two baselines, AnEn-ECC and GEFS-Raw, will be contrasted with ViT-LDM in extreme precipitation verification. Note that each of the two baselines contains 31 members, whereas the ViT-LDM generates 62 members. Although the total number of ensemble members is unequal, we think such a comparison is still fair because generating more ensemble members is part of the methodology and purpose of ViT-LDM.

e. Verification methods

ViT-LDM and the two baselines were verified from 1 January to 31 December 2021. The general postprocessing performance of all methods was examined using the continuous ranked probability scores (CRPSs) and CRP skill scores (CRPSSs), whereas the performance of extreme precipitation forecasts was verified using the Brier score (BS) and Brier skill score (BSS; Murphy 1973). The climatology reference of CRPSS and BSS was calculated from the 2002–19 CCPA data (section 2b). The probabilistic forecasts of extreme precipitation events were averaged from the deterministic results of ensemble members.

The computation of spatiotemporally aggregated BSSs follows Hamill and Juras (2006), with the BS on individual grid point and forecast lead times being computed and aggregated first and then converted to BSS by applying the climatology reference. The three-component decomposition of BS and reliability diagrams were also computed to attribute the BSS difference; their computation follows Murphy (1973) and Hsu and Murphy (1986). Bootstrapping was applied to estimate the confidence intervals of skill scores. It was conducted separately on positive (i.e., extreme precipitation cases) and negative samples to preserve their relative ratios. Two-sided Wilcoxon signed-rank tests were applied to the CRPSS comparisons to determine if skill scores were statistically significantly different.

Note that the BSS values of this study are expected to be relatively low compared to those of regular probabilistic forecasts. This is because extreme precipitation events were verified at fine spatial resolutions. The BSSs on extreme events are generally low because these events are statistically rare, and observations are typically dominated by nonevents of interest. The finer spatial resolution (i.e., 1/8°) of this study brings additional challenges to the estimation of extreme precipitation events because larger penalties would be assigned for small displacement errors.

4. Results

a. Case-based assessments

A case-based assessment is presented to demonstrate the generative ensemble produced by ViT-LDM. In Fig. 6, an example of 48–54-h GEFS forecasts, initialized on 0000 UTC 30 December 2020, is presented. At this time, a synoptic-scale system was forecast in the southeastern United States. The system caused extreme precipitation (see the dotted area in Fig. 6d) and gradually moved toward the East Coast. Part of the ViT-LDM generative ensemble members is shown in Figs. 6e–i with the corresponding extreme precipitation events highlighted by the dotted area. Comparing the ViT-LDM outputs with the two baselines, several performance highlights are evident:

  1. Precipitation patterns generated by ViT-LDM shared roughly the same locations as the CCPA verification target. The LDM-based conditional sampling from the bias-corrected GEFS ensemble members has the ability to preserve the broadscale structure of the forecast event. This ensures that the generative ensemble would not exhibit large spatial discrepancies and place negative impacts on the prediction of extreme precipitation events.

  2. Differences in terms of the shape and intensity of the generated precipitation patterns can be found. For example, in Fig. 6f, the generated precipitation pattern had similar shapes to the CCPA target, but its forecast extreme precipitation area was shifted to the east. In Figs. 6g and 6i, extreme precipitation events were forecast on the correct grid points, but the precipitation pattern was extended to the south. Such small-scale variations provided good horizons on how this extreme precipitation event would develop. The probabilistic forecasts, collectively summarized from the ViT-LDM generative ensemble, showed good BS and outperformed the two baselines.

  3. The generated members were smoother than the CCPA verification target. This indicates that, although the ViT-LDM was trained using the 1/8°, they may not have the full ability to downscale 0.25° GEFS inputs into the 1/8° target resolution. That said, spatial downscaling is not the purpose of ViT-LDM.

Fig. 6.
Fig. 6.

An example of 48–54-h forecasts with extreme precipitation events on 0000–0600 UTC 1 Jan 2021. (a) Calibrated probabilities of precipitation rate > gridpoint-wise 99th percentile events. (b) As in (a), but for GEFS-Raw. (c) As in (a), but for AnEn-ECC. (d) The CCPA verification target. (e)–(i) Example of generative ensembles produced by ViT-LDM. Hatched areas represent where the 99th extreme events were (d) analyzed or (e)–(i) forecasted.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

b. General postprocessing performance verification

CRPSSs were averaged over all CONUS grid points and shown as functions of 6-hourly forecast lead times. CRPSS compares the entire predicted CDF, as represented by ensemble members, against the deterministic verification target. The CRPSS verification here is not focused on extreme precipitation events; rather, it measures the general forecast skill of the precipitation ensemble.

ViT-LDM and AnEn-ECC performed better than the GEFS-Raw (Fig. 7a), indicating that both postprocessing methods can produce more skillful precipitation forecasts than the raw ensemble output. Their CRPSS gains were positive throughout but larger for shorter forecast lead times and smaller for 72-h and longer lead times (Fig. 7b). The reduced forecast skills in longer forecast lead times indicate that the limited predictability of GEFS APCP placed a strong impact on all postprocessing methods. With the raw precipitation forecasts gradually diverging from the verification target on longer forecast lead times, it is difficult for postprocessing methods to reconstruct the correct precipitation fields. The CRPSS differences between ViT-LDM and AnEn-ECC were statistically significant for the first 48 h, with ViT-LDM performing better. For longer forecast lead times, the performance of AnEn-ECC was slightly superior to ViT-LDM (Fig. 7c). This indicates that AnEn-ECC is a competitive baseline; it can produce statistically calibrated precipitation forecasts with improved CRPSS.

Fig. 7.
Fig. 7.

Verification of ViT-LDM (red solid line), GEFS-Raw (blue dashed line), and AnEn-ECC (cyan dashed line) with CRPSSs(higher is better) in 2021. (a) Domainwise averaged CRPSS curves by forecast lead times. (b) The CRPSS differences between ViT-LDM and GEFS-Raw. (c) The CRPSS differences between ViT-LDM and AnEn-ECC. CRPSS curves in (a) were averaged from 100 bootstrapped replicates with error bars representing the 95% confidence intervals.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

The BSSs of precipitation events computed from a set of fixed thresholds, ranging from 1 to 40 mm (6 h)−1, were examined to provide further insights into the general performance of the precipitation ensembles. The lower end of these thresholds, such as 1, 5, and 10 mm (6 h)−1, is related to mild and moderate precipitation events, whereas 20 mm (6 h)−1 and above characterizes heavy-to-extreme events.

At lower thresholds, AnEn-ECC had the largest BSS among the three techniques (Fig. 8). Its BSS at 1-mm (6 h)−1 events was ≈ 0.5, which also provided major contributions to the CRPSS increase in Fig. 7a. For heavy-to-extreme events, however, the performance of AnEn-ECC decreased quickly with increasing threshold values, indicating that it is not an ideal option for postprocessing extreme precipitation events. While the performance of ViT-LDM was suboptimal for precipitation events with lower thresholds, it was superior for heavy-to-extreme events, which increasing benefit as the threshold was increased. The good performance of ViT-LDM on extreme precipitation events will be examined further with reliability diagrams. Its suboptimal performance on mild and moderate precipitation events will also be discussed with explainability studies.

Fig. 8.
Fig. 8.

Verifications of ViT-LDM (red bars), GEFS-Raw (blue bars), and AnEn-ECC (cyan bars) with BSSs (higher is better) in 2021. (a) BSSs of precipitation events derived from 1- to 40-mm (6 h)−1 thresholds and for 6–54-h forecasts. (b) As in (a), but for 54–102-h forecasts. (c) As in (a), but for 102–144-h forecasts. The “x” means the BSS is lower than 0.001.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

c. Extreme precipitation verification with reliability diagrams

Reliability diagrams in Fig. 9 provided detailed calibration performance of all methods on forecasting 6-hourly extreme precipitation events, defined based on the fixed 40 mm (6 h)−1 threshold. As introduced in section 2a and Fig. 1, this threshold emphasizes forecast performance across the Great Plains and the southeastern United States, where extreme precipitation events are typically triggered by supercell thunderstorms, mesoscale convective systems, and other forms of small-scale convection. Producing both well-calibrated and sharp probabilistic precipitation predictions at this high of a threshold is a challenge at which most postprocessing methods have struggled.

Fig. 9.
Fig. 9.

Verification of (top) forecasted 6-hourly extreme precipitation events with reliability diagrams, (middle) frequency of occurrence, and (bottom) BS (“Brier”; lower is better) decompositions [ reliability (“REL”; lower is better), resolution (“RES”; higher is better), and climatological uncertainty (o¯)] in 2021. All scores were computed based on events of 6-hourly precipitation rates > 40 mm (6 h)−1 threshold. In (a)–(c), metrics were averaged over 6–54-, 54–102-, and 102–144-h forecasts, respectively. Dashed no-skill reference lines and perfect reliability diagonal reference lines are included. Calibration curves were averaged from 100 bootstrap replicates with error bars and color shades representing the 95% confidence intervals.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

The performance of AnEn-ECC on extreme precipitation events was found suboptimal in Fig. 8. From the BS decompositions, it was revealed that AnEn-ECC improved the reliability from GEFS-Raw, but its resolution was too low due to probabilities that rarely exceeded 30%. In addition, the AnEn-ECC results were also underconfident, likely due to the fact that 40 mm (6 h)−1 is roughly the 99.6th percentile of the verified area (Figs. 1e,f). These events may not be represented well within the GEFS reforecast training set.

The GEFS-Raw forecasts exhibited better resolution than the AnEn-ECC forecasts but the GEFS-Raw forecasts were also unreliable; its calibration curve stayed close to the “no-skill” reference line. The GEFS-Raw has improved resolution compared to AnEn-ECC because it overpredicted extreme precipitation events among all its members, which resulted in the probabilistic forecasts being distinguishable from the climatological mean. However, the GEFS-Raw forecasts were found to have poor reliability as their forecasted probabilities often did not co-occur with observed extreme precipitation events.

The ViT-LDM forecasts showed the best performance in this verification. Its 6–54-h calibration performance was impressive, with the number of high-probability forecasts comparable to that of the GEFS-Raw, and a reliability curve followed the perfectly reliable line. For longer forecast lead times, particularly 102–144 h, the resolution of ViT-LDM decreased, mainly due to the reduced predictability of these extreme events. Nonetheless, ViT-LDM still outperformed the two baselines.

Overall, for extreme precipitation events closely related to deep and intense convection in the Great Plains and the southeastern United States, ViT-LDM exhibited excellent calibration performance for short forecast lead times and clearly outperformed the two baselines for all verified forecast lead times. This result is also aligned with Sha et al. (2024), which revealed the good performance of generative AI in predicting severe weather events in this area.

Figure 10 examines forecast reliability for extreme precipitation events identified based on the gridpoint-wise 99th percentile thresholds. Similar to the 40-mm-based verification in Fig. 9, the resolution of AnEn-ECC was too low, which reduced its calibration performance. Since the actual precipitation rates of 99th percentile thresholds were typically lower than 40 mm (6 h)−1, AnEn-ECC generated more large probabilities (Fig. 9). The GEFS-Raw was capable of issuing higher probabilities for extreme precipitation events as well; however, probabilities were often overforecast and its reliability curves stayed around the no-skill reference line.

Fig. 10.
Fig. 10.

As in Fig. 9, but for extreme events of 6-hourly precipitation rates > gridpoint-wise 99th percentile thresholds. Note that o¯ is not strictly equal to 0.01 because it was derived from the 2000–19 CCPA climatology, not from the 2021 verification period.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

Among the three methods, the ViT-LDM had the best calibration. Its reliability was comparable to the AnEn-ECC baseline but preserved the resolution of the GEFS-Raw forecasts. The latter can be further confirmed by the frequency of occurrence plots, where the number of high-probability extreme precipitation forecasts issued by the ViT-LDM was comparable to that of the GEFS-Raw. Meanwhile, the AnEn-ECC rarely produced extreme precipitation probabilities > 0.5. Overall, for 6-hourly extreme precipitation events defined based on 99th percentile thresholds, the postprocessed ensemble trajectories produced by ViT-LDM were verified to be skillful compared to the two baseline forecasts.

Finally, we examine the reliability of 6-day accumulated precipitation greater than the gridpoint-wise 99th percentile (Fig. 11). Temporally aggregated precipitation forecasts are sensitive to the spatiotemporal covariability of the forecasted trajectories. Thus, this verification examines how well these methods can produce spatiotemporally consistent forecasts. In addition, it is also a good indicator of the usefulness of postprocessing methods in real-world scenarios where end users can be warned of a sequence of incoming extreme precipitation events.

Fig. 11.
Fig. 11.

As in Fig. 9, but for extreme events of 6-day accumulated precipitation amount > gridpoint-wise 99th percentile thresholds.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

Based on the shape and position of reliability curves in Fig. 11, all methods were as reliable as they were for 6-hourly forecast lead times, indicating that ViT-LDM and the two baselines produced spatiotemporally consistent forecast trajectories. The reliability and resolution of the 6-day accumulated forecasts were consistent with the performance of the short lead-time 6–54-h forecasts. This is because the timing error of extreme events was largely eliminated when the entire trajectory was aggregated into a single time frame (e.g., Jeworrek et al. 2021).

For the two baselines, their spatiotemporal consistency was expected because ECC reassembles AnEn members by using the GEFS-Raw as dependence templates (Schefzik et al. 2013), and the GEFS-Raw, as produced for a physics-based numerical model, is spatiotemporal consistent. For ViT-LDM, its spatiotemporal consistency was confirmed in this verification, and it outperformed the two baselines with the best reliability and resolution decompositions. This result shows the effectiveness of 3D ViT and LDM on bias-correcting and generating forecast trajectories that characterizes the evolution of extreme precipitation events well, and these trajectories are practical to be used as 6-day guidance.

d. VQ-VAE latent space visualization and explainability studies

In this section, the predictive behavior of the 3D ViT and LDM were examined within the VQ-VAE latent space. The purpose of this study is to identify the contribution of the two neural networks in precipitation postprocessing and ensure that their performances were attributed to decisions rather than overfitting or artifacts.

The same 1 January 2021 extreme precipitation events as in Fig. 6, but with 0000–0600 UTC, 48–54 and 96–102-h forecasts were selected, and the technical steps of their explainability studies were introduced in Fig. 12a: The encoder projects the raw GEFS forecasts, CCPA targets, and the outputs of the two neural networks into the VQ-VAE latent space. Two latent space dimensions out of four were selected, and their averaged codebook values were computed and visualized on 2D axes.

Fig. 12.
Fig. 12.

(a) The schematics of the VQ-VAE latent space visualization. (b) Visualization examples on 1 Jan 2021 with 00–06-, 48–54-, and 96–102-h forecasts, respectively. Small and large blue dots are the latent space representations of the original GEFS ensemble members and the ensemble mean, respectively. Yellow and red dots are the representations of 3D ViT outputs and ViT-LDM outputs. Star symbols are the representations of the CCPA verification target. Dashed lines were produced from kernel density estimates.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

In Fig. 11b, the latent space representations of the GEFS APCP forecasts and the CCPA targets exhibited large spatial differences (cf. blue dots and star symbols in Fig. 12b). Such differences may not be identifiable in the real space with 224 × 464 grid points; however, when projected to a condensed and regularized VQ-VAE latent space, different positions were assigned for forecasts and analysis fields. That said, with the disentanglement property of a pretrained VQ-VAE, GEFS APCP forecasts and the CCPA targets are clearly separable within the latent space.

The separations of forecasts and analysis further revealed the decision-making process of the 3D ViT; it relocates each GEFS APCP member from a forecast-oriented representation to an analysis-oriented representation (cf. blue dots and yellow dots in Fig. 12b). When all the GEFS members were postprocessed in this way, they would stay around the position of the CCPA target; therefore, the overall CRPSS would expect an increase. Section 3b mentioned the training procedure of the 3D ViT; it was trained using the 6–54-h GEFS ensemble mean but applied to individual members and all forecast lead times. This training strategy can be explained in Fig. 12b. For short forecast lead times, the latent space representation of the GEFS ensemble mean was surrounded by all its ensemble members well, which means the learned relationships between the ensemble mean and the CCPA target can be applied to individual members directly. For longer forecast lead times, the latent space position of the GEFS ensemble mean no longer stayed close to its ensemble members (cf. large and small blue dots in Fig. 11b for different forecast lead times), so it has lost its ability on representing the bias correction relationships between individual ensemble members and the CCPA target. In addition, the relative positions of the GEFS forecasts and the CCPA targets stayed roughly the same for all visualized forecast lead times, which means the learned bias correction relationships for short forecast lead times can potentially be generalized to longer lead times. Thus, training 3D ViT on short forecast lead times and applying to a longer range of hours are a valid option.

The role of the diffusion model and the limitation of ViT-based postprocessing can be identified from this explainability study. The 3D ViT is a deterministic neural network; it relocates GEFS ensemble members within the VQ-VAE latent space to achieve bias correction, but the relocated members stayed very close to each other, and the spatial coverage of their representations was smaller than the raw GEFS members. We suspect these closely clustered members may have amplified the overprediction of the GEFS on drizzle forecasts and caused the suboptimal performance on calibrating 1- and 5-mm (6-h)−1 events. The diffusion model showed the ability to enlarge such spatial coverage. Its generative ensemble preserved the latent space locations of the 3D ViT outputs while expanding their spatial coverage (cf. yellow and red dots in Fig. 12b). The spatial expansion is expected to improve the overall CRPSS performance because the CRPS computation rewards intermember differences when the mean absolute error is preserved (Grimit et al. 2006). In addition, the spatial expansion also generated a few outliers from the 3D ViT outputs. As discussed in Li et al. (2024), generated outliers are connected to possible scenarios for the evolution of extreme weather, thus benefiting the calibration of extreme precipitation events.

The predictive behaviors of ViT-LDM were further analyzed by comparing the relative contribution of the ViT against the full ViT-LDM. Based on the verification set scores in Table 1, both of the two neural network components to the ensemble postprocessing and their relative contributions were comparable for the estimation of extreme precipitation events. For the general performance as measured by CRPS, the contribution of the LDM is relatively larger than that of the 3D ViT, although both are necessary to produce unbiased, well-dispersed forecasts. This is aligned with the explainability studies, where the generative members were found to represent larger areas within the VQ-VAE latent space.

Table 1.

Ablation studies of 3D ViT only and the full ViT-LDM predictions, contrasted by the GEFS-Raw baseline in 2021. Domainwise CRPS (lower is better) and BS of 6-hourly precipitation rate > gridpoint-wise 99th percentile events (lower is better) were applied as metrics.

Table 1.

The verification and explainability studies confirmed the effectiveness of ViT-LDM as a postprocessing framework that supports the probabilistic estimation of extreme precipitation events. Here, the potential benefits of ViT-LDM are examined by comparing its performance across different extreme precipitation events and ensemble sizes. As shown in Fig. 13, generating more ensemble members would typically lead to more skillful probabilistic forecasts of extreme precipitation events. For short forecast lead times and more extreme thresholds [i.e., 40 mm (6 h)−1], these benefits are more pronounced, with the ensemble size increase and the BS decrease exhibiting a close-to-linear relationship (Figs. 13a,b). For longer forecast lead times and less extreme thresholds (i.e., 99th percentile), diminishing marginal effects were observed. When the ensemble size exceeds 100 members, the benefit of adding more members becomes limited (Figs. 13c–f).

Fig. 13.
Fig. 13.

The BS (lower is better) performance of ViT-LDM on 6-hourly extreme precipitation events and with varying ensemble sizes. The ensemble sizes were increased by generating more members from each GEFS-Raw member.

Citation: Artificial Intelligence for the Earth Systems 4, 2; 10.1175/AIES-D-24-0063.1

Two reasons may explain such phenomena: 1) For less extreme thresholds, 100 members may have provided a sufficiently large ensemble space for calibration; 2) for 40-mm (6-h)−1 events in longer forecast lead times, the forecasts of GEFS-Raw and ViT-LDM may lack sufficient predictability (cf. Figs. 13a,c), so generating more ensemble members of the same quality does not improve performance. In summary, the potential benefits of ViT-LDM are considered the largest for short-term extreme precipitation events with higher/stricter thresholds. This finding can be considered for the research and operation of generative AI–based methods in the future.

5. Discussion

This research utilized generative AI for the ensemble postprocessing of extreme precipitation events. Two research questions were examined. The first was related to the feasibility of generative AI and its performance in forecasting extreme precipitation. The implementation of generative AI in this study was successful, and its technical approach was similar to that of Li et al. (2024) and Zhong et al. (2023), which applied diffusion models using forecasted fields as conditional inputs. An important technical choice of ViT-LDM that differs from other research is the use of latent space projection, which reduced the overall data sizes and simplified the training of 3D ViT and LDM. In the early stages of this research, the authors experimented with 3D ViTs configured with larger patch sizes and without using a VQ-VAE. Strong checkerboard artifacts were identified with this choice. The VQ-VAE-based latent space projection and patch size 1 were then proposed, leading to the success of this generative AI application. Elevation and climatology inputs were incorporated into the ViT-LDM pipeline as background information. This option benefits the generation of precipitation fields with better quality, and its effectiveness has been discussed in other downscaling (e.g., Sha et al. 2020a,b; Wang et al. 2021) and bias-correction(e.g., Sha et al. 2022) studies. The performance of ViT-LDM in predicting extreme precipitation events were verified to be better than the AnEn-ECC baseline, a set of widely used postprocessing techniques, as well as the original GEFS ensemble. The improved performance of ViT-LDM is especially evident from the verification of 40 mm (6-h)−1 events, which many existing postprocessing methods cannot calibrate properly. We think generative AI has the potential to be applied to the ensemble generation of regional precipitation forecasts and improve the prediction of extreme precipitation events.

The second research question focused on the explainability of ViT-LDM. Here, the example-based explainability studies were conducted. An important finding of this study is that the VQ-VAE latent space is capable of separating GEFS forecasts and the CCPA targets, which further explained the effectiveness of 3D ViT in relocating the forecast-oriented representations. The role of LDM was identified as expanding the spatial area of latent space representations, which improved both the general postprocessing performance and the calibration of extreme precipitation events. The explainability of AI-based forecast postprocessing methods is generally lacking. For many neural network–based postprocessing methods, it is unclear how they have improved the quality of numerical forecasts. This research provided an example of exploring the decision-making mechanism of these neural networks. In addition, explorations of performance with varying ensemble sizes in section 4d have also brought evidence and new insights on the potential benefits of implementing generative AI in extreme weather problems.

The ViT-LDM was found to be suboptimal for calibrating mild and moderate precipitation events, defined by 1 and 5 mm (6-h)−1 thresholds. Future work could be conducted to tackle this challenge. Based on the explainability results, we think the order of the two postprocessing steps can be changed. That is, generative AI can be applied to the raw forecast directly, and the resulting generative ensemble can be recalibrated by another postprocessing method. This choice avoids the use of a deterministic postprocessing neural network and enables the flexibility of implementing ensemble-based calibration methods. With the potential improvement of our method, more benchmarking efforts can be made to quantify the relative contribution of generative AI–based precipitation postprocessing. In addition, LDM-based postprocessing can be generalized to other regions, climate conditions, and other meteorological extremes [e.g., Chen et al. (2023) for extreme 2-m air temperature and 10-m wind].

6. Conclusions

A novel postprocessing method, ViT-LDM, was proposed by incorporating a vector quantized-variational autoencoder (VQ-VAE), a 3D vision transformer (ViT), and a latent diffusion model (LDM). The method takes 6-hourly precipitation forecasts from numerical ensembles as inputs and generates postprocessed trajectories that are skillful for the probabilistic estimation of extreme precipitation events. The 3D ViT aims to reduce conditional bias from the original numerical ensemble, while the LDM produces an expanded generative ensemble that better characterizes extreme events.

The method was trained using the Global Ensemble Forecast System, version 12 (GEFSv12), reforecasts as inputs and the Climatology-Calibrated Precipitation Analysis (CCPA) as targets from 2002 to 2019 and tested with the operational GEFSv12 over the conterminous United States (CONUS) from 1 January 2021 to 31 December 2021. Verification results showed that the method generated skillful precipitation forecast trajectories, as indicated by continuous ranked probabilistic skill scores (CRPSSs) and Brier skill scores (BSSs). Its calibration performance for extreme precipitation events was superior to that of the operational GEFS and the combination of analog ensemble (AnEn) and ensemble copula coupling (ECC). Reliability diagrams demonstrated that the probabilistic extreme precipitation forecasts of ViT-LDM were as reliable as that of the AnEn-based calibrations but with better resolution scores. For the verification of 6-day accumulated extreme precipitation events, ViT-LDM maintained the same good reliability and resolution seen across the individual 6-hourly forecast lead times, indicating that its generated trajectories were spatiotemporally consistent and could be aggregated to provide multiday forecast guidance.

Explainability studies were conducted to examine the decision-making process of ViT-LDM and provided evidence on the potential benefits of implementing generative methods in severe weather problems. These studies revealed that the VQ-VAE latent space provided good separations between the GEFS forecasts and the CCPA analysis, while the 3D ViT was capable of relocating the latent space representations of the raw GEFS members to achieve bias correction. It was also confirmed that the latent space representations of the LDM-generated members were clustered around the location of the CCPA verification target with an enlarged spatial coverage. This enlarged cluster with expanded ensemble size improved the characterization of extreme precipitation events. A potential weakness of ViT-LDM was its suboptimal calibration performance for mild and moderate precipitation events, attributed to the smoothness effect of the VQ-VAE decoder and the deterministic nature of the 3D ViT bias-correction network. Possible solutions were discussed as future research directions.

In summary, ViT-LDM leverages a generative artificial intelligence (AI) approach for extreme precipitation forecasts. It produces skillful and spatiotemporally consistent precipitation forecast trajectories, bridging the gap between limited numerical ensemble sizes and the need for large ensemble sets to assess extreme precipitation events. More broadly, it provides a framework for implementing generative AI methods to address weather forecasting challenges.

Acknowledgments.

The authors thank Dr. Yan Luo, I.M. Systems Group, Inc. (IMSG), NOAA, for the archived CCPA dataset. This material is based upon work supported by the National Center for Atmospheric Research (NCAR), which is a major facility sponsored by the National Science Foundation (NSF) under Cooperative Agreement 1852977. This research was supported by NOAA OAR Grant NA19OAR4590128, the NSF NCAR Short-Term Explicit Prediction Program, and NSF Grant RISE-2019758. Supercomputing support was provided by NSF NCAR Cheyenne and Casper [Computational and Information Systems Laboratory (CISL) 2020]. The authors also thank Dr. John S. Schreck at NSF NCAR and anonymous reviewers for their feedback.

Data availability statement.

The data preprocessing, neural network training, and data visualization code of this research can be found at https://github.com/yingkaisha/AIES_24_0063. A frozen release of the code is available at https://doi.org/10.5281/zenodo.14541840. The GEFSv12 forecasts and reforecasts are available at https://aws.amazon.com/marketplace/pp/prodview-qumzmkzc2acri and https://registry.opendata.aws/noaa-gefs-reforecast/, respectively. The long-term CCPA data of this research are archived in the NOAA supercomputing system; readers may contact Dr. Jun Du at the Environmental Modeling Center (EMC), NOAA, for details. The 1-week near-real-time CCPA data are available at https://ftp.ncep.noaa.gov/data/nccf/com/ccpa/prod/.

REFERENCES

  • Arnab, A., M. Dehghani, G. Heigold, C. Sun, M. Lučić, and C. Schmid, 2021: ViVit: A video vision transformer. 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 68366846, https://doi.org/10.1109/ICCV48922.2021.00676.

    • Search Google Scholar
    • Export Citation
  • Asperti, A., F. Merizzi, A. Paparella, G. Pedrazzi, M. Angelinelli, and S. Colamonaco, 2023: Precipitation nowcasting with generative diffusion models. arXiv, 2308.06733v2, https://doi.org/10.48550/arXiv.2308.06733.

    • Search Google Scholar
    • Export Citation
  • Ba, J. L., J. R. Kiros, and G. E. Hinton, 2016: Layer normalization. arXiv, 1607.06450v1, https://doi.org/10.48550/arXiv.1607.06450.

  • Baran, S., and D. Nemoda, 2016: Censored and shifted gamma distribution based EMOS model for probabilistic quantitative precipitation forecasting. Environmetrics, 27, 280292, https://doi.org/10.1002/env.2391.

    • Search Google Scholar
    • Export Citation
  • Bassetti, S., B. Hutchinson, C. Tebaldi, and B. Kravitz, 2024: Diffesm: Conditional emulation of temperature and precipitation in Earth system models with 3D diffusion models. J. Adv. Model. Earth Syst., 16, e2023MS004194, https://doi.org/10.1029/2023MS004194.

    • Search Google Scholar
    • Export Citation
  • Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 4755, https://doi.org/10.1038/nature14956.

    • Search Google Scholar
    • Export Citation
  • Bevacqua, E., L. Suarez-Gutierrez, A. Jézéquel, F. Lehner, M. Vrac, P. Yiou, and J. Zscheischler, 2023: Advancing research on compound weather and climate events via large ensemble model simulations. Nat. Commun., 14, 2145, https://doi.org/10.1038/s41467-023-37847-5.

    • Search Google Scholar
    • Export Citation
  • Blake, B. T., D. B. Parsons, K. R. Haghi, and S. G. Castleberry, 2017: The structure, evolution, and dynamics of a nocturnal convective system simulated using the WRF-ARW model. Mon. Wea. Rev., 145, 31793201, https://doi.org/10.1175/MWR-D-16-0360.1.

    • Search Google Scholar
    • Export Citation
  • Bodri, L., and V. Čermák, 2000: Prediction of extreme precipitation using a neural network: Application to summer flood occurrence in Moravia. Adv. Eng. Software, 31, 311321, https://doi.org/10.1016/S0965-9978(99)00063-0.

    • Search Google Scholar
    • Export Citation
  • Bond-Taylor, S., A. Leach, Y. Long, and C. G. Willcocks, 2022: Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell., 44, 73277347, https://doi.org/10.1109/TPAMI.2021.3116668.

    • Search Google Scholar
    • Export Citation
  • Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338347, https://doi.org/10.1175/1520-0493(2004)132%3C0338:PFOPIT%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., J. Du, Z. Toth, and D. Hou, 2019: Major operational Ensemble Prediction Systems (EPS) and the future of EPS. Handbook of Hydrometeorological Ensemble Forecasting, Q. Duan et al., Eds., Springer, 151193.

    • Search Google Scholar
    • Export Citation
  • Caracena, F., R. A. Maddox, L. R. Hoxit, and C. F. Chappell, 1979: Mesoanalysis of The Big Thompson Storm. Mon. Wea. Rev., 107 (1), 117, https://doi.org/10.1175/1520-0493(1979)107<0001:MOTBTS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Chen, L., X. Zhong, F. Zhang, Y. Cheng, Y. Xu, Y. Qi, and H. Li, 2023: FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Climate Atmos. Sci., 6, 190, https://doi.org/10.1038/s41612-023-00512-1.

    • Search Google Scholar
    • Export Citation
  • Çiçek, Ö., A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, 2016: 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016, S. Ourselin et al., Eds., Lecture Notes in Computer Science, Vol. 9901, Springer, 424432.

    • Search Google Scholar
    • Export Citation
  • Cohen, M., G. Quispe, S. L. Corff, C. Ollion, and E. Moulines, 2022: Diffusion bridges vector quantized variational autoencoders. arXiv, 2202.04895v2, https://doi.org/10.48550/arXiv.2202.04895.

    • Search Google Scholar
    • Export Citation
  • Creswell, A., T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, 2018: Generative adversarial networks: An overview. IEEE Signal Process. Mag., 35, 5365, https://doi.org/10.1109/MSP.2017.2765202.

    • Search Google Scholar
    • Export Citation
  • Dhariwal, P., and A. Nichol, 2021: Diffusion models beat GANs on image synthesis. NIPS’21: Proceedings of the 35th International Conference on Neural Information Processing Systems, Curran Associates Inc., 87808794, https://dl.acm.org/doi/10.5555/3540261.3540933.

    • Search Google Scholar
    • Export Citation
  • Dosovitskiy, A., and Coauthors, 2020: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv, 2010.11929v2, https://doi.org/10.48550/arXiv.2010.11929.

    • Search Google Scholar
    • Export Citation
  • Gao, Z., and Coauthors, 2023: Prediff: Precipitation nowcasting with latent diffusion models. Advances in Neural Information Processing Systems 36, A. Oh et al., Eds., Neural Information Processing Systems Foundation, 78621–78656, https://proceedings.neurips.cc/paper_files/paper/2023/file/f82ba6a6b981fbbecf5f2ee5de7db39c-Paper-Conference.pdf.

    • Search Google Scholar
    • Export Citation
  • Ghazvinian, M., and Coauthors, 2024: Deep learning of a 200-member ensemble with a limited historical training to improve the prediction of extreme precipitation events. Mon. Wea. Rev., 152, 15871605, https://doi.org/10.1175/MWR-D-23-0277.1.

    • Search Google Scholar
    • Export Citation
  • Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quart. J. Roy. Meteor. Soc., 132, 29252942, https://doi.org/10.1256/qj.05.235.

    • Search Google Scholar
    • Export Citation
  • Gu, S., D. Chen, J. Bao, F. Wen, B. Zhang, D. Chen, L. Yuan, and B. Guo, 2022: Vector quantized diffusion model for text-to-image synthesis. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, New Orleans, LA, Institute of Electrical and Electronics Engineers, 10 69610 706, https://openaccess.thecvf.com/content/CVPR2022/html/Gu_Vector_Quantized_Diffusion_Model_for_Text-to-Image_Synthesis_CVPR_2022_paper.html.

    • Search Google Scholar
    • Export Citation
  • Guan, H., and Coauthors, 2022: GEFSv12 reforecast dataset for supporting subseasonal and hydrometeorological applications. Mon. Wea. Rev., 150, 647665, https://doi.org/10.1175/MWR-D-21-0245.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 29052923, https://doi.org/10.1256/qj.06.25.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 32093229, https://doi.org/10.1175/MWR3237.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and M. Scheuerer, 2018: Probabilistic precipitation forecast postprocessing using quantile mapping and rank-weighted best-member dressing. Mon. Wea. Rev., 146, 40794098, https://doi.org/10.1175/MWR-D-18-0147.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., M. Scheuerer, and G. T. Bates, 2015: Analog probabilistic precipitation forecasts using GEFS reforecasts and climatology-calibrated precipitation analyses. Mon. Wea. Rev., 143, 33003309, https://doi.org/10.1175/MWR-D-15-0004.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Coauthors, 2022: The reanalysis for the Global Ensemble Forecast System, version 12. Mon. Wea. Rev., 150, 5979, https://doi.org/10.1175/MWR-D-21-0023.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., D. R. Stovern, and L. L. Smith, 2023: Improving National Blend of Models probabilistic precipitation forecasts using long time series of reforecasts and precipitation reanalyses. Part I: Methods. Mon. Wea. Rev., 151, 15211534, https://doi.org/10.1175/MWR-D-22-0308.1.

    • Search Google Scholar
    • Export Citation
  • Hendrycks, D., and K. Gimpel, 2016: Gaussian Error Linear Units (GELUs). arXiv, 1606.08415v5, https://doi.org/10.48550/arXiv.1606.08415.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2016: Extreme precipitation in models: An evaluation. Wea. Forecasting, 31, 18531879, https://doi.org/10.1175/WAF-D-16-0093.1.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018: Money doesn’t grow on trees, but forecasts do: Forecasting extreme precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Search Google Scholar
    • Export Citation
  • Hill, C. M., P. J. Fitzpatrick, J. H. Corbin, Y. H. Lau, and S. K. Bhate, 2010: Summertime precipitation regimes associated with the sea breeze and land breeze in southern Mississippi and eastern Louisiana. Wea. Forecasting, 25, 17551779, https://doi.org/10.1175/2010WAF2222340.1.

    • Search Google Scholar
    • Export Citation
  • Ho, J., A. Jain, and P. Abbeel, 2020: Denoising diffusion probabilistic models. NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Curran Associates Inc., 68406851, https://dl.acm.org/doi/abs/10.5555/3495724.3496298.

    • Search Google Scholar
    • Export Citation
  • Hou, D., and Coauthors, 2014: Climatology-Calibrated Precipitation Analysis at fine scales: Statistical adjustment of Stage IV toward CPC gauge-based analysis. J. Hydrometeor., 15, 25422557, https://doi.org/10.1175/JHM-D-11-0140.1.

    • Search Google Scholar
    • Export Citation
  • Hsu, W.-r., and A. H. Murphy, 1986: The attributes diagram a geometrical framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2, 285293, https://doi.org/10.1016/0169-2070(86)90048-8.

    • Search Google Scholar
    • Export Citation
  • Hu, M., Y. Wang, T.-J. Cham, J. Yang, and P. N. Suganthan, 2022: Global context with discrete diffusion in vector quantised modelling for image generation. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, New Orleans, LA, Institute of Electrical and Electronics Engineers, 11 50211 511, https://doi.org/10.1109/CVPR52688.2022.01121.

    • Search Google Scholar
    • Export Citation
  • Huang, H., H. Cui, and Q. Ge, 2021: Assessment of potential risks induced by increasing extreme precipitation under climate change. Nat. Hazards, 108, 20592079, https://doi.org/10.1007/s11069-021-04768-9.

    • Search Google Scholar
    • Export Citation
  • Ioffe, S., and C. Szegedy, 2015: Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML’15: Proc. 32nd Int. Conf. on Int. Conf. on Machine Learning, Vol. 37, Lille, France, JMLR.org, 448456, https://dl.acm.org/doi/10.5555/3045118.3045167.

    • Search Google Scholar
    • Export Citation
  • Jeworrek, J., G. West, and R. Stull, 2021: WRF precipitation performance and predictability for systematically varied parameterizations over complex terrain. Wea. Forecasting, 36, 893913, https://doi.org/10.1175/WAF-D-20-0195.1.

    • Search Google Scholar
    • Export Citation
  • Jiang, X., N.-C. Lau, and S. A. Klein, 2006: Role of eastward propagating convection systems in the diurnal cycle and seasonal mean of summertime rainfall over the U.S. Great Plains. Geophys. Res. Lett., 33, L19809, https://doi.org/10.1029/2006GL027022.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., and X. Wang, 2017: Design and implementation of a GSI-based convection-allowing ensemble data assimilation and forecast system for the pecan field experiment. Part I: Optimal configurations for nocturnal convection prediction. Wea. Forecasting, 32, 289315, https://doi.org/10.1175/WAF-D-16-0102.1.

    • Search Google Scholar
    • Export Citation
  • Lai, C., X. Chen, Z. Wang, H. Yu, and X. Bai, 2020: Flood risk assessment and regionalization from past and future perspectives at basin scale. Risk Anal., 40, 13991417, https://doi.org/10.1111/risa.13493.

    • Search Google Scholar
    • Export Citation
  • Leinonen, J., U. Hamann, D. Nerini, U. Germann, and G. Franch, 2023: Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification. arXiv, 2304.12891v1, https://doi.org/10.48550/arXiv.2304.12891.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., 2019: Ensemble size: How suboptimal is less than infinity? Quart. J. Roy. Meteor. Soc., 145, 107128, https://doi.org/10.1002/qj.3387.

    • Search Google Scholar
    • Export Citation
  • Li, L., R. Carver, I. Lopez-Gomez, F. Sha, and J. Anderson, 2024: Generative emulation of weather forecast ensembles with diffusion models. Sci. Adv., 10, eadk4489, https://doi.org/10.1126/sciadv.adk4489.

    • Search Google Scholar
    • Export Citation
  • Li, W., B. Pan, J. Xia, and Q. Duan, 2022: Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts. J. Hydrol., 605, 127301, https://doi.org/10.1016/j.jhydrol.2021.127301.

    • Search Google Scholar
    • Export Citation
  • Ling, F., and Coauthors, 2024: Diffusion model-based probabilistic downscaling for 180-year East Asian climate reconstruction. npj Climate Atmos. Sci., 7, 131, https://doi.org/10.1038/s41612-024-00679-1.

    • Search Google Scholar
    • Export Citation
  • Liu, Z., Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, 2021: Swin transformer: Hierarchical vision transformer using shifted windows. 2021 Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 10 01210 022, https://doi.org/10.1109/ICCV48922.2021.00986.

    • Search Google Scholar
    • Export Citation
  • Mardani, M., and Coauthors, 2023: Generative residual diffusion modeling for km-scale atmospheric downscaling. arXiv, 2309.15214v1, https://doi.org/10.48550/arXiv.2309.15214.

    • Search Google Scholar
    • Export Citation
  • Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF Ensemble Prediction System: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73119, https://doi.org/10.1002/qj.49712252905.

    • Search Google Scholar
    • Export Citation
  • Monache, L. D., F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight, 2013: Probabilistic weather prediction with an analog ensemble. Mon. Weather Rev, 141, 34983516, https://doi.org/10.1175/MWR-D-12-00281.1.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600, https://doi.org/10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Nair, U. S., M. R. Hjelmfelt, and R. A. Pielke, 1997: Numerical simulation of the 9–10 June 1972 Black Hills Storm using CSU RAMS. Mon. Wea. Rev., 125, 17531766, https://doi.org/10.1175/1520-0493(1997)125<1753:NSOTJB>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Pörtner, H. O., and Coauthors, 2022: Climate Change 2022: Impacts, Adaptation and Vulnerability. Cambridge University Press, 3056 pp.

  • Price, I., and Coauthors, 2023: GenCast: Diffusion-based ensemble forecasting for medium-range weather. arXiv, 2312.15796v2, https://doi.org/10.48550/arXiv.2312.15796.

    • Search Google Scholar
    • Export Citation
  • Ravuri, S., and Coauthors, 2021: Skilful precipitation nowcasting using deep generative models of radar. Nature, 597, 672677, https://doi.org/10.1038/s41586-021-03854-z.

    • Search Google Scholar
    • Export Citation
  • Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, N. Navab et al., Eds., Lecture Notes in Computer Science, Vol. 9351, Springer, 234241.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. arXiv, 1302.7149v2, https://doi.org/10.48550/arXiv.1302.7149.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 45784596, https://doi.org/10.1175/MWR-D-15-0061.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020a: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature. J. Appl. Meteor. Climatol., 59, 20572073, https://doi.org/10.1175/JAMC-D-20-0057.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020b: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part II: Daily precipitation. J. Appl. Meteor. Climatol., 59, 20752092, https://doi.org/10.1175/JAMC-D-20-0058.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2022: A hybrid analog-ensemble–convolutional-neural-network method for postprocessing precipitation forecasts. Mon. Wea. Rev., 150, 14951515, https://doi.org/10.1175/MWR-D-21-0154.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., R. A. Sobash, and D. J. Gagne, 2024: Generative ensemble deep learning severe weather prediction from a deterministic convection-allowing model. Artif. Intell. Earth Syst., 3, e230094, https://doi.org/10.1175/AIES-D-23-0094.1.

    • Search Google Scholar
    • Export Citation
  • Shepherd, J. M., A. Grundstein, and T. L. Mote, 2007: Quantifying the contribution of tropical cyclones to extreme rainfall along the coastal southeastern United States. Geophys. Res. Lett., 34, L23810, https://doi.org/10.1029/2007GL031694.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M. L., A. E. Raftery, T. Gneiting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, https://doi.org/10.1175/MWR3441.1.

    • Search Google Scholar
    • Export Citation
  • Smith, J. A., M. L. Baeck, Y. Zhang, and C. A. Doswell, 2001: Extreme rainfall and flooding from supercell thunderstorms. J. Hydrometeor., 2, 469489, https://doi.org/10.1175/1525-7541(2001)002<0469:ERAFFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, 2014: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15, 19291958.

    • Search Google Scholar
    • Export Citation
  • Stensrud, D. J., 2009: Parameterization Schemes: Keys To Understanding Numerical Weather Prediction Models. Cambridge University Press, 480 pp.

    • Search Google Scholar
    • Export Citation
  • Stovern, D. R., T. M. Hamill, and L. L. Smith, 2023: Improving National Blend of Models probabilistic precipitation forecasts using long time series of reforecasts and precipitation reanalyses. Part II: Results. Mon. Wea. Rev., 151, 15351550, https://doi.org/10.1175/MWR-D-22-0310.1.

    • Search Google Scholar
    • Export Citation
  • Strauch, M., C. Bernhofer, S. Koide, M. Volk, C. Lorz, and F. Makeschin, 2012: Using precipitation data ensemble for uncertainty analysis in swat streamflow simulation. J. Hydrol., 414–415, 413424, https://doi.org/10.1016/j.jhydrol.2011.11.014.

    • Search Google Scholar
    • Export Citation
  • Sukovich, E. M., F. M. Ralph, F. E. Barthold, D. W. Reynolds, and D. R. Novak, 2014: Extreme quantitative precipitation forecast performance at the Weather Prediction Center from 2001 to 2011. Wea. Forecasting, 29, 894911, https://doi.org/10.1175/WAF-D-13-00061.1.

    • Search Google Scholar
    • Export Citation
  • Sun, C., and X.-Z. Liang, 2020: Improving US extreme precipitation simulation: Sensitivity to physics parameterizations. Climate Dyn., 54, 48914918, https://doi.org/10.1007/s00382-020-05267-6.

    • Search Google Scholar
    • Export Citation
  • Tian, B., I. M. Held, N.-C. Lau, and B. J. Soden, 2005: Diurnal cycle of summertime deep convection over North America: A satellite perspective. J. Geophys. Res., 110, D08108, https://doi.org/10.1029/2004JD005275.

    • Search Google Scholar
    • Export Citation
  • van den Oord, A., O. Vinyals, and K. Kavukcuoglu, 2017: Neural discrete representation learning. 31st Conference on Neural Information Processing Systems (NIPS 2017), Curran Associates Inc., https://proceedings.neurips.cc/paper_files/paper/2017/file/7a98af17e63a0ac09ce2e96d03992fbc-Paper.pdf.

    • Search Google Scholar
    • Export Citation
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Proceedings of the. 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Curran Associates Inc., 59986008, https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

    • Search Google Scholar
    • Export Citation
  • Vaswani, A., P. Ramachandran, A. Srinivas, N. Parmar, B. Hechtman, and J. Shlens, 2021: Scaling local self-attention for parameter efficient visual backbones. 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, Institute of Electrical and Electronics Engineers, 12 89412 904, https://doi.org/10.1109/CVPR46437.2021.01270.

    • Search Google Scholar
    • Export Citation
  • Wang, F., D. Tian, L. Lowe, L. Kalin, and J. Lehrter, 2021: Deep learning for daily precipitation and temperature downscaling. Water Resour. Res., 57, e2020WR029308, https://doi.org/10.1029/2020WR029308.

    • Search Google Scholar
    • Export Citation
  • Wang, T., and G. Tang, 2020: Spatial variability and linkage between extreme convections and extreme precipitation revealed by 22-year space-borne precipitation radar data. Geophys. Res. Lett., 47, e2020GL088437, https://doi.org/10.1029/2020GL088437.

    • Search Google Scholar
    • Export Citation
  • Wehner, M. F., R. L. Smith, G. Bala, and P. Duffy, 2010: The effect of horizontal resolution on simulation of very extreme us precipitation events in a global atmosphere model. Climate Dyn., 34, 241247, https://doi.org/10.1007/s00382-009-0656-y.

    • Search Google Scholar
    • Export Citation
  • Wilcox, E. M., and L. J. Donner, 2007: The frequency of extreme rain events in satellite rain-rate estimates and an atmospheric general circulation model. J. Climate, 20, 5369, https://doi.org/10.1175/JCLI3987.1.

    • Search Google Scholar
    • Export Citation
  • Yang, L., and Coauthors, 2023: Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv., 56, 105, https://doi.org/10.1145/3626235.

    • Search Google Scholar
    • Export Citation
  • Yang, R., and S. Mandt, 2024: Lossy image compression with conditional diffusion models. NIPS ’23: Proceedings of the 37th International Conference on Neural Information Processing Systems, Curran Associates Inc., 64 97164 995, https://dl.acm.org/doi/10.5555/3666122.3668957.

    • Search Google Scholar
    • Export Citation
  • Zhang, Y., M. Long, K. Chen, L. Xing, R. Jin, M. I. Jordan, and J. Wang, 2023: Skilful nowcasting of extreme precipitation with NowcastNet. Nature, 619, 526532, https://doi.org/10.1038/s41586-023-06184-4.

    • Search Google Scholar
    • Export Citation
  • Zhong, X., L. Chen, J. Liu, C. Lin, Y. Qi, and H. Li, 2023: FuXi-extreme: Improving extreme rainfall and wind forecasts with diffusion model. arXiv, 2310.19822v1, https://doi.org/10.48550/arXiv.2310.19822.

    • Search Google Scholar
    • Export Citation
  • Zhou, X., Y. Zhu, D. Hou, Y. Luo, J. Peng, and R. Wobus, 2017: Performance of the new NCEP Global Ensemble Forecast System in a parallel experiment. Wea. Forecasting, 32, 19892004, https://doi.org/10.1175/WAF-D-17-0023.1.

    • Search Google Scholar
    • Export Citation
  • Zhou, X., and Coauthors, 2022: The development of the NCEP Global Ensemble Forecast System version 12. Wea. Forecasting, 37, 10691084, https://doi.org/10.1175/WAF-D-21-0112.1.

    • Search Google Scholar
    • Export Citation

Supplementary Materials

Save
  • Arnab, A., M. Dehghani, G. Heigold, C. Sun, M. Lučić, and C. Schmid, 2021: ViVit: A video vision transformer. 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 68366846, https://doi.org/10.1109/ICCV48922.2021.00676.

    • Search Google Scholar
    • Export Citation
  • Asperti, A., F. Merizzi, A. Paparella, G. Pedrazzi, M. Angelinelli, and S. Colamonaco, 2023: Precipitation nowcasting with generative diffusion models. arXiv, 2308.06733v2, https://doi.org/10.48550/arXiv.2308.06733.

    • Search Google Scholar
    • Export Citation
  • Ba, J. L., J. R. Kiros, and G. E. Hinton, 2016: Layer normalization. arXiv, 1607.06450v1, https://doi.org/10.48550/arXiv.1607.06450.

  • Baran, S., and D. Nemoda, 2016: Censored and shifted gamma distribution based EMOS model for probabilistic quantitative precipitation forecasting. Environmetrics, 27, 280292, https://doi.org/10.1002/env.2391.

    • Search Google Scholar
    • Export Citation
  • Bassetti, S., B. Hutchinson, C. Tebaldi, and B. Kravitz, 2024: Diffesm: Conditional emulation of temperature and precipitation in Earth system models with 3D diffusion models. J. Adv. Model. Earth Syst., 16, e2023MS004194, https://doi.org/10.1029/2023MS004194.

    • Search Google Scholar
    • Export Citation
  • Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 4755, https://doi.org/10.1038/nature14956.

    • Search Google Scholar
    • Export Citation
  • Bevacqua, E., L. Suarez-Gutierrez, A. Jézéquel, F. Lehner, M. Vrac, P. Yiou, and J. Zscheischler, 2023: Advancing research on compound weather and climate events via large ensemble model simulations. Nat. Commun., 14, 2145, https://doi.org/10.1038/s41467-023-37847-5.

    • Search Google Scholar
    • Export Citation
  • Blake, B. T., D. B. Parsons, K. R. Haghi, and S. G. Castleberry, 2017: The structure, evolution, and dynamics of a nocturnal convective system simulated using the WRF-ARW model. Mon. Wea. Rev., 145, 31793201, https://doi.org/10.1175/MWR-D-16-0360.1.

    • Search Google Scholar
    • Export Citation
  • Bodri, L., and V. Čermák, 2000: Prediction of extreme precipitation using a neural network: Application to summer flood occurrence in Moravia. Adv. Eng. Software, 31, 311321, https://doi.org/10.1016/S0965-9978(99)00063-0.

    • Search Google Scholar
    • Export Citation
  • Bond-Taylor, S., A. Leach, Y. Long, and C. G. Willcocks, 2022: Deep generative modelling: A comparative review of VAEs, GANs, normalizing flows, energy-based and autoregressive models. IEEE Trans. Pattern Anal. Mach. Intell., 44, 73277347, https://doi.org/10.1109/TPAMI.2021.3116668.

    • Search Google Scholar
    • Export Citation
  • Bremnes, J. B., 2004: Probabilistic forecasts of precipitation in terms of quantiles using NWP model output. Mon. Wea. Rev., 132, 338347, https://doi.org/10.1175/1520-0493(2004)132%3C0338:PFOPIT%3E2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., J. Du, Z. Toth, and D. Hou, 2019: Major operational Ensemble Prediction Systems (EPS) and the future of EPS. Handbook of Hydrometeorological Ensemble Forecasting, Q. Duan et al., Eds., Springer, 151193.

    • Search Google Scholar
    • Export Citation
  • Caracena, F., R. A. Maddox, L. R. Hoxit, and C. F. Chappell, 1979: Mesoanalysis of The Big Thompson Storm. Mon. Wea. Rev., 107 (1), 117, https://doi.org/10.1175/1520-0493(1979)107<0001:MOTBTS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Chen, L., X. Zhong, F. Zhang, Y. Cheng, Y. Xu, Y. Qi, and H. Li, 2023: FuXi: A cascade machine learning forecasting system for 15-day global weather forecast. npj Climate Atmos. Sci., 6, 190, https://doi.org/10.1038/s41612-023-00512-1.

    • Search Google Scholar
    • Export Citation
  • Çiçek, Ö., A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, 2016: 3D U-Net: Learning dense volumetric segmentation from sparse annotation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2016, S. Ourselin et al., Eds., Lecture Notes in Computer Science, Vol. 9901, Springer, 424432.

    • Search Google Scholar
    • Export Citation
  • Cohen, M., G. Quispe, S. L. Corff, C. Ollion, and E. Moulines, 2022: Diffusion bridges vector quantized variational autoencoders. arXiv, 2202.04895v2, https://doi.org/10.48550/arXiv.2202.04895.

    • Search Google Scholar
    • Export Citation
  • Creswell, A., T. White, V. Dumoulin, K. Arulkumaran, B. Sengupta, and A. A. Bharath, 2018: Generative adversarial networks: An overview. IEEE Signal Process. Mag., 35, 5365, https://doi.org/10.1109/MSP.2017.2765202.

    • Search Google Scholar
    • Export Citation
  • Dhariwal, P., and A. Nichol, 2021: Diffusion models beat GANs on image synthesis. NIPS’21: Proceedings of the 35th International Conference on Neural Information Processing Systems, Curran Associates Inc., 87808794, https://dl.acm.org/doi/10.5555/3540261.3540933.

    • Search Google Scholar
    • Export Citation
  • Dosovitskiy, A., and Coauthors, 2020: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv, 2010.11929v2, https://doi.org/10.48550/arXiv.2010.11929.

    • Search Google Scholar
    • Export Citation
  • Gao, Z., and Coauthors, 2023: Prediff: Precipitation nowcasting with latent diffusion models. Advances in Neural Information Processing Systems 36, A. Oh et al., Eds., Neural Information Processing Systems Foundation, 78621–78656, https://proceedings.neurips.cc/paper_files/paper/2023/file/f82ba6a6b981fbbecf5f2ee5de7db39c-Paper-Conference.pdf.

    • Search Google Scholar
    • Export Citation
  • Ghazvinian, M., and Coauthors, 2024: Deep learning of a 200-member ensemble with a limited historical training to improve the prediction of extreme precipitation events. Mon. Wea. Rev., 152, 15871605, https://doi.org/10.1175/MWR-D-23-0277.1.

    • Search Google Scholar
    • Export Citation
  • Grimit, E. P., T. Gneiting, V. J. Berrocal, and N. A. Johnson, 2006: The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Quart. J. Roy. Meteor. Soc., 132, 29252942, https://doi.org/10.1256/qj.05.235.

    • Search Google Scholar
    • Export Citation
  • Gu, S., D. Chen, J. Bao, F. Wen, B. Zhang, D. Chen, L. Yuan, and B. Guo, 2022: Vector quantized diffusion model for text-to-image synthesis. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, New Orleans, LA, Institute of Electrical and Electronics Engineers, 10 69610 706, https://openaccess.thecvf.com/content/CVPR2022/html/Gu_Vector_Quantized_Diffusion_Model_for_Text-to-Image_Synthesis_CVPR_2022_paper.html.

    • Search Google Scholar
    • Export Citation
  • Guan, H., and Coauthors, 2022: GEFSv12 reforecast dataset for supporting subseasonal and hydrometeorological applications. Mon. Wea. Rev., 150, 647665, https://doi.org/10.1175/MWR-D-21-0245.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. Juras, 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 29052923, https://doi.org/10.1256/qj.06.25.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 32093229, https://doi.org/10.1175/MWR3237.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and M. Scheuerer, 2018: Probabilistic precipitation forecast postprocessing using quantile mapping and rank-weighted best-member dressing. Mon. Wea. Rev., 146, 40794098, https://doi.org/10.1175/MWR-D-18-0147.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., M. Scheuerer, and G. T. Bates, 2015: Analog probabilistic precipitation forecasts using GEFS reforecasts and climatology-calibrated precipitation analyses. Mon. Wea. Rev., 143, 33003309, https://doi.org/10.1175/MWR-D-15-0004.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Coauthors, 2022: The reanalysis for the Global Ensemble Forecast System, version 12. Mon. Wea. Rev., 150, 5979, https://doi.org/10.1175/MWR-D-21-0023.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., D. R. Stovern, and L. L. Smith, 2023: Improving National Blend of Models probabilistic precipitation forecasts using long time series of reforecasts and precipitation reanalyses. Part I: Methods. Mon. Wea. Rev., 151, 15211534, https://doi.org/10.1175/MWR-D-22-0308.1.

    • Search Google Scholar
    • Export Citation
  • Hendrycks, D., and K. Gimpel, 2016: Gaussian Error Linear Units (GELUs). arXiv, 1606.08415v5, https://doi.org/10.48550/arXiv.1606.08415.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2016: Extreme precipitation in models: An evaluation. Wea. Forecasting, 31, 18531879, https://doi.org/10.1175/WAF-D-16-0093.1.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018: Money doesn’t grow on trees, but forecasts do: Forecasting extreme precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Search Google Scholar
    • Export Citation
  • Hill, C. M., P. J. Fitzpatrick, J. H. Corbin, Y. H. Lau, and S. K. Bhate, 2010: Summertime precipitation regimes associated with the sea breeze and land breeze in southern Mississippi and eastern Louisiana. Wea. Forecasting, 25, 17551779, https://doi.org/10.1175/2010WAF2222340.1.

    • Search Google Scholar
    • Export Citation
  • Ho, J., A. Jain, and P. Abbeel, 2020: Denoising diffusion probabilistic models. NIPS’20: Proceedings of the 34th International Conference on Neural Information Processing Systems, Curran Associates Inc., 68406851, https://dl.acm.org/doi/abs/10.5555/3495724.3496298.

    • Search Google Scholar
    • Export Citation
  • Hou, D., and Coauthors, 2014: Climatology-Calibrated Precipitation Analysis at fine scales: Statistical adjustment of Stage IV toward CPC gauge-based analysis. J. Hydrometeor., 15, 25422557, https://doi.org/10.1175/JHM-D-11-0140.1.

    • Search Google Scholar
    • Export Citation
  • Hsu, W.-r., and A. H. Murphy, 1986: The attributes diagram a geometrical framework for assessing the quality of probability forecasts. Int. J. Forecasting, 2, 285293, https://doi.org/10.1016/0169-2070(86)90048-8.

    • Search Google Scholar
    • Export Citation
  • Hu, M., Y. Wang, T.-J. Cham, J. Yang, and P. N. Suganthan, 2022: Global context with discrete diffusion in vector quantised modelling for image generation. Proc. IEEE/CVF Conf. on Computer Vision and Pattern Recognition, New Orleans, LA, Institute of Electrical and Electronics Engineers, 11 50211 511, https://doi.org/10.1109/CVPR52688.2022.01121.

    • Search Google Scholar
    • Export Citation
  • Huang, H., H. Cui, and Q. Ge, 2021: Assessment of potential risks induced by increasing extreme precipitation under climate change. Nat. Hazards, 108, 20592079, https://doi.org/10.1007/s11069-021-04768-9.

    • Search Google Scholar
    • Export Citation
  • Ioffe, S., and C. Szegedy, 2015: Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML’15: Proc. 32nd Int. Conf. on Int. Conf. on Machine Learning, Vol. 37, Lille, France, JMLR.org, 448456, https://dl.acm.org/doi/10.5555/3045118.3045167.

    • Search Google Scholar
    • Export Citation
  • Jeworrek, J., G. West, and R. Stull, 2021: WRF precipitation performance and predictability for systematically varied parameterizations over complex terrain. Wea. Forecasting, 36, 893913, https://doi.org/10.1175/WAF-D-20-0195.1.

    • Search Google Scholar
    • Export Citation
  • Jiang, X., N.-C. Lau, and S. A. Klein, 2006: Role of eastward propagating convection systems in the diurnal cycle and seasonal mean of summertime rainfall over the U.S. Great Plains. Geophys. Res. Lett., 33, L19809, https://doi.org/10.1029/2006GL027022.

    • Search Google Scholar
    • Export Citation
  • Johnson, A., and X. Wang, 2017: Design and implementation of a GSI-based convection-allowing ensemble data assimilation and forecast system for the pecan field experiment. Part I: Optimal configurations for nocturnal convection prediction. Wea. Forecasting, 32, 289315, https://doi.org/10.1175/WAF-D-16-0102.1.

    • Search Google Scholar
    • Export Citation
  • Lai, C., X. Chen, Z. Wang, H. Yu, and X. Bai, 2020: Flood risk assessment and regionalization from past and future perspectives at basin scale. Risk Anal., 40, 13991417, https://doi.org/10.1111/risa.13493.

    • Search Google Scholar
    • Export Citation
  • Leinonen, J., U. Hamann, D. Nerini, U. Germann, and G. Franch, 2023: Latent diffusion models for generative precipitation nowcasting with accurate uncertainty quantification. arXiv, 2304.12891v1, https://doi.org/10.48550/arXiv.2304.12891.

    • Search Google Scholar
    • Export Citation
  • Leutbecher, M., 2019: Ensemble size: How suboptimal is less than infinity? Quart. J. Roy. Meteor. Soc., 145, 107128, https://doi.org/10.1002/qj.3387.

    • Search Google Scholar
    • Export Citation
  • Li, L., R. Carver, I. Lopez-Gomez, F. Sha, and J. Anderson, 2024: Generative emulation of weather forecast ensembles with diffusion models. Sci. Adv., 10, eadk4489, https://doi.org/10.1126/sciadv.adk4489.

    • Search Google Scholar
    • Export Citation
  • Li, W., B. Pan, J. Xia, and Q. Duan, 2022: Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts. J. Hydrol., 605, 127301, https://doi.org/10.1016/j.jhydrol.2021.127301.

    • Search Google Scholar
    • Export Citation
  • Ling, F., and Coauthors, 2024: Diffusion model-based probabilistic downscaling for 180-year East Asian climate reconstruction. npj Climate Atmos. Sci., 7, 131, https://doi.org/10.1038/s41612-024-00679-1.

    • Search Google Scholar
    • Export Citation
  • Liu, Z., Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, 2021: Swin transformer: Hierarchical vision transformer using shifted windows. 2021 Proc. IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 10 01210 022, https://doi.org/10.1109/ICCV48922.2021.00986.

    • Search Google Scholar
    • Export Citation
  • Mardani, M., and Coauthors, 2023: Generative residual diffusion modeling for km-scale atmospheric downscaling. arXiv, 2309.15214v1, https://doi.org/10.48550/arXiv.2309.15214.

    • Search Google Scholar
    • Export Citation
  • Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF Ensemble Prediction System: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73119, https://doi.org/10.1002/qj.49712252905.

    • Search Google Scholar
    • Export Citation
  • Monache, L. D., F. A. Eckel, D. L. Rife, B. Nagarajan, and K. Searight, 2013: Probabilistic weather prediction with an analog ensemble. Mon. Weather Rev, 141, 34983516, https://doi.org/10.1175/MWR-D-12-00281.1.

    • Search Google Scholar
    • Export Citation
  • Murphy, A. H., 1973: A new vector partition of the probability score. J. Appl. Meteor., 12, 595600, https://doi.org/10.1175/1520-0450(1973)012<0595:ANVPOT>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Nair, U. S., M. R. Hjelmfelt, and R. A. Pielke, 1997: Numerical simulation of the 9–10 June 1972 Black Hills Storm using CSU RAMS. Mon. Wea. Rev., 125, 17531766, https://doi.org/10.1175/1520-0493(1997)125<1753:NSOTJB>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Pörtner, H. O., and Coauthors, 2022: Climate Change 2022: Impacts, Adaptation and Vulnerability. Cambridge University Press, 3056 pp.

  • Price, I., and Coauthors, 2023: GenCast: Diffusion-based ensemble forecasting for medium-range weather. arXiv, 2312.15796v2, https://doi.org/10.48550/arXiv.2312.15796.

    • Search Google Scholar
    • Export Citation
  • Ravuri, S., and Coauthors, 2021: Skilful precipitation nowcasting using deep generative models of radar. Nature, 597, 672677, https://doi.org/10.1038/s41586-021-03854-z.

    • Search Google Scholar
    • Export Citation
  • Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, N. Navab et al., Eds., Lecture Notes in Computer Science, Vol. 9351, Springer, 234241.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., T. L. Thorarinsdottir, and T. Gneiting, 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. arXiv, 1302.7149v2, https://doi.org/10.48550/arXiv.1302.7149.

    • Search Google Scholar
    • Export Citation
  • Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 45784596, https://doi.org/10.1175/MWR-D-15-0061.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020a: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part I: Daily maximum and minimum 2-m temperature. J. Appl. Meteor. Climatol., 59, 20572073, https://doi.org/10.1175/JAMC-D-20-0057.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2020b: Deep-learning-based gridded downscaling of surface meteorological variables in complex terrain. Part II: Daily precipitation. J. Appl. Meteor. Climatol., 59, 20752092, https://doi.org/10.1175/JAMC-D-20-0058.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., D. J. Gagne II, G. West, and R. Stull, 2022: A hybrid analog-ensemble–convolutional-neural-network method for postprocessing precipitation forecasts. Mon. Wea. Rev., 150, 14951515, https://doi.org/10.1175/MWR-D-21-0154.1.

    • Search Google Scholar
    • Export Citation
  • Sha, Y., R. A. Sobash, and D. J. Gagne, 2024: Generative ensemble deep learning severe weather prediction from a deterministic convection-allowing model. Artif. Intell. Earth Syst., 3, e230094, https://doi.org/10.1175/AIES-D-23-0094.1.

    • Search Google Scholar
    • Export Citation
  • Shepherd, J. M., A. Grundstein, and T. L. Mote, 2007: Quantifying the contribution of tropical cyclones to extreme rainfall along the coastal southeastern United States. Geophys. Res. Lett., 34, L23810, https://doi.org/10.1029/2007GL031694.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M. L., A. E. Raftery, T. Gneiting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, https://doi.org/10.1175/MWR3441.1.

    • Search Google Scholar
    • Export Citation
  • Smith, J. A., M. L. Baeck, Y. Zhang, and C. A. Doswell, 2001: Extreme rainfall and flooding from supercell thunderstorms. J. Hydrometeor., 2, 469489, https://doi.org/10.1175/1525-7541(2001)002<0469:ERAFFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, 2014: Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res., 15, 19291958.

    • Search Google Scholar
    • Export Citation
  • Stensrud, D. J., 2009: Parameterization Schemes: Keys To Understanding Numerical Weather Prediction Models. Cambridge University Press, 480 pp.

    • Search Google Scholar
    • Export Citation
  • Stovern, D. R., T. M. Hamill, and L. L. Smith, 2023: Improving National Blend of Models probabilistic precipitation forecasts using long time series of reforecasts and precipitation reanalyses. Part II: Results. Mon. Wea. Rev., 151, 15351550, https://doi.org/10.1175/MWR-D-22-0310.1.

    • Search Google Scholar
    • Export Citation
  • Strauch, M., C. Bernhofer, S. Koide, M. Volk, C. Lorz, and F. Makeschin, 2012: Using precipitation data ensemble for uncertainty analysis in swat streamflow simulation. J. Hydrol., 414–415, 413424, https://doi.org/10.1016/j.jhydrol.2011.11.014.

    • Search Google Scholar
    • Export Citation
  • Sukovich, E. M., F. M. Ralph, F. E. Barthold, D. W. Reynolds, and D. R. Novak, 2014: Extreme quantitative precipitation forecast performance at the Weather Prediction Center from 2001 to 2011. Wea. Forecasting, 29, 894911, https://doi.org/10.1175/WAF-D-13-00061.1.

    • Search Google Scholar
    • Export Citation
  • Sun, C., and X.-Z. Liang, 2020: Improving US extreme precipitation simulation: Sensitivity to physics parameterizations. Climate Dyn., 54, 48914918, https://doi.org/10.1007/s00382-020-05267-6.

    • Search Google Scholar
    • Export Citation
  • Tian, B., I. M. Held, N.-C. Lau, and B. J. Soden, 2005: Diurnal cycle of summertime deep convection over North America: A satellite perspective. J. Geophys. Res., 110, D08108, https://doi.org/10.1029/2004JD005275.

    • Search Google Scholar
    • Export Citation
  • van den Oord, A., O. Vinyals, and K. Kavukcuoglu, 2017: Neural discrete representation learning. 31st Conference on Neural Information Processing Systems (NIPS 2017), Curran Associates Inc., https://proceedings.neurips.cc/paper_files/paper/2017/file/7a98af17e63a0ac09ce2e96d03992fbc-Paper.pdf.

    • Search Google Scholar
    • Export Citation
  • Vaswani, A., N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, 2017: Attention is all you need. Proceedings of the. 31st International Conference on Neural Information Processing Systems (NeurIPS 2017), Curran Associates Inc., 59986008, https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

    • Search Google Scholar
    • Export Citation
  • Vaswani, A., P. Ramachandran, A. Srinivas, N. Parmar, B. Hechtman, and J. Shlens, 2021: Scaling local self-attention for parameter efficient visual backbones. 2021 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, Institute of Electrical and Electronics Engineers, 12 89412 904, https://doi.org/10.1109/CVPR46437.2021.01270.

    • Search Google Scholar
    • Export Citation
  • Wang, F., D. Tian, L. Lowe, L. Kalin, and J. Lehrter, 2021: Deep learning for daily precipitation and temperature downscaling. Water Resour. Res., 57, e2020WR029308, https://doi.org/10.1029/2020WR029308.

    • Search Google Scholar
    • Export Citation
  • Wang, T., and G. Tang, 2020: Spatial variability and linkage between extreme convections and extreme precipitation revealed by 22-year space-borne precipitation radar data. Geophys. Res. Lett., 47, e2020GL088437, https://doi.org/10.1029/2020GL088437.

    • Search Google Scholar
    • Export Citation
  • Wehner, M. F., R. L. Smith, G. Bala, and P. Duffy, 2010: The effect of horizontal resolution on simulation of very extreme us precipitation events in a global atmosphere model. Climate Dyn., 34, 241247, https://doi.org/10.1007/s00382-009-0656-y.

    • Search Google Scholar
    • Export Citation
  • Wilcox, E. M., and L. J. Donner, 2007: The frequency of extreme rain events in satellite rain-rate estimates and an atmospheric general circulation model. J. Climate, 20, 5369, https://doi.org/10.1175/JCLI3987.1.

    • Search Google Scholar
    • Export Citation
  • Yang, L., and Coauthors, 2023: Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv., 56, 105, https://doi.org/10.1145/3626235.

    • Search Google Scholar
    • Export Citation
  • Yang, R., and S. Mandt, 2024: Lossy image compression with conditional diffusion models. NIPS ’23: Proceedings of the 37th International Conference on Neural Information Processing Systems, Curran Associates Inc., 64 97164 995, https://dl.acm.org/doi/10.5555/3666122.3668957.

    • Search Google Scholar
    • Export Citation
  • Zhang, Y., M. Long, K. Chen, L. Xing, R. Jin, M. I. Jordan, and J. Wang, 2023: Skilful nowcasting of extreme precipitation with NowcastNet. Nature, 619, 526532, https://doi.org/10.1038/s41586-023-06184-4.

    • Search Google Scholar
    • Export Citation
  • Zhong, X., L. Chen, J. Liu, C. Lin, Y. Qi, and H. Li, 2023: FuXi-extreme: Improving extreme rainfall and wind forecasts with diffusion model. arXiv, 2310.19822v1, https://doi.org/10.48550/arXiv.2310.19822.

    • Search Google Scholar
    • Export Citation
  • Zhou, X., Y. Zhu, D. Hou, Y. Luo, J. Peng, and R. Wobus, 2017: Performance of the new NCEP Global Ensemble Forecast System in a parallel experiment. Wea. Forecasting, 32, 19892004, https://doi.org/10.1175/WAF-D-17-0023.1.

    • Search Google Scholar
    • Export Citation
  • Zhou, X., and Coauthors, 2022: The development of the NCEP Global Ensemble Forecast System version 12. Wea. Forecasting, 37, 10691084, https://doi.org/10.1175/WAF-D-21-0112.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    (a) The 0.125° grid spacing CONUS domain with shaded elevation. (b),(c) The 2002–19 climatology of gridpoint-wise 99th percentile values of 6-hourly precipitation rates for 0000–0600 and 1200–1800 UTC, respectively. (d) The 1986–2015 climatological probabilities of tornadoes, hail, and wind gusts derived from NOAA Storm Prediction Center (SPC) reports. (e),(f) The corresponding 2002–19 climatological percentiles of 40 mm (6 h)−1 threshold for 0000–0600 and 1200–1800 UTC, respectively. Hatched area in (e) and (f) means percentiles cannot be estimated as 40 mm (6 h)−1 is close to or larger than the historical maximum.

  • Fig. 2.

    (a) The general concept and (b) technical steps of ViT-LDM. Steps that solve the oversmoothness problem of the VAE-based latent diffusion are highlighted using a yellow background color.

  • Fig. 3.

    (a) The architecture of VQ-VAE with 2D convolutional layers, down- and upsampling layers, batch normalization (Ioffe and Szegedy 2015), Gaussian error linear unit (GELU; Hendrycks and Gimpel 2016) activation function, and dropout (Srivastava et al. 2014). (b) The schematics of the VQ layer and its dashed line arrows represent identical mapping. (c) The design of the residual block. A separate output section is highlighted using a yellow background color.

  • Fig. 4.

    (a) The architecture of the 3D ViT. The “C” indicates the number of channels. (b) The design of ViT blocks with layer normalization (Ba et al. 2016), multihead self-attention (Vaswani et al. 2017), GELU activation, and dropout. (c) The design of multihead self-attention. The “Q,” “K,” and “V” represent “query,” “key,” and “value,” respectively, which are three copies of the input tensor for self-attention computation (Vaswani et al. 2017).

  • Fig. 5.

    (a) The architecture of the 3D diffusion model. The C indicates the number of channels. (b) The design of residual block.

  • Fig. 6.

    An example of 48–54-h forecasts with extreme precipitation events on 0000–0600 UTC 1 Jan 2021. (a) Calibrated probabilities of precipitation rate > gridpoint-wise 99th percentile events. (b) As in (a), but for GEFS-Raw. (c) As in (a), but for AnEn-ECC. (d) The CCPA verification target. (e)–(i) Example of generative ensembles produced by ViT-LDM. Hatched areas represent where the 99th extreme events were (d) analyzed or (e)–(i) forecasted.

  • Fig. 7.

    Verification of ViT-LDM (red solid line), GEFS-Raw (blue dashed line), and AnEn-ECC (cyan dashed line) with CRPSSs(higher is better) in 2021. (a) Domainwise averaged CRPSS curves by forecast lead times. (b) The CRPSS differences between ViT-LDM and GEFS-Raw. (c) The CRPSS differences between ViT-LDM and AnEn-ECC. CRPSS curves in (a) were averaged from 100 bootstrapped replicates with error bars representing the 95% confidence intervals.

  • Fig. 8.

    Verifications of ViT-LDM (red bars), GEFS-Raw (blue bars), and AnEn-ECC (cyan bars) with BSSs (higher is better) in 2021. (a) BSSs of precipitation events derived from 1- to 40-mm (6 h)−1 thresholds and for 6–54-h forecasts. (b) As in (a), but for 54–102-h forecasts. (c) As in (a), but for 102–144-h forecasts. The “x” means the BSS is lower than 0.001.

  • Fig. 9.

    Verification of (top) forecasted 6-hourly extreme precipitation events with reliability diagrams, (middle) frequency of occurrence, and (bottom) BS (“Brier”; lower is better) decompositions [ reliability (“REL”; lower is better), resolution (“RES”; higher is better), and climatological uncertainty (o¯)] in 2021. All scores were computed based on events of 6-hourly precipitation rates > 40 mm (6 h)−1 threshold. In (a)–(c), metrics were averaged over 6–54-, 54–102-, and 102–144-h forecasts, respectively. Dashed no-skill reference lines and perfect reliability diagonal reference lines are included. Calibration curves were averaged from 100 bootstrap replicates with error bars and color shades representing the 95% confidence intervals.

  • Fig. 10.

    As in Fig. 9, but for extreme events of 6-hourly precipitation rates > gridpoint-wise 99th percentile thresholds. Note that o¯ is not strictly equal to 0.01 because it was derived from the 2000–19 CCPA climatology, not from the 2021 verification period.

  • Fig. 11.

    As in Fig. 9, but for extreme events of 6-day accumulated precipitation amount > gridpoint-wise 99th percentile thresholds.

  • Fig. 12.

    (a) The schematics of the VQ-VAE latent space visualization. (b) Visualization examples on 1 Jan 2021 with 00–06-, 48–54-, and 96–102-h forecasts, respectively. Small and large blue dots are the latent space representations of the original GEFS ensemble members and the ensemble mean, respectively. Yellow and red dots are the representations of 3D ViT outputs and ViT-LDM outputs. Star symbols are the representations of the CCPA verification target. Dashed lines were produced from kernel density estimates.

  • Fig. 13.

    The BS (lower is better) performance of ViT-LDM on 6-hourly extreme precipitation events and with varying ensemble sizes. The ensemble sizes were increased by generating more members from each GEFS-Raw member.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 698 699 381
PDF Downloads 2654 2654 79