1. Introduction
Weather forecasts have been studied for years evolving from subjective forecasts to numerical weather prediction (NWP) nowadays. They have great impacts on society as well as people’s daily lives. NWP predicts the future state of the atmosphere by integrating the governing partial differential equations, using the currently observed weather conditions as input (Bauer et al. 2015). While NWP performance has advanced, especially over recent decades, substantial uncertainties persist, notably in extreme weather scenarios like heavy precipitation and typhoons. The uncertainty of NWP mainly stems from two kinds of errors: initial condition errors and model errors (Hailing and Pu 2010; Duan et al. 2022). To quantitatively predict atmospheric state uncertainty, global numerical weather prediction centers have embraced ensemble forecasting (Buizza et al. 2005). Unlike single forecasts, ensemble forecasting generates multiple concurrent forecasts with distinct initial conditions or model formulations. Epstein (Epstein 1969) first introduced ensemble forecasting in meteorology to represent true forecast distributions. Various ensemble methods have since emerged to quantify uncertainties in both initial and model errors (Toth and Kalnay 1997; Hoffman and Kalnay 1983; Toth and Kalnay 1993; Mureau et al. 1993; Buizza et al. 2005, 2008; Wei et al. 2008). Due to their effectiveness in handling uncertainty and superior performance compared to single forecasts, ensemble forecasts find applications in high-impact precipitation forecasting (Du et al. 1997; Evans et al. 2014; Clark 2017) and tropical cyclone forecasting (Zhang and Krishnamurti 1997; Mackey and Krishnamurti 2001; Lin et al. 2020; J. Zhang et al. 2021). However, despite its impressive performance, ensemble forecasting faces significant challenges, with computational expense being a major drawback. Theoretically, larger ensembles better resolve probability distributions, but costs scale proportionally with ensemble size (Buizza and Palmer 1998). The high computational demands lead to time-consuming delays from ensemble forecast execution to delivery for end users. Therefore, accurately quantifying forecast uncertainty while optimizing computational resources and minimizing time lags remains an ongoing research challenge.
As a data-driven method, deep learning has been widely adopted across many fields, enabled by increased computational power, availability of large datasets, and rapid advances in neural network architectures (Schultz et al. 2021; Schmidhuber 2015). In recent years, weather forecasting has also been addressed by deep learning methods (Reichstein et al. 2019). The initial successes focused on single deterministic forecasting. Studies have shown the potential of deep learning methods in precipitation forecast (Shi et al. 2015, 2017; Ayzel et al. 2020; Espeholt et al. 2022; Fernández and Mehrkanoon 2021; F. Zhang et al. 2021; Zhang et al. 2023), typhoon track and intensity forecast (Zhuo and Tan 2021; Chen and Yu 2021; Zhang et al. 2022), and other meteorological variables’ forecasts (Rasp et al. 2020; Pathak et al. 2022; Fan et al. 2023; Bi et al. 2022; Lam et al. 2022).
Considering the importance of uncertainty quantification (Nilsen et al. 2022) and the limitation of traditional ensemble forecasts, several studies have employed deep learning methods to estimate forecast uncertainty. Scher and Messori (2018) proposed a convolutional neural network (CNN) to predict forecast errors or ensemble spread from past forecasts, showing that uncertainty can be estimated via machine learning. Vannitsem et al. (2021) gave an overview of the development and challenges of postprocessing methods in NWP, including the generation of probabilistic forecasts from deterministic inputs. Grönquist et al. (2021) trained a deep neural network to estimate the spread of weather forecasts using a small set of simulations, reducing computational costs compared to full ensembles. Chapman et al. (2022) found that a deep learning model could compete with or outperform the Global Ensemble Forecast System for probabilistic integrated vapor transport forecasts. Brecht and Bihlo (2023a) proposed a 3D Pix2Pix model to quantify 500-hPa geopotential height uncertainty from a control forecast, which is computationally efficient. They further used generative adversarial networks to generate realistic precipitation ensembles from just the control forecast, with performance identical to the ECMWF IFS ensemble (Brecht and Bihlo 2023b). Meanwhile, some other ensemble methods within deep learning architectures have been proposed. Clare et al. (2021) trained several neural networks on different small subsets to predict probability density functions for the target weather variable. Bihlo (2021) used a conditional generative adversarial network to forecast variables and quantify uncertainty via Monte Carlo dropout. Scher and Messori (2021) test different methods for a neural network to generate ensemble forecasts, which showed better performance than the unperturbed neural network forecasts.
Compared to traditional physical models, deep learning (DL) models exhibit significantly enhanced computational efficiency in prediction tasks. Once trained, they are often orders of magnitude faster than the original physically based models (Boukabara et al. 2019). DL models present an appealing alternative for uncertainty predictions compared to resource-intensive ensemble forecasting. While previous studies predominantly employed CNN approaches like U-Net to estimate forecast uncertainty, these models have limited capacity for spatiotemporal learning, crucial for meteorological data. Furthermore, existing research has mainly focused on variables like geopotential height and temperature, neglecting precipitation despite its significance and the challenges in accurate prediction. For variables such as temperature and geopotential height that closely follow normal distributions, the ensemble mean and spread can well approximate the forecast distributions. However, precipitation, characterized by sparse distribution and nonnormal behavior, requires different uncertainty quantification approaches. One approach is to generate ensemble members directly with multiple DL models (Brecht and Bihlo 2023b), but training multiple networks is computationally expensive. Therefore, this work aims to avoid ensemble complexity and provide useful precipitation uncertainty information by predicting ensemble spread and probabilistic forecasts, respectively, given only a single deterministic forecast. The predicted spread indicates forecast confidence and uncertainty, while the probabilistic forecasts provide precipitation occurrence probabilities for different thresholds, which also reflects forecast uncertainty. To achieve this, we introduce a novel spatiotemporal transformer network named ST-TransNet. Employing a hierarchical encoder–decoder structure, the model learns multiscale precipitation information. Given the spatial and temporal correlations in both inputs and outputs, a spatiotemporal transformer module is incorporated to comprehensively extract this information. Since meteorological variables like precipitation exhibit high spatial localization, window-based attention captures local precipitation patterns, simultaneously reducing computational costs. Additionally, adversarial learning is used to learn rich spatiotemporal representations and generate visually realistic results. The proposed model undergoes training and evaluation using the China Meteorological Administration (CMA) ensemble forecasts and the Global Precipitation Measurement (GPM) precipitation dataset. Results demonstrate that a well-trained DL model can estimate precipitation forecast uncertainty without relying on computationally expensive ensemble model runs. By predicting spread and probabilities from a single forecast, our model provides uncertainty information in an efficient computational manner.
The rest of this article is organized as follows: section 2 introduces the dataset used in this study. Section 3 provides a detailed description of the proposed model. Section 4 presents the experiment configuration, baselines for comparison, and evaluation methods. Section 5 analyzes and discusses the evaluation results. Finally, section 6 summarizes the main conclusions and discussion.
2. Data
The purpose of this study is to estimate precipitation forecast uncertainty from a single deterministic forecast. Thus, a high-quality ensemble dataset is crucial for training the proposed model. We utilize the dataset from the THORPEX Interactive Grand Global Ensemble (TIGGE) (Bougeault et al. 2010), which contains global ensemble forecasts from NWP centers. Specifically, we use the CMA ensemble forecasts in the TIGGE archive. The CMA ensemble forecasts provide 360-h forecasts (72-h forecasts are used in this study) at 6-h intervals, initialized at 0000 and 1200 UTC daily since May 2007. It consists of one unperturbed control forecast and 14-member forecasts, extended to 30 members in October 2020. The spatial resolution of the model is 0.5° × 0.5°, which is approximately 50 km. The grid points of the global model are 720 × 360. To improve efficiency, we focus on the domain 64°–160°E, 6.5°–70°N, shown in Fig. 1. The model outputs many variables, but we target total precipitation due to its importance and difficulty to predict. The spread is defined as the standard deviation of the members in ensemble forecasting (Leutbecher and Palmer 2008). Higher ensemble spreads indicate greater uncertainty and disagreement among the ensemble forecast members, while lower spreads signify more confidence and consensus with more similar predictions across the ensemble. Probability forecasts give the proportion of members exceeding a threshold. Figure 1 illustrates a 24-h deterministic control forecast (Fig. 1a), the corresponding ensemble spread (Fig. 1b), and probability of precipitation exceeding 25 mm, initialized at 0000 UTC 21 July 2021.
(a) The 24-h precipitation forecast (mm) from the unperturbed control forecast initialized at 0000 UTC 21 Jul 2021, (b) the computed uncertainty (mm) of 24-h precipitation forecast from the ensemble forecasts, and (c) the probabilistic forecast of 24-h precipitation (>25 mm) from the ensemble forecasts.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
In addition to the ensemble dataset, we utilize precipitation data from the Integrated Multi-satellitE Retrievals for GPM (IMERG) mission as the observation ground truth. The GPM mission is composed of an international network of satellites that provide the next-generation global observations of rain and snow. The IMERG algorithm combines information from the GPM satellite constellation to estimate precipitation over the majority of Earth’s surface (Huffman et al. 2015). IMERG provides three types of products: early-run, late-run, and final-run products. Both the early-run and late-run products are near–real time, with a latency of 4 and 12 h, respectively. The final run is a post-real-time research product with a latency of about 4 months. It has a spatial resolution of 0.1° × 0.1° and a temporal resolution of 30 min, 1 day, and 1 month. We use the half-hourly final- and late-run products as reference data in this study.
3. Methodology
a. Problem formulation
We formulate the problem of uncertainty estimation as a spatiotemporal sequence-to-sequence translation task. Let X = {x1, x2, …, xT} represent a sequence of deterministic forecasts, such as precipitation predictions. The variable xi ∈ RM×N, i = 1, 2, …, T, is the forecast at time step i within a spatial region of M × N. Given X, the objective of this study is to predict the corresponding ensemble spread or forecasting probability, denoted by
During the training phase, the ground truth Y is provided alongside the deterministic forecast X. Here, Y represents the spread or forecast probability derived from historical ensembles. The neural network’s input and output are intricately correlated in both temporal and spatial dimensions. Consequently, the model must accurately capture relationships along both dimensions. Moreover, precipitation forecasts predominantly exhibit local features within specific regions. To solve this, we propose a spatiotemporal transformer neural network (ST-TransNet) to estimate the uncertainty of deterministic precipitation forecasts, as shown in Fig. 2. The ST-TransNet comprises a transformer-based generator and a discriminator, detailed in the subsequent subsections. Once trained, the model adeptly estimates precipitation forecast uncertainty without the need for computationally intensive ensemble forecasts.
The architecture of the ST-TransNet: (a) generator and (b) discriminator.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
b. Transformer-based generator
The proposed ST-TransNet adopts a structure inspired by the transformer architecture, which comprises a patch embedding layer and multiple transformer blocks. These components work together to systematically reduce the spatial dimensions of the input data. The input sequence X has dimensions of T × H × W, representing time, height, and width. We first apply a 3D patch embedding layer to divide the input into patches before feeding into the transformer blocks. The standard transformer receives as input a sequence of token embedding for natural language processing. In contrast to methods like vision transformer (ViT) (Dosovitskiy et al. 2021), which reshape 2D images into sequences of 2D patches, ST-TransNet uniquely maps the input sequence into 3D tokens. Assuming a 3D token size of 2 × 4 × 4, the input is then divided into (T/2) × (H/4) × (W/4) 3D tokens. Following this, a linear projection is employed to project the features of each token into an arbitrary dimension denoted by C. Subsequently, multiple transformer blocks operate on these patch tokens, facilitating intricate feature extraction and representation learning.
The transformer block serves as the cornerstone of our model, playing a pivotal role in capturing correlations present in both spatial and temporal dimensions of the input and output. Temporally, our model acknowledges the influence of earlier time steps on subsequent ones, accounting for long-range dependencies. Spatially, robust interactions exist between different regions, particularly notable in scenarios involving highly localized precipitation. Drawing inspiration from Liu et al. (2022), our model employs a 3D shifted window multihead self-attention (3D SW-MSA) module to effectively learn spatiotemporal interactions among 3D tokens, illustrated in the top section of Fig. 2a. The 3D SW-MSA serves as a replacement for the standard MSA in our transformer blocks. The architecture includes 3D window MSA (3D W-MSA), 3D SW-MSA, multiple feed-forward networks [also called multilayer perception (MLP)], and layer normalization (LN). The input undergoes a division into nonoverlapping 3D windows, and multihead self-attention is initially applied within each window. This design of window attention proves computationally efficient and adept at capturing local features, as demonstrated in image and video recognition (Liu et al. 2021, 2022). To facilitate connections across windows, the 3D window undergoes a strategic shift between blocks, introducing connections between nonoverlapping windows. A more detailed description of different modules can be found in the appendix.
An inherent challenge in weather forecasting lies in its multiscale characteristics. We employ a hierarchical encoder–decoder structure with multiple stages to effectively capture these multiscale features. After transformer blocks, patch merging layers are strategically introduced to systematically reduce the number of tokens and generate hierarchical representations. Assuming the feature map dimensions after the initial transformer layers are (T/2) × (H/4) × (W/4) × C, they are downsampled by a patch merging layer into dimensions of (T/2) × (H/8) × (W/8) × 2C. This downsampling process is iteratively applied over several stages, aligning with the multiscale nature of the weather data. Inspired by the work of Ronneberger et al. (2015), our generator is meticulously crafted with a symmetric transformer-based encoder–decoder architecture. The encoder employs patch merging for efficient downsampling, while the decoder incorporates patch expanding layers for precise upsampling, thereby restoring the original resolution. To seamlessly integrate multiscale feature representations, skip connections are strategically placed between corresponding scales in the encoder and decoder. This holistic approach ensures comprehensive information flow across scales. Finally, an upsampling module restores the output to the original input resolution.
c. Discriminator
In ST-TransNet, the discriminator plays a pivotal role in ensuring that the output sequence from the generator faithfully captures both low- and high-frequency components resembling the ground truth sequence. Illustrated in Fig. 2b, the discriminator receives three distinct inputs: the generated output sequence, the ground truth sequence, and the model input sequence. Employing multiple convolutional layers, each augmented with appropriate activation functions, the discriminator extracts features at various levels. A final sigmoid layer is responsible for classifying each input patch as either real or fake. Adhering to the principles of Isola et al. (2017), we adopt a patch generative adversarial network (GAN) approach. The discriminator’s output manifests as a sequence of N × N patches, each corresponding to distinct areas of the original input feature map. The overall discriminator output is determined by averaging the classifications across all individual patches. Comprehensive details are provided in the appendix.
4. Experiments
a. Baseline models
To fully evaluate the performance of the ST-TransNet model, we present two main tasks for estimating precipitation forecast uncertainty: spread estimation and probabilistic forecasting. The input is a single deterministic forecast as the control forecast. For spread estimation, the model is trained on ensemble spread derived from historical ensemble weather forecasts. In the case of probabilistic forecasting, the target is to forecast the probability of exceeding several precipitation thresholds. We compare ST-TransNet to several baseline methods: 1) Linear regression model: Recognizing the strong correlation between precipitation uncertainty and the actual forecast, where increased rainfall implies higher uncertainty, we establish a simple linear regression model as the baseline for spread estimation. We train one overall linear model with all grids in the training dataset. 2) U-Net (Ronneberger et al. 2015; Grönquist et al. 2021): Employing an encoder–decoder architecture, U-Net is adapted for sequence data by concatenating the time dimension as channels. 3) 3D U-Net (Çiçek et al. 2016; Grönquist et al. 2019): The 3D U-Net model is the extension of U-Net tailored for extracting information from data with 3D dimensions, utilizing 3D operations like convolutions to replace 2D operations in the U-Net. 4) TimeSformer (Bertasius et al. 2021): Originally designed for video classification, TimeSformer learns spatiotemporal features through a transformer architecture. Given the spatiotemporal nature of our data, this method is employed for uncertainty estimation. 5) Video vision transformer (ViViT; Arnab et al. 2021): ViViT is also designed for video classification tasks with spatiotemporal attention operations of transformer, which is appropriate in this study to process sequence meteorological data. 6) 3D Pix2Pix: It is an extension of the Pix2Pix model (Isola et al. 2017) used for image translation problems. The 3D Pix2Pix model aims to solve the video translation problem. Brecht and Bihlo (2023a) learned the ensemble spread of 500-hPa geopotential height with the 3D Pix2Pix model. The detailed descriptions of these baselines are provided in the appendix, offering in-depth insights into the methodologies and configurations of each baseline method used for comparison.
b. Implementation details
5. Results
In this section, we present evaluation results comparing the proposed ST-TransNet model to the baseline methods. We start by assessing performance on ensemble spread estimation, which serves as a reliable predictor of errors in a single deterministic forecast. Subsequently, we delve into a comparison of performance in probabilistic forecasting, offering valuable uncertainty information to end users. These experiments showcase the effectiveness of neural networks in learning to estimate precipitation forecast uncertainty at a lower computational cost.
a. Uncertainty estimation by ensemble spread
We compare ST-TransNet to the baseline models on spread estimation using several quantitative metrics. Table 1 shows the CSI, HSS, MSE, MAE, and SSIM scores averaged over the 72-h forecast period, against ensemble forecast spread labels. CSI and HSS are evaluated at spread thresholds of 5, 10, 20, and 30 mm. The linear regression (LR) model has the lowest CSI and HSS at all thresholds. Thus, though forecast uncertainty is usually proportional to the actual precipitation forecast, a simple linear relation cannot accurately predict the corresponding ensemble spread. Among the deep learning models, U-Net performs worst. Due to concatenating time along channels, U-Net fails to capture temporal dynamics with its 2D convolutions. When extending 2D operations to 3D, the performance of 3D U-Net is improved by incorporating spatiotemporal operations. Compared to the U-Net structure, TimeSformer and ViViT utilize self-attention to model relationships in space and time. However, their single-layer architectures limit multiscale feature learning for meteorological data. ViViT behaves better on larger threshold of 30 mm, which is even higher than ST-TransNet, indicating stronger capability on high-spread samples despite the overall limitations. For MSE and MAE, LR again has the highest errors while ST-TransNet achieves the lowest. The first five methods are trained without adversarial loss, resulting in a low SSIM value, which measures the visual quality of the output sequence. In comparison, clear improvements can be found in SSIM values for 3D Pix2Pix and ST-TransNet, indicating better visual quality of output sequence. Overall, ST-TransNet generally outperforms the baselines across the quantitative metrics, demonstrating the advantages of its spatiotemporal transformer architecture with adversarial training for spread estimation.
Comparison of quantitative evaluation results with baseline methods. CSI and HSS are given for different thresholds of spread at 5, 10, 20, and 30 mm and the average value. The bold values indicate the best scores.
To evaluate the performance at different forecasting lead times, we take CSI and HSS for thresholds of 5 and 10 mm as examples to show the metrics evolution against forecast times (Fig. 3). It can be found that as forecast time increases, the performance of almost all the models improves, which differs from typical deterministic forecasting results as usual. Due to the chaotic nature of the atmosphere, deterministic skill degrades with lead time due to growing uncertainty. However, this reduction in deterministic skill corresponds to an increase in forecast errors, which is represented by greater ensemble spread. Thus, the ensemble spread exhibits an increasing trend over lead time. Initially, few grids exceed 5- or 10-mm spread thresholds. Consequently, the initial CSI and HSS metrics are low for these thresholds. The number of samples exceeding 5- and 10-mm spread rises, leading to improved CSI and HSS scores. In comparison, ST-TransNet consistently outperforms the baselines across all lead times. Among the baselines, 3D U-Net achieves the best performance in the first few forecast hours before being surpassed by 3D Pix2Pix at longer leads. Especially for larger uncertainty, 3D Pix2Pix attains higher metrics than other baselines except ST-TransNet. Among all methods, LR, U-Net, and ViViT achieve relatively lower CSI and HSS scores. Overall, the proposed ST-TransNet outperforms other methods across thresholds and lead times, demonstrating clear advantages in spread estimation performance.
The CSI and HSS score evolution over forecast times of different methods at the threshold of (top) 5 mm and (bottom) 10 mm.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
Ensemble spread is a good predictor of errors in the deterministic forecast. High spread indicates greater disagreement among ensemble members and larger uncertainty. Thus, ensemble spread can be evaluated using binned average spread–skill plots (Wang and Bishop 2003; McLay et al. 2008). Figure 4 displays binned average spread–skill scatterplots and fitted linear relationships for 24-, 48-, and 72-h forecasts. For clarity, we show three representative methods: LR as a traditional regression model, 3D Pix2Pix as the best DL performer besides ST-TransNet, and the proposed ST-TransNet. The RMSE means the mean-squared forecast error (control forecasts against observations). The ordered spread–RMSE pairs are divided into 20 bins, with bin-averaged values plotted. The dotted gray line depicts perfect correlation between spread and RMSE. It is seen that for precipitation forecasting, all experiments exhibit slight underdispersion, with smaller spread than RMSE. At 24 h, LR agrees more closely with forecast errors for lower spreads. For higher spreads, ST-TransNet maintains the closest spread–skill to the ideal situation. At 48 and 72 h, ST-TransNet sustains the strongest alignment to perfect correlation. LR still performs better than 3D Pix2Pix for lower spreads but worse for higher spreads. Generally, ST-TransNet demonstrates the most favorable relationship between predicted spread and forecast errors.
Scatter diagram involving bin-mean ensemble spread and bin-mean RMSE of (a) 24-h forecasts, (b) 48-h forecasts, and (c) 72-h forecasts. The solid lines with different colors denote the least squares fit lines to scatter points for different experiments. The gray dotted line indicates a perfect spread–skill line. The binning process is based on 20 bins and makes use of all the testing datasets.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
To evaluate the proposed model both qualitatively and quantitatively, we present a case study to show the individual comparison results of different methods. Figure 5 shows the spread estimation of 72-h precipitation forecasts initialized at 0000 UTC 15 August 2021. Each result in Fig. 5 represents the last 6-h prediction results. Due to limited space, results are shown every 12 h. The first row displays the deterministic control forecasts from the ensemble, which is also the input sequence of our model. Correspondingly, the second row represents the ground truth of ensemble spread, indicating the uncertainty of precipitation forecasts. As shown in the first two rows, high uncertainty occurred in the Indian Ocean, where a large area of strong precipitation took place. Additionally, clear uncertainty appears in the northwest Pacific south of Japan and the East China Sea for the 12- and 24-h predictions. As time increases, the main precipitation shifts to the East China Sea and the sea between Japan and South Korea, where high uncertainty is also observed. Another region of high uncertainty is located in 112°–118°E, 55°–65°N after 36- and 48-h predictions. Overall, LR substantially overestimates the spreads for the first few forecasting steps and underestimates the spreads at later periods. Compared to all methods, the proposed ST-TransNet best captures the main uncertainties in the Indian Ocean and northwest Pacific at 12- and 24-h predictions. U-Net and 3D Pix2Pix slightly overestimate the uncertainty in the northwest Pacific Ocean. As forecast time increases, almost all methods except LR overestimate the uncertainty around 112°–118°E, 55°–65°N. For longer 60–72-h predictions, although all methods successfully predict the uncertainty between Japan and South Korea, ST-TransNet still yields the best agreements with the ground truth. In summary, the proposed ST-TransNet demonstrates great skill in estimating ensemble spread.
The top row shows a case study of uncertainty estimation of different methods from deterministic forecasts. The second row represents the ground truth of ensemble spread. The other rows show the prediction results from different methods. The predictions are initialized at 0000 UTC 15 Aug 2021; each time represents the last 6-h prediction.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
b. Uncertainty estimation by probabilistic forecasts
The assessment of the predicted spread serves as an initial evaluation of our model’s effectiveness. While spread can offer a reference for indicating confidence in forecasts, it provides limited information about the specific range of forecasts, especially considering precipitation does not closely follow a normal distribution. Therefore, to comprehensively quantify uncertainty in precipitation forecasting, this study proceeds to examine the probabilistic forecasting ability of the ST-TransNet. Unlike deterministic forecasts, which provide a single predicted precipitation value, probabilistic forecasts indicate the likelihood of occurrence for potential precipitation events. This inherently encodes uncertainty information, providing insights into the confidence level associated with the forecast.
Table 2 illustrates the quantitative evaluation results against labels from ensemble forecasts. DL-based models are trained on the historical ensemble forecasts. Five thresholds are selected to show the general performance on different levels, which are 0.1, 1, 5, 10, and 25 mm (6 h)−1. It is seen that scores of all models decrease with thresholds increasing. It indicates better performance for lighter rain due to unbalanced samples. ST-TransNet has higher CSI and HSS scores than other models for most thresholds except 25 mm. For heavier rainfall (>25 mm), it is found that ViViT is more skillful than ST-TransNet for CSI and HSS. On average, the proposed ST-TransNet outperforms other baselines for probabilistic forecasting. Apart from ST-TransNet, TimeSformer obtains relatively higher scores for smaller thresholds like 0.1 and 1.0 mm. ViVit performs better for heavier rainfall, which is also found in the spread estimation task. In comparison, U-Net has the lowest CSI and HSS scores.
CSI and HSS scores for different precipitation thresholds (mm). Scores are averaged from probability of 0.1–0.9. The bold values indicate the best scores.
Figure 6 shows the evolution of CSI and HSS scores as forecast time increases. Both scores are averaged on the whole test dataset, with the thresholds of 0.1 and 1 mm given as examples. For 0.1 mm (top of Fig. 6), ST-TransNet clearly scores better than all baselines for both CSI and HSS, especially in early forecasting periods. Except for U-Net, the gap between ST-TransNet and other baselines decreases gradually over time. In contrast, U-Net obtains the lowest scores among all models, with scores decreasing quickly as time increases. For 1.0 mm (bottom of Fig. 6), ST-TransNet maintains improvements over baselines across all forecast periods. Apart from ST-TransNet, TimeSformer shows better forecast skills than other baselines, while U-Net still performs worst. Overall, the proposed ST-TransNet achieves the highest scores over all forecast periods, demonstrating the best performance for probabilistic precipitation forecasts.
The CSI and HSS score evolution over forecast times of different methods at the threshold of (top) 5 mm and (bottom) 10 mm.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
The model performance is also evaluated using the BS, a proper scoring rule equivalent to the mean-squared error between the predicted probabilities and ground truth. The ground truth interpolates the GPM IMERG products into the same spatial resolution as the model output, with values set to 1 if exceeding the threshold and 0 otherwise. Figure 7 shows the average BS for different forecasting times, which is averaged on the five thresholds, 0.1, 1, 5, 10, and 25 mm, over the whole test dataset. It is seen that within the first 36 h, U-Net shows a relatively lower BS. In contrast, ST-TransNet obtains higher scores at the beginning of forecasts. After 36 h, the performance of U-Net declines with BS increasing. Meanwhile, the proposed ST-TransNet gradually outperforms other baselines, especially during the late of forecast. Apart from ST-TransNet, ViViT is slightly better than other baselines at most of times. Therefore, compared with observation ground truth, the proposed ST-TransNet shows more apparent improvements for longer-range forecasts.
BS evolution over forecast times for probabilistic precipitation forecasts from different methods. The ordering of the bars follows the ordering of the legend in different colors.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
Similar to spread prediction results, we present a case study to visually evaluate the probabilistic forecasts. Figure 8 shows the comparison of observations with predictions from different methods for three thresholds. We give the 48-h forecasting results initialized on 15 August 2021, where each prediction means the past 6-h accumulated precipitation. GPM precipitation product is used as ground truth. The label refers to the probability forecast from ensemble forecasts. The 0.1-mm threshold (Fig. 8a) basically shows whether precipitation occurred. Compared to ground truth, all models generally predict the main precipitation occurrences. However, larger uncertainty is found in U-Net probabilistic forecasts. For example, in the northwest of China where precipitation above 0.1 mm is predicted confidently by ensemble forecast and ST-TransNet, U-Net’s probability remains low, indicating high uncertainty. For 1 mm (Fig. 8b), the precipitation distribution becomes more concentrated and localized. Again, U-Net yields more uncertainty than other models, especially at boundaries where precipitation occurs. A similar situation appears for 3D Pix2Pix. In comparison, ST-TransNet agrees better with ensemble forecast probabilities. As the threshold increases to 5 mm (Fig. 8c), the area of precipitation occurrence continues to decrease. It is seen that all models can well predict the main precipitation around the East China Sea with great confidence. However, for the precipitation in the central of China (100°–110°E, 30°–40°N), all models including traditional ensemble forecasts have a position bias, which is on the north of the ground truth. U-Net, 3D U-Net, and 3D Pix2Pix are more uncertain in predicting this precipitation, leading to lower probabilities. In comparison, ST-TransNet shows higher confidence for prediction at the 5-mm threshold. Another example is found in the north of the study area (100°–120°E, 50°–60°N) where a heavy rainfall occurred. The proposed ST-TransNet clearly outperforms all baselines, making the best probabilistic forecast.
A case study of 48-h probability forecasting of precipitation at different thresholds. The prediction is initialized at 0000 UTC 15 Aug 2021, and the results represent the 6-h accumulated precipitation.
Citation: Monthly Weather Review 152, 5; 10.1175/MWR-D-23-0097.1
6. Discussion and conclusions
In this study, we mainly focus on estimating precipitation forecast uncertainty from a single deterministic forecast with a deep learning model. Whereas traditional methods rely on computationally expensive ensemble forecasts to estimate uncertainty, our study aims to showcase the potential of a neural network to learn and estimate precipitation forecast uncertainty without the need for ensembles. Once trained, the model’s inference at testing time takes only seconds, making it much more cost-effective than running NWP ensemble forecasts.
We select precipitation as the target variable due to its significant impact on daily life and socioeconomic factors. Unlike other meteorological variables such as temperature, precipitation is highly localized and spatially discontinuous, posing a challenge for accurate prediction. Given that precipitation and its errors do not closely follow a normal distribution, relying solely on ensemble spread might not offer sufficient uncertainty information. Thus, our study trains the proposed model on both ensemble spread and probabilistic forecasts to comprehensively quantify forecast uncertainty. The uncertainty estimation task is formulated as a spatiotemporal sequence-to-sequence translation problem. We introduce a novel spatiotemporal transformer network, ST-TransNet, to explore the ability of deep learning to estimate precipitation forecast uncertainty. ST-TransNet employs a hierarchical encoder–decoder structure to learn multiscale forecast features. Transformer-based blocks extract spatiotemporal correlations, using window-based attention to capture localized precipitation patterns and global information. To ensure the predictions resemble the ground truth in both low and high frequencies, adversarial training is included in this study. The proposed model is trained and evaluated on the TIGGE ensemble forecasting dataset and GPM IMERG precipitation product. The effectiveness of ST-TransNet is verified, and its performance is compared with other baselines. Results demonstrate that the proposed ST-TrasnNet generally outperforms all baselines on both spread estimation and probabilistic forecast tasks. It shows the capacity of a well-designed deep learning model to estimate precipitation forecast uncertainty directly from a single deterministic forecast.
This study highlights the promising potential for neural networks to efficiently infer precipitation forecast uncertainties. Considering computational efficiency, it could become a valuable reference in operational contexts, particularly for policymakers with limited computational resources. The proposed model can bypass the complexity of running every ensemble member while still providing uncertainty information about precipitation through spread estimation and probabilistic forecasts. Despite the progress, we have noted that there are still some limitations and potential improvements of this study. First, the proposed model, while effective, is limited in that it cannot provide the full ensemble members with multimodality. Instead of replacing ensemble forecasts, the proposed model might serve as guidance that limited resources can be used to generate sufficient ensemble forecasts in situations with significant uncertainty. Moreover, this study specifically targets uncertainty in precipitation forecasts. Given the high correlation of precipitation with other variables such as moisture flux and geopotential height, future work should include additional features and explore further improvements in precipitation forecast uncertainty. Last, the network presented still relies on an ensemble forecast dataset, and with an improved quality of ensemble forecasts in terms of members and resolutions, the proposed network’s performance can continue to advance.
Acknowledgments.
This work was supported by the Natural Science Foundation of China (41975066). The authors thank the editor and anonymous reviewers for their constructive comments.
Data availability statement.
CMA ensemble forecast data can be retrieved through the TIGGE archive (https://apps.ecmwf.int/datasets/data/tigge). The IMERG precipitation data are also available online (https://gpm.nasa.gov/data/directory).
APPENDIX
Details of Model Architectures and Configurations
a. ST-TransNet
The patch merging and expanding modules enable the model to extract features with multiple scales in both encoder and decoder. We incorporate skip connections within our model, establishing direct links between corresponding scales in both the encoder and the decoder (Fig. 2a). These skip connections enhance information flow throughout the network by creating direct connections between different layers. The operational principle of a skip connection involves adding the output of one layer to that of a subsequent layer. The features from the encoder are then concatenated to the corresponding decoder stage at the same scale, contributing to the reconstruction of high-level representations. By facilitating connections at different scales, skip connections enable the network to access both low-level and high-level information. This, in turn, promotes more effective information flow and preserves intricate details during both the encoding and decoding phases. Ultimately, skip connections empower the model to learn representations across diverse scales of features.
The discriminator architecture is defined by incorporating three convolutional layers, each with a distinct number of filters: 32, 64, and 1. After each convolution operation, a leaky rectified linear unit (ReLU) activation function is employed to introduce nonlinearity into the model. This activation function allows the network to capture complex patterns and relationships within the data. Specifically, the leaky ReLU activation function is chosen due to its ability to handle gradient-related issues during training, enabling the model to learn more effectively. The final layer of the discriminator employs a sigmoid activation function. This choice is motivated by the need to classify each output patch as either real or fake. The sigmoid activation function, which squashes the output values between 0 and 1, is well suited for binary classification tasks, making it suitable for the discriminator’s final decision-making layer.
b. Baselines
The U-Net (Ronneberger et al. 2015) architecture is a convolutional neural network designed for image segmentation tasks, and it is widely used in medical image analysis. It consists of an encoder and a decoder, forming a U-shaped structure. To adjust it with the spatiotemporal prediction task, the input sequence is concatenated in the channel dimension. The input passes through four convolutional blocks, each followed by a maximum pooling operation. These blocks progressively reduce spatial resolution while doubling the number of channels. The decoder consists of four up-convolutional blocks, each followed by a concatenation operation and a convolutional block. The purpose of these blocks is to upsample the spatial resolution while maintaining rich feature information. Skip connections are incorporated between corresponding scales in the encoder and decoder. These connections concatenate feature maps from the encoder with those from the decoder, aiding in the preservation of spatial details during upsampling. The 3D U-Net incorporates the temporal dimension by replacing 2D operation with 3D operation, preserving spatial relationships.
The Pix2Pix model is designed for classical paired image-to-image translation problem. To incorporate time embedding, Brecht and Bihlo (2023a) extend it to a 3D Pix2Pix model. It is based on a condition GAN model, in which the generator is a three-dimensional encoder–decoder U-Net which consists of several three-dimensional convolutional layers. The output of each convolutional layer is then concatenated to transposed convolutional layers which increase the latent dimension back to the original input shape. The discriminator D is a PatchGAN using the same encoder blocks as are used in the generator. A total of three layers with 32, 64, and 1 filters, respectively, are used. The last layer uses a sigmoid activation function.
TimeSformer (Bertasius et al. 2021) is a variant of the ViT designed for processing temporal sequences. It extends the transformer architecture to effectively capture temporal dependencies in sequential data. The input to the TimeSformer is a sequence of frames. Each frame is treated as a token, and the entire sequence is represented as a 1D sequence of tokens. Similar to ViT, the input sequence is divided into fixed-size patches, and each patch is linearly embedded into a high-dimensional space. This process allows the model to capture local features within each patch. TimeSformer proposes an efficient architecture named “Divided Space–Time Attention,” where temporal attention and spatial attention are separately applied one after the other. Notably, we modified the original TimeSformer by adding a transpose convolutional layer to restore the spatial resolution to be consistent with input.
Similar to TimeSformer, ViViT (Arnab et al. 2021) is a transformer-based video-processing model designed to effectively analyze and comprehend video sequences. ViViT starts by breaking down video frames into spatial–temporal patches, extending the embedding technique of ViT into the three-dimensional domain. What distinguishes ViViT is its innovative approach to self-attention computation on sequences of spatiotemporal tokens. This is achieved through a factorized self-attention strategy, employing two distinct transformer encoders. The first encoder focuses solely on spatial self-attention, while the subsequent encoder operates exclusively in the temporal dimension. In alignment with TimeSformer’s methodology, we modified the original architecture by introducing a transpose convolutional layer. This serves the purpose of restoring spatial resolution, ensuring consistency with the input frames, which effectively tackles the uncertainty estimation problem formulated in this study.
REFERENCES
Arnab, A., M. Dehghani, G. Heigold, C. Sun, M. Lucic, and C. Schmid, 2021: ViViT: A video vision transformer. 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 6816–6826, https://doi.org/10.1109/ICCV48922.2021.00676.
Ayzel, G., T. Scheffer, and M. Heistermann, 2020: RainNet v1.0: A convolutional neural network for radar-based precipitation nowcasting. Geosci. Model Dev., 13, 2631–2644, https://doi.org/10.5194/gmd-13-2631-2020.
Barnston, A. G., 1992: Correspondence among the correlation, RMSE, and Heidke forecast verification measures; refinement of the Heidke score. Wea. Forecasting, 7, 699–709, https://doi.org/10.1175/1520-0434(1992)007<0699:CATCRA>2.0.CO;2.
Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 47–55, https://doi.org/10.1038/nature14956.
Bertasius, G., H. Wang, and L. Torresani, 2021: Is space-time attention all you need for video understanding? arXiv, 2102.05095v4, https://doi.org/10.48550/arXiv.2102.05095.
Bi, K., L. Xie, H. Zhang, X. Chen, X. Gu, and Q. Tian, 2022: Pangu-weather: A 3D high-resolution model for fast and accurate global weather forecast. arXiv, 2211.02556v1, https://doi.org/10.48550/arXiv.2211.02556.
Bihlo, A., 2021: A generative adversarial network approach to (ensemble) weather prediction. Neural Networks, 139, 1–16, https://doi.org/10.1016/j.neunet.2021.02.003.
Bougeault, P., and Coauthors, 2010: The THORPEX interactive grand global ensemble. Bull. Amer. Meteor. Soc., 91, 1059–1072, https://doi.org/10.1175/2010BAMS2853.1.
Boukabara, S.-A., V. M. Krasnopolsky, J. Q. Stewart, E. S. Maddy, N. Shahroudi, and R. N. Hoffman, 2019: Leveraging modern artificial intelligence for remote sensing and NWP: Benefits and challenges. Bull. Amer. Meteor. Soc., 100, ES473–ES491, https://doi.org/10.1175/BAMS-D-18-0324.1.
Brecht, R., and A. Bihlo, 2023a: Computing the ensemble spread from deterministic weather predictions using conditional generative adversarial networks. Geophys. Res. Lett., 50, e2022GL101452, https://doi.org/10.1029/2022GL101452.
Brecht, R., and A. Bihlo, 2023b: Towards replacing precipitation ensemble predictions systems using machine learning. arXiv, 2304.10251v1, https://doi.org/10.48550/arXiv.2304.10251.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probability. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Buizza, R., and T. N. Palmer, 1998: Impact of ensemble size on ensemble prediction. Mon. Wea. Rev., 126, 2503–2518, https://doi.org/10.1175/1520-0493(1998)126<2503:IOESOE>2.0.CO;2.
Buizza, R., P. L. Houtekamer, G. Pellerin, Z. Toth, Y. Zhu, and M. Wei, 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., 133, 1076–1097, https://doi.org/10.1175/MWR2905.1.
Buizza, R., M. Leutbecher, and L. Isaksen, 2008: Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System. Quart. J. Roy. Meteor. Soc., 134, 2051–2066, https://doi.org/10.1002/qj.346.
Chapman, W. E., L. D. Monache, S. Alessandrini, A. C. Subramanian, F. M. Ralph, S.-P. Xie, S. Lerch, and N. Hayatbini, 2022: Probabilistic predictions from deterministic atmospheric river forecasts with deep learning. Mon. Wea. Rev., 150, 215–234, https://doi.org/10.1175/MWR-D-21-0106.1.
Chen, Z., and X. Yu, 2021: A novel tensor network for tropical cyclone intensity estimation. IEEE Trans. Geosci. Remote Sens., 59, 3226–3243, https://doi.org/10.1109/TGRS.2020.3017709.
Çiçek, Ö., A. Abdulkadir, S. S. Lienkamp, T. Brox, and O. Ronneberger, 2016: 3D U-Net: Learning dense volumetric segmentation from sparse annotation. arXiv, 1606.06650v1, https://doi.org/10.48550/arXiv.1606.06650.
Clare, M. C. A., O. Jamil, and C. J. Morcrette, 2021: Combining distribution-based neural networks to predict weather forecast probabilities. Quart. J. Roy. Meteor. Soc., 147, 4337–4357, https://doi.org/10.1002/qj.4180.
Clark, A. J., 2017: Generation of ensemble mean precipitation forecasts from convection-allowing ensembles. Wea. Forecasting, 32, 1569–1583, https://doi.org/10.1175/WAF-D-16-0199.1.
Dosovitskiy, A., and Coauthors, 2021: An image is worth 16 × 16 words: Transformers for image recognition at scale. arXiv, 2010.11929v2, https://doi.org/10.48550/arXiv.2010.11929.
Du, J., S. L. Mullen, and F. Sanders, 1997: Short-range ensemble forecasting of quantitative precipitation. Mon. Wea. Rev., 125, 2427–2459, https://doi.org/10.1175/1520-0493(1997)125<2427:SREFOQ>2.0.CO;2.
Duan, W., J. Ma, and S. Vannitsem, 2022: An ensemble forecasting method for dealing with the combined effects of the initial and model errors and a potential deep learning implementation. Mon. Wea. Rev., 150, 2959–2976, https://doi.org/10.1175/MWR-D-22-0007.1.
Epstein, E. S., 1969: Stochastic dynamic prediction. Tellus, 21, 739–759, https://doi.org/10.3402/tellusa.v21i6.10143.
Espeholt, L., and Coauthors, 2022: Deep learning for twelve hour precipitation forecasts. Nat. Commun., 13, 5145, https://doi.org/10.1038/s41467-022-32483-x.
Evans, C., D. F. V. Dyke, and T. Lericos, 2014: How do forecasters utilize output from a convection-permitting ensemble forecast system? Case study of a high-impact precipitation event. Wea. Forecasting, 29, 466–486, https://doi.org/10.1175/WAF-D-13-00064.1.
Fan, H., Y. Liu, Y. Li, Y. Liu, J. Duan, L. Li, and Z. Huo, 2023: A deep learning method for predicting lower troposphere temperature using surface reanalysis. Atmos. Res., 283, 106542, https://doi.org/10.1016/j.atmosres.2022.106542.
Fernández, J. G., and S. Mehrkanoon, 2021: Broad-UNet: Multi-scale feature learning for nowcasting tasks. Neural Networks, 144, 419–427, https://doi.org/10.1016/j.neunet.2021.08.036.
Grönquist, P., T. Ben-Nun, N. Dryden, P. Dueben, L. Lavarini, S. Li, and T. Hoefler, 2019: Predicting weather uncertainty with deep convnets. arXiv, 1911.00630v2, https://doi.org/10.48550/arXiv.1911.00630.
Grönquist, P., C. Yao, T. Ben-Nun, N. Dryden, P. Dueben, S. Li, and T. Hoefler, 2021: Deep learning for post-processing ensemble weather forecasts. Philo. Trans. Roy. Soc., A379, 20200092, https://doi.org/10.1098/rsta.2020.0092.
Hailing, Z., and Z. Pu, 2010: Beating the uncertainties: Ensemble forecasting and ensemble-based data assimilation in modern numerical weather prediction. Adv. Meteor., 2010, 432160, https://doi.org/10.1155/2010/432160.
Hoffman, R. N., and E. Kalnay, 1983: Lagged average forecasting, an alternative to Monte Carlo forecasting. Tellus, 35A, 100–118, https://doi.org/10.3402/tellusa.v35i2.11425.
Huffman, G. J., D. T. Bolvin, D. Braithwaite, K. Hsu, R. Joyce, P. Xie, and S.-H. Yoo, 2015: NASA Global Precipitation Measurement (GPM) Integrated Multi-satellitE Retrievals for GPM (IMERG). Algorithm Theoretical Basis Doc., version 4, 30 pp.
Isola, P., J.-Y. Zhu, T. Zhou, and A. A. Efros, 2017: Image-to-image translation with conditional adversarial networks. 2017 IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, Institute of Electrical and Electronics Engineers, 5967–5976, https://doi.org/10.1109/CVPR.2017.632.
Jolliffe, I. T., and D. B. Stephenson, 2012: Forecast Verification: A Practitioner’s Guide in Atmospheric Science. John Wiley and Sons, 304 pp.
Kingma, D. P., and J. Ba, 2015: Adam: A method for stochastic optimization. arXiv, 1412.6980v9, https://doi.org/10.48550/arXiv.1412.6980.
Lam, R., and Coauthors., 2022: GraphCast: Learning skillful medium-range global weather forecasting. arXiv, 2212.12794v2, https://doi.org/10.48550/arXiv.2212.12794.
Leutbecher, M., and T. N. Palmer, 2008: Ensemble forecasting. J. Comput. Phys., 227, 3515–3539, https://doi.org/10.1016/j.jcp.2007.02.014.
Lin, J., K. A. Emanuel, and J. L. Vigh, 2020: Forecasts of hurricanes using large-ensemble outputs. Wea. Forecasting, 35, 1713–1731, https://doi.org/10.1175/WAF-D-19-0255.1.
Liu, Z., Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, 2021: Swin Transformer: Hierarchical vision transformer using shifted windows. 2021 IEEE/CVF Int. Conf. on Computer Vision (ICCV), Montreal, QC, Canada, Institute of Electrical and Electronics Engineers, 9992–10 002, https://doi.org/10.1109/ICCV48922.2021.00986.
Liu, Z., J. Ning, Y. Cao, Y. Wei, Z. Zhang, S. Lin, and H. Hu, 2022: Video Swin Transformer. 2022 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, Institute of Electrical and Electronics Engineers, 3192–3201, https://doi.org/10.1109/CVPR52688.2022.00320.
Mackey, B. P., and T. N. Krishnamurti, 2001: Ensemble forecast of a typhoon flood event. Wea. Forecasting, 16, 399–415, https://doi.org/10.1175/1520-0434(2001)016<0399:EFOATF>2.0.CO;2.
McLay, J. G., C. H. Bishop, and C. A. Reynolds, 2008: Evaluation of the ensemble transform analysis perturbation scheme at NRL. Mon. Wea. Rev., 136, 1093–1108, https://doi.org/10.1175/2007MWR2010.1.
Mureau, R., F. Molteni, and T. N. Palmer, 1993: Ensemble prediction using dynamically conditioned perturbations. Quart. J. Roy. Meteor. Soc., 119, 299–323, https://doi.org/10.1002/qj.49711951005.
Nilsen, G. K., A. Z. Munthe-Kaas, H. J. Skaug, and M. Brun, 2022: Epistemic uncertainty quantification in deep learning classification by the Delta method. Neural Networks, 145, 164–176, https://doi.org/10.1016/j.neunet.2021.10.014.
Pathak, J., and Coauthors, 2022: FourCastNet: A global data-driven high-resolution weather model using adaptive Fourier neural operators. arXiv, 2202.11214v1, https://doi.org/10.48550/arXiv.2202.11214.
Rasp, S., P. D. Dueben, S. Scher, J. A. Weyn, S. Mouatadid, and N. Thuerey, 2020: WeatherBench: A benchmark data set for data-driven weather forecasting. J. Adv. Model. Earth Syst., 12, e2020MS002203, https://doi.org/10.1029/2020MS002203.
Reichstein, M., G. Camps-Valls, B. Stevens, M. Jung, J. Denzler, N. Carvalhais, and Prabhat, 2019: Deep learning and process understanding for data-driven Earth system science. Nature, 566, 195–204, https://doi.org/10.1038/s41586-019-0912-1.
Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. arXiv, 1505.04597v1, https://doi.org/10.48550/arXiv.1505.04597.
Scher, S., and G. Messori, 2018: Predicting weather forecast uncertainty with machine learning. Quart. J. Roy. Meteor. Soc., 144, 2830–2841, https://doi.org/10.1002/qj.3410.
Scher, S., and G. Messori, 2021: Ensemble methods for neural network-based weather forecasts. J. Adv. Model. Earth Syst., 13, e2020MS002331, https://doi.org/10.1029/2020MS002331.
Schmidhuber, J., 2015: Deep learning in neural networks: An overview. Neural Networks, 61, 85–117, https://doi.org/10.1016/j.neunet.2014.09.003.
Schultz, M. G., C. Betancourt, B. Gong, F. Kleinert, M. Langguth, L. H. Leufen, A. Mozaffari, and S. Stadtler, 2021: Can deep learning beat numerical weather prediction? Philos. Trans. Roy. Soc., A379, 20200097, https://doi.org/10.1098/rsta.2020.0097.
Shi, X., Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural. Inf. Process Syst., 28, 802–810.
Shi, X., Z. Gao, L. Lausen, H. Wang, D.-Y. Yeung, W.-K. Wong, and W.-C. Woo, 2017: Deep learning for precipitation nowcasting: A benchmark and a new model. arXiv, 1706.03458v2, https://doi.org/10.48550/arXiv.1706.03458.
Toth, Z., and E. Kalnay, 1993: Ensemble forecasting at NMC: The generation of perturbations. Bull. Amer. Meteor. Soc., 74, 2317–2330, https://doi.org/10.1175/1520-0477(1993)074<2317:EFANTG>2.0.CO;2.
Toth, Z., and E. Kalnay, 1997: Ensemble forecasting at NCEP and the breeding method. Mon. Wea. Rev., 125, 3297–3319, https://doi.org/10.1175/1520-0493(1997)125<3297:EFANAT>2.0.CO;2.
Vannitsem, S., and Coauthors, 2021: Statistical postprocessing for weather forecasts: Review, challenges, and avenues in a big data world. Bull. Amer. Meteor. Soc., 102, E681–E699, https://doi.org/10.1175/BAMS-D-19-0308.1.
Wang, X., and C. H. Bishop, 2003: A comparison of breeding and ensemble transform Kalman filter ensemble forecast schemes. J. Atmos. Sci., 60, 1140–1158, https://doi.org/10.1175/1520-0469(2003)060<1140:ACOBAE>2.0.CO;2.
Wang, Z., A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, 2004: Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process., 13, 600–612, https://doi.org/10.1109/TIP.2003.819861.
Wei, M., Z. Toth, R. Wobus, and Y. Zhu, 2008: Initial perturbations based on the ensemble transform (ET) technique in the NCEP global operational forecast system. Tellus, 60A, 62–79, https://doi.org/10.1111/j.1600-0870.2007.00273.x.
Zhang, F., X. Wang, and J. Guan, 2021: A novel multi-input multi-output recurrent neural network based on multimodal fusion and spatiotemporal prediction for 0–4 hour precipitation nowcasting. Atmosphere, 12, 1596, https://doi.org/10.3390/atmos12121596.
Zhang, J., J. Feng, H. Li, Y. Zhu, X. Zhi, and F. Zhang, 2021: Unified ensemble mean forecasting of tropical cyclones based on the feature-oriented mean method. Wea. Forecasting, 36, 1945–1959, https://doi.org/10.1175/WAF-D-21-0062.1.
Zhang, R., Q. Liu, R. Hang, and G. Liu, 2022: Predicting tropical cyclogenesis using a deep learning method from gridded satellite and ERA5 reanalysis data in the western North Pacific basin. IEEE Trans. Geosci. Remote Sens., 60, 1–10, https://doi.org/10.1109/TGRS.2021.3069217.
Zhang, Y., M. Long, K. Chen, L. Xing, R. Jin, M. I. Jordan, and J. Wang, 2023: Skilful nowcasting of extreme precipitation with NowcastNet. Nature, 619, 526–532, https://doi.org/10.1038/s41586-023-06184-4.
Zhang, Z., and T. N. Krishnamurti, 1997: Ensemble forecasting of hurricane tracks. Bull. Amer. Meteor. Soc., 78, 2785–2796, https://doi.org/10.1175/1520-0477(1997)078<2785:EFOHT>2.0.CO;2.
Zhuo, J.-Y., and Z.-M. Tan, 2021: Physics-augmented deep learning to improve tropical cyclone intensity and size estimation from satellite imagery. Mon. Wea. Rev, 149, 2097–2113, https://doi.org/10.1175/MWR-D-20-0333.1.