Use of Deep Learning for Weather Radar Nowcasting

Joaquin Cuomo aColorado State University, Fort Collins, Colorado

Search for other papers by Joaquin Cuomo in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-9608-7766
and
V. Chandrasekar aColorado State University, Fort Collins, Colorado

Search for other papers by V. Chandrasekar in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Nowcasting based on weather radar uses the current and past observations to make estimations of future radar echoes. There are many types of operationally deployed nowcasting systems, but none of them are currently based on deep learning, despite it being an active area of research in the last few years. This paper explores deep learning models as alternatives to current methods by proposing different architectures and comparing them against some operational nowcasting systems. The methods proposed here, harnessing residual convolutional encoder–decoder architectures, reach a level of performance expected of current systems and in certain scenarios can even outperform them. Finally, some of the potential drawbacks of using deep learning are analyzed. No decay in the performance on a different geographical area from where the models were trained was found. No edge or checkerboard artifact, common in convolutional operations, was found that affects the nowcasting metrics.

Significance Statement

Deep learning methods started to become more popular for weather nowcasting, but none is operational. We noticed that most of the studies did not present a benchmark against operational systems, and in many cases, the evaluation methods used were limited to light rain scenarios. Our goal is to propose deep learning models, analyze how they perform against operational methods in many scenarios, and explore some of the potential limitations. We found that our models perform better for low to mild storms. While we showed that common side effects of convolutional operations (a common technique in deep learning) did not impair the performance, we agree with many other authors that the major problem is the smoothing effect that hinders the nowcast of intense storms.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Joaquin Cuomo, jcuomo@alumni.colostate.edu

Abstract

Nowcasting based on weather radar uses the current and past observations to make estimations of future radar echoes. There are many types of operationally deployed nowcasting systems, but none of them are currently based on deep learning, despite it being an active area of research in the last few years. This paper explores deep learning models as alternatives to current methods by proposing different architectures and comparing them against some operational nowcasting systems. The methods proposed here, harnessing residual convolutional encoder–decoder architectures, reach a level of performance expected of current systems and in certain scenarios can even outperform them. Finally, some of the potential drawbacks of using deep learning are analyzed. No decay in the performance on a different geographical area from where the models were trained was found. No edge or checkerboard artifact, common in convolutional operations, was found that affects the nowcasting metrics.

Significance Statement

Deep learning methods started to become more popular for weather nowcasting, but none is operational. We noticed that most of the studies did not present a benchmark against operational systems, and in many cases, the evaluation methods used were limited to light rain scenarios. Our goal is to propose deep learning models, analyze how they perform against operational methods in many scenarios, and explore some of the potential limitations. We found that our models perform better for low to mild storms. While we showed that common side effects of convolutional operations (a common technique in deep learning) did not impair the performance, we agree with many other authors that the major problem is the smoothing effect that hinders the nowcast of intense storms.

© 2021 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Joaquin Cuomo, jcuomo@alumni.colostate.edu

1. Introduction

Storm nowcasting consists of predicting the behaviors of storms ranging from a few minutes to hours ahead of time, allowing local authorities to be warned about hazardous situations. Usually, storm nowcasting relies on data detected by weather radars, which allows meteorologists to produce highly detailed location-specific predictions. This provides information on the shape, intensity, size, direction, and speed of storms.

The nowcasting systems deployed worldwide mostly rely on methods that use optical flow, by extrapolating the radar echoes to estimate the upcoming weather. However, around 2015, machine learning approaches have become more popular for storm nowcasting (Shi et al. 2015), but to the best of the authors’ knowledge, there is no DL-based model operational at the time. While some publications have shown to outperform methods based on optical flow [such as Real-time Optical flow by Variational methods for Echoes of Radar (ROVER; Shi et al. 2015); tracking radar echoes by correlation (TREC) and continuous TREC (CO-TREC; Shi et al. 2018); and Farneback (Akbari Asanjan et al. 2018)], few studies compare their methods against operational systems.

The main difference between using DL compared to many other methods is that it relies more on the data fed to the network than in a physics-based model approach. In addition, over the last couple of years, neural networks have proven to outperform many traditional approaches in different fields that require expertise in the process of feature extraction, and now in nowcasting, it is starting to show promising results. In many fields, it is common for DL models to suffer from a lack of training data, which limits their ability to successfully generalize their learning. However, radar data have been available for many decades, and these techniques are starting to gain attention. We believe that DL-based models have the potential to replace current nowcasting methods in the next few years. Thus, we intend to bridge that gap by proposing three DL models and comparing them with two operational systems and six other methods, four of which are based on optical flow and two on DL. The proposed methods are based on convolutional networks with skip connections (He et al. 2016), recurrent layers (Shi et al. 2015), and an ensemble of several models. The latter aims to improve the common problem found in DL models of blurry predictions over time (Jing et al. 2019b; Tran and Song 2019; Franch et al. 2020).

After benchmarking the proposed models, we aim to explore some of the limitations and advantages of the DL methods, and how they perform compared to different approaches. The limitations analyzed are regarding two inherent aspects of DL approaches. First, the data-driven aspect, which can result in limited use of the model to the same geographical areas where the training data were from (explained in section 4c). Second, to the learning aspect, which can produce artifacts on the predictions (explained in section 4d). To facilitate future comparisons with other papers we chose to present the evaluation metrics most commonly found in the literature: critical success index, false alarm ratio, probability of detection, mean squared error, mean absolute error, equitable threat score, accuracy, and structural similarity (explained in section 3d).

2. Background

The nowcasting of weather radar echoes is the prediction of the distribution, intensity, and appearance of future sequences and is based on historical observations. This problem can be described as a translation of sequence to sequence of images or as a spatiotemporal prediction problem.

The formalization of the nowcasting problem in the context of this paper is as follows: given a sequence of n past frames ftn,ftn+1,,ft, it is best to estimate the k future frames ft+1,,ft+k, so the goal is to find a model M such that
M(ftn,ftn+1,,ft)ft+1,,ft+k.
Nowcasting methods have been evolving from the earliest interpretation of what could happen in the next few hours to modern methods that try to automate the predictions using models of different complexities. Extrapolation techniques are the most frequent choice when the time lapse is short and the scale is small. Moreover, radar is the primary observation type used to issue warnings as it is suitable to track storms (Wang et al. 2017). To estimate precipitation, the most important measurement from the radars is the reflectivity factor or Z factor. Normally, this is expressed in decibels (dBZ). As the range of Z increases from 0.001 mm6 m−3 (caused by light fog) to 36 000 000 mm6 m−3 (large hail), dBZ increases from −30 to 75 dBZ (Milrad 2018).

Nowcasting systems need to run in real-time. Therefore, computationally intensive models often do not perform well. This is especially the case with numerical weather prediction (NWP), where their standard update time is on the order of hours, and the meteorological radar observations can have a high temporal sampling on the order of minutes for urban radar networks. However, there are adaptations of NWP that are intended to perform faster. Due to the complexity of finding global parameters, many of the methods used have different advantages and limitations, and they usually perform better for a specific region. Table 1, taken from the 2017 version (last available) Guidelines for Nowcasting Techniques by the World Meteorological Organization (Wang et al. 2017), lists some of the operational systems divided into different categories according to how they approach the predictions. Some methods detect the precipitation area and then track it, while others extrapolate the whole radar image. It is common for the latter to use methods originally built on the principles of optical flow to obtain the motion vectors and then apply semi-Lagrangian advection schemes to make the predictions. More sophisticated methods integrate several inputs and different nowcasting algorithms. It is worth noticing that no DL approach is listed. After applying one or more of these algorithms, a human forecaster will examine and process the results with other information, such as changes that could occur during the day, and decide if severe weather may happen.

Table 1.

Literature summary on prediction models from Wang et al. (2017) categorized by their approach.

Table 1.

In Table 2, a list of different papers using machine learning approaches to address weather radar nowcasting is shown. In 2017, Shi et al. (2017) proposed trajGRU which outperformed the optical flow–based model ROVER from Woo and Wong (2017). In 2018, Shi et al. (2018) proposed recurrent dynamic CNNs (RDCNN), which also obtained better scores than CO-TREC. In 2019, Jing et al. (2019a) proposed a model [adversarial extrapolation neural network (AENN)] using a generative adversarial network. AENN did better than ROVER and TREC in most of the metrics, as well as producing more realistic predictions when inspected. In 2020, Ayzel et al. (2020) presented RainNet which outperform the author’s optical flow based-model (RainyMotion from Ayzel et al. 2019a) at lower thresholds.

Table 2.

Literature review of machine learning models used for weather nowcasting. Abreviations used in this table are NM = not mentioned, CORR = correlation.

Table 2.

3. Research methodology

The first part of this section briefly explains the data used to train and test the models. Then the proposed models are presented. After that, the baseline models are described. Finally, the metrics to compare all models are introduced.

a. Dataset

The datasets used to train, evaluate, and test the different models were built using reflectivity (Z) data from the WSR-88D from Dallas–Fort Worth in the United States, belonging to the NEXRAD network. We used level-II data downloaded from NCEI (2021). The training dataset is from 2005 to 2017 containing 1200 events, the validation dataset is from 2018 containing 150 events, and the testing dataset from 2019 containing 50 events. Also, a testing dataset from Denver was used in one experiment containing 30 events from 2017. The data were converted to Cartesian coordinates using a range of 300 km × 300 km (latitude and longitude) at 20 km of altitude. The gridded map, after converting from polar to Cartesian coordinates, was defined in a 64 × 64 matrix, and the reflectivity values clipped from 0 to 70 dBZ. Each event consists of 32 frames, where the first 16 are used as the context to make the predictions, and the next 16 as the observation to compare against the predictions. We provided the step-by-step on how the dataset was constructed in the Cuomo (2020) repository.

Some of the baseline models used in the benchmark rely on a strict sampling rate. Thus, the testing dataset was resampled to have a regular time step. The resampling was done in Cartesian coordinates using linear interpolation from a ~4 min spacing to 5 min. It is important to remark that interpolating in polar coordinates on each scan ray or applying an advection correction in the Cartesian coordinates would likely be more accurate.

b. Models

Before describing the models, we provide to the readers unfamiliar with convolutional layers and residual architectures the Brownlee (2019) and Ruiz (2019) references as introductory guides on these topics.

Three different machine learning models are proposed, named resConv, resGRU, and Composite. The models are based on the encoder–decoder implementation of Shi et al. (2017). They have some differences in the structure, where the most significant is the residual connections (also known as skip connections) between the encoder and the decoder inspired on the ResNet model (He et al. 2016). The idea behind using residual connections is to have deeper networks while avoiding the vanishing gradient problem as the backpropagation has shortcuts to follow compared to a network without them. This allowed us to deepen the up-/downsampling layers to be a compound of several layers to split the process of learning relevant features and changing the scale.

The architecture of the first model, resConv, is shown in Table 3, where the downsampling is done with convolutions and the upsampling with a transposed convolutions. For the second model, resGRU, shown in Table 4, we incorporated a recurrent neural network (RNN) as the last layer of each upsampling–downsampling block. Following the design of Shi et al. (2017), the initial states of the decoders’ RNN layers are given by the output state of the encoder RNN layers. In Fig. 1 we exemplify this type of architectures with a diagram. We decided to build these two models, differing only in having RNN or not, as they harness two of the most popular architectures nowadays and will provide a more comprehensive benchmark for future references.

Table 3.

ResConv_16-16 model architecture. References used: K = kernel, S = stride, output shape = (channels, frames, height, width), skip connections are the addition of inputs/outputs with same number of asterisks, and initial states for recurrent layers in the decoder (upsampling section) are the outputs from the encoder (downsampling section) marked with the same number of hyphens. The third column lists the type of activation (Act.), normalization (norma.), and regularization (reg.) layers used at each stage.

Table 3.
Table 4.

ResGRU model architecture. References used: K = kernel, S = stride, output shape = (channels, frames, height, width), skip connections are the addition of inputs/outputs with same number of asterisks, and initial states for recurrent layers in the decoder (upsampling section) are the outputs from the encoder (downsampling section) marked with the same number of hyphens.

Table 4.
Fig. 1.
Fig. 1.

Example architecture of proposed models with RNN layers. resConv, which does not have RNN, has same architecture removing the RNN.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

The third model, Composite, consists of an ensemble of multiple predictions. Franch et al. (2020) proposed before an ensemble aiming also to minimize the blur in the prediction, which they formulated as conditional bias (Ciach et al. 2000). The difference in their approach is that to generate different layers of the ensemble, they varied the loss function as a function of a reflectivity threshold. On the other hand, our approach relies on thresholding the target values by different reflectivity values as shown in Fig. 2. The ensemble is constructed using a base prediction, obtained by one of the previous models, which has values ranging continuously from 0 to 70 dBZ. Then additional layers of binary predictions are stacked by keeping the maximum pixel value. These binary predictions consist of using a model with the same architecture as before but trained on the dataset thresholded at different values. For example, for the first binary layer, where the used threshold is 20 dBZ, the model will predict only reflectivity values higher than 20 dBZ, and it will be interpreted as if they were 20 dBZ. The goal is to improve the base prediction with all these additional layers as they have been trained on a more specific target. Equation (1) formalizes how all predictions are combined to produce the output of the Composite model. The formula to obtain this output is decomposed pixelwise:
Compositeij=MAX{baseij,pred20dBZij,pred25dBZij,pred30dBZij,pred35dBZij},
where indexes i, j correspond to the pixel coordinates of each frame. The “base” is the nonthresholded prediction, and each predXdBZ is a prediction from the respective model using the threshold X dBZ.
Fig. 2.
Fig. 2.

Diagram of the binary model. The adaptation of a nonbinary to a binary model consists of adding a threshold operation at the input of the target and a scale factor at the output of the model. The output will have two different values, 0 and x dBZ, where x is the threshold used.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

All models were trained using Adam optimizer with default parameters, LogCosH as the loss function (Ayzel et al. 2019b), a learning rate of 0.0001, a dropout value of 0.01. The training was stopped when the minimum of the function loss on the validation dataset was reached or at 20 epochs, whichever condition was satisfied first (Cuomo and Chandrasekar 2021).

c. Baseline models

To compare against the proposed models, we used baseline models based both on optical flow and DL.

1) Based on optical flow

  1. Extrapolation—Generates a nowcast by applying a simple advection-based extrapolation to the given precipitation field.

  2. Spectral Prognosis (S-PROG)—The motion field is used to generate a deterministic nowcast with the Spectral Prognosis model, which implements a scale filtering approach in order to progressively remove the unpredictable spatial scales during the forecast.

  3. Short-Term Ensemble Prediction System (STEPS)—It is an extension of the S-PROG approach that includes a stochastic term to consider the variance produce by the unpredictable development of the storm. It is currently operational in Finland, and other implementations of the same algorithm are being used by the Australian Bureau of Meteorology and the Royal Meteorological Institute of Belgium.

  4. ANVIL—Originally developed to use with vertically integrated liquid (VIL) data, it can be used with any two-dimensional input. It is an extrapolation-based nowcast that uses a cascade decomposition and a multiscale autoregressive integrated model that can predict growth and decay.

All the above models are implementations from the pySTEPS project (Pulkkinen et al. 2019), which provides open-source nowcasting models regularly improved. For each model, the estimation of the motion field was done using Lucas–Kanade optical flow method. The hyperparameters of the models were optimized using Bayesian optimization on the validation dataset.

While extrapolation is a static nowcast, S-PROG can predict growth and decay but with loss of small-scale features and with postprocessing to correct bias and wet-area ratios. On the other hand, ANVIL can predict growth and decay while preserving the small-scale structures without the need for postprocessing.

  • (v) Dynamic and Adaptive Radar Tracking of Storms (DARTS)—Currently operational in Dallas–Fort Worth uses Collaborative Adaptive Sensing of the Atmosphere (CASA) radars (Ruzanski et al. 2011) and issues warnings to more than 7 million people. It uses linear least squares estimations implemented in the Fourier domain for motion estimation with advection performed via a kernel-based method formulated in the spatial domain.

2) Machine learning based models

  1. RainNet—Model proposed by Ayzel et al. (2020) is conceptually similar to resConv as it uses an encoder–decoder architecture with residual connections. The authors were mainly inspired by the SegNet (Badrinarayanan et al. 2017) and the U-Net (Ronneberger et al. 2015), which are used for image segmentation. The authors’ implementation was used in this paper (Ayzel 2020).

  2. trajGRU—Proposed by Shi et al. (2017), and the inspiration for resGRU. It is an encoder–decoder architecture using a specific recurrent neural network designed to capture radar data dynamics. The authors’ implementation in MxNet was used (model “trajGRU L17” from Shi 2017) as well as Huang (2019) implementation in PyTorch. Because we obtained better results with the latter, only those results are shown.

All models were trained using the original authors parameters, but on the same training dataset as the proposed models. The stopping criteria was the minimum of the evaluation loss (i.e., when they start to overfit).

d. Evaluation metrics

Multiple metrics are used to evaluate the predictions. The Developmental Testbed Center has a comprehensive guide (Fowler et al. 2020) that includes most of them. Two main categories pertain to this paper: categorical metrics (which use 0 to indicate no reflectivity presence and 1 to indicate the presence of the reflectivity range under analysis), and continuous metrics, which use continuous reflectivity values.

The metrics used in this paper are the critical success index (CSI), false alarm ratio (FAR), probability of detection (POD), mean squared error (MSE), mean absolute error (MAE), equitable threat score (ETS), accuracy (ACC), and structural similarity (SSIM). The only one not explained by Fowler et al. (2020) is the SSIM as it is not normally used in the nowcasting community. Nevertheless, some authors have used it and it is widely adopted in the computer vision community. Moreover, it provides a good measure of the structural degradation in an image, which we consider potentially useful to capture the blur we observed in the predictions. Thus, we decided to include this metric to provide an abstraction from the nowcasting community and analyze the results from a computer vision point of view. The metric is defined as
SSIM=(2μPμO+C1)+(2σPO+C2)(μP2+μO2+C1)(σP2+σO2+C2),
where μ is the average, σ the standard deviation, C1 = (k1L)2 and C2 = (k2L)2 stabilizers when the denominator is weak, L the pixel-value dynamic range, and k1, k2 constants. The formula above is applied with a sliding window of size 11 × 11 using a Gaussian weighting function. It considers the luminance, the contrast, and the structure of the frames. It ranges from −1 to 1, with 1 being the perfect score.

4. Experiments

The experiments are organized as follows. First, a comparison between using or not residual connection in the proposed models. Second, a comparison against the baseline models to evaluate the performance of the proposed models. Then two different experiments show some of the advantages and disadvantages of using machine learning approaches. The first experiment addresses potential limitations of the data-driven aspect of machine learning models. The other experiment addresses potential issues with specific techniques used in the proposed models to generate the prediction frames.

a. Assessing the use of residual connections

We first analyze the effects in the performance of using or not residual connections. For this, the two proposed models, resGRU and resConv, are compared against their corresponding models without any residual connection.

The results consist of two parts. First, an example prediction at the farthest predicted frame (approximately 1 h in the future). Second, the average over multiple thresholds of different metrics. Figure 3a shows the predictions. The first frame corresponds to the observation of what happened, and then the predicted frame for each of the models. Visually, we can observe that the resConv model performed better than its counterpart Conv. On the other hand, the differences between resGRU and convGRU are more subtle. The former seems to have a more similar shape to the observation, while the latter predicts better the intensity values. Analyzing the metrics shown in Fig. 3b, in blue we see that resConv outperforms Conv in every score. As expected from the visual analysis, the metrics are not conclusive for the models using recurrent layers. convGRU has better scores on the CSI, but resGRU does better for FAR, SSIM, and most of MSE.

Fig. 3.
Fig. 3.

Comparison between using and not using residual connection. (a) Example prediction at 1 h in the future. The first frame is the observation and the following four frames are the prediction for resConv, Conv, resGRU, and convGRU. (b) Average metrics over multiple thresholds at every predicted lead time. In red, models with recurrent layers, and in blue without recurrent layers. Dashed lines correspond to models without residual connections.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

b. Comparing proposed models against baseline models

The proposed models resGRU, resConv, and Composite (using resConv as base) are compared against the models listed in section 3c. The comparison against DARTS and STEPS are the most relevant experiments in this paper as it compares the proposed model against currently operational nowcasting systems. Particularly, the nowcasts from DARTS were done externally, while we ran the nowcast for pySteps, trajGRU, and RainNet. PySteps models were optimized on the validation dataset, and the DL models were trained on the training set. We acknowledge that regardless of the intentions to produce the best nowcast out of them, it is likely that the original authors can further optimize their models for the datasets used in this paper and obtain better results. Therefore, the comparisons must be taken in that context.

The average metrics over different thresholds are shown in Fig. 4 for all models. Observing the CSI score, resGRU, STEPS, ANVIL, S-PROG, and Extrapolation have similar behavior and performance. On the initial lead times, Composite and resConv do not perform well relative to the rest but then outperform them on the furthest lead time. DARTS behaves similarly to Composite and resConv but slightly worst. On the FAR scores, resGRU and resConv do overall better than the rest, while DARTS has the worst score. On the MSE scores, again, resGRU and resConv have the best scores as well as Composite, and in this case, DARTS performs better than the S-PROG, ANVIL, and STEPS approach, and similar to Extrapolation. Finally, neither RainNet nor trajGRU did well overall.

Fig. 4.
Fig. 4.

Performance metrics for all analyzed models. CSI, FAR, POD, and ETS are calculated using a threshold to binarize the predictions, so the results shown here are the average over multiple used thresholds (listed between parentheses) for those metrics.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

An example prediction is displayed in Figs. 5 and 6. We can observe that resConv did not predict reflectivity values above 45 dBZ. The optical flow implementations do not predict the whole frame area. The Composite model, in this example, seems to overestimate lower reflectivity values, and DARTS did not accurately predict the storm’s translation. ANVIL predicted growth that exceeds the 70 dBZ and RainNet a decay in size and intensity. To understand more about the differences between the models, Fig. 7 shows the average over the testing dataset of only the CSI and FAR scores at each threshold for those models that performed better. Here, we can observe that Composite and resConv have overall the best scores except at reflectivities higher than 35 dBZ, where Extrapolation, S-PROG, and DARTS have better performance.

Fig. 5.
Fig. 5.

Example predictions for ML-based models. (top row) The observation and (remaining rows) the predictions for different models.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

Fig. 6.
Fig. 6.

Example predictions for optical flow–based models. The observation can be found in Fig. 5, and each row is a prediction for different models.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

Fig. 7.
Fig. 7.

Columns show the benchmark of top-performance models at different reflectivity thresholds using only the metrics (top) CSI and (bottom) FAR. This figure should be analyzed columnwise, where thresholds are (a) 20, (b) 25, (c) 30, and (d) 35 dBZ. This allows us to characterize the models’ performance at different storm severity.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

c. Performance on different geographical regions

As machine learning models are heavily dependent on the training set, it is of interest to compare the model’s performance on a dataset from a different location. For this, a dataset with Denver data (NEXRAD KFTG radar) from the spring of 2017 is used as well as the spring 2019 data from Dallas–Fort Worth (same radar as the training dataset). The ideal comparison would be doing a cross-training/testing with the same model trained on radar A, then test on radar A and B, then trained on radar B, and then tested on radar A and B. But, because of the lack of availability on two different training datasets, a different approach is taken. The comparison is made with DARTS nowcasting system, which is independent (in the sense that no training data are used) of the radar. Therefore, the analysis is done on how relatively similar the nowcasts are between DARTS and resConv on one dataset and then in another dataset.

For these experiments, the metrics on both datasets and models, DARTS and resConv, are presented in Fig. 8. The relative performance between both models on each dataset seems consistent as both do worse on Denver predictions. To quantify this similarity, the absolute difference between each dataset is computed and shown in Fig. 9. The ranges of each metric are the same as in Fig. 8 or the metrics’ theoretical ranges. The more the curves overlap, the more similar performance their cross datasets have.

Fig. 8.
Fig. 8.

Comparison of average metrics over thresholds (listed between parentheses) for different regions’ datasets.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

Fig. 9.
Fig. 9.

Comparison of the difference between average metrics over thresholds (ResConv − DARTS) for different regions’ datasets.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

d. On the generation of frames

The way the prediction frames are generated in machine learning approaches highly differs from other methods. In optical flow, for example, frames are the result of calculating the advection of a storm, while in the methods proposed here are generated by convolutional operations from an encoded state. Using the encoder–decoder architecture can generate issues in the prediction during the upsampling convolutional layers, where the frames’ dimension is being increased, such as checkerboard artifacts (Odena et al. 2016), blurriness (Tran and Song 2019), and edge artifacts (Liu and Jia 2008). In this experiment, we try to analyze if these specific aspects have an impact on the quality of the predicted frames. To minimize checkerboard artifacts, we kept the strides and kernel sizes of the upsampling convolutions multiple of each other, as well as using 1-stride convolutions after them (Odena et al. 2016). During the training phase, it is common to see how the edges are the last part of the frame to show correct predictions, and blurriness can be seen across the entire frame. Figure 10 exemplifies these problems on two storm events from our dataset. The observations are contrasted with the predictions on an early stage of the training phase, where these artifacts are more evident.

Fig. 10.
Fig. 10.

Example of a storm event showing (a) checkerboard artifacts and (b) edge problems and blurriness. (top) Observations of two storm frames and (bottom) predictions in an early stage of the model training.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

The first analysis is done to evaluate if the fully trained model suffers from edge problems. For this, the metrics were averaged samplewise instead of framewise and then also averaged samplewise on the validation dataset containing 134 events. This results in each pixel in each frame having the average metric for that location. Only MSE, ETS, and FAR are shown in Fig. 11. The results show no clear pattern that would indicate that the edges are performing worst. The metrics would be expected to be homogeneous across each frame but it is not, most likely due to the sample size. With different datasets, the pattern varies, which means that it does not perform better in the upper-right corner as these results suggest.

Fig. 11.
Fig. 11.

Pixelwise metrics over the frames of the validation set containing storm events to detect edge artifacts on the predictions. Rows are different metrics—(top) MSE, (middle) ETS, and (bottom) FAR—and columns are different lead times. The optimal result corresponds to a homogeneous frame, while an unwanted result corresponds to a repetitive pattern across frames.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

The following analysis is done in an attempt to measure the blurriness on the edges. To quantify this, a Laplacian kernel was convolved (which estimates the second-order derivative of a matrix and is interpreted as rapid intensity changes in an image) on each frame and then the standard deviation was computed on each pixel (with a filter size of 3 × 3). The higher the variance, the less blurry an image is (Pech-Pacheco et al. 2000). Because a kernel is convolved twice, this would introduce artifacts on the edges as they are abrupt transitions, which interfere directly with the experiment’s goal. For that, different ways of extending the frames were tried, and all yield similar results. As shown in Fig. 12, a comparison between the observation and the prediction was made. In the predictions, the edges have higher values, which is the opposite of what was expected, meaning that it is less blurry than in the rest of the image. The lack of these edges on the observation frames means that it is not a consequence of the method used to detect blurriness. Even though is not actually blur on the predictions, it is undoubtedly a type of artifact. But following the previous experiment result, it does not present a significant impact on the standard metrics.

Fig. 12.
Fig. 12.

Pixelwise blur detection plots over the validation set containing storm events to detect blur artifacts on the predicted frames. Values are calculated as the standard deviation of a Laplacian kernel; refer to section 4d for a detailed explanation. (top) Observations and (bottom) predictions. Any artifact found also in the observation means that it was not produced by the model, but by the radar or postprocessing.

Citation: Journal of Atmospheric and Oceanic Technology 38, 9; 10.1175/JTECH-D-21-0012.1

Two other “anomalies” appeared on the blur analysis, one in the center with a circular shape, and the other like rays from the center, which are stronger in the observations. The former is most likely the radar itself, and the latter could be a calibration issue on the radar, as it seems to have a more abrupt change between two parts of the sweep.

5. Discussion

In this paper, we proposed three deep learning models for weather nowcasting using radar echo data and compared them with existing models, particularly against two operational models.

A common issue with deep learning approaches is that they tend to underestimate the intensity of reflectivity. To address this, we proposed the Composite model, which shows promising results on enhancing the predictions at higher reflectivity values. Although this ensemble can improve any base model, we believe that the primary focus should be put on improving the training dataset, which lacks intense storm events.

During the process of designing new models, we analyzed the use of residual connections. For purely convolutional networks it proves to have a positive impact on the performance. resConv reached a similar performance as the convGRU model, suggesting that using residual connections can narrow the gap introduced by recurrent layers. Similar results were obtained by Bai et al. (2018). On the other hand, using residual connections in models with recurrent layers did not show the same boost in performance. Metrics such as SSIM, which tries to reflect the human interpretation of image similarity, favored resGRU, which corresponds to the visual analysis made in section 4a. However, the CSI score, which can be considered the most relevant for weather nowcasting, favored convGRU.

DARTS makes good predictions and outperforms the proposed methods in some aspects such as predicting high reflectivity values, and not suffering from the blurriness effect. Similarly, Extrapolation and S-PROG from the pySteps project did very well, as they correctly predicted the reflectivity range and the shape changes. On the other hand, the proposed models outperform all baseline systems at lower reflectivities, especially improving the ratio of false alarms, which is an important issue in the nowcasting field. Particularly, the Composite model has the advantage of boosting the CSI scores at high Z values but also at the expense of increasing the FAR scores.

None of the proposed models nor the baseline models outperformed the rest in all of the analyzed metrics. This is because each method has certain advantages only in certain aspects, and all the metrics used here try to capture each of them. As mentioned before, some of the metrics, e.g., CSI and FAR, are counterparts, meaning that they quantify opposite aspects of a trade-off. Therefore, unless a model outperforms at both (or at all metrics), it means they only improve one aspect of the trade-off. For example, the proposed models presented an improvement in the FAR scores overall, but they only outperformed integrally the baseline models at reflectivity values below 30 dBZ. These results fall in line with other studies (refer to section 1), where the blur (smoothing effect) impair the predictions of DL models at higher reflectivities.

Finally, we showed that some of the potential limitations on using machine learning models do not have a major impact on the predictions. The models showed promising performance in other geographical areas aside from the ones it has been trained on. The checkerboard and edge artifacts are essential to the way frames are being generated and are extinguished after the training phase is completed. Some of the disadvantages of using machine learning approaches are that it is slow to train, it requires a large amount of historical data, high computation power, and it is a black box, meaning that it is difficult to provide confidence intervals for the predictions.

Acknowledgments

This research is supported by the National Science Foundation (Grant 1639570).

Data availability statement

These datasets were derived from the following public domain resource: NEXRAD Inventory (https://www.ncdc.noaa.gov/nexradinv/). The code used to process NEXRAD data is available in the Cuomo (2020) repository (https://github.com/JCuomo/NEXRAD_dataset).

REFERENCES

  • Agrawal, S., L. Barrington, C. Bromberg, J. Burge, C. Gazen, and J. Hickey, 2019: Machine learning for precipitation nowcasting from radar images. arXiv, https://arxiv.org/abs/1912.12132.

  • Akbari Asanjan, A., T. Yang, K. Hsu, S. Sorooshian, J. Lin, and Q. Peng, 2018: Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos., 123, 12 54312 563, https://doi.org/10.1029/2018JD028375.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayzel, G., 2020: Rainnet implementation in PyTorch. Github, https://github.com/hydrogo/rainnet.

  • Ayzel, G., M. Heistermann, and T. Winterrath, 2019a: Optical flow models as an open benchmark for radar-based precipitation nowcasting (rainymotion v0.1). Geosci. Model Dev., 12, 13871402, https://doi.org/10.5194/gmd-12-1387-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayzel, G., M. Heistermann, A. Sorokin, O. Nikitin, and O. Lukyanova, 2019b: All convolutional neural networks for radar-based precipitation nowcasting. Procedia Comput. Sci., 150, 186192, https://doi.org/10.1016/j.procs.2019.02.036,.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayzel, G., T. Scheffer, and M. Heistermann, 2020: Rainnet v1.0: A convolutional neural network for radar-based precipitation nowcasting. Geosci. Model Dev., 13, 26312644, https://doi.org/10.5194/gmd-13-2631-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Badrinarayanan, V., A. Kendall, and R. Cipolla, 2017: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39, 24812495, https://doi.org/10.1109/TPAMI.2016.2644615.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bai, S., J. Z. Kolter, and V. Koltun, 2018: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv, https://arxiv.org/abs/1803.01271.

  • Bowler, N. E. H., C. E. Pierce, and A. Seed, 2004: Development of a precipitation nowcasting algorithm based upon optical flow techniques. J. Hydrol., 288, 7491, https://doi.org/10.1016/j.jhydrol.2003.11.011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bowler, N. E. H., C. E. Pierce, and A. Seed, 2006: STEPS: A probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled NWP. Quart. J. Roy. Meteor. Soc., 132, 21272155, https://doi.org/10.1256/qj.04.100.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brownlee, J., 2019: A gentle introduction to padding and stride for convolutional neural networks. Machine Learning Mastery, https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/.

  • Ciach, G. J., M. L. Morrissey, and W. F. Krajewski, 2000: Conditional bias in radar rainfall estimation. J. Appl. Meteor., 39, 19411946, https://doi.org/10.1175/1520-0450(2000)039<1941:CBIRRE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cuomo, J., 2020: NEXRAD dataset. Github, https://github.com/JCuomo/NEXRAD_dataset.

  • Cuomo, J., and V. Chandrasekar, 2021: Developing deep learning models for storm nowcasting. IEEE Trans. Geosci. Remote Sens., in press.

  • Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm identification, tracking, analysis, and nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785797, https://doi.org/10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Foresti, L., I. V. Sideris, D. Nerini, L. Beusch, and U. Germann, 2019: Using a 10-year radar archive for nowcasting precipitation growth and decay: A probabilistic machine learning approach. Wea. Forecasting, 34, 15471569, https://doi.org/10.1175/WAF-D-18-0206.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fowler, T., J. H. Gotway, K. Newman, T. Jensen, B. Brown, and R. Bullock, 2020: The Model Evaluation Tools v7.0 (METv7.0) user’s guide. Developmental Testbed Center Doc., 482 pp., https://dtcenter.org/sites/default/files/community-code/met/docs/user-guide/MET_Users_Guide_v9.0.1.pdf.

  • Franch, G., D. Nerini, M. Pendesini, L. Coviello, G. Jurman, and C. Furlanello, 2020: Precipitation nowcasting with orographic enhanced stacked generalization: Improving deep learning predictions on extreme events. Atmosphere, 11, 267, https://doi.org/10.3390/atmos11030267.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Germann, U., and I. Zawadzki, 2002: Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology. Mon. Wea. Rev., 130, 28592873, https://doi.org/10.1175/1520-0493(2002)130<2859:SDOTPO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haiden, T., A. Kann, C. Wittmann, G. Pistotnik, B. Bica, and C. Gruber, 2011: The Integrated Nowcasting through Comprehensive Analysis (INCA) system and its validation over the eastern Alpine region. Wea. Forecasting, 26, 166183, https://doi.org/10.1175/2010WAF2222451.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, L., J. Sun, W. Zhang, Y. Xiu, H. Feng, and Y. Lin, 2017: A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos., 122, 40384051, https://doi.org/10.1002/2016JD025783.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. 2016 IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, IEEE, 770–778, https://doi.org/10.1109/CVPR.2016.90.

    • Crossref
    • Export Citation
  • Hering, A. M., C. Morel, G. Galli, P. Ambrosetti, and M. Boscacci, 2004: Nowcasting thunderstorms in the alpine region using a radar based adaptive thresholding scheme. Proc. Third European Conf. on Radar Meteorology, Visby, Sweden, ERAD, 206–211.

  • Huang, Z., 2019: trajGRU implementation in PyTorch. Github, https://github.com/Hzzone/Precipitation-Nowcasting.

  • Isaac, G. A., and Coauthors, 2014: The Canadian Airport Nowcasting System (CAN-NOW). Meteor. Appl., 21, 3049, https://doi.org/10.1002/met.1342.

  • James, P. M., B. K. Reichert, and D. Heizenreder, 2018: NowCastMIX: Automatic integrated warnings for severe convection on nowcasting time scales at the German Weather Service. Wea. Forecasting, 33, 14131433, https://doi.org/10.1175/WAF-D-18-0038.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jing, J., Q. Li, X. Ding, N. Sun, R. Tang, and Y. Cai, 2019a: AENN: A generative adversarial neural network for weather radar echo extrapolation. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci, 42-3, 8994, https://doi.org/10.5194/isprs-archives-XLII-3-W9-89-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jing, J., Q. Li, and X. Peng, 2019b: MLC-LSTM: Exploiting the spatiotemporal correlation between multi-level weather radar echoes for echo sequence extrapolation. Sensors, 19, 3988, https://doi.org/10.3390/s19183988.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Johnson, J. T., P. L. MacKeen, A. Witt, E. D. W. Mitchell, G. J. Stumpf, M. D. Eilts, and K. W. Thomas, 1998: The Storm Cell Identification and Tracking algorithm: An enhanced WSR-88D algorithm. Wea. Forecasting, 13, 263276, https://doi.org/10.1175/1520-0434(1998)013<0263:TSCIAT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jung, S., and G. Lee, 2015: Radar-based cell tracking with fuzzy logic approach. Meteor. Appl., 22, 716730, https://doi.org/10.1002/met.1509.

  • Li, L., W. Schmid, and J. Joss, 1995: Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteor., 34, 12861300, https://doi.org/10.1175/1520-0450(1995)034<1286:NOMAGO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, P., and E. Lai, 2004: Short-range quantitative precipitation forecasting in Hong Kong. J. Hydrol., 288, 189209, https://doi.org/10.1016/j.jhydrol.2003.11.034.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, R., and J. Jia, 2008: Reducing boundary artifacts in image deconvolution. 2008 15th IEEE Int. Conf. on Image Processing, San Diego, CA, IEEE, 505–508, https://doi.org/10.1109/ICIP.2008.4711802.

    • Crossref
    • Export Citation
  • Milrad, S., 2018: Radar imagery. Synoptic Analysis and Forecasting: An Introductory Toolkit, Elsevier, 163–177, https://doi.org/10.1016/B978-0-12-809247-7.00012-0.

    • Crossref
    • Export Citation
  • Mueller, C., T. Saxen, R. Roberts, J. Wilson, T. Betancourt, S. Dettling, N. Oien, and J. Yee, 2003: NCAR Auto-Nowcast System. Wea. Forecasting, 18, 545561, https://doi.org/10.1175/1520-0434(2003)018<0545:NAS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • NCEI, 2021: NEXRAD data inventory search. NOAA, accessed 10 October 2019, http://www.ncdc.noaa.gov/nexradinv/.

  • Odena, A., V. Dumoulin, and C. Olah, 2016: Deconvolution and checkerboard artifacts. Distill, https://distill.pub/2016/deconv-checkerboard/.

    • Crossref
    • Export Citation
  • Pech-Pacheco, J. L., G. Cristobal, J. Chamorro-Martinez, and J. Fernandez-Valdivia, 2000: Diatom autofocusing in brightfield microscopy: A comparative study. Proc. 15th Int. Conf. on Pattern Recognition, Barcelona, Spain, IEEE, 314–317, https://doi.org/10.1109/ICPR.2000.903548.

    • Crossref
    • Export Citation
  • Pulkkinen, S., D. Nerini, A. A. P. Hortal, C. Velasco-Forero, A. Seed, U. Germann, and L. Foresti, 2019: Pysteps: An open-source python library for probabilistic precipitation nowcasting (v1.0). Geosci. Model Dev., 12, 41854219, https://doi.org/10.5194/gmd-12-4185-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ronneberger, O., P. Fischer, and T. Brox, 2015: U-net: Convolutional networks for biomedical image segmentation. 18th Int. Conf. on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, MICCAI Society, 234–241.

    • Crossref
    • Export Citation
  • Ruiz, P., 2019: Understanding and visualizing ResNets. Medium, https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8.

  • Ruzanski, E., V. Chandrasekar, and Y. Wang, 2011: The CASA nowcasting system. J. Atmos. Oceanic Technol., 28, 640655, https://doi.org/10.1175/2011JTECHA1496.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sato, R., H. Kashima, and T. Yamamoto, 2018: Short-term precipitation prediction with skip-connected PredNet. 27th Int. Conf. on Artificial Neural Networks, Rhodes, Greece, European Neural Network Society, 373–382.

    • Crossref
    • Export Citation
  • Shi, E., Q. Li, D. Gu, and Z. Zhao, 2018: A method of weather radar echo extrapolation based on convolutional neural networks. Int. Conf. on Multimedia Modeling, Bangkok, Thailand, MMM, 16–28.

    • Crossref
    • Export Citation
  • Shi, X., 2017: HKO-7. Github, https://github.com/sxjscience/HKO-7.

  • Shi, X., Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. 29th Conf. on Neural Information Processing Systems, Montreal, QC, Canada, NeurIPS, 802–810.

  • Shi, X., Z. Gao, L. Lausen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2017: Deep learning for precipitation nowcasting: A benchmark and a new model. 31st Conf. on Neural Information Processing Systems, Long Beach, CA, NeurIPS, 5618–5628.

  • Su, A., H. Li, L. Cui, and Y. Chen, 2020: A convection nowcasting method based on machine learning. Adv. Meteor., 2020, 5124274, https://doi.org/10.1155/2020/5124274.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tran, Q. K., and S. K. Song, 2019: Computer vision in precipitation nowcasting: Applying image quality assessment metrics for training deep neural networks. Atmosphere, 10, 244, https://doi.org/10.3390/atmos10050244.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, Y., and Coauthors, 2017: Guidelines for nowcasting techniques. WMO Rep. WMO-1198, 82 pp.

  • Woo, W.-C., and W. K. Wong, 2017: Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere, 8, 48, http://doi.org/10.3390/atmos8030048.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save
  • Agrawal, S., L. Barrington, C. Bromberg, J. Burge, C. Gazen, and J. Hickey, 2019: Machine learning for precipitation nowcasting from radar images. arXiv, https://arxiv.org/abs/1912.12132.

  • Akbari Asanjan, A., T. Yang, K. Hsu, S. Sorooshian, J. Lin, and Q. Peng, 2018: Short-term precipitation forecast based on the PERSIANN system and LSTM recurrent neural networks. J. Geophys. Res. Atmos., 123, 12 54312 563, https://doi.org/10.1029/2018JD028375.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayzel, G., 2020: Rainnet implementation in PyTorch. Github, https://github.com/hydrogo/rainnet.

  • Ayzel, G., M. Heistermann, and T. Winterrath, 2019a: Optical flow models as an open benchmark for radar-based precipitation nowcasting (rainymotion v0.1). Geosci. Model Dev., 12, 13871402, https://doi.org/10.5194/gmd-12-1387-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayzel, G., M. Heistermann, A. Sorokin, O. Nikitin, and O. Lukyanova, 2019b: All convolutional neural networks for radar-based precipitation nowcasting. Procedia Comput. Sci., 150, 186192, https://doi.org/10.1016/j.procs.2019.02.036,.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ayzel, G., T. Scheffer, and M. Heistermann, 2020: Rainnet v1.0: A convolutional neural network for radar-based precipitation nowcasting. Geosci. Model Dev., 13, 26312644, https://doi.org/10.5194/gmd-13-2631-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Badrinarayanan, V., A. Kendall, and R. Cipolla, 2017: SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell., 39, 24812495, https://doi.org/10.1109/TPAMI.2016.2644615.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bai, S., J. Z. Kolter, and V. Koltun, 2018: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv, https://arxiv.org/abs/1803.01271.

  • Bowler, N. E. H., C. E. Pierce, and A. Seed, 2004: Development of a precipitation nowcasting algorithm based upon optical flow techniques. J. Hydrol., 288, 7491, https://doi.org/10.1016/j.jhydrol.2003.11.011.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bowler, N. E. H., C. E. Pierce, and A. Seed, 2006: STEPS: A probabilistic precipitation forecasting scheme which merges an extrapolation nowcast with downscaled NWP. Quart. J. Roy. Meteor. Soc., 132, 21272155, https://doi.org/10.1256/qj.04.100.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Brownlee, J., 2019: A gentle introduction to padding and stride for convolutional neural networks. Machine Learning Mastery, https://machinelearningmastery.com/padding-and-stride-for-convolutional-neural-networks/.

  • Ciach, G. J., M. L. Morrissey, and W. F. Krajewski, 2000: Conditional bias in radar rainfall estimation. J. Appl. Meteor., 39, 19411946, https://doi.org/10.1175/1520-0450(2000)039<1941:CBIRRE>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cuomo, J., 2020: NEXRAD dataset. Github, https://github.com/JCuomo/NEXRAD_dataset.

  • Cuomo, J., and V. Chandrasekar, 2021: Developing deep learning models for storm nowcasting. IEEE Trans. Geosci. Remote Sens., in press.

  • Dixon, M., and G. Wiener, 1993: TITAN: Thunderstorm identification, tracking, analysis, and nowcasting—A radar-based methodology. J. Atmos. Oceanic Technol., 10, 785797, https://doi.org/10.1175/1520-0426(1993)010<0785:TTITAA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Foresti, L., I. V. Sideris, D. Nerini, L. Beusch, and U. Germann, 2019: Using a 10-year radar archive for nowcasting precipitation growth and decay: A probabilistic machine learning approach. Wea. Forecasting, 34, 15471569, https://doi.org/10.1175/WAF-D-18-0206.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fowler, T., J. H. Gotway, K. Newman, T. Jensen, B. Brown, and R. Bullock, 2020: The Model Evaluation Tools v7.0 (METv7.0) user’s guide. Developmental Testbed Center Doc., 482 pp., https://dtcenter.org/sites/default/files/community-code/met/docs/user-guide/MET_Users_Guide_v9.0.1.pdf.

  • Franch, G., D. Nerini, M. Pendesini, L. Coviello, G. Jurman, and C. Furlanello, 2020: Precipitation nowcasting with orographic enhanced stacked generalization: Improving deep learning predictions on extreme events. Atmosphere, 11, 267, https://doi.org/10.3390/atmos11030267.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Germann, U., and I. Zawadzki, 2002: Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology. Mon. Wea. Rev., 130, 28592873, https://doi.org/10.1175/1520-0493(2002)130<2859:SDOTPO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haiden, T., A. Kann, C. Wittmann, G. Pistotnik, B. Bica, and C. Gruber, 2011: The Integrated Nowcasting through Comprehensive Analysis (INCA) system and its validation over the eastern Alpine region. Wea. Forecasting, 26, 166183, https://doi.org/10.1175/2010WAF2222451.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Han, L., J. Sun, W. Zhang, Y. Xiu, H. Feng, and Y. Lin, 2017: A machine learning nowcasting method based on real-time reanalysis data. J. Geophys. Res. Atmos., 122, 40384051, https://doi.org/10.1002/2016JD025783.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. 2016 IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, IEEE, 770–778, https://doi.org/10.1109/CVPR.2016.90.

    • Crossref
    • Export Citation
  • Hering, A. M., C. Morel, G. Galli, P. Ambrosetti, and M. Boscacci, 2004: Nowcasting thunderstorms in the alpine region using a radar based adaptive thresholding scheme. Proc. Third European Conf. on Radar Meteorology, Visby, Sweden, ERAD, 206–211.

  • Huang, Z., 2019: trajGRU implementation in PyTorch. Github, https://github.com/Hzzone/Precipitation-Nowcasting.

  • Isaac, G. A., and Coauthors, 2014: The Canadian Airport Nowcasting System (CAN-NOW). Meteor. Appl., 21, 3049, https://doi.org/10.1002/met.1342.

  • James, P. M., B. K. Reichert, and D. Heizenreder, 2018: NowCastMIX: Automatic integrated warnings for severe convection on nowcasting time scales at the German Weather Service. Wea. Forecasting, 33, 14131433, https://doi.org/10.1175/WAF-D-18-0038.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jing, J., Q. Li, X. Ding, N. Sun, R. Tang, and Y. Cai, 2019a: AENN: A generative adversarial neural network for weather radar echo extrapolation. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci, 42-3, 8994, https://doi.org/10.5194/isprs-archives-XLII-3-W9-89-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jing, J., Q. Li, and X. Peng, 2019b: MLC-LSTM: Exploiting the spatiotemporal correlation between multi-level weather radar echoes for echo sequence extrapolation. Sensors, 19, 3988, https://doi.org/10.3390/s19183988.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Johnson, J. T., P. L. MacKeen, A. Witt, E. D. W. Mitchell, G. J. Stumpf, M. D. Eilts, and K. W. Thomas, 1998: The Storm Cell Identification and Tracking algorithm: An enhanced WSR-88D algorithm. Wea. Forecasting, 13, 263276, https://doi.org/10.1175/1520-0434(1998)013<0263:TSCIAT>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jung, S., and G. Lee, 2015: Radar-based cell tracking with fuzzy logic approach. Meteor. Appl., 22, 716730, https://doi.org/10.1002/met.1509.

  • Li, L., W. Schmid, and J. Joss, 1995: Nowcasting of motion and growth of precipitation with radar over a complex orography. J. Appl. Meteor., 34, 12861300, https://doi.org/10.1175/1520-0450(1995)034<1286:NOMAGO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, P., and E. Lai, 2004: Short-range quantitative precipitation forecasting in Hong Kong. J. Hydrol., 288, 189209, https://doi.org/10.1016/j.jhydrol.2003.11.034.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, R., and J. Jia, 2008: Reducing boundary artifacts in image deconvolution. 2008 15th IEEE Int. Conf. on Image Processing, San Diego, CA, IEEE, 505–508, https://doi.org/10.1109/ICIP.2008.4711802.

    • Crossref
    • Export Citation
  • Milrad, S., 2018: Radar imagery. Synoptic Analysis and Forecasting: An Introductory Toolkit, Elsevier, 163–177, https://doi.org/10.1016/B978-0-12-809247-7.00012-0.

    • Crossref
    • Export Citation
  • Mueller, C., T. Saxen, R. Roberts, J. Wilson, T. Betancourt, S. Dettling, N. Oien, and J. Yee, 2003: NCAR Auto-Nowcast System. Wea. Forecasting, 18, 545561, https://doi.org/10.1175/1520-0434(2003)018<0545:NAS>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • NCEI, 2021: NEXRAD data inventory search. NOAA, accessed 10 October 2019, http://www.ncdc.noaa.gov/nexradinv/.

  • Odena, A., V. Dumoulin, and C. Olah, 2016: Deconvolution and checkerboard artifacts. Distill, https://distill.pub/2016/deconv-checkerboard/.

    • Crossref
    • Export Citation
  • Pech-Pacheco, J. L., G. Cristobal, J. Chamorro-Martinez, and J. Fernandez-Valdivia, 2000: Diatom autofocusing in brightfield microscopy: A comparative study. Proc. 15th Int. Conf. on Pattern Recognition, Barcelona, Spain, IEEE, 314–317, https://doi.org/10.1109/ICPR.2000.903548.

    • Crossref
    • Export Citation
  • Pulkkinen, S., D. Nerini, A. A. P. Hortal, C. Velasco-Forero, A. Seed, U. Germann, and L. Foresti, 2019: Pysteps: An open-source python library for probabilistic precipitation nowcasting (v1.0). Geosci. Model Dev., 12, 41854219, https://doi.org/10.5194/gmd-12-4185-2019.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ronneberger, O., P. Fischer, and T. Brox, 2015: U-net: Convolutional networks for biomedical image segmentation. 18th Int. Conf. on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, MICCAI Society, 234–241.

    • Crossref
    • Export Citation
  • Ruiz, P., 2019: Understanding and visualizing ResNets. Medium, https://towardsdatascience.com/understanding-and-visualizing-resnets-442284831be8.

  • Ruzanski, E., V. Chandrasekar, and Y. Wang, 2011: The CASA nowcasting system. J. Atmos. Oceanic Technol., 28, 640655, https://doi.org/10.1175/2011JTECHA1496.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sato, R., H. Kashima, and T. Yamamoto, 2018: Short-term precipitation prediction with skip-connected PredNet. 27th Int. Conf. on Artificial Neural Networks, Rhodes, Greece, European Neural Network Society, 373–382.

    • Crossref
    • Export Citation
  • Shi, E., Q. Li, D. Gu, and Z. Zhao, 2018: A method of weather radar echo extrapolation based on convolutional neural networks. Int. Conf. on Multimedia Modeling, Bangkok, Thailand, MMM, 16–28.

    • Crossref
    • Export Citation
  • Shi, X., 2017: HKO-7. Github, https://github.com/sxjscience/HKO-7.

  • Shi, X., Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. 29th Conf. on Neural Information Processing Systems, Montreal, QC, Canada, NeurIPS, 802–810.

  • Shi, X., Z. Gao, L. Lausen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2017: Deep learning for precipitation nowcasting: A benchmark and a new model. 31st Conf. on Neural Information Processing Systems, Long Beach, CA, NeurIPS, 5618–5628.

  • Su, A., H. Li, L. Cui, and Y. Chen, 2020: A convection nowcasting method based on machine learning. Adv. Meteor., 2020, 5124274, https://doi.org/10.1155/2020/5124274.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tran, Q. K., and S. K. Song, 2019: Computer vision in precipitation nowcasting: Applying image quality assessment metrics for training deep neural networks. Atmosphere, 10, 244, https://doi.org/10.3390/atmos10050244.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wang, Y., and Coauthors, 2017: Guidelines for nowcasting techniques. WMO Rep. WMO-1198, 82 pp.

  • Woo, W.-C., and W. K. Wong, 2017: Operational application of optical flow techniques to radar-based rainfall nowcasting. Atmosphere, 8, 48, http://doi.org/10.3390/atmos8030048.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Example architecture of proposed models with RNN layers. resConv, which does not have RNN, has same architecture removing the RNN.

  • Fig. 2.

    Diagram of the binary model. The adaptation of a nonbinary to a binary model consists of adding a threshold operation at the input of the target and a scale factor at the output of the model. The output will have two different values, 0 and x dBZ, where x is the threshold used.

  • Fig. 3.

    Comparison between using and not using residual connection. (a) Example prediction at 1 h in the future. The first frame is the observation and the following four frames are the prediction for resConv, Conv, resGRU, and convGRU. (b) Average metrics over multiple thresholds at every predicted lead time. In red, models with recurrent layers, and in blue without recurrent layers. Dashed lines correspond to models without residual connections.

  • Fig. 4.

    Performance metrics for all analyzed models. CSI, FAR, POD, and ETS are calculated using a threshold to binarize the predictions, so the results shown here are the average over multiple used thresholds (listed between parentheses) for those metrics.

  • Fig. 5.

    Example predictions for ML-based models. (top row) The observation and (remaining rows) the predictions for different models.

  • Fig. 6.

    Example predictions for optical flow–based models. The observation can be found in Fig. 5, and each row is a prediction for different models.

  • Fig. 7.

    Columns show the benchmark of top-performance models at different reflectivity thresholds using only the metrics (top) CSI and (bottom) FAR. This figure should be analyzed columnwise, where thresholds are (a) 20, (b) 25, (c) 30, and (d) 35 dBZ. This allows us to characterize the models’ performance at different storm severity.

  • Fig. 8.

    Comparison of average metrics over thresholds (listed between parentheses) for different regions’ datasets.

  • Fig. 9.

    Comparison of the difference between average metrics over thresholds (ResConv − DARTS) for different regions’ datasets.

  • Fig. 10.

    Example of a storm event showing (a) checkerboard artifacts and (b) edge problems and blurriness. (top) Observations of two storm frames and (bottom) predictions in an early stage of the model training.

  • Fig. 11.

    Pixelwise metrics over the frames of the validation set containing storm events to detect edge artifacts on the predictions. Rows are different metrics—(top) MSE, (middle) ETS, and (bottom) FAR—and columns are different lead times. The optimal result corresponds to a homogeneous frame, while an unwanted result corresponds to a repetitive pattern across frames.

  • Fig. 12.

    Pixelwise blur detection plots over the validation set containing storm events to detect blur artifacts on the predicted frames. Values are calculated as the standard deviation of a Laplacian kernel; refer to section 4d for a detailed explanation. (top) Observations and (bottom) predictions. Any artifact found also in the observation means that it was not produced by the model, but by the radar or postprocessing.

All Time Past Year Past 30 Days
Abstract Views 779 0 0
Full Text Views 1781 788 77
PDF Downloads 1914 735 75