1. Introduction
Lightning poses significant hazards to society, whether directly through lightning strikes on humans, or indirectly by, for instance, igniting fires, damaging electrical infrastructure, or disrupting aviation. This causes significant harm to human health and life and considerable monetary costs (e.g., Holle 2014, 2016), and climate change is expected to continue to make these societal impacts worse (Price and Rind 1994; Koshak et al. 2015). Conversely, lightning avoidance also incurs significant costs in the form of cancellations, delays and service outages. It is therefore important that these precautions be taken when required, but that unnecessary interventions be avoided when possible. The accurate prediction of lightning is required for making informed decisions about the proper response to its impending occurrence.
Lightning usually originates in convective storms, which develop rapidly and occur in limited areas. This makes it difficult to predict their exact location using numerical weather prediction (NWP). For short lead times, it is often preferable to use nowcasting, the statistical prediction of the development of weather patterns using the most recent available observations. On the other hand, statistical nowcasting models, being ignorant of the physics of the atmosphere, begin to lose their predictive power at longer forecast time scales. Merging the two approaches by using the statistical nowcast for near-term predictions, the NWP forecast at longer lead times, and a combination of the two in between, is known as seamless nowcasting (Kober et al. 2012; Wastl et al. 2018; Nerini et al. 2019; Sideris et al. 2020).
As with many complex statistical data problems, lightning nowcasting has recently been the subject of research applying machine learning (ML) models to this problem. The ML approaches can be broadly divided into two categories: object-based nowcasting, which uses conventional methods to detect storm objects and their motion and then applies machine learning to predicting their development, and grid-based nowcasting, which operates directly on gridded data and produces gridded outputs. Grid-based nowcasting avoids the complexity of the storm detection and tracking algorithms (including a the somewhat arbitrary cell definition and the problems with merging and splitting cells) at the cost of requiring more advanced machine learning methods, especially if it is also desired to predict the motion of thunderstorm cells. In the object-based category, recently published research includes the studies of Shrestha et al. (2021) and Leinonen et al. (2022b). Among studies on grid-based ML prediction of lightning, Lin et al. (2019), Zhou et al. (2020), and Geng et al. (2021) used deep learning techniques using convolutional neural networks (CNNs), while Blouin et al. (2016) and La Fata et al. (2021) used methods based on decision trees. Mostajabi et al. (2019) considered the nowcasting of lightning at weather station locations using ML.
Our study falls into the category of grid-based nowcasting. We present a neural network with convolutional layers to model spatial features and recurrent layers to model temporal development. This network draws data from multiple sources including lightning detection, weather radar, satellite imagery, NWP and topographical information. The architecture of the network is specifically designed to seamlessly combine data from observational and NWP sources. The output of the network can be interpreted as the probability of lightning occurrence, and therefore is suitable for uncertainty quantification and for enabling each end user to select thresholds that best conform to their needs. Our work is similar to that of Geng et al. (2021) but complements and improves upon their work with a larger set of input data and a neural network specifically designed to incorporate NWP data seamlessly and to retain high-resolution input features in the near-term predictions. We also use higher spatial and temporal resolution (1 km and 5 min as compared with their 4 km and 60 min) and a shorter maximum lead time (1 h vs 6 h), emphasizing very-short-term warnings of lightning hazards. Furthermore, we present a more extensive analysis of various model features, especially training loss functions and calibration.
In this article, we first describe the input datasets (section 2) and the neural network model used to make the predictions (section 3). We then compare the different network features, evaluate the best models and discuss examples of predicted cases of lightning (section 4). We then summarize the results and present final conclusions and directions for future work (section 5).
2. Data
a. Study area
We carried out our study in an area roughly defined by the coverage of the Swiss operational radar network, shown in Fig. 1. This area contains all of Switzerland as well as a considerable distance in each direction beyond its borders. We chose the area because it provides plentiful data for thunderstorm research, is covered comprehensively by radars with high spatial and temporal resolution, and has a high population density and sensitive infrastructure that make the prediction of severe weather particularly important. Another objective aided by the selection of this area was that of easing the process of adapting the research results into operational applications at a later stage.
The study area, with the terrain elevation shown in color and the international borders shown as black lines. The locations of the weather radars are shown as red-outlined circles, and the shaded area depicts the area outside the range of the radars and hence excluded from the study. The scale bar indicates a distance of 256 km, the size of the subdomains used for training.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
The study area is characterized by highly variable terrain. Terrain types range between the flat plains of the Po, Rhine, and Saône Valleys, the moderate-elevation regions of the Black Forest, the Vosges, the Jura, the French Massif Central, and the Ligurian Apennines, and the high main chain of the Alps where the surface elevation frequently reaches above 3000 m above sea level. A small part of the area in the south is covered by the Mediterranean Sea. The climatological occurrence of lightning in the area, particularly on the southern flank of the Alps, is among the highest in Europe (Taszarek et al. 2019), making it highly suitable for the purposes of this research. Thunderstorms are also frequent on the northern flank of the Alps. A comparison of the occurrence, characteristics, and driving forces of thunderstorms on the northern and southern sides reveals similarities but also differences, as has been shown for hail (Nisi et al. 2018; Barras et al. 2021).
We process our data on a regular grid that covers the study area with 1 km resolution. The grid is defined using the European Petroleum Survey Group (EPSG): 21781 projection, covering the range [255, 965 km] in the projection coordinates in the east–west direction and [−160, 480 km] in the north–south direction, resulting in a grid of 710 × 640 points. Some products such as the radar composite are natively produced on this grid; other products were projected into this grid before further processing. The regions out of range of the radars, shown shaded in Fig. 1, are excluded from the study.
b. Data sources and preprocessing
Below, we describe the data sources used in this study, how we expect them to contribute to the prediction of lightning, and the preprocessing applied to them. A complete list of the input variables can be found in Table A1 of the appendix.
1) Lightning detection
The current lightning activity is an excellent predictor for the occurrence of lightning in the near future. Our lightning data were collected with the European Cooperation for Lightning Detection (EUCLID) network of ground-based lightning detection antennas that determine the location of lightning strikes using triangulation and time-of-arrival differences. The operation of the network is described by Schulz et al. (2016) and Poelman et al. (2016). The data were processed and provided to MeteoSwiss by Météorage.
The raw data products consist of the time, location, estimated current and various other descriptors of individual lightning strikes. To transform these data into a format compatible with our network, we accumulated the individual strikes into maps of lightning density covering 5-min periods. We also created similar maps of current-weighted density. These maps were normalized to bring the mean activity close to 1. Furthermore, we used the lightning data to create our targets, where a pixel is set to 1 if a lightning strike occurred within 8 km of that pixel within the last 10 min, and otherwise to 0. This definition is used in safety procedures at airports for takeoff and landing operations based on the regulations of the European Union (2017) and the International Civil Aviation Organization (2018). Such a binary definition is not concerned with the lightning flash rate but the occurrence of lightning regardless of the rate, and therefore emphasizes cases with low flash rate. Such situations are particularly hazardous because lightning may surprise people who have not yet sought shelter (Holle et al. 1993). This definition is used here to demonstrate the algorithm for one realistic use case, but the definition can be easily modified to accommodate the needs of other users. For example, to emphasize low flash rate situations further, the radius used to define lightning occurrence could be increased from 8 km or the time window widened from 10 min.
2) Precipitation radar
The precipitation radar data originate from the Swiss operational weather radar network operated by MeteoSwiss (Germann et al. 2016, 2022). The network consists of five C-band dual-polarization Doppler radars, which cover the entire area of Switzerland along with regions of the neighboring countries. The radars indirectly measure the intensity of precipitation at the surface. As the radars scan at multiple elevation angles and the scans of different radars overlap, the radars also provide information on the three-dimensional structure of the radar reflectivity. The relatively short distances between the radars in comparison with their range, and the strategic placement of certain radars at high elevations, mitigate the observational gaps caused by the terrain blocking the radar beams. The operational processing chain merges the measurements from the different radars into one composite on the study grid.
Of the radar data, we used the vertical-profile-corrected estimate of the precipitation at the surface (RZC), the maximum column reflectivity (CZC), echo-top heights at radar reflectivity thresholds of 20 and 45 dBZ (EZC-20 and EZC-45, respectively), the vertically integrated liquid water content (LZC), and the height of the maximum radar echo (HZC). These were chosen from among the descriptors available in the radar archive because the radar reflectivity and its vertical profile provide a direct observation of the hydrometeors relevant for the initiation of lightning. In particular nonsticky collisions between graupel (or larger ice crystals) coated with a quasi-liquid layer of supercooled water and smaller upward moving ice crystals in the convective updraft region cause a separation of electric charges (Takahashi 1978). The chosen radar observations have been identified by many sources as an indicator of thunderstorms and lightning (e.g., Marshall and Radhakant 1978; Gremillion and Orville 1999; Hering et al. 2004, 2006; Houze 2014). For processing in the neural network, we transformed RZC and LZC to a logarithmic scale, motivated by the globally lognormal distribution of rain intensity (Kedem and Chiu 1987). We shifted and scaled RZC, CZC, and LZC to distributions that are close to the standard normal distribution. EZC and HZC, which have natural minima at 0, were scaled to a mean of approximately 1. The details of the transformations can be found in the appendix (Table A1).
3) Satellite imagery: SEVIRI
The satellite data were obtained from the Rapid Scan service from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI; Schmid 2000), which are found on board each of the MeteoSat Second Generation (MSG) satellites. The products we used originate from the MSG-3 satellite, which is on geostationary orbit over 0° longitude. Every 5 min, the Rapid Scan service scans one-third of the disk visible from the satellite, centered on central and western Europe. The SEVIRI instrument provides data at 12 narrow-wavelength bands ranging from the visible to the infrared. The details of the bands can be found in Table A1 of the appendix, where the three numbers in each band name indicate the band wavelength: for instance, IR-087 corresponds to a band at 8.7 μm. Furthermore, SEVIRI produces a broadband high-resolution visible wavelength data product (HRV). Reflectance, brightness temperatures, as well as differences and temporal derivatives of the MSG bands are associated with convective cloud properties, as discussed by Mecikalski et al. (2010b,a). Higher-level cloud data products are derived from the SEVIRI data by the Nowcasting Satellite Application Facility (NWCSAF). These include information about the type, height and microphysical properties of clouds. We included them as we expected them to convey further information about cloud-top phenomena associated with severe weather, such as overshooting tops (Bedka et al. 2010; Bedka 2011) and above anvil cirrus plumes (Bedka et al. 2018).
We input transformed versions of each of the SEVIRI bands into our network. The thermal bands are available at all times of day and are expressed as brightness temperatures, which were transformed to mean μ ≈ 0 and standard deviation σ ≈ 1 using the same scaling for all bands except for IR-039, which is sensitive to both solar and infrared radiation, and hence was given its own scaling parameters.
The solar bands include HRV, the two visible-wavelength bands, as well as IR-016, the infrared band nearest to the visible range. These are not available at night and are set to zero during these times. To cancel out diurnal variation, we divided the solar band radiances by cosθz, where θz is the solar zenith angle. After this, we applied thresholds below which the radiance was set to zero in order to mask out signals originating from the surface. While this may also hide some thin clouds, we do not expect this to be a major issue for the present application as we concentrate on phenomena that are associated with very thick clouds. Finally, we transformed the solar bands to μ ≈ 1.
Of the NWCSAF products, we use the cloud phase, cloud-top temperature, cloud-top height and cloud optical thickness, the last of which is not available at night while the others are available at all times of day. Combinations of these have been used to identify deep convective (i.e., tall and optically thick) clouds in previous studies (e.g., Oreopoulos et al. 2014). The cloud phase is a categorical variable indicating either no cloud, liquid cloud, ice cloud, or mixed-phase cloud. This was transformed into a one-hot feature. The cloud-top temperature and height were scaled close to (μ, σ) ≈ (0, 1). Meanwhile, we took the logarithm of the cloud optical thickness following Leinonen et al. (2016) and normalized this to near (μ, σ) ≈ (0, 1).
The subsatellite resolution of all satellite channels and products except HRV is 3 km, corresponding to roughly 3 km × 5 km at the latitude range of the study area. These products were resampled to a grid with a resolution of 4 km, shared with the NWP products. The HRV has a subsatellite resolution of 1 km and was resampled to the 1 km resolution grid also used for the radar and lightning observations. The resampling was performed using projection to the 1 km grid with PyTroll (Raspaud et al. 2018), followed by taking the average of each 4 km × 4 km square to reduce the resolution of the channels other than HRV.
We recognized that the lack of availability of the solar bands and the cloud optical thickness at night was a potential issue that might confuse the machine learning algorithm to consider the scene cloudless. To provide information about the time of day, we also provided cosθz as an input. In principle, this should provide enough information to the network to enable it to disregard the solar bands when θz ≤ 0.
4) Numerical weather prediction
The NWP products were derived from the archived operational forecast runs of the Consortium for Small-Scale Modeling (COSMO) model (Baldauf et al. 2011), which MeteoSwiss uses for operational NWP. Analysis products would also have been available in the archive but were not used as they would not be available in real-time operations.
Using the results of Leinonen et al. (2022b) as a guideline, we selected various features that pertain to the occurrence of deep convection from among the COSMO model outputs. These were the convective available potential energy with respect to the most unstable level (CAPE-MU), the convective inhibition (CIN), the height of the 0°C isotherm (HZEROCL), the lifting condensation level (LCL), the moisture convergence (MCONV), the vertical velocity of air in pressure coordinates (OMEGA), the surface lifted index (SLI), the soil type, and the temperatures at the surface and at 2-m height (T-SO and T-2M, respectively). The COSMO model produces more variables that would be potentially useful for our prediction tasks. Some operational forecast output archives had unfortunately only been retained for a limited time because of the vast data volume created, and it was not possible to recover all potentially interesting variables at the time the data collection was performed.
As with the other data sources, the features that have a natural zero point, such as those expressing height from the surface, were scaled to μ ≈ 1. Those with an essentially open data range—for example, the temperatures—were shifted and scaled to approximately (μ, σ) ≈ (0, 1). The soil type is a categorical variable, and accordingly we transformed it into a one-hot feature.
The native resolution of the COSMO-1 version used operationally in Switzerland is 1.1 km. We expected that the operational forecast would provide the largest benefits at longer lead times, when the spatial uncertainty of forecast events is very high. With this consideration, and since it was desirable to constrain the amount of data passed to the model, we downsampled the data to 4-km resolution by averaging 4 × 4 pixel squares after projecting the data to the study grid. The forecast products were available at a time resolution of 1 h; linear interpolation was used to produce frames at a 5-min resolution. Reflecting the expected operational use pattern, the latest NWP forecast with lead time of at least 1 h was selected for each time step. Thus, the lead time of the NWP forecast used in the ML model ranges from 1 to 4 h; information about the lead time was not passed to the ML model.
5) Digital elevation model
The elevation in parts of the study area is considerable, indeed the highest in Europe outside the Caucasus, and orography is widely known to physically influence convective processes (Kirshbaum et al. 2018) and to be statistically linked to lightning occurrence (e.g., Dissing and Verbyla 2003). Therefore, we expected that it would be important to include information about the elevation in the analysis despite the results of Leinonen et al. (2022b) that suggested that the DEM does not contribute significantly to the prediction skill. We used a set of DEM data derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) global DEM (Abrams et al. 2020). Of the DEM features, we passed the elevation and the east–west and north–south direction derivatives to the model. The elevation was scaled to a mean of 1, and the derivatives were scaled to (μ, σ) ≈ (0, 1).
c. Selection and processing
Because we wanted to focus on predicting lightning, we downselected data from our study area and time period into the dataset such that only those spatiotemporal regions where convective activity was likely occurring nearby were included. We identified these regions based on the radar-derived rainfall rate. At each time step in the study period, we located regions in the study area where the rainfall rate exceeded 10 mm h−1 in a contiguous area of more than 10 pixels. For each such area, we added to the dataset the spatiotemporal box containing the 256 × 256 pixels surrounding the area at every time step ±2 h from the time of occurrence. Although lightning can occur at rain rates lower than 10 mm h−1 or even with no rain (e.g., Hodanish et al. 2004; Schultz et al. 2021), the 10 mm h−1 threshold concerns the maximum in the 256 km × 256 km box; therefore, the training data should also contain many cases of lightning with lower rain rates and consequently the threshold should not overly restrict the training dataset.
After the downselection, the volume of the data was still prohibitively large for the typical approach of storing the training dataset in a single static tensor. However, the sequences in different training samples overlap spatially and temporally. To avoid duplication of data storage, we divided the study area into tiles of 32 × 32 pixels such that each tile is stored at most once on a given time step. Only tiles that are within the regions of interest defined by the rainfall rate threshold were stored. The training samples are generated on demand from these tiles during network training and evaluation. The elimination of data duplication allows us to greatly reduce the memory footprint or, conversely, use a larger amount of data within the limits of the available memory.
We divided the dataset into a training set used to train the model, a validation set used for performance evaluation at training time and for hyperparameter tuning, and a test set used for final evaluation. Following the strategy of Leinonen et al. (2022b), entire days were set aside for the training set and the testing set in order to minimize overlap between the three subsets of data. First, days were randomly selected until at least 10% of the available time steps had been assigned into the validation set. Then, the same process was repeated with the remaining data to assign another at least 10% into the testing set. The remaining data were used as the training set.
The final dataset includes a total of 30 641 different possible starting times for the training sequences, of which 24 113 are in the training set, 3210 in the validation set and 3318 in the testing set. In total, 1 021 447 different samples, that is, around 30 samples per starting time, can be generated (not including the further diversity added by data augmentation). However, there is considerable overlap between these both spatially and temporally. The effective number of unique training samples in the dataset can be estimated from the total volume of the data: The total number of data points is approximately 6.7 × 109, which corresponds to 5680 image sequences of 18 × 256 × 256 size. Approximately 7.7 × 107 data points fulfill the condition that lightning occurred within 8 km in the previous 10 min; this is roughly 1.1% of the total, indicating a severely unbalanced dataset. The generation of training samples will be discussed in more detail in section 3b.
3. Models
a. Neural network
The model we propose for lightning nowcasting is a neural network that uses convolutional layers to model spatial relationships and recurrent connections to model the temporal evolution of the state of the atmosphere. The network follows an encoder–forecaster architecture, where the encoder produces an analysis of the state of the atmosphere as a deep representation, while the forecaster decodes this representation into a prediction of the future evolution of the target variable. Shortcut connections similar to U-Net networks (Ronneberger et al. 2015) are used to allow the encoder to be connected to the forecaster simultaneously at multiple different scales. The architecture was developed from that used by Leinonen et al. (2021); a variant developed in parallel work was used by Leinonen (2021a,b). The design resembles those already used in nowcasting applications by Franch et al. (2020), Cuomo and Chandrasekar (2021) and Ravuri et al. (2021).
A diagram of the network architecture is shown in Fig. 2. The downsampling connections use residual blocks derived from ResNet (He et al. 2016), with strided convolutions to downsample the resolution by a factor of 2. The upsampling connections are similar, but they apply bilinear upsampling by a factor of 2 before the residual block. The recurrent connections use a variant of the convolutional gated recurrent unit (Ballas et al. 2016) where the convolutions have been replaced with residual blocks as in Leinonen (2021b). The weights of the downsampling blocks in the encoder are shared between the time steps, as are those of the upsampling blocks in the forecaster. The recurrent connections at a given resolution use shared weights in the encoder and shared weights in the forecaster, but the weights are not shared between the encoder and the forecaster. The hidden states of the recurrent units in the encoder are initialized to zero, while those in the forecaster are initialized to the final state of the corresponding recurrent unit in the encoder, passed through a convolutional layer to allow the forecaster to have representations different from those of the encoder. The output of the forecaster is a time series of images with each pixel set to a value between 0 and 1, with larger values corresponding to higher confidence in lightning occurrence at that location and time step.
An illustration of the network architecture. For clarity, only 3 time steps are shown for both the past and the future; the actual network processes 6 past time steps and 12 future time steps. Here, N is the number of predictors, lr indicates low resolution, hr indicates high resolution, p indicates the past time frame, and f indicates the future time frame (i.e., COSMO variables). In our case Nf,lr = 9, Nf,hr = 10, Np,lr = 20, and Np,hr = 20.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
The encoder has multiple branches that correspond to the different input data time frames (past or future) and spatial resolutions. When a prediction is made, observational data are clearly only available for the past time frame, but NWP forecasts are also available for the future and can be exploited to produce a seamless nowcast. The output of the future branch of the encoder, which uses the NWP data, is used as the input to the deepest recurrent layer of the forecaster. The branches corresponding to different spatial resolutions are treated equally except that the downsampling operations are skipped in the lower-resolution branches until the resolution becomes equal to the next-highest resolution branch, at which point the two branches are merged using concatenation. The network can create past branches with the full resolution and additional resolutions 2, 4, and 8 times lower, and future branches with the same set of resolutions. The network construction script analyzes the inputs and automatically creates only the branches that are necessary for the input data. In the case of our input data, we have two past branches with resolutions of 1 km × 1 km (radar, lightning, DEM, and HRV data) and 4 km × 4 km (SEVIRI data other than HRV) and two future branches with the same resolutions (NWP data at 4 km × 4 km and static data such as the DEM at 1 km × 1 km). An architecturally simpler alternative would have been to upsample all data to the highest resolution before processing. Processing the lower-resolution input data at a resolution closer to the original saves processing time, memory and CPU-to-GPU transfer time. In the future, the model can also easily be adapted to new predictor datasets. Because of its capacity to process datasets with different resolutions, the model will be well suited to process the observations from the upcoming Meteosat Third Generation, where solar observations and thermal channels are made with 1- and 2-km subsatellite resolution, respectively.
The model in its baseline configuration did not use normalization or dropout. The experience from Leinonen (2021b) was that the model architecture is resilient to overfitting and these mitigation tools are not necessarily needed. Dropout can be optionally included and the consequences of this will be discussed in section 4a.
The network was implemented in Python using Tensorflow/Keras (Chollet 2015). NumPy (Harris et al. 2020), SciPy (Virtanen et al. 2020), and the PyTroll libraries (Raspaud et al. 2018) were used for data processing; Numba (Lam et al. 2015) and Dask (Rocklin 2015) were used for optimization; and Matplotlib (Hunter 2007) was used for visualization.
b. Training
We trained the model to predict Nf = 12 time steps in the future from observational data from the Np = 6 time steps in the past and NWP data for the Nf future time steps. At the 5-min time resolution, this corresponds to 30 min in the past and 60 min in the future. We briefly examined using a length of 12 time steps also for the past time frame, but we found no noticeable benefits over 6 steps.
At initialization, the data generator locates all available three-dimensional boxes of (Np + Nf) × Nh × Nw pixels (where Nh = Nw = 256 are the height and width of the training images, respectively) in the 1-km-resolution training data. These have been selected to be in or near regions containing likely convection as described in section 2c. During each training epoch, the data generator iterates over all possible starting times of the training sequences in randomized order. Only one training sample is generated for each starting time at each epoch, selected randomly from among the possible choices. This reduces the overlap of training samples, with the objective of avoiding overfitting to the cases of most widespread convection. Additionally, random rotations at 90° intervals and random mirroring are used to further increase the diversity of training samples and to incentivize the network to learn to be approximately invariant with respect to these transformations.
The training was performed on a computing cluster node with eight Nvidia V100 GPUs. With this hardware, one epoch required approximately 18 min of training time. The number of training epochs was not fixed, as different loss functions may require different amounts of training time. Instead, we followed an early stopping strategy where the learning rate is divided by 5 if the loss in the validation set has not improved for three epochs, and the training is stopped if the validation loss has not improved for six epochs; after stopping, the model weights giving the best validation loss are saved. Unlike with the training set, the order of samples in the validation and testing sets was not shuffled, nor was random data augmentation applied, in order to prevent a spurious improvement in the validation and test losses due to a random selection of more favorable inputs. We found that the training typically stopped after 20–30 epochs.
In contrast to the training time, the model is very fast to evaluate: we were able to generate a single sample and produce a prediction for it in 1.2 s on a modern computer with 16 CPU cores. Thus, GPU hardware is not necessary to use the model operationally, and bottlenecks in producing warnings are likely to be in data acquisition rather than computation.
c. Evaluation
Various skill scores can be computed to describe the predictive power of a binary forecast model; these have been summarized in the atmospheric science context by Hogan and Mason (2012). Scores differ in terms of whether they assign more weight to the correct prediction of occurring events or that of nonoccurrence. Moreover, for highly unbalanced datasets such as that used in our study, some skill scores are less suitable than others (Branco et al. 2016).
From the scores defined in Eqs. (1)–(4), one can derive various relatively straightforward metrics of success. Many of the metrics have varying names in different fields; below we define each metric only once and mention the alternative names.
The scores above are tradeoffs with respect to each other: For example, a decision threshold of T = 0 gives a POD of 1 but a high FAR, whereas T = 1 gives a FAR of 0 but also a POD of 0. For this reason, other scores have been devised which balance the different correct and incorrect predictions and generally attain their maximum at some threshold 0 < T < 1. Of these, we use the following:
d. Training losses
Our prediction task falls into the general category of predicting the probability of a binary event occurring. As the probability is predicted for each pixel, the prediction also has much in common with image segmentation tasks, where the task of a model is to identify the pixels in an image that belong to a certain category. Loss functions for image segmentation have been systematically reviewed recently by Jadon (2020) and Mehrtash et al. (2020). One important difference to image segmentation is that our model also needs to consider the shift of the regions over time. While this does not affect the definitions of the losses, it means that conclusions drawn about the relative merits of loss functions in image segmentation may not be applicable to our problem.
4. Results and discussion
a. Model selection
To refine the network, we experimentally evaluated the effect of various design choices on its skill. All such evaluations were carried out using the validation dataset; the test set was set aside for the evaluation of the final selected model. Many hyperparameter choices were already examined for a similar network by Leinonen (2021a), though this was done in the context of predicting a continuous variable using mean square error loss. Hence, the main focus of model tuning in this work was on adapting the network to probabilistic predictions.
1) Loss functions
The most important difference between the continuous predictions in Leinonen (2021a) and the binary categorical predictions in this work is the choice of loss function. We evaluated the performance of the model using the various losses introduced in section 3d. Each loss was evaluated using two choices of class weighting: equal weighting and inverse frequency weighting [IFW; Eq. (17)] setting the occurrence of the target variable to f1 = 0.0106. An exception is the CSI loss, which is naturally weighted and accordingly was tested only once.
In Fig. 3, we show the CSI and PSS metrics for the different losses as a function of the threshold chosen. The ETS and HSS metrics, while numerically different from CSI, exhibit essentially the same patterns as CSI, hence they are omitted from Fig. 3. FL with γ = 2 attains the highest of both scores, with CSI of 0.391 and PSS of 0.910. The differences in the top performance scores between the loss choices are modest except for the CSI loss, which produces a fairly good CSI but performs poorly with PSS. The choice of loss strongly affects the threshold T where the metrics peak. With both CSI and PSS, the equally weighted models peak at lower T than the IFW models. Increasing γ, that is, giving more weight to the uncertain cases, increases the optimal T with the equally weighted losses but decreases it with the IFW losses. The CSI loss tends to produce outputs that are very close to either 0 or 1 with few values between, so the choice of T has little effect on the metrics.
(a) CSI and (b) PSS as a function of threshold skill scores for different loss functions.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
The equally weighted losses outperform the IFW losses. We find this result somewhat surprising, as IFW has been established as relatively standard way of helping models learn from unbalanced datasets and is endorsed by, for example, the original paper on FL (Lin et al. 2017). The result is further supported by Fig. 4, where we show the precision–recall curves for different models. These curves also show the equally weighted losses slightly outperforming the IFW losses. For visual clarity, we have omitted the γ = 1 losses from Fig. 4 as their precision–recall curves are indistinguishable from the CE and γ = 2 equivalents. The CSI loss has also been omitted as its tendency to yield values very close to 0 or 1 makes it difficult to produce precision–recall curves.
Precision–recall curves for various loss functions.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
2) Calibration
To yield proper probabilistic forecasts, the output of the model must correctly reflect the probability of the event occurring. Among the loss functions presented in section 3d, only CE is strictly a probability-theoretic metric of the distance between distributions. Accordingly, this is the only loss that we can expect to accurately represent the probability of an event occurring in a given pixel. Both the IFW and the additional factor introduced in FL break the probabilistic assumptions.
Calibration curves, which express the occurrence rate of the target variable as a function of the predicted probability p, are shown in Fig. 5 for the different losses. The occurrence rate has been calculated for 100 bins equally spaced in p. As expected, only the CE loss produces a near 1:1 correspondence between the predicted and observed occurrence. Equally weighted FL results in a roughly sigmoid-shaped curve that crosses the 1:1 line near 0.5. The IFW losses result in calibration curves that remain low until rather high p, in particular for WCE, and then increase steeply. The output of the CSI loss is very heavily weighted toward values near 0 and 1, producing sampling noise in the other bins.
Model calibration. The probability predicted by the model is shown on the horizontal axis, and the actual rate of occurrence for that prediction is on the vertical axis. The dotted gray line indicates 1:1 correspondence.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
It is possible to recalibrate models by applying the curves shown in Fig. 5 to the model output. The recalibration does not affect metrics like CSI and PSS or the precision–recall curves because it merely shifts the thresholds. Recalibration is easier if the curve properly covers the full [0, 1] range of occurrence rates and if it is not too steep at any point. The WCE has a steep calibration curve near p = 1, which complicates the calibration, while the CSI loss is essentially impossible to calibrate. The focal losses produce usable calibration curves for both the IFW and equally weighted variants. Meanwhile, the equally weighted CE loss can probably be used without recalibration in most applications.
It is interesting to contrast the results shown here to the conclusions of Mukhoti et al. (2020), who argued that FL improves the calibration of the models over CE. We find that this is true for the IFW losses, but the opposite happens with the equally weighted losses where increasing γ makes the calibration worse.
3) Dropout
Dropout and weight regularization are commonly used to prevent overfitting and improve convergence in the training of neural networks. Dropout was omitted by Leinonen (2021a) as the model architecture was found not to be very prone to overfitting. In this work, we examined this in more detail. We experimented with two alternatives: an FL2 model using dropout with a rate of 0.1 in the downsampling and upsampling layers as well as weight decay with a rate of 10−4 in the AdaBelief optimizer, and another without these features. For both alternatives, we trained three instances that were trained identically except for the random initialization of weights.
The CSI, PSS, and PR AUC scores, and their standard deviations, are shown in the first two rows of Table 1. They indicate that the models using dropout (DO) and weight decay (WD) perform slightly better than the models without them, although the differences are only on the order of one standard deviation. The standard deviation values also suggest that using these features improves training stability, producing more consistent scores across different training runs. Therefore, we elected to use dropout and weight decay in this study.
Skill score comparison of different models with the validation set. The numbers indicate the mean score; if a plus/minus sign is present, the following value is the sample standard deviation; DO stands for dropout, and WD stands for weight decay. The skill scores are determined with different thresholds T optimizing each skill score independently.
4) Model variance and ensembling
We selected the model using FL with γ = 2 with dropout and weight decay as our primary model based on its good performance with the metrics and on the ease of recalibrating it. Since three instances of the model had been trained for examining the effect of DO and WD, we also created an ensemble model that outputs the average of the three models. Such ensembling is often found to result in performance that is better than that of any of the individual models (Ganaie et al. 2021). We obtained the same result; the ensemble scores shown on the last row of Table 1 demonstrate that the ensemble outperforms the individual models.
b. Evaluation
Having selected the model based on the results of section 4a, we use the recalibrated ensemble of three FL γ = 2 models to evaluate results on the test dataset.
1) Skill metrics
Various skill scores of our model are shown in Table 2 and are compared with the Eulerian and Lagrangian persistence models. The Eulerian persistence model assumes that lightning activity including its location remains the same as on the last time step in the past. In the Lagrangian persistence model, the lightning field from the final past time step is additionally motion extrapolated using motion detected from the RZC field using the Lucas–Kanade method (Lucas and Kanade 1981) implemented in the Pysteps library (Pulkkinen et al. 2019). The skill scores were calculated by selecting the threshold T such that it gives the optimal CSI in the validation dataset, then evaluating the scores using the test dataset. The optimal T for the test dataset would have been only slightly different, 0.421 instead of 0.426. The skill scores other than CSI in Table 2 were also computed using this T even though other choices of T may be more optimal if one wishes to optimize other skill scores instead. This is in contrast to Table 1, where each score was computed using the T optimal for that score. Consequently, the PSS is worse in Table 2 than in Table 1, but since T was selected differently, this does not indicate that the model performs worse with the test dataset than with the validation set. Indeed, the better CSI score for the test set (0.453) than for the validation set (0.398) indicates that the test set is somewhat less challenging for the model. Regardless, our model clearly outperforms the persistence models for all metrics. The AUC scores cannot be computed for the persistence models as they are not probabilistic.
Skill scores of the model on the test dataset. The scores for the Lagrangian and Eulerian persistence models are shown for comparison. The threshold T was selected to give the optimal CSI for the validation set.
Comparisons of the skill metrics with the results of earlier studies, whether ML based or traditional, are difficult to perform because their definitions for lightning occurrence differ from our “within 8 km in the last 10 min” definition. Some studies also calculate the metrics using each thunderstorm cell as a data point, while we use a more demanding pointwise calculation that is likely to produce lower scores. Hence, we avoid a direct comparison here.
2) Effect of lead time on skill
The skill of a forecast model can be expected to degrade with increasing lead time. In Fig. 6, we show the optimal CSI and PSS for the model as a function of lead time between 5 and 60 min. These are compared with the equivalent results of the Eulerian and Lagrangian persistence models. The performance at the first time step with 5-min lead time is high because the target variable is lightning occurrence within the last 10 min, meaning that some of the correct answers on the first time step can be inferred directly from the input. There is a rather sharp drop in CSI from 5 to 10 min, followed by a more gradual decline. The relative advantage of our deep learning model over the persistence model grows with increasing lead time. With PSS, our model has rather high scores even at long lead times because PSS weights detections much more than false negatives.
(a) CSS and (b) PSS of the model (blue) at lead times from 5 to 60 min, in comparison with the Lagrangian (orange) and Eulerian (green) persistence assumptions.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
3) Example cases
While it is not possible to cover the wide variety of possible cases within the constraints of this article, we chose three examples for discussion that demonstrate the ability of the model to predict the movement, growth, and decay of convective systems. These have all been evaluated with the calibrated model such that the predicted probabilities accurately reflect the probability of occurrence.
The first example, shown in Fig. 7, shows a relatively fast-moving system that is actively producing lightning. A comparison of the observed and forecast lightning activity demonstrates that the model has correctly inferred the speed and direction of the motion of the system. It also correctly predicts that lightning activity in the system will continue at a similar intensity, giving high confidence that lightning will occur on the center right of the image even at the last time step of the prediction at t = +60 min.
An example of predictions with our model in a case with a moving convective system at 1200 UTC 11 Jul 2020. The two columns of images on the left show four input variables: rain rate, lightning occurrence as defined for the target variable, the HRV satellite image, and cloud-top height (CTH). The four columns on the right show the NWP-predicted CAPE, observed lightning occurrence, and the lightning probability predicted by our model at lead times indicated on top of each column.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
In the second example in Fig. 8, the lightning activity decreases from t ≤ 0 to t > 0. The model is able to recognize this from the input sequence and correctly predicts decreasing probabilities in all regions of lightning activity. It forecasts very little lightning activity for the last time step, and this is indeed the case in the observation.
As in Fig. 7, but with decaying thunderstorms at 0130 UTC 2 Aug 2020. The HRV data are missing because the case occurs at night.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
In contrast to the previous example, Fig. 9 shows initiating and intensifying cells. Three slightly different cases can be identified. First, the cell on the bottom left is already active at t = −15 min and the model predicts with high confidence that it will also remain active. Second, on the top right lightning activity has just initiated, being present at t = 0 but not at t = −15 min. The model infers that activity in this area will continue. Near the center there is an area where no lightning has been detected in the input data. Nevertheless, the model detects a growing cell in this area, presumably using other input variables, and correctly predicts that lightning activity will begin there. There is a region on the center right of this example where lightning occurs in the observation while the predicted probability does not exceed the p = 0.025 threshold for visualization. However, the assigned probability is still nonzero in this area, ranging between approximately 0.01 and 0.02 at t = +60 min. In contrast, in the top left corner that is farthest from lightning activity, the predicted probability is approximately 3 × 10−5, indicating much higher confidence in the absence of lightning.
As in Fig. 7, but with growing thunderstorms at 1015 UTC 2 Aug 2020. The data are missing for the t = 0 frame of the CTH.
Citation: Artificial Intelligence for the Earth Systems 1, 4; 10.1175/AIES-D-22-0043.1
As manually selected cases, the examples shown in this section are not statistically representative of the dataset; to provide a representative sample, we have included equivalent figures of 32 random cases from the test dataset in the accompanying data archive (Leinonen et al. 2022a). These cases, or indeed the entire test dataset, cannot fully cover the variety of cases that our model might encounter, especially if it is used in conditions that are considerably outside the training distribution. However, they demonstrate that the network works well in a wide variety of cases and does not easily suffer from artifacts or glitches creating spurious lightning predictions.
5. Conclusions
The ability of deep neural networks to learn spatial relationships using convolutional layers and temporal relationships using recurrent layers makes recurrent-convolutional networks, which utilize both of these, a natural fit for forecasting the spatiotemporal evolution of atmospheric fields. The network introduced in this study utilizes this architecture to estimate the probability of lightning occurrence at each grid point and time step in the 60 min following the reference time. It uses inputs from ground-based radar, satellite observations, lightning detection, numerical weather prediction and a digital elevation model. The output probability can be utilized flexibly by the end users to issue warnings at specific thresholds, balancing their tolerance to nondetections and false alarms.
The results indicate that our selected model is able to infer the stage of the life cycle of convection from the input data. It can predict the motion, growth and decay of lightning-producing thunderstorm cells, adding to its ability to accurately predict the occurrence of lightning. The ability to directly predict the movement using only convolutional and recurrent layers also means that object detection and tracking are not necessarily needed to nowcast thunderstorm hazards.
Ideally, a probabilistic nowcasting model for the occurrence of rare events should separate the occurrences from nonoccurrences as efficiently as possible, and accurately represent the probability of the event occurring. Whether the latter is the case depends on the choice of loss function. Among the losses we examined, only the cross entropy is strictly probabilistic, while others, such as the focal loss or losses utilizing unequal class weighting, break the assumptions of the probabilistic loss. The other losses require recalibration of the output in order to be interpreted as a probability. In this work, we adopted the focal loss with focusing parameter γ = 2, which does require recalibration but whose calibration curve is well-behaved enough that this can be done easily. While this loss achieved the best metrics, cross entropy performed only slightly worse and is naturally calibrated, making it a good alternative in cases where recalibration is not practical or desirable. Contrary to established practice, we found that equal weighting of the lightning occurrence and nonoccurrence classes produced better results than inverse frequency weighting even though the dataset is severely unbalanced.
Given that our network architecture is not specific to lightning, and that we exploit a multisource dataset in this study that can give us information about many different hazards, it is expected that our approach can be easily adapted to other input and target variables. For instance, the upcoming Meteosat Third Generation satellites will provide higher-resolution geostationary observations for Europe, potentially helping CNNs extract more information. Furthermore, the importance of different data sources, previously examined by Zhou et al. (2020) and Leinonen et al. (2022b), is yet to be quantified in this context, but is necessary in order to understand the expected performance of the network in, for example, regions where ground-based radar observations are not available. We intend to investigate these topics in detail in a follow-up study. Further input variables could also be added in future versions, such as a distinction between cloud-to-ground and cloud-to-cloud lightning, polarimetric radar variables, observed or simulated hydrometeor densities that can act as indicators of lightning activity (Besic et al. 2016; Figueras i Ventura et al. 2019), and a more detailed description of the planetary boundary layer.
We found it difficult to compare our results with those of other studies because of differences in the datasets and the lightning occurrence definitions. To remedy this problem, we recommend that the community adopt standardized definitions and benchmark datasets to enable fair comparisons between different approaches.
Acknowledgments.
We thank Simone Balmelli for his assistance with the lightning data. The work of author Leinonen was supported by the fellowship “Seamless Artificially Intelligent Thunderstorm Nowcasts” from the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT). The hosting institution of this fellowship is MeteoSwiss in Switzerland.
Data availability statement.
The preprocessed training, validation, and testing datasets created for this study, as well as the trained models and precomputed results, are available for noncommercial use under the CC BY-NC-SA 4.0 license (https://doi.org/10.5281/zenodo.6802292; Leinonen et al. 2022a). The ML and analysis code used in this study can be found online (https://github.com/MeteoSwiss/c4dl-lightningdl). The original data from the EUCLID lightning network are proprietary and cannot be made available in raw form. The original data from the Swiss radar network and the COSMO NWP model can be made available for research purposes on request. The MSG SEVIRI Rapid Scan radiances are available to EUMETSAT members and participating organizations at the EUMETSAT Data Store (https://data.eumetsat.int/). The NWCSAF products can be created from these data using the publicly available NWCSAF software (https://www.nwcsaf.org/). The ASTER DEM can be obtained online (https://doi.org/10.5067/ASTER/ASTGTM.003; NASA/METI/AIST/Japan Spacesystems and U.S./Japan ASTER Science Team 2019).
APPENDIX
REFERENCES
Abrams, M., R. Crippen, and H. Fujisada, 2020: ASTER global digital elevation model (GDEM) and ASTER Global Water Body Dataset (ASTWBD). Remote Sens., 12, 1156, https://doi.org/10.3390/rs12071156.
Baldauf, M., A. Seifert, J. Förstner, D. Majewski, M. Raschendorfer, and T. Reinhardt, 2011: Operational convective-scale numerical weather prediction with the COSMO model: Description and sensitivities. Mon. Wea. Rev., 139, 3887–3905, https://doi.org/10.1175/MWR-D-10-05013.1.
Ballas, N., L. Yao, C. Pal, and A. Courville, 2016: Delving deeper into convolutional networks for learning video representations. arXiv, 1511.06432, https://doi.org/10.48550/arXiv.1511.06432.
Barras, H., O. Martius, L. Nisi, K. Schroeer, A. Hering, and U. Germann, 2021: Multi-day hail clusters and isolated hail days in Switzerland—Large-scale flow conditions and precursors. Wea. Climate Dyn., 2, 1167–1185, https://doi.org/10.5194/wcd-2-1167-2021.
Bedka, K. M., 2011: Overshooting cloud top detections using MSG SEVIRI infrared brightness temperatures and their relationship to severe weather over Europe. Atmos. Res., 99, 175–189, https://doi.org/10.1016/j.atmosres.2010.10.001.
Bedka, K. M., J. Brunner, R. Dworak, W. Feltz, J. Otkin, and T. Greenwald, 2010: Objective satellite-based detection of overshooting tops using infrared window channel brightness temperature gradients. J. Appl. Meteor. Climatol., 49, 181–202, https://doi.org/10.1175/2009JAMC2286.1.
Bedka, K. M., E. M. Murillo, C. R. Homeyer, B. Scarino, and H. Mersiovsky, 2018: The above-anvil cirrus plume: An important severe weather indicator in visible and infrared satellite imagery. Wea. Forecasting, 33, 1159–1181, https://doi.org/10.1175/WAF-D-18-0040.1.
Besic, N., J. Figueras i Ventura, J. Grazioli, M. Gabella, U. Germann, and A. Berne, 2016: Hydrometeor classification through statistical clustering of polarimetric radar measurements: A semi-supervised approach. Atmos. Meas. Tech., 9, 4425–4445, https://doi.org/10.5194/amt-9-4425-2016.
Blouin, K. D., M. D. Flannigan, X. Wang, and B. Kochtubajda, 2016: Ensemble lightning prediction models for the province of Alberta, Canada. Int. J. Wildland Fire, 25, 421–432, https://doi.org/10.1071/WF15111.
Branco, P., L. Torgo, and R. P. Ribeiro, 2016: A survey of predictive modeling on imbalanced domains. ACM Comput. Surv., 49 (2), 1–50, https://doi.org/10.1145/2907070.
Chang, J., X. Zhang, J. Chang, M. Ye, D. Huang, P. Wang, and C. Yao, 2018: Brain tumor segmentation based on 3D Unet with multi-class focal loss. 11th Int. Congress on Image and Signal Processing, BioMedical Engineering and Informatics, Beijing, China, Institute of Electrical and Electronics Engineers, 1–5, https://doi.org/10.1109/CISP-BMEI.2018.8633056.
Chollet, F., 2015: Keras. GitHub, https://keras.io.
Cuomo, J., and V. Chandrasekar, 2021: Use of deep learning for weather radar nowcasting. J. Atmos. Oceanic Technol., 38, 1641–1656, https://doi.org/10.1175/JTECH-D-21-0012.1.
Davis, J., and M. Goadrich, 2006: The relationship between precision–recall and ROC curves. Proc. 23rd Int. Conf. on Machine Learning, New York, NY, Association for Computing Machinery, 233–240, https://doi.org/10.1145/1143844.1143874.
Dissing, D., and D. L. Verbyla, 2003: Spatial patterns of lightning strikes in interior Alaska and their relations to elevation and vegetation. Can. J. For. Res., 33, 770–782, https://doi.org/10.1139/x02-214.
Doi, K., and A. Iwasaki, 2018: The effect of focal loss in semantic segmentation of high resolution aerial image. Int. Geosci. Remote Sens. Symp., Valencia, Spain, Institute of Electrical and Electronics Engineers, 6919–6922, https://doi.org/10.1109/IGARSS.2018.8519409.
European Union, 2017: Commission implementing regulation (EU) 2017/373 of 1 March 2017 laying down common requirements for providers of air traffic management/air navigation services and other air traffic management network functions and their oversight. Off. J. Eur. Union, 60, L62, https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32017R0373&from=EN.
Figueras i Ventura, J., and Coauthors, 2019: Analysis of the lightning production of convective cells. Atmos. Meas. Tech., 12, 5573–5591, https://doi.org/10.5194/amt-12-5573-2019.
Franch, G., D. Nerini, M. Pendesini, L. Coviello, G. Jurman, and C. Furlanello, 2020: Precipitation nowcasting with orographic enhanced stacked generalization: Improving deep learning predictions on extreme events. Atmosphere, 11, 267, https://doi.org/10.3390/atmos11030267.
Ganaie, M. A., M. Hu, A. K. Malik, M. Tanveer, and P. N. Suganthan, 2021: Ensemble deep learning: A review. arXiv, 2104.02395, https://doi.org/10.48550/arXiv.2104.02395.
Geng, Y., and Coauthors, 2021: A deep learning framework for lightning forecasting with multi-source spatiotemporal data. Quart. J. Roy. Meteor. Soc., 147, 4048–4062, https://doi.org/10.1002/qj.4167.
Germann, U., M. Boscacci, M. Gabella, and M. Schneebeli, 2016: Weather radar in Switzerland. From Weather Observations to Atmospheric and Climate Science in Switzerland: Celebrating 100 Years of The Swiss Society for Meteorology, S. Willemse and M. Furger, Eds., Vdf Hochschulverlag AG, 165–188.
Germann, U., M. Boscacci, L. Clementi, M. Gabella, A. Hering, M. Sartori, I. V. Sideris, and B. Calpini, 2022: Weather radar in complex orography. Remote Sens., 14, 503, https://doi.org/10.3390/rs14030503.
Goodfellow, I., Y. Bengio, and A. Courville, 2016: Deep Learning. MIT Press, 800 pp., http://www.deeplearningbook.org.
Gremillion, M. S., and R. E. Orville, 1999: Thunderstorm characteristics of cloud-to-ground lightning at the Kennedy Space Center, Florida: A study of lightning initiation signatures as indicated by the WSR-88D. Wea. Forecasting, 14, 640–649, https://doi.org/10.1175/1520-0434(1999)014<0640:TCOCTG>2.0.CO;2.
Harris, C. R., and Coauthors, 2020: Array programming with NumPy. Nature, 585, 357–362, https://doi.org/10.1038/s41586-020-2649-2.
He, K., X. Zhang, S. Ren, and J. Sun, 2016: Deep residual learning for image recognition. 2016 IEEE Conf. on Computer Vision and Pattern Recognition, Las Vegas, NV, Institute of Electrical and Electronics Engineers, 770–778, https://doi.org/10.1109/CVPR.2016.90.
Hering, A. M., C. Morel, G. Galli, S. Sénési, P. Ambrosetti, and M. Boscacci, 2004: Nowcasting thunderstorms in the Alpine region using a radar based adaptive thresholding scheme. Third European Conf. on Radar in Meteorology and Hydrology, Visby, Sweden, Copernicus, 206–211, https://www.copernicus.org/erad/2004/online/ERAD04_P_206.pdf.
Hering, A. M., U. Germann, M. Boscacci, and S. Sénési, 2006: Operational thunderstorm nowcasting in the Alpine region using 3D-radar severe weather parameters and lightning data. Fourth European Conf. on Radar in Meteorology and Hydrology, Barcelona, Spain, Centre de Recerca Aplicada en Hidrometeorologia, http://www.crahi.upc.edu/ERAD2006/proceedingsMask/00122.pdf.
Hodanish, S., R. L. Holle, and D. T. Lindsey, 2004: A small updraft producing a fatal lightning flash. Wea. Forecasting, 19, 627–632, https://doi.org/10.1175/1520-0434(2004)019<0627:ASUPAF>2.0.CO;2.
Hogan, R. J., and I. B. Mason, 2012: Deterministic forecasts of binary events. Forecast Verification: A Practitioner’s Guide in Atmospheric Science, I. T. Jolliffe and D. B. Stephenson, Eds., John Wiley & Sons, Ltd., 31–59.
Holle, R. L., 2014: Some aspects of global lightning impacts. Int. Conf. on Lightning Protection, Shanghai, China, Institute of Electrical and Electronics Engineers, 1390–1395, https://doi.org/10.1109/ICLP.2014.6973348.
Holle, R. L., 2016: A summary of recent national-scale lightning fatality studies. Wea. Climate Soc., 8, 35–42, https://doi.org/10.1175/WCAS-D-15-0032.1.
Holle, R. L., R. E. López, R. Ortiz, C. H. Paxton, D. M. Decker, and D. L. Smith, 1993: The local meteorological environment of lightning casualties in central Florida. 17th Conf. on Severe Local Storms and Conf. on Atmospheric Electricity, St. Louis, MO, Amer. Meteor. Soc., 779–784.
Houze, R. A., Jr., 2014: Cloud Dynamics. 2nd ed. Elsevier, 496 pp.
Hunter, J. D., 2007: Matplotlib: A 2D graphics environment. Comput. Sci. Eng., 9, 90–95, https://doi.org/10.1109/MCSE.2007.55.
International Civil Aviation Organization, 2018: Annex 3 Meteorological Service for International Air Navigation. 20th ed. International Civil Aviation Organization Doc., 250 pp.
Jadon, S., 2020: A survey of loss functions for semantic segmentation. Conf. on Computational Intelligence in Bioinformatics and Computational Biology, Via del Mar, Chile, Institute of Electrical and Electronics Engineers, 1–7, https://doi.org/10.1109/CIBCB48159.2020.9277638.
Kedem, B., and L. S. Chiu, 1987: On the lognormality of rain rate. Proc. Natl. Acad. Sci. USA, 84, 901–905, https://doi.org/10.1073/pnas.84.4.901.
Kirshbaum, D. J., B. Adler, N. Kalthoff, C. Barthlott, and S. Serafin, 2018: Moist orographic convection: Physical mechanisms and links to surface-exchange processes. Atmosphere, 9, 80, https://doi.org/10.3390/atmos9030080.
Kober, K., G. C. Craig, C. Keil, and A. Dörnbrack, 2012: Blending a probabilistic nowcasting method with a high-resolution numerical weather prediction ensemble for convective precipitation forecasts. Quart. J. Roy. Meteor. Soc., 138, 755–768, https://doi.org/10.1002/qj.939.
Koshak, W. J., K. L. Cummins, D. E. Buechler, B. Vant-Hull, R. J. Blakeslee, E. R. Williams, and H. S. Peterson, 2015: Variability of CONUS lightning in 2003–12 and associated impacts. J. Appl. Meteor. Climatol., 54, 15–41, https://doi.org/10.1175/JAMC-D-14-0072.1.
La Fata, A., F. Amato, M. Bernardi, M. D’Andrea, R. Procopio, and E. Fiori, 2021: Cloud-to-ground lightning nowcasting using machine learning. 35th Int. Conf. on Lightning Protection (ICLP) and XVI Int. Symp. on Lightning Protection, Colombo, Sri Lanka, Institute of Electrical and Electronics Engineers, 1–6, https://doi.org/10.1109/ICLPandSIPDA54065.2021.9627428.
Lam, S. K., A. Pitrou, and S. Seibert, 2015: Numba: A LLVM-based python JIT compiler. Proc. Second Workshop on the LLVM Compiler Infrastructure in HPC, New York, NY, Association for Computing Machinery, 1–6, https://doi.org/10.1145/2833157.2833162.
Leinonen, J., 2021a: Improvements to short-term weather prediction with recurrent-convolutional networks. 2021 IEEE Int. Conf. on Big Data, Orlando, FL, Institute of Electrical and Electronics Engineers, 5764–5769, https://doi.org/10.1109/BigData52589.2021.9671869.
Leinonen, J., 2021b: Spatiotemporal weather data predictions with shortcut recurrent-convolutional networks: A solution for the Weather4cast challenge. First Workshop on Complex Data Challenges in Earth Observation, Gold Coast, Queensland, Australia, CEUR, 3052, http://ceur-ws.org/Vol-3052/short15.pdf.
Leinonen, J., M. D. Lebsock, G. L. Stephens, and K. Suzuki, 2016: Improved retrieval of cloud liquid water from CloudSat and MODIS. J. Appl. Meteor. Climatol., 55, 1831–1844, https://doi.org/10.1175/JAMC-D-16-0077.1.
Leinonen, J., D. Nerini, and A. Berne, 2021: Stochastic super-resolution for downscaling time-evolving atmospheric fields with a generative adversarial network. IEEE Trans. Geosci. Remote Sens., 59, 7211–7223, https://doi.org/10.1109/TGRS.2020.3032790.
Leinonen, J., U. Hamann, and U. Germann, 2022a: Data archive for “Seamless lightning nowcasting with recurrent-convolutional deep learning”. Zenodo, https://doi.org/10.5281/zenodo.6802292.
Leinonen, J., U. Hamann, U. Germann, and J. R. Mecikalski, 2022b: Nowcasting thunderstorm hazards using machine learning: The impact of data sources on performance. Nat. Hazards Earth Syst. Sci., 22, 577–597, https://doi.org/10.5194/nhess-22-577-2022.
Lin, T., and Coauthors, 2019: Attention-based dual-source spatiotemporal neural network for lightning forecast. IEEE Access, 7, 158 296–158 307, https://doi.org/10.1109/ACCESS.2019.2950328.
Lin, T.-Y., P. Goyal, R. Girshick, K. He, and P. Dollár, 2017: Focal loss for dense object detection. IEEE Int. Conf. on Computer Vision, Venice, Italy, Institute of Electrical and Electronics Engineers, 2999–3007, https://doi.org/10.1109/ICCV.2017.324.
Lucas, B. D., and T. Kanade, 1981: An iterative image registration technique with an application to stereo vision. Proc. Seventh Int. Joint Conf. on Artificial Intelligence, San Francisco, CA, Morgan Kaufmann Publishers Inc., 674–679.
Marshall, J. S., and S. Radhakant, 1978: Radar precipitation maps as lightning indicators. J. Appl. Meteor. Climatol., 17, 206–212, https://doi.org/10.1175/1520-0450(1978)017<0206:RPMALI>2.0.CO;2.
Mecikalski, J. R., W. M. MacKenzie Jr., M. König, and S. Muller, 2010a: Cloud-top properties of growing cumulus prior to convective initiation as measured by Meteosat second generation. Part II: Use of visible reflectance. J. Appl. Meteor. Climatol., 49, 2544–2558, https://doi.org/10.1175/2010JAMC2480.1.
Mecikalski, J. R., W. M. MacKenzie, M. Koenig, and S. Muller, 2010b: Cloud-top properties of growing cumulus prior to convective initiation as measured by Meteosat Second Generation. Part I: Infrared fields. J. Appl. Meteor. Climatol., 49, 521–534, https://doi.org/10.1175/2009JAMC2344.1.
Mehrtash, A., W. M. Wells, C. M. Tempany, P. Abolmaesumi, and T. Kapur, 2020: Confidence calibration and predictive uncertainty estimation for deep medical image segmentation. IEEE Trans. Med. Imaging, 39, 3868–3878, https://doi.org/10.1109/TMI.2020.3006437.
Mostajabi, A., D. L. Finney, M. Rubinstein, and F. Rachidi, 2019: Nowcasting lightning occurrence from commonly available meteorological parameters using machine learning techniques. npj Climate Atmos. Sci., 2, 41, https://doi.org/10.1038/s41612-019-0098-0.
Mukhoti, J., V. Kulharia, A. Sanyal, S. Golodetz, P. Torr, and P. Dokania, 2020: Calibrating deep neural networks using focal loss. Advances in Neural Information Processing Systems, H. Larochelle et al., Eds., Vol. 33, Curran Associates, Inc., 15 288–15 299, https://proceedings.neurips.cc/paper/2020/file/aeb7b30ef1d024a76f21a1d40e30c302-Paper.pdf.
NASA/METI/AIST/Japan Spacesystems and U.S./Japan ASTER Science Team, 2019: ASTER Global Digital Elevation Model V003. NASA EOSDIS Land Processes DAAC, accessed 14 February 2022, https://doi.org/10.5067/ASTER/ASTGTM.003.
Nerini, D., L. Foresti, D. Leuenberger, S. Robert, and U. Germann, 2019: A reduced-space ensemble Kalman filter approach for flow-dependent integration of radar extrapolation nowcasts and NWP precipitation ensembles. Mon. Wea. Rev., 147, 987–1006, https://doi.org/10.1175/MWR-D-18-0258.1.
Nisi, L., A. Hering, U. Germann, and O. Martius, 2018: A 15-year hail streak climatology for the Alpine region. Quart. J. Roy. Meteor. Soc., 144, 1429–1449, https://doi.org/10.1002/qj.3286.
Oreopoulos, L., N. Cho, D. Lee, S. Kato, and G. J. Huffman, 2014: An examination of the nature of global MODIS cloud regimes. J. Geophys. Res. Atmos., 119, 8362–8383, https://doi.org/10.1002/2013JD021409.
Poelman, D. R., W. Schulz, G. Diendorfer, and M. Bernardi, 2016: The European lightning location system EUCLID—Part 2: Observations. Nat. Hazards Earth Syst. Sci., 16, 607–616, https://doi.org/10.5194/nhess-16-607-2016.
Price, C., and D. Rind, 1994: The impact of a 2 × CO2 climate on lightning-caused fires. J. Climate, 7, 1484–1494, https://doi.org/10.1175/1520-0442(1994)007<1484:TIOACC>2.0.CO;2.
Pulkkinen, S., D. Nerini, A. A. Pérez Hortal, C. Velasco-Forero, A. Seed, U. Germann, and L. Foresti, 2019: Pysteps: An open-source Python library for probabilistic precipitation nowcasting (v1.0). Geosci. Model Dev., 12, 4185–4219, https://doi.org/10.5194/gmd-12-4185-2019.
Rahman, A., and Y. Wang, 2016: Optimizing intersection-over-union in deep neural networks for image segmentation. Advances in Visual Computing, G. Bebis et al., Eds., Lecture Notes in Computer Science, Vol. 10072, Springer, 234–244, https://doi.org/10.1007/978-3-319-50835-1_22.
Raspaud, M., and Coauthors, 2018: PyTroll: An open-source, community-driven Python framework to process Earth observation satellite data. Bull. Amer. Meteor. Soc., 99, 1329–1336, https://doi.org/10.1175/BAMS-D-17-0277.1.
Ravuri, S., and Coauthors, 2021: Skilful precipitation nowcasting using deep generative models of radar. Nature, 597, 672–677, https://doi.org/10.1038/s41586-021-03854-z.
Rocklin, M., 2015: Dask: Parallel computation with blocked algorithms and task scheduling. Proc. 14th Python in Science Conf., Austin, TX, SciPy, 126–132, https://doi.org/10.25080/Majora-7b98e3ed-013.
Ronneberger, O., P. Fischer, and T. Brox, 2015: U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, N. Navab et al., Eds., 234–241, https://doi.org/10.1007/978-3-319-24574-4_28.
Schmid, J., 2000: The SEVIRI instrument. Proc. 2000 EUMETSAT Meteorological Satellite Data Users’ Conf., Bologna, Italy, EUMETSAT, 10 pp., https://www-cdn.eumetsat.int/files/2020-04/pdf_ten_msg_seviri_instrument.pdf.
Schultz, C. J., R. E. Allen, K. M. Murphy, B. S. Herzog, S. A. Weiss, and J. S. Ringhausen, 2021: Investigation of cloud-to-ground flashes in the non-precipitating stratiform region of a mesoscale convective system on 20 August 2019 and implications for decision support services. Wea. Forecasting, 36, 717–735, https://doi.org/10.1175/WAF-D-20-0095.1.
Schulz, W., G. Diendorfer, S. Pedeboy, and D. R. Poelman, 2016: The European lightning location system EUCLID–Part 1: Performance analysis and validation. Nat. Hazards Earth Syst. Sci., 16, 595–605, https://doi.org/10.5194/nhess-16-595-2016.
Shrestha, Y., Y. Zhang, R. Doviak, and P. W. Chan, 2021: Lightning flash rate nowcasting based on polarimetric radar data and machine learning. Int. J. Remote Sens., 42, 6762–6780, https://doi.org/10.1080/01431161.2021.1933243.
Sideris, I. V., L. Foresti, D. Nerini, and U. Germann, 2020: NowPrecip: Localized precipitation nowcasting in the complex terrain of Switzerland. Quart. J. Roy. Meteor. Soc., 146, 1768–1800, https://doi.org/10.1002/qj.3766.
Takahashi, T., 1978: Riming electrification as a charge generation mechanism in thunderstorms. J. Atmos. Sci., 35, 1536–1548, https://doi.org/10.1175/1520-0469(1978)035<1536:REAACG>2.0.CO;2.
Taszarek, M., and Coauthors, 2019: A climatology of thunderstorms across Europe from a synthesis of multiple data sources. J. Climate, 32, 1813–1837, https://doi.org/10.1175/JCLI-D-18-0372.1.
Virtanen, P., and Coauthors, 2020: SciPy 1.0: Fundamental algorithms for scientific computing in Python. Nat. Methods, 17, 261–272, https://doi.org/10.1038/s41592-019-0686-2.
Wastl, C., and Coauthors, 2018: A seamless probabilistic forecasting system for decision making in civil protection. Meteor. Z., 27, 417–430, https://doi.org/10.1127/metz/2018/902.
Zhou, K., Y. Zheng, W. Dong, and T. Wang, 2020: A deep learning network for cloud-to-ground lightning nowcasting with multisource data. J. Appl. Meteor. Climatol., 37, 927–942, https://doi.org/10.1175/JTECH-D-19-0146.1.