In this work deep convolutional neural networks (CNNs) are shown to be an effective model for fusing heterogeneous geospatial data to create radar-like analyses of precipitation intensity (i.e., synthetic radar). The CNN trained in this work has a directed acyclic graph (DAG) structure that takes inputs from multiple data sources with varying spatial resolutions. These data sources include geostationary satellite (1-km visible and four 4-km infrared bands), lightning flash density from Earth Network’s Total Lightning Network, and numerical model data from NOAA’s 13-km Rapid Refresh model. A regression is performed in the final layer of the network using NEXRAD-derived data mapped onto a 1-km grid as a target variable. The outputs of the CNN are fused with analyses from NEXRAD to create seamless radar mosaics that extend to offshore sectors and beyond. The model is calibrated and validated using both NEXRAD and spaceborne radar from NASA’s Global Precipitation Measurement (GPM) Mission’s Core Observatory satellite. The advantages over a random forest–based approach used in previous works are discussed.
Depictions of storm location and intensity obtained from weather radar are extremely important for public safety, transportation, agriculture, tourism, and several other areas. The United States is covered by a network of 159 Weather Surveillance Radar-1988 Dopplers (WSR-88Ds or, more commonly, NEXRAD;1 Crum and Alberty 1993), which are long-range S-band radars that provide frequent and detailed analyses of reflectivity (dBZ), radial velocity, and a number of polarimetric variables. Precipitation mosaics constructed from the NEXRAD enable meteorologists and operational forecasters to track and anticipate impacting weather events, such as precipitation, thunderstorms, hail, snow, tornadoes, and other forms of hazardous weather.
Weather radar is particularly important in air transportation (Evans et al. 2006). In the United States, air traffic controllers (ATC) and air traffic managers (ATM) employed by the Federal Aviation Administration (FAA) rely heavily on weather radar to track storms that can impact the safety and efficiency in the National Airspace System (NAS). However, the NAS includes areas both inside and outside of land-based radar range, leaving many offshore and oceanic controllers without direct access to weather information required for proper air traffic management. The left panel of Fig. 1 demonstrates this shortcoming by showing storm activity (indicated by lightning detection) outside radar range that is not depicted in the NEXRAD mosaic. This lack of adequate situational awareness may be detrimental to aviation safety and can lead to inefficiencies in the NAS.
This deficiency motivated the development of the Offshore Precipitation Capability (OPC; Veillette et al. 2016, 2015; Veillette and DeLaura 2016; Ryan 2016), which is a system that fills in gaps outside the coverage of weather radar with synthetic radar, which is a radar-like depiction of precipitation created by combining data from multiple nonradar sources that provide coverage in areas where there is no weather radar. These nonradar data include lightning flash detections, geostationary satellite imagery, and numerical weather prediction (NWP) model output. Individually, these datasets do not provide depictions as descriptive as radar; however, OPC effectively combines these quantities and provides radar-like mosaics familiar to air traffic controllers and air traffic managers. The right panel of Fig. 1 shows an example of the NEXRAD mosaic augmented with the version of OPC described in this paper.
More formally, let be a 2D image2 generated from weather radar data. In general, can represent a number of meteorological quantities; however, our focus lies with images derived from radar reflectivity (dBZ) that represent precipitation intensity, for example, base reflectivity, composite reflectivity, echo top, vertically integrated liquid (VIL), and rainfall rate.3 The image will only be valid in range of weather radar and is otherwise assigned a missing value. By “radar like” or synthetic radar, we mean an image that depicts the same meteorological quantity as but is estimated from nonradar datasets. Let be a collection of images obtained from a set of nonradar sources. In OPC, is constructed using the conditional expectation given these additional data:
Typically, these nonradar data will have a greater coverage umbrella than radar and thus are able to fill regions outside the radar range with radar-like estimates. With this method, radar mosaics with far greater coverage (potentially global) can be constructed.
The initial version of OPC utilized a random forest (OPC-RF; Veillette et al. 2016) to estimate the expectation in (1). This model was trained using NEXRAD as a target variable and generated imagery depicting VIL and echo tops4 that were chosen because of their relevance to air traffic control. The OPC-RF used a feature vector made up of roughly 100 subjectively determined features constructed from patches of input satellite, lightning, and numerical model images extracted around each pixel. While this approach provided adequate radar-like estimates of offshore and oceanic storms, this subjective feature extraction methodology is computationally expensive and raised the question of whether the constructed feature vectors were “optimal” for estimating radar-derived quantities.
In this paper we show that convolutional neural networks (CNNs; LeCun et al. 1998; Zhang et al. 2016; Szegedy et al. 2015) are an effective class of models for generating radar-like depictions from nonradar data. We present a CNN-based OPC (OPC-CNN) that outperforms its predecessor (OPC-RF). CNNs have provided state-of-the-art results in many areas, including computer vision, remote sensing, natural language processing, and others (Krizhevsky et al. 2012; Zhang et al. 2015). In many of these applications, CNNs are able to learn compact feature representations that can efficiently transform highly detailed pixel-level information into higher-order features relevant for a particular objective. These features can then be used for classification, segmentation, or, in the case of OPC, regression. With CNNs it is no longer necessary to subjectively engineer features from input images (as is done in OPC-RF) and instead the image data can be passed directly to the model. As we will show, this offers many advantages in accuracy and efficiency over the random forest method.
This paper is organized as follows. Section 2 provides a brief overview of related work. Section 3 describes the data used to train OPC-CNN, as well as the sampling and preprocessing steps involved in constructing the training set. The CNN architecture created for OPC is discussed in section 4, and the results of training and additional verification are presented in section 5. Methods to generalize the model outside the training domain are discussed in section 6.
2. Related work
Besides OPC-RF, there are a number of systems that create a form of synthetic radar imagery that exist in both industry and academia. However, details of these systems are often proprietary or remain largely unpublished. In Iskenderian (2008), a probability matching method is used to map cloud-to-ground lightning densities to an estimate of VIL and was a major source of inspiration for OPC-RF. Commercial products that correlate cloud-to-ground and intercloud lightning detections to radar reflectivity are also available, for example, PulseRad (Earth Networks 2017) and Thunderstorm Manager (Vaisala 2017). Methods that rely more on satellite data and/or numerical model data include NowRad (The Weather Company 2017). Based on publically available descriptions of these products, none fuse together as much data as OPC (namely, lightning, geostationary satellite imagery, and numerical weather prediction model output) to provide a radar-like depiction of storm intensity.
A related problem that has been studied for decades is that of estimating surface precipitation (e.g., rainfall rate) from data sources other than rain gauges (Kidd and Huffman 2011; Tapiador et al. 2012). Since geostationary satellites (GEO) provide imagery at semiregular intervals, multichannel rainfall estimates derived from infrared (IR) satellite imagery have been developed (Ba and Gruber 2001; Behrangi et al. 2009; Hsu et al. 1999). Other methods are the CPC morphing technique (CMORPH; Joyce et al. 2004) and the Integrated Multisatellite Retrievals for GPM (IMERG; Huffman et al. 2017). These methods combine visible (VIS) and IR imagery from GEO satellites with the limited coverage of passive microwave (PMV) sounders, which are capable of more accurate precipitation estimates than IR sensors. Some work has also been done to combine data across multiple sensors, for example, combining lightning with satellite data (Grecu et al. 2000). While machine learning is used throughout these works, these other models often invoke much simpler methods and shallower models than those described here, for example, lookup tables, linear regression, and shallow and fully connected neural networks that do not generalize well to the multiple-input framework inherent in OPC.
This section describes the data used to train and validate the OPC-CNN model as well as preprocessing involved in preparing the training set. Four major data sources are used in training the model: (i) cloud-to-ground lightning flashes, (ii) VIS and IR imagery from geostationary satellite, (iii) six parameters from a numerical weather prediction model, and (iv) VIL obtained from weather radar data (target variable). Two additional derived products, cloud-top height and solar zenith angle, are also created from these data and are used as additional training inputs. Validation data include GPM Dual-Frequency Precipitation Radar (DPR) reflectivity values that are preprocessed into VIL.
Incoming data are mapped onto a common map projection (e.g., Lambert equal area) using bilinear interpolation. The output horizontal resolution of this remapping is chosen based on the native resolution of each data source (specified below). From here on, the resolution of each data source will refer to units of this map projection (even though technically the true resolutions may vary spatially, as is also the case with geosynchronous footprints). After data are mapped onto a common set of grids, a patch selection methodology (explained below) is used to subsample input images and to create a training dataset for the CNN. Each sample in this training dataset consists of a time stamp , latitude φi and longitude ϕi, and the set of image “patches” obtained from input sources that match this time and location.
The following sections provide more detail on the data sources and the construction of this dataset.
a. Input data descriptions
Global lightning (LGHT) datasets are provided through Earth Network’s Total Lightning Network (Earth Networks 2017, unpublished data; Heckman 2014). The data consist of latitude–longitude, a time stamp, amplitude, multiplicity, and height information of all detected flashes, and are received every second. To be used in OPC, cloud-to-ground detections are geographically binned into grids with a resolution of 2 km. Three images of this type are constructed using 10-, 20-, and 30-min history of lightning flash data. The binned lightning data are smoothed with three Gaussian kernels having standard deviations 6, 12, and 18 km, respectively. This creates a set of nine lightning-derived images. As a final step, a logarithmic transform is applied to the density images to make the distributions less concentrated near zero.
2) VIS and IR GEO satellite
In this work satellite data for OPC are taken from the Geostationary Operational Environmental Satellite-13 (GOES-13). All imager channels provided by the satellite are utilized, including the VIS (0.65 µm) channel at 1-km horizontal resolution and four IR channels (3.9, 6.7, 10.7, and 13.3 µm), each at a 4-km horizontal resolution. Data arrival times depend on the current GOES-13 schedule set by the Office of Satellite and Product Operations (OSPO). During routine operations, continental U.S. (CONUS) scans arrive roughly every 10–15 min, whereas the larger Northern Hemisphere scans arrive approximately every 30–45 min, and full Disk scans arrive every 3 h. For any given time, the most recently available satellite images are selected for training. Archives of GOES-13 data used in this work can be obtained from NOAA’s Comprehensive Large Array-Data Stewardship System (CLASS) website (NOAA 2018).
3) NWP model (MOD)
NWP models are capable of simulating a large number of meteorological quantities in regions not covered by traditional sensors, including 3D soundings of temperature, pressure, and humidity, as well as relevant 2D fields, such as precipitable water, convective available potential energy, and temperature and pressure slices at various layers of the atmosphere (e.g., surface, tropopause). In this paper, model fields are sampled from the National Oceanic and Atmospheric Administration (NOAA)’s 13-km Rapid Refresh (RAP) model (Earth System Research Laboratory 2017, https://rapidrefresh.noaa.gov/, unpublished data; Benjamin et al. 2016) that covers a large portion of North America, including the oceanic regions of interest. It is executed every hour and provides hourly forecasts out to 22 h. At any given target time , to account for forecast latency OPC uses the most recent 2-h forecast product valid at . For convenience, this data is upsampled onto a 4-km grid using a bilinear interpolation scheme.
Although NWP models exist, such as the High-Resolution Rapid Refresh (HRRR) with much better resolution and update rate, at the time of writing these models do not extend as far into the oceanic regions of interest. However, once higher-resolution models become available over these regions, the framework set out in the remainder of the paper is adaptable to include this information. A global model, such as the Global Forecast System (GFS; https://www.ncdc.noaa.gov/), can also be used as a lower-resolution alternative to RAP.
4) Weather radar
Radar data are used as the target variable for training our model. The main source of radar data used here are VIL mosaic images obtained from the Corridor Integrated Weather System (CIWS; MIT Lincoln Laboratory 2017, https://ciws.wx.ll.mit.edu/, unpublished data; Evans and Ducot 2006; Klingle-Wilson and Evans 2005). The CIWS uses NEXRAD data to generate VIL images at a 1-km resolution every 2.5 min that covers the CONUS. These data are plentiful; however, CIWS is available only over the United States and southern Canada, which may impact model skill when generalized to other parts of the world. When discussing VIL in the context of aviation applications, it is common to bin VIL into six intensity categories, or levels, defined using the thresholds 0.16 (level 1), 0.78 (level 2), 3.5 (level 3), 6.2 (level 4), 12 (level 5), and 32 kg m−2 (level 6).
Radar data from the GPM DPR (NASA 2018) are used for model validation over areas not covered by NEXRAD. Though these data are not used to train the model, they are invaluable for assessing model performance in its intended domain. The data used for the assessments used in this paper are version 06 level 2A DPR reflectivity (dBZ) and for brevity are referred to as “GPM DPR data.” These reflectivity data are converted into VIL for use in validating the model.
5) Derived inputs
In addition to the aforementioned sources, two additional images are derived from previous inputs. The first is an image of solar zenith angles (SZA), which represent the angle made between the sun and the zenith at each pixel. This angle is a function of location, time of day (particularly day/night), and time of year (Woolf 1968). Not only is SZA a useful proxy for time of day, but it also has a drastic influence on satellite data, particularly VIS data that lose brightness as the SZA approaches 90° and become completely unavailable at night (SZA > 90°).
The other derived input is an estimate of cloud-top height (CTH) obtained by combining IR 10.7-µm brightness temperature with RAP model forecasts of temperature and pressure. In a method similar to that found in Donovan et al. (2008), the height of the cloud top is estimated by matching the satellite brightness temperature to a height from the vertical profile of atmospheric temperature generated from NWP output. CTH is computed on a 4-km grid similar to the IR satellite data.
b. Training domain and patch selection
The data sources listed above generate far more data than are actually needed for training. Not only are the images they produce quite large, but the majority of pixels in any given image will likely contain little or no precipitation that would be detected by a radar. Moreover, not all sources provide the same coverage (radar being the most limited). To mitigate these issues, a training domain is used to limit where data are sampled from, and image patches are subsampled from these regions in a way to create a balanced training set that contains roughly equal portions of zero-intensity, low-intensity, and high-intensity VIL. This balancing is necessary so that the training set is not dominated by cases with low or no radar return. In an unbalanced training set that was sampled uniformly at random, high-intensity cases would be relatively sparse, and their effect on model weights would be minor. The resulting model would show a low area bias for high-intensity storms because the fit necessarily favors low intensity. By raising this frequency of high-intensity storms in the training data, the algorithm becomes more sensitive to these cases. While this balancing achieves the desired effect of raising the detection rate for high-intensity storms, it also increases the tendency to predict low- and medium-intensity storms and leads to a high area bias in these regimes. This effect is accounted for in postprocessing with a calibration described in section 5.
A training domain is defined as a pair that contains a geographic G component and a temporal T component. The G component consists of a region that is adequately covered by all the input data sources listed above and is large enough to contain a range of possible weather scenarios that should be captured by the model. The T component consists of a time span from which to sample training data. This time span should also be chosen to match the time stamps for which the model is being applied (for instance, training a model using data sampled in July may not perform well if applied in January).
In this paper our training domain is selected to be the eastern half of the United States that is in range of NEXRAD, and our temporal domain includes July–September of 2015. The eastern U.S. training domain is ideal because of the overlapping coverage of both the GOES-13 satellite and NEXRAD. Note that a “fully operational” version of OPC that provides data year-round for various sections of the NAS will likely need a larger training domain (as well as additional data sources); however, we choose to limit the domain here for the purposes of demonstration. Generalizing the models to different regions (particularly to those far outside the range of NEXRAD) will be addressed in a later section.
A set of target times from which to sample input data sources was selected based on the scan schedule of the GOES-13 satellite.5 For the 3-month window considered, this resulted in approximately = 9000 target times. For each time , images from each input source were gathered and trimmed to the region G. Since input data are created on different schedules and take different lengths of time to refresh, the closest time stamp to within a maximum time offset was selected from each source for inclusion in the training data. The maximum time offsets are sensor dependent and were chosen based on sensor refresh rate, which was 5 min for radar and 1 h for the NWP forecasts. A set of patch locations denoting latitude and longitude were then randomly sampled within G. To ensure that the training is not dominated by cases without precipitation, locations with greater VIL values were sampled with higher probability such that the final training set contains roughly equal proportions of no VIL (= 0 kg m−2), mild to moderate VIL (0–0.77 kg m−2), and strong to severe VIL (>0.77 kg m−2). Additionally, areas of degraded NEXRAD, such as beam-blocked areas, were not selected to be a part of the training set, as this would affect model performance. To limit oversampling certain regions, the number of points sampled per time was limited to 200, or less than 0.01% of the pixels in G (and was limited even further if there was no severe weather present). Around each point, a patch of size (64 × 64 km2) was extracted from each input image and saved for training. This resulted in approximately 1 million training samples for a 3-month period, which was then further randomly subsampled down to approximately 150 000 samples, which was found to be sufficient for training.
c. Handling image misalignment
For optimal training, input data sources should be aligned spatially and temporally as accurately as possible. While samples are chosen to be close to a common target time, there will still be some difference between the target time and the exact time data are measured. This, together with other issues such as sensor latency and satellite parallax shift, will lead to image features not being perfectly aligned across patches taken from different sources (Lindstrom 2006). This can be partially improved by limiting the difference in sensor valid time and target time, or by shifting the center of patches taken from satellite data to account for a parallax shift (which is height, location, and sensor dependent). Ongoing improvements in sensor technology will also help reduce this in the future, for example, rapid updates on the GOES-R series of geostationary satellites; however, perfect alignment remains a challenge in this type of approach and will limit the overall accuracy of the final output.
To help alleviate this problem, we introduce some smoothing in our target output variable (VIL) and use these smoothed images in the loss layers of the CNN. The output VIL image is smoothed with a mean kernel of size 4 km × 4 km (a hyperparameter) and downsampled to 4-km resolution. This smearing was found to dull the intensity of severe storms during training, so to increase the probability of detection a histogram matching procedure to match smoothed 4-km VIL to the original 1-km VIL; this new set of patches is named S-VIL. The implication of using S-VIL to train OPC-CNN is that the output may suffer from some storm location error related to the degree of misalignment of the structure on the 4-km grid to the original structure at 1 km. Also, storms may be enlarged, since histogram matching is spreading out high values in a 1-km grid over a 4-km pixel. In practice, this results in a lower resolution or “softer” output, where the radar-like output visually appears smoother than true weather radar imagery. This effect should be noted in aviation applications of OPC; for example, storm sizes may be overestimated by a factor related to this applied smoothing and image misalignment. Examples of this effect will be observed in section 5.
4. Network architecture
This section presents the network architecture used in the OPC-CNN. We start by giving a brief overview of CNNs in general and then describe the network constructed for our problem. To define the notation below, suppose that at each time, an input data source provides images (or patches) of size . We can represent this collection of images as a 3D array, which we express as , where the layers are matrices representing the input images. A collection, or minibatch, of observations from will therefore make up a 4D array . The training dataset for OPC-CNN consists of a set of 4D tensors corresponding to each of the input sources listed in Table 1.
We start with a short review of components that make up a CNN and then move into the specific architecture used in OPC-CNN. A more thorough review of CNNs can be found in Goodfellow et al. (2016).
a. Review of CNNs
CNNs were first introduced in LeCun et al. (1989). In contrast to classical feed-forward neural networks with fully connected layers, nodes in a convolutional layer are connected to only a subset of nodes in a previous layer. This local connectivity is useful when dealing with images, which have a dimensionality that is often too large for classical neural networks. For images, this subset of nodes corresponds to a rectangular slice of the image known as a receptive field. This subset of nodes makes for an efficient feed-forward operation, and layers contain a much smaller number of parameters that need to be learned compared to classical (fully connected) networks.
A CNN is made up of several types of layers that can be described by a directed acyclic graph (DAG). These layers can be categorized as either input layers, image processing layers, or loss layers. Input layers are 4D arrays consisting of a batch of samples from data source and are provided from the training data. Image processing layers are functions that transform one or more 4D arrays into a new 4D array. In OPC-CNN, the following types of image processing layers are utilized:
Convolutional layers (conv): These are the main image processing layers of the CNN. A convolutional layer is parameterized by a bias vector and a 3D array of weights that are learned during training. The hyperparameter is the depth of the layer, and the represent weight matrices (or filters) of size that are typically much smaller than the input image size. Given an input array of , a convolution layer outputs another 4D array by convolving the layers of with the weight matrices, that is, and k = 1, …, N is the batch index. Note the size of the is dependent on the definition of the convolution operator. In OPC-CNN, convolutional layers use a stride of 1 and use the “valid” convention that does not add any image padding to . This results in a reduction in patch size computed by and .
Nonlinearity: Nonlinearity layers (also known as activation layers) are elementwise computations that typically follow convolutional layers. These layers simply apply a function to each pixel of its input. Common choices for include the rectified linear unit (relu) , a sigmoid , and the identity function .
Pooling layers (pool): Pooling layers reduce the dimensionality of the input image by summarizing groups of neighboring pixels. The groups of pixels in the input array are formed by subregions of size spaced pixels apart. A common choice is the max pooling layer, which computes the maximum of all pixels in a subregion. By setting the spacing of the subregions to be half of , max pooling layers effectively decrease the resolution of an image but retain the same key features. In OPC-CNN, 2 × 2 max pooling layers are used to downsample images by a factor of 2.
Dropout layers (drop): Dropout layers (Srivastava et al. 2014) are layers that attempt to prevent overfitting the training data. During training, these layers will randomly “turn off” (set to 0) pixels in an input array with some probability . By doing so, the network is forced not to rely on a small number of layers, which helps with robustness.
Batch normalization layers (batch norm): Many machine learning algorithms are more effective when inputs are properly normalized (e.g., have a mean of 0 and a variance of 1). Often in neural networks with several layers, the distribution of hidden layers over the training set can differ from this ideal, so batch normalization layers (Ioffe and Szegedy 2015) rescale data from the previous layer to have a more desirable mean and variance.
Concatenation layers (concat): Concatenation layers combine or “stack” multiple 4D input arrays. These layers receive multiple input arrays with sizes and combine them into a single array of size by concatenating each along the second dimension (image depth ). Note that if image sizes do not match across inputs, then cropping of larger images is necessary.
The aforementioned layers are commonly organized into groups or “blocks.” A typical block consists of a convolutional layer, a nonlinearity, a pooling layer, and possibly a dropout layer. When describing networks, it is often simpler to explain the structure of a network in terms of blocks than individual layers.
The third category of layer used in OPC-CNN is a loss layer. A loss layer receives output from an image processing layer and compares it to some form of ground truth provided by an input layer. The loss layer outputs a scalar-valued score that represents the quality of the network’s output. The objective of training the CNN is to minimize the output loss through backpropagation. The type of loss function used in OPC-CNN is a mean-squared loss layer, which averages the squared error across all pixels and layers of an image, that is, .
Loss layers are minimized by tuning the weights and biases in the convolutional layers through stochastic gradient descent (Bishop 2006) in which a minibatch of samples from the training set is used to estimate the gradient of the loss function through backpropagation. In the case of multiple loss layers , the gradient descent update uses a weighted average across all loss layers , where the are a set of weights that typically sum to 1.
b. Architecture in OPC-CNN
An overview of the network architecture used in OPC-CNN is shown in Fig. 2. The network consists of 51 layers that are grouped into processing blocks for clarity. The lettered blocks (Table 2) represent convolutional layers followed by some combination of a nonlinearity, dropout layer, and/or a pooling layer. Conceptually, this network can be decomposed into two parts—a set of feature extraction layers that process individual input sources into features and a set of fusion layers that combine these features and map to an output image that compares to a true radar image in a final loss layer.
Multiple variations of the chosen architecture were attempted and the best performing on a holdout set was selected. Most network designs that were tested were motivated from popular CNNs in the literature, for example, Szegedy et al. (2015) and Krizhevsky et al. (2012). This particular architecture was also heavily influenced by the following factors.
Output resolution: For this work, 4 km was chosen as the output resolution. To achieve this, inputs of different resolutions were mapped to this common resolution via pooling layers or upsampling layers.
Concatenation: After images are mapped to a common resolution, it is necessary to combine all images into a single tensor for further processing. This layer is added after all layers are mapped to the common output resolution.
Regularization: Dropout layers are utilized in this work to regularize the network and to avoid overfitting. These are typically added after convolution–relu combinations, with the exception of later layers. Dropout layer rates were chosen via experimentation.
Convolution depth: As convolution layers are the main image processing layer of the CNN, these should appear in both the feature extraction and fusion sections of the network. The number of convolution layers in each section was also chosen via experimentation.
In the feature extraction layers, processing blocks are trained from individual data sources. Blocks are applied to only VIS; blocks are applied to IR/CTH; and is applied to only LGHT inputs. To enable better learning in the early layers of the network, additional output () and mean-square-error (MSE) loss layers were added to the early stages of the network. It was found in this case that networks that only rely on one final loss layer were unable to propagate error back effectively to initial layers. This led to noisy (untrained) feature extraction layers and less accuracy overall. By shortening pathlengths (e.g., VIS output), meaningful weight matrices were successfully updated in early stages (see the results section for more details). Convolutional blocks were not used for MOD and SZA because their lower spatial resolution made them unnecessary.
Outputs of the feature extraction layers are concatenated into one 4D array. The combined features undergo batch normalization and are further processed by three convolutional blocks. The output from the last processing block has one layer representing the radar-like estimate and is passed to a final MSE loss layer and compared to S-VIL.
Network training is done using input patches described in Table 1; however, a network that only works on small patches would not be very practical in an operational setting. It is preferable that the trained network be applicable to an arbitrarily sized domain. This property is achieved in OPC-CNN by avoiding fully connected layers in the network architecture, which flatten images created by hidden layers of the neural network into 1D vectors of fixed length. Not including fully connected layers maintains the image structure of the data throughout the network and allows the network to accept input images of sizes different than what was used during training. CNNs that have this property are referred to as fully convolutional neural networks and have been applied in other areas, such as image segmentation (Long et al. 2015). Because these types of convolutions will trim off the outer edges of an image, the output of the network will be of a slightly smaller size. This can be compensated for in test mode by adding appropriate image padding to full-size input images.
Another important consideration in the network design is image resolution. Input images arrive in differing resolutions of 1, 2, 4, and 13 km. To ensure images are properly aligned spatially in the concatenation layers of the network, features must be mapped into a common resolution prior to concatenation. To achieve this, VIS (1 km) and LGHT (2 km) features are downsampled to a 4-km grid using max pooling layers of 2 × 2 pixels. MOD patches are upscaled to 4 km using a bilinear upsampling (upsamp block in Fig. 2), and SZA is computed natively on a 4-km resolution and thus does not require any additional processing.
c. Model training
Network weights were initialized randomly using a Gaussian distribution with zero mean and a standard deviation of 10−3. Biases were initialized to 0. The training dataset was split into training, validation, and testing components using a 70%, 15%, 15% split, respectively. The training component was used for stochastic gradient descent, the validation set was used for stopping criteria and model selection, and the test set was used to estimate final network performance and for hyperparameter tuning.
As described in section 4b, the network is trained by minimizing the MSE criteria over multiple loss layers. Three of these output layers depend exclusively on VIS, IR, and LGHT inputs, respectively, and the final output layer depends on the output of the fusion layers. The influence that these output layers have on the final objective depends on weights , which are hyperparameters. These weights are initially set to be equal to ensure that the feature extraction layers are adequately trained, but as the number of epochs increases, the weights corresponding to VIS, IR, and LGHT inputs decay exponentially, while the weight associated with the final output approaches 1. The network was trained for 300 epochs. To choose the final network, the fusion loss layer was evaluated for each training epoch on a separate hold out set, and a median filter was applied to smooth this error curve over epochs. The epoch that provided the minimum error was chosen as the final network.
Training was performed using the Lincoln Laboratory Supercomputering Center (Byun et al. 2016). The processing node used in training had 64 GB of RAM and has access to a Tesla K80 graphics processing unit (GPU) with 4 GB of memory. Training using 300 epochs with this setup took approximately 30 min for the chosen dataset.
This section provides visualizations and performance estimates of the trained network applied to full-size images (i.e., not limited to the patches used during training). Postprocessing steps, including calibration and the incorporation of actual radar data, are also discussed.
a. Application to full-size inputs
The OPC-CNN was trained to output radar estimates on small patches of input data. However, in practice, the trained layers can be applied to inputs of arbitrary size. Inputs are first remapped to the same patch resolution used during training and then cropped so each input tensor covers identical geographic regions. After normalization, the trained network is then applied to these inputs. In this case, the output of the network is also a full-size image of nearly equal size to the original data, except for a small loss of data around the edges as a result of the convolutional layers (recall no padding was used). Because input images can be quite large, inputs may need to be divided into smaller-size geographic blocks and processed individually before reconstructing upon output. A reverse parallax shift is applied to the output image using the same vectors as used previously to align the output to approximate the view that would come from ground-based radar. Finally, a calibration explained later in section 5b is applied.
Figures 3a–d show examples of OPC-CNN being applied to portions of the Southeast United States. The left panel of each case shows output from OPC-CNN based on only satellite, lightning, and numerical model inputs. In the right panel of each case, the target variable (NEXRAD VIL) is shown along with an indication of radar coverage (gray shading). Qualitatively, in each case OPC-CNN is able to capture the size, intensity, and orientation of the storms, albeit at a slightly lower resolution. Small isolated convective cells, typically those that generate cloud-to-ground lightning, are very well depicted by OPC-CNN as can be seen in the isolated convection over the Florida Peninsula in Fig. 3a and thunderstorms over Alabama in Fig. 3c. Nonconvective storms that do not produce lightning can sometimes be missed or underrepresented, for example, the line of precipitation south of the Florida Panhandle in Fig. 3b. Overall, the number of false alarms in OPC-CNN is low; in very few instances does OPC-CNN depict storm cells (which are defined for aviation purposes as VIL greater than 3.5 kg m−2, represented by yellow shading in Fig. 3) in radar coverage that do not match an observed storm. This suggests that storms outside radar range that are depicted by OPC-CNN are reliable indications of hazardous weather, for example, the cluster of storms in the Gulf of Mexico in Fig. 3d.
Recall from section 3b that the training set was constructed using roughly equal portions of no VIL, moderate VIL, and severe VIL. This distribution is far from the true relative frequencies of these regimes observed in nature and thus may lead to area biases when applied to actual nonbalanced data. The left panel of Fig. 4 shows a comparison between the distribution function of VIL measured from NEXRAD and the OPC-CNN over the training domain. It is clear that the OPC-CNN is overestimating the lower levels of VIL and slightly underestimating the higher levels (note the logarithmic scale).
To address these area biases, a histogram matching procedure is applied to the output of the final layer. This procedure modifies pixel intensities of the output in such a way that the distribution function of the resulting images over the training domain matches that of a target collection of images (NEXRAD VIL in this case). In other words, if and represent the distribution functions of NEXRAD VIL and OPC-CNN VIL, respectively, then the histogram matching computes a mapping . The mapping can be found by minimizing the square of the log ratio of the distribution functions:
The left panel of Fig. 4 shows and estimated within the training domain on even Julian days. The right panel of Fig. 4 shows the resulting corresponding to six VIL levels (L1–L6) along the y axis and a piecewise linear interpolation applied in between these levels. As expected, this function suppresses lower VIL levels while enhancing higher levels.
c. Model validation
In this section we validate the OPC-CNN on full-size inputs. Standard performance metrics— probability of detection (POD), false alarm rate (FAR), bias (BIAS), and critical success index (CSI)—commonly used for validating weather forecasts (Stanski et al. 1989) are used. To define these metrics, assume is a radar-like image created by OPC-CNN, and is the associated radar-derived target variable. These two images are to be compared in regions of adequate radar coverage. To avoid overfitting from the histogram matching procedure, these metrics are computed only on the odd Julian days.
BIAS computes the ratio of area coverage of storm intensity greater than a threshold T for both model and target in regions with valid radar coverage:
where indicates pixel counts over the training domain. Ideally, BIAS should be close to 1.
The other metrics are POD, FAR, and CSI. For a given T, these metrics compare the ratio of pixels correctly classified or misclassified as greater than T to those that were not. More precisely, define hits H, misses M, and false alarms (FA) for T as
Then POD, FAR, and CSI are defined as
Because this training domain is the same as used to create the patches used to train the CNN, there is a small amount of overlap between this validation and the data used to train OPC-CNN. However, this overlap represents an extremely small proportion of all pixels used for scoring in this section, and separate experiments revealed that scores were not sensitive to this overlap. For a baseline reference, the results of the OPC-CNN are compared to the OPC-RF that was running operationally at the Massachusetts Institute of Technology (MIT) Lincoln Laboratory during the summer of 2015.
Six scoring thresholds corresponding to the six VIL intensity levels shown in Fig. 1 were used when computing CSI and BIAS. OPC-CNN scores were computed for both calibrated (OPC-CNN-Cal) and uncalibrated (OPC-CNN-NoCal) outputs and compared to the output from the legacy random forest algorithm6 (OPC-RF). The results of this scoring are summarized in the performance diagram showing in Fig. 5. This diagram shows POD and 1 − FAR along the axes, BIAS along rays emanating from the origin, and contours of constant CSI. For CSI, OPC-CNN outperforms both OPC-RF and the uncalibrated results in all six VIL levels. The most significant boosts are seen in lower VIL levels, with >12% CSI increase for level 1 and a 13% increase for level 2. These results also show the importance of calibration, as some scores increased dramatically after calibration was applied. The calibrated OPC-CNN had a BIAS closer to 1 compared to OPC-RF in four out of six levels. The calibrated OPC-CNN shows a high area bias for level 6 VIL (>4), which is likely a result of the histogram matching procedure overfitting to a small set of cases in the even Julian days. This area bias is likely to be improved with additional data.
d. Radar stitching
As a final postprocessing step, calibrated output from the OPC-CNN are stitched or blended with actual NEXRAD returns in regions with adequate radar coverage. The stitching procedure used here is similar to the method described in Veillette et al. (2015), which computes a weighted average of actual radar with OPC-CNN output , where the weights are based on the distance to the closest functioning NEXRAD. To ensure that strong radar returns far from the radar are not dulled in this average, the maximum of the average field and is then taken. More precisely, the stitched radar field is given by
where the weights depend on the distance (km) to the closest radar via
and is the logistic sigmoid function. In this formula, represents the radius of what is considered to be good radar coverage (where no OPC will be merged in), and represents the maximum range of radar, outside of which only OPC will be used for the final output. In this application, and were chosen to be 230 and 450 km, respectively. The right panel of Fig. 1 shows an example of radar stitching applied.
6. Expansion outside the training domain
Much of the data and results up until this point have relied on land-based NEXRAD. NEXRAD was chosen for training because it provides a high-quality and plentiful source of “truth”; however, its coverage is limited mainly to the continental United States. Using a training set sampled from a relatively small training domain calls into question how well the resulting model will generalize to different global climates (Pacific Ocean, Alaska, desert regions, etc.). This problem can be partially alleviated by including data obtained from other radar networks (European, China, India, etc.), but these data will still not provide truly global coverage.
Despite these concerns, nothing is preventing the application of the current model in other parts of the world and testing the results. Initial testing outside the United States has been performed (Veillette and DeLaura 2016). In this work the OPC-RF algorithm trained over the eastern CONUS was tested in regions in the Gulf of Mexico and the western Atlantic Ocean. The output was compared to both NEXRAD VIL and GPM DPR-derived VIL. These results showed that although an area bias existed for intense storms, the model still provided a useful radar-like depiction in areas lacking any weather radar. Qualitative studies have also shown that the network can provide reasonable radar-like depictions in other regions of the world, but a quantitative study has not yet been performed.
This issue should be addressed more rigorously if the model is to be applied far outside the training domain used in this work. For global training, a valuable resource of radar data is from the DPR on board the GPM satellite described in section 3a; however, its footprint is limited to relatively small swaths along the satellite’s orbit width. The size of this footprint limits the amount of training data that can be collected. Such a training set will require a larger temporal component in the training domain to account for the limited spatial coverage of the GPM footprint. This type of approach may be considered in future work.
A simpler option for domain adaptation is to modify the calibration procedure described in section 5b to have regional dependence. This approach will utilize the same OPC-CNN trained in the current training domain, but it will ensure that the output has some degree of statistical similarity to precipitation intensity outside the training domain by using a regional calibration. This proposed approach also has the benefit of not requiring vast amounts of data to train a CNN from scratch.
To incorporate this regional dependence, the objective function (2) is modified in two ways: 1) OPC-CNN VIL intensity is compared to GPM-derived VIL intensity rather than NEXRAD VIL and 2) a regularization term is added to prevent overfitting in data-sparse situations. Because OPC-CNN is trained using NEXRAD as a target variable, modification 1 also requires the introduction of a GPM correction that addresses the difference between NEXRAD VIL and GPM-derived VIL. To motivate this correction, let be the distribution function for GPM-derived VIL and notice that the quotient inside the logarithm in (2) can be rewritten as
where is estimated within NEXRAD range.
The term is used to prevent overfitting in situations where the sample size is small. For multiple domains , can be computed separately for each region and interpolated between regions to avoid edge effects.
This regional adaptation was applied to the five regions shown in Fig. 6. These include two regions over land, which are covered by NEXRAD, and three offshore regions outside radar range. OPC-CNN VIL mosaics were constructed within all five regions without merging NEXRAD, and they were used to compute separately within all five regions. The was computed in offshore regions using all available overlapping GPM data, and was estimated for regions with radar coverage. The calibration functions were then computed for each region and plotted in Fig. 6 using a regularization of Overall, the calibrations curves are similar, with minor differences observed in low-level and severe precipitation. This result suggests that a model trained over the CONUS (Northeast and Southeast United States) retains skill when applied directly outside the training domain.
7. Concluding remarks
Air traffic controllers rely heavily on NEXRAD to provide safe, effective, and reliable transportation; however, these data are lacking in many regions of the world without weather radar. This paper provides a methodology for creating radar-like precipitation mosaics in areas lacking weather radar. Multiple forms of nonradar data, including geostationary satellite imagery, lightning detections, numerical weather prediction model output, and sun position are used as input layers to a convolutional neural network. A training database was constructed using VIL mosaics taken from the Corridor Integrated Weather System (CIWS) to use as the target variable in a regression. The OPC-CNN has a DAG structure that takes inputs in their native resolution and combines them into estimates of radar-derived quantities. These OPC-CNN-generated images are combined with actual radar data that exist in areas with radar coverage to create seamless precipitation mosaics that can be extended far outside the range of radar.
The model was verified over the summer of 2015 and compared to a prior version of the model trained using a random forest algorithm. The CNN version showed higher skill scores for both CSI and BIAS. A calibration methodology was developed to help generalize the model to regions outside the training domain. This was done by calibrating the output of the trained model using data from the GPM DPR as validation in regions outside radar coverage.
During the summer of 2017, a research prototype of OPC-CNN was transitioned into an operational demonstration at five key air traffic control centers, including Miami, Florida; San Juan, Puerto Rico; Houston, Texas; New York, New York; and the Air Traffic Control System Command Center. This capability will provide an interim solution while OPC is transitioned into future FAA weather systems and will provide valuable feedback that will influence future improvements to the capability.
Moving forward, advanced sensors and increasing amounts of training data will allow better model performance and greater spatial coverage of the OPC model. The GOES-R series of geostationary satellites, the first of which launched in November of 2016, will provide higher-resolution and faster updating image data, and also lightning data from the Geostationary Lightning Mapper that can be leveraged to improve the OPC algorithm. Methods like OPC can be used to take full advantage of these rich datasets to generate effective data-driven meteorological tools.
This material is based upon work supported by the Federal Aviation Administration under Air Force Contract FA8721-05-C-0002 and/or FA8702-15-D-0001 under Interagency Agreement DTFAWA-11-X-80007. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Federal Aviation Administration. We wish to recognize the late Leonard Story of the Federal Aviation Administration for his advocacy and guidance in the early stages of this work. We also wish to thank Earth Networks for providing the global lightning data to us, and two anonymous reviewers for their help with improving the article.
At the time of this writing, since the San Juan NEXRAD was disabled in September 2017 from Hurricane Maria.
The term image here actually means a raster, which is an image embedded in some spatial reference system that maps between geocoordinates (latitude, longitude) and pixel coordinates through a predefined map projection . With this notion, we can discuss the distance between pixels in units of actual distance (e.g., km).
Even though VIL and rainfall rate are not strictly “radar quantities,” well-known methods exist that map reflectivity to estimates of liquid water content and rainfall.
Echo tops are defined as the height of the 18-dBZ layer of reflectivity.
The OPC-RF results scored here also include a histogram matching–based calibration.