Commercial microwave links are installed and maintained for the purpose of telecommunication. Hydrometeors between transmitting and receiving antennas cause the microwave signal to be attenuated. From signal attenuation, the path-averaged rainfall intensity can be calculated. A 7-month dataset of instantaneously logged signal powers from almost 2000 unique links in the Netherlands is analyzed. Rainfall intensities are calculated with the RAINLINK package with a novel preprocessing module, enabling the package to be applied on instantaneously logged data from now on. Rainfall intensities per link are validated with the path-averaged rainfall intensities according to a gauge-adjusted radar product. Both the overall performance and the dependence of errors on link characteristics and measurement conditions are evaluated. The coefficient of variation decreases from 3.70 to 2.32 and the correlation increases from 0.30 to 0.63 from instantaneous to daily estimates of rainfall accumulations. The coefficient of variation is also smaller during heavy rainfall. Errors are largest for pathlengths shorter than 2 km, for observations during the late night and early morning, and for observations during colder months (when solid or melting precipitation could occur and dew is more likely to form on the antennas). Comparison of our results with those of earlier studies shows that minimum/maximum sampling (widely employed in network management systems) outperforms instantaneous sampling regarding detection of both quantity and occurrence of rain at a 15-min sampling rate in the Dutch climate.
Rain gauge networks and weather radar are well-known and prevalent rainfall monitoring technologies. However, rainfall observations are sparse or lacking in large areas of the world. Sensors are costly to install and maintain. Especially developing countries and urban areas are likely to benefit from denser rainfall measurements than currently available (Schilling 1991; Berne et al. 2004; Kidd et al. 2017).
Microwave links consist of a transmitting and a receiving antenna at several tens of meters above the ground and typically several kilometers apart, between which microwave signals propagate. The links, owned and maintained by commercial telecom operators for the purpose of cellular communication, are abundant in populated areas. Microwave links covered 20% of the world’s land surface in 2007 [Global System for Mobile Communications Association (GSMA; GSMA 2012)]. Upton et al. (2005) first suggested making use of signal attenuation over commercial microwave links (CMLs) to determine rainfall, notably in urban areas. It was demonstrated that this could be achieved with actual commercial microwave link data (Messer et al. 2006; Leijnse et al. 2007). This is promising as CMLs are widespread even in sparsely gauged areas (GSMA 2016) and a CML rainfall product is found to outperform some satellite rain products (for the Netherlands) (Rios Gaona et al. 2017).
CML data are obtained by logging the power levels of mobile operators, which is typically done by determining the minimum and maximum power levels over a time interval [minimum/maximum (min/max) sampling] or by logging instantaneous power levels every time interval. There have been numerous validation studies on several types of microwave link data in various climates (e.g., Goldshtein et al. 2009; Zinevich et al. 2009; Schleiss and Berne 2010; Chwala et al. 2012; Doumounia et al. 2014; Kim and Kwon 2018), and the technique is maturing toward becoming a real-time rainfall monitoring tool (Chwala et al. 2016). A good understanding of the accuracy of rainfall estimations from realistic CML networks is needed for its operational use. Several studies explored rainfall observations from large CML networks in the Netherlands with min/max 15-min sampling (Overeem et al. 2011, 2013; Rios Gaona et al. 2015; Overeem et al. 2016a,b; Van het Schip et al. 2017; Rios Gaona et al. 2017). Instantaneous sampling of signal levels is another common strategy for telecom operators, but to our knowledge, large validation studies were only conducted with min/max-sampled datasets. This study, for the first time, validates CML observations of a countrywide CML network (almost 2000 links) over a long period (~7 months), where the data are sampled instantaneously.
The general CML rainfall retrieval principle is explained in section 2. Many factors can influence the accuracy of rainfall estimates. This study aims to determine the overall accuracy of rainfall observations from instantaneously sampled CMLs and to identify the factors that affect that accuracy. Section 3 provides an overview of the known CML characteristics and their expected impact on rainfall estimation accuracy based on previous studies. Contrary to experimental or simulation studies (e.g., Berne and Uijlenhoet 2007; Leijnse et al. 2008, 2010; van Leth et al. 2018), these characteristics cannot be completely isolated in this dataset, but findings are indicative for large-scale occurrence. This is followed by the methods in section 4 detailing specifics of the instantaneous CML dataset and how rainfall estimates are calculated from it. The results in section 5 are related to the expected behavior (from section 3) in the discussion in section 6, followed by the conclusions on the value and prospects of this measurement technique for rainfall retrieval in section 7.
2. General rainfall retrieval principle
As telecommunication companies strive for a fully functional network at all times, avoiding too-low signal-to-noise ratios is a point of attention. The transmitted signal levels (TSL) and received signal levels (RSL) are typically stored in the mobile operators’ network management systems for the purpose of network quality monitoring. CMLs usually operate at frequencies where the signal is sensitive to hydrometeors, resulting in increased attenuation (TSL − RSL) during rainfall compared to the attenuation due to free-path loss only. The more raindrops and the larger their sizes, the larger the signal attenuation. From this signal attenuation with respect to dry weather, path-averaged rainfall intensity can be obtained via this power law (Atlas and Ulbrich 1977):
between specific signal attenuation k (dB km−1) and link path-averaged rainfall intensity R (mm h−1), where coefficients a (mm h−1 dB−b kmb) and b (nondimensional) are dependent on signal frequency and polarization (Olsen et al. 1978; Jameson 1991). Leijnse et al. (2010) determine these a and b values experimentally for Dutch conditions for a range of frequencies based on measured raindrop size distributions. Because for typical operational microwave frequencies, the R–k relation is considered almost linear. That means that, in principle, the path-averaged rain rate can be determined irrespective of the variability of the rain rates along the path (Overeem et al. 2011).
Rain-induced specific attenuation k can be isolated from the total signal loss over the link path by subtracting the attenuation (TSL − RSL) representative of dry conditions (free-space loss) and the attenuation due to wet antennas due to rain Aa (dB), and dividing that attenuation by the pathlength L (km):
Note that the wet-antenna attenuation Aa will lead to considerable rainfall overestimation if not accounted for (Leijnse et al. 2008). Identification of dry intervals to determine the baseline attenuation Adry (to avoid nonzero rain estimations during dry weather), the so-called wet–dry classification, has been constructed in various ways: based on auxiliary data from, for example, ground-based radars or satellites (Overeem et al. 2011; Van het Schip et al. 2017), time series of a single link at high sampling frequencies (Upton et al. 2005; Schleiss and Berne 2010; Chwala et al. 2012; Wang et al. 2012), or on observations of multiple links (Rahimi et al. 2003; Overeem et al. 2011; Rayitsfeld et al. 2012; Overeem et al. 2013). Alternatively, Fenicia et al. (2012) obtain the fraction of the total attenuation that is caused by rain without a wet–dry classification, but calculate a variable baseline with a one-parameter linear low-pass filter.
3. Error dependency on link characteristics and observation conditions
As stated in section 2, path-averaged rainfall intensity can be calculated after extracting k from the total signal power loss, calculated as TSL − RSL. The accuracy of the resulting rainfall observations is affected by several factors whose impacts depend on each other. This section highlights these dependencies as reported in previous studies and formulates the expected impact for the CML dataset evaluated in this study.
a. Sampling strategy
In min/max sampling, the minimum and the maximum RSL over the time interval are recorded, where the TSL is often constant. Instantaneous sampling yields snapshots of RSL and TSL. Telecom providers may choose to vary TSL depending on RSL to ensure signal transfer. This does not negatively affect the rainfall retrieval as long as the changing TSL are logged, in the case of instantaneous sampling at the same time as the RSL, with similar accuracy and precision as the RSL (although in practice the TSL are often logged with less accuracy and precision, hence leading to a degradation in the quality of rainfall retrievals).
Different types of CML sampling strategies result in different errors. Leijnse et al. (2008) simulate microwave link signals from radar data. Three sampling strategies have been compared: (i) continuous powers are converted to rainfall intensities and averaged over 15 min, (ii) averaged signal powers over 15 min are converted to rainfall intensities (somewhat similar to min/max sampling), and (iii) instantaneous powers every 15 min are converted to rainfall intensities (i.e., instantaneous sampling). Based on the overall simulation experiment, Leijnse et al. (2008) show the limitations of instantaneous sampling as compared to the other two strategies.
Ostrometzky et al. (2017) show that min/max sampling in combination with rounding of logged signal levels will yield a positive bias equal to the quantization interval in TSL plus twice the quantization interval in RSL. This bias is the same for both wet and dry periods. Therefore, by correcting with a baseline based on dry weather intervals, this bias is canceled out completely. Nevertheless, rounding of power levels will lead to errors that are most severe in cases where the rain-induced attenuation over the link path is small and hence the effects of rounding are large. This is the case for short links and low frequencies, especially for low rainfall intensities (Leijnse et al. 2008).
Although instantaneous sampling may be preferable at high sampling frequencies, at 15-min intervals it is expected to miss some of the rainfall dynamics over the interval and to be outperformed by the min/max-sampling strategy that uses information over the complete interval.
An extensive number of min/max-sampled CML datasets have been evaluated in the Netherlands (Overeem et al. 2011, 2013; Rios Gaona et al. 2015; Overeem et al. 2016a,b; Van het Schip et al. 2017; Rios Gaona et al. 2017). As the current instantaneously sampling network is employed in the same climatic conditions, this allows for a fair comparison between the two sampling strategies. Appendix A includes a table with an in-depth overview of the previous CML studies carried out in the Netherlands, in which the dataset of the study by Overeem et al. (2016b) has been revisited to improve comparability with the current study.
b. Link characteristics
The accuracy of link rainfall estimates is influenced by frequency and pathlength in three distinctive ways:
Frequency-dependent choice of a and b [Eq. (1)]
When estimating R from k [Eq. (1)], one needs to obtain a and b values that describe that relation for a certain polarization and frequency in that climate. The actual relation may not be as deterministic in reality, and assuming the relation as such can yield an error in estimated R. Berne and Uijlenhoet (2007) quantify this error in a simulation experiment to be nearly negligible (mean relative error < 2%) for links paths longer than 5 km. Second, climatological standardized parameters, for example, from the International Telecommunication Union (2005), are often the only available information, although this can yield very different rainfall estimates than from fitted a and b values using data from the region under study. Berne and Uijlenhoet (2007) show that assuming these climatological parameters can result in a mean relative error up to 20% for frequencies of 15 GHz.
The a and b parameters used in this study were determined experimentally by Leijnse et al. (2010) for a range of frequencies for both horizontal and vertical polarization in the Dutch climate. Leijnse et al. (2008, 2010) showed a nearly perfect fit between k and R at 30 GHz and considerable scatter at 10 GHz. This suggests an expected larger error for the links at the lower end of the frequency range in this study.
Pathlength and rainfall variability
Path-averaged rain rate estimation relies on a near-linear R–k relation, where k is constant over the link path. The errors due to variable rain rates along the path will be largest for frequency and polarization combinations where b deviates from 1 (Leijnse et al. 2010; Overeem et al. 2011), which is at frequencies considerably higher or lower than 32 GHz in Dutch conditions (Leijnse et al. 2008). Overeem et al. (2011) estimate that the error due to rainfall variability over the link path varies between ~+2.2% for 13 GHz and ~−0.4% for 39 GHz.
Pathlength and relative measurement error
Rainfall can only be estimated as path-averaged rainfall intensities, meaning that the spatial scale of the observation increases with pathlength. Under constant rainfall intensities, absolute attenuation due to rain is larger over long link paths than over short links (given the same signal frequency). Measurement errors in total attenuation over long links are divided by longer pathlengths resulting in a relatively smaller error in k. Also the contribution of wet-antenna attenuation is relatively larger for shorter link paths.
These three mechanisms show the intricate way link characteristics may impact the uncertainty in a deterministic R–k relationship and thereby in CML rainfall estimations. The CML dataset evaluated in this study contains links with frequencies varying between 39.3 and 12.8 GHz and pathlengths between 0.5 and 18 km, where short links typically correspond with high signal frequency values and vice versa. The majority of the links is vertically polarized (72%).
Based on the previous research described in this section, we expect best results in our dataset for links longer than 5 km transmitting signals at frequencies near 32 GHz.
c. Meteorological conditions
1) Heavy rainfall
Rainfall intensity is related to the uncertainty in the assumptions made in wet-antenna attenuation and to the effect of power-level rounding. The wet-antenna attenuation results in a larger overall attenuation than merely due to k, which could lead to a considerable overestimation if not accounted for. In the simulation study by Leijnse et al. (2008) this Aa depends on rainfall intensity and to a lesser extent on frequency (over the range of the current dataset). At 20 GHz, Aa of a single antenna varies between ~1 dB at 1 mm h−1 and ~2 dB at 10 mm h−1 (Leijnse et al. 2008). Nevertheless, Overeem et al. (2016b) show good rainfall observations from a Dutch min/max-sampling CML network where Aa has a constant value of 2.3 dB (for two antennas). This corresponds with Schleiss et al. (2013), who find the dependency of wet-antenna attenuation on time after the start and end of the rainfall is larger than the dependence on rainfall intensity. Second, the error due to rounding of RSL and TSL becomes relatively smaller when the total attenuation increases, that is, for larger rainfall intensities.
2) Time of day
Rainfall estimations are prone to errors because of dew formation on the antenna (van Leth et al. 2018) and changes in the atmosphere that affect the refractive index of air and thereby the microwave signal propagation (i.e., superrefraction and ducting). Valtr et al. (2011) report how the attenuation of a microwave signal may vary during the day because of these effects on beam propagation over the link path. Both the dew formation and the atmospheric conditions related to beam propagation mostly occur during late night or early morning, and they therefore impact the attenuation most severely then.
3) Time of year
The rainfall retrieval principle described in section 2 only applies for liquid precipitation. Solid precipitation along the link path attenuates the signal far less than rain droplets. However, melting precipitation along the link path or on the antenna heavily attenuates the signal, possibly over a long period. This may lead to considerable errors in microwave rainfall estimation during and after solid or melting precipitation events if not properly accounted for (Paulson and Al-Mreri 2011). In the Netherlands these events are most likely to occur in winter (December–February).
This study uses a CML dataset from telecom provider T-Mobile NL, consisting of instantaneous transmitted and received powers at 15-min resolution over 1936 unique microwave link paths in the Netherlands. The signal frequency ranges between 12.8 and 39.3 GHz. Transmitted powers (dBm) are truncated to integers and typically remain constant over long periods of time. Received powers (dBm) are provided in one decimal accuracy. The period from 18 February to 16 October 2016 is evaluated, chosen because of the high data availability during this period. Metadata of the links consist of location and height of receiving and transmitting antennas, link ID, signal polarization, and microwave frequency. Figure 1 shows the layout and data availability of the link network.
For validation a climatological rainfall dataset was obtained from the Royal Netherlands Meteorological Institute (KNMI) based on data from two C-band radars (in Den Helder and de Bilt, and a new radar in Herwijnen during the last 4 weeks of the study period) adjusted with data from two rain gauge networks (31 automatic and 325 manual gauges) (Overeem et al. 2009a,b, 2011), freely available online (https://data.knmi.nl/datasets/rad_nl25_rac_mfbs_em_5min/2.0). Rainfall amounts are given over the entire land surface of the Netherlands in pixels of ~1 km2 with a temporal resolution of 5 min.
b. RAINLINK processing
The software package RAINLINK (Overeem et al. 2016a), written in scripting language R, has been designed to calculate CML rainfall estimates from minimum and maximum RSL with a constant TSL. A simple preprocessing module replaces the minimum and maximum RSL columns (dBm) with RSL − TSL (negative attenuation; dB). The unit change and the negative values have no consequences for the following calculations. This approach enables the package in its current form to calculate rainfall from instantaneous power measurements. Also, it no longer requires constant TSL and makes use of polarization-dependent a and b parameters [Eq. (1)]. The R package, its documentation (including documentation on the additional preprocessing module for instantaneous sampling), and a 2-day-sample dataset of over 2000 CMLs are freely available on GitHub (via https://github.com/overeem11/RAINLINK).
The RAINLINK package is used to convert RSL − TSL into rainfall observations following six steps:
Duplicated link IDs, IDs with inconsistent metadata, and links with frequencies outside the range 12.5–40.5 GHz are excluded from the analysis. Note that all the links in the dataset operate within this frequency range.
The wet–dry classification relies on the spatial correlation of rainfall. When more than half of nearby links (within a radius of 15 km) experience reduced RSL − TSL levels, the interval is labeled as wet. These reductions are calculated as the difference between the RSL − TSL as compared to the maximum (i.e., least negative) RSL − TSL values of the respective links over the past 24 h, both as difference and as difference divided by the link pathlength. If the median of all nearby links is more negative than predefined thresholds [QmP (dB) for difference and QmPL (dB km−1) for difference per kilometer link path], the link is labeled as wet for that interval. This step is optional in the software, which when omitted results in all intervals to be labeled as dry in step 3.
The median RSL − TSL of all dry intervals during the previous 24 h is considered the dry weather reference signal level.
In the outlier filter, intervals of a link are excluded if the cumulative difference between its specific attenuation and that of the surrounding links over the previous 24 h becomes lower than the outlier filter threshold.
From the attenuation during wet intervals, the dry weather reference and a fixed value for wet-antenna attenuation Aa are subtracted. This value divided by the pathlength yields k.
The value of R is calculated from k [with Eq. (1)], using frequency-dependent a and b values provided in the package. In case of min/max-sampled data the weighted average of the Rmin and Rmax rainfall would be calculated with weighing factor α. However, the α parameter becomes obsolete in case of instantaneous sampling.
An extensive explanation of the rainfall retrieval algorithm is provided by Overeem et al. (2016a). Note that in the default version of the RAINLINK package steps 2 and 3 are applied on power levels instead of attenuation values.
As parameter values may differ between climates and sampling strategies, it is recommended to calibrate RAINLINK parameters on a subset of the dataset of interest (Overeem et al. 2016a). The default parameter values for QmP, QmPL in step 2, and Aa in step 5 in RAINLINK were based on min/max-sampled data (Overeem et al. 2016a). These parameters have been newly determined according to an optimization analysis described in appendix B. The outlier filter threshold (used in step 4 of the RAINLINK package) was kept at its default value. The resulting values for QmP and QmPL (−0.6 dB and −0.4 dB km−1, respectively) became less negative, and the Aa value (1.4 dB) was lower than the default. Note that this default value was determined in a previous analysis on min/max data by jointly optimizing α and Aa (Overeem et al. 2016a), which may have resulted in values of α compensating for the higher values of Aa (or vice versa).
To obtain a link-based validation dataset from radar pixel measurements, the rainfall intensities of all pixels overlying the link path are averaged, weighted by the length of the link path segment through the pixel. Two types of radar rainfall intensity references (mm h−1) are used in this research. The first one is used for the validation of instantaneous CML observations, and is constructed from radar data of a single time interval of 5 min closest to the instantaneous CML time stamp, denoted .
Second, the temporal average of the three path-averaged radar rainfall intensities within the 15 min for which the instantaneous CML measurement is assumed to be representative, denoted . The value includes all rainfall information between CML instantaneous observations, and is used to examine how well CML describes rainfall over longer periods when aggregating measurements over longer time intervals. Whenever or rainfall intensity from link observations were aggregated over longer durations, any interval with less than 80% availability of 15-min intervals was discarded.
Path-averaged link rainfall estimates are determined over ~7 months (excluding the 10-day calibration subset used in the optimization analysis). To validate with , where can be either or (aggregated) , the following diagnostics were calculated. Intervals where both and observe zero rainfall (i.e., dry weather) are excluded. The Pearson correlation coefficient r is defined as
where is the covariance between x and y, and is the standard deviation of x. The relative bias (bias from now on) is defined as
with indicating the mean of R and
The coefficient of variation (CV) is calculated as the standard deviation of the difference divided by the mean of the reference:
Two measures of how often link observations correctly estimate the occurrence of rainfall are probability of detection (POD) and false alarm ratio (FAR):
Here, hits are the number of intervals and when both detect rainfall, misses are the number of intervals when detects dry weather while is rainy, and the false alarms are the number of intervals when is rainy but is dry. An interval can be labeled as “wet” when the rainfall is above zero, or above a certain rainfall minimum (where “dry” intervals measure zero rainfall or rainfall amounts below this minimum).
The entire dataset was subdivided based on known features in the CML dataset to determine their impact on the overall accuracy:
Link geographical location
Link signal frequency
Link signal polarization
Time of day
Time of year
Rain type, for example, during heavy rainfall events
Intervals of heavy rainfall were determined based on ; intervals in the link dataset during all events when reported at least an average of 6 mm h−1 over 2.5 h or longer were selected. Intervals at the start and end of an event with rainfall intensities below 3 mm h−1 were excluded, and the complete event was discarded if it became shorter than 1 h after excluding start and end intervals, that is, if it contained only a few or a single high rainfall intensity value(s). The value of was compared with for all selected intervals for all links. These criteria were found to provide a substantial subset while still containing continuous periods of heavy rainfall.
a. Overall comparison
Figure 2 shows time series of the number of links in the dataset before and after applying the RAINLINK algorithm. The link density decreases over time. The gaps in the time series are in principle due to data storage issues in the network management system of the cellular telecommunication company rather than the CML network being out of service. After processing the data there are on average 1458 links available per 15-min interval over the total period. Data availability reduced to an average of 1451 links per 15-min interval when missing values in the radar reference were taken into account.
The performance of all links over the entire period, disregarding any characteristic of the links or time of observation, is visualized in Fig. 3. The links overestimate rainfall and there is a considerable spread around the diagonal (gray line) that would indicate perfect agreement between the link and reference. The comparison between links and instantaneous radar Rrad,5min during heavy rainfall intervals shows a far smaller bias and reduces the CV from 3.70 to 1.59 in the total dataset (Fig. 3).
Figure 4 shows the comparison with for 15-min, hourly, 3-hourly, and daily accumulations. This demonstrates the ability of the link observations, once every 15 min, to describe rainfall dynamics at various temporal aggregation scale. The accuracies of the links increase for longer accumulation intervals, as shown by a decrease in CV (4.15, 3.43, 3.24, and 2.32, respectively) and increase of r (0.28, 0.52, 0.57, and 0.63, respectively). Link rainfall estimates are higher than the radar reference on average for all accumulation intervals.
To explore whether over- or underestimation occurred similarly for all links, all double-mass curves (i.e., total link accumulation vs total radar accumulation) are evaluated in a density plot (Fig. 5). Intervals where either the link or the reference contained missing data were excluded. This figure shows that the densest region of the double-mass curves is parallel to the diagonal, indicating agreement between link and reference. However, there is overestimation by the links as shown by the large density area above the diagonal. A considerable fraction of the overestimation by CMLs occurs near the origin, corresponding in most cases with the colder earlier months of the dataset when melting precipitation may occur.
The accuracy of the links in determining rainfall occurrence, as opposed to rainfall amounts, is visualized in Fig. 6. The values of POD and FAR are displayed for a rain occurrence threshold of 0 mm, as well as 0.1 and 0.5 mm during the time interval. Although the POD increases and the FAR decreases for longer time intervals, the occurrence of rainfall in the reference was missed in ~60% of the hourly intervals by the links with a threshold of 0.1 mm when we assume to be accurate.
b. Link characteristics
Known link characteristics are geographic location, link pathlength, signal frequency, and signal polarization. Geographic location was found not to be a significant factor on accuracy in this dataset, as the validation metrics of links showed no spatial dependence over the Netherlands (not shown).
Each link has its individual signal frequency. Shorter links generally operate at higher frequencies. In Fig. 7 the comparisons of link rainfall and radar reference per bin of frequency and pathlength show relative large bias, low POD, and high CV for the shortest links, especially up to 2 km. The longer links show mixed results and a poor performance in terms of POD values, although this may in some cases (e.g., bin A2) be due to outliers in the relatively small number of values (Fig. 7). The error seems to depend more on pathlength than frequency and the dataset is evaluated based on pathlength only. Figure 8 shows similar results for all links except for links shorter than 2 km and between 12 and 13 km (corresponding with bin A2 in Fig. 7), where particularly the bias and CV are largest. Further examination showed that the poor performance of links with pathlengths of 12–13 km is due to 2 of the total 17 links of this length class that consistently overestimate rainfall during the complete study period.
The metrics of horizontally and vertically polarized links were also compared (not shown). Measurements from the horizontally polarized links (26% of observations in dataset) have a smaller relative bias (0.25 vs 0.33), similar r (0.29 vs 0.30), and smaller CV (3.6 vs 3.7) compared with measurements from vertically polarized links. The subsets are different in the sense that link paths of over half of horizontally polarized links are shorter than 2 km, against only one-third of the vertically polarized links.
c. Time of observation
Because of the weak signal effect of snow and heavy attenuation during melting precipitation on antennas and along the link path, it is likely that the link rainfall retrieval error is larger in winter months. The CV and bias were particularly large in the first month, with also a correlation lower than in other months (Fig. 9). Further evaluation showed that solid precipitation did occur in the Netherlands in February and March.
The metrics show high positive bias and large values of CV and FAR particularly between 0400 and 0600 UTC. This corresponds with the end of night/early morning (local time), and may be related to dew formation and atmospheric changes affecting beam propagation (Fig. 10).
a. RAINLINK parameter choices
The optimization analysis of RAINLINK on instantaneous data results in less negative thresholds than the parameters that were obtained in min/max-sampled data in the same climate: QmP = −0.6 dB instead of −1.4 dB and QmPL = −0.4 dB km−1 instead of −0.7 dB km−1. This means that an interval is more easily classified as wet with the new parameters. However, the relatively low POD values in Fig. 6 imply that a significant fraction of rainfall is missed, even with the new thresholds. The Dutch min/max-sampled CML dataset from Overeem et al. (2016b) (evaluated in the same manner as the current dataset) indeed shows a slightly higher POD and a lower FAR than the instantaneously sampled dataset (see appendix A). This suggests that less rainfall is missed in min/max sampling than in instantaneous sampling, even with adjusted parameter settings. Min/max sampling enables the detection of rainfall at any time in the interval, where instantaneous sampling may miss it.
The optimized Aa parameter changed from 2.3 to 1.4 dB, which is more in the range of the values found by Overeem et al. (2011) (1.2–1.9 dB), although the authors find an Aa value of 2.3 dB more suitable in a later study (Overeem et al. 2013). In both studies, the Aa value is determined together with another parameter α, the weighting factor between the calculated rainfall intensity from the minimum and maximum RSL, which is obsolete for instantaneously sampled data. These two parameters (α and Aa) are highly dependent, where higher values of α yield similar accuracy as long as Aa increases as well (Overeem et al. 2016b). This highlights the need for determining Aa in the absence of α in the processing of the instantaneous dataset, and that such a lower value would make more sense for use in the algorithm where min/max values are available.
A fixed Aa of 1.4 dB (effectively 0.7 dB per antenna if both are wet) could be an underestimation of the actual Aa for rainfall intensities above 1 mm h−1 (Leijnse et al. 2008). This would result in a rainfall overestimation during heavy rainfall. This overestimation was not found for the current dataset (see Fig. 3), which suggests that other factors overshadow this effect.
b. Accuracy dependence on link and rain characteristics
The figures in sections 5b and 5c should not be mistaken as a means to identify the effect of isolated characteristics, but a means to identify the most prominent influences on rainfall observations experiencing multiple error sources. As short links typically operate at high frequencies, causality of the relatively larger errors in this subset can be difficult to determine. Horizontally polarized links perform slightly better than vertically polarized links especially concerning bias. Horizontally polarized links are more sensitive to rainfall along the link path (Ruf et al. 1996).
The graphs in Figs. 8–10 were also constructed on hourly values (not shown). Although, as expected, the values indicated better agreement with the reference (see also Fig. 4b vs Fig. 4a), the dependence of the values on pathlength, time of day, and time of year were the same as for the instantaneous comparison.
The error seems to depend more on pathlength than on frequency in Fig. 7. Not all bins of pathlength and frequency are equally well represented in the dataset, and the low number of values on which the calculations are based may be responsible for cases of poor accuracy. The poor accuracy of the 12–13-km pathlength links is due to two links with continuous rainfall overestimation. Excluding these two links would result in validation metrics similar to those of 11–12- and 13–14-km bins. The accuracy is lowest for links shorter than 2 km, which was not due to outliers in this subset. It could however be caused by larger relative measurement errors (see also section 3), or the large uncertainty in the ground-truth reference that is constructed from pixel-averaged rainfall intensities.
Previous research described in section 3b indicated that links longer than 5 km and transmitting signals with frequencies near 32 GHz are expected to be most accurate. Figure 7 shows that the corresponding bin (E6) indeed shows relatively high r and CV, and it is the only bin where rainfall is not overestimated. However, the POD and FAR are not as good as for some other pathlengths and frequencies.
The temporal variation in the error corresponds with the possible influence of atmospheric processes, dew, and solid precipitation (Valtr et al. 2011; Paulson and Al-Mreri 2011; van Leth et al. 2018), but this cannot be verified in our study setup. Additional measurements in an experimental setup are needed to confirm a causal relation.
c. Performance compared to other studies
The current dataset originates from a network in the same climate (the Netherlands) as several 15-min min/max-sampled datasets explored by Overeem et al. (2011), Rios Gaona et al. (2015), Overeem et al. (2016a,b), Van het Schip et al. (2017), and Rios Gaona et al. (2017). Although these studies differed slightly from the current one in terms of validation approach (e.g., the decision to exclude intervals below a certain threshold, map-based or path-based comparison), network (e.g., number of links), and accumulation interval, they enable us to draw conclusions on the relative performance of instantaneous datasets as compared to min/max-sampled CML data. The results found in these studies that most closely resemble the current study are summarized in appendix A (Table A1), and the dataset in Overeem et al. (2016b) has been reanalyzed in the same manner as the current approach for better comparison. These values show a smaller bias, lower CV, and higher correlations in rainfall observations from min/max-sampled data at 15-min intervals than from instantaneously sampled observations, as was hypothesized in section 3a.
a. Validation outcome
This study explores the accuracy of rainfall observations obtained from an instantaneously sampling CML network covering the Netherlands, operating at frequencies between 12.8 and 39.3 GHz. A preprocessing module was successfully added in order to enable the open-source R package RAINLINK to process data from this sampling format. Instantaneous rainfall observations were found to have a CV of 3.70 and r of 0.30, with improved results during heavy rainfall and at longer time intervals, with a CV of 2.32 and r of 0.63 for daily observations. The links tend to miss a considerable fraction of rainfall that is captured by the ground truth reference (POD of ~40% to detect rainfall exceeding 0.1 mm), and overestimate rainfall intensities when they do detect it (overall bias of ~+30%).
There are various ways in which signal frequency, polarization, and geographic location could potentially influence the accuracy of CML rainfall observations, but for this dataset the effect of pathlength and temporal variability was most prevalent. This was evident from the larger errors in links shorter than 2 km, in the first month of the dataset, and between 0400 and 0600 UTC. These errors could be related with large relative measurement errors at small spatial sampling and/or uncertainty in the ground truth reference over short links, solid or melting precipitation in February/March, and dew formation and/or beam propagation changes in the early morning.
Comparisons with the performances of CML networks as explored in previous studies in the same climate indicate that min/max sampling, which effectively includes information over the whole time interval, is preferable over instantaneous sampling at 15-min time intervals. The instantaneous sampling approach is expected to improve at shorter time intervals as more information on rainfall dynamics can be captured that way. It should be noted that CML networks are not installed or maintained for rainfall monitoring, but can successfully be used as such.
b. Future prospects
Understanding the potential of obtaining rainfall estimates from CML datasets captured in commonly used sampling strategies is vital for future operational use. Communicating the benefits for improved rainfall observation accuracy by slight modifications in the manner that RSL levels are logged in network management systems, without deteriorating the information that telecommunication companies obtain from these observations, may be the way forward for future CML collaborations. These modifications would comprise more frequent logging of TSL and RSL, as well as making these values real-time available.
In this work the gauge-adjusted radar product was considered as ground truth, even though radar estimates rainfall based on observations of an atmospheric volume aloft. The differences in sampling compared to ground-based sensors can lead to considerable differences in observations that in our current work are interpreted as errors in the CML estimate. It may therefore be interesting to validate in the future with collocated ground-based sensors, that is, by comparing CML link estimates with nearby WMO rain gauges. This can also be done to investigate in further detail the CML accuracy under various weather conditions.
Even though CML networks have not been designed for environmental sensing, they are successful in capturing rainfall signals. However, the introduction of fiber optical communication technology has led to a reduction in the number of operational commercial microwave links in the Netherlands. The T-Mobile CML network in the Netherlands is currently far sparser than represented in Fig. 1. The use of CML observations for rainfall monitoring has most potential for those areas in the world where fiber optical cable communication is not favorable (e.g., because of local topography, limited benefits, or financial constraints) and other rainfall sensing techniques are lacking. According to Ericsson, a key player in the construction of commercial microwave links, 40% of global radio sites will be connected by CMLs in 2023 (Sellin 2018).
This study shows that the RAINLINK package is applicable to instantaneously sampled datasets. Apart from studies in Brazil (Rios Gaona et al. 2018), Italy (Alberoni et al. 2018), and Pakistan (Sohail Afzal et al. 2018) RAINLINK has mainly been applied to Dutch datasets. To advance the process toward operational application and upscaling of this technique, long time series should be analyzed for other networks and climates. It can be particularly useful to blend rainfall data from CMLs with other rainfall observations (e.g., Haese et al. 2017).
The authors are grateful to T-Mobile NL for making the microwave link data available. The assistance of Ronald Kloeg and Ralph Koppelaar in the provision and interpretation of the data is greatly appreciated. This work was part of a multiyear strategic research project (MSO) by the Royal Netherlands Meteorological Institute, partially financially supported by the Amsterdam Institute for Advanced Metropolitan Solutions (AMS) and the SMART city project (Project 13760) funded by Netherlands Technology Foundation (STW; currently NWO-TTW).
POD and FAR of Min/Max-Sampled and Instantaneously Sampled Data
The POD and FAR values indicate how often the CMLs and reference agree on the occurrence of rainfall. Figure A1 shows these values for two different thresholds above which the interval is considered as wet, with higher POD and FAR for higher thresholds. It should be noted that the validation dataset used here is a radar product that measures rainfall at an elevation of typically ~1.5 km because of the beam sampling by radar, which leads to uncertainties regarding actual rainfall at the surface. Nevertheless, the min/max-sampled results are consistently better (i.e., higher POD and lower FAR) than their instantaneously sampled counterparts.
Table A1 provides an overview of results found in similar CML validation studies. The work of Overeem et al. (2016b) was revisited. It is included once with the metrics reported in the original paper, and once where the results were validated in the same manner as the current study, in terms of thresholds, link-based comparison instead of map-based comparison, and months making up the study period.
Cost Function Determining the Optimal RAINLINK Parameters
The parameters QmP, QmPL, and Aa of the RAINLINK software were newly determined for the instantaneous dataset. The rainfall retrieval is polarization dependent; that is, the a and b values depend on both the polarization and frequency of the link. A subset of 10 rainy days spread over the study period were used to calculate hourly rainfall estimates for all combinations of QmP between −2 and −0.2, QmPL between −1.4 and −0.2, and Aa between 0 and 3 (steps of 0.2). Validation with gauge-adjusted radar cumulated to hourly values yields metrics for each of the 10 days on: coefficient of variation (CV), Pearson correlation r, number of intervals where rainfall could be estimated n, relative bias in the mean (bias), probability of detection (POD; %), and false alarm ratio (FAR; %). The best combination of parameters was determined as the set where the cost function was minimal:
Here j indicates the combination of parameters, with i a day d in the subset of 10 rainy days. For the instantaneous CML dataset, the parameter combination that yields the smallest CV, bias, and FAR values and the highest r, n, and POD values for the 10 days in the optimization is QmP of −0.6 dB, QmPL of −0.4 dB km−1, and Aa of 1.4 dB.