## Abstract

Calibration error represents a significant source of uncertainty in quantitative applications of ground-based radar (GR) reflectivity data. Correcting it requires knowledge of the true reflectivity at well-defined locations and times during a volume scan. Previous work has demonstrated that observations from certain spaceborne radar (SR) platforms may be suitable for this purpose. Specifically, the Ku-band precipitation radars on board the Tropical Rainfall Measuring Mission (TRMM) satellite and its successor, the Global Precipitation Measurement (GPM) mission *Core Observatory* satellite together provide nearly two decades of well-calibrated reflectivity measurements over low-latitude regions (±35°). However, when comparing SR and GR reflectivities, great care must be taken to account for differences in instrument sensitivity and frequency, and to ensure that the observations are spatially and temporally coincident. Here, a volume-matching method, developed as part of the ground validation network for GPM, is adapted and used to quantify historical calibration errors for three S-band radars in the vicinity of Sydney, Australia. Volume-matched GR–SR sample pairs are identified over a 7-yr period and carefully filtered to isolate reflectivity differences associated with GR calibration error. These are then used in combination with radar engineering work records to derive a piecewise-constant time series of calibration error for each site. The efficacy of this approach is verified through comparisons between GR reflectivities in regions of overlapping coverage, with improved agreement when the estimated errors are removed.

## 1. Introduction

Since their development following the end of the Second World War, ground-based weather radars have become an indispensable tool for studying precipitation systems and associated phenomena on scales ranging from tens of meters to thousands of kilometers. Of particular value is their ability to provide quantitative information about surface rainfall intensity. Such information is used by forecasters to monitor and warn for hazardous extreme-rain events, serves as input data to hydrological models, and allows for areal verification of quantitative precipitation forecasts. However, radar-derived rainfall estimates are subject to significant uncertainties (Villarini and Krajewski 2010). Many of these relate to assumptions that must be made regarding the drop size distribution and its evolution as hydrometeors fall from the level of observation to the surface, but perhaps the most fundamental uncertainty is that associated with errors in radar calibration.

The primary quantity measured by weather radars is the equivalent reflectivity factor *Z* (hereinafter reflectivity^{1}; mm^{6} m^{−3}). This is related to the returned power from a target at range *r* via the radar equation

Here, *C* is the so-called radar constant, which depends on the radar system’s characteristics (e.g., transmitted power, wavelength, beamwidth, pulse duration, antenna gain). In reality *C* is not constant but varies as a result of degradation, maintenance, and replacement of various radar system components, as well as thermal effects. Taking the common logarithm of (1) and multiplying by 10, we obtain an expression for the reflectivity in decibel units (dB*Z*)

where for variable *χ*. Hereinafter, we drop the circumflex and simply use *Z* to denote reflectivity, irrespective of the units. It can be seen that any error in the assumed value of will produce an equivalent error in the reflectivity. This is referred to as a calibration error.

Maintaining a well-calibrated radar system requires regular testing and maintenance of those components that influence the true value of . Since this can be both time consuming and costly, there is great value in so-called end-to-end calibration tests that characterize the system as a whole. These tests typically involve the measurement of a target (or targets) with well-defined scattering properties, such as a standard reflector or metal sphere (Atlas 2002; Chandrasekar et al. 2015). An alternative approach is to compare reflectivity measurements with those from an independent well-calibrated radar system. The Ku-band precipitation radar (PR) on the Tropical Rainfall Measuring Mission (TRMM; Simpson et al. 1996) satellite, operational from 1997 to 2014, represented one such system. Internal and external calibration checks showed that in the absence of attenuation, PR reflectivity measurements were accurate to within 1 dB (Kawanishi et al. 2000; Takahashi et al. 2003). The Ku-band component of the Dual-Frequency Precipitation Radar (KuPR) on board the Global Precipitation Measurement mission (GPM; Hou et al. 2014) *Core Observatory* satellite, which has now superseded TRMM, is anticipated to be equally accurate.

The task of comparing reflectivities observed by spaceborne and ground-based radars (hereinafter SRs and GRs, respectively) is complicated by the wildly different sampling characteristics of the two instruments. Operational GRs typically perform volume scans at regular intervals of 5–10 min. These scans consist of 360° radial sweeps performed at multiple elevation angles, ranging from near zero to around 20°–30°. Samples of reflectivity are recorded every 0.5°–1° in azimuth and every 250 m –1 km in range, out to maximum ranges of 150–300 km. By comparison, the TRMM PR and GPM KuPR measure quasi-vertical profiles of reflectivity within ~250-km-wide orbital swaths, with horizontal and vertical sampling intervals of 5 km and 125–250 m, respectively. Sun-asynchronous orbits give rise to quasi-periodic observations at all locations within the satellite’s latitudinal range (±35° for TRMM, ±65° for GPM) with typical overpass frequencies of 1–2 day^{−1}. Another important difference between GR and SR measurements relates to the atmospheric volume sampled by each radar pulse. This volume is proportional to the angular beamwidth and increases with the square of range due to beam broadening. As a consequence, GR sample volumes vary by approximately five orders of magnitude within the instrument’s field of view. In contrast, for SRs the extent of measurements in the range (vertical) direction is limited to the first 20 km above the surface and thus the relative variation in sample volume is small.

To quantitatively compare SR and GR reflectivities, measurements must be associated in time and space. The ground speed of the satellites is sufficiently high that measurements across a typical GR field of view can be treated as instantaneous. Temporal association is thus achieved simply by identifying the GR volume scan closest in time to a given SR overpass. Because of the different sampling geometries, spatial association is much more challenging. Many researchers have taken the fairly simple approach of remapping both observation sets to a common three-dimensional Cartesian grid, using nearest-neighbor or linear interpolation (e.g., Anagnostou et al. 2001; Liao et al. 2001; Bolen and Chandrasekar 2003; Liao and Meneghini 2009b; Wang and Wolff 2009; Park et al. 2015). However, such procedures necessarily introduce errors that may swamp systematic differences in reflectivity associated with GR miscalibration. To overcome this issue, Schwaller and Morris (2011, hereinafter SM11) introduced what will herein be referred to as the volume-matching method (VMM). In this approach, intersections between individual SR beams and GR elevation sweeps are identified and the reflectivity values from both instruments are averaged within a spatial neighborhood around the intersection. Specifically, SR data are averaged in range over the width of the GR beam at the GR range of the intersection, while GR data are averaged in the range–azimuth plane within the footprint of the SR beam. The result is a pair of reflectivity measurements corresponding to approximately the same volume of atmosphere.

The VMM was originally developed as part of ground validation efforts in support of the GPM mission. While the potential of the method as a means to track GR calibration was immediately apparent to the developers, its use in this context has thus far been very limited. Kim et al. (2014) applied the VMM to four GRs in the Korean Peninsula for the period 2006–10, finding time-averaged calibration errors of between −2 and +1 dB. However, they were unable to identify shorter-time-scale variations in GR calibration as a result of the noisiness of the GR–SR comparisons. This characteristic was also noted by SM11 and is believed to result from a combination of factors, including imperfect spatial and temporal matching, differences in radar frequency, and errors in SR attenuation correction.

The present study summarizes our efforts using the VMM to quantify and correct historical calibration errors for three GRs in the vicinity of Sydney, Australia. We first explore how systematic variations in GR–SR reflectivity difference can be related to certain characteristics of the volume-matched sample pair. By isolating samples that are least influenced by these artifacts, it is possible to significantly reduce the noise in GR bias estimates. We then demonstrate how VMM results can be used in combination with radar engineering maintenance records to identify variations in GR calibration on inter- and intra-annual time scales. Finally, we present a simple method for comparing GR observations in regions of overlapping coverage as a means to validate the estimated bias corrections.

## 2. Methodology

### a. Data

The Australian Bureau of Meteorology (BoM) operates a diverse network of over 60 single-polarization GRs comprising a mixture of C- and S-band systems of varying age and make. This study uses data from three S-band radars located close to the cities of Sydney (SYD), Wollongong (WOL), and Newcastle (NEW) in the state of New South Wales (Fig. 1). Together, these sites provide coverage of a densely populated stretch of coastline that is frequently affected by high-impact weather, including damaging hail and extreme precipitation. The characteristics of the radars are listed in Table 1. For each GR, volume scan data for the period 15 May 2009 (the start of operational monitoring at SYD) to 31 December 2015 were extracted from BoM archives and converted from the in-house Radar Picture (RAPIC) format to OPERA (Operational Programme for the Exchange of Weather Radar Information; Köck et al. 2000) Data Information Model–Hierarchical Data Format, version 5 (ODIM-HDF5; Michelson et al. 2014), for processing. Note that all data are subject to on-site processing to mitigate ground clutter and noise (Rennie 2012). No additional quality control was applied for the present analysis.

BoM engineering staff perform regular (approximately once every 6 months) maintenance work at all GR sites. Relevant to the radar calibration are checks on the transmitted peak power, frequency, and pulse duration, and the receiver gain. Where necessary, these settings are adjusted and the radar constant is updated accordingly. In addition to these routine activities, unscheduled maintenance is sometimes required to deal with system failures or suspected faults. Records of all site visits are maintained on an internal database called SitesDB. While the information contained in these records is minimal, with only a date and a brief description of what was done (e.g., “02/12/2009: 6 monthly maintenance carried out”), it is sufficient to identify dates of *possible* calibration changes. In theory, calibration accuracy should improve following all maintenance work; however, as we shall demonstrate, this is often not the case.

The characteristics of the TRMM PR and GPM KuPR are listed in Table 2. TRMM operated almost continuously from December 1997 to April 2015, with the PR providing reliable measurements up to September 2014. This study uses data from version 7 of the Level 2 products 2A23 and 2A25 (Table 3). These consist of orbital swaths made up of a large number of individual PR scans that in turn comprise 49 individual rays. Each scan has a unique time stamp, and rays are georeferenced by the latitude–longitude coordinates of their intersection with the Earth ellipsoid. The 2A23 product contains information on precipitation type and the characteristics of the radar bright band^{2} (where present) for each ray, while 2A25 contains the vertical profiles of attenuation-corrected reflectivity. Precipitation type is determined based on the horizontal and vertical echo structure (Awaka et al. 2007), with three basic classifications: stratiform, convective, and other. The bright band is identified as outlined in Awaka et al. (2009). A hybrid method (Meneghini et al. 2004), combining the approaches of Hitschfeld and Bordan (1954) and Meneghini et al. (2000), is used to correct for attenuation of the SR beam, which can be significant in heavy rainfall. For the GPM KuPR, data are available from March 2014 onward. Version 4 of the 2AKu product is used, which contains the same basic variables as the 2A23 and 2A25 TRMM products (Table 3). All SR data were obtained using the STORM online data-access interface to NASA’s precipitation processing system archive (https://storm.pps.eosdis.nasa.gov). To reduce data volumes, only those sections of orbital swaths corresponding to GR site overpasses were extracted.

It is noted that at the time of writing, new product versions (version 8 for TRMM and version 5 for GPM) are in the process of being released. These include changes to the SR calibrations, corresponding to reflectivity increases of 1.1 and 1.3 dB for the TRMM PR and GPM KuPR, respectively (NASA 2017; Iguchi et al. 2017). It remains to be seen whether these will be the final adjustments, but for now it must be assumed that the GR calibration errors derived herein are biased low by a little over 1 dB. This serves to illustrate the main limitation of using radar intercomparisons to assess calibration: even the most carefully monitored systems can be in error.

### b. Volume-matching method

The VMM allows for a quantitative comparison of SR and GR reflectivities with minimal spatial processing of the two datasets. Intersections between an SR beam and a GR elevation sweep are identified, and the reflectivities from both instruments are averaged to roughly equate the sample volumes. SR reflectivities are averaged along the SR beam (approximately vertically) between the half-power points of the GR sweep. GR reflectivities are averaged in the range–azimuth plane (approximately horizontally) within the footprint of the SR beam. Figure 2 illustrates these averaging procedures for idealized cases at GR ranges of 50 and 100 km. Full details of the procedure are provided in the appendix. Here we note only the key differences between our implementation of the method and the original algorithm as described by SM11 and Morris and Schwaller (2009).

#### 1) Minimum and maximum ranges

As previously discussed, the volume of atmosphere sampled by a GR varies significantly across the instrument’s field of view as a result of beam broadening. This means that samples considered in the VMM also increase in volume with GR range. Given the limited vertical extent of many precipitating systems, it is appropriate to define a maximum range for volume matching to proceed. SM11 specified km, while we use a slightly higher value of 115 km. For the WOL radar, which has an angular beamwidth , this corresponds to a maximum beam diameter of 4 km. Since all the GRs considered by SM11 had , their maximum beam diameters were <2 km. However, as we shall show, the GR–SR reflectivity difference displays relatively little sensitivity to range (and thus beam diameter). Unlike SM11, we additionally specify a minimum range in order to exclude samples where the GR beamwidth is smaller than the SR gate spacing. Specifically, km, which for corresponds to a beam diameter of just over 250 m, the gate spacing of the TRMM PR.

#### 2) Frequency correction

The different frequencies used by the SR and GR systems promote systematic differences between the reflectivity measured by the two instruments that vary in both sign and magnitude depending on the scattering characteristics of particles within the sample volume. Scattering simulations can be used to quantify these differences and to derive empirical relationships for converting reflectivity measurements from one frequency to another. SM11 used the equations from Liao and Meneghini (2009a) to convert their GR reflectivities from S to Ku band, applying the equations for snow and rain above and below the bright band, respectively. Since we are interested in quantifying GR errors, it is desirable to instead convert the SR reflectivities from Ku to S band. We therefore use equations from Cao et al. (2013) that have the following form:

The coefficients (given in Table 1 of Cao et al. 2013) are specified for rain, dry snow, and dry hail, and for snow and hail at varying stages of melting (from 10% to 90% in 10% increments). The melting layer (ML) is defined as extending from to , where and are the SR-derived brightband height and width, respectively. To deal with the fact that a bright band is present only in stratiform precipitation, both quantities are computed as the median value across all stratiform SR rays that intercept the Earth ellipsoid between and . Overpasses where there are fewer than 10 such rays are excluded from further analysis.

#### 3) Reflectivity thresholds

Both the TRMM PR and GPM KuPR have nominal sensitivities of around 18 dB*Z* (Hou et al. 2014), although prelaunch tests showed that the KuPR may detect reflectivities as low as 14.5 dB*Z* (Toyoshima et al. 2015). In the VMM, only SR bins for which dB*Z* are included in the calculation of the average SR reflectivity. For each volume-matched sample, the fraction of SR bins within the volume that meet this criterion is recorded. A similar approach is taken with the GR using a different reflectivity threshold with the fraction of GR bins where denoted as . When analyzing the GR reflectivity bias, effects associated with nonuniform beam filling and the low PR sensitivity can be mitigated by excluding samples with and less than some threshold . Based on the analysis presented below, we set , while SM11 used the more stringent criterion . As discussed by Morris and Schwaller (2011) and illustrated below, GR–SR reflectivity differences derived using the VMM can vary substantially depending on the value of this threshold.

Another key difference is in our choice of the GR reflectivity threshold. SM11 set dB*Z* to match the SR sensitivity with an allowance for a −3-dB GR calibration error. While it is necessary to match the sensitivity of the two instruments when using one to quantify bias in the other, we argue that this should be done at a later stage in the analysis, namely, when comparing the spatially averaged reflectivities from the volume-matched samples. As detailed in section 3c, this allows for the implementation of an iterative bias correction procedure where GR samples are filtered according to their bias-corrected reflectivity at the *n*th iteration (Protat et al. 2011). In volume matching we therefore employ a much lower GR reflectivity threshold, dB*Z*.

#### 4) Reflectivity averaging

In the original VMM implementation, reflectivities for GR bins within the SR footprint are averaged using a Barnes Gaussian inverse-distance weighting, where distance is measured horizontally from the center of the SR footprint to the center of the GR bin (Morris and Schwaller 2009). This weighting is designed to account for the nonuniform distribution of power within the SR beam. The algorithm has since been updated to also include linear weighting based on the volume of the GR bins so that larger volumes are weighted more heavily (K. Morris 2015, personal communication). This is justified by the fact that GR bin volumes can vary by up to a factor of 2 within the PR footprint. Our VMM implementation uses this modified weighting scheme. As in the original algorithm, no weighting is applied in averaging the SR reflectivities as a result of uncertainties in the GR beam height associated with nonstandard refraction.

## 3. Results

### a. Comparison examples

Figures 3 and 4 show examples of GR–SR comparisons for the SYD radar. The former shows a comparison with TRMM on 22 November 2013, while the latter shows a comparison with GPM a little over a year later on 27 January 2015. The top row in each figure shows plan views at a particular elevation angle of the (frequency corrected) SR and GR reflectivities and their difference. The middle row shows vertical cross sections along a particular SR scan of the same fields. The bottom row presents a statistical comparison of the reflectivities from the two instruments across all volume-matched samples. Note that samples with or have been excluded from this analysis.

From the plan views of reflectivity it appears that the VMM produces good spatial agreement between the reflectivity measurements from the two instruments. This is confirmed by high values of the Pearson correlation coefficient (0.95 and 0.87 for the first and second comparisons, respectively). This agreement allows us to estimate the GR calibration error. In the first case, the error is close to zero (Fig. 3i); however, in the second case the GR shows a substantial negative bias of around 4 dB (Fig. 4i). It is thus apparent that the calibration of the SYD radar changed sometime between late 2013 and early 2015.

It is noteworthy that, on a point-by-point basis, the GR–SR reflectivity difference displays a large degree of scatter. For example, for the first case, the difference varies by more than 10 dB (from <−5 to >5 dB) across the 1.3° elevation sweep (Fig. 3c). Part of this variation will be associated with imperfect spatial matching of the data as a result of a combination of advection and evolution of the precipitation features during the time between measurements (200 s in this case) and beam propagation effects (e.g., nonstandard refraction of the GR beam). However, as we shall demonstrate in the next section, other factors, including the Ku-to-S-band frequency correction and the reflectivity value itself, also strongly influence GR–SR reflectivity differences.

### b. Comparison sensitivities

In this section, we investigate the sensitivity of the GR–SR reflectivity difference to various characteristics of the volume-matched samples. To eliminate effects associated with the time-varying GR calibration errors, we have applied the corrections derived in the next section to all GR data. We begin by examining the relationship between and , which is the minimum fraction of SR and GR bins within the sample volume with reflectivities above the respective thresholds, dB*Z* and dB*Z*. This is illustrated in Fig. 5 for each of the GRs. The data are binned using values from 1 to 0 in increments of 0.1, with the median and interquartile range (IQR) of the distribution in each bin plotted together with the number of volume-matched sample pairs.

For the most restrictive case of , all SR and GR bins comprising a sample must satisfy the reflectivity criteria. This ensures good volume matching but severely limits the number of valid samples. In contrast, for only a single bin for each radar needs to exceed the respective reflectivity thresholds. This gives many more valid samples, but it can lead to very poor volume matching. As is decreased, we thus observe an increase in both the number of samples and the variability in (Fig. 5). The change in sample size is more pronounced for the NEW and WOL radars as a result of their larger beam widths; at a given GR range, more SR bins are included in each sample, so the probability that is higher. For all three GRs, there is a pronounced decrease in the median with decreasing , with the total change being around 1–1.5 dB. This trend, also noted by Morris and Schwaller (2011, their Figs. 2–5), results from the low sensitivity of the SRs. As is reduced, an increasing number of samples comprise bins with dB*Z*, which the GR can observe but the SR cannot. Thus, the average volume-matched GR reflectivity decreases, while the corresponding SR reflectivity remains approximately constant.

Clearly, it is important to exclude samples with low values of or . Ideally, we would set ; however, testing showed that the associated reduction in sample size severely limits our ability to derive a complete time series of GR calibration error (not shown). As a compromise we therefore set . In doing so, Fig. 5 suggests that we will introduce a slight negative bias in our calibration error estimates. However, it turns out that that this bias is largely mitigated by the reflectivity thresholding described below (not shown).

We now examine how varies with precipitation type and height together with the impact of the Ku-to-S-band frequency correction that is applied to SR reflectivities. This information is summarized using box-and-whisker diagrams in Fig. 6. Here, samples for each GR are divided according to the SR precipitation-type classification (stratiform or convective) and based on their height with respect to the ML (below, within, or above). For both precipitation types, the frequency corrections for dry and melting snow have been used above and within the ML, respectively. The relationships for hail were initially used in convective precipitation but were found to worsen the agreement between above and below the ML (not shown). Samples with precipitation type “other” account for a very small proportion (<1%) of the total and are therefore excluded from this analysis.

The frequency correction results in an increase in (via a decrease in ) below the ML and a decrease in (via an increase in ) within and above the ML. Changes are more pronounced in convective than stratiform precipitation because the former is characterized by higher reflectivities. For all three GRs, we observe good agreement between the frequency-corrected distributions above and below the ML in stratiform precipitation. However, within the ML the distributions are shifted upward, suggesting that the frequency correction for melting snow is underestimated. This layer also shows higher variability in because it includes all samples whose volume overlaps the bright band. For convective precipitation, the frequency correction clearly increases the discrepancy between the different vertical layers, promoting a systematic decrease in with height. We speculate that this is associated with an undercorrection of SR beam attenuation in convective precipitation (leading to an underestimation of and thus an overestimation of ); however, errors in the frequency correction may also contribute. In addition to a disagreement between the layers, we note that the convective samples feature a larger spread in , consistent with higher spatial variability in the precipitation field and associated poorer volume matching.

Based on these results we exclude convective precipitation samples and stratiform samples within the ML from all subsequent analysis. This reduces the SYD radar sample size by approximately 62% and the WOL and NEW radar sample sizes by approximately 77% (the larger beam widths of these radars mean that more samples overlap with the ML). It should be noted that in order to mitigate potential biases associated with the SR attenuation correction, only stratiform samples *above* the ML are used in ground validation of the GPM DPR (W. Petersen 2017, personal communication). However, when combined with the reflectivity criteria introduced below, the exclusion of samples below the ML was found to excessively limit the total number of samples. Testing reveals a slight (typically <0.5 dB) but systematic increase in calibration error estimates when only samples above the ML are used (not shown). This may be indicative of excessive attenuation correction in stratiform precipitation (cf. Wang and Wolff 2009) and/or undercorrection for the frequency difference in snow.

Figure 7 summarizes the influence of two further sample characteristics—GR range and GR–SR time difference —on . The data are plotted as bivariate histograms with the median and IQR of overlaid for each and bin. As one might expect, there is little dependence for either variable. At ranges beyond ~60 km, shows a weak decreasing trend with increasing for the WOL and NEW radars that is not present for the SYD radar. This is likely due to the beam widths of the WOL and NEW radars being around twice the angular beam spacing , whereas for SYD . For , the GR reflectivity of the volume-matched sample tends to represent a larger area (in the range–azimuth plane) than observed by the SR, giving rise to a slight negative bias in , particularly at long ranges where the absolute difference in area is large. It should be noted that Morris and Schwaller (2011) found the same trend (increasing reflectivity in their case) despite the fact that the radar they considered (the WSR-88D in Melbourne, Florida) had a 1° beam. This probably reflects their use of a higher GR reflectivity threshold which will have reduced the number of samples with low (and thus low ; see below) at short range, where many GR bins are averaged.

Turning to the lower row of Fig. 7, it is clear that there is no systematic variation in with ; however, larger time differences are associated with higher variability, as seen from the IQRs. This again is consistent with the findings of Morris and Schwaller (2011) and makes intuitive sense: a larger implies greater spatial mismatch between the SR and GR volumes, leading to larger random errors in . These errors could be reduced by applying an advection correction to each GR sweep; however, we do not attempt this here.

The final sensitivity we consider is to the reflectivity itself. Of course, there are two measures of this quantity and it is important to consider both. The top row of Fig. 8 shows how varies with SR reflectivity for the three GRs, using the same format as in Fig. 7. For SYD and WOL, shows a slight increasing trend for dB*Z*, while for all three radars there is a similarly weak decreasing trend for dB*Z*. The origin of the first of these trends is unclear; however, the second trend may be associated with the Ku-to-S-band frequency correction. Without this correction, the trend is much more pronounced (not shown), suggesting that with larger corrections it would disappear altogether. It is quite possible that the Cao et al. (2013) method underestimates the frequency correction at high reflectivities; however, given the other sources of uncertainty, it is difficult to be sure. In any case, the associated variation in is small (<1 dB in the median).

The variations in with are much more substantial (bottom row of Fig. 8). For all three radars, there are three distinct portions of the parameter space. For dB*Z*, is negative and shows a strong positive trend. This is a direct consequence of the low sensitivity of the SRs. For dB*Z*, the GR reflectivity is constrained to be lower than the SR reflectivity and thus is constrained to be negative; similarly, if only slightly exceeds 18 dB*Z*, then can only be slightly positive. Effectively, the top-left portion of the histogram has been cut off. The trend disappears only once the reflectivity is large enough that the distribution of becomes roughly symmetric, which occurs around dB*Z*. Beyond this point, remains almost constant up to around dB*Z*, when it begins to rapidly increase again. We believe the latter trend to be associated with attenuation of the SR beam in regions of intense stratiform precipitation. This would be consistent with Liao and Meneghini (2009b) and SM11, both of whom noted undercorrection of attenuation in version 6 of the TRMM 2A25 product, as well as several studies (Wolff and Fisher 2008; Amitai et al. 2009; Chen et al. 2013; Kirstetter et al. 2013; Rasmussen et al. 2013) which identified negative biases in PR rainfall estimates at high rain rates.

Summarizing the results of this section, we have identified several factors that strongly influence the GR–SR reflectivity difference estimates obtained using the VMM, namely, the percentage of above-threshold reflectivity values within a sample, the height of the sample with respect to the ML, the application of a Ku-to-S-band frequency correction, the precipitation type, and the reflectivity itself. Based on these findings we extract the subset of volume-matched samples for each radar that are expected to most accurately isolate reflectivity differences associated with GR calibration errors. Specifically, samples are included only if they meet the following criteria:

comprise at least 70% SR and GR bins with reflectivities above the respective thresholds;

are located entirely above or below the ML in stratiform precipitation;

have volume-averaged SR and GR reflectivity values between 24 and 36 dB

*Z.*

Table 4 shows how the sample size, mean , and standard deviation vary with the application of these criteria. Consistent with the discussion given above, criteria 1 and 3 produce a pronounced positive shift in mean , while criterion 2 produces a smaller negative shift. All three criteria act to reduce variability, with criterion 3 having by far the biggest impact. This is almost entirely due to the lower reflectivity threshold; the impact of the higher threshold is much smaller because there are far fewer samples with high reflectivities. Applying all three criteria together results in a 1.6–2.2 dB increase in mean Δ*Z* and a 2–2.4 dB decrease in the standard deviation (even though the sample size decreases by more than 90% for each radar).

### c. Correcting calibration errors

Figure 9 shows the complete 7-yr time series of GR–SR comparisons for the SYD radar. Plotted are the mean reflectivity difference (symbols, colored according to the number of samples) and its standard deviation (vertical lines) for each SR overpass. It is apparent that even with the filtering criteria detailed above there is considerable variability in values (cf. Table 4), particularly for those comparisons with fewer than 100 samples (white and light gray symbols). This is most likely associated with residual volume-matching errors in the presence of rapidly moving/evolving precipitation features and/or nonstandard GR beam refraction. Nevertheless, it is possible to identify the basic temporal evolution of GR calibration.

From the start of operations in May 2009 until the middle of 2014, the calibration appears to be quite accurate and stable, with mean errors generally less than 2 dB. A possible exception is September/October 2012, when several comparisons suggests a negative offset of around 4–5 dB, although the sample sizes for these are small. The period August 2014–May 2015 shows more significant GR errors, with positive offsets of 3–4 dB during the first 3 months and negative offsets of 3–5 dB thereafter. There are no comparisons during June and July 2015 and only one each in August and September; however, toward the end of the year errors return to near zero.

While the VMM does not provide sufficiently precise estimates of GR reflectivity error to identify gradual changes in calibration associated with the degradation of radar hardware, it can pick out sudden jumps that may result from component failures or engineering activities. The problem is that suitable SR site overpasses are rarely frequent enough to determine the exact date of these changes. Fortunately, as discussed in section 2a, the BoM maintains records of all operational GR maintenance work. From these records, the dates of possible calibration changes were identified and used to group the GR–SR comparisons into periods ranging in length from a few weeks to around 18 months. The calibration error ϵ during each period is assumed to be constant and is calculated using the following iterative procedure:

Valid samples (i.e., those meeting criteria 1–3, above) comprising all GR–SR comparisons during the period are grouped, and the mean is computed as an initial estimate of ϵ.

The set of valid samples is recomputed, incorporating the estimated calibration error (i.e., with ϵ subtracted from the GR reflectivities) and a new value of ϵ is computed as the mean of the uncorrected values.

Step 2 is repeated until a stable estimate of ϵ is obtained (to the nearest 0.1 dB). Typically, this takes fewer than five iterations.

As discussed by Protat et al. (2011), an iterative calculation is required when thresholding the reflectivity to account for the fact that, given a nonzero calibration error, samples will be incorrectly included/excluded from the calculation of ϵ. For example, consider a situation where the true ϵ is −3 dB. In the initial estimation (step 1, above), samples with uncorrected reflectivites of 21–24 dB*Z* (true reflectivities of 24–27 dB*Z*) will be incorrectly excluded, while those with uncorrected reflectivities of 33–36 dB*Z* (true reflectivities of 36–39 dB*Z*) will be incorrectly included. Similarly, if the true ϵ is +3 dB, then samples with uncorrected reflectivites of 24–27 dB*Z* (true reflectivities of 21–24 dB*Z*) will be incorrectly included, while those with uncorrected reflectivities of 36–39 dB*Z* (true reflectivities of 33–36 dB*Z*) will be incorrectly excluded. In either case, the magnitude of ϵ will be underestimated. By subsetting samples according to the corrected GR reflectivities and recomputing ϵ iteratively, this bias can be eliminated. Figure 10 illustrates the procedure for two consecutive periods (one with positive ϵ, one with negative ϵ) from the SYD radar time series. In both cases, iteration increases the magnitude of the calibration error estimate by 0.6 dB.

Not every single maintenance event will be associated with a change in radar calibration. For example, checks may show the transmitter and receiver settings to be stable with respect to the previous site visit. We therefore test whether the calibration error during each period is statistically distinct from the one that preceded it. Specifically, a difference of means test is performed using the error-adjusted samples from each period. If the difference is significant at the 5% level^{3} and ≥0.5 dB, then both periods are retained; otherwise, the two are combined and the GR bias estimate is recomputed. Periods are also combined if one contains fewer than two comparisons comprising at least 50 samples each; we consider this the minimum requirement for a robust error estimate. The choice of 0.5 dB as a minimum difference is somewhat arbitrary but reflects the remaining uncertainty in the GR–SR comparisons (i.e., we do not expect the method to reliably detect changes in calibration of less than 0.5 dB).

Figure 11 shows the time series of GR–SR reflectivity difference for the SYD radar following the calculation of calibration error. The dates of possible calibration changes and the mean (estimated ϵ) and its standard deviation for each intervening period are also indicated. Comparing Fig. 11 with Fig. 9, it can be seen that the sample size and mean values for each comparison have changed, particularly where ϵ is large in magnitude (e.g., in September and October 2012), as a result of the use of bias-corrected GR reflectivities in the filtering of samples. Overall, the method appears to work very well. It is able to identify the above-noted major calibration changes in 2012, 2014, and 2015, as well as more subtle changes, for example, in December 2011. Values of ϵ range from −5.3 to +3.5 dB with the average over the entire 7-yr period being −0.6 dB. The same analysis for the WOL and NEW radars (not shown) reveals similar maximum error magnitudes but more negative values on average with means of −1.4 and −1.7 dB, respectively.

Two aspects of these results must be remarked upon. The first is the large magnitude of the calibrations errors, with values frequently >1 dB and occasionally >5 dB. These errors have the potential to mislead forecasters (by suggesting that storms are more/less intense than they really are) and significantly impact radar-derived products, particularly when values are integrated in time (e.g., precipitation accumulations) or space (e.g., vertically integrated liquid water content). The second aspect to remark upon is the change in calibration associated with radar maintenance activities. One would hope that system checks and modifications always act either to maintain an existing good calibration or to improve a poor one. However, our results show that this is often not the case. For example, following preventative maintenance of the SYD radar in July 2014, the calibration error was increased from +0.8 to +3.5 dB (Fig. 11). Further work later that year saw the introduction of an error of roughly the same magnitude but opposite sign (−3.7 dB). It is difficult to ascertain the reason for these changes from the limited textual information contained in SitesDB; however, both human error and miscalibrated test equipment may play a role. Clearly, there is a need for more careful monitoring of radar calibration during operations, an issue we discuss further in section 4.

### d. Verification

By combining filtered GR–SR comparisons with radar engineering records, we have been able to quantify historical calibration errors for three GRs in the vicinity of Sydney. We now seek to evaluate the benefits achieved by accounting for these errors. A comparison against ground truth, such as rain gauges, is theoretically one means to achieve this goal; however, as noted in the introduction, radar rainfall estimates are subject to many additional sources of uncertainty. We therefore instead investigate how the consistency of our three radars differs with and without calibration adjustments. Agreement between neighboring GRs is important both from an operational perspective (e.g., forecasters viewing a storm using different radars should obtain the same impression of its intensity) and for the production of multiradar products, such as regional and national rainfall maps. Since we are adjusting the GRs relative to the same SR reference, we would expect the agreement between them to improve.

Following the rationale behind the VMM, we minimize spatial processing (interpolation and averaging) of the measured reflectivities and associated errors by directly matching sample volumes in space and time. The only spatial processing we apply is the averaging of reflectivities in range to achieve a consistent gate spacing ( m) across all three radars. For all possible radar pairs (SYD–WOL, SYD–NEW, and WOL–NEW), we then identify bins that are (i) close in space (centers < 500 m apart), (ii) close in time (elevation sweeps < 2 min apart), and (iii) similar in size (difference in volume < 10%). Spatial association is achieved by mapping data to a common Cartesian grid using an azimuthal equidistant projection centered halfway between the sites. For simplicity, we model the volume of atmosphere sampled by each bin as a cuboid with dimensions of , , and in the range, azimuth, and elevation directions, respectively. Here, we account for the fact that each azimuthal sector comprises multiple rays that overlap by an increasing degree with increasing elevation angle (the term). The fractional volume difference between radar bins *i* and *j* is computed as . To reduce computational expense, only days with widespread rainfall in the area of overlapping coverage are processed. Specifically, we use gridded rain gauge data (Jones et al. 2009) to identify days with at least 1 mm of rain over two-thirds of the land portion of the overlap area. For each pair of temporally matched scans, the reflectivities and volumes of each bin pair are stored together with their spatial and temporal offsets.

This GR–GR comparison method is very similar to that used in the original version of the Radar Reflectivity Comparison Tool (RRCT; Gourley et al. 2003), which was developed for monitoring the relative calibration of radars in the U.S. WSR-88D network. Tolerances in the RRCT were 500 m in horizontal distance, 50 m in vertical distance, 5% in volume, and 3 min in time (between volume scans). Note, however, that the method was subsequently modified to use less stringent tolerances (750 m in distance and 6 min in time) while considering only bins within a rectangular region (120 km in length, 20 km in width, and 20 km in height) centered equidistant between the radars to ensure comparable bin volumes (http://rrct.nwc.ou.edu/). The latter approach would not work here because, unlike the WSR-88Ds, our three radars all have different beam widths (Table 1) and thus different bin volumes at a given range.

Figure 12 summarizes the GR–GR comparison results using smoothed kernel density estimation violin plots (Hintze and Nelson 1998). Shown are the distributions of reflectivity difference computed with and without calibration adjustments, together with the sample size and Pearson correlation coefficients. Note that there are over a million samples for each radar pair, with nearly 10 million for the SYD–WOL comparison as a result of the close proximity of these sites (Fig. 1). It can be seen that the agreement of both the WOL and NEW radars with the SYD radar is improved, with smaller values of median difference, smaller IQRs, and higher correlation coefficients for the calibrated reflectivities (Figs. 12a,b). While the latter two changes are also seen in the WOL–NEW comparison, the median difference in this case actually increases in an absolute sense, from −0.1 to +0.5 dB. Taken alone, this would suggest that NEW reflectivities are being overcorrected and/or WOL reflectivities are being undercorrected. However, based on the SYD comparisons, we would expect a difference of only around 0.1 dB between these two radars after calibration adjustments. This discrepancy may be indicative of poorer volume matching between the WOL and NEW radars as a result of the large distance between them (Fig. 1).

## 4. Summary and outlook

In this paper we have presented a method for estimating ground-based radar (GR) calibration errors through comparisons with spaceborne radar (SR) measurements from the TRMM and GPM satellites. This has been developed and tested using data from three Bureau of Meteorology (BoM) operational GRs in the vicinity of Sydney, Australia, for the period 2009–15.

Spatially and temporally coincident GR and SR observations are first obtained using the volume-matching method (VMM) of SM11, which was originally developed to support ground validation efforts for GPM. Following Cao et al. (2013), a precipitation-phase-dependent reflectivity correction is applied to the SR data to account for differences in measurement frequency (S band for GRs, Ku band for SRs). The resulting sample pairs are then filtered to isolate reflectivity differences associated with GR calibration error. Specifically, samples are retained only if they (i) predominantly comprise bins with reflectivities above the respective instrument sensitivities (18 dB*Z* for the SRs, 0 dB*Z* for the GRs), (ii) are located in stratiform precipitation outside of the melting layer, and (iii) have moderate reflectivities (24–36 dB*Z*) that are largely unaffected by the low SR sensitivity or attenuation of the SR beam. It was shown that the application of these criteria on average increases the estimated GR offset by around 2 dB while decreasing its variability.

Time series of the filtered GR–SR comparisons show periods of relatively stable GR calibration separated by sudden jumps of several decibels. However, it is not possible to determine the precise date of these changes because of the low frequency of suitable satellite overpasses. In addition, residual noise in the comparisons, resulting from imperfect volume matching, makes it difficult to detect more subtle changes in calibration. To address these issues, we make use of radar engineering work records maintained by the BoM. Dates of possible calibration changes are identified, between which the GR error is assumed to be constant. The calibration error for each period is then computed as the mean GR–SR reflectivity difference across all contemporaneous samples, using an iterative procedure to account for biases introduced by the reflectivity thresholding (Protat et al. 2011). This method produces results that are consistent with a subjective assessment of the time series while providing precise estimates of calibration error.

Since no ground truth exists to verify the accuracy of our calibration error estimates, we have examined the impact of correcting for these errors on the agreement between the three radars. Following the rationale behind the VMM, a method has been developed where spatially and temporally coincident GR sample volumes are identified and their reflectivities are compared (cf. Gourley et al. 2003). It was found that the calibration corrections in general lead to a robust improvement in the agreement between GRs, with an increase in correlation coefficients and a narrowing and shift toward zero of the reflectivity difference distributions.

In the future it would be valuable to explore ways to further reduce the variability in GR–SR comparisons. One method that is currently being investigated is the use of quality indices to screen out GR samples that may be contaminated by ground clutter, anomalous propagation, or beam blockage (Crisologo et al. 2017). Screening could also be applied in cases where the orientation of the two radar beams leads to poor volume matching (e.g., Fig. 2c). The accuracy of the VMM would likely be further improved by accounting for nonstandard GR beam refraction and the movement of precipitation features between SR and GR scans. Future work could also explore refinements to the Cao et al. (2013) frequency correction in the melting layer, with a view to eliminating the need to filter out volume-matched samples that fall within this layer.

In theory, the approach presented in this paper could be applied to any radar that falls within the coverage of the SRs (±35° and ±65° during the TRMM and GPM eras, respectively). In practice, however, its potential is limited by the requirement for reliable engineering records, as these may not be available for many GR networks. Furthermore, since several months can pass between suitable satellite overpasses, the approach cannot be used for operational calibration monitoring. An alternative method that does not suffer from these issues is the relative calibration adjustment (RCA) technique (Silberstein et al. 2008; Wolff et al. 2015). This uses the statistical properties of ground clutter to provide a precise (±0.5 dB) measure of day-to-day variations in GR calibration relative to some baseline. The problem in this case is identifying an accurate baseline.

Clearly, the two techniques—SR comparison and RCA—are complementary. We are thus exploring the potential of applying them in tandem: using SR comparisons to set and periodically check the baseline reflectivity and the RCA to monitor and correct day-to-day fluctuations in calibration. This approach has already been successfully applied to 16 years of observations from the C-band polarimetric (CPOL) research radar in Darwin, Australia, and work is ongoing to incorporate it into operational radar quality control procedures at the BoM (Louf et al. 2017). Given the near-global coverage of GPM and the ubiquity of ground clutter, we believe that this approach has the potential to improve the accuracy and stability of GR calibration the world over.

## Acknowledgments

RAW was funded by an Australian Research Council Linkage Project grant (LP130100679). We are grateful to Walt Petersen for insightful discussions on this work, to Bob Morris for providing details of the VMM implementation, and to Mark Curtis for his assistance with SitesDB. We also thank editor Luca Baldini and two anonymous reviewers for their comments.

### APPENDIX

#### Details of the VMM Algorithm

The VMM algorithm was coded up using Interactive Data Language (IDL) based on the descriptions in SM11 and Morris and Schwaller (2009), with modifications as detailed in section 2b. At the time of writing, work is ongoing to incorporate it into the wradlib Python library (Heistermann et al. 2013). Here we summarize the steps involved in creating the GR–SR comparison for a single SR overpass. The geometry of the GR and SR measurements is illustrated in Fig. A1.

The first step is to determine the location of each SR bin with respect to the GR. For this, we use an azimuthal equidistant projection centered on the GR. Each SR ray has an associated longitude–latitude pair corresponding to its intersection with the Earth ellipsoid [both TRMM and GPM use the World Geodetic System of 1984 (WGS 84) ellipsoid]. These are easily converted to Cartesian coordinates using standard map projection routines. To determine the full three-dimensional coordinates of each SR bin, we must apply a parallax correction. The magnitude of the parallax error is

where is the range of the bin from the Earth ellipsoid and *α* is the local zenith angle of the ray (Fig. A1a). The parallax-corrected horizontal coordinates are then

Here and are the coordinates of the ellipsoid intersection (), and *γ* is the angle of the SR scan line (Fig. A1a). Finally, the height of the bin is computed as

Note that we do not account for the curvature of the Earth in these calculations. This is a reasonable approximation because is small (typically <5 km).

In addition to the coordinates of each SR bin, we calculate their horizontal and vertical dimensions. The radius of a bin projected onto the horizontal plane is computed as the average of the projected radii in the along-track and cross-track directions (the latter varies with ),

where is the SR range of the bin (Fig. A1a) and is the SR angular beamwidth (0.71° for both TRMM and GPM). The vertical depth of a bin is given by

where is the SR gate spacing (250 m for TRMM, 125 m for GPM).

The next step is to identify the nearest GR volume scan in time. Each SR scan has a unique time stamp; however, since it takes less than a minute for the satellite to traverse the GR field of view, a single time may be reasonably applied to all scans in the overpass. Specifically, we use the time corresponding to the closest point of approach to the GR . For the BoM radars, volume scans have a time stamp for every elevation sweep (corresponding to the start of that sweep) , but they are named according to the start time of the entire scan . Preliminary work indicated that the largest number of GR–SR matched volumes occurred around the third or fourth elevation sweep or about s into the scan. Thus, to ensure the best temporal matching, we identify the scan that minimizes and proceed only if min.

The Cartesian coordinates of the GR bins are next determined under the assumption of standard refraction, that is, modeling the Earth as a sphere of equivalent radius , where and *a* is the geocentric Earth radius at the latitude of the GR. The geometry illustrated in Fig. A1b leads to the follow simultaneous equations:

where *h* is the height of the GR antenna above the Earth ellipsoid; is the height of the GR bin; is its horizontal distance from the radar; and and are the GR range and elevation angle, respectively. Solving for , we obtain

from which the *x* and *y* coordinates can be determined as

At this point we have the coordinates of every SR and GR bin in a common reference frame. We now compute the median brightband height and width and apply a Ku-to-S-band frequency correction to the SR data as described in section 2b. The volume matching then proceeds by looping first over SR rays and then over GR elevation sweeps. SR rays are considered only if they (i) contain precipitation ( for TRMM and for GPM; Table 3) and (ii) are located between GR ranges of and . GR sweeps are considered only if min. The steps involved in identifying a volume-matched GR–SR sample pair are as follows:

- Calculate the GR elevation angle of each SR bin [using (A6)] as where is the horizontal distance of the SR bin from the GR.
Identify the SR bins that fall within the GR beam, that is, for which , where is the GR’s angular beamwidth. Note the fraction of these for which .

- Average the values of , , and to get the coordinates of the sample centroid (, , and ) and approximate its horizontal and vertical dimensions ( and ) by the maximum and total , respectively. Also, determine the GR range of the sample [again using (A6)] as where is the horizontal distance of the sample from the GR.
Linearly average the reflectivity values (in mm

^{6}m^{−3}) for which to get the SR reflectivity of the matched volume . Do this for both raw and frequency-corrected reflectivities.Identify the GR bins that fall within the footprint of the SR beam; i.e., for which , where . Note the fraction of these for which .

Average the reflectivity values (in mm

^{6}m^{−3}) for which , weighting bins inversely by*d*(using a Barnes Gaussian function with radius ) and linearly by (proportional to the bin volume), to get the GR reflectivity of the matched volume .

For every SR overpass, a single file is produced containing data for all volume-matched samples. The variables stored for each sample are as follows:

Cartesian coordinates (, , );

volume dimensions (, );

GR range ();

averaged SR and GR reflectivities [, , ];

fraction of SR and GR bins above the respective minimum reflectivity thresholds (, );

precipitation-type index ( for stratiform, for convective, for other);

GR–SR time difference ().

The file also contains the median brightband height and width for the overpass.

Note that the quality parameters listed in Table 3 are used at various stages of the algorithm to ensure that all matched samples are accurate. Specifically, SR scans are rejected if and SR rays are rejected if for TRMM and if and/or for GPM.

## REFERENCES

*Measuring Precipitation from Space: EURAINSAT and the Future*, V. Levizzani, P. Bauer, and F. J. Turk, Eds., Advances in Global Change Research, Vol. 28, Springer, 213–224.

*38th Conf. on Radar Meteorology*, Chicago, IL, Amer. Meteor. Soc., 23B.4, https://ams.confex.com/ams/38RADAR/meetingapp.cgi/Paper/321038.

*31st Conf. on Radar Meteorology*, Seattle, WA, Amer. Meteor. Soc., P3C.1, https://ams.confex.com/ams/32BC31R5C/techprogram/paper_64171.htm.

*38th Conf. on Radar Meteorology*, Chicago, IL, Amer. Meteor. Soc., 23B.3, https://ams.confex.com/ams/38RADAR/webprogram/Paper320618.html.

*34th Conf. on Radar Meteorology*, Williamsburg, VA, Amer. Meteor. Soc., P7.3, https://ams.confex.com/ams/34Radar/techprogram/paper_155254.htm.

*35th Conf. on Radar Meteorology*, Pittsburgh, PA, Amer. Meteor. Soc., 68, https://ams.confex.com/ams/35Radar/webprogram/Paper191729.html.

## Footnotes

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

^{1}

Technically, the term *reflectivity* refers to the quantity , where *λ* is wavelength and is the dielectric constant for liquid water. However, for brevity, and in keeping with previous studies on radar calibration, we will refer to *Z* as reflectivity.

^{2}

The bright band is a layer of locally enhanced reflectivities around the melting level that occurs as a result of changes in the scattering properties of snow as it melts.

^{3}

Other significance levels (10% and 1%) were tested with almost no change in the results.