Validation of GPM Dual-Frequency Precipitation Radar (DPR) Rainfall Products over Italy

AbstractThe Ka–Ku Dual-Frequency Precipitation Radar (DPR) and the Microwave Imager on board the Global Precipitation Measurement (GPM) mission core satellite have been collecting data for more tha...


Introduction
The Global Precipitation Measurement (GPM) Core Observatory has been collecting data by both the passive GPM Microwave Imager (GMI; Draper et al. 2015) and the Dual-Frequency Precipitation Radar (DPR; Furukawa et al. 2015) for more than 3 years (Neeck et al. 2014). The DPR consists of a Ku-band (13.6 GHz) precipitation radar, similar to the Precipitation Radar (PR) on board the Tropical Rainfall Measuring Mission (TRMM) satellite (Kummerow et al. 1998), and an unprecedented Ka-band (35 GHz) radar.
GPM provides several precipitation products at different scales by using different sensors' combinations and synergies. The DPR plays a key role in the GPM precipitation estimation scheme, being the main calibration instrument, serving as space reference for radiometer-derived global precipitation algorithms (Neeck et al. 2014), and providing basic information on the vertical cloud structure (Grecu et al. 2016). Three algorithms (KuPR algorithm, KaPR algorithm, and dualfrequency algorithm) are derived by DPR estimating up to three different precipitation rates for a footprint of GPM DPR. Though Level-2 DPR products do not provide a frequent snapshot coverage of the same area of the globe, they have an important role for the generation of the Level-3 products. Moreover, they are particularly useful in the regions where ground-based measurements are sparse or not available, and where information on vertical structure, not provided by radiometers, is required.
As such, it is important to provide as much reliable verification as possible in different orographic and climatological conditions (Speirs et al. 2017;Iguchi et al. 2016). A physical validation of DPR observations would require dedicated observation tools, that is, ground-based radar, operating purposely to assess the reflectivity measurements of DPR radars. Two postlaunch ground validation field campaigns in complex terrain were performed: the Integrated Precipitation and Hydrology Experiment (IPHEx; Barros et al. 2014) and the National Aeronautics and Space Administration (NASA) Olympic Mountains Ground Validation Experiment (OLYMPEX; Houze et al. 2015Houze et al. , 2017 involving different instruments such as Micro Rain Radars (MRRs), disdrometer and rain gauge networks, aircraftbased radiometer, and X-, Ku-, Ka-, and W-band radar measurements.
However, it is also worth directly validating the rain rate at the ground, as extracted with different approaches and algorithms (Chandrasekar et al. 2008). For this reason, NASA and the Japan Aerospace Exploration Agency (JAXA) carried out an extensive ground validation program in North America (mainly the United States) and Europe (Schwaller and Morris 2011), as well as partnering with various other groups elsewhere in the world. As an example, a scientific collaboration between the EUMETSAT Satellite Application Facility on Support to Operational Hydrology and Water Management (H SAF) and GPM called ''H SAF and GPM: Precipitation algorithm development and validation activity'' was established in 2014.
The H SAF started in 2005 as part of the EUMETSAT SAF Network (Mugnai et al. 2013). In March 2017, the program entered its Third Continuous Development and Operation Phase, which will last until February 2022. The H SAF is a consortium with the aim of retrieving satellite observations to estimate hydrological parameters such as precipitation, soil moisture, and snow cover to provide products and services in support of operational meteorology, hydrology, oceanography, and climate and risk management. In particular, the consortium produces estimations of precipitation products from passive microwaves (PMWs) Panegrossi et al. 2014Panegrossi et al. , 2016Casella et al. 2015a,b;Marra et al. 2015) and PMW and infrared (IR) data (Feidas et al. 2018) over the Meteosat Second Generation (MSG) full disk. Inside the H SAF community, a program to evaluate the accuracy of precipitation maps retrieved from satellite data has been defined . The H SAF Precipitation Product Validation Group validates every year over Europe PMW-only products and IR-PMW products, taking into account the different sensor features and the satellite native grid. This service provides information useful to ingest the observations in a numerical model or in a decisional cycle.
In the H SAF context, instantaneous and accumulated precipitation products are usually validated with respect to ground radar and rain gauge data on European areas (Belgium, Bulgaria, Germany, Hungary, Italy, Poland, Slovakia, and Turkey), and the same procedures can be applied to validate other satellite products, as the GPM ones. In particular, DPR products are of deep interest for H SAF, allowing extensive validation of H SAF instantaneous precipitation products over the MSG full disk in regions not covered by ground-based measurements, such as oceans and a large part of Africa. The DPR has the great advantage of providing global and consistent observations over the globe, including oceans and mountainous areas, and remote areas where groundbased precipitation measurements are scarce or not available, as in African areas.
However, DPR measurements are affected by some limitations, such as attenuation, ground clutter, nonuniform beam filling (NUBF), and multiple scattering. These effects are taken into account in the NASA/ JAXA algorithms used to retrieve DPR precipitation products (Iguchi et al. 2017), but the surface precipitation estimates are affected by errors that need to be quantified.
The work presented here is focused on performance of the DPR precipitation rate products in complex terrain such as Italy, over an 18-month time frame. This paper reviews the performance of precipitation retrieval algorithms evaluating the rainfall rates estimated with the KuPR algorithm, KaPR algorithm, and dualfrequency algorithm by comparing with both ground radars and rain gauges. The results are analyzed in order to define potentialities and limitations in the use of GPM-DPR products as reference for the validation of the H SAF precipitation products over the MSG full disk.
Italy is an ideal test bed for complex terrain in a Mediterranean climatic regime, as it consists of a mixture of mountainous terrain (Alps, Apennines) and flatter/coastal areas. The country is well instrumented by a network of 22 weather radars Rinollo et al. 2013), as well as a network of around 3000 rain gauges.
The aim of this paper is to assess the overall performances of the DPR-derived products and to focus on the sensitivity of the validation results to the retrieval settings.
The paper is structured as follows. The description of the ground data and of the DPR products is presented in section 2. In section 3, the overall results are reported by comparing satellite with both ground datasets, while section 4 is focused on the sensitivity of the results to seasonal cycle and rainfall rate. A detailed study of possible error sources is reported in section 5, while the results related to this analysis are drawn in section 6. A summary of conclusions is reported in section 7.

Data and algorithms description
A period of 18 months from 1 July 2015 to 31 December 2016 over the Italian Peninsula has been analyzed. The GPM Level-2A precipitation products have been evaluated by comparing them with the ground estimates from a radar network and rain gauges network, both delivered by the Italian Department of Civil Protection (DPC). Only instantaneous field of view (IFOV) with liquid precipitation (excluding solid and mixed-phase precipitation) over land have been considered, applying the flag on the GPM precipitation phase product as well as on the GPM surface type product, respectively. A total of 902 overpasses and more than 2.5 million pieces of satellite data [for the DPR normal scan (NS) acquisition mode over Italian land areas] were analyzed: of these, 621 (585) events with 103 304 (73 207) IFOVs have a rainfall rate estimated at the surface above 0.0 (0.5) mm h 21 . Focusing on the liquid-only precipitation phase, the dataset decreases to 460 overpasses with 47 459 IFOVs with rain-rate intensity (RR) $ 0.5 mm h 21 .

a. DPR products
While the DPR can help validate GMI, its primary role is to construct three-dimensional precipitation and drop size distribution (DSD) maps by using its Ka-and Ku-band radar measurements. The GPM Core Observatory flies in a non-sun-synchronous orbit at 658 inclination to cover a larger latitudinal extension with respect to the TRMM orbit, which extended from 358N to 358S .
The high-sensitivity scan (HS) Ka-band radar footprints are interlaced with Ku-band footprints (NS, 49 footprints) and have 24 angle bins with the range bin size of 250 m. The Ka-and Ku-band radar-matched scan (MS) footprints, across the central beams of Ku footprints, have 25 angle bins with 125-m bin size range. The swath widths of Ka-and Ku-band radars are 120 and 245 km, respectively, while both Ka-and Ku-band footprints are of 5.2-km diameter. The minimum detectable signal is around 13 dBZ for Ku band, while it is 17 dBZ for Ka and dual-frequency band. This corresponds to a minimum rainfall rate of 0.2 mm h 21 for Ka band and DPR and 0.5 mm h 21 for Ku band (Iguchi et al. 2017).
The version V04A of both the DPR Level-2A product (2A-DPR) and the Ka/Ku (Ka single orbit, Ku single orbit) Level-2A product (2A-Ka/Ku) have been used in this study. While the 2A-DPR products are based on the dual-frequency information, both the 2A-Ka/Ku products are derived separately from the single-frequency signal (Iguchi et al. 2017). Please note that products are not fully independent: in particular, the inner swath (footprints 13-37) of 2A-DPR-NS is the same as 2A-DPR-MS, while the outer swath (footprints 1-12 and 38-49) of 2A-DPR-NS is the same as the outer swath of 2A-Ku NS. The 2A-DPR and 2A-Ka/Ku products present many output variables (NASA/JAXA 2016). In this work, we considered the precipRateNearSurface (prNs) and precipRateESurface (prEs) products. While the former refers to the rain estimation at the first DPR bin free from ground clutter, the latter estimates the precipitation rate at the surface.

b. Rain gauge data and processing
The Italian rain gauge network comprises over 3000 tipping-bucket type sensors ( Fig. 1) with variable temporal sampling (1-60 min), different spatial density (a mean of 1 sensor every 100 km 2 ) over the country, and a common minimum detectable rain amount of 0.2 mm. In this study, we consider half-hour accumulated data. This rain gauge accumulation interval provides better accuracy in comparison with instantaneous satellite estimates with respect to shorter and longer sampling rates (Porcù et al. 2014). Moreover, 30 min is a tradeoff between the need to monitor short-lived heavy showers and long-lasting weak precipitation (Chen and Chandrasekar 2015).
To homogenize the two ground datasets, rain gauge data, preprocessed according to range, persistence, step, and spatial consistency (Shafer et al. 2000) to screen out suspect values, have been interpolated over a regular grid (1 km 3 1 km) through the Random Generator of Spatial Interpolation from uncertain Observations (GRISO). The GRISO (Pignone et al. 2010;Feidas et al. 2018) is an improved kriging-based technique implemented by the International Centre on Environmental Monitoring (CIMA Research Foundation). This technique preserves the values observed at the rain gauge location, allowing for a dynamical definition of the covariance structure associated with each rain gauge by the interpolation procedure. Each correlation structure depends both on the rain gauge location and on the accumulation time considered. GRISO is adopted by all European participating countries in the H SAF validation procedure ).

c. Radar data and processing
The Italian radar network is currently composed by 20 C-band and 2 X-band systems, managed by 11 administrations. The product generation at national level is carried out by the DPC that currently manages 7 C-band and 2 X-band systems, all with dual-polarization capability. The spatial distribution of single-and dual-polarization systems is depicted on Fig. 2.
The processing architecture is partially distributed, as the observations are first processed locally by a unique software system, then the single-radar products are centralized to generate the national level-products. In this work, radar products with a time resolution of 10 min mapped over a 1-km equispaced grid have been used.
The operational radar processing chain aims at identifying most of the uncertainty sources affecting the rainfall estimation process (Friedrich et al. 2006). Among them, the following error sources are primarily considered: contamination by nonweather returns (clutter), partial beam blocking (PBB), beam broadening at increasing distances, vertical variability of precipitation (Joss and Lee 1995;Germann and Joss 2002;Marzano et al. 2004), and rain path attenuation (Carey et al. 2000;Testud et al. 2000;Bringi and Chandrasekar 2001;Vulpiani et al. 2008).
Because of the characteristics of the Italian territory, most of the uncertainty has to be ascribed to the orographic complexity, especially in southern Italy where the radar coverage overlapping is poor, whenever available (Vulpiani et al. 2012. Every error source is quantified through specific tests ending with the estimation of a specific (partial) data quality index (QI) and its compensation, whenever possible. The overall data QI is then obtained as a combination of the partial quality matrices. The quality scheme, described in Rinollo et al. (2013), is embedded within the overall processing chain.
The radar observations are processed according to the following steps ): 1) Nonweather returns are identified by means of the combination of a static clutter map, texture analysis on reflectivity Z, and differential reflectivity Z DR , whenever available. 2) The PBB is quantified through the combination of empirical derived and digital terrain model-derived visibility maps (Bech et al. 2003). The resulting partial data QI is derived as in Rinollo et al. (2013). 3) The differential phase F DP is filtered and the specific differential phase K DP is computed through the iterative finite-difference scheme proposed in Vulpiani et al. (2012), tested in different environmental conditions (Crisologo et al. 2014), and integrated in open-source libraries (Heistermann et al. 2013). 4) Attenuation correction (for dual-polarization systems) is based on filtered F DP measurements according to the ZPHI methodology (Testud et al. 2000). In the case of single-polarization radars, attenuation is only evaluated with the aim to determine the corresponding QI as in Rinollo et al. (2013). 5) The detrimental effects related to distance (broadening and height of observations with respect to the zero-thermal height), eventually enhanced by orography, are evaluated through the data quality model proposed by Rinollo et al. (2013).
6) The overall data quality is computed as geometric mean of the partial quality matrices. 7) The retrieved mean vertical profile of reflectivity (VPR) is applied to the entire volumetric scan with the aim of using all the observations along the vertical to retrieve the surface rainfall rate. 8) The single-radar rainfall intensity map is computed by combining parametric algorithms based on the use of reflectivity and specific differential phase (Vulpiani et al. 2015). 9) The national-level rainfall product is built by combining the single-radar rainfall maps through a qualitybased combination criterion.
Moreover, a filtering process over the full radar dataset considered was carried out. The goal is to eliminate radar data not congruent with respect to the pluviometric ones, the latter being considered as reference. These radar data are mainly due to spurious signals caused by radar failure, such as an antenna malfunction detecting system, or ground clutter due to an antenna pointing system failure. A rainy pixel is considered affected by the radar anomaly and thus is discarded if the hourly cumulated ratio between the rain gauge and ground radars is less than a factor of 0.1. This value was chosen because it is much smaller than the average value that the ratio assumes close to the location of a calibrated polarimetric weather radar (Sebastianelli et al. 2013) and for this reason is attributable to a radar failure. The adopted methodology allows us to reduce cases of partial or total beam blocking, which would invalidate locally the radar error trend with range estimation (Sebastianelli et al. 2013). Despite this, with current technologies, there is a lack of knowledge about the true measurements of such ground truth, which, actually, is unknown (Kirstetter et al. 2012;Habib et al. 2004).
Moreover, a sensitivity analysis with respect to the radar QI values (Vulpiani et al. 2012;Rinollo et al. 2013) was conducted to exclude events with path attenuation due to heavy rainfalls as well as the sampling above precipitation areas. The mean absolute error (MAE) and the mean squared error (MSE) are computed for comparison between hourly rain gauge and radar data for increasing QI values. These statistical scores, as well as the remaining data for different QI thresholds, are shown in Table 1. The error decreases as QI increases and reaches a minimum value (MAE 5 1.35 mm h 21 and MSE 5 6.41 mm 2 h 22 ), corresponding to a QI threshold equal to 0.60, whereas beyond this QI value the error begins to increase due to the residual presence of outliers caused by radar failure (as explained above), which condition the error estimate in the face of a strong reduction of the sample size. By observing Table 1, it can be noted that MAE is stable for QI ranging from 0.40 and 0.60 while the MSE index shows greater variability, as well as the Nash-Sutcliffe model efficiency coefficient (EC; Nash and Sutcliff 1970), not shown in Table 1. The EC measures the reliability of a model in predicting the observations (in our case, radar estimates and rain gauges measurements, respectively). It ranges from 2' to 1. When the index is equal to 1 the maximum efficiency of the model occurs. We found that a better model performance corresponds to a QI of 0.6 instead of a QI of 0.5, with EC in these cases equal to 0.30 and 0.28, respectively. For this reason, only radar data with QI values greater than 0.60 are used for comparison with satellite data.

d. Performance indicators
The performance of satellite products is evaluated by considering continuous and multicategorical statistical scores.
The continuous statistical scores considered are mean error (ME), root-mean-square error (RMSE) as defined in Nurmi (2003), fractional standard error (FSE) defined as the ratio between the RMSE and the average of the observations at the ground, and the Pearson correlation coefficient (CC).
The multicategorical statistical scores evaluated (derived by the contingency table) are probability of detection (POD) and false alarm rate (FAR) as defined in Nurmi (2003). Moreover, we considered the volumetric indices of the precipitation correctly or incorrectly detected, as proposed by AghaKouchak and Mehran (2013): volumetric hit index (VHI) and volumetric false alarm ratio (VFAR).

Bulk comparisons
To have an overall indication of the performances of DPR products, we computed basic indicators over the whole dataset. The aim of this section is to assess which of the DPR products available to users and described in section 2a shows better performance as compared to a ground reference and also to highlight differences between radar and rain gauge validation.
The reason to consider two reference datasets is that radar and rain gauge networks measure precipitation rates looking at different characteristics of the rain structure, so the two measures do not necessarily coincide at all scales (Harrison et al. 2000). Moreover, while a rain gauge provides point-like, time-cumulated quantities, a radar computes instantaneous, volumeintegrated values. Although this latter feature makes for a more prompt comparison of satellite products with radar, the validation of satellite products with rain gauges is still a widespread practice, especially in a region where the radar coverage is poor or limited by orography. To mitigate the limitations of the point nature of rain gauge measurements, these data are interpolated to obtain a spatially continuous precipitation field over an equispaced grid, as indicated in section 2a.
The DPR and ground reference data have been temporally and spatially matched to perform the comparison. The temporal matching pairs the DPR observation with the radar and the accumulated rain gauge field closest in time. The spatial matching upscales the 1 km 3 1 km reference data to the DPR grid through a Gaussian-weighted (G) average. For each satellite IFOV, a 2D Gaussian function is computed with the maximum value in the IFOV center and the full width at half maximum corresponding to the IFOV's diameter, and its value is used to weight the contribution of 1 km 3 1 km grid points within the IFOV. Only ground data with QI above the threshold (0.6) and a corresponding G value greater than half of the maximum height are considered. Then, the upscaled ground data are obtained as a weighted average over the distance from the IFOV .
Before evaluating the performances of the GPM products with respect to the ground reference, a comparison between the precipitation rates as estimated by prEs and prNs products has been carried out. The differences between the prEs and prNs values are minimal for all the scan types and for DPR and Ka-/Ku-only products, with prEs reporting lower rain rates with respect to prNs according to the DPR retrieval algorithm settings (Iguchi et al. 2017). The mean difference ranges from 0.01 mm h 21 for lower rainfall rates to 0.8 mm h 21 for higher rainfall rates for all products except the DPR-HS, where it reaches 1.4 mm h 21 . Given the negligible difference between prEs and prNs, we decided to carry on the analysis on the prEs values for all scan types and instrument combinations. This reflects in six products considered in the following analysis, three from DPR and three from the Ka-/Ku-only retrieval algorithm.
One of the first steps in comparing rainfall rates measured with different instruments is to define a threshold to discriminate between wet and dry samples. This is a very important point because, given the power-law distribution of rain-rate values, the number of samples with a very low rain rate is relatively large and able to dominate any statistics. Moreover, most of the measuring precipitation instruments are less accurate at the lowest rain rates, for various reasons. In particular, Italian rain gauges are of the tipping-bucket type, with well-known problems for very low rates (Tokay et al. 2003), with a tip size of 0.2 mm, resulting in an hourly minimum detectable rate of 0.4 mm h 21 when the 30-min accumulation is considered. As for the DPR, the Ka radar has a minimum measurable rain rate of 0.2 mm h 21 , while for the Ku radar it is 0.5 mm h 21 (NASA/JAXA 2016). Given these shortcomings, that makes both estimate and reference fields particularly unreliable, and we decided to exclude from all validation calculation those samples with at least one of the two elements (estimates or reference) reporting rain rates between 0.0 and 0.5 mm h 21 (both extremes kept in the calculations). To support this choice, we performed a separate validation on only these samples, finding values of CC very close to zero (0.02-0.04), which indicate a very weak relationship between estimated and reference values when one of the two quantities is below the sensitivity of the instrument.
The first analysis is carried out on the whole database for the six prEs products. The scatterplots between prEs and ground reference datasets for all available scan types are shown in Figs. 3a-f (rain gauges) and Figs. 3g-l (radar). For each of the two blocks, the top line refers to the Ku-/Ka-only products, while the bottom line refers to the DPR products. At a first glance, the two series look similar, indicating that the two ground-based observations show overall agreement in describing the distribution of rain rates during the analyzed months.
Focusing on the differences between products, the Ku-NS plots (Figs. 3a,g) indicate a satellite general overestimation for all rain-rate ranges, with the large majority of the points laying around the 1:1 line. This is more evident when the satellite products are compared with the radar. The Ka-MS product (Figs. 3b,h) shows a prevailing underestimation for moderate and higher rain rates. The Ka-HS products (Figs. 3c,i) show a general underestimation with even lower agreement at higher rain rates.
DPR with respect to the corresponding Ku-/Ka-only products. Nevertheless, the points are slightly more concentrated along the 1:1 line, with a greater concentration at higher values for DPR-MS, indicating more skill in correctly estimating heavy rain. Further quantitative analysis can be done based on the computation of the indicators introduced in section 2b, which are reported in Table 2 after comparison between the six prEs products and rain gauge and radar reference fields for all rain-no rain events (with rain event defined as RR $ 0.5 mm h 21 ). Each cell of Table 2 reports two rows, the top referring to the comparison between GPM and radar products and the bottom referring to the comparison with respect to rain gauges. We also bolded the best score for each indicator.
The sample size of NS is almost twice the sample size of MS and HS, given the different scan structure. Statistical scores for radar and rain gauges are comparable in terms of continuous indicators, while the categorical and volumetric indicators are generally better for comparison with radar.
DPR products generally show better performance with respect to Ka-/Ku-only products, mainly for continuous scores, indicating that the synergy between the two frequencies increases the quality of the estimate. In particular, DPR-NS and MS obtain lower ME and RMSE and higher CC with respect to the DPR-HS product, while FSE values are comparable. Categorical indicators show acceptable quality detection values, better for volumetric scores and for comparison with radar, and low false alarm values (except for rain gauges matching) with limited variability among the different products. The POD indicators are rather high, even if the VHI is significantly greater, highlighting the difficulty of satellites in detecting mainly the light precipitation.
A last look into the general behavior of the GPM products is made by considering the distribution of the error of the single measure. We analyzed the percentage error (PE) defined as PE i 5 100 3 (SAT i 2 OBS i )/OBS i for each ith couple of satellite (SAT) and ground (OBS) precipitation values where both estimates are above 0.5 mm h 21 .
In Fig. 4, the frequency of samples with PE below a given threshold for both radar (Fig. 4a) and rain gauge (Fig. 4b) validation is shown. All distributions of PE are peaked around 250% (for less than 10% of the dataset), then the frequency decreases rapidly and its lower end is bounded at the value of PE 5 2100%, while all the samples with PE higher than 400% are stored at 400% value. The above-described properties of the error distribution hold for both DPR and Ku-/Ka-only products as well as for the comparison with radar and rain gauges.

Sensitivity to rain intensity, seasonal cycle, and altitude
The physiography and geographical setting of the Italian Peninsula makes it a key region to represent the Mediterranean climate, with dry summers and wet winters. Cold months are dominated by cyclonic development and frontal structures, while in summer, which is generally dry, the occurrence of isolated and mesoscale organized convection is possible. Mesoscale convection is particularly severe during late summer and early autumn (September-November) when such intense systems hit Italy, causing flooding, landslides, and other damage Marra et al. 2017;Silvestro et al. 2016). The complex orography that characterizes the Italian Peninsula adds local forcing to the precipitation formation and enhancement processes from one side, and from the other side makes it difficult to measure precipitation from ground-based instruments. Moreover, even the remote sensing from satellite-borne sensors may suffer from shortcomings at all wavelengths because of the small-scale variation of terrain elevation.
For these reasons, the performances of satellite products are expected to be dependent on the season and rain-rate intensity because of the strict relationship with the microphysical structure of the precipitating cloud and its interaction with local to mesoscale forcing.
We . The analysis is focused on the best GPM radar products, as derived in section 3, that is, the DPR-NS, DPR-MS, and DPR-HS. All indicators are computed with respect to both rain gauges and radar ground reference, but only the latter are reported here, with the rain gauge indicators being very similar to the radar ones even if, in general, they are slightly worse.
a. Sensitivity to rain-rate intensity Table 3 reports the values obtained for the whole period, dividing the dataset according to the rainfall intensity classes defined above. In each cell, we reported three lines referring from top to bottom to the performance of DPR-NS, DPR-MS, and DPR-HS, respectively. We also bolded the best score for each indicator. For light precipitation, the DPR-HS product markedly outperforms the NS and MS products in terms of ME, RMSE, and FSE. For the moderate class, DPR-HS and DPR-MS underestimate the precipitation, while DPR-NS slightly overestimates it. The CC is generally low and better for the DPR-NS products for moderate and heavy precipitation regimes. For the heavy rain class, the rank is reversed, with the DPR-NS outperforming the other products for all the statistical indicators. In this class, the precipitation is markedly FIG. 4. Distribution of the PE for satellite prEs products in comparison with (a) ground radar and (b) rain gauge data.

MAY 2018 P E T R A C C A E T A L .
underestimated by the DPR, while the best value for FSE is reached. DPR-HS for this class has the worst values, indicating that the high-sensitivity scan, more sensitive to lower rain rates, is less reliable in estimating higher rain rates. Categorical and volumetric indicators show that most of the heavy precipitation is missed by DPR, while the detection skill increases for the other two DPR products. Volumetric indicators also show that DPR-NS/-MS products assign most of the light and moderate rainfall to the right class, while DPR-HS has lower performance. FAR (and VFAR) increases for higher rainfall rate classes.

b. Sensitivity to seasonal cycle
In Table 4 (which follows the same structure as  Table 3), the impact of the seasonal cycle is addressed by computing the statistical indicators for the different seasons for rain rates higher than 0.5 mm h 21 . SON has a larger number of samples, since the study covered the period from July 2015 to December 2016; thus, the behavior of this season could have a relative impact on the overall data presented in Table 3. This feature is not evidenced by JJA, it being the driest season in Italy. A large part of the seasonal signal can be attributed to the higher occurrence of heavy precipitation in warm months with respect to light/moderate rain rates. The DPR overestimates the precipitation only during SON (except for DPR-HS) and underestimates it during all other seasons. DPR-NS and DPR-MS show better indicators for DJF, MAM, and JJA. RMSE is better in MAM for all products, and this is also true for FSE (except DPR-HS for heavy rain, which has the best value in SON), while higher CC is found for DPR-NS and DPR-MS for JJA.
The categorical and volumetric indicators show a slightly worse capability of all DPR products to detect the precipitation during MAM and DJF with respect to the other seasons, while FAR and VFAR present better scores during DJF for all the DPR products. Generally, DPR products highlight a tendency to underestimate the precipitation for both intensity and areal coverage.
Deeper insights in the relationship between seasons and rainfall intensity can be reached with Tables 5-7 (light, moderate, and heavy precipitation classes, respectively), where the indicators for each class and season are reported. As a general comment, DPR-HS is better for the light classes in all seasons, DPR-NS prevails often in heavy classes, even if its performance does not differ significantly from the DPR-MS ones, while for the moderate class there is no clearly prevailing product. The highest CCs are reached for moderate rain rates by DPR-MS in JJA and by DPR-NS in DJF. The FSE has its lowest values in the heavy rainfall class (DPR-MS in DJF), while it increases in the moderate class and often exceeds 200% in the case of light precipitation for all seasons and DPR products. The RMSE presents strong differences between light/moderate and heavy precipitation, especially in JJA, while the differences decrease in the other seasons to reach the minimum in DJF. We remark here that for DJF/MAM the heavy class is probably undersampled.
The analysis of the VHI shows, however, that only a relatively small fraction of the total amount of heavy rainfall is correctly classified with better values in JJA and SON, while for the moderate class VHI increases, reaching its maximum for the light class in JJA for DPR-MS. The larger part of the error for the heavy precipitation class in DJF and SON is due to false alarms, while the opposite is true in JJA, indicating a likely dependence on cloud system types (i.e., small-scale convection in JJA/SON and embedded convection in DJF). The low performance of all products and seasons for heavy precipitation could be related to the sub-IFOV scale of convective rainfall structures in Italy (low POD/VHI). This will be partially addressed in section 5, where the sensitivity of DPR products on rainfall pattern variability is evaluated.

c. Sensitivity to the altitude
This section is dedicated to the analysis of the performance of DPR products as a function of the altitude when compared with the rain gauges. For comparison with radar, as described in section 2c, the QI takes into account the complex orography, and this will be discussed in section 5b. Three levels have been identified by analyzing the distribution of the rain intensity measured by the rain gauges and are labeled as ''plain'' (0-400 m MSL), ''hill'' (400-800 m MSL), and ''mountain'' (higher than 800 m MSL) regions. The quantification of the DPR performances is reported in Table 8. The samples decrease moving from plain to hill/mountain regions, which show comparable sample sizes. The indicators show an overall better performance for DPR-NS and DPR-MS in the hill region (even if ME and RMSE are better in the plain and mountain regions, respectively), while they get worse from flat to mountain regions for DPR-HS. The ME also shows a general DPR rain-rate underestimation, increasing with the altitude.

Outliers
A deeper study of the overall results presented in previous sections is carried out here to investigate the main causes of the largest discrepancies between DPR products and the ground reference. In particular, we focused on the marked over-and underestimation of the satellite products. We applied this analysis to the DPR products, considering the validation with both rain gauges and radar data reported in the next two subsections.
Starting from the study reported in section 3, we focused on the samples with the largest errors [hereafter called outliers (OUT)], namely, the samples where the DPR (ground) RR $ 10 mm h 21 and the ratio between DPR RR and ground RR is at least 4 (1/4). The outliers with a DPR overestimate are labeled as DPRout while those with a DPR underestimate as GAUGEout or RADARout when compared with rain gauges or radar, respectively. To evaluate the causes for the large discrepancies we also considered a benchmark set (BS), composed of the pairs where the estimation is very close to the reference value and their normalized absolute difference does not exceed 5%. Figure 5 graphically shows the samples selected as OUT (close to the axes) and BS (close to the 1:1 line) for the three different DPR scans of prEs.

a. Rain gauge outlier analysis
We considered seven attributes of the matched DPR-reference pair that are expected to have an impact on the discrepancies. The considered attributes are gridpoint elevation above mean sea level (GE), average terrain slope around the grid point (GS), gauges density around the grid point (GD), time of the DPR observation with respect to the half-hourly gauge integration (GTD), rainfall pattern variability (GRV), position of the DPR IFOV within the swath (IP), and height above sea level of the first DPR bin not affected by clutter (BH).
The GE attribute is extracted, for each grid point, from the global 30-arc-s elevation dataset (GTOPO30), a digital elevation model (DEM) with a horizontal grid spacing of 30 arc s (;1 km). From the same database, we also computed the GS, as the standard deviation of the DEM elevation, over each grid point, on a 3 3 3 neighborhood. GD is set as the number of working gauges within a circle of 25-km radius, centered on the grid point and corresponding to the radius of influence in the GRISO procedure.  The GTD is computed as the time difference between the rain gauge time stamp (end of the 30-min accumulation time) and the DPR passage within the previous 30 min. The GRV is the standard deviation of all GRISO grid points within the DPR IFOV. IP is the position of the DPR IFOV within the swath and ranges between 1 and 49 for the NS acquisition mode, with the IFOV 1 and 49 at the edges of the swath. BH is the height above the sea level of the first DPR bin not affected by clutter.
In Table 9 the number of samples in the three categories (BS, GAUGEout, and DPRout) are reported for the three DPR products as well as the percentage of total OUT with respect to the total number of samples selected. For DPR-HS there is a marked prevalence of GAUGEout, indicating an underestimation of this product. This could be due to the higher attenuation of Ka band at higher rainfall rates. For DPR-NS and DPR-MS the number of BS is generally higher than the total OUT, with a prevalence of GAUGEout for both of them. Given the results of the previous sections and this preliminary analysis, we carried out the analysis on the OUT only referring to DPR-NS product, and we also investigated the possible sources of error in the whole DPR swath. Figure 6 shows the distribution of the three categories, which are GAUGEout (red bars), DPRout (blue bars), and BS (green bars), with respect to the more significant attributes. The percentage of total OUT for each class is shown by the open bars, and relative values are indicated on the right y axis.
The distribution of OUT and BS according to their GE, sampled in 250-m-wide classes, is presented in Fig. 6a. The number of BS for DPR-NS is larger than the sum of the OUT for the first three GE classes, the difference decreases when GE increases, and for the 0.75-1.00-km class the number of OUT is more than the number of BS. Very similar behavior is found also for the GS attribute (not shown), indicating that the impact of orography on OUT/BS distribution is expected at a higher altitude.
Considering the impact of the gauges' spatial density on the DPR-NS validation, Fig. 6b shows that, except for the case of only few gauges (less than five) around the grid point, the number of BS equals or slightly exceeds {from [5-10) to $50 GD classes} the total number of OUT. Among the OUT, DPRout numbers prevail at lower classes {from [5][6][7][8][9][10] to [15][16][17][18][19][20]}, while for the other classes GAUGEout numbers have a larger occurrence.
The distributions of BS and OUT according to GTD (Fig. 6c) show that the the number of BS prevails on OUT only when the DPR observations occur around the center of the gauges' accumulation interval (30 min). When the observation is closer to the beginning of the interval, the number of OUT slightly prevails on BS.
Almost the totality of the BS and most of the OUT are within the low rainfall-rate spatial variability (Fig. 6d). When the GRV is higher than 1 mm h 21 , the number of OUT systematically exceeds the BS, indicating the difficulty of the DPR observation to describe patchy rain patterns (in these cases the nonuniform beam filling could markedly affect the DPR observation). The second {[1-2) mm h 21 } and higher GRV classes are filled almost exclusively by GAUGEout, showing a marked DPR-NS overestimation. Since the GRISO has an influence radius of 25 km, GRV has low sensitivity, and a value of 2 mm h 21 has to be considered as relatively high.
The distribution of OUT and BS according to the IP (Fig. 6e)   lowest bin is between 1.0 and 2.5 km above the ground (Fig. 6f). In case of higher and lower heights, the numbers of DPRout and GAUGEout grow equally with respect to BS. The analysis shows that there is not any particular feature that can be related to the samples with large discrepancies between GPM products and rain gauge reference. A possible explanation about the presence of OUT could be due to a combination of the analyzed parameters that cannot be broken down.

b. Radar outlier analysis
The analysis carried out in section 5a is repeated here by using the radar data and five features that are expected to impact the discrepancies have been considered. These are time difference between the radar and DPR observations (RTD), rainfall pattern variability of the radar acquisitions (RRV) within the DPR IFOV, IP, upscaled radar quality index (UQI), and number of radar data inside each DPR IFOV (RND).
RTD is computed as the difference between the DPR acquisition time and the radar nominal time, that is, the start of the 10-min acquisition time. RRV is the standard deviation of all radar data within the DPR IFOV. IP is the same as in section 5a. UQI indicates the QI upscaled through a Gaussian weight to the satellite grid. ND represents the number of radar pixels (1-km equispaced grid) contained in the DPR IFOV.
The numbers of samples in the three categories (BS, RADARout, and DPRout) for the three DPR products are reported in Table 10, with the percentage of total OUT with respect to the total number of samples selected. For all DPR scans the number of BS is almost twice with respect to the total OUT, indicating a better agreement between DPR and radar with respect to the rain gauges' comparison. Among the OUT, there is a prevalence of DPRout for NS and MS, while for DPR-HS the number of RADARout is higher than DPRout (but lower than BS), indicating that the signal for this product (based on Ka radar data) could be attenuated for higher rain rates, resulting in a marked underestimation. As observed for rain gauge comparison in section 5a, we focused on the DPR-NS product.
The distributions of the DPRout (blue bars), BS (green bars), and RADARout (red bars) categories with respect to the individual features are shown in Fig. 7. The percentage of total OUT for each class is shown by the open bars, and relative values are indicated on the right y axis. Figure 7a shows the OUT and BS distribution with respect to RTD. It ranges between 0 and 10 min because of the time frequency of the ground radar product. The number of BS increases slightly for greater time differences, but the percentage of total OUT is equally distributed with values around 37%. We can affirm that the RTD does not have a significant impact on the large discrepancies between ground radar data and satellite estimation. The distribution of OUT and BS according to UQI is presented in Fig. 7b. The number of BS increases steadily, improving UQI values, confirming the goodness of the applied scheme. Focusing on the outlier's distribution, the number of DPRout prevails for UQI less than 0.75, probably due to the radar underestimation at a greater distance from the radar site. The trend to underestimate precipitation at increasing distance is mainly related to the increasing altitude of observation at increasing distance from radar due to Earth's curvature in combination with the vertical variability of precipitation (Vulpiani et al. 2012). The number of RADARout has a higher occurrence for higher UQI values even if they are never dominant in any of the considered categories. The resulting outliers' percentage decreases, increasing the UQI threshold from 40% at UQI greater than 0.8 to only 15% for UQI greater than 0.95.
As observed for rain gauge comparison, low rainfall rate spatial variability favors the occurrence of BS (Fig. 7c). The percentage of OUT increases for higher RRV, exceeding the 70% when the RRV is above 5 mm h 21 , even if the subset is poor over this threshold. For RRV $ 5mm h 21 the number of RADARout has higher occurrence than DPRout and BS, indicating the difficulty of the DPR to describe patchy rain patterns. Figure 7d shows the BS and OUT distributions with respect to the number of radars within the DPR IFOV. The number of BS prevails with respect to total OUTs independently by RND, even if the percentage of OUT slightly decreases at higher RND. Figure 7e shows the BS and OUT distribution with respect to the IP along the scanline. The number of RADARout and DPRout is fairly equally distributed resulting in an absence of a clear signal.
The analysis shows a strong dependence of many discrepancies by the quality of the radar data upscaled to the satellite grid. Moreover, even if to a lesser extent, the different spatial resolution between the two instruments highlights the difficulties from the satellite point of view to observe in detail the spatially less homogeneous precipitation patterns.

c. Outliers conclusion
While the rain gauge outlier analysis did not show any particular feature able to justify the larger discrepancies observed between the DPR products and the rain gauge data, the radar outlier analysis showed a clearer signal. The UQI shows a good agreement with the benchmark FIG. 7. As in Fig. 6, but for (a) RTD, (b) UQI, (c) RRV, (d) RND, and (e) IP. Note that for the RRV distribution only, the occurrence axis ranges up to 800. 6. UQI-filtered dataset results As described in section 2c, the QI depends on different factors such as PBB, beam broadening, vertical variability of precipitation, and rain path attenuation. For greater distances (i.e., for lower QI values), there is a higher probability of underestimating the precipitation. To best evaluate the performance of the DPR product and, at the same time, maintain a statistically significant reference dataset, we considered the ground radar estimates with UQI greater than 0.8, hereafter referred to as the UQI-filtered dataset.
In this analysis, we recomputed the statistical scores for the DPR products with respect to the UQI-filtered dataset. Figure 8a shows the scatterplot between the DPR prEs NS and the radar estimation (as in Fig. 3j), while Fig. 8b shows the same scatterplot for UQIfiltered data. The comparison between the Figs. 8a and 8b shows a significant improvement when the UQI filter is applied. In the resulting density scatterplot, samples are more normally distributed around the 1:1 line, and a general reduction of points along the x and y axes and of samples with greater discrepancies is evident. Table 11 reports the number of BS, RADARout, and DPRout samples before and after the UQI-filter processing and the relative percentage variation. The UQI-filter process reduced, as observed in Fig. 7b, mainly the DPRout category (almost 65%) and over a quarter of RADARout samples. Only 21% of BS samples were filtered out. The UQI-filtered dataset shows a comparable number of RADARout and DPRout, indicating a lower bias with respect to the previous dataset.
The statistical scores discussed in section 3 are computed for the UQI-filtered dataset and compared with respect to the DPR-NS ''whole dataset'' as shown in Table 12. The best value for each score is bolded. The applied filter reduces the whole dataset by about 30%. All statistical scores considered (except the ME that decreases and VHI that remains unchanged) improve. The RMSE improves by 8% (from 3.86 to 3.57 mm h 21 ), the CC by 17% (from 0.44 to 0.52), and the FSE by 14% (from 165% to 142%). The most significant improvement is in terms of the false alarm and volumetric false alarm, which reach the values FAR 5 1% and VFAR 5 0%, respectively. The ME moves from 20.17 to 20.44 mm h 21 , indicating a greater tendency to underestimate the radar estimates.
This slight underestimation is partly attributable to the lower DPR spatial resolution (5.2 km) with respect to ground radar data (1 km). In fact, as shown in Fig. 7c, the OUT distribution as a function of the RRV highlights the minor capability of DPR to adequately detect irregular FIG. 8. Density scatterplot for the DPR prEs NS product in comparison with the ground radar estimates considering (a) the whole dataset and (b) the dataset with radar UQI $ 0.8. rain patterns. At the same time, for homogeneous and widespread precipitation characterized by lower spatial rain variability, the number of BS is clearly predominant, highlighting the DPR's high performance in detecting and correctly estimating the rainfall rate.

Conclusions
In this work we compared the GPM Level-2A (Ka/Ku and DPR) estimated surface products with respect to instantaneous radar estimates and 30-min accumulated rain gauge for an 18-month period (July 2015-December 2016) over Italian land areas for liquid-phase precipitation with a total of 460 DPR overpasses. The validation approach is based on the H SAF methodology that consists of evaluating the satellite product on the grid of the product itself (pixel-based approach), and only the ground data undergoes the upscaling process to the satellite grid.
As a first comparison, DPR (dual-frequency) products generally show better performance with respect to the Ka-/Ku-only products, confirming that the synergy between the two frequencies increases the overall quality. On this basis, the analysis was carried out on the DPR prEs product for all scans (NS, MS, and HS). Moreover, the statistical scores computed between the satellite and ground references are very close to each other, showing a slightly better performance for the radar comparison.
The Italian Peninsula represents a key region to describe the Mediterranean climate, and its complex orography makes it difficult to measure precipitation from both ground-based instruments and remote satellite sensors. For these reasons, we evaluated the DPR product's performances for different rain-rate intensities, seasons, and altitudes. For light precipitation (0.5 # RR , 1.0 mm h 21 ), the HS scan markedly outperforms NS and MS products, but shows worse performance for the heavy rain class (RR $ 10 mm h 21 ), indicating its greater aptitude in detecting lower rain rates with respect to heavier precipitation. The DPR-NS product shows similar performances for the different precipitation classes. Looking at the seasonal scores, we noted better performances during warmer months (mainly MAM) in terms of RMSE, CC, and FSE for the NS and MS products. The winter season (DJF) prevails only for lower false alarm values, indicating a dependence on cloud system types when wider-spread precipitation is more frequent. It has to be remarked that for the winter season (and higher elevation) the results may be contaminated by snowfall episodes not screened by the DPR detection of liquid/solid precipitation. The study on the orography highlights a greater agreement for DPR-NS and DPR-MS in the hill region and an increasing underestimation with altitude.
A deeper analysis was carried out to investigate the main causes of the largest discrepancies between the DPR-NS product and ground reference data. We focused both on the BS and on the OUT samples. This analysis was applied to the rain gauge data and did not show any clear signal that is able to explain the reason for the large discrepancies. From the radar analysis, we found a more marked dependence on the ground radar quality with the number of BS increasing, improving the UQI values. Analogously, the prevalence of DPRout at lower UQI indicates a general underestimation of the precipitation intensity by ground radars with respect to the DPR products. The results obtained by filtering datasets for UQI values above 0.8 show a significant improvement for all statistical scores considered, resulting in RMSE 5 3.57 mm h 21 , FSE 5 142%, POD 5 65%, VHI 5 83%, FAR 5 1%, and VFAR 5 0% for rainfall above 0.5 mm h 21 .