This study assesses the level-2 precipitation estimates from 10 radiometers relative to Global Precipitation Measurement (GPM) Ku-band precipitation radar (KuPR) in two parts. First, nine sensors—four imagers [Advanced Microwave Scanning Radiometer 2 (AMSR2) and three Special Sensor Microwave Imager/Sounders (SSMISs)] and five sounders [Advanced Technology Microwave Sounder (ATMS) and four Microwave Humidity Sounders (MHSs)]—are evaluated over the 65°S–65°N region. Over ocean, imagers outperform sounders, primarily due to the usage of low-frequency channels. Furthermore, AMSR2 is clearly superior to SSMISs, likely due to the finer footprint size. Over land all sensors perform similarly except the noticeably worse performance from ATMS and SSMIS-F17. Second, we include the Sondeur Atmospherique du Profil d’Humidite Intertropicale par Radiometrie (SAPHIR) into the evaluation process, contrasting it against other sensors in the SAPHIR latitudes (30°S–30°N). SAPHIR has a slightly worse detection capability than other sounders over ocean but comparable detection performance to MHSs over land. The intensity estimates from SAPHIR show a larger normalized root-mean-square-error over both land and ocean, likely because only 183.3-GHz channels are available. Currently, imagers are preferred to sounders when level-2 estimates are incorporated into level-3 products. Our results suggest a sensor-specific priority order. Over ocean, this study indicates a priority order of AMSR2, SSMISs, MHSs and ATMS, and SAPHIR. Over land, SSMIS-F17, ATMS and SAPHIR should be given a lower priority than the other sensors.
Satellite precipitation estimates have been widely used in many areas ranging from real-time, high-impact weather detection and short-term weather prediction to long-term climate monitoring. Accurate precipitation estimation is of critical importance for these applications. Currently, there are several well-documented and operational precipitation datasets, including NASA’s Integrated Multisatellite Retrievals for GPM (IMERG) (Huffman et al. 2015), Climate Prediction Center’s morphing technique (CMORPH) (Xie et al. 2017), JAXA’s Global Satellite Mapping of Precipitation (GSMaP) (Kubota et al. 2007), and Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks–Cloud Classification System (PERSIANN-CCS) (Hsu et al. 1997; Hong et al. 2004). These gridded precipitation datasets are often referred to as “level-3” products in the satellite precipitation community.
A key step to improve the accuracy of the satellite precipitation estimates is to validate them relative to other references. Indeed, these level-3 products have been extensively validated against gauge and ground radar observations over different land regions at hourly, daily, and monthly time scales. Maggioni et al. (2016) surveyed the validation work from 1998 to 2015 for these gridded satellite precipitation products. These validation studies consistently showed that level-3 satellite rainfall products are generally more accurate over dense vegetation regions and in the warm season. Similarly, Khan et al. (2018) showed that IMERG has the best performance over the southeast United States compared with other regions in the United States, relative to ground and spaceborne radar observations. Validation work over ocean for these level-3 products is limited to rain gauge observations over atolls and from buoys. Prakash and Gairola (2014) showed that the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) has systematic overestimation tendencies relative to 22 buoy rain gauge observations in the tropical Indian Ocean. Validation work continues in the Global Precipitation Measurement (GPM) era since 2014. Recent studies show that IMERG is better than TMPA (Liu 2016; Liu et al. 2018; Wu and Wang 2019; Gebregiorgis et al. 2018), partially due to the level-2 passive microwave retrieval algorithm improvement.
These surface- and geographical-dependent performances are inherited from the level-2 (swath) rainfall retrieval results of passive microwave radiometers, which serves the basis for generating the widely used level-3 (gridded) precipitation datasets (except PERSIANN estimates deriving from IR only). These passive microwave radiometers include the Special Sensor Microwave Imager (SSMI) on board the Defense Meteorological Satellite Program (DMSP) F11–F15 satellites; the Special Sensor Microwave Imager/Sounder (SSMIS) on board DMSP F16–F19 satellites; the Tropical Rainfall Measuring Mission’s (TRMM) Microwave Imager (TMI); the Global Precipitation Measurement (GPM) Microwave Imager (GMI) on board the GPM Core Observatory satellite; the Advanced Microwave Sounding Unit-B (AMSU-B) on board NOAA-15–NOAA-17 satellites; the Advanced Technology Microwave Sounder (ATMS) on board Suomi National Polar-Orbiting Partnership (Suomi-NPP) and NOAA-20 satellites; the Microwave Humidity Sounder (MHS) on board NOAA-18, NOAA-19, MetOp-A, MetOp-B, and MetOp-C satellites; the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) on board the Aqua satellite; and its follow-on satellite (AMSR2) on board the GCOM-W1 satellite. In addition, several future satellite missions with radiometers on board suitable for precipitation measurement have already been planned. For example, several Joint Polar Satellite System (JPSS) satellites with ATMS on board and FengYun satellites with Microwave Radiation Imager (MWRI) and the Microwave Humidity Sounder (MWHS) on board have already been planned to launch in the near future (Goldberg 2018; Gu and Tong 2015).
Efforts have been made to directly evaluate the level-2 rainfall retrieval results for several aforementioned sensors. For example, Conner and Petty (1998) compared the rainfall retrieval results from five different algorithms for SSMI on board the DMSP F11 satellite, relative to hourly gauge and gauge-corrected radar rain rates over the continental United States (CONUS). They found that all five retrieval algorithms perform similarly with higher retrieval skills for heavy rainfall. McCollum et al. (2002) validated the rainfall retrieval results from SSMI on board F13, F14, and F15 relative to the gauge-corrected hourly radar data over the eastern CONUS and concluded that SSMI overestimates (underestimates) rainfall intensity in the summer (winter) months. Kummerow et al. (2001) downgraded the instantaneous TMI retrieved rain rates into monthly time scales and accumulated in 2.5° grid boxes to compare with the gauge-based precipitation product from Global Precipitation Climatology Centre (GPCC) and showed that they agree very well with each other.
However, these early level-2 validation studies have a time-scale mismatch problem. In other words, the satellite-derived rain rate is an instantaneous snapshot product which does not match well with the hourly or even monthly reference products. With the availability of ground radar rainfall estimates at several minute temporal resolution over CONUS (Kirstetter et al. 2014; Zhang et al. 2016) and western Europe (Kidd et al. 2018), this time-scale mismatch problem has been greatly alleviated. Tang et al. (2014) assessed the rainfall retrieval results from 12 passive microwave radiometers over the eastern CONUS using the 5-min ground radar observations as the reference. They concluded that precipitation retrieval from microwave imagers notably outperforms that from sounders. It is worth mentioning that the retrieval algorithms for the evaluated 12 passive microwave radiometers are from different developers and use different retrieval techniques. Some of the algorithms are even ad hoc in nature.
A recent study by Kidd et al. (2018) evaluated the level-2 retrieval results from GPM constellation radiometers, which are all generated by the Goddard profiling algorithm (GPROF) (Kummerow et al. 2015). It is found that GPROF retrievals tend to overestimate the light rainfall and underestimate the heavy rainfall. The GPROF level-2 retrieval results for GMI have also been evaluated against three dense gauge networks at the pixel resolution and instantaneous time scale by Tan et al. (2018). They showed that GPROF still faces challenges over coastal and semiarid regions. Snowfall validation for GMI has been conducted over a radar station in Finland for several snowfall events (von Lerber et al. 2018). It was shown that the GPROF retrieval performance shows a clear dependence on storm-top height, with much better performance for tall storm systems.
As reviewed above, validation for level-2 retrieval at the instantaneous temporal resolution is limited to several land regions due to lack of high spatial and temporal resolution reference data. Over ocean, the data availability issue is even more severe. In the TRMM era, the level-2 retrieval validation often used the single radar observations at the Kwajalein Atoll (Kummerow et al. 2001; Kubota et al. 2007; Wolff and Fisher 2008, 2009). Clearly, the observations from a single radar station cannot represent the rainfall characteristics over ocean.
To mitigate the representativeness issue over both land and ocean, Lin and Hou (2008) exploited the coincident observations from the TRMM Precipitation Radar (PR) and eight passive microwave sensors. It is shown that over land AMSU-B (sounder) and SSMI (imager) perform similarly for instantaneous rain rates between 1.0 and 10.0 mm h−1, and over ocean the imagers are noticeably better than the sounders for rain rates greater than 5.0 mm h−1.
This study applies the same strategy of Lin and Hou (2008) by taking the spaceborne KuPR as the reference to evaluate the precipitation estimates from 10 sensors. First, we evaluate the performance of the precipitation estimates generated by the latest version of GPROF (Kummerow et al. 2015) from nine sensors, including four imagers (AMSR2, SSMIS on board F16, F17, and F18) and five sounders (ATMS on board Suomi-NPP and MHS on board NOAA-18, NOAA-19, MetOp-A, and MetOp-B). This evaluation is over the GPM covered region of 65°S–65°N. Second, we include precipitation estimates from the Sondeur Atmospherique du Profil d’Humidite Intertropicale par Radiometrie (SAPHIR) on board Megha-Tropiques into the evaluation process, which are generated by the Precipitation Retrieval and Profiling Scheme (PRPS) (Kidd 2018). Since SAPHIR only covers the area from 30°S to 30°N, precipitation estimates from the other radiometers are limited to this 30°S–30°N latitudinal band for a fair comparison.
All sensors excluding SAPHIR evaluated in this study are in sun-synchronous orbits, which cross the equator at the same local times twice a day. Therefore, the comparisons may reflect systematically different precipitation regimes that have a diurnal variability. For example, a sensor with a late afternoon equator-crossing time may sample more thunderstorms than a sensor with a noontime equator-crossing time. While this is a common issue for the intersensor comparison, the goal of this study is not to provide an absolute but only operational-oriented evaluation of the retrieval’s performance. Additionally, the possible adverse effect due to diurnal cycle is mitigated by the satellites drifting over time, which evens out the diurnal sampling of a sensor on a particular platform. For example, MHS-NOAA18 drifted from about 0400/1600 in March 2014 to 0830/2030 in December 2018. This diurnal sampling bias is further reduced by considering the same sensors on different platform—which have different diurnal sampling times—in our conclusions.
Results from this work are expected to have important implications for the level-3 merged precipitation product improvement. Currently, imagers are preferable to sounders when the level-2 retrieval results are incorporated into the level-3 merged product—IMERG (Huffman et al. 2015). By knowing the performance of each sensor, a better prioritization scheme may be possible. Furthermore, assessments of the recently released precipitation rates by PRPS from SAPHIR are limited, in part because its 30°N range constrains the use of CONUS-based ground radars for evaluation. The results can provide insight into the SAPHIR precipitation estimates relative to other radiometers in the tropical region.
a. KuPR precipitation rate
This study uses the latest version (V06) KuPR precipitation rate as the “reference.” Specifically, we obtain the variable “precipRateNearSurface” from the 2A-DPR product for KuPR. It is worth mentioning that “precipRateESurface” is extrapolated from the lowest nonclutter bin to the surface and is slightly smaller than “precipRateNearSurface” by about 2%. KuPR is a cross-track scanning radar on board the GPM Core Observatory with a nadir resolution of about 5 km. With a frequency of 13.6 GHz that is similar to TRMM PR, it is well suited to observe moderate-to-heavy rain. It has a detection threshold of 18 dBZ (~0.5 mm h−1) based on its initial design specifications, but postlaunch analysis suggests that it can identify precipitation signals down to 12 dBZ (~0.2 mm h−1) (Hamada and Takayabu 2016). Precipitation estimates that include the Ka-band PR (KaPR) reflectivity from the 2A-DPR product are not used in this study because 1) the KuPR swath width (245 km) is about twice as wide as that from KaPR, resulting in more coincident observations between KuPR and each radiometer, and 2) no clear sensitivity advantage is observed from KaPR compared with KuPR (Toyoshima et al. 2015; Hamada and Takayabu 2016; Skofronick-Jackson et al. 2019).
Although this study uses the KuPR precipitation rate as the “reference,” it is certainly not perfect. For example, KuPR misses most of the precipitation rates below 0.2 mm h−1 due to its detection limitation (Hamada and Takayabu 2016; Skofronick-Jackson et al. 2019). Previous studies also showed that KuPR underestimates high rainfall rates in the convective storms, compared with the ground radar observations (Schwaller and Morris 2011; Biswas and Chandrasekar 2018; Warren et al. 2018). However, it provides the precipitation estimate from 65°S to 65°N, which is particularly valuable over ocean where observations from other instruments (e.g., radar and gauge) are very sparse.
Another potential caveat in using KuPR as the reference is the overlap in data for calibration and validation. Information from KuPR is used as a priori knowledge by GPROF and PRPS, from which the retrieval attempts to estimate the precipitation rate based on the observed microwave brightness temperatures from the passive sensors. In this process, the a priori knowledge, made of coupled passive microwave brightness temperatures and KuPR precipitation rates, is used by GPROF’s Bayesian scheme to generate a weighted mean of all known KuPR precipitation rates. To further constrain the solution, GPROF stores this a priori information into so-called databases, subsetted by surface type and environmental conditions (e.g., TPW and 2-m temperature). While this approach provides a robust precipitation rate retrieval, and is applicable to any passive microwave instrument, its validation against KuPR may easily result in an inflated performance. The same applies to the PRPS. For example, biases in KuPR may propagate into the GPROF/PRPS database, thereby hiding any systematic error that may be present in the estimates. However, we expect such situations to occur in a small fraction of the results presented in this study for the following reason. GPROF builds the retrieval database primarily using GMI and KuPR to constrain the precipitation vertical profile. Then, this database is used in the same fashion by the other sensors (except SAPHIR) to estimate the surface precipitation rate. The extent to which accurate precipitation information can be extracted from this database is dependent on the information content from each sensor and the retrieval algorithm’s ability to obtain the precipitation information. We would like to emphasize that GMI is not included in the comparison process. In contrast, PRPS directly uses the KuPR precipitation rate to populate the database for SAPHIR precipitation rate estimates. In this sense, the statistics from SPAHIR may represent its upper limit performance, and our results later show that SAPHIR performs slightly worse than the other radiometers. Therefore, despite the use of KuPR for constructing the databases by both GPROF and PRPS, its use in evaluating the passive microwave precipitation estimates can still provide valuable information for the relative performance of each sensor.
b. Precipitation estimates from 10 radiometers with GPROF
The first part of this study uses the surface precipitation estimates from 10 sensors over the 65°S–65°N latitudinal band, including AMSR2 on board GCOM satellite; GMI on board GPM Core Observatory satellite; SSMIS on board F16, F17, and F18 satellites; ATMS on board Suomi-NPP satellite; and MHS on board MetOp-A, MetOp-B, NOAA-18, and NOAA-19 satellites.
For clarity and convenience, these sensors are referred to as AMSR2, GMI, ATMS, SSMIS-F16, SSMIS-F17, SSMIS-F18, MHS-MetOpA, MHS-MetOpB, MHS-NOAA18, and MHS-NOAA19. From now on, we use these abbreviations to represent either the sensors themselves or the GPROF retrieved precipitation rates from these sensors, depending on the context of discussion.
Precipitation rates for all 10 sensors are generated by GPROF (Kummerow et al. 2015). There are 14 surface types used in the GPROF retrieval process, including ocean, sea ice, maximum vegetation, high vegetation, moderate vegetation, low vegetation, minimum vegetation, maximum snow, moderate snow, low snow, minimum snow, standing water, water–land boundary (coast), and water–sea ice boundary. This study combines all five vegetation categories into “vegetation” type, and all four snowfall categories into “snow” type to increase the sample size over each surface type. By doing so, there are seven surface types in total. In addition, GPROF does not use all available channels from each sensor for precipitation retrieval, and the channels used are also different over land and ocean for SSMIS (Table 1). This feature has important implications for the retrieval result.
c. Precipitation estimates from SAPHIR by PRPS
In the second part of this work, we include the precipitation estimates from the cross-track-scanning SAPHIR in the evaluation process. Kidd (2018) recently developed the Precipitation Retrieval and Profiling Scheme (PRPS) to estimate precipitation rate from SAPHIR, which only has six channels with different bandwidths around 183.3 GHz and covers the tropical region of 30°S–30°N. The precipitation estimates from SAPHIR have not been evaluated as extensively as GPROF and understanding of its performance remains limited.
We obtain the SAPHIR precipitation rate from the latest version of PRPS (V02–02). To conduct a fair comparison, precipitation estimates from the other sensors (KuPR and other radiometers) are limited to 30°S–30°N of SAPHIR covered region.
d. Spatial and temporal coverage
Data from all sensors are over the GPM covered region of ~65°S–65°N except observations from SAPHIR that are only over the ~30°S–30°N region. The input brightness temperature data for all sensors are the intercalibrated level 1C data (Berg et al. 2016), which are produced and distributed by the NASA Precipitation Processing System (PPS). In terms of the temporal coverage, data for all sensors are from March 2014 (launch of GPM satellite) to December 2018, except those from MHS-NOAA18, and the three SSMIS sensors. MHS-NOAA18 ceased to work in late October 2018, and only data from March 2014 to October 2018 are used. For SSMIS-F16, the 183.3 GHz channels (183.3 ±1, ±3, and ±7) were not used in GPROF from December 2013 to August 2015 due to excessive noise in the brightness temperature observations. These data were flagged as bad and set to missing in the input level 1C dataset based on quality control done as part of the dataset development and production. We would like to mitigate the influence of the unavailability of the 183.3-GHz channels on the retrieval results for SSMIS-F16 by selecting a period from September 2015 to December 2018 when all three SSMIS sensors had working 183.3-GHz channels. It also worth mentioning that the 150-GHz channel from SSMIS-F18 stopped functioning in February 2012. No obvious impact is noticed due to the loss of this channel for SSMIS-F18 in the later analysis.
e. Collocation scheme
The precipitation rate from KuPR has a spatial resolution of ~5 km. Retrieved surface precipitation rates for SSMIS, AMSR2, and GMI are put at their ~19 GHz channel resolutions, which are approximately 59, 22, and 15 km, respectively. For SAPHIR, MHS, and ATMS, the nominal resolution for the retrieved surface precipitation rate is ~10, ~16, and ~16 km at nadir, respectively. It is worth mentioning that GPROF does not produce retrieval results for the 9 and 11 pixels over the right and left edges of each scan line for ATMS. For MHS, no retrieval is performed for five pixels at both ends of the scan line. By omitting these pixels, the negative impact of the large footprint size at the edge of the sounder scan line is alleviated.
To mitigate the apparent resolution difference between KuPR and these radiometers, we take 15 km as the nominal resolution, and average 3 × 3 KuPR pixels to match this resolution. Specifically, we take the arithmetic mean of the precipitation rates from KuPR every three pixels in both the cross and along scan line directions. For retrieval results from the radiometers, we use their native retrieval resolutions. Another scheme to mitigate the resolution discrepancy is to calculate the average 3 × 3 KuPR value based upon each radiometer footprint view, instead of averaging the nine KuPR pixels first. The reason why we average 3 × 3 KuPR pixels first is to reduce the KuPR sample size. By doing so, it is more computationally efficient for the collocation between KuPR and each radiometer (keeping in mind that there are 11 radiometers in this study).
For retrieval results from the radiometers, their native retrieval resolutions are used. As mentioned previously, the retrieval resolutions from all sensors are close to 15 km nominal resolution, except that from SSMIS (59 km). We need ~16 already downgraded KuPR pixels (15 km) to roughly match this SSMIS retrieval resolution. Analyses indicate that the statistical metrics shown later can vary, depending on the KuPR subpixel variability in each SSMIS footprint, as demonstrated by previous studies (Varma et al. 2004; Kirstetter et al. 2015). This study decides to use the original SSMIS retrieval resolution because 1) it is almost impossible to know the subpixel variability when incorporating the SSMIS retrieval result to the level-3 merged product. In fact, IMERG does not account for the retrieval resolution and maps level 2 pixels onto the 0.1° grid via nearest neighbor interpolation and 2) the collocated sample size between KuPR and SSMIS is too small for the robust statistical analysis. For example, based on the current collocation scheme there are 357 893 collocations between KuPR and SSMIS-F16 over ocean. This number reduces to 38 963 when only the SSMI pixel with 16 KuPR pixels in its field of view are kept.
f. Coincident observations from KuPR and other sensors
The purpose of this study is to use the KuPR observations as the reference. To this end, we must clearly define when the observations from KuPR and other sensors are considered “coincident” observations.
For GMI, obtaining the coincident observations is rather simple since both KuPR and GMI are on the GPM Core Observatory satellite. Observations are made at nearly the same time from both instruments. For each downscaled KuPR pixel (~15 km), the nearest GMI pixel is attached. By doing so, we have KuPR–GMI coincident observations.
We define the coincident observations between KuPR and each of the other nine sensors when the observations from both instruments (e.g., KuPR and AMSR2) are less than 5 min apart and less than 5 km away. These two threshold values (5 min and 5 km) are selected by considering the trade-off between the sample size and the accuracy of coincident observations. A similar procedure has also been utilized in satellite intercalibration (Yang et al. 2011) and brightness temperature temporal variation computation (You et al. 2017a, 2018). We have also tested other threshold values (e.g., 10 min, 15 min, 2 km, and 10 km) and found that the conclusions (e.g., the performance rank for all sensors) of this study are robust under different parameters of coincidence, even though the absolute numbers change.
After obtaining the coincident observations, we would like to analyze the precipitation retrieval performance from all passive microwave sensors against KuPR, except GMI. We intentionally do not compare the precipitation estimates from GMI with those from other radiometers. Such a comparison favors the GMI performance because they are on the same satellite and make observations at nearly the same time. Instead, this study uses the performance of precipitation estimates from GMI as a sample size indicator. More detailed explanations are discussed in section 4.
To assess the precipitation detection performance of the passive microwave sensors, we compute the four numbers in the 2 × 2 contingency table (i.e., hit, miss, false alarm, and correct negative). In the following discussions, we take the AMSR2 precipitation estimates as an example, and the definitions are equally applied to the other passive microwave sensors.
Definitions for these four numbers are based on Wilks (2011). A hit is defined as both KuPR and AMSR2 detecting precipitation. A false alarm is when AMSR2 detects precipitation while KuPR detects no precipitation. A miss is when KuPR detects precipitation but AMSR2 does not. A correct negative is when both KuPR and AMSR2 detect no precipitation. This study uses 0.2 mm h−1 as the precipitation/no-precipitation threshold value. Choosing other values (e.g., 0.1 or 0.5 mm h−1) does not change our conclusions, though the numerical values change.
From these four numbers, we compute commonly used accuracy metrics of probability of detection (POD), false alarm rate (FAR), and Heidke skill score (HSS). The POD value quantifies the fraction of precipitating KuPR pixels that AMSR2 correctly identifies as precipitating; it varies from 0 to 1 with a larger POD indicating better detection performance. The FAR value quantifies the fraction of nonprecipitating KuPR pixels that AMSR2 incorrectly identifies as precipitating; it varies from 0 to 1 with a smaller FAR indicating better detection performance. FAR should not be confused with False Alarm Ratio, defined as the fraction of precipitating AMSR2 pixels that are false alarms (Wilks 2011). A large POD value is often associated with a large FAR value, which makes it difficult to assess detection performance using POD or FAR. This study uses the HSS (varying from −1 to +1) value to judge the overall detection performance. HSS is a generalized skill score that quantifies how well AMSR2 detects precipitation compared to random chance. An HSS value greater than zero indicates a performance better than random chance.
In the hit category (i.e., when both KuPR and AMSR2 detect precipitation), we also compute the correlation coefficient between KuPR and AMSR2.
To further evaluate the precipitation intensity, we compute the normalized bias (nBIAS) and normalized root-mean-square error (nRMSE) in different KuPR precipitation intensity bins (Lin and Hou 2008; Tang et al. 2014). Without binning the precipitation intensity, these two metrics can be easily weighed toward the most frequently light-precipitation pixels when computing these statistical measures (Conner and Petty 1998; Lin and Hou 2008). In each KuPR precipitation intensity bin (e.g., 0.2–0.5 mm h−1), these two metrics are computed:
where xi and yi are the KuPR precipitation rate and AMSR2 precipitation rate, respectively. The and n represents the mean KuPR precipitation rate and sample size in that particular bin. Both the zero and nonzero precipitation rates from the concurrent AMSR2 precipitation rates are included.
This section begins by explaining why we use GMI precipitation estimate’s performance as the sample size indicator over different surface types. Then the results are presented in two separate subsections. The first subsection focuses on the performance from nine sensors (AMSR2, SSMIS-F16, SSMIS-F17, SSMIS-F18, ATMS, MHS-MetOpA, MHS-MetOpB, MHS-NOAA18, MHS-NOAA19) over the 65°S–65°N region. In the second subsection, we include the SAPHIR precipitation estimates in the evaluation process and limit other sensors to the SAPHIR latitudes (30°S–30°N).
a. GMI performance as a sample size indicator
Figures 1a–j show the coincident observations between KuPR and each sensor of AMSR2, SSMIS-F16, SSMIS-F17, SSMIS-F18, ATMS, MHS-MetOpA, MHS-MetOpB, MHS-NOAA18, and MHS-NOAA19. It is clear that the sample size differs greatly among these sensors, with the largest sample size from AMSR2 of 40 980 427, and the smallest sample size from ATMS of 6 441 257.
The large sample size difference among the different sensors likely leads to an unfair comparison. For example, AMSR2 has many more samples over snow-covered areas and over sea ice than the other sensors. Precipitation retrieval over these surface types remains very challenging and is known to have larger retrieval uncertainties (Kummerow et al. 2015; You et al. 2015, 2016, 2017a). To ensure that the results for each sensor are not affected by its sampling, we use the GMI performance as an indicator for sampling in the following way. Each KuPR observation has a concurrent GMI observation. Therefore, we compare GMI to KuPR for each coincident pair between KuPR and the passive microwave sensor; effectively, we are selecting subsets of the GMI–KuPR comparison based on the sampling of the various coincidences. This reveals how different sampling may affect the performance.
For each coincident observation pair between KuPR and each passive microwave sensor (except GMI), there is also a concurrent GMI observation. We compare these subsets of GMI retrieved precipitation rate (concurrent with any other radiometers) to KuPR over different surface types. The essential idea is that if the sample size of each GMI subset is large enough, the performance from each subset should be similar.
The correlation between the GMI retrieved precipitation rates and KuPR precipitation rates (concurrent with any other radiometers) is shown in Fig. 2a over these seven surface types. Over ocean, these nine subsets of GMI (indicated by the sensor name in Fig. 2b) perform very similarly, with correlation ~0.75. This means that the differences in sampling between the various sensors have a negligible effect. A similar pattern is observed over vegetation and coast, with all correlations clustered closely around ~0.60 and ~0.55, respectively. On the other hand, the correlation varies greatly over the other four surface types (sea ice, snow, standing water, and water–ice boundary), indicating a possible effect of sampling on the results (keeping in mind that these are all GMI–KuPR comparisons). A similar feature is observed from the bias analysis (Fig. 2b). That is, the biases from these nine subsets of GMI are closer to each other over ocean, vegetation, and coast, compared with other four surface types.
To further investigate the sample size issue, the quantile–quantile (Q–Q) plot between all KuPR precipitation rates and KuPR precipitation rates in each subset (i.e., KuPR meets each sensor) is analyzed. A similar approach is employed by Kirstetter et al. (2012) to check the sample representativeness between TRMM PR precipitation rates and ground radar observations. Figure 2c shows the Q–Q plot between all KuPR precipitation rates and KuPR (ATMS-subset) precipitation rates over ocean. It is immediately clear that the quantiles from these two datasets are almost identical to each other, indicated by their close matches on the 1 by 1 line (red curve in Fig. 2c). The quantiles from other KuPR subsets (not shown) also show close match with those based on all KuPR precipitations rate. The almost identical quantiles from all KuPR precipitation rates and subsets KuPR precipitation rates indicate that the subset samples are representative to the overall KuPR distribution over ocean. The Q–Q plots over vegetation and coast are similar to those over ocean.
In contrast, clear discrepancies are noticed between quantiles from all KuPR precipitation rates and those from ATMS-subset KuPR precipitation rates (Fig. 2d) for intensities greater than 4 mm h−1 over snow-covered regions. It implies that the ATMS-subset KuPR precipitation rates are not representative enough relative to the overall KuPR precipitation features. Similar discrepancies from other sensor subsets and over other three surface types (sea ice, standing water, and water–ice boundary) are observed (not shown).
Although subsets of GMI precipitation retrieval results over coastal regions show similar performance (Figs. 2a,b), this study does not analyze the retrieval results over coastal regions. The reason is because pixels from SSMIS and pixels at the scan edges of the sounders (ATMS and MHSs) have a much larger footprint size compared with AMSR2. The precipitation retrieval performance over coastal regions is highly influenced by the land/ocean percentages in each pixel rather than any other physical reason (e.g., channel availability). Therefore, our analyses only focus over vegetation and ocean surface types.
b. Precipitation estimate performance over 65°S–65°N
1) Detection performance
Figure 3 shows the detection statistics over ocean and vegetation for the GPROF estimates from nine sensors. Over ocean, all five sounders (MHSs and ATMS) have similar POD values of ~53% (Fig. 3a), and FAR values of ~8% (Fig. 3b). The overall detection performance is also similar with HSS values of ~0.33 (Fig. 3c) for all five sounders. The precipitation detection performance from imagers (SSMIS and AMSR2) is markedly improved. Specifically, the three SSMISs perform similarly with POD, FAR and HSS of ~65%, ~6%, and ~0.46, respectively. The detection performance from AMSR2 is marginally better than SSMISs, with a HSS of 0.49.
Different channels used for detection over ocean (Table 1) largely account for the apparent better detection performance from imagers than from sounders. For sounders, the low-frequency channels below 89 GHz are either not available from MHS or not used from ATMS. Therefore, there is almost no emission signature from the liquid water drops, leading to inferior detection performance from sounders. In contrast, the emission signature is much better captured by the 19-GHz channel from AMSR2 and SSMIS, resulting in better detection performance.
A very different picture emerges over vegetation in terms of the precipitation detection performance between sounders and imagers. The most striking difference is that there is no clearly superior performance between imagers and sounders over vegetation. The HSS values are all close to 0.60 (Fig. 3f) except notably smaller values of 0.54 from SSMIS-F17 and 0.56 from ATMS. SSMIS-F17 also has the largest FAR value (Fig. 3e). The equator crossing times from SSMIS-F17 and SSMIS-F18 are very similar from 2015 to 2018. In particular, the equator crossing time in 2017 from these two satellites is less than 20 min apart. Therefore, the precipitation diurnal cycle may not be responsible for the poorer performance of SSMIS-F17. Work is underway to track down the exact reason why SSMIS-F17 has an inferior performance relative to the other two SSMISs on board F16 and F18. Our preliminary analysis indicates that the unavailability of the vertically polarized 37 GHz channel on SSMIS-F17 since April 2016 is a highly likely possibility. Another possible reason is potential calibration errors in the high-frequency channels (91, 150, and 183.3 GHz), since all three SSMISs perform similarly over ocean, where the primary signature is the liquid water emission signal from low-frequency channels (e.g., 19 GHz). The Precipitation Measurement Missions (PMM) intercalibration working group found significant biases resulting from an emissive main reflector on the SSMISs (Berg and Sapiano 2013). Corrections of up to 5 K or more were applied, but there may be some residual calibration errors that need to be reevaluated since SSMIS-F17 shows such significant differences in precipitation detection from the other SSMISs. As for ATMS, its lower HSS and POD values compared to the MHS sensors (Figs. 3d,f) are possibly caused by the doubled footprint size of the 89-GHz channel (e.g., 32 versus 16 km at nadir).
Interestingly, the HSS value over ocean (Fig. 3c) from each sensor is clearly smaller than over land (Fig. 3f). This lower performance is caused by the fact that GPROF adds more precipitating pixels below the KuPR detection limit over ocean to minimize the discrepancy between observed and simulated brightness temperature (Kummerow et al. 2011). Adding these raining pixels compensates for the limitation of KuPR in detecting drizzle, a limitation revealed by comparison to the more sensitive CloudSat. This procedure inevitably increases the light rainfall events that are beyond the KuPR detection limit. Consequently, these pixels are judged as “false alarm” when KuPR is taken as the reference, leading to larger FAR values over ocean than over land (cf. Fig. 3b and Fig. 3e) and smaller HSS values over ocean than over land (cf. Fig. 3c and Fig. 3f). In short, the apparent poorer performance of the sensors over ocean compared to over vegetation is a limitation of using KuPR as the “reference.”
To summarize, in terms of the detection performance, imagers (AMSR2 and SSMIS) outperform sounders (ATMS and MHS) over ocean indicated by larger HSS values, but no clear superiority is observed among these nine sensors over vegetation.
2) Precipitation intensity comparison over ocean
This section compares the precipitation intensity based on coincident observations between KuPR and each sensor. Only observations in the hit category (i.e., when both KuPR and each sensor detect precipitation) are included. The results are presented separately over ocean and over vegetation.
Figure 4 shows the scatterplot between KuPR and each sensor over ocean. Precipitation rates from AMSR2 agree best with those from KuPR, indicated by a higher density of data close to the one-to-one line in Fig. 4a, and a larger correlation coefficient of 0.69 in Fig. 5a. The SSMIS sensors perform similarly in the scatterplots (Fig. 4b to Fig. 4d), with a correlation coefficient of 0.57 (Fig. 5a). All five sounders show similar scatterplots from Fig. 4e to Fig. 4i with a much lower correlation coefficient of ~0.40 (Fig. 5a).
To demonstrate a clearer picture of the precipitation intensity, Fig. 6 provides the probability density function (PDF) for each sensor, showing the precipitation occurrence as a function of precipitation intensity for the nine radiometers over ocean.
The occurrence number (y axis) is normalized by the total number of observations from each sensor and thus is shown as a percentage. The PDF from AMSR2 (Fig. 6a) is closest to that from KuPR, followed by those from the three SSMISs (Figs. 6b–d). The PDFs from ATMS (Fig. 6e) and four MHS sensors (Fig. 6f to Fig. 6i) deviate the most from that of KuPR.
Furthermore, PDFs from these five sounders have two peaks around 1 and 4 mm h−1 . The peak around 1 mm h−1 is also apparent from Fig. 4e to Fig. 4i. The reason for this bimodal distribution from the sounders is currently under investigation. The GPROF algorithm team speculates that the first peak is related to an emission increase in the 89 GHz channel that averages together all the light precipitation in the Bayesian database, resulting in a mean value around 1 mm h−1. The sounders then sense the heavier rain where there is a strong enough scattering signal. It may indicate that sounders lack enough skill to retrieve the light precipitation in the current GPROF framework but they have some signals that they always turn into a similar precipitation intensity (C. Kummerow 2019, personal communication).
We further show the histogram by precipitation amount in Fig. 7. These histograms corroborate the major conclusions drawn so far. That is, precipitation rate retrieved from AMSR2 is the closest to KuPR, followed by the three SSMISs, and last the sounders. The double peaks from ATMS and MHS are more apparent around 1 and 4 mm h−1. In addition, all sensors greatly underestimate the heavier precipitation (>16 mm h−1), with the degree of underestimation from AMSR2 being notably smaller. Underestimation of heavy precipitation is a known artifact of Bayesian averaging by the GPROF retrieval algorithm. In the retrieval process, the Bayesian method averages multiple profiles in the database. It basically means that one can never retrieve the most extreme precipitation intensities in the database, which are averaged with lower values.
The better performance of imagers is due to the utilization of low-frequency channels (11–37 GHz) from AMSR2 and SSMIS (Table 1), and therefore they can better exploit the emission signature from the liquid water drops. It is also clear that AMSR2 outperforms SSMISs, most likely due to the much finer footprint size (e.g., 22 versus 59 km at ~19 GHz) as well as the availability of the 10.7-GHz channel on AMSR2.
3) Precipitation intensity comparison over vegetation
The scatterplot between coincident observations from KuPR and each sensor over vegetation is shown in Fig. 8. The differences among these sensors over land are not as large as that over ocean.
The most obvious feature is that AMSR2 has an intensity peak around 0.7 mm h−1 (Fig. 8a). This feature is more apparent in the histogram plot of Fig. 9a. The likely reason why AMSR2 produces less precipitation intensities lighter than 1 mm h−1 is caused by the lack of channels with frequencies higher than 89 GHz (Table 1). In contrast, all other eight sensors have channels with frequencies around 150 and 183.3 GHz. Over land, the precipitation retrieval algorithm primarily depends on the ice-scattering signature, and these higher-frequency channels (150 and 183 GHz) are more sensitive to this signature than 89 GHz (Bennartz and Petty 2001; Skofronick-Jackson and Johnson 2011; You et al. 2017b). Therefore, AMSR2 has a lower ability to retrieve light precipitation.
Comparing the three SSMIS sensors on board F16, F17, and F18, precipitation rates less than 2 mm h−1 from SSMIS-F17 have a larger spread around the one-to-one line (Figs. 8b–d). This leads to a smaller correlation of 0.44 between KuPR and SSMIS-F17, while the correlations from SSMIS-F16 and SSMIS-F18 are around 0.54 (Fig. 5b). As discussed earlier in section 4b(1), the exact reason why SSMIS-F17 has an inferior performance relative to the other two SSMISs on board F16 and F18 is unknown, but work is ongoing to find an explanation.
All four MHS sensors perform very similarly, indicated by the similar scatterplots from Fig. 8f to Fig. 8i and the similar histogram distributions, either by occurrence (Fig. 9f to Fig. 9i) or amount (Fig. 10f to Fig. 10i). The similar performance is further corroborated by the similar correlation coefficients between KuPR and each MHS sensor, which are ~0.55 (Fig. 5b).
From the scatterplots in Fig. 8 and the PDF by occurrence in Fig. 9, it seems that there is little difference between ATMS and each MHS sensor. However, the difference between ATMS and each MHS sensor is clearly shown in PDF by amount (Fig. 10). Specifically, ATMS has a larger percentage of light precipitation intensity around 2 mm h−1 and a smaller percentage of heavy precipitation intensity greater than 8 mm h−1 (cf. Fig. 10e and Figs. 10f–10i). This is most likely caused by the coarser resolution of 89 GHz from ATMS. As mentioned previously, the footprint size of 89 GHz from ATMS is twice as large as that from MHS (e.g., 32 versus 16 km at nadir). The worse performance from ATMS relative to each MHS sensor is further shown by a smaller correlation of 0.45, while the correlation is ~0.55 for all MHS sensors (Fig. 5b). According to the PMM intercalibration working group, significant calibration changes will be implemented for ATMS in the near future, though it remains to be seen whether these changes will impact the performance of ATMS precipitation estimates.
4) Normalized bias and normalized RMSE
The normalized bias in Fig. 11a immediately reveals the superior performance of AMSR2 over ocean, which is particularly evident for precipitation intensity greater than ~2 mm h−1. Compared with sounders, the better performance from the three SSMISs is also apparent for precipitation intensities from ~0.5 to ~4 mm h−1. RMSE from AMSR2 also shows better performance than all sounders in the full range of precipitation intensity, as shown in Fig. 11b. Relative to sounders, SSMIS sensors show smaller RMSE from ~1 to ~12 mm h−1. These results largely agree with our previous conclusions. That is, AMSR2 performs the best over ocean, followed by the three SSMISs, and the sounders perform the poorest.
Over vegetation, the most apparent characteristic is that ATMS has the largest negative bias from ~1 to ~12 mm h−1. This bias is likely caused by the larger footprint size of 89 GHz from ATMS compared to other sensors. Except this bias, there is no clear superiority based on normalized bias and normalized RMSE among all these sensors.
c. SAPHIR precipitation retrieval performance
Until now, we focused on the assessment of the GPROF retrieved precipitation rate for nine sensors in the GPM microwave constellation over the 65°S–65°N region. SAPHIR is also in the GPM constellation and provides observations two to five times per day over the 30°S–30°N region. While precipitation estimates from SAPHIR are included in IMERG V06, evaluation of its performance remains limited. This section evaluates the SAPHIR retrieval results from Precipitation Retrieval and Profiling Scheme (PRPS) developed by Kidd (2018). To ensure a consistent evaluation, the results from all passive microwave sensors and KuPR are limited to SAPHIR latitudes (30°S–30°N, Fig. 1k).
1) SAPHIR detection performance
Figure 12 shows the detection performance of SAPHIR in contrast to the other GPM constellation sensors. First, the HSS value for SAPHIR over ocean is 0.44, which is only slightly smaller than the ~0.50 from other sounders (ATMS, and MHS) (Fig. 12c). The slightly smaller HSS value from SAPHIR is primarily caused by the notably smaller POD value (Fig. 12a). Second, over vegetation, SAPHIR has a HSS value of 0.60, which is comparable to that from most other sensors except for SSMIS-F17.
Incidentally, this reduction in latitude range also revealed that HSS values from ATMS and MHS increase markedly from ~0.34 over the 65°S–65°N region (Fig. 3c) to ~0.50 over the 30°S–30°N region (Fig. 12c). At the same time, POD values and FAR values from ATMS and MHS also improve when only data in 30°S–30°N are used (cf. Fig. 3a and Fig. 12a; cf. Fig. 3b and Fig. 12b). One reason why ATMS and MHS have a much better detection performance over the tropical region is due to the heavier precipitation systems in the tropical region, compared with those in the midlatitudes. Heavier precipitation systems have larger water paths, which makes it easier for passive microwave radiometers to detect.
In summary, our results show that SAPHIR has a slightly worse detection performance over ocean and comparable detection performance over vegetation compared to the other sounders (MHS and ATMS) over the 30°S–30°N region, even though there are only six channels on SAPHIR around 183.3 GHz (Table 1).
2) SAPHIR precipitation intensity retrieval performance
In terms of the SAPHIR performance over ocean, the correlation between KuPR and SAPHIR is 0.40, comparable to ~0.44 from MHS and ATMS over ocean (Fig. 13a). The normalized bias is similar to those from MHS and ATMS (Fig. 14a). However, the most evident difference from SAPHIR is the much larger normalized RMSE from SAPHIR at precipitation intensities less than 8 mm h−1, shown in Fig. 14b. The wider spread in the scatterplot (Fig. 15d) further corroborates this large RMSE observation. Compared with SSMIS (Fig. 15b) and MHS (Fig. 15c), this large random error is probably explained by the absence of lower frequencies on SAPHIR. Previous studies have shown that 183.3-GHz channels are less sensitive to the ice-scattering signature than the high-frequency window channel of ~150 GHz (Bennartz and Bauer 2003; You et al. 2017b).
Over vegetation, the correlation coefficient from SAPHIR is 0.45, notably smaller than those from MHSs but larger than those from ATMS and SSMIS-F17 (Fig. 13b). Similar to over ocean, the retrieved precipitation rate from SAPHIR shows a larger spread around the one-to-one line in Fig. 16d than that from the other sensors. Consequently, the normalized RMSE is also larger compared with other sensors (Fig. 14d). Again, the lack of high-frequency window channels (~150 GHz) is most likely responsible for the weak correlation and large RMSE.
5. Conclusions and discussions
This study compares the retrieved precipitation rate from 10 sensors using multiple year coincident observations between each sensor and KuPR. We first assess the precipitation estimates from nine sensors over 65°S–65°N, including AMSR2, SSMIS-F16, SSMIS-F17, SSMIS-F18, ATMS, MHS-MetOpA, MHS-MetOpB, MHS-NOAA18, and MHS-NOAA19. We intentionally do not include GMI in this comparison process to avoid the unfair comparison since GMI and KuPR are on the same satellite platform. Our analysis only focuses over ocean and vegetation in order to have sufficient samples. Key results are summarized as follows:
For detection over ocean, imagers (AMSR2 and SSMISs) have much better performance than sounders (ATMS and MHSs). The utilization of the low-frequency channels from imagers primarily accounts for the better detection from these four sensors.
For precipitation intensity over ocean, AMSR2 correlates most strongly with KuPR, followed by the SSMISs, and finally the sounders (ATMS and MHSs). The better performance from AMSR2 relative to the SSMISs is likely caused by the finer footprint size and the availability of 10.7 GHz channel. The better performance from imagers than that from sounders is again due to the utilization of low-frequency channels.
For both precipitation detection and intensity over vegetation, there is no clear superior sensor among these nine passive microwave radiometers. However, both ATMS and SSMIS-F17 have notably worse performance than the other seven sensors. The exact reason why these two sensors perform worse than the other seven sensors is unknown, though it likely stems from larger biases at high-frequency channels (91–183.3 GHz) and the unavailability of the 37 GHz (vertical polarization) channel on SSMIS-F17 and the large footprint size of ATMS.
Comparing SAPHIR to the other radiometers within 30°S–30°N, SAPHIR shows a slightly worse detection performance over ocean than the other sounders, while its detection performance is similar to MHSs over land. In terms of the precipitation intensity, SAPHIR shows a larger normalized RMSE over both land and ocean, which is particularly evident over ocean. Considering that SAPHIR has only six channels, all around 183.3 GHz, the poorer performance is consistent with our expectations based on emission- and scattering-based retrievals.
These results have important implications for generating the level-3 merged product. Currently, the official GPM gridded product, IMERG, prioritizes imagers (e.g., AMSR2 and SSMIS) over sounders (e.g., ATMS and MHS) when there is more than one passive microwave radiometer’s observations in a grid box within the half-hour time period. Our results suggest that this is generally valid over ocean. However, a more refined hierarchy can be adopted. Specifically, we recommend the following priority order over ocean: AMSR2, the three SSMISs, ATMS and the four MHSs, and SAPHIR. Over vegetation, our results do not support the priority order in IMERG. Instead, our results suggest the following two-level priority order: AMSR2, SSMIS-F16, SSMIS-F18, and the four MHSs should be preferable to those from ATMS, SSMIS-F17, and SAPHIR.
While this study does not include GMI, we hypothesize that GMI should perform the best by comparing performance from all other sensors. This study shows that two key factors (the channel availability and the footprint size) affect the retrieval performance greatly. For example, the better results from AMSR2 (compared with ATMS/MHS) over ocean highlights the importance of low-frequency channels. The better results from AMSR2 (compared with SSMIS) over ocean highlights the importance footprint size. Considering that GMI has the finest footprint resolution, and has the full spectrum of frequencies from 10 to 183 GHz available, we therefore hypothesize that it should be ranked ahead of all the others, though we exclude GMI in the comparison process. In fact, Kidd et al. (2018) showed that GMI has the best correlation and HSS values compared with other radiometers in the GPM constellation, relative to the ground radar precipitation estimates over Western Europe and CONUS. More generally, this study demonstrates a framework to determine the order of priority of various passive microwave sensors used in IMERG at any period of time, which can be extended back to the TRMM era (using the TRMM PR instead of GPM KuPR) as well as into the future to include the next-generation sensors.
As mentioned previously, the inferior performance from SSMIS-F17, compared with the SSMISs on board F16 and F18, is potentially due to large biases from the high-frequency channels and the unavailability of the vertically polarized 37-GHz channel with this sensor, considering that the same retrieval algorithm is applied to all three SSMISs. This issue may pose a challenge when calibrating this sensor. Finally, the current GPROF retrieval framework treats all three SSMISs equally. This study indicates that it may be necessary to treat SSMIS-F17 differently from the other two SSMISs, for example, using a different channel weighting scheme in the retrieval process.
The conclusions in this study are obtained using an evaluation against a spaceborne precipitation radar. Compared to conventional evaluation using ground measurements, the use of KuPR as a reference may have some shortcomings in terms of the independence of data (section 2a). However, its near-global coverage confers the advantage of an evaluation over different conditions, particularly over oceans where reliable ground reference is sparse. Together with ground validation of KuPR itself (Skofronick-Jackson et al. 2017, 2018), the framework used in this study can serve as a synergistic evaluation of global precipitation, with potential improvements to widely used gridded products through a refinement of the sensor priority order.
Finally, we would like to emphasize that the conclusions drawn from the comparison in this study are relevant to the GPROF algorithm and are not indicative of the potential power of each of these sensors if different retrieval algorithms were applied to them.
GPM data are downloaded from NASA Precipitation Processing System (PPS) website (https://storm.pps.eosdis.nasa.gov/storm/). We thank Dr. Kummerow for the discussions related to the sounder performance over ocean. This work is supported by NASA’s Precipitation Measurement Missions Program science team via solicitation NNH15ZDA001N-PMM. Y.Y.L. would like to acknowledge the financial support from NOAA Grant NA14NES4320003 (Cooperative Institute for Climate and Satellites-CICS) at the University of Maryland/ESSIC.
This article is included in the Global Precipitation Measurement (GPM) special collection.