The evaluation uncertainty caused by a standard reference itself is harmful to both algorithm developers and data users in substantially understanding the error features and the performance of satellite precipitation products (SPPs). In this study, the Climate Precipitation Center Unified (CPCU) data and the Merged Precipitation Analysis (MPA) data are used as the benchmark to investigate the evaluation uncertainties of satellite precipitation estimates generated by the reference itself. Two SPPs, IMERG-Late and GSMaP-MVK, are employed here. The results show that the approach using two different ground-based precipitation products as the references can effectively reveal the potential evaluation uncertainties. Interestingly, it is found that the evaluation results are prone to resulting in larger uncertainties over semihumid areas. Furthermore, evaluation uncertainty of statistical metrics is closely related to rainfall intensity in that it has a gradually decreasing tendency with increasing rainfall intensities. Additionally, we also found that the dependency of the false alarm ratio (FAR) and root-mean-square error (RMSE) scores on the spatial density of rain gauges is relatively low. Both relative bias (RBIAS) and normalized root-mean-square error (NRMSE) scores for light precipitation (1–5 mm day−1) increase with the spatial density of the rain gauges, suggesting that the evaluation of light precipitation can easily cause uncertainties relative to medium-to-high rain rates. Finally, the minimum gauge density required for different scores and different rainfall intensities is discussed. This study is expected to provide criteria to investigate the reliability of evaluation results for the satellite quantitative precipitation estimation community.