Quantitative precipitation estimation (QPE) products from the next-generation National Mosaic and QPE system (Q2) are cross-compared to the operational, radar-only product of the National Weather Service (Stage II) using the gauge-adjusted and manual quality-controlled product (Stage IV) as a reference. The evaluation takes place over the entire conterminous United States (CONUS) from December 2009 to November 2010. The annual comparison of daily Stage II precipitation to the radar-only Q2Rad product indicates that both have small systematic biases (absolute values > 8%), but the random errors with Stage II are much greater, as noted with a root-mean-squared difference of 4.5 mm day−1 compared to 1.1 mm day−1 with Q2Rad and a lower correlation coefficient (0.20 compared to 0.73). The Q2 logic of identifying precipitation types as being convective, stratiform, or tropical at each grid point and applying differential Z–R equations has been successful in removing regional biases (i.e., overestimated rainfall from Stage II east of the Appalachians) and greatly diminishes seasonal bias patterns that were found with Stage II. Biases and radar artifacts along the coastal mountain and intermountain chains were not mitigated with rain gauge adjustment and thus require new approaches by the community. The evaluation identifies a wet bias by Q2Rad in the central plains and the South and then introduces intermediate products to explain it. Finally, this study provides estimates of uncertainty using the radar quality index product for both Q2Rad and the gauge-corrected Q2RadGC daily precipitation products. This error quantification should be useful to the satellite QPE community who use Q2 products as a reference.
Reliable quantitative information on the spatial and temporal distribution of rainfall is essential for a wide range of applications including real-time flood forecasting, evaluation of atmospheric model forecasts, evaluation of precipitation estimates from remote sensing platforms, dam operations and hydroelectric power generation, transportation, and agriculture. Therefore, accurate measurement and uncertainty estimation of precipitation at a range of spatial and temporal resolutions is paramount for a variety of scientific applications. Rainfall measurement has been a challenge to the research community predominantly because of its high variability in space and time, thus limiting the representativeness of in situ measurements from rain gauges and disdrometers (Zawadzki 1975). Ground- and space-based remote sensing instruments provide an opportunity to capture rainfall variability at high resolution, but they have their own set of instrument-specific errors.
While rain gauges collect rainfall directly in a small orifice and measure the water depth, weight, or volume, remote sensing instruments must rely on a more indirect signal. Radar, for instance, transmits and receives a polarized pulse of electromagnetic radiation in order to deduce a great deal of information about the scatterers within the sampling volume. Weather radar has proven its value in the United States leading up to the installation of the Weather Surveillance Radar-1988 Doppler (WSR-88D) network [Next Generation Weather Radar (NEXRAD)], which is the basis for quantitative precipitation estimation (QPE) products evaluated in this study. These radar-based QPEs require evaluation because of their susceptibility to error. Radar QPE errors can be generally grouped into two categories: 1) obtaining a sample representative of precipitation at the surface and 2) conversion of the measured radar signal to rainfall rate. For a detailed summary of radar rainfall errors, the reader is referred to Krajewski et al. (2010).
A project built upon data collected by the NEXRAD network is the National Oceanic and Atmospheric Administration's (NOAA) next-generation National Mosaic and QPE system (Q2; http://nmq.ou.edu), described in Zhang et al. (2011). The Q2 system combines information from all ground-based radars comprising the National Weather Service's (NWS) NEXRAD network, mosaics reflectivity data onto a common 3D grid, estimates surface rainfall accumulations and types, and blends the estimates with collocated rain gauge networks to arrive at accurate, ground-based estimates of rainfall. The uniqueness of the Q2 system lies in its resolution at 1-km2 horizontal resolution and high frequency of product generation at 5 min. A significant amount of research has been conducted over the past 10 years to improve the data quality and accuracy of the Q2 rainfall products (Lakshmanan et al. 2007; Vasiloff et al. 2007; Zhang et al. 2005). To date, the most comprehensive study on the evaluation of Q2 rainfall products was completed by Wu et al. (2012). The evaluation was completed for a year's worth of data, primarily using gauges from the NWS Automated Surface Observing System (ASOS) as ground truth, and they supplemented their analysis using gridded QPEs from the NWS Stage IV rainfall product. They concluded that Q2 generally had lower bias and better correlations with reference rainfall compared to QPEs from the NWS Precipitation Processing System (PPS; also referred to as Stage II) described in Fulton et al. (1998). The errors, however, were shown to be relatively high during the cool season months, at higher rainfall rate thresholds, and within NWS River Forecast Center (RFC) areas of responsibility located in the Intermountain West.
The principal aim of this study is to complement the recent work of Wu et al. (2012) by accomplishing the following goals: 1) describe the spatial and temporal error characteristics of Q2 relative to Stage II and Stage IV QPEs at the highest possible resolution, 2) elucidate the underlying causes of the errors in order to provide valuable feedback to the algorithm developers, and 3) provide an error modeling framework to quantify the uncertainty with Q2 products. We believe this multitiered strategy will be useful for algorithm developers and users of the data alike. Recently, the satellite QPE community has been using Q2 datasets for algorithm validation (Amitai et al. 2009, 2012; Kirstetter et al. 2012); it is thus imperative to identify and quantify the error characteristics of Q2. The next section describes the datasets used in the study and the comparative statistics. Section 3 presents results from the CONUS-wide evaluation for the entire year and then broken down seasonally. RFC-specific statistics are provided in this section as well as in-depth explanations on the error sources. Section 3 also quantifies the uncertainty with Q2 products using a radar quality index (RQI) as the primary describing factor. A summary of results and concluding remarks close the paper in section 4.
2. Data and evaluation methods
a. Stage II radar-only and Stage IV products
Operational NWS precipitation products from the National Centers for Environmental Prediction (NCEP) Environmental Modeling Center (EMC) were obtained from 1 December 2009 to 30 November 2010 over the CONUS at hourly resolution on the 4-km Hydrologic Rainfall Analysis Project (HRAP) grid. The fully automated, radar-only component of the PPS is based on NEXRAD data and is referred to hereafter as “Stage II radar only”. Additional details of the product archive are described at the following website, where the data were obtained: http://data.eol.ucar.edu/codiac/dss/id=21.090. The Stage II radar-only product uses reflectivity on the hybrid scan, which is composed of the lowest elevation angles that clear the underlying terrain. Quality control procedures are applied to the hybrid scan reflectivity in order to remove isolated targets presumed to be ground clutter from buildings and trees and to remove anomalous propagation (AP) by evaluating the percent reduction in reflectivity in tilts above the hybrid scan. Reflectivity (Z) data are converted to rainfall rates (R) using a single Z–R relationship applied to all bins underneath the radar umbrella. Rainfall estimates from adjacent Weather Surveillance Radar-1988 Doppler (WSR-88D) radars are then composited together at NCEP to arrive at a CONUS product. All procedures in generating Stage II radar only are automatic and are considered as inputs to the Stage IV precipitation analysis generated at individual RFCs.
A major responsibility of each RFC is producing a high-quality precipitation analysis on an hourly or at least 6-hourly basis. The Stage IV product is composed of data from WSR-88Ds, rain gauges, and satellite data, with the capability of manual quality control performed by forecasters. The technique of bias correction called P1 was originally developed by forecasters at the Arkansas–Red Basin RFC. A bias field is computed by comparing the Stage II radar estimates of rainfall to gauge reports on an hourly basis. These biases are then spatially analyzed using an inverse distance weighted (IDW) interpolation scheme. The bias field is then applied back to the original radar field so as to preserve the spatial variability of rainfall captured by radar but correct its bias using rain gauge reports. The forecaster then has the capability to perform quality control on the gauge-corrected rainfall field by direct interaction, and they can utilize satellite-based rainfall estimates in regions of poor radar coverage from the Hydro-Estimator algorithm developed at the National Environmental Satellite, Data, and Information Service (Scofield and Kuligowski 2003; Vicente et al. 1998). The multisensor inputs to the Stage IV product combined with the manual interaction performed by trained NWS forecasters results in a very high quality product at 4-km–1-h resolution. Finally, Stage IV products from the 12 individual RFCs are mosaicked over the CONUS, displayed, and archived at NCEP (http://www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/).
The original intent of the study was to use the Stage IV CONUS mosaics as the reference, or ground truth, to evaluate the other products at hourly resolution. Upon examining the CONUS mosaics, however, it was discovered that not all RFCs use the same protocols in terms of generating Stage IV precipitation products at hourly resolution, particularly in the Pacific Northwest. For this reason, the Stage IV 24-h precipitation product is used as a reference hereafter. In addition, the western offices, that is, California–Nevada RFC, Northwest RFC, and Colorado Basin RFC, use a different scheme to estimate hourly (or daily) precipitation amounts than the RFCs in the rest of the country. Because the reference rainfall product in the western RFCs is generated quite differently from the radar-based products being evaluated, larger differences and lower correlations are expected there.
In the western RFCs, gauge-recorded precipitation is spatially interpolated to the terrain using the Parameter-Elevation Regressions on Independent Slopes Model (PRISM) developed by Daly et al. (1994). Recorded precipitation is essentially used to scale the monthly precipitation climatologies, which reflect orographic precipitation patterns over steep terrain. Thus, the Stage IV precipitation product in the West does not use weather radar to depict spatial precipitation patterns, but rather relies on the monthly climatological patterns. Nonetheless, Stage IV precipitation at daily scale and 4-km spatial resolution is considered the “gold standard” in QPE; thus, we rely on it as the reference product hereafter.
b. NOAA/NSSL National Mosaic and QPE products
The Q2 system was originally developed from a joint initiative between the NOAA/National Severe Storms Laboratory (NSSL), the Salt River Project (SRP), and the Federal Aviation Administration's Aviation Weather Research Program. The system assimilates different observational networks, including “raw” level-2 data from the NEXRAD network, model hourly analyses from the Rapid Update Cycle (RUC) model (Benjamin et al. 2004) and rain gauge observations from the Hydrometeorological Automated Data System (HADS). Similar to the Stage II radar-only product, computations are first computed in polar coordinates on grids centered on each WSR-88D site. Hybrid scan reflectivity is computed in the same manner as in Stage II radar only, but the quality control procedures are based on a neural network approach that examines the 3D spatial characteristics of reflectivity (Lakshmanan et al. 2007). The quality-controlled reflectivity data from each WSR-88D site are mosaicked on a Cartesian 3D grid covering the CONUS using the objective analysis scheme described in Zhang et al. (2005). The next step in QPE generation involves a classification step. Each vertical column of reflectivity is examined to determine if the precipitating echoes are either convective, stratiform, or are being generated through warm rain processes. Moreover, the RUC freezing level height analysis is combined with radar observations to determine locations of inflated reflectivity in the bright band (Zhang et al. 2008). The Z observations are corrected to represent near-surface conditions, and the classification type dictates which Z–R equation is applied (i.e., convective, stratiform, or tropical). This logic is applied to each 1-km grid cell independently on a 5-min basis to yield the Q2 radar-only product, referred to as Q2Rad hereafter.
The second Q2 product to be evaluated in this study is the gauge-corrected Q2Rad product, or Q2RadGC. Similar to the Stage IV bias correction method, hourly rain gauge reports are collected and collocated with overlying Q2Rad estimates of rainfall. A bias is computed at each rain gauge location and spatially analyzed using an IDW scheme. In the case of Q2, the two adaptable parameters, the cutoff radius and the exponent controlling the weighting function, are optimized on an hourly basis using a leave-one-out, cross-validation scheme. The interpolated bias field is applied back to the gridded Q2Rad estimates to yield the bias-corrected Q2RadGC field. Because the nominal temporal resolution of gauge reports is hourly, the Q2RadGC product's resolution is 1 km–1 h (Table 1), although Kirstetter et al. (2012) applied the hourly bias values downscale to yield the Q2RadGC product at 5-min resolution.
c. Cross-evaluation periods and methods
Given their best joint data availability, we chose one full year from 1 December 2009 through 30 November 2010 to intercompare the four radar-based QPE products in this study. Note that four months of this time period coincide with that of Wu et al. (2012), who examined algorithm performance during the warm season of 2009 (April–September) and cool season (October 2009 to March 2010). Figure 1 shows the percentage of data available from each of the four products for the 1-yr study period. Both Q2Rad and Q2RadGC have the highest data availability (100%) during this period while Stage IV has comparable availability. However, the Stage II radar-only products show data outages as high as 20%, which seem to be radar specific and concentrated in the Intermountain West. The reader should not confuse Fig. 1 with effective radar coverages in complex terrain, as in Maddox et al. (2002). Rather, the lack of data availability can be interpreted as a loss in transmitted data due to radar outages. In any case, we only conducted the intercomparisons for grid cells containing data availability of 80% or greater, which still covers 99% of the CONUS.
For quantitative evaluation indices, we use the following statistics to compare the Q2Rad, Q2RadGC, and Stage II radar-only products (R) against the Stage IV daily precipitation accumulations: difference (Diff) in millimeters per day, relative difference (RD) in percent (also referred to as bias), root-mean-squared difference (RMSD) in millimeters per day, Pearson linear correlation coefficient (CC), and normalized standard error (NSE), where
In (4), cov refers to the covariance, and σ is the standard deviation. The total number of comparisons N varies according to the spatiotemporal domain over which the statistic is computed (i.e., annual, seasonal, or RFC wide). NSE in (5) normalizes the mean standard error (with mean denoted by overbar) for a given comparison category so that the values scale from 0 to 1, with 1 being the worst performance. As with many statistical measures, there is not a single one that adequately describes the overall error, so it is recommended to examine all of them jointly.
a. Daily precipitation evaluation
Figures 2a–d show daily mean precipitation from 1 December 2009 to 30 November 2010 estimated by the four QPE algorithms investigated in this study. Each algorithm depicts similar precipitation patterns over the CONUS, but there are notable differences. First, the Q2Rad product appears to have too much precipitation in the central plains and the South compared to the other products. The three fully automated algorithms (i.e., Stage II, Q2Rad, and Q2RadGC) are similar in their lack of precipitation, compared to Stage IV, along the coastal mountain chains in the West and their radar-centric, circular patterns of precipitation in the Intermountain West and north-central plains. Q2RadGC does slightly better than the other two with precipitation on the West Coast. The same three algorithms depict an unlikely precipitation minimum in eastern West Virginia, which was likely a result of beam blockage and was subsequently corrected in the Stage IV product.
Density-colored scatterplots and statistics are shown in Figs. 2e–g using the Stage IV product as a reference. The CC with Stage II is the lowest at 0.20 because of several isolated points that were overestimated because of ineffective ground clutter removal. These problematic points are largely responsible for Stage II having the highest RMSD at 4.5 mm day−1. Overall, Stage II is associated with little bias though with an RD of −6.8%. Stage II is compared to the other automated, radar-only product, Q2Rad. Q2Rad benefits greatly from improved data quality procedures. There is overestimation by 7.9% that appears most predominantly in the regions of the central and southern United States that receive the most warm-season rainfall. The gauge-correction scheme to Q2Rad in the Q2RadGC product reduces this bias down to −4.1% and is also successful in dropping the RMSD from 1.1 to 0.8 mm day−1 and increasing the CC from 0.73 to 0.81. The remaining differences between Q2RadGC and Stage IV are in the Intermountain West and West Coast, where Stage IV relies on precipitation gauges and the mountain mapper technique to represent climatological precipitation patterns over complex terrain.
Table 2 provides overall statistical performance as quantified by Diff, CC, and NSE over the CONUS and is then broken down by regions and RFCs. The Diff statistic indicates that while the Q2Rad product performs well when considering all grid points over the CONUS, it underestimates more severely in the western RFCs than the other algorithms (−0.64 mm day−1) and overestimates more than the others in the central and eastern RFCs (0.33 and 0.61 mm day−1). The Stage II product, on the other hand, suffers similar underestimation problems as Q2Rad in the western RFCs (−0.61 mm day−1) but is less biased than Q2Rad in the central and eastern RFCs (0.21 and 0.13 mm day−1). The overestimation of Q2Rad in the central and eastern RFCs is a topic we address more completely in later sections. The gauge-correction scheme in Q2RadGC removes the bias in Q2Rad in the central and eastern RFCs almost completely, but the western RFCs are still plagued by underestimation (−0.49 mm day−1). Recall that Stage IV uses the gauge-based mountain mapper approach in the West, so it is not surprising to see large discrepancies (i.e., lower CC values) with the other products that use radar data. These algorithmic differences impact the CC in the western RFCs, with all three algorithms apparently performing the worst there. The CCs for Q2Rad and Q2RadGC are better than Stage II in the central and western regions.
Next, we produced spatial maps of Diff, CC, and RMSD at each 4-km grid point using daily rainfall accumulations over the 1-yr study period. Again, Stage IV is assumed to be the reference. The bias patterns are quite similar among the radar-based products in the West, where there is moderate underestimation in the intermountain regions and much more significant underestimation along the coastal mountain chains, like the Sierra Nevada and Cascade Range (Fig. 3a–c), and the interior chains, like the northern Rocky Mountains and the Bitterroot Range. These deficiencies are a result of 1) poor low-level coverage in these complex terrain regions and/or 2) an inappropriate Z–R equation applied for orographic precipitation (Rosenfeld and Ulbrich 2003). It is also worth noting that all three algorithms yield too much rainfall in the valleys near Seattle and Portland. The reason for the overestimation in the northern valleys is likely because of bright band contamination from radar sampling of partially wetted hydrometeors in the melting layer. Radar range–dependent biases are prevalent in the intermountain regions, and the gauge correction in Q2RadGC does little to remove this bias pattern. In the eastern United States, the Appalachian Mountains stand out with underestimated precipitation, but not as severely as in the West. In terms of algorithmic differences, Stage II yields several “hot spots” with overestimation. Some of these occur because of the presence of nearby mountains, such as the Mogollon Rim in Arizona, the Ozarks in Arkansas, and the Wichita Mountains in Oklahoma. The apparent data quality control problems also appear in parts of Texas, where there is not significant terrain relief. Q2Rad overestimates precipitation in the central and southern United States, which tends to be most prominent where there are precipitation maxima. This bias is nominally removed using gauge adjustment in the Q2RadGC product, with the exception of mountainous regions.
The spatial pattern of CC makes it quite obvious where the radar-based algorithms suffer from lack of low-level coverage in the West. Figure 3d–f corresponds quite well to the effective radar coverage maps of Maddox et al. (2002). The incorporation of rain gauges in the estimation scheme improves the CC but indicates that much research is needed for automated QPE in these intermountain regions. The distribution of RMSD reiterates the aforementioned algorithmic deficiencies in the coastal and intermountain ranges of the West (Fig. 3g–i). The Stage II RMSD reveals an interesting demarcation of larger errors to the south and east of the Appalachians. There is a suggestion of this pattern in the Q2Rad product, but it is not as obvious. This feature exists in regions of the southeast where there are large annual rainfall amounts; thus, one might expect higher RMSD values there. However, the high RMSD feature also exists in drier areas like South Carolina and eastern parts of Florida (see Fig. 2d). The air mass in this region generally has higher moisture content, and we hypothesize that this impacts the drop size distribution and thus the accuracy of the reflectivity-to-rainfall (Z–R) equation being used. Q2Rad differs from Stage II in this regard, as it applies a differential Z–R relationship on a pixel-by-pixel basis, whereas Stage II uses the same equation for all grid points within a given radar umbrella.
b. Seasonal analysis
Seasonal precipitation patterns are shown in Fig. 4 for all four algorithms. The range-dependent biases plaguing the Stage II, Q2Rad, and Q2RadGC algorithms in the Intermountain West appear most prominently in winter and spring when precipitation typically falls from shallow, stratiform systems. The radar-centric patterns are much less obvious in fall and are practically gone in summer. The hotspots noted in the Stage II product appear more numerous in spring and fall months. These are times of the year when midlatitude synoptic storms traverse the CONUS and are associated with the most frequent frontal passages. We hypothesize that low-level temperature and humidity profiles produce beam-ducting conditions during these transitional times of the year and cause AP that is not properly removed from Stage II precipitation products.
Density-colored scatterplots and associated statistics quantify the seasonal errors in Fig. 5. All three algorithms yield the most amount of spread or dispersion about the 1:1 line in winter, while the smallest random errors for each algorithm appear to be in summer. The relatively high RMSDs and low CCs with the Stage II product during the spring and fall months confirm the prior assertion that very large accumulations at isolated points, presumably from AP, are the culprit and require further attention to improve the product's data quality. It is also noteworthy that the Stage II product underestimates during the spring and fall with an RD of approximately −10%, which worsens in the winter to −32.9% and then overestimates in the summer by 16.5%. The Q2Rad product yields less bias and smaller random errors (i.e., lower RMSD and higher CC) compared to the Stage II product; the remaining errors also have much less seasonal dependence. The only exception to this generalization is the RD of 21.9% that occurs during the summer. Figure 5g shows that this overestimation occurs with daily average rainfall amounts greater than 8 mm day−1. Because the bias is a summertime phenomenon, it is less likely to be associated with a radar sampling issue, but rather to the logic for classifying precipitation types and applying different Z–R relationships. When hourly rain gauge data are introduced in the Q2RadGC product, RMSD reduces and CC increases for all seasons. Bias remains below 10% except for the winter season (RD = −16.9%). Apparently, present algorithms that use radar and rain gauge data alone are insufficient to properly estimate cool season precipitation in complex terrain.
c. Statistics for each RFC
The Stage IV analysis is a QPE product generated by the 12 RFCs and is later mosaicked for the CONUS at NCEP. These 12 RFC regions of responsibility include the Northwest (NW), California–Nevada (CN), Colorado Basin (CB), Missouri Basin (MB), Arkansas–Red Basin (AB), West Gulf (WG), North Central (NC), Ohio (OH), Lower Mississippi (LM), Northeast (NE), Middle Atlantic (MA), and Southeast (SE). Statistical performance of the three algorithms in each of the RFC regions (annual average and broken down by season) is provided in Tables 2 and 3. Below, we evaluate the seasonal differences for each RFC region.
Figure 6 shows underestimation being prevalent from all three algorithms in the western RFCs (NW, CN, and CB), especially during winter. These radar-sampling errors are most pronounced during the cool season because of precipitation falling from shallow, stratiform clouds. Large gradients in spatial precipitation patterns caused by the complex terrain limit the spatial representativeness of rain gauges, highlighting the need for developing novel QPE data collection and precipitation estimation strategies. The residuals with the Stage II product reveal a seasonal pattern of negative perturbations from their annual mean difference in the winter that become positive in the summer; this is the case for every RFC region. In the case of the NW, NE, and MA RFC regions (Figs. 6a,d,h), the differences remain negative throughout the entire year. In the central and southern plains, the differences are generally negative in winter and become positive in summer. With the exception of the western RFCs (and perhaps the AB), this seasonal pattern is greatly reduced with the Q2Rad product. However, the average differences exceed 1.0 mm day−1 in the NC, AB, and LM RFCs (Figs. 6c,f,k), and are consistently positive in the OH, WG, and SE (Figs. 6g,j,l) RFCs. The Q2RadGC product mitigates these errors, except in the western RFC regions.
The incorporation of rain gauge data has improved QPE in the Q2 system for regions located east of the Rocky Mountains. It is worthwhile to further examine the bias that has been identified with the unadjusted Q2Rad product. Continued improvements to this radar-only product will apply at smaller scales (5 min) where rain gauge data do not contribute as much. Applications at this finer scale include flash flood forecasting and evaluation of spaceborne precipitation estimates from passive and active microwave sensors (see, e.g., Kirstetter et al. 2012). The previously identified bias with Q2Rad appears in areas of heavy climatological precipitation in the southeast United States and does not appear to be an outcome of radar-sampling deficiencies, such as in the West. A distinguishing characteristic of the Q2Rad algorithm is the method of identifying each pixel as being convective, stratiform, or tropical and applying a suitable Z–R equation. Convective echoes are distinct from stratiform precipitation because of their larger updraft strengths, while tropical precipitation suggests active warm rain microphysical processes. While this differential Z–R logic has been successful in removing regional biases (i.e., overestimated rainfall from Stage II east of the Appalachians) and seasonal bias patterns, a seemingly stationary bias remains. Below, we introduce results from the precipitation type classification within Q2 to further illuminate the source of these errors.
Figure 7 shows the percentages of precipitation types that were identified by Q2 for each season and for each RFC region. The greatest contribution is from stratiform rainfall with values greater than 90% for each season and for each RFC region. This is not a surprising result given the larger spatiotemporal scale associated with stratiform precipitation systems compared to convective storms. Convective precipitation peaks in the summer at all RFCs and tends to have the largest relative contribution to total precipitation in the northern-tier RFC regions (i.e., MB and NC in Figs. 7b,c). The Q2 system uses a Z–R equation for grid cells identified as “tropical” that results in the highest rainfall rates per reflectivity value (Xu et al. 2008). In contrast to the spatial and seasonal behavior of Q2 convective precipitation types, the contribution from tropical precipitation does not conform to physical intuition. Figure 7f, 7g, 7j, and 7k all show the maximum in tropical rainfall occurs in spring in the southern and central plains while Fig. 7l shows a maximum occurrence in the winter months. Rainfall from warm rain processes in these regions are expected the most in the summer and fall months, which is not reflected in the results. We conclude that the precipitation typing algorithm in Q2 has been overclassifying precipitation as tropical in RFC regions in the central and eastern RFCs, where there is comparatively good radar sampling, and thus explains the bias revealed in the Q2Rad product.
d. Error modeling of Q2 rainfall products
Uses of Q2 products to evaluate other remote sensing rainfall estimates and for hydrological applications will greatly benefit from uncertainty estimates of the QPEs. The product suite includes a radar quality index (RQI) product (Zhang et al. 2012) that scales from 0 to 1 depending on the height of the lowest elevation angle relative to the melting layer and percent beam blockage by terrain. An RQI value of 1 indicates unblocked sampling in rain below the melting layer. Kirstetter et al. (2012) used gauge-adjusted Q2Rad at 5-min resolution to examine errors from space-based rainfall estimates. They noted better agreement between datasets following each level of Q2 processing and recommended restricting the most trustworthy rainfall dataset with RQI values equal to 1. In this study, we rely on the Stage IV daily precipitation amounts to compute residuals for each RFC region. Figure 8 shows the quantiles of the residuals plotted as a function of RQI. We see the greatest dependence of the residuals on RQI exists primarily in the western and northern regions (Figs. 8a–e,i). Increasing slopes of the quantile curves indicates underestimation occurs as a function of beam height. This range-dependent bias, due to dependence on the vertical profile of reflectivity (VPR), is most pronounced where stratiform precipitation from shallow clouds is most common in the western and northern-tier RFCs. We note the systematic error (bias) approaches 0 as the RQI increases. However, positive residuals remain, with RQI values of 1 for several of the RFC regions, primarily in the central plains and the South, in agreement with prior analyses.
The error quantiles have also been computed for the Q2RadGC product in Fig. 9. First, we note the introduction of the gauge correction does little to mitigate the VPR dependence in the NW, MS, CN, and CB RFC regions (Figs. 9a,b,e,i). Systematic biases remain because of underestimation at low RQI, which is typically at far range from radars. The slopes of the curves for the other regions where there are not significant terrain blockages have become much flatter compared to Fig. 8, indicating the gauge-correction scheme largely resolved the prior dependence on RQI. Moreover, the 50% quantiles with the Q2RadGC product converge to residuals of 0 at RQI values close to 1. This indicates the systematic bias in these RFC regions can be safely neglected while the spread between the quantiles provides information about the random errors. This information can now be used to yield QPE ensembles or probabilistic QPE values from Q2.
4. Summary and conclusions
Quantitative precipitation estimation (QPE) products from the next-generation National Mosaic and QPE system (Q2) provide the highest temporal and spatial resolution (5 min–1 km) precipitation products from ground-based observations with the largest coverage over the CONUS. Applying such high-resolution QPE products to hydrological applications, such as flash flood forecasting systems, has become feasible in the hydrometeorological community. Moreover, it is possible to use these products to evaluate satellite-based QPE products from polar-orbiting platforms (e.g., Kirstetter et al. 2012). However, Q2 has only been evaluated by Wu et al. (2012) in a comprehensive way over large spatiotemporal domains. In this study, we evaluated daily Q2 precipitation products at annual and seasonal time scales over the CONUS and then for each of the National Weather Service River Forecast Center's (RFC) regions of responsibility. The comparison was carried out for the fully automated radar-only Q2Rad product, the gauge-adjusted Q2RadGC product, and the Stage II radar-only product using daily accumulations from December 2009 to November 2010.
We evaluated products at the 4 km–daily scale because of the trustworthiness of the Stage IV product at this scale, which incorporates radar, rain gauges, a model for climatological rainfall patterns due to orographic enhancement, and manual forecaster quality control. All products evaluated use the same radar inputs, and several of the gauge inputs are the same as those used in the reference. Despite this lack of independence, we believe there is much information to be learned from the evaluation. While the Stage IV product is considered the best at daily/4-km scale, the finer scale Q2Rad and Q2RadGC products are useful and appropriate for evaluating spaceborne products from low-earth-orbiting satellites and have applications in flash flood and flood forecasting. We provide explanations for some of the error characteristics by incorporating additional Q2 products and finally provide error models of the Q2 products based on the radar quality index (RQI) product. Findings from this study are summarized as follows:
In comparing the radar-only Stage II and Q2Rad products over the entire CONUS, both have relatively small biases (−6.8% and 7.9% respectively), but Q2Rad has a much smaller RMSD at 1.08 mm day−1 (compared to 4.51 mm day−1 with Stage II) and greater CC at 0.73 (compared to 0.20 with Stage II). The low CC with Stage II is due to several isolated points that were overestimated because of ineffective ground clutter removal. These hotspots occurred most frequently during the spring and fall months.
The three fully automated algorithms (i.e., Stage II, Q2Rad, and Q2RadGC) are similar with their lack of precipitation along the coastal mountain chains in the West and their radar-centric, circular patterns of precipitation in the Intermountain West and north-central plains. These deficiencies were found to be a result of inadequate low-level coverage by NEXRAD, which results in the greatest errors with shallow, stratiform clouds during the cool season and perhaps inappropriate Z–R equations applied for orographic precipitation. These problems are not readily corrected using rain gauges and thus require new approaches by the research community.
The Q2 logic of identifying precipitation types as being convective, stratiform, or tropical at each grid point and applying differential Z–R equations has been successful in removing regional biases (i.e., overestimated rainfall from Stage II east of the Appalachians) and greatly diminishes seasonal bias patterns.
Q2Rad overestimated average daily rainfall in the LM and NC RFC regions by 0.91 and 0.73 mm day−1, respectively. Although this bias was readily corrected through gauge adjustment, intermediate Q2 products were introduced in order to help explain the overestimation problem. It was discovered that the automatic algorithm for identifying precipitation profiles representative of warm rain microphysics, thus adapting the Z–R relation to this tropical environment, had nonintuitive seasonal and spatial characteristics. In particular, the identification of tropical rain in these regions was deemed to be too high, especially for winter and spring months.
Daily precipitation errors for Q2Rad and Q2RadGC are now modeled as a function of the RQI product. These plots revealed strong dependence of systematic bias in Q2Rad and Q2RadGC products on RQI in the western RFCs. For the remaining RFCs with reasonable low-level radar coverage, the gauge adjustment yielded quantile plots that depended much less on RQI. Moreover, the residuals approached 0 for RQI values of 1, indicating success of the gauge-adjustment scheme.
The identification and quantification of seasonal and spatial characteristics of errors in the Q2 system now provide detailed information to algorithm developers and users alike. The sampling-related deficiencies in the Intermountain West were largely anticipated following the work of Maddox et al. (2002). A revelation that the overestimation problem was a result of the tropical rain identification algorithm, which has now been corrected for present and future algorithm versions. Nonetheless, this points to the need for continuing, regular studies on the evaluation of precipitation products from remote sensing platforms. The quantification of uncertainty in daily precipitation amounts segregated by each RFC region will be of great use for users who can utilize the products in an ensemble or probabilistic sense. Future work will focus on the utilization of data from other spaceborne platforms to address the challenging lack of NEXRAD coverage in the vast intermountain regions of the West. Applying the developed error models for daily Q2Rad and Q2RadGC products downscale to hourly and 5-min resolution is a topic worthy of research. The dependence of systematic errors on RQI should prove to be useful in this research, as the RQI product is generated on a 5-min basis.
The authors acknowledge the Atmospheric Radar Research Center's financial support. NOAA/NSSL is greatly acknowledged for providing the Q2 data. Partial funding for this work has been provided by the NASA Global Precipitation Measuring Ground Validation Program. Partial funding was provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA11OAR4320072, U.S. Department of Commerce. We appreciate the comments from two anonymous reviewers who helped improve the quality of the manuscript.