The National Centers for Environmental Prediction (NCEP) stage IV quantitative precipitation estimates (QPEs) are used in many studies for intercomparisons including those for satellite QPEs. An overview of the National Weather Service precipitation processing system is provided here so as to set the stage IV product in context and to provide users with some knowledge as to how it is developed. Then, an assessment of the stage IV product over the period 2002–12 is provided. The assessment shows that the stage IV product can be useful for conditional comparisons of moderate-to-heavy rainfall for select seasons and locations. When evaluating the product at the daily scale, there are many discontinuities due to the operational processing at the radar site as well as discontinuities due to the merging of data from different River Forecast Centers (RFCs) that use much different processing algorithms for generating their precipitation estimates. An assessment of the daily precipitation estimates is provided based on the cumulative distribution function for all of the daily estimates for each RFC by season. In addition it is found that the hourly estimates at certain RFCs suffer from lack of manual quality control and caution should be used.
Many studies have used the National Weather Service/National Centers for Environmental Prediction (NWS/NCEP) stage IV quantitative precipitation estimates (QPEs) for analysis and comparison. The NCEP stage IV product, herein referred to as stage IV, is a near-real-time product that is generated at NCEP separately based on the NEXRAD Precipitation Processing System (PPS; Fulton et al. 1998) and the NWS River Forecast Center (RFC) precipitation processing (Seo and Breidenbach 2002). Originally, the stage IV product was intended for assimilation into atmospheric forecast models to improve quantitative precipitation forecasts (QPFs; Lin and Mitchell 2005). The product as it is currently generated and archived has become quite popular for various applications. However, there is some confusion as to what the stage IV product is, what it is not, and how it is produced. This paper attempts to answer these questions as well as present the quantitative and qualitative measures of the product itself.
A review of the studies that use the stage IV product can be categorized into five main topics: assimilation, hydrologic model evaluation, QPF evaluation, radar QPE evaluation and comparison, and satellite QPE comparison. Lopez and Bauer (2007) investigated the impact of using stage IV data for 1D and 4D variational assimilations, and Lopez (2011) used the stage IV product for 4D variational assimilation in the eastern United States in the ECMWF’s global Integrated Forecast System.
Gourley et al. (2011) used stage IV and satellite QPEs in a comparison for calibration of a hydrologic model. Kalinga and Gan (2010) used stage IV results for assessing the infrared microwave rainfall algorithm via the Sacramento soil moisture model (Burnash et al. 1973). Yilmaz et al. (2005) compared gauge-, radar-, and satellite-based precipitation estimates accounting for hydrologic modeling. Habib et al. (2008) used stage IV data as input into a salinity model to study the effects of the variability of rainfall on the model outputs. Several studies used the stage IV data for evaluation of QPFs from different forecast models (Gallus 2002; Davis et al. 2006a,b; Yuan et al. 2007a,b, 2008, 2009).
Several studies have performed evaluations of the stage IV product but for limited spatial and/or temporal extents. Wang et al. (2008) used gauges to compare the stage IV product with the RFC specific stage III product for one basin in Texas. The main finding for this basin was that the Multisensor Precipitation Estimation (MPE) has a higher capability of rain detection than do gauges or stage III data. Westcott et al. (2008) compared monthly gauges amounts to the stage IV product and found that stage IV overestimates precipitation at the low end and underestimates precipitation at the high end. Habib et al. (2009a) performed a validation of the stage IV product using a dense network of gauges over Louisiana at small spatial and temporal scales. Nelson et al. (2010) used stage IV data for comparison to a reanalysis product over the southeastern United States. They found that the added value of quality control of the input radar-only products and gauge data provided better estimates of multisensor precipitation as compared to stage IV. Wu et al. (2012) evaluated the National Mosaic and Multisensor Quantitative (NMQ) Precipitation Estimation System (Zhang et al. 2011) for two seasons during 2009 using gauges and the stage IV product as supplementary information. They found some improved statistics for the NMQ versus the stage IV products but the 6-hourly stage IV product has a higher correlation coefficient than the 1-h stage IV product. Habib et al. (2013) evaluated six different products from the NWS MPE algorithm over Louisiana using rain gauges. They conclude that the most effective improvement in the rainfall products comes from applying the mean-field bias adjustment to the radar-only product. Hou et al. (2014) describe a methodology for generating a new dataset by adjusting 6-h accumulations of the stage IV data to the NWS Climate Prediction Center (CPC) unified gauge estimates. (Chen et al. 2008) The method has some limitations with heavy-to-extreme precipitation. Although there are many studies evaluating the stage IV product, these studies are limited in their scope spatially and/or temporally. To date, no comprehensive assessment of the product exists.
The stage IV product is getting by far the most use in satellite QPE intercomparisons. Joseph et al. (2009) have created a high-resolution multisatellite-derived QPE [using Special Sensor Microwave Imager (SSM/I), Advanced Microwave Sounding Unit (AMSU-B), Advanced Microwave Scanning Radiometer (AMSR), Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E), and TRMM Microwave Imager (TMI)] and used the stage IV data for comparison. Several studies used the stage IV for comparison and verification of algorithms to improve satellite QPEs (Ferraro et al. 2005; Hong et al. 2006; Barros and Tao 2008; Tao and Barros 2010; Tesfagiorgis et al. 2011). Villarini et al. (2011) evaluated QPEs for a satellite (TRMM), a model [North American Land Data Assimilation System (NLDAS)], and radar (stage IV) for three hurricanes during 2004 and found stage IV to be the best. Jiang et al. (2008) used stage IV to evaluate satellite QPEs for Hurricane Isidore. Zagrodnik and Jiang (2014) used the stage IV product to evaluate the TRMM PR and TMI products for landfalling tropical cyclones over the southeastern United States. Habib et al. (2009b) evaluated TRMM-based estimates over Louisiana for tropical-based heavy rainfall events. Habib et al. (2012) evaluated the CPC morphing technique (CMORPH; Joyce et al. 2004) using a dense rain gauge network over Louisiana. And, finally, AghaKouchak et al. (2011) evaluated four different satellite QPEs [CMORPH, Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN; Hsu et al. 1997), TRMM-RT, and TRMM-V6] using stage IV. Each of these studies used the stage IV product for analysis, but an assessment of the stage IV product does not exist that these studies can reference. This paper provides such an assessment.
This paper is organized as follows. Section 2 provides an overview of the stage IV development and processing. Section 3 presents the issues and biases that are evident in the stage IV product. Section 4 presents an attempt at conterminous United States (CONUS) wide comparisons for illustration of the issues and biases raised in section 3. Section 5 provides conclusions and recommendations.
2. NWS precipitation processing
We provide a brief overview of the data products generated at the radar site, the RFCs, and NCEP. Currently there are 144 Weather Surveillance Radar-1988 Doppler (WSR-88D) sites in the CONUS, Hawaii, and Alaska. WSR-88D sites also exist in Guam, South Korea, and Japan. Klazura and Imy (1993) provide details as to the products that were initially generated at the WSR-88D sites, and Fulton et al. (1998) provide a detailed description of the operational WSR-88D precipitation algorithm. Kitzmiller et al. (2013) give an overview of the estimation techniques in the National Weather Service’s hydrologic operations.
a. Radar-site processing
1) Level products
Historically, WSR-88Ds have produced three base products—radial velocity, spectrum width, and reflectivity—referred to as level II data. Initially, these products were recorded at 1° and 1-km spatial resolution and at 5-, 6-, or 10-min (depending on the radar scanning strategy) temporal resolution (Crum et al. 1993). Recently, the WSR-88D sites were upgraded to be able to provide products at higher spatial resolutions of 0.5° and 0.5 km. Level II data are used by NWS Weather Forecast Offices (WFOs) for real-time monitoring and warning of severe weather. In addition, the level II data (specifically the reflectivity product) are processed through the WSR-88D rainfall algorithm (Fulton et al. 1998) to provide products to the NWS RFCs for real-time decision support and warnings. These are typically called level III products.
2) Stage I
Calibrated, quality controlled reflectivity data are input into the PPS. The PPS algorithm involves five scientific processing components: (i) reflectivity preprocessing, (ii) rain-rate conversion, (iii) rainfall accumulation, (iv) precipitation adjustment, and (v) precipitation products; see Fulton et al. (1998) for a detailed description of each of the five processing components. This algorithm, which runs at each WSR-88D site, is called stage I and produces the Digital Precipitation Array (DPA). The DPAs are produced at hourly 4 km × 4 km resolution. The DPA data are then passed to the NWS RFC for hydrologic forecasting.
3) Stage II
Stage II data are generated from the stage I and rain-gauge estimates. Stages II and III (Hudlow 1988) were used only until about 2002, at that point MPE replaced stage III and stage II was removed (Seo and Breidenbach 2002). Stage II consists of a radar only estimate, rain-gauge data, and a multisensor estimate. The stage II process consists of a mean field bias correction (Smith and Krajewski 1991) and radar–gauge merging (Hudlow 1988), resulting in the multisensor precipitation estimate.
b. RFC processing
1) Stage III
Prior to 2002, stage III of the NWS rainfall PPS referred to the creation of the RFC-wide mosaic of the stage II data. It was a mosaic of stage II multisensor data from multiple radars in an RFC’s domain that was gridded to the national Hydrologic Rainfall Analysis Projection (HRAP; Greene and Hudlow 1982; Reed and Maidment 1999). Figure 1 shows the 13 RFC domains, and Table 1 provides the RFC names and abbreviations as well as their locations. RFCs had control to manually edit or remove bad data in the stage III processing. Stages II and III were replaced in 2002 by the multisensor precipitation estimator (for most RFCs).
2) Multisensor precipitation estimator
The multisensor precipitation estimator incorporated improvements based on experience with the stage II and III processing. However, some RFCs do not use the MPE algorithm. For example, the Arkansas–Red basin RFC (ABRFC) uses its own version of multisensor processing called the P3 algorithm (Young et al. 2000; Seo and Breidenbach 2002). Furthermore, the California–Nevada RFC, the Northwest RFC, and the Colorado-basin RFC do not include radar-based estimates in their regional processing (Hou et al. 2014). In addition, radar coverage in the western United States is poor as a result of the scarcity of the radar network and blockage from the mountains (Maddox et al. 2002). Figure 1 shows the radar coverage over the CONUS at 230 km from the radar. Figure 1 also shows the radar locations and their range rings along with the overlap of other radars in the vicinity. The MPE is similar to stage III in that there is flexibility for the RFC to quality control bad radar and gauge data, and then the DPA and rain-gauge data are merged via the MPE algorithm. In addition, MPE has a component designed to merge satellite QPEs from the NOAA Hydroestimator (Scofield and Kuligowski 2003). The MPE includes several different products at the RFC-wide domain including radar-only, bias-adjusted radar (both mean field and local); gauge only; and satellite bias adjusted (Seo and Breidenbach 2002).
3) Mountain Mapper
The western CONUS RFCs use a different rainfall processing algorithm. For instance, the California–Nevada, Northwest, and Colorado-basin RFCs use what is known as the Mountain Mapper (Hou et al. 2014; Schaake et al. 2004). Mountain Mapper is a gauge-only algorithm that attempts to adjust for climatological variations due to topography and wind directions similar to a Parameter-Elevation Regressions on Independent Slopes Model (PRISM) type adjustment (Daly et al. 1994).
4) P3 algorithm
In P3, the first step is to merge the hourly digital precipitation (HDP) products to produce a radar-only product for the entire RFC area. The next step involves computing the ratio between the gauge- and radar-only precipitation estimates and assigning the ratio to the HRAP cell that contains the gauge. For HRAP cells that do not contain a gauge, a ratio is computed from nearby cells by interpolation using a distance-weighting scheme. In the final step, the radar precipitation estimates are multiplied by the ratio field to produce the P3 multisensor precipitation product. The P3 algorithm generates a QPE field that tends to agree more with gauge reports near gauge sites; thus, long-term averaging shows evidence of the “bull’s-eye” effect.
c. NCEP processing
1) NCEP stage II
During the 1995–96 time frame, NCEP started producing its own stage II product. We only provide a description of the NCEP stage II product for information, and we do not use it in our analysis. NCEP produces two CONUS-wide products, stages II and IV, which cause some confusion with the stage II product that was produced at the RFCs (described previously in this section). It is important to note that the NCEP stage II product is different than the NWS PPS stage II product. The NCEP stage II algorithm was adapted from the optimal rainfall algorithm developed by the NWS Office of Hydrologic Development (OHD; Seo 1998) and produced on the national HRAP grid. This national product was developed for data assimilation for use in national QPF (Lin and Mitchell 2005). NCEP stage II uses ASOS sites and the DPA (also known as stage I) data to generate an hourly 4-km QPE in near–real time, and then it uses the HADS (Kim et al. 2009) automated gauges and DPA to rerun the hourly 4-km QPE 6 and 18 h later. The impetus for generating the near-real-time QPE is for QPF data assimilation.
2) NCEP stage IV
In 2001, stage IV data began to be operationally produced at NCEP’s Environmental Modeling Center (EMC). Lin and Mitchell (2005) provide an overview of the stage IV process, and NCEP maintains an informational web page to provide other details of the stage IV process (http://www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/). Archived stage IV data are available from the National Center for Atmospheric Research (NCAR) (CPC/National Centers for Environmental Prediction/NWS/NOAA/U.S. Department of Commerce, and Joint Office for Science Support/UCAR 2000). Stage IV data are the mosaicked data from the 12 RFCs in the CONUS (Alaska is not included in stage IV). The precipitation estimates from each RFC (i.e., the MPE, P3, or Mountain Mapper estimates), which NCEP terms the Regional Multisensor Precipitation Analysis (RMPA), are sent to NCEP at semiregular intervals. NCEP performs the mosaic for hourly and 6-hourly CONUS-wide maps that are gridded in their original projection (HRAP).
The 24-h stage IV product reprocessed 24 h after the verification time is the most accurate product for postanalysis of precipitation data. Use of the stage IV products in real time is not optimal as the hourly and 6-hourly maps are generated at differing times and for different reasons. For example, RFCs generate hourly and 6-hourly analyses first in an automated sense with no manual quality control. Then, the analysis is done several hours later in manual mode with quality control performed by a human analyst. So NCEP generates the national mosaic of hourly, 6-hourly, and daily mosaics at top-of-the-hour (i.e., approximately 30-min delay) periods with some RFCs sending their RMPAs at varying time periods. Therefore, a complete CONUS-wide map may not be available until several hours later. NCEP also produces a 24-h analysis that is the summation of the 6-hourly analysis. In addition, at least one RFC does not send hourly RMPA (Northwest RFC) to NCEP for processing. They only send the 6-h RMPA. It is also important to note that the 6-hourly analysis files are not the accumulation of the hourly analysis. The RFCs send hourly and 6-hourly analysis separately. The hourly RMPAs that NCEP receives are not manually quality controlled. However, the 6-hourly analysis that is included in the NCEP stage IV are the precipitation maps that are manually quality controlled at the RFC. RFCs do quality control hourly maps but at a time later than those that are sent to NCEP, and sometimes it happens that NCEP is not able to include the manually quality controlled hourly maps from every RFC in the NCEP stage IV hourly estimates.
3. Biases in stage IV
In this section we present the sources of uncertainty and problematic areas that exist in both the hourly and 6-hourly stage IV products. In this section we will not identify the well-known discontinuities and anomalies that exist as part of single-radar reflectivity scanning and processing [i.e., beam blockage (Young et al. 1999; Nelson et al. 2010), hot and cold biases (Nelson et al. 2010), bright band (Smith et al. 1996), anomalous propagation (Krajewski and Vignal 2001), and cone of silence (Nelson et al. 2010)]. Rather, we focus on the larger aspects of combined radar processing, combined RFC processing, and CONUS-wide QPE estimation.
As a purely qualitative snapshot, Fig. 2a shows the yearly average of the stage IV product (using the daily product maps) over the 11-yr period, and Fig. 2b shows the climatology from the PRISM for the same period.
a. Mosaicking process
The mosaicking process creates discontinuities in areas that are distinctly outside of a certain RFC but may fall within the mosaicking domain of more than one RFC. This is mostly relevant over oceans and in Canada but can also be seen in the extent that covers the Great Lakes region. There is also an issue with mosaicking in the western part of the Missouri RFC and eastern part of the Northwest RFC for the hourly stage IV products. Evidently, data from the Missouri RFC are used over the Northwest RFC at certain times and not others. This causes the “rectangular” discontinuity in the hourly maps.
Pixels that fall outside of an RFC domain tend to have another uncertainty associated with their estimation. The process of determining this estimate follows certain steps (Lin and Mitchell 2005). If a point falls outside an RFC’s domain but an estimate exists (from one or several RFCs), an average value from those estimates is used. Then, if a point falls within the RFC domain but the estimate does not exist, an average value of any other estimate from other RFCs that falls within that point is taken. The method can be illustrated by the discontinuity created along the boundaries of RFCs (Fig. 3). As an example in Fig. 3, the overlap region of the Missouri RFC and the North Central RFC can be seen along the Canadian border. The long-term average of precipitation in this overlap region between the Missouri RFC and the North Central RFC shows a discontinuity as compared to either of the RFCs. This is due in part to averaging of daily values from both RFCs. Of note at the hourly scale, the Northwest RFC does not provide precipitation estimates so the hourly stage IV data in this RFC include data from other RFCs but not the Northwest RFC. The bottom of Fig. 3 shows the rectangles that correspond to the RFC processing areas as they exist in the CONUS-wide processing grid.
b. RFC processing
Not all RFCs generate precipitation with the same rainfall algorithm. For instance, the majority of the RFCs (Missouri, North Central, Ohio, Mid-Atlantic, Northeast, Southeast, Lower Mississippi, and West Gulf) use the MPE algorithm developed by the NWS OHD (J. P. Breidenbach et al. 2002, unpublished manuscript). A key step in the MPE is for the algorithm to identify which reflectivity value to use for each pixel. The MPE algorithm uses the pixel associated with the lowest unobstructed sampling volume, which usually results in using the pixel closest to the identified radar. A result of this processing is developing defined borders for each radar. The look of this result is something akin to Theissen polygons. However, it is actually due to the slight changes in reflectivity values from adjacent radars that can be due to hot and cold biases in radar operating strategies (Breidenbach et al. 1999, 2001). Figure 4a clearly identifies the artificial “border” in a long-term average of the stage IV product.
As discussed earlier, the RFCs generate precipitation via different algorithms. For example, Fig. 4c is a long-term average of the Arkansas–Red basin precipitation. The ABRFC uses the P3 algorithm, which is the newer version of the P1 algorithm evaluated in Young et al. (2000). There are no “polygons” in this map but there are radar “rings” (approximately 230 km from the radar), which are evident and this is due to the compositing nature that happens in the P3 algorithm. The ring borders are also due to hot and cold biases in radar operating strategies.
Figure 4b shows the long-term average of the hourly stage IV. The issues related to automated processing are evident in the California–Nevada RFC (CNRFC) precipitation estimates. In addition, the lack of gauge coverage is clearly an issue as the radius of influence for each gauge is limited. And finally, the quality control of the gauges for the hourly maps is lacking as a result of the automated nature of the hourly RFC processing. It is quite obvious that in the western RFCs the hourly maps should be used with caution.
c. RFC border
Figure 4d shows an example of the QPE at the border of the Missouri, North Central, Ohio, and Lower Mississippi RFCs. Figure 4d is a good example of a main problem in the stage IV dataset. Discontinuities are due not only to radar-to-radar hot and cold biases but what can be termed RFC-to-RFC hot and cold biases. RFC borders are evident in the long-term accumulation/average of the QPE, and this is an indication of the differences in processing from RFC to RFC, thus producing a bias throughout the dataset.
4. Assessment of stage IV
In this section we provide an assessment of the stage IV product. The assessment is intended to put the product in context for its use in comparative studies.
a. Conditional analysis
The distribution of rainfall is lognormal. Therefore, analyzing the data conditionally provides a better way of describing the bias, error, and correlation as compared to unconditional analysis. Evaluating bias, error, and correlation in an unconditional sense does not tell the whole story because over- (under-) estimation in one regime can wash out the under- (over-) estimation in another regime. In addition, just due to the nature of rainfall estimation, biases and errors are a function of precipitation intensity. Figure 5a is the cumulative distribution function (CDF) for the period of record for each of the 12 RFCs. The CDF allows us to describe the distribution of rainfall for each RFC. Figure 5b provides some percentiles of interest (50th, 70th, and 90th) to describe the distribution of rainfall and to provide a comparison from RFC to RFC. The classification of the different regimes as a function of the daily rainfall intensity is rather arbitrary. However, we find a few classifications in the literature (Alpert et al. 2002; Arnone et al. 2013). Alpert et al. (2002) propose a classification in six classes (light, 0–4 mm day−1; light–moderate, 4–16 mm day−1; moderate–heavy, 16–32 mm day−1; heavy, 32–64 mm day−1; heavy–torrential, 64–128 mm day−1; and torrential: >128 mm day−1). More recently, Arnone et al. (2013) provided a simpler classification (light, 0.1 ≤ R < 4 mm day−1; moderate, 4 ≤ R < 20 mm day−1; and heavy, R ≥ 20 mm day−1). We note that the moderate–heavy category corresponds to the wet millimeter days (wmmd) (>17.8 mm day−1 for CONUS), the heavy rainfall corresponds to 50 mm day−1, and the heavy–torrential category corresponds to 100 mm day−1, which are commonly used thresholds (Prat and Nelson 2015). For light rain (50th percentile) there is little difference in the daily accumulation as it relates to the percentile from RFC to RFC. However, from season to season and at the local scale there are significant differences from RFC to RFC. In the following sections we describe the stage IV QPEs by season and percentile. For heavy and very heavy rainfall (90th and 99th percentiles) there are significant differences from RFC to RFC. For example in Fig. 5a the range of values is from 8 mm day−1 (CBRFC) to 29 mm day−1 (LMRFC) at the 90th percentile and from 25 mm day−1 (CBRFC) to 73 mm day−1 (LMRFC) at the 99th percentile. The ranges of values for the heavy precipitation vary from RFC to RFC when broken down by season. Figure 5 and Table 2 show the range of values for the percentiles of interest (50th, 70th, 90th, and 99th).
In a separate study (Prat and Nelson 2015) we used a rain-rate-based threshold [i.e., 17.8 mm day−1, 2 in. day−1 (50.8 mm day−1), and 4 in. day−1 (101.6 mm day−1)] for the entire CONUS. The goal was to compare the different sensors (rain gauges, stage IV, and satellite) and their ability to capture intense–heavy precipitation regardless of the climatic characteristics of the area. For instance the number of heavy events decreases significantly when we move westward. In this work, we consider a different approach. Our goal is to evaluate the performance of the NCEP stage IV dataset regardless of the climatic characteristic of the RFC domain. Therefore, the use of a percentile rather than intensity for the threshold allows a comparison of stage IV performance from RFC to RFC in a statistically significant way (distributions versus distributions) and minimizes the climatological characteristics of each of the RFCs. On average, the choice of 50% corresponds to the light rainfall definition mentioned above [<4 mm day−1; Alpert et al. (2002); Arnone et al. (2013)]. Furthermore, the light–moderate and moderate–heavy thresholds defined here as 50th–70th and 70th–90th percentiles, respectively, correspond loosely to the classifications mentioned above (Alpert et al. 2002; Arnone et al. 2013). They represent a compromise between both classifications, ensuring that each category [L, 50th (2 mm day−1 < R < 5 mm day−1); L–M, 70th (3 mm day−1 < R < 11 mm day−1); M–L, 90th (8 mm day−1 < R < 29 mm day−1); and H, 99th (>20–25 mm day−1)] is statistically significant.
1) Light precipitation
Light precipitation is an important aspect of the precipitation regime and it is sometimes overlooked. Since light precipitation occurs throughout the diurnal cycle, it is particularly important. Furthermore, the challenge in retrieving both light and heavy precipitation (i.e., both sides of the spectrum) is well known.
In this section we examine light precipitation within the CDF for each season. For purposes of our study we have defined light precipitation as the 50th percentile. This is a subjective measure but as is evident in the CDF, where at the lower percentiles there is little difference in the daily value that corresponds to the lower percentiles. This is mainly due to the fact that there are so many values at the lower end of the precipitation regime because of either the climatology or the minimum detectible threshold of rainfall defined at each RFC. Figure 6 shows a map of the 50th percentile for the CONUS for each season.
The seasonal distribution of light rainfall shows the challenges involved with detecting precipitation throughout the CONUS. In the western RFCs (NW, CN, and CB) light precipitation is dominant in the winter season with some contribution in spring and fall. However, there is very little light precipitation during the summer season. There is certainly a climatological effect as the western United States sees little precipitation in the summer, but since the western RFCs process QPE differently, there is an effect of processing in the lack of observed precipitation. This effect is due to sparse gauge networks that cannot capture precipitation, and the fact that the western RFCs do not use the radar data in their MPE processing. Other factors such as evaporation between the cloud base and surface could contribute to the lack of observed precipitation.
In the RFCs that use the DPA radar data (MB, AB, WG, NC, LM, OH, NE, MA, and SE), there are obvious coverage issues from season to season. This is particularly true in the Missouri, Arkansas–Red, and West Gulf RFCs but is evident in others as well. For example, the western edge of these RFCs shows a much different climatology from the winter season to summer. The coverage issue is due to the fact that the sparse radar coverage is not sufficient—spatially—to detect rainfall during the cool season (quantitatively). Next, there is significant light precipitation in the Gulf of Mexico states during the winter and fall. In addition, this area of significant precipitation falls a bit north of the Gulf of Mexico states in the spring (Prat and Nelson 2014). Finally, it is interesting to note that the Florida Peninsula shows lower light precipitation in the winter—its dry season.
2) Heavy precipitation
We define heavy precipitation as the 90th percentile at each RFC. The values of heavy precipitation at this percentile range from 6.4 mm day−1 (MRRFC) in the winter to 30.7 mm day−1 (LMRFC) in the spring (Table 2). This range of values emphasizes the difficulty in describing the QPE from season to season and from RFC to RFC. The less precipitation in the winter in the Missouri River RFC is likely due to the climatology of winter precipitation (i.e., snow and frozen precipitation), and the large precipitation in the Lower Mississippi RFC is likely due to the climatology of spring precipitation and the convective nature of storms in the spring. Figure 7 shows the seasonal maps of precipitation for the 90th percentile at each pixel. Of note for heavy precipitation is wintertime precipitation in the Northwest and California–Nevada RFCs. Climatologically, it is evident that this region gets the majority of its precipitation in the winter with some heavy events in the spring and fall. The location where the heavy precipitation is largest in the summer is in the central United States. Climatologically, large convective storms pass through this region. Heavy events are concentrated on the Gulf during spring and fall and on the mid-Atlantic in the spring.
3) Very heavy precipitation
We define heavy precipitation as the 99th percentile at each RFC. The values of heavy precipitation at this percentile range from 20.6 mm day−1 (CRRFC) in the summer to 87.8 mm day−1 (LMRFC) in the fall. Figure 8 shows the seasonal maps of precipitation for the 99th percentile at each pixel. In Fig. 8 the climatology of extreme precipitation is evident. For example, the very heavy precipitation on the Gulf coast in the fall is most likely due to tropical storms and hurricanes. There are a larger number of tropical storms making landfall during the fall season as compared to other seasons in the Gulf coast region (Prat and Nelson 2013, 2014). The very heavy events in the central and northern plains happen in the summer season, which agrees with climatology and the patterns of highly convective thunderstorms. In the spring the very heavy events are concentrated in the central plains and the Tennessee and Ohio River valleys as well as along the gulf coast. In California and the Northwest the very heavy events are concentrated in the winter but there are also very heavy events in the fall.
b. In situ verification
Verification of the NCEP stage IV precipitation product is difficult for many reasons including but not limited to the availability of a consistent long-term CONUS-wide rain gauge dataset. For the purposes of verifying the stage IV precipitation product over the CONUS, we use the U.S. Climate Reference Network (USCRN). The USCRN is a network that was implemented in response to the challenges of siting a climate network that will provide no changes in the station history. The USCRN program aims to create a set of station records that provide a robust multidecadal climate monitoring capability (Diamond et al. 2013). As related to precipitation, the USCRN has three independent vibrating-wire weighing transducers that suspend the precipitation-bucket cradle and provide three independent measurements of the depth of the precipitation that has fallen in the bucket. Data are quality controlled and are available at the 5-min, hourly, daily, and monthly scales, in local standard time. For this study we use USCRN daily precipitation amounts accumulated from the hourly precipitation data with a conversion for local standard time to UTC to match the stage IV precipitation product. Figure 1 shows the distribution of gauges across the CONUS. While there is some sparse coverage over certain RFCs, we consider this to be the only in situ dataset that is available for CONUS-wide comparisons. Another widely used CONUS wide rain gauge network, the Global Historical Climatology Network (GHCN; Menne et al. 2012), provides a much denser network at the daily time scale but an investigation of this network showed that some of these sites are used in some capacity in the processing of the RFC-wide precipitation data that are sent to NCEP as inputs to stage IV. The specific gauge locations that are used are not known and thus it is difficult to extract those that could be used for verification. Thus, for verification purposes we use all available data from the USCRN precipitation observations to make a comparison of daily precipitation. The period of record of the USCRN station that is used for comparison varies from gauge to gauge based on when it was installed. Some gauges were installed in 2002 or prior and have a full record for the study period and, others were installed later (some as late as 2010) and thus have a shorter record.
Comparisons of gauge to pixel estimates for all of the available daily USCRN data are shown in the scatterplots in Fig. 9. While comparisons of unconditioned data show strong statistics for certain RFCs, we provide the bias, fractional standard error (FSE), and correlation based on certain conditions of the rain rate. All statistics are calculated for rain rates conditioned on the rain-gauge value. The bias is calculated as the ratio of the radar estimate to the gauge estimate:
The FSE provides a measure of the error for various conditions of rain rate defined as
The FSE is a normalized root-mean-square error that is normalized by the average gauge value for the given RFC and condition. Additionally, the correlation is defined as the sample Pearson correlation coefficient. Table 3 provides the bias values by RFC for each condition of precipitation for each season, Table 4 provides the FSE values by RFC for each condition of precipitation and each season, and Table 5 provides the sample Pearson correlation coefficient for each condition of precipitation and each season.
Tables 3–5 (as well as Figs. 11–13, shown later) present the bias, FSE, and correlation for each RFC for four different conditions of rain rates. The light rain rate corresponds to rainfall greater than zero and less than the 50th percentile for the RFC. The light-to-moderate rain rate corresponds to rainfall greater than the 50th percentile and less than the 70th percentile for the RFC. The moderate-to-heavy rain rate corresponds to rainfall greater than the 70th percentile and less than the 50th percentile for the RFC. The heavy rain rate corresponds to rainfall greater than the 90th percentile for the RFC. Each of the percentile values are given in Table 2.
Our discussion of bias relates to the over- (under-) estimation of the radar estimate as compared to the gauge measurement. A bias of 2 means the radar estimate is twice as large (overestimation) as the gauge measurement and a bias of 0.5 means the radar estimate is half as large (underestimation) as the gauge measurement. In general, the largest underestimations (Fig. 10 and Table 3) exist in the western RFCs. The largest underestimation is for the CNRFC during the summer for light rainfall at 0.38. A very small bias exists at the same RFC for light-to-moderate rain in the summer. The large underestimations in summer for light rain at the CNRFC can be due to limited sampling. Underestimation in the western RFCs is due in general to the fact that the Mountain Mapper algorithm is used for the stage IV product. Large underestimation should be evident at gauge locations that are not used in the Mountain Mapper estimation (i.e., CRN gauge locations).
The largest biases (stage IV overestimation; see Fig. 10 and Table 3) exist for light rainfall at the MBRFC for summer at 2.01. There are three possible reasons for a large bias no matter which RFC is being evaluated: 1) reflectivity to rain-rate relation; 2) rainfall representativeness, which refers to the difference in the sampling volumes of a gauge versus a radar pixel (i.e., submeter scale versus kilometer scale); and 3) possible bright band or hail contamination. In fact, the larger biases are for light precipitation for all RFCs and for all seasons, suggesting the continued difficulty of radar-rainfall estimation in this regime. The largest biases for light-to-moderate rainfall are for the MARFC for all seasons. For moderate-to-heavy and heavy rainfall the biases are greatly reduced for the eastern RFCs (MB, AB, WG, NC, OH, LM, NE, and SE); although, some small biases (stage IV underestimation) are evident for most seasons at the moderate-to-heavy and heavy rain rates.
While most RFCs exhibit a large overestimation for light rainfall, the biases are generally reduced for increasing rain rates. The RFCs that tend to exhibit better performance as it relates to bias are the AB, SE, and NC RFCs. The biases tend to be neutral for moderate and moderate-to-heavy rain rates at these RFCs. At high rain rates all RFCs tend to underestimate with the exception of the CNRFC.
2) Fractional standard error
The fractional standard error (Fig. 11 and Table 4) is the normalized RMSE, which is normalized by the average gauge value for the particular RFC, season, and condition. For light precipitation there are large FSEs in the eastern and southern RFCs. The percentage FSE is large but the error is relatively small. An FSE at light rainfall of 100% is just 1 times the average value for that particular RFC. So for instance an FSE of 100% for an average rainfall value in the light precipitation regime of 1 mm day−1 would be an RMSE of 1 mm day−1. In this sense the FSE provides a relative measure of RMSE, and thus an FSE of 50% for heavy rainfall suggests that rainfall estimation in this regime is improving. The OHRFC, WGRFC, and SERFC have the largest FSEs for light precipitation. The large FSEs seem to move westward for larger rain rates. The MBRFC, CBRFC, and CNRFC have large FSEs for light-to-moderate, moderate-to-heavy, and heavy precipitation. It is also of note that FSEs decrease for increasing rain rates.
The smallest FSEs are in the eastern RFCs. For example for light and light-to-moderate precipitation the smallest FSE is in the NERFC except for SERFC in winter for light precipitation. For moderate-to-heavy to heavy precipitation the smallest FSEs are in NERFC, LMRFC, SERFC, and ABRFC.
Our conditional analysis is based on the percentiles of the distribution that correspond to certain types of precipitation, and thus our statistics are computed being bound on both ends of the distribution. We evaluate our statistics this way since we have defined the various precipitation regimes. The relatively few studies that evaluate precipitation conditionally for the CONUS (Prat and Nelson 2015; Habib et al. 2013; Wu et al. 2012) do so by bounding only the lower end of the distribution. When we look at the results (Fig. 12 and Table 5) there are very low correlations in the western RFCs for all precipitation regimes and seasons. The smallest correlation is close to zero at the CNRFC for light precipitation in the summer. There is also a very small correlation close to zero for the MBRFC in winter. These small correlations for light precipitation are likely due to radar error (i.e., not detecting rainfall) or for the western RFCs, they are due to the problem of the algorithm being gauge based and the verifying gauge (i.e., CRN) being outside the radius of influence of the gauge used in the estimation algorithm. There are other interesting points to note for the correlation. There appears to be a small dip in the correlation values at light-to-moderate rain rates. This is likely due to 1) sampling and 2) the transition between Z–R relations. In Fig. 13 we show the same correlation statistic but only being bound on the lower end of the condition. Figure 13 is shown for comparison to past studies and reveals how there can be differences in correlation values depending on how the distributions are bound. We also note that at larger rainfall rates the correlations increase, with the largest correlations in the eastern RFCs as well as the southern RFCs. The largest correlation is 0.92 for the NERFC in the spring but large correlations exist for SERFC, NERFC, LMRFC, and ABRFC for all seasons for heavy rainfall.
c. Discussion and areas for improvement
In this section we try to put this study in context with past studies of the stage IV product at the daily scale. There have been relatively few studies of the stage IV product over the CONUS for a long period of record. In fact, we find only three that provide a CONUS-wide evaluation. Prat and Nelson (2015) evaluated several rainfall products over the CONUS for the period 2002–12, Wu et al. (2012) evaluated both the MPE product along with the NSSL’s Multiradar/Multisensor System (MRMS) product, and Hou et al. (2014) attempted to adjust the stage IV product with the Climate Prediction Center’s gauge-based product. Of these three, only Prat and Nelson (2015), as well as this study, evaluate the stage IV product over a long-term period (2002–12). We note that the Hou et al. (2014) study is for the period 2002–09. Several other studies have evaluated the stage IV product for a longer period but only for one RFC or one region. Westcott et al. (2008) compared the MPE products to gauges, but they degraded the spatial resolution at the daily scale and their period of record was 2002–05. Young and Brunsell (2008) evaluated MPE products for a longer period (1998–2004) for the Missouri basin RFC. Habib et al. (2013) evaluated several of the NWS operational products including MPE but for a small basin and only for a 2-yr period (2005–06).
For the aforementioned studies we provide some comparison to the three CONUS-wide studies. Prat and Nelson (2015) found a general underestimation of the radar at the annual average scale and they found that this is due to the radar missing rain events with increasing threshold. At low rain rates they found the radar detects more events (so higher accumulation and overestimation) while at higher rain rates (RRs), the radar detects fewer events (so lower accumulation and underestimation). Next, the Wu et al. (2012) study is most closely related to the current study (CONUS-wide comparison of daily MPE estimates with rain-gauge observations), and hence we have evaluated the correlation and RMSE values of our study in a manner similar to their study (Figs. 13 and 14). In Wu et al. (2012) (Fig. 10), they evaluate correlation and RMSE for warm and cool seasons but do not parse values by RFC. A comparison of Fig. 13 shows a decreasing correlation with increasing rain rate that compares quite well with Wu et al. (2012). In addition the RMSE values from Fig. 14 compare quite well with those in Wu et al. (2012). In addition, Wu et al. (2012) (Fig. 2) provide correlation values by RFC but lumped for the entire study period. They have similar highs and lows for correlation by RFC, with the lowest correlations at the western RFCs and highest correlations at the AB, WG, LM, and SE RFCs. And finally Hou et al. (2014) provide one map of RMSE that loosely compares with the RMSE values from the current study.
Given the overview of the biases in the stage IV product and the error analysis, we provide five major areas for improving the NCEP stage IV precipitation product:
We recommend separating out the western RFCs from the rest of the RFCs. Until there is better coverage of the NEXRAD network in the western RFCs, those three RFCs should use a consistent algorithm. Consistent in this case meaning the same implementation of the process to generate precipitation estimates. Similarly, the rest of the RFCs (where there is mostly good coverage of the NEXRAD network) should use a consistent algorithm. This will reduce or eliminate biases induced by generating precipitation estimates using different algorithms.
An improvement needs to be implemented in the merging of data from adjacent radars and adjacent RFCs. An improved method of merging data in overlapping regions will reduce or eliminate this possible cause of spatial discontinuities in precipitation estimates.
There is a need for an improvement in the estimation of both light and heavy precipitation. The existing estimation algorithm at most RFCs uses an optimal estimation technique that does not account for conditional biases in precipitation. An improved method of estimation such as conditional penalized biased kriging (CPBK) has shown improvement in the estimation of heavy precipitation while adjusting for light precipitation at the same time (Seo 2013).
It is unlikely at this time that additions to the NEXRAD network will be made. However, it is our recommendation that the coverage in the western RFCs needs to be improved. If more NEXRAD sites are not added, then a much denser rain gauge network should be addressed or the state of satellite QPE estimation should be vastly improved.
Given the lack of coverage of the NEXRAD network in the western RFCs, an improvement in the estimation algorithm should be accelerated. In addition, satellite QPE could help improve data in these RFCs.
In this paper, we provide an overview of the NCEP stage IV precipitation product. The product consists of hourly, 6-hourly, and 24-hourly maps of precipitation at the 4 km × 4 km scale. In this analysis we only evaluate the 24-hourly maps of precipitation. The maps of precipitation are generated by the NCEP using a mosaicking technique that combines data from the 12 RFCs in the CONUS. We have provided an overview of the NWS precipitation processing system that generates quantitative precipitation estimates at the RFCs. The stage IV product is currently the only operational product that provides high-resolution radar-based precipitation estimates over the CONUS, and thus is used in many studies for comparison of precipitation products (i.e., satellite QPE). Our findings indicate that the stage IV product could be useful for certain types of studies but should be used with caution in other types of studies. We outline the strengths and weaknesses here.
Although we have shown a general underestimation of the stage IV at high rain rates, the FSE is reduced with increasing rain rates and the correlation increases. Thus, the stage IV estimate has shown good performance at high rain rates. The only caution is that the radar has a higher percentage of missed events with increasing rain rate (Prat and Nelson 2015).
The stage IV product is operational and provided in a common data format [gridded binary (GRIB)]. Thus, the product is useful for studies that need high-resolution data spatially (4 km). There is also an ease-of-use factor with the product, as it can be transferred and processed easily (due to size and format).
Quantitatively for certain locations and certain conditions the stage IV product has shown good performance and similar statistics to previous studies. The stage IV product has been shown to have improving performance as it relates to correlation and FSE at increasing rain rates (section 4). In gauge-sparse areas and in areas of limited elevation change with adequate NEXRAD site coverage the product can be useful for convective-type precipitation estimation and thus for comparison to other products (i.e., satellite QPE). We caution that stage IV tends to underestimate precipitation at increasing rain rates, partly as a result of a higher rate of missed events.
The stage IV product has the advantage of being able to better capture convective precipitation as compared to rain-gauge observations. As compared to satellite-based QPEs like TRMM, CMORPH, and GPCP, the stage IV product has a higher spatial resolution (Prat and Nelson 2015), and as an operational product it is bias adjusted in near–real time as compared to a lengthy delay of adjustment for the satellite-based QPE.
The quality control at the hourly scale implemented at the RFCs is automated, and often this process cannot identify bad rain-gauge reports. Therefore, bad rain-gauge reports are going to be merged in the multisensor precipitation estimation algorithm. The hourly stage IV precipitation estimates should be used with caution, especially when comparing them to other datasets (i.e., satellite QPE).
Hourly data from the western RFCs (CNRFC, CBRFC, and NWRFC) should be used with caution. The gauge-only algorithm used at these RFCs does not provide enough nonautomated quality control to remove bad gauges. Figure 4b shows the bull’s-eyes that are not flagged in the automated process as an example. In addition, the NWRFC does not provide hourly precipitation estimates to the stage IV process.
Data in overlapping regions outside of an RFC should also be used with caution. NCEP details the process for mosaicking data in an overlap region of two or more RFCs. An example of a discontinuity due to compositing data in overlapping regions is found in the data from the coast of South Carolina and Georgia.
Because each RFC uses different algorithms to generate QPEs, and because they use different processes to identify bad gauge reports or bad radar estimates, there are biases that exist between RFCs. The best example of this is at the junction of five RFCs: Southern Illinois, Missouri, Kentucky, Tennessee, and Arkansas. This is the junction of the Missouri, North Central, Ohio, Lower Mississippi, and Arkansas–Red RFCs. The borders of the RFCs can be seen in the long-term average of the QPE (Fig. 3d).
Finally, the underlying issues related to radar-based precipitation estimation are still evident in the stage IV product. This is an important point that should not be lost when using the stage IV product. As many studies have used and will be using the stage IV product because it is a consistent CONUS-wide product, these studies still need to refer to the fact that radar rainfall estimation has inherent problems and that the algorithms used at the RFCs try to reduce these problems but they cannot eliminate them altogether. Inherent biases in radar estimation are due to anomalous propagation, brightband contamination, beam blockage, range-dependent detection of rainfall, lack of radar coverage, and representativeness bias due to the physical nature of rainfall. The biases that exist in the algorithms used by the RFCs are due to the radar merging technique, the automated process for identifying bad gauge reports, and the use of only gauges (by certain RFCs).
The second author is supported by NOAA through the Cooperative Institute for Climate and Satellites–North Carolina under Cooperative Agreement NA14NES432003. In addition the authors would like to express their gratitude to the reviewers for the in-depth reviews.