• Dai, A., 1999: Recent changes in the diurnal cycle of precipitation over the United States. Geophys. Res. Lett., 26 , 341344.

  • Higgins, R. W., Janowiak J. E. , and Yao Y-P. , 1996: A gridded hourly precipitation data base for the United States (1963–1993). NCEP/Climate Prediction Center ATLAS 1, 47 pp.

    • Search Google Scholar
    • Export Citation
  • Kondragunta, C., and Shrestha K. , 2006: Automated real-time operational rain gauge quality controls in NWS hydrologic operations. Preprints, 20th Conf. on Hydrology, Atlanta, GA, Amer. Meteor. Soc., P2.4. [Available online at http://ams.confex.com/ams/pdfpapers/102834.pdf].

    • Search Google Scholar
    • Export Citation
  • Kursinski, A. L., and Mullen S. L. , 2008: Spatiotemporal variability of hourly precipitation over the eastern contiguous Unites States from stage IV multisensor analyses. J. Hydrometeor., 9 , 321.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lin, Y., and Mitchell K. E. , 2005: The NCEP stage II/IV hourly precipitation analyses: Development and applications. Preprints, 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2. [Available online at http://ams.confex.com/ams/pdfpapers/83847.pdf].

    • Search Google Scholar
    • Export Citation
  • NCDC, cited. 2003: Data documentation for Data Set 3200 (DSI-3200). [Available online at http://www.ncdc.noaa.gov/oa/documentlibrary/].

  • Nelson, B., Seo D. J. , and Kim D. , 2008: Multi-sensor precipitation reanalysis. Preprints, Int. Symp. on Weather Radar and Hydrology, Grenoble, France, Laboratoire d’étude des Transferts en Hydrologie et Environnement (LTHE), 02-004, 150 pp. [Available online at http://www.wrah-2008.com/PDF/O2-004.pdf].

    • Search Google Scholar
    • Export Citation
  • NWS, 2002: Standard hydrometeorological exchange format (SHEF) manual. National Weather Service Manual 10-944. [Available online at http://www.nws.noaa.gov/directives/].

    • Search Google Scholar
    • Export Citation
  • Seo, D-J., and Breidenbach J. , 2002: Real-time correction of spatially nonuniform bias in radar rainfall data using gauge measurements. J. Hydrometeor., 3 , 93111.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tollerud, E., Collander R. , Lin Y. , and Loughe A. , 2005: On the performance, impact, and liabilities of automated precipitation gage screening algorithms. Preprints, 21st Conf. on Weather Analysis and Forecasting, Washington, DC, Amer. Meteor. Soc., P1.42. [Available online at http://ams.confex.com/ams/pdfpapers/95173.pdf].

    • Search Google Scholar
    • Export Citation
  • View in gallery

    A schematic diagram of HADS real-time and archival product flows. A real-time precipitation product begins at NWS/OHD and is delivered to end users at an RFC or WFO. This product is also stored at NCEP and NCAR for other applications. The HADS program pushes original-format HADS data to NCDC once a day, where it is then reprocessed. Some RFCs report manually edited precipitation data, and they are also archived at NCDC through the SRRS.

  • View in gallery

    The time series of accumulated precipitation at the Hungry Horse, MT (HGHM8), gauge station during a 7-day period from 0000 UTC 20 May through 2100 UTC 26 May 2006. Apparent small perturbations make true rain events difficult to detect. Such noise has existed since the beginning of the archive (October 1997).

  • View in gallery

    Distribution of subsampled HADS stations during September 2005. The subsampling was made from every seventh station selected from an alphabetical list of all stations in the CONUS. At least one station must be present in each state and stations with more than seven days (168 h) of missing values are deleted.

  • View in gallery

    Distribution of all HADS stations available in NC and SC (solid dots) and COOP daily rain gauge stations (open circles).

  • View in gallery

    Two quality metrics comparing repro PP (dark bars) and real-time PP (gray bars) during 2003–05 for the CONUS. (a) Fractional missing values (the smaller the better). (b) Percentage of top-of-the-hour observations (the larger the fraction is, the better the time representation).

  • View in gallery

    Box plots of the bias ratio (monthly total precipitation comparing HADS to COOP) for the warm seasons for 2003–05. Median values of repro PP (in the dark color box) are closer to unity than those of real-time PP.

  • View in gallery

    Empirical probability function of the gain (repro PP − real-time PP) for all three warm seasons. The function is trimmed between −10 and +11 mm with 0.5-mm class intervals. The distribution shows a skewness toward positive values; namely, repro PP recovered observation values that real-time PP missed. The mean value of the highest probability bin (0.0 to 0.5) was 0.254 mm. The dashed line shows the fitted probability density function with a peak value of 0.18 mm.

  • View in gallery

    (a) Diurnal patterns of frequencies in missing values during warm seasons in the NC–SC domain. Solid circles connected with dashed lines are taken from real-time PP; open circles with solid lines are taken from repro PP. Real-time PP shows peaks of missing values at certain hours of the day, while repro PP reflects more of a uniform distribution in time. The peaks of missing values in 2004 are from May 2004. (b) As in (a) but for precipitation frequencies. Positive PP values are counted as rain events.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 209 149 7
PDF Downloads 122 78 3

Characteristics of Reprocessed Hydrometeorological Automated Data System (HADS) Hourly Precipitation Data

View More View Less
  • 1 NOAA/NESDIS/NCDC, Asheville, North Carolina
  • | 2 NOAA/NWS/Office of Hydrologic Development, Silver Spring, Maryland, and University Corporation for Atmospheric Research, Boulder, Colorado
© Get Permissions
Full access

Abstract

The Hydrometeorological Automated Data System (HADS) is a real-time data acquisition, processing, and distribution system operated by the Office of Hydrologic Development (OHD) of NOAA’s National Weather Service (NWS). The initial reprocessing of HADS data from its original format since its inception in July 1996 has been completed at NOAA’s National Climatic Data Center (NCDC). The quality of the reprocessed HADS hourly precipitation data from rain gauges is assessed by two objective metrics: the average fraction of missing values and the percentage of top-of-the-hour observations for a 3-yr period (2003–05). Pairwise comparisons between the reprocessed product and the real-time product are made using representative samples (about 13%) from the 48 contiguous United States. The monthly average of missing values varies from 0.5% to 2% in the reprocessed product and from 1.7% to 10.1% in the real-time product. Except for January 2003, the reprocessed product consistently reduced missing values, by as much as 9.4% in October 2004. The availability of top-of-the-hour observations is about 85% in the reprocessed product, while the real-time product has top-of-the-hour observations only about 50% of the time. This paper discusses real-time product quality issues, additional quality assurance algorithms used in the reprocessing environment, and the design of system-wide performance comparisons. Thus, the benefits to users of reprocessing the HADS data are the correction of 4-h observation time errors during 1 July–11 August 2005 and the demonstration of diurnals pattern of precipitation frequencies in regional domains. A Web-based interactive quality assessment tool for reprocessed HADS hourly precipitation data and access to the data are also presented.

Corresponding author address: Dongsoo Kim, NOAA/NESDIS/National Climatic Data Center, 151 Patton Ave., Asheville, NC 28801-5001. Email: dongsoo.kim@noaa.gov

Abstract

The Hydrometeorological Automated Data System (HADS) is a real-time data acquisition, processing, and distribution system operated by the Office of Hydrologic Development (OHD) of NOAA’s National Weather Service (NWS). The initial reprocessing of HADS data from its original format since its inception in July 1996 has been completed at NOAA’s National Climatic Data Center (NCDC). The quality of the reprocessed HADS hourly precipitation data from rain gauges is assessed by two objective metrics: the average fraction of missing values and the percentage of top-of-the-hour observations for a 3-yr period (2003–05). Pairwise comparisons between the reprocessed product and the real-time product are made using representative samples (about 13%) from the 48 contiguous United States. The monthly average of missing values varies from 0.5% to 2% in the reprocessed product and from 1.7% to 10.1% in the real-time product. Except for January 2003, the reprocessed product consistently reduced missing values, by as much as 9.4% in October 2004. The availability of top-of-the-hour observations is about 85% in the reprocessed product, while the real-time product has top-of-the-hour observations only about 50% of the time. This paper discusses real-time product quality issues, additional quality assurance algorithms used in the reprocessing environment, and the design of system-wide performance comparisons. Thus, the benefits to users of reprocessing the HADS data are the correction of 4-h observation time errors during 1 July–11 August 2005 and the demonstration of diurnals pattern of precipitation frequencies in regional domains. A Web-based interactive quality assessment tool for reprocessed HADS hourly precipitation data and access to the data are also presented.

Corresponding author address: Dongsoo Kim, NOAA/NESDIS/National Climatic Data Center, 151 Patton Ave., Asheville, NC 28801-5001. Email: dongsoo.kim@noaa.gov

1. Introduction

The Hydrometeorological Automated Data System (HADS) provides a collection of hydrometeorological observations from diverse networks that use Geostationary Operational Environmental Satellite (GOES) data collection platforms (DCPs) for real-time data transmission. The diverse networks that compose HADS include the U.S. Geological Survey (USGS), the U.S. Army Corps of Engineers (USACE) districts, and participants in the Remote Automated Weather Stations (RAWS) program hosted by the U.S. Department of Agriculture’s (USDA’s) Forest Service. Data are transmitted to the HADS program office at the National Weather Service’s Office of Hydrologic Development (NWS/OHD) for processing and archiving. In this paper we focus on one particular class of observations (hourly precipitation) from the HADS dataset and undertake an effort to enhance and improve it, both spatially and temporally. This reprocessing effort is driven by the fact that hourly rain gauge data are needed in order to describe precipitation for finer-scale events, such as diurnal variations of convective storms, heavy rains that trigger debris flow, and verifications of model forecasts, to name a few. For any scientific study, high quality data are necessary. Often, however, missing values render the record incomplete, and therefore the users have to estimate the missing values. The reprocessing effort allows for the recovery of certain missing data points and for the rigorous quality control of the raw data to provide an improved dataset for use in research and climatic applications.

The purposes of reprocessing the HADS data are threefold: 1) to enlarge the hourly hydroclimate database for use in various applications such as multisensor precipitation reanalysis (Nelson et al. 2008); 2) to provide real-time data to users, such as forecasters at NWS Weather Forecast Offices (WFOs) and River Forecast Centers (RFCs), with data quality information for specific gauge stations; and 3) to provide improved-quality hourly precipitation data to the user community. Rain gauge data often come with some measure of ambiguity. Missing values are a source of much of this ambiguity in rain gauge datasets. Precipitation data are encoded as missing when 1) the gauge was not functioning at the time of a scheduled measurement, 2) there was a disruption of data transfer at the time of transmission, and 3) there was a temporary failure in the data storage or product generation processes. In addition, the production system may encode the value as missing when the data failure was assumed by a quality threshold, for example, a negative hourly precipitation amount. The issues of missing values in real-time precipitation data used by the NWS WFOs and RFCs were revisited and corrective measures applied by reprocessing the original-format precipitation data and comparing the results with hourly precipitation products generated in real time.

Near-real-time HADS data are available online for 1 week at the NWS/Office of Hydrologic Development (OHD) Web site (http://www.nws.noaa.gov/oh/hads/). Currently, the original-format precipitation data are transferred to the National Climatic Data Center (NCDC) at the end of the day. Most of the historical data, collected since June 1996, are then stored and available for use at NCDC. Because of the diverse ownership of the networks included in HADS, it is difficult to expect uniform quality in precipitation measurements and sensor maintenance. In addition, the locations of the surface stations are determined by the network owner’s mission requirements. As a result, the spatial density of the gauges is highly inhomogeneous and the number of stations changes over time. On average, about 6200 rain gauges were available in 2007, while only about 2800 were available in 1996.

The HADS program produces hourly precipitation data in real time to support operational hydrologic forecasting at the NWS. For example, the HADS precipitation data are used in quantitative precipitation estimation (QPE) such as multisensor precipitation analysis (Seo and Breidenbach 2002). At least 70% of the hourly precipitation data used by RFC forecasters are composed of HADS precipitation data. As such, improvements in quality, including reduction of missing values, contribute directly the to overall improvement of the QPE product at each RFC.

There are two precipitation-related variables in the HADS data: cumulative and incremental precipitation amounts. More than 95% of the gauges have been reporting cumulative precipitation amounts since the reset of the value (coded as PC). Less than 5% of the gauges are reporting incremental precipitation at prespecified time intervals (coded as PP). It is simple to convert PC to PP by subtracting the previous PC value from the current PC value. When the increment is 60 min, the output measures hourly precipitation and is usually measured at the top of the hour. If the gauge reports subhourly PP, the running total of subhourly PP for 1 h also measures hourly precipitation.

The HADS program produces hourly precipitation data and makes them available to users. This product is defined as “real-time PP,” as it is produced in real time. In the retrospective environment, data are recovered that would have been dropped in the real-time environment. This reprocessed PP output is defined as “repro PP.” In the remainder of the paper, we present the HADS precipitation data flow to help understand the staging places of the data and quality control practices. We discuss the reprocessing steps at NCDC and the analysis approaches with metrics of the fraction of missing values and the percentage of top-of-the-hour observations. We demonstrate the importance of the reprocessing by analyzing the diurnal cycle of the precipitation frequency in a regional domain. Finally, we conclude with recommendations for future study.

2. Data flow, quality assurance, and control practices

a. Data flow

Figure 1 shows a schematic of the HADS precipitation data and product flow in real time and from the archive. The HADS program office at OHD collects data from the DCP owners, produces PP values, and disseminates the data. In the real-time environment (solid lines in Fig. 1), both PC and PP are delivered to users at RFCs, WFOs, and the National Centers for Environmental Prediction (NCEP). NCEP collects PP values from both HADS and non-HADS data [e.g., Automated Surface Observation System (ASOS) hourly precipitation] for assimilation and verification purposes (Lin and Mitchell 2005). Here, a “real-time PP” value is defined as the product generated in an hourly cycle even if the station reports subhourly measurements. A historical archive of these values is available from the National Center for Atmospheric Research’s (NCAR) Earth Observing Laboratory (EOL) Web site (http://data.eol.ucar.edu/codiac/dss/id=21.004). In the archival environment, most NWS operational products (texts, grids, and graphics) are archived at NCDC through the Service Records Retention System (SRRS). Manually edited precipitation data created by RFCs and WFOs are embedded into this data flow. However, not all RFCs and WFOs report manually edited precipitation data; hence, the knowledge of the QC process in operational QPE is not preserved and the product is not reproducible (e.g., Kursinski and Mullen 2008). Another archival flow that began in May 2005 is for original-format PC values to be sent to NCDC. This archival flow is a part of the reprocessing of HADS precipitation data.

b. Quality control and assurance practices

The quality control and assurance (QC/QA) of HADS precipitation data were originally designed for real-time use in order to meet the operational mission of the NWS. The HADS program staff monitors incoming HADS data, updates metadata, and isolates obviously problematic stations. However, the QC/QA of the observed values is left to the end users.

The operational QC process for the hourly precipitation data at the RFCs follows four levels of QC procedures as described in Kondragunta and Shrestha (2006). The first level of the QC process deals with gross errors caused by instrument malfunction and transmission and coding–decoding errors due to format and configuration changes. The second level of the QC procedure checks for outliers outside of threshold values for each season and location. The third level uses neighboring gauge data and independent observations for spatial consistency checks, temporal consistency checks, and multisensor checks. The last level is left to the expert judgment of the forecaster. Screening of problematic data is the most important and time-consuming duty of the forecasters at RFC (J. Bradberry 2005, personal communication).

Gauge precipitation data are often used for the verification of quantitative precipitation forecasts (QPFs) from numerical weather prediction (NWP) models. Tollerud et al. (2005) developed a QC system for HADS precipitation data to verify model-based precipitation forecasts. In their work, the QC system was used to screen out questionable gauge stations. Questionable gauge measurements that violate internal threshold values in the QC system are considered to be gross errors. If the gross errors continue to be present, the gauge is labeled a “repeat offender.” These repeat offenders are entered into the list of rejected stations. Data from rejected stations were not used in the rest of the QC process. Improved verification scores resulted from the use of the quality-controlled data. The above system was developed based on real-time PP data served by NCEP, half of which are not from the top of the hour.

3. Reprocessing

The reprocessing of HADS hourly precipitation data begins with the decoding of original-format HADS data at full resolution as soon as OHD pushes them to NCDC at the close of the day. The decoded cumulative precipitation data are checked for temporal inconsistencies to recover missing values. Then, the detection and correction of spikes and noise in the hourly data complete the reprocessing step. At the beginning of a new month, we repeat the procedure by double-checking the data inventory and the metadata of the previous month.

a. Data preparation

Each month’s HADS data were parsed for two precipitation-related variables, PC and PP, using the NWS’s Standardized Hydrometeorological Exchange Format (SHEF) decoding package (NWS 2002). In this process, illegal characters embedded in the SHEF-encoded HADS data were removed. Occasionally, a misplaced digit in the SHEF text caused a decoding failure. In these cases, the misplaced location of the digit was manually corrected and the decoding step rerun. All decoded PC values were saved at reported intervals along with simplified metadata that include the following fields: station name, network owner, latitude, longitude, and measurement interval. In this way, a metadata list was created for each month that excludes stations that do not report precipitation. The inhomogeneity of the network providers means that not all measurements are reported at the same temporal interval. For example, some networks report measurements at the 5-, 15-, and 30-min, as well as hourly, intervals. The subhourly intervals provide an easy way to report hourly measurements at the top of the hour. However, some stations report only hourly intervals, which represent off the top-of-the-hour accumulations (e.g., 15 min past the hour, 30 min past the hour) causing misrepresentations of the observations in hourly precipitation data. We urge caution when using these off-the-top-of-the-hour measurements, and in this paper we separate these off-the-top-of-the-hour measurements from the top-of-the-hour measurements in all analyses. A resulting indicator file shows if the hourly PP is from the top of the hour or off the top of the hour.

The real-time PP process is set up to provide the latest real-time data to its users. This process, however, does not ensure that the hourly PP is the top-of-the-hour measurement. The issue of off-the-top-of-the hour PP data arises in retrospective hourly analyses and can be detrimental to specific applications such as hydrologic forecasting or multisensor quantitative precipitation estimation. We have found that some Remote Automated Weather Station (RAWS) gauges were measuring at off the top of the hour even though a majority of the RAWS gauges were measuring on the top of the hour. This is not a comprehensive picture, as other gauges from other networks report off the top of the hour too.

b. Restoration of missing values

The most frequently observed quality problem was that of missing values during nonprecipitating events. Nonprecipitating events are easily recognized as constant PC values before and after a period of missing values. During the conversion of PC to PP, strings of missing values were checked. If both PC values before and after the missing period were identical, the missing values were replaced with the same PC value, which resulted in a zero PP value. The missing period was not extended any longer than 24 h for fear of stuck gauges. If PC values are different, precipitation is assumed, and values are left as missing even if the difference is as small as 0.25 mm (0.01 in.). Then, observation times are classified into 15-min bins to assure that the derived PP is on the top of the hour. The output of this step is defined as “baseline PP” to distinguish it from real-time PP.

c. Spikes and noise control

Spikes and noise are nonphysical events. They are caused by many situations, but the two most common are a lack of system maintenance and exposure to a severe environment. The DCP system includes gauge instruments as well as a datalogger and a transmitter. A malfunction of any or all of these components can cause errors of this kind. The HADS metadata do not include gauge type and system information and, therefore, controlling spikes and noise requires detection of such errors in the time series of baseline PP. Such problems were detected by analyzing baseline PP values for regular patterns of negative and positive values of equal size at certain observation times. Then, nonnegativity constraints were imposed on the PP time series. The application of the spikes and noise control algorithm outputs reprocessed hourly precipitation (repro PP). Figure 2 exemplifies noise in PC values during 20–26 May 2006 at the gauge station in Hungry Horse, Montana. No rain during the period from 0000 UTC 21 May through 0500 UTC 25 May 2006 should display a flat line in its PC values, but there are wiggles in the PC values. The straightforward derivation to a PP value results in a sequence of many −0.01 and +0.01 values. Such noise has existed since the beginning of our archival record and covered the period October 1997–March 2008. Clusters of stations of noisy PC values were found in the northwestern and northeastern United States.

In summary, daily reprocessing steps involve the following:

  • decoding of the SHEF-format PC variable in full frequency, and the creation of metadata;

  • generation of the top-of-the-hour baseline PP with recovery of some missing values; and

  • generation of the repro PP by controlling some spikes and noise in the baseline PP.

In the first day of the month, the previous month’s HADS data are reprocessed to update the monthly metadata and compute each station’s monthly quality flag.

4. Assessment of reprocessed HADS

a. Comparison with real-time PP values

Real-time PP data generated by the HADS program were retrieved from the NCAR EOL site. The two metrics used for the comparison were the percentage of missing values and the percentage of top-of-the-hour measurements during each month of 2003–05. To manage the high volume of data, every seventh station from an alphabetical list of all stations in each of the 48 contiguous United States was subsampled for this assessment. Additionally, stations with more than 7 days of missing values in either repro PP or real-time PP were removed. The HADS program was unable to deliver SHEF-encoded historical HADS data to NCDC for the months of November 2003 and January 2004. December 2003 contained too many missing values in the real-time PP to allow for a fair comparison. Figure 3 shows the distribution of subsampled HADS stations during September 2005. The spatial inhomogeneity is not caused by the subsampling process, but by the network design.

b. Comparison with COOP daily precipitation

For a detailed comparisons of repro PP and real-time PP, a regional domain (North and South Carolina), during the warm season (April–September), was selected. In this domain, both repro PP and real-time PP were compared with Cooperative Observer Network (COOP) daily precipitation data (NCDC 2003). Figure 4 shows spatial distributions of HADS and COOP stations, and the average nearest distance between HADS and COOP stations is about 11 km. Each time series of HADS hourly precipitation was summed up according to the COOP’s reported observation time (at the top-of-the-hour) for 24 h. From this process, quality metrics were computed for two daily time series, HADS (repro PP and real-time PP) and COOP, for every month. Any HADS–COOP time series pairs were removed if the ratio of the two was greater than 3 or less than ⅓ for fear of gross error in the COOP and/or HADS data. Out of 2408 pairs, 344 were removed from this gross error check. If a missing value was present in the daily COOP data, then the next-nearest COOP station data (within 50 km) were used. The differences in the monthly totals between repro PP and real-time PP were defined as the gain, and the HADS-to-COOP ratio of the monthly totals was referred to as the bias ratio, which is a commonly used measure in QPE. As these statistics are based on the monthly totals, we excluded the missing values from the calculation of the monthly accumulation. The gain, bias ratio, and percentage of missing values are the three quality metrics used in the detailed comparison.

c. Patterns of missing values and their implications

In general, the rain gauge or electronics malfunctions at the time of measurement and/or during data transmission caused data to be unavailable at the specified observation time. On the other hand, the data provider deletes observed values that fail quality criteria at the processing level. The two causes must be differentiated so that the users are in control of correcting suspected data. We illustrate two examples: HADS station LLDN7 in July 2003 and MCKN7 in August 2003. The original data was reported at 15-min frequencies, so that hourly data on the top of the hour are available. Table 1 shows 15-min decoded PC values, real-time PP values, and reprocessed PP values at the LLDN7 on 1 July 2003. During the 5-h period, obvious measurement errors occurred. The reprocessed HADS data restored them rather than encoding them as missing values. Oftentimes, such gross errors help in the diagnosis of the duration of a disturbance. Table 2 is an example of station ROKN7, which incorrectly set to default zero values instead of encoding the suspect data as missing values. Even though the repro PP identified the pattern of spikes and corrected them, false zero PC values had appeared as early as October 2001. A history of station quality should be helpful to users in determining observation validity, and in the refinement of the QC algorithm.

The occurrence of missing values is hard to characterize when a gauge instrument malfunctions, but we have observed that a higher frequency of missing values in real-time PP may be attributed to the latency of the data ingestion process to the processing environment at the HADS program office. The recovery of missing values is possible by reprocessing data from the original SHEF-formatted archive. We have analyzed the diurnal cycle of the precipitation frequency (e.g., Dai 1999), one of the hydroclimate variables, for three warm seasons in North and South Carolina. A full-blown analysis of the hydroclimate variables is beyond the scope of this paper.

5. Results

Direct comparisons between repro PP and real-time PP are shown in Fig. 5. The monthly average of the fractional missing values varies from 0.5% to 2% in repro PP, and from 1.7% to 10.1% in real-time PP. Except for January 2003, repro PP consistently reduced the missing values, by as much as 9.4% in October 2004. Overall, the average missing value in repro PP is about 1.0%, which is equivalent to seven missed observations in 1 month. The improvement in the fractional missing values from real-time PP to repro PP is possible only through reprocessing. The fractional percentage of missing values in repro PP reflects the rate of unrecoverable missing values due to malfunctions by the gauge and in data transmission. The top-of the-hour observations are also important when comparing QPE data from other platforms such as radar. On average, the top-of-the-hour observations are available for about 85% of the times in repro PP, while in real-time PP they are available for about 50% of the times. The reason for the higher rate in the off the-top-of-the-hour observations in real-time PP is because the HADS program processes the latest available observations to support real-time hydrologic forecasting. The real-time focus means that the HADS data processing produces the hourly estimates as they become available. Thus, many non-top-of-the-hour data in real-time PP are transmitted to the users. The RFC, as a user, applies a narrow time window around the data; ±2 min on PP values and ±10 min on PC values from the top of the hour (J. Bradberry 2008, personal communication). Practically, half of the real-time PP data will be discarded in the retrospective production of MPE. An advantage of reanalysis is that many more top-of-the-hour values are available (Nelson et al. 2008).

Figure 6 shows the bias ratio results for both repro PP and real-time PP for the warm seasons (April–September) of 2003–05 in the Carolinas. A bias ratio close to unity indicates close agreement with the COOP data in the monthly total. The median values of repro PP are closer to unity than those of the real-time PP. Figure 7 is the empirical probability function of the gain (repro PP − real-time PP) for all three warm seasons. The function is trimmed between −10 and +11 mm, with a 0.5-mm interval. The distribution shows a positive skewness, namely, repro PP recovered observation values that real-time PP missed. The mean value of the highest probability bin (0.0–0.5) was 0.254 mm. The dashed line fitted the probability density function whose peak is at 0.18 mm.

Figure 8a shows the frequencies of the missing values in the daily cycle. The missing values in repro PP show a uniform distribution throughout the day over the 3-yr period, but those of the real-time PP display certain times of increased missing values. A disturbing feature of the real-time PP is the sharp increase in missing values during 1800–2300 UTC (1300–1800 local time) during 2004 when warm-season convective rains were active. Figure 8b shows a sharp drop in rain events in real-time PP against repro PP at 2100 UTC during 2004. Note that the increased number of missing values during 1800–2300 UTC causes a misinterpretation of the diurnal precipitation pattern. The secondary maximum rain events during 1200–1500 UTC during 2004 are attributed to the remnants of Hurricanes Charley, Florence, Ivan, and Jeanne, which passed through the region in the month of September. The pattern of the shift during 2005 was a result of the time reference error in real-time PP. The 4-h shift in real-time PP lasted from 1 July through 11 August 2005.

The results in this section have potentially large implications for the various applications and analyses. For example, the recovery of the missing values will provide a better dataset for studies of finescale climate signals such as for the diurnal pattern of precipitation. Figure 7 shows that the recovery of the missing values can provide a dataset that shows a more representative diurnal pattern of precipitation. In addition, the recovery of the no-rain events from missing values has implications for direct comparisons of the hourly rain gauge measurements to other rainfall measurements such as those from radar and satellite. Finally, the identification of both the top-of-the-hour and off-the-top-of-the hour values in the hourly precipitation data can have a significant impact in specific applications such as multisensor precipitation estimation and the modeling of hydrologic processes at fine scales.

6. Conclusions and future research recommendations

The retrospective reprocessing of HADS hourly precipitation data has reduced the average number of fractional missing values from 5% in the real-time product down to 1% during the assessment period 2003–05 in the conterminous U.S. (CONUS) domain. This is equivalent to a recovery of 29 h of missing values per month. The missing values in the reprocessed product are uniformly distributed across all hours of the day while the real-time product displayed a diurnal pattern. In addition, the reprocessed product improved the availability of the top-of-the-hour observations from 50% in the real-time product to 85%. The improved availability of the top-of-the-hour observations significantly increases the value of the hourly precipitation data in finescale applications, for example, data fusion with other high-frequency QPE methods from radars and satellites. The reprocessed HADS data are expected to be used as an input source to the Climate Prediction Center’s extended-period gridded observations for the detection and diagnostics of precipitation variations and long-term changes (Higgins et al. 1996). Currently, reprocessed HADS hourly data are available from NCDC in a 1-day-delayed mode (see the appendix).

For future research, we offer the following recommendations:

  • Preservation of original data is absolutely required in order to diagnose quality problems. Original SHEF-formatted HADS data made it possible not only to improve the quality of the data, but also to determine the origins of quality problems in the hourly precipitation product.

  • A single repository of gauge quality information is necessary in order to improve the quality of the precipitation data. Many RFCs save manual gauge QC results for their service area, but do not share it with other communities, and some network owners apply extra QC measures unknown to other users. The gauge quality Web page can serve as a common tool for both end users and network operators.

  • Gauge metadata must be completed in order to assess quality issues. The metadata must include not only geospatial information, but instrument type and maintenance records, in order to understand the history of the quality problems.

  • Reprocessing must utilize product and algorithm version control to allow the well-documented transitions to newer techniques.

Acknowledgments

The authors thank Lawrence Cedrone and the entire NWS/OHD HADS Program staff who have always been responsive and corrected problematic HADS gauge reports. The authors acknowledge Arthur Fotos for programming support of the reprocessed HADS Data Web site. The authors thank Anne Markel, Tom Peterson, Ed Kearns, and Xuangang Yin of NCDC for their careful review and three anonymous reviewers for many suggestions.

REFERENCES

  • Dai, A., 1999: Recent changes in the diurnal cycle of precipitation over the United States. Geophys. Res. Lett., 26 , 341344.

  • Higgins, R. W., Janowiak J. E. , and Yao Y-P. , 1996: A gridded hourly precipitation data base for the United States (1963–1993). NCEP/Climate Prediction Center ATLAS 1, 47 pp.

    • Search Google Scholar
    • Export Citation
  • Kondragunta, C., and Shrestha K. , 2006: Automated real-time operational rain gauge quality controls in NWS hydrologic operations. Preprints, 20th Conf. on Hydrology, Atlanta, GA, Amer. Meteor. Soc., P2.4. [Available online at http://ams.confex.com/ams/pdfpapers/102834.pdf].

    • Search Google Scholar
    • Export Citation
  • Kursinski, A. L., and Mullen S. L. , 2008: Spatiotemporal variability of hourly precipitation over the eastern contiguous Unites States from stage IV multisensor analyses. J. Hydrometeor., 9 , 321.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lin, Y., and Mitchell K. E. , 2005: The NCEP stage II/IV hourly precipitation analyses: Development and applications. Preprints, 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2. [Available online at http://ams.confex.com/ams/pdfpapers/83847.pdf].

    • Search Google Scholar
    • Export Citation
  • NCDC, cited. 2003: Data documentation for Data Set 3200 (DSI-3200). [Available online at http://www.ncdc.noaa.gov/oa/documentlibrary/].

  • Nelson, B., Seo D. J. , and Kim D. , 2008: Multi-sensor precipitation reanalysis. Preprints, Int. Symp. on Weather Radar and Hydrology, Grenoble, France, Laboratoire d’étude des Transferts en Hydrologie et Environnement (LTHE), 02-004, 150 pp. [Available online at http://www.wrah-2008.com/PDF/O2-004.pdf].

    • Search Google Scholar
    • Export Citation
  • NWS, 2002: Standard hydrometeorological exchange format (SHEF) manual. National Weather Service Manual 10-944. [Available online at http://www.nws.noaa.gov/directives/].

    • Search Google Scholar
    • Export Citation
  • Seo, D-J., and Breidenbach J. , 2002: Real-time correction of spatially nonuniform bias in radar rainfall data using gauge measurements. J. Hydrometeor., 3 , 93111.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tollerud, E., Collander R. , Lin Y. , and Loughe A. , 2005: On the performance, impact, and liabilities of automated precipitation gage screening algorithms. Preprints, 21st Conf. on Weather Analysis and Forecasting, Washington, DC, Amer. Meteor. Soc., P1.42. [Available online at http://ams.confex.com/ams/pdfpapers/95173.pdf].

    • Search Google Scholar
    • Export Citation

APPENDIX

Reprocessed HADS Hourly Precipitation Web Site

For the first time since the inception of the HADS program, the hourly precipitation data in HADS have been reprocessed. Reprocessing HADS data has improved the data quality by recovering many missing values and by choosing top-of-the-hour observations when subhourly data were available. Currently, version 1.0 HADS-reprocessed PP products are available for further applications. There were extended periods of missing values when the retrieval of original-format HADS data from OHD’s storage system failed, for example, December 1996, January 1997, August 1997, January 1998, June 1998, May 1999, January–April 2000, July–September 2000, December 2000, January–September 2001, November 2003, and January 2004. As of January 2008, the initial version of the repro PP data has been populated on the Web so that users can assess the quality and download them (http://www.ncdc.noaa.gov/hads/). The first Web site page guides the user to enter the month/year and click on the desired U.S. state. On the next page, the user can choose the desired HADS station from a map or enter the five-letter station name, which leads to a time series page.

Time series page

The lower two panels on the Web page display relative locations of neighboring HADS stations (lower-left panel) and the relative locations of neighboring daily COOP stations within a 1° × 1° box from the target HADS station. The user can view the neighboring station’s time series by clicking on the HADS location, where data can be viewed and/or downloaded.

Monthly statistics of HADS–COOP pair data are viewable by clicking “View Data” below the panel of neighboring COOP stations. The header displays the HADS station name, year, month, latitude, longitude, and number of collocated COOP stations. The 14 columns of each pair are described in Table A1.

Mass analysis page

An extensive user interface page can be found by clicking on the “Mass Analysis” link on the time series page. This page overlays accumulated precipitation with neighboring HADS stations using different colors for up to four stations. The effects of missing values (marked with black dots), variability of rain events as a function of distance and direction, and gross errors can be easily understood.

Storm period page

Users can examine storm periods by clicking the “Storm Period” link on the time series page, and selecting he desired storm period by entering the start and end times. This page displays time series of target stations as well as storm totals for all available neighboring HADS stations within a 1° × 1° box.

The Web page is considered experimental until the station quality history and the rescue of missing values are completed. After that process has been completed, initial versions of the reprocessed HADS hourly precipitation data are available (and at higher quality than the real-time data).

Fig. 1.
Fig. 1.

A schematic diagram of HADS real-time and archival product flows. A real-time precipitation product begins at NWS/OHD and is delivered to end users at an RFC or WFO. This product is also stored at NCEP and NCAR for other applications. The HADS program pushes original-format HADS data to NCDC once a day, where it is then reprocessed. Some RFCs report manually edited precipitation data, and they are also archived at NCDC through the SRRS.

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 2.
Fig. 2.

The time series of accumulated precipitation at the Hungry Horse, MT (HGHM8), gauge station during a 7-day period from 0000 UTC 20 May through 2100 UTC 26 May 2006. Apparent small perturbations make true rain events difficult to detect. Such noise has existed since the beginning of the archive (October 1997).

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 3.
Fig. 3.

Distribution of subsampled HADS stations during September 2005. The subsampling was made from every seventh station selected from an alphabetical list of all stations in the CONUS. At least one station must be present in each state and stations with more than seven days (168 h) of missing values are deleted.

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 4.
Fig. 4.

Distribution of all HADS stations available in NC and SC (solid dots) and COOP daily rain gauge stations (open circles).

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 5.
Fig. 5.

Two quality metrics comparing repro PP (dark bars) and real-time PP (gray bars) during 2003–05 for the CONUS. (a) Fractional missing values (the smaller the better). (b) Percentage of top-of-the-hour observations (the larger the fraction is, the better the time representation).

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 6.
Fig. 6.

Box plots of the bias ratio (monthly total precipitation comparing HADS to COOP) for the warm seasons for 2003–05. Median values of repro PP (in the dark color box) are closer to unity than those of real-time PP.

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 7.
Fig. 7.

Empirical probability function of the gain (repro PP − real-time PP) for all three warm seasons. The function is trimmed between −10 and +11 mm with 0.5-mm class intervals. The distribution shows a skewness toward positive values; namely, repro PP recovered observation values that real-time PP missed. The mean value of the highest probability bin (0.0 to 0.5) was 0.254 mm. The dashed line shows the fitted probability density function with a peak value of 0.18 mm.

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Fig. 8.
Fig. 8.

(a) Diurnal patterns of frequencies in missing values during warm seasons in the NC–SC domain. Solid circles connected with dashed lines are taken from real-time PP; open circles with solid lines are taken from repro PP. Real-time PP shows peaks of missing values at certain hours of the day, while repro PP reflects more of a uniform distribution in time. The peaks of missing values in 2004 are from May 2004. (b) As in (a) but for precipitation frequencies. Positive PP values are counted as rain events.

Citation: Weather and Forecasting 24, 5; 10.1175/2009WAF2222227.1

Table 1.

Decoded HADS data, real-time PP, and repro PP for station LLDN7 on 1 Jul 2003. The real-time PP withheld values of 11.92 and 21.60 for having failed the QC check. We denoted these values as NA.

Table 1.
Table 2.

Decoded HADS data, real-time PP, and repro PP for station ROKN7 on 7 Jun 2004. The real-time PP withheld values of −0.71, but the 0.71 values that survived as legitimate.

Table 2.

Table A1. Description of columns used in the monthly statistics of HADS and COOP.

i1520-0434-24-5-1287-ta01
Save