## 1. Introduction

The objective of this study is to develop automated quality control (QC) tools for precipitation, based on the empirical statistical distributions underlying the observations. This paper explores threshold quantifying methods to identify a subset of data consisting of potential outliers in the precipitation observations with the aim of reducing the manual checking workload.

Previous studies have documented various QC tools for use with weather data (Wade 1987; Gandin 1988; Guttman and Quayle 1990; Reek et al. 1992; Meek and Hatfield 1994; Eischeid et al. 1995; Shafer et al. 2000; Hubbard 2001; Hubbard et al. 2005). As a result, there has been good progress in the automated QC of weather indices, especially the daily maximum/minimum air temperature. The difficulty of performing after-the-fact QC of precipitation has led the designers of the Climate Reference Network (CRN) to implement a rain gauge with multiple strain gauges in order to avoid a single point of failure in the measurement process. Radar data have also been used to examine the spatial extent of rainfall and the plausibility of large precipitation events associated with localized storms (Martinez et al. 2004). However, uncertainty in daily precipitation hampers the automation of QC of daily precipitation data.

Two types of error may occur in the QC of weather data. A type I error is the erroneous flagging of good data while a type II error is failing to flag bad data. The relative frequency of these two types of error is a good indicator for evaluating the performance of various QC methods. The purpose of research on QC methods is to produce optimal techniques that flag bad data while minimizing the frequency of type I and type II errors.

Development of continuous and high-quality climate datasets is essential to populate Web-distributed databases (Hubbard et al. 2004) and to serve as input to Decision Support Systems (e.g., Westphal et al. 2003). Recent developments in QC include new methods to estimate missing data (Hubbard et al. 2005) and to place confidence limits on in situ temperature measurements of the daily maximum and minimum temperatures (You and Hubbard 2006). These studies on temperature have focused on methodology that allows the comparison of the performance of various quality control tests. Using this approach one is able to choose the method that provides the best performance, that is, flags the highest percent of known errant data values. At the same time one is able to estimate the number of type I errors (good data flagged as bad) that will be generated by each of the QC tests. The spatial regression test (SRT) was found to provide the best estimates for temperature at sites across the United States. The SRT method can also be employed to estimate confidence limits for daily maximum or minimum temperature data and to flag as potential outliers those values that do not fall within the confidence limits.

Unfortunately, the search for effective precipitation QC methods has proven more difficult. The SRT method is able to identify many of the errant data values, but the rate of finding errant values to that of making type I errors is conservatively 1:6. This is not acceptable because it would take excessive manpower to check all the flagged values that are generated in a nationwide network. For example, the number of precipitation observations from the cooperative network in a typical day is 4000. Using an error rate of 2% and considering the type I error rate indicates that several hundred values may be flagged, requiring substantial personnel resources for assessment.

In this paper a new precipitation QC method, which consists of using a series of gamma distributions fit for each site, is introduced and its performance is compared to other widely used QC tests. The use of this gamma distribution in a quality assurance procedure for precipitation data is examined and the suitability of using the gamma distribution based on parameters estimated from historical data to represent the distribution of daily precipitation amount is determined. This paper also examines the use of the *Q* test for precipitation. The performance of the gamma distribution approach, *Q*-test approach, and the multiple interval gamma distribution approach is evaluated using a seeded error dataset (Hubbard et al. 2005).

## 2. Methods and data

Hubbard et al. (2005) evaluated several methods to determine their performance in identifying precipitation outliers. These methods are threshold, persistence, and spatial regression tests. In this case the threshold test is not arbitrary because thresholds are associated with the historical distribution of the values through the selected probability. The persistence test checks whether the variance of measurements falls between certain limits that are determined from the population of data taken when sensors are working properly. The SRT estimates the precipitation at the target station and the confidence intervals so that potential outliers can be flagged. More details of these approaches can be found in Hubbard et al. (2005). The performance of a threshold test using the gamma distribution (gamma distribution test) and of a test, the “*Q*-Test” (Kunkel et al. 2005), using a quality (*Q*) metric based on comparisons with neighboring stations is quantified in this paper. Also, a new precipitation QC method is introduced, which consists of a series of gamma distributions fit for each site. These methods will be described below.

### a. Estimation of parameters for distribution of precipitation

*G*(

*γ*,

*β*). The shape and scale parameters

*γ*,

*β*can be estimated from the precipitation events following Johnson et al. (1994, 1995) and Evans et al. (2000), where

*and*X

*s*are the sample mean and the sample standard deviation, respectively.

### b. Threshold approach for QC of precipitation using gamma distribution

The data for each station in the gamma distribution test include all precipitation events on a daily basis for a year. The parameters for left-censored (0 values excluded) gamma distributions, on a monthly basis, are also calculated, based on the precipitation events for individual months in the historical record. To ascertain the representativeness of the gamma distribution, the precipitation values for the corresponding percentiles *P*: 99%, 99.9%, 99.99%, and 99.999% were computed from the gamma distribution and compared with the precipitation values for given percentiles based on ranking (original data).

*x*(

*j, t*) is the observed daily precipitation on day

*t*at station

*j*and

*I*(

*p*) is the threshold daily precipitation for a given probability,

*p*(=

*P*/100), calculated using the gamma distribution. A value not meeting this criterion is considered an outlier (the shaded area to the right of the

*p*= 0.995 value for the distribution for all precipitation events in Fig. 1). The test function uses the one-sided test for precipitation, a nonnegative variable.

### c. Q test for precipitation

Kunkel et al. (2005) defined a quality metric, denoted as *Q* and based on comparisons with neighboring stations, which was used to identify outliers in a historical precipitation dataset. For each station, all nonzero daily precipitation values were ranked in ascending order. The values exceeding the 95th percentile threshold were then selected for further testing to identify those values that were most likely to be invalid.

*x*(

*j, t*), two measures of the quality metric

*Q*were calculated. The first measure incorporated the actual daily precipitation amounts as follows: where

*Q*

_{amt}(

*j, k, t*) is the

*Q*measure using the precipitation amounts,

*x*(

*j*,

*t*) for day

*t*, and the precipitation amount,

*x*(

*k, t*) at the neighboring station

*k*on day

*t*. The second measure was calculated from a daily percentile rank as follows: where

*Q*

_{per}(

*j*,

*k*,

*t*) is the

*Q*measure using precipitation percentiles. Here,

*R*(

_{j}*t*,

*m*) and

*R*(

_{k}*t*,

*m*) are the percentile ranks for month

*m*of the daily precipitation on day

*t*for the station

*j*being evaluated and the nearest neighbor

*k*, respectively. The monthly percentiles were obtained by ranking all historical nonzero daily precipitation values for the month. Low values of

*Q*indicate large differences between the target and neighboring stations and suggest the possibility of data errors.

The *Q* estimates in Eqs. (4) and (5) were calculated for up to 50 neighboring stations. The final value of *Q _{j}*, for

*x*(

*j*,

*t*), is the maximum individual

*Q*value among the set of

*Q*

_{amt}and

*Q*

_{per}. The key aspect of the procedure is that a high

*Q*value will be calculated if any single nearest neighbor station has a precipitation value that is seasonably high. Values with very low

*Q*only occur when no nearby station has a high precipitation value. Kunkel et al. (2005) indicated that this procedure was effective in selecting invalid values and maximizing the use of personnel resources for manual assessment. This test is structured to minimize type I errors, but not necessarily type II errors. The measurement is identified for further manual review when

*Q*is less than a threshold value, for example, 0.50 as suggested by Kunkel et al. (2005).

### d. New multiple interval range limit gamma distribution test for precipitation

Analysis has shown that precipitation data at a station can be fit to a gamma distribution, which can then be applied to a threshold test approach as described in section 2b. With this method only the most extreme precipitation events will be flagged as potential outliers so errant data at other points in the distribution are missed.

The multiple intervals gamma distribution (MIGD) was developed to address these other points along the distribution. It assumes that meteorological conditions that produce a certain range in average precipitation at surrounding stations will produce a predictable range at the target station. Our concept is to develop a family of gamma distributions for the station of interest and to selectively apply the distributions based on specific criteria. The average precipitation for each day is calculated for neighboring stations during a time period (e.g., 30 yr). These values are ranked and placed into *n* bins with an equal number of values in each. The range for *n* intervals can be obtained from the cumulative probabilities of neighboring average time series, 0, 1/*n*, 2/*n*, . . . , *n* − 1/*n*, 1. For the *i*th interval all corresponding precipitation values at the station of interest are gathered. This process is repeated for each of the *n* intervals resulting in a family of gamma curves (*G _{i}*). The operational QC involves the application of the threshold test where the gamma distribution for a given day is selected from the family of curves based on the average precipitation for the neighboring stations. Each interval can be defined as (

*p*(

*i*/

*n*)],

*p*[(

*i*+ 1)/

*n*]], where

*p*(

*i*/

*n*) is the cumulative probability associated with

*i/n*,

*I*= 0 to

*n*− 1, and

*p*(

*i*/

*n*)] is the neighboring stations’ average for a given cumulative probability. Figure 1 shows an example of the resulting distribution at the target station for the

*i*th interval.

*x*at the station of interest, the neighboring stations’ average is calculated. If the average precipitation falls in the interval (

*p*(

*i*/

*n*)],

*p*[(

*i*+ 1)/

*n*]], then

*G*is used to form a test: where

_{i}*p*is a probability in the range (0.5, 1), and the

*G*(

_{i}*p*) is the precipitation value for the given probability

*p*in the gamma distribution associated with the

*i*th interval. This equation forms a two-sided test, as illustrated in Fig. 1. Any value that does not satisfy this test will be treated as an outlier for further manual checking. The intervals and the estimation of this method were implemented using

*R*statistical software (Ihaka and Gentleman 1996).

### e. Data

A dataset consisting of original data and seeded errors (Hubbard et al. 2005) was used in this study to evaluate the performance of the different QC approaches for precipitation. Analyses were carried out for six stations (see Table 1) that represent a range of climates. Detailed results for Crete, Nebraska, are presented to illustrate the complete comparison and sensitivity issues for the gamma distribution, *Q*-test, and MIGD methods. The percentiles of precipitation for a specific date for a given station were calculated from all recorded nonzero precipitation events at the station. The length used in calculating parameters for the *Q* test exceeds 30 yr for each of the six stations.

A seeded error dataset was created and the performance of quality assurance software was evaluated with regard to the number of seeded errors that were flagged. The ratio of errors caught to the total number of seeds by each procedure can be compared across the range of error magnitudes introduced. The data used to create the seeded error dataset were retrieved from the National Climatic Data Center (NCDC) archives and the Applied Climatic Information System (ACIS) system, and except for a few differences in data values, the sets are identical. They are the data as reported for all the months in 1971∼2000 by observers in the Cooperative Observer Program (National Weather Service 1987). The data have been assessed using NCDC procedures and are referred to as “clean” data. Note, however, that clean does not necessarily infer that the data are true values but means instead that the largest outliers have been removed.

*f*was selected using a random number generator operating on a uniform distribution with a mean of zero and range of ±3.5. Here,

*f*also represents the standardize anomalies of the seeded errors. This number was then multiplied by the standard deviation

*s*of the variable in question to obtain the error magnitude

*E*for the randomly selected observation

*x*: A new random variable

*f*is generated when (

_{x}*E*+

_{x}*x*) < 0 to avoid any negative seeded precipitation values because they are easily detected with a physical limit test and are not of interest in this study. Thus the seeded error value is skewed distributed when

*f*< 0 but roughly uniformly distributed when

*f*> 0. The selection of 3.5 is arbitrary but does serve to produce a large range of errors.

## 3. Results

As shown in Fig. 2a, the method using the original gamma distribution for precipitation identifies a low fraction of seeded errors when *p* is relatively high (i.e., fraction flags identified <0.05 for *p* > 0.99). This matches our expectations for the threshold checking of precipitation. A large fraction of the days with seeded errors will not be caught because the value of precipitation with a seeded error is, in many cases, not comparable to extreme values in the distribution. The value of 1 − *p* for a given fraction of seeded errors flagged varies widely among the different stations for high values of *p* (see left side of Fig. 2a) but converges as *p* decreases (see right side of Fig. 2a).

Figure 2b compares the gamma distribution results in Fig. 2a with the results from other tests obtained by Hubbard et al. (2005) for Tucson, Arizona. These other tests assume a normal distribution. Although the SRT method flags a high fraction of the seeded errors, the SRT is not efficient in evaluating the seeded error dataset because the fraction of flagged values becomes asymptotic as the *P* (=100*p*) value increases. Both the threshold and persistence tests generally flag fewer values than the gamma distribution. Thus, Fig. 2b demonstrates that the gamma distribution method is better than the threshold, persistence, and SRT methods in terms of the identified fraction of seeded errors changing with threshold percentile. In the next comparisons only the gamma distribution method and a percentile of 99% (labeled Gamma.99) are shown.

The *Q* test is also applied to identify the seeded errors for different thresholds (see Fig. 3). Only a few seeded errors (always <5%) were identified for each of the six stations because the *Q* test is relevant only to the extreme values of precipitation, in this case greater than the 95th percentile of historical precipitation. Those modified precipitation values lower than the 95th percentile value automatically pass the test. The *Q* test avoids improperly placing flags for high precipitation events when the surrounding stations also have large values of precipitation. However, these results suggest that the *Q* test is only efficient as a general test for discriminating extremes from outliers.

In this study, the MIGD method and all other methods for all six stations used in Hubbard et al. (2005) are presented, but detailed results only at Crete are compared in Figs. 4, 5, 6 and 7. Table 1 provides summary results obtained for all six stations.

Figure 4 presents the fraction of seeded errors flagged and the fraction of type I errors out of all flags for the MIGD, as a function of the number of intervals used or equivalently the number of gamma distributions fit for the site, where the percentile was taken as 99%. The fraction of the seeded errors flagged increases with increasing number of intervals in the MIGD method. The type I error fraction when 8 intervals were employed was the lowest. However, the 10- and 8-interval MIGDs were nearly the same with respect to both the fraction of seeded errors flagged and the fraction of type I errors. Figure 5a shows the fraction of seeded errors flagged as compared to the size of the seeded error (expressed in standardized anomalies and denoted by *f*; see Hubbard et al. 2005) for different intervals at the percentile of 99%. At Crete, the flagged fraction was similar for 5, 6, 8, and 10 intervals. Considering both the type I error fraction and the fraction of identified seeded errors shown in Fig. 4, the use of 8 or 10 intervals is recommended. In Figs. 5b,c and Fig. 7 only the MIGD method with 8 intervals is presented.

Figure 5b shows a comparison of the fraction of the seeded errors flagged for MIGD at percentiles of 99.0%, 99.5%, and 99.9%; the SRT, threshold, and persistence methods (Hubbard et al. 2005); the *Q* test (Kunkel et al. 2005); and the gamma distribution method at percentile of 99%. Overall, the MIGD method recognizes 30%–50% of the seeded errors. This is far better than any of the other tests included here. The threshold, persistence, and *Q* test recognize less than 10% of the errors regardless of the size of error. Gamma (99%) does a little better but only for the largest errors. The SRT does rather well but (not shown here) generates about four type I errors for every errant value correctly identified. This would flag nearly 10% of the data in operational use and because of manpower limitations would be unacceptable. The new multiple-gamma approach works well, finding 70% or more of the errors in the range *f* > 1.5 at Crete and more than 50% of the errors for the same range at Tucson (Table 1). It appears that the effectiveness of the method depends on the precipitation climatology, and thus definite geographical patterns for the country are expected. The number of type I errors produced by the MIGD approach, at say the 99% confidence level, is approximately one for every four errant values correctly identified (see Fig. 5c).

Figure 6 presents the percentiles for actual precipitation (based on rank) and estimated percentile based on the gamma distribution with the estimated parameters using Eqs. (1) and (2) for each interval of the 10-interval MIGD. The estimated precipitation values for the given percentiles fit well with the precipitation observations at the given percentiles. This good fit suggests that the MIGD method is well grounded.

Table 1 lists the number of seeded errors, flagged errors, type I errors, fraction of identified seeded errors, and the equivalent number of observations requiring validation (ORV) for all six stations. The MIGD method works better for stations in the Great Plains such as Crete and the Dickinson Experiment Station, North Dakota, than for stations in the subtropical climate of Florida, such as Key West International Airport and Fort Myers Page Field. One possible explanation is that the precipitation events in the Great Plains possess stronger spatial correlation than those in Florida or the spatial structure of surrounding stations in Nebraska is more symmetrical than that in Florida. The quadrants wherein the surrounding stations exist in Florida are restricted by the location of the landmass and surrounding water. Convection over the land may be as well organized in Florida as in Nebraska or North Dakota, but in many directions there is only water and no stations. Complex terrain may also lead to difficulties in QC of precipitation at the arid site (Tucson) and the mountainous station (Lake Yellow Stone, Wyoming). However, the MIGD method is superior to other methods at these sites considering the fraction of seeded errors identified and the type I errors. For example, for a percentile level of 90%, the MIGD is better than the SRT method on both counts at all six stations.

The ideal QC method will minimize the time required for manual validation. Table 1 also includes the equivalent daily ORV for different methods for the six stations. For illustrative purposes, it is assumed that a state has 400 stations and the number of flagged data entries for the 30 yr of data can be used to calculate the average daily workload based on the equivalent daily observation time series assumed to have roughly 2% bad data like the seeded error dataset. The MIGD method at *P* = 90% is better than the SRT method in both the higher fraction of seeded errors flagged and the lower ORV. If 95% is used for the MIGD method, the fraction of seeded errors flagged is similar to the SRT method but with much lower ORV (less than 10 stations a day for any of the six sites).

Figure 7 plots the equivalent daily ORV using the assumption that the average number of flags per station would be the same as at Crete. The number of flagged seeded errors, type I errors, and the number of all flags are also plotted. The MIGD method has substantially fewer type I errors and a smaller number of ORV when the *P* of MIGD takes a value of 95% or higher. At Crete, the use of *P* = 95% identifies a high fraction of seeded errors with a relatively small fraction of type I errors. When *P* for MIGD is 95% or lower, the fraction of type I errors will increase considerably. However, as listed in Table 1, the values change for stations. Although the sites used in the study represent a cross section of climate, more research is required to assess by region or state the optimum probability levels that should be used in the application of the MIGD method.

Figure 8 demonstrates the percentile of precipitation calculated using gamma distributions for different lengths of period for Crete. The comparisons of the calculated percentiles to the percentile obtained from the observations demonstrates that the gamma distribution can be fitted using short time series, since most of the percentiles are close to the percentiles of observation. Two curves of percentiles fall farther than other lines, the 5-yr percentile (1991–95) and 10-yr percentile (1991–2000). We found that the mean and standard deviation of the time period 1991∼95 is smaller than that of other periods. Although the time period 1996∼2000 is relatively dry with low mean precipitation, the standard deviation is much higher than that of 1991∼95; therefore, the percentile curve falls within the main group. The small mean and variability of the time period 1991∼95 contribute to the small mean and variability of the time period 1991∼2000. There is a deviation between the actual data curve and the fitted curves for *P* less than 0.002. For the MIGD method the implication is that a few more type I errors might be made. As can be seen from Fig. 7 the type I errors are not notably increased at the *P* = 0.001 level, and thus the divergence of the two curves (data and fitted) has little impact.

In the results section we suggest that the use of more intervals for the MIGD method (e.g., 8 or 10) is more efficient than using fewer intervals (e.g., 3) based on the long time series of 30 yr. When a station has a short time series, for example, 5 yr, which is common for the newly established stations or observation networks, the MIGD with fewer intervals is suggested because the number of events in each bin will be too small to use the 8- or 10-interval approach.

The question of how to test a zero value at the target station naturally arises. The zero precipitation was not included in forming the distributions. The zero precipitation was included in the threshold test with the condition that a zero was considered, by default, valid if at least one of the surrounding stations recorded a zero (named as the default case shown in Table 1). When we exclude the zero precipitation from consideration (see examples at Crete, nonzero precipitation in Table 2), the results are identical to those of the default case. A check of the data indicated that this phenomenon was due to the fact that, by chance, no seeded precipitation values of zero were introduced in these examples. When we include the zero precipitation in the test without considering whether the surrounding stations recorded a zero (or zeros), the MIGD method has exactly the same performance for threshold probabilities higher than 99% with or without zero precipitation testing (see examples at Crete, with zero precipitation in Table 2). For a threshold probability of 90%, the flagged zero precipitation is relatively high, which suggests a higher threshold probability (e.g., >97.5%) in the application of the MIGD method. The identified flags were also calculated for each subclass by the standardized anomalies. It was found (not shown) that the fraction of identified seeded errors increases with the standard anomalies of the seeded errors for the MIGD and SRT methods. The MIGD method (8 intervals) has outstanding performance and identifies the seeded errors, expressed as standardized anomalies, higher than 0.5 for threshold probabilities higher than 90%.

## 4. Discussion and conclusions

This study uses the gamma distribution to represent the distribution of precipitation events. For the precipitation regimes represented by the six studies used in this study, the results indicate that the gamma distribution is well suited for deriving more appropriate thresholds for a particular precipitation event. The calculated extreme values provide a good basis for identifying extreme outliers in the precipitation observations. The inclusion of all precipitation events reduces the data requirements for the quantification of extreme events, which generally requires a long time series of observations (e.g., using the Gumbel distribution.) Using the approach based on the gamma distribution, a suitable representation of the distribution of precipitation can be obtained with only a few years of observation, as is the case with newly established automatic weather stations, for example, the Climate Reference Network. Further study is required for probability selection in the gamma distribution approach.

The *Q*-test approach serves as a tool to discriminate between extreme precipitation and outliers, and it has proven to minimize the manual examination of precipitation by choice of parameters that identify the most likely outliers (Kunkel et al. 2005). The performance of both the gamma distribution test and the *Q* test is relatively weak with respect to identifying the seeded errors. The *Q* test is different from the gamma distribution method because the *Q* test uses both the historical data and measurements from neighboring stations while the simple implementation of the gamma distribution method only uses the data from the station of interest.

The MIGD method is a more complex implementation of the gamma distribution that uses historical data and measurements from neighboring stations to partition a station’s precipitation values into separate populations. The MIGD method shows real promise and outperforms other QC methods for precipitation. This method identifies more seeded errors and creates fewer type I errors than the other methods. MIGD will be used as an operational tool in identifying the outliers for precipitation in ACIS. However, the fraction of errors identified by the MIGD method varies for different probabilities and among the different stations. Network operators, data managers, and scientists who plan to use MIGD to identify potential precipitation outliers can perform a similar analysis (sort the data into bins and derive the gamma distribution coefficients for each interval) over their geographic region to choose an optimum probability level.

## Acknowledgments

This work was partially supported by the Office of Global Programs, National Oceanic and Atmospheric Administration (NOAA) under Award NA16GP1498. Additional support was provided through NOAA Cooperative Agreement NA17RJ1222. Any opinions, findings, and conclusions are those of the authors and do not necessarily reflect the views of NOAA or the Illinois State Water Survey.

## REFERENCES

Eischeid, J. K., , Baker C. B. , , Karl T. , , and Diaz H. F. , 1995: The quality control of long-term climatological data using objective data analysis.

,*J. Appl. Meteor.***34****,**2787–2795.Evans, M., , Hastings N. , , and Peacock B. , 2000:

*Statistical Distributions*. 3d ed. John Wiley and Sons, 221 pp.Gandin, L. S., 1988: Complex quality control of meteorological observations.

,*Mon. Wea. Rev.***116****,**1137–1156.Guttman, N. V., , and Quayle R. G. , 1990: A review of cooperative temperature data validation.

,*J. Atmos. Oceanic Technol.***7****,**334–339.Hubbard, K. G., 2001: Multiple station quality control procedures. Automated Weather Stations for Applications in Agriculture and Water Resources Management, AGM-3 WMO/TD No. 1074, 133–136.

Hubbard, K. G., , DeGaetano A. T. , , and Robbins K. D. , 2004: Announcing a Modern Applied Climatic Information System (ACIS).

,*Bull. Amer. Meteor. Soc.***85****,**811–812.Hubbard, K. G., , Goddard S. , , Sorensen W. D. , , Wells N. , , and Osugi T. T. , 2005: Performance of quality assurance procedures for an applied climate information system.

,*J. Atmos. Oceanic Technol.***22****,**105–112.Ihaka, R., , and Gentleman R. , 1996: R: A language for data analysis and graphics.

,*J. Comput. Graphical Stat.***5****,**299–314.Johnson, N. L., , Kotz S. , , and Balakrishnan N. , 1994:

*Continuous Univariate Distributions*. 2d ed. Vol. 1, John Wiley and Sons, 784 pp.Johnson, N. L., , Kotz S. , , and Balakrishnan N. , 1995:

*Continuous Univariate Distributions*. 2d ed. Vol. 2, John Wiley and Sons, 752 pp.Kunkel, K. E., , Easterling D. R. , , Hubbard K. , , Redmond K. , , Andsager K. , , Kruk M. C. , , and Spinar M. L. , 2005: Quality control of pre-1948 cooperative network observer data.

,*J. Atmos. Oceanic Technol.***22****,**1691–1705.Martinez, J. E., , Fiebrich C. A. , , and Shafer M. A. , 2004: The value of a quality assurance meteorologist. Preprints,

*14th Conf. on Applied Climatology*, Seattle, WA, Amer. Meteor. Soc., CD-ROM, 7.4.Meek, D. W., , and Hatfield J. L. , 1994: Data quality checking for single station meteorological databases.

,*Agric. For. Meteor.***69****,**85–109.National Weather Service, 1987: Cooperative program management. Weather Service Operations Manual B-17 (revised), National Oceanic and Atmospheric Administration, Silver Spring, MD, 50 pp.

Reek, T., , Doty S. R. , , and Owen T. W. , 1992: A deterministic approach to the validation of historical daily temperature and precipitation data from the cooperative network.

,*Bull. Amer. Meteor. Soc.***73****,**753–765.Shafer, M. A., , Fiebrich C. A. , , Arndt D. S. , , Fredrickson S. E. , , and Hughes T. W. , 2000: Quality assurance procedures in the Oklahoma mesonetwork.

,*J. Atmos. Oceanic Technol.***17****,**474–494.Wade, C. G., 1987: A quality control program for surface mesometeorological data.

,*J. Atmos. Oceanic Technol.***4****,**435–453.Westphal, K. S., , Vogel R. M. , , Kirshen P. , , and Chapra S. C. , 2003: Decision support system for adaptive water supply management.

,*J. Water Resour. Plann. Manage.***129****,**165–177.You, J., , and Hubbard K. G. , 2006: Quality control of weather data during extreme events.

,*J. Atmos. Oceanic Technol.***23****,**184–197.

Fraction of seeded errors flagged and type I errors fraction of all flags for six stations.

Performance of methods for zero precipitation testing and nonzero precipitation testing at Crete.