Measuring the Performance of Data Validators

N. Guttman National Climatic Data Center, Asheville, NC 28801-2696

Search for other papers by N. Guttman in
Current site
Google Scholar
PubMed
Close
,
C. Karl National Climatic Data Center, Asheville, NC 28801-2696

Search for other papers by C. Karl in
Current site
Google Scholar
PubMed
Close
,
T. Reek National Climatic Data Center, Asheville, NC 28801-2696

Search for other papers by T. Reek in
Current site
Google Scholar
PubMed
Close
, and
V. Shuler National Climatic Data Center, Asheville, NC 28801-2696

Search for other papers by V. Shuler in
Current site
Google Scholar
PubMed
Close
Restricted access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

The National Climatic Data Center is committed to archiving and disseminating data of high quality. Automated screening of data has proven to be very effective in isolating suspect and erroneous values in large meteorological data sets. However, manual review by validators is required to judge the validity and correct the data that is rejected by the screens. Since the judgment of the validators affects the quality of the data, the efficacy of their actions is of paramount importance.

Techniques have been developed to measure whether data validators make the proper decision when editing data. Measurement is accomplished by replacing valid data with known errors (so-called “seeds”) and then monitoring the validator's decisions. Procedural details and examples are given.

The measurement program has several benefits: (1) validator performance is quantitatively evaluated; (2) limited inferences about data quality can be made; (3) feedback to the validators identifies training requirements and operational procedures that could be improved; and (4) errors of omission as well as of commission are found. It is important to recognize that seeding does not detect errors inserted into the data by validators. Thus, seeding is but one aspect of a comprehensive surveillance mechanism.

The National Climatic Data Center is committed to archiving and disseminating data of high quality. Automated screening of data has proven to be very effective in isolating suspect and erroneous values in large meteorological data sets. However, manual review by validators is required to judge the validity and correct the data that is rejected by the screens. Since the judgment of the validators affects the quality of the data, the efficacy of their actions is of paramount importance.

Techniques have been developed to measure whether data validators make the proper decision when editing data. Measurement is accomplished by replacing valid data with known errors (so-called “seeds”) and then monitoring the validator's decisions. Procedural details and examples are given.

The measurement program has several benefits: (1) validator performance is quantitatively evaluated; (2) limited inferences about data quality can be made; (3) feedback to the validators identifies training requirements and operational procedures that could be improved; and (4) errors of omission as well as of commission are found. It is important to recognize that seeding does not detect errors inserted into the data by validators. Thus, seeding is but one aspect of a comprehensive surveillance mechanism.

Save