Recovering “Lost” Data: An Adaptive, Least-Squares Outlier Detection Technique

Kimberly L. Elmore National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Kimberly L. Elmore in
Current site
Google Scholar
PubMed
Close
,
F. Wesley Wilson Jr. National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by F. Wesley Wilson Jr. in
Current site
Google Scholar
PubMed
Close
, and
Michael J. Carpenter National Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Michael J. Carpenter in
Current site
Google Scholar
PubMed
Close
Restricted access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

On occasion, digital data gathered during field projects suffers damage due to hardware problems. If no more than half the data are damaged and if the damaged data are randomly distributed in space or time, there is a high probability that the damage can be isolated and repaired using the algorithm described in this paper. During subsequent analysis, some data from the NCAR CP4 Doppler radar were found to be damaged and initially seemed to be lost. Later, the nature of the problem was found and a general algorithm was developed that identifies outliers, which can then be corrected. This algorithm uses the fact that the second derivative of the damaged data with respect to (in this case) radial distance is relatively small. The algorithm can be applied to any similar data. Such data can be closely approximated by a first order, least-squares regression line if the regression line is not applied over too long an interval. This algorithm is especially robust because the length of the regression fit is adaptively chosen, determined by the residuals, such that the slope of the regression line approximates the first radial derivative. The outliers are then marked as candidates for correction, allowing data recovery. This method is not limited to radar data; it may be applied to any data with damage as outlined above. Examples of damaged and corrected data sets are shown and the limitations of this method are discussed as are general applications to other data.

Abstract

On occasion, digital data gathered during field projects suffers damage due to hardware problems. If no more than half the data are damaged and if the damaged data are randomly distributed in space or time, there is a high probability that the damage can be isolated and repaired using the algorithm described in this paper. During subsequent analysis, some data from the NCAR CP4 Doppler radar were found to be damaged and initially seemed to be lost. Later, the nature of the problem was found and a general algorithm was developed that identifies outliers, which can then be corrected. This algorithm uses the fact that the second derivative of the damaged data with respect to (in this case) radial distance is relatively small. The algorithm can be applied to any similar data. Such data can be closely approximated by a first order, least-squares regression line if the regression line is not applied over too long an interval. This algorithm is especially robust because the length of the regression fit is adaptively chosen, determined by the residuals, such that the slope of the regression line approximates the first radial derivative. The outliers are then marked as candidates for correction, allowing data recovery. This method is not limited to radar data; it may be applied to any data with damage as outlined above. Examples of damaged and corrected data sets are shown and the limitations of this method are discussed as are general applications to other data.

Save