## 1. Introduction

The spatial regression test (SRT) method has been found to be superior to the inverse distance weighting (IDW) method (You et al. 2004, manuscript submitted to *J. Atmos. Oceanic Technol.*, hereafter YHG) when applied to provide estimates for the maximum air temperature (*T*_{max}) and the minimum air temperature (*T*_{min}) in the Applied Climate Information System (ACIS). However, the sensitivity of the performance of both methods to the input parameters has not been evaluated. This paper conducts a sensitivity analysis on the performance of the SRT and IDW methods of estimating missing data. We examine the effect of distance to the surrounding stations and the number of surrounding stations on overall performance.

Quality assurance (QA) procedures have been applied (Guttman and Quayle 1990) to (semi) automatically check the validity of weather data from the cooperative climatological stations in the National Climatic Data Center (NCDC). General testing approaches, such as the threshold method and step change test were designed for the single station review of data to detect potential outliers (Wade 1987; Meek and Hatfield 1994; Eischeid et al. 1995).

Recently, the use of multiple stations in quality assurance procedures has proven useful; for example the spatial tests compare a station’s data against the data from neighboring stations (Wade 1987; Gandin 1988; Eischeid et al. 1995; Hubbard 2001a). The spatial tests involve the use of neighboring stations to make an estimate of the measurement at the station of interest. The IDW technique weights the values at surrounding stations according to the inverse of the distance separating the locations (Guttman et al. 1988; Wade 1987), while other statistical approaches seek to provide a nonbiased estimate [e.g., multiple regression (Eischeid et al. 1995, 2000) and the bivariate linear regression test (Hubbard et al. 2005)].

Unlike the IDW, the spatial regression test used herein (Hubbard et al. 2005) does not assign the largest weight to the nearest neighbor but, instead, assigns the weight according to the root-mean-square error (rmse) between the station of interest and each of the neighboring stations. Research has demonstrated the excellent performance of the spatial regression test in identifying seeded errors (Hubbard et al. 2005). In a separate study, the investigators used the spatial regression test to identify the potential outliers during unique weather events. In the case of hurricanes, cold front passage, floods, and droughts, the number of QA failures was largely due to the different times of observation coupled with the ambiguity associated with position relative to tight gradients of temperature or precipitation.

The spatial regression approach was found superior to the inverse distance approach for the maximum air temperature (*T*_{max}) and the minimum air temperature (*T*_{min}) in a previous study (YHG), with the largest improvements in the coastal and mountainous regions. Both methods were found to perform relatively poorer when the weather stations were sparsely distributed (YHG). The success of the spatial regression approach is in part due to its implicit ability to resolve the systematic differences caused by temperature lapse rate with elevation, which is not accounted for in the inverse distance weighting method.

The spatial regression and inverse distance weighting methods provide separate estimates of a station’s data based on surrounding stations. This is critical to the processes of identifying suspect data and is undertaken here to ensure quality data in the ACIS (Hubbard et al. 2004). The estimates are also used to form a continuous dataset by filling in missing values. The spatial regression test has three parameters that the user may adjust to obtain estimates. These are the station list, the width of the time window, and the time offset related to the window. The inverse distance weighting technique has one parameter, the number of stations used in the weighted estimate. In the previous study the number of stations used in the IDW method was five. The radius of inclusion for the SRT method was set to 50 km, except in those areas where no station was located within 50 km. In such cases the radius of inclusion was increased to 150 km. This paper conducts a sensitivity analysis of the performance of both the spatial regression test and inverse distance weighted method for both *T*_{max} and *T*_{min} to the distance to the reference stations and the number of reference stations. The performance of each estimation approach was evaluated, and the optimum parameters are suggested.

## 2. Data

The tests were carried out over four states, California (CA), Utah (UT), Nevada (NV), and Nebraska (NE), for the year 2000. These four states represent the diverse landforms and topographical features from coastal regions to mountainous regions to plains. The density of stations also varies significantly within these four states. Based on results from a previous study (YHG), we concluded that these states represent the range of conditions over which we wished to test the sensitivity of the methods. California was chosen because the coastal stations there were found to have relatively poor performance for both the SRT method and IDW method compared to the case of the plains stations (YHG). Nevada was selected for relatively poor performance in remote parts of the state due to sparsely distributed stations in mountainous regions. The performance of SRT and IDW was better in Utah, which was also selected for sensitivity analysis because it has mountainous regions, like Nevada, but generally higher station density. Nebraska was chosen as an example for the plains states, with adequate station density, except in the Sand Hills region in the north-central portion of the state.

The data from stations in the four states within the cooperative observer weather data network, a regional automated weather data network (Hubbard 2001b), and other networks were retrieved through the ACIS, a distributed data management system. This study includes estimation of the daily maximum (*T*_{max}) and minimum air temperature (*T*_{min}) for the period.

## 3. Methods

### a. Spatial regression test

*n*. All stations (

*M*) within a certain distance of the station of interest are selected, and a linear regression performed for each station paired with the station of interest and centered on the datum of interest. For each surrounding station, a regression-based estimate (

*x*=

_{i}*a*+

_{i}*b*) is formed. The weighted estimate (

_{i}y_{i}*x*′) is obtained by utilizing the standard error of estimate (

*s*), also known as rmse in the weighting process: where

*N*is the number of stations to be used in the estimate (generally restricted to an

*R*

^{2}greater than 0.5 within a given radius of inclusion, e.g., 50 km). Note that

*N*is selected by the user and

*N*≤

*M*. Care must be taken to preserve the correct sign on

*x*′. To account for possible systematic time shifting of observations (this occurs when an observer consistently writes his observation down on the day before or after the actual date of observation), the surrounding station’s data are each shifted by ±1 day and the regression repeated. The time shift (−1, 0, or +1) that results in the lowest standard error of estimate is then taken into (1) and (2). The weighted standard error of estimate (

*s*′) is calculated: Now the confidence intervals are formed from

*s*′, and we test whether or not the station value (

*x*) falls within the confidence intervals: If the relation in Eq. (3) holds, then the corresponding datum passes the spatial regression test. Increasing

*f*decreases the number of potential Type I errors but increases the number of potential Type II errors. Unlike distance weighting techniques, we can sort the rmse into ascending order and select those stations that compare most favorably to the station of interest. By repeating the process for different values of

*N*we can observe how the performance of the method changes with

*N*. These may or may not be the closest stations.

The spatial regression test has parameter settings (window length, offset, and station list) that are specified for the quality assurance processes. Window length is the number of data pairs applied to form a regression. Offset determines where the window is located relative to the datum to be checked. For example, in our analyses an offset of zero centers the widow on the datum of interest, so aligning the datum of interest with the beginning of the window would be specified by an offset of −29 for a 60-day window. On the other hand, aligning the datum of interest with the end of the window would be accomplished by setting the offset to 30 for a 60-day window. The station list contains the stations used in estimating the data. The station list can be sorted or unsorted stations obtained using different approaches, for example, stations located within a certain distance of the station of interest or within a rectangular neighborhood centered on the station of interest. In this study, we determine the neighborhood by specifying the radius of inclusion where all stations (*M*) within the resulting circle are considered for the station list.

### b. Inverse distance weighting method

*x̂*is the IDW estimate,

*x*is the particular measurement at the

_{i}*i*th surrounding station, and the weight function

*w*is derived from the inverse of the distance between the target station and the

_{i}*i*th surrounding station. In this study, the number of stations (

*n*) was varied to create different sets of estimates, and the performance of each set relative to the original measurements was determined. Many simple weighting methods have been applied for distance-based methods such as the inverse square distance weighting and the exponential distance weighting methods. In this paper we examine the simple IDW method.

### c. Evaluation of estimations

*x*is the measured variable,

_{i}*x̂*is estimated variable, and

_{i}*x̄*is the arithmetic mean of

_{i}*x*for all events

_{i}*i*= 1 to

*m*.

The calculation of NSC is a procedure that essentially sums the deviations of the observations from a linear regression line with a slope of 1. If the measured variable is estimated exactly for all observations, the value of NSC is 1. Low values of NSC show high deviations between measured and estimated values. If NSC is negative, estimates are very poor and population average values would provide better estimates.

*R*

^{2}, the explained variance, have been widely used as an index of agreement, so

*R*

^{2}is included here also. However,

*R*

^{2}is not always instructive and, as Willmott (1981) cautions, should not be used alone to assess the accuracy of estimates. In this paper, Willmott’s (1981)

*D,*the index of agreement for assessing model performance is also used and is expressed as

*D*index and NSC are more sensitive to systematic model error than are

*R*and

*R*

^{2}and reflect systematic model bias when coupled with the

*R*

^{2}statistic. Values of

*D*range from 0.0 for complete disagreement to 1.0 for perfect agreement. Other measures of model performance included in this paper are the systematic (

*E*) and nonsystematic (

_{s}*E*) components of the rmse: where

_{u}*P*

_{ri}is calculated from the slope (

*b*) and intercept (

*a*) of the regression of estimated

*x*and observed

*x*(such that

*P*

_{ri}=

*a*+

*b x*).

_{i}## 4. Results

The estimates using the spatial regression test were evaluated against the measurements for stations within the four states (California, Nevada, Utah, and Nebraska) for the year 2000. The settings used for the sensitivity analysis of the spatial regression test are shown in Table 1. For example, in the radius analysis, the length of the time window and the time offset were set to 60 and 0, respectively, while the radius varied from 16 to 241 km. For each radius, all available stations within the circle were applied during the analysis. For the analysis on number of stations in the SRT method, the stations within 160 km of the station of interest were ranked based on the rmse obtained from regression between the measurements at reference stations and measurements at the station of interest. For the IDW method, the stations were sorted by distance to the station of interest. We illustrate the findings by showing the results from individual stations selected from different topographic regions: coastal, plains, mountains, and basins.

The SRT method was initially analyzed based on the inclusion of all stations within the radius of inclusion, which ranged from 16 to 241 km. In this case, there was no sorting on the rmse so the performance of SRT is not optimal. One example is shown in Figs. 1a and 1b. Patterns vary somewhat from station to station; however, when the radius is greater than 80 km, *D*, NSC, *R*^{2}, *E _{s}*,

*E*, and rmse for the SRT method are relatively stable or decline only slightly. When the radius is less than 80 km, indices at some stations have relatively large fluctuations.

_{u}An example of the analysis of SRT method for sensitivity to window length is shown for the window (Figs. 1c,d) and the window offset (Figs. 1e,f), respectively. In general, *D*, NSC, R^{2}, *E _{s}*,

*E*, and rmse diverge for windows from 20 to 150 days (the largest window length evaluated in this study). When more stations are examined, the estimates obtained are relatively stable when the window length is larger than 60 days. When the window length is less than 60, the fluctuations of

_{u}*D*, NSC,

*R*

^{2},

*E*,

_{s}*E*, and rmse are larger. The performance of SRT changes slightly with the window offsets. In general, data estimates are best when the window is centered on the datum of interest (time shift equal to zero). Figures 1e and 1f demonstrate that using the time shift of −1 is only slightly different from using a time shift of +1. We suggest using a window length larger than or equal to 60 days and using the centered offset (offset being half of the window) for best results.

_{u}This study evaluated the performance of the SRT under a second implementation according to the number of stations (see Figs. 2 and 3). In this analysis, the reference stations were sorted based on the rmse between the daily time series of 2000 at the current station and the daily data of 2000 at the reference stations. When less than 10 stations were applied in the estimation, the performance of the SRT method in some cases varied significantly with the number of stations, for example, those shown in Fig. 2. All indices stabilized when 10 or more stations were used to estimate the *T*_{max} or *T*_{min}. In Fig. 3 we show the proportion of stations (*P*) that have a value of NSC, *R*^{2}, and rmse smaller than the value on the *x* axis for all stations in four states. The distribution curve of these indices for estimates obtained using 10 stations is close to that using more stations. The differences of rmse at 90% probability for 10 stations and 30 stations for both *T*_{max} and *T*_{min} are less than 0.5°F, which is within the expected observational errors (±1°F). In this study, 10 stations are recommended for the estimations of *T*_{max} and *T*_{min} using SRT method.

The performance of the IDW method was also evaluated to determine its sensitivity to the number of stations included (Figs. 4 and 5). The reference stations are sorted by the distance to the station of interest. As shown in the examples, *D*, NSC, *R*^{2}, *E _{s}*,

*E*, and rmse vary with the number of stations and differ for different stations. No specific number of stations shows stability of these indices (Fig. 4). Table 2 lists the number of stations falling within the circle of 32, 80, and 160 km. The Minden, NE, and San Francisco International Airport, CA, stations have better estimates due to a higher station density. In contrast, the sparse distribution around Merced, CA, and Partoun, UT, results in relatively poor estimates. The sensitivity to the number of stations also changed for the densely distributed stations and sparsely distributed stations. In general, the distributions of NSC,

_{u}*R*

^{2}, and rmse using five stations are close to those using more stations (see Fig. 5). Thus, using five stations is recommended for the IDW method for both

*T*

_{max}and

*T*

_{min}.

The distribution of NSC, *R*^{2}, and rmse for the SRT method and the patterns for single stations demonstrate that 10 stations are reasonable for the estimates for *T*_{max} and *T*_{min} (Figs. 2 and 3). We therefore use 10 stations for the SRT method in the following comparisons to the IDW method.

Figures 6 and 7 compare the distribution of rmse for the SRT and IDW methods for the four states individually. Figure 6 compares the results for the SRT method using 10 stations and the best results obtained using the IDW method (15 stations). The SRT method is superior to the IDW method for each of the four states for both *T*_{max} and *T*_{min}. The distribution varies between states. Nebraska has the least rmse for both the SRT and IDW methods for *T*_{max} and *T*_{min}. The rmses for the IDW method for 90% of the stations are 10°, 8°, 6°, and 6°F for *T*_{max} and 8°, 8°, 4°, and 6°F for *T*_{min} for CA, UT, NE, and NV, respectively. The rmses for the SRT method for 90% of the stations are 3°, 2.5°, 2.5°, and 2.7°F for *T*_{max} and 3°, 3.5°, 2.5°, and 3.5°F for *T*_{min} for CA, UT, NE, and NV, respectively, when 10 best-fit stations were utilized to give the estimates.

Figure 7 compares the best results obtained using the IDW method (15 stations) and the worst results for the SRT method using only one station. The results of the SRT method using only one station are still better than the best results obtained by the IDW method. Thus, the SRT method outperforms the IDW method.

## 5. Discussion and conclusions

The parameters for the SRT method were evaluated in this study. A distance of 80 km for radius of inclusion is recommended for both *T*_{max} and *T*_{min}. For some isolated stations, the radius of inclusion should be larger than 80 km, even up to 200 km, so that enough weather stations are available for the estimation and quality assurance processes of *T*_{max} and *T*_{min}. The SRT method using a time-window length of 60 days gives relatively good estimates for both *T*_{max} and *T*_{min}, and using this width is not significantly different from using broader windows, although broader windows provide somewhat better estimates. The estimates are better when the window is centered on the datum of interest. However, for real-time data quality assurance, we suggest using the data time series of previous records or using the fixed SRT method.

The performance of the SRT method stabilized when we used more than 10 stations that were sorted by the least rmse between time series of the station of interest and reference stations. There were significant changes in the performance of the SRT method when fewer than 10 stations were used. In this study, we find 10 to be a minimum but acceptable number of stations for stabilizing estimates for the SRT method.

This study shows that the worst estimates using the SRT method are superior to the best estimates obtained by the IDW method. Thus, in general, the IDW method cannot perform better than the SRT method. The findings provide us more information beyond simply confirming that the spatial regression test will be better than the IDW method (YHG; Eischeid et al. 2000). We suggest using the SRT method to provide estimates for any missing data and for QA of the observations.

The unequal distribution of the weather may contaminate the weighting factor of the spatial regression test. For an unequal distribution of input data points, such as the dense network within the larger-scale network, the analyzed field may be biased by the data within the dense network. For example, the coastal stations have only the neighboring stations on the land but no stations in the sea. Thus, the estimation of data at a coastal station may be biased toward the inland stations, which have significantly different climate patterns than that of the coastal stations. More analysis is planned to address these effects.

## REFERENCES

Eischeid, J. K., , Baker C. B. , , Karl T. , , and Diaz H. F. , 1995: The quality control of long-term climatological data using objective data analysis.

,*J. Appl. Meteor.***34****,**2787–2795.Eischeid, J. K., , Pasteris P. A. , , Diaz H. F. , , Plantico M. S. , , and Lott N. J. , 2000: Creating a serially complete, national daily time series of temperature and precipitation for the western United States.

,*J. Appl. Meteor.***39****,**1580–1591.Gandin, L. S., 1988: Complex quality control of meteorological observations.

,*Mon. Wea. Rev.***116****,**1137–1156.Guttman, N. B., , and Quayle R. G. , 1990: A review of cooperative temperature data validation.

,*J. Atmos. Oceanic Technol.***7****,**334–339.Guttman, N. B., , Karl C. , , Reek T. , , and Shuler V. , 1988: Measuring the performance of data validators.

,*Bull. Amer. Meteor. Soc.***69****,**1448–1452.Hubbard, K. G., 2001a: Multiple station quality control procedures. Automated Weather Stations for Applications in Agriculture and Water Resources Management, K. G. Hubbard and M. V. K. Sivakumar, Eds., Tech. Doc. AGM-3 WMO/TD 1074, High Plains Regional Climate Center, Lincoln, NE, 248 pp.

Hubbard, K. G., 2001b: The Nebraska and High Plains regional experience with automated weather stations. Automated Weather Stations for Applications in Agriculture and Water Resources Management, K. G. Hubbard and M. V. K. Sivakumar, Eds., Tech. Doc. AGM-3 WMO/TD 1074, High Plains Regional Climate Center, Lincoln, NE, 248 pp.

Hubbard, K. G., , De Gaetano A. T. , , and Robbins K. D. , 2004: A modern Applied Climatic Information System (ACIS).

,*Bull. Amer. Meteor. Soc.***85****,**811–812.Hubbard, K. G., , Goddard S. , , Sorensen W. D. , , Wells N. , , and Osugi T. T. , 2005: Performance of quality assurance procedures for an Applied Climate Information System.

,*J. Atmos. Oceanic Technol.***22****,**105–112.Meek, D. W., , and Hatfield J. L. , 1994: Data quality checking for single station meteorological databases.

,*Agric. For. Meteor.***69****,**85–109.Nash, J. E., , and Sutcliffe J. V. , 1970: River flow forecasting through conceptual models.

,*J. Hydrol.***10****,**282–290.Wade, C. G., 1987: A quality control program for surface mesometeorological data.

,*J. Atmos. Oceanic Technol.***4****,**435–453.Willmott, C. J., 1981: On the validation of models.

,*Phys. Geogr.***2****,**184–194.

Settings for the sensitivity analysis.

Number of stations within a distance.