## 1. Introduction

Observations from instruments at the earth's surface generally represent either time-integrated or instantaneous values taken at points in space, while estimates from numerical model output generally represent areal or volume quantities. It cannot be assumed that instrumentation and averaging error is nonexistent in surface measurements. Thus, there is a need to assess the utility of the in situ data and then scale the data to match the scales of model output or remotely sensed observations to produce validated products.

Validation of earth system models is difficult since replication is essentially impossible at the scale of interest. Two approaches have been used with some success. The large-scale field experiment can test parameterizations of the small-scale issues by extensive measurements of such important variables as soil moisture, streamflow, fluxes above the canopy, etc. Although the spatial and temporal scales of data from field programs are much smaller than the larger-scale model resolution, field programs are invaluable for development and validation of physical parameterizations. However, meteorological models also need to be compared to long-term routinely collected meteorological data. Thus, the validation of a climate model can be done for a range of spatial and temporal scales to test both the dynamical and the physical aspects of the model climate (Greene and Morrissey 2000).

Whenever numerical forecast models are validated and compared, verification winds are normally interpolated to individual model grid points. Since the error characteristics of the measured variables (e.g., the winds) are known, and the values are assumed to be more accurate than the model output, interpolation procedures typically focus on scaling from the point measurements to the areal (grid) model estimates. This is particularly true with a high-density instrumentation network like the one used in this study. To be statistically significant, differences between model and verification data must exceed the uncertainty of verification winds due to instrument error, sampling, and interpolation. This paper will describe an approach to examine the uncertainty of interpolated boundary layer winds and illustrate its practical effects on model validation and intercomparison efforts.

As part of a joint model validation project undertaken by the Environmental Verification and Analysis Center at the University of Oklahoma (http://www.evac.ou.edu) and the Battlefield Environment Directorate of the Army Research Laboratory, Battlescale Forecast Model (BFM) surface analyses and forecasts have been validated against surface observations of temperature, pressure, and wind. The BFM is a PC-based mesoscale modeling system designed to produce fine-resolution, short-range forecasts over areas of 500 km × 500 km or less (Henmi et al. 1994; Henmi and Dumais 1998). The BFM incorporates the High Order Turbulence Model for Atmospheric Circulations (Yamada 1981; Yamada and Bunker 1988).

Techniques such as inverse distance-squared weighting or the iterative approach described in Barnes (1964) are commonly used to grid wind fields. However, these methods are not based on the specific spatial correlation of the data to which they are applied. Kriging was chosen for this study because of its reliance on observed spatial correlations of particular datasets (Wackernagel 1994; Isaaks and Srivastava 1989; Breaker et al. 1994). Kriging uses sample values at known locations to estimate the value of unsampled points (Myers 1997). Through an analysis of the spatial variability patterns, a weighted estimate of the unknown quantity is computed. Thus, this technique statistically assess the usefullness of each particular sample data point. This study uses a cross-validation of kriging estimates against 10-m wind observations. Differences between measured and estimated winds were examined in light of their effect on model validation. An example is given to demonstrate that uncertainties in verification winds can reduce the extent to which forecast errors can be reliably calculated and compared and ultimately lead to an inability to evaluate differences in model forecast accuracy.

This spatial pattern information is of critical importance to assess the accuracy of the model output. For example, suppose the error criteria necessary to produce a reliable wind estimate for artillery corrections was 2 m s^{−1}. The results show for certain locations, the accuracy of the model cannot be determined to that level of accuracy due to the errors in the surface fields. Since the errors in the model estimates cannot be determined to the level of accuracy required, any meaningful intercomparison of models (or model parameters) cannot be accomplished. Any tests performed without a clear understanding of the errors associated with the measurement and estimation of the wind field would not be statistically significant, and might lead to erroneous conclusions regarding the efficacy of the models tested.

One potential application of this approach would be a new and valuable method for evaluating the effects of additional observations not yet operationally available but potentially so. Of particular interest to the military weather community is the concept of “adaptable observations.” Using the methods to produce the error estimates described here, accurate information can be provided as to where to place additional observations to maximize their potential efficiency, and to improve the overall army capabilities to not only improve their models, but to correctly assess their accuracy. As described above one advantage in using kriging is that in addition to minimizing the error in the estimate, a map of the actual error in the field can be generated. This would illustrate where best to place additional sensors to improve the surface estimates and ultimately reduce the uncertainties in the model output. This would provide useful information to the modeling community about the amount (and location) of information needed to meet certain error criteria (Greene and Morrissey 1998).

## 2. Methods

In this study, kriging wind estimates were cross-validated against 10-m wind observations from the Oklahoma Mesonet. The Oklahoma Mesonet consists of 111 automated observing stations that continuously monitor numerous important weather and soil parameters (Brock et al. 1995). This dataset included wind measurements taken at 5-min intervals throughout the period of 1994–98. Measurements of standard meteorological variables (e.g., temperature, wind, precipitation) are measured on a 5-min timescale. Only stations reporting throughout the entire 5-yr period and not located in the Oklahoma were incorporated into the study (total number of stations included was 105). The combination of the dense array of stations (including at least one station in each county in the state) and the high temporal resolution of the mesonet makes it the most comprehensive statewide climatological observing system in the world.

**V**(

*i*) and

**V**(

*j*) are wind vectors at locations

*i*and

*j,*respectively. Expected values were evaluated over monthly periods to isolate seasonal differences in spatial variability. Variograms for each station and month were modeled by fitting a curve of the form:

*γ*

*x*

*c*

*ce*

^{−kx}

*c*and

*k*are constants that control limit and shape, respectively, while

*x*represents station separation distance. Each variogram model excluded data from the station to which it would be applied. The function choice was made after rejecting others on the basis of continuity, subjective fit, and theoretical validity. Functions incorporating a nugget effect, or discontinuity at the origin, were rejected due to the nonphysical nature of the discontinuity. Other functions such as the square root of hyperbolic tangent fit the data near the origin very well but the fit was very poor at greater distances. Variations on the square root of distance were rejected on grounds that a variogram should reach a finite limit. A duplicate record was produced for each mesonet site by applying its kriging matrix solution to data from the remaining 104 stations. Statistics were then developed describing the differences between actual and estimated wind records.

The magnitude of interpolation error is viewed as a measure of the uncertainty that would prevail for estimates made at arbitrary locations within the domain specified by the study. The similarity of the distribution of errors calculated for mesonet sites to those calculated for a regular grid or other arbitrary configuration is partly ensured by excluding all data from a given station from its variogram model and its weighted sum estimator. In addition, station-to-station separations within the Oklahoma Mesonet are distributed much like station-to-gridpoint distances would be for grids covering the same domain. One way to test the adequacy of the semivariogram is to examine the results via a cross-validation (Michaelsen 1987). In this approach, the kriging approach described above is computed based on the estimated semivariogram(s) leaving out one point at a time, the sum the squared errors can then be compared with the kriging variance estimate. This ensures that the appropriate weights are determined for each distance. A similar method (comparing the estimated values to the actual wind vectors) is used in this paper.

## 3. Results

Figures 1–4 show vector wind error magnitudes over the state of Oklahoma for four months representing the winter, spring, summer, and autumn wind climatologies. The 90th percentile confidence level was chosen as a practical upper bound on expected errors. The larger error magnitudes evident in western Oklahoma may be due to several factors. The larger station separations found in western Oklahoma will lead to greater estimation variance. In addition, an isotropic model of spatial variability is applied to a region that, particularly in the spring and summer, may experience more longitudinal than latitudinal variability in wind vector orientation due to the presence of a dryline.

The figures above imply uncertainties that would be associated with gridded verification data and forecast error calculations. This vector wind uncertainty can be visualized by drawing a circular footprint around verification wind vectors with a radius equivalent to the 90th percentile errors shown in Figs. 1–4. Figure 5 gives an example of such a footprint and suggests that there is a 90% likelihood that actual wind vectors will fall within the gray-shaded region. Furthermore, because model vector error is defined as the vector difference between modeled and actual winds, 90% of model error vectors should originate from inside the footprint. It should also be noted that the minimum upper bound on the magnitude of model error (vector **B**) is never less than the radius of the gray region.

This minimum upper bound has important consequences for the comparison of numerical model output. For example, assume that the maximum speed and direction error thresholds for wind battlefield wind forecasts are 4 kt in speed and 20°–30° in direction. This implies a maximum allowable vector error magnitude of approximately 8 m s^{−1} for surface winds up to about 30 kt [8 m s^{−1} ≈ 30 kt × sin(30°)]. While it is possible to demonstrate the accuracy of surface wind forecasts within the 8 m s^{−1} limit given the error magnitudes shown in Figs. 1–4, demonstrating the superiority of one model over another within this range becomes very difficult.

In order to show significant differences between models, the confidence intervals bracketing their error magnitudes cannot overlap. This requires that the uncertainty in verification data not exceed one-third of the magnitude of allowable operational error. Therefore, as illustrated in Fig. 6, to show improvements within the 8-m s^{−1} operational requirement, verification data should not contain errors of more than about 2.6 m s^{−1}. Figures 7–10 show areas where the 2.6 m s^{−1} criterion is met. It is evident that regions where models can be compared within their operational demands are limited and vary seasonally. Maps such as those produced in Figs. 7–10 clearly illustrate the advantage of looking at the wind error characteristics in the manner described in this paper. Those areas not shaded in the figures represent areas where it is impossible, given the current network of stations and meteorological variability, to determine the accuracy of the models to within the required degree of confidence. If those areas were determined of strategic importance, then additional resources could be used to place the necessary sensors in those regions. This would reduce the representative distance between stations, and thus improve the overall accuracy in the estimation of the covariance, which in turn would reduce the overall error in the gridded wind field estimates.

## 4. Conclusions

It has been shown that comparisons of model accuracy can be limited by errors in verification data to the extent that meaningful comparisons and significance tests cannot be drawn. In addition, this study illustrates the importance of understanding the seasonal and geographic variability in the suitability of verification data for model intercomparisons. While verification wind uncertainties allow for the calculation of marginally significant forecast improvements given the requirements of a particular application, the areas where this can be accomplished are restricted.

Errors in interpolated verification data are the result of three factors: instrument error, undersampling, and imperfect modeling of spatial variability. Instrument errors cannot be reduced without considerable expense, and to the extent that they are nonzero, they only serve to strengthen the conclusions of this study. Errors due to undersampling might be reduced by enhancing the resolution of the observing network. However, this option is also unrealistic due to cost. The main result of this study is to illustrate that it is crucial to recognize the errors inherent in gridding verification winds when conducting model validation and intercomparison work. Defendable model intercomparison results may rely on proper scheduling of model tests with regard to seasonal wind climatology and choosing instrument networks and variogram functions capable of providing adequately small errors due to sampling and imperfect modeling. Thus, it is important to quantify verification wind uncertainty when stating forecast errors or differences in the accuracy of forecast models.

## REFERENCES

Barnes, S. L., 1964: A technique for maximizing details in numerical weather map analysis.

,*J. Appl. Meteor.***3****,**396–409.Breaker, L. C., Gemmill W. H. , and Crosby D. S. , 1994: The application of a technique for vector colleration to problems in meteorology and oceanography.

,*J. Appl. Meteor.***33****,**1354–1365.Brock, F. V., Crawford K. C. , Elliott R. L. , Cuperus G. W. , Stadler S. J. , Johnson H. L. , and Eliats M. D. , 1995: The Oklahoma Mesonet: A technical overview.

,*J. Atmos. Oceanic Technol.***12****,**5–19.Greene, J. S., and Morrissey M. L. , 1998: Evaluation and validation of simulated and observed climate data.

*Climate Prediction for Agricultural and Resource Management,*L. Leslie and R. Munro, Eds., Australian Academy of Sciences, 83–100.Greene, J. S., and Morrissey M. L. , . 2000: Validation and uncertainty analysis of satellite rainfall algorithms.

,*Prof. Geogr.***52****,**247–257.Henmi, T., and Dumais R. , 1998: Description of the Battlescale Forecast Model. Army Research Laboratory Tech. Rep. 1032, 144 pp.

Henmi, T., Lee M. E. , and Smith T. J. , . 1994: Evaluation of the Battlescale Forecast Model.

*Proc. of the 1994 Battlefield Atmospherics Conference,*White Sands, NM, Battlefield Environment Directorate, U.S. Army Research Laboratory, 245–253.Isaaks, E. H., and Srivastava R. M. , 1989:

*An Introduction to Applied Geostatistics*. Oxford University Press, 561 pp.Journel, A. G., and Huijbregts C. H. J. , 1978:

*Mining Geostatistics*. Academic Press, 600 pp.Michaelsen, J., 1987: Cross-validation in statistical climate forecast models.

,*J. Climate Appl. Meteor.***26****,**1589–1600.Myers, J. C., 1997:

*Geostatistical Error Management: Quantifying Uncertainty for Environmental Sampling and Mapping*. Van Nostrand Reinhold Press, 571 pp.Wackernagel, H., 1994: Cokriging versus kriging in regionalized multivariate data analysis.

,*Geoderma***62****,**83–92.Yamada, T., 1981: A numerical simulation of nocturnal drainage flow.

,*J. Meteor. Soc. Japan***59****,**108–122.Yamada, T., and Bunker S. , 1988: Development of a nested grid, second moment turbulence closure model and application to the ASCOT Brush Creek data simulation.

,*J. Appl. Meteor.***27****,**562–578.