## 1. Introduction

One of the goals of New England High Resolution Temperature Program (NEHRTP; Stensrud et al. 2006) is to investigate the potential of an ensemble forecast system to provide improved forecasts of near-surface variables during the warm season. To accomplish this goal, a simple postprocessing scheme was developed during the first 2 yr of this project (Stensrud and Yussouf 2003, 2005). This scheme uses the past 12 days of observations and model forecasts to calculate the bias of each individual model for each National Weather Service (NWS) surface station location and forecast time. These biases are then applied to today’s raw model data to correct for persistent errors in the surface energy budget, the representation of the surface superadiabatic layer, and differences between the model and actual terrain heights. Results indicate that the ensemble mean 2-m temperature, 2-m dewpoint temperature, and 10-m wind speed from this bias-corrected ensemble (BCE) are competitive with or better than those available from any of the model output statistics (MOS) presently available operationally in the United States (Stensrud and Yussouf 2005). The BCE forecasts also provide reliable probability forecasts that could be very valuable to many end users of weather forecasts. Moreover, a number of other studies have also shown the benefit of simple bias adjustments to the value of short-range ensemble forecasts (Eckel and Mass 2005; Woodcock and Engel 2005).

One limitation to the BCE approach is that it provides forecasts only at NWS surface observation stations across the United States, Canada, and Mexico because observations are the foundation of the bias correction calculation. Considering the need for forecasts of surface variables at other locations (such as towns and cities without an NWS surface site) and motivated by the results obtained from the first 2 yr of the project, further investigations are done using the data collected during the summer of 2004 to examine how to extend the BCE approach to provide bias-corrected forecasts for other locations within the model domain. Because the BCE approach is computationally very inexpensive compared with the cost of producing the ensemble, an extended BCE approach that provides accurate forecasts of near-surface variables for any city or town within the model domain is potentially a very attractive way to help increase and improve the forecast guidance provided to the public.

The ensemble and verification data are described in section 2. Section 3 contains a brief overview of the methods developed for obtaining extended BCE forecasts at locations without observing stations. Section 4 presents the results obtained from the analysis, followed by a final discussion in section 5.

## 2. Data

### a. Ensemble forecasts

The data collection for this experiment started on 1 June and ended on 15 September 2004 yielding a total of 107 forecast days. The short-range ensemble for this experiment consists of 22 members as described in Table 1. The four models that are used to generate the ensemble are the Eta Model (Rogers et al. 1996), the Regional Spectral Model (RSM; Juang et al. 1997), the Rapid Update Cycle model (RUC; Benjamin et al. 2004), and the Weather Research and Forecast model (WRF; Klemp 2004).

Sixteen of the ensemble members are provided by the National Centers for Environmental Prediction (NCEP), with 15 of these members from the operational Short-Range Ensemble Forecast (SREF) system (see McQueen et al. 2005 for more details). The final NCEP member is the 12-km operational Eta Model. Four other members are provided by the Forecast System Laboratory (FSL) and another two members are provided by the National Severe Storms Laboratory (NSSL). Out of the four members from FSL, two are from the RUC model (at 13 and 20 km) while the remaining two are from the WRF model (at 13 and 20 km). The 20-km RUC uses an initial condition created by an optimal interpolation scheme, with boundary conditions provided by the Eta Model forecasts. The 13-km RUC uses the 20-km RUC data for initial and boundary conditions. The initial and boundary condition data for the WRF model are the same as for the RUC forecasts. The final two members from NSSL use the Eta and WRF models. Both of these models use the NCEP Global Forecast System (GFS) for initial and boundary conditions and the Kain–Fritsch (KF) convective parameterization scheme. While the start times for the ensemble members vary from 0000 to 1200 UTC (Table 1), all the model forecasts are valid starting 1200 UTC each day and extending out to 48 h. The model forecasts are available over the continental United States every 3 h and are interpolated to the same 40-km grid as the NCEP models.

### b. Surface observations

The surface observations are obtained from the NWS operational data stream. Data from a total of 1892 observing stations spread out geographically across the contiguous United States, southern Canada, and northern Mexico (Fig. 1) are used in determining the bias correction for each model, each station location, and each forecast time. These data also are used in the verification of the forecasts for these observing stations. The surface data are not quality controlled beyond that done by NWS operationally, which may add some error to the bias corrections calculated and applied. However, examination of selected days suggests that the surface observations used in this study are reasonably precise and free from large errors.

### c. Verification data

To verify the accuracy of the extended BCE approach at locations without routine NWS surface observations, it is important to have reliable and high-quality independent verification data that are not part of the routine NWS observational network. The Oklahoma Mesonet (Brock et al. 1995), which arguably represents the premiere meteorological mesonet in the country, is selected for this purpose and hourly data are obtained via the Meteorological Assimilation Data Ingest System (MADIS; MacDermaid et al. 2005). The Oklahoma Mesonet consists of 116 stations statewide for the 2004 experimental period (Fig. 2). The mesonet has very detailed and robust quality control algorithms (Shafer et al. 2000) and has uniform, high-quality instrumentation.

## 3. Methodology

The BCE method uses the past complete 12 days of data to calculate the bias of each of the 22 individual models at each individual observing station at each forecast time. A 12-day window is chosen after quantitative evaluation of the data with window lengths varying from 2 to 25 days indicating that 12 days is a reasonable choice for the bias correction window length (Stensrud and Yussouf 2005). Out of the 107 days of this experiment, the first 12 days are used to provide the first BCE forecasts, leaving a total of 95 days for evaluation. Because of computer problems, one or more of the model runs may be missing on a given day, in which case the ensemble is created from the remaining available members. The accuracy of the extended BCE approach is determined by comparing results against two other bias correction approaches and against the model ensemble mean forecasts without any bias correction.

### a. Extended BCE (EBCE) approach

*Z*of a variable at any location is given by

_{G}*Z*is the 12-day mean bias value at an NWS surface observation site

_{Gi}*i*and

*W*is a weight factor inversely proportional to the distance

_{Gi}*r*between the NWS site

_{i}*i*and the specified location. This weight factor is defined as

*n*is the radius of influence (assumed to be 200 km in this study) and

*m*is the number of NWS stations that are located within the 200-km radius of the specified site (Fig. 3a). The 200-km radius of influence is chosen simply so that any location within the contiguous 48 states has data from at least three NWS stations contributing to the bias approximation. Experiments to determine if other values of

*n*produce better results have not been done. The model forecasts for the same specified location are bilinearly interpolated from each model in the ensemble using the closest four model grid points (Fig. 3b). Finally, the interpolated 12-day mean bias values calculated from (1) are applied to today’s interpolated raw forecast at the specified location for each individual model. This approach is called the extended BCE (EBCE) forecast. While more sophisticated and accurate extension methods are possible, this very basic approach appears sufficient to illustrate the potential value of the EBCE approach. The ensemble mean of all the available bias-corrected forecasts is used to compare with observations and evaluate the accuracy of this method.

To illustrate the horizontal variations in the 12-day mean bias values, a Cressman weighting scheme is used to interpolate the bias values from the BCE approach at the NWS observation station locations to a grid with 50-km grid spacing. This analysis indicates that the bias values vary smoothly (Fig. 4). While some of this smoothness is due to the objective analysis scheme, examination of point values generally reaffirms the slow horizontal variation of the 12-day mean bias values from location to location. This behavior is also seen at other forecast times, from other models, and for other variables (not shown). The lack of small-scale structure and irregularity suggests that it is possible to determine a representative value of the mean bias at locations in between the NWS stations by simple interpolation. Admittedly, in regions of complex terrain, such as the western United States (where the observational NWS surface data density also is less; see Fig. 1), the results from this simple interpolation method may be less accurate. The use of nonisotropic weighting schemes (e.g., Stauffer and Seaman 1994) and terrain height adjustments likely need to be explored in these regions.

### b. Mesonet BCE approach

This approach applies the BCE technique of Stensrud and Yussouf (2005) directly to the Oklahoma Mesonet stations. First, the bias of each individual model at each of the 116 mesonet stations at each forecast hour is calculated from the previous 12 days of observations and the bilinearly interpolated forecast data valid at the station location. Second, these biases are then applied to the present day’s raw model forecast at those same locations and forecast times. The ensemble mean forecast is calculated using a simple average of the bias-corrected ensemble forecast values. This approach is expected to be the most accurate as it does not require any horizontal interpolation of the bias values.

### c. NWS BCE approach

The BCE technique is applied directly to the 53 NWS stations within Oklahoma and across the surrounding border states (Fig. 2). These NWS station locations are independent from the Oklahoma Mesonet stations. This approach is exactly the same as the original BCE that yields forecasts more accurate than MOS (Stensrud and Yussouf 2005), but is limited to using stations within the Oklahoma region. Again, the ensemble mean forecast is used to compare with the observations.

### d. Raw ensemble (RE) mesonet approach

The raw (original) forecasts from each of the ensemble members are bilinearly interpolated to the Oklahoma Mesonet locations from the four closest model grid points and the ensemble mean calculated. No bias correction is applied to the forecast variables. This approach shows the accuracy of the raw model forecasts and should be the least accurate of the methods examined as it fails to account for model bias.

Verification results from the EBCE are compared against the results from the mesonet BCE, NWS BCE, and the RE mesonet forecasts to determine the ability of the EBCE to accurately predict near-surface variables. The variables predicted include 2-m temperature, 2-m dewpoint temperature, and 10-m wind speed. Because each forecast cycle starts at 1200 UTC and extends out to 48 h at 3-h intervals, there are 17 separate forecast times for verification.

## 4. Results

The mean error (bias; forecast − observed), mean absolute error (MAE), and the root-mean-square error (rmse) (Wilks 1995) are calculated for each of the four approaches at the various stations and each of the 17 forecast hours for the entire 95-day experimental period. The results shown are calculated from all the stations used within each of the four approaches. The statistical significance of the error values also is determined using a bootstrap technique (Mullen and Buizza 2001), where resamples are randomly selected from the pool of 95 days for each forecast hour and error statistics for each of those resamples are generated. This resampling procedure is repeated 10 000 times to estimate the 90% confidence bounds of the error statistics. If the confidence intervals associated with the EBCE, the mesonet BCE, the NWS BCE, and the RE mesonet do not overlap, then, assuming a normal distribution, the differences are significant at more than the 95% level.

Results indicate that the mean EBCE forecasts for 2-m dewpoint temperature (Fig. 5) and 2-m temperature (Fig. 6) are very competitive with the mean mesonet BCE and NWS BCE forecasts at all times as shown by the values of MAE and rmse, with the EBCE generally being slightly less accurate than the mesonet BCE forecasts. This result is expected as the mesonet BCE should be the most accurate forecast at the mesonet stations because the bias is calculated from the observations at the station locations instead of being interpolated from nearby NWS stations. However, the differences are not significant at the 95% level for most forecast times because the confidence bounds overlap. The mean EBCE forecasts further are seen to have a cold bias for 2-m dewpoint temperature (Fig. 5c) and a warm bias for 2-m temperature (Fig. 6c). These bias values are fairly consistent with the diurnal heating cycle in that they are smallest in the morning near 1200 UTC and reach their largest values near 0000 UTC, in agreement with the diurnal cycle of various model temperature biases seen by Zhang and Zheng (2004). Not surprisingly, the errors from the RE mesonet forecasts without bias correction are always larger than those from the corrected forecasts and these differences almost always are significant, illustrating the value of bias correction.

Results for 10-m wind speed again indicate that the mean EBCE forecasts are slightly less accurate than the mesonet BCE as shown by the values of MAE and rmse, but are more accurate than the NWS BCE forecast (Fig. 7). The differences between the EBCE and NWS BCE forecasts are statistically significant for several forecast times, a result that is unexpected. These differences likely are not due to the EBCE forecasts being more accurate than the NWS BCE forecasts and, instead, are attributed to the different reporting procedures. The Oklahoma Mesonet reports wind speeds to an accuracy of 0.1 m s^{−1}, whereas the NWS reports wind speeds in units of whole knots. The mean EBCE also is seen to have a positive bias for 10-m wind speed with the largest bias values during the daytime, consistent with the diurnal heating cycle. The three bias correction approaches largely remove the nocturnal overprediction of wind speed, as illustrated by the large positive bias values of the RE mesonet between 12 and 24 h and 36 and 48 h, a model bias also seen when using several boundary layer parameterization schemes in Zhang and Zheng (2004).

The probabilistic information provided by the EBCE, such as the probability of the forecast temperatures or wind speeds exceeding a given value, is examined using attribute diagrams (Wilks 1995) along with the 90% confidence intervals. The probabilities are calculated assuming that each ensemble member forecast is equally likely and are calculated over the entire 95 days and all 17 forecast hours. Results indicate that all of the approaches have a tendency to underpredict lower probabilities and overpredict higher probabilities for 2-m dewpoint temperatures greater than or equal to 294 K (70°F) (Fig. 8). Similar behaviors are seen in the bias correction approaches for 2-m temperature using a threshold of 303 K (86°F) (Fig. 9) and 10-m wind speed using a threshold of 7 m s^{−1} (Fig. 10), but the underprediction only occurs for low-probability (<20%) events. Overall, the probabilities from the mesonet BCE, NWS BCE, and the EBCE approaches are very similar and tend to have overlapping confidence intervals, indicating that their differences are not significant. In contrast, the probabilities from the RE mesonet approach yield results that are farther from the diagonal line in the attribute diagrams (Figs. 8 –10). Indeed, many of the probabilities from the raw ensemble have no skill for both 2-m temperature and 10-m wind speed, while the raw ensemble also largely underpredicts the occurrences of 2-m dewpoint temperatures above the threshold value. These comparisons again highlight the improvements seen when using simple bias correction on the ensemble forecasts. It is important to note that no attempts are made to calibrate the probability forecasts as was done by Hamill and Colucci (1998). Stensrud and Yussouf (2005) show that calibration of the BCE forecasts is possible using information from rank histograms and yields a more reliable forecast.

To emphasize the results from these comparisons even further, the Brier score (Brier 1950), a mean square error of the probability forecasts, and a Brier skill score (BSS; Wilks 1995) are calculated for the four approaches. The climatological frequency of the event is determined from the full 95 days of the observational dataset for each selected threshold amount. Smaller values of the Brier score indicate a more skillful forecast, as do larger values of the BSS. Results indicate that the Brier score of the EBCE is slightly larger than that of the mesonet BCE and NWS BCE approaches but lower than that of the RE mesonet for both temperatures and wind speeds (Table 2). The BSS of the EBCE approach is smaller than those from the mesonet BCE and NWS BCE, yet all three of these methods are more skillful than the RE mesonet, as expected. While it is clear that the availability of observations to bias correct the model forecasts yield more accurate and skillful bias-corrected forecasts, as shown by the better performance of the mesonet BCE and NWS BCE approaches, the interpolated values of the mean bias used by the EBCE still yield comparatively accurate and skillful forecast guidance for locations where no surface observations are available and are a significant improvement over the raw model forecasts.

## 5. Discussion

This study explores the possibility of using BCE forecasts of 2-m temperature, 2-m dewpoint temperature, and 10-m wind speeds (Stensrud and Yussouf 2005), available only at NWS surface observing sites, to obtain forecasts of these near-surface variables at any given location within the model domain. A method is developed that interpolates the BCE 12-day mean bias values valid at NWS station locations to any given location using a Cressman weighting function, while the raw model forecasts are bilinearly interpolated to this same location from the four closest grid points. The interpolated 12-day mean bias value is then applied to the bilinearly interpolated forecast value for today’s forecast. This method is tested using observations from the Oklahoma Mesonet as verification data. The EBCE mean forecast rmse and MAE are close to those from the original BCE mean as applied to both mesonet and NWS observations and the differences in the errors are not statistically significant. Statistically significant differences between the EBCE and mesonet and NWS BCE approaches are seen in the bias, or mean error, with the EBCE biases showing a distinct diurnal cycle in all three predicted variables. However, the probabilistic information provided by the EBCE is found to be very similar to that provided by the mesonet and NWS BCE approaches, with all three approaches yielding forecasts that are reliable and are skillful for the thresholds examined when evaluated using a Brier skill score. Comparisons against the raw ensemble data without any bias correction clearly show the improvements due to the application of simple bias correction both for predicting the ensemble mean and for providing probabilistic forecast guidance.

Admittedly, these calculations are limited to the summer months, so that results from the BCE and EBCE approaches during the colder seasons are unknown. However, the results from Woodcock and Engel (2005) suggest that this type of approach may be useful in both the warm and cold seasons of the year. Further investigations to obtain improvements of the BCE scheme in calculating the bias of the BCE scheme can be explored by applying a weighted average of the previous days. Moreover, the verification data used in this study are limited to the state of Oklahoma. Verifying the EBCE approach over a larger domain is desirable provided the availability of reliable and high-quality observations similar to that of the Oklahoma Mesonet. Over the western United States and other regions of complex terrain, further adjustments to account for the errors in model terrain height fields will be needed. The correlation between model terrain height errors and 2-m temperature biases at NWS stations during the daytime indicate a linear relationship that might be helpful in reducing the forecast temperature biases in complex terrain. However, no such simple relationships are found at night or for 2-m dewpoint temperature or wind speed, indicating that further study is needed. As shown by Myrick et al. (2005), short-period weather disturbances can lead to different biases in adjacent valleys, making accurate bias corrections in complex terrain quite challenging when the observational data are sparse.

The comparisons suggest that the EBCE approach has the potential to produce accurate, skillful, and useful forecast products at any location within the model domain. Thus, forecasts for towns and cities that do not presently benefit from the availability of postprocessed statistical model output can be provided easily using the raw ensemble forecasts and routine NWS surface observations. Simple techniques such as the BCE and EBCE to bias correct ensemble forecasts should be more fully explored and tested in forecast operations to help provide useful and accurate guidance forecasts of many near-surface variables to the public.

## Acknowledgments

The authors are thankful to Jun Du, Jeff McQueen, Mike Baldwin, Stan Benjamin, Tracy Lorraine Smith, and Tanya Smirnova for providing the output from the forecast models used in this experiment. The authors are also thankful to the Oklahoma Mesonet, Patricia Miller, and Michael Barth for providing the mesonet data. The helpful and constructive comments of two anonymous reviewers are appreciated and gratefully acknowledged. Local computer assistance provided by Doug Kennedy, Steven Fletcher, Brett Morrow, and Karen Cooper are greatly appreciated. Chris Fiebrich is thanked for sharing his knowledge of wind sensors. Partial funding for this research was provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA17RJ1227, U.S. Department of Commerce.

## REFERENCES

Benjamin, S. G., G. A. Grell, J. M. Brown, T. G. Smirnova, and R. Bleck, 2004: Mesoscale weather prediction with the RUC hybrid isentropic–terrain-following coordinate model.

,*Mon. Wea. Rev.***132****,**473–494.Brier, G. W., 1950: Verification of forecasts expressed in terms of probability.

,*Mon. Wea. Rev.***78****,**1–3.Brock, F. V., K. C. Crawford, R. L. Elliott, G. W. Cuperus, S. J. Stadler, H. L. Johnson, and M. D. Eilts, 1995: The Oklahoma Mesonet: A technical overview.

,*J. Atmos. Oceanic Technol.***12****,**5–19.Cressman, G. P., 1959: An operational objective analysis system.

,*Mon. Wea. Rev.***87****,**367–374.Eckel, F. A., and C. F. Mass, 2005: Aspects of effective mesoscale, short-range ensemble forecasting.

,*Wea. Forecasting***20****,**328–350.Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts.

,*Mon. Wea. Rev.***126****,**711–724.Juang, H-M. H., S-Y. Hong, and M. Kanamitsu, 1997: The NCEP Regional Spectral Model: An update.

,*Bull. Amer. Meteor. Soc.***78****,**2125–2143.Klemp, J., 2004: Next-generation mesoscale modeling: A technical overview of WRF. Preprints,

*20th Conf. on Weather Analysis and Forecasting,*Seattle, WA, Amer. Meteor. Soc., CD-ROM, 11.2.MacDermaid, C., R. C. Lipschutz, P. Hildreth, R. A. Ryan, A. B. Stanley, M. F. Barth, and P. A. Miller, 2005: Architecture of MADIS data processing and distribution at FSL. Preprints,

*21st Int. Conf. on Interactive Information and Processing Systems (IIPS) for Meteorology, Oceanography, and Hydrology,*San Diego, CA, Amer. Meteor. Soc., CD-ROM, P2.39.McQueen, J., J. Du, B. Zhou, G. Manikin, B. Ferrier, H-Y. Chuang, G. DiMego, and Z. Toth, 2005: Recent upgrades to the NCEP Short Range Ensemble Forecasting System (SREF) and future plans. Preprints,

*17th Conf. on Numerical Weather Prediction,*Washington, DC, Amer. Meteor. Soc., 11A.2.Mullen, S. L., and R. Buizza, 2001: Quantitative precipitation forecasts over the United States by the ECMWF Ensemble Prediction System.

,*Mon. Wea. Rev.***129****,**638–663.Myrick, D. T., J. D. Horel, and S. M. Lazarus, 2005: Local adjustment of the background error correlation for surface analyses over complex terrain.

,*Wea. Forecasting***20****,**149–160.Rogers, E., T. Black, D. Deaven, G. Dimego, Q. Zhao, M. Baldwin, N. Junker, and Y. Lin, 1996: Changes to the operational “early” Eta Analysis/Forecast System at the National Centers for Environmental Prediction.

,*Wea. Forecasting***11****,**391–413.Shafer, M. A., C. A. Fiebrich, D. S. Arndt, S. E. Fredrickson, and T. W. Hughes, 2000: Quality assurance procedures in the Oklahoma Mesonetwork.

,*J. Atmos. Oceanic Technol.***17****,**474–494.Stauffer, D. R., and N. L. Seaman, 1994: Multiscale four-dimensional data assimilation.

,*J. Appl. Meteor.***33****,**416–434.Stensrud, D. J., and N. Yussouf, 2003: Short-range ensemble predictions of 2-m temperature and dewpoint temperature over New England.

,*Mon. Wea. Rev.***131****,**2510–2524.Stensrud, D. J., and N. Yussouf, 2005: Bias-corrected short-range ensemble forecasts of near surface variables.

,*Meteor. Appl.***12****,**217–230.Stensrud, D. J., and Coauthors, 2006: The New England High-Resolution Temperature Program (NEHRTP).

,*Bull. Amer. Meteor. Soc.***87****.**491–498.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences: An Introduction*. Academic Press, 467 pp.Woodcock, F., and C. Engel, 2005: Operational consensus forecasts.

,*Wea. Forecasting***20****,**101–111.Zhang, D-L., and W-Z. Zheng, 2004: Diurnal cycles of surface winds and temperatures as simulated by five boundary layer parameterizations.

,*J. Appl. Meteor.***43****,**157–169.

Descriptions of the models used to construct the ensemble, the organizations that provided the forecasts, the model grid spacing (km), the model forecast start times (UTC) and forecast length (h), and the number of forecasts provided.

Values of BS and BSS of the three variables for the selected thresholds. While the EBCE, mesonet BCE, and RE mesonet approaches are applied on 116 OK Mesonet locations, the NWS BCE approach is applied on 53 NWS station locations within the same region.