• Allen, D. M., 1971: Mean square error of prediction as a criterion for selecting variables. Technometrics, 13 , 469475.

  • Antolik, M. S., 2000: An overview of the National Weather Service's centralized statistical quantitative precipitation forecasts. J. Hydrol., 239 , 306337.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Briggs, P. R., , and Cogley J. G. , 1996: Topographic bias in mesoscale precipitation networks. J. Climate, 9 , 205218.

  • Charba, J. P., 1977: Operational system for predicting thunderstorms two to six hours in advance. NOAA Tech. Memo. NWS TDL-64, 24 pp.

  • Chelliah, M., , and Ropelewski C. F. , 2000: Reanalyses-based tropospheric temperature estimates: Uncertainties in the context of global climate change detection. J. Climate, 13 , 31873205.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Connelly, B. A., , Braatz D. T. , , Halquist J. B. , , DeWeese M. M. , , Larson L. , , and Ingram J. J. , 1999: Advanced hydrologic prediction system. J. Geophys. Res., 104 (D16) 1965519660.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Daly, C., , Neilson R. P. , , and Phillips D. L. , 1994: A statistical-topographic model for mapping climatological precipitation over mountainous terrain. J. Appl. Meteor., 33 , 140158.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Day, G. N., 1985: Extended streamflow forecasting using NWSRFS. ASCE J. Water Res. Plann. Manage., 111 , 157170.

  • Dey, C. H., , and Morone L. L. , 1985: Evolution of the NMC Global Assimilation System: January 1982–December 1983. Mon. Wea. Rev., 113 , 304318.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DiMego, G., 1988: The National Meteorological Center Regional Analysis System. Mon. Wea. Rev., 116 , 11371156.

  • Eischeid, J. K., , Pasteris P. A. , , Diaz H. F. , , Plantico M. S. , , and Lott N. J. , 2000: Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J. Appl. Meteor., 39 , 15801591.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Glahn, H. R., , and Lowry D. A. , 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11 , 12031211.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hagemann, S., , and Gates L. D. , 2001: Validation of the hydrological cycle of ECMWF and NCEP reanalyses using the MPI hydrological discharge model. J. Geophys. Res., 106 (D2) 15031510.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamlet, A. F., , and Lettenmaier D. P. , 1999: Columbia River streamflow forecasting based on ENSO and PDO climate signals. ASCE J. Water Res. Plann. Manage., 125 , 333341.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hay, L. E., , Clark M. P. , , Wilby R. L. , , Gutowski W. J. , , Arritt R. W. , , Takle E. S. , , Pan Z. , , and Leavesley G. H. , 2002: Use of regional climate model output for hydrologic simulations. J. Hydrometeor., 3 , 571590.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janowiak, J. E., , Gruber A. , , Kondragunta C. R. , , Livezey R. E. , , and Huffman G. J. , 1998: A comparison of the NCEP–NCAR reanalysis precipitation and the GPCP rain gauge–satellite combined dataset with observational error considerations. J. Climate, 11 , 29602979.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jensenius J. S. Jr., , 1992: The use of grid-binary variables as predictors for statistical weather forecasting. Preprints, 12th Conf. on Probability and Statistics in the Atmospheric Sciences, Toronto, ON, Canada, Amer. Meteor. Soc., 225–230.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437471.

  • Kalnay, E., , Lord S. J. , , and McPherson R. D. , 1998: Maturity of operational numerical weather prediction: Medium range. Bull. Amer. Meteor. Soc., 79 , 27532769.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-year reanalysis: Monthly means CD-ROM and documentation. Bull. Amer. Meteor. Soc., 82 , 247267.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Leavesley, G. H., , Lichty R. W. , , Troutman B. M. , , and Saindon L. G. , 1983: Precipitation-runoff modeling system: User's manual. U.S. Geological Survey Water Investment Rep. 83-4238, 207 pp.

    • Search Google Scholar
    • Export Citation
  • Leavesley, G. H., , Restrepo P. J. , , Markstrom S. L. , , Dixon M. , , and Stannard L. G. , 1996: The modular modeling system—MMS: User's manual. U.S. Geological Survey Open File Rep. 96-151, 142 pp.

    • Search Google Scholar
    • Export Citation
  • Mantua, N. J., , Hare S. R. , , Zhang Y. , , Wallace J. M. , , and Francis R. C. , 1997: A Pacific Interdecadal Climate Oscillation with impacts on salmon production. Bull. Amer. Meteor. Soc., 78 , 10691079.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Michaelsen, J., 1987: Cross-validation in statistical climate forecast models. J. Climate Appl. Meteor., 26 , 15891600.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pan, H-L., , and Wu W-S. , 1994: Implementing a mass flux convection parameterization package for the NMC medium range forecast model. Preprints, 10th Conf. on Numerical Weather Prediction, Portland, OR, Amer. Meteor. Soc., 96–98.

    • Search Google Scholar
    • Export Citation
  • Panofsky, H. A., , and Brier G. W. , 1963: Some Applications of Statistics to Meteorology. Mineral Industries Continuing Education, College of Mineral Industries, The Pennsylvania State University, 224 pp.

    • Search Google Scholar
    • Export Citation
  • Peppler, R. A., , and Lamb P. J. , 1989: Tropospheric static stability and central North American growing season rainfall. Mon. Wea. Rev., 117 , 11561180.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Reek, T., , Doty S. R. , , and Owen T. W. , 1992: A deterministic approach to the validation of historical daily temperature and precipitation data from the cooperative network. Bull. Amer. Meteor. Soc., 73 , 753762.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Reid, P. A., , Jones P. D. , , Brown O. , , Goodess C. M. , , and Davies T. D. , 2001: Assessments of the reliability of NCEP circulation data and relationships with surface climate by direct comparisons with station based data. Climate Res., 17 , 247261.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Serreze, M. C., , and Hurst C. M. , 2000: Representation of mean Arctic precipitation from NCEP–NCAR and ERA reanalyses. J. Climate, 13 , 182201.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., , and Guillemot C. J. , 1998: Evaluation of the atmospheric moisture and hydrologic cycle in the NCEP/NCAR reanalyses. Climate Dyn., 14 , 213231.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vislocky, R. L., , and Fritsch J. M. , 1997: Performance of an advanced MOS system in the 1996–97 National Collegiate Weather Forecasting Contest. Bull. Amer. Meteor. Soc., 78 , 28512857.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, 467 pp.

  • Woollen, J. S., , Kalnay E. , , Gandin L. , , Collins W. , , Saha S. , , Kistler R. , , Kanamitsu M. , , and Chelliah M. , 1994: Quality control in the reanalysis system. Bull. Amer. Meteor. Soc., 75 , 1314.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    Systematic biases in NCEP forecasts of precipitation and 2-m air temperature, showing biases for day+0 (top row) and then biases for each subsequent forecast lead time. Precipitation biases are expressed as a percentage of the PRISM climatology, and temperature biases are expressed as a departure from the PRISM climatology (°C)

  • View in gallery

    Accuracy of the raw NCEP precipitation and 2-m max temperature forecasts, showing forecast skill for day+0 (top row) and then skill for each subsequent forecast lead time. Forecast skill for precipitation forecasts is assessed using Spearman rank correlations, and forecast skill for 2-m max temperature forecasts is assessed using squared Pearson correlations (r2)

  • View in gallery

    Scatterplots of the observed and modeled MOS-based long-term (1977–98) mean for the four midseason months of Jan (column 1), Apr (column 2), Jul (column 3), and Oct (column 4) for max and min temperature (°C; top two rows), precipitation occurrence (expressed as a percentage of precipitation days; third row), and precipitation amounts (mm day−1; bottom row). Each point illustrates the observed and modeled mean for an individual station

  • View in gallery

    Accuracy of the MOS-based precipitation and 2-m max temperature forecasts, showing forecast skill for day+0 (top row) and then skill for each subsequent forecast lead time. Forecast skill for precipitation forecasts is assessed using Spearman rank correlations, and forecast skill for 2-m max temperature forecasts is assessed using squared Pearson correlations (r2)

  • View in gallery

    Accuracy of the raw NCEP and the MOS-based precipitation and temperature forecasts for the four midseason months of Jan (column 1), Apr (column 2), Jul (column 3), and Oct (column 4). Shown are skill scores for max and min temperature (squared Pearson correlation; top two rows), precipitation occurrence (Kupier's skill score; third row), and precipitation amounts (Spearman rank correlation; bottom row). Raw NCEP predictions are expressed with a dotted line (squares), and the MOS-based predictions are expressed with a solid line (triangles)

  • View in gallery

    Location of study basins

  • View in gallery

    RPSS calculated for each forecast day and month, using (top) MOS-based precipitation and temperature forecasts and (bottom) the climatological ESP approach. See text for further details

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 237 237 19
PDF Downloads 138 138 12

Use of Medium-Range Numerical Weather Prediction Model Output to Produce Forecasts of Streamflow

View More View Less
  • 1 Center for Science and Technology Policy Research, Cooperative Institute for Research in Environmental Sciences, University of Colorado, Boulder, Colorado
  • | 2 U.S. Geological Survey, Denver, Colorado
© Get Permissions
Full access

Abstract

This paper examines an archive containing over 40 years of 8-day atmospheric forecasts over the contiguous United States from the NCEP reanalysis project to assess the possibilities for using medium-range numerical weather prediction model output for predictions of streamflow. This analysis shows the biases in the NCEP forecasts to be quite extreme. In many regions, systematic precipitation biases exceed 100% of the mean, with temperature biases exceeding 3°C. In some locations, biases are even higher. The accuracy of NCEP precipitation and 2-m maximum temperature forecasts is computed by interpolating the NCEP model output for each forecast day to the location of each station in the NWS cooperative network and computing the correlation with station observations. Results show that the accuracy of the NCEP forecasts is rather low in many areas of the country. Most apparent is the generally low skill in precipitation forecasts (particularly in July) and low skill in temperature forecasts in the western United States, the eastern seaboard, and the southern tier of states. These results outline a clear need for additional processing of the NCEP Medium-Range Forecast Model (MRF) output before it is used for hydrologic predictions.

Techniques of model output statistics (MOS) are used in this paper to downscale the NCEP forecasts to station locations. Forecasted atmospheric variables (e.g., total column precipitable water, 2-m air temperature) are used as predictors in a forward screening multiple linear regression model to improve forecasts of precipitation and temperature for stations in the National Weather Service cooperative network. This procedure effectively removes all systematic biases in the raw NCEP precipitation and temperature forecasts. MOS guidance also results in substantial improvements in the accuracy of maximum and minimum temperature forecasts throughout the country. For precipitation, forecast improvements were less impressive. MOS guidance increases the accuracy of precipitation forecasts over the northeastern United States, but overall, the accuracy of MOS-based precipitation forecasts is slightly lower than the raw NCEP forecasts.

Four basins in the United States were chosen as case studies to evaluate the value of MRF output for predictions of streamflow. Streamflow forecasts using MRF output were generated for one rainfall-dominated basin (Alapaha River at Statenville, Georgia) and three snowmelt-dominated basins (Animas River at Durango, Colorado; East Fork of the Carson River near Gardnerville, Nevada; and Cle Elum River near Roslyn, Washington). Hydrologic model output forced with measured-station data were used as “truth” to focus attention on the hydrologic effects of errors in the MRF forecasts. Eight-day streamflow forecasts produced using the MOS-corrected MRF output as input (MOS) were compared with those produced using the climatic Ensemble Streamflow Prediction (ESP) technique. MOS-based streamflow forecasts showed increased skill in the snowmelt-dominated river basins, where daily variations in streamflow are strongly forced by temperature. In contrast, the skill of MOS forecasts in the rainfall-dominated basin (the Alapaha River) were equivalent to the skill of the ESP forecasts. Further improvements in streamflow forecasts require more accurate local-scale forecasts of precipitation and temperature, more accurate specification of basin initial conditions, and more accurate model simulations of streamflow.

Corresponding author address: Dr. Martyn P. Clark, Center for Science and Technology Policy Research, University of Colorado, 1333 Grandview Ave., UCB 488, Boulder, CO 80309-0488.Email: clark@vorticity.colorado.edu

Abstract

This paper examines an archive containing over 40 years of 8-day atmospheric forecasts over the contiguous United States from the NCEP reanalysis project to assess the possibilities for using medium-range numerical weather prediction model output for predictions of streamflow. This analysis shows the biases in the NCEP forecasts to be quite extreme. In many regions, systematic precipitation biases exceed 100% of the mean, with temperature biases exceeding 3°C. In some locations, biases are even higher. The accuracy of NCEP precipitation and 2-m maximum temperature forecasts is computed by interpolating the NCEP model output for each forecast day to the location of each station in the NWS cooperative network and computing the correlation with station observations. Results show that the accuracy of the NCEP forecasts is rather low in many areas of the country. Most apparent is the generally low skill in precipitation forecasts (particularly in July) and low skill in temperature forecasts in the western United States, the eastern seaboard, and the southern tier of states. These results outline a clear need for additional processing of the NCEP Medium-Range Forecast Model (MRF) output before it is used for hydrologic predictions.

Techniques of model output statistics (MOS) are used in this paper to downscale the NCEP forecasts to station locations. Forecasted atmospheric variables (e.g., total column precipitable water, 2-m air temperature) are used as predictors in a forward screening multiple linear regression model to improve forecasts of precipitation and temperature for stations in the National Weather Service cooperative network. This procedure effectively removes all systematic biases in the raw NCEP precipitation and temperature forecasts. MOS guidance also results in substantial improvements in the accuracy of maximum and minimum temperature forecasts throughout the country. For precipitation, forecast improvements were less impressive. MOS guidance increases the accuracy of precipitation forecasts over the northeastern United States, but overall, the accuracy of MOS-based precipitation forecasts is slightly lower than the raw NCEP forecasts.

Four basins in the United States were chosen as case studies to evaluate the value of MRF output for predictions of streamflow. Streamflow forecasts using MRF output were generated for one rainfall-dominated basin (Alapaha River at Statenville, Georgia) and three snowmelt-dominated basins (Animas River at Durango, Colorado; East Fork of the Carson River near Gardnerville, Nevada; and Cle Elum River near Roslyn, Washington). Hydrologic model output forced with measured-station data were used as “truth” to focus attention on the hydrologic effects of errors in the MRF forecasts. Eight-day streamflow forecasts produced using the MOS-corrected MRF output as input (MOS) were compared with those produced using the climatic Ensemble Streamflow Prediction (ESP) technique. MOS-based streamflow forecasts showed increased skill in the snowmelt-dominated river basins, where daily variations in streamflow are strongly forced by temperature. In contrast, the skill of MOS forecasts in the rainfall-dominated basin (the Alapaha River) were equivalent to the skill of the ESP forecasts. Further improvements in streamflow forecasts require more accurate local-scale forecasts of precipitation and temperature, more accurate specification of basin initial conditions, and more accurate model simulations of streamflow.

Corresponding author address: Dr. Martyn P. Clark, Center for Science and Technology Policy Research, University of Colorado, 1333 Grandview Ave., UCB 488, Boulder, CO 80309-0488.Email: clark@vorticity.colorado.edu

1. Introduction

Rapid population growth and economic development, along with changing social demands on freshwater resources, have imposed new challenges on water management in many regions in the United States. Managers must balance the need to retain as much water as possible in reservoirs to meet the needs of irrigation, hydropower generation, and domestic consumption, along with needs such as ensuring an adequate supply of water for recreational uses, as well as meeting stringent water quality standards, regulations for maintenance of aquatic ecosystems, and the special needs for the protection of threatened or endangered species. Reservoir space also must be maintained to protect downstream homes, farms, and businesses from flooding.

Accurate streamflow forecasts can play a key role in optimizing the use of water. Traditionally, hydrologic forecasts in the United States have been made using the climatic Ensemble Streamflow Prediction (ESP) procedure (Day 1985). In this approach, a hydrologic model is driven with observed precipitation and temperature data up to the beginning of the forecast to estimate basin initial conditions. Then precipitation and temperature data for the same date from every other year in the historical record are used to produce ensemble forecasts of streamflow. For example, an 8-day forecast initialized on 1 January 2004 could use station observations from 2 to 9 January 1950 as inputs for ensemble 1, station observation from 2 to 9 January 1951 as inputs for ensemble 2 … and station observations from 2–9 January 1999 as inputs for ensemble 50. When these ensembles are run through a hydrologic model, the method provides an ensemble of possible streamflow given the antecedent conditions (e.g., soil moisture, water equivalent of the accumulated snowpack) at the start of the forecast. Forecast accuracy is therefore dependent on accurate specification of conditions over the basin at the start of the forecast and the influence of those conditions on the basin hydrologic response. Accuracy also is dependent on the similarity between future weather conditions and the ensembles of historic data from previous years. This approach works well in river systems where substantial lag times are introduced because of storage of water in snowpack or subsurface and ground-water reservoirs. However, because the methodology of ESP weights equally the history for each year in the historical record, the approach often yields a wide range of possible outcomes and low probabilistic forecast skill.

A number of studies have suggested that it is possible to improve the accuracy of probabilistic streamflow forecasts by including in the ESP approach information from meteorological forecasts and climate outlooks [e.g., see the plans for an Advanced Hydrologic Prediction System (AHPS) by the National Weather Service in the United States (Connelly et al. 1999)]. As a first step in this direction, Hamlet and Lettenmaier (1999) modified the ESP approach by restricting ensemble members to years that are similar in terms of the phase of the El Niño–Southern Oscillation (ENSO) and the phase of the Pacific decadal oscillation (PDO). In most cases this provides a set of ensembles that are more tightly clustered than the full ensemble. On shorter time scales, further reductions in ensemble spread may be realized by replacing the ensemble of data from previous years with output from atmospheric forecast models.

This paper explores the utility of atmospheric forecasts for hydrologic predictions on time scales of up to 8 days. This study first evaluates the systematic biases and the accuracy in Medium-Range Forecast Model (MRF) predictions of precipitation and temperature over the contiguous United States and introduces procedures to improve raw MRF output through downscaling. Results are based on the 40+ yr archive of 8-day atmospheric forecasts from the National Centers for Environmental Prediction (NCEP) reanalysis project (described later). As an example application of this approach, this study assesses the hydrologic forecast accuracy obtained when forcing a distributed-hydrologic model with the MRF output for four basins across the contiguous United States.

2. The NCEP forecast archive

a. Project overview

The NCEP reanalysis project (Kalnay et al. 1996; Kistler et al. 2001) produced a retroactive 40+ year record of global atmospheric fields and surface fluxes derived from a numerical weather prediction and data assimilation system kept unchanged over the analysis period. Use of a fixed model eliminates pseudoclimate jumps in archived time series associated with frequent upgrades in the operational modeling system used at NCEP and allows an assessment of the accuracy of a Numerical Weather Prediction (NWP) model over a long time period. However, temporal inconsistencies can still be present because of changes through time in the amount, type, and quality of the available assimilation data. The model used for the reanalysis is identical to the Medium-Range Forecast Model implemented operationally at NCEP in January 1995, except that the horizontal resolution is twice as coarse in the reanalysis version. Every 5 days, a single realization of an 8-day atmospheric forecast was run. For the period 1958–98, this provides more than 2500 8-day forecasts that can be compared with observations.

b. NCEP Medium-Range Forecast Model description

The NCEP reanalysis is performed with a T62 model (approximately 1.9° horizontal resolution) with 28 vertical sigma levels and the spectral statistical interpolation (SSI) for assimilation (Kalnay et al. 1996). Assimilation data are formatted into a common standard World Meteorological Organization (WMO) binary universal format representation (BUFR) and then evaluated by quality control procedures (Dey and Morone 1985; Woollen et al. 1994; DiMego 1988; Kalnay et al. 1996). Data sources include rawinsonde profiles, surface marine reports from the Comprehensive Ocean–Atmosphere Data Set (COADS), aircraft observations of wind and temperature, synoptic reports of surface pressure over land, vertical temperature profiles from the Television Infrared Operation Satellite (TIROS) Operational Vertical Sounder (TOVS) over the ocean, TOVS temperature sounding over land above 100 hPa, surface wind speeds from the Special Sensor Microwave Imager (SSM/I) and satellite cloud drift winds.

Two types of precipitation are computed: convective and grid scale (dynamic). Convection is based on a simplified Arakawa–Schubert scheme (Pan and Wu 1994) that was found to result in improved prediction of precipitation over the continental United States and the tropics as compared to the previous Kuo parameterization (Kalnay et al. 1996). Dynamic precipitation is parameterized by starting at the top layer and checking for supersaturation. If supersaturated, latent heat is released to adjust the specific humidity and temperature to saturation, with the excess water falling to the next lower layer. If this next layer is supersaturated then adjustment to saturation occurs again, and the amount of precipitation is added to that from the higher layer. However, if the layer is unsaturated, some or all of the precipitation is evaporated. The process continues downward with all precipitation that penetrates to the bottom layer allowed to fall to the surface.

3. Station data

This study uses daily precipitation and maximum and minimum temperature data from a network of over 11 000 National Weather Service (NWS) manual cooperative climate observing stations across the contiguous United States. These data were extracted from the National Climatic Data Center (NCDC) Summary of the Day Dataset by J. Eischeid, National Oceanic and Atmospheric Administration (NOAA) Climate Diagnostics Center, Boulder, Colorado (Eischeid et al. 2000). Quality control performed by NCDC includes the procedures described by Reek et al. (1992) that flag questionable data based on checks for (a) outliers, based on extreme values defined for each state; (b) internal consistency among variables (e.g., maximum temperature less than minimum temperature); (c) constant temperature (e.g., 5 or more days with the same temperature are suspect); (d) excessive diurnal temperature range; (e) invalid relations between precipitation, snowfall, and snow depth; and (f) unusual spikes in temperature time series. Records at most of these stations start in 1948 and continue through 1998. We restrict use to only the “best” stations in the Eischeid archive. These are defined as those with less than 10% missing or questionable data during the period 1958–99.

Observation times for stations in the co-op network are mixed. Some co-op observers take measurements in the morning, some observers take measurements in the afternoon, and some observers take measurements in the evening. Specific observation times sometimes vary through time, and are not known for all stations. To address these inconsistencies, the forecast model output was averaged for the three 12-h periods surrounding the day of the observation.

4. Accuracy of the NCEP model

a. Systematic model biases

As a first step in evaluating the accuracy of the NCEP model, the systematic biases in NCEP temperature and precipitation forecasts are examined. Mean biases are evaluated using monthly climatologies of precipitation and temperature derived from the Parameter-elevation Regressions on Independent Slopes Model (PRISM) system (Daly et al. 1994). PRISM uses multiple linear regression techniques to distribute monthly climatologies of precipitation and temperature from a dense network of stations to a 2-km digital elevation model (DEM) over the contiguous United States. The PRISM climatologies are available commercially from Oregon State University's Spatial Climate Analysis Service. In this analysis, the PRISM climatologies were regridded to the NCEP Gaussian grid over the contiguous United States using an average of values from all PRISM grids within each NCEP grid box. The elevation of the resampled-PRISM DEM matches the elevation of the NCEP grid almost perfectly (not shown) and avoids introducing artificial biases associated with differences between the elevation of model grid points and the elevation of individual stations in the NCDC archive (e.g., see Briggs and Cogley 1996). Corresponding climatologies (1961–90) for the NCEP model were computed by averaging the 12-hourly data for each day in the forecast cycle. The 12-h MRF output represents an average precipitation rate and average temperature for the 12 h prior to the forecast time.

Systematic biases in precipitation and 2-m air temperature are summarized for the months of January and July in Fig. 1. The figure shows biases for day+0 (top row) and then for each subsequent forecast lead time. Precipitation biases are expressed as a percentage of the observed (PRISM) mean. On day+0 when the NCEP model atmosphere is strongly constrained by observed data, significant biases are evident in both precipitation and 2-m temperature. Most apparent are the negative temperature biases in the western United States in January and the previously documented positive precipitation biases in the southeastern United States in July (e.g., see Janowiak et al. 1998; Trenberth and Guillemot 1998). The bias characteristics change with forecast lead time (Fig. 1). In the most general sense this reflects the NCEP model “drifting” away from the observed climate toward the model's climate. In January, note the evolution of positive precipitation biases over the western Great Plains, the reduction of the negative day+0 temperature biases over the western United States, and the emergence of positive temperature biases over the northern Great Plains. Of note for July is the disappearance of the positive precipitation biases in the southeastern United States and the strengthening of negative temperature biases in the same region.

These biases are presented as an example of substantial biases in state-of-the-art NWP models. Examples of biases in other NWP models and other regions are provided for day+0 output by Chelliah and Ropelewski (2000), Serreze and Hurst (2000), Reid et al. (2001), and Hagemann and Gates (2001). Chelliah and Ropelewski (2000) compared tropospheric temperature from the Microwave Sounding Unit Channel 2 (MSU Ch2) with estimates from the NCEP reanalysis, the European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis, and the National Aeronautics and Space Administration Data Assimilation Office (NASA DAO) reanalysis. NCEP and ECMWF temperatures averaged over a near-global domain (80°N–80°S) were approximately 2°C higher than the MSU Ch2 values, with those from the NASA DAO reanalysis exhibiting positive biases of 3°C. Serreze and Hurst (2000) diagnosed problems with the NCEP and ECMWF reanalysis in simulating monthly Arctic precipitation. They found that both models underestimate precipitation over the Atlantic side of the Arctic. The most significant problem is a large overprediction of summertime convective precipitation over Arctic land areas in the NCEP reanalysis. Hagemann and Gates (2001) used NCEP and ECMWF reanalysis output to drive a hydrologic model for several large river basins throughout the globe. Of particular note was a wintertime cold bias in the ECMWF 2-m temperatures over high latitudes, resulting in a delay in spring streamflow. Excessive summer precipitation over Northern Hemisphere land areas in the NCEP reanalysis resulted in positive streamflow biases. Such biases need to be removed before NWP model output can be used in hydrologic applications.

b. Forecast accuracy

If biases are systematic, the NCEP model may still have considerable skill in forecasting day-to-day variations in precipitation and temperature. This is indicated by the results of Kalnay et al. (1998), who used the same forecast archive that is used in this study and showed that the NCEP model has appreciable skill in forecasting daily variability in 500-hPa height up to forecast lead times of 8 days. The accuracy of NCEP precipitation and 2-m maximum temperature forecasts is computed by interpolating the NCEP model output for each forecast day to the location of each station in the NWS cooperative network (see section 2b) and computing an appropriate skill score. The skill of precipitation forecasts is measured by Spearman rank correlations, and the skill of the 2-m maximum temperatures is measured by the explained variance using squared Pearson correlations (i.e., the r2 value). Spearman rank correlations are more appropriate than Pearson correlations when normal distributions cannot be assumed (as is the case for daily precipitation). We use Kupier's skill score (Wilks 1995) to evaluate the accuracy of precipitation occurrence predictions, that is, how well forecasted wet (dry) days match observed wet (dry) days. To avoid the possibility of spuriously high correlations that can result from matching zero-precipitation days, Spearman rank correlations for precipitation are only computed for days when both the station and NCEP model report precipitation. This reduces the number of days for analysis, particularly in dry regions and at longer forecast lead times when precipitation occurrence in the NCEP model is poorly matched with precipitation occurrence in observed records. Skill scores are only computed if there are more than 50 valid days available for analysis. More details on the skill scores are provided in the appendix.

The intent of comparisons between the NCEP model output and station observations is not to assess the true model skill (which would be done by interpolating the station observations to the NCEP grid, with topographic corrections) but to assess the potential utility of the NCEP MRF output at the local scales important for water resource applications. The goal of this study is to determine if global-scale forecast models contain useful local-scale information. Note however that the Pearson and Spearman correlation statistics are not sensitive to differences in the mean (appendix), so the effects of differences between grid-box and station elevations are reduced.

The accuracy of NCEP precipitation and 2-m maximum temperature forecasts is presented in Fig. 2 for the months of January and July. The skill at each individual station is represented by a colored dot. The NCEP model is shown to capture important aspects of day-to-day variations in precipitation and 2-m maximum temperature. In particular, note the modest skill in January precipitation forecasts over California and the upper Midwest states at the beginning of the forecast cycle and the high skill in January 2-m maximum temperature forecasts (through to day+4) over the eastern half of the United States. In July, the skill of precipitation forecasts is rather low across the entire country, but the 2-m maximum temperature forecasts exhibit high skill over the Pacific Northwest and the Midwest states where there is a high frequency of summertime clear days. Low forecast skill is evident in January for precipitation throughout the Rocky Mountains. The 2-m maximum temperature forecasts in January exhibit low skill over the Rocky Mountains and Appalachians, and in July the 2-m maximum temperature forecasts have lower skill east of the Mississippi River. In all cases precipitation forecasts have much lower skill than the 2-m maximum temperature forecasts.

The generally poor precipitation forecasts limit the use of global-scale NWP model output in river basins where the surface hydrology is dominated by rainfall. In river basins dominated by snowmelt (where the surface hydrology is controlled by variations in temperature), difficulties in providing accurate precipitation forecasts are less important. The case studies presented later in this paper demonstrate that while precipitation forecasts from the NCEP global-scale NWP model are not accurate enough to provide credible predictions of streamflow in river basins where the surface hydrology is dominated by rainfall, the NCEP temperature forecasts do provide useful predictions of streamflow in river basins dominated by snowmelt.

5. Improvement of raw NCEP NWP output

a. Background

Given the large systematic biases in the NCEP model and the poor skill in precipitation and 2-m air temperature forecasts in some regions, it is necessary to use methods that may improve upon the raw forecasts. The technique of model output statistics (MOS; e.g., Glahn and Lowry 1972; Antolik 2000) may be useful for this purpose. MOS downscaling approaches develop empirical relations between gridpoint values of NWP model output (e.g., vertical velocity, total column precipitable water, static stability) and observed data. An advanced MOS system was entered in the 1996–97 National Collegiate Weather Forecasting Contest and finished better than approximately 97% of the human forecasters who entered the contest (Vislocky and Fritsch 1997). The disadvantage of MOS is that the MOS equations must be developed using an archive of forecasts from the same model that is used in the operational setting. The practice at NCEP and other modeling centers is to frequently implement a new improved version of the operational model, meaning that the length of the forecast archive from the operational model may be too short to develop reliable MOS equations.

b. MOS technique

In the MOS technique used in this study, variables included in the NCEP forecast archive were used as predictors in a multiple linear regression approach to forecast precipitation occurrence, precipitation amounts, maximum temperature, and minimum temperature for stations in the National Weather Service cooperative network (Fig. 2). The MOS technique used in this study includes three main steps: preprocessing of the station data, development of the regression equations, and application of the regression equations—including stochastic modeling of the regression residuals to generate ensemble forecasts.

For the first step, the station time series of precipitation are preprocessed. For precipitation occurrence, the daily precipitation data at a given station are converted to a binary time series of 1's (wet days) and 0's (dry days); the regression equation thus predicts the probability of precipitation (see also Antolik 2000). For precipitation amount, the station precipitation data (only wet days) is transformed to a normal distribution using a nonparametric probability transform (e.g., Panofsky and Brier 1963). To do this, we compute the cumulative probability of observed precipitation (based on the ranked time series) and the cumulative probability of a standard normal distribution (mean of zero and standard deviation of one). The cumulative probability of each daily precipitation total in the observed record is matched with the cumulative probability in the standard normal distribution, and the precipitation value is replaced with the corresponding z score. For example, a precipitation value of 16.25 mm may have a cumulative probability of 0.84 and correspond to a z score of 1.0. The ranked daily precipitation data for the dependent sample is saved for a later retransform of the downscaled precipitation predictions. In the retransform, a linear interpolation is generally necessary because the cumulative probability of the downscaled z score lies between the cumulative probability of two of the ranked observed values. In rare cases when the cumulative probability of the downscaled z score is smaller (larger) than the lowest (highest) cumulative probability in the ranked observed time series, it is ascribed the lowest (highest) observed value in the dependent time period.

Multiple linear regression with forward selection is used to develop the MOS equations (Antolik 2000). The forward selection procedure first identifies the predictor variable (e.g., total column precipitable water), which explains the most variance of the predictand (e.g., precipitation at a station location). It then searches through the remaining variables and selects the variable that reduces the largest portion of the remaining unexplained variance in combination with the variable already chosen. If the improvement in explained variance exceeds a given threshold (taken here as 1%), the variable is included in the multiple linear regression equation. The remaining variables are examined in the same way until no further improvement is obtained based on the correlation threshold. The MOS equations are developed over the period 1958–76 and validated over the period 1977–98, which represent two different climate regimes over the North Pacific Ocean and North America (Mantua et al. 1997). A separate regression equation is developed for each station, each forecast lead time, and each month.

To provide a fairly complete description of forecasted atmospheric conditions, a large pool of potential predictor variables is tested in the multiple linear regression model (Table 1). Predictor variables from the NCEP archive include geopotential height, temperature, wind, and humidity at five pressure levels (300, 500, 700, 850, and 1000 hPa), various surface flux variables (e.g., downwelling shortwave radiation flux, 24-h accumulated precipitation), and derived variables such as vorticity advection, zonal and meridional moisture fluxes, and stability indices. All predictor variables are taken from grid boxes within a 500-km search radius of the station being modeled and interpolated to the station location using Cressman (inverse distance) interpolation. Grid-binary predictors are also used (Jensenius 1992; Antolik 2000). Gridded fields of the downwelling shortwave radiation flux, the precipitation rate, 850-hPa relative humidity, and the modified-K stability index (Charba 1977; Peppler and Lamb 1989) were compared against threshold values (in this case, tercile values for each quantity, computed separately for each month). For each grid point, each day with a data value above a given threshold was assigned a “1,” and each day below that threshold was assigned a “0.” These gridded binary fields were then interpolated to the station locations in an identical manner to the standard and computed predictors, providing a continuous predictor time series bounded by zero and one. All predictor variables are lagged to account for possible temporal phase errors in the atmospheric forecasts. For example, regression models developed for forecast day+3 include variables at forecast lead times of 48, 60, 72, 84, and 96 h. In Colorado, mountain standard time, hour+72 corresponds to 1700 local time 3 days from the start of the forecast for variables reporting a snapshot of atmospheric conditions (e.g., 500-hPa height), and hour+72 corresponds to the period 0500 to 1700 local time 3 days from the start of the forecast for variables reporting 12-h averages (e.g., downwelling shortwave radiation flux).

Cross-validation procedures are used to avoid overspecification of the regression equation or chance selection of a set of insignificant variables (Michaelsen 1987; Allen 1971). The dependent sample (1958–76) is randomly broken into two periods. For a given combination of variables, the equation is trained on the first period (true-dependent sample) and validated on the second period (pseudoindependent sample). This process is repeated five times for the same combination of variables. The selection of the set of predictor variables is based on the average explained variance from the five pseudoindependent samples. Once the variable set is selected, coefficients in the regression equation are estimated using the entire dependent sample. Typically, between three and eight variables are used in the MOS equations.

A final step in the downscaling procedure is stochastic modeling of the residuals in the multiple linear regression equations to provide an assessment of model uncertainty and permit the generation of probabilistic forecasts. For maximum and minimum temperature, this is achieved by extracting a random number from a normal Gaussian distribution (mean of zero and standard deviation of one), multiplying the random number by the standard deviation of the regression residuals, and adding this product to the forecast of temperature. For precipitation, we first determine precipitation occurrence. A random number is drawn from a uniform distribution ranging from zero to one. If the random number is lower than the forecasted probability of precipitation occurrence, the day is classified as a precipitation day. Precipitation amounts are only forecasted for precipitation days. After forecasting precipitation amounts, residuals are modeled stochastically using methods identical to those used for maximum and minimum temperature, and then the forecasted (normally distributed) precipitation amounts are transformed back to the original gamma-type distribution of observed precipitation using the nonparametric probability transform techniques described earlier. The stochastic modeling of the regression residuals inflates the variance of precipitation and temperature forecasts, reducing problems of variance underestimation that are typical of regression-based models.

c. Forecast improvements

Output from the MOS system does not contain the large biases evident in the raw NCEP predictions. Figure 3 illustrates the observed and modeled long-term (1977–98) mean for maximum and minimum temperature, precipitation occurrence, and precipitation amounts for the four midseason months of January, April, July, and October. Each point represents the observed and modeled 1977–98 mean for an individual station. The median absolute bias, computed over all stations for individual months, is summarized in Table 2. For all variables, the MOS system reproduces spatial variations in the long-term climatologies (Fig. 3). The median absolute bias in maximum and minimum temperature at individual stations is less than 0.5°C in all months apart from December, and the median absolute bias for precipitation at individual stations is less than 15% of the mean in all months except for July and August (Table 2). The lower scatter between observed and modeled long-term mean temperatures (Fig. 3) occurs because the temporal variations in temperature at a given station are much lower than the spatial variations in long-term mean temperature across the contiguous United States.

The ability of the MOS-based system to reproduce daily variability in precipitation and maximum temperature is presented in Fig. 4 for the months of January and July. The presentation is identical to Fig. 2, where each station in the contiguous United States is represented by a colored dot. Spatial variations in the accuracy of the MOS-based precipitation and maximum temperature predictions are similar to the raw NCEP predictions. Comparing Fig. 4 with Fig. 2, note again the modest skill for January precipitation predictions in California and the upper Midwest, the high skill for January temperature predictions over the eastern United States, and the low skill for July precipitation predictions throughout the country. There are, however, several regions where the MOS-based forecasts are more accurate than the raw NCEP forecasts. Higher forecast accuracy is readily apparent for January precipitation in the northeastern United States and for maximum temperature in January and July over the entire country. The skill of the maximum temperature forecasts over the Rocky Mountains and Appalachians in January, and over the east coast of the United States in July (Fig. 2), improves when applying statistical MOS guidance (Fig. 4).

To bring these results together, Fig. 5 compares the median skill of the raw NCEP and MOS-based precipitation and temperature forecasts for the four midseason months of January, April, July, and October. The median forecast skill is computed using all stations in the cooperative network that had sufficient data to develop MOS equations and compute skill scores (i.e., the median is computed for all stations in Figs. 2 and 4). As in Figs. 2 and 4, the skill of the maximum and minimum temperature forecasts is measured by the explained variance (i.e., the r2 value), and the skill of precipitation forecasts is measured by Spearman rank correlations. Kupier's skill score (Wilks 1995) is used to evaluate the accuracy of precipitation occurrence predictions, that is, how well forecasted wet (dry) days match observed wet (dry) days (see appendix). Raw NCEP forecasts are shown in Fig. 5 as squares, and MOS-based forecasts are shown as triangles.

The MOS-based maximum and minimum temperature forecasts are, in almost all cases, more accurate than the raw NCEP forecasts (top two rows in Fig. 5). This is most apparent at the beginning of the forecast cycle in January (e.g., day+0), where the MOS-based predictions explain approximately 20% more variance than the raw NCEP predictions. For precipitation occurrence, the MOS guidance has substantially greater accuracy than the raw NCEP predictions. This mostly reflects the frequent “drizzle” in global-scale models that is accentuated by interpolating NCEP data from all grid points within a 500-km search radius to point station locations. MOS-based predictions of precipitation amounts (bottom row in Fig. 5) are less accurate than the raw NCEP predictions. While this result is initially surprising, it is most likely due to an inadequate number of precipitation days used to develop the MOS equations in dry regions. Recall that the regression equations for precipitation amounts are developed using the subset of days when there is precipitation at the station. The MOS-based day+0 precipitation predictions in January (Fig. 3) are of much higher skill in California and the northeastern United States than the corresponding raw NCEP predictions (Fig. 5). Work is continuing to determine the minimum sample size necessary to develop stable MOS equations, and the effects of different methods to artificially increase the sample size (e.g., “borrowing” data from adjacent months and nearby stations). The results in Fig. 5 show that improvements obtained from MOS are most pronounced for short forecast lead times. After about 4–5 days, correlations between MOS predictions and station data are of similar magnitude to correlations between raw NCEP output and station data. For these longer lead times, the main benefit of the MOS approach is to correct the systematic biases in the NCEP MRF output.

6. Use of MRF output to produce forecasts of streamflow

Based on the results presented thus far, we assess if the use of downscaled MRF output (MOS) be used in a hydrologic model to improve upon the traditional practice of forecasting streamflow using the climatic ESP procedure (Day 1985). Potential increases in forecast skill are assessed when the ESP ensemble inputs are replaced with MOS-based ensemble forecasts of precipitation and temperature. The streamflow forecast experiments are constructed as follows: Basin initial conditions for both the MOS-based and ESP forecasts are estimated by running our hydrologic model with station observations up to the start of each forecast period, and then with the forecast ensemble. The forecast ensemble for ESP comprises station observations from matching dates in the historical record [see Day (1985) for more details], and the forecast ensemble from MOS is generated though the regression-based estimates and stochastic modeling of the residuals in the regression equation (see section 5b for more details). In the ESP approach there is essentially zero skill in the inputs—the skill in the forecasts of streamflow is due to specification of the basin conditions at the start of the forecast and the influence of those conditions on the basin hydrologic response.

Hydrologic model simulations for these forecast experiments are performed using the U.S. Geological Survey's Precipitation-Runoff Modeling System (PRMS), which is described in detail in Leavesley et al. (1983), Leavesley et al. (1996), and Hay et al. (2002). The hydrologic simulations generated using station observations (precipitation and maximum and minimum temperature) were used as a measure of “truth” to assess the skill of the hydrologic forecasts. This focuses attention on the hydrologic impact of errors in the inputs, instead of possible errors in the hydrologic model itself. The ESP and MOS-based forecasts use the same initial conditions, which are estimated for each forecast by running PRMS with station observations. The ranked probability skill score (RPSS) is used to assess the skill of the ESP and MOS-based streamflow forecasts (appendix).

Results are examined in the following four basins: 1) Animas River at Durango, Colorado (Animas); 2) East Fork of the Carson River near Gardnerville, Nevada (Carson); 3) Cle Elum River near Roslyn, Washington (Cle Elum); and 4) Alapaha River at Statenville, Georgia (Alapaha). The surface hydrology of the first three basins (Animas, Carson, and Cle Elum) is dominated by snowmelt. The Carson and Cle Elum basins are also characterized by frequent rain-on-snow events in the winter months. The Alapaha basin is a low-elevation rainfall-dominated basin. Table 3 lists some of the defining features of each basin, and Fig. 6 shows the location of each.

The probabilistic skill of the 8-day streamflow forecasts produced using statistically downscaled MRF output (MOS) and the climatic ESP technique are presented in Fig. 7. The contour plots show the month along the x axis, the forecast day along the y axis, and the RPSS as the contoured variable. Increases in forecast skill from MOS-based forecasts are most pronounced during the peak snowmelt season in the three western basins (the Animas, the East Fork of the Carson, and the Cle Elum). At this time of the year, daily variations in streamflow are more closely tied to variations in temperature than precipitation, and the high skill in predictions of temperature translates into high skill in predictions of streamflow. The MOS-based forecasts and ESP perform equally well in the rainfall-dominated basin (Alapaha), where skillful predictions of streamflow are hampered by the poor predictions of precipitation. The conclusion gleaned from these results is that skillful short-term predictions of runoff are possible in snowmelt situations, when knowledge of the accumulated snowpack is available. Further work on a larger set of basins is required to verify this statement.

7. Summary and discussion

This paper examined an archive containing more than 40 years of 8-day atmospheric forecasts from the NCEP reanalysis project to assess the possibilities for using medium-range NWP model output for predictions of streamflow. Systematic biases in the NCEP forecasts are often large. In many regions, precipitation biases are in excess of 100% of the mean, and temperature biases are in excess of 3°C. In some locations, biases are even higher. In addition, the accuracy of the NCEP forecasts is rather low in many areas of the country. Most apparent are the generally low skill in precipitation forecasts (particularly in July) and the low skill in temperature forecasts over the Rocky and Appalachian Mountains in January and over the eastern seaboard in July. These results outline a clear need for additional processing of the NCEP Medium-Range Forecast Model output before it is used for hydrologic predictions.

Techniques of model output statistics (MOS) were used to improve the raw NCEP forecast model output. In our MOS technique, atmospheric variables included in the NCEP forecast archive (e.g., total column precipitable water, 2-m air temperature) were used as predictors in a forward screening multiple linear regression approach to forecast precipitation occurrence, precipitation amounts, maximum temperature, and minimum temperature for over 11 000 stations in the National Weather Service cooperative network. This procedure effectively removes all systematic biases in the raw NCEP precipitation and temperature forecasts. In addition, the MOS guidance results in substantial improvements in the accuracy of maximum and minimum temperature forecasts throughout the country, most apparent in Rocky and Appalachian Mountains where the skill of the raw 2-m temperature forecasts was low. MOS guidance also substantially improves predictions of precipitation occurrence. Forecast improvements of precipitation amounts are more modest than for temperature and precipitation occurrence. The MOS guidance did result in increased forecast accuracy over the northeastern United States in January, but overall the accuracy of MOS-based forecasts of precipitation amounts is slightly lower than the raw NCEP forecasts. This may be due to an inadequate number of precipitation days to develop stable MOS equations in dry regions. Nevertheless, the raw NCEP precipitation forecasts contain biases and need some statistical correction before they can be used directly in hydrologic forecasting applications.

Statistically downscaled MRF output (NCEP atmospheric forecasts) are found to provide realistic predictions of streamflow in the snowmelt-dominated basins examined in the western United States. Short-term variations in streamflow in snowmelt-dominated river systems are influenced more by variations in temperature than variations in precipitation. If the volume of snowpack is estimated correctly for the winter season, reliable short-term forecasts of streamflow are possible through a good representation of the effects of temperature on the rates of snowmelt (see also Hay et al. 2002). The accuracy of the MOS-based temperature forecasts are in fact much higher than the MOS-based precipitation forecasts (Fig. 4), and the predictions of streamflow in snowmelt-dominated river basins (the Animas, East Fork of the Carson, and the Cle Elum) do exhibit greater improvement over ESP than in the rainfall-dominated basin (the Alapaha). The poor forecasts of precipitation limit the use of atmospheric forecast output in river basins where the surface hydrology is dominated by rainfall.

Further improvement in the skill of streamflow forecasts depends not only on the local-scale forecasts of precipitation and temperature, but also on the specification of basin initial conditions and on hydrologic model simulations of streamflow. All three of these issues need more attention. On the atmospheric side, forecast skill at local scales is limited by the coarse horizontal resolution of the MRF (e.g., precipitation occurs on the subgrid scale) and deficiencies in model physics (e.g., summertime precipitation may be poorly represented because of inadequacies in convective parameterizations). It is likely that nesting a series of regional atmospheric models to finer scales may be necessary to adequately resolve the subgrid-scale variations and physical atmospheric processes important for hydrologic modeling. Of course, at longer lead times forecast skill depends on the prediction of large-scale climate features (e.g., the 500-hPa height field). In terms of estimating basin initial conditions, opportunities for forecast improvements are largest in snowmelt-dominated basins where it is possible to estimate the spatial variability of snowpack from satellites. New satellite missions to estimate soil moisture and subsurface storage offer some promise, but satellite estimates of soil moisture are currently only reliable for the top soil layers on nonvegetated surfaces, and estimates of subsurface storage are currently only reliable on large spatial scales. Further work is needed on this topic. Improved model simulations of streamflow are possible through both advances in parameter estimation methodologies and improvements in model structure. Recent work has shown the value of multimodel approaches to improve hydrologic predictions, and these approaches can be implemented operationally through close coordination between different modeling groups. Focused attention on improving streamflow forecasts will help water managers optimize the use of water resources and thus satisfy the increasingly competitive demands for water.

Acknowledgments

This work was supported by both the NOAA GAPP Program (Award NA16GP2806) and the NOAA RISA Program (Award NA17RJ1229). The authors are grateful to Mark C. Serreze and Robert L. Wilby for comments on an earlier draft of this manuscript.

REFERENCES

  • Allen, D. M., 1971: Mean square error of prediction as a criterion for selecting variables. Technometrics, 13 , 469475.

  • Antolik, M. S., 2000: An overview of the National Weather Service's centralized statistical quantitative precipitation forecasts. J. Hydrol., 239 , 306337.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Briggs, P. R., , and Cogley J. G. , 1996: Topographic bias in mesoscale precipitation networks. J. Climate, 9 , 205218.

  • Charba, J. P., 1977: Operational system for predicting thunderstorms two to six hours in advance. NOAA Tech. Memo. NWS TDL-64, 24 pp.

  • Chelliah, M., , and Ropelewski C. F. , 2000: Reanalyses-based tropospheric temperature estimates: Uncertainties in the context of global climate change detection. J. Climate, 13 , 31873205.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Connelly, B. A., , Braatz D. T. , , Halquist J. B. , , DeWeese M. M. , , Larson L. , , and Ingram J. J. , 1999: Advanced hydrologic prediction system. J. Geophys. Res., 104 (D16) 1965519660.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Daly, C., , Neilson R. P. , , and Phillips D. L. , 1994: A statistical-topographic model for mapping climatological precipitation over mountainous terrain. J. Appl. Meteor., 33 , 140158.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Day, G. N., 1985: Extended streamflow forecasting using NWSRFS. ASCE J. Water Res. Plann. Manage., 111 , 157170.

  • Dey, C. H., , and Morone L. L. , 1985: Evolution of the NMC Global Assimilation System: January 1982–December 1983. Mon. Wea. Rev., 113 , 304318.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DiMego, G., 1988: The National Meteorological Center Regional Analysis System. Mon. Wea. Rev., 116 , 11371156.

  • Eischeid, J. K., , Pasteris P. A. , , Diaz H. F. , , Plantico M. S. , , and Lott N. J. , 2000: Creating a serially complete, national daily time series of temperature and precipitation for the western United States. J. Appl. Meteor., 39 , 15801591.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Glahn, H. R., , and Lowry D. A. , 1972: The use of model output statistics (MOS) in objective weather forecasting. J. Appl. Meteor., 11 , 12031211.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hagemann, S., , and Gates L. D. , 2001: Validation of the hydrological cycle of ECMWF and NCEP reanalyses using the MPI hydrological discharge model. J. Geophys. Res., 106 (D2) 15031510.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hamlet, A. F., , and Lettenmaier D. P. , 1999: Columbia River streamflow forecasting based on ENSO and PDO climate signals. ASCE J. Water Res. Plann. Manage., 125 , 333341.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hay, L. E., , Clark M. P. , , Wilby R. L. , , Gutowski W. J. , , Arritt R. W. , , Takle E. S. , , Pan Z. , , and Leavesley G. H. , 2002: Use of regional climate model output for hydrologic simulations. J. Hydrometeor., 3 , 571590.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Janowiak, J. E., , Gruber A. , , Kondragunta C. R. , , Livezey R. E. , , and Huffman G. J. , 1998: A comparison of the NCEP–NCAR reanalysis precipitation and the GPCP rain gauge–satellite combined dataset with observational error considerations. J. Climate, 11 , 29602979.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jensenius J. S. Jr., , 1992: The use of grid-binary variables as predictors for statistical weather forecasting. Preprints, 12th Conf. on Probability and Statistics in the Atmospheric Sciences, Toronto, ON, Canada, Amer. Meteor. Soc., 225–230.

    • Search Google Scholar
    • Export Citation
  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77 , 437471.

  • Kalnay, E., , Lord S. J. , , and McPherson R. D. , 1998: Maturity of operational numerical weather prediction: Medium range. Bull. Amer. Meteor. Soc., 79 , 27532769.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-year reanalysis: Monthly means CD-ROM and documentation. Bull. Amer. Meteor. Soc., 82 , 247267.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Leavesley, G. H., , Lichty R. W. , , Troutman B. M. , , and Saindon L. G. , 1983: Precipitation-runoff modeling system: User's manual. U.S. Geological Survey Water Investment Rep. 83-4238, 207 pp.

    • Search Google Scholar
    • Export Citation
  • Leavesley, G. H., , Restrepo P. J. , , Markstrom S. L. , , Dixon M. , , and Stannard L. G. , 1996: The modular modeling system—MMS: User's manual. U.S. Geological Survey Open File Rep. 96-151, 142 pp.

    • Search Google Scholar
    • Export Citation
  • Mantua, N. J., , Hare S. R. , , Zhang Y. , , Wallace J. M. , , and Francis R. C. , 1997: A Pacific Interdecadal Climate Oscillation with impacts on salmon production. Bull. Amer. Meteor. Soc., 78 , 10691079.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Michaelsen, J., 1987: Cross-validation in statistical climate forecast models. J. Climate Appl. Meteor., 26 , 15891600.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pan, H-L., , and Wu W-S. , 1994: Implementing a mass flux convection parameterization package for the NMC medium range forecast model. Preprints, 10th Conf. on Numerical Weather Prediction, Portland, OR, Amer. Meteor. Soc., 96–98.

    • Search Google Scholar
    • Export Citation
  • Panofsky, H. A., , and Brier G. W. , 1963: Some Applications of Statistics to Meteorology. Mineral Industries Continuing Education, College of Mineral Industries, The Pennsylvania State University, 224 pp.

    • Search Google Scholar
    • Export Citation
  • Peppler, R. A., , and Lamb P. J. , 1989: Tropospheric static stability and central North American growing season rainfall. Mon. Wea. Rev., 117 , 11561180.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Reek, T., , Doty S. R. , , and Owen T. W. , 1992: A deterministic approach to the validation of historical daily temperature and precipitation data from the cooperative network. Bull. Amer. Meteor. Soc., 73 , 753762.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Reid, P. A., , Jones P. D. , , Brown O. , , Goodess C. M. , , and Davies T. D. , 2001: Assessments of the reliability of NCEP circulation data and relationships with surface climate by direct comparisons with station based data. Climate Res., 17 , 247261.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Serreze, M. C., , and Hurst C. M. , 2000: Representation of mean Arctic precipitation from NCEP–NCAR and ERA reanalyses. J. Climate, 13 , 182201.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., , and Guillemot C. J. , 1998: Evaluation of the atmospheric moisture and hydrologic cycle in the NCEP/NCAR reanalyses. Climate Dyn., 14 , 213231.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vislocky, R. L., , and Fritsch J. M. , 1997: Performance of an advanced MOS system in the 1996–97 National Collegiate Weather Forecasting Contest. Bull. Amer. Meteor. Soc., 78 , 28512857.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. Academic Press, 467 pp.

  • Woollen, J. S., , Kalnay E. , , Gandin L. , , Collins W. , , Saha S. , , Kistler R. , , Kanamitsu M. , , and Chelliah M. , 1994: Quality control in the reanalysis system. Bull. Amer. Meteor. Soc., 75 , 1314.

    • Search Google Scholar
    • Export Citation

APPENDIX

Measures of Forecast Skill

The Pearson and Spearman correlation coefficients

The equation for the Pearson correlation coefficient is
i1525-7541-5-1-15-ea1
where x is station data (in this case maximum and minimum temperature observations), x is the mean of the station observations, y is the predicted value (in this case raw NCEP output or the MOS-based predictions), and y is the mean of the predicted value. The equation for the Spearman “rank” correlation coefficient is identical to Eq. (A1), except the rank of the values is used in place of the actual values. For both the Pearson and Spearman correlation coefficients, a value of 1.0 represents a perfect forecast.

Kupier's skill score

Kupier's skill score is used to assess the skill of binary predictions (in this case precipitation occurrence). It is calculated from a 2 × 2 contingency table as
i1525-7541-5-1-15-ea2
where a represents the number of cases when a wet day was forecast and a wet day was observed, b represents the number of cases when a wet day was forecast and a dry day was observed, c represents the number of cases when a dry day was forecast and a wet day was observed, and d represents the number of cases when a dry day was forecast and a dry day was observed (see also Wilks 1995). Similar to the Pearson and Spearman correlation coefficients, a value of 1.0 represents a perfect forecast for Kupier's skill score.

The ranked probability skill score

The RPSS is used to provide a measure of the probabilistic skill of the ensemble streamflow forecasts. The RPSS is based on the ranked probability score (RPS) computed for each forecast-observation pair:
i1525-7541-5-1-15-ea3
where Ym is the cumulative probability of the forecast for category m, and Om is the cumulative probability of the observation for category m. This is implemented as follows (see also Wilks 1995): First, the observed time series is used to distinguish 10 (J) possible categories for forecasts of precipitation and temperature (i.e., the minimum value to the 10th percentile, the 10th percentile to the 20th percentile … the 90th percentile to the maximum value). These categories are determined separately for each month and basin. Next, for each forecast-observation pair, the number of ensemble members forecast in each category is determined and their cumulative probabilities are computed. Similarly, the appropriate category for the observation is identified and the observation's cumulative probabilities are computed (i.e., all categories below the observation's position are assigned “0,” and all categories equal to and above the observation's position are assigned “1”). Now, the RPS is computed as the squared difference between the observed and forecast cumulative probabilities, and the squared differences are summed over all categories [Eq. (A3)]. The RPSS is then computed as
i1525-7541-5-1-15-ea4
where RPS is the mean ranked probability score for all forecast-observation pairs, and RPSrand is the mean ranked probability score for randomly shuffled forecast-observation pairs.

Fig. 1.
Fig. 1.

Systematic biases in NCEP forecasts of precipitation and 2-m air temperature, showing biases for day+0 (top row) and then biases for each subsequent forecast lead time. Precipitation biases are expressed as a percentage of the PRISM climatology, and temperature biases are expressed as a departure from the PRISM climatology (°C)

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Fig. 2.
Fig. 2.

Accuracy of the raw NCEP precipitation and 2-m max temperature forecasts, showing forecast skill for day+0 (top row) and then skill for each subsequent forecast lead time. Forecast skill for precipitation forecasts is assessed using Spearman rank correlations, and forecast skill for 2-m max temperature forecasts is assessed using squared Pearson correlations (r2)

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Fig. 3.
Fig. 3.

Scatterplots of the observed and modeled MOS-based long-term (1977–98) mean for the four midseason months of Jan (column 1), Apr (column 2), Jul (column 3), and Oct (column 4) for max and min temperature (°C; top two rows), precipitation occurrence (expressed as a percentage of precipitation days; third row), and precipitation amounts (mm day−1; bottom row). Each point illustrates the observed and modeled mean for an individual station

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Fig. 4.
Fig. 4.

Accuracy of the MOS-based precipitation and 2-m max temperature forecasts, showing forecast skill for day+0 (top row) and then skill for each subsequent forecast lead time. Forecast skill for precipitation forecasts is assessed using Spearman rank correlations, and forecast skill for 2-m max temperature forecasts is assessed using squared Pearson correlations (r2)

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Fig. 5.
Fig. 5.

Accuracy of the raw NCEP and the MOS-based precipitation and temperature forecasts for the four midseason months of Jan (column 1), Apr (column 2), Jul (column 3), and Oct (column 4). Shown are skill scores for max and min temperature (squared Pearson correlation; top two rows), precipitation occurrence (Kupier's skill score; third row), and precipitation amounts (Spearman rank correlation; bottom row). Raw NCEP predictions are expressed with a dotted line (squares), and the MOS-based predictions are expressed with a solid line (triangles)

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Fig. 6.
Fig. 6.

Location of study basins

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Fig. 7.
Fig. 7.

RPSS calculated for each forecast day and month, using (top) MOS-based precipitation and temperature forecasts and (bottom) the climatological ESP approach. See text for further details

Citation: Journal of Hydrometeorology 5, 1; 10.1175/1525-7541(2004)005<0015:UOMNWP>2.0.CO;2

Table 1.

Percentage of equations that a given predictor variable is selected for use in the MOS system, and (in parentheses) the percentage of equations that a given variable is the first variable for MOS-based predictions of max temperature (TMAX), min temperature (TMIN), precipitation occurrence (POCC), and precipitation amounts (PRCP). Data are aggregated over all stations, all forecast lead times, and for the five lagged time periods. See text for more details. Downwelling shortwave radiation flux is only used for daytime forecast periods (0000–1200 UTC). High-pass-filtered 500- and 1000-hPa height is computed using all time periods (i.e., midpoint minus the mean of all five periods)

Table 1.
Table 1.

(Continued )

Table 1.
Table 2.

Median absolute bias, computed over all stations in the contiguous United States, for the MOS-based predictions of max temperature (TMAX), min temperature (TMIN), precipitation occurrence (POCC), and precipitation amounts (PRCP). Median absolute bias for max and min temperature is expressed in terms of °C. Median absolute bias for precipitation occurrence is expressed as a percentage of days with precipitation, and the median absolute bias for precipitation amounts is expressed as a percentage of the long-term precipitation mean

Table 2.
Table 3.

Study basins

Table 3.
Save