## 1. Introduction

The Hydrometeorological Prediction Center (HPC) at the National Centers for Environmental Prediction (NCEP) produces a suite of deterministic 6-h quantitative precipitation forecasts (QPFs) out through 3 days. Day 1 forecasts have been produced for over 40 yr, and verification statistics show steady, gradual improvement (Olson et al. 1995; Hoke et al. 2000). A primary role of the HPC is to produce QPF guidance that can be used by National Weather Service (NWS) River Forecast Centers (RFCs) and Weather Forecast Offices (WFOs) to increase the lead time of watches and warnings for floods, flash floods, and major snowfalls. While these forecasts have proven to be useful in their present form, they offer no information concerning the uncertainties of individual forecasts. Many users (including RFCs) have expressed a need for an objective way of assessing the likely success of a particular forecast.

The quality of manual QPF forecasts, especially beyond 12 h, varies considerably from forecast to forecast. This can be related directly to the inherent uncertainty in model predictions that serve as guidance for HPC forecasts. One of the possible ways to quantify the uncertainty is the use of short-range ensemble forecasts (SREFs). The availability of ensemble forecasts has allowed forecasters to assess the uncertainty of model forecasts. Many of the early ensemble studies tried to address the ability of ensemble forecasts to predict the uncertainty of the model forecasts (e.g., Barker 1991; Houtekamer 1993; Buizza 1997; Hamill and Colucci 1998; Whitaker and Loughe 1998; Hou et al. 2001; Toth et al. 2001; Grimit and Mass 2002; Stensrud and Yussouf 2003; Scherrer et al. 2004). Traditionally, the ability has been measured in terms of a linear correlation between ensemble spread and ensemble mean skill. The correlation between spread and skill has been shown to be positive for short-range and mesoscale forecasts, however, generally less than 0.5 (Barker 1991; Hamill and Colucci 1998; Whitaker and Loughe 1998; Stensrud et al. 1999). Although most previous studies reported little correlation, higher correlations are shown in more recent studies (Grimit and Mass 2002; Stensrud and Yussouf 2003). Using more recent and improved ensemble datasets, the spread–error correlations have peaked near 0.8, but are generally lower (Grimit and Mass 2002; Stensrud and Yussouf 2003).

This study is an attempt to quantify objectively the level of confidence that is justified in a particular HPC QPF (HQPF) by relating errors in HQPF to ensemble spread. The hypothesis used in this approach is that the larger the spread of the ensemble forecast, the greater the uncertainty of a particular HQPF. The first step was to find a relationship between the desired quantity, the absolute error (AE) of the HQPF, and a known quantity available from the NCEP Environmental Modeling Center (EMC) SREF.

High correlations between HQPF AE and ensemble QPF spread (SP) are shown to exist and substantiate the use of regression model equations. This approach uses the linear regression of the AE on SP not only to predict the AE but also to derive a confidence interval (CI) for the AE from its distribution about the regression line. Using the regression model equation parameters derived at each horizontal grid point for each season and individual forecast lead time, we predict an AE associated with an individual SP and a 95% CI of the AE. Based on the AE CI forecast and the HQPF itself, we also predict the 95% CI of the HQPF. At the time of writing, real-time CI forecasts are available online for the continental U.S. twice (0000 and 1200 UTC) a day (http://www.hpc.ncep.noaa.gov/qpfci/qpfci.shtml).

The method obtained from this study is efficient and cost effective to develop and implement on an inexpensive computing platform. The end product for operational use is reasonably simple, providing an estimate of the uncertainty in the deterministic HQPF. The basic concept of this approach is the same as the traditional spread–error skill. However, the performance of the previous work was limited to providing only deterministic outputs of the error estimates. Our study advances upon previous work by applying CI statistics, which makes it possible to produce probabilistic error forecasts as well as deterministic error forecasts. This study is also the first attempt to relate model-produced ensemble forecasts with manually derived QPFs, and its operational application eventually could aid in increasing the forecast lead time and accuracy of RFC streamflow model forecasts.

This paper consists of seven sections. In section 2, we describe the datasets and the method for interpolation/remapping of the data used. In section 3, we present results of spread and forecast error relationships obtained at individual U.S. grid points for four seasons. In section 4, we (i) derive an equation to compute a 95% CI for HQPF using linear regression of the AE on SP, (ii) examine the assumptions used in deriving the linear regression equation, and (iii) propose HQPF stratification methodologies to address any problems found in ii. In section 5, we compare the results from the five methodologies proposed in section 4. In section 6, we provide more detailed verification for the method selected from the comparisons in section 5. Last, section 7 summarizes the results and conclusions.

## 2. Datasets

Three datasets, (a) NCEP HQPF, (b) RFC observational data, and (c) NCEP EMC SREF, are analyzed for the period of October 2001–February 2004.

### a. HQPF

Details regarding the manually produced HQPF data and their verification are described in Olson et al. (1995). Briefly, the HQPF forecast process is a continuous assimilation and assessment of observations, analyses, and model output. This process includes 1) analysis of the latest surface data plots, 2) continuing examination of animated satellite imagery, 3) continuous animated display of regional and national composites of radar imagery, and 4) examination and evaluation of the latest model output from the North American Mesoscale Model, Global Forecast System, Rapid Update Cycle, models outside of NCEP (e.g., U.K. Met Office, European Centre for Medium-Range Weather Forecasts, etc.), and any experimental models available. The forecaster integrates all of this information along with experience and knowledge of model biases to provide the best guidance for a given forecast situation.

The comparative verification study of QPFs in the NWS (Charba et al. 2003) shows overall performance of the manual HQPF is better than any among a series of other QPF products: outputs from numerical weather prediction models run centrally at the NCEP, products issued by forecasters at WFOs, and the final modified WFO QPF products prepared by forecasters/hydrologists at RFCs. The performance of these forecasts varies significantly with season, weather regime, and forecast time. For example, the predictability of a synoptic-scale event is usually higher than convective events. Consequently, performance of both model and human-based QPFs is much better during the cool season than during the convective warm season (McDonald and Graziano 2000).

In this research, we use 6-h HQPFs produced on a Lambert conic conformal (LCC) grid (32-km resolution) for the continental United States. They are issued four times (0000, 0600, 1200, and 1800 UTC) per day. However, only data issued at 0000 and 1200 UTC are analyzed to match as many HQPFs as possible with output times from the NCEP operational SREF, which currently runs twice per day at 0900 and 2100 UTC to 63 h. Therefore, we analyze HQPFs for forecast lead times of 0–6, 6–12, 12–18, 18–24, 24–30, 30–36, 36–42, 42–48, 48–54, and 54–60 h.

### b. RFC observational data

For verifying the HQPFs, we use quantitative precipitation estimate (QPE) data obtained from the National Precipitation Verification Unit (NPVU). Details on the QPE produced by the NPVU and its operational implementation can be found in McDonald and Graziano (2001). The NPVU precipitation data is gathered and sent by the NWS RFCs. For the eastern United States, RFC QPE data come from the stage III analysis (Fread et al. 1995; Fulton et al. 1998) in which precipitation analysis involves automated processing of gauge-measured and radar-estimated precipitation and interactive quality control by RFC forecasters. For the western United States, RFC QPE data are obtained from only gauge measurements (Charba et al. 2003) that are gridded using Mountain Mapper (Henkel and Peterson 1996).

For objective analysis, all RFC QPE data on 4-km resolution Hydrologic Rainfall Analysis Project grids are remapped onto the standard grids of this study (HQPF grid format) using an approximate area average preserving interpolation (AAAPI) technique (see the appendix).

### c. EMC SREF

The NCEP EMC multi-initial condition and multimodel SREF system has been run operationally since May 2001 (Du et al. 2004). At the time of this work, the system had 10 ensemble members: five members from the Eta Model (Rogers et al. 1996) incorporating the Betts–Miller–Janjić convective parameterization scheme (Janjić 1994) and five members from the Regional Spectral Model (RSM) (Juang et al. 1997). Each SREF run consists of one control run plus two initial-condition breeding pair runs for both the Eta and RSM components using the NCEP breeding technique (Toth and Kalnay 1997).

In this study, we analyze the NCEP SREF operational ensemble mean and spread output products on the Advanced Weather Interactive Processing System Grid 212 (LCC, 40-km resolution grid) (Du et al. 2004). The ensemble spread is defined as the standard deviation of ensemble members about the ensemble mean. These products are issued twice a day (0900 and 2100 UTC). To match the time of the HQPF forecasts (i.e., 1200 and 0000 UTC), we use ten 6-h ensemble forecast intervals starting with 3–9 h and ending with 57–63 h. These data are also remapped onto the standard grid format of HQPF using the AAAPI technique.

Beginning in December 2003, another set of five ensemble members from the Eta Model incorporating the Kain–Fritsch convective parameterization scheme (Kain and Fritsch 1993) was added to the operational NCEP SREF system to make a total of 15 members. This approach adds a multiple convective parameterization dimension along with the previous multi-initial condition and multimodel approach. In this study, the ensemble data used for the period of October 2001–November 2003 are obtained from the 10-member SREF. The ensemble data used for the period of December 2003–February 2004 are obtained from the 15-member SREF. More details regarding the operational implementation of the SREF, new convective parameterizations, and cloud microphysics chosen for ensemble forecasts at NCEP can be found in Du et al. (2004) and Ferrier (2004).

## 3. Spread–forecast error relationships

The goal of this section is to find a parameter available from the SREF that is well correlated with the HQPF AE. To attain this goal, correlations are computed between the HQPF AE and the SREF means and spreads of 500-hPa geopotential height, 700- and 850-hPa relative humidity, 850-hPa temperature, and QPF. The geopotential height, relative humidity, and temperature fields indicate no linear relationship between the HQPF AE and their spreads or means (these results are not shown). On the other hand, the correlation between the AE and ensemble QPF spread (SP) shows a strong linear relationship (Fig. 1). Figure 1 shows Pearson's correlation coefficient (*r*) (Wilks 1995) computed for 0–6-h forecast lead time during winter months. In Fig. 1, the *r* between AE and SP is greater than 0.5 at most U.S. grid points (90.5%) and even *r* values greater than 0.8 are found at many grid points (10.5%). This indicates the SREF can be used to predict the uncertainties of the HQPFs. The correlations shown in this study are greater than those previously reported in work such as Grimit and Mass (2002) and Stensrud and Yussouf (2003). Another noticeable feature in Fig. 1 is the large degree of small-scale variability, reinforcing the need to use high-resolution data for computing correlation statistics at individual grid points.

In addition to the correlation for the simple linear regression, we examined the correlations for various other types of regression such as logarithmic, power, polynomial, and multiple regressions (not shown). The best fit was found for the simple linear regression (some examples showing scatterplots can be found in the following sections), and the null hypothesis (*H*o: slope = 0) was rejected at the 0.001 error rate (99.9% confidence level). No statistically significant improvement was found in the multiple regression approach (not shown).

Figure 2 illustrates seasonal variations of the correlation. To provide more meaningful statistics, *r* is computed only at individual grid points having at least 100 values in their time series. If this restriction is not satisfied at a certain grid point, a blank spot appears in the figure. Blank spots also appear if all data are zero during the regression period. Figure 2 shows relatively higher *r* values in winter months, the lowest values in summer months, and intermediate values in spring and fall. The relatively lower *r* values during the summer are likely the result of convective precipitation common to the warm season.

To evaluate quantitatively the spread–error skill as a function of forecast lead time, the *r* associated with each lead time is computed at each horizontal grid point over the United States. For each lead time, the *r* values are averaged over the entire U.S. domain. Figure 3 plots these averaged *r* values versus forecast lead time for each season of 2002 (Fig. 3a) and 2003 (Fig. 3b). It is apparent that the spread–error relationship is better for earlier forecast lead times. The general tendency of the spread–error relationship to decrease with lead time is consistent with other studies (e.g., Barker 1991; Stensrud and Yussouf 2003). In comparing 2002 and 2003, the spread–error relationship improves in 2003, particularly for winter, which shows large improvement over 2002 in both the 2- and 3-day forecasts.

## 4. Confidence interval forecasts using linear regression model

### a. Linear regression of AE on SP

*Y*is the AE,

*q*is the observed precipitation,

_{o}*q*is the HQPF,

_{f}*b*

_{0}is the intercept,

*b*

_{1}is the slope, and

*X*is the SP. For a given time series of values (

*X*,

*Y*) at each grid point, a slope and intercept are found such that the sum of squared errors about the regression line is a minimum.

### b. Predicting the confidence interval for AE given SP

*Y*given a particular

*X*that is within, or at least near, the range of

*X*values in the data. The distribution of the data about the line of regression allows estimation of a confidence interval (CI) for the predicted

*Y*value (

*Ŷ*). The general form of CI for a particular

*X*value (e.g., Steel et al. 1997; Wilks 1995) is given by where

*X*

_{0}is the SP value for an individual point we are trying to predict,

*t*is the appropriate percentile of the

*t*distribution with degrees of freedom equal to the error degrees of freedom,

*n*is the number of data points in the time series used for the regression,

*X*

*X*is a SP value from the time series, and MSE is the mean squared error of the

_{k}*Y*about the regression line. The computational form of the MSE is

The upper and lower bounds of the CI [the positive and negative roots of Eq. (2), respectively] establish, for a specified confidence level, the expected range of absolute error for the HQPF, given an ensemble spread *X*_{0}.

*q*−

_{o}*q*|) for its CI, selecting +

_{f}*t*

*q*in Eq. (2):

_{o}Choosing the positive (negative) sign in Eq. (4) yields the maximum (minimum) expected precipitation for the upper (lower) CI bound associated with the *q _{f}* ,

*X*

_{0}, and

*t*values. While the level of confidence associated with the

*t*value is appropriate for the AE CI estimates, the same degree of confidence may not apply to HQPF CI estimates because the largest possible interval is selected for Eq. (4).

Construction of the CI assumes that the individual deviations of the observed AE from the true line of fit (i.e., true errors, *ɛ _{k}*) satisfy four conditions: 1) their mean is zero, 2) any subset (determined independently from the errors themselves) has the same variance as any other, 3) they are uncorrelated, and 4) they are normally distributed. Taking the regression line to be a close approximation to the “true” line of fit, the estimates of

*ɛ*(the deviations of the observed AE from the fit line, residuals) are computed at a number of grid points. Examining (details are not shown) the distributions of estimates of the

_{k}*ɛ*shows that they satisfy conditions 1, 3, and 4. However, they do not satisfy condition 2. All the MSEs (estimates of the variance of

_{k}*ɛ*) computed for HQPF subsets categorized on the basis of HQPF amount are not the same. Categorization of the subsets based on HQPF amount shows that the variance of the deviations increases as the HQPF amount increases (Fig. 4).

Figure 4 shows scatterplots of AE versus SP for subsets based on observed precipitation amount (0 ≤ OBS < 0.01, 0.01 ≤ OBS < 0.1, and 0.1 ≤ OBS < 1.0 in.) and subsets based on HQPF amount (0 ≤ HQPF < 0.01, 0.01 ≤ HQPF < 0.1, and 0.1 ≤ HQPF < 1.0 in.) during the 2001/02 winter (December 2001–February 2002). The scatterplots for observed precipitation subsets in Fig. 4a and those for HQPF subsets in Fig. 4b are at a grid point located in the northwestern United States (NW) at 44.36°N, 115.76°W. Figures 4c and 4d are for a grid point in the southeastern United States (SE) at 35.29°N, 77.26°W. The total number of time series data points used is the same at both sites (approximately 180 at each site). These two points, representative of different climatological regimes, are selected arbitrarily and exhibit certain common features described below.

For observed precipitation subsets shown in Figs. 4a and 4c, the distribution of AE about the line of regression on SP is scattered on both sides of the line and along the entire length of the line for all categories, especially in the case of the NW point (Fig. 4a). This indicates that one regression computed for all data ranges can be used in estimating the AE CI for any subset. Since the observed precipitation is not known at the time of predicting AE for a forecast, the situation shown in HQPF-categorized subsets is more important. The HQPF subsets (Figs. 4b and 4d) show different characteristics from the observed precipitation subsets. The high precipitation forecasts generally have not only larger SP values but also larger AE values, and vice versa, but this occurs regardless of the true precipitation amount. This implies that, as Hamill and Colucci (1998) pointed out, the ability to differentiate the potential errors between two forecast events having the same forecast amount but different SP value is limited. In addition, the deviations of AE from the regression line tend to increase as the HQPF amount increases. The subset of 0 ≤ HQPF < 0.01 also tends to cluster below the regression line using all data, shown by the separate regression line (blue) in Figs. 4b and 4d. Thus, if we apply the regression equation parameters obtained using all HQPF data, the CI forecasts will be overestimated (underestimated) for lower (higher) HQPF amounts. The remedy may be data stratification discussed in the next section.

### c. Stratification methodology

There are many possible methods for stratifying the HQPF data to create a series of linear regressions of AE on SP so that each segment of this piecewise linear fit has a more uniform distribution of MSE. Five methods, numbered 1–5, are introduced and examined to address the problem of differing MSEs for subsets determined by HQPF amounts. The stratification categories for each method are summarized in Table 1. The HQPF ranges shown in the left column are the ranges of HQPF used to predict the CI, given a particular HQPF. The HQPF ranges shown in the five rightmost columns are those used to compute regression parameters for Eqs. (2) and (4). The different regression model equation parameters (i.e., slope, intercept, mean squared error, number of data points, mean spread, sum of squares for spread, etc.) derived using different HQPF stratification methodologies are applied to CI computations. For the HQPF ranges in the left column, the CI are computed by applying two categories of regressions (Dry/All Reg), two categories of regressions (Dry/Wet Reg), three categories of regressions (Dry/Light/Moderate-Heavy Reg), two categories of regressions (Dry/Log-Log Reg), and 15 categories of regressions in method 1, 2, 3, 4, and 5, respectively. The shorthand descriptions for each method are intended to suggest the HQPF categories used for computing the regression parameters. In these method descriptions, chosen subjectively to contrast the methods easily, “Dry” means 0 ≤ HQPF < 0.01, “Wet” means HQPF > 0, “ALL” means HQPF ≥ 0, “Light” means 0 < HQPF < 0.1, “Moderate-Heavy” means HQPF ≥ 0.1, and “Log-Log” means that type of regression for HQPF > 0.

For method 5, the best way to obtain regression parameters is to use data in mutually exclusive intervals. However, it is impossible to get a large enough sample in the high precipitation categories. Therefore, all regression parameters for HQPF > 0 in method 5 are computed using data in 0 < HQPF < threshold as specified in Table 1. In all methods, the data used for the regression are from the same season of the previous year. The corresponding CI forecast results are evaluated in the next section.

## 5. Method comparisons and discussions

### a. The 95% CI forecast comparisons: Probabilistic approaches

Figure 5 shows the evaluations of the 95% CI forecasts for the AE (panel a) and for the HQPF (panel b) for five different stratification methodologies at the NW site during the 2002/03 winter. The same evaluations at the SE site are shown in Figs. 6a and 6b. In panel a of Figs. 5 and 6, the predicted AE, the predicted upper and lower bounds for AE CI, and the observed AE for validation are displayed as functions of SP. In panel b of Figs. 5 and 6, the predicted upper and lower bounds for HQPF CI and the observed precipitation are shown as functions of HQPF. Since the CI forecast methods for HQPF = 0 are all the same in the five methods, only the CI forecasts for HQPF > 0 are compared here. Impossible negative values for the lower bound of the CI are shown in Figs. 5 and 6 to give a sense of the width of the CI, but in practical application these values are set to zero. Although method 5 defines 15 possible categories, only the categories populated with HQPF data during the winter of 2002/03 are displayed for method 5 in panel a of Figs. 5 and 6. The higher categories are not shown because these amounts were not forecast at this specific point during that period.

In Figs. 5 and 6, the CIs for both AE and HQPF are estimated to be broadest in methods 2, 3, and 4, compared to either method 1 or 5. In particular, method 4, log–log regression, exhibits anomalously high (low) values of the upper (lower) boundary of CI forecasts and values of the expected AE that are too low. This is due to the nonlinear effect of the transformation as shown in Figs. 5 and 6, where values are plotted in ordinary coordinates. On this basis, we reject method 4. The best method to select is one having a narrow CI (|Upper_CI − Lower_CI|) while preserving a high hit rate. Hit rate is defined as the fraction of the forecast events for which the observed precipitation falls between the lower and upper boundaries of the CI: Lower_CI ≤ OBS ≤ Upper_CI. The best CI forecasts for both AE and HQPF, satisfying the requirements of narrow CI and high hit rate, are shown in method 5 (see method 5 in Figs. 5 and 6, quantified below). For the subset of 0 < HQPF < 0.1, the narrowest CIs having a reasonably high hit rate are shown in method 5. For higher HQPF ranges, the CIs in method 5 increase accordingly.

Comparisons of mean CI size and hit rate of HQPF CI forecasts for the categorized HQPF ranges for each method and location are summarized in Table 2. As in Figs. 5 and 6, the higher categories not populated are not shown. To choose the best method satisfying the requirements described above, the two narrowest CIs are selected (boldface in Table 2, with ties for first or second narrowest also in boldface) in each HQPF range, and then their hit rates are compared. If the percent increase from the first to second narrowest CI is less (greater) than the corresponding hit rate percent increase, the method having the second (first) narrowest CI is selected as the better method. Compared to method 1, method 5 is found to be better for all HQPF ranges at both the NW and SE sites. The same comparison applied at all grid points selects method 5 at the rate of 90%, with method 1 selected at the remaining 10% of grid points.

### b. Comparisons for mean AE and mean absolute residual: Deterministic approaches

*t*value determination, the MSE, the deviation from mean SP value, and the sum of squared deviations for SP, as well as

*b*

_{1},

*b*

_{0}, and SP

_{0}[Eqs. (2) and (4)]. When forecasting just the deterministic error of HQPF, only

*b*

_{1},

*b*

_{0}, and SP

_{0}are needed. This means that the deterministic error forecast is independent of the number of data, MSE, etc. To determine the best method for forecasting the HQPF deterministic error, we perform the method comparisons for mean predicted AE and mean absolute residual in the individual HQPF ranges (Table 3). The mean observed AE [MAE(obs)], the mean predicted AE [MAE(prd)], and the mean absolute residual (MAR) are computed as follows:

The MAE(prd) closest to MAE(obs) for each HQPF range is shown enclosed by parentheses in Table 3. Counting the number of parenthetically enclosed entries in the MAE(prd) columns of Table 3, the occurrences of MAE(prd) closest to MAE(obs) are found three, one, six, zero, and two times for methods 1, 2, 3, 4, and 5, respectively, in the HQPF ranges. Considering overall HQPF ranges (All HQPF in the table), the best MAE(prd) is found in method 2. For the equal-weight average (the same weighting is given to each HQPF range regardless of number of data in each HQPF range), the best MAE(prd) is found in method 3. On the other hand, the smallest MARs are found one, two, four, one, and two times in methods 1, 2, 3, 4, and 5, respectively [marked with an asterisk (*)]; the second smallest MARs are found one, five, one, one, and four times in methods 1, 2, 3, 4, and 5, respectively [marked with double asterisks (**)]. Both the All HQPF (HQPF ≥ 0) and the equal-weight average show the smallest MAR in method 5. Integrating all of the evaluations, method 5 shows the best performance. In addition, method 3 (the method using regressions obtained from three mutually exclusive data intervals) appears to be another good candidate for the deterministic forecasts of AE.

## 6. Verification of 95% CI forecasts

More detailed verification for method 5 (15 HQPF categorized regressions) addresses statistical reliability and resolution of the forecasts of CI. Table 4 shows detection percentage (hit rate × 100) of the HQPF 95% CI forecasts and the AE 95% CI forecasts. The detection percentages are computed at individual grid points and averaged over the entire U.S. domain. All results in Table 4 have values near 95% for each season and each lead time, indicating statistical consistency. While the spatial mean values are almost the same for the various seasons, the standard deviation (SD) from the mean shows seasonal variability: the highest SD in summer, the second highest in spring, and the smallest in fall and winter. Experiments using degrees of confidence lower than 95% show good reliability for AE CI, but show excessive hit rates for HQPF CI.

For the 2003/04 winter, hit rates are computed using both the old 10-member and new 15-member SREFs. Little change in the SP characteristics in going to the 15-member SREF results in a negligibly small hit rate difference between the 10- and 15-member SREFs. This suggests that the use of previous season regression parameters obtained from the 10-member SREF is not detrimental to the prediction of CI using the new 15-member SREF. The results presented below for the 2002/03 winter are intended to provide a detailed verification.

To investigate the frequency distribution of the CI size (|Upper_CI − Lower_CI|) and the corresponding hit rate, Fig. 7 plots the absolute frequency distribution (inset) and relative frequency distribution of the HQPF 95% CI forecast and the corresponding hit rate of the CI forecast. The relative frequency distribution of the CI (|Upper_CI − Lower_CI|) indicates that over 90% of the CI forecasts are issued in the range of 0 ≤ CI < 0.2 in. The corresponding hit rates for the CI forecasts tend to be higher in the marginal ranges (0 ≤ CI < 0.2 and CI ≥ 1.8 in.) and lower in the range of 0.2 ≤ CI < 1.8 in.

*Z*) are computed by where

*σ*is the SD of the CI. The standardized anomaly is a useful representation of data even when the distribution of the data are not Gaussian (Wilks 1995). Figure 8 plots the relative frequency of occurrence of the standardized anomalies of HQPF CI forecasts and the corresponding hit rate of the CI forecasts within 10 equally likely intervals. In Fig. 8, the median values of the standardized anomalies are negative in 8 of the 10 bins, where median values are in the range of −1 <

*Z*< ≈ 0. This indicates that approximately 80% of the CIs have values less than the mean CI values, and at least 93% of the CIs with negative

*Z*values have the deviation values (CI −

*Z*values, indicating that approximately 20% of CI values are greater than the mean CI values, and at least 50% of the CIs with positive

*Z*values have Z values greater than 1. This asymmetrical distribution is reasonable given that the lower bound of HQPF is zero and most events occur in the near-zero ranges. The corresponding hit rates of the CI forecasts over the 10 bins show a decreasing trend for increasing

*Z*, but the hit rate never falls below 0.8. The forecasts make use of differing values of CI for which the hit rates are satisfactory. Qualitatively, this indicates that the CI forecasts do exhibit a considerable degree of resolution.

To characterize the reliability of the CI forecasts for various HQPF amounts, Fig. 9 exhibits the relative frequency distribution of HQPF and the corresponding hit rates and under–overforecast rates of the HQPF 95% CI forecasts as a function of HQPF amount. The intervals on the axis for HQPF amount are set to the values conventionally used in HQPF forecasts. Relative frequency distribution of the HQPF shows that over 94% of the HQPFs are issued in the range of 0 ≤ QPF < 0.1 in. Going from lower to higher HQPF ranges, the hit rate decreases to about 0.6, and the overforecast rate (fraction of forecast events for which the observation is less than the lower boundary of the CI) goes up to about 0.4. While the overforecast rate increases with increasing HQPF amount, the underforecast rate (fraction of forecasts for which the observation exceeds the upper boundary of the CI) stays below 0.1 over all HQPF ranges and even decreases to zero for very heavy QPF ranges. The relatively lower rate of underforecasting for all HQPF ranges is desirable because an underforecast fails to detect potential hazards, especially at the higher precipitation amounts.

Figure 10 shows hit and underforecast rates of HQPF 95% CI forecasts versus HQPF ranges for the forecast lead times. In the 0–0.2-in. (1 in. = 25.4 mm) HQPF range, the tendency for the hit rate to decrease with increasing HQPF amounts is nearly the same for all forecast lead times. However, for the moderate and heavy HQPF ranges, the hit rate decreases are greater with increasing HQPF amounts for increasing forecast lead times. On the other hand, underforecast rates indicate relatively steady trends for HQPF amounts and forecast lead times, showing values less than 0.1 over all QPF ranges and forecast lead times. The overforecast rate is one minus the sum of the hit rate and the underforecast rate. Since underforecast rates are relatively steady over the forecast lead times, overforecast rates show opposite trends to the hit rates in the moderate and heavy HQPF ranges (not shown).

## 7. Summary and conclusions

The purpose of this study is to develop a methodology to quantify the uncertainty in manually produced 6-h HQPFs using NCEP SREFs. The first step is to find a relationship between the desired quantity, the AE of HQPF, and a known quantity available from the ensemble forecasts. The ensemble forecasts appear to predict the uncertainty of HQPFs with high correlation between the AE and SP. On the basis of the high correlations, the linear regression model equations needed to estimate the AE are derived in this study. Using the regression model equation parameters (i.e., slope, intercept, MSE, number of data points, mean SP, etc.) derived at each horizontal grid point for each season and individual forecast lead time, we predict an AE associated with an individual SP and the 95% CI of the AE. Based on the AE CI forecast and the HQPF itself, we also predict the 95% CI of the HQPF.

The evaluation of regressions for data categorized according to the observed and forecasted precipitation amounts indicates that the MSEs (estimates of the variance of the residuals) for wet categorized subsets (HQPF ≥ 0.01 in.) are much greater than the MSE for the dry subset (0 ≤ HQPF < 0.01 in.). This suggests that the best-fitting lines should be computed separately for the individual HQPF subsets. To address this issue, five different HQPF stratification methodologies are tested: method 1 (Dry/All Regressions), method 2 (Dry/Wet Regressions), method 3 (Dry/Light/Moderate-Heavy Regressions), method 4 (Dry/Log-Log Regressions), and method 5 (15 QPF Categorized Regressions). The best CI forecasts satisfying the requirements of narrow CI (|Upper_CI − Lower_CI|) with high hit rate (fraction of the forecasts for which Lower_CI ≤ OBS ≤ Upper_CI) are found in method 5. Using the method 5 regression parameters derived for 15 categorized ranges of HQPF at each horizontal grid point for each season and individual forecast lead time, real-time CI forecasts for HQPF and the AE are produced for the continental United States and are now available online twice (0000 and 1200 UTC) a day (http://www.hpc.ncep.noaa.gov/qpfci/qpfci.shtml). Figure 11 gives an example of a 6-h forecast panel valid at 0600 UTC 16 February 2003 as it appeared on the Web site. Prior to this work, only the HQPF in Fig. 11a was available. This new display gives the user copious but compact information to assess the uncertainty of the HQPF.

Verification is performed for a variety of seasons, various CI ranges, HQPF categories, and forecast lead times. The overall detection percentage of HQPF 95% CI forecasts shows almost constant values near 95% for various seasons and forecast lead times. The relative frequency distribution of HQPF and the corresponding hit and under–overforecast rates for HQPF 95% CI forecasts as a function of HQPF amount are investigated for the winter season. It is shown that approximately 98% of HQPFs are issued in the range of 0 ≤ HQPF < 0.25 in. and the corresponding hit rate is 0.98. As HQPF increases, the hit rate decreases to about 0.6 and the overforecast rate (the fraction of the forecasts when OBS < Lower_CI) goes up to about 0.4. While the overforecast rate increases with increasing HQPF amount, the underforecast rate (the fraction of the forecasts when OBS > Upper_CI) stays below 0.1 over all HQPF ranges and even decreases to zero for very heavy HQPF ranges. These relatively low and steady trends for the underforecast rate over all HQPF ranges are also shown for increasing forecast lead times. This is desirable because an underforecast fails to detect potential hazards, especially at the higher precipitation amounts.

We expect to improve the reliability of the product (especially in high precipitation regimes) by increasing the amount of data used in deriving the relationship between SREF spread and error in HQPF. The present study indicates that it is possible to predict the uncertainties of the QPFs using more recent ensemble forecasts. This is possibly due to improvements in operational ensemble forecasts (e.g., forecast skill, horizontal resolution, ensemble members, etc.). The method developed from this study is efficient, yielding a reasonably simple end product for operational use. Its operational application followed by evaluation and feedback from RFC users eventually could aid in increasing the lead time and accuracy of RFC streamflow model forecasts.

Future work involves using a combination of methods 5 and 1 by selecting one or the other on the basis of the regression parameters, and calibrating the HQPF CI at lower levels of confidence to improve reliability. Although the December 2003 upgrade to a 15-member SREF was not detrimental to successful use of 2002 winter regression parameters, future work is required to develop a method to handle operational changes to the SREF.

The authors thank Zoltan Toth for his comments, Jun Du for archival of SREF data, Tish Soulliard for providing NPVU data, and John Schaake for his insights and encouragement. We further appreciate the support provided by HPC DTB colleagues (Chris Bailey, Mark Klein, Mike Bodner, Peter Manousos, and Joey Carr). The suggestions of the anonymous reviewers have improved the clarity of this paper. This research was funded by the Advanced Hydrologic Prediction System Program of NOAA/NWS.

## REFERENCES

Accadia, C., , Mariani S. , , Casaioli M. , , and Lavagnini A. , 2003: Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids.

,*Wea. Forecasting***18****,**918–932.Barker, T. W., 1991: The relationship between spread and forecast error in extended-range forecasts.

,*J. Climate***4****,**733–742.Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF Ensemble Prediction System.

,*Mon. Wea. Rev.***125****,**99–119.Charba, J. P., , Reynolds D. W. , , McDonald B. E. , , and Carter G. M. , 2003: Comparative verification of recent quantitative precipitation forecasts in the National Weather Service: A simple approach for scoring forecast accuracy.

,*Wea. Forecasting***18****,**161–183.Du, J., and Coauthors, 2004: The NOAA/NWS/NCEP Short-Range Ensemble Forecast (SREF) system: Evaluation of an initial condition vs multiple model physics ensemble approach. Preprints,

*16th Conf. on Numerical Weather Prediction,*Seattle, WA, Amer. Meteor. Soc., CD-ROM, 21.3.Ferrier, B. S., 2004: Modifications of two convective schemes used in the NCEP Eta Model. Preprints,

*16th Conf. on Numerical Weather Prediction,*Seattle, WA, Amer. Meteor. Soc., CD-ROM, J4.2.Fread, D. L., and Coauthors, 1995: Modernization in the National Weather Service river and flood program.

,*Wea. Forecasting***10****,**477–484.Fulton, R. A., , Breidenbach J. P. , , Seo D-J. , , Miller D. A. , , and O'Bannon T. , 1998: The WSR-88D rainfall algorithm.

,*Wea. Forecasting***13****,**377–395.Grimit, E. P., , and Mass C. F. , 2002: Initial results of a mesoscale short-range ensemble forecasting system over the Pacific Northwest.

,*Mon. Wea. Rev.***17****,**192–205.Hamill, T. M., , and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts.

,*Mon. Wea. Rev.***126****,**711–724.Henkel, A., , and Peterson C. , 1996: Can deterministic quantitative precipitation forecasts in mountainous regions be specified in a rapid, climatologically-consistent manner with Mountain Mapper functioning as the tool for mechanical specification, quality control, and verification? Abstracts,

*Fifth National Heavy Precipitation Workshop,*State College, PA, NOAA/NWS, 31 pp.Hoke, J. E., , Reynolds D. W. , , Danaher E. , , and McCarthy K. C. , 2000: The Hydrometeorological Prediction Center—Its future role in quantitative precipitation forecasting. Preprints,

*15th Conf. on Hydrology,*Long Beach, CA, Amer. Meteor. Soc., 243–246.Hou, D., , Kalnay E. , , and Droegemeier K. K. , 2001: Objective verification of the SAMEX '98 ensemble forecasts.

,*Mon. Wea. Rev.***129****,**73–91.Houtekamer, P. L., 1993: Global and local skill forecasts.

,*Mon. Wea. Rev.***121****,**1834–1846.Janjić, Z. I., 1994: The step-mountain Eta coordinate model: Further developments of the convection, viscous sublayer, and turbulence closure schemes.

,*Mon. Wea. Rev.***122****,**927–945.Juang, H-M. H., , Hong S-Y. , , and Kanamitsu M. , 1997: The NCEP Regional Spectral Model: An update.

,*Bull. Amer. Meteor. Soc.***78****,**2125–2143.Kain, J., , and Fritsch J. M. , 1993: Convective parameterization for mesoscale models: The Kain–Fritsch scheme.

*The Representation of Cumulus Convection in Numerical Models of the Atmosphere, Meteor. Monogr.,*No. 46, Amer. Meteor. Soc., 165–170.McDonald, B. E., , and Graziano T. M. , 2000: The NWS QPF verification program. Preprints,

*15th Conf. on Probability and Statistics in the Atmospheric Sciences,*Asheville, NC, Amer. Meteor. Soc., 61–64.McDonald, B. E., , and Graziano T. M. , 2001: The National Precipitation Verification Unit (NPVU): Operational implementation. Preprints,

*Symp. on Precipitation Prediction: Extreme Events and Mitigation,*Albuquerque, NM, Amer. Meteor. Soc., 71–74.Mesinger, F., 1996: Improvements in quantitative precipitation forecasting with the Eta regional model at the National Centers for Environmental Prediction: The 48-km upgrade.

,*Bull. Amer. Meteor. Soc.***77****,**2637–2649.Olson, D. A., , Junker N. W. , , and Korty B. , 1995: Evaluation of 33 years of quantitative precipitation forecasting.

,*Wea. Forecasting***10****,**498–511.Rogers, E., and Coauthors, 1996: Changes to the operational “early” Eta analysis forecast system at the National Centers for Environmental Prediction.

,*Wea. Forecasting***11****,**391–413.Scherrer, S. C., , Appenzeller C. , , Eckert P. , , and Cattani D. , 2004: Analysis of the spread–skill relations using the ECMWF Ensemble Prediction System over Europe.

,*Wea. Forecasting***19****,**552–565.Steel, R. G. D., , Torrie J. H. , , and Dickey D. A. , 1997:

*Principles and Procedures of Statistics: A Biometrical Approach*. 3d ed. McGraw-Hill Series in Probability and Statistics, McGraw-Hill, 666 pp.Stensrud, D. J., , and Yussouf N. , 2003: Short-range ensemble predictions of 2-m temperature and dewpoint temperature over New England.

,*Mon. Wea. Rev.***131****,**2510–2524.Stensrud, D. J., and Coauthors, 1999: Using ensembles for short-range forecasting.

,*Mon. Wea. Rev.***127****,**433–446.Toth, Z., , and Kalnay E. , 1997: Ensemble forecasting at NCEP and the breeding method.

,*Mon. Wea. Rev.***125****,**3297–3319.Toth, Z., , Zhu Y. , , and Marchok T. , 2001: The use of ensembles to identify forecasts with small and large uncertainty.

,*Wea. Forecasting***16****,**463–477.Whitaker, J. S., , and Loughe A. F. , 1998: The relationship between ensemble spread and ensemble mean skill.

,*Mon. Wea. Rev.***126****,**3292–3302.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences: An Introduction*. Academic Press, 467 pp.

# APPENDIX

## Approximate Area Average Preserving Interpolation

The approximate area average preserving interpolation (AAAPI) remaps data from one grid to another automatically taking into account local differences in the resolutions and the map projections of the two grids. It is designed to meet the following requirements:

- approximate preservation of area averages,
- high computational speed, and
- portable, simple code that is not tailored to specific input or output grids.

If the input grid is of comparable or coarser resolution compared to the output grid, the AAAPI uses bilinear interpolation. It can be shown that criterion 1 is met to first-order approximation for a smaller output grid box completely embedded within some input grid box. Bilinear interpolation satisfies requirements 2 and 3. In the case where a smaller output grid box is not completely embedded within a single input grid box, bilinear interpolation is less representative of the area average. Accadia et al. (2003) have quantified the affects of bilinear interpolation in terms of changes in skill of interpolated precipitation forecasts. In their case, the input and output grids are of comparable resolution, and many output grid boxes are not completely embedded within an input grid box. When they used a somewhat more complicated and computationally expensive remapping algorithm involving division of each output grid box into 25 subboxes following a method similar to that of Mesinger (1996), the changes in skill were reduced but not eliminated.

For the case when the input grid is of fine resolution compared to the output grid, the AAAPI uses a different algorithm to establish an area average for the output grid box. A circumscribing circle of radius *D* (in units of input grid coordinates) approximates the output grid box. Any input point whose distance *d* from the center of the output grid box is less than *D* is considered to contribute to the area average for the output box. The weight given to the point is based on an estimate of how much of the area of the input grid box is contributing to the area covered by the output grid box. The weight is less than 1 if (*D* − *d*) ≤ 0.5, decreasing linearly to 0.5 as *d* approaches *D*; otherwise, the weight is 1. If the number of contributing input grid boxes is less than four or the sum of the weights is less than 2, the algorithm reverts to bilinear interpolation.

HQPF stratification categories for five methodologies.

Method comparisons of mean CI size and hit rate of HQPF 95% CI 0–6-h forecast for the categorized HQPF ranges at the NW and SE sites. The narrowest two CIs and their associated hit rates are shown in boldface. In the comparison of M1 and M5 column, the CI increase is percent increase from smaller CI to larger CI. The hit rate increase is percent increase from smaller hit rate (associated with smaller CI) to larger hit rate (associated with larger CI).

Method comparisons for mean predicted AE [MAE(prd)] and mean absolute residual (MAR) computed for the categorized HQPF ranges at all U.S. grid points. The MAE(prd) closest to MAE(obs) is shown enclosed by parentheses. The smallest MAR and the second smallest MAR are denoted with an asterisk (*) and double asterisks (**), respectively.

Detection percentage of the HQPF 95% CI forecasts (HQPF det. %) and the AE 95% CI forecasts (AE det. %) for season and lead time. The values of detection percentage represent U.S. mean ± std dev.