1. Introduction
Quantitative precipitation forecasts (QPFs) are important for applications ranging from flash flood forecasting to long-term agricultural and water resource management. Yet accurately predicting precipitation is one of the most difficult tasks in meteorology (Fritsch et al. 1998). While enhancements to the resolution and physics of numerical weather prediction (NWP) models have significantly increased the accuracy of forecasts of many meteorological variables in recent years, similar improvements in the accuracy of precipitation forecasts have not been attained because of the physical complexity of precipitation processes, and because the small temporal and spatial scales involved in such processes cannot be resolved by the numerical models (Olson et al. 1995).
Limits in available computing power constrain the resolution and physical detail that can be used in a model. One approach for bridging the gap between the model resolution and the desired forecast scale is to downscale the NWP model output by relating it to observations at specific locations of interest, that is, to “localize” NWP model output. The most widely used example of this is Model Output Statistics (MOS; Glahn and Lowry 1972). In the context of MOS, multiple linear regression is used to relate temperature, cloud cover, probability of precipitation, and other variables of interest at specific locations to NWP model output in a historic dataset, and these derived relationships are in turn used to predict the variables from real-time NWP model output.
In addition to linear regression, a number of other data analysis techniques can be used to relate NWP model output to observed variables of interest. They include generalized additive models (Vislocky and Fritsch 1995a) and self-learning algorithms such as abductive machine learning (Abdel-Aal and Elhadidy 1995) and goal-oriented pattern detection (Dumais and Young 1995). Artificial neural networks are another approach that has been applied to the prediction of meteorological variables, including tornadoes (Marzban and Stumpf 1996), severe weather (McCann 1992), and lightning (Frankel et al. 1995).
A number of neural network applications to precipitation forecasting have also been developed. Navone and Ceccatto (1994) used neural networks to predict total rainfall for the Indian monsoon season; Kuligowski and Barros (1998) predicted 0–6-h precipitation amounts based on observed precipitation at nearby gauges during the previous 6 h. Lindner and Krein (1993) used neural networks to predict the maximum daily rainfall in the state of Mississippi based on radiosonde data. An approach that postprocesses NWP model output was taken by Hall (1996), who used neural networks to predict the 24-h probability of precipitation and mean precipitation amount for a gauge network in the Dallas–Fort Worth metroplex based on output from the National Centers for Environmental Prediction (NCEP) Eta Model.
The work presented here uses a backpropagation neural network to predict 6-h precipitation amounts during the 0–24-h time period (i.e., 0–6, 6–12, 12–18, and 18–24 h) for four specific locations in two drainage basins in the middle Atlantic region of the United States, based on nearby gridpoint values from the NCEP Nested Grid Model (NGM). The next section of this paper presents a description of the dataset and the selection of predictor variables, as well as a description of the backpropagation neural network used in this study. Section 3 contains the results and a comparison to linear regression, followed by conclusions and a summary in the final section.
2. Methodology
a. Dataset
Precipitation gauges from two drainage basins were chosen for this study (Fig. 1). The first basin is the Youghiogheny River above Confluence, Pennsylvania, which covers an area of 1029 mi2 (2634 km2) in western Pennsylvania and western Maryland; the second is Swatara Creek above Harper Tavern in southeastern Pennsylvania, which covers an area of 337 mi2 (863 km2). Each basin contains two precipitation gauges from the National Climatic Data Center (NCDC) archive: Confluence and McHenry in the Youghiogheny basin, and Joliett and Lebanon in the Swatara basin.
The predictands for this study were the 6-h precipitation amounts for four lead times: 0–6, 6–12, 12–18, and 18–24 h. Since the NCDC data are hourly, the data for these four gauges were aggregated into 6-h amounts for the 6-h time periods that corresponded to the model output cycle: 0000–0600, 0600–1200, 1200–1800, and 1800–0000 UTC.
The predictors were archived NGM output values for the four grid points closest to each basin at both the beginning and end of each 6-h period of interest (these grid points are represented by the solid squares in Fig. 1). Although the NGM model grid has a resolution of approximately 83 km, the data are archived at the much coarser resolution of approximately 190 km. Three of the lead times of the NGM forecasts corresponded directly with the desired lead time of the neural network forecast: 6–12, 12–18, and 18–24 h. For the 0–6-h lead time, current NGM output would not be available to the operational forecaster because of the time required to run the model and distribute the output; consequently, variables from the previous NGM model run for the 12–18-h forecast period are used instead. The dataset had a length of 5 yr, from December 1987 to November 1992, and was divided into four seasons: winter (December–February), spring (March–May), summer (June–August), and fall (September–November).
Since the two most important large-scale factors in the production of precipitation are moisture and vertical lift, the following variables were taken directly from or derived from the NGM output fields.
1) Moisture
PW: total column precipitable water;
RH: column-average relative humidity, and single-level relative humidity at 1000, 950, 850, 700, and 500 hPa; and
θe: equivalent potential temperature at the same levels as relative humidity.
w: vertical velocity,
u, υ: zonal/meridional velocity (to account for orographic forcing),
DIV: horizontal divergence (related to vertical velocity by the continuity equation),
ADVζ: absolute vorticity advection (related to vertical velocity by the omega equation),
ADVT: thermal advection (omega equation),
ADVTHK: advection of 1000–500-hPa thickness by the wind at each level (omega equation), and
∂θ/∂z, ∂θe/∂z: vertical lapse rates of θ and θe for 1000–950, 950–850, 850–700, and 700–500 hPa (measures of stability).
Other model variables that were either directly extracted or derived from the NGM output included THK (1000–500-hPa thickness), prs (surface pressure), z (geopotential height), θ (potential temperature), and ζ (absolute vorticity). The latter three variables were obtained or derived for the 1000-, 950-, 850-, 700-, and 500-hPa levels. (If the ground surface is above a particular pressure level, values are still computed for that level.) Furthermore, the variables were all considered both at the beginning and end of each 6-h forecast period of interest for the four grid points nearest to the location of interest. The only exception was vorticity advection, for which a mean value was computed for the grid cell formed by four NGM grid points. Overall, this results in a total of 528 possible predictor variables.
The NGM precipitation forecasts were not included in the predictor set because comparison would be made later to the NGM’s own precipitation forecasts. To include those forecasts in the predictor set might be perceived as an unfair advantage. Nevertheless, those data could also be included in the predictor dataset in operational applications.
b. Variable selection
A predictor set as large as that proposed in the previous section (528 variables) would require a prohibitive amount of computer resources to train the neural network. In addition, the use of too many predictors often results in overfitting, that is, the development of predictor–predictand relationships that do not generalize outside the training data. When overfitting occurs, the resulting equations fit the training data very well, but perform very poorly on independent data. Therefore, the number of predictors should be reduced.
It is difficult to determine which predictors are best to use in the neural network based solely on an understanding of the system physics: the predictors with the strongest physical relationship to the predictand may not actually be the best to use, because the ability of the NWP model to forecast the predictors of interest must also be considered. For instance, a predictor with a very strong relationship to a 6-h precipitation amount may be a relatively poor predictor for this application if the NWP model does not forecast its value well, while a predictor with a weaker relationship to a 6-h precipitation may be a relatively good predictor for this application if its value is predicted accurately by the NWP model.
Consequently, an objective variable screening technique is needed. A widely used method is forward screening regression (Glahn and Lowry 1972). The screening is begun by selecting the variable with the highest correlation to the predictand. Every possible combination of this first variable with each of the other variables in the predictor set is evaluated, and the combination that explains the greatest amount of the variance in the predictand is selected. This process is first repeated for every possible combination of these first two variables and the remaining ones, and subsequently for each updated set of selected variable. Additional variables are appended to the final set in this manner until the desired number of predictors has been obtained, or until the rate of improvement in the explained variance falls below some predetermined threshold.
Performing this type of screening using a backpropagation network instead of regression is inappropriate, since training the neural network takes much longer than solving a matrix of regression equations. Consequently, forward screening regression was used to select the best predictor variables.
The explained variances used to evaluate the forward screening approach are typically computed from the data that were used to develop the regression equations. However, in this work the explained variances were computed from an independent dataset separate from the developmental data. Specifically, the regression equations were developed using 4 of the 5 yr of data for each season, and then the explained variance was evaluated using data from the remaining year. The objective was to assure that the variables selected were suitable for application to independent data and did not just represent a “memorization” of the characteristics of the developmental dataset.
To maximize the amount of available data, a cross-validation scheme (Elsner and Schmertmann 1994; Michaelsen 1987) was used: variable selection was simultaneously performed for all five possible combinations of 4 yr of development data and 1 yr of evaluation data. The resulting explained variances were based on 5 yr of independent data rather than one, which should produce an even more robust final set of predictor variables.
These two variable selection methods were compared by choosing a set of predictor variables using forward screening regression without cross validation (explained variance is computed using the developmental dataset) and by using forward screening with the cross-validation scheme described above. The neural network forecasts from these two predictor sets were compared, and the use of the variables that were selected using forward screening with cross validation resulted in lower forecast errors than the variables evaluated using forward screening without cross validation.
The optimum number of predictor variables to use is difficult to determine, just as it is not immediately obvious which variables to use. Sensitivity experiments showed that the model generally performed best with 25 predictors, and to minimize the amount of effort required for development, this number of predictors was used for all locations, seasons, and lead times rather than altering it for individual situations. The specific variables contained in the set of 25 varied among the gauges, lead times, and seasons.
Table 1 contains two example sets of predictor variables: one for 6–12-h forecasts at Lebanon during the summer, and the other for 12–18-h forecasts at Confluence during the winter. There are significant differences between these two sets of predictor variables, due probably in part to the differences in season and location. Furthermore, as mentioned previously, the skill of a model at forecasting the value of a given predictor is important and may compensate in part for the relatively weak relationship between that predictor and the predictand in the real atmosphere.
As a final note on variable selection, the cumbersome computational expense of forward screening as a variable selection method could become a problem in an operational environment. However, it is one effective (and necessary) strategy to identify the degree of correlation among predictor variables and the strength of the relationships between each predictor and the predictand, which must be accounted for when selecting predictor variables. The robustness of the forward screening approach was verified in experiments that consisted of removing variables from the predictor set, first one and then five variables at a time to evaluate the performance of the neural network. The network-generated forecasts were generally less accurate when the variables selected earlier in the process were removed than when those selected later in the selection process were eliminated. This is explained by the strong correlation among the early and later variables, hence reducing the effective contribution of the latter to the regression analysis.
c. Backpropagation neural network specifications
A detailed description of backpropagation neural networks is given in Kuligowski and Barros (1998) and in the other references pertaining to neural networks that were mentioned in the introduction. Additional information on neural networks in general can be found in Cheng and Titterington (1994) and Ripley (1994). In brief, a backpropagation neural network is an information processing system that produces an output value (in this case, a forecast of 6-h precipitation amount at a gauge) given one or more inputs (in this case, gridpoint forecast values from the NGM). By repeatedly presenting known input and output values to the neural network, the network can “learn” the relationships between them via a gradient-descent error-reduction algorithm that seeks to reduce the mean squared error with each iteration of training. These learned input–output relationships can then be applied to new input values to produce output values in a forecast setting.
The neural network used in this study was developed by the authors using FORTRAN code in a UNIX environment. The network structure consisted of 25 input nodes, a single output node, and 11 hidden-layer nodes. The selection of 11 hidden-layer nodes was made in accordance with Fletcher and Goss (1993), who found that the ideal number of hidden-layer nodes existed in the range (2n1/2 + m) to (2n + 1), where n is the number of input nodes and m the number of output nodes. Sensitivity studies of the optimum number of hidden-layer nodes within this range were performed by the authors. No appreciable differences in performance were found, so the minimum number of hidden-layer nodes was chosen to minimize computing time.
The sigmoid transfer function used in Kuligowski and Barros (1998) was used in the hidden-layer nodes in this study, with no transfer function applied to the output nodes. The values of the learning constants used were higher than in Kuligowski and Barros (1998), with a value of 0.05 for β (the fraction of the product of the output error and the input or hidden-layer node value that is applied to the change in weight) and 0.005 for α (the fraction of the previous weight change that is applied to the current weight change as a “momentum term”) providing a suitable rate of learning while maintaining a stable training process.
Cross validation was used during the neural network training using the same time periods and same approach as applied to the forward screening regression. This allowed us to evaluate the neural network using independent data, while maximizing the available amount of data available for statistical evaluation. Each neural network was trained for 40 000 cycles, and the results were evaluated on independent data every 1000 cycles. The cycle that had the lowest root-mean-squared error (rmse) on independent data was chosen for verification purposes.
Because the majority of the gauges in the NCDC dataset are Fischer and Porter gauges that record precipitation with a resolution of only 0.10 in. (2.5 mm) (Kuligowski 1997), two variants of the neural network model were tested. In the first one, the rain gauge data were used with no considerations made for measurement accuracy. In the second one, all positive errors of 0.09 in. (2.3 mm) or less were ignored during training, and the final forecasts were rounded down to simulate the resolution of the Fischer and Porter gauge. The exact nature of the rounding of the Fischer and Porter gauge cannot be expressed exactly since it depends on the amount of water in the gauge at the beginning of the 6-h period of interest. However, any approach that uses a consistent method of training the network and rounding the final forecasts should be suitable, since the rounding merely eliminates errors that were ignored during training to simulate uncertainty in the gauge readings. By testing both networks, we found that the neural network with the built-in rounding scheme exhibited better overall performance than the network without it. Accordingly, only the results from the former are presented in the next section.
3. Results
Figures 2a–c are scatterplots of the forecasts for all four gauges, for all seasons and lead times produced by the backpropagation neural network, by linear regression, and by interpolating the NGM precipitation forecasts to the gauge locations using bilinear interpolation. (Here, the term “forecast” is applied to the original NGM output as well as to the localized NGM output.) Two features of these scatterplots are immediately obvious: one is the presence of a systematic dry bias (as shown by the slope of the solid best-fit line, which is less than the slope of the dashed 1:1 line). The second feature is the reduced systematic dry bias of the neural network forecasts relative to the regression and interpolated NGM forecasts.
Figures 3a–c show three different breakdowns of the rmse of the three forecast approaches from the previous paragraph, plus a consensus forecast consisting of the arithmetic average of the neural network and linear regression. This consensus forecast was generated in accordance with Vislocky and Fritsch (1995b), who used an arithmetic average of the NGM and Limited Fine Mesh model MOS forecasts for several meteorological variables to demonstrate that a consensus of different forecast systems often produces more accurate forecasts than any system alone.
The seasonal breakdown in Fig. 3a shows that the highest rmse values occur during the summer months and the lowest during the winter. This is to be expected, since summertime precipitation in the northeastern United States is primarily driven by convective mechanisms that are poorly resolved by most operational NWP models, while in the wintertime, larger-scale forcings predominate and are more accurately depicted by the models. The postprocessing of the NGM data produces forecasts with lower rmse’s than the linearly interpolated NGM forecasts in all cases, with lower errors for the linear regression forecasts than for the neural network forecasts, and lower errors yet (except in the fall) for the consensus forecasts, which is consistent with Vislocky and Fritsch (1995b).
The distribution of rmse by lead time is shown in Fig. 3b. There is little difference in rmse for the first three time periods, though the similarity between the performance of the 0–6-h forecasts and the 12–18-h forecasts is to be expected since the former are based on the 12–18-h forecasts of the previous NGM model run. There is some increase in rmse for the 18–24-h forecasts, most noticeably for the neural network and unprocessed NGM forecasts. In all cases, the regression forecasts have somewhat lower errors than the neural network forecasts, with consensus having the lowest errors of all (except for the 18–24-h lead time).
The distribution by gauge in Fig. 3c shows a reasonable degree of consistency in performance from one gauge to the next. The Swatara basin forecasts (Joliett and Lebanon) have higher errors than the Youghiogheny basin forecasts (Confluence and McHenry). One possible explanation for this difference is the orographic enhancement of precipitation in the Swatara, which is directly exposed to incoming weather systems. Nevertheless, there are too few gauges in the sample to draw firm conclusions on this matter. Again, the linear regression forecasts have lower errors than the neural network forecasts (except at Lebanon), and consensus forecasts perform the best of all (except at Confluence).
A statistical comparison of the four forecast approaches is given in Table 2. As the plots already showed, all three postprocessing methods reduce the rmse relative to the interpolated NGM precipitation forecasts, but the regression forecasts have a lower rmse than the neural network forecasts, and the consensus forecasts have the lowest rmse of all. A comparison of the slopes of the best-fit lines of forecasts versus observations shows that the neural network forecasts have less systematic bias than the linear regression and consensus forecasts. However, the values of the correlation coefficients (r values) suggest that the latter two approaches result in less scatter about the best-fit line. The lower systematic bias of the neural networks is also reflected in the ratio of the variances of the forecasts to the observations (explained variance), which shows a narrower distribution of forecast values for the linear regression and consensus forecasts than for the neural network forecasts. A narrower distribution of forecast values can be associated to a systematic dry bias in variables with a fixed lower bound such as precipitation amount.
Since operational verification of precipitation forecasts is often performed for 24-h amounts rather than 6-h amounts (e.g., Olson et al. 1995), the threat scores for 24-h forecasts that consist of combining the four 6-h forecasts issued at a given time are shown in Fig. 4b. The results are consistent with the performance of the 6-h forecasts: the regression-based forecasts are superior to the neural network–based forecasts for lighter precipitation events, but for heavier events the neural network–based forecasts are superior. For both time periods, the threat scores of the consensus forecasts generally run between the two component forecasts.
The threat score values are consistent with the rmse analysis presented in Fig. 5: the backpropagation network forecasts have lower rmse’s than the linear regression forecasts for observed precipitation amounts of 0.20 in. (5.1 mm) or greater in 6 h, and lower rmse’s than consensus for amounts of 0.30 in. (7.6 mm) or greater. For a threshold of 1.00 in. (25.4 mm), the reduction in rmse over regression exceeds 10%.
This comparison demonstrates the need for forecasters and forecast developers to use statistical measures that are tailored to the forecast problem of interest, because different statistics highlight different aspects of the performance of a particular forecast system. In this case, forecasters concerned with overall forecast accuracy may find the regression-based or consensus forecasts to be the most attractive options, while those more concerned with heavy precipitation events would find the neural network–based forecasts more suited to their needs.
Regardless of the method selected, there is considerable value in postprocessing NWP model forecasts to downscale the model output and enhance forecast accuracy. This is illustrated by the improvements in the correlation coefficients between observations and forecasts at all lead times as shown in Figs. 6a,b for the two river basins used in this study. The benefit of detecting some of the small-scale variations in precipitation that are not captured on the scale of the NWP model grid can be illustrated by spatially distributing the point forecasts throughout a basin. Figures 7a–c show spatial distributions of observed and predicted precipitation (12–18-h lead time) for 1200–1800 UTC on 14 January 1992 for the Swatara Creek basin. They were constructed by applying the hypsometric method to the point observations and predictions at Joliett and Lebanon, Pennsylvania (see Fig. 1 for location and comparison to NGM archive grid scale).
Specifically, these distributions were constructed by determining a linear relationship between the observed (or predicted) precipitation amount and the elevation at Joliett (497 m) and Lebanon (137 m). This relationship is subsequently used along with the distribution of area with elevation in the Swatara, the hypsometric curve inferred from the basin’s digital elevation map (DEM), to generate spatial fields of precipitation (observations or forecasts) for the entire basin. The point of this exercise is to illustrate for a particular case study how the differences in point forecasts can be propagated to areal estimates of precipitation. The differences in observed precipitation between Joliett (15.2 mm) and Lebanon (5.1 mm) (Fig. 7a) were captured much more clearly by the neural network–based forecasts at Joliett (17.8 mm) and Lebanon (5.1 mm) (Fig. 7b) than by the differences in the bilinearly interpolated NGM forecasts at Joliett (3.0 mm) and Lebanon (2.5 mm) (Fig. 7c). This additional detail can in turn significantly enhance the accuracy of areal QPF used for hydrologic forecasting as demonstrated by comparing the estimates of observed and forecasted mean areal precipitation.
4. Conclusions
Quantitative precipitation forecasting remains a difficult but vitally important task for the hydrometeorological community, and numerous means have been explored for improving the accuracy of these forecasts. Improvements in the accuracy of QPF from NWP models have lagged behind that of other variables, at least in part because many of the processes that affect precipitation production occur on spatial scales smaller than those depicted by current models. Enhancements in model grid resolution have been improving, and are expected to continue to improve, the accuracy of model QPF, but localization of NWP model output to specific points also affords opportunities for improvement, as evidenced by the success of regression-based MOS.
Backpropagation neural networks provide an alternative to linear regression that contributes to the accuracy of QPF that is based on NWP model output. The results may vary from region to region depending on the quality of the NWP forecasts and the availability of historical records of both observations and model output to train and evaluate the neural network. For the applications presented here, although the overall accuracy of the forecasts as measured by rmse and correlation coefficients is slightly lower than that obtained using linear regression, the neural network approach provides significant improvements for moderate to high precipitation amounts, which are the most important conditions for operational hydrology.
Acknowledgments
The authors wish to thank Robert L. Vislocky for access to the NGM archives used in this work, and to thank Joseph T. Ostrowski (MARFC) for access to the NCDC precipitation data used here. This work was supported in part by a NASA Earth Systems Science fellowship and a NASA Space Grant fellowship awarded to the first author, and by the National Science Foundation under Contract CMS 95-01958.
REFERENCES
Abdel-Aal, R. E., and M. A. Elhadidy, 1995: Modeling and forecasting the daily maximum temperature using abductive machine learning. Wea. Forecasting,10, 310–325.
Cheng, B., and D. M. Titterington, 1994: Neural networks: A review from a statistical perspective. Stat. Sci.,8, 2–54.
Dumais, R. E., and K. C. Young, 1995: Using a self-learning algorithm for single-station quantitative precipitation forecasting in Germany. Wea. Forecasting,10, 105–113.
Elsner, J. B., and C. P. Schmertmann, 1994: Assessing forecast skill through cross validation. Wea. Forecasting,9, 619–624.
Fletcher, D., and E. Goss, 1993: Forecasting with neural networks: An application using bankruptcy data. Inf. Manage.,24, 159–167.
Frankel, D. S., J. S. Draper, J. E. Peak, and J. C. McLeod, 1995: Artificial Intelligence Needs Workshop: 4–5 November 1993, Boston, Massachusetts. Bull. Amer. Meteor. Soc.,76, 728–738.
Fritsch, J. M., and Coauthors, 1998: Quantitative precipitation forecasting: Report of the Eighth Prospectus Development Team, U.S. Weather Research Program. Bull. Amer. Meteor. Soc.,79, 285–299.
Glahn, H. R., and D. L. Lowry, 1972: The use of Model Output Statistics (MOS) in objective weather forecasting. J. Appl. Meteor.,11, 1203–1211.
Hall, T., 1996: BRAINMAKER: A new approach to quantitative and probability of precipitation forecasting. NWS Southern Region Tech. Attachment SR/HSD 96-2, Scientific Services Division, NWS, Fort Worth, TX, 7 pp.
Kuligowski, R. J., 1997: An overview of National Weather Service quantitative precipitation estimates. TDL Office Note 97-4, Techniques Development Laboratory, National Weather Service, Silver Spring, MD, 27 pp.
——, and A. P. Barros, 1998: Experiments in short-term precipitation forecasting using artificial neural networks. Mon. Wea. Rev.,126, 470–482.
Lindner, A. J., and A. S. Krein, 1993: A neural network for forecasting heavy precipitation. Preprints, 13th Conf. on Weather Analysis and Forecasting, Vienna, VA, Amer. Meteor. Soc., 612–615.
Marzban, C., and G. J. Stumpf, 1996: A neural network for tornado prediction based on Doppler radar–derived attributes. J. Appl. Meteor.,35, 617–626.
McCann, D. W., 1992: A neural network short-term forecast of significant thunderstorms. Wea. Forecasting,7, 525–534.
Michaelsen, J., 1987: Cross-validation in statistical climate forecast models. J. Climate. Appl. Meteor.,26, 1589–1600.
Olson, D. A., N. W. Junker, and B. Korty, 1995: Evaluation of 33 years of quantitative precipitation forecasting at the NMC. Wea. Forecasting,10, 498–511.
Ripley, B. D., 1994: Neural networks and related methods for classification. J. Roy. Stat. Soc.,56, 409–456.
Vislocky, R. L., and J. M. Fritsch, 1995a: Generalized additive models versus linear regression in generating probabilistic MOS forecasts of aviation weather parameters. Wea. Forecasting,10, 669–680.
——, and ——, 1995b: Improved Model Output Statistics forecasts through model consensus. Bull. Amer. Meteor. Soc.,76, 1157–1164.
Sample lists of predictor variables for 6–12-h forecasts at Lebanon (Swatara Creek basin) during the summer (JJA), and 12–18-h forecasts at Confluence (Youghiogheny River basin) during the winter (DJF). The variable names corresponding to the symbols are provided in section 2a. The grid points are the 4 NGM storage grid points closest to the point of interest, numbered counterclockwise starting from the point to the southwest. Bold indicates moisture variables according to the classification in section 2a.
Composite evaluation statistics for three postprocessing forecast techniques and interpolated NGM forecasts for all four test locations for all four gauges, seasons, and lead times (51 922 forecasts). The statistics provided are the root-mean-squared error, the slope and y-intercept of the best-fit regression line of the forecasts upon the observations, the correlation coefficient (r) between the forecasts and observations, and the explained variance (EV: ratio of the variances of the forecasts and observations).