Introduction
Estimates of precipitation at regional scales are at the heart of hydrological, ecological, and environmental modeling in general, because they serve as input parameters (forcing fields) in spatially distributed models (Entekhabi et al. 1999). At these scales, the utility of precipitation predictions provided by general circulation models is limited because of their coarse spatial resolution. The use of limited area models (LAMs) is gradually emerging as a means for enhancing the accuracy of rainfall predictions at regional scales (Giorgi and Mearns 1991; Kim and Soong 1996; Miller and Kim 1996; Leung et al. 1996; Kim et al. 1998, 2000). Dynamic downscaling using LAMs yields multiple relevant variables, including precipitation, that are physically and dynamically consistent. However, dynamic downscaling is computationally expensive and is not error-free, because of limited spatial resolution and model parameterizations. Statistical interpolation of rainfall based on rain gauge data still provides one of the basic analysis tools for constructing rainfall maps at regional scales (Tabios and Salas 1985; Dirks et al. 1998), even though physical and dynamic consistency of such interpolation predictions is not preserved in general.
In the context of mapping precipitation using rain gauge data, the variable most frequently used for enhancing interpolation, especially over mountainous regions, is terrain elevation (Chua and Bras 1982; Hevesi et al. 1992; Goovaerts 2000). Terrain-derived characteristics, such as slope and aspect, as well as other variables such as latitude, longitude, and distance from the coast, are less frequently accounted for in the mapping of rainfall (Spreen 1947; Burns 1953; Wolfson 1975; Wotling et al. 2000). Hybrid approaches also exist that account for slope orientation when selecting data to construct local regression models between rainfall and elevation (Daly et al. 1994). Although the above variables enhance the predictability of precipitation spatial distribution, they do not account directly for orographically induced rainfall (Smith 1979).
Simple regional models with limited atmospheric physics and dynamics [see Barros and Lettenmaier (1993) for a review] focus on lower-atmosphere state variables that steer storms and dictate the pattern and movement of air masses (Rhea 1978; Alpert 1986; Isakson 1996). Examples of such variables include wind speed and direction, as well as specific humidity integrated over several pressure levels (Pandey et al. 1999). Atmospheric variables typically are available at a very coarse resolution. They only provide a picture of the large-scale state of the atmosphere, which is expected to bear some relevance to observed precipitation at the local scale. The important link of such lower-atmosphere characteristics with precipitation lies in their interaction with local terrain (Alpert and Shafir 1989b; Sinclair 1994; Andrieu et al. 1996).
To the authors' knowledge, apart from some aspects of the work of Herman et al. (1997), no comprehensive method has been previously reported for incorporating the joint effects of atmospheric and terrain variables to the spatial prediction of rainfall using geostatistical techniques. It should be stressed that such atmospheric variables are widely available at coarse resolutions, and their relevance to mapping precipitation at smaller scales is significant, as will be illustrated in what follows. The geostatistical framework presented in this paper accounts (in a quantitative way) for the joint effects of atmospheric and terrain variables into the mapping of rainfall at regional scales. This work is concerned with time-averaged precipitation; the additional factor of temporal variability is not addressed. Enhanced mapping of precipitation in space and time using atmospheric and terrain characteristics will be reported in the near future.
The selection of rainfall predictors based on physical considerations is demonstrated in section 2 for a study region in northern California. In section 3, alternative geostatistical approaches for mapping rainfall are presented. Two general cases are distinguished: spatial interpolation using (a) only rain gauge precipitation data and (b) rain gauge data in addition to low-atmosphere variables and their interaction with terrain. In section 4, a case study is undertaken: seasonal precipitation for the winter of 1981–82 is mapped for a region of northern California, using the various algorithms presented in section 3; the results are compared in terms of cross-validation statistics. Last, in section 5, some conclusions are drawn regarding the applicability of the proposed methodology to rainfall mapping and the potential avenues for future research.
Study area, precipitation, and its predictors
The study domain (Fig. 1a) is a 300 × 360 km2 area of the northern California coastal region, which is characterized by complex terrain and extreme seasonal variation in precipitation. The characteristic length scale of the terrain ranges from approximately 50–100 km in the northern part of the domain (the Coast Range north of San Francisco Bay) down to 10–20 km in the south of the bay. Annual precipitation varies widely within the region from 200 mm yr−1 in the Central Valley (east of the Coast Range) to over 1300 mm yr−1 in the Santa Cruz Mountains (north of the Monterey Bay). Western slopes of the Coast Range receive 4–5 times more precipitation than the Central Valley during the cold season (November–March). Precipitation in the region is generally from stratiform clouds caused by orographic lifting of the westerly flow over the western slope of the Coast Range. On occasion, strong convection embedded within the stratiform clouds generates intense local precipitation.
The rainfall dataset used in this study consists of 77 rain gauge precipitation measurements representing the seasonal [November–December–January (NDJ)] average of daily rainfall for 1 November 1981–31 January 1982 at 77 stations over the study area (see Fig. 1a). The crosses attached to certain rain gauges indicate that these gauges are used subsequently in a jackknife procedure (see section 4). The statistics of the available precipitation data are shown in Table 1. Precipitation values range from 1.49 to 14.35 mm with a mean of 5.83 mm and a standard deviation of 3.03 mm. The difference between the median (5.22 mm) and the mean (5.83 mm) values indicates a slightly positively skewed precipitation histogram. The original daily precipitation values constitute a subset of the Cooperative Observer and first-order precipitation stations, obtained from the National Oceanic and Atmospheric Administration; for details see Pandey et al. (1999).
The objective of this study is to map the season-average precipitation on a 300 × 360 grid of cell size 1 km2, using all relevant information available for this region. The decision to map the particular NDJ average of precipitation, instead of an interannual mean, was dictated by the fact that such a map should be used as input to coupled hydrologic models calibrated for that particular time period. Mapping the interannual rainfall average might reveal different structures of dependence between precipitation and its predictors; such an alternative scenario was not investigated in this work.
Elevation as a precipitation predictor
The spatial distribution of precipitation is heavily influenced by temperature, especially by its vertical lapse rate, which dictates the local level (height) and rate of condensation. In the absence of detailed (small-scale) temperature information, we use elevation as its surrogate, at least as a first approximation; this would be true for the case of a spatially constant lapse rate. Because we are interested in the spatial patterns of temperature rather than in its absolute magnitude (see below), this first approximation is adequate for all practical purposes. One alternatively could establish a regression function between coarse-resolution (large scale) temperature and fine-resolution (small scale) elevation information. The fine-resolution terrain then could be transformed into a regression-based temperature map, which would exhibit small-scale variations due to corresponding local terrain variations.
A 1-km-resolution digital elevation model (DEM) is available for this area and is shown in Fig. 1b; elevation values range from −17 to 1668 m, with a median of 126 m and an interquartile range of 398 m. The DEM grid size is 300 × 360 km2 and coincides with the grid at which estimation–interpolation of rainfall will be performed. All subsequently derived rainfall predictors are available at this spatial resolution. The first step, then, is to determine precipitation values at the 77 grid nodes that are closest to the 77 rain gauges using the nearest-neighbor method. The rank order (Spearman) correlation coefficient between collocated precipitation and elevation values is 0.22 (here the term collocated connotes that the pairs of rainfall and elevation data used to compute that correlation are located on the same grid node, possibly after nearest-neighbor interpolation of the rain gauge value). This valve implies that, for the particular area and season, elevation is weakly correlated with precipitation, which is expected for relatively low mountains such as those found in the study region. We use the Spearman correlation coefficient instead of the ordinary Pearson correlation coefficient because the former is more robust to outliers. In addition, we interpret the relevance of elevation with precipitation in terms of its correlation with the relative rank of precipitation and less in terms of its correlation with precipitation magnitude. The nonlinear rank-order transformation involved in calculating the Spearman correlation coefficient allows capturing nonlinearities in the elevation–precipitation relationship.
To determine the scale of interaction between precipitation and elevation, the DEM-reported value at each 1-km cell was replaced by the elevation average over gradually enlarged square windows ranging from 3 × 3 km2 to 21 × 21 km2, with an increment of 2 km in each direction. Ten different averaging windows were applied to obtain 10 sets of averaged elevation values derived at the 77 rain gauges. Rank correlation coefficients were then calculated between the 10 sets of collocated precipitation and average elevation values corresponding to each window size. As indicated by the scatterplot of Fig. 1c, correlation increases gradually as the size of the averaging window increases, and a maximum of 0.36 is reached for a 13 × 13-km2 window; for larger window sizes, correlation drops gradually. Window averaging is similar to low-pass filtering with a boxcar window, which smooths elevation spatial variability below a given window size. For the particular region shown in Fig. 1a, elevation exhibits the maximum relevance to precipitation, that is, can better inform its mapping, when small-scale (<13 km) elevation features are suppressed. Henceforth, the term elevation is used for this 13 × 13 km2 window averaged elevation, whose map is shown in Fig. 1d.
Atmospheric variables as precipitation predictors
Lower-atmosphere state variables at the study region for the winter of 1981/82 include specific humidity, integrated from 850- to 1000-hPa levels, and the horizontal wind components at the 700-hPa level. More specific, the time averages of specific humidity and horizontal wind components over the period of interest were retained. Lower-atmosphere state variables are available at a coarse resolution of 2.5° × 2.5° from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis dataset (Kalnay et al. 1996). The reanalysis products provide snapshots of the global atmospheric and surface fields at a uniform resolution every 6 h. NCEP–NCAR reanalysis variables are generated via data assimilation, whereby observational data around the globe are used to drive a global atmospheric model. Details of the data quality control and assimilation procedure are presented in Kalnay et al. (1996). Together with the reanalysis from the European Centre for Medium-range Weather Forecasts, NCEP–NCAR reanalysis variables are regarded as the most reliable (and widely available) representation of the instantaneous state of the global atmosphere.
We retained reported values at nine reanalysis nodes within and nearby the study region, which are shown in Fig. 2. Season-averaged specific humidity ranges from 14.81 to 16.68 g kg−1 and is relatively smaller in the southeastern part of the study domain (Fig. 2a). Wind speed ranges from 7.02 to 10.90 m s−1 and is relatively higher at the northwestern part of the region (Fig. 2b). Last, wind direction ranges from 86.58° to 110.61° from the north (Fig. 2c), indicating a dominant wind direction from the west. Lower-atmosphere state variables, although known at a very small number of locations, provide a picture of the large-scale state of the lower atmosphere, which is expected to be related to observed precipitation at the local (1 km) scale.

Although the horizontal wind vector v is available at a very coarse resolution from the nine NCEP nodes and interpolation at a 1-km resolution provides very smooth maps of the horizontal wind components over the study region, the vertical wind component due to orographic lifting exhibits small-scale variations due to local terrain. In this paper, such terrain-induced vertical motion, which can also be regarded as the exposure of local terrain to local wind, is accounted for in the regional-scale mapping of rainfall.
Rainfall predictors are subsequently derived from the NCEP–NCAR reanalysis lower-atmosphere state variables and their interaction with local terrain in the following three steps.
- Interpolation (via the inverse-distance-squared method) of specific humidity and horizontal wind components from the nine NCEP nodes to a 300 × 360 grid with 1-km2 cell size is done. Inverse-distance interpolation is retained in lieu of other methods of objective analysis because of the very small number of NCEP nodes, which does not allow for reliable inference of structural characteristics, for example, a spatial correlation model for specific humidity. The vertical wind component at any grid cell is calculated using Eq. (1). Let x1(u), x3(u) denote the interpolated values of specific humidity and derived vertical wind component at any grid cell with (2D) coordinate vector u = (u1, u2), expressed, for example, in degrees of longitude and degrees of latitude, respectively. Let x2(u) denote the elevation at u obtained from the low-pass filtering procedure described in section 2a.
- The resulting values are transformed to follow a uniform distribution. For the case of specific humidity, for example, the transformed value y1(u) at any location u is computed asy1uFX1x1uwhere
( ) denotes the cumulative histogram of the interpolated specific humidity values. For the case of vertical wind component X3, a zero value corresponds to the median of the distribution; that is,FX1 (0.0) = 0.5, which implies that a rank-ordered value y3(u) of greater than 0.5 should be interpreted as uplift. In the more general case, one would first compute the probability corresponding to a zero value of the vertical wind componentFX3 =p(0)3 (0.0) and then interpret as uplift all rank-ordered values greater thanFX3 . Rank-ordered values lower thanp(0)3 conversely would correspond to downslope vertical wind movement.p(0)3 - The interaction terms among the three predictors y1(u), y2(u), and y3(u) are calculated. For example, the interaction between humidity and the vertical wind component at a grid cell u is calculated as y5(u) = y1(u)y3(u). This product term is the rank-ordered amount of moisture uplift due to wind encountering the local terrain slope. The interaction term y7(u) = y1(u)y2(u)y3(u) is the (rank ordered) moisture uplift due to wind encountering the local terrain slope, modulated by the decrease of moisture availability with topographic height. The remaining interaction terms were calculated as y4(u) = y1(u)y2(u) and y6(u) = y2(u)y3(u). All the resulting interaction fields are also transformed to follow a uniform distribution as in Eq. (2).
The transformation procedure of Eq. (2) is applied for eliminating the effect of different histograms of predictor fields to precipitation mapping, given that all transformed fields lie in the interval [0, 1]. One should interpret the relevance of a predictor value at any grid cell in terms of its relation with the relative rank of precipitation rather than in terms of its correlation with precipitation magnitude. The magnitude of precipitation is dictated by the rain gauge observations themselves.
Maps of the final rank-transformed values of specific humidity y1(u), elevation y2(u), and vertical wind component y3(u) at any 1-km2 grid cell u are shown in Fig. 3. Note the extremely smooth spatial variation of humidity (Fig. 3a), as opposed to that of elevation (Fig. 3b) and vertical wind component (Fig. 3c). The patterns of spatial variability of the latter are mainly due to the elevation gradient fields, because the interpolated horizontal wind components (not shown) exhibit very smooth spatial variation attributable to the limited spatial resolution of NCEP–NCAR reanalysis. The vertical wind component field (Fig. 3c) exhibits the most complex spatial variability. Relatively large positive values of vertical wind, dark pixels corresponding to stronger uplift, are found on the windward side of the terrain (recall that the prevailing wind direction is from the west).
Maps of the final rank-transformed values of the interactions between humidity and elevation y4(u), humidity and vertical wind y5(u), elevation and vertical wind y6(u), and humidity with elevation and vertical wind y7(u) are shown in Fig. 4. One can appreciate the smooth spatial patterns inherited from humidity and to a lesser extent from elevation (Fig. 4a) and the complex spatial patterns inherited from the vertical wind component (Figs. 4b–d).
Last, we investigate whether the predictor values at the rain gauge stations are a representative sample of their respective populations. This is done by constructing quantile–quantile plots between the distributions of all predictors (over the entire domain) and their respective sample distributions (Fig. 5). A representative sample for, say, elevation would lead to a corresponding quantile–quantile plot aligned with the 45° line. Rain gauges tend to be located in relatively low elevations, although very low and very high elevations are adequately covered (Fig. 5a). Specific humidity and vertical wind, on the other hand, are adequately sampled from the spatial configuration of the available rain gauges (Figs. 5b–c). Similar graphs were constructed for the remaining predictors (not shown), and indicated representative samples for these predictors.
Correlation between precipitation and its predictors
In this paper, the relevance of the three predictors and their interactions to observed precipitation is quantified in terms of their correlation with the rain gauge data (Table 2, top row). Maximum correlation between a predictor and precipitation is reached for the interaction of humidity with elevation (
From Table 2, one can appreciate the importance of humidity in mapping precipitation, given that its (Pearson) correlation with the rain gauge data is
Next, we investigate whether a significant correlation exists between the predictor variables themselves, a situation termed mutlicollinearity in a regression context (Draper and Smith 1998). The original predictors (i.e., specific humidity Y1, elevation Y2, and the vertical wind component Y3) constitute nearly independent variables, because the correlation between them is very low:
Mapping precipitation via geostatistics
Consider the task of predicting the unknown precipitation value z(u) at any location u within the study area, that is, the task of constructing a map of precipitation estimates. Atmospheric and terrain variables, as well as their multiplicative effects (their interactions), constitute valuable information for improving precipitation predictability, in addition to the n available rain gauge measurements {z(uα), α = 1, … , n} (here uα denotes the coordinate vector of the αth rain gauge). The objective is (a) to assess the relevance of each predictor to observed precipitation and (b) to account for the most relevant predictors in the spatial interpolation of rainfall.
For simplicity, we assume that u represents the coordinate vector of the central point of any grid cell and that rain gauge data or relevant atmospheric variables (derived or interpolated) are defined on the same quasi-point support. In our case study (see section 4), an unknown precipitation value z(u) or a known elevation value y2(u) is regarded as representative of a 1-km2 grid cell centered at u.
On one hand, the trend component malg(u) could be identified to predictions of a physically based deterministic model, and the resulting residual Ralg(u) could be viewed as stochastic spatial variability (Rutherford 1972). On the other hand, a constant trend component malg(u) = m attributes all spatial variability to the residual component Ralg(u), which implies that no prior knowledge exists regarding the average spatial variability of the phenomenon under study. In the optimal case, the trend component should be associated with some physically meaningful component of spatial variability. The residual component should model stochastic variations (not necessarily purely random, i.e., white noise) around that trend. In what follows, the subscript alg is dropped for simplicity.


Before presenting rainfall mapping schemes that account for several auxiliary variables, such as humidity, elevation, and their interactions, we first present the case of mapping rainfall using only rain gauge measurements.
Spatial interpolation using only rain gauge data
Consider the task of estimating the unknown precipitation value z(u) at a location u from n(u) < n surrounding values {z(uα), α = 1, … , n(u)} within a neighborhood W(u) centered at u. In the stationary case, n(u) + 1 pieces of information are available: the known regional mean m, and the n(u) residual values {r(uα) = [z(uα) − m], α = 1, … , n(u)}.





Kriging (either SK or OK) is an exact interpolator; that is, kriging estimates reproduce observed sample data values at their locations:
We now present two geostatistical alternatives for incorporating several relevant predictors into the mapping of rainfall. In what follows, we do not adopt the geostatistical approach of cokriging (Hevesi et al. 1992), because of space limitations and for the following two reasons: (a) the inference effort required by cokriging increases with the number of predictors and (b) various investigators have reported that cokriging does not improve significantly the accuracy of precipitation estimates when compared (in terms of cross-validation statistics) with the two approaches presented in the next section (Goovaerts 2000).
Spatial interpolation accounting for rainfall predictors
The coefficient vector b is constant over the study region, because the regression function is determined using all n data values; this implies that the function f( ) does not account for local variations in the relationship between precipitation and its predictors. Any difference (spatial variation) between two estimates
A regression-derived precipitation estimate

Note that the decision to adopt a linear versus a nonlinear regression model for establishing the local trend component malg(uα) is one possible (subjective) interpretation of the decomposition in Eq. (4). The use of forward stepwise regression instead of backward elimination for selecting the pool of most relevant predictors (Draper and Smith 1998) is similarly another possible (subjective) interpretation of the general decomposition in Eq. (4). Different decisions evidently result into different sets of predictor variables, which in turn result into a different trend component malg(u) and consequently a different residual component ralg(u) at each location u. In any case, the residual component ralg(u) is considered to be stationary with zero mean, and SK consequently can be used for mapping the residual spatial variation.





An alternative (and more elaborate) procedure could be envisaged, whereby the local trend of precipitation is evaluated with respect to each predictor individually instead of collectively through their linear combination obtained by regression. For the case of KED, this approach would call for the determination of two coefficients for each variable, leading to a total of 2(K + 1) trend coefficients for the case of (K + 1) variables. In this paper, we do not pursue this alternative; we essentially regard the regression-based precipitation predictions as a secondary variable carrying all the available information content of the individual predictors.
The different geostatistical algorithms presented in this section for mapping precipitation are a consequence of the different decompositions of the RF model {Z(u), u ∈ D} given in Eq. (3). To be more specific, their difference lies in the complexity of the spatial trend component m(u) within each neighborhood W(u) centered at the location u where estimation is performed and the complexity of the resulting residual component. For SK, the spatial trend component m(u) is assumed to be constant; that is, m(u) = m, ∀u ∈ D. This assumption is oversimplified, and SK is not considered in what follows. For OK, m(u) is estimated using only nearby precipitation data based on a model of their spatial correlation; that is, m(u) =
Recall that the resulting residual component R(u) is considered to have a constant (zero) mean and a stationary covariance model CR(h) in all forms of kriging presented in this section. As a consequence, it is expected that the introduction of more realism in the spatial trend component, via consideration of relevant predictor variables, will render the assumption of a stationary residual less unrealistic. This implies that SKLM and KED, whose spatial trend components are the most complex, will lead to more realistic (physically less inconsistent) rainfall maps.
Both SKLM and KED can be viewed as corrections to the “first-guess” field derived via an OLS regression. Recall that OLS assumes that the regression residuals are spatially uncorrelated; hence SKLM and KED can be regarded as procedures for reintroducing such a residual spatial correlation to the mapping of precipitation. Both SKLM and KED corrections ensure data exactitude and could be viewed as (intermittent) data assimilation techniques (Rutherford 1972). The only difference is that in the meteorological definition of data assimilation, the first-guess field is a physically based prediction from a dynamic atmospheric model. In our case, the first-guess field is a statistical prediction based on relevant (in a physical sense) variables with observed precipitation. By construction, the regression-based estimates are independent of the resulting residuals, whereas this might not be true in the case of data assimilation involving the difference between atmospheric model predictions and observed precipitation.
The alternative geostatistical algorithms (apart from SK) presented in this section are subsequently applied for mapping rainfall over the study area shown in Fig. 1a.
Case study
The precipitation dataset shown in Fig. 1a, as well as the predictors derived in section 2 and shown in Figs. 3 and 4, are used for mapping rainfall over a 300 × 360 grid of cell size 1 km2. In what follows, we first present maps of rainfall estimates derived from the different algorithms presented in section 3. These estimates then are compared in terms of their cross-validation statistics in section 4b. All geostatistical analyses were performed using the public-domain “GSLIB” geostatistical software library package (Deutsch and Journel 1998).
Regression models between precipitation and its predictors
The map of precipitation estimates
The map of precipitation estimates

Recall from Table 2 that, although the vertical wind component Y3 is not correlated with the interaction between humidity and elevation Y4(
The map of precipitation estimates
The correlation coefficient between regression-based and observed precipitation values for the regression model in Eq. (19) is 0.72 (see Fig. 7a). Such correlation values between regression-based and rain gauge data are comparable with those obtained from predictions using deterministic regional models with limited inclusion of atmospheric physics and dynamics; for this latter case, such correlation values range between 0.61 and 0.84 (Alpert and Shafir 1989a; Sinclair 1994). The spatial organization of the resulting residual values {r347(uα) = [z(uα) −
The three different maps of Fig. 6 represent three different first-guess fields, that is, three different interpretations of the relative importance of the seven predictor variables available over the study area. We now proceed by incorporating these three different maps, as well as the information brought by the precipitation residuals at the rain gauge stations (and their spatial correlation), into the geostatistical mapping of rainfall.
Geostatistical mapping of rainfall
All geostatistical interpolation procedures presented in the previous section call for a covariance (or, equivalent, a variogram) function that models the spatial continuity (or, equivalent, the spatial variability) of the original sample precipitation z data or that of the residual r values resulting from the particular trend function adopted. Figure 8 depicts the omnidirectional experimental and model variograms of the sample precipitation data (Fig. 8a), as well as those corresponding to the residuals from different trend functions (Figs. 8b–d).

The parameters of the variogram models for the original precipitation z data and the residual r values from various trend functions were derived via cross-validation [see Isaaks and Srivastava (1989) and next section] and are tabulated in Table 3. Because a zero nugget effect was adopted for all cases, the sum of the sills of the two variogram structures should be approximately equal to the variance of the respective dataset. In the case of original rain gauge z samples, precipitation spatial variability is decomposed in a small-scale (40 km) process that explains 22% of the total variance (=9.18 mm2), and a large-scale (160 km) process that explains the remaining 78% of the total variance. Similar decompositions, although with different parameters, were adopted for the residual r datasets. In general, as the trend function becomes more complex, thus involving more predictor variables and explaining a larger proportion of the z sample variability, the variance of the corresponding residual r values [sum of
The map of precipitation estimates
The maps of precipitation estimates
The task now is to compare objectively these alternative rainfall maps and to evaluate the improvement (if any) brought by the predictors to the accuracy of the derived map product.
Comparison of alternative mapping procedures
Because all variants of kriging are exact interpolators, no estimation error occurs at rain gauge locations. The different geostatistical interpolation procedures are therefore compared via cross-validation [see, for example, Isaaks and Srivastava (1989)]. Cross-validation amounts to sequentially dropping a single precipitation value from the sample dataset and reestimating its value from the remaining samples using all other available information. At any sample location uα, both the original precipitation value z(uα) and its cross-validation-derived estimate
Table 4 gives selected cross-validation statistics for the interpolation algorithms considered in the previous section. Cross-validation statistics examined are the rmse, as well as the correlation
Ordinary Kriging is the least accurate, because it leads to the largest rmse and
From Table 4, one can conclude that, overall, rainfall mapping via SKLM and KED, accounting for the maximum amount of relevant predictors and for spatial correlation of the resulting residuals, yields relatively better cross-validation scores for the particular dataset and study region. Although OK performs worst in terms of cross-validation scores, its performance is not dramatically different than the other algorithms considered in this work. Such relatively good cross-validation scores for the case of OK are largely due to the rain-gauge density and the large spatial correlation range (160 km) of precipitation.
To arrive at a more definite conclusion regarding the improvement brought by the lower-atmosphere state variables and terrain information, we compare their relationship with the various cross-validation error datasets that result from the different algorithms adopted. In particular, we investigate whether the entire set of available predictors (Table 2) could account for a portion of the spatial variability of the cross-validation errors. If, for example, a statistically significant regression function can be established between the predictors and the cross-validation errors, this would imply that such errors could be potentially reduced, had those predictors been included in the interpolation procedure. Relations between predictors and cross-validation errors were investigated for all the algorithms presented in the previous section.
The regression characteristics between cross-validation errors corresponding to different mapping algorithms and three precipitation predictors Y3, Y4, and Y7 are shown in Table 5. A relatively high R2 indicates that a higher proportion of the spatial variability of cross-validation errors can be accounted for by the three predictors. A relatively high F statistic associated with a small significance p value implies that the regression model between the corresponding cross-validation errors and the precipitation predictors is statistically significant.
From Table 5, one can see that statistically significant regression models between cross-validation errors and the three precipitation predictors Y3, Y4, and Y7 can be established for the case of OK, SKLM2, and, to a lesser extent, for the case of SKLM4. For the case of OK, the percentage of variance of cross-validation errors accounted for by the regression on the predictors is 24%, whereas such a proportion drops to 19% in the case of SKLM4. Note that KED and SKLM give similar results in all cases. One could argue that the R2 values of Table 5 for the case of OK, SKLM2, and SKLM4 (or, equivalent, for the corresponding KED algorithms) are not very high. They are, however, important in that the three predictors Y3, Y4, and Y7 can explain a nonnegligible proportion of the spatial variability of the cross-validation errors. Such results corroborate the importance of including lower-atmosphere state variables and their interaction with local terrain characteristics into the spatial interpolation of rainfall.
Last, we evaluate the performance of the various algorithms in terms of jackknife scores. To be more specific, we exclude sample precipitation values at the 15 stations marked with crosses in Fig. 1a and perform estimation at these 15 locations. We preferentially exclude the highest precipitation values for the jackknife, because accurate estimation of such values is critical in many hydrologic analyses. Most of the jackknife locations in the northwestern part of the study region are located outside the convex hull of the remaining rain gauges. This means that estimation at these jackknife locations is performed in extrapolation mode, in which case the local trend model malg(u) adopted is of paramount importance. The algorithm that will suffer most from this extrapolation setting is OK, because its local trend model is estimated from nearby data that are located (south)east from the jackknife stations (see Fig. 1a). This adverse setting for OK, however, is compensated by the fact that these highest sample precipitation values are not predicted as well as the other ones from the corresponding regression model (local trend; see Fig. 7a). It also should be noted that, for all the algorithms considered, we build the corresponding regression models and infer the resulting residual variograms using all n = 77 rain gauges. The resulting rmse of jackknife errors is shown in Table 6. Ordinary kriging yields the largest rmse value (4.57), whereas SKLM374 and SKLM4 yield the smallest rmse values (3.42 and 3.69, respectively), thus achieving a 25% and 19% reduction from the OK-based rmse. Again, the importance of including lower-atmosphere state variables and their interaction with local terrain characteristics into the spatial interpolation of rainfall is corroborated.
Discussion and conclusions
A geostatistical framework for enhanced analyses of precipitation is presented in this paper. Atmospheric and terrain characteristics, which control the spatial distribution of precipitation at regional scales, are accounted for via alternative forms of kriging. Lower-atmosphere state variables include specific humidity and horizontal wind components, readily available at coarse resolution (2.5° × 2.5°) from the NCEP–NCAR reanalysis products. Their interactions with terrain, both elevation and its local gradients, provide valuable information for mapping the spatial distribution of orographic precipitation. The relevance of this information is first evaluated via a regression model based on collocated precipitation and predictor data. The regression-based precipitation estimates constitute a first-guess field. Spatial interpolation of the residuals from this first-guess field is then performed, and the resulting residual field is added to the regression-based estimates. As an alternative, the first-guess field is locally modified to conform to nearby sample precipitation data, followed by spatial interpolation of the resulting residuals.
The alternative geostatistical procedures, which differ in the complexity of the first-guess field, are used for mapping time-averaged precipitation from a set of 77 rain gauges over a region of northern California for NDJ of 1981/82. For this particular study area and time period, elevation alone explains 12% of the precipitation spatial variability, whereas the interaction of specific humidity with elevation explains 48% of such variability. Linear regression using the vertical wind component, the interaction of specific humidity with elevation, and the interaction of specific humidity with elevation and vertical wind (the latter being a measure of orographic uplift of air masses modulated by a surrogate of temperature) as precipitation predictors explains 52% of the variance of observed precipitation. Different first-guess fields are constructed via linear regression using a different number of precipitation predictors. The resulting residuals are correlated in space, with ranges varying from 150 km (for the least complex first-guess field) to 90 km (for the most complex first-guess field); the correlation range for the sample rain gauge precipitation is 160 km. The various interpolation algorithms are compared in terms of (a) their respective cross-validation error statistics, (b) the significance of regression models between such cross-validation errors and precipitation predictors, and (c) their jackknife errors. In all cases, interpolation using only rain gauge data (OK) performed worst, while interpolation using the maximum amount of relevant atmospheric and terrain information (SKLM347) resulted in better cross-validation and jackknife scores. The reduction in rmse values from OK obtained via SKLM347 ranged from 9% for cross-validation to 25% for jackknife.
Classical objective analysis schemes ignore important relevant information such as humidity and vertical wind and consequently produce oversmooth representations of the spatial distribution of rainfall; such an adverse effect is intensified when the network of rain gauges is sparse. This paper demonstrates the capability of constructing realistic analyses of precipitation by integrating readily available and physically relevant predictors. Precipitation analyses derived from the proposed schemes can be used as reference for comparison against spatial patterns of precipitation obtained via detailed atmospheric models operating at regional scales.
In conclusion, it should be noted that rainfall mapping within the proposed geostatistical framework could be enhanced by the availability of wind fields at less coarse resolution than those available from the NCEP reanalysis products. Such finer-resolution wind fields could be obtained, for example, from regional-scale atmospheric models with detailed parameterization of physical and dynamical processes and could allow resolution of the local divergence of air masses in complex terrain. This better-resolved vertical wind component could be better correlated with observed rain gauge precipitation. Similar remarks can be made for other relevant variables, such as temperature and humidity, whose availability at regional scales could be critical for improving the final map product.
The authors acknowledge the constructive criticism of three anonymous reviewers. This work was supported through funding provided by NASA–RESAC Grant NS-2791 and LBNL Grant LDRD 366139. Work for the Department of Energy was under Contract DE-AC03-76SF00098.
REFERENCES
Alpert, P. 1986. Mesoscale indexing of the distribution of orographic precipitation over high mountains. J. Appl. Meteor 25:532–545.
Alpert, P., and H. Shafir. 1989a. A physical model to complement rainfall normals over complex terrain. J. Hydrol 110:51–62.
Alpert, P., and H. Shafir. 1989b. Meso-γ-scale distribution of orographic precipitation: Numerical study and comparison with precipitation derived from radar measurements. J. Appl. Meteor 28:1105–1116.
Andrieu, H., , M. N. French, , V. Thauvin, , and W. F. Krajewski. 1996:. Adaptation and application of a quantitative rainfall forecasting model in a mountainous region. J. Hydrol 184:243–259.
Barros, A. P., and D. P. Lettenmaier. 1993. Dynamic modeling of the spatial distribution of precipitation in remote mountainous areas. Mon. Wea. Rev 121:1195–1214.
Burns, J. I. 1953. Small-scale topographic effects on precipitation in San Dimas experimental forest. Trans. Amer. Geophys. Union 34:761–767.
Chilès, J. P., and P. Delfiner. 1999. Geostatistics: Modeling Spatial Uncertainty. John Wiley and Sons, 695 pp.
Chua, S. H., and R. L. Bras. 1982. Optimal estimators of mean area precipitation in regions of orographic influence. J. Hydrol 57:23–48.
Cressie, N. A. C. 1993. Statistics for Spatial Data. John Wiley and Sons, 900 pp.
Daley, R. 1991. Atmospheric Data Analysis. Cambridge University Press, 457 pp.
Daly, C., , R. P. Neilson, , and D. L. Phillips. 1994. A statistical–topographic model for mapping climatological precipitation over mountainous terrain. J. Appl. Meteor 33:140–158.
Deutsch, C. V., and A. G. Journel. 1998. GSLIB: Geostatistical Software Library and User's Guide. 2d ed. Oxford University Press, 368 pp.
Dirks, K. N., , J. E. Hay, , C. D. Stow, , and D. Harris. 1998. High-resolution studies of rainfall on Norfolk Island. Part II: Interpolation of rainfall data. J. Hydrol 208:187–193.
Draper, N. R., and H. Smith. 1998. Applied Regression Analysis. John Wiley and Sons, 706 pp.
Entekhabi, D. and Coauthors, 1999. An agenda for land surface hydrology research and a call for the second International Hydrological Decade. Bull. Amer. Meteor. Soc 80:2043–2058.
Giorgi, F., and L. O. Mearns. 1991. Approaches to the simulation of regional climate change: A review. Rev. Geophys 29:191–216.
Goovaerts, P. 1997. Geostatistics for Natural Resources Evaluation. Oxford University Press, 483 pp.
Goovaerts, P. 2000. Geostatistical approaches for incorporating elevation into the spatial interpolation of rainfall. J. Hydrol 228:113–129.
Herman, A., , V. B. Kumar, , P. A. Arkin, , and J. V. Kousky. 1997. Objectively determined 10-day African rainfall estimates created for famine early warning systems. Int. J. Remote Sens 18:2147–2159.
Hevesi, J. A., , A. L. Flint, , and J. D. Istok. 1992. Precipitation estimation in mountainous terrain using multivariate geostatistics. Part II: Isohyetal maps. J. Appl. Meteor 31:677–688.
Isaaks, E., and R. M. Srivastava. 1989. An Introduction to Applied Geostatistics. Oxford University Press, 561 pp.
Isakson, A. 1996. Rainfall distribution over central and southern Israel induced by large-scale moisture flux. J. Appl. Meteor 35:1063–1075.
Journel, A. G., and C. J. Huijbregts. 1978. Mining Geostatistics. Academic Press, 600 pp.
Kalnay, E. and Coauthors, 1996. The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc 77:437–471.
Kim, J., and S-T. Soong. 1996. Simulation of a precipitation event in the western United States. Regional Impacts of Global Climate Change, S. J. Ghan et al., Eds., Battelle Press, 73–84.
Kim, J., , N. L. Miller, , A. K. Guetter, , and K. P. Georgakakos. 1998. River flow response to precipitation and snow budget in California during the 1994/95 winter. J. Climate 11:2376–2386.
Kim, J., , N. L. Miller, , J. D. Farrara, , and S-Y. Hong. 2000. A seasonal precipitation and stream flow hindcast and prediction study in the western United States during the 1997/98 winter season using a dynamic downscaling system. J. Hydrometeor 1:311–329.
Leung, L. R., , M. S. Wigmosta, , S. J. Ghan, , D. J. Epstein, , and L. W. Vail. 1996. Application of a subgrid orographic precipitation/surface hydrology scheme to a mountain watershed. J. Geophys. Res 101:12803–12817.
Miller, N. L., and J. Kim. 1996. Numerical prediction of precipitation and river flow over the Russian River watershed during the January 1995 California storms. Bull. Amer. Meteor. Soc 77:101–105.
Pandey, G. R., , D. R. Cayan, , and K. P. Georgakakos. 1999. Precipitation structure in the Sierra Nevada of California during winter. J. Geophys. Res 104:12019–12030.
Rhea, J. O. 1978. Orographic precipitation model for hydrometeorologic use. Ph.D. dissertation, Department of Atmospheric Science, Colorado State University, 199 pp.
Rutherford, I. D. 1972. Data assimilation by statistical interpolation of forecast error fields. J. Atmos. Sci 29:809–815.
Sinclair, M. R. 1994. A diagnostic model for estimating orographic precipitation. J. Appl. Meteor 33:1163–1175.
Smith, R. B. 1979. The influence of mountains on the atmosphere. Advances in Geophysics Vol. 21, Academic Press,. . 87–230.
Spreen, W. C. 1947. A determination of the effect of topography upon precipitation. Trans. Amer. Geophys. Union 28:285–290.
Switzer, P. 1979. Statistical considerations in network design. Water Resour. Res 15:1712–1716.
Tabios, G., and J. Salas. 1985. A comparative analysis of techniques for spatial interpolation of precipitation. Water Resour. Bull 21:365–380.
Thiébaux, H. J. 1997. The power of duality in spatial–temporal estimation. J. Climate 10:567–573.
Wackernagel, H. 1995. Multivariate Geostatistics. Springer-Verlag, 256 pp.
Wolfson, N. 1975. Topographical effects on standard normals of rainfall over Israel. Weather 30:138–144.
Wotling, G., , C. Bouvier, , J. Danloux, , and J-M. Fritsch. 2000. Regionalization of extreme precipitation distribution using the principal components of the topographic environment. J. Hydrol 233:86–101.

Interaction of precipitation with elevation: (a) precipitation average (mm) during 1 Nov 1981–31 Jan 1982 at 77 rain gauges near the northern California coastal region (crosses indicate gauges used for jackknife), (b) digital elevation model of 1-km resolution (elevation values lower than 0 m are colored white, elevation values greater than 1000 m are colored black), (c) rank-order correlation coefficients between 77 collocated precipitation and window-averaged elevation values as a function of aggregation scale, and (d) smoothed elevation map derived via averaging the original elevation values of (b) within a 13 km × 13 km moving window
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Time-average (from 1 Nov 1981 to 31 Jan 1982) of daily lower-atmosphere state variables from NCEP–NCAR reanalysis nodes: (a) specific humidity (g Kg−1) integrated from 850- to 1000-hPa levels, (b) wind speed (m s−1) at the 700-hPa level, and (c) wind direction (in degrees clockwise from north) at the 700-hPa level
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Maps of ranked predictors for precipitation: (a) specific humidity integrated from 850 to 1000 hPa, (b) average elevation within a 13-km square window, and (c) vertical wind component
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Maps of ranked interactions between precipitation predictors: (a) humidity with elevation; (b) humidity with vertical wind component, (c) elevation with vertical wind, and (d) humidity with elevation and vertical wind
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Plots to check whether predictor values at rain gauges constitute representative samples from their respective populations. Quantile–quantile plots between distributions of (a) gauge vs DEM reported elevations, (b) interpolated specific humidity at 77 pixels closest to the 77 rain gauges vs specific humidity at all pixels, and (c) vertical wind component at 77 pixels closest to the 77 rain gauges vs specific humidity at all pixels
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Maps of regression-based precipitation estimates (mm) derived using (a) elevation Y2 as a single predictor (REGR-2); (b) the interaction of specific humidity and elevation Y4 as a single predictor (REGR-4); and (c) the vertical wind component Y3, the interaction of specific humidity with elevation Y4, and the interaction of specific humidity with elevation and vertical wind Y7 as predictors (REGR-347)
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Characteristics of estimated and residual precipitation values (mm) resulting from regression using the vertical wind component Y3, the interaction of specific humidity with elevation Y4, and the interaction of specific humidity with elevation and vertical wind Y7 as predictors (REGR-347): (a) scatterplot of estimated versus sample precipitation values, (b) location map of precipitation residuals, and (c) normal probability plot of the distribution of precipitation residuals, indicating a quasi-Gaussian distribution
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Spatial variability characteristics of observed and residual datasets at the 77 rain gauges. Sample (thick dash–dotted black lines) and model (thick gray lines) variograms for (a) observed precipitation; (b) residuals from regression estimates using elevation Y1 as the only predictor (REGR-2); (c) residuals from regression estimates using the interaction of specific humidity and elevation Y4 as the only predictor (REGR-4); and (d) residuals from regression estimates using vertical wind Y3, the interaction of specific humidity with elevation Y4, and the interaction of specific humidity with elevation and vertical wind Y7 as the vector of predictors (REGR-347)
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Kriging-derived precipitation estimates (mm) based on (a) only station data and ordinary kriging (OK); (b) regression using elevation Y2 as the single precipitation predictor, followed by simple kriging of residuals (SKLM-2); (c) regression using the interaction between specific humidity and elevation Y4 as the single precipitation predictor, followed by simple kriging of residuals (SKLM-4); and (d) regression using the vertical wind component Y3, the interaction between specific humidity and elevation Y4, and the interaction among specific humidity, elevation, and the vertical wind component Y7 as three precipitation predictors, followed by simple kriging of residuals (SKLM-347)
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2

Kriging-derived precipitation estimates (mm) based on (a) local deformation of the regression-based trend component
Citation: Journal of Applied Meteorology 40, 11; 10.1175/1520-0450(2001)040<1855:GMOPFR>2.0.CO;2
Summary statistics (mm) of the 77 sample precipitation data

Matrix of Pearson correlation coefficient values between sample precipitation and its predictors

Model parameters adopted (via cross validation) for the variograms of the sample precipitation data, and the residuals from the different trend functions. The variogram model specification is γ(i) (|h|) = C(i)(0) −

Statistics of cross-validation errors for different mapping algorithms (subscripts denote the predictors used in the respective regression equations). Rmse denotes the root-mean-square error (mm),

Regression characteristics between cross-validation errors of different mapping algorithms (subscripts denote the predictors used in the respective regression equations), and the three precipitation predictors: vertical wind component Y3, interaction of specific humidity with elevation Y4, and interaction among humidity, elevation, and vertical wind Y7. Here, R2 denotes the proportion of cross-validation errors that is explained by regression using the three predictors Y3, Y4, and Y7. A relatively high F statistic associated with a p value smaller than 0.001 implies that the regression model between the corresponding cross-validation errors and the precipitation predictors is statistically significant

Root-mean-square error (mm) of jackknife errors for different mapping algorithms (subscripts denote the predictors used in the respective regression equations). The numbers in parentheses show the relative changes in the corresponding jackknife statistics from OK
