## 1. Introduction

High-resolution quantitative precipitation estimates (QPE) constitute an important basis for hydrological forecasting (e.g., Young et al. 2000), data assimilation in numerical weather forecasting (e.g., Leuenberger and Rossa 2007), climatological extreme rainfall analysis (e.g., Overeem et al. 2009), and many more applications. There is long-lasting and continued research into QPE methods that combine remote sensing measurements from radar with in situ measurements from conventional rain gauges. Proposed methods range from global and local bias adjustment (e.g., Brandes 1975; Gjertsen et al. 2004; Germann et al. 2006) to formal statistical real-time combination (e.g., Krajewski 1987; Seo and Breidenbach 2002; Sinclair and Pegram 2005; DeGaetano and Wilks 2008; Fuentes et al. 2008).

Methods for real-time radar–gauge combination have been developed within several methodological frameworks, including objective analysis (e.g., Brandes 1975; García-Pintado et al. 2009), interpolation with deterministic weights (e.g., DeGaetano and Wilks 2008; He et al. 2011), interpolation with splines (Schneider and Steinacker 2009), and Bayesian conditioning (Todini 2001). A particularly broad class of combination methods is constructed in the framework of geostatistics, using a stochastic interpolation procedure known as kriging (Gandin 1965; Cressie 1993). After the pioneering work by Krajewski (1987) and Creutin et al. (1988), many different variants of kriging have been proposed for radar–gauge combination, including ordinary kriging, kriging with external drift, cokriging, and indicator kriging (see e.g., Barancourt et al. 1992; Seo 1998; Haberlandt 2007). The exact employment of these methods (or a combination of them) can vary considerably between applications. One technical advantage of the geostatistical concept is that the spatial covariance structure is estimated from the data and, as a consequence, less a priori specifications are necessary, compared for example to deterministic interpolation. Many applications of geostatistical radar–gauge combination have demonstrated that point estimates (best estimates of precipitation) can provide substantial improvement over the corresponding single-sensor estimates (e.g., Haberlandt 2007; Schuurmans et al. 2007; Goudenhoofdt and Delobbe 2009; Velasco-Forero et al. 2009) as well as over less advanced combination methods such as mean field bias correction or range-dependent adjustment (e.g., Goudenhoofdt and Delobbe 2009).

Geostatistical combination methods offer not only point estimates. The underlying stochastic concept delivers a probabilistic precipitation estimate in the form of a probability density function quantifying the uncertainty of the resulting QPE product at each target location. The probabilistic estimates would, in principle, enable the stochastic simulation of QPE ensembles, which in turn would serve to follow observational uncertainties into applications such as hydrological forecasts (e.g., Ahrens and Jaun 2007; Zappa et al. 2008; Germann et al. 2009). So far, the probabilistic statements of geostatistical radar–gauge combination have rarely been evaluated and little is known about the reliability of the resulting uncertainty estimates.

One limitation for using geostatistical radar–gauge combination is that the underlying stochastic concept relies on the Gaussian distribution and on the stationarity of mean and spatial covariance. Precipitation is intermittent (wet and dry regions) and has a skewed distribution; therefore, these theoretical assumptions are rarely satisfied in practice. For example, spatial precipitation covariance is expected to depend on precipitation magnitude, but is modeled with one stationary semivariogram in geostatistics. An established approach to deal with violations of assumptions on distribution and homoscedasticity is data transformation such as the Box–Cox power transform (e.g., Box and Cox 1964). Kriging with antecedent data transformation is commonly termed trans-Gaussian kriging (e.g., Cressie 1993; Schabenberger and Gotway 2005). It has previously been used in applications outside QPE such as for the mapping of hake abundance (Jardim and Ribeiro 2008) or mapping of Pb concentration in soils (Zhang et al. 2009). In the context of geostatistical radar–gauge combination, some studies have used a prescribed data transformation: for example, Schuurmans et al. (2007) use a square root transformation in an application with situations of abundant daily rainfall, and Verworn and Haberlandt (2011) use a log transformation for selected flood cases. Nevertheless, it is not common practice to apply transformation in radar–gauge combination, and, to our knowledge, the effect of transformation as well as the role of transformation choice has not been examined systematically so far.

In this paper, we study the potential and limitations of trans-Gaussian kriging using Box–Cox transformation to make the non-Gaussian nature of radar and gauge data approximately compliant with the theoretical assumptions of kriging. To this end, we undertake experiments with one frequently used geostatistical combination technique (kriging with external drift) and compare results with different transformation settings to those with untransformed data. Apart from checking the compliance with model assumptions, emphasis is placed on the effect of the transformation on the probabilistic estimates, which we evaluate by cross validation. As a test bed we use hourly precipitation measurements from a radar composite and a sparse rain gauge network for the territory of Switzerland.

The paper is organized as follows: Section 2 describes the methods applied, including the transformation of data. Section 3 introduces the test bed and data used. Results are presented in section 4. Section 5 summarizes the findings of the study and gives an outlook to open questions and possible future research.

## 2. Methods

### a. Combination method

Geostatistical methods are based on a stochastic concept where precipitation values at all locations (including those where no observations are available) are perceived as one single realization of a multivariate random variable. In classical geostatistics, this random variable is assumed to have a Gaussian distribution, which is described by two characteristics: the mean value (the deterministic part, also called the drift or trend) and the covariance function, which describes the spatial dependence between the residuals (the random part). The covariance is usually assumed to depend on the lag only (stationarity) and to belong to a parametric class of functions.

This stochastic concept of a multivariate distribution for precipitation delivers a probabilistic estimate (a probability density function) of the target variable at any location in the domain through the conditional distribution given the observations available. By the Gaussian assumption, this probabilistic estimate is again Gaussian. Kriging (the geostatistical prediction method) computes the expected value (the point estimate) and the variance (the measure of prediction uncertainty) of this conditional distribution. When applied to a regular grid of target locations, kriging can produce maps of the point estimate (i.e., an interpolation of observations), maps of quantiles of the probabilistic estimate, maps of the uncertainty estimate, and maps of simulated realizations conditional on the observations (i.e., an ensemble). Thorough descriptions of the principle of geostatistics and the various kriging techniques can be found for example in Cressie (1993), Schabenberger and Gotway (2005), or Diggle and Ribeiro (2007).

*i*,

*j*denoting indices of grid points;

*α*and

*β*are the trend parameters—intercept and radar coefficient, respectively; and

*Z*is a random process modeling the deviations of precipitation from the scaled radar field. Its expectation is assumed to be zero and the covariance is assumed to be the isotropic exponential covariance allowing for an additional nugget. The reason for the choice of this parsimonious covariance is the limited sampling of 75 rain gauge stations only (see section 3a). Most of the time, a large fraction of these report dry conditions, which prevents a reliable estimation of more variogram parameters. Previous studies have found only very minor benefit from using more complex covariance functions, even in more favorable sampling conditions (Haberlandt 2007; Erdin 2009). Model parameters are estimated by the method of restricted maximum likelihood (REML) (Schabenberger and Gotway 2005). This procedure estimates trend and covariance parameters jointly while accounting for the limited sample size. It is not depending on settings for the construction of empirical variograms, because every data pair is considered individually. The kriging is performed with a global neighborhood. The radar–gauge matching is done by nearest neighbor—that is, the radar value at a gauge location is taken from the radar pixel (1 km × 1 km) encompassing the gauge. As compared to matching the mean of several neighboring pixels (such as in Goudenhoofdt and Delobbe 2009), our choice is motivated by the desire to not further increase inconsistencies in spatial sampling between radar and gauges (see Villarini et al. 2008). Sensitivity experiments (not shown) revealed that the radar–gauge correlation is not systematically better/worse if four or nine neighboring pixels were averaged for the matching.

KED assumes the random process *Z* to be Gaussian and stationary—that is, with constant mean, constant variance, and constant covariance structure over the entire domain. (In fact, similar assumptions are made in other geostatistical combination methods.) However, the deviations from the linear radar–gauge relationship are hardly Gaussian, but positively skewed (in particular, precipitation is truncated at zero). Moreover, the variance of these deviations is usually not constant, but increases with the precipitation values.

One approach to overcome these violations of geostatistical model assumptions is trans-Gaussian kriging (e.g., Cressie 1993; Schabenberger and Gotway 2005). The idea is to transform the target variable (and possibly also the covariates) with a monotonic function so that the assumptions of stationarity and Gaussian distribution are better fulfilled. Kriging is then performed on this transformed scale and estimates in the original scale are obtained by subsequent back transformation of the kriging prediction.

*Y*and

*Y** are the original and transformed variables, respectively;

*λ*is the transformation parameter. Values of

*λ*larger than 0 and smaller than 1 are of primary interest for our application because they are suitable to correct for positive skewness and because they yield regular transformations for

*Y*≥ 0 (zero included). Note that the log transformation itself (

*λ*= 0) is not feasible because of its irregularity for dry conditions (i.e., rain gauge and radar measurements of zero). Also note that the Box–Cox transformation does not strictly resolve the issue of a lower bound. For positive values of

*λ*the lower bound for

*Y** is −1/

*λ*.

In our application of trans-Gaussian KED, we apply the Box–Cox transformation to both radar and gauge data, considering that the skewness of both variables contributes to violating the model assumptions. The same value of *λ* (either fixed a priori or case dependent; see below) is used for both variables. Application of KED on the transformed scale results in a probabilistic estimate at each target location, which has the form of a Gaussian with a mean value given by the kriging point estimate and with a variance given by the kriging variance. The probabilistic estimate in original (untransformed) space is then obtained by converting this Gaussian distribution with the inverse Box–Cox transformation. This is accomplished numerically and individually at each location, by back transforming 399 quantiles, equidistant in probability. The point estimate at precipitation scale (the expected value of the back-transformed probabilistic estimate) is then approximated by the mean of these back-transformed quantiles. In some cases, a few of the lowest of the 399 considered quantiles (on transformed scale) are below the threshold −1/*λ*, where no regular back transform exists. In such cases, we set the respective quantiles to zero precipitation (i.e., their probability mass is accumulated at dry conditions).

In this study, we compare applications of KED with four different settings of the Box–Cox transformation parameter *λ* in order to gain insight into the effects of transformation strength: *λ* = 1 corresponds to the classical application without transformation. Here we use it as a reference to compare to applications with transformations. The setting *λ* = 0.5 corresponds to the square root transformation. This is a well-established transformation that has been suggested for geostatistical radar–gauge combination previously (Schuurmans et al. 2007). The third setting, *λ* = 0.1, is a setting close to the log transformation. We choose it to illustrate the consequences of very strong, possibly excessive transformation.

As a fourth transformation setting, we apply a case-dependent transformation parameter *λ* = *λ _{e}*;

*λ*is estimated individually for each hourly accumulation period to account for the variable distributions during different precipitation regimes. There are several criteria that could guide the selection of a case-dependent

_{e}*λ*: Gaussian distribution of residuals, homoscedasticity, or linearity of the radar–gauge relationship. In this study we choose the criterion of Gaussian distribution of residuals, very similar to what is described in Box and Cox (1964). To this end,

*λ*is estimated by maximizing the likelihood of Gaussian residuals in a linear relation between the transformed predictor (radar) and transformed predictand (gauge). For simplicity, the spatial correlation of residuals is not accounted for in the estimation of

_{e}*λ*. Moreover, the range of possible values for

_{e}*λ*is constrained to the range [0.2, 1.5] by imposing a prior density to the maximum likelihood estimation. For this prior density, a beta distribution is selected, with positive density on the interval [0.2, 1.5] (and zero outside). The shape parameters are

_{e}*α*= 1.03 and

*β*= 1.12, producing a relatively steep tail near the lower bound, almost constant density near the center and a more gradual tail at the upper bound. The purpose of the constraint is to avoid values of

*λ*close to zero, because this is associated with large biases in the point estimate as will be discussed below in section 4.

_{e}A systematic application over one year (2008; see later section 3) yields values of *λ _{e}* between 0.20 and 0.53, with a large fraction (76%) between 0.2 and 0.25 and only 13% exceeding 0.3. Fourteen percent of all

*λ*values are found very close to the imposed lower bound of 0.2 (between 0.2 and 0.21). This suggests that, in approximately these 14% of all cases,

_{e}*λ*values would have fallen below this lower bound without the constraint of the prior distribution. Estimates of

_{e}*λ*vary only slightly between seasons; for instance, there is a mean value of 0.235 in summer [June–August (JJA)] compared to 0.247 in winter [December–February (DJF)]. Values of

_{e}*λ*vary more prominently with the structure and spread of precipitation: large values of

_{e}*λ*(>0.3) typically occur in hours with even and widespread precipitation (more than 20 wet gauges, coefficient of variation smaller than 1.5), while small values of

_{e}*λ*(<0.22) are prominent in situations with regionally constrained and variable precipitation (less than 15 wet gauges, coefficient of variation larger than 2). The small seasonal variation suggests that situations for small and for large values of

_{e}*λ*occur in all seasons, supporting our estimation of

_{e}*λ*flexibly for each hour, rather than, for example, on a seasonal basis.

_{e}All computations are done in R (R Development Core Team 2010) and the geostatistical combination method is based on the geoR package (Ribeiro and Diggle 2001).

### b. Evaluation

Leave-one-out cross validation is undertaken for the evaluation of the point and probabilistic estimates. That is, point estimates and probabilistic estimates are computed by KED for each gauge location in turn, without using the measurements at the gauge itself. The estimates are then compared to the measurements.

*i*runs over all evaluation pairs. We prefer MRTE over the frequently used root-mean-square error (RMSE), because the square root transformation of precipitation amounts in MRTE mitigates the dominant influence of errors at large precipitation amounts in RMSE.

To assess the reliability of the probabilistic estimate, we compare each gauge measurement against the probability density function (PDF) of the corresponding cross-validation probabilistic estimate using the idea of probability integral transform (Hoel et al. 1971): In a system that produces fully reliable probabilistic estimates, one would expect that the frequency with which a gauge measurement is smaller than quantile Q_{p} of its pertinent probabilistic estimate is *p*, for whatever choice of *p*. We can therefore use the systematic evaluation, to assess the empirical frequency of measurements in relation to quantiles of the corresponding probabilistic estimates. To this end, we calculate the frequency of observations to fall in between predefined quantiles (say *p*_{2}–*p*_{1}). The ratio of observed versus expected frequencies, as a function of interquantile bins, gives information on the general reliability of the kriging prediction intervals. Our assessment of reliability is similar to the Talagrand diagram or rank histogram employed in ensemble verification (Talagrand et al. 1997; Hamill 2001). The procedure is further illustrated with a numerical example, which we supply later in section 4b.

## 3. Test bed and data

The test bed for our trans-Gaussian radar–gauge combination experiments is hourly precipitation over the domain of Switzerland. In a first step, results for a selected hourly precipitation field will demonstrate the effects of transformation qualitatively as an illustrative example. A quantitative assessment is then undertaken with a systematic application and leave-one-out cross validation over one year (2008). In this section we describe the radar and rain gauge data employed and we introduce the selected example case as well as the procedures of systematic application.

### a. Radar and rain gauge data

We use a radar composite of hourly accumulated precipitation (in mm) covering the entire domain of Switzerland with a grid spacing of 1 km. The composite incorporates measurements from the three C-band radars of MeteoSwiss operational during 2008 (Fig. 1). The hourly composite is derived from an original 5-min composite by simple averaging. These radar data constitute a further development of the QPE radar product described in Germann et al. (2006). The conversion from reflectivity to rain rate is done with a fixed *Z*–*R* relation (*Z* = 316*R*^{1.5}), where empirical parameters were obtained from long-term observations prior to our study year (see Joss et al. 1998). The postprocessing includes several clutter filters, visibility correction relying on geometrical simulation, as well as a vertical profile correction using a real-time mesobeta-scale profile (Germann and Joss 2002). Furthermore, radar measurements are adjusted to the climatology of rain gauge measurements on domainwide and on local scale with seasonally stratified adjustment factors. At the daily time scale, the accuracy of the radar data is characterized by a scatter of 2.3 dB and a false alarm ratio of 15% for detecting daily precipitation larger than 0.3 mm (Germann et al. 2006). In the mountainous parts of the country the accuracy is lower than over the flatland, mainly because of beam shielding by mountains.

As for the gauge data, we use hourly accumulated measurements from the automated rain gauge network of MeteoSwiss (MeteoSwiss 2010). In 2008, the network consisted of 75 tipping buckets, with a typical intergauge distance of 25 km (Fig. 1). The stations are distributed quite evenly over the country, but there is an underrepresentation of high-elevation areas. The rain gauge measurements pass a thorough manual quality control, but they are not corrected for systematic biases (e.g., Neff 1977).

### b. Example case for illustration

Figure 1 shows the radar composite and rain gauge measurements for the selected example case (1800–1900 UTC 10 June 2008). The radar composite shows several related rainfall cells in northeastern Switzerland with hourly accumulations exceeding 15 mm. Meteorologically, the event can be considered as a case of prefrontal convection developing in an environment of still weak pressure gradients. The example case includes features that are typical for summertime precipitation distribution in Switzerland and bring along challenges for radar–gauge combination. These are a large fraction of dry area, high spatial variability at scales merely resolved by the gauge network, and locally high intensities. Forty four out of 74 gauges are dry, 15 gauges record between 0.1 and 0.5 mm, and another 15 more than 0.5 mm. The largest measured rain gauge value is 7.4 mm. The comparison of radar and gauge data suggests that radar overestimates precipitation in this particular hour, especially at locations with moderate to high precipitation. Also, the radar composite implies larger wet areas than supported by the gauges: there are 16 gauge locations with dry gauge and wet radar measurement, but none with wet gauge and dry radar.

### c. Systematic application

For the quantitative assessment of trans-Gaussian geostatistical radar–gauge combination, we apply KED with different settings of the Box–Cox transformation parameter *λ* systematically for the year 2008. The application includes 1438 hourly accumulations, for which at least 10 gauges have registered values of 0.5 mm or more. Hours with fewer than 10 wet gauges are excluded because the robustness problems associated with such limited sampling could blur the effects of transformation, and alternative estimation procedures (e.g., using a prescribed variogram) would have to be considered for such situations anyway.

## 4. Results

This section presents and discusses our analyses on the effect of transformation on geostatistical radar–gauge combination by KED. In the first subsection, we compare applications with and without transformation for the selected example case and provide insight into the effects of transformation on the fulfillment of model assumptions, on the point estimate, and on the probabilistic estimate. In the second subsection, the results of the systematic evaluation over the year 2008 are presented. These quantitative analyses assess the influence of transformation on the radar coefficient, on the point estimate, and on the probabilistic estimate.

### a. Example case

Figure 2 depicts results from applying KED with different settings of the transformation parameter *λ* to the example case (1800–1900 UTC 10 June 2008; see also Fig. 1). The pertinent parameter estimates (radar coefficient and variogram parameters) are listed in Table 1.

Applications of KED with different settings of the transformation parameter *λ* (see Table 1) for the example case 1800–1900 UTC 10 Jun 2008. (left) QQ–normal: normal quantile–quantile plot of the standardized residuals at transformed scale (deviations of gauge values from the trend at the pertinent locations), diagnostic for the assumption of a Gaussian distribution. (middle left) Tukey–Anscombe plot: standardized residuals at transformed scale as a function of pertinent trend value, diagnostic for stationarity of residual distribution. Label 20 in lower Tukey–Anscombe plots indicates point onto which 20 samples are mapped (not labeled in the two upper Tukey–Anscombe plots for lack of space). (middle right) Map sections of the point estimate at back-transformed scale (in mm; upper color legend at the bottom of figure). Colored dots depict observed rain gauge measurements. (right) Map sections of a robust measure of the standard deviation at precipitation scale, defined as half of the difference between the 84% and 16% quantiles of the probabilistic estimate (in mm; lower color legend).

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Applications of KED with different settings of the transformation parameter *λ* (see Table 1) for the example case 1800–1900 UTC 10 Jun 2008. (left) QQ–normal: normal quantile–quantile plot of the standardized residuals at transformed scale (deviations of gauge values from the trend at the pertinent locations), diagnostic for the assumption of a Gaussian distribution. (middle left) Tukey–Anscombe plot: standardized residuals at transformed scale as a function of pertinent trend value, diagnostic for stationarity of residual distribution. Label 20 in lower Tukey–Anscombe plots indicates point onto which 20 samples are mapped (not labeled in the two upper Tukey–Anscombe plots for lack of space). (middle right) Map sections of the point estimate at back-transformed scale (in mm; upper color legend at the bottom of figure). Colored dots depict observed rain gauge measurements. (right) Map sections of a robust measure of the standard deviation at precipitation scale, defined as half of the difference between the 84% and 16% quantiles of the probabilistic estimate (in mm; lower color legend).

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Applications of KED with different settings of the transformation parameter *λ* (see Table 1) for the example case 1800–1900 UTC 10 Jun 2008. (left) QQ–normal: normal quantile–quantile plot of the standardized residuals at transformed scale (deviations of gauge values from the trend at the pertinent locations), diagnostic for the assumption of a Gaussian distribution. (middle left) Tukey–Anscombe plot: standardized residuals at transformed scale as a function of pertinent trend value, diagnostic for stationarity of residual distribution. Label 20 in lower Tukey–Anscombe plots indicates point onto which 20 samples are mapped (not labeled in the two upper Tukey–Anscombe plots for lack of space). (middle right) Map sections of the point estimate at back-transformed scale (in mm; upper color legend at the bottom of figure). Colored dots depict observed rain gauge measurements. (right) Map sections of a robust measure of the standard deviation at precipitation scale, defined as half of the difference between the 84% and 16% quantiles of the probabilistic estimate (in mm; lower color legend).

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Parameter estimates of the application of KED for the example case (1800–1900 UTC 10 Jun 2008) using different settings of transformation parameter *λ*, *β*: radar coefficient [see Eq. (1)], range: effective variogram range, and nugget/sill: ratio of nugget to total sill of variogram. RTW is the ratio of total water amounts calculated from point estimates in cross-validation mode (see section 2b).

We first assess the fulfillment of model assumptions at transformed scale by inspecting pertinent diagnostics—that is, the QQ–normal plot of standardized residuals (Gaussian distribution of residuals; column 1 of Fig. 2) and the Tukey–Anscombe plot (constancy of variance; column 2 of Fig. 2). For untransformed data (*λ* = 1), the residual distribution deviates considerably from Gaussian. In particular, the largest positive residuals contribute to a longer-than-normal upper tail. Furthermore, the residual variance increases with increasing trend value as is evident in the conelike structure of points in the Tukey–Anscombe plot. In contrast, *λ* = 0.5 and *λ* = *λ _{e}* show much better fulfillment of model assumptions: the residuals are closer to a Gaussian distribution (QQ–normal) and the variance remains more or less constant throughout the range of fitted trend values (Tukey–Anscombe). In the application with strong transformation (

*λ*= 0.1), the residual distribution turns to short tailed. Furthermore, both diagnostics show a separation of residuals into three groups: a group with negative residuals, evident in the lower part of the plots, consisting of the dry gauges paired with wet radar pixels. Here, the trend values are larger than the observations (i.e., negative residuals) and there is a linear trend residual relationship. A second group encompasses the 20 dry gauges paired with dry radar pixels. Because of identical trend and residual values, these cases map onto a single point in the Tukey–Anscombe plot (labeled with “20”) and onto a horizontal line in the QQ–normal plot. Finally, all wet radar and gauge measurements constitute a third group, forming a data cloud in the upper-right corner of the Tukey–Anscombe plot. Note that similar, though less pronounced, residual grouping is also evident in the Tukey–Anscombe plots for the other transformation settings.

The example illustrates how model assumptions can be clearly violated in the application of KED with untransformed data as well as with an obviously exaggerated transformation such as that close to the log transformation. In contrast, a data-driven estimate of the transformation (*λ _{e}*), in this case, offers an improved but not perfect compliance with the Gaussian residuals assumption. The

*p*values of two common goodness-of-fit tests (Pearson’s chi-square test and the Kolmogorov–Smirnov test) confirm this impression: even though all tests suggest a rejection of the Gaussian distribution hypothesis because of the 20 coinciding residuals (significance level: 0.05), the largest

*p*value is found for

*λ*, followed by

_{e}*λ*= 0.5, whereas the model without (

*λ*= 1) and that with very strong (

*λ*= 0.1) transformation yield smaller

*p*values. This is not unexpected, as the applied maximum likelihood estimation (MLE) optimizes for the criterion of Gaussian residuals. However, the example also highlights the fact that data transformation cannot remedy all the conflicts with model assumptions in geostatistical radar–gauge combination. The intermittency of precipitation and errors in the wet–dry distinction by the radar are inevitably associated with clustering in the Tukey–Anscombe plot, pointing to nonstationarities in the distribution of residuals.

The third column of Fig. 2 shows maps of the point estimates for the example case, derived from KED with different transformation settings. For *λ* = 1, *λ* = 0.5, and *λ* = *λ _{e}* the precipitation distributions are very similar. They incorporate the finescale precipitation pattern observed by the radar, but precipitation rates are generally smaller—that is, KED corrects for the apparent overestimates by radar (cf. Fig. 1). There are small differences between the three maps. On the one hand, these are due to minor differences in the radar coefficient [

*β*in Eq. (1)] and variogram parameters (range and nugget/sill) (see Table 1). On the other hand, they are caused by differences in the relative influence of single gauge measurements on the stochastic model part: for

*λ*= 1, residuals to the trend are large (and therefore influential) predominantly at gauges with large precipitation amounts, whereas large residuals (at transformed scale) can also be found at gauges with small precipitation amounts when

*λ*decreases.

The point estimate with strong transformation (*λ* = 0.1) differs remarkably from the others in the sense that it is considerably wetter. The comparison with gauge measurements suggests that KED with *λ* = 0.1 overestimates precipitation. This is actually confirmed in the cross validation, which reveals a strong positive bias, with the total water amount being overestimated by a factor of 2.7 (see RTW in Table 1). Note that the overestimate is not because KED gives more credibility to the (too wet) radar measurements. Actually, the radar coefficient for *λ* = 0.1 is lower than that for other transformations (see Table 1). Moreover, the largest values in the point estimate are found in an area with low station density (in the center of the map section), whereas radar measurements are largest in a different part of the depicted area (in the north, see Fig. 1). Insight into the origin of the positive bias with excessive transformation will be given further below.

Figure 2 also shows maps of the uncertainty estimate (last column). Because of the skewed nature of the probabilistic estimate at back-transformed scale, we choose half the difference between the 84% and 16% quantiles as measure of uncertainty (for Gaussian distributions this returns the standard error). For *λ* = 1, the uncertainty estimate is small close to and large away from the rain gauges (i.e., it depends primarily on the station density). This is characteristic of classical untransformed kriging, where kriging variances depend on sample density and configuration only. In addition, we see slightly larger values of uncertainty in regions with large radar values. This is a consequence of the uncertainty in trend parameters, which manifests itself particularly at large radar values (extrapolation). In fundamental contrast to the untransformed case, applications of KED with transformation (*λ* = 0.5, *λ _{e}*, or 0.1) show patterns of uncertainty that resemble the precipitation field itself (i.e., uncertainty is small in dry areas and large in wet areas). The relationship between uncertainty and precipitation amount becomes stronger with decreasing

*λ*.

Considering the skewed nature of precipitation, it is natural to expect large absolute uncertainties for a combined precipitation estimate in regions with intense precipitation and very small uncertainties in regions where gauge and radar measurements reveal dry conditions. The pattern of uncertainty revealed by KED with transformation is therefore clearly more realistic than the virtually even spatial distribution obtained without transformation in this example case. Hence, apart from improving the compliance with model assumptions, transformation can also introduce a desirable qualitative dependence of the uncertainty estimate on precipitation amount.

To better understand the influences of transformation in different parts of the domain, we inspect the full probabilistic precipitation estimate for three selected gauges (indicated in Fig. 1). Figure 3 shows the cumulative distributions of the pertinent probabilistic estimates, obtained from KED in cross-validation mode and using the different transformation settings. Note that station MAG is located in a dry region, station SMA is at the border of a precipitation cell, and station WAE is located near the center of an intense precipitation area (cf. Fig. 1).

Cumulative distribution functions of the probabilistic estimate for the example case (1800–1900 UTC 10 Jun 2008) at the location of three selected rain gauges (see station labels in Fig. 1). Distributions refer to the application of KED in cross-validation mode using different settings of the transformation parameter *λ*; see legend in bottom panel. Dashed vertical lines indicate the pertinent cross-validated point estimates. The solid red vertical lines mark the measured precipitation amount at the gauges (values also denoted in panel titles).

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Cumulative distribution functions of the probabilistic estimate for the example case (1800–1900 UTC 10 Jun 2008) at the location of three selected rain gauges (see station labels in Fig. 1). Distributions refer to the application of KED in cross-validation mode using different settings of the transformation parameter *λ*; see legend in bottom panel. Dashed vertical lines indicate the pertinent cross-validated point estimates. The solid red vertical lines mark the measured precipitation amount at the gauges (values also denoted in panel titles).

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Cumulative distribution functions of the probabilistic estimate for the example case (1800–1900 UTC 10 Jun 2008) at the location of three selected rain gauges (see station labels in Fig. 1). Distributions refer to the application of KED in cross-validation mode using different settings of the transformation parameter *λ*; see legend in bottom panel. Dashed vertical lines indicate the pertinent cross-validated point estimates. The solid red vertical lines mark the measured precipitation amount at the gauges (values also denoted in panel titles).

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

For *λ* = 1, the distribution of the probabilistic estimates is Gaussian (by assumption) and, hence, symmetric at all three gauge locations (black lines in Fig. 3). Furthermore, the uncertainty estimate is comparable at all three stations (note the different scales of the panels) because kriging variance depends primarily on station density and configuration (cf. discussion of Fig. 2 above). The three examples foreshadow that the uncertainty estimate by KED with untransformed data is, in general, too large at dry locations and too small at locations with heavy precipitation. In the example of WAE, the observation is considered as extremely unlikely by the probabilistic estimate even though the point estimate (dashed black line) is close to the observation (i.e., 5.4 mm compared to 7.3 mm). This is representative for locations with intense precipitation as will be shown later (section 4b).

For station MAG the probabilistic estimate with *λ* = 1 has nonzero probability density at negative precipitation values, which is physically not meaningful. This occurs at locations where point estimates are low because of the symmetric distribution and excessive variance. If this was amended by truncating the PDF at zero (aggregating the corresponding probability mass at zero), the unbiasedness of kriging estimates would be destroyed and a positive bias to the point estimate would be introduced.

For applications with transformation (*λ* = 0.5, *λ _{e}*, and 0.1; green, blue, and chartreuse lines in Fig. 3) the probabilistic estimate is positively skewed. Also, the uncertainty estimate is smaller at MAG (dry) than at SMA (light precipitation) and smaller at SMA than at WAE (intense precipitation). This is a manifestation of the positive dependence of the uncertainty estimate on the estimated precipitation amount as seen earlier in the discussion of Fig. 2. Both the skewness and the dependence of uncertainty on precipitation amount are desirable characteristics and render the probabilistic estimates with transformation more realistic and plausible than those without. The observed intense precipitation at station WAE is well included in the probabilistic estimates by models with transformed data, in contrast to

*λ*= 1. Furthermore, there are no negative quantiles produced by models with transformed data (by our definition of the back transform; see section 2a). The truncation at zero leads to physically meaningful probabilistic estimates. In contrast to

*λ*= 1, the possibly introduced positive bias by this truncation is negligible when using transformed data because of the very small uncertainty estimates at dry locations.

Comparisons of the different transformations show that the degree of skewness and the dependence of uncertainty on precipitation amount increase with decreasing *λ*: the strongest transformation (*λ* = 0.1) yields the smallest uncertainty estimate at MAG (dry) but the largest at WAE (intense precipitation). Moreover, its probabilistic estimates are more skewed than those from other transformations at all locations. In the case of WAE the excessive skewness and dependence on precipitation amount with *λ* = 0.1 cause an unrealistically large upper tail of the probabilistic estimate (95% quantile is 69 mm, 99% quantile is 248 mm). This has a strong influence on the point estimate (defined as the expected value of the distribution) and leads to a considerable overestimate at this station (14.3 mm compared to 7.3 mm). Similar overestimates are found for other wet gauges and we interpret the positive bias with *λ* = 0.1 in Fig. 2 (and Table 1) as a consequence of the excessive skewness and precipitation dependence. Experiments with *λ* < 0.2 for several other example cases (not shown) resulted in similar positive biases. An inappropriately small transformation parameter (too-strong transformation) hence bears the risk of a positive bias in the point estimates. Actually, in previous experiments (not shown), the likelihood-based estimation of *λ* described in Box and Cox (1964) yielded values of *λ* below 0.2 and obviously biased point estimates in many cases with a large number of dry gauges. For this reason we have introduced the constraint λ ≥ 0.2 with the aid of a prior distribution (cf. section 2).

Figure 3 also illustrates an interesting limitation of data transformation in describing uncertainties. Station SMA is situated in a zone of a strong precipitation gradient (see Fig. 1). The variability of residuals at nearby gauges is considerably larger than in the rest of the domain (not shown). In fact, the cross-validated point estimate at SMA varies considerably between the four transformation settings, reflecting the large and variable residuals in the neighborhood. We would expect that the uncertainty of a rainfall estimate is particularly large at this and adjacent locations, not because rainfall rates are particularly high, but because of the complex pattern and strong gradients (see the radar field in Fig. 1). However, a dependence of uncertainty on precipitation structure, leading to regionally variable residual variance, cannot be represented in the geostatistical framework, which assumes constant variance. The real uncertainty of the point estimate at SMA is therefore likely underestimated in KED with and without transformation. This is a particular violation of the stationarity assumption, which cannot be eliminated by data transformation.

### b. Systematic analysis for 2008

The systematic application of KED encompasses a total of 1438 h of the year 2008, all of which dispose of at least 10 wet gauges. The constraint will avoid serious robustness problems in parameter estimation. Results are compared for *λ* = 1, 0.5, and *λ _{e}*. The transformation with

*λ*= 0.1 is not further analyzed because of the large bias found in many cases with this obviously inappropriate setting (see section 4a);

*λ*= 1 is further pursued as a reference despite its observed deficiencies because of its wide application in current geostatistical radar–gauge combination.

Transformation influences the estimate of the radar coefficient in KED [*β* in Eq. (1)]. This is illustrated in Fig. 4, depicting a scatterplot of estimates of *β* obtained with *λ* = 1 and *λ* = *λ _{e}*. There is a positive correlation between the two estimates, reflecting the mutual adjustment of multiplicative radar errors. But, for

*λ*= 1, estimates of

*β*vary over a wide range, including unrealistic cases such as negative values and very large values (up to 9). For

*λ*=

*λ*, on the contrary, estimates of

_{e}*β*are confined to the range 0–1.2 without unrealistic outliers. Obviously, the radar coefficient is more reliably estimated with transformation. Without transformation, the estimation of

*β*is dominated by those radar–gauge pairs with large absolute differences (i.e., typically a few gauges with the largest precipitation measurements). Transformation with a suitable

*λ*, however, produces a more balanced residual variance across the rain gauge sample and thereby reduces the relative influence of single large observations.

Scatterplot of estimates of the radar coefficient *β* for *λ* = *λ _{e}* (

*y*axis) and

*λ*= 1 (

*x*axis). Results are from the application of KED to 1438 h in 2008. Three estimates of

*β*for

*λ*= 1 are larger than 4 (not on the plot; the largest is 9), of which the corresponding estimates for

*λ*=

*λ*are below 1.2.

_{e}Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Scatterplot of estimates of the radar coefficient *β* for *λ* = *λ _{e}* (

*y*axis) and

*λ*= 1 (

*x*axis). Results are from the application of KED to 1438 h in 2008. Three estimates of

*β*for

*λ*= 1 are larger than 4 (not on the plot; the largest is 9), of which the corresponding estimates for

*λ*=

*λ*are below 1.2.

_{e}Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Scatterplot of estimates of the radar coefficient *β* for *λ* = *λ _{e}* (

*y*axis) and

*λ*= 1 (

*x*axis). Results are from the application of KED to 1438 h in 2008. Three estimates of

*β*for

*λ*= 1 are larger than 4 (not on the plot; the largest is 9), of which the corresponding estimates for

*λ*=

*λ*are below 1.2.

_{e}Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

As for the accuracy of the point estimates, Table 2 lists skill measures (as defined in section 2c) of cross-validation experiments with KED and different transformation settings for the entire year (2008) as well as for the summer and winter months separately. The comparison between the seasons shows that the combination has a better performance in the category-based scores (HK and SEEPS) in summer, but larger mean errors and error spread at wet locations (MRTE and SCATTER). This can be explained by the better radar visibility (rainfall systems reach into higher altitudes) and larger precipitation amounts of convective summer storms compared to the stratiform precipitation systems in winter. The seasonal variations are similar for the different transformation settings. When comparing the scores between transformation settings there are only minor differences between them in a season: there is a slight positive bias for *λ* = 1 and *λ* = *λ _{e}* and a small negative bias for

*λ*= 0.5 (column RTW). The delineation between wet and dry areas (HK) as well as the overall accuracy (SEEPS and MRTE) are somewhat improved by transformation, whereas the error spread at wet locations (SCATTER) is slightly deteriorated. These differences between transformations are similar for the year and individual seasons. Altogether, our systematic application shows no clear improvement of point estimates with transformation compared to without, which corroborates the finding of section 4a that point estimates are very similar for the three transformation settings.

Skill measures of systematic cross validation with KED and different transformation settings (λ). RTW (best score RTW = 1), HK (best score HK=1), SCATTER (best score SCATTER = 0 dB), SEEPS (best score SEEPS = 0), and MRTE (best score MRTE = 0). (See section 2b for more detail on the skill measures.) Results are listed for the entire year (2008), as well as for winter (DJF) and summer (JJA) separately.

We now turn to evaluating the reliability of the probabilistic estimate—that is, to examine whether geostatistical radar–gauge combination can deliver not only accurate point estimates but also realistic estimates of uncertainty. To this end, we compare for each hour and rain gauge location the pertinent measurement (a point value) to the probabilistic estimate (a PDF) obtained from KED without using the measurement itself (cross validation). (See section 2b for details of this comparison.) Figure 5 depicts the frequencies with which measurements fall into certain predefined interquantile bins of their pertinent probabilistic estimate. Frequencies are expressed relative to the expected frequency in a fully reliable system. For example, the expected frequency of measurements to fall between the 75% and 90% quantiles is 0.15, and hence the observed frequency of measurements between the quantiles is divided by this number. [Figure 5 is somewhat similar to the Talagrand diagram or rank histogram in ensemble verification (Talagrand et al. 1997; Hamill 2001).] For some of the dry measurements, the association to quantile bins is not unique. This occurs when several of the bin borders (selected quantiles) are zero. In these cases, dry observations are distributed between all consistent bins according to the expected frequency.

Frequency of rain gauge measurements to fall in interquantile bins of the pertinent cross-validated probabilistic estimate (cumulative probability; *x* axis). Frequencies are expressed relative to the expected frequency in a perfectly reliable system (*y* axis; log scale). Bins of the distribution have been defined by the 0.5%, 2.5%, 5%, 10%, 25%, 75%, 90%, 95%, 97.5%, and 99.5% quantiles (dashed vertical lines). Results from a systematic application of KED in 2008 are shown for three transformation settings (*λ* = 1, *λ* = 0.5, and *λ* = *λ _{e}*). Note that the black (

*λ*= 1) and blue (

*λ*=

*λ*

_{e}) line coincide between the 25% and 75% quantile.

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Frequency of rain gauge measurements to fall in interquantile bins of the pertinent cross-validated probabilistic estimate (cumulative probability; *x* axis). Frequencies are expressed relative to the expected frequency in a perfectly reliable system (*y* axis; log scale). Bins of the distribution have been defined by the 0.5%, 2.5%, 5%, 10%, 25%, 75%, 90%, 95%, 97.5%, and 99.5% quantiles (dashed vertical lines). Results from a systematic application of KED in 2008 are shown for three transformation settings (*λ* = 1, *λ* = 0.5, and *λ* = *λ _{e}*). Note that the black (

*λ*= 1) and blue (

*λ*=

*λ*

_{e}) line coincide between the 25% and 75% quantile.

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Frequency of rain gauge measurements to fall in interquantile bins of the pertinent cross-validated probabilistic estimate (cumulative probability; *x* axis). Frequencies are expressed relative to the expected frequency in a perfectly reliable system (*y* axis; log scale). Bins of the distribution have been defined by the 0.5%, 2.5%, 5%, 10%, 25%, 75%, 90%, 95%, 97.5%, and 99.5% quantiles (dashed vertical lines). Results from a systematic application of KED in 2008 are shown for three transformation settings (*λ* = 1, *λ* = 0.5, and *λ* = *λ _{e}*). Note that the black (

*λ*= 1) and blue (

*λ*=

*λ*

_{e}) line coincide between the 25% and 75% quantile.

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

In Fig. 5, a perfectly reliable probabilistic estimate would manifest as a straight horizontal line at 1. All three transformation settings, however, produce U-shaped curves—that is, measurements fall too often below the lowest (0.5%) and above the highest (99.5%) quantiles considered. Moreover, measurements are found too often in lower and too seldom in middle and upper-quantile bins. Together this implies that the probabilistic estimates of KED tend to underestimate the true uncertainty for all transformation settings (i.e., the PDFs are generally too narrow).

However, there are marked differences in the degree of unreliability between transformation settings. The transformation with *λ _{e}* produces more accurate frequencies at medium and high quantile bins compared to no transformation (

*λ*= 1). The difference is particularly large for the uppermost bin, where the observed frequency is 2.4 times too large for

*λ*=

*λ*, but 6.0 times too large for

_{e}*λ*= 1 (note the log scale). At the lowermost bins, however, the frequencies are less reliable for

*λ*=

*λ*than for

_{e}*λ*= 1. Further analyses have shown that this aggravation by transformation is coming exclusively from cases with dry gauge measurements. The narrow and skewed probabilistic estimates with transformation (see section 4b) lead to frequent situations, where all quantiles are very close to but not exactly zero. As a result, many dry measurements fall formally outside their pertinent probabilistic estimate but the differences in rain rate are so small that these discrepancies are of no practical relevance.

In summary, the analysis of Fig. 5 indicates that the probabilistic estimates of KED are in general overconfident (i.e., too-narrow PDFs) with all examined transformations. However, the overconfidence is substantially reduced by transformation at the upper end of the distribution. The physically plausible skewness and precipitation dependence introduced to the probabilistic estimate by transformation (cf. section 4a) are hence also quantitatively justified by the strong improvement in reliability.

For many applications, a good representation of intense precipitation amounts is of particular interest. We continue the evaluation of probabilistic estimates specifically for intense precipitation events, which we define as cases when the measurements exceed the thresholds 5, 10, and 20 mm. (Note, that these thresholds are ad hoc and do not imply severity/rareness in an impact sense.) How well are these measurements contained in the pertinent probabilistic estimate? Table 3 lists the percentage of cases when a measured intense precipitation event is larger than the 99.5% quantile of its pertinent cross-validated probabilistic estimate. Such occurrences mean that the actually measured intense precipitation would have been considered as extremely unlikely (*p* = 0.005) by the probabilistic estimate. It is desirable that the frequency of “extreme unlikeliness” is small. The exact number is unknown, because the conditioning to intense events does not permit to compare the percentages to the unconditional expectation (0.5%). From Table 3, we see that the probabilistic estimates without transformation (*λ* = 1) consider more than half of the events exceeding the thresholds 10 and 20 mm as extremely unlikely. For the 5-mm threshold this rate is almost 0.5. This is much too large, as is illustrated with the following argument: if, instead of the kriging-based probabilistic estimate, one would choose the constant climatological PDF (i.e., the empirical PDF of measured precipitation amounts at all gauges in all hours) as probabilistic estimate, then the rate of extreme unlikeliness for 5-mm measurements would be 31% [rate of extreme unlikeliness (0.005) divided by climatological frequency of threshold exceedance (0.0159); see Table 3]. This implies that a very crude uninformed probabilistic estimate (the climatological distribution) would consider intense precipitation less often unlikely than the probabilistic estimate of KED without transformation (44%, see Table 3). But the latter has information on an hour-by-hour basis from the surrounding gauges. Hence, the rate with which KED without transformation considers measurements of intense precipitation as extremely unlikely is unrealistically large. In the case of a square root transform (*λ* = 0.5) the percentages are reduced, but still a considerable fraction of intense precipitation measurements (24%–54%) are considered extremely unlikely. With *λ* = *λ _{e}*, however, the fraction is reduced by an order of magnitude (4%–12%) compared to

*λ*= 1. Hence, the ability of KED to include large precipitation amounts (when they were observed) into the probabilistic estimate clearly benefits from transformation.

Percentage of intense precipitation events (cases when rain gauge measurements exceed predefined thresholds; first column) exceeding the 99.5% quantile of the corresponding cross-validated probabilistic estimate. KED with different transformation settings *λ*. Column “no. of events” lists the number of threshold exceedances and column “climatological frequency” the percentage of threshold exceedances with respect to all measurements (1438 h; 75 gauges).

Figure 6 shows the fraction of observed precipitation events (≥5 mm) considered extremely unlikely by the probabilistic estimate for each station. Without transformation (*λ* = 1), the fraction is very large (mostly 10%–100%) at all gauge locations, whereas for *λ* = *λ _{e}* it is substantially smaller (mostly 0%–10%). There are two exceptions to this general pattern: Station SIO (Valais, southwestern Switzerland) shows a very low fraction of extremely unlikely events for

*λ*= 1. This is due to a sampling problem: only three measurements exceed the threshold. Station GSB (southwestern border of Switzerland) stands out with a very large fraction of extremely unlikely events even with

*λ*=

*λ*. This rain gauge is located on a high-elevation pass, and there are several known measurement problems (shielding, wind exposure, and drifting snow) as well as difficulties in radar observation at this location.

_{e}Fraction of intense precipitation events (rain gauge measurement exceeding 5 mm) considered as extremely unlikely by the probabilistic estimate in a systematic application of KED for 2008 (top) without (*λ* = 1) and (bottom) with transformation (*λ* = *λ _{e}*). “Extreme unlikeliness” means that the rain gauge measurement is larger than the 99.5% quantile of the corresponding cross-validated probabilistic estimate.

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Fraction of intense precipitation events (rain gauge measurement exceeding 5 mm) considered as extremely unlikely by the probabilistic estimate in a systematic application of KED for 2008 (top) without (*λ* = 1) and (bottom) with transformation (*λ* = *λ _{e}*). “Extreme unlikeliness” means that the rain gauge measurement is larger than the 99.5% quantile of the corresponding cross-validated probabilistic estimate.

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

Fraction of intense precipitation events (rain gauge measurement exceeding 5 mm) considered as extremely unlikely by the probabilistic estimate in a systematic application of KED for 2008 (top) without (*λ* = 1) and (bottom) with transformation (*λ* = *λ _{e}*). “Extreme unlikeliness” means that the rain gauge measurement is larger than the 99.5% quantile of the corresponding cross-validated probabilistic estimate.

Citation: Journal of Hydrometeorology 13, 4; 10.1175/JHM-D-11-096.1

## 5. Conclusions

We have investigated the potential of trans-Gaussian kriging for the combination of radar and rain gauge data in QPE based on the hypothesis that data transformation could overcome conflicts of the data characteristics with the model assumptions of classical geostatistics. To this end, combination experiments for hourly QPE in Switzerland were undertaken with kriging with external drift and compared between several settings of the Box–Cox transformation, including one with a variable transformation parameter.

Indeed, several improvements were found when KED was applied with a suitable transformation as compared to the classical (untransformed) application: The assumptions of Gaussian residual distribution and stationarity of residual variance are better satisfied. Estimates of the radar coefficient are more robust, in the sense that unrealistic (excessive and negative) estimates encountered in KED without transformation are avoided. Transformation renders the probabilistic estimate into a positively skewed PDF, scaling with precipitation amount, which rectifies the unrealistic structure and dependencies inherent to KED without transformation. Most importantly, transformation improves the reliability of the probabilistic estimate substantially, correcting for the underestimation of uncertainties (overconfidence) of KED without transformation. This is particularly manifest at the upper tail: 44% of the observed intense precipitation events (≥5 mm h^{−1}) are considered extremely unlikely (*p* < 0.5%) by the probabilistic estimate without transformation. The variable transformation reduces this rate to 4%.

The stochastic concept of random Gaussian fields, which builds the theoretical basis of kriging, can also be utilized to simulate realizations of precipitation fields that are conditioned on the observations. An ensemble of such realizations could inform users about the inherent QPE uncertainty. Such an application is described in Ahrens and Jaun (2007). Reliability of the probabilistic estimates is a prerequisite for a realistic description of the QPE uncertainty by such an ensemble. Our results suggest that stochastic simulation of random Gaussian fields without transformation would yield unrealistically small uncertainty ranges, in particular for intense precipitation, and that the reliability of such an ensemble could be improved by a variable transformation.

Only small effects of transformation were found for the point estimates. KED without transformation yields precipitation distributions that are very similar to those from a transformation with exponents between 0.2 and 0.5. However, excessive transformation, close to the log transformation (Box–Cox parameter smaller than 0.2), was found to introduce a positive bias. This is related to an exaggeration of the upper tail, resulting from the excessive skewness and inordinate precipitation dependence of the probabilistic estimate. Care is therefore needed to avoid transformations close to the log. This is in line with Schuurmans et al. (2007), who found the square root transform to be preferable over the log transform in a comparable application of KED for daily QPE. Again in a different context, Ciach et al. (2007) found the variance of the radar to gauge ratio to decrease with precipitation amount, suggesting that a log transformation of radar–gauge differences may be exaggerated for achieving stationarity of variance.

Transformation cannot remedy all the complications arising for the application of geostatistics to precipitation data. For example, in cases with a high number of zeros (intermittency), heteroscedasticity cannot be avoided. Moreover, and possibly related to this, we find some level of unreliability (overconfidence) in the probabilistic estimate with all transformation settings tested. Finally, even though the uncertainty estimate in our application pictures a dependence on precipitation amount, its dependence on spatial structure (e.g., larger uncertainty in areas of strong precipitation gradients) is outside the stochastic concept of trans-Gaussian KED.

Our analysis was focusing on kriging with external drift for hourly QPE in Switzerland. We suppose, however, that the conclusions apply qualitatively also to other geographical regions and time scales and for related geostatistical combination methods because similar conflicts between model assumptions and data characteristics must be expected. In fact, additional experiments (not shown) for the daily time scale and using radar–gauge combination by ordinary kriging of radar errors (Erdin 2009; Goudenhoofdt and Delobbe 2009) in Switzerland are in agreement with the findings of the present study.

What is an appropriate setting of the Box–Cox transformation parameter? As concerns the point estimate, our analyses suggest that its accuracy is not overly sensitive to the choice of transformation parameter, provided excessive transformations are avoided (e.g., by applying a lower bound near 0.2). As concerns the probabilistic estimate, the best reliability was obtained with a case-dependent estimation of the transformation parameter, optimized for the criterion of Gaussian residuals (like in Box and Cox 1964). This criterion is only one of several possibilities and has been chosen in this study primarily for its simplicity. It would be interesting to examine other case-by-case estimates building on alternative criteria, such as homoscedasticity or linearity of the radar–gauge relationship, or a compromise of several criteria.

Combining geostatistics with data transformation is just one way of adjusting to the complex nature of precipitation data in radar–gauge combination. Clearly, the present study also pinpoints to limitations of this approach. These could be overcome by extending trans-Gaussian kriging with an explicit account of intermittency (such as proposed by Seo 1998), by introducing additional terms in the external trend to model dependencies on other factors (topography, radar quality, or aspects of precipitation structure, possibly including interactions), or by adopting fundamentally different concepts of radar rain gauge merging (e.g., Fuentes et al. 2008). Nevertheless, data transformation offers an easily implemented extension of classical geostatistical combination methods with added value when the interest is not merely in QPE itself but also in more realistic and more reliable estimation uncertainties, such as in ensemble QPE. We are currently in the process of a more extensive evaluation and intercomparison of trans-Gaussian KED and related geostatistical combination methods for hourly real-time QPE in Switzerland.

## Acknowledgments

We thank Ioannis Sideris, Urs Germann, Reinhard Schiemann, and Marco Gabella from MeteoSwiss for the supply of radar data and for many valuable discussions on radar–gauge combination. We are also grateful to Werner Stahel (Seminar for Statistics, ETH Zurich) for his contributions during an early phase of this study and for valuable suggestions on the draft manuscript, to Marco Willi for his contributions to the software and an excellent technical support, and to Mark Liniger and Christof Appenzeller for their motivating assistance in the management of the project.

## REFERENCES

Ahrens, B., and Jaun S. , 2007: On evaluation of ensemble precipitation forecast with observation-based ensembles.

,*Adv. Geosci.***10**, 139–144.Barancourt, C., Creutin J. D. , and Rivoirard J. , 1992: A method for delineating and estimating rainfall fields.

,*Water Resour. Res.***28**, 1133–1144.Box, G. E. P., and Cox D. R. , 1964: An analysis of transformations.

,*J. Roy. Stat. Soc.***26A**, 211–252.Brandes, E. A., 1975: Optimizing rainfall estimates with the aid of radar.

,*J. Appl. Meteor.***14**, 1339–1345.Ciach, G., Krajewski W. , and Villarini G. , 2007: Product-error-driven uncertainty model for probabilistic quantitative precipitation estimation with NEXRAD data.

,*J. Hydrometeor.***8**, 1325–1347.Cressie, N. A. C., 1993:

*Statistics for Spatial Data*. Wiley, 900 pp.Creutin, J. D., Delrieu G. , and Lebel T. , 1988: Rain measurement by raingage-radar combination: A geostatistical approach.

,*J. Atmos. Oceanic Technol.***5**, 102–115.DeGaetano, A. T., and Wilks D. S. , 2008: Radar-guided interpolation of climatological precipitation data.

,*Int. J. Climatol.***29**, 185–196, doi:10.1002/joc.1714.Diggle, P. J., and Ribeiro P. J. Jr., 2007:

*Model-Based Geostatistics*. Springer, 228 pp.Erdin, R., 2009: Combining rain gauge and radar measurements of a heavy precipitation event over Switzerland: Comparison of geostatistical methods and investigation of important influencing factors. Veroeffentlichungen der MeteoSchweiz 81, 108 pp.

Fuentes, M., Reich B. , and Lee G. , 2008: Spatial–temporal mesoscale modeling of rainfall intensity using gage and radar data.

,*Ann. Appl. Stat.***2**, 1148–1169.Gandin, L. S., 1965:

*Objective Analysis of Meteorological Fields*. Israel Program for Scientific Translation, 242 pp.García-Pintado, J., Barberá G. G. , Erena M. , and Castillo V. M. , 2009: Rainfall estimation by rain gauge-radar combination: A concurrent multiplicative-additive approach.

,*Water Resour. Res.***45**, W01415, doi:10.1029/2008WR007011.Germann, U., and Joss J. , 2002: Mesobeta profiles to extrapolate radar precipitation measurements above the Alps to the ground level.

,*J. Appl. Meteor.***41**, 542–557.Germann, U., Galli G. , Boscacci M. , and Bolliger M. , 2006: Radar precipitation measurement in a mountainous region.

,*Quart. J. Roy. Meteor. Soc.***132**, 1669–1692.Germann, U., Berenguer M. , Sempere-Torres D. , and Zappa M. , 2009: REAL—Ensemble radar precipitation estimation for hydrology in a mountainous region.

,*Quart. J. Roy. Meteor. Soc.***135**, 445–456.Gjertsen, U., Salek M. , and Michelson D. B. , 2004: Gauge adjustment of radar-based precipitation estimates in Europe.

*Proc. ERAD,*Visby, Sweden, ERAD, 7–11.Goudenhoofdt, E., and Delobbe L. , 2009: Evaluation of radar-gauge merging methods for quantitative precipitation estimates.

,*Hydrol. Earth Syst. Sci.***13**, 195–203.Haberlandt, U., 2007: Geostatistical interpolation of hourly precipitation from rain gauges and radar for a large-scale extreme rainfall event.

,*J. Hydrol.***332**, 144–157.Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts.

,*Mon. Wea. Rev.***129**, 550–560.He, X., Flemming V. , Stisen S. , Sonnenborg T. O. , and Jensen K. H. , 2011: An operational weather radar–based quantitative precipitation estimation and its application in catchment water resources modeling.

,*Vadose Zone J.***10**, 8–24.Hoel, P. G., Port S. C. , and Stone C. J. , 1971:

*Introduction to Probability Theory*. Houghton Mifflin, 272 pp.Jardim, E., and Ribeiro P. , 2008: Geostatistical tools for assessing sampling designs applied to a Portuguese bottom trawl survey field experience.

,*Sci. Mar.***72**, 623–630.Joss, J., and Coauthors, 1998: Operational use of radar for precipitation measurements in Switzerland. Verlag der Fachvereine Hochschulverlag AG, ETH Zürich, 110 pp.

Krajewski, W. F., 1987: Cokriging radar-rainfall and rain gage data.

,*J. Geophys. Res.***92**D8, 9571–9580.Leuenberger, D., and Rossa A. , 2007: Revisiting the latent heat nudging scheme for the rainfall assimilation of a simulated convective storm.

,*Meteor. Atmos. Phys.***98**, 195–215.MeteoSwiss, 2010: SwissMetNet: An observation network for the future. Federal Office of Meteorology and Climatology MeteoSwiss, 7 pp.

Neff, E. L., 1977: How much rain does a rain gage gauge?

,*J. Hydrol.***35**, 213–220.Overeem, A., Buishand T. , and Holleman I. , 2009: Extreme rainfall analysis and estimation of depth-duration-frequency curves using weather radar.

,*Water Resour. Res.***45**, W10424, doi:10.1029/2009WR007869.Peirce, C. S., 1884: The numerical measure of the success of predictions.

,*Science***4**, 453–454.R Development Core Team, cited 2010: The R project for statistical computing. [Available online at http://www.R-project.org.]

Ribeiro, P. J., Jr., and Diggle P. J. , 2001: GeoR: A package for geostatistical analysis.

*R News,*No. 2, R Project for Statistical Computing, Vienna, Austria, 15–18.Rodwell, M. J., Richardson D. S. , Hewson T. D. , and Haiden T. , 2010: A new equitable score for verifying precipitation in numerical weather prediction.

,*Quart. J. Roy. Meteor. Soc.***136**, 1344–1363.Schabenberger, O., and Gotway C. A. , 2005:

*Statistical Methods for Spatial Data Analysis*. Chapman & Hall/CRC, 488 pp.Schneider, S., and Steinacker R. , 2009: Utilisation of radar information to refine precipitation fields by a variational approach.

,*Meteor. Atmos. Phys.***103**, 137–144.Schuurmans, J. M., Bierkens M. F. P. , Pebesma E. J. , and Uijlenhoet R. , 2007: Automatic prediction of high-resolution daily rainfall fields for multiple extents: The potential of operational radar.

,*J. Hydrometeor.***8**, 1204–1224.Seo, D.-J., 1998: Real-time estimation of rainfall fields using radar rainfall and rain gage data.

,*J. Hydrol.***208**, 37–52.Seo, D.-J., and Breidenbach J. P. , 2002: Real-time correction of spatially nonuniform bias in radar rainfall data using rain gauge measurements.

,*J. Hydrometeor.***3**, 93–111.Sinclair, S., and Pegram G. , 2005: Combining radar and rain gauge rainfall estimates using conditional merging.

,*Atmos. Sci. Lett.***6**, 19–22.Talagrand, O., Vautard R. , and Strauss R. , 1997: Evaluation of probabilistic prediction systems.

*Proc. ECMWF Workshop on Predictability,*Shinfield Park, United Kingdom, ECMWF, 1–25.Todini, E., 2001: A Bayesian technique for conditioning radar precipitation estimates to rain-gauge measurements.

,*Hydrol. Earth Syst. Sci.***5**, 187–199.Velasco-Forero, C. A., Sempere-Torres D. , Cassiraga E. F. , and Gomez-Hernandez J. J. , 2009: A non-parametric automatic blending methodology to estimate rainfall fields from rain gauge and radar data.

,*Adv. Water Resour.***32**, 986–1002.Verworn, A., and Haberlandt U. , 2011: Spatial interpolation of hourly rainfall—Effect of additional information, variogram inference and storm properties.

,*Hydrol. Earth Syst. Sci.***15**, 569–584.Villarini, G., Mandapaka P. V. , Krajewski W. F. , and Moore R. J. , 2008: Rainfall and sampling uncertainties: A rain gauge perspective.

,*J. Geophys. Res.***113**, D11102, doi:10.1029/2007JD009214.Young, C., Bradley A. , Krajewski W. , Kruger A. , and Morrissey M. , 2000: Evaluating NEXRAD multisensor precipitation estimates for operational hydrologic forecasting.

,*J. Hydrometeor.***1**, 241–254.Zappa, M., and Coauthors, 2008: MAP D-PHASE: Real-time demonstration of hydrological ensemble prediction systems.

,*Atmos. Sci. Lett.***9**, 80–87.Zhang, C., Tang Y. , Luo L. , and Xu W. , 2009: Outlier identification and visualization for Pb concentrations in urban soils and its implications for identification of potential contaminated land.

,*Environ. Pollut.***157**, 3083–3090.