1. Introduction
Bárdossy and Pegram (2009) use copulas to obtain multivariate probability distributions of precipitation among gauges of a ground station network. Following Sklar‘s (Sklar 1959) theorem unique copulas require continuous marginal distributions for their construction, which poses difficulties when handling binary-continuous processes such as precipitation. A wide base of copula forms are available for modeling purposes. One major drawback, however, is the high number of parameters if parametric marginal distributions are used. The parameters need to be estimated through maximum-likelihood optimization. We will show that the approach proposed here is akin to modeling multivariate dependence using a nonparametric Gaussian copula.
In a similar context Herr and Krzysztofowicz (2005) proposed a closed-form approach of bivariate precipitation modeling. The mixed binary-continuous precipitation process, observed at two sites, is mapped to standard normal variates using the nonparametric normal quantile transform (NQT). The bivariate dependence is modeled in terms of the mutually conditional PoP, two marginal and two conditional distributions and the covariance as parameter, 8 elements in total. Assuming a bivariate standard normal dependence is equivalent to using a Gaussian copula model. Albeit rigorous verification of the model, the extension of the bivariate case to multivariate is impractical because of the prohibitive number of conditional PoP combinations and conditional/marginal distributions (see appendix A). For instance, for two predictors and one predictand, the number of necessary parameters is 10 and 6 univariate distributions. For three and more predictors, the number of parameters grows factorially.
The above difficulties of proposing suitable parametric multivariate distribution models favors random sampling. To avoid handling mixed binary-continuous distribution structures with precipitation (and similarly, intermittent river flows), variates can be assumed as censored. Censored distributions describe random processes, in which measurements are cut off beyond a critical threshold, while the amount of censoring points is known. We note that censoring is different from truncation, as the latter by definition excludes values beyond the truncation threshold. As a result, mean and variance of a censored and a truncated distribution differ. Areas of application for censored distributions include life sciences (Sorensen et al. 1998) and system failure analysis (Kalbfleisch and Prentice 1980). The parameters and conditional distributions for censored distribution can be inferred by means of data augmentation through Markov chain Monte Carlo (MCMC) sampling. In hydrometeorology, censored distributions have been used in different contexts for postprocessing raw precipitation ensemble forecasts (Scheuerer and Hamill 2015), or data series, that have been power transformed to approximate a normal distribution. Bárdossy and Plate (1992) applied a simple parametric power transformation to precipitation data, while Frost et al. (2007) and Wang et al. (2011) used a modified Box–Cox and Yeo–Johnson power transform to normalize river discharge data. All reported applications are limited to a small number of predictors due to heavy parameterization.
Here we propose an alternative approach by specifying the predictive density (1) through a mixture of concepts adopted in the previous approaches. The proposal rests on the model-conditional processor (MCP) proposed by Todini (2008) for probabilistic flow forecasting. Recently Reggiani et al. (2019) used the MCP to process monthly precipitation reanalysis in poorly gauged basins in Pakistan. So far the MCP has been applied to the derivation of predictive densities for continuous variates such as river stage and discharge (Coccia and Todini 2011), monthly average precipitation and temperature (Reggiani et al. 2016). Now we extend the approach to multivariate mixed binary-continuous variates, primarily daily precipitation. The principal advantages of the approach can be summarized as follows:
The processor is parameter-parsimonious and thus applicable to large multivariate problems.
It does not require the use of ad hoc assumptions on the choice of PoP or parametric precipitation depth CDFs due to the direct use of empirical distributions obtained from observed data and the use of nonparametric transformations to standard normal variates.
It is extensible to probabilistic forecasting for arbitrarily located ground stations and an arbitrary number of predictors, suggesting applicability to ensemble forecasting.
It is computationally efficient and thus optimized for operational use.
We introduce the structure of the processor and show its properties and performance verification on a spatially limited example, while more extended applications will be given in a sequel work. As such, the current work should be envisaged as proof of concept rather than an operational application. The text is structured as follows: in section 2 we introduce the theory, in section 3 we describe data and processor application, in section 4 we describe processor execution and results, in section 5 we provide a discussion, and in section 6 we present the conclusions. Implementation details are given in the appendixes.
2. Methods
a. Processor model

The ERA-Interim weather model grid and analysis window including nine cells evidenced by the shadowed area. The triangles represent 15 observing stations including Bischofszell (BIZ) in the central analysis cell. Observed precipitation has been mapped from points to cell averages by block kriging.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

The ERA-Interim weather model grid and analysis window including nine cells evidenced by the shadowed area. The triangles represent 15 observing stations including Bischofszell (BIZ) in the central analysis cell. Observed precipitation has been mapped from points to cell averages by block kriging.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
The ERA-Interim weather model grid and analysis window including nine cells evidenced by the shadowed area. The triangles represent 15 observing stations including Bischofszell (BIZ) in the central analysis cell. Observed precipitation has been mapped from points to cell averages by block kriging.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Assumption
b. Censoring and nonignorable missing data
To fill (or impute) the gap of unknown missing values in each row of the sample, we apply a random sampling technique, which preserves the mean and the variance-covariance structure of the whole sample, including the yet unknown censored values. We note that these moments are different from those of the same but truncated sample, where in contrast to censoring, no values exist beyond the truncation threshold. Depending on the method used, the sampling is known as data augmentation (Tanner and Wong 1987) or imputation (Little and Rubin 2002).
We note that the binary variate
The parameters can be estimated by applying expectation maximization (EM) or the Newton-Raphson method for maximum likelihood estimation (MLE) on (15). Both approaches can become computationally demanding if dealing with complex missingness pattern and large multivariate MNAR data. An alternative approach is Bayesian imputation.
c. Bayesian imputation
d. MCMC sampling
We use MCMC simulation to generate a large number of sample values from the two distributions in (24) for yc and the parameters θ = (μ, Σ), respectively, and approximate the summaries
After a series of sampling steps, during which the MCMC process loses track about the arbitrarily chosen set of initial parameter values (the burn-in period), the values sampled at each iteration represent a draw from the posterior distribution, and the statistics in (26) and (27) can be computed to a degree of approximation, which depends on of the number of sampled values.
3. Application
a. Study site
A well-monitored area in the northern part of Switzerland bordering on Lake Constance and depicted in Fig. 1 was selected as study site. The area is served by 15 meteorological stations operated by Meteo Swiss with an hourly recording interval. The figure also shows the grid of the ERA-Interim weather forecasting model at 0.125° × 0.125° spatial resolution. We note that the chosen resolution corresponding to the Gaussian N640 Grid is smaller than the one of the native 0.7° × 0.7° N128 Gaussian grid. The downsizing was performed with the aid of the Meteorological Archival and Retrieval System (MARS) of ECMWF. We select the square containing station Bischofszell (BIZ) as analysis cell and the 3 × 3 beige-shadowed window centered on the analysis cell as spatial processing region. Analyzed precipitation reforecasts for the nine cells at daily time step are used as predictors to calibrate and validate the processor. We emphasize that we have used the forecasting field contained in the ERA-Interim dataset, initialized from analyses at 0000 and 1200 UTC (Berrisford et al. 2011) and not a genuine historical forecast or reforecast. However, the procedure outlined here remains independent of such choice and can be applied to assess the uncertainty of real-time forecasts as well as reanalyses in exactly the same way.
A time window of observations and forecasts covering a continuous period of 36 years starting on 1 January 1979 to 31 December 2015 is chosen. The reforecasts are aggregated from 3-hourly to daily time steps. The whole sample includes a continuous series of data with a total number of 13 514 time steps. Precipitation lower than 0.5 mm day−1 is considered a nonevent. In summary we study a m + 1 = 10-dimensional multivariate problem including 9 predictors and a single predictand of cell-averaged daily observations at analysis cell BIZ.
b. Block kriging
To use predictors provided at the scale of a model cell, precipitation needs to be upscaled from the hourly point measurements at individual stations to daily values at the scale of the analysis cell. For this purpose block kriging is used, a geostatistical method, in which the spatial correlation structure of the station records is represented by an empirical semivariogram, which is modeled through parametric functions. The semivariogram is time dependent and thus needs to be refitted periodically. We chose from four different semivariogram models, which were selected on the basis of optimal weighted least squares fitting (Cressie 1985). The optimal parameters are found by means of the Newton–Raphson method by minimizing the least squares error used as cost function. Alternatively one can consider the variogram parameters as random variables, which are optimized by maximum likelihood estimation (MLE) (Todini and Pellegrini 1999).
c. Normalization of variables
Next we use the NQT in appendix B to map the predictand and predictor variates X and
d. Missing data imputation
The NQT transformed censored series are combined into a joint distribution, which is assumed a censored MVND [see (6)] with density in (7). Of course this assumption must be tested. Such testing will be performed after the censored sample has been extended by imputation using Gibbs sampling. Bayesian imputation leads to a m + 1-dimensional complete MVN sample, including imputed values yc, by fully preserving the parameters structure μ and Σ of the uncensored parent sample. The Gibbs sampler is first tested on a synthetic 10-dimensional, perfectly MVN distributed sample that is generated by random draws from the MVND with known variance-covariance matrix Σ and zero mean μ. The complete synthetic sample is censored using the thresholds vector c calculated from the observation and forecast sample of the Swiss study site. By applying Gibbs sampling as described in appendix C, the censored part yc of the sample is retrieved by imputation from the remainder of the parent sample. The variance-covariance matrix of the reconstructed sample (yc, yo) was compared against μ and Σ of the original parent sample prior to censoring. Both parameters turned out to be essentially equal, confirming the correct convergence of the sampler through the imputation process. Figure 2 shows a bivariate Gaussian synthetic sample, which was drawn, censored and then reconstructed by imputation. When working with a real-world sample of observed and forecasted precipitation assumed as censored, the parameters of the uncensored parent sample are unknown and must be iteratively recovered though estimates from the reconstructed sample. A posterior verification by comparison, as in the synthetic case, is in this case not possible.

Bayesian imputation for a synthetic bivariate normal sample, 103 draws. (top left) Original sample drawn from N2(μ, Σ) with
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Bayesian imputation for a synthetic bivariate normal sample, 103 draws. (top left) Original sample drawn from N2(μ, Σ) with
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Bayesian imputation for a synthetic bivariate normal sample, 103 draws. (top left) Original sample drawn from N2(μ, Σ) with
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Next we proceed with testing the Gibbs sampler on the 10 × 13 514 36-year daily dataset for the study site, including 9 predictors and one series of observations. The standard-normal values of the zero-precipitation thresholds are calculated and the Gibbs sampler applied to impute the fictitious subthreshold data values. This process requires particular attention due to the high covariance among predictor series (forecasted precipitation series from adjacent weather model grid cells in Fig. 1), which lead to poor mixing properties in the Gibbs sampling (Raftery and Lewis 1992). Figure 3 shows the plot of two variates out of a synthetic 10-variate normal sample with very high uniform covariances σij = 0.99. This case mimics the covariance structure between forecast series of adjacent predictor cells. The sample was censored and successfully reconstructed by imputation.

Bayesian imputation for a highly correlated synthetic 10-variate normal sample, 103 draws. (top left) Variate 1 vs 2 of the original sample drawn from N10(μ, Σ) with μ = 0, σij = 0.99. (top right) Distribution truncated at the Gaussian values estimated from the Swiss data sample. (bottom left) Truncated sample with the red part retrieved by imputation, and (bottom right) a zoom in on the transition zone between parent and imputed sample. High correlation leads to poor mixing in the Gibbs sampling.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Bayesian imputation for a highly correlated synthetic 10-variate normal sample, 103 draws. (top left) Variate 1 vs 2 of the original sample drawn from N10(μ, Σ) with μ = 0, σij = 0.99. (top right) Distribution truncated at the Gaussian values estimated from the Swiss data sample. (bottom left) Truncated sample with the red part retrieved by imputation, and (bottom right) a zoom in on the transition zone between parent and imputed sample. High correlation leads to poor mixing in the Gibbs sampling.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Bayesian imputation for a highly correlated synthetic 10-variate normal sample, 103 draws. (top left) Variate 1 vs 2 of the original sample drawn from N10(μ, Σ) with μ = 0, σij = 0.99. (top right) Distribution truncated at the Gaussian values estimated from the Swiss data sample. (bottom left) Truncated sample with the red part retrieved by imputation, and (bottom right) a zoom in on the transition zone between parent and imputed sample. High correlation leads to poor mixing in the Gibbs sampling.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Poor mixing is known to cause oscillation of the sampling during imputation, as visible from the trace plots in Fig. 4. and eventually to lead to the divergence of the iterative process. This difficulty can be overcome by diagonalizing the variance-covariance matrix Σ through principal component analysis (PCA).

Effects of poor mixing on Gibbs sampling for one variate of a 10-variate synthetic Gaussian sample, σij = 0.99, 103 draws. (top) (left) the oscillating trace (a Gaussian variate) of the Gibbs sampler without PCA, and (right) the stabilization of the sampling due to PCA transformation, which diagonalizes the variance-covariance matrix (σij = 0.99; i ≠ j). (bottom) Autocorrelation function (ACF) of successive draws. Without PCA (left) undesired autocorrelation persists across the iterative sampling, while it decays after few iterations when adopting PCA.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Effects of poor mixing on Gibbs sampling for one variate of a 10-variate synthetic Gaussian sample, σij = 0.99, 103 draws. (top) (left) the oscillating trace (a Gaussian variate) of the Gibbs sampler without PCA, and (right) the stabilization of the sampling due to PCA transformation, which diagonalizes the variance-covariance matrix (σij = 0.99; i ≠ j). (bottom) Autocorrelation function (ACF) of successive draws. Without PCA (left) undesired autocorrelation persists across the iterative sampling, while it decays after few iterations when adopting PCA.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Effects of poor mixing on Gibbs sampling for one variate of a 10-variate synthetic Gaussian sample, σij = 0.99, 103 draws. (top) (left) the oscillating trace (a Gaussian variate) of the Gibbs sampler without PCA, and (right) the stabilization of the sampling due to PCA transformation, which diagonalizes the variance-covariance matrix (σij = 0.99; i ≠ j). (bottom) Autocorrelation function (ACF) of successive draws. Without PCA (left) undesired autocorrelation persists across the iterative sampling, while it decays after few iterations when adopting PCA.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
PCA is a linear transformation and presupposes that the sufficient statistics mean and variance fully describe the sample distribution, a condition which is strictly met by Gaussian data and, as in our case, is obtained by normalizing precipitation data through NQT. We evaluate eigenvectors and obtain a diagonal covariance matrix with σii given by eigenvalues. Eigen values represent the variance along principal components. Principal components with larger associated variances indicate high informative content, while those with lower variances resemble noise. PCA allows ranking predictors in terms of their informative contribution and, as a consequence, reducing the dimensionality of the problem, a property which becomes of significant interest when working with ensemble forecasts.
An additional complication in this process is given by the fact that by performing PCA, one needs to consider the transformation by rotation of the entire sampling sector while mapping the 10-dimensional reference system into principal components. Figure 5 gives an example of this situation. We see a snapshot of sampling (Kotecha and Djurić 1999) from a bivariate truncated Gaussian distribution in the inner Gibbs sampling cycle (appendix C). With reference to the composite variate

Slice of the inner Gibbs sampling cycle for the 2D-case in which missing values are sampled from the bivariate truncated normal distribution. The reference frame is rotated by PCA transformation with eigenvectors on display. The red cloud is an ensemble of sampled fictitious negative precipitation values from which only one is retained at each inner cycling step. Due to rotation of the reference frame, the straight boundaries of the gray sampling region change accordingly. Thusly, instead of drawing from the (left) gray square region, one must draw from the (right) gray triangle region, which requires linear inequality constraining.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Slice of the inner Gibbs sampling cycle for the 2D-case in which missing values are sampled from the bivariate truncated normal distribution. The reference frame is rotated by PCA transformation with eigenvectors on display. The red cloud is an ensemble of sampled fictitious negative precipitation values from which only one is retained at each inner cycling step. Due to rotation of the reference frame, the straight boundaries of the gray sampling region change accordingly. Thusly, instead of drawing from the (left) gray square region, one must draw from the (right) gray triangle region, which requires linear inequality constraining.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Slice of the inner Gibbs sampling cycle for the 2D-case in which missing values are sampled from the bivariate truncated normal distribution. The reference frame is rotated by PCA transformation with eigenvectors on display. The red cloud is an ensemble of sampled fictitious negative precipitation values from which only one is retained at each inner cycling step. Due to rotation of the reference frame, the straight boundaries of the gray sampling region change accordingly. Thusly, instead of drawing from the (left) gray square region, one must draw from the (right) gray triangle region, which requires linear inequality constraining.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
e. Testing
Next we proceed to testing the validity of assumption in (6). Choosing to model the dependence structure of the joint normal sample (w, z) as MVN, is equivalent to using a Gaussian copula that is constructed starting from the corresponding Gaussian marginal distributions F and G. The MVN dependence is one of several possible dependencies we could have chosen, and of course is only an approximation of the true dependence structure of the joint sample, which we nevertheless assume sufficiently close to MVND. Therefore a positive outcome from stringent multivariate normality tests, such as the Mardia test (Mardia 1970), is illusory in our case, not so much because of the distance between the Gaussian copula and the MVND, but the presence of outliers, especially in the fringe region of the MVND associated with high or low-end extreme events. Nevertheless, other tests can be performed to investigate sufficient closeness of the copula to a MVND. So the sample tested positively for the pairwise linear predictor versus predictor and predictand versus predictor dependence, as shown in the first row of Fig. 6 for an observation–predictor pair (left) and a predictor–predictor pair (right). We also tested the dependence of the residuals on the regressor using the Breusch–Pagan test (Breusch and Pagan 1979), which led to the acceptance of the homoscedasticity hypothesis for the majority of cases (second row in Fig. 6).

Test results of multivariate normality for (left) an observations–predictor and (right) a predictor–predictor pair for selected predictors. (top) Linear interdependence, (middle) the residuals against a selected predictor, and (bottom) the Q–Q plot of the residuals. While linearity of the dependence and homoscedasticity of the residuals are visible, there is a divergence of the residuals from the bisection line in the tails region.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Test results of multivariate normality for (left) an observations–predictor and (right) a predictor–predictor pair for selected predictors. (top) Linear interdependence, (middle) the residuals against a selected predictor, and (bottom) the Q–Q plot of the residuals. While linearity of the dependence and homoscedasticity of the residuals are visible, there is a divergence of the residuals from the bisection line in the tails region.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Test results of multivariate normality for (left) an observations–predictor and (right) a predictor–predictor pair for selected predictors. (top) Linear interdependence, (middle) the residuals against a selected predictor, and (bottom) the Q–Q plot of the residuals. While linearity of the dependence and homoscedasticity of the residuals are visible, there is a divergence of the residuals from the bisection line in the tails region.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
A Shapiro–Wilks test for normality of the residuals was performed, but led to rejection of some cases, despite residuals were visually on the theoretical Gaussian CDF curve. Especially the tails region of the pairwise Q–Q plots showed divergence from the bisection line (third row in Fig. 6).
Finally, we tested the Gibbs sampler on the 10-variate synthetic sample, which was drawn from the genuine MVND and, as to be expected, passed the Mardia test. After censoring the sample and retrieving the censored part by imputation, we performed another positive test for multivariate normality of the complete joint sample, obtaining confirmation that the Gibbs sampler produces perfectly multivariate Gaussian data.
4. Results
a. Processor execution
The 10 × 13 514 points study dataset, including 9 forecast and 1 observation series, is split into a 1 January 1979–31 December 2010 calibration and a 1 January 2011–31 December 2015 validation period. The processor is set up for the first period and verified over the second, whereby the conditional mean is compared against observations. This corresponds to a practical situation, in which a forecasting service uses a processor for the 5-yr period 2011–15 after it has been calibrated on 31 years of daily data and updated last on the 31 December 2010. The observed precipitation climatology is Weibull distributed, as visible in Fig. 7. The normalized subzero precipitation data, considered as censored negative precipitation, are retrieved by imputation for the calibration period. The sampling yields the posterior variance-covariance matrix Σ and a zero mean vector μ, to be used for processing of the validation period. First we apply (9) to compute conditional mean and variance, which is constant in virtue of near-homoscedasticity of the data. The Gaussian predictive density is computed by means of (8) and successively mapped back into the space of provenience. The continuous Gaussian process must be transformed into a discontinuous process of type (A1). To this end the dichotomous PoP process is drawn from the the Bernoulli distribution, a special case of the binomial distribution, while the precipitation depth process is drawn from the inverse Weibull.

Empirical and modeled climatic distribution of daily precipitation observed at BIZ, 1979–2015. The lower abscissa and left ordinate axis refer to the CDF, the other axes to the PDF. The data follow a 3-parameter Weibull distribution with k the shape, λ the scale, and θ the location parameter.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Empirical and modeled climatic distribution of daily precipitation observed at BIZ, 1979–2015. The lower abscissa and left ordinate axis refer to the CDF, the other axes to the PDF. The data follow a 3-parameter Weibull distribution with k the shape, λ the scale, and θ the location parameter.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Empirical and modeled climatic distribution of daily precipitation observed at BIZ, 1979–2015. The lower abscissa and left ordinate axis refer to the CDF, the other axes to the PDF. The data follow a 3-parameter Weibull distribution with k the shape, λ the scale, and θ the location parameter.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

(a) The observed precipitation, the conditional mean, and credible intervals for the Gaussian variables over a selected 4-month period at the central analysis cell BIZ (Fig. 1). The dashed horizontal line indicates the Gaussian zero-precipitation threshold. (b) The same data backtransformed into the original space. We note that the Gaussian PDFs morph into skewed gamma-type PDFs. The four Gaussian PDFs in the middle have part of the curve below the Gaussian zero-precipitation threshold. The area portions below the threshold are the PoP nonoccurrence used to parameterize the Bernoulli sampler for drawing the real-space PoP.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

(a) The observed precipitation, the conditional mean, and credible intervals for the Gaussian variables over a selected 4-month period at the central analysis cell BIZ (Fig. 1). The dashed horizontal line indicates the Gaussian zero-precipitation threshold. (b) The same data backtransformed into the original space. We note that the Gaussian PDFs morph into skewed gamma-type PDFs. The four Gaussian PDFs in the middle have part of the curve below the Gaussian zero-precipitation threshold. The area portions below the threshold are the PoP nonoccurrence used to parameterize the Bernoulli sampler for drawing the real-space PoP.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
(a) The observed precipitation, the conditional mean, and credible intervals for the Gaussian variables over a selected 4-month period at the central analysis cell BIZ (Fig. 1). The dashed horizontal line indicates the Gaussian zero-precipitation threshold. (b) The same data backtransformed into the original space. We note that the Gaussian PDFs morph into skewed gamma-type PDFs. The four Gaussian PDFs in the middle have part of the curve below the Gaussian zero-precipitation threshold. The area portions below the threshold are the PoP nonoccurrence used to parameterize the Bernoulli sampler for drawing the real-space PoP.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
b. Verification metrics
Performance indicators, 9 cells vs 1 cell, 1979–2010 [calibration (cal)] vs 2011–15 [validation (val)] for proposed approach, BMA, and the raw unprocessed forecast at the central cell.


Table 1 is split into two parts, an upper one with indicators calculated in the Gaussian space and a lower one with indicators in the real space. Vertically the table compares results for the proposed method, BMA and the raw unprocessed prediction at the central analysis cell. For the proposed method the Pearson correlation (CORR) between observations and conditional mean, which is coincident with the covariance for standard Gaussian variates, computes at 0.84 for 9 predictors, indicating good agreement between Gaussian observations and the linear model. The value is slightly smaller for the application with a single predictor. Next, we report the coefficient of multiple correlation R2 (coefficient of determination), which can be interpreted as the variance (VAR) of the observations estimated solely from the regression model, and the variance of residuals in (9) (variance unexplained), equal to 1 − R2. The values corroborate that the linear regression model in (4) is able to explain 71% of the Gaussian variance, while 29% remains random noise. We do not report these values for the verification period, as the processor continues to operate with the parameters (μw|z, Σzz) retrieved for the calibration period. It is possible to compute these parameters retrospectively for the validation period, but this would require another round of imputation to recover subzero observations and predictions, while the difference with respect to the calibration period would be likely insignificant. The results for BMA are very similar, but slightly worse. The BMA variance averaged over the calibration and the validation period computes to 0.32, thus slightly larger than the variance given by our approach. The signal-to-noise ratio (SNR) is a decision-theoretic measure of the informativeness of output (Krzysztofowicz 1992) and is equal to 2.4 for the proposed method and 2.16 for BMA due to the slightly higher variance. In the hypothetical case of a totally uninformative forecast, which would be completely uncorrelated with observations, CORR(w, μw|z) ≈ 0, all variance becomes unexplained and SNR → 0. To the contrary, if the processed forecast is “perfect,” CORR(w, μw|z) = 1, and consequently SNR → ∞. This also means that in the case of a noninformative forecasting model, which poorly correlates with observations,
The lower part of the table reports bias (BIAS), mean absolute error (MAE), root-mean-square error (RMSE) and correlation (CORR), metrics that are all estimated in the real variable space. We note that one effect of the Bayesian processor is bias removal, which is reduced well below 0.5 mm day−1 in all cases.
Processor performance is distilled through reliability diagrams (Wilks 1995; Bröcker and Smith 2007), which visualize dicotomic occurrence/nonoccurrence frequencies of an observation against the probability of the corresponding forecast. First we preset precipitation thresholds V = [1, 10, 15] mm day−1 against which dicotomic occurrence/nonoccurrence is verified. Given V, a forecast is considered reliable if actual precipitation exceeding V occurs with an observed relative frequency consistent with the forecast value. If oj are occurrence/non-non-occurrence frequencies of daily observations and qi the corresponding allowable probabilities of forecasts exceeding V, the reliability diagram collapses the joint distribution p(oj, qj) by factorization into the conditional distribution p(oj|qi) (calibration distribution) and the forecast distribution p(qi) (refinement distribution) (Murphy and Winkler 1987). The relative frequency of observations are plotted against suitably binned forecast probability quantiles. The forecast distribution is visualized as inset frequency histogram on the same plot. Figure 9 shows the reliability diagrams for validation (Fig. 9a) and calibration (Fig. 9b) periods for different values V. An ideally calibrated processor producing perfect forecasts leads to a graph with markers lying on the bisection line. Markers below the bisection indicate systematic overforecasting and thus wet bias, while those above mean underforecasting and dry bias. A forecast that is biased in either of the two directions is considered unreliable or miscalibrated.

Reliability diagrams for (a) calibration and (b) verification, threshold values V = [1, 10, 15] mm day−1, 9 predictors. The insets visualize relative frequencies of the forecast distribution p(qi). With increasing threshold V, calibration deteriorates visibly through departure from the bisection in either direction. In the first panel in (b) the marker at 0% on the ordinate axis means that particular bin contains only observation nonoccurrences. Because of rarefaction of observed frequencies in the high precipitation range for validation, the number of bins has been reduced from 20 to 5. The vertical bars indicate consistency intervals. Observed relative frequencies are in nearly all cases within the 5%–95% bounds and thus consistent with reliability.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Reliability diagrams for (a) calibration and (b) verification, threshold values V = [1, 10, 15] mm day−1, 9 predictors. The insets visualize relative frequencies of the forecast distribution p(qi). With increasing threshold V, calibration deteriorates visibly through departure from the bisection in either direction. In the first panel in (b) the marker at 0% on the ordinate axis means that particular bin contains only observation nonoccurrences. Because of rarefaction of observed frequencies in the high precipitation range for validation, the number of bins has been reduced from 20 to 5. The vertical bars indicate consistency intervals. Observed relative frequencies are in nearly all cases within the 5%–95% bounds and thus consistent with reliability.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Reliability diagrams for (a) calibration and (b) verification, threshold values V = [1, 10, 15] mm day−1, 9 predictors. The insets visualize relative frequencies of the forecast distribution p(qi). With increasing threshold V, calibration deteriorates visibly through departure from the bisection in either direction. In the first panel in (b) the marker at 0% on the ordinate axis means that particular bin contains only observation nonoccurrences. Because of rarefaction of observed frequencies in the high precipitation range for validation, the number of bins has been reduced from 20 to 5. The vertical bars indicate consistency intervals. Observed relative frequencies are in nearly all cases within the 5%–95% bounds and thus consistent with reliability.
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
The vertical credible interval bars in Fig. 9 are derived as in Bröcker and Smith (2007) by computing variations of the observed relative frequencies over a set of forecasts generated by bootstrap resampling with replacement. The method computes variations of the observed relative frequencies resulting from uncertainties in terms of quantile bin mean probability and bin population. The result of such sampling produces a frequency distribution in correspondence of each bin, from which the 5%–95% credible interval is determined. The bin size in Fig. 9b has been enlarged in the high precipitation range >10 mm day−1 to avoid meaningless observed relative frequencies due to entirely full or empty quantile bins.

Daily CRPS values for the same selected 4-month period in Fig. 8, red for the proposed method, blue for BMA. The
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1

Daily CRPS values for the same selected 4-month period in Fig. 8, red for the proposed method, blue for BMA. The
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
Daily CRPS values for the same selected 4-month period in Fig. 8, red for the proposed method, blue for BMA. The
Citation: Monthly Weather Review 147, 12; 10.1175/MWR-D-19-0066.1
5. Discussion
Thus far we presented the methodology and application of the proposed Bayesian uncertainty processor, which is founded on the model conditional processor (MCP) concept (Todini 2008). In principle there is neither theoretical nor procedural limitation to the number of employable predictors thanks to working with normal variables and a nonparametric structure. Additional predictors can easily be included into the analysis.
The proposed Bayesian processor has been benchmarked against BMA and the performance indicators in Table 1 figure in the same range, with the BMA average variance over the calibration and the validation period slightly higher. Unlike Sloughter et al. (2007), who derived heuristic conditional densities of precipitation ad hoc as logistic regressions with a power transformation, we have first normalized variates by nonparametric NQT, performed data augmentation and successively applied BMA to Gaussian data. Todini (2008) and Biondi and Todini (2018) pursued a similar performance contest of the two approaches, albeit in a different context, confirming strong similarities between BMA and the proposed Bayesian approach in terms of predictive mean and a slightly higher variance for BMA. We also note that BMA constitutes an approximation of predictive mean and variance, requiring constrained optimization for determining the weights. Moreover, BMA does not explicitly consider the covariance structure among predictors. Our proposed method instead estimates predictive distributions on an analytical basis by nonparametric mapping of empirical distributions to the Gaussian space, while accounting for the dependency structure among predictors. A certain level of approximation nevertheless remains due to assuming a MVN dependency among NQT-transformed variates and estimating the covariance structure by MCMC sampling.
Another topic of discussion is our choice of the spatial set of predictors. These are given by grid-based precipitation forecasts in a 9-cell modeling window centered on an analysis cell. This choice is motivated by the necessity to account for spatial uncertainty of precipitation. The methodology does not limit the extension of the analysis mask to larger windows as the one chosen here. Moreover, the approach can be spatialized by applying a sliding analysis mask over larger regions. Such spatial use of the processor requires estimating the covariance matrix Σzz and the covariance vector Σwz on a cell-by-cell basis for the study region. Of course these quantities need to be re-estimated periodically as new observations and forecasts become available. At an operational level such updating, which necessitates Bayesian imputation, can be executed offline, without impacting online operations.
We also note that the processor has been calibrated over a single dataset, without slicing it by seasons or specific pluvial regimes. The MVND should correctly account for extreme events and the multivariate normal regression model thus fitted separately for a low-to-intermediate and a high precipitation range. Such an approach would improve the estimation of extreme precipitation events, which tend to be underrepresented in the current setup. Examples of calibrating the processor by means of multivariate truncated normal distributions to accommodate heteroscedasticity of the data are given in Coccia and Todini (2011) and Reggiani et al. (2016, 2019).
6. Summary and conclusions
The outlined methodology describes a precipitation uncertainty processor able to handle multiple binary-continuous predictors. The output of the processor is a calibrated binary-continuous debiased probabilistic forecast of precipitation at a single location. Precipitation, an intermittent random process, is considered as censored variate, turned into a continuous one by recovering unknown censored values by imputation. Processing of the predictors and the observed values starts with the transformation of nonparametric marginal distributions into standard normal variates by NQT. The normalization serves different purposes.
First, the joint distribution can be considered with some approximation as MVND, equivalent to a Gaussian copula, which admits closed-form expressions for conditional densities.
Second, the censored Gaussian precipitation values and MVND parameters are retrieved by Bayesian imputation using a nested Gibbs sampler. The use of fully Gaussian distributed data facilitates the monitoring of imputation convergence and verification of results.
Third, Gaussianity of the data supports the application of principal component analysis (PCA) to diagonalize the variance-covariance matrix and analyze data redundancy.
Conditional predictive densities are computed for the normal variables and successively back-transformed into the space of origin. The processor has been calibrated and validated for a test site in Switzerland and computes satisfying reliability plots as well as performance indicators against BMA. The principal strengths of the proposed method can be summarized as follows:
The processor does not use ad hoc assumptions on distribution models for precipitation depth, but works straight with empirical CDFs that are subsequently mapped to Gaussian by nonparametric transformations. This supports parameter parsimony and avoids the need for parameter optimization.
Being parameter-parsimonious, the processor is computationally efficient and consequently apt for operational use.
The processor is sufficiently generic to handle multiple predictors, encouraging its application for hydrometeorological applications involving similar intermittent random processes, for example river flows.
The processor is self-calibrating given it produces an output (the predictive mean) with the same distributional properties as retrospective observations.
Moreover, the processor guarantees coherence as it cannot produce an output with inferior informativeness and thus negative economic value than the usage of the climatic distribution.
Standard forecast verification shows that the processor meets quality criteria, such as bias removal, and yields comparable values against BMA in terms of commonly used performance metrics.
Acknowledgments
This research was supported by Deutsche Forschungsgemeinschaft through Grant RE3834/5 “BSCALE” awarded to the first author. We thank Ezio Todini for the BMA likelihood maximization algorithm and and Alfred Müller from the University of Siegen for constructive suggestions. Both have supported this work with their discussions. We also acknowledge Meteo Swiss and ECMWF for giving access to the data used in this study. We finally acknowledge three anonymous reviewers for their thorough reading and suggestions that have helped improving the manuscript.
APPENDIX A
Precipitation Process Representation
APPENDIX B
Normal Quantile Transform
APPENDIX C
Posterior Gibbs Sampling
REFERENCES
Alpert, M., and H. Raiffa, 1982: A progress report on the training of probability assessors. Judgment under Uncertainty: Heuristics and Biases, D. Kahneman, P. Slovic, and A. Tversky, Eds., Cambridge University Press, 294–305, https://doi.org/10.1017/CBO9780511809477.022.
Bárdossy, A., and E. Plate, 1992: Space-time model for daily rainfall using atmospheric circulation patterns. Water Resour. Res., 28, 1247–1259, https://doi.org/10.1029/91WR02589.
Bárdossy, A., and G. G. S. Pegram, 2009: Copula based multisite model for daily precipitation simulation. Hydrol. Earth Syst. Sci., 13, 2299–2314, https://doi.org/10.5194/hess-13-2299-2009.
Berrisford, P., and Coauthors, 2011: The ERA-Interim archive version 2.0. Tech. Rep., European Centre for Medium-Range Weather Forecasts, 27 pp., https://www.ecmwf.int/node/8174.
Biondi, D., and E. Todini, 2018: Comparing hydrological postprocessors including ensemble predictions into full predictive probability distribution of stream flow. Water Resour. Res., 54, 9860–9882, https://doi.org/10.1029/2017WR022432.
Breusch, T. S., and A. R. Pagan, 1979: A simple test for heteroskedasticity and random coefficient variation. Econometrica, 47, 1287–1294, https://doi.org/10.2307/1911963.
Bröcker, J., and L. A. Smith, 2007: Increasing the reliability of reliability diagrams. Mon. Wea. Rev., 22, 651–662, https://doi.org/10.1175/WAF993.1.
Coccia, G., and E. Todini, 2011: Recent developments in predictive uncertainty assessment based on the Model Conditional Processor approach. Hydrol. Earth Syst. Sci., 15, 3253–3274, https://doi.org/10.5194/hess-15-3253-2011.
Cressie, N., 1985: Fitting variogram models by weighted least squares. Math. Geol., 17 (5), 663–586.
Frost, A. J., M. A. Thyer, R. Srikantan, and G. Kuczera, 2007: A general Bayesian framework for calibrating and evaluating stochastic models of annual multi-site hydrological data. J. Hydrol., 340, 129–148, https://doi.org/10.1016/j.jhydrol.2007.03.023.
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. H. Vehtari, and D. B. Rubin, 2014: Bayesian Data Analysis. 3rd ed. CRC Press, 639 pp.
Geman, S., and D. Geman, 1984: Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell., PAMI-6, 721–741, https://doi.org/10.1109/TPAMI.1984.4767596.
Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, https://doi.org/10.1175/MWR2904.1.
Gupta, V. K., and L. Duckstein, 1975: A stochastic analysis of extreme droughts. Water Resour. Res., 11, 221–228, https://doi.org/10.1029/WR011i002p00221.
Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209–3229, https://doi.org/10.1175/MWR3237.1.
Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434–1447, https://doi.org/10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.
Herr, H. D., and R. Krzysztofowicz, 2005: Generic probability distribution of rainfall in space: The bivariate model. J. Hydrol., 306, 234–263, https://doi.org/10.1016/j.jhydrol.2004.09.011.
Hersbach, R., 2000: Decomposition of the continuous ranked probability score for ensemble prediction systems. Wea. Forecasting, 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.
Kalbfleisch, J. D., and R. L. Prentice, 1980: The Statistical Analysis of Failure Time. Wiley and Sons, 435 pp.
Katz, R. W., 1977: Precipitation as a chain-dependent process. J. Appl. Meteor., 16, 671–676, https://doi.org/10.1175/1520-0450(1977)016<0671:PAACDP>2.0.CO;2.
Kavvas, M. L., and J. W. Delleur, 1981: A stochastic cluster model of daily rainfall sequences. Water Resour. Res., 17, 1151–1160, https://doi.org/10.1029/WR017i004p01151.
Kelly, K. S., and R. Krzysztofowicz, 2000: Precipitation uncertainty processor for probabilistic river stage forecasting. Water Resour. Res., 36, 2643–2653, https://doi.org/10.1029/2000WR900061.
Kotecha, J. H., and P. E. Djurić, 1999: Gibbs sampling approach for generation of truncated multivariate gaussian random variables. Proc. 1999 IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP99), Phoenix, AZ, IEEE, 2643–2653, https://doi.org/10.1109/ICASSP.1999.756335.
Krzysztofowicz, R., 1992: Bayesian correlation score: A utilitarian measure of forecast skill. Mon. Wea. Rev., 120, 208–219, https://doi.org/10.1175/1520-0493(1992)120<0208:BCSAUM>2.0.CO;2.
Krzysztofowicz, R., 1999: Bayesian theory of probabilistic forecasting via deterministic hydrologic model. Water Resour. Res., 35, 2739–2750, https://doi.org/10.1029/1999WR900099.
Li, Y., and S. K. Ghosh, 2015: Efficient sampling methods for truncated multivariate normal and student-t distributions subject to linear inequality constraints. J. Stat. Theory Pract., 9, 712–732, https://doi.org/10.1080/15598608.2014.996690.
Little, R. J. A., and D. B. Rubin, 2002: Statistical Analysis with Missing Data. Wiley Interscience, 409 pp.
Mardia, K. V., 1970: Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530, https://doi.org/10.1093/biomet/57.3.519.
Mardia, K. V., J. T. Kent, and J. M. Bibby, 1979: Multivariate Analysis. Probability and Mathematical Statistics. Academic Press, 512 pp.
Matheson, J. E., and R. L. Winkler, 1976: Scoring rules for continuous probability distributions. Manage. Sci., 22, 1087–1096, https://doi.org/10.1287/mnsc.22.10.1087.
Moran, P. A. P., 1970: Simulation and evaluation of complex water systems operation. Water Resour. Res., 6, 1737–1742, https://doi.org/10.1029/WR006i006p01737.
Murphy, A. H., and R. L. Winkler, 1987: A general framework for forecast verification. Mon. Wea. Rev., 115, 1330–1338, https://doi.org/10.1175/1520-0493(1987)115<1330:AGFFFV>2.0.CO;2.
Raftery, A. E., and S. Lewis, 1992: How many iterations in the Gibbs sampler? In Bayesian Statistics 4, J. M. Bernardo et al., Eds., Oxford University Press, 763–773.
Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian Model Averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174, https://doi.org/10.1175/MWR2906.1.
Reggiani, P., G. Coccia, and B. Mukhopadhyay, 2016: Predictive uncertainty estimation on a precipitation and temperature reanalysis ensemble for Shigar basin, Central Karakoram. Water, 8 (6), 263, https://doi.org/10.3390/w8060263.
Reggiani, P., A. Boyko, T. Rientjes, and A. Khan, 2019: Probabilistic precipitation analysis in the Central Indus River basin. Indus River Basin: Water Security and Sustainability, S. Khan and T. Adams, Eds., Elsevier, 485 pp.
Scheuerer, M., and T. M. Hamill, 2015: Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Mon. Wea. Rev., 143, 4578–4596, https://doi.org/10.1175/MWR-D-15-0061.1.
Seo, D.-J., S. Perica, E. Welles, and J. Schaake, 2000: Simulation of precipitation fields from probabilistic quantitative precipitation forecast. J. Hydrol., 239, 203–229, https://doi.org/10.1016/S0022-1694(00)00345-0.
Sklar, A., 1959: Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris, 1, 229–231.
Sloughter, J. M. L., A. E. Raftery, T. Gneiting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 3209–3220, https://doi.org/10.1175/MWR3441.1.
Sorensen, D. A., D. Gianola, and I. R. Korsgaard, 1998: Bayesian mixed-effects model analysis of a censored normal distribution with animal breeding applications. Acta Agric. Scand. Sec. A, Animal Sci., 48, 222–229, https://doi.org/10.1080/09064709809362424.
Tanner, M. A., and W. H. Wong, 1987: The calculation of posterior distributions by data augmentation. J. Amer. Stat. Assoc., 82, 528–540, https://doi.org/10.1080/01621459.1987.10478458.
Todini, E., 2008: A model conditional processor to assess predictive uncertainty in flood forecasting. Int. J. River Basin Manage., 36, 3265–3277, https://doi.org/10.1080/15715124.2008.9635342.
Todini, E., and M. Di Bacco, 1997: A combined Pólya process and mixture distribution approach to rainfall modelling. Hydrol. Earth Syst. Sci., 1, 367–378, https://doi.org/10.5194/hess-1-367-1997.
Todini, E., and F. Pellegrini, 1999: A maximum likelihood estimator for semi-variogram parameters in kriging. GeoENVII—Geostatistics for Environmental Applications, J. Gomez-Hernandez, A. Soares, and R. Froidevaux, Eds., Kluwer Academic Publishers, 187–198.
Todorovic, P., and V. Yevjevich, 1969: Stochastic processes of precipitation. Colorado State University Hydrology Paper 35, 61 pp.
Vrac, M., and P. Naveau, 2007: Stochastic downscaling of precipitations: From dry events to heavy rainfalls. Water Resour. Res., 43, W07402, https://doi.org/10.1029/2006WR005308.
Wang, C. S., D. E. Robertson, and D. Gianola, 2011: Multisite probabilistic forecasting of seasonal flows for streams with zero value occurrences. Water Resour. Res., 47, W02546, https://doi.org/10.1029/2010WR009333.
Waymire, E., and V. K. Gupta, 1981: The mathematical structure of rainfall representations: 1. A review of the stochastic rainfall models. Water Resour. Res., 17, 1261–1272, https://doi.org/10.1029/WR017i005p01261.
Wilks, D. S., 1990: Maximum likelihood estimation for the gamma distribution using data containing zeros. J. Climate, 3, 1495–1501, https://doi.org/10.1175/1520-0442(1990)003<1495:MLEFTG>2.0.CO;2.
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. International Geophysics Series, Vol. 59, Elsevier, 467 pp.
Woolhiser, D. A., and G. G. S. Pegram, 1979: Maximum likelihood estimation of Fourier coefficients to describe seasonal variations of parameters in stochastic daily precipitation models. J. Appl. Meteor., 18, 34–42, https://doi.org/10.1175/1520-0450(1979)018<0034:MLEOFC>2.0.CO;2.