## Abstract

Recently, a single-site stochastic precipitation model called “the mixture of generalized chain-dependent processes conditioned on a climate variable” was developed. The model can effectively eliminate overdispersion—that is, underestimation in variance of seasonal precipitation total. In this paper, the single-site model is further developed into a multisite stochastic precipitation model by driving a collection of individual single-site models, but with spatial dependence following a method proposed by D. S. Wilks. Specifically, a computationally effective algorithm for estimating the spatial dependence of precipitation occurrence is developed to replace the construction of the empirical curves in the Wilks method. An effective and straightforward approach for correcting the bias of the spatial correlation of precipitation intensity is also proposed. This model is tested on a small network of sites from a significant hydroelectric power generation region of South Island, New Zealand.

## 1. Introduction

In 2003, the National Institute of Water and Atmospheric Research (NIWA) of New Zealand launched a project called “Climate-Related Risks for Energy Supply and Demand” (Zheng and Thompson 2007). In this project, daily precipitation at four sites in the upper Waitaki catchment, New Zealand, was simulated under several climate scenarios. The outflows from the catchment were simulated using a hydrological catchment model forced by the simulated precipitation. Since the outflows dominate the electricity generation capacity of the hydropower stations near the catchment, the climate-related risks for the hydroenergy supply could be accessed.

An outcome of this project was the development of a single-site stochastic precipitation model called “the mixture of generalized chain-dependent processes conditioned on a climate variable” (Zheng and Katz 2008a; Zheng and Thompson 2007). Unlike other existing stochastic precipitation models, this model can effectively eliminate overdispersion (i.e., underestimation of the variance in seasonal precipitation totals; Katz and Zheng 1999). However, the capacity of hydroelectric power is dominated by the seasonal precipitation total at the catchment scale. Correctly simulating the distribution of the seasonal precipitation total is most important in this project. In this paper, we will demonstrate that, without introducing spatial dependence of daily precipitation, the overdispersion at the catchment scale still exists, despite the fact that overdispersion for every single site is eliminated. In this case, the risk for energy supply related to wet or dry seasons could still be underestimated.

The literature on multisite precipitation models is rather limited (e.g., Wilks 1998; Qian et al. 2002; Thompson et al. 2007; Brissette et al. 2007), with the majority of the work based on the method proposed by Wilks (1998). Wilks’ approach demands significant effort to construct the empirical curves corresponding to all possible station pairs. In the present paper, we propose an alternative algorithm for the estimation of the spatial dependence of precipitation occurrence and intensity. This algorithm is faster and easier to implement.

The paper is arranged as follows. First the proposed estimation method for the spatial dependence of daily precipitation is introduced. Then the results of a simulation study of long-term daily precipitation in the upper Waitaki catchment, New Zealand, using the proposed multisite precipitation model are described. Finally, discussions and conclusions are presented.

## 2. Methodology

In this section, we briefly introduce a multisite daily precipitation model called the mixture of generalized chain-dependent processes conditional on a climate variable at multisites.

### a. Generalized chain-dependent processes conditioned on a climate variable

A “generalized chain-dependent processes conditioned on a climate variable” is a stochastic model for single-site daily precipitation. The model can be summarized as follows [see Zheng and Katz (2008a) for the more detail]:

Let *X _{y}* be a climate variable in year

*y*; for example, a seasonal Interdecadal Pacific Oscillation (IPO) index whose state remains constant in a year

*y*over the time period of

*T*days (i.e., a season). Let

*J*,

_{yt}*t*= 1, … ,

*T*denote the sequence of indicators of daily precipitation occurrence in year

*y*(i.e.,

*J*= 1 indicates a “wet day” and

_{yt}*J*= 0 a “dry day”). It is assumed that this process is a first-order Markov chain whose transition probability is related to

_{yt}*X*by

_{y}where “Pr” is being used as a generic “probability function” (i.e., both for discrete probabilities and for density functions) (Chandler and Wheater 2002).

Let *R* = (*R _{yt}*,

*t*= 1, … ,

*T*) denote the time series of daily precipitation amounts in year

*y*. The “intensity”

*R*> 0 (i.e., days for which

_{yt}*J*= 1) is taken to be conditionally independent and identically distributed. It is further assumed that daily precipitation intensity has a power-transform distribution. That is,

_{yt}where *ϕ*(*x*; *μ*, *σ*) denotes the Gaussian density function with mean *μ* and standard deviation *σ*. The values of *q* = ½, ⅓, or ¼ are commonly employed to account for the high degree of positive skewness in the distribution of daily precipitation amounts (Katz and Zheng 1999).

### b. Mixture of generalized chain-dependent processes conditioned on a climate variable

Let *I _{y}*(

*I*=

_{y}*i*,

*i*= 0, 1) denote a two-state index—usually hidden, unobserved, or only partially observed—that remains constant over the time period of

*T*days. The annual time series of index states is assumed to be identically distributed with respect to year

*y*, with a common distribution

For a given year, the time series of daily precipitation amounts (*R _{yt}*,

*t*= 1, … ,

*T*) is called a mixture of generalized chain-dependent processes conditional on

*X*; if for given

_{y}*I*=

_{y}*i*, it is a generalized chain-dependent process conditioned on

*X*with parameters

_{y}*w*,

_{i}α_{0},

_{i}α_{1},

_{i}β_{0},

_{i}β_{1},

*,*

_{i}μ*, and*

_{i}σ*q*(the left subscript is for the state

*I*=

_{y}*i*). Specifically, similar to Eqs. (1) and (2),

The climate variable *X _{y}* is assumed to be a mixture of two Gaussian random variables with mean

*that depends on*

_{i}m*I*=

_{y}*i*and a common standard deviation

*s.*Specifically,

(Zheng and Katz 2008a). Under this assumption, parameters listed in Table 1 can be estimated using the expectation maximization (EM) algorithm (McLachlan and Krishnan 1997; Zheng and Katz 2008a). Moreover, for simulating daily precipitation, we need to know the probability of *I _{y}* conditioned on

*X*. This can be derived by Bayes’ Theorem and Eqs. (3), (5), and (6),

_{y}Given a parameter set listed in Table 1 and an initial occurrence state *J*_{y0}, we can generate by a Monte Carlo simulation a daily precipitation time series from a mixture of chain-dependent processes conditioned on *X _{y}*. The procedure is as follows (Zheng and Katz 2008a):

(i) Generate statistically independent uniform random variables [over the range (0, 1)]

*U*, and standard Gaussian random variables (_{y}*W*,_{yt}*t*= 1, … ,*T*) and (*Z*;_{yt}*t*= 1, … ,*T*).- (iv) Transform the standard Gaussian variable
*Z*into the general Gaussian variable with mean_{yt}and standard deviation_{Iy}μto get the power-transformed daily precipitation amount_{Iy}σIf is negative, truncate it to zero.

### c. Mixture of generalized chain-dependent processes at multisites

The mixture of generalized chain-dependent processes conditioned on *X _{y}* at multisites [

*R*

_{y1}(

*m*), … ,

*R*(

_{yT}*m*)] can be simply represented by extending notions of mixture of generalized chain-dependent processes to a higher dimension to have additional parameters for every site (listed in Table 1) and to add spatial correlations to the two Gaussian fields. Here [

*R*

_{y1}(

*m*), … ,

*R*(

_{yT}*m*)] can be simulated from a uniform random variable

*U*that is common for all sites, and temporally independent but spatially correlated standard Gaussian fields [

_{y}*W*(

_{yt}*m*),

*t*= 1, … ,

*T*] and [

*Z*(

_{yt}*m*);

*t*= 1, … ,

*T*], following the steps (ii)–(iv) documented in section 2b.

To estimate the spatial correlation between *W _{yt}*(

*m*) and

*W*(

_{yt}*n*) [denoted by

*ω*(

*m*,

*n*)], we represent the modeled correlation between

*J*(

_{yt}*m*) and

*J*(

_{yt}*n*) [denoted by

*C*(

_{y}^{J}*m*,

*n*)] as a function of

*ω*(

*m*,

*n*), then we adjust

*ω*(

*m*,

*n*) to make

*C*(

_{y}^{J}*m*,

*n*) close to the observed correlation, following the idea of Wilks (1998). However, Wilks’ approach demands significant effort to simulate the relation between

*C*(

_{y}^{J}*m*,

*n*) and

*ω*(

*m*,

*n*). Here, we propose an alternative algorithm with much less computational cost.

By conditional probability formula, the modeled correlation can be expressed as

where by Eq. (8) and the fact that *X _{y}* is common for all stations, we have

is the invariant measure of the Markov chain (*J _{yt}*(

*m*),

*t*= 1, … ,

*T*) conditioned on

*I*(

_{y}*m*) =

*i*(Feller 1971), and

_{i,i′}

*π*(

_{y}*m*,

*n*) is the invariance measure of the bivariate Markov chain {[

*J*(

_{yt}*m*),

*J*(

_{yt}*n*)],

*t*= 1, … ,

*T*} conditioned on

*I*(

_{y}*m*) =

*i*and

*I*(

_{y}*j*) =

*j*. The quantity

_{i,i′}

*π*(

_{y}*m*,

*n*) can be uniquely determined by the parameter

*ω*(

*m*,

*n*) [a computationally effective algorithm for deriving

_{ii′}

*π*(

_{y}*m*,

*n*) given

*ω*(

*m*,

*n*) is documented in the appendix]. Therefore,

*ω*(

*m*,

*n*) can be estimated by minimizing the mean square error (over all years) between modeled correlations [Eq. (11)] and observed correlations.

The spatial correlation *ζ*(*m*, *n*) is initially estimated as

where *C _{y}^{r}*(

*m*,

*n*) is the spatial correlation of observed power-transformed precipitation intensity between the site pair

*m*and

*n*in year

*y*. However, because of the power transformation, it is likely that the spatial correlations of simulated precipitation intensity using

*ζ̂*(

*m*,

*n*) are biased (Zheng and Katz 2008b). To correct the bias, the

*ζ̂*(

*m*,

*n*) estimated by Eq. (14) is adjusted by applying a factor that is the ratio of the observed spatial correlation of precipitation intensity to the simulated spatial correlation of precipitation intensity using

*ζ̂*(

*m*,

*n*) estimated by Eq. (17).

## 3. Data and results

The upper Waitaki catchment (Fig. 1) is the one of the most important hydroelectric power generation catchments in New Zealand. It is crucial for New Zealand to understand the variability of the outflow from the three lakes (Lake Pukaki, Lake Tekapo, and Lake Ohau; see Fig. 1) in the upper Waitaki catchment and to estimate this variability for the next two to three decades. To this end, a climate variable is needed that is both predictable and significantly associated with rainfall on a decadal time scale. Fortunately, the IPO may be such a climate variable. The IPO has significant impacts on rainfall and river flows in certain regions of New Zealand. In the west and south of South Island, the negative IPO phase is generally associated with lower rainfall and inflows, and vice versa for the positive IPO phase (Salinger et al. 2001; Zheng and Thompson 2007). An aspect of this NIWA project is to assess how precipitation in the upper Waitaki catchment, on interannual time scales, is linked to the IPO, and in particular the correlation between the IPO and the seasonal rainfall total in the upper Waitaki catchment.

There are only four precipitation stations near the catchment—Mt. Cook, Lake Ohau, Lake Tekapo, and Franz Josef (see Fig. 1 for the locations)—with long record lengths. A common period to all four sites was 1953–2000. A wet day, in the context of this study, occurs when at least 1 mm of precipitation was recorded by the rain gauge; otherwise, the day is treated as dry. Only the summer season (December–February, with the seasonal length *T* = 90) is analyzed in this study, because generally there is sufficient precipitation for other seasons in the upper Waitaki catchment. Power-transform values for *q* of ⅓, ⅖, ½, and are adopted for Lake Tekapo, Lake Ohau, Mt. Cook, and Franz Josef, respectively. These values of *q* were empirically chosen to make the simulated seasonal mean totals similar to the observed (Zheng and Thompson 2007; Zheng and Katz 2008a). This is because the seasonal variability of the precipitation is related to both daily and decadal variability, so correctly simulating the seasonal mean precipitation at each site is most essential. Zheng and Thompson (2007) showed that the selected *q* values can simulate reasonable daily, seasonal, annual, and decadal variability of the precipitation in the upper Waitaki catchment.

The IPO index is provided by the Hadley Centre from the Met Office. It is derived from the 3^{rd} empirical orthogonal function (EOF) pattern of 13-year low-pass-filtered global SST (see Fig. 2 in Zheng and Thompson 2007). There have been three major phases during the twentieth century: positive phases from 1920 to 1945 and 1978–99, and a negative phase between 1946 and 1977. The IPO appears to modulate the frequency and intensity of the El Niño–Southern Oscillation (ENSO) phenomenon: during the last positive phase of the IPO, there were more El Niño episodes and fewer La Niña ones than during the preceding negative IPO phase (e.g., Salinger et al. 2001). The climate variable *X _{y}* is the observed mean IPO index over the austral summer season [December–February (DJF)].

As described in section 2, parameters for each site are estimated by fitting the single-site mixture of generalized chain-dependent processes conditioned on the IPO index to each site’s data. All the estimated parameters are listed in Table 2. The methodology of section 2c is used to estimate the spatial dependence of precipitation occurrence and precipitation intensity. These estimates are listed in Table 3. For each site pair *m* and *n*, *ω̂*(*m*, *n*) can be estimated within 10 s by a standard PC with a 3.2-GHz P4 processor and 2 Gb of RAM. The computational cost for estimation of *ζ̂*(*m*, *n*) is negligible.

Based on the fitted parameters and the observed seasonal IPO index, 100 independent simulations of the DJF daily precipitation over a 47-year period are generated using Eqs. (8)–(10).

The observed and simulated correlations between daily precipitation occurrences at all site pairs are shown in Table 4. It shows that the correlations simulated are virtually identical to the correlations observed. This indicates that for this dataset, the proposed estimation procedure works well.

The observed and simulated correlations between daily precipitation intensity at all site pairs are shown in Table 5. It is clear that, without the factor adjustment for *ζ̂*(*m*, *n*) in Eq. (17), the simulated correlations are all underestimated. However, after *ζ̂*(*m*, *n*) is bias corrected, the remaining bias in the simulated correlations is negligible.

A major advantage of the single-site mixture of generalized chain-dependent processes conditioned on a climate variable over single-site chain-dependent processes is in the better modeling of the interannual and decadal variability (Zheng and Katz 2008a). Zheng and Katz (2008a) showed that the variance of seasonal mean precipitation and the correlation between the seasonal mean precipitation and the climate variable can be well captured by such a mixture model. In this study, we further investigate simulated seasonal regional precipitation totals because they are closely related to the capacity of hydroelectric power generation in each year and are therefore crucial in estimating the climate-related risk of energy supply.

The qqplot of the seasonal regional precipitation totals simulated by the mixture of the generalized chain-dependent processes at multisites against those observed is shown in the left panel of Fig. 2. It shows that the distributions of the simulated and observed seasonal precipitation total are similar since they lie on or very close to the one-to-one line in the figure.

## 4. Discussion and conclusions

In this paper, the mixture of generalized chain-dependent processes conditioned on a climate variable at multisites has been proposed for simulating daily precipitations in order to capture from daily-to-decadal precipitation data, variability at the catchment scale. The proposed model is similar in concept to the Wilks (1998) precipitation model. But in the estimation of spatial dependence of the daily precipitation occurrence and intensity, the construction of the empirical curves used in the Wilks method have been replaced by a computationally effective optimization algorithm.

In this study, the distribution of transformed precipitation intensity is assumed to be power-transformed Gaussian. An advantage of the power-transformed Gaussian assumption is that when the mixture of statistical processes is introduced, the overdispersion (i.e., underestimation of the variance of seasonal precipitation total) can be effectively eliminated. However, this is not necessarily the case for other distributions (Zheng and Katz 2008a), such as the gamma or mixed exponential. In the present paper, we have shown that the spatial dependence of precipitation intensity can be effectively modeled under the power-transformed Gaussian assumption. Moreover, the estimation of the spatial correlation of precipitation intensity is very simple and the bias arising from the power transformation can be easily corrected.

To investigate why the distribution of seasonal regional precipitation total is well simulated by the proposed model, we further consider the spatially independent mixture of generalized chain-dependent processes [i.e., *ω*(*m*, *n*) = *ξ*(*m*, *n*) = 0]. The qqplots of the alternative model is shown in the right panel of Fig. 2. The model tends to overestimate the smaller precipitation totals but underestimate the larger precipitation totals. This suggests that if the spatial correlation is not introduced, it would underestimate the risks to hydroelectric supply associated with wet and dry seasons.

A reason for the spatially independent mixture of generalized chain-dependent processes underestimating the risks associated with wet and dry seasons may be as follows. Since the four sites are close, we expect (Tables 4 and 5) that the spatial dependence of the precipitation between them is positive and significant. If one site has a low precipitation total in any given summer season, then there is a high likelihood, through its spatial dependency, for the other sites to also have low precipitation totals. However, as the model has no spatial dependencies, the probability of the simulated precipitation at all four sites to have above- or below-average precipitation in any single season is actually smaller. Therefore, the likelihood of large or small regional totals in any single season is also smaller.

In conclusion, the stochastic model proposed in the study has simulated the spatial correlations of daily precipitation occurrence and intensity quite well, and has also correctly simulated the distribution of the total. This is particularly vital for regional water budget, and for any risk analysis of hydroelectric generation potential during wet and dry years.

## Acknowledgments

This work was supported by National Program on Key Basic Research Project of China (Grant 2010CB951604), National High-Tech R&D Program of China (Grant 2009AA122104), the National Natural Science Foundation of China General Program (Grant 40975062), and the New Zealand Foundation for Research, Science, and Technology (Contract C01X0302). We thank Dr. P. J. Thomson for useful comments, and the Hadley Centre UKMO for IPO time series.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

### APPENDIX

#### Why ii′πy(m, n) Is Uniquely Determined by ω(m, n)

The probability of [*J _{yt}*(

*m*) =

*j*,

*J*(

_{yt}*n*) =

*j*′] conditioned on [

*J*

_{y(t−1)}(

*m*) =

*k*,

*J*

_{y(t−1)}(

*n*) =

*k*′;

*I*(

_{y}*m*) =

*i*,

*I*(

_{y}*n*) =

*i*′;

*X*] is denoted by

_{y}_{ii′}

*P*

_{kk′jj′}(

*m*,

*n*) and the corresponding transition probability matrix is denoted by 𝗣. Given [

*I*(

_{y}*m*) =

*i*,

*I*(

_{y}*n*) =

*i*′;

*X*], [

_{y}*J*(

_{yt}*m*),

*J*(

_{yt}*n*),

*t*= 1, … ,

*T*] is a bivariate Markov chain. By Eq. (8), its transition probabilities are

where * _{i}P_{kj}*(

*m*) denotes the probability of the event [

*J*(

_{yt}*m*) =

*j*] conditioned on [

*J*

_{y(t−1)}(

*m*) =

*k*;

*I*(

_{y}*m*) =

*i*;

*X*]. The function for calculating multivariate Gaussian probability distribution [i.e., Φ in Eq. (A1)] is available, for example, the function “dmvnorm” in the library “mvtnorm” of the statistical package “R” (http://cran.r-project.org). So

_{y}_{ii′}

*P*

_{kk′ll}(

*m*,

*n*) can be quickly computed.

By the ergodic theory of Markov chains (e.g., Feller 1971), the invariant probability measure of the transition probability matrix 𝗣 is the last row of 𝗔^{T}(𝗔𝗔^{T})^{−1}; where the partitioned matrix 𝗔 = [𝗜 − 𝗣, **1**], 𝗜 is the identity matrix and all elements of column vector **1** are 1. Since 𝗣 is uniquely determined by *ω*(*m*, *n*), so it is an invariant measure. Finally, since _{ii′}*π*(*m*, *n*) is the value of the binary invariant at the binary state *J _{yt}*(

*m*) = 1,

*J*(

_{yt}*n*) = 1, it is also uniquely determined by

*ω*(

*m*,

*n*).

## Footnotes

*Corresponding author address:* Xiaogu Zheng, College of Global Change and Earth System Science, Beijing Normal University, Beijing 100875, China. Email: x.zheng@bnu.edu.cn

This article included in the State of the Science of Precipitation special collection.