1. Introduction
It is well known that when simple stochastic models are fitted to time series of daily precipitation amount, there is a marked tendency to underestimate the observed (or interannual) variance of monthly (or seasonal) total precipitation (Buishand 1978; Wilks 1989). In the statistics literature, this situation in which the observed variance exceeds that for the fitted model is termed “overdispersion” (e.g., Cox 1983). The explanation for the overdispersion phenomenon of precipitation, however, is not agreed upon. Some researchers view this discrepancy as evidence of an inadequate model for high-frequency variation of precipitation (Gregory et al. 1993). Others regard it as attributable to low-frequency variation that these models do not account for and, as such, constitutes a measure of the “potential predictability” of monthly total precipitation on an interannual timescale (Shea and Madden 1990; Shea et al. 1995; Singh and Kripalani 1986). But, if the first explanation were valid, then this approach would overestimate the degree of potential predictability.
In the present paper, we examine the first explanation, identifying and eliminating the various sources of variance underestimation for one particular class of stochastic model for high-frequency variation of precipitation, known as a chain-dependent process (Katz 1977a). This model involves dividing the precipitation process into two component models, one for its occurrence and another for its intensity (i.e., amount of precipitation conditional on its occurrence). Limited extensions of such models, including higher-order Markov chains for the daily occurrence process and autocorrelation of intensities on consecutive wet days, will be considered. In the literature on stochastic modeling of precipitation, evidence exists in support of making these adjustments, but the specific focus has not been on how well the variance of monthly total precipitation is approximated. So the present approach should provide additional insight concerning the relationship between daily and monthly variation of precipitation.
In section 2, some properties of stochastic models for time series of daily precipitation amount are reviewed. Essential to the present study is the representation of monthly total precipitation as a “random sum,” enabling its variance to be decomposed into two terms, one of which involves the variance of the number of wet days. In section 3, these theoretical results are applied to a time series of daily precipitation amount in January at Chico, California. The same dataset was previously analyzed by Katz and Parlange (1993, 1996), in conjunction with a statistical downscaling study of the effect of large-scale atmospheric circulation on local precipitation. These studies raised the possibility that a more complex model for high-frequency variation of precipitation at Chico might reduce the extent of the overdispersion phenomenon. Through a combination of analytical expressions, computational algorithms, and simulation techniques, the variance of January total precipitation and related statistics are estimated for various extensions of a chain-dependent process. Some technical details are relegated to appendices. Finally, section 4 consists of a discussion.
2. Stochastic models
In this section, the probabilistic representation of total precipitation as a random sum is utilized. Loosely speaking, the basic idea is that monthly total precipitation consists of a sum of precipitation amounts contributed by individual storms. This representation enables the variance of total precipitation to be related to the various components of particular stochastic models for daily precipitation amount. For this reason, the approach has the advantage of applying more generally than just for the simplest form of chain-dependent process for precipitation (Katz 1977a).
a. Random sum representation
b. Chain-dependent process
A chain-dependent process (Katz 1977a; Todorovic and Woolhiser 1975) has the desirable feature of requiring only a relatively small number of parameters, while still accounting for the most important statistical features of precipitation time series. Its simple structure enables the analytical determination of many of its properties, including the variance of monthly total precipitation (Katz 1977b). In particular, the random sum representation for total precipitation, introduced in section 2a, applies to this class of stochastic model.
As already defined in section 2a in conjunction with a general random sum, the daily precipitation intensities Zk for a chain-dependent process are assumed i.i.d. Nevertheless, it is convenient to introduce alternative notation, letting {Xt: t = 1, 2, . . .} denote the time series of daily precipitation amount (i.e., Xt assumes both zero and positive values). The equivalent assumption in terms of the intensities Xt > 0 (i.e., on days t for which Jt = 1) is that they are conditionally i.i.d., given the states of the Markov chain model for daily precipitation occurrence. In particular, the intensity mean and variance can be defined in terms of Xt as μ = E(Xt|Jt = 1) and σ2 = Var(Xt|Jt = 1). The daily intensity has a positively skewed distribution, taken to be exponential by Todorovic and Woolhiser (1975), gamma by Katz (1977a), and based on a power transformation to normality by Katz and Parlange (1993).
c. Extensions
1) Higher-order Markov chains
Because of the state-space representation of a higher-order Markov chain as a first-order chain with vector states, this model is equivalent to a first-order chain with more than two states. Thus, if the stochastic model for daily precipitation amount described in section 2b is extended to incorporate a higher-order Markov chain, the more general theory of chain-dependent processes can be employed (Katz 1977b). Specifically, the expression (7) for the variance of a sum of a chain-dependent process is a special case of a formula that involves the inverse of a matrix, an approach taken by Klugman and Klugman (1981).
Alternatively, the random sum representation (1) can be exploited, again applying the decomposition (2) for the variance of monthly total precipitation. The approximate expression (5) for the variance of the number of wet days, Var[N(T)], is based on a first-order Markov chain and requires modification. Although no simple analog to (5) exists, the exact variance can be calculated via recursive methods. Appendix C gives an algorithm for determining the exact distribution of the number of wet days for a second-order Markov chain (8), a generalization of the method given in Katz (1974). It is anticipated that allowing for higher-order Markov chains would increase the estimated variance of the number of wet days, thus increasing the estimated variance of monthly total precipitation.
2) Autocorrelated intensities
Another, less common, extension of a chain-dependent process involves allowing the intensities within a given wet spell to be autocorrelated. Formally, the intensities are assumed to follow a first-order autoregressive [AR(1)] process with autocorrelation coefficient ϕ> 0. This process “randomly” terminates when the end of a wet spell is reached. In Katz and Parlange (1995), the AR(1) process is actually fitted to power transformed intensities to allow for skewness as well. They used this approach in modeling hourly precipitation amount, a situation in which the autocorrelation of intensities is more apparent. Previous attempts to allow for dependence among intensities have relied on a multistate Markov chain, an approach that requires the estimation of a large number of transition probabilities (Gregory et al. 1993; Haan et al. 1976).
3) Nonidentical distributions
4) Combinations of extensions 1), 2), and 3)
Naturally, various combinations of these three types of extensions of a chain-dependent process could be applied simultaneously. As noted previously, the general theory for chain-dependent processes (Katz 1977b) already encompasses both extensions 1) and 3), and consequently their combination (i.e., higher-order Markov chain for occurrences and nonidentically distributed intensities). Although a closed-form expression exists for the variance of monthly total precipitation, it involves large state spaces and matrix inversion (Klugman and Klugman 1981). A theory that encompasses all three extensions has yet to be devised. Nevertheless, in appendix A it is shown how to derive the first- and second-order autocorrelation coefficients of daily precipitation amount for this general situation.
3. Results
A time series of daily precipitation amount in January for 78 yr (during the period 1907–88, with 4 yr eliminated because of missing observations) at Chico, California, is analyzed. Because of its long record, a reasonably reliable estimate of the interannual variance of January total precipitation can be obtained. Chico is situated near the west coast of the United States, a region where large-scale atmospheric circulation patterns have a dominant influence on local weather during the winter season. Thus, the precipitation process is expected to be relatively persistent, making this a stringent test for simple stochastic models. For these data, Katz and Parlange (1993) have already established that an ordinary chain-dependent process has a substantial degree of overdispersion.
a. Fitted models
The fitted models for the daily occurrences and intensities are presented separately. We reiterate that our focus is not on whether more complex models necessarily provide an improved fit (i.e., in terms of parameter estimates that are deemed “statistically significant”), but rather on the ability of these models to estimate the variance of January total precipitation and related statistics.
1) Occurrence process
Table 1 gives the estimated transition probabilities (based on the criterion of approximate maximum likelihood) for two-state Markov chains of orders 1–4 fit to the time series of daily precipitation occurrence in January at Chico. Because of the constraints on the transition probabilities (as noted in section 2b), only the conditional probability of a wet day is listed. To facilitate comparisons and because any lower-order chain can be viewed as a special case of a higher-order chain (as explained in section 2c), the estimates for all orders are presented in a form corresponding to a fourth-order chain. For each row in Table 1, the transition probability estimates would be constant (except for sampling errors) if the time series of precipitation occurrence were actually generated by a first-order chain. Some evidence is present in the table that dry spells exhibit a form of persistence that cannot be modeled by a first-order chain. For instance, the estimated conditional probability of a wet day decreases from about 0.211, given only that the previous day is dry, to about 0.158, given that the last 4 days are dry (these two probabilities would be identical for a first-order chain). The pattern is less clear for wet spells.
For a first-order Markov chain, the transition probability estimates can be converted [via (4)] into the corresponding estimates for the unconditional probability of a wet day and the persistence parameter,
2) Intensity process
Table 2 gives the estimated parameters for the conditional means, conditional standard deviations, and first-order autocorrelation coefficient of daily precipitation intensity in January at Chico. For convenience, these estimates are presented in a general form for nonidentical intensity distributions, recalling that the identically distributed case corresponds to the constraint that μ0 = μ1 and
The estimates of the first-order autocorrelation coefficient ϕ for daily intensity in Table 2 are relatively small positive values, 0.161 or 0.145, depending on whether or not the intensity distribution is assumed identical. More generally, the estimated autocorrelation is quite sensitive to any differences in the intensity distribution depending on the position within the wet spell (first vs second day of wet spell, etc.), a refinement of the model for nonidentical distributions described in section 2c. To circumvent this problem, an alternative method of estimating ϕ is also included in Table 2, predicated upon reproducing the sample first-order autocorrelation coefficient of daily precipitation amount. In the case of identically distributed intensities, this method involves substituting the estimates for μ and σ(along with π, d, and P11 for a first-order Markov chain) into the right-hand side of (9) with lag l = 1, equating this expression with the sample first-order autocorrelation coefficient for daily precipitation amount of 0.279 and then solving for ϕ. In the case of nonidentically distributed intensities, the same approach is taken, but now μi and
b. Overdispersion estimates
Through use of these fitted stochastic models for time series of daily precipitation amount, the variance of monthly total precipitation can be estimated. Because of their diagnostic capability, the variance of the number of wet days and the autocorrelation function of daily precipitation amount are also examined. To obtain estimates of these statistics, either an explicit formula, a computational algorithm, or stochastic simulation (based on the generation of 10000 yr of January daily precipitation amount) is employed. All of these approaches require numerical values for the parameters of the fitted models. We simply substitute the corresponding parameter estimates (given in section 3a), ignoring the sampling errors associated with those estimates. The large number of daily observations of precipitation (i.e., 2418 = 78 × 31) suggests that such uncertainties are relatively small.
1) Number of wet days
In view of the variance decomposition (2), we now focus on how well the observed standard deviation of the number of wet days in January at Chico is matched by the Markov chain model for the daily occurrence of precipitation. Table 3 includes the estimated standard deviation of the number of wet days for Markov chains of order 1–4 whose parameter estimates are given in Table 1. These estimated standard deviations were obtained through the computational algorithm outlined in appendix C.
For a first-order Markov chain, the standard deviation of the number of wet days is estimated as 3.76 days [the approximate expression (5) yields nearly the same value, 3.81], well below the observed value of 4.33 days, or an overdispersion of about 25% in terms of variance (Table 3). As the order of the chain is increased, this estimated standard deviation increases as well, being only slightly below the observed value for a third-order chain and slightly above for a fourth-order chain. This overestimate for a fourth-order chain most likely does not reflect real “underdispersion,” but rather just the sampling error in both the observed and model-estimated standard deviations.
Most importantly, the overdispersion phenomenon with respect to the number of wet days has been completely eliminated through the use of a higher-order chain. One drawback is that the number of transition probabilities required to be estimated increases at a rapid rate as the order is increased (e.g., 16 parameters for a fourth-order chain—see Table 1). So the possibility of overfitting is present. An alternative, not explored here, would be to fit a more parsimonious model for a higher-order chain in which certain constraints are placed among the parameters (Raftery 1985).
2) Autocorrelation
In view of the representations (A1) and (A2) of the variance of monthly total precipitation, we now focus on how well the autocorrelation function of daily precipitation amount is reproduced by the various forms of a stochastic model. The lag l = 1 and 2 day autocorrelation coefficients, calculated through use of (A4) and (A5), are included in Table 3. The ordinary form of the chain-dependent process has a marked tendency to underestimate the first-order autocorrelation coefficient of daily precipitation amount (i.e., 0.128 vs an observed value of 0.279 in Table 3). Permitting either nonidentically distributed or autocorrelated intensities increases this estimate somewhat, with their combination producing a value of 0.218, still well below the observed value. Necessarily, the “inflated” intensity autocorrelations do reproduce the desired value. Increasing the order of the Markov chain has no effect on the first-order autocorrelation coefficient, with the very slight numerical differences being attributable to the manner in which the lower-order probabilities are derived (appendix B).
Likewise, the second-order autocorrelation coefficient is underestimated by the ordinary form of the chain-dependent process (i.e., 0.046 vs 0.113 in Table 3). In this case, increasing the order of the Markov chain from one to two, as well as allowing for nonidentically distributed and autocorrelated intensities, all contribute to increases in this estimate, with their combination producing a value of 0.078. When the intensity autocorrelation is inflated as well, the largest value produced is 0.103 (for a fourth-order chain with identical distributions), still slightly below that observed. It is important to recognize that this approach of inflating the parameter ϕ is not constrained to reproduce the autocorrelation at lags l ≥ 2 days. This deficiency in estimating the autocorrelations was also found by Gregory et al. (1993) for precipitation in the United Kingdom.
Figure 1 shows the autocorrelation function up to lag l = 5 days for a subset of these stochastic models (i.e., Markov chain order restricted to first or fourth), with the lags l = 3, 4, and 5 days being estimated by stochastic simulation. For a first-order chain, it is evident that the higher-order autocorrelations (i.e., lags l ≥ 2) are substantially underestimated, no matter whether the intensities are identically distributed or not or autocorrelated or not (Figs. 1a,b). On the other hand, not much underestimation is evident for a fourth-order chain, no matter what the other model assumptions, provided the apparent increase of the observed fifth-order autocorrelation over the fourth is not regarded as real (Figs. 1c,d).
3) Total precipitation
Table 3 also includes the estimated standard deviation of January total precipitation at Chico for the various forms of stochastic model, and the same numbers are displayed in Fig. 2. For identically distributed intensities without any autocorrelation, the estimates are obtained from (2) for any order Markov chain (using the calculation of the variance of the number of wet days). For a first-order chain with identically distributed, autocorrelated intensities, the estimates are obtained from (A1)and (9) [for nonidentically distributed, uncorrelated intensities from (A1) and (12)]. Otherwise, the estimates are based on stochastic simulation.
As anticipated from Katz and Parlange (1993), the estimated standard deviation for the ordinary form of chain-dependent process is well below that observed, 68.7 mm [the approximate expression (7) yields nearly the same value, 69.2 mm] as compared to 88.6 or an overdispersion of about 40% in terms of variance (Table 3). The highest estimate is 80.6 mm (17% overdispersion) for a fourth-order chain with nonidentically distributed, autocorrelated intensities, or 83.9 (10% overdispersion) if inflated autocorrelation is permitted as well. Figure 2 illustrates that increasing the Markov chain order has the greatest effect on the estimated standard deviation, with the intensity autocorrelation having a lesser effect (roughly comparable if autocorrelations are allowed to be inflated), and with nonidentical distributions making the smallest contribution (but recall that inflated autocorrelations may well be a surrogate for a more complex form of nonidentically distributed intensities).
In any event, it is evident that the extent of overdispersion can be greatly reduced, if not eliminated, through use of a more complex form of stochastic model for high-frequency variation of precipitation. We note that Klugman and Klugman (1981) also found that the estimated standard deviation of seasonal total precipitation at a site in Oregon is sensitive to the assumed form of model. The question remains whether even more complex models than those considered here could completely eliminate this overdispersion.
4. Discussion
It has been established that much of the overdispersion for January total precipitation at Chico, California, could be attributable to an inadequate stochastic model for high-frequency variation of precipitation. A higher-order Markov chain model for daily precipitation occurrence completely eliminates one source of overdispersion, the number of wet days. The allowance for autocorrelated and nonidentically distributed intensities also contributes to this reduction in overdispersion. Although the appropriate form of stochastic model for daily precipitation at other locations might well differ from that for Chico, it is anticipated that the estimated variance of monthly total precipitation would likewise be sensitive to the assumed form of model.
Could the overdispersion for monthly total precipitation be further reduced? Results obtained through relating daily precipitation statistics to large-scale atmospheric circulation shed some light on this question. Katz and Parlange (1993) fit the simplest version of chain-dependent processes conditionally, given an index of large-scale atmospheric circulation, to the same daily precipitation data for Chico in January. When these conditional models are combined into a single overall “induced” model, the overdispersion of January total precipitation is reduced to about 4%, smaller yet than the reductions obtained in the present paper (i.e., 10% or 17% for best models). Of interest is the fact that the induced model completely eliminates the overdispersion in the number of wet days, in agreement with the result obtained here (Katz and Parlange 1996).
The implications of the present work for estimating potential predictability remain to be explored. Although the two approaches are not equivalent, the induced model resembles in some respects a single, more complex stochastic model that could have been directly fitted to the data (Katz and Parlange 1996). Future work will seek ways in which these two approaches could be unified. One possibility would involve so-called hidden Markov models and their generalizations (Guttorp 1995, chap. 2). These models involve a hidden state, like an index of atmospheric circulation but unobserved. The variance in monthly total precipitation associated with the hidden states could perhaps be construed as an estimate of potential predictability.
Acknowledgments
Research was partially supported by NSF Grant DMS-9312686 to the NCAR Geophysical Statistics Project. M. B. Parlange received support from NCAR’s Environmental and Societal Impacts Group and performed a portion of this research at the University of California, Davis.
REFERENCES
Brockwell, P. J., and R. A. Davis, 1991: Time Series: Theory and Methods. 2d ed. Springer-Verlag, 577 pp.
Buishand, T. A., 1978: Some remarks on the use of daily rainfall models. J. Hydrol.,36, 295–308.
Chin, E. H., 1977: Modeling daily precipitation occurrence process with Markov chain. Water Resour. Res.,13, 949–956.
——, and J. F. Miller, 1980: On the conditional distribution of daily precipitation amounts. Mon. Wea. Rev.,108, 1462–1464.
Cox, D. R., 1983: Some remarks on overdispersion. Biometrika,70,269–274.
Feller, W., 1968: An Introduction to Probability Theory and Its Applications. Vol. I. 3d ed. Wiley, 509 pp.
Gabriel, K. R., 1959: The distribution of the number of successes in a sequence of dependent trials. Biometrika,46, 454–460.
Gates, P., and H. Tong, 1976: On Markov chain modeling to some weather data. J. Appl. Meteor.,15, 1145–1151.
Gregory, J. M., T. M. L. Wigley, and P. D. Jones, 1993: Application of Markov models to area-average daily precipitation series and interannual variability in seasonal totals. Climate Dyn.,8, 299–310.
Guttorp, P., 1995: Stochastic Modeling of Scientific Data. Chapman and Hall, 372 pp.
Haan, C. T., D. M. Allen, and J. O. Street, 1976: A Markov chain model of daily rainfall. Water Resour. Res.,12, 443–449.
Katz, R. W., 1974: Computing probabilities associated with the Markov chain model for precipitation. J. Appl. Meteor.,13, 953–954.
——, 1977a: Precipitation as a chain-dependent process. J. Appl. Meteor.,16, 671–676.
——, 1977b: An application of chain-dependent processes to meteorology. J. Appl. Probability,14, 598–603.
——, and M. B. Parlange, 1993: Effects of an index of atmospheric circulation on stochastic properties of precipitation. Water Resour. Res.,29, 2335–2344.
——, and ——, 1995: Generalizations of chain-dependent processes: Application to hourly precipitation. Water Resour. Res.,31,1331–1341.
——, and ——, 1996: Mixtures of stochastic processes: Application to statistical downscaling. Climate Res.,7, 185–193.
Klugman, M. R., and S. A. Klugman, 1981: A method for determining change in precipitation data. J. Appl. Meteor.,20, 1506–1509.
Lindgren, B. W., 1968: Statistical Theory. 2d ed. Macmillan, 521 pp.
Raftery, A. E., 1985: A model for higher-order Markov chains. J. Roy. Stat. Soc., Ser. B,47, 528–539.
Shea, D. J., and R. A. Madden, 1990: Potential for long-range prediction of monthly mean surface temperatures over North America. J. Climate,3, 1444–1451.
——, N. A. Sontakke, R. A. Madden, and R. W. Katz, 1995: The potential for long-range prediction over India for the southwest monsoon season: An analysis of variance approach. Preprints, Sixth Int. Meeting on Statistical Climatology, Galway, Ireland, University College, 475–477.
Singh, S. V., and R. H. Kripalani, 1986: Potential predictability of lower-tropospheric monsoon circulation and rainfall over India. Mon. Wea. Rev.,114, 758–763.
Todorovic, P., and D. A. Woolhiser, 1975: A stochastic model of n-day precipitation. J. Appl. Meteor.,14, 17–24.
Wilks, D. S., 1989: Conditioning stochastic daily precipitation models on total monthly precipitation. Water Resour. Res.,25, 1429–1439.
APPENDIX A
Autocorrelation Function of Generalized Chain-Dependent Process
APPENDIX B
Relationships among Parameters
Second- versus first-order Markov chain
Nonidentically versus identically distributed intensities
APPENDIX C
Distribution of Number of Wet Days for Second-Order Markov Chain
Transition probability estimates for Markov chains of various orders fit to 78 yr of time series of daily precipitation occurrence in January at Chico, California.
Parameter estimates for various forms of model fit to daily precipitation intensities in January at Chico, California(787 wet days).
Overdispersion estimates and related statistics for January precipitation at Chico, California, based on daily stochastic models (parameter estimates in Tables 1 and 2), along with corresponding observed values.