• Albert, J. H., , and Chib S. , 1993: Bayesian analysis of binary and polychotomous response data. J. Amer. Stat. Assoc., 88, 669679, doi:10.1080/01621459.1993.10476321.

    • Search Google Scholar
    • Export Citation
  • Ambrosino, C., , Chandler R. E. , , and Todd M. C. , 2014: Rainfall-derived growing season characteristics for agricultural impact assessments in South Africa. Theor. Appl. Climatol., 115, 411426, doi:10.1007/s00704-013-0896-y.

    • Search Google Scholar
    • Export Citation
  • Bellone, E., , Hughes J. P. , , and Guttorp P. , 2000: A hidden Markov model for downscaling synoptic atmospheric patterns to precipitation amounts. Climate Res., 15, 112, doi:10.3354/cr015001.

    • Search Google Scholar
    • Export Citation
  • Charles, S. P., , Bates B. C. , , and Hughes J. P. , 1999: A spatiotemporal model for downscaling precipitation occurrence and amounts. J. Geophys. Res., 104, 31 65731 669, doi:10.1029/1999JD900119.

    • Search Google Scholar
    • Export Citation
  • Charney, J. G., , and Shukla J. , 1981: Monsoon dynamics. Predictability of Monsoons, J. Lighthill and R. Pearce, Eds., Cambridge University Press, 99–109.

  • Cox, D. R., 1971: The Analysis of Binary Data. Methuen, 142 pp.

  • DelSole, T., , and Feng X. , 2013: The “Shukla–Gutzler” method for estimating potential seasonal predictability. Mon. Wea. Rev., 141, 822831, doi:10.1175/MWR-D-12-00007.1.

    • Search Google Scholar
    • Export Citation
  • Forney, G. D., Jr., 1973: The Viterbi algorithm. Proc. IEEE, 61, 268278, doi:10.1109/PROC.1973.9030.

  • Furrer, E., , and Katz R. , 2007: Generalized linear modeling approach to stochastic weather generators. Climate Res., 34, 129144, doi:10.3354/cr034129.

    • Search Google Scholar
    • Export Citation
  • Ghil, M., , and Childress S. , 1987: Topics in Geophysical Fluid Dynamics: Atmospheric Dynamics, Dynamo Theory and Climate Dynamics. Springer-Verlag, 512 pp.

  • Greene, A. M., , Robertson A. W. , , and Kirshner S. , 2008: Analysis of Indian monsoon daily rainfall on subseasonal to multidecadal time-scales using a hidden Markov model. Quart. J. Roy. Meteor. Soc., 134, 875887, doi:10.1002/qj.254.

    • Search Google Scholar
    • Export Citation
  • Greene, A. M., , Robertson A. W. , , Smyth P. , , and Triglia S. , 2011: Downscaling projections of the Indian monsoon rainfall using a non-homogeneous hidden Markov model. Quart. J. Roy. Meteor. Soc., 137B, 347–359, doi:10.1002/qj.788.

  • Hansen, J. W., , Challinor A. , , Ines A. , , Wheeler T. , , and Moron V. , 2006: Translating climate forecasts into agricultural terms: Advances and challenges. Climate Res., 33, 2741, doi:10.3354/cr033027.

    • Search Google Scholar
    • Export Citation
  • Hay, L. E., , McCabe G. J. Jr., , Wolock D. M. , , and Ayers M. A. , 1991: Simulation of precipitation by weather type analysis. Water Resour. Res., 27, 493501, doi:10.1029/90WR02650.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., , and Guttorp P. , 1994a: A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena. Water Resour. Res., 30, 15351546, doi:10.1029/93WR02983.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., , and Guttorp P. , 1994b: Incorporating spatial dependence and atmospheric data in a model of precipitation. J. Appl. Meteor., 33, 15031515, doi:10.1175/1520-0450(1994)033<1503:ISDAAD>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., , Guttorp P. , , and Charles S. P. , 1999: A non-homogeneous hidden Markov model for precipitation occurrence. J. Roy. Stat. Soc., 48C, 1530, doi:10.1111/1467-9876.00136.

    • Search Google Scholar
    • Export Citation
  • Immerzeel, W. W., , van Beek L. P. H. , , and Bierkens M. F. P. , 2010: Climate change will affect the Asian water towers. Science, 328, 13821385, doi:10.1126/science.1183188.

    • Search Google Scholar
    • Export Citation
  • Johnson, G., , Hanson C. , , Hardegree S. , , and Ballard E. , 1996: Stochastic weather simulation: Overview and analysis of two commonly used models. J. Appl. Meteor., 35, 18781896, doi:10.1175/1520-0450(1996)035<1878:SWSOAA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Joshi, V., , and Rajeevan M. , 2006: Trend in precipitation extremes over India. NCC Research Rep. 3/2006, 25 pp.

  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Katz, R., , and Glantz M. , 1986: Anatomy of a rainfall index. Mon. Wea. Rev., 114, 764771, doi:10.1175/1520-0493(1986)114<0764:AOARI>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Kenabatho, P. K., , McIntyre N. R. , , Chandler R. E. , , and Wheater H. S. , 2012: Stochastic simulation of rainfall in the semi-arid Limpopo basin, Botswana. Int. J. Climatol., 32, 11131127, doi:10.1002/joc.2323.

    • Search Google Scholar
    • Export Citation
  • Kirshner, S., , Smyth P. , , and Robertson A. W. , 2004: Conditional Chow–Liu tree structures for modeling discrete-valued vector time series. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI Press, 317324.

  • McCullagh, P., , and Nelder J. , 1989: Generalized Linear Models. Chapman and Hall, 532 pp.

  • Moron, V., , Robertson A. W. , , Ward M. N. , , and Camberlin P. , 2007: Spatial coherence of tropical rainfall at the regional scale. J. Climate, 20, 52445263, doi:10.1175/2007JCLI1623.1.

    • Search Google Scholar
    • Export Citation
  • CPC/NCEP, 1987: CPC global summary of day/month observations, 1979-continuing. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory, accessed 20 May 2014. [Available online at http://rda.ucar.edu/datasets/ds512.0.]

  • Paap, R., , and Frances P. H. , 2000: A dynamic multinomial probit model for brand choice with different long-run and short-run effects of marketing-mix variables. J. Appl. Econ., 15, 717744, doi:10.1002/jae.580.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 1998: Nonlinear dynamics and climate change: Rossby’s legacy. Bull. Amer. Meteor. Soc., 79, 14111423, doi:10.1175/1520-0477(1998)079<1411:NDACCR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , Kirshner S. , , and Smyth P. , 2004: Downscaling of daily rainfall occurrence over northeast Brazil using a hidden Markov model. J. Climate, 17, 44074424, doi:10.1175/JCLI-3216.1.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , Kirshner S. , , Smyth P. , , Charles S. P. , , and Bates B. C. , 2006: Subseasonal-to-interdecadal variability of the Australian monsoon over North Queensland. Quart. J. Roy. Meteor. Soc., 132, 519542, doi:10.1256/qj.05.75.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , Moron V. , , and Swarinoto Y. , 2009: Seasonal predictability of daily rainfall statistics over Indramayu district, Indonesia. Int. J. Climatol., 29, 14491462, doi:10.1002/joc.1816.

    • Search Google Scholar
    • Export Citation
  • Schwarz, G., 1978: Estimating the dimension of a model. Ann. Stat., 6, 461464, doi:10.1214/aos/1176344136.

  • Scott, S., 2002: Bayesian methods for hidden Markov models: Recursive computing in the 21st century. J. Amer. Stat. Assoc., 97, 337351, doi:10.1198/016214502753479464.

    • Search Google Scholar
    • Export Citation
  • Shirley, K. E., , Small D. S. , , Lynch K. G. , , Maisto S. A. , , and Oslin D. W. , 2010: Hidden Markov models for alcoholism treatment trial data. Ann. Appl. Stat., 4, 366395, doi:10.1214/09-AOAS282.

    • Search Google Scholar
    • Export Citation
  • Spiegelhalter, D. J., , Best N. G. , , Carlin B. P. , , and van der Linde A. , 2002: Bayesian measures of model complexity and fit. J. Roy. Stat. Soc., 64B, 583639, doi:10.1111/1467-9868.00353.

    • Search Google Scholar
    • Export Citation
  • Timbal, B., , Hope P. , , and Charles S. , 2008: Evaluating the consistency between statistically downscaled and global dynamical model climate change projections. J. Climate, 21, 60526059, doi:10.1175/2008JCLI2379.1.

    • Search Google Scholar
    • Export Citation
  • Vrac, M., , and Naveau P. , 2007: Stochastic downscaling of precipitation: From dry events to heavy rainfalls. Water Resour. Res., 43, W07402, doi:10.1029/2006WR005308; Corrigendum, 44, W05702, doi:10.1029/2008WR007083.

    • Search Google Scholar
    • Export Citation
  • Wilks, D., 1999a: Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agric. For. Meteor., 93, 153170, doi:10.1016/S0168-1923(98)00125-7.

    • Search Google Scholar
    • Export Citation
  • Wilks, D., 1999b: Multisite downscaling of daily precipitation with a stochastic weather generator. Climate Res., 11, 125136, doi:10.3354/cr011125.

    • Search Google Scholar
    • Export Citation
  • View in gallery

    A hidden Markov chain provides stochastic transitions between weather states z. The observed rainfall at each site (with three sites shown here for illustration) depends on both the Markov state and the exogenous variables . The exogenous inputs modify the “wetness” of the states over time, thus providing a climatic control on weather.

  • View in gallery

    A simple example illustrating how the mixing weights respond to changes in a 1D exogenous X variable using an ordered multinomial probit model. Shown are the normal (top) PDF and (bottom) CDF as a function of for two values of X (left and right). The mixing weights are the areas under the normal PDF, which are given by the values of the CDF at the bin boundaries delineating no rain, light rain, and heavy rain. The probability of heavy rain is increased in the left compared to the right. See text for details.

  • View in gallery

    Station locations for two regions analyzed in the paper: Punjab and the upper Yangtze basin.

  • View in gallery

    Three summary metrics of the rainfall data. Stations in (a)–(c) Punjab and (d)–(f) the Yangtze basin. Mean rainfall is of comparable magnitude in the two regions, with large spatial gradients in the two regions. Note that season length is reflected in rain probability.

  • View in gallery

    The SAI of observed station rainfall for the Punjab and Yangtze basin networks (dimensionless).

  • View in gallery

    Coefficient values for seasonality and the SAI inputs, for both the (a),(b) Punjab and (c),(d) Yangtze regions, with the mean coefficient values (circles) and 95% probability intervals. The stations are ordered arbitrarily. Note that the seasonality input was not normalized, so that the amplitude values are not directly comparable between the left and right.

  • View in gallery

    Punjab region with SAI as an input and K = 4 states. (a)–(d) Probability of rain; (e)–(h) mean daily intensity. The relative frequencies of the four states are 20%, 18%, 28%, and 34% for states 1–4 (from wet to dry), respectively.

  • View in gallery

    (top) Estimated state sequence and (bottom) seasonality for the Punjab network. For seasonality, the average number of days per calendar date that fall into each state is summed over the 30-yr period.

  • View in gallery

    The Yangtze basin network of stations with SAI as an input and K = 6 states. (a)–(f) Probability of rain; (g)–(l) mean daily intensity. The relative frequencies of the six states are 18%, 18%, 16%, 19%, 15%, and 14% for states 1–6, respectively.

  • View in gallery

    As in Fig. 8, but for the Yangtze basin network.

  • View in gallery

    Observed daily rainfall PDF (black), showing the seasonal cycle for (a) Punjab and (b) the Yangtze basin. The mean fit and 95% confidence bands (gray) for 500 simulated datasets generated from the model are shown.

  • View in gallery

    Histograms of log frequency of precipitation and error bars from 500 simulated datasets from the model.

  • View in gallery

    The 50-yr return levels for annual block max, from GEV model fits. There is one box plot for each station: round markers represent observed values, box plots show the simulated data. The middle horizontal black line is the median, the lower and upper bounds of the box plot are the 25th and 75th percentiles, respectively.

  • View in gallery

    Punjab and the Yangtze basin with observed station-average values (black dots) and box plots for 1000 predicted station-average values from the model. (a),(b) Average annual rainfall amount with no input; (c),(d) average annual rainfall amount with SAI as an input; (e),(f) number of days annually with a rainfall event over 70 mm with SAI as an input; and (g),(h) number of dry days annually with SAI as an input.

  • View in gallery

    RMSE values per year, averaging over all stations, with (y axis) or without (x axis) the SAI input, for (a),(e),(e) Punjab and (b),(d),(f) the Yangtze basin. The values in the top-left corner are the RMSEs averaged over all years. Light gray to dark gray indicates light to heavy rain stations.

  • View in gallery

    As in Fig. 15, but for RMSE values per station, averaging over all years.

  • View in gallery

    NHMM (comparable to Figs. 11 and 16) for (a),(c) Punjab and (b),(d) the Yangtze basin. Station-averaged seasonal cycle from the NHMM (top; details as in Fig. 11). Interannual performance of NHMM vs GLM–HMM per station, averaging over all years, RMSE for the number of dry days with SAI as an input (bottom; details as in Fig. 16). The station-averaged RMSEs for Punjab for the NHMM (GLM–HMM) are 5.91 (5.93), 1.13 (1.11), and 0.84 (0.81) for dry-day counts, average rainfall, and number of 70+ mm days, respectively. For the upper Yangtze the numbers are 8.31 (8.28), 1.17 (1.13), and 0.68 (0.65), respectively.

  • View in gallery

    Climatological average vertically integrated moisture fluxes (kg m−1 s−1; vectors) and pressure “omega” vertical velocity (mb h−1; contours with an interval of 0.5 mb h−1, negative dashed) for JJAS.

  • View in gallery

    Composite anomalies of vertically integrated moisture fluxes (kg m−1 s−1; vectors) and pressure “omega” vertical velocity (mb h−1; contours with an interval of 0.5 mb h−1, negative dashed) for the Punjab states. The magnitude of the largest moisture flux anomaly for each state is given in the top-left corner (kg m−1 s−1). The number of days contained in each composite is given in the case.

  • View in gallery

    As in Fig. 19, but for the upper Yangtze basin states.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 173 173 45
PDF Downloads 128 128 34

A Bayesian Hidden Markov Model of Daily Precipitation over South and East Asia

View More View Less
  • 1 University of California, Irvine, Irvine, California
  • 2 International Research Institute for Climate and Society, Earth Institute at Columbia University, Palisades, New York
  • 3 University of California, Irvine, Irvine, California
© Get Permissions
Full access

Abstract

A Bayesian hidden Markov model (HMM) for climate downscaling of multisite daily precipitation is presented. A generalized linear model (GLM) component allows exogenous variables to directly influence the distributional characteristics of precipitation at each site over time, while the Markovian transitions between discrete states represent seasonality and subseasonal weather variability. Model performance is evaluated for station networks of summer rainfall over the Punjab region in northern India and Pakistan and the upper Yangtze River basin in south-central China. The model captures seasonality and the marginal daily distributions well in both regions. Extremes are reproduced relatively well in the Punjab region, but underestimated for the Yangtze. In terms of interannual variability, the combined GLM–HMM with spatiotemporal averages of observed rainfall as a predictor is shown to exhibit skill (in terms of reduced RMSE) at the station level, particularly for the Punjab region. The skill is largest for dry-day counts, moderate for seasonal rainfall totals, and very small for the number of extreme wet days.

Corresponding author address: Andrew Robertson, IRI, Earth Institute at Columbia University, 230 Monell, 61 Route 9W, P.O. Box 1000, Palisades, NY 10964-8000. E-mail: awr@iri.columbia.edu

Abstract

A Bayesian hidden Markov model (HMM) for climate downscaling of multisite daily precipitation is presented. A generalized linear model (GLM) component allows exogenous variables to directly influence the distributional characteristics of precipitation at each site over time, while the Markovian transitions between discrete states represent seasonality and subseasonal weather variability. Model performance is evaluated for station networks of summer rainfall over the Punjab region in northern India and Pakistan and the upper Yangtze River basin in south-central China. The model captures seasonality and the marginal daily distributions well in both regions. Extremes are reproduced relatively well in the Punjab region, but underestimated for the Yangtze. In terms of interannual variability, the combined GLM–HMM with spatiotemporal averages of observed rainfall as a predictor is shown to exhibit skill (in terms of reduced RMSE) at the station level, particularly for the Punjab region. The skill is largest for dry-day counts, moderate for seasonal rainfall totals, and very small for the number of extreme wet days.

Corresponding author address: Andrew Robertson, IRI, Earth Institute at Columbia University, 230 Monell, 61 Route 9W, P.O. Box 1000, Palisades, NY 10964-8000. E-mail: awr@iri.columbia.edu

1. Introduction

Statistical downscaling is a class of methods used for modeling the impact of regional climate variations and change on daily rainfall at local scale, for example, in agricultural applications of climate forecasts (e.g., Hansen et al. 2006). Hidden Markov models (HMMs) have been applied quite extensively to simulate daily rainfall variability across multiple weather stations, based on rain gauge observations and exogenous meteorological variables (Hay et al. 1991; Hughes and Guttorp 1994a; Charles et al. 1999; Bellone et al. 2000; Robertson et al. 2004; Greene et al. 2008). In these multisite stochastic weather generators based on discrete-state HMMs, each day is assumed to be associated with one of a finite number of hidden states, where the distributional characteristics of the states are estimated from historical data. The state-based nature of the HMM is well suited to representing large-scale weather control on the local rainfall processes, where the control is manifested across a region and influences individual locations according to local surface conditions such as topography and land use. An important goal of climate downscaling research is to better understand this cross-scale linkage, in order to obtain estimates of climate variability and change at local scale that better represent the physical relationships between large and small scales.

In a nonhomogeneous HMM (NHMM), the state transition probabilities are conditioned by one or more exogenous input variables. This formulation combines the Markov chain, to model the weather element as a stochastic process, with the influence of large-scale exogenous meteorological or climatic variables, such as spatially averaged geopotential height fields (Hughes et al. 1999) or general circulation model (GCM) output (Robertson et al. 2004, 2006, 2009). However, the NHMM presents a limitation for downscaling of climate change simulations because the rainfall characteristics of the modeled states may evolve as the climate warms (Timbal et al. 2008; Greene et al. 2011).

Furrer and Katz (2007), Ambrosino et al. (2014), and Kenabatho et al. (2012) have taken an alternative approach to stochastic weather generation, based on a generalized linear model (GLM) that allows state-dependent model parameters to be influenced directly by exogenous variables such as seasonal effects or climate indices. The GLM framework can be viewed as a generalization of classic least squares linear regression modeling (see McCullagh and Nelder 1989), allowing the conditional mean of a variable of interest to be modeled as a function of predictor variables, with noise characteristics that are non-Gaussian, the latter feature being particularly useful for modeling precipitation data.

In this paper we take a combined GLM–HMM approach that uses a GLM to incorporate the impact of exogenous variables (here seasonality and interannual regional climate variability), while exploiting an HMM for the stochastic spatiotemporal variability of daily rainfall across a regional network of stations. Thus, rather than encode the climatic influence on weather in terms of a set of fixed states as in an NHMM, the characteristics of the rainfall states themselves are assumed to be modulated in time by climate. In this way, the model emphasizes the regression between local rainfall parameters and the exogenous controls. The states enable more realistic modeling of spatiotemporal structures than would a stateless model, while allowing for a hierarchical structure of the climatic influence that can, for instance, be made to act on all stations equally. A discussion of the pros and cons of this approach is presented in section 6.

Two contrasting monsoonal regions, each characterized by increasing societal water demand, are taken here as case studies: the Punjab region of northern India and Pakistan and the upper Yangtze River basin in south-central China. The southwest monsoon over northern India is characterized by intense rainfall over a relatively short (July–August) peak season, with pronounced intraseasonal active and break phases and year-to-year variability (Greene et al. 2008). The summer monsoon over western China lasts longer (May–August peak season) with less prominent seasonality. These two regions provide much of the inflow to the Indus River in Pakistan and India and the Yangtze River above the Three Gorges Dam in China, respectively, and are thus of great importance for water resources, agriculture, and hydropower generation for these countries (Immerzeel et al. 2010). For each region, we evaluate the model in terms of its ability to represent seasonality of rainfall and the marginal (i.e., climatological) distributions of daily rainfall. We also evaluate its ability to capture interannual variability at the station level in daily rainfall characteristics, including rainfall frequency and extremes, when forced with regional-scale spatiotemporal averages of observed rainfall as the exogenous variable, and use the HMM’s states to interpret the multiscale nature of the rainfall variability in the two regions.

The paper proceeds as follows. The new model, which combines GLM and HMM approaches, is described in section 2, and the rainfall and other meteorological datasets are described in section 3. In section 4, the inferred rainfall states for Punjab and the upper Yangtze are presented and their meteorological and climatic associations are analyzed. In section 5, we then examine rainfall simulations generated by the model. Conclusions are presented in section 6.

2. Using GLMs with HMMs for precipitation modeling

Let represent the observed precipitation amount on day t at station or site s. The index t refers to both the day of year as well as to the year itself. We will model as a stochastic function of both 1) a hidden Markov state variable and 2) a time series of daily exogenous variables represented by a vector .

The HMM approach for modeling daily precipitation amounts is based on 1) a daily discrete hidden state variable taking values (often referred to as a weather state) and 2) state-dependent emission distributions that characterize the precipitation distribution at site s when the hidden state variable on day t is in state k. Two key assumptions underlie this model. First, the hidden states are assumed to be first-order Markov, that is, , where the resulting K × K state-to-state transition matrix can be constant in time (for a homogeneous HMM), or can vary as a function of time (e.g., seasonally) or in response to exogenous variables (for a nonhomogeneous HMM). The second key assumption in an HMM is that the observations on day t, , are conditionally independent of observations and states on other days conditioned on the hidden state variable on day t. Both assumptions lead to the familiar product form for the joint distribution of observations and states as
e1
where represents the initial state distribution (e.g., a uniform distribution).

In precipitation modeling applications, it is common to make an additional assumption that precipitation amounts at individual sites on day t are conditionally independent of each other given the state value , that is, . This conditional independence assumption does not imply that the precipitation amounts are marginally (unconditionally) independent—the presence of the K discrete weather states ensures that some degree of spatial dependence across sites is captured via the latent states. Indeed, this ability to capture spatial dependence in a relatively parsimonious manner is one of the features that makes HMMs attractive for multisite precipitation simulation at daily time scales. The assumption that precipitation amounts are “locally” (in time) independent given the weather state can in practice lead to underestimation of spatial dependence in simulations from such models; however, if one desires, additional spatial dependence among sites at the daily level can be added to such models (albeit at the cost of additional complexity in parameterization and computation), for example, via autologistic or tree-structured dependence models (Hughes and Guttorp 1994b; Kirshner et al. 2004; Robertson et al. 2004).

The extension of the homogeneous HMM to the nonhomogeneous case, by making the transition probabilities in dependent on exogenous variables [as originally proposed by Hughes and Guttorp (1994a)], allows the model’s outputs to be influenced by large-scale spatial phenomena (such as atmospheric circulation; e.g., Hughes et al. 1999) and long-range temporal effects (such as seasonal and interannual variability; e.g., Greene et al. 2011). This approach is limited by the fact that the exogenous variables can only effect changes in the simulated daily precipitation indirectly, namely, by changing the probability of different states being visited by the model. In other words, this approach allows frequency of occurrence of the different state-dependent emission distributions to vary over time, but it does not allow the underlying distributional characteristics of precipitation to change for each state. This may be particularly limiting for simulation scenarios such as climate change, which may require the flexibility to capture changes in the likelihood of extreme precipitation events over time.

As an alternative, for the single-site case, Furrer and Katz (2007) proposed a GLM approach that allows the distributional characteristics of daily precipitation to be influenced directly by exogenous variables such as seasonal effects or climate indices. In the approach of Furrer and Katz, daily precipitation intensity at an individual site was modeled as having a gamma distribution with mean , where , with being a vector of time-varying exogenous variables, being a vector of regression weights estimated from the data, μ being an intercept term, and the parameterization playing the role of the “link function” in the GLM framework. Kenabatho et al. (2012) use a multistage GLM approach to incorporate multiple stations and spatial correlation.

The cohesive model we propose in this paper, which we term GLM–HMM, can be viewed as building on both the GLM’s ability to allow exogenous variables to influence precipitation distributional characteristics and the HMM’s ability to simulate multisite daily precipitation characteristics. Figure 1 shows the modeled dependencies between the precipitation variables , hidden states z, and exogenous variables in graphical form, omitting for simplicity details such as relationships between various parameters of the model. The precipitation amount at site s on day t, conditioned on the value of the Markov hidden state variable , is modeled as a mixture of a point mass at zero and two gamma distributions:
e2
where
  • are the probabilities (mixing weights) for the three mixture components, with , where (as described below) the ϕ values in turn are functions of the exogenous variable ;
  • is the first mixture component, a delta function at representing zero precipitation;
  • and are two gamma densities representing mixture components for light (j = 1) and heavy (j = 2) rainfall amounts, with parameters and that depend on the component j and (for λ) on the state value and site s.
Fig. 1.
Fig. 1.

A hidden Markov chain provides stochastic transitions between weather states z. The observed rainfall at each site (with three sites shown here for illustration) depends on both the Markov state and the exogenous variables . The exogenous inputs modify the “wetness” of the states over time, thus providing a climatic control on weather.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

In this paper we used fixed values for the α parameters, specifically and . The choice of means that the first component is an exponential density (since the exponential is a special case of the gamma, with ). This choice of an exponential and a gamma (with ) provides both flexibility in terms of the ability of the model to capture amount characteristics as well as being straightforward to work with in terms of estimation (see the appendix for further justification). We also used Bayesian information criterion (BIC; see Table 1) to evaluate alternative modeling choices in terms of the number of mixture components and various combinations of exponentials and gammas (α parameters chosen to allow for various shapes of the distribution) and found that the choice of one exponential and one gamma component (as used in this paper) generally produced the best results (lowest BIC scores).

Table 1.

BIC scores for models with different numbers of mixture distributions. A min BIC score (in boldface) is related to the best model. The first entry (α) of the gamma distribution is fixed and the second one (λ) is variable for each of the mixture distributions. When the first parameter is set to 1, it is equivalent to an exponential distribution.

Table 1.

Other parameterizations for the mixture components could also be used in principle within the GLM–HMM framework, such as extreme value distributions (e.g., Vrac and Naveau 2007; Johnson et al. 1996).

We allow the mixing weights to vary as a function of exogenous variables, which in turn allows the state-dependent precipitation distributions at each site s to change in response to external influences. In particular, let be a vector of observed exogenous variables for site s on day t. (In practice the exogenous input variables may not be available at a daily scale, but instead at a monthly or annual time scale.) We use a GLM approach to link a linear function of the exogenous variables, , to the mixture weight probabilities, via a probit link function (described below). Variable is a vector of estimated site-dependent coefficients, with one site-dependent coefficient for each component of the exogenous variable vector , and is an estimated intercept term, one for each site s and each state k.

More specifically, we use an ordered multinomial probit GLM (Albert and Chib 1993; Cox 1971; McCullagh and Nelder 1989) that allows the mixture weight probabilities to vary as a function of the exogenous vector at station s at time t. The multinomial probit model operates as follows: with J = 3 mixture components (and mixture probabilities), we define J + 1 = 4 bin boundaries (h = 0, 1, 2, 3), where , , and for identifiability. The boundary value is estimated from the data as described in the appendix (we can assume it is fixed in the context of the discussion below). Let be a normal distribution with unit variance and with a mean value that is a linear function of and where is an intercept term assuming that day t is in state k. The mixture probabilities are defined as the amount of probability mass under this normal PDF that lies between the different bin boundaries. More specifically, let and , where is the cumulative density function (CDF) for the PDF defined by , where the mean function is . In turn, we have
eq1
Thus, changes in the exogenous variable linearly influence the mean of the normal PDF, which in turn affects the mixture probabilities . By using the so-called probit link function, defined as the inverse CDF of the normal distribution, the continuous-valued inner product can be mapped to a set of multinomial probabilities. In particular, for our problem, the exogenous variable influences precipitation by modulating the weights for the different precipitation components [no rain (j = 0), light rain (j = 1), and heavy rain (j = 2)].

Figure 2 provides a simple illustration of the approach. Figure 2 (top left) shows the normal distribution for some particular value of . The fixed boundaries and are shown as vertical lines, with the resulting and values and the corresponding mixture weights . Figure 2 (top right) shows a similar picture, but where a different exogenous input has caused the inner product to become more negative and to shift the normal PDF to the left. This changes the values of and , in turn changing the mixture weights . For example, with this input , the probability of no rain is now much larger than it was for , and correspondingly the probability of the heavy rain component has decreased significantly.

Fig. 2.
Fig. 2.

A simple example illustrating how the mixing weights respond to changes in a 1D exogenous X variable using an ordered multinomial probit model. Shown are the normal (top) PDF and (bottom) CDF as a function of for two values of X (left and right). The mixing weights are the areas under the normal PDF, which are given by the values of the CDF at the bin boundaries delineating no rain, light rain, and heavy rain. The probability of heavy rain is increased in the left compared to the right. See text for details.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

We also allow the transition matrix for the hidden states to change as a function of time of year, to allow for seasonal variation in the frequency of weather states. For example, a relatively dry state (whose probability of zero rain is high for many sites) should have lower probability of occurrence during the rainy season. For modeling of summer seasons we use six different transition matrices , each covering a window of approximately 20 days per summer. We selected 20 days to be sufficiently long to estimate a K × K matrix of transitions, while being short enough to resolve the strong seasonality (especially for Punjab) within the short June–September (JJAS) season.

Letting be the set of all of the unknown parameters in the GLM–HMM described above, we use Bayesian estimation techniques to estimate , conditioned on the observed data (multisite daily time series of observed rainfall and exogenous variables) and parameter priors . In particular we use Markov chain Monte Carlo (MCMC) techniques (specifically, Gibbs sampling) to generate samples in parameter space for the posterior distribution and then use these samples in downscaling and prediction. Full details are provided in the appendix. A feature of the Bayesian approach is that it provides a flexible mechanism for handling uncertainty in our parameter estimates, allowing (for example) averaging over possible parameter values to provide a full characterization of uncertainty in precipitation simulations. In addition, missing precipitation observations in the observed record can be handled straightforwardly via the MCMC estimation technique, simply by treating the missing information as unknown random quantities in a manner similar to that of the hidden Markov states and the unknown parameters. In the Gibbs sampler, for each day we first draw one of the three mixture components (no, light, or heavy rain) based on information from the inputs; we then simulate rainfall from that component.

3. Rainfall data

Figure 3 shows the weather station locations for the Punjab and upper Yangtze River regions, situated between about 25° and 35°N. Rainfall station data were obtained from the National Centers for Environmental Prediction (NCEP) Climate Prediction Center (CPC) Global Summary of the Day (GSOD) observations for 1980–2010 (CPC/NCEP 1987). We selected the 22 stations in the Punjab region and the 52 over the upper Yangtze that have fewer than 20% missing daily readings for the JJAS season.

Fig. 3.
Fig. 3.

Station locations for two regions analyzed in the paper: Punjab and the upper Yangtze basin.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

The JJAS season was selected to approximately span the summer monsoon in both cases, accounting for the greater part of annual precipitation totals (73% in the Punjab region and 64% in Yangtze). The stations are from approximately 50 to a few hundred kilometers apart, motivating the structure of the GLM–HMM in which station rainfall is modeled as spatially independent, given the large-scale weather state.

Figure 4 shows spatial summaries of empirical statistics for both regions, including mean daily rainfall (Figs. 4a,d), probability of rain (Figs. 4b,e), and mean daily intensity (Figs. 4c,f). Mean daily intensity is defined as the average rainfall only for days it rained (≥0.1 mm day−1), while mean daily rainfall includes all days. Over the Punjab, rainfall probabilities are highest in north-central India, while mean daily intensities are largest in northern Pakistan; the desert region to the west experiences the lowest rainfall amounts. The upper Yangtze exhibits comparable station amounts to the Punjab, with larger values toward the north and east where mean intensities are largest. Rainfall probabilities over the upper Yangtze, by contrast, are larger in the western region. Mean JJAS station rainfall ranges from 0.5 to 6.5 mm day−1 (station average 3.0 mm day−1) for the Punjab region and from 2.1 to 8.8 mm day−1 (station average 4.8 mm day−1) for the Yangtze.

Fig. 4.
Fig. 4.

Three summary metrics of the rainfall data. Stations in (a)–(c) Punjab and (d)–(f) the Yangtze basin. Mean rainfall is of comparable magnitude in the two regions, with large spatial gradients in the two regions. Note that season length is reflected in rain probability.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Exogenous variables

We use a two-dimensional exogenous variable in our experiments. The first of the two variables accounts for seasonality in the rainfall data. This seasonal exogenous variable is defined as the probability of rain on a given day of the year, averaged over all years, and is smoothed using a 10-day sliding window—it is defined separately for each station s.

The second of the two exogenous variables in allows for interannual variability and is taken here to be the station rainfall averaged first over the season (i.e., 122 days), and second over the spatial network (i.e., 22 stations for Punjab region and 52 for the Yangtze basin). The station-level seasonal averages are first standardized by subtracting their mean and dividing by their interannual standard deviation, so as to form a so-called standardized anomaly index (SAI; Katz and Glantz 1986). These SAI time series are shown in Fig. 5. This idealized case is used to illustrate the workings of the model and to quantify the ability of the model to downscale the seasonal climate anomalies. On the one hand, it represents the perfect case because the predictor is an average of the station-level predictands. On the other hand, the predictands are daily and station scale so errors will stem from the downscaling aspect of the model, reflecting intrinsic limitations to downscaling as well as model deficiencies.

Fig. 5.
Fig. 5.

The SAI of observed station rainfall for the Punjab and Yangtze basin networks (dimensionless).

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

The SAI is not highly correlated (cor) with the Niño-3.4 sea surface temperature index of ENSO for either region [cor(SAI-Punjab, Niño-3.4) = 0.24, cor(SAI-Yangtze, Niño-3.4) = 0.14], and the SAI variables for the two regions are relatively uncorrelated [cor(SAI-Punjab, SAI-Yangtze) = 0.10]. The variance (var) of the SAI provides a measure of the spatial coherence of the rainfall field, ranging from var(SAI) = 0 for an uncorrelated field to var(SAI) = 1 for a perfectly correlated field (Moron et al. 2007). The values for the two networks of stations are var(SAI-Punjab) = 0.30 and var(SAI-Yangtze) = 0.17, indicating low spatial coherence of interannual rainfall variation in both cases compared to those found previously for other networks of rainfall stations (Moron et al. 2007). These values indicate intrinsic limitations to downscaling of interannual rainfall variability from these regional aggregates, particularly for the larger Yangtze network.

4. Inferred states and parameters

a. Choosing the number of states

To fit the GLM–HMM, we first selected the number of hidden states for each of the two regions. We fit the model with different numbers of states and then selected the value of K that minimized the BIC. We also experimented with the deviance information criterion (DIC; Spiegelhalter et al. 2002) for model selection but found that it consistently preferred the models with the largest numbers of parameters—this may be because of nonnormality in the parameter posterior distributions, a known issue for DIC. For this reason we opted to use BIC rather than DIC in this work. BIC is defined (for a given K) as the negative log likelihood of the data plus a penalty term for model complexity (Schwarz 1978). We use the approach of Scott (2002), adapted to our model, to compute the log likelihood in a computationally efficient and stable method using a recursive algorithm. The GLM–HMM for the Punjab region has a minimum BIC score with K = 4. For the Yangtze basin region, the BIC is minimized for K = 7, but the BIC values for K = 6, K = 7, and K = 8 are not significantly different, so we selected K = 6 for parsimony. The larger number of states in the Yangtze model is consistent with the larger number of Yangtze stations and their smaller spatial coherence with respect to interannual anomalies compared to the Punjab.

In practical applications of model selection, it is common to augment quantitative model selection criteria (such as BIC) with additional assessment of other attributes of the model (e.g., Shirley et al. 2010). In this spirit, we use the BIC to inform our decision but also investigate (below) various aspects of the model and its interpretation.

b. Estimated GLM parameters

As described in the appendix, we use a Bayesian approach to generate samples of the unknown parameters and the hidden states from their posterior distributions conditioned on the data. Each iteration of the sampling algorithm generates a single sample for each of the parameters and states. In the results in this paper, we used N = 2000 iterations (or samples).

Figure 6 shows the mean estimated coefficient values for each station and their 95% probability intervals for the seasonality and SAI inputs. The mean values and 95% intervals are computed from the N = 2000 parameter samples. With one exception, the probability intervals do not include zero (indicated as a horizontal line), indicating that the β coefficients associated with the exogenous inputs of both seasonality and interannual variability are statistically significant from a Bayesian perspective.

Fig. 6.
Fig. 6.

Coefficient values for seasonality and the SAI inputs, for both the (a),(b) Punjab and (c),(d) Yangtze regions, with the mean coefficient values (circles) and 95% probability intervals. The stations are ordered arbitrarily. Note that the seasonality input was not normalized, so that the amplitude values are not directly comparable between the left and right.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

c. Punjab rainfall states

The rainfall states encode the seasonality of station-scale rainfall, as well as the patterns of subseasonal weather variability. After running the sampling algorithm on the GLM–HMM with 30 years of data, for each day t in the data we have N = 2000 different samples of the hidden states . In generating the figures in this and the next subsection, we assign each day t to the most frequently occurring state for that day across the N samples, providing state assignments in a manner similar to that computed via the Viterbi algorithm (Forney 1973) under a maximum likelihood framework as used in previous rainfall modeling studies with HMMs (e.g., Robertson et al. 2004). This maximum a posteriori probability (MAP) estimate of the state sequence is used to gain an interpretation of the states as an important biproduct of the HMM, while the full posterior draws are used in analyzing the rainfall simulations in section 5 below.

Figure 7 shows the probability of rainfall (Figs. 7a–d) and mean daily intensity (Figs. 7e–h) for each of the four states in the model for the Punjab region. The states are loosely ordered from wettest to driest. As found in Greene et al. (2008), who used a four-state HMM for rainfall over a similar region, there is a state that is generally wet at all stations (state 1) and one that is dry at all stations (state 4), together with two intermediate states with a somewhat north–south gradient. There is a general correspondence between high rainfall probability and high mean intensity for the wetter states, as found in Greene et al. (2008).

Fig. 7.
Fig. 7.

Punjab region with SAI as an input and K = 4 states. (a)–(d) Probability of rain; (e)–(h) mean daily intensity. The relative frequencies of the four states are 20%, 18%, 28%, and 34% for states 1–4 (from wet to dry), respectively.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Figure 8 shows the estimated state sequence, together with its seasonality averaged across the 30 years. Again, each day t is assigned to the state that occurs most frequently for that day across the N samples. The dry state occurs more frequently in June and September at the expense of the wetter states, though not exclusively so, and there are rapid fluctuations in time between the states. These rapid fluctuations are more pronounced than for a similar HMM analysis of northern India by Greene et al. (2008), and seasonality is less prominent. This is consistent with the greater aridity of the Punjab region. As seen in Greene et al. (2008), extended monsoon-break episodes of the dry state occur in some years. There is no obvious trend in the year-to-year occurrence of the states.

Fig. 8.
Fig. 8.

(top) Estimated state sequence and (bottom) seasonality for the Punjab network. For seasonality, the average number of days per calendar date that fall into each state is summed over the 30-yr period.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

d. Upper Yangtze basin rainfall states

Figure 9 shows the same types of plots as in Fig. 7 for the upper Yangtze basin, showing the probability of rainfall (Figs. 9a–f) and mean daily intensity (Figs. 9g–l) for each of the K = 6 states. The states are again loosely ordered from heavy to light rain. As with the Punjab region, there is a general correspondence between high rainfall probability and high mean intensity for the wetter states. Compared to the spatial variation in the states for the Punjab region (Fig. 7), there is considerably more spatial variation within each state over the larger Yangtze basin network. State 3 is characterized by high rainfall probabilities over the western part of the domain with low mean intensities indicative of drizzle, for example.

Fig. 9.
Fig. 9.

The Yangtze basin network of stations with SAI as an input and K = 6 states. (a)–(f) Probability of rain; (g)–(l) mean daily intensity. The relative frequencies of the six states are 18%, 18%, 16%, 19%, 15%, and 14% for states 1–6, respectively.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Figure 10 shows the estimated state sequence and its seasonality for the Yangtze basin network, analogous to Fig. 8. The sequence is much noisier than for Punjab and average seasonality is less easy to discern. This is partly a function of the total number of states (six vs four for Punjab) and illustrates the larger complexity of the region’s subseasonal rainfall variability. The monsoon in the Yangtze begins earlier in the calendar year than the Punjab region and is more homogenous throughout the season. There is again no obvious trend in the year-to-year occurrence frequencies of the states.

Fig. 10.
Fig. 10.

As in Fig. 8, but for the Yangtze basin network.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

5. Rainfall simulations

a. Seasonality

In Fig. 11, we assess the quality of the model in terms of reproducing the seasonal cycle. The black line shows the seasonal cycle of the observed and mean simulated rainfall amount for the Punjab (Fig. 11a) and the Yangtze basin (Fig. 11b), obtained by averaging rainfall amount over each station for each calendar day over 31 years. The mean simulated amount per day and 95% confidence bands are also shown in gray, as computed from 500 simulated 31-yr datasets (as described in the appendix) conditioned on the observed SAI and seasonal probability of occurrence as exogenous variables. The 500 × 31 simulated daily time series are then averaged over each calendar day (and over stations) to produce the mean and probability bands in Fig. 11. The model captures the seasonality of rainfall amount in both regions, with the simpler monsoon seasonality in the Punjab region more accurately represented. This may be because of greater spatial heterogeneity in seasonality across the larger Yangtze region where some stations have early seasonal peaks, while others may be multimodal or have seasonal peaks later in the season (not shown).

Fig. 11.
Fig. 11.

Observed daily rainfall PDF (black), showing the seasonal cycle for (a) Punjab and (b) the Yangtze basin. The mean fit and 95% confidence bands (gray) for 500 simulated datasets generated from the model are shown.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

b. Comparisons of distributions and extremes

1) Modeling the rainfall distribution

Figure 12 shows the observed rainfall distributions for selected wet and dry stations from each region (results for other stations were found to be generally similar). The observed log frequencies of rainfall (histogram) are compared to 500 simulated datasets of 30 years of data generated via conditional simulation from the model, represented by 95% confidence intervals plotted on the histogram. The model is seen to capture the distributions well in both regions, for both the wetter and drier stations. There is a slight bias in the Yangtze heavy rain station between 30 and 50 mm, where the model is overestimating days in this range, but otherwise the emission distributions are generally a good fit to the observed distributional nature of the data.

Fig. 12.
Fig. 12.

Histograms of log frequency of precipitation and error bars from 500 simulated datasets from the model.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

2) Extreme rainfall events

We investigated the model’s ability to capture extreme behavior in terms of daily amounts. The model was not specifically designed to model extreme events and relies solely on the tail behavior of the gamma distribution to model the extremes (we are not suggesting to replace extreme value theory models when the extremes are the primary interest, but rather here we evaluate one of our model’s attributes, which is its ability to capture extreme behavior). The estimation of the parameters of the gamma distribution tends to be dominated by the more frequent low rainfall values rather than the much rarer extremes. We fit a generalized extreme value (GEV) model to the annual maximum rainfall at each station, using the extremes library in R, and report the return levels of 50-yr return period events (Furrer and Katz 2007). Figure 13 shows summaries of 500 simulations from the model as box plots, one per station. Despite its simplicity, the model captures the extremes relatively well in the Punjab region, although it tends to underestimate them in the upper Yangtze basin.

Fig. 13.
Fig. 13.

The 50-yr return levels for annual block max, from GEV model fits. There is one box plot for each station: round markers represent observed values, box plots show the simulated data. The middle horizontal black line is the median, the lower and upper bounds of the box plot are the 25th and 75th percentiles, respectively.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

c. Interannual variability

As mentioned above, the exogenous variables in the model include 1) a station-scale, seasonal-cycle component and 2) a regional-scale, seasonal-average variable, the SAI. In this section, we evaluate the interannual predictive skill of the model in response to the SAI variable. In the results below, models are built both with and without the SAI variable (for comparison purposes), and all models include the seasonal-cycle exogenous variable. The evaluation is carried out via sixfold cross validation, where six different consecutive 5-yr blocks of data are held out in turn (except for the year 1980), and the model is fit to the other 26 years of data. For each fitted model we generate 1000 predictive runs for each of the 5 held-out years, conditioned on the SAI value for that year (the appendix provides a more detailed description of how predictive runs are generated).

Figure 14 shows the results of these conditional predictions for the Punjab (Figs. 14a,c,e,g) and Yangtze (Figs. 14b,d,f,h). Various predictive statistics of interest (y axis) are shown per year (x axis) from 1981 onward, where all statistics are computed by spatially averaging the simulated rainfall across all stations. Figures 14a and 14b are shown for calibration purposes and show the annual-average rainfall when predictive simulations are generated without using the SAI input variable. The boxplots reflect the variability (from the model) over the 1000 simulated years, where the median (50% percentile) is shown as a horizontal black line and the 25th and 75th percentile are the lower and upper bounds of the shaded box, respectively. The black dots show the actual observed average rainfall per year. The observed rainfall varies significantly from year to year, while the model simulations (without any interannual input) remain relatively constant in their distribution (as one would expect, since any variation from year to year is entirely due to sampling variability in the simulations).

Fig. 14.
Fig. 14.

Punjab and the Yangtze basin with observed station-average values (black dots) and box plots for 1000 predicted station-average values from the model. (a),(b) Average annual rainfall amount with no input; (c),(d) average annual rainfall amount with SAI as an input; (e),(f) number of days annually with a rainfall event over 70 mm with SAI as an input; and (g),(h) number of dry days annually with SAI as an input.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Figures 14c–h show the same type of plots, now with the SAI variable included in the model, for three specific metrics of interest (Hughes et al. 1999; Joshi and Rajeevan 2006): mean daily rainfall (averaged over the JJAS season), the count of rainfall days over 70 mm (0.6% of Punjab days and 0.5% of Yangtze days), and the count of dry days (77% of Punjab days and 45% of Yangtze days). For the average rainfall metric (Figs. 14c,d) the model’s simulations fairly accurately bracket the observed rainfall for both the Punjab and the Yangtze, which is anticipated given that the input is the SAI. The model also performs fairly well for the annual number of simulated dry days (Figs. 14g,h), for both regions. For the 70-mm rainfall extremes (Figs. 14e,f), the model captures some of the observed interannual variation over the Punjab but not for the Yangtze.

Figure 15 shows results from the same simulations as scatterplots. Each point in each plot represents the root-mean-square error (RMSE) for a particular year (again for the rainfall averaged over stations), where for each such year the RMSE is computed between the 1000 simulated years and observed statistic for that year. The dots are shaded, with darker shading corresponding to wetter years (according to the respective statistic), and the RMSE values averaged over all years are given in the top left of each panel. The x axis corresponds to not using the SAI variables as an exogenous input and the y axis corresponds to including SAI. In the case of dry-day counts (Figs. 15e,f), including the SAI leads to a significant reduction in RMSE for most years (i.e., many of the yearly points are well below the diagonal), particularly over Punjab. This occurs to a lesser extent for seasonal rainfall amount, and only marginally for heavy rain days, and then for only 6 years over Punjab (cf. Fig. 14). There is no clear correspondence between the size of the RMSE and wetter or dryer years (dot shading).

Fig. 15.
Fig. 15.

RMSE values per year, averaging over all stations, with (y axis) or without (x axis) the SAI input, for (a),(e),(e) Punjab and (b),(d),(f) the Yangtze basin. The values in the top-left corner are the RMSEs averaged over all years. Light gray to dark gray indicates light to heavy rain stations.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Figure 16 shows analogous RMSE plots, but where each point now represents an individual station and the averaging is performed over years to show the impact of the SAI at each location, thus depicting the model’s downscaling skill. For each station, the rainfall statistic is first computed for each year of the 1000 simulations and the observations; the ensemble mean of the simulations is then computed, and the square of the differences is computed for each year. The mean over years is then taken and the square root plotted in Fig. 16. Again, the largest impact of the SAI is seen for dry-day count, and for the Punjab region, with less impact of the other variables, especially 70-mm rain days.

Fig. 16.
Fig. 16.

As in Fig. 15, but for RMSE values per station, averaging over all years.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

In summary, Figs. 1416 demonstrate that the GLM–HMM with SAI of seasonal regionally averaged rainfall does exhibit skill (in terms of reduced RMSE) at the station level and for individual years, but that this skill is largely limited to dry-day counts, though with some skill for seasonal rainfall total over the Punjab. These findings are discussed further in the conclusions section.

d. Comparison to the NHMM

In this subsection we compare the model’s performance with that of an equivalent NHMM, similar to those used by Hughes et al. (1999) and Robertson et al. (2006). Here the NHMM estimation is Bayesian as in the current case, but with the same two-dimensional exogenous input used to modulate the NHMM’s transition probabilities. Here, we use the station-average seasonality in place of the station-scale one used in the GLM. The same number of states were used in each case.

Figure 17 shows the NHMM’s simulation of the seasonality (Figs. 17a,b), and interannual performance of dry-day counts (Figs. 17c,d). The seasonal cycle for the GLM–HMM (Fig. 11) and the NHMM (Figs. 17a,b) are quite similar for the Punjab region, but the GLM–HMM clearly outperforms the NHMM over the Yangtze basin. The interannual performance of the two models is very close, with the RMSE of dry-day counts of the two models plotted against each other stationwise in Figs. 17c and 17d. The station averages of the RMSEs are also given in the caption of Fig. 17 for dry-day counts, average rainfall, and number of 70+ mm days. These again show very similar numbers for the two models, though the GLM–HMM has a slight edge (smaller RMSE) overall.

Fig. 17.
Fig. 17.

NHMM (comparable to Figs. 11 and 16) for (a),(c) Punjab and (b),(d) the Yangtze basin. Station-averaged seasonal cycle from the NHMM (top; details as in Fig. 11). Interannual performance of NHMM vs GLM–HMM per station, averaging over all years, RMSE for the number of dry days with SAI as an input (bottom; details as in Fig. 16). The station-averaged RMSEs for Punjab for the NHMM (GLM–HMM) are 5.91 (5.93), 1.13 (1.11), and 0.84 (0.81) for dry-day counts, average rainfall, and number of 70+ mm days, respectively. For the upper Yangtze the numbers are 8.31 (8.28), 1.17 (1.13), and 0.68 (0.65), respectively.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Table 2 shows the Markov conditional transition probabilities of dry–dry, dry–wet, wet–dry, and wet–wet day transitions at the station level for the NHMM and GLM–HMM. The first row for each region gives the probabilities from the observed data (missing observations are not included). Both the NHMM and GLM–NHMM capture these transition probabilities reasonably well. Values for the GLM–NHMM are marginally closer to observed than those from the NHMM, although differences are small.

Table 2.

Markov conditional rainfall occurrence probabilities pooled over stations for the observed data and for two models for Punjab and the Yangtze basin.

Table 2.

e. Meteorological associations with rainfall states

The HMM is designed to represent synoptic-scale subseasonal rainfall variability and seasonality across the rainfall network, with interannual variability and additional seasonality encoded via the GLM. To gain insight into the synoptic-scale rainfall patterns associated with the states, we plot composites of horizontal fluxes of moisture and vertical air motions, averaged over the days assigned to each state. These diagnostics are calculated based on NCEP–NCAR (version 1) reanalyses data (Kalnay et al. 1996) and presented in terms of deviations (anomalies) from the JJAS climatological time averages displayed in Fig. 18. The moisture flux is integrated vertically from the surface to 200 hPa. This seasonal-average climatology exhibits a broad moist monsoon southwesterly flux with maxima over the Arabian Sea and Bay of Bengal, located upstream and to the south of the Punjab and Yangtze regions, respectively, together with a weaker southerly flux from the South China Sea into China. A region of maximum climatological ascent at 500 hPa (negative omega) is located over the Bay of Bengal, with our rainfall networks situated on its northwest and northeast margins.

Fig. 18.
Fig. 18.

Climatological average vertically integrated moisture fluxes (kg m−1 s−1; vectors) and pressure “omega” vertical velocity (mb h−1; contours with an interval of 0.5 mb h−1, negative dashed) for JJAS.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Figure 19 shows the composite atmospheric anomalies for the Punjab states. State 1 (wettest) shows an enhanced southwesterly moisture flux from the Arabian Sea with a cyclonic anomaly to the north and strong convergence and anomalous ascent over Punjab. The main moisture source is over the Arabian Sea, even though the path of moisture advection crosses the northern Bay of Bengal. State 1 is most frequent from late June to mid-September (Fig. 8) and is representative of active monsoon conditions; Fig. 19a can thus be interpreted as the difference between active monsoon conditions and the JJAS seasonal average, reflecting both seasonality as well as subseasonal phenomena (Greene et al. 2008). State 3 exhibits a rather similar pattern, though weaker, and with less extension to the north, consistent with the more southerly location of the largest rainfall probabilities in Fig. 7. State 2 presents a different pattern indicating a northward displacement of the mean moisture current, while the dry state 4 is almost a mirror image of state 1, with an anticyclonic circulation anomaly. The emergent mechanistic picture is that of a monsoon trough anomaly—either strong (state 1) or weak (state 3)—or a northward displacement of the mean moisture current toward the Himalayan foothills (state 2), consistent with the HMM of Greene et al. (2008) over a region slightly farther south.

Fig. 19.
Fig. 19.

Composite anomalies of vertically integrated moisture fluxes (kg m−1 s−1; vectors) and pressure “omega” vertical velocity (mb h−1; contours with an interval of 0.5 mb h−1, negative dashed) for the Punjab states. The magnitude of the largest moisture flux anomaly for each state is given in the top-left corner (kg m−1 s−1). The number of days contained in each composite is given in the case.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

Figure 20 shows composite atmospheric anomalies for the Yangtze states. State 1, the wettest, is characterized by a northward displacement of the westerly monsoonal moisture current near 25°N and broad anomalous ascent over the Yangtze basin. States 2–4 are characterized by regionally limited Rossby-wave-like circulation anomalies. States 2 and 4 exhibit an anticyclonic wave centered over eastern China with a southerly flux of moisture on its westward flank, in both cases with dipolar vertical motion anomalies and ascent on the northern flank. State 3 is a “drizzle” state with very small rainfall intensity, small ascent yet relatively large northerly moisture fluxes. State 5 is relatively wet in the south, associated with southerly moisture flux. In summary, the state rainfall distributions in Fig. 9 are consistent with the composite moisture flux and vertical velocity anomalies, while Rossby waves play a larger role in explaining the rainfall anomalies in this region, consistent with its location downstream of the Tibetan plateau. Seasonality is weaker and the HMM component of the GLM–HMM thus largely represents this Rossby-wave-like subseasonal variability.

Fig. 20.
Fig. 20.

As in Fig. 19, but for the upper Yangtze basin states.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0142.1

6. Summary and conclusions

We have presented a new approach to multisite precipitation downscaling that differs from previous nonhomogeneous HMMs through its incorporation of exogenous variables as direct modifiers of the mixture of the emission distribution at each station, rather than using logistic regression to influence the transition probabilities of the Markov chain. The new model combines a weather state model (via an HMM) to capture spatial station dependencies, and generalized linear model (GLM) to incorporate exogenous variables at a station level (independent of the HMM state) to downscale precipitation. Parameter estimation for the model is performed using a Bayesian framework, allowing (for example) the ability to average over parameter uncertainty when generating simulations from the model.

Two illustrative applications of the model were presented for regions of monsoonal Asia, the Punjab and upper Yangtze River basin. For each region we used two exogenous variables as input: the seasonal-cycle-dependent likelihood of precipitation occurrence at each station, and a standardized seasonal-average rainfall amount averaged over all stations (the SAI). The first variable provides a simple way of incorporating seasonality at the station level, while the second allows us to test the model’s ability to disaggregate or “downscale” seasonally and regionally averaged rainfall amounts to daily sequences at the station level. These choices are meant as simple illustrations, and the predictors could be chosen separately for each station, and vary daily if desired.

The model performance is evaluated using a large ensemble of cross-validated daily rainfall simulations, in terms of seasonality, daily rainfall distributions, 50-yr return levels, and interannual variability of daily rainfall characteristics. The model captures the marginal daily distributions well in both regions, for both the wetter and drier stations (Fig. 12). Despite its simplicity, the model captures the extremes relatively well in the Punjab region, although it tends to underestimate them in the upper Yangtze basin (Fig. 13). In terms of interannual variability, the GLM–HMM with SAI of seasonally and regionally averaged rainfall as a predictor is shown to exhibit skill (in terms of reduced RMSE) at the station level and for individual years at the regional level, particularly for the Punjab region. The skill is largest for dry-day counts, with less skill in the seasonal rainfall totals, and almost none in the number of extreme wet days (Figs. 1416).

The model’s GLM component relates the exogenous variables to each station’s rainfall distribution expressed as a mixture of dry-day, light rain, and heavy rain component distributions via a mixture model (Figs. 1, 6). Previous studies have found dry-day counts to be more seasonally predictable than seasonal rainfall totals in monsoonal climates (Robertson et al. 2009), and this is confirmed by our results (Figs. 1416) for the two summer monsoonal regions considered here, in the sense that interannual variations of dry-day counts at local scale are more closely tied to the regional SAI of seasonal total rainfall than are the station totals themselves (Fig. 16).

The higher model skill in downscaling of rainfall frequency at local scale compared to seasonal rainfall total can be interpreted via the higher spatial coherence of seasonal anomalies of rainfall frequency compared to mean daily intensity (Moron et al. 2007). The spatial coherence given by the variance of the SAI computed from the Punjab network is 0.30, 0.47, and 0.10 for seasonal rainfall total, dry-day count, and 70-mm wet-day count, respectively. The numbers for the Yangtze network are 0.17, 0.38, and 0.09. Both sets of numbers are consistent with the RMSE results in Figs. 15 and 16, highlighting that the spatial coherence of the field dictates the ability of the model to downscale rainfall from the regional average.

The HMM’s states represent spatial patterns of regional-scale rainfall variability (Figs. 7a, 9a) inferred from the daily rainfall data, while the estimated state sequence shows their temporal evolution over the historical period of the data (Figs. 8, 10). Over the Punjab, the heavy rainfall state (Fig. 7a) is associated with a pronounced monsoon trough that draws moisture primarily from the Arabian Sea (Fig. 19). Our results provide additional confirmation of this source region. Over the upper Yangtze, the heaviest rainfall state (Fig. 9a) is associated with an extension to the north and east of the monsoon current emanating over northern South Asia, rather than the Bay of Bengal that dominates in the climatological moisture flux picture (Fig. 20). Other wet states over the Yangtze highlight the role of transient Rossby wave patterns that may be excited downstream of the Tibetan plateau.

The estimated state sequences (Figs. 8, 10) express synoptic-scale variability across time scales from subseasonal to interdecadal, as well as the seasonality of rainfall within the JJAS season. In simulation mode, however, the HMM contains no exogenous variables, and its seasonally dependent Markov transition matrix can only generate variability on the daily time scale. The cases presented here encode interannual variability solely through the regional-scale SAI variable, while subseasonal variability is generated by the HMM. This distinction is used for simplicity here and could be relaxed in future work, both by allowing the exogenous variables to encode subseasonal variability, such as associated with intraseasonal oscillations that contain predictability, as well as allowing a nonhomogeneous HMM component.

One of the motivations for constructing a GLM–HMM in which the exogenous variables act only via the GLM component (and not via modulating state transition as in the NHMM) was to test the validity of this alternative approach. A truly surprising result of this work is just how similar the interannual performances of these two approaches turned out to be for the rainfall statistics considered (Fig. 17). There are physical arguments for both paradigms of interannual climate variability. In the (loosely speaking) “linear” paradigm, a separation of times scales is made between fast “weather noise” and slower climatic influences such as ENSO; observed interannual variability is then assumed to be the sum of a stationary daily stochastic process and interannual forcing (e.g., Charney and Shukla 1981; DelSole and Feng 2013). This directly motivates the GLM–HMM approach. The alternative “nonlinear” paradigm argues that there is no separation of scales and that interannual variability at local scale results from a modulation of nonlinear weather regimes that form the chaotic attractor of weather and climate variability (e.g., Ghil and Childress 1987; Palmer 1998). This physical model has motivated the use of the NHMM in climate studies of daily rainfall sequences (Robertson et al. 2004). There is indeed evidence that drought years over India are associated with more monsoon breaks and that these are well captured by an NHMM (Greene et al. 2008). However, for climate change downscaling, the fixed nature of the NHMM states was found to be a limitation (Greene et al. 2008). Thus, a combined GLM–NHMM is advocated and this is the subject of our current work.

There are a variety of other extensions of the model that could be pursued. For example, the exogenous variables in the GLM–HMM could be station specific in order to encode additional spatial information, unlike in the NHMM approach; as in the case of the NHMM, they can also have higher frequency in time. The poorer results for the upper Yangtze basin may be due to the larger size of the region or to the more complex nature of the circulation patterns there (Fig. 9a). The ability of the GLM–HMM to encode GLM predictors at the station scale provides the potential to include smaller spatial (or temporal) scale information from a high-resolution GCM if the latter are shown to be reliable. Thus, there is considerable flexibility in how the GLM–HMM approach can be used for GCM downscaling. The model could also be extended to include an NHMM in place of the HMM, thus allowing exogenous variables to modulate the state transitions as well. Finally, while gamma distributions are shown to yield quite reasonable 50-yr return levels for the Punjab case, extremes in the upper Yangtze are less well captured, for which heavy-tailed distributions could be included in the mixture model.

In conclusion, the Bayesian GLM–HMM developed and tested here shows promise as a downscaling methodology for station rainfall and has several potential advantages compared to existing non-Bayesian NHMMs, through explicit estimates of parameter uncertainty and the model’s ability to modulate simulated rainfall distributions as a direct function of exogenous predictors. This latter feature may be particularly useful in climate change downscaling studies, where state stationarity assumptions pose an impediment for NHMMs.

Acknowledgments

We are grateful to two anonymous reviewers whose insightful and constructive comments contributed to the revised version of the manuscript. This work was supported by the U.S. Department of Energy Grant DE-SC0006616, as part of the Earth System Models (EaSM) multiagency initiative.

APPENDIX

Bayesian Sampling of the GLM–HMM

In this appendix, we provide mathematical details for constructing the GLM–HMM in a Bayesian framework. The simplest form of MCMC is Gibbs sampling because they are simple to sample from and do not require tuning. But Gibbs sampling requires all posterior full conditionals to be well-known distributions. To do this type of sampling for the GLM–HMM, we must introduce two sets of latent parameters: one set for the mixing weights of the emission distributions and one set for the generalized linear modeling connecting the linear term with the probabilities .

Precipitation amount is modeled with a point mass at zero and a mixture of gamma distributions. We set the first parameter of both gamma distributions, and (thus, the first gamma distribution is an exponential and the second is very similar to an exponential but shifted slightly away from zero to indicate heavier rainfall). A mixture of two exponential distributions (three-parameter distribution) is claimed to be better than a single gamma distribution (two-parameter distribution) when modeling rainfall (Wilks 1999a,b). Using the BIC as a metric, we tested different numbers of mixtures of exponential distributions and found two to be the best for the Punjab region and two or three to be the best for the Yangtze basin (we chose two mixtures for our models, so as not to overparameterize). Thus, the emission distribution for precipitation amount has a mixture of probability density functions as follows:
eq2
where , , and . As described in section 2, the probabilities are related to the exogenous variables and coefficients through a link function : . For sampling reasons, we choose to be a probit link function that has properties that allow for a data augmentation approach (Albert and Chib 1993). In the Bayesian framework, the Gibbs sampling algorithm relies on the probit form of the ordered multinomial; however, the probit is quite similar to the logit link function (Paap and Frances 2000).
The sampling algorithm employs two sets of latent variables: the categories in the GLM–HMM are ordered no rain, light rain, and heavy rain, respectively. Thus, the latent parameters are the outcome categories of the multinomial distribution of the GLM. A second set of latent variables are introduced for Gibbs sampling of the coefficients where the mean function is linear (Albert and Chib 1993). In our model, is the variable for probability of rainfall occurrence on each calendar day; it is defined for each day of the year, which is the same for each year and station specific. The SAI variable describes annual rain and is the same for all stations. There is an additional variable γ that is used to determine the break points between the normally distributed variables giving rise to the bins of the multinomial distributed variables. Here, the three bins denote no rain, light rain, and heavy rain. The first break point is fixed; it can arbitrarily be set to without loss of generality. For identifiability, the model only has one unknown break point . As a note, if more bins are added, then more unknown break point variables would be needed. Equation (A1) shows the relationship between the latent variables and :
ea1
The GLM–HMM has a conditional independence assumption as the previous day informs the current day through this set of equations:
eq3
where are the transition probabilities and is the hidden state on a given day t. The transition probabilities q can best be described as a matrix where is the kth row of the transition matrix:
eq4
There is one restriction: all rows sum to one, .
Additionally, a few parameters need prior distributions; we choose them such that the posterior full conditional distributions are conjugate, which allows for Gibbs sampling. Noninformative or flat priors cause improper posterior distributions in some cases. Our priors were chosen to be somewhat informative (but allow the data to carry most of the weight in the posterior distribution):
eq5
Because , , and are latent variables, they are not given priors; their full conditional posterior distributions will be given to show their dependence on the other parameters of the model.

Algorithm considerations

Here we present the posterior full conditional distributions for all parameters and latent variables. The algorithm can be implemented with a Gibbs sampler. The algorithm is run for iterations. The posterior full conditional distributions are as follows:
eq6

1) MCMC algorithm details

The algorithm implemented here is a blocked Gibbs sampler. The MCMC chain can be started in multiple locations to ensure that convergence is not to a local mode. Two hundred iterations are treated as burn in and not used to calculate posterior means; we use 2000 samples for all of the plots. However, most of the parameters’ posterior full conditional distribution become stationary after just a few iterations.

2) Missing data

It can be important to impute missing data in a time series analysis; removal of these points affects the autoregressive nature of the model. The missing precipitation values are treated as unknown parameters and imputed. In the Bayesian framework, each point of missing data is assumed to have a conditional distribution. This creates a straightforward analysis of the temporal dependence since during the algorithm the missing values are imputed and updated. The posterior full conditionals for the parameters of the model and missing values that can be used in an MCMC algorithm are given as follows:
eq7
The known data are fit along with the model parameters in the algorithm. As we learn about the unknown parameters of the distribution, the missing data are also learned. The result is a set of draws from the conditional posterior distribution for each missing data point. This way of modeling missing data accounts for our uncertainty in the true value of the missing data points, whereas other forms of missing data imputations tend to impute the missing data through smoothing or a single point estimate and use that imputed value as the truth without quantifying the uncertainty.

3) Replicates

We need to produce replicates (simulated years of data) from the model (to calculate metrics like average rainfall per state and RMSE values to assess model fit). We can do this as one of the steps of the algorithm. First, we calculate a new sequence of hidden states and draw of all of the parameters from their posterior distributions, and then we can draw a new set of predictive rainfall . Then these replicates can be used to assess the model fit:
eq8
We can also calculate predictive runs for time periods of the model that are not observed. This is particularly important for cross validation; we may want to hold out a set of observed years and compare them to the predictive distribution. Variable corresponds to the exogenous variables for the years that are held out. We start by drawing a sequence of states for this predictive region, and we calculate the probabilities of no rain, light rain, and heavy rain associated with the held-out years. Finally, we draw the predictive rainfall for the held-out years . These draws of predictive rainfall can be compared to the actual observed amounts for cross validation (this calculates metrics like RMSE values for the model prediction):
eq9

REFERENCES

  • Albert, J. H., , and Chib S. , 1993: Bayesian analysis of binary and polychotomous response data. J. Amer. Stat. Assoc., 88, 669679, doi:10.1080/01621459.1993.10476321.

    • Search Google Scholar
    • Export Citation
  • Ambrosino, C., , Chandler R. E. , , and Todd M. C. , 2014: Rainfall-derived growing season characteristics for agricultural impact assessments in South Africa. Theor. Appl. Climatol., 115, 411426, doi:10.1007/s00704-013-0896-y.

    • Search Google Scholar
    • Export Citation
  • Bellone, E., , Hughes J. P. , , and Guttorp P. , 2000: A hidden Markov model for downscaling synoptic atmospheric patterns to precipitation amounts. Climate Res., 15, 112, doi:10.3354/cr015001.

    • Search Google Scholar
    • Export Citation
  • Charles, S. P., , Bates B. C. , , and Hughes J. P. , 1999: A spatiotemporal model for downscaling precipitation occurrence and amounts. J. Geophys. Res., 104, 31 65731 669, doi:10.1029/1999JD900119.

    • Search Google Scholar
    • Export Citation
  • Charney, J. G., , and Shukla J. , 1981: Monsoon dynamics. Predictability of Monsoons, J. Lighthill and R. Pearce, Eds., Cambridge University Press, 99–109.

  • Cox, D. R., 1971: The Analysis of Binary Data. Methuen, 142 pp.

  • DelSole, T., , and Feng X. , 2013: The “Shukla–Gutzler” method for estimating potential seasonal predictability. Mon. Wea. Rev., 141, 822831, doi:10.1175/MWR-D-12-00007.1.

    • Search Google Scholar
    • Export Citation
  • Forney, G. D., Jr., 1973: The Viterbi algorithm. Proc. IEEE, 61, 268278, doi:10.1109/PROC.1973.9030.

  • Furrer, E., , and Katz R. , 2007: Generalized linear modeling approach to stochastic weather generators. Climate Res., 34, 129144, doi:10.3354/cr034129.

    • Search Google Scholar
    • Export Citation
  • Ghil, M., , and Childress S. , 1987: Topics in Geophysical Fluid Dynamics: Atmospheric Dynamics, Dynamo Theory and Climate Dynamics. Springer-Verlag, 512 pp.

  • Greene, A. M., , Robertson A. W. , , and Kirshner S. , 2008: Analysis of Indian monsoon daily rainfall on subseasonal to multidecadal time-scales using a hidden Markov model. Quart. J. Roy. Meteor. Soc., 134, 875887, doi:10.1002/qj.254.

    • Search Google Scholar
    • Export Citation
  • Greene, A. M., , Robertson A. W. , , Smyth P. , , and Triglia S. , 2011: Downscaling projections of the Indian monsoon rainfall using a non-homogeneous hidden Markov model. Quart. J. Roy. Meteor. Soc., 137B, 347–359, doi:10.1002/qj.788.

  • Hansen, J. W., , Challinor A. , , Ines A. , , Wheeler T. , , and Moron V. , 2006: Translating climate forecasts into agricultural terms: Advances and challenges. Climate Res., 33, 2741, doi:10.3354/cr033027.

    • Search Google Scholar
    • Export Citation
  • Hay, L. E., , McCabe G. J. Jr., , Wolock D. M. , , and Ayers M. A. , 1991: Simulation of precipitation by weather type analysis. Water Resour. Res., 27, 493501, doi:10.1029/90WR02650.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., , and Guttorp P. , 1994a: A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena. Water Resour. Res., 30, 15351546, doi:10.1029/93WR02983.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., , and Guttorp P. , 1994b: Incorporating spatial dependence and atmospheric data in a model of precipitation. J. Appl. Meteor., 33, 15031515, doi:10.1175/1520-0450(1994)033<1503:ISDAAD>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., , Guttorp P. , , and Charles S. P. , 1999: A non-homogeneous hidden Markov model for precipitation occurrence. J. Roy. Stat. Soc., 48C, 1530, doi:10.1111/1467-9876.00136.

    • Search Google Scholar
    • Export Citation
  • Immerzeel, W. W., , van Beek L. P. H. , , and Bierkens M. F. P. , 2010: Climate change will affect the Asian water towers. Science, 328, 13821385, doi:10.1126/science.1183188.

    • Search Google Scholar
    • Export Citation
  • Johnson, G., , Hanson C. , , Hardegree S. , , and Ballard E. , 1996: Stochastic weather simulation: Overview and analysis of two commonly used models. J. Appl. Meteor., 35, 18781896, doi:10.1175/1520-0450(1996)035<1878:SWSOAA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Joshi, V., , and Rajeevan M. , 2006: Trend in precipitation extremes over India. NCC Research Rep. 3/2006, 25 pp.

  • Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437471, doi:10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Katz, R., , and Glantz M. , 1986: Anatomy of a rainfall index. Mon. Wea. Rev., 114, 764771, doi:10.1175/1520-0493(1986)114<0764:AOARI>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Kenabatho, P. K., , McIntyre N. R. , , Chandler R. E. , , and Wheater H. S. , 2012: Stochastic simulation of rainfall in the semi-arid Limpopo basin, Botswana. Int. J. Climatol., 32, 11131127, doi:10.1002/joc.2323.

    • Search Google Scholar
    • Export Citation
  • Kirshner, S., , Smyth P. , , and Robertson A. W. , 2004: Conditional Chow–Liu tree structures for modeling discrete-valued vector time series. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence, AUAI Press, 317324.

  • McCullagh, P., , and Nelder J. , 1989: Generalized Linear Models. Chapman and Hall, 532 pp.

  • Moron, V., , Robertson A. W. , , Ward M. N. , , and Camberlin P. , 2007: Spatial coherence of tropical rainfall at the regional scale. J. Climate, 20, 52445263, doi:10.1175/2007JCLI1623.1.

    • Search Google Scholar
    • Export Citation
  • CPC/NCEP, 1987: CPC global summary of day/month observations, 1979-continuing. Research Data Archive at the National Center for Atmospheric Research, Computational and Information Systems Laboratory, accessed 20 May 2014. [Available online at http://rda.ucar.edu/datasets/ds512.0.]

  • Paap, R., , and Frances P. H. , 2000: A dynamic multinomial probit model for brand choice with different long-run and short-run effects of marketing-mix variables. J. Appl. Econ., 15, 717744, doi:10.1002/jae.580.

    • Search Google Scholar
    • Export Citation
  • Palmer, T. N., 1998: Nonlinear dynamics and climate change: Rossby’s legacy. Bull. Amer. Meteor. Soc., 79, 14111423, doi:10.1175/1520-0477(1998)079<1411:NDACCR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , Kirshner S. , , and Smyth P. , 2004: Downscaling of daily rainfall occurrence over northeast Brazil using a hidden Markov model. J. Climate, 17, 44074424, doi:10.1175/JCLI-3216.1.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , Kirshner S. , , Smyth P. , , Charles S. P. , , and Bates B. C. , 2006: Subseasonal-to-interdecadal variability of the Australian monsoon over North Queensland. Quart. J. Roy. Meteor. Soc., 132, 519542, doi:10.1256/qj.05.75.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., , Moron V. , , and Swarinoto Y. , 2009: Seasonal predictability of daily rainfall statistics over Indramayu district, Indonesia. Int. J. Climatol., 29, 14491462, doi:10.1002/joc.1816.

    • Search Google Scholar
    • Export Citation
  • Schwarz, G., 1978: Estimating the dimension of a model. Ann. Stat., 6, 461464, doi:10.1214/aos/1176344136.

  • Scott, S., 2002: Bayesian methods for hidden Markov models: Recursive computing in the 21st century. J. Amer. Stat. Assoc., 97, 337351, doi:10.1198/016214502753479464.

    • Search Google Scholar
    • Export Citation
  • Shirley, K. E., , Small D. S. , , Lynch K. G. , , Maisto S. A. , , and Oslin D. W. , 2010: Hidden Markov models for alcoholism treatment trial data. Ann. Appl. Stat., 4, 366395, doi:10.1214/09-AOAS282.

    • Search Google Scholar
    • Export Citation
  • Spiegelhalter, D. J., , Best N. G. , , Carlin B. P. , , and van der Linde A. , 2002: Bayesian measures of model complexity and fit. J. Roy. Stat. Soc., 64B, 583639, doi:10.1111/1467-9868.00353.

    • Search Google Scholar
    • Export Citation
  • Timbal, B., , Hope P. , , and Charles S. , 2008: Evaluating the consistency between statistically downscaled and global dynamical model climate change projections. J. Climate, 21, 60526059, doi:10.1175/2008JCLI2379.1.

    • Search Google Scholar
    • Export Citation
  • Vrac, M., , and Naveau P. , 2007: Stochastic downscaling of precipitation: From dry events to heavy rainfalls. Water Resour. Res., 43, W07402, doi:10.1029/2006WR005308; Corrigendum, 44, W05702, doi:10.1029/2008WR007083.

    • Search Google Scholar
    • Export Citation
  • Wilks, D., 1999a: Interannual variability and extreme-value characteristics of several stochastic daily precipitation models. Agric. For. Meteor., 93, 153170, doi:10.1016/S0168-1923(98)00125-7.

    • Search Google Scholar
    • Export Citation
  • Wilks, D., 1999b: Multisite downscaling of daily precipitation with a stochastic weather generator. Climate Res., 11, 125136, doi:10.3354/cr011125.

    • Search Google Scholar
    • Export Citation
Save