## 1. Introduction

The upper Delaware River Basin System supplies New York City, one of the largest urban water supply systems in the United States. With a cumulative storage capacity of 1.5 × 10^{9} m^{3} from five major reservoirs, the Delaware River basin supplies about 3 × 10^{6} m^{3} day^{−1} to the city of New York. The Delaware River Basin Commission and the New York City Department of Environmental Protection are primarily responsible for managing the releases from the major reservoirs to meet the water demand of the city of New York and to maintain downstream ecosystem services (DRBC 2007). The operating rules of this reservoir system (e.g., the minimum water levels to be maintained in the reservoir at any specified time, or the specification of drought return periods) are based on relatively short historical records of data. The typical record length of the naturalized streamflow data for the major reservoirs on the system is 50–60 years. Given that the drought of record in the basin was in the 1960s, that is, about 50 years ago, extended records of hydrologic variability from paleoproxies such as tree rings could be very useful for assessing the likely return period of this drought for regional water supply planning and drought operation. Recently, impacts on fisheries during the summer low flow period have led to questions concerning the reservoir operating policies that are designed to avert the 1960s drought risk (Kolesar and Serio 2011).

Numerous studies have focused on the use of tree-ring widths for developing proxy climatic and hydrologic series using traditional regression techniques (Stockton and Jacoby 1976; Meko and Graybill 1995). Meko et al. (2007), Woodhouse et al. (2006), and Woodhouse and Lukas (2006a,b used a stepwise linear regression approach to develop multicentury reconstructed streamflows and investigate the medieval drought in the upper Colorado River basin. Similarly, over the Northeast, Cook and Jacoby (1983) used canonical regression analysis to reconstruct the July–September streamflow for the Potomac River using tree-ring chronologies from nearby sites. Cook and Jacoby (1977) also examined the drought in the Hudson River Valley by reconstructing the Palmer Drought Severity Index (PDSI) using a stepwise regression analysis. Recently, Maxwell et al. (2011) reconstructed the Potomac River streamflow dating back to 950 using a network of tree-ring chronologies from multiple species. Kauffman and Vonck (2011) investigated the frequency and intensity of extreme drought over the lower Delaware River basin, specifically at the mouth of the Delaware River using a reconstructed PDSI.

These paleoreconstruction methods use a regression model fit to the observed streamflow using tree-ring chronologies as predictors. The streamflow data in the preinstrumental (paleo) period are then obtained by applying the estimated regression coefficients to the paleo-period tree-ring indices. The paleoreconstruction often considers multiple proxies and multiple hydroclimatic records to be reconstructed [e.g., gridded PDSI reconstruction as in Cook et al. (1999) or temperature reconstruction over spatial grids using multiple proxies as in Tingley and Huybers (2010a,b)]. The resulting multivariate regression problem can be high dimensional. Given a finite dataset, a practical question is how best to estimate parameter uncertainty. Further, the records often have varying length, and long gaps in data can also pose estimation problems. In this paper we consider only the continuous common record and do not consider expectation–maximization (EM) or related algorithms (Dempster et al. 1977; Schneider 2001) for gap filling. A successful model also needs to preserve the correlation of streamflow across sites to properly constrain stochastic simulations of multireservoir operation (Gangopadhyay et al. 2009).

In this paper, we present a hierarchical Bayesian regression (HBR) model for inferences on the posterior probability distribution of the regression coefficients and streamflow values at multiple locations of interest using recently developed tree-ring chronologies in the upper Delaware River basin. A multilevel model framework that provides an elegant means of propagating the parameter uncertainties through appropriate conditional distributions is adopted. Further, noting that multiple correlated predictors (regional tree-ring chronologies) from different species may inform streamflow reconstruction in a similar way, the hierarchical model provides for partial pooling of this common information. Partial pooling reduces the equivalent number of independent parameters, resulting in lower uncertainty in parameter estimates, and therefore leads to reduced uncertainty in the reconstructed streamflows. The multilevel or partial-pooling approach improves on estimation on full pooling, which ignores the cross-site variations in response, and on no pooling, which estimates independent regressions across the sites. Those two cases are subsets or end points of the model developed, and are compared as such.

Hierarchical Bayesian models have been used previously in the context of climate field reconstruction over spatial grids (Tingley and Huybers 2010a,b) and reconstructing Northern Hemisphere temperature data using proxy datasets such as tree-ring measurements, pollen indices, and borehole temperatures, among others. (Li et al. 2010). Similarly, dynamic Bayesian space–time models have also been used to develop long lead forecasting for tropical Pacific SSTs (Berliner et al. 2000). Several other hydrologic applications have been developed and demonstrated in Lima and Lall (2009, 2010) and Kwon et al. (2008, 2011). Readers can also refer to Wikle (2003), Raftery (1995), Gelman and Hill (2007), and Gelman et al. (2004) for additional information on hierarchical Bayesian model applications.

The paper is organized as follows. A brief description of the streamflow and tree-ring chronology data used in the study is provided in section 2. Section 3 contains a description of the proposed hierarchical Bayesian regression model and section 4 presents the results and analysis from the model. The drought characterization using the reconstructed streamflow data for the region is presented in section 5. Finally, in section 6, the key results are discussed and summarized.

## 2. Data description

### a. Streamflow data

The location of the five major reservoirs, selected stream gauges, and the tree-ring sites in the upper Delaware River basin (DRB) is shown in Fig. 1. The DRB extends roughly 532 km from its confluence of the East and West Branches in New York to the mouth of the Delaware Bay encompassing ~34 965 km^{2} and includes the states of New York, New Jersey, Pennsylvania, and Delaware. In this study, we consider five major reservoirs in the upper DRB that serve as the primary water supply systems for the city of New York. Specifics of the U.S. Geological Survey (USGS) stream gauges on the major tributaries corresponding to inflow into each reservoir are provided in Table 1. For the purpose of this study, we selected the stream gauges (from the USGS National Water Information System) on the major creeks feeding into the reservoir such that the inflows are not influenced by any upstream diversions or regulations. The drainage area and record length vary across the stations (Table 1). The Schoharie creek (USGS gauge 1350000) has the longest (98 yr) record. All other stations have data records in the range of 50–64 years.

Details of the stream gauges on major tributaries and the corresponding reservoir systems in the Delaware River basin. These reservoirs serve as the primary water supply systems for the city of New York. The table shows the number of years of streamflow record and the drainage area corresponding to each stream gauge.

### b. Tree-ring data

Table 2 shows the details of the seven new and one older collection [hemlock(2), *Tsuga canadensis*, in Mohonk, New York] of tree-ring chronologies developed from forests in the upper DRB. Note the multiple chronologies for each site. The chronology site locations are presented in Fig. 1. Of the seven chronologies, one was used in Pederson et al. (2004) (pitch pine in Mohonk), three were developed and used in Pederson (2005) (*Liriodendron tulipifera* and *Quercus prinus* in Montgomery Place and *Quercus prinus* in Middleburg), and three were used (now updated) in Cook and Jacoby (1977) [hemlock(1), *Tsuga canadensis*, pitch pine, *Pinus rigida*, and *Quercus* subgenus *leucobalanus* in Mohonk]. The hemlock(2) chronology was developed for the North American Drought Atlas (Cook et al. 1999, 2010). The recently updated Mohonk records, except for the *Betula lenta* chronology, are published here for the first time and are available from the International Tree-Ring Databank in their original form.

Details of the tree-ring chronology data used in the study. The information regarding the site, species, number of trees per site and number of cores sampled is given in the table. The species are TSCA (*Tsuga canadensis*), QUSP (*Quercus* subgenus *leucobalanus*), BELE (*Betula lenta*), PIRI (pitch pine), QUPR (*Quercus prinus*), and LITU (*Liriodendron tulipifera*). The Mohonk records are available from the International Tree-Ring Databank (ITRDB) in their original form.

Recent research indicates that the larger number of species used for tree-ring-based reconstructions could enhance the final reconstruction (Cook and Pederson 2010; Maxwell et al. 2011; Pederson et al. 2013). All chronologies used here have also been shown to be useful for drought index reconstructions by Cook and Pederson. All time series of tree-ring measurements were processed using standard techniques (Stokes and Smiley 1968; Fritts 1976; Cook 1985; Cook and Kairiukstis 1990). Ring width series with growth distortions, rotten sections, or other gaps, including series from the data bank, were filled using the gap-filling option in auto-regressive standardization (ARSTAN) (see Pederson et al. 2004). All series were transformed using the adaptive power transformation and then standardized to conserve as much long-term variation in ring widths as possible while reducing the influence of nonclimatic forces such as changes in competition (Pederson et al. 2004). The “Friedman Super Smoother” was the primary option used to reduce the influence of disturbance in each series (Friedman 1984; Buckley et al. 2010). The Friedman Super Smoother would occasionally cause ring index inflation or deflation at either end of a series. In these few cases, a cubic smoothing spline two-thirds the length of each series was used (Cook and Peters 1981). The mean value for each year was calculated using a biweight robust function following standardization (Cook 1985).

### c. Diagnostic analysis and predictor selection

The tree-ring chronologies represent the annual growth cycle of the trees resulting from less dense (inner portion) early-wood formation during the photosynthetically active growing season (late spring and summer) and the more dense (outer portion) late-wood formation during the fall and winter. These chronologies vary in size each year depending upon the regional climate phenomena. Consequently, the tree rings (measured as the width of early wood plus late wood) are wider during years with adequate moisture availability and narrow during drought years. Hence, analogous to streamflow, the growth index is an integrator of moisture and energy availability in the region. This commonality between annual growth index and streamflow enables us to develop predictive models that can be utilized to understand the long-term variability of the climate in the region. The summer season average [June–August (JJA)] streamflow for each of the five major gauges in the upper DRB was identified for reconstruction under the hypothesis that growing season of the trees, concurrent to the streamflow, may present the best sensitivity across flows and trees. This relatively dry period is also critical for reservoir operations given the fishing and ecological impacts. Preliminary analyses of the seasonality of inflows (Fig. 2a) show that typically 45% of annual inflows occur during March–May and the flows during JJA contribute 20% of the annual inflows. From a water management perspective, developing reconstructed inflows for the summer season is important to assess the frequency and recurrence of severe droughts and to better quantify the operational rule curves for downstream release purposes (Kolesar and Serio 2011). Summer (JJA) is also the growth season when the trees are photosynthetically active and thus most sensitive to moisture limitations and loss through transpiration.

Diagnostic analysis of streamflows and tree-ring chronologies: (a) boxplot of monthly flows and plot of the mean monthly rainfall for the Schoharie station; (b) correlation coefficient between tree rings and annual average flows; and (c) correlation between tree rings and summer (JJA) season average streamflow at the five selected stations. The horizontal line marks the one-sided 95% significance level for correlation.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Diagnostic analysis of streamflows and tree-ring chronologies: (a) boxplot of monthly flows and plot of the mean monthly rainfall for the Schoharie station; (b) correlation coefficient between tree rings and annual average flows; and (c) correlation between tree rings and summer (JJA) season average streamflow at the five selected stations. The horizontal line marks the one-sided 95% significance level for correlation.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Diagnostic analysis of streamflows and tree-ring chronologies: (a) boxplot of monthly flows and plot of the mean monthly rainfall for the Schoharie station; (b) correlation coefficient between tree rings and annual average flows; and (c) correlation between tree rings and summer (JJA) season average streamflow at the five selected stations. The horizontal line marks the one-sided 95% significance level for correlation.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

For a preliminary assessment of this hypothesis, we computed the Pearson correlation coefficient between the tree-ring chronologies and 1) the annual average streamflows and 2) the JJA streamflows for the five stations (Fig. 2). The correlation coefficients statistically significant at the 95% confidence interval for a given sample size are highlighted. One can see that all tree chronologies correlate better with the summer than with annual streamflow. The tree ring–streamflow correlation with winter/spring flows was also lower than for the summer flows. Regional studies also show that the tree-ring response to regional climate is greatest during the active growing season (summer) (Cook and Jacoby 1977, 1983; Maxwell et al. 2011; Pederson et al. 2013).

One can also observe from Fig. 2c that there is a potential opportunity for “grouping” or pooling the relationships across trees and streamflow. For instance, the correlation coefficient between the annual tree-ring growth index of hemlock [MH(1)] and the summer average streamflow of the five stations is in the range 0.35–0.4, indicating that the hemlock relationship to streamflow is similar across the five stations. Similarly, the correlation of the flows of the five stations to the tulip poplar species (MoTP) ranges from 0.6 to 0.75. This suggests that pooling the regression coefficients across stations with respect to a specific tree-ring chronology may be useful, while pooling regression coefficients for a streamflow station across different tree chronologies may not be as effective.

We also investigated the possibility of lagged correlation (e.g., *t*−1,* t*−2, *t*−3) between tree rings and streamflow that could result from longer use of stored energy in the trees from prior growth (Trumbore et al. 2002; Kagawa et al. 2006). However, we did not find any statistically significant relations (results not shown) at these lags. Hence, only the tree-ring chronologies of the current year were used to predict the summer seasonal streamflows.

The Shapiro–Wilk normality test (Royston 1995) applied to the log transformed summer seasonal (JJA) average flows for each station did not reject the null hypothesis of the transformed values being normally distributed at the 5% level (*p* values ranged from 0.05 to 0.27). The Box–Cox transform led to the same conclusion. The log transformed streamflows were modeled as the response variables, but all subsequent model validation results are presented in terms of the real space summer flows.

## 3. Hierarchical Bayesian regression: Methodology

A general multilevel modeling structure that allows pooling of information across stream gauges for regression on available tree-ring chronologies, and considers correlation of model residuals across stations was explored under the hierarchical Bayesian regression framework. We term the general model the “partial pooling model.” Several subsets of this model were explored to develop intuition as to how different end points of this model perform. We consider the following subsets: a no-pooling model, where regression coefficients for each streamflow series are modeled independently, and a full-pooling model, where all streamflow sequences have the same regression coefficient for a specific tree-ring series. In each case, we considered estimation of the full covariance matrix across sites of residuals as well as a diagonal structure that treats them as uncorrelated.

### a. The general model (partial pooling)

Equation (1) represents the regression of *t*. MVN stands for multivariate normal distribution. The *i*. The *i*. Since the drainage area of the river basin varies across sites, the

Noting that the correlation of a given tree-ring chronology is very similar across the five stations; we consider a multilevel model that allows for pooling of information across stations for a given tree for estimating the regression slope parameters to reduce the associated uncertainty. The model has a multilevel structure where the model parameters are presumed to be drawn from a common distribution, whose parameters (e.g., the

_{0}and ν

_{1}degrees of freedom. In our applications, the scale matrices

_{0}and ν

_{1}were set to one more than the dimension of the matrix (i.e., the total number of predictors, 8, for

*p*(

*| y*) for the partial-pooling case of the complete parameter vector

*p*(

*n*is the number of observations at station

_{i}*i*available for fitting the model.

### b. No-pooling model

_{1}degrees of freedom. The joint posterior likelihood

*p*(

*| y*) for the no-pooling case is given as follows:

### c. Full-pooling model

**. The posterior likelihood**

*β**p*(

*|*

*y*) is given as follows:

For each model, the parameters

## 4. Results and analysis

The fit of the three models (no pooling, full pooling, and partial pooling) was compared initially using the Bayesian deviance statistics, that is, the measure of lack of fit

### a. Comparison of regression parameters and the reconstructed flows

The posterior probability distributions of the regression slope parameter vector (i.e., *Tsuga canadensis* species are illustrated in Fig. 3. The posterior distributions of the regression coefficients under two model schemes (with correlated errors and without correlated errors) are shown in separate plots. In each subset, the first column corresponds to the regression coefficient for the full-pooling case, where all stations have the same coefficient. This is followed by the boxplots of the regression coefficients for the five stations under no pooling and the five stations under partial pooling. The results for uncorrelated residuals are presented in Fig. 3a and for the case where the covariance matrix is modeled are presented in Fig. 3b. The interquartile range (IQR) from the full-pooling, no-pooling, and partial-pooling models for the estimated

Boxplots of the posterior distribution of the regression coefficients (i.e., *Tsuga canadensis*) for all stations: (left) diagonal covariance matrix for residuals and (right) full covariance matrix for residuals. The interquartile range across the stations for each model scheme is shows as a horizontal line (solid for partial pooling, and dashed for no pooling).

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Boxplots of the posterior distribution of the regression coefficients (i.e., *Tsuga canadensis*) for all stations: (left) diagonal covariance matrix for residuals and (right) full covariance matrix for residuals. The interquartile range across the stations for each model scheme is shows as a horizontal line (solid for partial pooling, and dashed for no pooling).

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Boxplots of the posterior distribution of the regression coefficients (i.e., *Tsuga canadensis*) for all stations: (left) diagonal covariance matrix for residuals and (right) full covariance matrix for residuals. The interquartile range across the stations for each model scheme is shows as a horizontal line (solid for partial pooling, and dashed for no pooling).

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Interquartile ranges for the regression coefficients (i.e.,

The reduction in variance of the regression coefficients in turn leads to a reduction in the uncertainty in the streamflow estimates. An examination of the correlation structure resulting from the posterior distribution from the no-pooling, full-pooling, and partial-pooling models of the resulting

A boxplot of the posterior distribution of the mean of the vector of regression coefficients across sites (i.e., *Tsuga canadensis*) chronology] have positive coefficients. Given the high correlation in the chronologies of the tree species [0.8 for MH(1) and MH(2)], the coefficients are expected to be negatively correlated. The off-diagonal elements in the estimated

Boxplots of the *μ*_{β} coefficients from the partial pooling hierarchical Bayesian regression model. MH(1) and MH(2) are strongly correlated series.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Boxplots of the *μ*_{β} coefficients from the partial pooling hierarchical Bayesian regression model. MH(1) and MH(2) are strongly correlated series.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Boxplots of the *μ*_{β} coefficients from the partial pooling hierarchical Bayesian regression model. MH(1) and MH(2) are strongly correlated series.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

The posterior probability distributions of the reconstructed flows from the no-pooling approach and the partial-pooling approach for the Roundout and Canonsville stations during the period 1754–2000 are compared in Fig. 5. For the sake of brevity, all subsequent results are presented for only the no-pooling and partial-pooling models, with results from the full-pooling model discussed only where appropriate. Predicting or extrapolating data back in time using the tree-ring indices may be associated with a larger uncertainty due to potential extrapolation of the fitted data. From a comparison of the uncertainty bands (5th and 95th percentiles) in Fig. 5, we can see that the partial-pooling HBR approach results in a modest reduction in uncertainty in estimating the posterior distribution of the flows. This can be also be seen from Fig. 5c, which shows the boxplot of the width (i.e., the difference of the 95th percentile and the 5th percentile) for the 247 years from the no-pooling and partial-pooling models. The reduction in uncertainty is similar for other stations.

JJA reconstructed seasonal average streamflow for the Roundout and Canonsville stations from (a) no-pooling traditional regression and (b) partial-pooling hierarchical Bayesian regression, along with the uncertainty bands that represent the 5th and 95th percentile flows. (c) The boxplots of the in the width of the uncertainty bands (95th percentile − 5th percentile) for the two models.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

JJA reconstructed seasonal average streamflow for the Roundout and Canonsville stations from (a) no-pooling traditional regression and (b) partial-pooling hierarchical Bayesian regression, along with the uncertainty bands that represent the 5th and 95th percentile flows. (c) The boxplots of the in the width of the uncertainty bands (95th percentile − 5th percentile) for the two models.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

JJA reconstructed seasonal average streamflow for the Roundout and Canonsville stations from (a) no-pooling traditional regression and (b) partial-pooling hierarchical Bayesian regression, along with the uncertainty bands that represent the 5th and 95th percentile flows. (c) The boxplots of the in the width of the uncertainty bands (95th percentile − 5th percentile) for the two models.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

The correlation of flow across streamflow stations is also estimated from the posterior distribution of the flows. The observed correlation between stations ranged between 0.92 (for Neversink and Roundout) and 0.74 (for Neversink and Canonsville). The median correlation between Neversink and Roundout estimated from the 1000 posterior draws is 0.89 with the interquartile range between 0.88 and 0.93. Similarly, the median of the correlation between Neversink and Canonsville estimated from the 1000 posterior draws is 0.72 with the interquartile range between 0.65 and 0.79. Results for other stations are similar. In the next section, we present the results from cross-validation over varying calibration periods using performance statistics common to the tree-ring reconstruction literature.

### b. Validation tests under varying calibration periods

We used two performance metrics, reduction of error (RE) and coefficient of efficiency (CE), as measures of model performance to compare the reconstructed posterior mean of the streamflow estimates with the actual streamflow data. These metrics were estimated using the leave-*m*-out cross-validation method. The procedure is carried out by leaving out *m* randomly selected data points from the observational dataset for validation, and the model is developed using the remaining (*n − m*) observations (*n* is the total number of observational data points). This process is repeated several times to obtain an ensemble of validation metrics resulting from each randomly selected model. We used the 50-yr (1950–99) common data period across all streamflow stations for this purpose. The cross-validation approach taken was to draw a sample of size 40 from the 50-yr common record without replacement and fit the Bayesian model on this dataset: then predictions were made on the 10 observations that were left out. This procedure was repeated 50 times to compute the validation statistics. Note that analysts who fit Bayesian models are typically interested in the coverage rates (discussed later), uncertainty levels, and model checking using the posterior draws. However, we include comparisons across the cross-validated statistics here so that traditional tree-ring analysts who use such a procedure are given a benchmark consistent with their approach. The entire Bayesian fitting process is repeated for each sample of size 40.

*x*and

_{i}*i*of the validation period and

*R*

^{2}statistic. RE > 0 indicates that the reconstructed streamflow contains useful information not contained in the calibration period. Similarly, RE < 0 indicates that the reconstructions are poorer than climatology; that is, the reconstructions are not better than the mean flows in the calibration period. The coefficient of efficiency is defined as

*x*and

_{i}*i*of the validation period, and

The results for RE and CE performance under cross-validation for both no pooling and partial pooling for each station are shown in Fig. 6. We observe that under most calibration periods, both the no-pooling and partial-pooling methods show RE and CE greater than zero, indicating that the reconstructed streamflows (from both methods) contain useful information not contained in the calibration period. Further, we also observe that, on average, the RE and CE across all the validation periods from the partial-pooling HBR method is comparable to or better than the no-pooling method for the stations. The improved average metrics for the HBR method reflect the reduction in the uncertainties in estimating the model parameter and the resulting flows. A comparison of the average bias and variance of the estimates from both the no-pooling and partial-pooling methods showed that the reduction in average error for the partial-pooling method is primarily due to a lower parameter variance.

Boxplots of (a) reduction of error (RE) and (b) coefficient of efficiency (CE) for the 50 randomly selected cross-validation cases.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Boxplots of (a) reduction of error (RE) and (b) coefficient of efficiency (CE) for the 50 randomly selected cross-validation cases.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

Boxplots of (a) reduction of error (RE) and (b) coefficient of efficiency (CE) for the 50 randomly selected cross-validation cases.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

In addition to computing the cross-validated RE and CE, the performance of the posterior probability distribution is assessed by examining the model’s ability to cover the observed flows within a specified credible interval. Here, we estimated the coverage rates (Li el al. 2010) for the 90% credible intervals for the validation periods for both the models. For each validation period, we count the number of failures or the number of observations (during the validation period) that are outside the 5th and 95th percentile of the posterior distribution resulting from the model developed using the remaining years as calibration for each station. Henceforth by computing the total number of failures from all the randomly selected models, we estimated the coverage rate as the percentage of failures in the total of 500 (50 × 10 years) samples. The average coverage rate across the stations is approximately 92% for the no-pooling model and 91.5% for the partial-pooling model, indicating the robustness of the fitted Bayesian models.

From the above results, we see that the performance of the partial-pooling HBR method for streamflow reconstructions of the Delaware River is comparable to or better than the no-pooling traditional regression method. In the next section, we use simulations of the reconstructed streamflow for regional drought characterization.

## 5. Drought characterization based on reconstructed streamflows

The drought of record in the region is the one from the early to mid 1960s (Namias 1966, 1967). A similar drought could cause severe stress on regional water resources, given the increased population and services today. Subsequent moderate droughts have led to water restrictions in the region due to reduced reservoir storages (NYCDEP 2011). In this section, we attempt to 1) characterize the duration and severity of the 1960s drought along with its return periods and 2) investigate for any changes/trends in the extreme drought events using the reconstructed streamflow from the general partial pooling model.

### a. Quantifying the duration and severity of droughts

We define a drought as an event during which the streamflow is continuously below a certain level. A schematic representation (based on historical data for the Canonsville station) of the drought statistics is shown in Fig. 7. For a selected threshold (90% of mean observed summer flows here), a drought event is defined as the sequence of years that are under the threshold with event duration defined as the numbers of years the flow is continuously below the threshold. The magnitude or the severity is the cumulative deficit over the drought duration estimated as the area under the curve below the threshold. The number of historical drought events and their severity for the Canonsville station is shown in Fig. 7b. The 1960s drought is seen to be the most severe in terms of duration (5 yr) and severity (a cumulative deficit of 132 × 10^{6} m^{3}). Hence, purely based on the historical record, the return period of this drought event is approximately 1 in 54.

(top) Schematic representation of the duration and severity of drought and (bottom) drought events based on a selected threshold of 90% of the average streamflow in the historical streamflow data record of 50 years for the Canonsville station.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

(top) Schematic representation of the duration and severity of drought and (bottom) drought events based on a selected threshold of 90% of the average streamflow in the historical streamflow data record of 50 years for the Canonsville station.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

(top) Schematic representation of the duration and severity of drought and (bottom) drought events based on a selected threshold of 90% of the average streamflow in the historical streamflow data record of 50 years for the Canonsville station.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

### b. Quantifying the duration and severity of droughts based on reconstructed flows

We used the posterior probability distribution of the 247-yr-long reconstructed streamflow records conditioned on tree-ring data to quantify the duration and severity of droughts greater than or equal to the 1960s drought in the historical record for each station. For example, for the Canonsville station the 1960s drought with a severity of 132 × 10^{6} m^{3}—we identify all events that are more severe in terms of duration and severity in each simulation of length 247 years from the posterior probability distribution of the model. The return period of drought severity can then be estimated from the number of such events in a 247-yr simulation. One thousand realizations, each of length 247 yr, were generated from the posterior distribution and the number of events that exceed the duration and severity of the 1960s drought at each streamflow station was counted for each realization. The results from the partial-pooling model for the exceedance attributes of the severity and duration of the 1960s drought are shown in Fig. 8. From Fig. 8a we see that the median return period of the 1960s drought is around 80 yr for the region with an interquartile range between 50 and 125 yr. However, we also see that the return periods for the Neversink and the Roundout stations, which have small drainage areas, are lower than the regional median. To understand the correlated nature of these droughts, in Fig. 8b we show the histograms of the number of stations that are simultaneously under drought from all the simulations. We observe that most commonly all five stations are under drought. However, Neversink experiences more frequent droughts of the 1960s severity and duration and corresponds to the case where there is a lone station under drought. Further, the number of simulations that indicate an exceedance of the 1960s drought for Canonsville over the paleoreconstructed record is illustrated in Fig. 8c. Note that each bar (showing the total number of posterior draws per year that indicate the drought) corresponds to the year in which the simulated drought ends. A cluster of these years, as in 1965–70, indicates a high probability of such a drought. The 1960s period is striking in this regard. However, it is within the fitting sample and, hence, is expected to be more prominent. The years 1912–14, 1850s–60s, 1790–1810, and the 1770s appear to be other periods of interest. Similar observations were also made by Pederson et al. in the Northeast region. The simulations provide the ability to also analyze reservoir fill and drain probabilities as a function of drought intermittence and recurrence. Bayesian regime and changepoint analysis models integrated with the reconstruction model could also be employed to inform reservoir and drought management policies.

(a) Boxplots of the return period of the 1960s drought identified for the five stations from the 1000 simulations of the partial-pooling model; (b) histogram of the number of stations that are simultaneously under drought; and (c) time series marking Canonsville droughts with duration and severity greater than 1960s drought from posterior simulations, marked at the end of each drought.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

(a) Boxplots of the return period of the 1960s drought identified for the five stations from the 1000 simulations of the partial-pooling model; (b) histogram of the number of stations that are simultaneously under drought; and (c) time series marking Canonsville droughts with duration and severity greater than 1960s drought from posterior simulations, marked at the end of each drought.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

(a) Boxplots of the return period of the 1960s drought identified for the five stations from the 1000 simulations of the partial-pooling model; (b) histogram of the number of stations that are simultaneously under drought; and (c) time series marking Canonsville droughts with duration and severity greater than 1960s drought from posterior simulations, marked at the end of each drought.

Citation: Journal of Climate 26, 12; 10.1175/JCLI-D-11-00675.1

### c. Trend analysis to detect changes in drought events

The analysis presented thus far attempted to analyze the probability of occurrence of the 1960s drought using the long run simulations of reconstructed streamflow for each station. In this section, we assess monotonic trends in the joint drought events from the 1000 posterior draws that exceed the 1960s drought threshold level using the Mann–Kendall nonparametric trend tests (Mann 1945; Helsel and Hirsch 1992). The Mann–Kendall test is a rank-based test that is typically used for detecting trends in extremes with no assumption of the underlying distribution of the data (Helsel and Hirsch 1992). For each posterior draw of streamflow, we identify the drought events in the 247-yr simulation with a duration and severity greater than the target duration and severity, and apply the Mann–Kendall trend test for monotonic increase or decrease in the incidence of these historical drought events. The incidence is recorded as a binary variable (1 for exceedance, 0 for nonexceedance). Results show that the slope (tau) from the test as percentiles across the 1000 simulations ranges from −0.2 to 0.2, suggesting that there is little evidence for a monotonic trend in the incidence of droughts for the target threshold.

## 6. Discussion and summary

A restricted goal of our hierarchical Bayesian model for streamflow reconstruction was to consider 1) the use of processed tree-ring chronologies as the primary predictors; 2) all chronologies of the common length; 3) no expectation–maximization (Dempster et al. 1977; Schneider 2001) or similar algorithm for imputation of predictors or response variables, that is, full record lengths; 4) no exogenous predictors for streamflow such as drainage area; and 5) no consideration of the spatial structure on the river drainage network.

We were interested in seeing whether partial pooling through hierarchical Bayesian regression (HBR) offered 1) reductions in uncertainty over the relatively short lengths of time series available in the Delaware basin for a small number of streamflow stations for relatively small drainage area basins, 2) insights on how best to pool (or share information about the regression coefficients) across streamflow stations or trees, and 3) an ability to model multivariate correlations across sites and trees. These questions were explored with the Delaware application, with the idea that we would eventually build a HBR model that considers the entire reconstruction process more generally, relaxing the restrictions imposed here. For example, as in Lima and Lall (2010), one can easily extend the model to consider exogenous predictors for the slopes and intercepts and to link different levels together. Intuitively, such an approach is advantageous compared to some traditional applications where, for instance, only the conditional mean of one modeling step (e.g., tree-ring chronology processing) is used as “data” for the next modeling step (e.g., flow reconstruction). Extensions of this model to explore whether there are cyclical patterns (e.g., related to the North Atlantic Oscillation or other low-frequency climate modes) or hidden states may allow further investigation of the drought onset and withdrawal as part of system operation. So far, very few such models have been pursued for this application. Integration of models such as the wavelet autoregressive models presented in Kwon et al. (2007) would be of interest in this regard.

A relatively simple HBR model structure was consequently used and justified in the application. We found partial pooling across stations for a particular tree species, and modeling the correlation of residuals across streamflow stations to be best for this dataset. This is also consistent with the biological and climate intuition of tree-ring specialists and hydrologists. Comparisons with traditional models in our preliminary work (not reported here) showed that the HBR is competitive or superior in terms of the validation statistics typically used by paleoclimate modelers. We are able to see the effectiveness of the HBR under partial pooling to deliver reduced uncertainty in reconstruction and improved cross-validation performance statistics, as well as provide ways to assess the joint probability distribution of drought severity and duration and its uncertainty. Such information is important both in terms of adding value from long records and in terms of developing precision in the return period estimates. Finally, nonstationarity in drought incidence was explored for the longer record using simulations from the posterior density of the reconstructed flows and the conclusion was that the evidence for monotonic trend was not statistically significant. The implications for water resource managers are that, at least for now, 1) drought planning and management could use the information from the historical and paleoreconstructions for the dry period in each year, 2) the return periods of droughts of different severity and duration can be estimated, including its uncertainty, and 3) synthetic streamflow sequences for summer period inflows into the reservoir system can be developed to explore the risk implications for different drought sequences.

## Acknowledgments

This research was supported through the Consortium on Climate Risk in the Urban Northeast (CCRUN), part of the NOAA RISA program and through the NSF Grant 0934516 titled “Reconstructing Climate from Tree Ring Data.” The authors thank Dr. Martin Tingley for his critical reviews that were very helpful in improving the analysis and presentation in the manuscript. The authors would also like to thank the other two anonymous reviewers whose valuable comments led to significant improvements in the manuscript.

## REFERENCES

Berliner, L. M., C. K. Wikle, and N. Cressie, 2000: Long-lead prediction of Pacific SSTs via Bayesian dynamic modeling.

,*J. Climate***13**, 3953–3968.Buckley, B. M., and Coauthors, 2010: Climate as a contributing factor in the demise of Angkor, Cambodia.

,*Proc. Natl. Acad. Sci. USA***107**, 6748–6752.Cook, E. R., 1985

*:*A time series analysis approach to tree-ring standardization. Ph.D. dissertation, University of Arizona, 342 pp.Cook, E. R., and G. Jacoby, 1977: Tree-ring drought relationships in the Hudson Valley, New York.

,*Science***198**, 399–401.Cook, E. R., and K. Peters, 1981: The smoothing spline: A new approach to standardizing forest interior ring-width series for dendroclimatic studies.

,*Tree-Ring Bull.***41**, 45–53.Cook, E. R., and G. Jacoby, 1983: Potomac River streamflow since 1730 as reconstructed by tree rings.

,*J. Climate Appl. Meteor.***22**, 1659–1672.Cook, E. R., and L. A. Kairiukstis, 1990:

*Methods of Dendrochronology: Applications in the Environmental Sciences.*Kluwer Academic, 304 pp.Cook, E. R., and N. Pederson, 2010: Uncertainty, emergence, and statistics in dendrochronology.

*Dendroclimatology: Progress and Prospects,*M. K. Hughes, T. W. Swetnam, and H. F. Diaz, Eds., Vol. 11,*Developments in Paleoenvironmental Research,*Springer Verlag, 77–112.Cook, E. R., D. Meko, D. Stahle, and M. Cleaveland, 1999: Drought reconstructions for the continental United States.

,*J. Climate***12**, 1145–1162.Cook, E. R., R. Seager, R. R. Heim Jr., R. S. Vose, C. Herweijer, and C. Woodhouse, 2010: Megadroughts in North America: Placing IPCC projections of hydroclimatic change in a long-term palaeoclimate context.

,*J. Quat. Sci.***25**, 48–61.Dempster, A. P., N. M. Laird, and D. B. Rubin, 1977: Maximum likelihood from incomplete data via the EM algorithm.

,*J. Roy. Stat. Soc.***39**, 1–38.DRBC, cited 2007: Flexible flow management program. [Available online at http://water.usgs.gov/osw/odrm/.]

Friedman, J. H., 1984: A variable span smoother. Stanford University SLAC PUB-3477 STAN-LCS 005, 30 pp. [Available online at http://www.slac.stanford.edu/cgi-wrap/getdoc/slac-pub-3477.pdf.]

Fritts, H. C., 1976:

*Tree Rings and Climate.*Academic Press, 567 pp.Gangopadhyay, S., B. L. Harding, B. Rajagopalan, J. J. Lukas, and T. J. Fulp, 2009: A nonparametric approach for paleohydrologic reconstruction of annual streamflow ensembles.

,*Water Resour. Res.***45**, W06417, doi:10.1029/2008WR007201.Gelman, A., 2005: Prior distribution for variance parameters in hierarchical models.

,*Bayesian Anal.***1**, 1–19.Gelman, A., and D. B. Rubin, 1992: Inference from iterative simulation using multiple sequences.

,*Stat. Sci.***7**, 457–511.Gelman, A., and J. Hill, 2007:

*Data Analysis Using Regression and Multilevel/Hierarchical Models.*Cambridge University Press, 648 pp.Gelman, A., J. B. Carlin, H. S. Stern, and D. B. Rubin, 2004:

*Bayesian Data Analysis.*Chapman & Hall, 668 pp.Gilks, W. R., and G. O. Roberts, 1995: Strategies for improving MCMC.

*Markov Chain Monte Carlo in Practice: Interdisciplinary Statistics,*W. R. Gilks, S. Richardson, and D. Spiegelhalter, Eds., Chapman & Hall, 89–114.Helsel, D. R., and R. M. Hirsch, 1992:

*Statistical Methods in Water Resources.*Studies in Environmental Science Series, Vol. 49, Elsevier Science, 522 pp.Kagawa, A., A. Sugimoto, and T. C. Maximov, 2006: 13CO2 pulse-labelling of photoassimilates reveals carbon allocation within and between tree rings.

,*Plant Cell Environ.***29**, 1571–1584.Kauffman, G. J., and K. J. Vonck, 2011: Frequency and intensity of extreme drought in the Delaware basin, 1600–2002.

,*Water Resour. Res.***47**, W05521, doi:10.1029/2009WR008821.Kolesar, P., and J. Serio, 2011: Breaking the deadlock: Improving water-release policies on the Delaware River through operations research.

,*INFORMS Interfaces***41**, 18–34.Kwon, H.-H., U. Lall, and A. F. Khalil, 2007: Stochastic simulation model for nonstationary time series using an autoregressive wavelet decomposition: Applications to rainfall and temperature.

,*Water Resour. Res.***43**, W05407, doi:10.1029/2006WR005258.Kwon, H.-H., C. Brown, and U. Lall, 2008: Climate informed flood frequency analysis and prediction in Montana using hierarchical Bayesian modeling.

,*Geophys. Res. Lett.***35**, L05404, doi:10.1029/2007GL032220.Kwon, H.-H., U. Lall, and V. Engel, 2011: Predicting foraging wading bird populations in Everglades National Park from seasonal hydrologic statistics under different management scenarios.

,*Water Resour. Res.***47**, W09510, doi:10.1029/2010WR009552.Li, B., D. W. Nychka, and C. M. Ammann, 2010: The value of multi-proxy reconstruction of past climate.

,*J. Amer. Stat. Assoc.***105**, 883–911.Lima, C. H. R., and U. Lall, 2009: Hierarchical Bayesian modeling of multisite daily rainfall occurrence: Rainy season onset, peak and end.

,*Water Resour. Res.***45**, W07422, doi:10.1029/2008WR007485.Lima, C. H. R., and U. Lall, 2010: Spatial scaling in a changing climate: A hierarchical Bayesian model for non-stationary multi-site annual maximum and monthly streamflow.

,*J. Hydrol.***383**, 307–318, doi:10.1016/j.jhydrol.2009.12.045.Lorenz, E. N., 1956: Empirical orthogonal functions and statistical weather prediction. MIT Department of Meteorology Statistical Forecasting Scientific Rep. 1, 57 pp.

Lunn, D. J., A. Thomas, N. Best, and D. Spiegelhalter, 2000: WinBUGS—A Bayesian modelling framework: Concepts, structure, and extensibility.

,*Stat. Comput.***10**, 325–337.Mann, H. B., 1945: Nonparametric tests against trend.

,*Econometrica***13**, 245–259.Maxwell, R. S., A. E. Hessl, E. R. Cook, and N. Pederson, 2011: A multispecies tree ring reconstruction of Potomac River streamflow (950–2001).

,*Water Resour. Res.***47**, W05512, doi:10.1029/2010WR010019.Meko, D. M., and D. A. Graybill, 1995: Tree-ring reconstruction of Upper Gila River discharge.

,*Water Resour. Bull.***31**, 605–616.Meko, D. M., C. A. Woodhouse, C. H. Baisan, T. Knight, J. J. Lukas, M. K. Hughes, and M. W. Salzer, 2007: Medieval drought in the Upper Colorado River basin.

,*Geophys. Res. Lett.***34**, L10705, doi:10.1029/2007GL029988.Namias, J., 1966: Nature and possible causes of the northeastern United States drought during 1962–1965.

,*Mon. Wea. Rev.***94**, 543–557.Namias, J., 1967: Further studies of drought over northeastern United States.

,*Mon. Wea. Rev.***95**, 497–508.NYCDEP, cited 2011: History of drought and water consumption. [Available online at http://www.nyc.gov/html/dep/html/drinking_water/droughthist.shtml.]

Pederson, N., 2005: Climatic sensitivity and growth of southern temperate trees in the Eastern US: Implications for the carbon cycle. Ph.D. dissertation, Columbia University, 186 pp.

Pederson, N., E. R. Cook, G. C. Jacoby, D. M. Peteet, and K. L. Griffin, 2004: The influence of winter temperatures on the annual radial growth of six northern-range-margin tree species.

,*Dendrochronologia***22**, 7–29.Pederson, N., A. R. Bell, E. R. Cook, U. Lall, N. Devineni, R. Seager, K. Eggelston, and K. J. Vranes, 2013: Is an epic pluvial masking the water insecurity of the Greater New York City region?

,*J. Climate***26**, 1339–1354.Raftery, A., 1995: Bayesian model selection in social research.

,*Sociol. Methodol.***25**, 111–163.Royston, P., 1995: Remark AS R94: A remark on algorithm AS 181: The W test for normality.

,*Appl. Stat.***44**, 547–551.Schneider, T., 2001: Analysis of incomplete climate data: Estimation of mean values and covariance matrices and imputation of missing values.

,*J. Climate***14**, 853–871.Spiegelhalter, D., A. Thomas, N. Best, and W. Gilks, 1996: BUGS 0.5: Bayesian inference using Gibbs sampling manual (version ii). Medical Research Council Biostatistics Unit Manual, 59 pp.

Stockton, C. W., and G. C. Jacoby, 1976: Long-term surface-water supply and streamflow trends in the Upper Colorado River basin based on tree-ring analyses. National Science Foundation Lake Powell Research Project Bull. 18, 70 pp.

Stokes, M. A., and T. L. Smiley, 1968:

*An Introduction to Tree-Ring Dating.*University of Arizona Press, 73 pp.Tingley, M. P., and P. Huybers, 2010a: A Bayesian algorithm for reconstructing climate anomalies in space and time. Part I: Development and applications to paleoclimate reconstruction problems.

,*J. Climate***23**, 2759–2781.Tingley, M. P., and P. Huybers, 2010b: A Bayesian algorithm for reconstructing climate anomalies in space and time. Part II: Comparison with the regularized expectation–maximization algorithm.

,*J. Climate***23**, 2782–2800.Trumbore, S., J. Gaudinski, P. Hanson, and J. Southon, 2002: Quantifying ecosystem–atmosphere carbon exchange with a 14C label.

,*Eos, Trans. Amer. Geophys. Union***83**, 265–268.Wikle, C. K., 2003: Hierarchical Bayesian models for predicting the spread of ecological processes.

,*Ecology***84**, 1382–1394.Woodhouse, C. A., and J. Lukas, 2006a: Drought, tree rings and water resource management in Colorado.

,*Can. Water Resour. J.***31**, 1–14, doi:10.4296/cwrj3104297.Woodhouse, C. A., and J. Lukas, 2006b: Multi-century tree-ring reconstructions of Colorado streamflow for water resource planning.

,*Climatic Change***78**, 293–315, doi:10.1007/s10584-006-9055-0.Woodhouse, C. A., S. T. Gray, and D. M. Meko, 2006: Updated streamflow reconstructions for the Upper Colorado River basin.

,*Water Resour. Res.***42**, W05415, doi:10.1029/2005WR004455.