1. Introduction
A recent outlook report by the Organization for Economic Cooperation and Development highlights key determinants of future agricultural sector performance (OECD 2008). Developments in world-market prices of inputs to production, trends in commodity prices, technological sector innovations, and national food strategies and trade policies of the big producers are considered to be important. At the same time, the authors highlight the relevance of sufficient water supply capacities for sustaining and increasing agricultural production (see also Rockstrom et al. 2007). They also mention the mounting concern over the central role that irrigation plays in agricultural production given 1) inadequate maintenance and development of surface water storage technologies, 2) the widespread regional depletion of aquifers in many countries, 3) ongoing soil salinization in irrigated lands, 4) significant pollution of surface and subsurface waters, and 5) changes in the regional availability of renewable water due to climate change and variability.
At the global scale, irrigated agriculture accounts for less than 20% of the total cropland area, yields 40% of agricultural output, and consumes nearly 70% of total developed water supply (Shiklomanov 2000; UNDP 2006). Without large-scale improvements in irrigation efficiency, ensuring stable and increasing irrigation by guaranteeing water supplies of sufficient quantity and quality may become progressively more difficult, especially in water-scarce semiarid and arid regions (Fekete et al. 2004; Vorosmarty et al. 2000; Kundzewicz et al. 2008). The challenge is particularly acute in developing and transitional economies such as India and China, where many of the problems listed above are pervasive because of extreme population pressure and persistent rural poverty (Alley et al. 2002; Chao et al. 2008; UNDP 2006). Adequate short-term management and sound long-term planning of water resources are key prerequisites for addressing these issues.
Crop choice and area set aside for production in addition to climatological factors and management techniques determine water use in agriculture. So far, understanding agricultural response to variability in economic and climatic boundary conditions has been primarily focused on modeling agricultural yields at different scales (e.g., see the review by Challinor et al. 2009, and references therein). However, accurately forecasting the timing and magnitude of important climate phenomena (e.g., dry days, heat days, monsoons, seasonal floods, seasonal rainfall, etc.) and assessing their influence on agriculture at seasonal to annual scales could help stakeholders and policy makers to better determine overall allocation needs and inform on tradeoffs between different supply options and how these can translate into mutual gains for stakeholders including farmers as well as government and the environment alike.
Most importantly, in agroeconomic systems where command and control is absent and where a large number of individual farmers react to varying economic and environmental conditions, this forecast information could contribute to more efficient and sustainable conjunctive management of surface and subsurface reservoirs as a crucial part in improved input-side planning. Resulting increases in the reliability of water supply would then directly reduce the farmers’ production risk and thus translate into tangible gains throughout the agricultural sector and the larger economy (Singh et al. 2007). This is widely considered a key goal in improving the livelihoods of rural populations (Faures and Santini 2008).
With the freshwater allocation problem in focus, our study addresses the issues surrounding input-side planning through 1) illustrating the importance of a complex network of interactions and tradeoffs between surface and subsurface water supplies used for irrigated agriculture in dry regions; 2) the formulation of a flexible, data-driven approach at district scale where a set of hydroclimatological predictors are utilized to model irrigated crop area for two major crops in the region under consideration; and 3) the application of this model as both a diagnostic and possibly predictive tool for modeling farmers’ responses to environmental conditions to improve water resources management and planning.
We focus on the Telangana region and eight of its nine districts in the state of Andhra Pradesh (AP) in India (see map in Fig. 1). This region is chosen for a number of reasons, not the least being the availability of extensive hydroclimatological and agricultural data for at least the past 35 yr. Special attention is paid to the dry-season (Rabi) agriculture because of the heavy reliance on groundwater reserves and other forms of stored and imported water during this period of low rainfall.
The plan of the paper is as follows: The next section introduces agricultural practices and climate and water resources over the Telangana region and its districts. We then present the data and potential hydroclimate predictors used in our case study. The methods section introduces our statistical modeling approach to investigate the influence of hydroclimate predictors and, to a lesser extent, social/technological influences on irrigated area. Our methods include a new application of a machine learning approach to model selection and an ensemble Gaussian process modeling strategy to improve model predictive capability and fully account for uncertainty. Results of the modeling study and discussion follow the methods section. The paper concludes with a discussion and general recommendations.
2. Irrigated agriculture in Telangana, Andhra Pradesh
a. Regional characteristics
Telangana is geographically homogenous. The mean annual rainfall in the Telangana region is 940 mm, with the southeastern districts Mahabubnagar (MAH) and Nalgonda (NAL) having considerably lower mean values of 713 and 741 mm, respectively. More than 75% of the total precipitation falls during the wet Kharif season from June through September. Temperature ranges from 28° to 37°C in the dry season (Rabi) from November until the end of May and from 21° to 25°C during the southwest monsoon season (Kharif) from June to October (Singh et al. 2007). Droughts are recurring phenomena in the Telangana region. Two major rivers drain into the region: the Godavari River to the north and the Krishna River to the south.
As with most of the semiarid regions in South Asia, Telangana has witnessed a dramatic increase in irrigated agriculture over the last 30 yr, especially in terms of the growth in total area irrigated. The food crop mix in Telangana is dominated by rice, maize, wheat, jowar (sorghum), and bajra (pearl millet). Total irrigation water requirements for these five major crops have increased by more than 50% over the same period. Figure 2 shows the development of sourcewise and total irrigated area for the Telangana region (data sources: World Bank and Center for Monitoring of Indian Economy). The ratio between gross irrigated rice and maize area and total gross irrigated area (GIA) has declined from 90% in the 1970s to 60% in 2004 approximately, thus indicating a move toward greater diversity in agricultural production.1 Still, the water-intensive maize and rice crops make up more than 70% as a fraction of food crops only GIA. Using standard crop coefficients and the assumption of 1.5 m for evapotranspiration, it can be estimated that rice and maize accounted for more than 90% of total irrigation water requirements from 1970 to 2005 and thus may be considered the dominant players.2
The net irrigated area (NIA) from groundwater has expanded from 1970 to 2000 from 800 000 ha to 1 600 000 ha with a recent decline to 1 200 000 ha (see Fig. 2).3 This corresponds to an increase in the fraction of total NIA due to groundwater from 20% to 80% (i.e., NIA from all supply sources). Over the same time period, GIA doubled from 1 million hectares (mio ha) to 2 mio ha with the recent decline to 1.5 mio ha after 2000. Rabi irrigated area (RIA) exhibits a linearly increasing trend with large fluctuations and a total growth from 300 000 to 500 000 ha with a decline to 400 000 ha during the recent droughts.4
Interestingly, per area irrigation water requirements have declined from 1100 to 900 mm over the same period, which corresponds approximately to a 20% reduction. This is due to shifts in the crop mix as well as more efficient overall agricultural production. For paddy rice yields, for example, a linear technology trend is clearly visible from 1968 to 2005 that follows closely the statewide trend in Andhra Pradesh, despite the unsuitability of the climate for paddy rice production in the region. Such increases in per area production outcomes clearly reflect increases in the quality of production factors such as high yielding varieties and increased use of fertilizers as well as better agricultural management. Nonetheless, the expansion of irrigation has all but offset any water savings through efficiency gains and clearly played the key role in the rapid growth of agricultural production in the region (see also Vakulabaharanam 2004).
Project canal and tank irrigated area as a fraction of total NIA declined from 80% to 20%. As a result, in 2004, for example, irrigated area from groundwater wells was more than 3.5 times larger than the area under irrigation from other sources (see Fig. 2). The decline in areas served by minor irrigation tanks is noteworthy and is generally attributed to the lack of maintenance and siltation problems. The decline in areas irrigated by other wells (dug wells and bore-cum-dug wells) can be explained by the growing importance of deep tube wells to allocate water from ever-increasing depth to groundwater due to falling water tables.
b. Causes and consequences of groundwater depletion
As in many other regions in India, allocation decisions in the farm sector in Telangana are greatly distorted because of input subsides, most importantly the provision of below generation and transmission cost electricity for groundwater pumping, and market sector subsidies (i.e., minimum support prices). In Andhra Pradesh, for example, agricultural electricity consumption corresponds to approximately 40% of total power consumption. However, it contributes only marginally (4% in 1999–2000) to total power sector revenues (S. P. Tucker 2008, Secretary of Irrigation, Andhra Pradesh Government, personal communication).5 Lacking a customer–provider business relationship between the electricity utility and the farmer, there exists neither the means nor the incentive from the side of the utilities to provide reliable 24/7 power supply.
Farmers too, facing at most a low flat tariff, have no incentive to invest in appropriate and efficient pumps and piping and in fact are encouraged to invest in pumps that help them deal with the low-quality rural power supply, resulting in further inefficient water use and with major consequences to their income (access to water when needed, equipment failure, etc.). On top of that, famers’ behaviors (e.g., crop choices and area irrigated) suggest that they do not take into account the in situ value of groundwater. That is, the present value of future sacrifices (due to future unavailability of groundwater) associated with the current use of groundwater is considered to be small, if not nil, from their perspective. In fact, the common-property nature of groundwater together with farmers’ risk aversion causes them to have unusually high discounting rates, which can explain this (Binswanger 1980).
From the farmers’ perspective, whatever is not consumed today on their part might no longer be available tomorrow because of competitive abstraction in the vicinity of their own plots. So, prior to each agricultural season, the farmers measure the level of groundwater tables with a plummet. Whereas smaller drawdowns mean more water available for dry-season irrigation, higher drawdowns (lower water tables) constrain the size of irrigated perimeters and their experiences lets them then translate a certain aquifer filling stage into an acreage that can be irrigated safely. This has resulted in abstraction levels exceeding recharge and a gradual decline in the water table. In many places, water has simply run out, and whatever is recharged by the monsoon is all used up within the same year, as far as pumping depth and capacity allows.
Observation of water table time series in the Telangana region show that districtwide mean drawdown levels, that is, the depth at which groundwater can be found relative to the surface elevation in the region has increased twofold to threefold over the last 30 yr, with high spatial variability and locally significant higher values. To assess the severity of groundwater mining, the Groundwater Department of AP characterizes groundwater developments in 1230 watersheds, called units, with each one covering roughly 300 km2. The latest assessment reveals that 118 units (corresponding to 219 mandals)6 are categorized as overexploited, 79 units are categorized as critical (77 mandals), 188 units are categorized as semicritical (175 mandals) and 772 units are categorized as safe units (P. Raj 2008, personal communication; see also Raj et al. 1996; Raj 2004a,b, 2006).7 As a result, the region faces considerable uncertainty with respect to the availability and sustainability of groundwater supplies.8
In summary, farmers and utilities are locked in a suboptimal energy, water, and agricultural production equilibrium (Shah et al. 2003, 2007; Shah 2008). Unfortunately, political circumstances prevent the system from moving to a more mutually beneficial state of nondistortive pricing and better infrastructure. Whenever current government wants to push through a water pricing reform, they get voted out of power in the next election round by populist parties that hijack the environmental sustainability agenda in favor of a pro-poor, active livelihood support program through subsidies.
What are the consequences of groundwater depletion? Groundwater is a convenient perennial source of water that modulates the variability in surface water supplies. It performs the dual role of increasing the mean and reducing the variability of the total water supply at any given location (Gemma and Tsur 2007). Resource depletion translates into a reduction of buffering capacity and value (Tsur and Graham-Tomasi 1991). In other words, the future ability to mitigate potential climate variability including drought gets undermined by current exploitation strategies; as a result, farmers are again exposed to unpredictable climatic conditions, especially during the dry season. In a monsoon-type climate, this is all the more critical due to the short-duration and high-intensity characteristics of precipitation events.
Apart from this intertemporal inefficiency, falling subsurface water tables also decrease the poor farmers’ ability to access groundwater because of their limited capital means to continuously lower their wells to chase the groundwater to ever-increasing depths. Furthermore and from a societal perspective, the energy required to pump a given amount of water out of the ground is proportional to the depth from which it is pumped. Because energy for pumping is state subsidized, the burden on society grows correspondingly to the extent that overall benefits from agricultural production get eroded away by the total costs from energy supply.
Also, depletion can induce a deterioration of water quality in the aquifer where pollution sources are present (Zammouri et al. 2008). Finally, it often leads to a desiccation of shallow rooting natural vegetation, because the plants lose access to the soil moisture that is critically required for them to survive the dry season (Reddy 2005; Dubash 2002; Bear et al. 1999; Lahm and Bair 2000; Scanlon et al. 2007; Central Groundwater Board 2007; Harvey et al. 2002; Cooper et al. 2006).
In summary, in regions where irrigated agriculture depends on access to groundwater, the depletion of these resources is inefficient from an economic point of view, impacts natural ecosystems in an unsustainable way and thus might also cause irretrievable loss of biodiversity, and causes equity issues to emerge because the societies’ poor are overproportionally impacted. It is thus important to acknowledge that there is a limit to the extent that groundwater can be extracted in a sustainable way from shallow aquifers. Especially in the dry regions and in the development context where regulation of subsurface water use is most often absent and where societies crucially depend on access to these common-property resources, falling groundwater tables, with corresponding environmental and societal consequences, are very problematic. They should gain the fullest attention of decision makers and stakeholders alike.
c. What is needed?
Clearly, the governmentally subsidized expansion of irrigated agriculture did not take into account the finiteness of the groundwater resources and certainly did not respect any sustainability constraints of the subsurface resources, because the largely unseen nature of groundwater led planners and farmers alike to believe that the groundwater era was indeed a golden one, with unbounded quantities of irrigation water suddenly becoming available because of the technological advances. In reality, it is the climate boundary condition that ultimately determines the recharge to groundwater (i.e., the percentage of total precipitation that ends up in the subsurface). From a management perspective, long-term average recharge values should thus serve as an upper limit to aggregate extraction activities, which themselves are driven by the total acreage of irrigated area, if the resources are to be preserved.
The complex, heavily decentralized and unmonitored hydropolitical environment under study here seems to preclude effective water-demand management, either through direct regulation (e.g., by withdrawal permits, limits to extraction depth, etc.) or indirectly through incentives (e.g., by inputs-based pricing, etc.). For this reason, it appears natural to focus on input-based planning that focuses on the provision of optimal amounts of water (including surface water) and energy at a given time and given location as a function of expected water demand. The task of forecasting irrigated water demand seems a formidable one, especially if one considers that seasonal production outcomes are aggregates of a large number of independent decisions as well as performances. However, all is not lost. As much as the filling stage of a surface reservoir determines the upper limit of the area that can potentially be irrigated, so does the level of groundwater in the shallow aquifers in Telangana determine the potential for agricultural activity. In areas where both surface and groundwater supply options are available, obviously total conjunctive supply will serve as a measure for potential irrigation activity.
For any given dry season, groundwater tables are primarily determined by ex ante recharge. Thus, we argue that some features of the prior-season hydroclimate, such as total wet-season rainfall, the number of wet days, some measures of rainfall intensity, etc., or any combination of a subset of these features are key determinants for the acreage chosen for irrigation in the subsequent agricultural season. Certainly, we are aware that this understanding of real-world agricultural production rests on a simplistic understanding. However, we posit that a conceptual model along the lines discussed here is a powerful tool to actually forecast total dry-season irrigated area and, with that, total irrigation water demand for this period.
It is important to emphasize that other factors, such as input prices, technological innovation, and differing levels of subsidies, also have an impact on potential agricultural activity, including crop choice, and may mitigate the effects of hydroclimatic variability. Unfortunately, many times good-quality socioeconomic time-series data on these features are not available. This is also the case for the present study. Thus, we concentrate exclusively on the effects of hydroclimatic variability on irrigated agriculture. However, we note that, if reliable socioeconomic data ever became available, the conceptual model presented here can easily be amended to accommodate these.
3. Data and sources
Crop-specific data for food crops are available for each district in India and compiled from the World Bank (Dinar et al. 1998) and from the Indian Harvest Database from the Center for Monitoring Indian Economy (Center for Monitoring of Indian Economy 2008). As a result, a district-level data record consisting of area cropped, crop production, and crop yield for the major crops from 1956 to 2005 could be established. Data for all districts in Telangana were compiled. The Rangareddi district is not further considered after agricultural production has largely been abandoned there because of the relentless expansion of Hyderabad and its suburbs.
All agricultural data are for a particular agricultural year that starts in June and ends in May the following year. Seasonal data for Kharif and Rabi in a particular year are available from 1970 to 2005. Other key agricultural data, such as fertilizer type and use, seed type and quality, and data on other production factors, are not available for the time period under consideration.
Daily gridded precipitation data are obtained from Rajeevan et al. (2006) and are available from 1 January 1951 until 31 December 2008 at a resolution of ∼100 km × 100 km. Data on temperature are obtained from National Centers for Environmental Prediction (NCEP) daily global analyses data and provided by the National Oceanic and Atmospheric Administration (NOAA)/Office of Oceanic and Atmospheric Research (OAR)/Earth System Research Laboratory (ESRL)/Physical Science Division (PSD), Boulder, Colorado, from their Web site (available online at http://www.cdc.noaa.gov/; U.S. Department of Commerce 1994; Randel 1987; Trenberth and Olson 1988a,b) at a resolution of ∼250 km × 250 km. Both, precipitation and temperature data are downscaled to the irregularly shaped districts using a simple area-weighted averaging approach. For a given district, we define a dry day as one where the precipitation is less than 0.05 mm day−1. Heat days are defined as temperature exceeding the 25°C threshold. Because of a lack of good quality, high-resolution, district-level groundwater data, these and a number of additional hydroclimate variables may be considered as proxies in an attempt to capture at least some measure of shallow groundwater recharge (e.g., previous wet-season total rainfall, previous-year wet-season rainfall, intensity, etc.).
Sea surface temperature (SST) data were obtained from the extended reconstructed sea surface temperature dataset [NOAA/National Climatic Data Center (NCDC) Extended Reconstructed Sea Surface Temperature Dataset (ERSST), version 3; available online at http://iridl.ldeo.columbia.edu/; see also Xue et al. 2003; Smith et al. 2008). Seasonal SST indices are constructed for the central Indian Ocean (CIO; fall), the Arabian Sea (ASI; winter), and the Indian Ocean northwest of Australia (NWAI; winter) following the methodology outlined in Clark et al. (2000). In total, 80 potential predictors (features) are assembled with lead times (relative to the Rabi planting season) ranging from one month to one year.9
4. Modeling approach
Our goal is to model dry-season irrigated area for maize and rice production in an effort to anticipate water requirements and inform policy decisions concerning water use in the Telangana districts. The challenge we face is that agricultural production decisions are complex, aggregate outcomes of a set of stimuli, including economic and climatological factors as well as governmental incentives. For different groups of farmers in different regions and with variable endowments of capital and natural resources, the importance of individual stimuli varies. In other words, even over a relatively homogenous region such as Telangana, variability in the complex interactions between hydroclimatology, water resource availability, and economic factors may result in considerable heterogeneity at the village/district scale. Therefore, spatially aggregated modeling approaches may well smooth over the underlying variability and ignore information that is important for individual farmers and district planners. A further implication is that, to effectively model agricultural production at smaller scales, unique models may be required at each district.
This finding is potentially daunting because we do not presume to know the appropriate model of the problem but rather face the task of model selection. To do this, we utilize a modeling technique from the machine learning community that can be broadly classified as a supervised learning approach. Simply put, our model should achieve maximum regression accuracy with a minimal set of relevant features (predictors) while at the same time avoiding overfitting. For this purpose, we propose a two-stage modeling approach that consists of a feature subset selection stage (stage 1) and a bootstrap aggregation stage (stage 2).10
In stage 1, a statistical estimator is run on a dataset that is partitioned into an internal training sample and a holdout sample, with different features removed from the list of predictors. The subset of features with the highest evaluation performance is chosen as the final set to be used for the regression. Its performance is then evaluated on an independent test set that was not used during the search (Kohavi and John 1997). This procedure is repeated 10 times with different initialization values in a model step called outer cross-validation to establish feature ranks. For a particular feature, a rank of 30%, for example, means that the particular feature under consideration ends up 3 out of 10 times in the final model selection (a process diagram of our modeling approach is shown in Fig. 3). This procedure ensures that we obtain the set of features with minimum redundancy and maximum relevance for the particular regression problem.
To improve model performance and accuracy, we use bootstrap aggregation (bagging) in stage 2 (Breiman 1996). The basic idea is straightforward. Bootstrap samples are obtained from the feature subset chosen in stage 1. As a result, we get a family of training sets from which statistical estimator models are developed. These models are then combined by simple model averaging to arrive at a final predictive model. Several studies have shown that bagging can be suitable for unstable regression models such as ours: that is, models that suffer from small training sets and thus are sensitive to small changes in these sets (Chen and Ren 2009; Kirk and Stumpf 2009).
The modeling approach is summarized in the following list:
Stage 1 (see also Fig. 3):
(i) Split data in training, holdout, and test sets;
(ii) Train model with inner cross-validation and retain selected features;
(iii) Retrain model with outer cross-validation and establish ranks of retained features;
(iv) Delete features from final set of retained features with lowest ranks as established in (iii).
Stage 2:
(i) Model averaging by bootstrap aggregating (bagging).
To model the time-dependent process that generates the data, we use district-level Gaussian process models (Cressie 1993; Abrahamsen 1997; Rasmussen and Williams 2006). The specific details of our models as well as details related to the model implementation will be reported in a separate, more technical paper.11
Additionally, the sensitivity of the regression models to each of the selected features is evaluated. Two approaches are employed. The first is a global sensitivity analysis for each feature in each district, which accounts for the influence of individual features on the irrigated area as well as the uncertainty associated with that influence. The second approach evaluates the proportion of the total irrigated area variance in each district due to 1) individual features, i.e. Si and 2) nonlinear interactions between a particular feature and all other features within a district, i.e. Ti. These two measures are defined as follows: Si is defined as V{E[y|F(i)]}/V(y) and is the variance of the conditional expectation V{E[y|F(i)]} divided by the total variance of the output V(y), where F(i) is a feature in a particular district. Here, Si is a measure of the portion of variability that is due to a variation in the main effects for each input variable and thus a measure of feature importance (Gramacy and Taddy 2009; Saltelli et al. 2008). Also, Ti is defined as E{V[y|F(∼i)]}/V(y) and measures the residual variability after the variability due to all other features F(∼i) has been accounted for. Note that both Si and Ti are bounded within [0, 1].
The interpretation of these two test values is straightforward. For a particular feature, if Ti is large and Si is small, then interaction effects between feature F(i) and all others within a district are important for the observed response Y (irrigated area in our case). If Ti and Si are both large, then the individual feature and its interaction with other features are important. Thus, a feature that exhibits little direct influence on the irrigated area may be quite important when its interactions with other features within a district are considered.
5. Results
a. Districtwise model results
The results from the model selection step are shown in Tables 1 and 2. Table 1 lists all features from the total set of 80 features set that are retained after the feature selection search [features are subsequently abbreviated by F(1)–F(15), Y denotes the set of observations; i.e., RIA]. Table 2 lists stage 1 modeling results for individual districts and shows the most relevant features. Features with a ranking of less than 20% were not chosen to be included in the final models to avoid overfitting. Although the choice of the 20% cutoff is arbitrary, it is consistent with the literature for problem instances with many feature subsets (e.g., 280 in our case) and small training sample sizes (for a discussion, see also Kohavi and John 1997). This potential pitfall of the above presented stage 1 approach is a problem in instances such as ours. The ranking percentiles shown in Table 2 are an indication of the robustness of each feature. Higher percentiles (≥50%) indicate that a particular feature was frequently selected in the outer cross-validation run, and lower percentiles (≤50%) indicate that a feature was less frequently selected.
Obviously, the features which are chosen for their predictive potential vary from district to district (see Table 2). Time [F(1)], which serves as a proxy for infrastructure development and improvement in agricultural techniques, appears as a robust feature in all but three districts [Warangal (WAR), Mahabubnagar, and Nalgonda]. With respect to hydroclimatic features, two main groups emerge, those related to heat/dry days [F(3)–F(6) and F(10)–F(11)] during the planting season of interest (i.e., the Rabi dry season) and those associated with monsoon rainfall {e.g., rainfall intensity [F(9)] and previous-year rainfall [F(2), F(7)–F(9), F(13)–F(15)]}. Additionally, Khammam (KAM) and Nalgonda also exhibit influence from Arabian Sea SSTs [F(12)] and upstream (Krishna River) rainfall [F(13)], respectively. Selected hydroclimate-related features vary in robustness from 20% to 60%.
Districtwise modeling results using the features described in the Table 2 are shown in Fig. 4. The gray shading surrounding the mean predictions are a measure of model uncertainty; widely (narrowly) spaced confidence bounds indicate greater (lesser) uncertainty. The models perform reasonably well in most of the districts but show the narrowest range of uncertainty in Adilabad (ADI), Mahabubnagar, and Karimnagar (KAR). Various model error statistics are reported in Table 3.
Models which have time [F(1)] plus additional features as predictors tend to perform well [i.e., Adilabad, Khammam, Karimnagar, Nizamabad (NIZ), and Medak (MED)]. The districts that are modeled exclusively with hydroclimate predictors perform less well (e.g., Warangal, Nalgonda). Mahabubnagar is an exception in that present-year and previous-year wet-season rainfalls are able to accurately predict dry-season cropped area. These results suggest that the influence of individual features varies from district to district, but they do not tell us which features dominate or whether interactions are important.
b. Sensitivity analysis
To investigate the differences in model performance from district to district, a sensitivity analysis of the individual features and interaction effects is undertaken. A global sensitivity analysis helps to resolve the variability in total dry-season irrigated area of maize and rice Y by apportioning elements of its variation to the different relevant features. For each district and the corresponding relevant feature there (see also Table 2), Figs. 5 and 6 show the marginal conditional response distributions that arise when Y has been marginalized over the complement of this district’s features subset while letting the specific feature under consideration vary. For example, in Karimnagar, the two most important features identified in stage 1 are time [F(1)] and June–October (JJASO) rainfall intensity [F(9)]. The sensitivity of Y to variation in the feature F(1), while having marginalized over feature F(9), shows positive relation that is an indication of the importance of the time trend. Similarly, sensitivity to F(9) shows the positive relation between increases in rainfall intensities and irrigated agricultural dry-season area Y.
Figures 5 and 6 indicate that most of the features have narrow uncertainty bounds but vary substantially in the sign and magnitude of their influence. All districts that have time [F(1)] as a feature, except Adilabad, show high sensitivity and positive slope, indicating that this feature is important in explaining trends in irrigated area. Mahabubnagar shows high sensitivity and a positive response in irrigated area because of both previous-year rainfall and present-year monsoon rains. Other features with high positive influence are upstream rainfall in Nalgonda, rainfall intensity in Karimnagar, and wet days in Adilabad. Although features associated with heat/dry days exhibit negative slope, as one would expect, only Warangal exhibits a high sensitivity to this feature. Many features elicit nearly flat responses, indicating little to no direct influence on irrigated area. However, it would be incorrect at this stage to conclude that these features are unimportant, because we have still not taken into account the role that interactions between features play in determining the outcomes in Y.
To further investigate the role of individual features and their interactions in each district, a global sensitivity analysis is employed (see previous section for more details). Table 4 lists first-order Si and total sensitivity Ti indices for each district and the corresponding features F(i) within. The proportion of total variance explained by individual features Si in each district largely supports the previous findings, with large (small) values indicating greater (lesser) influence. Examination of the total sensitivity indices Ti reveals that interactions between features are indeed important and that features that are seemingly meaningless individually are influential when their interactions with other features are considered.
As can be seen in Table 4, Si ≪ Ti in many instances, which implies that interactions between features in a particular district may be as important as, if not more important than, the features individually. A good example of this effect may be found in the results for Medak, a district whose irrigated area is captured quite well by the model. The features [F(10)–F(11)] related to heat days individually account for 8% and 4%, respectively, of the variation in output and are apparently of little predictive value (see Fig. 6). However, when the interactions between these features, including time [F(1)], are taken into account, the total variation explained jumps to 71% and 66%. Similarly, substantial increases in influence due to interactions are observed even in districts where the temporal trend dominates the response in the irrigated area (e.g., Warangal, Khammam, and Mahabubnagar).
It is important to note that care should be taken when interpreting these tests, because some districts that exhibit significant interaction effects also have large uncertainties associated with the modeling of irrigated area (e.g., see Nalgonda in Fig. 4). In these instances, the results of the total sensitivity analyses may not be meaningful. However, in districts where the models perform well, such as Mahabubnagar, Karimnagar, Nizamabad, and Medak, the sensitivity indices are useful for assessing the influence of individual features and their interactions.
6. Discussion
a. General findings
In 5 out of 8 districts, time [F(1)] is a robust feature that likely indicates a technology trend that corresponds to the districts that have witnessed the largest expansion in irrigation infrastructure over the time period in question (cf. with Table 16 in Vakulabaharanam 2004). In Nizamabad, for example, the good model performance is almost entirely due to the temporal/technology feature and indicates the significance of influences from agricultural sector improvements.
Interestingly, the two districts of Mahabubnagar and Medak, with the lowest percentage of area irrigated from surface water (cf. with Fig. 1), are the ones with the highest predictability as shown in Fig. 4. There, tube wells tend to make the dominant contribution to NIA and the individual features and their interactions make significant contributions to the overall variance in irrigated area. More precisely, for Mahabubnagar, the relevant feature set is previous wet-season precipitation and previous-year wet-season precipitation. Figure 5 shows that both features have a significant positive response effect with increases in the corresponding precipitation values. There, as elsewhere, recharge through precipitation in the wet season recharges groundwater. In this predominately groundwater-irrigated district, the amount of previous-season and previous-year precipitation is thus a crucial determinant of the state of groundwater at the beginning of the subsequent dry season. Whereas excessive rains can have a negative overall impact on agriculture in the Kharif season because of a deficit in solar radiation, such a negative outcome can, to a certain extent, be compensated by a better agricultural performance in the subsequent dry season because of favorable groundwater levels.
Although Mahabubnagar and Karimnagar exhibit the positive influence that monsoonal rains have on dry-season irrigated area, the opposite influence (i.e., excessive heat during Rabi and the preceding Kharif season causing a negative impact on Rabi irrigated area) can be observed in Warangal and Medak. Warangal is another predominately groundwater-irrigated district with little alternative supply availability from surface irrigation water (see Fig. 5). Reported maize and rice area in Medak, unlike other districts, is heavily influenced by the number of heat days throughout the dry season. This is not necessarily evidence of a direct temperature–area link but rather may indicate increased physiological heat and water stress of the plants, which might make it necessary for farmers to contract agricultural activity to smaller areas during the season.
The larger uncertainties associated with districts like Adilabad, Nalgonda, and Warangal are a manifestation of the complex aggregate outcomes of individual farm-level decisions in locations where multiple options for irrigation exist and where the vagaries of climate variability are damped because of alternative supply options. As can be seen in Fig. 1, irrigation from surface canals and dug wells contribute roughly equally to NIA in Nalgonda, whereas all four sources contribute equally in Adilabad and Warangal is dominated by groundwater sources. These districts may indicate the importance of diversified backup supplies and/or management strategies that can buffer in times of absolute water scarcity due to insufficient rains and little aquifer recharge during the monsoonal season.
Nalgonda, with a considerable fraction of land irrigated from canals fed by surface water from the upstream Krishna catchment, shows interesting results with regard to the sensitivity analysis carried out. Although first-order sensitivity effects are small [F(13): total June–September (JJAS) upstream precipitation in year t in Krishna basin] or even insignificant for individual features [F(4): November–February (NDJF) dry days; F(15): JJASO total rainfall in year t − 1], their total effect due to interaction is substantial [see Fig. 6 and Table 4, where Ti ≫ Si, i = {F(4), F(13), F(15)}]. This can be intuitively understood. If late monsoon season rains during early Rabi all but fail [F(13)] in conjunction with unfavorable groundwater levels due to insufficient Kharif season recharge [F(15)], surface water irrigation from multiyear manmade storage systems such as the Nagarjuna Sagar dam becomes all the more important. In this case, large Ti shows that district-level outcomes Y are uniquely associated with particular nonlinear combinations of features in a way that cannot be adequately described by the first-order sensitivities as shown in Fig. 6 and Table 4.
There are key outcomes of our modeling approach. To summarize, these are 1) preceding wet-season predictors associated with rainfall and drought with variable lead time are important; 2) these predictors are particularly important for districts that are less dependent on surface water irrigation and more dependent on tube wells, dug wells, and boreholes; and 3) interaction effects are also important as illustrated by Nalgonda.
b. Policy implications
The results presented here show that districtwise dry-season irrigation water requirements for the two main crops from the consumptive water-use perspective can be anticipated within the bounds of uncertainty. This can inform decision makers in planning for various supply options, including better timing and spatially targeted deliveries of energy to the rural sector for groundwater pumping. Additionally, it can inform conjunctive use management: for example, easily implementable rules such as (i) use surface water for irrigation during wet years and (ii) use groundwater for irrigation in dry years.
Districts where groundwater irrigation dominates increasingly face regional depletion of the limited subsurface storage as average total extraction has exceeded long-term recharge for at least the last 15 yr. In the absence of large- and small-scale surface water storage and canal irrigation systems (that guarantee timely delivery of sufficient amounts of irrigation water) and their targeted development, there is little that can be done in these districts apart from 1) incentivizing a switch to less water-intensive crops (i.e., through economic incentives such as price signals) to ensure the long-term viability of agriculture and 2) developing alternative resources, ranging from small-scale tank irrigation schemes up to large-volume canal supply schemes.
As population numbers and the stress on the environment keep growing proportionately, policy implications point toward the importance of conjunctive water resources management in individual districts because overreliance on one source of irrigation water increases the production risk of farmers in case of supply failure from that particular source (because of, e.g., multiyear drought, competitive regional abstraction, etc.). Like this, decision makers can more easily ensure that future interannual agricultural outcomes are stable at high levels while not reducing the likelihood of achieving future target levels because of inappropriate (i.e., unsustainable) present-day management. This becomes ever more important in the face of projected changes in the mean climate state and climate variability resulting from anthropogenic climate change and the potential for upstream and downstream disputes between Maharashtra and AP.
As more data become available through retrieval of statistical archives and through the collection of additional predictors, our flexible modeling approach can be easily modified to accommodate new and potentially relevant data. Finally, it is important to note that our method can be readily applied in other dryland regions on the planet, where sufficient amounts of data are available to justify such approach.
7. Conclusions
With increasing population pressure and potentially severe impacts from climate change, semiarid lands increasingly face severe levels of water stress because of overutilization and degradation of the available water resources. We present a robust statistical modeling strategy to better anticipate aggregate water requirements at subregional scales for agricultural production. Our results suggest that, even within a small and relatively uniform region from the hydroclimatological and geographical perspective, agricultural outcomes are the result of a diverse number of critical determinants. The prevailing characteristics of the hydroclimate are shown to be important, but interdistrict variability in infrastructure, socioeconomic boundary conditions, and complex nonlinear interactions play significant roles in influencing the irrigated area. This shows the importance of scale at which modeling is carried out.
The results of our study help to assess trade-offs from the allocation of surface water for irrigation as well as energy resources for groundwater pumping. This will increase management and planning capabilities of government planners, because they are dealing with increasing, unprecedented environmental and socioeconomic challenges in the future. Extending our approach to different regions in India and other drylands on the globe, where irrigated agriculture plays a dominant role, is an exciting topic for future research.
Acknowledgments
Support from the PepsiCo Foundation and the Cross Cutting Initiative at the Earth Institute, Columbia University, is gratefully acknowledged. We thank Robert B. Gramacy, Andrew Robertson, S. P. Tucker, A. C. Reddy, and Fernando Cela-Diaz for helpful discussions.
REFERENCES
Abrahamsen, P., 1997: A review of Gaussian random fields and correlation functions. Norwegian Computing Center Tech. Rep. 917, 70 pp.
Alley, W. M., Healy R. W. , LeBaugh J. W. , and Reilly T. E. , 2002: Flow and storage in groundwater systems. Science, 296 , 1985–1990.
Bandyopadhyay, P., and Mallick S. , 2003: Actual evapotranspiration and crop coefficients of wheat (triticum aestivum) under varying moisture levels of humid tropical canal command area. Agric. Water Manage., 59 , 33–47.
Bear, J., Cheng H-D. , Soreck S. , Ouazar D. , and Herrera I. , 1999: Seawater Intrusion in Coastal Aquifers—Concepts, Methods and Practices. Kluwer, 625 pp.
Binswanger, H. P., 1980: Attitudes toward risk: Experimental measurement in rural India. Amer. J. Agroc. Econ., 62 , 395–407.
Breiman, L., 1996: Bagging predictors. Mach. Learn., 24 , 123–140.
Center for Monitoring of Indian Economy 2008: Indian harvest database. Electronic Database.
Central Groundwater Board 2007: Annual report, 2006–2007. Government of India Ministry of Water Resources Tech. Rep., 48 pp.
Challinor, A. J., Ewert F. , Arnold S. , Simelton E. , and Fraser E. , 2009: Crops and climate change: Progress, trends, and challenges in simulating impacts and informing adaptation. J. Exp. Bot., 60 , 2775–2789. doi:10.1093/jxb/erp062.
Chao, B. F., Wu Y. H. , and Li Y. S. , 2008: Impact of artificial reservoir water impoundment on global sea level. Science, 320 , 212–214.
Chen, T., and Ren J. , 2009: Bagging for Gaussian process regression. Neurocomputing, 72 , 1605–1610. doi:10.1016/j.neucom.2008.09.002.
Clark, C., Cole J. , and Webster P. , 2000: Indian Ocean SST and Indian summer rainfall: Predictive relationships and their decadal variability. J. Climate, 13 , 2503–2519.
Cooper, D. J., Sanderson J. S. , Stannard D. I. , and Groeneveld D. P. , 2006: Effects of long-term water table drawdown on evapotranspiration and vegetation in an arid region phreatophyte community. J. Hydrol., 325 , 21–34.
Cressie, N., 1993: Statistics for Spatial Data. Wiley-Interscience, 900 pp.
Dinar, A., Mendelsohn R. , Evenson R. , Parikh J. , Sanghi A. , Kumar K. , McKinsey S. , and Lonergen J. , 1998: Measuring the impact of climate change on Indian agriculture. World Bank Tech. Rep. 402, 266 pp.
Dubash, N. K., 2002: Tubewell Capitalism: Groundwater Development and Agrarian Change in Gujarat. Oxford University Press, 287 pp.
Faures, J-M., and Santini G. , 2008: Water and the rural poor. Food and Agricultural Organisation Tech. Rep., 93 pp.
Fekete, B. M., Vorosmarty J. C. , Roads J. O. , and Willmott C. J. , 2004: Uncertainties in precipitation and their impacts on runoff estimates. J. Climate, 17 , 294–304.
Gemma, M., and Tsur Y. , 2007: The stabilization value of groundwater and conjunctive water management under uncertainty. Rev. Agric. Econ., 29 , 540–548.
Gramacy, R. B., and Taddy M. , 2009: Categorical inputs, sensitivity analysis, optimization and importance tempering with tgp version 2, an R package for treed Gaussian process models. University of Cambridge Statistical Laboratory Tech. Rep., 50 pp.
Harvey, C. F. Coauthors 2002: Arsenic mobility and groundwater extraction in Bangladesh. Science, 298 , 1602–1606.
Kang, S., Gu B. , Du T. , and Zhang J. , 2003: Crop coefficient and ratio of transpiration to evapotranspiration of winter wheat and maize in a semi-humid region. Agric. Water Manage., 59 , 239–254.
Kirk, P. D. W., and Stumpf M. P. H. , 2009: Gaussian process regression bootstrapping: Exploring the effects of uncertainty in time course data. Bioinformatics, 25 , 1300–1306. doi:10.1093/bioinformatics/btp139.
Kohavi, R., and John G. H. , 1997: Wrappers for feature subset selection. Artif. Intell., 97 , 273–324.
Kundzewicz, Z. W. Coauthors 2008: The implications of projected climate change for freshwater resources and their management. Hydrol. Sci. J., 53 , 3–10.
Lahm, T. D., and Bair E. S. , 2000: Regional depressurization and its impact on the sustainability of freshwater resources in an extensive midcontinent variable-density aquifer. Water Resour. Res., 36 , 3167–3177.
OECD 2008: OECD-FAO agricultural outlook 2008–2017. Organisation of Economic Cooperation and Development and Food and Agricultural Organisation Tech. Rep., 5 pp.
Raj, P., 2004a: Classification and interpretation of piezometer well hydrographs in parts of southeastern peninsular India. Environ. Geol., 46 , 808–819.
Raj, P., 2004b: Groundwater resource, 2004–05, Andhra Pradesh. Government of Andhra Pradesh Groundwater Department Tech. Rep., 34 pp.
Raj, P., 2006: Status of ground water in Andhra Pradesh: Availability, use and strategies for management. Government of Andhra Pradesh Groundwater Department Tech. Rep., 45 pp.
Raj, P., Nandulal L. , and Soni G. , 1996: Nature of aquifer in parts of granitic terrain Mahabubnagar District, Andhra Pradesh. J. Geol. Soc. India, 49 , 61–74.
Rajeevan, M., Bhate J. , Kale J. , and Lal B. , 2006: High resolution daily gridded rainfall data for the Indian region: Analysis of break and active monsoon spells. Curr. Sci., 91 , 296–306.
Randel, W. J., 1987: Global atmospheric circulation statistics, 1000–1 mb. National Center for Atmospheric Research Tech. Note NCAR/TN-295+STR, 256 pp.
Rasmussen, C. E., and Williams C. K. I. , 2006: Gaussian Processes for Machine Learning. MIT Press, 248 pp.
Reddy, V., 2005: Costs of resource depletion externalities: A study of groundwater overexploitation in Andhra Pradesh, India. Environ. Dev. Econ., 10 , 533–556.
Rockstrom, J., Lannerstad M. , and Falkenmark M. , 2007: Assessing the water challenge of a new green revolution in developing countries. Proc. Natl. Acad. Sci. USA, 104 , 6253–6260. doi:10.1017/S1355770X05002329.
Saltelli, A., Ratto M. , Andres T. , Campolongo F. , Cariboni J. , Gatelli D. , Saisana M. , and Tarantola S. , 2008: Global Sensitivity Analysis: The Primer. Wiley, 292 pp.
Scanlon, B. R., Jolly I. , Sophocleous M. , and Zhang L. , 2007: Global impacts of conversions from natural to agricultural ecosystems on water resources: Quantity versus quality. Water Resour. Res., 43 , W03437. doi:10.1029/2006WR005486.
Shah, T., 2008: Taming the Anarchy: Groundwater Governance in South Asia. RFF Press, 310 pp.
Shah, T., Roy A. , Qureshi A. , and Wang J. , 2003: Sustaining Asia’s groundwater boom: An overview of issues and evidence. Nat. Resour. Forum, 27 , 130–141.
Shah, T., Scott C. , Kishore A. , and Sharma A. , 2007: Energy-irrigation nexus in South Asia: Improving groundwater conservation and power sector viability. The Agricultural Groundwater Revolution: Opportunities and Threats to Development, CAB International, 211–242.
Shiklomanov, I. A., 2000: Appraisal and assessment of world water resources. Water Int., 25 , 11–32.
Singh, K. K., Reddy D. R. , Kaushik S. , Rathore L. S. , Hansen J. , and Sreenivas G. , 2007: Application of Seasonal Climate Forecasts for Sustainable Agricultural Production in Telangana Subdivision of Andhra Pradesh, India. Springer, 17 pp.
Smith, T. M., Reynolds R. W. , Peterson T. C. , and Lawrimore J. , 2008: Improvements to NOAA’s historical merged land–ocean surface temperature analysis (1880–2006). J. Climate, 21 , 2283–2296.
Trenberth, K. E., and Olson J. G. , 1988a: Evaluation of NMC global analyses: 1979–87. National Center for Atmospheric Research Tech. Note NCAR/TN-299+STR, 82 pp.
Trenberth, K. E., and Olson J. G. , 1988b: Intercomparison of NMC and ECMWF Global Analyses: 1980–1986. National Center for Atmospheric Research Tech. Rep. NCAR/TN-301+STR, 81 pp.
Tsur, Y., and Graham-Tomasi T. , 1991: The buffer value of groundwater with stochastic surface water supplies. J. Environ. Econ. Manage., 3 , 201–224.
Tyagi, N., Sharma D. , and Luthra S. , 2000: Determination of evapotranspiration and crop coefficients of rice and sunflower with lysimeter. Agric. Water Manage., 45 , 41–54.
UNDP 2006: Human development report. United Nations Development Programme Rep., 440 pp.
U.S. Department of Commerce 1994: Packing and identification of NMC grid point data. U.S. Department of Commerce Office Note 84, 41 pp.
Vakulabaharanam, V., 2004: Agricultural growth and irrigation in Telangana: A review of evidence. Economic and Political Weekly, Vol. 39, No. 13, 1421–1426.
Vorosmarty, J. C., Green P. , Salisbury J. , and Lammers R. , 2000: Global water resources: Vulnerability from climate change and population growth. Science, 289 , 284–288.
Xue, Y., Smith T. M. , and Reynolds R. W. , 2003: Interdecadal changes of 30-yr SST normals during 1871–2000. J. Climate, 16 , 1495–1510.
Zammouri, M., Siegfried T. , El-Fahem T. , and Kriaa S. , 2008: Salinization of groundwater in the Nefzawa oases region, Tunisia: Results of a regional-scale hydrogeologic approach. Hydrogeol. J., 15 , 1357–1375.
Final set of features F(i) as determined by the feature selection search, where Y is the dependent variable. Note that annual refers to an agricultural year starting from the month of June in year t and ending in May in year t + 1, where t is calendar year time.
Feature ranking after GP model wrapping. Out of the total ranked set (see Table 1), only the 2 or 3 most relevant features are shown. These are then chosen as predictors for the district-level models. Features for all eight districts are shown. Note that the district Rangareddi is not modeled because agricultural production has been largely abandoned there.
Districtwise GP model predictive performance on test set (five instances). The following error statistics are tabulated: mean absolute error (mae) and root mean squared error (rmse).
For each district and corresponding feature, means of posterior samples of first-order sensitivity indices S and total sensitivity indices T are reported. The features are ordered top–down as reported in the Table 2.
GIA is the total area irrigated for all crops, including food and nonfood crops, during an agricultural year [e.g., if a farmer plants 2 (possibly different) crops on a 1-ha plot, the GIA equals 2 ha].
Maize and rice irrigation water requirements have been calculated using typical values for crop coefficients in the region. They are as follows: Kc (wheat) = 0.7, Kc (rice) = 1.14, Kc (jowar) = 0.7, Kc (bajra) = 0.8, and Kc (maize) = 0.91 (Bandyopadhyay and Mallick 2003; Kang et al. 2003; Tyagi et al. 2000).
NIA is the total area that is irrigated at least once from a particular source during the agricultural year (e.g., if a farmer plants 2 (possibly different) irrigated crops on a 1-ha plot, the NIA equals 1 ha).
RIA is by definition reported as NIA.
The misbalance most likely has increased even further after the implementation of full subsidies in the agricultural sector in 2002–03. At the same time, rationed agricultural supply, cross-subsidized tariffs, and growth in nonfarm consumption has limited the contribution of agricultural power supply to state budgetary deficits.
A mandal is the smallest administrative unit in India.
With a recharge to precipitation ratio of 15%, estimates suggest that annual renewable groundwater resources are on the order of 32 km3 in Telangana, of which nowadays 13 km3 are consumptively utilized annually, mainly for irrigation purposes (P. Raj 2008, personal communication). These regionally averaged figures do hide considerable spatial variability of recharge to exploitation ratios in individual aquifers.
It is important to mention that surface water supplies from the region’s two major rivers, the Krishna and Godavari, are threatened as well. The upstream state Maharashtra has significant surface water development projects in both catchments, which, once realized, will significantly reduce downstream runoff, with potentially severe impacts in Telangana.
As only selected predictors are presented in the results section, a complete list of the set of features used for model selection is available from the authors upon request.
Bootstrap aggregation is an ensemble method that uses typical averaging schemes over several models to obtain better predictive performance than could be obtained from any single model of the constituent sample (Breiman 1996).
More information on the various aspects of the statistical modeling work is also available from the authors upon request.