The majority of the world’s population growth to 2050 is projected to occur in the tropics. Hence, there is a serious need for robust methods for undertaking water resource assessments to underpin the sustainable management of water in tropical regions. This paper describes the largest and most comprehensive assessment of the future impacts of runoff undertaken in a tropical region using conceptual rainfall–runoff models (RRMs). Five conceptual RRMs were calibrated using data from 115 streamflow gauging stations, and model parameters were regionalized using a combination of spatial proximity and catchment similarity. Future rainfall and evapotranspiration projections (denoted here as GCMES) were transformed to catchment-scale variables by empirically scaling (ES) the historical climate series, informed by 15 global climate models (GCMs), to reflect a 1°C increase in global average surface air temperature. Using the best-performing RRM ensemble, approximately half the GCMES used resulted in a spatially averaged increase in mean annual runoff (by up to 29%) and half resulted in a decrease (by up to 26%). However, ~70% of the GCMES resulted in a difference of within ±5% of the historical rainfall (1930–2007). The range in modeled impact on runoff, as estimated by five RRMs (for individual GCMES), was compared to the range in modeled runoff using 15 GCMES (for individual RRMs). For mid- to high runoff metrics, better predictions will come from improved GCMES projections. A new finding of this study is that in the wet–dry tropics, for extremely large runoff events and low flows, improvements are needed in both GCMES and rainfall–runoff modeling.
With the world’s population projected to grow from 6 billion (in 2000) to 9 billion people in 2050 (UNESCO 2009), and the majority of this growth likely to occur in the tropics, particularly sub-Saharan Africa and South Asia, growth in food production will need to mirror population growth (UNESCO 2009). Consequently, there is a need to develop robust methods of performing large-scale water resource assessments to underpin the sustainable management of water in the face of climate change in these tropical regions.
A key challenge when performing robust water resource assessments in the tropics is, in general, the relatively low density of hydrometeorological data. This may be due to economic, political, and/or proprietary reasons. The relatively sparsely inhabited tropical northern Australia (NA) (less than 1 person for every 6 km2) for example, has less than 1 operational rainfall gauge per 2000 km2, less than 1 operational streamflow gauge per 7000 km2 (Petheram et al. 2009b), and less than 10% of the region’s soils have been mapped at a scale finer than 1:100 000 (Petheram and Bristow 2008). While this same argument is difficult to verify across the tropics, the underrepresentation of tropical regions in existing global hydrometeorological datasets is a supporting line of evidence of data paucity (e.g., WMO 1996; Zhang et al. 2001; McMahon et al. 2007; Rudolf et al. 2010; Peterson et al. 2011).
In this paper we seek to test and demonstrate a pragmatic method of making runoff predictions under current and future climates in NA, which is a large tropical area with typically low data densities. This paper focuses on NA because a prolonged drought in southern Australia (1996–2008) and global pressures for food production have led to renewed interest in assessing and developing the water resources of the north (PMC 2007; Camkin et al. 2008).
Hydrological modeling approach
Many hydrological modeling approaches have been used to simulate runoff under future climates in tropical regions of the world. Traditionally, hydrologists have used the climate outputs from global climate models (GCMs) as inputs to catchment hydrological models (CHM) to simulate future runoff (e.g., Schaake 1990; Xu 1999; Chiew and McMahon 2002; Buytaert et al. 2009; Chiew et al. 2009b). Increasingly, global hydrological models (GHM) and land surface schemes are used within a coupled land–atmosphere–ocean GCM to simulate runoff under future climates (e.g., Nijssen et al. 2001; Manabe et al. 2004). GHMs typically simulate water resource impacts based on a less explicit representation of catchment water resources than CHMs. The land surface schemes of these models attempt to account for feedback effects between the land surface and the atmosphere. This, however, makes land surface schemes relatively complex and it is difficult to specify meaningful values for many of their parameters. Further, the time step and spatial resolution in the GHMs tend to be unrealistic (e.g., time step of <1 h and length scale of >100 km). Despite recent advances, GHMs (e.g., Gosling and Arnell 2011) are still largely considered a research tool. While there are no published examples of the results from these models that have predetermined frameworks for adoption by catchment water managers and policy makers, GHMs are useful for informing high-level global-scale policy and decision making [e.g., Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (AR4) Working Group II (WGII)].
A CHM approach was adopted in this paper because they have a well-established basis in water resource assessments (e.g., Thomas 1981; Servat and Dezetter 1993; Hughes 1995; Xu and Singh 1998; Mwakalila 2003; Chiew et al. 2009b). The scarcity of hydrological data across NA meant that the use of a lumped conceptual rainfall–runoff model parsimoniously matched the available data and model output requirements. More complex alternatives, such as distributed or semidistributed process-based hydrological models, whose parameter values vary in space, are notoriously difficult to parameterize accurately over large areas (Grayson et al. 1992), while less complex approaches, such as regional regression methods (SKM 2009), simple water balance models (Budyko 1974; Zhang et al. 2001), and runoff elasticity approaches (Chiew et al. 2006; Preston and Jones 2008; Petheram et al. 2010), are limited by their (i) general inability to simulate continuous daily time series of runoff necessary for environmental flow assessments and river system models and (ii) susceptibility to potential issues of nonstationarity in hydrological metrics under a changing climate (e.g., Milly et al. 2008; Petrone et al. 2010).
Different CHMs produce different estimates of runoff for the same forcing data. This has been observed by Dibike and Coulibaly (2007), Vaze et al. (2011), and others in temperate regions of the world. Those same studies have, however, demonstrated that for mean annual runoff (MAR) the divergence across CHMs is small (5% ~ 10%) relative to the modeled runoff results using rainfall projections from different GCMs (>40%) for the same downscaling methods. However, flood and low-flow events are generally considered to have greater ecological significance than MAR (e.g., Douglas et al. 2005; Lamontagne et al. 2005; Townsend and Padovan 2005; Webster et al. 2005; Hamilton et al. 2005) and periods of low-flow inevitably limit consumptive use (where continuous supply of water is needed). Petheram et al. (2012), working in the wet–dry tropics of NA, showed that the performance of five rainfall–runoff models (RRMs) was more variable for simulating low flows than when simulating the mean and midflow events.
The objectives for this paper are to
describe methods for estimating large-scale runoff in data-sparse regions of the world,
present broad summary results of runoff under current and future climates across a large tropical region, and
investigate the divergence across GCMs and RRMs for high- and low-flow runoff metrics using one scaling method.
This paper constitutes the first consistent and transparent assessment of runoff across all of NA and is the largest and most comprehensive regional-scale assessment of runoff under current and future climates in a tropical environment using lumped conceptual RRMs. It also provides a framework for modeling runoff from the wet–dry tropics in other large, data-sparse areas, such as Brazil, Mexico, Central America, tropical Africa, southern India, Sri Lanka, and parts of Indonesia, Malaysia, and the Philippines (see Peel et al. 2007).
The study region is described in Section 2. Section 3 describes the method by which streamflow gauging stations were selected, how conceptual RRMs were calibrated, and a pragmatic approach to regionalizing runoff in ungauged catchments. The method by which future catchment-scale climate variables were obtained is also described. In section 4 the runoff results under the historical and future climate are presented and the relative divergence in runoff across GCMs and RRMs assessed. Section 5 discusses why GCMs are so variable in the tropics and why conceptual rainfall–runoff models exhibit so much variability when simulating low flows and extreme runoff. This is followed by concluding remarks in section 6.
2. Study region
The study region includes the northernmost parts of Western Australia (WA), Northern Territory (NT), and Queensland (QLD), and encompasses Australia’s Timor Sea, Gulf of Carpentaria, and the most northern section of the Northeast Coast drainage divisions (Fig. 1), which is an area >1 250 000 km2. The NA landscape is generally of low relief by world standards (Bartle Frere is the highest peak at 1622 m), though outcropping bedrock ranges cause areas of regionally rugged topography. According to the updated Köppen–Geiger climate classification (Peel et al. 2007), the study region is classified as being predominantly tropical savanna (Köppen Aw) or subtropical steppe (Köppen BSh). The dominant vegetation is tropical savannas, and a narrow band of tropical rain forest occurs along the northeast coast and easternmost margin of the Gulf of Carpentaria drainage division (Köppen zone Am in Fig. 1). Extensive grazing is the predominant land use across the study area, and major industries to the NA economy include (in Australian dollars) intensive irrigated agriculture (about $160 million per year), commercial fishing (>$160 million per year), beef production (about $1 billion per year), tourism ($2.8 billion per year), and mining (>$9 billion per year) (Northern Australia Land and Water Taskforce 2009). Population centers are few and widely spaced, with a total population of only about 200 000 people (<1% of Australia’s total).
The Köppen–Geiger zones of NA are characterized as having highly seasonal, summer-dominated rainfall, high air temperatures, and high evaporation rates. At an annual scale, a large proportion of the study area is considered to be water limited (Li et al. 2009). Mean annual rainfall varies across the study area by more than an order of magnitude, from about 400 mm yr−1 south of the Gulf of Carpentaria, to over 4000 mm yr−1 on the steep coastal escarpments in the northeast coast drainage division. More than 90% of rain falls between November and March (Li et al. 2009) as the intertropical convergence zone extends south and passes over the northern extent of the continent. Rainfall is primarily generated by local and organized convection, tropical cyclones, or tropical depressions, which result in rainfall events of high intensity on a global scale (Jackson 1988). The wet season was defined here as from 1 November to 31 April. Orographic uplift on the Northeast Coast division results in high rainfall totals during the wet season and initiates some additional dry season rainfall for this region (Fig. 2).
Interannual variability of mean annual rainfall and runoff is higher than in other parts of the world of the same climate type (Petheram et al. 2008). All rivers considered here are externally draining and are “gaining systems” on an annual scale, because rainfall increases coastward. Many NA rivers have a base flow index (BFI) <0.3 and also have negative (but not significant) autocorrelation of annual flows (Petheram et al. 2008). These characteristics, together with the generally steep shape of runoff exceedance curves, suggest that most of NA has limited hydrologic storage capacity. Exceptions are the rivers set in the early to middle Paleozoic carbonate rocks in the NT and western QLD, Cretaceous sandstones in the NT, and Quaternary sands of northern Cape York (QLD), which exhibit perennial flow (Petheram and Bristow 2008; CSIRO 2009a,b).
a. Selection of gauging stations and preparation of streamflow data
Daily discharge and stage height data were obtained from state and territory government agencies for the 24-h period to 0900 LST (coinciding with the standard reporting period for rainfall). Only streamflow data that were quality coded by the jurisdictions as “satisfactory” or “good” were accepted for analysis.
The majority of gauges in NA commenced operation after 1970 (Petheram et al. 2009b), and hence 1970–present became the default calibration period. Only gauges with at least 10 yr of satisfactory unimpaired monthly discharge data were included (based on work by Yapo et al. 1996, Boughton 2007, and Kim et al. 2009). Yapo et al. (1996) state that having too few streamflow data against which to calibrate a model increases the sensitivity of model performance to the selection of the dataset. Because of the ephemeral nature of streamflow in many NA rivers, an additional condition imposed was that stations were required to have a minimum length of 4 yr of recorded flow during the wet season. Four exceptions were made to these criteria in areas with particularly low density of suitable gauging stations. Here it was deemed advisable to have some data rather than no data to avoid very large distances between donor and target catchments in the RRM regionalization, which is detrimental to model performance (Petheram et al. 2012).
Other gauges rejected were those where local knowledge indicated that considerable groundwater bypassed the gauge or flowed into the surface water catchment, where there was substantial distributary flow into or out of the upstream catchment, or if high or low flows were impeded by a regulatory structure estimated to capture–store water greater than ~10% of the catchment area.
With QLD streamflow gauging stations, the lack of detailed quality code descriptions accompanying each discharge measurement meant that an additional criterion was needed to ensure only “high-quality” stations were included. For this reason, QLD stations were only included if less than 60% of their total streamflow was at a height greater than the maximum gauged stage. This threshold was based on a compromise between using only high-quality stations and spatial density. For stations in the NT, rating curves for the majority of stations were inspected and those with poorly constrained rating curves were rejected. While WA rating curve data were not available, assessments of the quality of low- and high-flow ratings were provided by experienced local hydrographers. From ~450 possible stream gauging stations, the final database comprised 115 high-quality unimpaired gauges across the 1 250 000 km2 of NA.
b. Rainfall–runoff model calibration
Five lumped conceptual RRMs were used to simulate runoff across NA. In order of decreasing complexity, these models were Sacramento (Burnash et al. 1973); Soil Moisture Accounting and Routing Model (SMARG; Goswami et al. 2002); Identification of Unit Hydrographs and Component Flows from Rainfall, Evaporation, and Streamflow Data (IHACRES) Classic (Croke et al. 2006); Simplified Hydrolog (SIMHYD; Chiew et al. 2002); and Australian Water Balance Model (AWBM; Boughton 2004).
The rainfall–runoff modeling utilized gridded daily rainfall and areal potential evapotranspiration (APET) data at 0.05° resolution (approximately 5 km × 5 km), generated from the gridded Silo climate data (Jeffrey et al. 2001; www.longpaddock.qld.gov.au/silo/). The Silo surfaces were derived by spatial interpolation of rain gauge and climate observations using a trivariate (3D) smoothing spline with latitude, longitude, and elevation as the independent variables. APET was computed from the Silo daily climate surfaces using Morton’s wet environment evapotranspiration algorithm (Morton 1983). Based upon availability of historical climate data (Li et al. 2009), a reference historical climate period was defined as 1 September 1930 to 31 August 2007.
Each RRM was calibrated against observed streamflow data from unimpaired catchments with the same parameter values used for all grid cells within a catchment with a 1-yr warm-up period. Calibration was carried out by minimizing the sum of square errors between the modeled and observed values of daily streamflow [Eq. (1)]. To ensure low flows were well modeled, the approach of Petheram et al. (2012) was adopted. Here six calibration runs were undertaken for each model, using values of 1.25, 1, 0.75, 0.5, 0.25, and 0.05 for λ in Eq. (1). Values of λ less than 1 generally reduce the relative influence of high flows in the objective function. The large number of catchments meant that an automated calibration procedure was required. Calibration was automated using two methods applied sequentially: 1) shuffled complex evolution (Duan et al. 1993) and 2) the Rosenbrock method (Rosenbrock 1960). A log-bias constraint (Viney et al. 2009) was employed to ensure that the total flow volumes were well modeled:
where MODi and OBSi are the modeled and observed daily streamflows of day i. Model performance was reported in terms of the Nash–Sutcliffe efficiency (NSE) (Nash and Sutcliffe 1970). NSE values range between one and negative infinity, where a value of one indicates the model is a perfect predictor of the observed values and a NSE less than zero is indicative of the model being a poorer predictor than simply using an average of the observed values. For each model and for each catchment the “best all-around” calibrated parameter set (denoted by Ω) was selected from the six calibrated sets based upon the highest weighted NSE (NSEW) value as computed by Eq. (2). As there is usually a trade-off between simulating high- and low-flow events (Lee et al. 2005), the weightings of the NSE terms in Eq. (2) were devised after inspection of the calibration results for a selection of catchments for all five models, where the objective was to improve the performance of simulating low flows to a greater extent than the loss of performance in simulating the high flows:
In Eq. (2), the subscripts D, Ddry, Dfdc, M, Mdry, Mfdc, and LHfdc correspond to daily flow, daily dry season flow, daily flow duration curve, monthly flow, monthly dry season flow, monthly flow duration curve, and lower half of the daily flow duration curve (i.e., compares observed and predicted between the 50 and 100 percentiles of daily flow). In prediction mode, this method was found by Petheram et al. (2012) to result in an improvement to low-flow simulations, and there was no reduction in performance when simulating the high flows.
c. Regionalization of model parameters and runoff simulation
The NA landscape was subdivided into 603 subcatchments (average area ~2000 km2) based on landscape physiography, catchment area, location of streamflow gauging stations, environmental and cultural assets, and river regulatory structures. Subcatchment outlets are shown in Fig. 3. Version 3 of the GEODATA 9-s flow direction grid, derived from version 3 of the GEODATA 9-s digital elevation model (Hutchinson et al. 2008), was used to define catchment boundaries for each subcatchment.
Transposing calibrated parameter sets between up- and downstream catchments was found to be the best method of assigning parameter sets for NA (Petheram et al. 2012). However, this was not possible for most parts of NA because of the sparseness of streamflow gauges. For these catchments, Petheram et al. (2012) recommended using spatial proximity to regionalize “intact” model parameter sets. However, adjacent or nearby catchments need not always behave similarly, particularly if there are distinct differences between neighboring catchments in land use, geology, and/or climate. Hence, spatial proximity was augmented by knowledge of the local climate and geology. Particularly challenging areas to parameterize because of the absence of suitable gauges were the Cape Leveque coast and the coastal floodplains of the Gulf of Carpentaria (Fig. 1). The Cape Leveque coast anecdotally has very little runoff and few drainage lines. For this area, intact parameter sets were transposed from low-yielding sandy catchments in the Ord–Bonaparte region, ~700 km away (Fig. 3). Although there are a number of gauging stations on the coastal plains of the Gulf of Carpentaria, many of these stations capture streamflow from beyond their “natural catchment” during flood events. The challenge was to identify a station, with a well-defined rating curve, that only recorded flow originating within its natural boundaries. This was achieved through inspection of hydrographs, rainfall–runoff modeling, and high-resolution satellite imagery. The outcome was that much of the coastal floodplain draining to Gulf of Carpentaria was modeled using the parameters from a single gauge (918001). The resulting donor–target catchment mapping relationships for all catchments are shown in Fig. 3. Most parameter sets were donated from geographically near regions; the few exceptions are noted above.
In the Daly (NT) and the Fitzroy (WA) River catchments (Fig. 1), residual flow calibrations were undertaken at one middle- and one lower-reach streamflow gauge. Here, simulated runoff values from one or more upstream gauges were subtracted from the observed time series at a downstream gauge to produce a residual flow time series notionally representing runoff generated between the up- and downstream gauges. The RRMs were then calibrated to this “residual flow” by minimizing the objective function in Eq. (1) at a monthly time step. The use of a monthly time step overcame the need to account for lag and attenuation processes between up- and downstream gauges. Residual flow calibrations were found to give superior runoff simulations to methods using regionalized parameter sets based on an “informed” spatial proximity method for these two river catchments. This approach was only applicable for the Daly and Fitzroy Rivers because they were large (>50 000 km2) and had suitable gauges in series along their main channels. These catchments also receive a significant portion of low flow through groundwater discharge; hence, residual flow provides a better calibration than regionalization from catchments without this component.
To produce a comprehensive runoff surface covering the entire project area, areas between the downstream-most subcatchment and the coast were assigned to 18 “coastal subcatchments” for rainfall–runoff modeling purposes (Petheram et al. 2009b) based on the length of coastline, the diversity of climate, and the locations of suitable gauges for parameter donation.
d. Developing future climate data
GCMs are an important tool for simulating global and regional climate. However, GCMs provide information at a resolution that is too coarse to be used directly in catchment-scale hydrological modeling. For example, rainfall is generated too often and at too low intensity (Stephens et al. 2010). This is particularly important in tropical regions where even high-resolution coupled climate models simulate tropical cyclones to be weaker and larger in spatial extent than those observed (Solomon et al. 2007). Hence, an intermediate step of transforming the broad-scale GCMs outputs to catchment-scale variables is generally performed. Two fundamental approaches exist for downscaling large-scale GCM output to a finer spatial resolution: dynamic downscaling and statistical downscaling (Fowler et al. 2007 and references therein). Dynamic downscaling is computationally very intensive, however, and there are limited archived future daily GCM simulations from which to downscale, while sophisticated statistical downscaling methods (e.g., regression models, weather typing schemes, and weather generators) are laborious. This limits their applicability to large regional-scale hydrological assessments. Recent studies also suggest that no one downscaling method is superior across a range of hydrological metrics (Mitchell 2003; Whetton et al. 2005; Prudhomme and Davies 2009; Chiew et al. 2010; Segui et al. 2010), and in a comparative assessment of scaling methods in an area dominated by large-scale storm events, Salathé (2003) found simple scaling methods to be effective in simulating hydrological systems. For these reasons a simple scaling technique—the empirical scaling method (Chiew et al. 2009b)—was employed for this study. Although similar to pattern scaling (Whetton et al. 2000; Todd et al. 2011), the empirical scaling method used here takes into account changes in daily distribution of rainfall as well as changes in seasonal means. The GCM selection process is outlined in the next section and is followed by a brief summary of the empirical scaling method.
1) GCM selection
A commonly held premise of hydrological prediction is that models that are better able to simulate the past are more likely to accurately simulate the future. In an Australia-wide assessment of rainfall simulations using 23 GCMs, Chiew et al. (2009a) found that there was no clear difference in future rainfall projections across NA between the better and poorer-performing GCMs and that the use of weights to favor the better GCMs gave similar rainfall results to modeling using all the 23 GCMs. The use of paleo observations to determine which, if any, GCMs have the proper sensitivity in tropical regions is also of little value because tropical oceanic and terrestrial paleo-climate proxies are reported to conflict (Rind 2008).
To assess the uncertainty and simulate the range of future runoff predictions, future climate projections from a large range of archived GCM simulations were downloaded from the Program for Climate Model Diagnosis and Intercomparison (PCMDI) website (http://www-pcmdi.llnl.gov). Of the 23 GCMs examined by Chiew et al. (2009a), 15 had readily available daily rainfall data. These 15 GCMs were selected for use in this study and are described in Table 1.
2) Scaling method
The empirical scaling method employed here utilized the output from the 15 GCMs listed in Table 1 to scale the 77 yr of historical daily rainfall and APET sequences to construct the 15- by 77-yr future daily rainfall and APET sequences. The method comprised two broad steps: the first considered changes in the mean “seasonal” values of rainfall and APET, and the second step considered changes to the daily distribution of rainfall within each “season.” A brief overview of the method is provided below. For a comprehensive description of the empirical scaling method of this study and the validity of assumptions, readers are directed to Li et al. (2009) and Chiew et al. (2009b).
The first step estimates the seasonal scaling factors for four 3-month blocks (December–February, March–May, June–August, and September–November) as first described by Mitchell (2003). Using the archived monthly simulations for each of the 15 global climate models, the simulated rainfall was plotted against simulated global average surface air temperature for each season and each GCM grid point. A percentage change in rainfall per degree of global warming relative to the GCM simulated baseline was computed by fitting a linear regression. The results presented in this paper are for a 1° global warming scenario, which is roughly in line with the Solomon et al. (2007) projected median warming by ~2030 relative to ~1990.
The second step accounts for changes to the distribution of daily rainfall. This is potentially important because many GCMs indicate that future extreme high rainfall is likely to be more intense, even in regions where a decrease in mean seasonal or annual rainfall is expected (Fig. 4). Daily scaling factors were computed for different rainfall percentile ranges (i.e., 0%–2%, 2.5%–7.5%, 7.5%–12.5%, and so forth, until the observed grid cell rainfall is less than 1 mm) by comparing daily rainfall simulations from the 15 GCMs for a single Special Report on Emissions Scenarios (SRES) A1B run for two 20-yr time periods: 2046–65 and 1981–2000. Daily scaling factors were then obtained for all rainfall percentiles by interpolating between the values for each of the rainfall percentile ranges. To ensure compatibility with the seasonal scaling factors, the changes for the different rainfall percentiles were expressed as a percent change per degree of global warming. The above daily scaling factors were then used to scale the 77-yr historical rainfall sequences from each 0.05° grid cell that fell within the GCM grid cell. The entire series was then scaled so that the future mean rainfalls in the four seasons were the same as those estimated using the seasonal scaling factors. This step was necessary because the daily scaling factors were available only for two time periods from a single modeling run, whereas the seasonal scaling factors were determined using a large number of data points from several ensemble runs for the archived GCMs’ continuous monthly simulations over more than 100 yr. This process was repeated for each GCM, each season, and for each GCM grid cell. Using an empirical scaling method to transform broad-scale GCM outputs to catchment-scale variables is herein denoted GCMES.
Fig. 4 shows the number of GCMES projecting a decrease (or increase) in future mean annual rainfall and different percentiles of daily rainfall. Although no GCMs adequately resolve tropical cyclones (Solomon et al. 2007)—in part because of coarse model resolution—there is a strong convergence of GCMs projecting an increase in the GCMES of 1% daily rainfall. Theoretically, it is asserted that a warming climate will result in higher sea surface temperatures and an increase in the atmospheric water content (as dictated by the Clausius–Clapeyron relationship), both of which point to an intensification of the hydrological cycle (Huntington 2006; Pall et al. 2007). As a result, there has been much speculation that tropical cyclone intensity will increase with a warmer climate but not necessarily their frequency (Gualdi et al. 2008; Bender et al. 2010). This assertion is supported by some recent observational studies of rainfall variation in the western Pacific (Ikema et al. 2010).
3) Comparison of rainfall–runoff model variability and GCM variability
There are three primary sources of variability in modeling runoff under future climates: (i) uncertainty in GCM future projections, (ii) the method of downscaling, and (iii) the hydrological model. In this paper, we compare the relative uncertainty in two of these sources: GCM future projections and hydrological models. The uncertainty due to the downscaling method is not assessed here; this has been assessed elsewhere (e.g., Prudhomme and Davies 2009; Chiew et al. 2010).
For this comparison, runoff was simulated for each of the 603 subcatchments using five RRMs for 15 different future climate projections (GCMES); a total of 45 225 RRM runs were performed (i.e., 603 × 5 × 15). For each GCMES, the maximum range in runoff simulated by the five RRMs was computed for each subcatchment. These results were then compared to the divergence in modeled runoff using 15 GCMES—that is, for each RRM the maximum range in simulated runoff from using 15 different GCMES was computed for each subcatchment.
A comparison was undertaken for each of the runoff metrics: MAR, mean annual dry season runoff (MARD), coefficient of variation of annual runoff (Cv), 0.1 percentile daily runoff (P0.1), 1st percentile daily runoff (P1), 10th percentile daily runoff (P10), 30th percentile daily runoff (P30), 50th percentile daily runoff (P50), and 80th percentile daily runoff (P80).
In the retrospective and prospective assessments (sections 4a and b, respectively) we report the results of runoff for the IHACRES Classic–Sacramento (ISa) multimodel ensemble. This multimodel was found to be the best-performing all-around model for NA (Petheram et al. 2012). In section 4c we consider all five RRMs listed in section 3b.
a. Retrospective assessment
Figure 5 shows the proportion of catchments exceeding a certain NSE and bias (i.e., total simulated runoff minus total observed runoff, expressed as a percentage of the total observed runoff) when locally calibrated (calibration mode) and when the models were used to predict runoff in ungauged catchments (prediction mode). Model performance for the five individual RRMs were very similar to the results presented by Petheram et al. (2012). Potential sources of error in the modeling include the use of erroneous streamflow and rainfall data, donor and target catchments with dissimilar hydrological responses, calibration bias, and the use of daily rainfall data in catchments with subdaily rainfall–runoff responses.
Under the historical climate, predicted runoff varied from <20 mm yr−1 in the southern parts of the Flinders River catchment to >2000 mm yr−1 in the Mossman River catchment (Fig. 6). Broadly, the spatial distribution of MAR is similar to that of mean annual rainfall.
The majority of MAR coefficient values fell within the range 0.05–0.4, and the spatial continuity of MAR coefficient values and relationship with rainfall suggest that the runoff estimates are relatively robust (Fig. 7a). In Fig. 7b the calibration catchment with the very high runoff coefficient (0.89; Mossman River, Fig. 1) had an observed streamflow that exceeded the volume of rain falling on the catchment. However, it is not possible for the multimodel ensemble to generate more runoff than rainfall. It is likely that the interpolation method used to generate the Silo rainfall surface underpredicted the orographically generated rainfall along the steep, sparsely gauged coastal escarpments within the Mossman and neighboring catchments. These catchments, however, constitute <1% of the study area.
To place the modeled runoff coefficients within a global context, a curve of hypothetical MAR coefficients was computed using the two-parameter water balance model of Zhang et al. (2001) and plotted in Fig. 7b. Zhang et al. (2001) fitted their model (a rational function) to 240 catchments worldwide, although few (<20%) of the catchments were from tropical regions and none were from tropical savanna environments. The water balance model of Zhang et al. (2001) is plotted for a tree-dominated landscape, and the fact that this plots below many of the calibrated catchment runoff coefficients suggests that runoff coefficients in NA may be higher, for the same broad vegetation type and the same mean annual rainfall, than the global average.
Figure 8 illustrates the spatial distribution of key hydrological metrics under the historical climate. MARD was <5 mm season−1 across the majority of NA. The exceptions were along the northern coast drainage division and in those subcatchments set within a carbonate environment. For this reason, the P50 and P80 had a lower correlation with rainfall than the P1 and P10, respectively.
b. Prospective assessment
To assess the future impact of climate on runoff, the future climate projections were used to force the IHACRES Classic–Sacramento multimodel ensemble. Future climate projections were downscaled from the 15 GCMs using the empirical scaling method (as described in section 3d), and each used 77 yr of daily rainfall and APET data. In ungauged catchments, the donor–target catchment relationships established in section 3c were used. The results are presented in Fig. 9.
In Fig. 10, the results from Fig. 9 were summarized into wet, median, and dry scenarios by selecting the 2nd wettest (~10th percentile), 8th wettest (50th percentile), and 14th wettest (~90th percentile) future runoff projection for each 0.05° grid cell. This is expressed as a percentage change (%) in MAR in Fig. 10 (left column) and as an absolute change in MAR (mm yr−1) in Fig. 10 (right column). Preston and Jones (2008) also estimated the percentage change in MAR across the Australian continent per 1°C of global warming, but using an ensemble of 12 global and regional climate models and a simple runoff elasticity approach. The mean percentage runoff sensitivity from the 12 climate models for NA presented by Preston and Jones (2008) was broadly similar to the median percentage change in future runoff presented in Fig. 10 (i.e., ±5%). It should be noted, however, that the runoff elasticity relationships used by Preston and Jones (2008) were applied to runoff data from the NLWRA (2006), which was based on few stations in NA (<10). While the broad-scale runoff sensitivities to a 1°C of global warming of the two studies are similar, absolute runoff values as computed here and reported in the National Land and Water Resources Audit (NLWRA) can be quite different at the catchment scale.
The percentage change in MAR, mean annual rainfall, and mean annual APET spatially averaged across the three drainage divisions are plotted in Fig. 11. GCMES are ranked by increasing MAR. Ranked percentage change MAR was very strongly correlated to ranked percentage change mean annual rainfall (Spearman ranking correlation coefficient of 0.86–0.97 across the three divisions). Percentage change mean-annual APET had a strong inverse correlation to MAR (−0.63 to −0.76), but was also inversely correlated to mean annual rainfall (−0.57 to −0.71). Where there was a small divergence in the pattern of percentage change in MAR, this was primarily attributed to changes in rainfall occurring in catchments with different rainfall–runoff elasticity responses (i.e., due to different runoff coefficients; Chiew 2006) and differences between the GCMES in terms of mean annual APET and daily rainfall percentiles (Fig. 4).
Figure 11 shows that all GCMES data indicate a future increase in APET. In this study, we used Morton’s wet environment evapotranspiration algorithm to compute APET (section 3b), which is a function of actual vapor pressure, net radiation, and air temperature. This formulation of APET was used because of a general absence of wind speed data across NA for the entire 1930–2007 period (McVicar et al. 2008; this gridded dataset only begins in 1975) and there is no reliable indication from the GCMs of how wind speed will change into the future. However, it has been argued that the most robust methods of computing APET are those that are fully physically based (e.g., Wallace 1995; Donohue et al. 2010), with many observational studies reporting recent decline in atmospheric evaporative demand, which has been attributed in part to a widespread decline in recent wind speeds (see McVicar et al. 2011 and references therein). Nevertheless, numerous studies have shown that lumped RRMs are relatively insensitive to the APET formulation (e.g., Andreassian et al. 2004; Oudin et al. 2005) and hence APET formulation is considered here to have little effect on the percentage change in modeled runoff.
c. Assessment of induced variability
Figure 12 shows the range of results for various runoff metrics for the 603 subcatchments across NA for (i) each RRM informed by projections from the 15 GCMES and (ii) each GCMES runoff simulated by the five RRMs. The y axis plots the absolute range. Very similar results were attained when just the calibration catchments were analyzed.
The results show that, for MAR, the divergence across the 15 GCMES was considerably larger than the divergence across the five RRMs (Fig. 12). This is in agreement with observations made by others in temperate regions of the world, as discussed earlier, and this result is not surprising given the similarity in RRM bias as shown in Fig. 5b. Further to this observation, however, we found that this result holds for the mid- to high-flow metrics (i.e., P30 through to about P1). For the “extremely” high runoff events (i.e., P0.1), however, the divergence across the RRMs was similar to the divergence across the GCMES. Further, for the low-flow metrics (MARD, P50, and P80), the divergence across the 15 GCMES data was smaller than the divergence across the five RRMs. To our knowledge, this has not been previously reported in the scientific literature. The divergence across the GCMES data and RRMs was similar at about the P40 metric. Hence, for averages and mid- to high-runoff metrics, better predictions will come from improved GCM projections. For extreme events and low flows, improvements (e.g., to model structure and parameterization) are needed in both GCMs and rainfall–runoff modeling.
The results presented in Fig. 12 are for the full range of GCMES data and RRMs. However, the results were similar when the range in GCMES data and rainfall–runoff values were taken to be between the 10th and 90th percentile values. In section 5a, we discuss why there is such divergence across the 15 GCMES data for MAR and the mid- to high-flow metrics, and in section 5b we discuss why there is such divergence across the five RRMs for the low-flow metrics.
a. Why are GCMs so highly variable in the tropics?
In modeling MAR across NA, the greatest uncertainty in future predictions comes from the very large range of future rainfall projections from the different GCMES. Over northwestern Australia a small majority of GCMES project a weak rainfall increase, but there is no clear consensus over the eastern half of NA (Fig. 4). Approximately 70% of the GCMES, however, project future mean annual rainfall within ±5% of the historical value (i.e., negligible modeled trend) (Fig. 11) and it is possible for such small trends to be generated by internal variability modeled by the GCMs (Cai et al. 2010, 2011). In that scenario, because internal variability is independent across models, the resultant trends will be different from one model to another, which seems to be consistent with the results presented here. Hence, one could argue that the consensus result is that mean annual rainfall will not change into the future. Clearly there are some deficiencies in the GCMES and we are not sure how this may impact on trends in mean annual rainfall.
Over northwestern Australia, the GCMES projected mean annual rainfalls were considerably less than the wet recent past (1996–2007) (Li et al. 2009). This arises from the fact that the major convection in the present-day climate model is incorrectly situated over the far western Pacific. Hence, under global warming, the main convection weakens, leading to the reduction (Shi et al. 2008; Cai et al. 2011). This reduction in rainfall is not the result of a decrease in Northern Hemisphere aerosols in the models as previously thought.
Over the eastern half of NA, the lack of consensus in future trends is, in part, because given climate change it is not clear how the phase, intensity, and amplitude of the El Niño–Southern Oscillation (ENSO) will change (Collins et al. 2010). While ESNO is considered to be the primary source of global climate variability over the 2–7-yr time scale, it has its largest influence on weather patterns and climate variability over the tropical Pacific and the continental landmasses on either side [i.e., coastal regions of South America, Southeast Asia, and north and eastern Australia (Philander 1989)]. In these areas, rainfall teleconnections with ENSO are not well simulated and there is a large spread among models (Cai et al. 2009, 2011).
More broadly, and across the tropics as a whole, Rind (2008) attributes the wide range of GCM responses to the models’ cloud cover changes as the climate warms. Tropical warming in GCMs is strongly governed by the response of low and high tropical cloud cover to increased greenhouse gases, and different GCMs use different cloud parameterization schemes. Rind (2008) argues that a consequence of having a large uncertainty in low-latitude (i.e., tropics) and high-latitude climate sensitivity (primarily due to the complexities of the cryosphere and related feedbacks) is that there are large uncertainties in latitudinal temperature gradients, which affect the confidence in projections of atmospheric dynamics and hydrological response to a warming climate over the entire globe, not just the latitudinal extremes.
Although large improvements have occurred between early GCMs and the current versions (which include, e.g., coupled dynamic ocean models without flux corrections, incorporation of cloud liquid water routines, finer horizontal and vertical resolutions, and a much greater number of models), the degree of uncertainty in tropical sensitivity remains unchanged (Rind 2008). Climate scientists do, however, have an improved qualitative understanding of why the models are producing different sensitivities and a better understanding of the uncertainty that results from not knowing these sensitivities. Future improvements to GCMs’ performance in tropical regions will come from improved simulation of tropical atmospheric convection and tropical Indo–Pacific circulations in the mean climate, from which model climate change will occur. This is linked to a proper representation of oceanic sea surface temperature in the model equatorial Pacific. A key source of improvement should come from an ability to simulate the ENSO pattern, periodicity, and intensity, and the associated feedback processes that govern these properties (Collins et al. 2010). To see if the lack of consensus in rainfall holds into the future, we will have to wait until the new generation of models is submitted to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change.
Although it is likely that improvements to GCMs will result from the above advances, the range of plausible rainfall projections is likely to remain large. This realization has spurred a variety of opinions on the utility of GCMs for adaptation-type studies (Kundzewicz and Stakhiv 2010) and methods for managing potentially divergent output from multiple GCMs. Anagnostopoulos et al. (2010) question whether climate is at all predictable deterministically, and advocate a paradigm shift to stochastic methods and moving from trying to reduce the uncertainty to quantifying the uncertainty. Nevertheless, studies like ours and others that use projections from a large range of GCMs provide a useful indication of the envelope of potential climate change on various hydrological metrics (Preston and Jones 2008; Gosling et al. 2010). In the future, improved hydroclimatic projections, with reliable probabilistic quantification of uncertainties, will help improve the risk-based water planning and management decision making process.
b. Why do conceptual rainfall–runoff models exhibit so much variability when simulating low flows and extreme runoff?
Although the RRMs used here are only sensitive to changes in rainfall and to a lesser extent APET, for the low-flow metrics the differences between individual RRMs under the historical climate was large relative to the changes that occurred in each RRM under the 15 GCMES future climates. The relative divergence (i.e., the divergence expressed as a percentage of the historical value for each of the 603 subcatchments) across the 15 GCMES, however, is larger than the relative divergence across the five RRMs, for all metrics examined.
The reasons lumped conceptual RRMs produce diverging “absolute” results for low-flow metrics under the historical climate as well as the future climate are likely to be twofold: (i) lumped, conceptual RRMs have built-in structural limitations that cannot easily be overcome when it comes to simulating processes responsible for low-flow generation (in particular groundwater interaction with a river system); and (ii) regionalizing model parameters on the basis of spatial proximity does not adequately capture the spatial variability inherent in landscape processes responsible for low-flow generation, particularly in regions with few streamflow gauging stations (see Petheram et al. 2012). In the case of the physical similarity regionalization methods, which use “lumped” landscape attributes, it is likely that we cannot measure, map, or capture in a single term the controlling landscape attributes necessary to represent low-flow hydrology. A more specific calibration against low-flow metrics than was undertaken here is unlikely to yield better results than achieved here under prediction mode. For example, Petheram et al. (2012) showed that despite the use of optimization algorithms that improved the simulation of low flows under calibration mode, there was only a small gain in prediction mode.
Clearly, a spatially distributed CHM that explicitly simulate recharge and groundwater flow processes would be the most appropriate choice of model. However, data are rarely available to parameterize these models regionally. In using lumped conceptual RRMs calibrated against historical data to simulate runoff under future climates, we are essentially ignoring the feedback effects between land surface and climate (e.g., effects of CO2 fertilization on plant water use and catchment yield). We do not consider this to be an unreasonable proposition, given the results of previous studies (e.g., Gosling et al. 2011), the potential for misrepresentation and the uncertainty of the net effect of global warming, and increased CO2 concentrations on plant water use at the catchment scale (Körner 2006). Inclusion of these processes in the future, however, is only likely to increase the variability induced by the CHM.
c. Implications for water management in northern Australia
Water managers in NA are facing increasing demands for water during the dry season from pastoralists, irrigators, and forestry and mining companies, and with runoff in many catchments during this period <0.1 mm day−1, competition is likely to increase in accordance with development. In the NT, for example, in areas where no water allocation plan has been declared, and there is insufficient evidence to show otherwise, no more than 20% of instantaneous streamflow and no more than 20% of aquifer recharge can be allocated. Hence, during the dry season, water managers may be required to manage catchment runoff to within 0.02 mm day−1 or less. The analysis presented in section 4c, however, highlights the challenges of using RRM to inform low-flow-related management decisions in the wet–dry tropics, whether under current or future climates.
Despite these limitations, the general absence of hydrological data across NA has meant that the runoff results and calibrated model parameters from this study have been widely used—for example: (i) to assess the reliability of storages and security of water supply under future climate and development scenarios (Petheram et al. 2009a), (ii) to assess changes in flow regimes under future climate and development scenarios with respect to aquatic ecosystems (McJannet et al. 2009), (iii) to compute fluxes of nutrients and carbon to the ocean, and (iv) by engineering and agricultural consultants advising on irrigation expansion and mining developments in NA. The level of confidence these users will have in the modeling results will vary depending upon which aspect of the flow regime are most relevant to their study, whether their catchments are gauged or ungauged, and whether it is the absolute or relative values that are of greatest importance.
6. Concluding remarks
This paper describes the largest and most comprehensive assessment of projected future climate impacts of runoff undertaken in the tropics globally, using conceptual RRMs. For the best-performing multimodel ensemble (ISa), under the historical climate (1930–2007), simulations of runoff across the project area ranged from <20 mm yr−1 in the southernmost parts of the study area (Flinders River catchment) to >2000 mm yr−1 in a northeastern catchment (Mossman River). Using ISa, approximately half the global climate models (empirically scaled to transform broad global climate model outputs to catchment-scale variables-designated GCMES) used here resulted in a spatially averaged increase in MAR (by up to 29%) and half resulted in a decrease (by up to 26%), both relative to the 1930–2007 average. However, approximately 70% of the GCMES differed by less than ±5% from the historical rainfall (1930–2007), and it is possible for such small trends to be generated by internal variability modeled by the GCMs.
The range in the modeled impact on the MAR as estimated by five RRMs (for individual GCMES) was considerably less than the range of modeled MAR results using 15 GCMES (for individual RRM). Further, we found that this result holds for the mid- to high-flow metrics (1st through to 30th percentile daily runoff), but that the degree of divergence between the RRMs and the GCMES were similar for the 0.1 and 40th percentile daily runoff. Three reasons for the wide spread of rainfall values from the GCMES were considered: (i) inconsistencies in the modeled location of convection over the Pacific; (ii) lack of consensus on how the phase, mode, and amplitude of the ENSO will change under a future warmer climate; and (iii) different cloud parameterization schemes.
For the low-flow metrics (MARD, 50th, and 80th percentile daily runoff) the divergence across the 15 GCMES-based runoff projections was smaller than the divergence across the five RRMs—a finding that, to our knowledge, has not been previously reported. We attribute this to (i) conceptual RRMs having built-in structural limitations to simulating low flows and (ii) model parameter regionalization methods not capturing the spatial variability in processes that represent low-flow processes.
This work indicates that, for average and mid- to high-flow metrics, improvements in future flow predictions will come from improved GCM projections. For extremely large runoff events and low flows, however, improvements are needed in both GCMs and rainfall–runoff modeling applications.
This study was part of the Northern Australia Sustainable Yields (NASY) project (http://www.csiro.au/partnerships/NASY.html)—a 1-yr project funded by Australia’s National Water Commission. The authors are grateful to the WA Department of Water (DoW); NT Department of Natural Resources, Environment, the Arts and Sport (NTRETAS); and the QLD Department of Environment and Resource Management (DERM) for collecting and supplying streamflow data. Leith Bowyer (DoW), Vince Manley, and Phil Kerr (DERM) are thanked for their assistance in identifying suitable streamflow gauging stations. The authors would also like to thank Dr. David Post, Dr. Steve Charles (both CSIRO Land and Water), and three anonymous reviewers for comments on drafts of this paper. The first author was supported by the Water in Northern Australia stream of the CSIRO Water for Healthy Country Flagship.