A forecasting lead time of 5–10 days is desired to increase the flood response and preparedness for large river basins. Large uncertainty in observed and forecasted rainfall appears to be a key bottleneck in providing reliable flood forecasting. Significant efforts continue to be devoted to developing mechanistic hydrological models and statistical and satellite-driven methods to increase the forecasting lead time without exploring the functional utility of these complicated methods. This paper examines the utility of a data-based modeling framework with requisite simplicity that identifies key variables and processes and develops ways to track their evolution and performance. Findings suggest that models with requisite simplicity—relying on flow persistence, aggregated upstream rainfall, and travel time—can provide reliable flood forecasts comparable to relatively more complicated methods for up to 10 days lead time for the Ganges, Brahmaputra, and upper Meghna (GBM) gauging locations inside Bangladesh. Forecasting accuracy improves further by including weather-model-generated forecasted rainfall into the forecasting scheme. The use of water level in the model provides equally good forecasting accuracy for these rivers. The findings of the study also suggest that large-scale rainfall patterns captured by the satellites or weather models and their “predictive ability” of future rainfall are useful in a data-driven model to obtain skillful flood forecasts up to 10 days for the GBM basins. Ease of operationalization and reliable forecasting accuracy of the proposed framework is of particular importance for large rivers, where access to upstream gauge-measured rainfall and flow data are limited, and detailed modeling approaches are operationally prohibitive and functionally ineffective.
Flooding poses a severe constraint on socioeconomic development in flood-prone areas across the world. On average, river flooding affects 21 million people and $96 billion in gross domestic product (GDP) worldwide each year, with developing countries seeing more of their GDPs exposed to flood risks than the developed world (WRI 2015). South Asia is identified as one of the hardest-hit areas, with upward of 9.5 million people affected by annual floods. India and Bangladesh share the top two positions on the list of flood-prone nations, with Pakistan coming in at fifth and Afghanistan at ninth (WRI 2015).
A forecasting lead time of ~5–10 days is desired to increase the flood response and preparedness (ADPC 2002; Webster and Hoyos 2004; CEGIS 2006) in flood-prone regions across the world, including Bangladesh. The major limitation of providing short (3–5 days) to midrange (7–10 days) flood forecasting is mainly associated with large uncertainty in precipitation forecasts data (Clark and Hay 2004; Pappenberger et al. 2005; Cloke and Pappenberger 2009; Charba and Samplatsky 2011; Dravitzki and McGregor 2011). The accuracy of precipitation forecasts of weather models is currently limited to 1–5 days globally (Clark and Hay 2004; Cloke and Pappenberger 2009; Pappenberger and Buizza 2009; Bauer et al. 2015), much shorter than the desired forecast lead time. Consequently, attempts to develop and refine lumped, semidistributed, and distributed hydrological models and a range of numerical, statistical, and satellite-driven methods (i.e., Jasper et al. 2002; de Roo et al. 2003; Bartholmes and Todini 2005; Siccardi et al. 2005; Webster et al. 2010; Hopson and Webster 2010; Yucel et al. 2015) continue to improve but have yet to produce operationally credible flood forecasting with 5–10-days lead time. At the same time, comparative utility of these computationally intensive and complicated methods with respect to simple numerical approaches has not been fully explored.
Here, we examine the current flood forecasting methods for the Ganges, Brahmaputra, and Meghna (GBM) river basin system (Fig. 1). The GBM river basin system in South Asia, with a combined drainage area of approximately 1.7 million km2, is the third-largest freshwater outlet to the ocean (Chowdhury and Ward 2004). Three river basins of this system feature unique geomorphology and flow regimes, and the tributaries of these river basins flow through different ecological, social, economic, and political realms (Biswas 2008). We propose a data-based streamflow and water level (WL) forecasting scheme—named as the Requisitely Simple (ReqSim) flood forecasting model—by using the notion of requisite simplicity. Requisite simplicity provides a framework by discarding some details while maintaining conceptual clarity and the scientific precision of a complex system (Stirzaker et al. 2010) that includes many interactions, processes, and feedbacks—like rainfall–runoff and flood forecasting models in this case. Requisite simplicity may be achieved by taking a closer look at the dominant processes of a complex system, reducing the system to its essential components, and identifying the emergent properties of a system (Ward 2005; Stirzaker et al. 2010; Cilliers et al. 2013). A data-based model aims to learn from the data about how the system works by establishing a relationship between a series of inputs and a series of outputs (Beven 2012).
At present, the Flood Forecasting and Warning Center (FFWC) of the Bangladesh Water Development Board (BWDB) generates and disseminates three types of flood forecasts: 1) short-range (1–5 days) deterministic WL forecasts at 40 river points, 2) medium-range (1–10 days) probabilistic WL forecasts at 35 river points, and 3) Jason-2 altimetry-based WL forecasts at 13 river points (www.ffwc.gov.bd). FFWC requires the upstream flow information of three major transboundary rivers between Bangladesh and India (i.e., the Ganges, Brahmaputra, and upper Meghna Rivers) to generate effective flood forecasts in Bangladesh. This poses a fundamental difficulty for them due to the limited availability of gauge-measured upstream streamflow and WL data. As a result, forecasting these transboundary rivers is very essential for FFWC. It appears that if the upstream inflows at these three major rivers are successfully predicted with high accuracy, FFWC’s current operational flood forecasting model (i.e., a one-dimensional river model) can provide more reliable forecasts within Bangladesh (Jakobsen et al. 2005). Thus, improvement of the GBM streamflow or WL forecasts at Hardinge Bridge (HB) on the Ganges, Bahadurabad on the Brahmaputra, and Amalshid on the upper Meghna (UM) or Barak River is still considered as a major challenge in the existing operational forecasting capability of FFWC and overall flood disaster management plan in Bangladesh (Hopson and Webster 2010; Hossain et al. 2014a; Hossain and Bhuiyan 2016).
b. Attempts to simplify and requisite simplicity
Two different considerations must be considered to understand the limits of flood forecasting accuracy: a catchment condition that focuses on the state of the catchment and an atmospheric condition that examines the predictability of precipitation inputs. Not surprisingly, precipitation, a key input to a flood forecasting model, is generally considered to be the largest source of uncertainty for medium- to long-range flood forecasting (Pappenberger et al. 2005; Bauer et al. 2015; Wu et al. 2017). Accuracy of flood forecasting models, whether using observed or forecasted rainfall, are also limited in their ability to capture local characteristics of the rainfall–runoff processes because we do not know the soil and geological properties of catchments at the scale needed to model the relevant dynamics (Marks and Bates 2000; Pappenberger et al. 2005). In fact, the recognition of fundamental problems in the application of physically based models—due to scale mismatch between model equations and heterogeneity of rainfall- and runoff-generating mechanisms; practical constraints on solution methodologies; and uncertainties associated with parameter estimation, model calibration and validation—for flood forecasting is not new (Beven 1989; Wood et al. 2011; Beven 2012).
A flood forecasting system usually includes a large number of nonlinear relationships among rainfall and runoff processes with feedback. To generate a perfect (or nearly perfect) model of such a system, one has to model everything (or nearly everything). Yet, as Lorenz (1963) aptly pointed out, in a nonlinear system with feedback, approximate accurate representation of the present does not guarantee accurate forecast of the future due to sensitivity to the initial conditions. Consequently, to develop a reliable and robust flood forecasting model, we have to reduce the complexity of processes, interactions, and feedback by simplifying the model structure. Currently, we do not have a generally accepted criterion to decide what constitutes a simple model within the context of complexity of modeling and functional utility.
We adapt an approach proposed by Ward (2005), in the context of system design, to illustrate the relationship between modeling complexity and functional utility (Fig. 2). We begin at the lower left quadrant in region A. Here the system is simple: cause–effect relationships are known, modeling is guided by fundamental principles, and mathematically tractable solutions are usually found. As we introduce more realism to our model, we move toward region B. During this move, new variables, processes, and dynamics are introduced. An increased level of modeling complexity—from region A to B—thus usually leads to better functional utility (e.g., forecasting accuracy). At this stage, we may have a functional flood forecasting model for a given purpose. Further increasing the level of model complexity, however, may not lead to better forecasting accuracy.
Over the last several decades, with an advanced understanding of atmospheric physics, deployment of a global upper-atmospheric observation network, and increased computational power, our ability to provide skilled short-term (1–5 days) precipitation forecasts has improved. However, Lorenz (1963)—in his pioneering work on chaos—showed that nonlinear systems are only predictable for a finite time owing to their sensitive dependence on initial conditions. This puts a fundamental limit on our ability to provide accurate medium-range (5–10 days) precipitation forecasts. For example, a careful examination of over 40 years of 8-day atmospheric forecasts over the contiguous United States (e.g., Clark and Hay 2004; Cloke and Pappenberger 2009) or the European Centre for Medium-Range Weather Forecasts (ECMWF)-generated precipitation forecasts over the Danube basin (Pappenberger and Buizza 2009) suggests a limit on the predictability of precipitation with very low to modest skill out to 4–5 days. Given that the antecedent conditions likely dominate the runoff over lead times shorter than the time to concentration of the basin (Voisin et al. 2011), moderately skillful numerical precipitation forecasts at such short lead times may not lead to better forecasting accuracy. However, there is a perception that increased forecasting accuracy is achievable by simply increasing the space–time resolution and physical parameterization of numerical models. This perception may lead one to a journey from region B to C, in which model complexity increases without any appreciable change in functional utility.
While in region B, further progress may come not from adding more modeling complexity, but from simplification. Here, we argue that simplification may be achieved as we move from region B to D by taking a closer look at the dominant processes for large river basins and reducing the model to its essential components. The proposed requisite simplicity—to paraphrase Einstein, “simple but not simpler”—is achieved by identifying the key components of the rainfall–runoff process and developing ways to track their evolution for our ReqSim flood forecasting models described in section 2a.
c. Premise and structure of this paper
Building on the notion of requisite simplicity, we present a linear regression-based ReqSim flood forecasting model for the Ganges, Brahmaputra, and Meghna (Barak) basins at their most downstream gauging locations—HB, Bahadurabad, and Amalshid, respectively—and report the forecasting results of up to 10 days lead time over 2007–15. In Jiang et al. (2016), we have presented a proof of concept of this scheme with limited results from the Ganges River for 2002–07. This work thus aims to introduce streamflow and water level forecasts of three major rivers of Bangladesh (i.e., the Ganges, Brahmaputra, and Meghna) with an accuracy comparable to currently available operational flood forecasting techniques by using a “simpler” ReqSim model so that the flood forecasting agencies and other government and nongovernment stakeholder organizations of the country could become interested in adapting this technique and applying it in their disaster operation activities.
Section 2 explains the development of the ReqSim forecasting framework. Section 3 presents the results of the model with different combinations of streamflow or WL and basin rainfall. Section 4 discusses the key findings of the paper, applicability, and potential limitations of our flood forecasting model.
d. Flood forecasting efforts in Bangladesh
Bangladesh is the most downstream riparian country of the GBM river system. Flooding is an annually recurring event in this region, with approximately one-fifth of the area of Bangladesh inundated by flood water every year and as much as two-thirds inundated during extreme events (Mirza et al. 2001). The annual flood becomes catastrophic if a peak discharge of the Ganges and Brahmaputra Rivers occurs simultaneously, like in 1988, 1998, and 2007. The flooding conditions may also worsen if a high tide in the Bay of Bengal coincides with high river discharge.
The limited data availability from upstream basin areas in India puts a fundamental limitation on Bangladesh to produce and disseminate skilled flood forecasts of 5–10 days lead time. Over 90% of the GBM drainage areas lie outside of Bangladesh that generate about 80% of flood season (i.e., June–September) flow inside Bangladesh (Palash et al. 2014). To overcome this limitation imposed by upstream data availability, the FFWC started applying a numerical one-dimensional hydrodynamic model (river forecast model) since 1992 (Jakobsen et al. 2005). There are 38 upstream boundary condition points in the FFWC river forecast model, of which the HB (Ganges), Bahadurabad (Brahmaputra), and Amalshid (UM–Barak) locations are most important. FFWC opted for a subjective approach to estimate the possible WL change at these three boundary points over the next 72 h by observing radar, numerical weather model, or satellite-derived rainfall patterns in the upstream of the GBM basins. From 2014, FFWC—with the technical help from the Institute of Water Modeling (IWM), based in Dhaka, Bangladesh—has replaced its subjective boundary preparation with numerical GBM basin model-generated flow forecasts for up to a 5-day lead time (http://www.iwmffws.info/index.php).
Since 2003, the FFWC also started using the probabilistic forecasts at HB (Ganges) and Bahadurabad (Brahmaputra) for a 10-day lead time developed by the Climate Forecast Application Network (CFAN; www.cfanclimate.com) in its river model to simulate WL forecasts inside Bangladesh. CFAN provides the FFWC forecasts with 51 ensembles on a daily basis, and these are widely regarded as very successful in enhancing flood forecasting lead time and accuracy in the country. However, FFWC uses three sets of flow forecasts from the ensemble (i.e., ensemble mean, and mean plus and minus one standard deviation) to run its river model and generate daily forecasts. It appears that applying all ensemble forecasts in FFWC’s river model is not feasible due to the model’s high computational time and difficulties in disseminating ensemble forecasts for operational purposes.
Hossain et al. (2014a,b) employed Jason-2 satellite altimetry estimates in the upstream of the Ganges and Brahmaputra in a hydrodynamic river model and translated the satellite altimetry to downstream WL inside Bangladesh. From 2014, the FFWC has started testing Jason-2–based WL forecasts at major river locations of Bangladesh for up to 8 days lead time, including HB (Ganges) and Bahadurabad (Brahmaputra), which shows a great forecasting potential for Bangladesh. The forecast performance of CFAN and Jason-2 forecasts are discussed and compared with our proposed method in section 3c.
The satellite-derived river width coupled with a regression technique was introduced by Hirpa et al. (2013) to forecast the Ganges and Brahmaputra for up to 15 days lead time. The Nash–Sutcliffe efficiency (NSE) coefficient of Ganges daily forecasts at 1-day lead time is 0.80, which declines to 0.52 for 15-days lead time using a satellite-derived flow (SDF) method. For the Brahmaputra, they are 0.8 and 0.56, respectively. Adding flow persistence to SDF (i.e., SDF + PERS) improves the Ganges root-mean-square error (RMSE) by 41% from 5315 m3 s−1 and Brahmaputra by 37% from 8190 m3 s−1 for a 15-day lead time. They also applied an autoregressive moving average (ARMA) model, which provides better forecasts to SDF+PERS up to 10 days for the Ganges and 3 days for the Brahmaputra River. However, it is difficult to interpret their results for the flood season (June–October) because they evaluated their 1997–2010 results for the entire year, which is expected to yield better performance statistics due to very low flow and less variability during nonmonsoon months (November–May). Akhtar et al. (2009) developed an artificial neural network (ANN) for the Ganges flow forecasting for up to 10 days lead time. Their results showed the value of the sum of lagged rainfall as input started to become noticeable after 7 days lead time. Biancamaria et al. (2011) discussed the utilization of TOPEX/Poseidon measurements in the Ganges and Brahmaputra and reported WL anomalies at HB and Bahadurabad with an RMSE of 0.40 m for 5 days lead time and 0.6–0.8 m for 10 days lead time during monsoons in 2000–05.
e. Data-based modeling
The data-based model is inductive in letting the data suggest an appropriate model structure (Beven 2012). In some cases, a data-driven model can offer a mechanistic interpretation of a system and lead to further insights that a physically based model or theoretical reasoning might fail to reveal (Young 2003; Beven 2012). Young and Beven (1994) have shown that, with a good physical understanding of the underlying system and the observations, a data-based mechanistic modeling approach can provide sufficient and reasonable explanations of the system behavior. Thus, complex nonlinear natural processes can potentially be decomposed fairly easily into several serial, parallel, or feedback connections of simple processes, each of which can be considered as a first-order conservation equation (such as a representation of rainfall–runoff processes). Young and Beven, however, warned against overdependence on the model and urged that the associated variable and parameter uncertainties should be carefully evaluated and that an adaptive mechanism should be used to train the model with multiple time series datasets for a robust model equation. Young (2002) introduced a data-based mechanistic flood forecasting model based on a recursive estimation of nonlinear, stochastic, and transfer-function equations of rainfall–flow time series data. This model was mostly successful, not only in characterizing the rainfall–flow dynamics of the catchment, but also for interpreting essential aspects of a basin’s hydrology. However, using an example from a practical application of the model, Young suggested potential limitations of the model in terms of failing to explain several aspects of rainfall–runoff processes.
Smith et al. (2014) introduced a data-based mechanistic model with a parsimonious representation of catchment dynamics that could generate reliable flash flood forecasts for a small Alpine catchment. They analyzed historical observed flow and radar rainfall data, identified analogs from the predictors, and applied those in precipitation forecasts to simulate flow forecasts. The UK Environment Agency has adopted a data-based real-time flood forecast modeling approach developed by the Delft Flood Early Warning System (Leedal et al. 2013). The modeling approach uses a network of nodes representing upstream subcatchments, identifies input nonlinearity and output transfer functions, and applies Kalman-filter hyperparameters to generate flow forecasts. Shahzad and Plate (2014) applied a data-based modeling approach, such as a regression model, a linear rainfall–runoff model, or a combination of those two, in the Mekong River basin’s flood forecasts with the integration of rainfall data from upstream gauging stations and station-to-station flow travel time. Their results show that the regression model works fairly well for 1–2-day forecasts while the linear rainfall–runoff or the combined model can be useful for forecasting up to 5 days. Data-based mechanistic flood forecasting models were also applied in the United States and Honduras (Basha et al. 2008) by applying a multiregression approach based on rainfall, air temperature, and river stage observations and a stage–discharge relationship to generate discharge forecasts.
f. Hydrology of the GBM river system
The hydroclimatology of the GBM system is diverse and nonhomogeneous but similar in seasonal patterns of rainfall occurrence and streamflow generation (Rasid and Paul 1987). For example, the southwest monsoon originating from the Bay of Bengal brings more than 70% of average annual rainfall during June–September over the GBM. The western Ganges receives relatively less annual rainfall than the rest, about 760–1020 mm, while the middle and east receive 1020–1520 mm and 1520–2540 mm, respectively (FAO 2016). The Brahmaputra basin shows a large north–south gradient in annual precipitation. The upper Brahmaputra in the Tibetan Plateau receives 300 mm of annual rain, whereas it is 1200 mm in the east of the lower Brahmaputra and 6000 mm in the south over the Meghalaya Plateau (IWM 2013; Bajracharya et al. 2015). The annual rainfall in the western Meghna basin ranges from 2150 mm over the northeast haor region of Bangladesh to 6000 mm over the southern foothills of the Meghalaya in India. The UM (Barak) receives the highest annual rainfall, from 1700 mm in the east to 3000 mm in the west. Because of the seasonal hydroclimatology, there is a strong difference in average annual and peak discharges of these rivers. Table 1 shows the key physical and hydrologic features of the GBM basins while Fig. 3 shows the seasonality in the hydrology of these three river basins, with large variations between the monsoon and nonmonsoon months’ rainfall and river flow.
2. Proposed ReqSim flood forecasting model
a. Modeling framework: Requisite simplicity
Three rivers in the GBM basins show significant day-to-day persistence. Rahman et al. (2004), Akhtar et al. (2009), and Jiang et al. (2016) developed a simple flood forecasting method for the Ganges using flow persistence as the key mechanism for prediction. In this paper, motivated by the idea of requisite simplicity, we revisited the utility of flow persistence to develop a linear flood-forecasting scheme for the GBM basins. An autocorrelation function (ACF) quantifies the flow persistence by estimating the correlation function between times and and is computed as
where and are streamflow data at time and separated by lag , and and are the mean and standard deviation of streamflow . We considered the river flow as second-order stationary with constant mean and variance. During the monsoon season of the GBM basins (June–September), the ACF of the daily Ganges, Brahmaputra, and UM (Barak) flow shows a strong persistence up to several days (Fig. 4), providing a rationale to explore the potential of a “persistence”-based linear model. We extended the persistence model further by introducing the domain-average past and forecasted rainfall of the upstream basin areas and their flow travel lag times (i.e., time required for runoff to travel from the upstream domain to the downstream forecast location) into the forecasting scheme.
We therefore present three regression-based ReqSim models in this paper: 1) flow persistence (QQ), 2) flow persistence with observed rainfall (QQ+ObsR), and 3) flow persistence with observed and forecasted rainfall (QQ+ObsR+ForeR). All three models utilize flow persistence denoted by QQ that can either be the streamflow or WL of forecast location. The QQ+ObsR model uses observed upstream basin rainfall while the QQ+ObsR+ForeR model adds forecasted rainfall along with past observed rainfall into its regression. From this point onward, we will refer observed rainfall as “ObsR” and forecasted rainfall as “ForeR.”
1) QQ model
The strong persistence of daily streamflow up to several days provides the opportunity of using a persistence-based QQ model. A simple QQ model, therefore, is
where is the forecasted streamflow or WL of days lead time; and are the observed streamflow or WL on forecast day and previous day , respectively; and are model coefficients related to persistence; and is the regression interception coefficient.
2) QQ+ObsR model
A detailed mechanistic rainfall–runoff transformation model is desirable to convert upstream rainfall to downstream flow conditions. Given the size of the basin, lack of detailed data from upstream regions, and difficulty of calibrating and validating model parameters for such a model, we opted to use a simple linear transformation of rainfall to runoff. Division of the entire basin into four large domains and their corresponding flow travel times to downstream forecast locations (see section 2b) helped us to calculate the daily space aggregated (i.e., averaging domainwide rainfall) and then time aggregated (i.e., averaging daily domain rainfall from minimum to maximum flow travel time) rainfall by using Eq. (4). For example, our isochrone analysis shows that runoff from most upstream domains of the Ganges (domain I) located in the western Ganges takes up to 19–25 days to arrive at HB in Bangladesh (Fig. 5). In other words, rainfall that occurs in the Ganges domain I 19–25 days before has a contribution to current Ganges flow at HB. Similarly, runoff from domains II, III, and IV of the Ganges may take 13–18 days, 8–12 days, and 0–7 days, respectively, to arrive at HB. Therefore, we averaged the past daily rainfall of domain I (19–25 days), domain II (14–18 days), domain III (8–13 days), and domain IV (0–7 days) to get the space–time-aggregated rain signals , , , and , respectively, which can be considered linearly correlated to current HB flow. A similar approach has been applied to the Brahmaputra and UM (Barak) basin for calculating their space–time-aggregated domain rainfall.
It is important to note that when space-aggregated daily rainfall is compared against downstream daily streamflow of the basin, it is hard to establish any direct “linear” relationship between the two due to noise in the daily rainfall (Fig. 6, left). But if we compare space–time-aggregated domain rainfall with the application of the average flow travel time lag of each domain, then the rainfall becomes more correlated to the downstream streamflow (Fig. 6, right). Hence, a space–time-aggregating process helps to establish a near-linear correlation between upstream domain rain and downstream streamflow (i.e., high upstream rain to high downstream streamflow or vice versa). The regression structure of the ReqSim model utilizes the same linear correlation in its parameter estimation and prediction. Figure 7a shows how downstream flow of the Ganges on the day of the forecast responds to past rain signals of four upstream domains, while Fig. 7b illustrates the predicted response of downstream flow at a 10-day forecast horizon given observed rainfall within three upstream domains. The fourth, or most downstream, domain does not contribute to the 10-day forecast because its time of concentration is shorter than the forecast window.
Because of a relatively large lag time between the upstream rainfall and corresponding downstream streamflow in a large river basin like the Ganges, upstream rainfall has the potential to be a good predictor of downstream flow (Akhtar et al. 2009). This observation provides the rationale of incorporating upstream observed rain (ObsR) information into the flow persistence or QQ model to develop the flow persistence with observed rainfall (QQ+ObsR) model. The structure of the QQ+ObsR model is
where , , , , , , and are as in Eq. (1); , , , and are lagged space–time-aggregated domain rainfall of domains I, II, III and IV, respectively (Figs. 5–7); and , , , and are model coefficients related to the upstream rainfall of each of the four domains. Parameters and are the maximum and minimum flow travel times (in days) for domain i (I–IV), while is the forecast day (i.e., 0 day), and is the forecast lead time (1–10 days). If and/or > t, then the observed rainfall data up to the forecast day is considered in the QQ+ObsR model. For example, let us assume we want to generate Ganges forecasts for a 10-day lead time; so, and forecast day . Calculating the rainfall for domain III of the basin (flow travel time is 8–13 days) would require rainfall from (0 − 13 + 10) days to (0 − 8 + 10) days of rain, that is, from 3 days past rainfall to 2 days future rainfall. Since, the QQ+ObsR model does not consider forecasted rain in its regression, the domain III rainfall calculation includes rain from the past 3 days only, that is, no future rain is considered.
3) QQ+ObsR+ForeR model
The structure of the QQ+ObsR+ForeR model is similar to the QQ+ObsR model; the only difference is it uses ForeR in its regression equation along with the ObsR. For example, if and/or > t, where is the forecast day (i.e., 0 day), then forecasted rain of lead time is considered in the QQ+ObsR+ForeR model for domain , providing that and ≤ t + m. Figure 7c demonstrates the 10-day (i.e., n = 10) Ganges forecasting process by using 6 days of forecasted rain (i.e., m = 6) applied in the QQ+ObsR+ForeR model.
b. Upstream basin domains and flow travel time
We applied the spatial hydrological analyst (SHA) of ArcGIS and spatially distributed unit hydrograph (SDUH) concept (Maidment 1993) to estimate the flow travel time map or isochrones. An isochrone is a contour joining points in the watershed separated by the same travel time from the outlet (Roy and Thomas 2017). The SHA first determines the flow direction, flow accumulation, flow path, and slope by using an eight-direction pour-point algorithm (Narayan et al. 2013) and calculates the initial flow travel time by using the mean velocity of each flow path derived from the flow path slope and land-use features like curve number and roughness coefficient (i.e., Manning’s n). The SDUH method then establishes the excess rainfall in each raster cell of the watershed, develops a time area histogram, and calculates unit hydrograph ordinates, which are nothing but the incremental area divided by the representative time interval (Roy and Thomas 2017). The SHA operation then updates the initial flow travel time by using unit hydrograph ordinates, revises flow travel through each cell along the flow path, and finally calculates the travel time of surface water flow from each raster cell in the watershed to the basin outlet.
Figure 8 shows the separated isochrone zones in one of the subbasins of the Ganges basin and schematized illustration of the SDUH method. The SHA operation employs the Shuttle Radar Topography Mission (SRTM)-generated digital elevation model (DEM) of 90-m resolution (Jarvis et al. 2008) and global mosaics of the standard (MODIS) land cover (MCD12Q1) of 5′ resolution (Friedl et al. 2010). We divided the entire basin into four large domains by merging smaller subbasins and calculated maximum and minimum flow travel times using isochrones. The calculated flow travel time of the Ganges domains I–IV are 19–25, 14–18, 8–13, and 0–7 days, respectively. For the Brahmaputra, they are 15–25, 8–14, 4–7, and 0–3 days, and for the UM (Barak) they are 11–15, 7–10, 4–6, and 0–3 days, respectively (Fig. 5).
c. Datasets used in the study
We used historical records of daily rated discharge (streamflow) and measured WL data of HB (Ganges), Bahadurabad (Brahmaputra), and Amalshid (UM–Barak) in our model. We collected these data from FFWC over 1998–2015. FFWC calculates rated discharge by applying measured WL-discharge rating equations (Hopson and Webster 2010). We also considered two sets of gridded rainfall data: first, the near-real-time observed rainfall data of Tropical Rainfall Measuring Mission (TRMM) 3B42 V7 with 0.25° resolution (https://pmm.nasa.gov/data-access/downloads/trmm) over 1998–2015, and second, 1–6-day forecasted rainfall for 2014–15 generated by the Weather Research and Forecasting (WRF) Model and collected from IWM, Bangladesh.
d. Model parameters
In the ReqSim QQ+ObsR and QQ+ObsR+ForeR models, we have two persistence coefficients, and , that correspond to forecast day’s and previous day’s streamflow or WL variable, and four rainfall coefficients, , , , and , corresponding to four upstream domain rainfalls along with the interception coefficient . In this section, we provide a discussion on the parameters of the QQ+ObsR+ForeR model only, the values of which we found from regression fitting for the validation period (1998–2006) and kept unchanged for the calibration period (2007–15).
Figure 9 shows an interesting insight about how the influence of rain coefficients grows with forecast lead time while the influence of persistence coefficients first reaches its peak at 3–4 days lead time and then goes down with further lead time increases for all three river basins. For the Ganges, the influence of the first three upstream or remote domains’ (i.e., domain IV, III, and II) rainfall grows almost consistently with forecast lead time, suggesting that with increasing forecast lead time the influence of aggregated domain rainfall becomes more responsive to the downstream forecasted streamflow. Domain I rainfall, which is the nearest domain from forecast location HB, has almost no influence on the Ganges forecasts. For the Brahmaputra, the influence of the nearest two domains (i.e., domains IV and III) from the forecast location Bahadurabad remains consistent from 1 to 10 days lead time, whereas the influence of remote domains II and I becomes noticeable from 5 days lead time. For the UM or Barak, domain separation is different from the other two river basins. Domains IV and III of the UM are two independent watersheds, and the flows of these two domains join just upstream of the UM forecast location at Amalshid. Therefore, the influence of these domains’ rainfall to downstream forecasts does not follow a sequential pattern as is seen for the Ganges or the Brahmaputra basin. The influence of the domain IV rainfall of UM reaches its maximum at 3-days lead time, then reduces slightly at 4 days and remains consistent for the remaining lead time. The noticeable influence of UM domain III rain, on the other hand, starts after 3 days lead time and reaches its peak at 6 days, reduces slightly at 7 days, and remains consistent up to 10 days lead time.
e. Model performance evaluation
The model performance was evaluated using standard statistical model evaluation techniques applied in Moriasi et al. (2007), such as mean error (ME), mean absolute error (MAE), RMSE, coefficient of determination R2, and NSE. To find more on these techniques, please see Gupta et al. (1999), Singh et al. (2005), and Moriasi et al. (2007).
a. Forecasts findings
We used GBM streamflow or WL data (for flow or water level forecasting, respectively) along with TRMM rainfall data over 1998–2015 in our ReqSim models. The results discussed here are based on forecasts during flood season (July–October) of the validation period (2007–15) only, unless the forecast type and period is mentioned specifically.
The ReqSim model with flow persistence and flow persistence plus observed rain model provide almost identical results up to 5-day Ganges forecasts at the HB location (Tables 2, 3; Fig. 10), suggesting that inclusion of ObsR in the QQ model does not appreciably change the forecasting accuracy up to 5 days lead time for this basin. For example, NSE only improves from 0.88 to 0.92 for 5-day Ganges flow forecasts (Table 2). On the other hand, improvement of the QQ+ObsR model over QQ is small up to 3 days lead time for the Brahmaputra at Bahadurabad and UM (Barak) at Amalshid. The NSE improves from 0.90 to 0.92 for the Brahmaputra and from 0.79 to 0.83 for the UM up to 3 days forecast lead time.
Enhancement of forecasting accuracy with the addition of upstream rainfall is noticeable beyond 5 days for the Ganges and 3 days for the Brahmaputra and UM (Barak) forecasts. The NSEs of the QQ model for 7- and 10-day forecasts are 0.78 and 0.62, which improves up to 0.87 and 0.79 when ObsR is used in the Ganges QQ+ObsR model. For the Brahmaputra, the NSEs of the QQ model for 5-, 7-, and 10-day flow forecasts are 0.74, 0.56, and 0.32, respectively. The accuracy improves considerably when ObsR is included in the regression; resulting NSE values are 0.83, 0.74, and 0.53, respectively. Adding domain rainfall into the UM or Barak QQ+ObsR model does not improve the forecast performance as much as it does for the other two bigger basins. The NSE value merely increases from 0.62 to 0.67 for 5 days, from 0.51 to 0.55 for 7 days, and from 0.38 to 0.41 for 10-day UM flow forecasts.
In the process of adding ForeR into the ReqSim model, we first investigated the potential benefit of using ForeR at different lead times. In doing so, we considered TRMM’s past ObsR as ForeR with a notion of using a “perfect forecast” rainfall. We will refer to this rainfall as ForeRP, with the ReqSim model name QQ+ObsR+ForeRP. This analysis provides a quantitative measure of the utility of using ForeR to enhance the forecasting accuracy of the GBM. For example, the left panels of Fig. 10 clearly show that the improvement in 10-day flow forecast accuracy is marginal beyond the use of 5–6 days of ForeR for the Ganges, 6–7 days for the Brahmaputra, and 8 days of ForeR for the UM (Barak) forecasts. These findings also suggest that the upstream basin rainfall of the last 4–5 days and 3–4 days may have little effect, respectively, on today's HB (Ganges) and Bahadurabad (Brahmaputra) flow. For Amalshid (UM) flow, this may be true for the last 2 days of rainfall. Therefore, we maintained 4 days of lag for the Ganges and Brahmaputra and 2 days of lag for the UM between the applied forecasted rain’s maximum lead time and target streamflow or WL forecast lead time in the QQ+ObsR+ForeRP model. For example, we utilized 6, 3, and 1 days of ForeR for generating 10-, 7-, and 5-day Ganges and Brahmaputra forecasts, respectively, in the model, whereas we used 8, 5, and 3 days of ForeR for the UM (Barak) forecasts. The right panels of Fig. 10 appropriately show that adding ObsR alone to the QQ model improves the Ganges and Brahmaputra forecasts appreciably. After adding ForeR to the QQ+ObsR model, the forecast performance continues to improve further. But the relative improvement of QQ+ObsR+ForeRP model over QQ+ObsR model for the Brahmaputra is significantly higher than that of Ganges. On the contrary, noticeable improvement occurs for the UM (Barak) forecasts only after adding ForeR to the QQ+ObsR model.
It appears that smaller and flashier river basins with shorter flow travel times (and hence, shorter persistence) are likely to benefit more from using forecasted rainfall than the basins with longer persistence in our linear model. The reason behind this is directly related to the geophysical features of the river basin, its rainfall–runoff dynamics, river morphology, and slopes. The slope of the Brahmaputra River starting from domain II is considerably higher than that of the Ganges River, and it is true even for the most downstream river reaches. For example, the Brahmaputra River slope at Bahadurabad (i.e., 7.5 cm km−1) is 1.5 times higher than the Ganges at HB (i.e., 5 cm km−1; Sarker et al. 2003). On the other hand, domain IV of the UM (Barak) is extremely flashy with a high river slope compared to domain III, and our analysis shows that these two domains are independent watersheds. Overall, the slope and related dynamics in rainfall–runoff of these rivers are appropriately captured by the basin’s domain travel time (section 2b). Both the domain rainfall amount and flow travel time dictate how sensitive a particular domain’s rain will be to downstream river hydrologic conditions. As flood forecast lead time increases, the ReqSim linear model receives responses from domain III and II rain signals in predicting downstream floods for the Brahmaputra or UM more than the Ganges. This is the most plausible reason why the gains from including forecasted rainfall were higher in the Brahmaputra or UM than in the Ganges river basin in our model.
It is clear from both statistical (Tables 2, 3) and graphical evaluation (Figs. 11–13 ) that the performance of our ReqSim persistence with ObsR (QQ+ObsR) and persistence with observed and forecasted rainfall (QQ+ObsR+ForeRP) models are encouraging for the Ganges and Brahmaputra up to 10 days lead time. Given the nature of services and information disseminated by the flood forecasting agencies to the people, the 10-day flow forecasts with accuracy indicated by R2 of 0.86 and NSE of 0.81 for the Ganges and R2 of 0.71 and NSE of 0.69 for the Brahmaputra are expected to be operationally valuable in the FFWC’s current flood forecasting activities and can minimize the impacts of floods. The forecasting skill for the UM (Barak) basin at Amalshid is limited within 5–7 days lead time with the application of our linear scheme. The R2 and NSE values are 0.74 and 0.73 for 5-day and 0.66 and 0.64 for 7-day Amalshid flow forecasts, respectively. The 10-day flow forecasts are somewhat limited, with R2 and NSE values of 0.57 and 0.54, respectively.
We have also tested our ReqSim models by applying WL data instead of streamflow into the model structure and found even better forecasting accuracy (Tables 3 and 4). A probable reason for this could be related to the uncertainty introduced by the streamflow derivation from the rating curve using WL measurements. Using WL directly in the linear models may correlate to the rainfall events slightly better than the rated discharge and provide higher forecasting accuracy. Overall, WL forecast accuracy with MAE and RMSE less than 0.35 and 0.45 m, respectively, for 7-day lead time and 0.40 and 0.5 m, respectively, for 10-days lead time for large rivers like the Ganges and Brahmaputra could be regarded as a significant achievement, particularly considering the volume of water these rivers bring into Bangladesh during the flood season. We also tested streamflow forecasts by converting to WL using FFWC’s rating equations and found almost identical forecasting accuracy compared with WL forecasts obtained independently.
b. Incorporating forecasted rainfall
To simulate a more realistic application scenario, we used WRF Model outputs as a source of ForeR data for the GBM basins over the 2014–15 period in our model. We will refer to this forecasted rain as ForeRW, with the ReqSim model name QQ+ObsR+ForeRW. This version of our model provides an estimate of the operational forecasting accuracy that our model could achieve for the GBM basin with the use of WRF forecasted rainfall. The WRF Model (https://www.mmm.ucar.edu/weather-research-and-forecasting-model) is a mesoscale numerical rainfall prediction model customized and downscaled for South Asia and is run every day at IWM for generating weather forecasts with 1–6 days lead time. Both statistical (Table 4) and graphical (Fig. 14) evaluation of QQ+ObsR+ForeR model results with the use of TRMM’s perfect forecast rain and WRF’s forecast rain reveal almost similar Ganges and Brahmaputra forecasts for the flood season in 2014 and 2015. These comparisons suggest that our model with the use of forecasted rainfall from WRF (i.e., QQ+ObsR+ForeRW) could provide comparable results to that we have found with our use of a perfect forecast rainfall (i.e., QQ+ObsR+ForeRp) for 2007–13.
A valid question one may ask: How does the ReqSim model provide such an encouraging forecasting accuracy given the limited accuracy of the near-real-time rainfall product from TRMM (i.e., ForeRp) or WRF forecasted product (i.e., ForeRW)? Our assessment is that in forecasting accuracy for large basins like the GBM, absolute accuracy of rainfall is less important than the ability of TRMM or WRF rainfall products to capture large-scale rainfall patterns as an input to the ReqSim model. In general, despite limited accuracy in estimating rainfall magnitude, numerical weather forecasts or satellite estimates of rainfall could produce considerably more accurate spatial coverage of large-scale rain scenarios such as rain and no-rain or low rain and heavy rain (Islam et al. 2010; Bajracharya et al. 2015). Altogether, the findings presented above provide more supporting evidence that the ReqSim model may qualify as an encouraging flood forecasting tool for operational purposes.
c. Comparison between proposed ReqSim and existing operational products
We compared results of our ReqSim model with current operational methods such as Jason-2 WL and CFAN streamflow forecasting schemes for Bangladesh flood forecasting. The Jason-2 WL forecasts are available in Hossain et al. (2014a,b), whereas we collected CFAN forecasts (Webster et al. 2010) for the 2004–13 monsoons from FFWC (who is the end user of this forecast product), not from the original primary referenced source mentioned earlier. We considered the ensemble mean of CFAN’s 51 daily probabilistic forecasts at HB (Ganges) and Bahadurabad (Brahmaputra) for the comparison. Webster et al. (2010) also did the same to present CFAN’s forecast performances in their paper.
For the Ganges basin, the R2 of CFAN forecasts at 5, 7, and 10 days lead time are 0.79, 0.67, and 0.57, respectively. In comparison, our ReqSim QQ+ObsR+ForeRP model provides improved streamflow forecasts with R2 of 0.89, 0.86, and 0.81 for respective lead times. For the Brahmaputra basin, our QQ+ObsR+ForeRP model provides forecasts with R2 of 0.79, 0.71, and 0.69 for 5, 7, and 10 days lead time, which are identical to CFAN forecasts with R2 of 0.81, 0.78, and 0.72 for the respective lead times. Figure 15 shows a graphical comparison between the results of our ReqSim model and CFAN ensemble mean forecasts for the 2007 monsoon. Our evaluation of CFAN forecasts closely matches with the results published in Webster et al. (2010), with minor differences due to differences in the evaluation period.
The 20-day (1–20 August 2012) performance of Jason-2 WL forecasts shows that the ME and RMSE at HB (Ganges) are −0.43 and 0.47 m and at Bahadurabad (Brahmaputra) are −0.2 and 0.70 m, respectively, for 5 days lead time (Hossain et al. 2014a). Our ReqSim QQ+ObsR model without considering ForeR gives close or improved results even for 10 days lead time. For instance, the ME and RMSE of the 10-day Ganges WL forecasts are −0.33 and 0.52 m, respectively, during the same period. The respective errors for the 10-day Brahmaputra WL forecasts are −0.19 and 0.29 m, respectively. Hossain et al. (2014b) reported the performance of Jason-2–based WL forecasts for the 2013 monsoon (from 1 June to 9 September) by using the correlation coefficients r and RMSE matrix. For instance, the r of 8-day WL forecasts at HB (Ganges) and Bahadurabad (Brahmaputra) are 0.95 and 0.72 and RMSEs are 1.23 and 0.94 m, respectively. Our QQ+ObsR model provides 10-day Ganges and Brahmaputra r as 0.97 and 0.78, and RMSEs as 0.57 and 0.67 m, respectively. After adding ForeR to the model, Ganges 10-day forecasts improve slightly (e.g., r of 0.98 and RMSE of 0.54 m) but significantly for the Brahmaputra basin (e.g., r of 0.87 and RMSE of 0.53 m).
The comparisons presented above clearly imply that our linear ReqSim model can provide equivalent forecasting accuracy to the existing operational techniques such as CFAN and Jason-2 forecasts at both HB (Ganges) and Bahadurabad (Brahmaputra) locations.
This paper explores the utility of using a flood forecasting modeling framework with requisite simplicity that identifies key variables and processes of basin hydrology and develops ways to track their evolution and performance. Findings suggest that models with requisite simplicity—relying on flow persistence, upstream aggregated rainfall, and basin travel time—can provide flood forecasts comparable to relatively more complicated methods for up to 10 days lead time. Three different linear Requisitely Simple (ReqSim) models are considered: 1) a flow-persistence-based QQ model that uses lagged streamflow or water level (WL) in its regression; 2) a flow persistence and observed rainfall (ObsR)-based QQ+ObsR model that uses lagged streamflow or WL and four upstream domains’ rainfall lagged with their average flow travel time; and 3) a flow persistence and observed and forecasted rain (ForeR)-based QQ+ObsR+ForeR model that incorporates upstream domains’ ForeR along with past ObsR.
The GBM basins show strong persistence in their daily streamflow inside Bangladesh with lags up to several days (Fig. 4). For large river basins like these, the current and previous day’s flow or WLs contain significant memory in an aggregated form (i.e., basin response time) that is found to be useful in a simple forecasting model, as demonstrated with our findings. However, a persistence model alone cannot provide robust forecasting accuracy beyond 4–5 days lead time for the Ganges, 3 days for the Brahmaputra, and 2 days for the upper Meghna (Barak) River (Fig. 10). The contribution of adding upstream ObsR to a persistence model—in a very simple way by incorporating four large domain-average rainfalls into the model—appears to significantly enhance forecasting lead time.
Inclusion of ForeR to our linear model hardly improves the model performance that considers ObsR within 8 days lead time for the Ganges, 7 days for the Brahmaputra, and 4 days for the Meghna River. The noticeable improvements appear beyond these lead times, making the ReqSim QQ+ObsR+ForeR model attractive for the medium-range (7–10 days) forecasts for these three rivers. Our findings suggest that the upstream basin rainfall of the last 3–4 days may have minimal effect on today’s Hardinge Bridge (Ganges) and Bahadurabad (Brahmaputra) flow, while it appears to be for the last 2 days of rainfall for Amalshid (UM) flow. Consistent with these findings, incorporating ForeR in the model indicates that there exists an upper limit of applying ForeR’s lead time in enhancing 10-day flood forecasting accuracy. For instance, 6–7 days of ForeR is found adequate for 10-day Ganges and Brahmaputra forecasts while 8 days of ForeR is necessary for the UM (Barak) (Fig. 10, left). This is an important finding because it demonstrates that we may not need ForeR’s lead time to be equal to the flood forecast’s target lead time to produce skilled forecasting accuracy for large (e.g., the Ganges and Brahmaputra) and medium (e.g., the upper Meghna) river basins. It also appears that ForeR with a longer lead time may be useful for a flashy river basin like the Brahmaputra and UM (Barak) (Fig. 10, right).
We have explored the performance of our models with WL data and obtained even better forecasting accuracy compared to flow forecasts. This indicates a key strength of our modeling framework, that is, it can be used for those gauging locations where flow data are not available continuously. Although we have mainly discussed our model’s performance for streamflow forecasts, a quantitative assessment of WL forecast performance is discussed briefly in the Results section and shown in Tables 3 and 4. Besides that, Fig. 16 shows a long-term WL forecast performance during the flood season (July–October) that shows a consistent performance for both the calibration (1998–2006) and validation (2007–15) periods. Consistent performance of our model over a long period may suggest that there has not been a significant shift in the monsoon’s hydrologic response processes between 1998–2006 and 2007–15 in the GBM river basins.
Findings from this study also reveal that large-scale weather systems—for example, the rainfall pattern over the GBM basins—captured in satellite estimates (i.e., TRMM observations) and weather models (e.g., WRF forecasts) are useful in a data-driven model to obtain reasonably accurate GBM forecasts for up to 10 days lead time without employing any complicated processing techniques. This is of particular importance where availability and access to gauge-measured data from upstream basin areas are limited and detailed hydrological modeling are considerably expensive, resource intensive, and operationally prohibitive.
Our proposed model, however, may not be directly transferrable to a river basin that is heavily controlled by upstream regulators where a near-linear relationship between upstream rain and downstream flow may not hold. The Ganges river basin is heavily obstructed by a number of upstream regulators, including the Farakka Barrage. However, the Farakka Barrage reportedly operates from the postmonsoon to dry seasons (November–May) and remains open during flood season. This allows the Ganges to flow almost freely during flood season, and a near-linear relationship between upstream rain and downstream flow may still hold, justifying our use of a linear scheme for the Ganges during flood season. On the other hand, the Brahmaputra and upper Meghna (Barak) are not overly regulated, and the proposed linear schemes appear to work well.
To apply our method, one needs observed WL or streamflow data at the forecast location and upstream basin rainfall from precipitation measuring satellites. Daily or subdaily rainfall data from 1998 to near–real time is easily available from TRMM (https://pmm.nasa.gov/TRMM) for tropical regions of the world. Follow-on missions such as the Global Precipitation Measurement (GPM) Core Observatory provide even better accuracy, latency, and coverage of precipitation from across the planet (https://pmm.nasa.gov/GPM). The switch from TRMM to GPM will involve minor changes in real-time data processing activities (i.e., downloading and reading data), while updating the model training period and estimation of parameters will remain similar. At the same time, the ReqSim model can be easily calibrated until the previous year monsoon to get updated parameter values and generate forecasts for the current year, which in turn is expected to provide further improved results because of using updated parameter values fitted for the most recent hydroclimatology of the basin.
Therefore, if the forecast location shows flow persistence at least for a few days, and the observed rainfall (ObsR) of upstream basin areas is available in near–real time, it is possible to apply our ReqSim QQ and QQ+ObsR model for forecasting floods. To apply our ReqSim QQ+ObsR+ForeR, one needs forecasted rain (ForeR) data that may not be available for basins across the world. But if available, the use of ForeR into the model is similar to applying ObsR. The basin domain delineation and estimation of flow travel time are important steps in our model development. These can easily be done by using publicly available GIS tools and global datasets mentioned in section 2b. Finally, the total runtime of our ReqSim model is very small, as we estimated that from downloading data to running the model to generating and disseminating forecasts to the national forecasting agency would require a few minutes on a day-to-day basis using the easily available computational capabilities of a laptop computer.
We expect the simplicity of our model structure and use of easily available data will allow a wide-scale adoption of our modeling framework for the GBM and other large river basins around the world. Our results show comparable forecasting accuracy with respect to existing hydrologic or hybrid models (Webster et al. 2010; Hopson and Webster 2010) or satellite altimetry-based WL prediction (Hossain et al. 2014a,b) methods with varying degrees of complexity, and we consider our current study as a complementary one to the rich collection of existing research outputs. More importantly, our proposed framework is easy to implement and can be customized to work for any large river basins around the world with relatively less effort and resource requirements. We hope the notion of requisite simplicity—examining the trade-off between modeling complexity and functional utility—will be used as a guiding principle as we invest more resources to enhance flood forecasting accuracy of large rivers for effective preparedness and response.
This work was supported, in part, by two grants from the U.S. National Science Foundation (RCN-SEES 1140163 and NSF-IGERT 0966093). We are also indebted to the Flood Forecasting and Warning Centre (FFWC) of Bangladesh Water Development Board (BWDB) and Institute of Water Modelling (IWM), Dhaka, Bangladesh, for data and information sharing about the GBM river basins and Bangladesh floods.