Abstract

The identification of the rainfall–runoff relationship is a significant precondition for surface–atmosphere process research and operational flood forecasting, especially in inadequately monitored basins. Based on an information diffusion model (IDM) improved by a genetic algorithm, a new algorithm (GIDM) is established for interpolating and forecasting monthly discharge time series; the input variables are the rainfall and runoff values observed during the previous time period. The genetic operators are carefully designed to avoid premature convergence and “local optima” problems while searching for the optimal window width (a parameter of the IDM). In combination with fuzzy inference, the effectiveness of the GIDM is validated using long-term observations. Conventional IDMs are also included for comparison. On the Yellow River or Yangtze River, twelve gauging stations are discussed, and the results show that the new method can simulate the observations more accurately than traditional IDMs, using only 50% or 33.33% of the total data for training. The low density of observations and the difficulties in information extraction are key problems for hydrometeorological research. Therefore, the GIDM may be a valuable tool for improving water management and providing the acceptable input data for hydrological models when available measurements are insufficient.

1. Introduction

Scanty and missing data are insufficient to meet the needs of the hydrological modeling of the physical process. In addition, the rainfall–runoff relationship is one of the most complex hydrologic phenomena to comprehend because of the tremendous spatial and temporal variability of watershed characteristics and precipitation patterns (Sedki et al. 2009). Therefore, how to fill the existing observational data gaps and establish an acceptable model for rainfall–runoff forecasting is a crucial precondition for hydrometeorological research and operational flood forecasting, especially in some undermonitored river basins. Tremendous efforts have been made over the last few decades to recover missing data and to improve hydrological predictions.

Most of the missing data recovery methods, such as kriging interpolation, polynomial interpolation, optimal interpolation, Kalman filtering, the successive corrections method, fractal interpolation, and phase space reconstruction prediction have been widely applied to hydrologic and oceanographic interpolation. However, these methods may not achieve acceptable results when the known data are less than 60% of the total one (Wang et al. 2008).

Regarding how to estimate the relationship between rainfall and runoff accurately, many models have been proposed and have obtained many good results. These models can be broadly divided into three groups: regression-based methods, physical models, and artificial neural network (ANN) methods. The first group, which includes autoregressive moving-average models, has been widely used for reservoir design and optimization (Carlson et al. 1970; Chen and Rao 2002; Komorník et al. 2006; Salas 1993). However, this method generally assumes that the observations obey normal distribution or needs to make an assumption of the equation in advance. Therefore, it is very difficult to obtain a reasonable result for a small sample without any information about the population shape.

For the physical models (Sorooshian et al. 1993; Todini 1996; Whigham and Crapper 2001), equations of mathematical physics are developed into a popular approach to describe the relationships in physical systems. However, the parameters of these models need to be estimated by minimizing objective functions, which generally lead to groups of unrealistic parameters incorporating both data measurement errors and the errors present in the structure of the model itself; parameter observability conditions could not always be guaranteed either.

In recent years, the ANN technique has been of particular interest in operational hydrology. It is capable of simulating a nonlinear system that is hard to describe using traditional physical modeling. ANNs and improved ANNs have been applied in many fields of hydrology and water resource research (Alvisi et al. 2006; Cheng et al. 2005; Keskin et al. 2006; Muttil and Chau 2006; Tao et al. 2008; Zounemat-Kermani and Teshnehlab 2008). However, ANNs generally require sufficient samples to set optimal connection weights and thresholds, which may become challenging when data information is incomplete.

Aiming at shortcomings of the existing interpolation methods and hydrologic models mentioned above, the information diffusion model (IDM) is introduced into this paper. IDM is an effective method of dealing with the small-sample issue; it can capture complex nonlinear relationships without detailed knowledge of the physical processes (Huang 1997, 2001). Based on an IDM, an incomplete dataset can be regarded as a piece of fuzzy information; through some diffusion methods, some additional information can be extracted by spreading the observations. The diffusion coefficient [simple window width (SWW)] can be easily determined according to nearby criteria (Huang 1997) with incomplete data. Greater reliability for risk assessment and pattern recognition can be achieved using this method (Feng and Huang 2008; Huang 2002; Li et al. 2012). However, the IDM equipped with SWW (SIDM) is unable to precisely analyze meteorological, oceanic, and hydrological data that follow an asymmetrical and abnormal distribution. To solve this issue, under the principle of least-mean-squared errors, Xinzhou et al. (2003) proposed the optimal window width (OWW)-based IDM (OIDM), which displays a better performance for estimating the nonnormal population than SIDM. OWW uses the mean value of observations to iteratively compute an approximate result instead of taking all observations into account in one step, which may result in “local optima” and inconsistencies. Genetic algorithms (GAs) are implemented based on the ideas of natural genetics and biological evolution; the algorithm works with a number of solution sets over the search domain rather than a single one so that local optima can effectively be avoided (Goldberg and Holland 1988; Hong et al. 2013b). Hence, this paper presents a new method that obtains global optima diffusion coefficients by using a GA to interpolate and forecast monthly discharge time series.

To the best of our knowledge, no study has been reported in the hydrological literature that has used the IDM for hydrological modeling. Therefore, to facilitate our discussion, the principle of information diffusion and the algorithm to calculate the SWW and OWW is explained in section 2; in the same section, the new method to obtain the diffusion coefficients based on the GA is also discussed. Interpolation and prediction of river runoff using the IDM coupled with fuzzy inference is examined in section 3. To substantiate the new method, a step-by-step implementation of IDMs for the monthly river runoff interpolation and forecasting at real gauging stations is presented in section 4. Finally, in section 5, some conclusions are presented and future work is proposed.

2. IDM and the window width

a. Principles of information diffusion and SWW

Information diffusion refers to making an affirmation: when a knowledge sample is given, it can be used to compute a relationship between population and sample. Let be a given sample (where curly brackets indicate a series of values) and let the universe of discourse be . If and only if is incomplete, there must be a reasonable information diffusion function , , which can accurately estimate the real relation . This is called the principle of information diffusion (Huang 1997). Let be n independent identically distributed observations drawn from a population with density . Suppose is a Borel measurable function in :

 
formula

is called an information diffusion estimator about , where is the diffusion function and is the window width or diffusion coefficient (Huang 1997). According to molecule diffusion theory and nearby criteria, Huang (1997) obtained the normal diffusion function as

 
formula

and the (SWW) as :

 
formula

b. Optimal window width

The SWW-based information diffusion method maximizes the amount of useful data extracted from the sample, thus improving the accuracy of system recognition and natural disaster risk evaluation (Huang 2001; Palm 2007). However, the method is invalid when the population from which observations are drawn does not follow a normal distribution. To obtain a more accurate information diffusion estimator for the abnormal relationship, based on the principle of least-mean-squared error, Xinzhou et al. (2003) proposed an iterative method to obtain the optimal diffusion coefficient (OWW), which can be expressed by

 
formula

where is the initial iterative value; and denote the ordinal number of records and iterations, respectively; and , , and return the largest, smallest, and mean elements in , respectively. When is determined, the iterative computations end with the OWW:

 
formula

The IDM in conjunction with the OWW, that is, the OIDM, applies to research data that follow both normal and abnormal distributions and can estimate a real relationship more accurately than a traditional one (Xinzhou et al. 2003). The OWW is obtained using the mean value of observations for iterative computation instead of including all observations in one step; however, the local optima problem (Goldberg and Holland 1988) may emerge. To avoid the problem, GAs are employed.

c. Searching the general optimal window width using the GA

GAs are established based on the ideas of natural genetics and biological evolution; the algorithm works with a number of solution sets over the search domain so that local optima can effectively be avoided (Goldberg and Holland 1988). Hence, this paper uses GA to search global optimal diffusion coefficients. The combination of the GA and IDM for window width searching consists of three major phases. In the first phase, the GA initializes a population that compounds random codes from the search domain (Xinzhou et al. 2003), where b and a are the maximum and minimum values of the samples, respectively. Since there are too many variables using binary-encoded GAs to solve such optimization problems, this paper selects a real code GA, which means each chromosome is encoded not with binary numbers but with real ones. The second phase is the evaluation of the fitness of all chromosomes. According to Xinzhou et al. (2003), the window width can be obtained by

 
formula

where is the information diffusion estimator and denotes different records from sample . Motivated by second-order schemes (Flajolet and Sedgewick 1995),

 
formula

Then

 
formula

which is the criteria for determination of the window width. Thus, different records have their own different window widths. To take all records into account at the same time and search the only one global optimal , we select

 
formula

as the fitness value function. The third phase is to apply evolutionary processes, such as selection, crossover, and mutation operations by a GA according to its fitness, which is discussed in Goldberg and Holland (1988) and Hong et al. (2013a). The evolution stops when the fitness is smaller than a predefined value. Finally, the improved window width (IWW) can be adopted for the chromosome with the lowest fitness value.

After a brief overview of IDM and improved IDM techniques is presented, the procedure for interpolating and predicting river discharge based on the IDM is described in the next section.

3. IDM for runoff estimation using fuzzy inference

The IDM coupled with fuzzy inference is an approach that processes samples using a set numerical method (Huang 2002). Let be a training set of observations on ( is the real line), where input denotes the index of records sorted by chronological order or precipitation data, is the river discharge, and curly brackets indicate a series of values.

Let be the domain of and be the range of . The element of will be denoted by , the same for by , and curly brackets indicate a series of values. Let

 
formula

The Cartesian product is called the illustrating space, and and are fuzzy sets of and , respectively. Recalling Eqs. (1) and (2) to deal with their membership functions, the following can be obtained:

 
formula

and

 
formula

An illustrating point is given by , and and are window widths that can be obtained by and based on the algorithms discussed in section 2. The information gain of is

 
formula

Then we have

 
formula

which consists of the information matrix (Huang 2002)

 
formula

According to the theory of factor space (Pei-Zhuang 1990), a fuzzy relation matrix ,

 
formula

can be obtained from an information matrix by using

 
formula

To calculate the output fuzzy set , we use to denote the input fuzzy set,

 
formula

Using the fuzzy inference formula

 
formula

where operator “” denotes the maximum − minimum fuzzy composition rule,

 
formula

where ; thus, we can obtain

 
formula

Finally, the gravity center of the fuzzy set is generated as the output:

 
formula

In general, we use the given sample to construct a relationship between the river discharge and its antecedent values or its meteorological influencing factor in the following form:

 
formula

where is an input vector consisting of and is the flow in the next period or the lack of measurement. Thus, the value of river runoff occurring in a particular moment can be estimated by IDM with the help of fuzzy inference.

4. Case study

To investigate the effectiveness of the proposed model, an IDM improved by a genetic algorithm (GIDM), for runoff prediction and interpolation, experiments divided into two groups have been made. An application of GIDM for monthly discharge reconstructing and forecasting is compared with SIDM and OIDM using the same sparse observed data at Lijin station and neighboring stations in section 4b. In section 4c, more validation at other gauging stations is discussed.

a. Study area and data

The Tangnaihai, Lanzhou, Zhengzhou, Huayuankou, Jinan, and Lijin stations on the Yellow River, along with the Yibin, Zhutuo, Wanzhou, Yichang, Wuhu, and Datong stations on the Yangtze River, have been selected for this study (see Fig. 1). The Yangtze River, the longest river in China with a total length of 6380 km and a drainage basin of 1.8 × 106 km2, covers 20% of China’s land area. The surface runoff of the Yangtze River is about 9.616 × 1011 m3, which accounts for 36% of the total runoff in China. The Yellow River is equally vital in China’s hydrological cycle, with a mainstream length of 5464 km and an area of 752 443 km2. It originates from the Tibetan Plateau and flows eastward, crossing six Chinese provinces and two autonomous regions on its course to Bo Hai. The Lijin station is the farthest downstream on the Yellow River and is the master regulation station for river discharge and sediment. Lijin is selected for its importance to evaluate the performance of GIDM for simulating changes in runoff in detail, with the other stations as auxiliary evaluation sites. The monthly runoff data are published by the Yellow River Conservancy Commission (YRCC) and in the Bulletins of Chinese River Sediment complied by the Ministry of Water Resources from January 1951 to December 2010. Precipitation data from January 1981 to December 2010 are collected from the China Meteorology Administration.

Fig. 1.

Location of study sites. R1 refers to the Yellow River. R2 refers to the Yangtze River. The numbers 1, 2, 3, 4, 5, and 6 refer to Lijin, Lanzhou, Huayuankou, Zhutuo, Yichang, and Datong stations, respectively. The numbers 7, 8, 9, 10, 11, and 12 refer to Jinan, Zhengzhou, Tangnaihai, Yibin, Wanzhou, and Wuhu stations, respectively.

Fig. 1.

Location of study sites. R1 refers to the Yellow River. R2 refers to the Yangtze River. The numbers 1, 2, 3, 4, 5, and 6 refer to Lijin, Lanzhou, Huayuankou, Zhutuo, Yichang, and Datong stations, respectively. The numbers 7, 8, 9, 10, 11, and 12 refer to Jinan, Zhengzhou, Tangnaihai, Yibin, Wanzhou, and Wuhu stations, respectively.

b. Experiments estimating monthly runoff time series at Lijin

1) Experiment 1: Interpolating runoff time series using 50% of the total data

A real example of the monthly runoff data (×108 m3) from January 1951 to December 1965, taken at Lijin station, is presented to illustrate the step-by-step implementation of different IDMs.

  • Step 1. Let records measured from January 1951 to December 1965 be .

  • Step 2. Based on the Monte Carlo method, 50% of the observation data are pseudorandomly selected as the input data (see Table 1) and the remaining 50% are missing data or lack of measurements.

  • Step 3. Calculate the window width of IDM according to the input and the algorithm described in section 2. The values of SWW, OWW, and IWW are listed in Table 2.

  • Step 4. In the information diffusion technique, the selection of appropriate illustrating points is crucial for successful implementation because it provides the basic information about the system. Through a statistical analysis of the data series, the illustrating space can be well established. The input data are analyzed with respect to their distribution in Fig. 2. The more elements the spaced container has, the more illustrating points should be installed into it. Therefore, the illustrating space (where curly brackets indicate a series of values) is designed as 
    formula
  • Step 5. Calculate the final output using Eqs. (11)(22).

Table 1.

The input data of experiment 1.

The input data of experiment 1.
The input data of experiment 1.
Table 2.

The corresponding diffusion coefficients of experiment 1.

The corresponding diffusion coefficients of experiment 1.
The corresponding diffusion coefficients of experiment 1.
Fig. 2.

Distribution of input data in experiment 1.

Fig. 2.

Distribution of input data in experiment 1.

To measure the consistency of SIDM, OIDM, and GIDM, the monthly runoff of years 1966–80, 1981–95, and 1996–2010 are reviewed following the experiment discussed above. Figure 3 displays the aggregated time series of observed and interpolated runoffs. It can be observed that the GIDM outperforms the other two models. For example, in Fig. 3a, there are some obvious undersimulations in July 1951, June 1955, November 1958, October 1961, and July 1964, which are the same as oversimulations in March 1956, January 1957, and October 1965 obtained by SIDM and OIDM. However, the GIDM exhibits a good correlation with them. Although some discrepancies exist between observed and simulated data using GIDM (e.g., from June to November 1953), the general tendency could be acceptable, considering the limited number of training samples.

Fig. 3.

Comparison of observed runoff (solid blue line) and interpolated runoff forced by SIDM (solid black line), OIDM (solid red line), and GIDM (solid green line) at gauge Lijin from (a) January 1951 to December 1965, (b) January 1966 to December 1980, (c) January 1981 to December 1995, and (d) January 1996 to December 2010 using 50% of the data.

Fig. 3.

Comparison of observed runoff (solid blue line) and interpolated runoff forced by SIDM (solid black line), OIDM (solid red line), and GIDM (solid green line) at gauge Lijin from (a) January 1951 to December 1965, (b) January 1966 to December 1980, (c) January 1981 to December 1995, and (d) January 1996 to December 2010 using 50% of the data.

The root-mean-square error (RMSE; Wang et al. 2009; Nayak et al. 2004), the coefficient of correlation R (Wang et al. 2009), the Nash–Sutcliffe efficiency coefficient E (Nash and Sutcliffe 1970), and the mean absolute percentage error (MAPE; Hu et al. 2001) are employed as objective functions to calibrate the model. Table 3 shows the RMSE, R, E, and MAPE values for different models. It is clear from Table 3 that GIDM performs better than the traditional SIDM and OIDM. For example, in the years 1966–80, considering a high value of 145.2000 × 108 m3 and a very low value of 0.4692 × 108 m3 at the Lijin gauging station, the GIDM with an RMSE value of 20.4493 × 108 m3 performed satisfactorily up to the interpolation. Moreover, the GIDM obtained the best R, E, and MAPE statistics of 0.8580, 0.7159, and 28.69, respectively. Coefficient R evaluates the linear correlation between the observed and computed flow, E evaluates the capability of the model in predicting flow values deviating from the mean, and MAPE measures the mean absolute percentage error of the forecast. Therefore, according to the values in Table 3, it can be concluded that the GIDM has reliable robustness and consistency.

Table 3.

The RMSE, R, E, and MAPE values for different models (50% data, where boldface font indicates the best performance).

The RMSE, R, E, and MAPE values for different models (50% data, where boldface font indicates the best performance).
The RMSE, R, E, and MAPE values for different models (50% data, where boldface font indicates the best performance).

2) Experiment 2: Interpolating runoff time series using 33.33% of the total data

Subsequent to experiment 1, in step 2 only 33.33% of the monthly discharge data are selected as the input, with the remaining 66.67% used for testing. Figure 4 shows a plot of observed and reconstructed discharges using different models. It can be observed that the GIDM still correlates well with the recorded discharges, although there are some slight oversimulations and undersimulations. Table 4 presents a comparison for using different models in terms of various performance statistics and can be interpreted as follows. For example, in the years 1981–95, the GIDM improved the SIDM interpolation by about 23.92% and gave a 16.14% reduction in RMSE and MAPE, respectively; improvements of the results regarding R and E were approximately 20.23% and 40.26%, respectively. The RMSE and MAPE values obtained by the GIDM decreased by 16.04% and 30.24% compared with the OIDM, while the R and E values increased by 10.75% and 20.59%. Overall, GIDM is able to obtain better accuracy in terms of different evaluation measures. In addition, discharges have been interpolated using more sparse datasets for training, and all model simulations gradually deteriorate. This is because the samples contain less information about the river runoff for modeling runoff values.

Fig. 4.

As in Fig. 3, but for 33.33% of the data.

Fig. 4.

As in Fig. 3, but for 33.33% of the data.

Table 4.

The RMSE, R, E, and MAPE values for different models (33.3% data, where boldface font indicates the best performance).

The RMSE, R, E, and MAPE values for different models (33.3% data, where boldface font indicates the best performance).
The RMSE, R, E, and MAPE values for different models (33.3% data, where boldface font indicates the best performance).

3) Experiment 3: 12- and 24-month lead time forecasting

A new framework is proposed using IDM to investigate the relationship between upstream rainfall and predicted discharges at Lijin station. Assuming that the discharges at Lijin from January to December 2010 are ungauged, the only observations we have are the broken time series of monthly flow data from January 1981 to December 2009 measured at Lijin station and broken precipitation data at neighboring Jinan (see Fig. 1) station from January 1981 to December 2010.

  • Step 1. Let records from January 1981 to December 2009 be denoted by , where and are referred to as the rainfall and flow data and curly brackets indicate a series of values, respectively.

  • Step 2. Based on the Monte Carlo method, 33.33% of the observation data (see Table 5) are pseudorandomly selected for training the window widths, with the remaining 66.67% as missing data.

  • Step 3. Calculate the window widths. The values of SWW, OWW, and IWW are listed in Table 6.

  • Step 4. The corresponding illustrating space (where curly brackets indicate a series of values) with respect to the input distribution (see Fig. 5) is designed as 
    formula
  • Step 5. Calculate the final predicted values using Eqs. (17)(22). Taking GIDM as an example, first, calculate the fuzzy relationship matrix using Eq. (17) (see Table 7):

    • Then calculate for the amount of precipitation at Jinan in June 2010, . By Eq. (18), it can be obtained that 
      formula
    • Second, we use Eq. (20) and to calculate : 
      formula
    • Finally, by Eq. (22), the runoff at Lijin in June 2010 is calculated: 
      formula

Following the steps discussed above, the results of SIDM, OIDM, and GIDM for runoff forecasting at Lijin can be obtained (see Fig. 6). Figure 6 shows that the variation of runoff at Lijin is closely related to changes in upstream precipitation. Some slightly different tendencies between rainfall and runoff may be illustrated and suggest that the river flow is not only a response to rainfall but also to other physical factors, such as evaporation and soil moisture, or intensive human activities. In addition, Fig. 6 indicates a good match between the model output and observed runoff, especially in peak discharge forecasting using GIDM, which means the new method may be used as an operational flood forecasting tool.

Table 5.

The input data of experiment 3.

The input data of experiment 3.
The input data of experiment 3.
Table 6.

The corresponding diffusion coefficients of experiment 3.

The corresponding diffusion coefficients of experiment 3.
The corresponding diffusion coefficients of experiment 3.
Fig. 5.

Distribution of input data in experiment 3.

Fig. 5.

Distribution of input data in experiment 3.

Fig. 6.

Comparison between predicted and measured flow values at Lijin and rainfall data at Jinan (33.33% data).

Fig. 6.

Comparison between predicted and measured flow values at Lijin and rainfall data at Jinan (33.33% data).

Table 7.

Step 5 table.

Step 5 table.
Step 5 table.

To validate the effectiveness of SIDM, OIDM, and GIDM for runoff forecasting, 24-month lead time prediction also has been made for the period from January 2009 to December 2010. The above-mentioned experiment is repeated 30 times using different rainfall and runoff data from Zhengzhou (see Fig. 1) and Lijin for training, respectively. Table 8 shows the performance of different IDMs on average, and Fig. 7 presents the monthly hydrograph of observed and simulated river runoff for Lijin at the first experiment. They validate that the variation of runoff at Lijin is sensitive to changes in precipitation at upstream stations, and the GIDM performed better than the other two models: the GIDM improved the performance of traditional models by about 0.6%–33% in terms of different evaluation criteria.

Table 8.

Forecasting performance indices of models for Lijin on average for the period from January 2009 to December 2010 (33.33% data, 30 times, where boldface font indicates the best performance).

Forecasting performance indices of models for Lijin on average for the period from January 2009 to December 2010 (33.33% data, 30 times, where boldface font indicates the best performance).
Forecasting performance indices of models for Lijin on average for the period from January 2009 to December 2010 (33.33% data, 30 times, where boldface font indicates the best performance).
Fig. 7.

As in Fig. 6, but for rainfall data at Zhengzhou.

Fig. 7.

As in Fig. 6, but for rainfall data at Zhengzhou.

c. More interpolation and forecasting experiments at other stations

Subsequent to experiments 2 and 3, the three IDMs have been used to interpolate monthly river discharges and to estimate the rainfall–runoff relationship at five other gauges. In particular, the runoff curves from Huayuankou and Yichang are given (see Figs. 8, 9) because they are at the midstream of two different main rivers in China and have different physiographical factors, such as catchment area and underlying surface. Although there are some obvious underestimations in Fig. 8a, it could be indicated that the GIDM provides better interpolation and prediction performance than traditional IDMs. Moreover, according to the analysis in Figs. 8b and 9, the consistency of the new method can be validated.

Fig. 8.

Comparison of observed runoff and interpolated runoff forced by SIDM, OIDM, and GIDM at gauges (a) Huayuankou (from January 1951 to December 1970) and (b) Yichang (from January 1991 to December 2010) for 33.33% of the data.

Fig. 8.

Comparison of observed runoff and interpolated runoff forced by SIDM, OIDM, and GIDM at gauges (a) Huayuankou (from January 1951 to December 1970) and (b) Yichang (from January 1991 to December 2010) for 33.33% of the data.

Fig. 9.

Comparison of observed and simulated runoff at gauges (a) Yichang (from January to December 2010) and (b) Huayuankou (from January 2009 to December 2010) for prediction using different IDMs (33.33% data).

Fig. 9.

Comparison of observed and simulated runoff at gauges (a) Yichang (from January to December 2010) and (b) Huayuankou (from January 2009 to December 2010) for prediction using different IDMs (33.33% data).

Estimation of variations of discharges at Lanzhou, Zhutuo, and Datong stations are shown in Tables 9 and 10. According to the values in Tables 9 and 10, the GIDM obtained the best RMSE, R, E, and MAPE statistics for the interpolations and predictions at Lanzhou, Zhutuo, and Datong. In summary, there is a considerable prospect for the interpolation and prediction of river runoff from incomplete information using the GIDM.

Table 9.

Interpolating performance indices of models for different stations (33.33% data, where boldface font indicates the best performance).

Interpolating performance indices of models for different stations (33.33% data, where boldface font indicates the best performance).
Interpolating performance indices of models for different stations (33.33% data, where boldface font indicates the best performance).
Table 10.

Forecasting performance indices of models for different stations (33.33% data, where boldface font indicates the best performance).

Forecasting performance indices of models for different stations (33.33% data, where boldface font indicates the best performance).
Forecasting performance indices of models for different stations (33.33% data, where boldface font indicates the best performance).

5. Conclusions

A new algorithm for reconstructing and forecasting river discharges with incomplete data has been proposed in this paper. The purpose for constructing the algorithm is to improve the coefficient of IDM for unraveling more information from sparse data. Conventional IDMs are also included for comparison. The monthly runoff data from Lijin, Lanzhou, Huayuankou, Zhutuo, Yichang, and Datong gauging stations and upstream rainfall data are employed to train and validate the different IDMs. According to the results obtained, the potential of the new method can be concluded as follows:

  1. The GIDM is appropriate for estimating the relationship between monthly runoff and rainfall upstream with scanty data, which is crucial for flood prevention and water management in unmeasured basins.

  2. With sparse observations, the GIDM is an operational tool for long-term interpolation of river runoff, which may provide the acceptable input data in hydrologic modeling of physical process; traditional time series approaches have to use much more information to obtain an acceptable result.

  3. On average, the GIDM can improve traditional IDM interpolation and prediction by about 10%–40% in terms of the different performance criteria.

Although it is concluded that the new IDM is sufficient to model the runoff time series, it still cannot be acceptable when samples are much too sparse to permit effective simulation and forecasting. Therefore, it is hoped that future research will focus on these priorities, that is, on establishing a more efficient diffusion function, on estimating the relationship between runoff and other meteorological factors, and on saving computational time for searching optimal parameters of IDMs, etc., so as to improve the accuracy of hydrology simulation and to achieve better operation and management of the various engineering systems.

Acknowledgments

The authors are very grateful to Dr. Joe Turk and the anonymous reviewers for their valuable comments and constructive suggestions, which helped us significantly in improving the quality of the paper. This research was supported by the Chinese National Natural Science Fund (Grants 41375002, 41075045, 51190091, and 41071018) and the Chinese National Natural Science Fund of Jiangsu Province (BK2011123), the Program for New Century Excellent Talents in University (NCET-12-0262), the China Doctoral Program of Higher Education (20120091110026), the Qing Lan Project, the Elite Young Teachers Program, and the Excellent Disciplines Leaders in Midlife-Youth Program of Nanjing University.

REFERENCES

REFERENCES
Alvisi
,
S.
,
G.
Mascellani
,
M.
Franchini
, and
A.
Bardossy
,
2006
:
Water level forecasting through fuzzy logic and artificial neural network approaches
.
Hydrol. Earth Syst. Sci.
,
10
,
1
17
, doi:.
Carlson
,
R. F.
,
A.
MacCormick
, and
D. G.
Watts
,
1970
:
Application of linear random models to four annual streamflow series
.
Water Resour. Res.
,
6
,
1070
1078
, doi:.
Chen
,
H.-L.
, and
A. R.
Rao
,
2002
:
Testing hydrologic time series for stationarity
.
J. Hydrol. Eng.
,
7
,
129
136
, doi:.
Cheng
,
C.-T.
,
J.-Y.
Lin
,
Y.-G.
Sun
, and
K.
Chau
,
2005
: Long-term prediction of discharges in Manwan hydropower using adaptive-network-based fuzzy inference systems models. Advances in Natural Computation, L. Wang, K. Chen, and Y. S. Ong, Eds., Lecture Notes in Computer Science, Vol. 3612, Springer, 1152–1161.
Feng
,
L.
, and
C.
Huang
,
2008
:
A risk assessment model of water shortage based on information diffusion technology and its application in analyzing carrying capacity of water resources
.
Water Resour. Manage.
,
22
,
621
633
, doi:.
Flajolet
,
P.
, and
R.
Sedgewick
,
1995
:
Mellin transforms and asymptotics: Finite differences and Rice's integrals
.
Theor. Comp. Sci.
,
144
,
101
124
, doi:.
Goldberg
,
D. E.
, and
J. H.
Holland
,
1988
:
Genetic algorithms and machine learning
.
Mach. Learn.
,
3
,
95
99
.
Hong
,
M.
,
R.
Zhang
,
J. X.
Li
,
J. J.
Ge
, and
K. F.
Liu
,
2013a
:
Inversion of the western Pacific subtropical high dynamic model and analysis of dynamic characteristics for its abnormality
.
Nonlinear Processes Geophys.
,
20
,
131
142
, doi:.
Hong
,
M.
,
R.
Zhang
,
H. Z.
Wang
,
J. J.
Ge
, and
A. D.
Pan
,
2013b
:
Bifurcations in a low-order nonlinear model of tropical Pacific sea surface temperatures derived from observational data
.
Chaos: Interdiscip. J. Nonlinear Sci.
,
23
,
023104
, doi:.
Hu
,
T. S.
,
K. C.
Lam
, and
S. T.
Ng
,
2001
:
River flow time series prediction with a range-dependent neural network
.
Hydrol. Sci. J.
,
46
,
729
745
, doi:.
Huang
,
C.
,
1997
:
Principle of information diffusion
.
Fuzzy Sets Syst.
,
91
,
69
90
, doi:.
Huang
,
C.
,
2001
: Information matrix and application. Int. J. Gen. Syst.,30, 603–622, doi:.
Huang
,
C.
,
2002
:
Information diffusion techniques and small-sample problem
.
Int. J. Inf. Technol. Decis. Making
,
1
,
229
249
, doi:.
Keskin
,
M. E.
,
D.
Taylan
, and
O.
Terzi
,
2006
:
Adaptive neural-based fuzzy inference system (ANFIS) approach for modelling hydrological time series
.
Hydrol. Sci. J.
,
51
,
588
598
, doi:.
Komorník
,
J.
,
M.
Komorníková
,
R.
Mesiar
,
D.
Szökeová
, and
J.
Szolgay
,
2006
:
Comparison of forecasting performance of nonlinear models of hydrological time series
.
Phys. Chem. Earth
,
31
,
1127
1145
, doi:.
Li
,
Q.
,
J.
Zhou
,
D.
Liu
, and
X.
Jiang
,
2012
:
Research on flood risk analysis and evaluation method based on variable fuzzy sets and information diffusion
.
Saf. Sci.
,
50
,
1275
1283
, doi:.
Muttil
,
N.
, and
K.-W.
Chau
,
2006
:
Neural network and genetic programming for modelling coastal algal blooms
.
Int. J. Environ. Pollut.
,
28
,
223
238
, doi:.
Nash
,
J. E.
, and
J. V.
Sutcliffe
,
1970
:
River flow forecasting through conceptual models part I—A discussion of principles
.
J. Hydrol.
,
10
,
282
290
, doi:.
Nayak
,
P. C.
,
K. P.
Sudhheer
,
D. M.
Rangan
, and
K. S.
Ramasastri
,
2004
:
A neuro-fuzzy computing technique for modeling hydrological time series
.
J. Hydrol.
,
291
,
52
66
, doi:.
Palm
,
R.
,
2007
:
Multiple-step-ahead prediction in control systems with Gaussian process models and TS-fuzzy models
.
Eng. Appl. Artif. Intell.
,
20
,
1023
1035
, doi:.
Pei-Zhuang
,
W.
,
1990
:
A factor spaces approach to knowledge representation
.
Fuzzy Sets Syst.
,
36
,
113
124
, doi:.
Salas
,
J. D.
,
1993
:
Analysis and modeling of hydrologic time series
.
Handb. Hydrol.
,
19
,
1
72
.
Sedki
,
A.
,
D.
Ouazar
, and
E.
El Mazoudi
,
2009
: Evolving neural network using real coded genetic algorithm for daily rainfall–runoff forecasting. Expert. Syst. Appl.,36, 4523–4527, doi:.
Sorooshian
,
S.
,
Q.
Duan
, and
V. K.
Gupta
,
1993
:
Calibration of rainfall–runoff models: Application of global optimization to the Sacramento Soil Moisture Accounting Model
.
Water Resour. Res.
,
29
,
1185
1194
, doi:.
Tao
,
W.
,
Y.
Kailin
, and
G.
Yongxin
,
2008
:
Application of artificial neural networks to forecasting ice conditions of the Yellow River in the Inner Mongolia Reach
.
J. Hydrol. Eng.
,
13
,
811
816
, doi:.
Todini
,
E.
,
1996
:
The ARNO rainfall–runoff model
.
J. Hydrol.
,
175
,
339
382
, doi:.
Wang
,
H.-Z.
,
R.
Zhang
,
W.
Liu
,
G.-H.
Wang
, and
B.-G.
Jin
,
2008
:
Improved interpolation method based on singular spectrum analysis iteration and its application to missing data recovery
.
Appl. Math. Mech.
,
29
,
1351
1361
, doi:.
Wang
,
W. C.
,
K. W.
Chau
,
C. T.
Cheng
, and
L.
Qiu
,
2009
:
A comparison of performance of several artificial intelligence methods for forecasting monthly discharge time series
.
J. Hydrol.
,
374
,
294
306
, doi:.
Whigham
,
P.
, and
P.
Crapper
,
2001
:
Modelling rainfall–runoff using genetic programming
.
Math. Comput. Modell.
,
33
,
707
721
, doi:.
Xinzhou
,
W.
,
Y.
Yangsheng
, and
T.
Yongjing
,
2003
:
The theory of optimal information diffusion estimation and its application
.
Geospat. Inf.
,
1
,
10
17
.
Zounemat-Kermani
,
M.
, and
M.
Teshnehlab
,
2008
:
Using adaptive neuro-fuzzy inference system for hydrological time series prediction
.
Appl. Soft Comput.
,
8
,
928
936
, doi:.