Streamflow Hydrograph Classification Using Functional Data Analysis

Camille Ternynck Institute Center for Water and Environment, Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates, and Laboratoire LEM, Maison de la Recherche, Domaine Universitaire du Pont de Bois, University of Lille, Villeneuve-d’Ascq, France

Search for other papers by Camille Ternynck in
Current site
Google Scholar
PubMed
Close
,
Mohamed Ali Ben Alaya Eau Terre Environnement, Institut National de la Recherche Scientifique, Quebec, Quebec, Canada

Search for other papers by Mohamed Ali Ben Alaya in
Current site
Google Scholar
PubMed
Close
,
Fateh Chebana Eau Terre Environnement, Institut National de la Recherche Scientifique, Quebec, Quebec, Canada

Search for other papers by Fateh Chebana in
Current site
Google Scholar
PubMed
Close
,
Sophie Dabo-Niang Laboratoire LEM, Maison de la Recherche, Domaine Universitaire du Pont de Bois, and Modal Team INRIA, University of Lille, Villeneuve-d’Ascq, France

Search for other papers by Sophie Dabo-Niang in
Current site
Google Scholar
PubMed
Close
, and
Taha B. M. J. Ouarda Institute Center for Water and Environment, Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates, and Eau Terre Environnement, Institut National de la Recherche Scientifique, Quebec, Quebec, Canada

Search for other papers by Taha B. M. J. Ouarda in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Classification of streamflow hydrographs plays an important role in a large number of hydrological and hydraulic studies. For instance, it allows decisions to be made regarding the implementation of hydraulic structures and characterization of different flood types, leading to a better understanding of extreme flow behavior. The employed hydrograph classification methods are generally based on a finite number of hydrograph characteristics and do not include all the available information contained in a discharge time series. In this paper, two statistical techniques from the theory of functional data classification are adapted and applied for the analysis of flood hydrographs. Functional classification directly employs all data of a discharge time series and thus contains all available information on shape, peak, and timing. This potentially allows a better understanding and treatment of floods as well as other hydrological phenomena. The considered functional methodology is applied to streamflow datasets from the province of Quebec, Canada. It is shown that classes obtained using functional approaches have merit and can lead to better representation than those obtained using a multidimensional hierarchical classification method. The considered methodology has the advantage of using all of the information contained in the hydrograph, thus reducing the subjectivity that is inherent in multidimensional analysis of the type and number of characteristics to be used and consequently diminishing the associated uncertainty.

Corresponding author address: Camille Ternynck, Institute Center for Water and Environment, Masdar Institute of Science and Technology, P.O. Box 54224, Abu Dhabi, United Arab Emirates. E-mail: ternynck.camille@gmail.com

Abstract

Classification of streamflow hydrographs plays an important role in a large number of hydrological and hydraulic studies. For instance, it allows decisions to be made regarding the implementation of hydraulic structures and characterization of different flood types, leading to a better understanding of extreme flow behavior. The employed hydrograph classification methods are generally based on a finite number of hydrograph characteristics and do not include all the available information contained in a discharge time series. In this paper, two statistical techniques from the theory of functional data classification are adapted and applied for the analysis of flood hydrographs. Functional classification directly employs all data of a discharge time series and thus contains all available information on shape, peak, and timing. This potentially allows a better understanding and treatment of floods as well as other hydrological phenomena. The considered functional methodology is applied to streamflow datasets from the province of Quebec, Canada. It is shown that classes obtained using functional approaches have merit and can lead to better representation than those obtained using a multidimensional hierarchical classification method. The considered methodology has the advantage of using all of the information contained in the hydrograph, thus reducing the subjectivity that is inherent in multidimensional analysis of the type and number of characteristics to be used and consequently diminishing the associated uncertainty.

Corresponding author address: Camille Ternynck, Institute Center for Water and Environment, Masdar Institute of Science and Technology, P.O. Box 54224, Abu Dhabi, United Arab Emirates. E-mail: ternynck.camille@gmail.com

1. Introduction

The hydrograph as a graphical representation of the temporal variation of flow is the main source of information to study flow behavior. The information provided by the hydrograph is essential to determine the severity and frequency of extreme hydrological events, especially floods and droughts. The stream hydrograph is an integration of spatial and temporal variations in water input, storage, and transfer processes within a catchment. Thus, hydrographs may present different hydrological regimes for a given watershed (e.g., Hannah et al. 2000). Therefore, hydrographs of a given watershed may not be similar from year to year. Classifying hydrographs into homogeneous classes is of interest to identify and understand the different regimes, to characterize groups, to separate events, and to detect possible changes. In addition, hydrograph classification is essential to characterize the impacts of climate disturbances on hydrological regimes (e.g., Kingston et al. 2011). Hydrograph classification is then very important, particularly where changes in the frequency and/or in the intensity of various forms of extreme weather events could occur. The classification of hydrographs can allow characterizing different hydrological regimes, which leads to a better understanding and specific treatment of the behavior of extreme flows and the associated water resource activities (e.g., Harris et al. 2000).

From a water resources management point of view, hydroelectric utilities are interested in classifying hydrographs based on their shape and eventually linking the shape to a risk measure (e.g., a return period). Previous efforts to derive a rational classification procedure were limited mainly by technique availability. Yue et al. (2002), for instance, considered the two-parameter beta probability density function to represent the shape of hydrographs and used two shape variables (shape mean and shape variance) to classify flood hydrographs. This approach, although simplistic, was useful for the classification of hydrographs for practical purposes in the province of Quebec, Canada. The hydrological community needs hydrograph classification methods that can provide a full representation of the hydrograph and a full use of all the information contained within.

In terms of methods, hierarchical classification (HC) is the most commonly used technique for hydrograph classification. Hannah et al. (2000) proposed a multidimensional technique to classify diurnal discharge hydrographs from glacier basins separately according to their shape and magnitude. Their procedure involves two separate classifications of the hydrographs that have been combined. The aim of the first classification is to derive a set of distinct diurnal hydrograph shape classes using the HC approach based on principal component analysis (PCA). The second classification is based on four magnitude indices: the mean, minimum, maximum, and variance of monthly observations. This method was adapted by Harris et al. (2000) to riparian systems on four British rivers, where flow regimes are defined by monthly mean flow series. Bower et al. (2004) used this same method to develop a regime classification to identify spatial and temporal patterns in intra-annual hydroclimatological response as well as an index to assess river flow regime climatic sensitivity. Assani and Tardif (2005) do not use an HC approach but proposed 11 hydrological variables based on monthly discharge data considered with a PCA to identify three significant components used to characterize hydrological regimes in Quebec. Recently, Belmar et al. (2011) proposed a hydrological classification using a β-flexible clustering technique based on weighted PCA scores. The latter are obtained using 73 hydrological indices describing natural flow regimes in Segura River basin, Spain. These indices include, for instance, measures of drought duration, as well as flow magnitude, central tendency, and dispersion.

In the hydrological literature, including the above-mentioned studies, a hydrograph is generally characterized by a limited number of characteristics. However, since the hydrograph represents the variation of flow over a period of time (Yue et al. 2002), a flood cannot be characterized only by a finite, even large, number of characteristics, but instead by its entire hydrograph as a curve. We propose an illustration that is, for simplicity, based on the main flood features. Figure 1a (left) shows two hydrograph types characterized by the same volume and different peaks and durations. In Fig. 1a (right), the two hydrograph types present the same peak, volume, and duration. The only difference between them is that the second occurs with a lag time from the first. Multidimensional classification taking into account only the peak, volume, and duration can detect the differences between the two hydrograph types of the first example but is unable to do so for the last example. This last remark is valid whatever the finite number of hydrograph features considered. This is because the continuous character of the hydrograph, as a function, cannot be reduced to any limited number of its features where the hydrograph cannot be fully represented. Figure 1b illustrates another situation in which two hydrographs can correspond to the same peak value, duration, and volume and hence would not be differentiable through a classical multidimensional classification. However, these two hydrographs correspond to two completely different behaviors. Hydrograph I corresponds to a steep rising limb, which would lead to a large volume of water entering the reservoir in a short period of time. This does not give enough time to the operator to evacuate the excess water (because of the capacity of the spillway) and could lead to dam toppling and serious security consequences. On the other hand, hydrograph II presents a slow rising limb, giving ample opportunity to spill excess water and reduce the risk level. A classification of the hydrographs based on their shape is hence important. This example represents a simple illustration of the importance of the shape of the hydrograph. It would have been possible to add an additional variable representing the time to peak. However, other considerations may require other variables, making the process difficult and highly dimensional, especially in the case of multimodal hydrographs, for example. A general classification procedure is needed and can be provided by the functional framework.

Fig. 1.
Fig. 1.

(a) Examples of flood hydrographs characterized by different flood types. Variables V, Q, and D denote the volume, the peak, and the duration, respectively. (b) Examples of flood hydrographs characterized by different behaviors.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

The examples above illustrate that the multidimensional approach depends on the indices used to characterize the phenomenon and that not taking into account some indices (e.g., Julian date) can influence the multidimensional classification results. On the other hand, when a large number of features, such as 73 indices (Belmar et al. 2011), is used, a large quantity of information can be extracted and the hydrograph could be almost represented. However, other drawbacks occur, such as the increase of dimensionality, redundancy, and subjectivity. When the number of variables to include increases, the number of choices and possibilities of subsets of variables increases as well. Some variable selection techniques (see, e.g., Fraiman et al. 2008; Andrews and McNicholas 2014) can be used but are often computationally intensive and are based on the iterative use of hypothesis testing that induces errors at each step. In addition, some variables are not directly available, such as the volume, and require extraction from the raw data, which can cause an increase in uncertainty because of the lack of accuracy in their computation. A number of the considered variables or indices are also usually taken at monthly or annual time scales, which reduces capturing the temporal variability of the hydrological phenomenon. The above considerations have negative impacts on hydrograph classification, especially in terms of information loss, and they consist in a substantial simplification of the overall hydrological phenomenon.

Recently, Chebana et al. (2012) introduced the functional data analysis (FDA) statistical framework to the hydrological context. It is important to mention that in hydrology, the term “functional” is mainly used to refer, for example, to the function of a catchment as its input–output conversion. Here, the term FDA is employed to refer to a statistical framework based on functional data. Specifically, Chebana et al. (2012) focused on exploratory analysis as well as outlier detection of hydrographs. They showed that the FDA framework is adapted to the hydrological context with a number of advantages. Indeed, the functional framework is more general, flexible, and representative of the real hydrological phenomena than multidimensional analysis. In fact, the former treats the whole hydrograph as a functional observation (function or curve), taking into account the maximum of the available information, and constitutes a natural extension of multidimensional approaches. Hence, a classification approach based on the whole hydrograph of an annual discharge time series as a single observed function could lead to more representative classes. It is relevant to mention that Pappenberger and Beven (2004) proposed a hydrograph classification approach using the multidimensional HC method and based on graphical visualizations of hydrographs. Ganora et al. (2009) considered the flow duration as a curve for regionalization purposes. However, even though these studies indicate the need to consider the whole hydrograph in the classification process, they do not have a functional statistical foundation.

In the functional framework, a variety of hydrographs could be covered and all their features would be included without necessarily increasing the uncertainty and subjectivity. Active research is targeting the development of adapted statistical methods to analyze functional data. A number of classical approaches are extended to the functional context (e.g., Ramsay and Silverman 2005; Dabo-Niang et al. 2006; Cadre and Paris 2012; Fischer 2010). This is also the case for classification methods. The functional versions are often extensions of the classical classification ones, in particular the HC and the k-means methods. The aim of the present paper is to introduce the FDA framework for classifying streamflow hydrographs by considering discharge time series as continuous curves.

The theoretical background of functional k-means classification methods is presented in section 2, in its general form. In section 3, these methods are adapted to the hydrological context and applied to the daily streamflows of the Romaine River station. The detailed results are given in section 4. For more general results, functional methods are also applied to 13 other stations in the province of Quebec and the results are briefly presented. A comparison with a multidimensional classification method is also given in section 4. A discussion of all results is carried out in section 5 and conclusions are reported in section 6.

2. Functional k-means methods

The purpose of this section is to present some recently developed k-means procedures for classifying functional data. Let be a set of n discrete observations, where each is the jth record time point from a given continuous time subset , which includes the set . For instance, an observation could be a daily flow temperatures series within a given ith year with T = 365. Note that the statistical object of FDA is a function (curve). However, the curves are not completely observed; instead, only discrete measurements of the curves are available. Then, a first step is to prepare the data to be used in an FDA context. For a fixed observation xi, each set of measurements is converted to be functional data and denoted by , where is the set of all positive real values, by using a smoothing (or interpolation) technique [see, e.g., Ramsay and Silverman (2005) and section 3b].

Generally, in any classification framework, data inside each class should be as similar as possible but different from those in other classes. Note that classification can be supervised or unsupervised (e.g., Hartigan 1975). In the first one, the number of classes is known in advance or chosen according to the study constraints. Otherwise, unsupervised approaches are considered. The present work focuses on these latter ones and more particularly on functional k-means techniques. Note that a number of classification approaches (e.g., the hierarchical algorithm) are also available in the functional literature (see, e.g., Dabo-Niang et al. 2007).

An appropriate classification should lead to homogeneous classes and heterogeneity between classes, thus avoiding unnecessary classes. Consequently, the obtained number of classes k is important (e.g., Milligan and Cooper 1985). Some classification methods do not automatically determine k. Techniques are developed in the literature to overcome this difficulty. One of the techniques consists of selecting k that optimizes a given class homogeneity index (e.g., Krzanowski and Lai 1988). In the application section below, for the k-means classification, an initial and arbitrary choice of k is taken at the beginning of the procedure that is not necessarily the final choice. Note that extensive literature exists for the initialization of the k-means algorithm (e.g., Khan and Ahmad 2004). In addition, the size of the obtained classes could be a concern, according to the classification aims. For instance, if the purpose is inference, such as modeling or estimation, then large size classes are necessary for reliable results. On the contrary, an exploratory or descriptive analysis does not require any size constraints and small classes could be of interest.

The set of the curves is denoted S. The following presented approaches should lead to a splitting of S into some k distinct representative and interpretable classes . Two k-means classification methods for functional data are presented. In classical analysis, the widely used k-means classification consists of partitioning observations into k classes by minimizing a certain quantity, named distortion in this context (similar to a distance). Given k, the preliminary number of classes, the general k-means algorithm is the following (e.g., Hartigan 1975):

  1. Choose k initial centers .

  2. Identify for each observation the nearest center . Each center defines a class l. The proximity between and is measured by a distortion measure .

  3. Define new centers: for each l, is the mean of the observations of the new class l.

  4. If the composition of each class is the same as in step 2, then stop the algorithm and save the obtained classes.

  5. Otherwise, go to step 2.

This algorithm requires an adaptation to the functional setting. The main difficulty consists of dealing with the infinite dimensionality of the data curves. Note that a functional random variable takes values in an infinite dimensional space {such as functional space, e.g., space of continuous functions in the interval ; see Ferraty and Vieu 2006}. First, as presented in the following, the classical k-means algorithm can be used with Bregman divergences as the distortion measure (see Fischer 2010). Second, projection approaches proposed by Abraham et al. (2003) and James and Sugar (2003) is another adaptation. It consists of projecting the curves on a basis and gets classes by considering the k-means algorithm applied on the coefficients of the projection. The k-means method developed in Auder and Fischer (2012) is also presented in this section, where the infinite dimension is reduced by considering only the first p coefficients of the projection on a basis and then performing classification in .

a. Bregman divergence

In the general k-means algorithm, a distortion measure , between an observation and the center of a class, is needed. The main purpose of the k-means classification is to allocate each observation to a class l by minimizing the mean of the distances between each observation and its nearest class center, that is,
eq1
where if belongs to class l, otherwise .

In , different well-known distances are used as distortion measures such as the Euclidean distance. However, in the context of curves classification, it is necessary to consider a notion of distance adapted to high dimensions. To this end, Banerjee et al. (2005) showed that the k-means algorithm should be generalized by replacing the classical distance by the Bregman divergence.

The Bregman divergence was introduced by Bregman (1967) in the multidimensional context. Most of the widely used distortion measures, as the Euclidean distance, are particular cases of Bregman divergences. The Bregman divergences are generalized to the functional context and are defined by
eq2
where ϕ, x, and y are functions; with is a convex set; and denotes the differential of the function ϕ at y. According to the choice of the function ϕ, the functional Bregman divergence may be, for instance, the squared distance, the quadratic bias, and the classical and generalized Kullback–Leibler divergences (e.g., Fischer 2010). The divergence choice depends on the data and on the type of partition required. In particular, the quadratic bias, used in the application section, is defined by
e1
with is an interval in and μ is a finite positive measure.

A similar idea is employed in Chebana and Ouarda (2011) to measure errors between multivariate quantile curves applied to hydrological variables.

b. Projection-based curve clustering

Auder and Fischer (2012) proposed an approach for curve classification by reducing the infinite dimension based on projecting the curves onto a finite lower-dimensional space. Then, k-means classification is performed on the first p coefficients of the corresponding basis projection.

Since the k-obtained classes may depend on the basis choice, several projection bases were used in practice, such as Fourier, Haar, and functional PCA. Another basis was proposed by Auder and Fischer (2012) as the best-entropy basis.

Given the centers of the k classes, the goal of this method of classification is to find the basis minimizing the following distortion:
e2
where is the orthogonal projection on . To this end, according to different values of the projection dimension p, the k-means algorithm is implemented on the projected coefficients resulting from the above basis. The selected classification is the one where the projection dimension p and the basis minimize distortion (2).

c. Silhouette index

In the application section below, the functional approaches presented above are applied to real world data and a comparison with a multidimensional approach (see section 2e) is carried out. For a quantitative comparison of the different classification approaches, the silhouette index is considered (see Kaufman and Rousseeuw 1990, chapter 2). It is presented briefly here.

For a given classification, and for each object of a dataset , the corresponding silhouette index is defined as
eq3
where, for a given belonging to a class A and a distance ,
eq4
for all classes C different from A, from the same classification.
Note that Kaufman and Rousseeuw (1990) proposed to use the Euclidean distance for . For each object , is between −1 and 1. An observation is considered well classified when is large. Consequently, for each classification, the mean of the is evaluated and the best classification (with compact and well-separated classes) corresponds to the largest . In the application section, the silhouette index is computed on the raw data. The following distances were considered for the distance : Euclidean, maximum, Manhattan, Canberra, Minkowski (p = 3), and quadratic bias, which are defined below. For two vectors and , the considered distances are defined by
eq5

Note also that the silhouette criterion is used to compare the results obtained by the different methods. However, it is not claimed that this criterion is optimal. Indeed, as indicated in Ferraty and Vieu (2006), when an unsupervised classification is performed, the user does not know how to validate the obtained partition. Only additional information collected after the analysis can confirm or refute the results. In this study, the goal of the unsupervised classification is to obtain descriptive information rather than claim that the obtained classification is the true one.

d. Location parameters for functional variables

In the application section, figures illustrating centrality curves, namely, mean, mode, and median curves are presented. These location measures summarize the data and aim to provide a representative element of the sample. In the FDA context, for a set of curves, we define the mean curve as , , where denotes the number of elements in the set S (see Ramsay and Silverman 2005). An alternative to the sample mean is the functional median that is based on the statistical notion of depth function (see, e.g., Fraiman and Muniz 2001; Febrero et al. 2008). The median curve is the deepest function in the sample S. It maximizes the depth function and is estimated by , where stands for the element in the set A that maximizes the function g. In the application section, the following concept of functional depth, proposed by Fraiman and Muniz (2001), is applied. For every , let be the empirical distribution of the sample and let denote the univariate depth of the data in this sample, given by
eq6
Then, define for ,
eq7
and rank the observations according to the values of . Thus, the functional median will coincide with the function for which is the maximum. Another central characteristic is the modal curve defined as the curve most densely surrounded by the rest of the curves of the dataset. An estimator of the modal curve can be obtained by , where is an estimate of the density f. The following estimator of f, proposed in Cuevas et al. (2006), is considered in the application section. Given a kernel function and a fixed bandwidth parameter h, is defined as
eq8
In this situation, the kernel mode estimator is defined by . The reader is also referred to Chebana et al. (2012) for more details about the location parameters for functional variables. In the application section, the R package rainbow [see also Hyndman and Shang (2010)] was used to display the centrality curves of each obtained class.

e. Multidimensional method

For the numerical applications, the results are compared with a multidimensional classification method based on 25 hydrological variables that describe three of the five characteristics of hydrologic types suggested by Richter et al. (1996), namely, magnitude, duration, and rate of change. The 25 indices are as follows: monthly discharges (6 variables), monthly maximum and minimum discharges (12 variables), monthly discharge ratios (5 variables), duration of event (1 variable), and the Julian date of the maximum flow (1 variable). First, a PCA is performed to isolate the first main principal components that sufficiently explain the variance in the data described by the 25 variables. Then, an ascendant HC is applied on scores of these components to identify the different classes.

3. Application in hydrology

The methods presented in section 2 are adapted to hydrological discharge time series in section 3a, whereas section 3b gives details about the smoothing step. The methods are applied to the case study data from the province of Quebec described in section 3c. The aim of the case study is to illustrate the functional framework with two submethods and provide a comparison between frameworks (functional and classical) rather than select a specific method within a given framework. The comparison is not limited to the results and the performance but also includes considerations that are related to practical issues, information use, subjectivity reduction, and additional insight generation. All these elements are important and specific to hydrological data.

a. Adaptation to discharge time series

We consider flow series recorded for a given station. These data can be recorded at different time scales such as hourly, daily, or monthly. In the following, we focus on daily flow data assumed to be available for n hydrological events. These hydrological events can be, for instance, floods and are denoted by , where T represents the number of days in the period of year covering the hydrological event, such as T = 365 for the whole year or T = 184 for spring floods. The flow value is measured at day j for the ith event. Usually, these discrete observations have the same size T. These observations are converted into smooth functions on a continuous period in the interval (see section 3b for details on the smoothing procedure). The obtained function constitutes a hydrograph of one event. In this paper, for a given hydrological year, we only consider the spring high-flow event.

b. Smoothing

The first task when studying functional data is to convert the observations into smooth functions on a continuous period in the interval . When the data series are of good quality and with a long enough record, one can simply interpolate the measurements to obtain the curves. Otherwise, smoothing can be required. However, even in the first case, smoothing can be necessary depending on the objective of the study (e.g., Ramsay and Silverman 2005). The reader is referred to Chebana et al. (2012) for more details concerning this issue in the hydrological context. Ramsay and Silverman (2005) presented two main basis systems for building functions. The first one, the Fourier basis system, is the usual choice for periodic data, while the B-spline basis system is rather used for nonperiodic data [see also Graves et al. (2009)]. Note that in Chebana et al. (2012) daily streamflows were smoothed using Fourier basis functions to obtain annual streamflow curves. However, since the present application considers only spring high-flow events, the curves do not cover the entire year and the Fourier basis appears less suited to smooth these data. Therefore, in this study, data smoothing based on the B-spline approximations seems more suitable. Furthermore, the two functional k-mean methods used in this study (based on Bregman divergences and based on the projected curves) use a cubic spline interpolation of the given data points. Interpolation is a technique that has similar aims than smoothing but where the estimated curves pass through all the given points, and its purpose is to estimate the values of the curves at any position between the known points.

c. Case study

The data series are represented by daily flow (m3 s−1) from the Romaine River station with reference number 02VC001. The area of the drainage basin is 13 000 km2. The focus in this study is on spring high-flow events occurring between 1 March and 31 August that are , for T = 184. Indeed, in the province of Quebec, the largest streamflow events are mainly caused by snowmelt during the spring season and can continue until the end of summer. Data are available from 1961 to 2000. According to the present dataset, we have n = 40 years of observations . The ith observation denotes the daily flow measurements for the ith year, which is converted to a smooth curve in the interval . This is performed through the B-spline basis system. As indicated in section 2, S represents the whole set of flow curves . Figure 2 presents all the obtained smooth curves for the period studied. It shows that these curves have several different shapes. Therefore, it is appropriate to classify these curves. Note that we consider unsupervised classification, since the number of classes k is usually unknown in advance. The following two paragraphs outline and justify some choices done by applying the presented methods.

Fig. 2.
Fig. 2.

Curves, obtained by data interpolation, corresponding to the studied period from 1 Mar to 31 Aug for the Romaine River station.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

1) Functional -means using Bregman divergences (denoted KMB).

We have tested the four Bregman divergences mentioned above. The quadratic bias (1) is chosen because the corresponding results lead to the most reasonable classes. Note that a Bregman divergence that equals the measure gives the same classification than the quadratic bias for k = 2, but the classification in three classes resulted in a class composed of only one element. For the k-means approaches, the initial values of k are chosen arbitrarily: different values were tested to make the choice.

2) Functional -means based on projected curves (denoted KMP).

The k-means algorithm was executed on the projected coefficients for the four bases listed in the second section, namely, best-entropy, Haar, Fourier, and functional PCA basis, with different values of the projection dimension p. The selected classification is the one obtained with projection dimension p and the basis minimizing the distortion given in (2). In our case study, projections based on functional PCA with p = 18 were considered for a classification in two and three classes since this configuration minimizes the distortions.

For more general results, 13 other stations are also considered. They represent pristine basins and were selected as part of the reference hydrometric basin network (RHBN) to help provide an understanding of the physical processes within and account for the impact of climate change across the province of Quebec (Ouarda et al. 1999). These stations are listed in Table 1. The results obtained from the different functional approaches are presented in the following section.

Table 1.

List of stations.

Table 1.

4. Results

The functional approaches are applied to the data presented previously and results are given in section 4a. Results for the Romaine River station, Quebec, are presented in detail. Furthermore, the main results for 13 other stations, across the province of Quebec, are briefly presented. For comparison purposes, a multidimensional approach is also applied to the data and the results are presented in section 4b.

a. Functional classification results

First, the functional k-means method with Bregman divergence is performed on the Romaine River station. The corresponding algorithm is implemented on the set of curves S. The number of k = 2 classes is initially considered and the algorithm leads to the two classes denoted by and , with sizes 35 and 5, respectively. The compositions of these classes are presented in Fig. 3 in terms of occurrence years. Figure 3 presents all the obtained classes by each of the considered methods in terms of time occurrence. For instance, for the method, the year 1961 belongs to the class while the year 1970 belongs to . The sizes of these classes are not of the same order of magnitude since no constraint was imposed in this sense. Therefore, class of five curves could be seen as a class of unusual curves. Figure 4a represents all the curves of the set S of each class with different colors according to class: black curves for and gray curves for . The centrality curves, as defined in section 2d, namely, the mean, median, and modal of each class, are also given in Figs. 4b–d. We note that the mean curves do not properly reflect the structure of the studied curves, since they are not based on observations, unlike the median and modal curves.

Fig. 3.
Fig. 3.

Composition of classes obtained for the Romaine River station, according to the different methods: KMB, KMP, and M.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

Fig. 4.
Fig. 4.

Curves and centrality curves of each of the two and three classes using the classification by KMB for the Romaine River station: (a),(e) all curves; (b),(f) mean curves; (c),(g) median curves; and (d),(h) modal curves.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

The number of classes is then increased to k = 3. The obtained three classes, denoted , , and , are of sizes 12, 4, and 24, respectively, and the corresponding compositions are presented in Fig. 3. Figure 4f illustrates the corresponding three distinct mean curves. In terms of mean, class represents the curves with low peak and short duration whereas class represents the curves with higher peak and longer duration. However, class corresponds to curves with intermediate features between those of and . Figures 4g and 4h show that median and modal curves corresponding to class are smaller than for and .

Second, the functional k-means method with projection-based curve is implemented on the data from the Romaine River station. The initial number of classes is taken to be k = 2. For each basis and different values of the dimension projection p, the empirical distortion given in (2) is computed for the Romaine River station and presented in Table 2. According to Table 2, the distortion is minimal for a projection onto p = 18 functional PCA basis; then, for the classification, we consider projections onto p = 18 functional PCA basis. The size of both obtained classes, denoted and , is 20 and their composition is shown in Fig. 3. Class contains most years before 1981 whereas mainly contains years after 1981. Figure 5b indicates that class is clearly characterized by a mean curve higher than in . This figure shows also a time lag between these mean curves. The spring high-flow event seems to occur one month earlier in class . The feature related to the time lag is also valid for the modal and median curves. Consequently, the spring high-flow event generally becomes less important and occurs earlier from the year 1981.

Table 2.

Distortions for the Romaine River station, corresponding to classification. Boldface indicates smallest value.

Table 2.
Fig. 5.
Fig. 5.

Centrality curves for the two and three classes obtained using the projection-based curve clustering method for the Romaine River station: (a),(e) all curves; (b),(f) mean curves; (c),(g) median curves; and (d),(h) modal curves.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

A classification in k = 3 classes, , , and , is also applied by this method. According to Table 3, the distortion is minimal for a projection onto p = 18 functional PCA basis. Consequently, for the classification in three classes, projections onto the p = 18 functional PCA basis are also considered. The composition of these classes of respective sizes 13, 16, and 11 is shown in Fig. 3. The latter shows that mainly represents the years prior to 1975, while mainly represents the years after 1986 and represents the years between 1975 and 1986. According to Figs. 5f–h, class seems to correspond to earlier and smaller hydrographs and corresponds to higher and later hydrographs. Class is the intermediary class between and . Spring streamflow events seem to be evolving toward smaller streamflows that occur earlier.

Table 3.

Distortions for the Romaine River station, corresponding to classification. Boldface indicates smallest value.

Table 3.

Finally, these functional classification methods are applied to 13 other RHBN stations. It is noticed that results are generally similar to those of the Romaine River station. Because of size limitations, we only present in Fig. 6 the mean curves of classes corresponding to the best silhouette criterion (see the following section for more details).

Fig. 6.
Fig. 6.

Mean curves obtained using the method providing the best silhouette criterion for each station.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

b. Comparison between multidimensional and functional results

The multidimensional classification results (see section 2e) are given in detail for the Romaine River station and are summarized for the RHBN stations. First, for the Romaine River station, a PCA is performed to isolate the first five principal components that explain 81% of the variance in the data described by the 25 variables. Then, an ascendant HC was applied on scores of these five components to identify the different classes. The dendrogram, represented on Fig. 7a, indicates that a classification in two classes is appropriate. Figure 3 illustrates the composition of the two obtained classes and . Class contains 6 years, and the second class contains 34 years. According to Figs. 7b–d, class gathers hydrographs with smallest peak and volume and early start date. On the other hand, class is characterized by highest peak and volume and late start date.

Fig. 7.
Fig. 7.

Dendrogram and centrality curves corresponding to the multidimensional ascendant HC, for the Romaine River station: (a) dendrogram, (b) mean curves, (c) median curves, and (d) modal curves.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

The silhouette index (introduced in section 2c) is computed on the raw data in order to compare the different classification results obtained using functional and multidimensional approaches. Tables 4 and 5 present silhouette indices corresponding to the different methods. More precisely, Table 4 gives silhouette indices obtained on the Romaine River considering different distances (defined in section 2c) while Table 5 gives silhouette indices obtained on all the studied stations but only considering the Euclidean distance. According to Table 4, the obtained indices are similar, whatever the distance used, except for considering the quadratic bias. Therefore, in the following, we focus on the silhouette indices based on the Euclidean distance. Accordingly, the best classifications for the Romaine River station would be the two and three classes obtained by the KMP method . The corresponding are the highest (0.2748 and 0.2620, respectively). These functional classifications outperform classes obtained by the multidimensional approach M for which is 0.2293. The worst classification results for the Romaine River station are those produced by the KMB approach with = 0.0841 and = 0.1076.

Table 4.

Mean of the silhouette indices , according to different distances, for the Romaine River station. Boldface indicates highest value for each station. and are functional k means with Bregman divergence in two and three classes, respectively; and are functional k means with projection-based curve in two and three classes, respectively; and M is multidimensional hierarchical classification.

Table 4.
Table 5.

Classification results based on the mean of the silhouette indices for the Romaine River and RHBN stations. Boldface indicates highest value for each station. An em dash indicates that the silhouette index cannot be computed (e.g., class with one observation). and are functional k means with Bregman divergence in two and three classes, respectively; and are functional k means with projection-based curve in two and three classes, respectively; and M is multidimensional hierarchical classification.

Table 5.

The multidimensional approach M is also performed on the RHBN stations. The silhouette indices corresponding to the obtained classes by the functional and M methods are given in Table 5. According to this criterion, the M method often gives better results than the KMB method. However, the KMP method leads to better results for all stations, except for Metabetchouane, where the M method is the best and cannot be evaluated for the KMP method. It is also noticed that, in some cases, the silhouette index cannot be calculated for the classes obtained using the KMB method. This is because of the existence of classes containing a single element. In terms of the silhouette criterion, this method (i.e., KMB) gives the worst results, probably because of the proximity measure used (quadratic bias), which takes mainly into account the volume. Therefore, the conclusions obtained for the RHBN stations are generally similar to those obtained for the Romaine River station.

5. Discussion

For the Romaine River station, the comparison of the functional classification methods with the classical approach (i.e., M) indicates that the first class is included in class . Furthermore, the classification obtained with the M method takes only into account the dates and does not consider the peak height. In fact, class regroups years with early events and irregular curves while class includes the 34 others curves without any regard for a particular feature, the date (e.g., years 1985 and 1994), the height (years 1972 and 1997), or the speed (years 1978 and 1982; see Fig. 8). However, the classification by the KMP method takes into account the starting and ending dates, the peak height, the hydrograph duration, and shape. Indeed, the KMP method separates the previous cited years in two different classes (see Fig. 8), namely, and . More precisely, class is characterized by the high speed of the hydrograph rise, the early start date, and the lower peak whereas class is characterized by lower speed of the hydrograph, the later start date, and the higher peak. The cutting in three classes with the KMP method is possible thanks to a homogeneity gain, but a cutting in three with the M method seems unreasonable (Fig. 7a).

Fig. 8.
Fig. 8.

Some curves of class according to class or .

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

For the Romaine River and the RHBN stations, according to the silhouette criterion (Table 5), the M method performs better than the functional k-means method KMB. This indicates that numerical compression of discharge time series by using appropriate indices or variables can preserve most information contained in the data. Indeed, hydrological time series usually show strong internal dependencies and high autocorrelations allowing a better numerical compression (see Weijs et al. 2013). Therefore, by adequately choosing the indices, a large part of the information can be preserved. On the other hand, given a list of available indices or variables, the choice of which to include in the multidimensional classification is important but subjective, and the number of possible subsets of variables to include could be great. Thus, the choice of indices directly influences the classification results. However, a functional classification method has the advantage of being automatic without making choices of indices and thus, it can improve results. This is the case of the KMP approach, which shows better performance compared to the M approach, in terms of the silhouette criterion. Although, there are still subjective choices to be made with functional methods such as the divergence, they are secondary. Indeed, these choices are less fundamental than variable selection in the multidimensional context. In the latter, the result is directly and significantly related to the way the series and the variables to include are extracted. Other choices are common to both settings, for instance, centers of the k means.

Table 6 summarizes results obtained for the Romaine River station. From this table, we can notice that classes , , and contain hydrographs that start late and have a high volume and peak. On the other hand, classes , , and contain hydrographs that start early and have a low peak and volume. In the same way, the result from the 13 RHBN stations indicates that there are two main hydrograph types in Quebec (see Fig. 6). A first hydrograph type is characterized by a large volume, large peak, and late start date, and a second type is characterized by a low volume, low peak, and early start date. In Quebec, high spring runoff is caused by the melting of large quantities of snow. Indeed, during the summer, liquid precipitation contributes directly to surface runoff since precipitation accelerates the snowmelt process. Consequently, when snowmelt occurs late in the summer, it combines with heavy rainfall to lead to extreme events.

Table 6.

Main features of the obtained classes.

Table 6.

From Fig. 3 and Table 6, the appearance frequency of the curves in classes and for the Romaine River station seems to have changed during the last years (approximately since year 1981). Indeed, the frequency of late hydrographs has decreased whereas hydrographs that start early and that are characterized by a low peak and volume became more frequent. This behavior could be explained by climate variability. However, the relatively short length of the series cannot affirm this. Indeed, climate has a significant influence on rainfall, river streamflow, and snowmelt. The variability and trends in the climate are influenced by oceanic and atmospheric oscillations on a large scale, known as teleconnections (e.g., Hurrell and Van Loon 1997; Rogers 1997). Among the most known atmospheric oscillations, one can mention El Niño–Southern Oscillation (ENSO). ENSO was shown to have a strong impact on hydroclimatic variables in a number of regions throughout the globe (e.g., Cullen et al. 2002; Nazemosadat et al. 2006; Modarres and Ouarda 2013; Ouachani et al. 2013). These oscillations determine the large-scale atmospheric circulation and can affect the watershed hydrological regime for a given year. Figure 9 shows that the period preceding the year 1981 was dominated by low phases of ENSO, while the period posterior to 1981 was dominated by high phases of ENSO. This seems to be related to the two classes identified by methods (see Fig. 3). Indeed, events classified before 1981 seem to be mostly characterized by large peaks, large volumes, and late starts, while events classified after 1981 are characterized by low peaks, low volumes, and early starts. Thus, the developed application in this work can be extended by the characterization of different obtained classes using the climatic oscillation indices and other climatic factors.

Fig. 9.
Fig. 9.

ENSO time series over the studied period.

Citation: Journal of Hydrometeorology 17, 1; 10.1175/JHM-D-14-0200.1

6. Summary and concluding remarks

The purpose of the present paper is the classification of streamflow hydrographs using the FDA framework. Two functional classification methods are considered, namely, k-means method with Bregman divergences (KMB) and with projection-based curve (KMP). These functional classification methods are presented and adapted to streamflows. Although this work covers the classification of streamflow hydrographs, the presented methodology is general and can therefore be applied to other hydrological events, for example, to the classes of droughts curves, storms, and rainfall.

Applications are carried out for hydrological stations from the province of Quebec (Canada), including the Romaine River and 13 other RHBN stations. Functional approaches are compared with a multidimensional method based on 25 extracted hydrological variables that describe magnitude, duration, and rate of change of a hydrograph. For the Romaine River station, an appropriate functional classification is obtained with the k-means method with projections in two and three classes. In fact, in terms of the silhouette criterion, classification using this method gives better results than classification obtained using the multidimensional method. An advantage of functional approaches is that they allow for coverage of the whole spring streamflow event and not only partially through some of its features. For all the RHBN considered stations, the different classification results allow for the identification of two main spring streamflow types. The first is characterized by large volume, large peak, and late start date, and the second is characterized by low volume, low peak, and early start date.

The presented method is applied to streamflow hydrographs in each station separately. Consequently, the different obtained spring streamflow types characterize only the temporal variability of the spring streamflow events at a given station. Since the stream hydrograph is an integration of spatial and temporal variations in water input, the presented method could be adapted and extended in order to account for spatial variability in a regional classification context.

A natural extension to the work presented herein consists of linking the shape classification obtained by FDA to a risk measure (return period for instance). The idea is to derive hydrograph shapes that correspond to different return periods, thus identifying (in a quantitative manner) the types of hydrographs that correspond to highly recurrent events on one side and extremely rare events on the other. This hydrograph classification is very useful for design and management purposes. This is in fact an extension of the concept of flood quantile, except that the focus is no longer on quantiles corresponding to a single variable (e.g., peak) or based on the joint distribution of two or more variables (e.g., peak, volume, and duration), but on quantiles representing the whole hydrograph. Future efforts can also focus on adapting the distance measure to the classification objective (risk analysis, total inflow quantification, etc.) and to the data record length and quality. This can only be achieved through the application of the functional classification method to a number of case studies.

To conclude, functional methods allow for coverage of all the information contained in discharge time series. In fact, classical approaches have dimensionality-related difficulties when the number of hydrograph characteristics increases. These difficulties can be related to computation issues, algorithm definition, error estimation, and subjectivity in the evaluation. This issue is important when one is interested in classifying the whole hydrograph rather than only some of its features.

Acknowledgments

Financial support for this study was graciously provided by the Natural Sciences and Engineering Research Council (NSERC) of Canada and by the Nord-Pas de Calais Regional Council, France. The authors thank the International Relations Ministry of Quebec (Ministère des Relations Internationales, de la Francophonie, et du Commerce Extérieur du Québec) for its financial contribution to this France–Quebec cooperation project. The authors are grateful to Benjamin Auder and Aurélie Fischer for sharing the R codes related to the projection-based curves clustering. The authors wish to thank the editor, Professor Christa D. Peters-Lidard, as well as Professor Nataliya Le Vine and one anonymous reviewer for their useful comments, which led to considerable improvements in the paper.

REFERENCES

  • Abraham, C., Cornillon P. A. , Matzner-Løber E. , and Molinari N. , 2003: Unsupervised curve clustering using B-splines. Scand. J. Stat., 30, 581595, doi:10.1111/1467-9469.00350.

    • Search Google Scholar
    • Export Citation
  • Andrews, J. L., and McNicholas P. D. , 2014: Variable selection for clustering and classification. J. Classif., 31, 136153, doi:10.1007/s00357-013-9139-2.

    • Search Google Scholar
    • Export Citation
  • Assani, A. A., and Tardif S. , 2005: Classification, caractérisation et facteurs de variabilité spatiale des régimes hydrologiques naturels au Québec (Canada): Approche éco-géographique. Rev. Sci. Eau, 18 (2), 247266, doi:10.7202/705559ar.

    • Search Google Scholar
    • Export Citation
  • Auder, B., and Fischer A. , 2012: Projection-based curve clustering. J. Stat. Comput. Simul., 82, 11451168, doi:10.1080/00949655.2011.572882.

    • Search Google Scholar
    • Export Citation
  • Banerjee, A., Merugu S. , Dhillon I. S. , and Ghosh J. , 2005: Clustering with Bregman divergences. J. Mach. Learn. Res., 6, 17051749.

  • Belmar, O., Velasco J. , and Martinez-Capel F. , 2011: Hydrological classification of natural flow regimes to support environmental flow assessments in intensively regulated Mediterranean rivers, Segura River basin (Spain). Environ. Manage., 47, 9921004, doi:10.1007/s00267-011-9661-0.

    • Search Google Scholar
    • Export Citation
  • Bower, D., Hannah D. M. , and McGregor G. R. , 2004: Techniques for assessing the climatic sensitivity of river flow regimes. Hydrol. Processes, 18, 25152543, doi:10.1002/hyp.1479.

    • Search Google Scholar
    • Export Citation
  • Bregman, L. M., 1967: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys., 7, 200217, doi:10.1016/0041-5553(67)90040-7.

    • Search Google Scholar
    • Export Citation
  • Cadre, B., and Paris Q. , 2012: On Hölder fields clustering. Test, 21, 301316, doi:10.1007/s11749-011-0244-4.

  • Chebana, F., and Ouarda T. B. M. J. , 2011: Multivariate quantiles in hydrological frequency analysis. Environmetrics, 22, 6378, doi:10.1002/env.1027.

    • Search Google Scholar
    • Export Citation
  • Chebana, F., Dabo-Niang S. , and Ouarda T. B. M. J. , 2012: Exploratory functional flood frequency analysis and outlier detection. Water Resour.Res., 48, W04514, doi:10.1029/2011WR011040.

  • Cuevas, A., Febrero M. , and Fraiman R. , 2006: On the use of the bootstrap for estimating functions with functional data. Comput. Stat. Data Anal., 51, 10631074, doi:10.1016/j.csda.2005.10.012.

    • Search Google Scholar
    • Export Citation
  • Cullen, H. M., Kaplan A. , Arkin P. A. , and deMenocal P. B. , 2002: Impact of the North Atlantic Oscillation on Middle Eastern climate and streamflow. Climatic Change, 55, 315338, doi:10.1023/A:1020518305517.

    • Search Google Scholar
    • Export Citation
  • Dabo-Niang, S., Ferraty F. , and Vieu P. , 2006: Mode estimation for functional random variable and its application for curves classification. Far East J. Theor. Stat., 18 (1), 93119.

    • Search Google Scholar
    • Export Citation
  • Dabo-Niang, S., Ferraty F. , and Vieu P. , 2007: On the using of modal curves for radar waveforms classification. Comput. Stat. Data Anal., 51, 48784890, doi:10.1016/j.csda.2006.07.012.

    • Search Google Scholar
    • Export Citation
  • Febrero, M., Galeano P. , and Gonzalez-Manteiga W. , 2008: Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics, 19, 331345, doi:10.1002/env.878.

    • Search Google Scholar
    • Export Citation
  • Ferraty, F., and Vieu P. , 2006: Nonparametric Functional Data Analysis: Theory and Practice. Springer-Verlag, 260 pp.

  • Fischer, A., 2010: Quantization and clustering with Bregman divergences. J. Multivariate Anal., 101, 22072221, doi:10.1016/j.jmva.2010.05.008.

    • Search Google Scholar
    • Export Citation
  • Fraiman, R., and Muniz G. , 2001: Trimmed means for functional data. Test, 10, 419440, doi:10.1007/BF02595706.

  • Fraiman, R., Justel A. , and Svarc M. , 2008: Selection of variables for cluster analysis and classification rules. J. Amer. Stat. Assoc., 103, 12941303, doi:10.1198/016214508000000544.

    • Search Google Scholar
    • Export Citation
  • Ganora, D., Claps P. , Laio F. , and Viglione A. , 2009: An approach to estimate nonparametric flow duration curves in ungauged basins. Water Resour. Res., 45, W10418, doi:10.1029/2008WR007472.

    • Search Google Scholar
    • Export Citation
  • Graves, S., Hooker G. , and Ramsay J. , 2009: Functional Data Analysis with R and MATLAB. Springer, 202 pp.

  • Hannah, D. M., Smith B. P. G. , Gurnell A. M. , and McGregor G. R. , 2000: An approach to hydrograph classification. Hydrol. Processes, 14, 317338, doi:10.1002/(SICI)1099-1085(20000215)14:2<317::AID-HYP929>3.0.CO;2-T.

    • Search Google Scholar
    • Export Citation
  • Harris, N. M., Gurnell A. M. , Hannah D. M. , and Petts G. E. , 2000: Classification of river regimes: A context for hydroecology. Hydrol. Processes, 14, 28312848, doi:10.1002/1099-1085(200011/12)14:16/17<2831::AID-HYP122>3.0.CO;2-O.

    • Search Google Scholar
    • Export Citation
  • Hartigan, J. A., 1975: Cluster Algorithms. Wiley, 351 pp.

  • Hurrell, J. W., and Van Loon H. , 1997: Decadal variations in climate associated with the North Atlantic Oscillation. Climatic Change, 36, 301326, doi:10.1023/A:1005314315270.

    • Search Google Scholar
    • Export Citation
  • Hyndman, R. J., and Shang H. L. , 2010: Rainbow plots, bagplots, and boxplots for functional data. J. Comput. Graph. Stat., 19, 2945, doi:10.1198/jcgs.2009.08158.

    • Search Google Scholar
    • Export Citation
  • James, G. M., and Sugar C. A. , 2003: Clustering for sparsely sampled functional data. J. Amer. Stat. Assoc., 98, 397408, doi:10.1198/016214503000189.

    • Search Google Scholar
    • Export Citation
  • Kaufman, L., and Rousseeuw P. , 1990: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 342 pp.

  • Khan, S. S., and Ahmad A. , 2004: Cluster center initialization algorithm for K-means clustering. Pattern Recognit. Lett., 25, 12931302, doi:10.1016/j.patrec.2004.04.007.

    • Search Google Scholar
    • Export Citation
  • Kingston, D. G., Thompson J. R. , and Kite G. , 2011: Uncertainty in climate change projections of discharge for the Mekong River basin. Hydrol. Earth Syst. Sci., 15, 14591471, doi:10.5194/hess-15-1459-2011.

    • Search Google Scholar
    • Export Citation
  • Krzanowski, W. J., and Lai Y. T. , 1988: A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics, 44, 2334, doi:10.2307/2531893.

    • Search Google Scholar
    • Export Citation
  • Milligan, G. W., and Cooper M. C. , 1985: An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159179, doi:10.1007/BF02294245.

    • Search Google Scholar
    • Export Citation
  • Modarres, R., and Ouarda T. B. M. J. , 2013: Testing and modelling the volatility change in ENSO. Atmos.–Ocean, 51, 561570, doi:10.1080/07055900.2013.843054.

    • Search Google Scholar
    • Export Citation
  • Nazemosadat, M. J., Samani N. , Barry D. A. , and Molaii Niko M. , 2006: ENSO forcing on climate change in Iran: Precipitation analysis. Indian J. Sci. Technol., 30 (B4), 555565.

    • Search Google Scholar
    • Export Citation
  • Ouachani, R., Bargaoui Z. , and Ouarda T. B. M. J. , 2013: Power of teleconnection patterns on precipitation and streamflow variability of upper Medjerda basin. Int. J. Climatol., 33, 5876, doi:10.1002/joc.3407.

    • Search Google Scholar
    • Export Citation
  • Ouarda, T. B. M. J., Rasmussen P. F. , Cantin J. F. , Bobée B. , Laurence R. , and Hoang V. D. , 1999: Identification d’un réseau hydrométrique pour le suivi des modifications climatiques dans la province de Québec. Rev. Sci. Eau, 12, 425448, doi:10.7202/705359ar.

    • Search Google Scholar
    • Export Citation
  • Pappenberger, F., and Beven K. J. , 2004: Functional classification and evaluation of hydrographs based on multicomponent mapping (Mx). Int. J. River Basin Manage., 2, 89100, doi:10.1080/15715124.2004.9635224.

    • Search Google Scholar
    • Export Citation
  • Ramsay, J. O., and Silverman B. W. , 2005: Functional Data Analysis. 2nd ed. Springer, 428 pp.

  • Richter, B. D., Baumgartner J. V. , Powell J. , and Braun D. P. , 1996: A method for assessing hydrologic alteration within ecosystems. Conserv. Biol., 10, 11631174, doi:10.1046/j.1523-1739.1996.10041163.x.

    • Search Google Scholar
    • Export Citation
  • Rogers, J. C., 1997: North Atlantic storm track variability and its association to the North Atlantic Oscillation and climate variability of northern Europe. J. Climate, 10, 16351647, doi:10.1175/1520-0442(1997)010<1635:NASTVA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Weijs, S. V., van de Giesen N. , and Parlange M. B. , 2013: Data compression to define information content of hydrological time series. Hydrol. Earth Syst. Sci., 17, 31713187, doi:10.5194/hess-17-3171-2013.

    • Search Google Scholar
    • Export Citation
  • Yue, S., Ouarda T. B. M. J. , Bobée B. , Legendre P. , and Bruneau P. , 2002: Approach for describing statistical properties of flood hydrograph. J. Hydrol. Eng., 7, 147153, doi:10.1061/(ASCE)1084-0699(2002)7:2(147).

    • Search Google Scholar
    • Export Citation
Save
  • Abraham, C., Cornillon P. A. , Matzner-Løber E. , and Molinari N. , 2003: Unsupervised curve clustering using B-splines. Scand. J. Stat., 30, 581595, doi:10.1111/1467-9469.00350.

    • Search Google Scholar
    • Export Citation
  • Andrews, J. L., and McNicholas P. D. , 2014: Variable selection for clustering and classification. J. Classif., 31, 136153, doi:10.1007/s00357-013-9139-2.

    • Search Google Scholar
    • Export Citation
  • Assani, A. A., and Tardif S. , 2005: Classification, caractérisation et facteurs de variabilité spatiale des régimes hydrologiques naturels au Québec (Canada): Approche éco-géographique. Rev. Sci. Eau, 18 (2), 247266, doi:10.7202/705559ar.

    • Search Google Scholar
    • Export Citation
  • Auder, B., and Fischer A. , 2012: Projection-based curve clustering. J. Stat. Comput. Simul., 82, 11451168, doi:10.1080/00949655.2011.572882.

    • Search Google Scholar
    • Export Citation
  • Banerjee, A., Merugu S. , Dhillon I. S. , and Ghosh J. , 2005: Clustering with Bregman divergences. J. Mach. Learn. Res., 6, 17051749.

  • Belmar, O., Velasco J. , and Martinez-Capel F. , 2011: Hydrological classification of natural flow regimes to support environmental flow assessments in intensively regulated Mediterranean rivers, Segura River basin (Spain). Environ. Manage., 47, 9921004, doi:10.1007/s00267-011-9661-0.

    • Search Google Scholar
    • Export Citation
  • Bower, D., Hannah D. M. , and McGregor G. R. , 2004: Techniques for assessing the climatic sensitivity of river flow regimes. Hydrol. Processes, 18, 25152543, doi:10.1002/hyp.1479.

    • Search Google Scholar
    • Export Citation
  • Bregman, L. M., 1967: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Comput. Math. Math. Phys., 7, 200217, doi:10.1016/0041-5553(67)90040-7.

    • Search Google Scholar
    • Export Citation
  • Cadre, B., and Paris Q. , 2012: On Hölder fields clustering. Test, 21, 301316, doi:10.1007/s11749-011-0244-4.

  • Chebana, F., and Ouarda T. B. M. J. , 2011: Multivariate quantiles in hydrological frequency analysis. Environmetrics, 22, 6378, doi:10.1002/env.1027.

    • Search Google Scholar
    • Export Citation
  • Chebana, F., Dabo-Niang S. , and Ouarda T. B. M. J. , 2012: Exploratory functional flood frequency analysis and outlier detection. Water Resour.Res., 48, W04514, doi:10.1029/2011WR011040.

  • Cuevas, A., Febrero M. , and Fraiman R. , 2006: On the use of the bootstrap for estimating functions with functional data. Comput. Stat. Data Anal., 51, 10631074, doi:10.1016/j.csda.2005.10.012.

    • Search Google Scholar
    • Export Citation
  • Cullen, H. M., Kaplan A. , Arkin P. A. , and deMenocal P. B. , 2002: Impact of the North Atlantic Oscillation on Middle Eastern climate and streamflow. Climatic Change, 55, 315338, doi:10.1023/A:1020518305517.

    • Search Google Scholar
    • Export Citation
  • Dabo-Niang, S., Ferraty F. , and Vieu P. , 2006: Mode estimation for functional random variable and its application for curves classification. Far East J. Theor. Stat., 18 (1), 93119.

    • Search Google Scholar
    • Export Citation
  • Dabo-Niang, S., Ferraty F. , and Vieu P. , 2007: On the using of modal curves for radar waveforms classification. Comput. Stat. Data Anal., 51, 48784890, doi:10.1016/j.csda.2006.07.012.

    • Search Google Scholar
    • Export Citation
  • Febrero, M., Galeano P. , and Gonzalez-Manteiga W. , 2008: Outlier detection in functional data by depth measures, with application to identify abnormal NOx levels. Environmetrics, 19, 331345, doi:10.1002/env.878.

    • Search Google Scholar
    • Export Citation
  • Ferraty, F., and Vieu P. , 2006: Nonparametric Functional Data Analysis: Theory and Practice. Springer-Verlag, 260 pp.

  • Fischer, A., 2010: Quantization and clustering with Bregman divergences. J. Multivariate Anal., 101, 22072221, doi:10.1016/j.jmva.2010.05.008.

    • Search Google Scholar
    • Export Citation
  • Fraiman, R., and Muniz G. , 2001: Trimmed means for functional data. Test, 10, 419440, doi:10.1007/BF02595706.

  • Fraiman, R., Justel A. , and Svarc M. , 2008: Selection of variables for cluster analysis and classification rules. J. Amer. Stat. Assoc., 103, 12941303, doi:10.1198/016214508000000544.

    • Search Google Scholar
    • Export Citation
  • Ganora, D., Claps P. , Laio F. , and Viglione A. , 2009: An approach to estimate nonparametric flow duration curves in ungauged basins. Water Resour. Res., 45, W10418, doi:10.1029/2008WR007472.

    • Search Google Scholar
    • Export Citation
  • Graves, S., Hooker G. , and Ramsay J. , 2009: Functional Data Analysis with R and MATLAB. Springer, 202 pp.

  • Hannah, D. M., Smith B. P. G. , Gurnell A. M. , and McGregor G. R. , 2000: An approach to hydrograph classification. Hydrol. Processes, 14, 317338, doi:10.1002/(SICI)1099-1085(20000215)14:2<317::AID-HYP929>3.0.CO;2-T.

    • Search Google Scholar
    • Export Citation
  • Harris, N. M., Gurnell A. M. , Hannah D. M. , and Petts G. E. , 2000: Classification of river regimes: A context for hydroecology. Hydrol. Processes, 14, 28312848, doi:10.1002/1099-1085(200011/12)14:16/17<2831::AID-HYP122>3.0.CO;2-O.

    • Search Google Scholar
    • Export Citation
  • Hartigan, J. A., 1975: Cluster Algorithms. Wiley, 351 pp.

  • Hurrell, J. W., and Van Loon H. , 1997: Decadal variations in climate associated with the North Atlantic Oscillation. Climatic Change, 36, 301326, doi:10.1023/A:1005314315270.

    • Search Google Scholar
    • Export Citation
  • Hyndman, R. J., and Shang H. L. , 2010: Rainbow plots, bagplots, and boxplots for functional data. J. Comput. Graph. Stat., 19, 2945, doi:10.1198/jcgs.2009.08158.

    • Search Google Scholar
    • Export Citation
  • James, G. M., and Sugar C. A. , 2003: Clustering for sparsely sampled functional data. J. Amer. Stat. Assoc., 98, 397408, doi:10.1198/016214503000189.

    • Search Google Scholar
    • Export Citation
  • Kaufman, L., and Rousseeuw P. , 1990: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley, 342 pp.

  • Khan, S. S., and Ahmad A. , 2004: Cluster center initialization algorithm for K-means clustering. Pattern Recognit. Lett., 25, 12931302, doi:10.1016/j.patrec.2004.04.007.

    • Search Google Scholar
    • Export Citation
  • Kingston, D. G., Thompson J. R. , and Kite G. , 2011: Uncertainty in climate change projections of discharge for the Mekong River basin. Hydrol. Earth Syst. Sci., 15, 14591471, doi:10.5194/hess-15-1459-2011.

    • Search Google Scholar
    • Export Citation
  • Krzanowski, W. J., and Lai Y. T. , 1988: A criterion for determining the number of groups in a data set using sum-of-squares clustering. Biometrics, 44, 2334, doi:10.2307/2531893.

    • Search Google Scholar
    • Export Citation
  • Milligan, G. W., and Cooper M. C. , 1985: An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159179, doi:10.1007/BF02294245.

    • Search Google Scholar
    • Export Citation
  • Modarres, R., and Ouarda T. B. M. J. , 2013: Testing and modelling the volatility change in ENSO. Atmos.–Ocean, 51, 561570, doi:10.1080/07055900.2013.843054.

    • Search Google Scholar
    • Export Citation
  • Nazemosadat, M. J., Samani N. , Barry D. A. , and Molaii Niko M. , 2006: ENSO forcing on climate change in Iran: Precipitation analysis. Indian J. Sci. Technol., 30 (B4), 555565.

    • Search Google Scholar
    • Export Citation
  • Ouachani, R., Bargaoui Z. , and Ouarda T. B. M. J. , 2013: Power of teleconnection patterns on precipitation and streamflow variability of upper Medjerda basin. Int. J. Climatol., 33, 5876, doi:10.1002/joc.3407.

    • Search Google Scholar
    • Export Citation
  • Ouarda, T. B. M. J., Rasmussen P. F. , Cantin J. F. , Bobée B. , Laurence R. , and Hoang V. D. , 1999: Identification d’un réseau hydrométrique pour le suivi des modifications climatiques dans la province de Québec. Rev. Sci. Eau, 12, 425448, doi:10.7202/705359ar.

    • Search Google Scholar
    • Export Citation
  • Pappenberger, F., and Beven K. J. , 2004: Functional classification and evaluation of hydrographs based on multicomponent mapping (Mx). Int. J. River Basin Manage., 2, 89100, doi:10.1080/15715124.2004.9635224.

    • Search Google Scholar
    • Export Citation
  • Ramsay, J. O., and Silverman B. W. , 2005: Functional Data Analysis. 2nd ed. Springer, 428 pp.

  • Richter, B. D., Baumgartner J. V. , Powell J. , and Braun D. P. , 1996: A method for assessing hydrologic alteration within ecosystems. Conserv. Biol., 10, 11631174, doi:10.1046/j.1523-1739.1996.10041163.x.

    • Search Google Scholar
    • Export Citation
  • Rogers, J. C., 1997: North Atlantic storm track variability and its association to the North Atlantic Oscillation and climate variability of northern Europe. J. Climate, 10, 16351647, doi:10.1175/1520-0442(1997)010<1635:NASTVA>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Weijs, S. V., van de Giesen N. , and Parlange M. B. , 2013: Data compression to define information content of hydrological time series. Hydrol. Earth Syst. Sci., 17, 31713187, doi:10.5194/hess-17-3171-2013.

    • Search Google Scholar
    • Export Citation
  • Yue, S., Ouarda T. B. M. J. , Bobée B. , Legendre P. , and Bruneau P. , 2002: Approach for describing statistical properties of flood hydrograph. J. Hydrol. Eng., 7, 147153, doi:10.1061/(ASCE)1084-0699(2002)7:2(147).

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    (a) Examples of flood hydrographs characterized by different flood types. Variables V, Q, and D denote the volume, the peak, and the duration, respectively. (b) Examples of flood hydrographs characterized by different behaviors.

  • Fig. 2.

    Curves, obtained by data interpolation, corresponding to the studied period from 1 Mar to 31 Aug for the Romaine River station.

  • Fig. 3.

    Composition of classes obtained for the Romaine River station, according to the different methods: KMB, KMP, and M.

  • Fig. 4.

    Curves and centrality curves of each of the two and three classes using the classification by KMB for the Romaine River station: (a),(e) all curves; (b),(f) mean curves; (c),(g) median curves; and (d),(h) modal curves.

  • Fig. 5.

    Centrality curves for the two and three classes obtained using the projection-based curve clustering method for the Romaine River station: (a),(e) all curves; (b),(f) mean curves; (c),(g) median curves; and (d),(h) modal curves.

  • Fig. 6.

    Mean curves obtained using the method providing the best silhouette criterion for each station.

  • Fig. 7.

    Dendrogram and centrality curves corresponding to the multidimensional ascendant HC, for the Romaine River station: (a) dendrogram, (b) mean curves, (c) median curves, and (d) modal curves.

  • Fig. 8.

    Some curves of class according to class or .

  • Fig. 9.

    ENSO time series over the studied period.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 2216 783 67
PDF Downloads 1222 209 13