Notwithstanding the rich record of hydrometric observations compiled by the U.S. Geological Survey (USGS) across the contiguous United States (CONUS), flood event catalogs are sparse and incomplete. Available databases or inventories are mostly survey- or report-based, impact oriented, or limited to flash floods. These data do not represent the full range of flood events occurring in CONUS in terms of geographical locations, severity, triggering weather, or basin morphometry. This study describes a comprehensive dataset consisting of more than half a million flood events extracted from 6,301 USGS flow records and radar-rainfall fields from 2002 to 2013, using the characteristic point method. The database features event duration; first- (mass center) and second- (spreading) order moments of both precipitation and flow, flow peak and percentile, event runoff coefficient, base flow, and information on the basin geomorphology. It can support flood modeling, geomorphological and geophysical impact studies, and instantaneous unit hydrograph and risk analyses, among other investigations. Preliminary data analysis conducted in this study shows that the spatial pattern of flood events affected by snowmelt correlates well with the mean annual snowfall accumulation pattern across CONUS, the basin morphometry affects the number of flood events and peak flows, and the concentration time and spreadness of the flood events can be related to the precipitation first- and second-order moments.
Data from thousands of U.S. Geological Survey (USGS) stream gauges and radar-rainfall estimates over the United States for a 12-yr period are used to retrieve over a half-million flood events and document their spatiotemporal precipitation and flow characteristics.
Flood events that appear as overflow from water bodies represent hydrological responses of basins to precipitation accumulation from storms. A comprehensive database of flood events is vital for studying this hydrological behavior at catchment scale and for analyzing the occurrence and impact of hydrological hazards, yet one is not available. Survey- and report-based flood catalogs are limited in terms of the number of recorded events (Adhikari et al. 2010; Calianno et al. 2013; Diakakis et al. 2012; Du et al. 2015; Gourley et al. 2010; Santos et al. 2015) and could be impractical in regions exhibiting less frequent weather hazards. Threshold-based approaches (Gourley et al. 2013) are restricted to gauging locations with available flood thresholds. Such thresholds are difficult to define across basins of different sizes because the term “overflow” varies with time and location. Existing databases, such as the Emergency Disasters Database (EM-DAT; www.emdat.be), the International Flood Network (IFNET; www.internationalfloodnetwork.org), the impact-categorized (United States) flash flood reports (Calianno et al. 2013), the European flash floods (Gaume et al. 2009), and a U.S. unified flash flood database (http://blog.nssl.noaa.gov/flash/database/) (Gourley et al. 2013) have primarily focused on flash floods or major floods noticeable by their impact. Consequently, records in these databases represent a subset of the different flood events that can cause hydrological hazards. Other recent flood databases have only recorded flow time series and/or annual peak values without identifying the flood events (Hall et al. 2015). Furthermore, although the timing and location of floods are available in some of the existing databases, the triggering precipitation characteristics are seldom archived.
In this study, we report the development of a new comprehensive database of flood events over the contiguous United States (CONUS), identified from precipitation and flow records using the characteristic points method (CPM) (Mei and Anagnostou 2015). Besides being fully automatic because of its physical basis, CPM requires only flow and rainfall time series and does not depend on user-defined thresholds or calibration. Furthermore, using the available information on triggering basin-average precipitation, we have computed multiple descriptors (summarized in Table 1) for each event, including runoff coefficient, base flow index, and first- and second-order moments of both precipitation and flow (Zoccatelli et al. 2011)—parameters that do not exist in current flooding catalogs. These descriptors broaden the applicability of this database to varying flood studies, including hydrological modeling (Jayakrishnan et al. 2005; Park and Markus 2014; Shen et al. 2016a), flood risk analyses (Apel et al. 2009), and geomorphological and geophysical impact analyses (Costa 1987; Xu et al. 2004).
DATA AND METHODOLOGY.
We used CPM as the kernel identifier of the flood events. In this study we introduced minor modifications to the method to improve the significance of events and the completeness of associated precipitation. The two input datasets to the CPM were the U.S. Geological Survey (USGS) stream flows (IF) and the National Stage IV multisensor precipitation analyses (Stage IV) products (Klazura and Imy 1993). We used multiscale flow direction (FDR) and accumulation (FAC) maps (Lehner et al. 2006; Wu et al. 2011) to segment basin regions and calculate the spatial moments of precipitation. The computational steps (also depicted in Fig. 1) were as follows:
The USGS flow time series at intervals from 1 min to 1 h was offset from the local time zone to coordinated universal time (UTC) and then accumulated to hourly to match the Stage IV–based basin-average precipitation time series.
The flow time series from step 1 was input to the CPM to perform the base flow separation and then the flood event identification by matching necessary characteristics of an event. Identified events with a peak value below the 80th percentile of the entire flow series were considered insignificant and filtered out.
Stage IV precipitation fields were used to generate basin-average precipitation time series. The basin region was segmented using the traditional watershed algorithm, requiring an FDR map and the location of the basin outlet. To balance the computation and accuracy, FDR maps of variable resolution (30ʹ, 1/16°, or 1/8°) are selected, based on the drainage area. For this study, we selected a coarser resolution when the basin had more than 1,000 grid cells at the resolution. Before the segmentation, gauge locations were snapped into the river network. We searched the grid by matching the drainage area (accompanied by the gauge information provided by USGS) to the FAC (which equals to the drainage area) within three pixels’ distance from the gauge location.
The original rainfall association module of the CPM (Mei and Anagnostou 2015) was modified to improve the accuracy of the start time of triggering precipitation. Specifically, when CPM could not find a sufficient precipitation amount to associate to a given flood event, we tripled the value of the drainage area–derived searching period [LSP; Eq. (6) in Mei and Anagnostou (2015)] to identify the triggering precipitation. The start time of precipitation was defined as the latest time, tpb, in the basin-average precipitation time series for which precipitation accumulation between tpb and the start time of the flood, tfb, is at least twice the total basin outflow accumulation of the flood event.
Based on the event precipitation and flow time series we calculated multiple descriptors of the flood event, listed in Table 1.
We extracted 542,092 flood events from January 2002 to August 2013, applying the above procedure to flow records from 6,301 USGS hydrometric stations in the CONUS area. We discarded 762 stations whose records were incomplete, which contained back flow, or whose locations we could not snap to streams on the geographical map. Users can mine this database using different criteria—for example, peak value exceeding the 95th percentile of the peak flow record, duration, by defining drainage area ranges, and so on. Figure 2 provides an example of extracted flood events in USGS gauge 03007800, while Fig. 3 gives the overall distributions of selected flood characteristics. In Fig. 3a, the number of events is shown to decrease exponentially as a function of the runoff coefficient and to have a nearly parabolic distribution against the base flow index, with the maximal occurrence at 0.42. Figure 3b shows the duration distribution of triggering precipitation, with a median of 212.7 h. Precipitation lasting less than 6 h triggered 15,218 events, which are thus classified as flash floods (www.srh.noaa.gov/mrx/hydro/flooddef.php). Limited by the spatiotemporal resolution (4 km, hourly) of the Stage IV data, very short duration flash floods (lasting a few hours) associated with small-scale basins are not represented well by the spatial precipitation moments in this database.
Categorization of events facilitates different levels of flood studies. We categorized all events in our database into three classes by evaluating the runoff coefficient R, and the lag between the start time of flow and that of triggering precipitation, tlag, of each event. If the matched precipitation provided enough fast flow (i.e., R < 1), and the causal relationship between triggering precipitation and flood event held (i.e., tlag > 0), we labeled the event as category 3; if only the first condition was satisfied, we labeled it as category 2; if neither condition held, we labeled it as category 1. The timing error in category 2 and both the volume and timing errors in category 1 came from snowmelt contribution and/or data error. For studies at quantitative, qualitative, and basic levels, we correspondingly recommend using events of only category 3, categories 2 and 3, and all categories, as demonstrated below.
In Fig. 4a, rivers where melting snow makes a significant contribution to floods are identified by evaluating the R value of events in category 1. If R is greater than 1.2, it indicates a shortage of at least 20% of associated precipitation. The shortage may come from the fact that the snowmelt contribution was not considered in the CPM or due to error in the precipitation data. To moderate this ambiguity, we highlighted the gauges that had such events in at least one in 5 years of the available data record. By comparing with Fig. 4b, we note that the spatial pattern of USGS gauge locations with events affected by snow melting was in agreement with that of the annual snowfall. It should be noted that the snowfall locations and snowmelt-affected flows are expected to exhibit spatial and temporal lags, depending on the basin sizes and river lengths. A characteristic example is snowmelt-affected stream gauge records in the coastal areas of California and the central and southern plains that exhibit spatial lags relative to the snowfall locations.
Figure 5 analyzes the dependence of flood event characteristics on basin morphometry. As stated by Costa (1987), the number of flood events greatly reduces as a function of drainage area A and drainage density Dd and flood peak tends to increase as a function of elongation ratio Re. Figures 5a and 5b show the histogram of the annual number of events with respect to A and Dd, respectively, using events in all categories and of peak value greater than the 90th percentile of the time series. Although mean annual count of events varies among gauges of similar A because of the different climate conditions, A has a negative correlation to the mean annual count of flood events. We have not, however, observed a declining trend of this count with Dd in Fig. 5b. A possible explanation is that A dominates Dd in reducing the probability of flood events. To verify this, Fig. 5c illustrates the count dependence on both factors. For the drainage area bin between 25 and 300 km2, we observe a nearly monotonically declining trend of count with Dd. Other bins do not exhibit such clear trends, since they have limited dynamic range of Dd values. The count is reduced greatly with the drainage area in a similar Dd condition. Note that basins of total channel length shorter than 1 km or drainage area less than 24 km2 are not included in Figs. 5b and 5c, owing to resolution limitations in the 1-km geomorphological maps. To evaluate the flood peak dependence on the elongation ratio among basins of varying sizes, peak flow rate in m3 s−1 was normalized by the drainage area to be converted to mm h−1. Figure 5d exhibits the peak flow rate dependence on mean precipitation and Re. At the same level of precipitation, peak flow increases monotonically with Re. Generally, the peak flow rate should increase with mean precipitation, as shown by most parts of Fig. 5d. The events maximal column of Pmean (31–313 mm h−1) is not exhibited in Fig. 5d because event samples of this high precipitation level are not adequate to be statistically representative.
We used events from category 3, where snowmelt contribution and data error are relatively small, to validate the predictability of flood concentration time and spreadness from the spatial moments of precipitation. Considering that precipitation spatial moments were computed from the 4-km Stage IV data, we ruled out basins smaller than 100 km2 for the subsequent analysis to maintain the accuracy of precipitation spatial moments. According to the geomorphological instantaneous unit hydrograph (GIUH) theory (Rigon et al. 2016; Zoccatelli et al. 2011), the centroid and spreadness of a flood event can be predicted by precipitation moments using:
where ν is the effective traveling velocity of a water parcel and other variables are formulated in Table 1; E<·>, Var<·>, and Cov<·,·> stand for the expectation, variance, and covariance, respectively, of a random variable or variables. Variables TQ, TR, and LS denote the flow concentration time, runoff generation time, and travel distance, respectively. Variables used in Table 1 are described as follows: p(t, Aʹ) represents the precipitation field, with t denoting the time and Aʹ denoting the location; A and tlag stand for drainage area and time lag between the start time of a flood event and the triggering precipitation. Variables τq and τp are the duration of a flood event and its triggering precipitation, respectively. For a given gauged basin, event-dependent velocity υ is obtained by solving Eq. (1) using training events. Then the solved υ is fit by Eq. (3) to include dependences on the mean precipitation and spreadness:
where a and b are basin-specific coefficients that depend on geomorphological and geophysical characteristics, and Pmean is defined in the fifth row of Table 1. Equation (3) indicates that heavier precipitation and narrower spreadness generate a higher energy-gradient line of flow that results in greater water traveling velocity. The predicted E〈TQ〉 and Var〈TQ〉 against observations for all gauged basins are plotted in Figs. 6a–d. We note a minor underestimation, but otherwise a strong agreement, for the E〈TQ〉 predictions; the performance of Var〈TQ〉 predictions is worse, particularly in basins exceeding 1,000 km2. The agreement is also depicted in the density plots of predicted versus observed Pearson correlation coefficients shown in Figs. 6e and 6f and in the normalized root-mean-square difference (NRMSD) shown in Figs. 6g and 6h. We have observed good predictability of the flood concentration time and correlation of flood spreadness, by/to the precipitation spatial moments, respectively. It is noted, however, that simplified assumptions that equalize total precipitation to the direct runoff while ignoring the effect of interception, evapotranspiration and infiltration, and the velocity difference between the surface flow and interflow between the hill slope and channel can contribute to error in predicting the flood spreadness (Rigon et al. 2016; Zoccatelli et al. 2011).
This article described a newly derived flood events database for the CONUS area. This database, containing the most flood events and descriptors, is more comprehensive than currently available flood event datasets. A unique aspect of it is the association of the flood events to the triggering precipitation characteristics. Correlation of flood event concentration time and spreadness to the precipitation spatial moments and evaluation of the initial influence of melting snow on floods consolidates the quality of the database and demonstrates its potential for supporting GIUH applications and flood vulnerability investigations, among many other studies. The article also showed the dependence of the number of flood events and flow peak values on geomorphological characteristics. The confirmation and refinement of existing dependences reveals the possibility of discovering and evaluating more elaborate and multivariant statistical relationships between flood characteristics and basin geomorphological factors.
A limitation of this database primarily comes from the use of Stage IV precipitation data available at hourly intervals and 4-km spatial resolution. Therefore, short-duration (1–4 h) and localized flood events that do not exhibit the complete set of flood characteristics defined in the CPM are not identifiable. Furthermore, the precipitation spatial moments in small watersheds (areas < 100 km2) are less accurate owing to the spatial resolution (∼16 km2) of the precipitation dataset.
This database, which is available to the research community (http://ucwater.engr.uconn.edu/fedb/), can support a number of flood modeling and vulnerability analysis studies. We also expect it to be used jointly with distributed basin morphometric datasets (Shen et al. 2016b) to extend the skills mentioned above to ungauged basins (e.g., predicting the a and b parameters by geomorphological and geophysical features) or with infrastructure and socioeconomic datasets to assess social impacts of floods. We expect to update the database annually over the CONUS area based on newly released USGS streamflow and Stage IV precipitation records. Furthermore, extension of this database to earlier years, incorporation of finer-resolution precipitation analysis, and extension of its coverage globally based on Earth observation datasets are among our future research directions.
The study was supported by the Connecticut Institute for Resilience and Climate Adaptation (CIRCA). The USGS instantaneous flow data from before October 2007 were shared by Dr. Zachery Flamig at the University of Oklahoma via http://flash.ou.edu/USGS/, and the records from after October 2007 were downloaded via http://waterdata.usgs.gov/nwis, the USGS National Water Information System (NWISWeb). The National Stage IV QPE product was downloaded via www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/, hosted by the National Centers for Environmental Prediction (NCEP), NOAA. This paper was edited by Dr. Lisa Ferraro Parmelee, manager of LFP Editorial Enterprises LLC. The dataset can be downloaded from http://ucwater.engr.uconn.edu/fedb.