Data from thousands of U.S. Geological Survey (USGS) stream gauges and radar-rainfall estimates over the United States for a 12-yr period are used to retrieve over a half-million flood events and document their spatiotemporal precipitation and flow characteristics.
Flood events that appear as overflow from water bodies represent hydrological responses of basins to precipitation accumulation from storms. A comprehensive database of flood events is vital for studying this hydrological behavior at catchment scale and for analyzing the occurrence and impact of hydrological hazards, yet one is not available. Survey- and report-based flood catalogs are limited in terms of the number of recorded events (Adhikari et al. 2010; Calianno et al. 2013; Diakakis et al. 2012; Du et al. 2015; Gourley et al. 2010; Santos et al. 2015) and could be impractical in regions exhibiting less frequent weather hazards. Threshold-based approaches (Gourley et al. 2013) are restricted to gauging locations with available flood thresholds. Such thresholds are difficult to define across basins of different sizes because the term “overflow” varies with time and location. Existing databases, such as the Emergency Disasters Database (EM-DAT; www.emdat.be), the International Flood Network (IFNET; www.internationalfloodnetwork.org), the impact-categorized (United States) flash flood reports (Calianno et al. 2013), the European flash floods (Gaume et al. 2009), and a U.S. unified flash flood database (http://blog.nssl.noaa.gov/flash/database/) (Gourley et al. 2013) have primarily focused on flash floods or major floods noticeable by their impact. Consequently, records in these databases represent a subset of the different flood events that can cause hydrological hazards. Other recent flood databases have only recorded flow time series and/or annual peak values without identifying the flood events (Hall et al. 2015). Furthermore, although the timing and location of floods are available in some of the existing databases, the triggering precipitation characteristics are seldom archived.
In this study, we report the development of a new comprehensive database of flood events over the contiguous United States (CONUS), identified from precipitation and flow records using the characteristic points method (CPM) (Mei and Anagnostou 2015). Besides being fully automatic because of its physical basis, CPM requires only flow and rainfall time series and does not depend on user-defined thresholds or calibration. Furthermore, using the available information on triggering basin-average precipitation, we have computed multiple descriptors (summarized in Table 1) for each event, including runoff coefficient, base flow index, and first- and second-order moments of both precipitation and flow (Zoccatelli et al. 2011)—parameters that do not exist in current flooding catalogs. These descriptors broaden the applicability of this database to varying flood studies, including hydrological modeling (Jayakrishnan et al. 2005; Park and Markus 2014; Shen et al. 2016a), flood risk analyses (Apel et al. 2009), and geomorphological and geophysical impact analyses (Costa 1987; Xu et al. 2004).
Descriptors of a flood event record. Here “yyyy” stands for four digits of the year, “mmm” stands for three-letter abbreviation of the month, “dd” stands for two digits of the day, and “hh” stands for the two-digit hour.


DATA AND METHODOLOGY.
We used CPM as the kernel identifier of the flood events. In this study we introduced minor modifications to the method to improve the significance of events and the completeness of associated precipitation. The two input datasets to the CPM were the U.S. Geological Survey (USGS) stream flows (IF) and the National Stage IV multisensor precipitation analyses (Stage IV) products (Klazura and Imy 1993). We used multiscale flow direction (FDR) and accumulation (FAC) maps (Lehner et al. 2006; Wu et al. 2011) to segment basin regions and calculate the spatial moments of precipitation. The computational steps (also depicted in Fig. 1) were as follows:
The USGS flow time series at intervals from 1 min to 1 h was offset from the local time zone to coordinated universal time (UTC) and then accumulated to hourly to match the Stage IV–based basin-average precipitation time series.
The flow time series from step 1 was input to the CPM to perform the base flow separation and then the flood event identification by matching necessary characteristics of an event. Identified events with a peak value below the 80th percentile of the entire flow series were considered insignificant and filtered out.
Stage IV precipitation fields were used to generate basin-average precipitation time series. The basin region was segmented using the traditional watershed algorithm, requiring an FDR map and the location of the basin outlet. To balance the computation and accuracy, FDR maps of variable resolution (30ʹ, 1/16°, or 1/8°) are selected, based on the drainage area. For this study, we selected a coarser resolution when the basin had more than 1,000 grid cells at the resolution. Before the segmentation, gauge locations were snapped into the river network. We searched the grid by matching the drainage area (accompanied by the gauge information provided by USGS) to the FAC (which equals to the drainage area) within three pixels’ distance from the gauge location.
The original rainfall association module of the CPM (Mei and Anagnostou 2015) was modified to improve the accuracy of the start time of triggering precipitation. Specifically, when CPM could not find a sufficient precipitation amount to associate to a given flood event, we tripled the value of the drainage area–derived searching period [LSP; Eq. (6) in Mei and Anagnostou (2015)] to identify the triggering precipitation. The start time of precipitation was defined as the latest time, tpb, in the basin-average precipitation time series for which precipitation accumulation between tpb and the start time of the flood, tfb, is at least twice the total basin outflow accumulation of the flood event.
Based on the event precipitation and flow time series we calculated multiple descriptors of the flood event, listed in Table 1.

Flowchart of flood events extraction for each USGS gauge data record.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Flowchart of flood events extraction for each USGS gauge data record.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Flowchart of flood events extraction for each USGS gauge data record.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
RESULTS.
We extracted 542,092 flood events from January 2002 to August 2013, applying the above procedure to flow records from 6,301 USGS hydrometric stations in the CONUS area. We discarded 762 stations whose records were incomplete, which contained back flow, or whose locations we could not snap to streams on the geographical map. Users can mine this database using different criteria—for example, peak value exceeding the 95th percentile of the peak flow record, duration, by defining drainage area ranges, and so on. Figure 2 provides an example of extracted flood events in USGS gauge 03007800, while Fig. 3 gives the overall distributions of selected flood characteristics. In Fig. 3a, the number of events is shown to decrease exponentially as a function of the runoff coefficient and to have a nearly parabolic distribution against the base flow index, with the maximal occurrence at 0.42. Figure 3b shows the duration distribution of triggering precipitation, with a median of 212.7 h. Precipitation lasting less than 6 h triggered 15,218 events, which are thus classified as flash floods (www.srh.noaa.gov/mrx/hydro/flooddef.php). Limited by the spatiotemporal resolution (4 km, hourly) of the Stage IV data, very short duration flash floods (lasting a few hours) associated with small-scale basins are not represented well by the spatial precipitation moments in this database.

Extracted flood hydrographs on the flow measurements of gauge USGS03007800; PA(t) is the basin-averaged precipitation defined in Table 1.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Extracted flood hydrographs on the flow measurements of gauge USGS03007800; PA(t) is the basin-averaged precipitation defined in Table 1.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Extracted flood hydrographs on the flow measurements of gauge USGS03007800; PA(t) is the basin-averaged precipitation defined in Table 1.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Distribution of flood event characteristics from the database: (a) base flow index and runoff coefficient and (b) duration of triggering precipitation.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Distribution of flood event characteristics from the database: (a) base flow index and runoff coefficient and (b) duration of triggering precipitation.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Distribution of flood event characteristics from the database: (a) base flow index and runoff coefficient and (b) duration of triggering precipitation.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Categorization of events facilitates different levels of flood studies. We categorized all events in our database into three classes by evaluating the runoff coefficient R, and the lag between the start time of flow and that of triggering precipitation, tlag, of each event. If the matched precipitation provided enough fast flow (i.e., R < 1), and the causal relationship between triggering precipitation and flood event held (i.e., tlag > 0), we labeled the event as category 3; if only the first condition was satisfied, we labeled it as category 2; if neither condition held, we labeled it as category 1. The timing error in category 2 and both the volume and timing errors in category 1 came from snowmelt contribution and/or data error. For studies at quantitative, qualitative, and basic levels, we correspondingly recommend using events of only category 3, categories 2 and 3, and all categories, as demonstrated below.
In Fig. 4a, rivers where melting snow makes a significant contribution to floods are identified by evaluating the R value of events in category 1. If R is greater than 1.2, it indicates a shortage of at least 20% of associated precipitation. The shortage may come from the fact that the snowmelt contribution was not considered in the CPM or due to error in the precipitation data. To moderate this ambiguity, we highlighted the gauges that had such events in at least one in 5 years of the available data record. By comparing with Fig. 4b, we note that the spatial pattern of USGS gauge locations with events affected by snow melting was in agreement with that of the annual snowfall. It should be noted that the snowfall locations and snowmelt-affected flows are expected to exhibit spatial and temporal lags, depending on the basin sizes and river lengths. A characteristic example is snowmelt-affected stream gauge records in the coastal areas of California and the central and southern plains that exhibit spatial lags relative to the snowfall locations.

Basins with floods to which melting snow contributed: (a) gauges of flood events with snowmelt contributions inferred from the derived database and (b) average annual snowfall [source: National Oceanic and Atmospheric Administration (NOAA)/National Climatic Data Center (NCDC)].
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Basins with floods to which melting snow contributed: (a) gauges of flood events with snowmelt contributions inferred from the derived database and (b) average annual snowfall [source: National Oceanic and Atmospheric Administration (NOAA)/National Climatic Data Center (NCDC)].
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Basins with floods to which melting snow contributed: (a) gauges of flood events with snowmelt contributions inferred from the derived database and (b) average annual snowfall [source: National Oceanic and Atmospheric Administration (NOAA)/National Climatic Data Center (NCDC)].
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Figure 5 analyzes the dependence of flood event characteristics on basin morphometry. As stated by Costa (1987), the number of flood events greatly reduces as a function of drainage area A and drainage density Dd and flood peak tends to increase as a function of elongation ratio Re. Figures 5a and 5b show the histogram of the annual number of events with respect to A and Dd, respectively, using events in all categories and of peak value greater than the 90th percentile of the time series. Although mean annual count of events varies among gauges of similar A because of the different climate conditions, A has a negative correlation to the mean annual count of flood events. We have not, however, observed a declining trend of this count with Dd in Fig. 5b. A possible explanation is that A dominates Dd in reducing the probability of flood events. To verify this, Fig. 5c illustrates the count dependence on both factors. For the drainage area bin between 25 and 300 km2, we observe a nearly monotonically declining trend of count with Dd. Other bins do not exhibit such clear trends, since they have limited dynamic range of Dd values. The count is reduced greatly with the drainage area in a similar Dd condition. Note that basins of total channel length shorter than 1 km or drainage area less than 24 km2 are not included in Figs. 5b and 5c, owing to resolution limitations in the 1-km geomorphological maps. To evaluate the flood peak dependence on the elongation ratio among basins of varying sizes, peak flow rate in m3 s−1 was normalized by the drainage area to be converted to mm h−1. Figure 5d exhibits the peak flow rate dependence on mean precipitation and Re. At the same level of precipitation, peak flow increases monotonically with Re. Generally, the peak flow rate should increase with mean precipitation, as shown by most parts of Fig. 5d. The events maximal column of Pmean (31–313 mm h−1) is not exhibited in Fig. 5d because event samples of this high precipitation level are not adequate to be statistically representative.

Dependence of flood event characteristics on geomorphological factors: annual average number of events vs (a) drainage area and (b) drainage density, (c) annual number of events vs both drainage area and drainage density, and (d) normalized peak flow vs elongation ratio and normalized precipitation. In (a) and (b), the 25th and 75th percentiles and mean values are outlined, while in (c) and (d) the mean values of gauges/events are rendered to each bin.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Dependence of flood event characteristics on geomorphological factors: annual average number of events vs (a) drainage area and (b) drainage density, (c) annual number of events vs both drainage area and drainage density, and (d) normalized peak flow vs elongation ratio and normalized precipitation. In (a) and (b), the 25th and 75th percentiles and mean values are outlined, while in (c) and (d) the mean values of gauges/events are rendered to each bin.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Dependence of flood event characteristics on geomorphological factors: annual average number of events vs (a) drainage area and (b) drainage density, (c) annual number of events vs both drainage area and drainage density, and (d) normalized peak flow vs elongation ratio and normalized precipitation. In (a) and (b), the 25th and 75th percentiles and mean values are outlined, while in (c) and (d) the mean values of gauges/events are rendered to each bin.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1







Predictability of (left) centroid and (right) spreadness of flood events from spatial moments of precipitation: (a),(b) two-dimensional intensity maps (number of events) of predicted and observed flood centroid and spreadness hourly grid values in basins with drainage area below 1,000 km2; (c),(d) as in (a),(b), but for basins with drainage area greater than 1,000 km2; (e),(f) density plots of the Pearson correlation coefficients of predicted vs observed flood centroid and spreadness values; (g),(h) as in (e),(f), but for NRMSD.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1

Predictability of (left) centroid and (right) spreadness of flood events from spatial moments of precipitation: (a),(b) two-dimensional intensity maps (number of events) of predicted and observed flood centroid and spreadness hourly grid values in basins with drainage area below 1,000 km2; (c),(d) as in (a),(b), but for basins with drainage area greater than 1,000 km2; (e),(f) density plots of the Pearson correlation coefficients of predicted vs observed flood centroid and spreadness values; (g),(h) as in (e),(f), but for NRMSD.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
Predictability of (left) centroid and (right) spreadness of flood events from spatial moments of precipitation: (a),(b) two-dimensional intensity maps (number of events) of predicted and observed flood centroid and spreadness hourly grid values in basins with drainage area below 1,000 km2; (c),(d) as in (a),(b), but for basins with drainage area greater than 1,000 km2; (e),(f) density plots of the Pearson correlation coefficients of predicted vs observed flood centroid and spreadness values; (g),(h) as in (e),(f), but for NRMSD.
Citation: Bulletin of the American Meteorological Society 98, 7; 10.1175/BAMS-D-16-0125.1
SUMMARY.
This article described a newly derived flood events database for the CONUS area. This database, containing the most flood events and descriptors, is more comprehensive than currently available flood event datasets. A unique aspect of it is the association of the flood events to the triggering precipitation characteristics. Correlation of flood event concentration time and spreadness to the precipitation spatial moments and evaluation of the initial influence of melting snow on floods consolidates the quality of the database and demonstrates its potential for supporting GIUH applications and flood vulnerability investigations, among many other studies. The article also showed the dependence of the number of flood events and flow peak values on geomorphological characteristics. The confirmation and refinement of existing dependences reveals the possibility of discovering and evaluating more elaborate and multivariant statistical relationships between flood characteristics and basin geomorphological factors.
A limitation of this database primarily comes from the use of Stage IV precipitation data available at hourly intervals and 4-km spatial resolution. Therefore, short-duration (1–4 h) and localized flood events that do not exhibit the complete set of flood characteristics defined in the CPM are not identifiable. Furthermore, the precipitation spatial moments in small watersheds (areas < 100 km2) are less accurate owing to the spatial resolution (∼16 km2) of the precipitation dataset.
This database, which is available to the research community (http://ucwater.engr.uconn.edu/fedb/), can support a number of flood modeling and vulnerability analysis studies. We also expect it to be used jointly with distributed basin morphometric datasets (Shen et al. 2016b) to extend the skills mentioned above to ungauged basins (e.g., predicting the a and b parameters by geomorphological and geophysical features) or with infrastructure and socioeconomic datasets to assess social impacts of floods. We expect to update the database annually over the CONUS area based on newly released USGS streamflow and Stage IV precipitation records. Furthermore, extension of this database to earlier years, incorporation of finer-resolution precipitation analysis, and extension of its coverage globally based on Earth observation datasets are among our future research directions.
ACKNOWLEDGMENTS
The study was supported by the Connecticut Institute for Resilience and Climate Adaptation (CIRCA). The USGS instantaneous flow data from before October 2007 were shared by Dr. Zachery Flamig at the University of Oklahoma via http://flash.ou.edu/USGS/, and the records from after October 2007 were downloaded via http://waterdata.usgs.gov/nwis, the USGS National Water Information System (NWISWeb). The National Stage IV QPE product was downloaded via www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/, hosted by the National Centers for Environmental Prediction (NCEP), NOAA. This paper was edited by Dr. Lisa Ferraro Parmelee, manager of LFP Editorial Enterprises LLC. The dataset can be downloaded from http://ucwater.engr.uconn.edu/fedb.
REFERENCES
Adhikari, P., Y. Hong, K. R. Douglas, D. B. Kirschbaum, J. Gourley, R. Adler, and G. R. Brakenridge, 2010: A digitized global flood inventory (1998–2008): Compilation and preliminary results. Nat. Hazards, 55, 405–422, doi:10.1007/s11069-010-9537-2.
Apel, H., G. Aronica, H. Kreibich, and A. Thieken, 2009: Flood risk analyses—How detailed do we need to be? Nat. Hazards, 49, 79–98, doi:10.1007/s11069-008-9277-8.
Calianno, M., I. Ruin, and J. J. Gourley, 2013: Supplementing flash flood reports with impact classifications. J. Hydrol., 477, 1–16, doi:10.1016/j.jhydrol.2012.09.036.
Costa, J. E., 1987: Hydraulics and basin morphometry of the largest flash floods in the conterminous United States. J. Hydrol., 93, 313–338, doi:10.1016/0022-1694(87)90102-8.
Diakakis, M., S. Mavroulis, and G. Deligiannakis, 2012: Floods in Greece, a statistical and spatial approach. Nat. Hazards, 62, 485–500, doi:10.1007/s11069-012-0090-z.
Du, S., H. Gu, J. Wen, K. Chen, and A. Van Rompaey, 2015: Detecting flood variations in Shanghai over 1949–2009 with Mann-Kendall tests and a newspaper-based database. Water, 7, 1808–1824, doi:10.3390/w7051808.
Gaume, E., and Coauthors, 2009: A compilation of data on European flash floods. J. Hydrol., 367, 70–78, doi:10.1016/j.jhydrol.2008.12.028.
Gourley, J. J., J. M. Erlingis, T. M. Smith, K. L. Ortega, and Y. Hong, 2010: Remote collection and analysis of witness reports on flash floods. J. Hydrol., 394, 53–62, doi:10.1016/j.jhydrol.2010.05.042.
Gourley, J. J., and Coauthors, 2013: A unified flash flood database across the United States. Bull. Amer. Meteor. Soc., 94, 799–805, doi:10.1175/BAMS-D-12-00198.1.
Hall, J., and Coauthors, 2015: A European flood database: Facilitating comprehensive flood research beyond administrative boundaries. Proc. Int. Assoc. Hydrol. Sci., 370, 89–95, doi:10.5194/piahs-370-89-2015.
Jayakrishnan, R., R. Srinivasan, C. Santhi, and J. Arnold, 2005: Advances in the application of the SWAT model for water resources management. Hydrol. Processes, 19, 749–762, doi:10.1002/hyp.5624.
Klazura, G. E., and D. A. Imy, 1993: A description of the initial set of analysis products available from the NEXRAD WSR-88D system. Bull. Amer. Meteor. Soc., 74, 1293–1311, doi:10.1175/1520-0477(1993)074<1293:ADOTIS>2.0.CO;2.
Lehner, B., K. Verdin, and A. Jarvis, 2006: HydroSHEDS technical documentation, version 1.0. U.S. World Wildlife Fund, 27 pp. [Available online at https://hydrosheds.cr.usgs.gov/HydroSHEDS_TechDoc_v10.doc.]
Mei, Y., and E. N. Anagnostou, 2015: A hydrograph separation method based on information from rainfall and runoff records. J. Hydrol., 523, 636–649, doi:10.1016/j.jhydrol.2015.01.083.
Park, D., and M. Markus, 2014: Analysis of a changing hydrologic flood regime using the Variable Infiltration Capacity model. J. Hydrol., 515, 267–280, doi:10.1016/j.jhydrol.2014.05.004.
Rigon, R., M. Bancheri, G. Formetta, and A. de Lavenne, 2016: The geomorphological unit hydrograph from a historical-critical perspective. Earth Surf. Processes Landforms, 47, 27–37, doi:10.1002/esp.3855.
Santos, M., J. Santos, and M. Fragoso, 2015: Historical damaging flood records for 1871–2011 in Northern Portugal and underlying atmospheric forcings. J. Hydrol., 530, 591–603, doi:10.1016/j.jhydrol.2015.10.011.
Shen, X., Y. Hong, K. Zhang, H. Li, and Z. Hao, 2016a: Refining a distributed linear reservoir routing method to improve performance of the CREST model. J. Hydrol. Eng., 22, doi:10.1061/(ASCE)HE.1943-5584.0001442.
Shen, X., H. J. Vergara, E. I. Nikolopoulos, E. N. Anagnostou, Y. Hong, Z. Hao, K. Zhang, and K. Mao, 2016b: GDBC: A tool for generating global-scale distributed basin morphometry. Environ. Modell. Software, 83, 212–223, doi:10.1016/j.envsoft.2016.05.012.
Wu, H., J. S. Kimball, N. Mantua, and J. Stanford, 2011: Automated upscaling of river networks for macroscale hydrological modeling. Water Resour. Res., 47, W03517, doi:10.1029/2009WR008871.
Xu, Y.-G., B. He, S.-L. Chung, M. A. Menzies, and F. A. Frey, 2004: Geologic, geochemical, and geophysical consequences of plume involvement in the Emeishan flood-basalt province. Geology, 32, 917–920, doi:10.1130/G20602.1.
Zoccatelli, D., M. Borga, A. Viglione, G. Chirico, and G. Blöschl, 2011: Spatial moments of catchment rainfall: Rainfall spatial organisation, basin morphology, and flood response. Hydrol. Earth Syst. Sci., 15, 3767–3783, doi:10.5194/hess-15-3767-2011.