Despite flash flooding being one of the most deadly and costly weather-related natural hazards worldwide, individual datasets to characterize them in the United States are hampered by limited documentation and can be difficult to access. This study is the first of its kind to assemble, reprocess, describe, and disseminate a georeferenced U.S. database providing a long-term, detailed characterization of flash flooding in terms of spatiotemporal behavior and specificity of impacts. The database is composed of three primary sources: 1) the entire archive of automated discharge observations from the U.S. Geological Survey that has been reprocessed to describe individual flooding events, 2) flash-flooding reports collected by the National Weather Service from 2006 to the present, and 3) witness reports obtained directly from the public in the Severe Hazards Analysis and Verification Experiment during the summers 2008–10. Each observational data source has limitations; a major asset of the unified flash flood database is its collation of relevant information from a variety of sources that is now readily available to the community in common formats. It is anticipated that this database will be used for many diverse purposes, such as evaluating tools to predict flash flooding, characterizing seasonal and regional trends, and improving understanding of dominant flood-producing processes. We envision the initiation of this community database effort will attract and encompass future datasets.
Flash floods are rapid surface water responses over normally dry land to intense rainfall or a sudden release of water from a dam break or ice jam, and have significant impacts on transportation, infrastructure, and human safety. While in recent decades hydrometeorologists have significantly advanced our comprehension of synoptic and mesoscale environments conducive to intense rainfall, tools used by forecasters to predict the hydrologic response, location, timing, and magnitude of the social impact have not progressed commensurately. Predictability is determined by the development and refinement of a physical system model, and it is strongly dependent on the available data. In this case, data involve a unified observational dataset of flash floods, which holds great potential for advancing the science and predictability of flash flooding. This article serves the following purposes: 1) to announce a recently assembled U.S.-wide database on flash flooding available in multiple, common formats to reach a wide range of users, and 2) to encourage others to submit additional, related datasets that can be incorporated into the database.
There is no single source of information that holistically describes flash flooding in the United States. Perhaps it is the diverse and discontinuous nature of flash-flooding impacts that makes them difficult to observe and subsequently catalogue in a consistent database. Flash floods differ from other weather-related hazards (e.g., tornadoes, hail) in that their impacts are strongly controlled by surface properties, infrastructure, and social exposure factors. Streamflow measurements operated and maintained by the U.S. Geological Survey (USGS) benefit from automation and high temporal resolution, resulting in long-term, continuous records at each gauge site. These instruments require electrical power and road access for communications, regular instrument maintenance, and manual measurements to empirically establish a rating curve (i.e., the relationship between the measured stage and the desired discharge). The costs associated with these requirements (on the order of $10,000 per year per gauge site) imply that automated streamflow measurements are relatively uncommon in small basins where flash floods occur.
Forecasters at local National Weather Service (NWS) offices throughout the United States routinely collect reports of flash flooding from trained spotters, local authorities, and emergency management officials within their areas of responsibility. The NWS Storm Events Database is essential for evaluating and improving operational forecast products and procedures. Details contained in the reports, such as information about the meteorological environment, a dollar estimate of the damages, and number of fatalities, have yielded very useful information to the research community as well. Limitations of the database include 1) subjectivity in the reported event locations defined by often imprecise, bounding polygons; 2) times that are often related to the meteorological event rather than the flooding impacts; 3) reports are dependent on a person to witness the event; and 4) reports provide little or no information about the site's societal exposure or antecedent conditions.
The third database considered in this study comes from the Severe Hazards Analysis and Verification Experiment (SHAVE), which was conducted across the United States during the summers of 2008–10 at the National Severe Storms Laboratory (NSSL). Student callers obtained details on the specific type of impact, magnitude, and frequency of flash flooding directly from members of the public who responded to a telephone-based questionnaire. This georeferenced, survey-based approach overcomes some of the imprecision noted with the NWS Storm Events Database, but it is based on unreliable reports from the general public. Also, NSSL researchers designed the SHAVE data collection methodology for high-resolution, storm-targeted reports, and thus the dataset does not encompass all events at a given time.
The assembly of the three aforementioned datasets into a unified, consistent database retains the inherent limitations associated with each one, yet the database combines the high-resolution details from SHAVE with the broad spatial coverage and event narratives from the NWS storm reports with the automated streamflow measurements from USGS to provide a more complete depiction of flash flooding across the United States. The database is freely available to the public at www.nssl.noaa.gov/projects/flash/database.php, and we provide it in three different formats for a variety of users who may be interested in quick-and-easy plots, detailed spatial investigations, or statistical analysis using the raw data.
DATA SOURCES AND PROCESSING STEPS.
USGS discharge measurements.
We obtained the USGS archive of instantaneous streamflow data from 10,106 gauges with records dating from July 1927 through September 2010 in a MySQL database format. The USGS maintains the instantaneous data for most stations beginning in the mid-1980s, and with data intervals commonly ranging from 5 to 60 min. The public can directly access this archive at http://ida.water.usgs.gov/ida. While applications such as calibrating and evaluating hydrologic models for water budget studies and water resources management require time-series data, these data require additional processing in order to define when flooding events occurred and to determine the magnitude of flooding. NWS offices have defined stage heights associated with stream conditions (bankfull; action; and minor, moderate, and major flooding) for 3,490 stream gauge locations. These thresholds were defined in coordination with the local emergency management and stakeholder community and were based on impacts to lives and/or property. Often, the bankfull stage is the same as the minor flood stage (often referred to as the flood stage). However, in more rural areas, the flood stage may be greater than the bankfull stage due to the lack of infrastructure situated in close proximity to the streams.
We converted the USGS database to an “event-based” database for flood studies in the following manner. We identified all events that exceeded their predefined action stage for each station; this is most often the lowest stage height threshold defined by the NWS. According to the NWS Directive 10–950–Hydrologic Services Program Definitions and General Terminology, action stage is “the stage which when reached by a rising stream, lake, or reservoir represents the level where the NWS or a partner/user needs to take some type of mitigation action in preparation for possible significant hydrologic activity” (www.nws.noaa.gov/directives/sym/pd01009050curr.pdf). In total, there were 98,668 events in the database that exceeded action stages at 2,948 of the gauges in the USGS archive (see Fig. 1); 665 of these gauges have catchment areas less than 250 km2. For each event, we provide the following information: USGS Gauge ID, latitude (decimal degrees), longitude (decimal degrees), start time (UTC) at which the flow exceeded the action stage threshold, end time (UTC) when the flow dropped below the threshold, peakflow magnitude (m3 s−1), peak time (UTC) at which peakflow occurred (UTC), and the difference between start time and peak time (in hours).
This latter variable, referred to hereafter as the flooding rise time, is a proxy for the time-to-rise and is plotted against basin catchment area in Fig. 2 for all 98,668 events combined. We see there is a clear relationship between the proxy “flashiness” of an event and catchment area. The spread, represented by the gray-shaded interquantile areas, results from different antecedent conditions, variable basin geomorphologies that impact their responses to rainfall, and events with rainfall that fell near the basin outlet. While there is no formal method to segregate flash floods from river floods, we recommend users apply a threshold to the flooding rise time rather than the catchment area. From the figure, we see that applying a threshold of 6 h corresponds to a median basin catchment area of approximately 250 km2, and is the recommended flooding rise time threshold for flash flood events.
Along with the events dataset, we supply metadata for each station containing static information about the USGS station's ID, latitude (decimal degrees), longitude (decimal degrees), hydrologic unit code (HUC), agency, degree of regulation, gauge name, drainage area (km2), contributing drainage area (km2), computed flows (m3 s−1) for recurrence intervals for 2, 5, 10, 25, 50, 100, 200, and 500 yr, and computed flows (m3 s−1) for action stage, minor, moderate, and major flooding. The USGS has previously computed flows for recurrence intervals (return periods) from 2 to 500 years using a Log-Pearson Type III distribution for those stations with at least 10 years of record and no significant changes in the record due to urbanization, diversion, or regulation. The degree of regulation field comes directly from USGS metadata for peakflow data and has values of either “Yes,” “No,” or “Undefined.”
Each station's event data and metadata are grouped by a first-level, two-digit hydrologic unit code (HUC), which represents a basin scale at the regional level. We provide the processed USGS flood event database for HUC basins in the United States and Caribbean in the following three formats: 1) comma-delimited text file, 2) GIS shapefile, and 3) KMZ file for Google Earth. Use of KMZ files yields quick-and-easy displays, while the provision of shapefiles enables more in-depth spatial analysis using GIS software. The comma-delimited files can be read by a number of commonly available statistical software packages. Some users may also wish to access the text files directly for use in originally developed code and scripts. Data for Alabama, North Dakota, South Dakota, and Wyoming were not available when we obtained the database. Station data for these states will be added in future updates to the database (to be done on an annual basis).
NWS storm reports.
We obtained all NWS reports of flash flooding from 1 October 2006 to 31 December 2011 from the database managed by the NWS Performance Branch. Prior to 1 October 2007, the NWS recorded storm reports by county; any instance of flash flooding yielded a recorded event for the county. Because counties are delineated primarily according to political, rather than physical, geographic boundaries and their sizes vary considerably across the United States, this procedure lacks the necessary spatial resolution for meaningful hydrological interpretation. NWS forecasters now report the locations of impacted regions using bounding polygons defined by as many as eight vertices. There was a transitional period in implementing the new procedures, so it is not uncommon for polygons to have the same shape as a county or to be recorded as a point, especially in 2006–20. In addition to the reports nominally being stored as a single county-wide point prior to 1 October 2007, there are also instances in which event locations were recorded with only two vertices. In the latter case, we converted the two points into a single polygon feature by assigning the vertices as the diagonal of a bounded rectangle.
According to the NWS Storm Data Directive, a recorded flash flood must have posed a potential threat to life or property and had a report of moving water with a depth greater than 0.15 m (6 in.) or more than 0.91 m (3 ft) of standing water. Typical situations that meet these criteria include rivers and streams out of their banks, evacuations, rescues, road closures, and floodwaters in an above-ground residence (i.e., not a basement). Each report from the NWS database contains a unique ID, the three-letter abbreviation of the NWS forecast office (WFO) that reported the event, beginning and ending time of event (UTC), state, county, NWS region, direct/indirect fatalities and injuries (if applicable), a dollar estimate of property and crop damage (if applicable), details about the event including its cause (e.g., heavy rain), source of report (e.g., law enforcement), event and episode narratives, and vertex coordinates in decimal degrees of latitude and longitude as well as the range (miles) and azimuth (e.g., NE) from the nearest city. We used the entries in the vertex coordinate fields to create individual polygons for display and analysis in Google Earth and GIS software. In this conversion, we noted that the maximum allowable characters in the event and episode narrative fields were often exceeded for shapefile and KMZ formats. In some cases, these narratives can be several sentences long. The full narratives are preserved in the comma-delimited format.
Figure 3 shows the locations of all 19,419 flash flood reports contained in the NWS database. The limitations inherent in this database include poor precision and accuracy in both the timing and spatial extents of flash flooding. Often, the meteorological event timing is taken as flash flood timing. Regarding spatial extent, it's not clear that bounding polygons are appropriate for delineating impacted regions. Unlike rainfall amounts, flash-flood impacts are often spatially discontinuous and may be associated with difficult-to-contour features such as road networks. Consistent with the processing of the USGS dataset, we segregated the NWS flash-flood reports into regional, two-digit HUC basins. We provide files separately for point-based reports versus polygons. The same file formats used for the USGS dataset (i.e., comma-delimited text, GIS shapefile, and KMZ) are utilized for the NWS flash flooding reports.
SHAVE questionnaire responses.
During the summers of 2008–10, the NSSL employed 5–6 undergraduate meteorology students to collect unique details on flash-flooding impacts at very high resolution. The experiment designers utilized WSR-88D-based warning products and flash-flood warnings issued by the NWS and displayed them in Google Earth to guide the SHAVE callers where flash flooding may have just occurred. Students initiated calls if rainfall exceeded flash-flood guidance, there was a NWS warning or advisory, or a survey response for a different hazard (i.e., hail) suggested flash flooding was a problem. Then, callers employed a purposeful sampling strategy to better refine the spatial extent of flash flooding. They used a georeferenced telephone database to call the public and initiate a questionnaire designed to obtain details about flash-flooding impacts, including the depth and movement of flood waters, lateral extent of water out of the stream, incidence of rescues and evacuations, start and end times of impacts, respondent-estimated frequency of event, and types of impacts. The callers also included detailed comments to assess the uncertainty and validity of the reports as well as to include other anecdotal responses that didn't readily fit into one of the survey questions. The SHAVE dataset was postprocessed in order to better classify the impact types and to incorporate additional geographical attributes into each report, including land use, local terrain slope, contributing drainage area, compound topographic index (relates to dominant runoff process), and population density. Future efforts planned for SHAVE in 2013 include the collection of social science data on human behavior prior to and during flash-flooding events.
Figure 4 shows the spatial distribution of all 9,369 reports collected during SHAVE. SHAVE was operational only when undergraduate students were available on a full-time basis during the warm season from early May through the end of August in 2008–10. The SHAVE database differs from the NWS storm reports in that it is storm-targeted and point-based. The NWS reports are meant to encompass all flash-flooding events across the United States from 2006 to 2011, while the SHAVE reports are for specific storms. The SHAVE dataset provides for the assessment of false alarm rates (i.e., forecast of flood with no observed event) because it includes reports of “no flooding.” In fact, this class comprises 73% of the total reports. Accounts from human reports, especially the untrained public, are subject to uncertainty due to perceptions and occasional embellishments, and must be used with caution. Users are encouraged to refer to the supplied metadata for additional information about each field. In concert with the other databases, we provide the SHAVE data in comma-delimited text format, KMZ, and GIS shapefile format for each of the regional HUC basins.
Other candidate data sources.
An additional goal of this article is to reach out to other agencies, universities, and companies who maintain datasets related to the observation of flash flooding. For example, active and passive microwave sensors positioned on the ground and in space (e.g., 37-GHz channel onboard Tropical Rainfall Measurement Mission, Moderate Resolution Imaging Spectroradiometer onboard the Terra and Aqua satellites, Advanced Spaceborne Thermal Emission and Reflection Radiometer onboard Terra) have shown the potential for monitoring inundated areas near streams and provide reasonable estimates of surface water fluxes and depths (see e.g., http://oas.gsfc.nasa.gov/floodmap). Local and regional networks of cameras or ultrasonic sensors mounted near bridges and tunnels can be very useful for detecting flash floods. There are also myriad networks of stream gauges and rain gauges, many of which are used by NWS offices, operated by cities, counties, districts, water boards, bureaus, private companies, and tribes. For example, Community Collaborative Rain, Hail and Snow Network (CoCoRaHS) observers are encouraged to report flooding along with their rainfall reports. In 2011, the Iowa Flood Center based at The University of Iowa developed and deployed more than 100 fully automated and autonomous stream-stage sensors on local bridges. NWS forecasters use these data in real time, and we plan to add the data to the U.S. flash-flood database.
Postevent surveys serve as the primary basis for a flash-flood database already built for Europe. For instance, while the NWS storm events dataset generally contains surveyed events, further details from field investigations would certainly be useful to estimate peakflows. There is a wealth of information on flash-flood impacts that could be made available by insurance agencies. Thus far, social science data such as human exposure, behavior, perceived risks, and responses constitutes a significant missing component in the unified flash-flood database. Once social science data are included in SHAVE—a planned activity for 2013—we will be able to gain a more comprehensive understanding of flash-flood impacts through an end-to-end analysis of the physical and societal components. Social media have also been shown to provide potential for volunteer reporting by the public, including submission of georeferenced photographs of flooded lands. Finally, incorporation of long-term archives (decadal) of gridded rainfall datasets from WSR-88D-based products will be essential for understanding the role of the causative rainfall to the quickness and intensity of basin responses. We encourage those who maintain datasets like the ones described here to consider making them a part of the U.S. unified flash-flood database for freely accessible, community use.
This article describes the data sources and processing steps to create a unified database of flash-flood observations across the United States that is now available to the community at www.nssl.noaa.gov/projects/flash/database.php. The database comprises streamflow observations maintained by the USGS, storm reports collected by the NWS from trained personnel, and public survey responses to a questionnaire developed for the SHAVE experiment. We have rigorously postprocessed all datasets for consistency in terms of data formats, time formats, geographic projection, and units. The database will be updated once per year in order to include recent USGS streamflow data, storm reports from the NWS, and SHAVE.
We expect the announcement of this unified database will result in the inclusion of additional datasets relevant to flash-flood observation, specifically those that provide social science data. The overall goal is to provide a comprehensive observational database on flash-flood impacts, which will enhance the research community's understanding of the social, physical, and economic effects of flash flooding.
Funding was provided by the NOAA/Office of Oceanic and Atmospheric Research under NOAA-University of Oklahoma Cooperative Agreement NA17RJ1227, the U.S. Department of Commerce, and the Advanced Hydrologic Prediction Service of the NOAA/National Weather Service. WFK acknowledges the support of the Iowa Flood Center.