Precipitation Proxies for Flash Flooding: A Seven-Year Analysis over the Contiguous United States

Eric P. James aNOAA/Global Systems Laboratory, Boulder, Colorado
bDepartment of Atmospheric Science, Colorado State University, Fort Collins, Colorado

Search for other papers by Eric P. James in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-6507-4997
and
Russ S. Schumacher bDepartment of Atmospheric Science, Colorado State University, Fort Collins, Colorado

Search for other papers by Russ S. Schumacher in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

Flash flooding remains a challenging prediction problem, which is exacerbated by the lack of a universally accepted definition of the phenomenon. In this article, we extend prior analysis to examine the correspondence of various combinations of quantitative precipitation estimates (QPEs) and precipitation thresholds to observed occurrences of flash floods, additionally considering short-term quantitative precipitation forecasts from a convection-allowing model. Consistent with previous studies, there is large variability between QPE datasets in the frequency of “heavy” precipitation events. There is also large regional variability in the best thresholds for correspondence with reported flash floods. In general, flash flood guidance (FFG) exceedances provide the best correspondence with observed flash floods, although the best correspondence is often found for exceedances of ratios of FFG above or below unity. In the interior western United States, NOAA Atlas 14 derived recurrence interval thresholds (for the southwestern United States) and static thresholds (for the northern and central Rockies) provide better correspondence. The 6-h QPE provides better correspondence with observed flash floods than 1-h QPE in all regions except the West Coast and southwestern United States. Exceedances of precipitation thresholds in forecasts from the operational High-Resolution Rapid Refresh (HRRR) generally do not correspond with observed flash flood events as well as QPE datasets, but they outperform QPE datasets in some regions of complex terrain and sparse observational coverage such as the southwestern United States. These results can provide context for forecasters seeking to identify potential flash flood events based on QPE or forecast-based exceedances of precipitation thresholds.

Significance Statement

Flash floods result from heavy rainfall, but it is difficult to know exactly how much rain will cause a flash flood in a particular location. Furthermore, different precipitation datasets can show very different amounts of precipitation, even from the same storm. This study examines how well different precipitation datasets and model forecasts, used by forecasters to warn the public of flash flooding, represent heavy rainfall leading to flash flooding around the United States. We found that different datasets have dramatically different numbers of heavy rainfall events and that high-resolution model forecasts of heavy rain correspond with observed flash flood events about as well as precipitation datasets based on rain gauge and radar in some regions of the country with few observations.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Eric James, eric.james@noaa.gov

Abstract

Flash flooding remains a challenging prediction problem, which is exacerbated by the lack of a universally accepted definition of the phenomenon. In this article, we extend prior analysis to examine the correspondence of various combinations of quantitative precipitation estimates (QPEs) and precipitation thresholds to observed occurrences of flash floods, additionally considering short-term quantitative precipitation forecasts from a convection-allowing model. Consistent with previous studies, there is large variability between QPE datasets in the frequency of “heavy” precipitation events. There is also large regional variability in the best thresholds for correspondence with reported flash floods. In general, flash flood guidance (FFG) exceedances provide the best correspondence with observed flash floods, although the best correspondence is often found for exceedances of ratios of FFG above or below unity. In the interior western United States, NOAA Atlas 14 derived recurrence interval thresholds (for the southwestern United States) and static thresholds (for the northern and central Rockies) provide better correspondence. The 6-h QPE provides better correspondence with observed flash floods than 1-h QPE in all regions except the West Coast and southwestern United States. Exceedances of precipitation thresholds in forecasts from the operational High-Resolution Rapid Refresh (HRRR) generally do not correspond with observed flash flood events as well as QPE datasets, but they outperform QPE datasets in some regions of complex terrain and sparse observational coverage such as the southwestern United States. These results can provide context for forecasters seeking to identify potential flash flood events based on QPE or forecast-based exceedances of precipitation thresholds.

Significance Statement

Flash floods result from heavy rainfall, but it is difficult to know exactly how much rain will cause a flash flood in a particular location. Furthermore, different precipitation datasets can show very different amounts of precipitation, even from the same storm. This study examines how well different precipitation datasets and model forecasts, used by forecasters to warn the public of flash flooding, represent heavy rainfall leading to flash flooding around the United States. We found that different datasets have dramatically different numbers of heavy rainfall events and that high-resolution model forecasts of heavy rain correspond with observed flash flood events about as well as precipitation datasets based on rain gauge and radar in some regions of the country with few observations.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Eric James, eric.james@noaa.gov

1. Introduction

Flash flooding remains a difficult prediction problem and one of high societal importance due to the projected increases in flash-flood-related losses due to population growth (e.g., Downton et al. 2005) and climate change (Prein et al. 2017). One of the key challenges with flash flood forecasting is the lack of a universally accepted definition of the phenomenon. The National Weather Service (NWS) defines a flash flood as “a rapid rise in water levels, along rivers, creeks, normally dry washes, arroyos, or even normally dry land areas, generally occurring within 6 h of the causative rainfall or other event” (NWS 2023). Even beyond the formal definition, the likelihood of flash flooding resulting from a given intensity and duration of heavy rain is strongly dependent upon a hydrologic response, which dramatically varies regionally and in time. Due to these complications, it is helpful for forecasters to have a quick way to see what magnitude of rainfall accumulation or rate is climatologically anomalous or would cause a flood response given other hydrologic factors. One way to sift through the available information is to filter out events that are climatologically or hydrologically not as likely to cause flooding. A given rainfall accumulation over a given duration could have vastly different impacts depending on location or time (i.e., antecedent conditions, land surface type, vegetation, and topography). In this regard, it is important to have accurate estimates of the precipitation threshold (amount of precipitation over a given time period) beyond which flash flooding may occur at high spatial and temporal resolutions. From a climatological perspective, NOAA Atlas 14 (Bonnin et al. 2006; Perica et al. 2011, 2013a,b, 2015, 2018) is intended to reflect the average amount of precipitation that corresponds to a given recurrence interval, highlighting statistically “rarer” precipitation events which could produce flash flooding. Flash flood guidance (FFG), on the other hand, is intended to reflect hydrologic capacity given the soil information and antecedent conditions. These thresholds are available down to the 1-h temporal scale, but there is some indication that precipitation accumulations at even finer temporal scales (e.g., 15 min) could be important for some types of flooding events (e.g., landslides; Kean et al. 2019).

Ultimately, treatment of the hydrological factors influencing the probability of flash flooding is appropriately handled only with advanced hydrologic models. The Flooded Locations and Simulated Hydrographs (FLASH) system (Gourley et al. 2017) now provides gridded comparisons of radar-based quantitative precipitation estimates (QPEs) with recurrence intervals and FFG, as well as an ensemble hydrologic model which provides high-resolution and frequently updated streamflow predictions. Recent work has also coupled the FLASH system with an experimental short-range ensemble forecast system (Yussouf et al. 2020; Martinaitis et al. 2023), enabling improved meteorological forcing and therefore improved streamflow forecasts. However, these novel applications are restricted to shorter lead times (3–6 h), and there remains a need for comparison of longer-range convection-allowing model (CAM) forecasts with precipitation thresholds of interest.

In response to the somewhat ambiguous nature of flash flood events, and because of issues with the flash flood report (FFR) dataset, the Weather Prediction Center (WPC) has developed a dataset, known as the Unified Flooding Verification System (UFVS), which combines FFRs with “proxy” flood events derived from gridded comparisons of QPE versus several thresholds (Erickson et al. 2019). This dataset builds upon earlier efforts to create a flash flood dataset which merges several data sources (Gourley et al. 2013), and is used to examine the performance of WPC’s operational excessive rainfall outlooks (EROs; Erickson et al. 2021), as well as other potential forecast guidance products such as the Colorado State University random forest (RF) systems (Herman and Schumacher 2018a; Schumacher et al. 2021).

Previous work has evaluated the correspondence between QPEs compared with precipitation thresholds potentially of interest for the onset of flash flooding with reported flash flood events. Herman and Schumacher (2018b, HS18 hereafter) compared several common QPE datasets against fixed precipitation thresholds, average recurrence intervals (ARIs) from NOAA Atlas 14 and other sources, and FFG, examining their correspondence with both FFRs and NWS-issued flash flood warnings during a 2.5-yr period. They found that the best correspondence is for 2.5 in. of precipitation in 24 h considering the CONUS as a whole, with regionally varying results for ARI exceedances. Gourley and Vergara (2021, GV21 hereafter) carried out a similar analysis using a more recent version of the Multi-Radar Multi-Sensor (MRMS) product, finding the best agreement with FFRs at shorter accumulation periods and much higher fixed thresholds and also better performance for more sophisticated approaches such as ARI and FFG comparisons. Schumacher and Herman (2021) demonstrated that most of the differences in results between HS18 and GV21 were due to more frequent temporal sampling by GV21. They highlighted the two separate categories of this type of analysis: one for real-time warning operations, using low-latency and frequently updated QPE datasets (e.g., 2-min MRMS QPE) with overlapping accumulation periods, and another for longer-range forecasting applications which uses more latent, less frequently updated QPE or quantitative precipitation forecasts (QPFs).

The purpose of this study is to extend the analysis of HS18 to a longer time period (7 vs 2.5 years) and to include, in the same analysis context, forecasts from a state-of-the-art convection-allowing modeling system. Comparing model QPF with various QPE products in this framework provides some guidance for forecasters seeking to use gridded model-based threshold exceedances in their forecasting operations. A comprehensive intercomparison of the population of heavy rainfall events in different QPE products is also important for the purpose of obtaining a dataset of events for training machine learning prediction systems (e.g., Hill and Schumacher 2021). Although it would be instructive to include running accumulation QPEs to quantify agreement when including overlapping time periods (as done by GV21), we focus on nonoverlapping 1- and 6-h periods in order to include QPE datasets such as stage IV and to facilitate comparison with HS18; thus, our results will be more applicable to longer-range forecasting rather than real-time warning operations. It is anticipated, as demonstrated by Schumacher and Herman (2021), that this will reduce the relative number of exceedances, especially at the higher precipitation thresholds.

In the following section, the datasets used in the analysis are described. Section 3 outlines the methodology used for the analysis. Section 4 presents results, section 5 discusses operational considerations, and section 6 provides a discussion and conclusions.

2. Datasets

As described by HS18, there are large uncertainties associated with defining the occurrence of a flash flood event. They propose a simple contingency table framework for evaluating correspondence between QPE exceedances of different thresholds and FFRs, keeping in mind all the uncertainties associated with FFRs. We adopt this framework herein to examine the frequency of these so-called “proxy” flash flood events in both QPE and QPF. In this section, we describe the datasets used to set a threshold for flash flooding. Table 1 shows the datasets evaluated in this study, in addition to the time periods included, and data availability. Note that we do not evaluate differences between versions of QPE datasets or the High-Resolution Rapid Refresh (HRRR) model; the version dates are provided as a reference because they could have some impact upon our results. The remainder of this section describes the datasets included in this study.

Table 1.

Datasets included in this study, with the corresponding analysis period. Percent complete indicates the fraction of times in which the CONUS grid is at least 90% spatially complete. See the text for more description on treatment for FFG. CCPA is Climatology-Calibrated Precipitation Analysis, MRMS is Multi-Radar Multi-Sensor, and HRRR is High-Resolution Rapid Refresh.

Table 1.

a. Flash flood reports

In this study, we verify against preliminary FFRs obtained from the Iowa Environmental Mesonet. As documented in prior studies (e.g., Calianno et al. 2013; Clark et al. 2014; HS18), FFRs are subject to significant reporting biases related to population density and time of day, as well as biases related to NWS Weather Forecast Office (WFO) reporting procedures. Regular flooding reports may also at times reflect flooding due to short-duration excessive rainfall; however, including flood reports would introduce many events where the flooding is temporally separated from the causative rainfall event. HS18 additionally compared against NWS flash flood warnings (FFWs) but demonstrated similar results when comparing against either FFRs or FFWs. As a result of this, and to facilitate comparison with HS18, we focus only on FFRs.

Figure 1 shows a map of the spatial distribution of FFRs during the 7-yr period of record included in this study. Consistent with HS18’s 2.5-yr analysis, and with the 20-yr analysis of Ahmadalipour and Moradkhani (2019), FFRs are more common in the south-central and east-central United States, with a secondary maximum in the southwestern United States (along the lower Colorado River valley). In Texas, population influences and/or impacts of urbanized areas which are more prone to flash flooding are evident with the concentrations of FFRs in the Houston, Austin, and Dallas–Fort Worth areas, as well as near smaller cities such as Corpus Christi and Midland–Odessa. Similar urban effects are also evident elsewhere around the United States. The scarcity of FFRs in the Pacific Northwest may be related to the tendency for the local WFOs to record flood reports instead of FFRs; however, most rainfall events in that region do occur on longer time scales than those associated with flash floods. WFOs in other regions, like Lower Michigan, also tend to classify nearly all reports as “flood” rather than “flash flood,” which may further complicate our analysis and results in those areas.

Fig. 1.
Fig. 1.

Number of FFRs received during 2015–21, on a 60 km × 60 km grid.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

The climatological frequency of “true” flash flood events is likely somewhat higher than reflected by the FFR dataset and would likely benefit from a bias correction procedure similar to that developed by Potvin et al. (2019) for the tornado report dataset. However, such a procedure cannot correct bias by introducing FFRs for individual events. As a result, despite its deficiencies, we proceed with using FFRs as the “ground truth” data for our analysis.

b. Precipitation thresholds

Defining a precipitation threshold beyond which flash flooding occurs is a hydrologic problem. In this section, we describe the various approaches to defining a precipitation threshold in order to analyze correspondence with flash flood events.

1) Fixed precipitation thresholds

The simplest approach to defining a precipitation threshold is to use a fixed threshold. In this paper, we use several fixed thresholds in conjunction with different accumulation periods. We select thresholds following HS18 but exclude thresholds exceeding 127 mm for the sake of space.

2) Average recurrence intervals

As described by HS18, the use of ARIs is more complicated. Since the studies of Herman and Schumacher (2016, 2018b), NOAA Atlas 14 (Bonnin et al. 2006; Perica et al. 2011, 2013a,b, 2015) has been updated for Texas (Perica et al. 2018) but still does not include the Pacific Northwest. For this reason, we use the approach of HS18 (described in their appendix B) to estimate recurrence intervals in the Pacific Northwest for these accumulation periods. The ARIs are constructed from rain gauge observations with long records, using spatial statistics to estimate frequencies in regions of sparse observations.

Figure 2 shows the resulting ARIs for the 1- and 3-h accumulation periods; these maps may be compared with HS18 Fig. 1. By definition, ARI values increase monotonically with increasing rarity. There is a spatial pattern with higher values in the southeastern United States and lower values to the north, and especially in the interior western United States. Comparing Fig. 2g with HS18’s Fig. 1g illustrates the changes over Texas associated with the Atlas 14 update there, with more physical detail evident in the revised results (Fig. 2g).

Fig. 2.
Fig. 2.

ARIs, derived primarily from NOAA Atlas 14, for (left) 1- and (right) 3-h durations (mm). Shown are (a),(b) 1-, (c),(d) 2-h, (e),(f) 5-, (g),(h) 10-, (i),(j) 25-, (k),(l) 50-, and (m),(n) 100-yr ARIs. Additional details on the derivation of the ARIs are provided in the text.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

3) Flash flood guidance

FFG describes the fields produced by River Forecast Centers (RFCs), through various approaches, to provide guidance on probable amounts of precipitation over a given period of time required for the onset of bankfull conditions on streams (Clark et al. 2014). The FFG construction methodology varies regionally around the United States, leading to regionally varying performance characteristics. Figure 3 shows the median FFG value for the 7-yr analysis period, as well as the 10th percentile, and the equivalent ARI recurrence interval; the figure may be compared with HS18’s Fig. 2. Consistent spatial patterns emerge between Fig. 3 and HS18’s Fig. 2, despite the difference in the analysis period (7 years here vs 2.5 years in HS18). Median 6-h FFG values range from less than 25 mm in the Pacific Northwest to greater than 150 mm in portions of the southern United States. As shown by HS18, dramatic differences in FFG emerge across RFC boundaries. In particular, the Northwest RFC produces FFG that varies only slightly across accumulation interval, while the California Nevada and Colorado Basin RFCs’ FFG increases dramatically from 1- to 6-h accumulation (Figs. 3a–c). The same pattern is seen for the higher-risk 10th percentile FFGs (Figs. 3g–i). Finally, comparing the median and 10th percentile FFG values reveals that FFG is essentially constant in time in the western United States (e.g., Figs. 3d–f,j–l); this is consistent with the use of the flash flood potential index in these regions, which is based on gridded physiographic information rather than soil moisture estimates (e.g., Clark et al. 2014).

Fig. 3.
Fig. 3.

(a)–(f) Median and (g)–(l) 10th percentile FFG estimates over the 7-yr period of record, showing the actual threshold estimates in (a)–(c) and (g)–(i), as well as the equivalent ARIs in (d)–(f) and (j)–(l). Shown are (left) 1-, (middle) 3-, and (right) 6-h FFG values.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Clark et al. (2014) carried out an analysis of FFG performance by evaluating stage IV QPE versus FFG and comparing it against FFRs, finding the critical success index (CSI) maximizing at 0.2 in the eastern CONUS. It is important to note that some of the low skill evident in the western CONUS in their analysis is likely due to shortfalls in the stage IV QPE (e.g., Nelson et al. 2016). They also found that there was significant skill dependency upon the dataset of flash flood events used for verification.

For FFG in this study, special treatment was carried out to allow extension of the analysis back to 2015. Prior to July 2017, all FFG grids valid at 0600 UTC were missing data for six RFCs covering the western, northern, and central CONUS, and FFG grids at 0000 and 1800 UTC were missing data for the three western RFCs. This was handled by using FFG values from the most recent valid time that provided values for the point in question, as long as it occurred during the previous 24 h. This allows us to achieve 99.1% data coverage for the 2015–21 period (see Table 1).

c. Quantitative precipitation estimates

In this section, we describe the various QPE datasets used in the analysis, building upon the findings of HS18.

1) Stage IV

Stage IV is the RFC-produced precipitation analysis constructed from radar-based QPE and bias-corrected based on rain gauge observations (Nelson et al. 2016). There are well-known stage IV quality control (QC) issues, including discontinuities along RFC boundaries, and radar artifacts (Stevenson and Schumacher 2014; Nelson et al. 2016). The quality of stage IV estimates varies by RFC, and also through time, largely dependent upon the availability of radar and gauge observations. However, the stage IV products are widely used in precipitation retrieval evaluation as well as model verification.

Stage IV QPE has a latency that mostly precludes its use in generating any real-time flash flood products; however, we include it here to facilitate comparison with HS18.

2) CCPA

Because of the aforementioned weaknesses of stage IV, particularly in the population of heavy to extreme precipitation events, another dataset has been developed which uses a simple linear regression model to adjust stage IV toward the daily Climate Prediction Center (CPC) global gauge analysis. This dataset, referred to as the Climatology-Calibrated Precipitation Analysis (CCPA; Hou et al. 2014), shifts the distribution of QPE toward that of the CPC dataset but retains the small-scale structure of precipitation events. HS18 also documented how CCPA also tends to mute extreme values that are found in stage IV. CCPA also has a high latency and is not generally used in real-time warning operations.

3) MRMS radar-only QPE

The NSSL MRMS project aims to use data from ground-based radar and other sources to create a variety of user-focused analysis products, including QPE. The MRMS QPE, formerly entitled the National Mosaic and multisensor QPE (NMQ; Zhang et al. 2011), has undergone extensive development over the past decade (e.g., Zhang et al. 2016; Qi et al. 2016). MRMS was implemented operationally in 2014.

The original radar-based MRMS QPE is described by Zhang et al. (2011). The QPE features four ZR relationships, applied on a pixel-by-pixel basis. Since its original inception, improved QC measures have been applied, including the use of polarimetric radar observations and a vertical profile of reflectivity correction for bright banding (Zhang et al. 2016). Tang et al. (2020) describe more recent QC developments for radar-based QPE. In addition, Zhang et al. (2020) describe a dual-polarization radar synthetic QPE which has since been implemented as part of the operational MRMS radar-only QPE (GV21).

Gauge-corrected MRMS QPE is also available but only for more recent years; thus, we do not include it in our analysis.

d. Quantitative precipitation forecasts

The HRRR is an hourly updating convection-allowing model run operationally since September 2014, using community-supported data assimilation and model software (Dowell et al. 2022, hereafter D22; James et al. 2022). The HRRR produces hourly QPF, which has been evaluated against both QPE datasets and rain gauge observations in certain regions and for limited time periods (e.g., Ikeda et al. 2013; Bytheway and Kummerow 2015; Bytheway et al. 2017; Dougherty et al. 2021; English et al. 2021). A comprehensive evaluation of HRRR QPF, including how it has changed between HRRR versions, is beyond the scope of this study, but work is underway to document this in a peer-reviewed article.

The HRRR initialization procedure is described in detail by D22 (section 3) and consists of several steps. Radar data are ingested in the context of latent heat application in four 15-min windows during a 1-h “preforecast” for each HRRR simulation (Weygandt et al. 2022). Following the radar data assimilation, conventional observations are assimilated using an approach that varies by HRRR version (D22). The assimilation step also carries out a nonvariational stratiform cloud hydrometeor analysis step (Benjamin et al. 2021), which allows for realistic analysis and short-term prediction of cloud cover. Short-range HRRR forecasts (lead times less than 6 h) exhibit some dependence on radar observations due to the use of radar data for initialization (Weygandt et al. 2022).

3. Methodology

Analysis of model QPF exceedances of various thresholds, in the context of flash flood prediction, has been done for several years as part of the annual Flash Flood and Intense Rainfall (FFaIR) experiment (e.g., Barthold et al. 2015). In this section, we describe the methodology employed here to examine the correspondence of QPE/QPF threshold exceedances with FFRs, following the general approach of HS18.

In contrast to HS18, who evaluated on the ∼4-km Hydrologic Rainfall Analysis Project (HRAP) grid, here comparison is done on the 3-km HRRR grid because we had access to stage IV QPE already interpolated to the HRRR grid. QPE products are interpolated to the 3-km HRRR grid using the National Centers for Environmental Prediction (NCEP) ipolates library (https://www.nco.ncep.noaa.gov/pmb/docs/libs/iplib/ipolates.html). We used neighborhood budget interpolation, preserving precipitation maxima; we tested sensitivity to using ipolates budget interpolation, as well as the impact of preserving maxima versus doing average interpolation, finding minimal sensitivity in the results (not shown). FFRs are put on the closest HRRR grid point and then projected onto multiple nearby HRRR grid points using a 40-km radius of influence, as in HS18. Both the point QPE/QPF exceedances and the projected FFRs are then upscaled to a 60-km grid for evaluation (similar to the 0.5° grid used by HS18). Contingency table statistics were then calculated relating the QPE/QPF exceedances to the occurrence of flash flood reports.

To remove clearly erroneous QPE values, we followed the approach of Herman and Schumacher (2016; their appendix). This approach uses the fact that ARI threshold exceedances should occur with some specified frequency. Time series of QPE (for 1-, 6-, or 24-h durations) at each grid point were used to assess time-lagged correlations and thus to determine the approximate number of independent events in each time series; based on this, we can assess the statistical likelihood of observing different numbers of ARI exceedances based on the binomial distribution. We remove QPE values exceeding the 99.99% percentile for any of the ARIs shown in Fig. 2. While this is admittedly a simple approach for QC and is somewhat dependent on the accuracy of the ARI estimates, it is applied to all the QPE datasets and thus should not affect our conclusions. The model QPF was not subject to this QC.

4. Results

a. CONUS-wide results

In this section, we summarize our results in terms of CONUS-wide performance. We begin with “heat maps,” showing the frequency of exceedance of various thresholds; these spatial patterns can be compared with Fig. 1, which shows the frequency of FFRs during the period.

1) Heat maps

Figure 4 shows exceedance counts of a single QPE dataset (6-h CCPA) against various fixed, ARI, and FFG ratios. Fixed thresholds, as the simplest formulation, exhibit the well-known climatology of heavy precipitation across the CONUS. The high precipitation thresholds are mostly confined to the Gulf Coast region with tongues of higher probability of exceeding, for example, 76.2 mm (6 h)−1, extending northward along the Atlantic coast and into eastern Oklahoma (Fig. 4g). CCPA estimates exceeding 50.8 mm (6 h)−1 occur occasionally along the Pacific coast and in the Sierra Nevada and the Cascade Range and also in the Sonoran Desert of Arizona (Fig. 4a). The ARI thresholds accomplish their purpose by somewhat normalizing frequency across the CONUS (e.g., Fig. 4b). However, maxima and minima are still evident due to departures from climatology during this period, errors in CCPA, and/or biases in the ARIs. Exceedances of FFG (Fig. 4f) exhibit a somewhat different spatial pattern due to the intended physical variability of the thresholds required for flash flooding, as well as artificial regional differences in the FFG methodology. In general, FFG exceedances are more frequent, relatively, in the Midwest and Appalachians, and less frequent in Florida/southern Georgia and the Nebraska Sandhills, than exceedances of 63.5 mm (6 h)−1 (Figs. 4d,f). There are no 6-h FFG exceedances during the 7-yr period in much of the southwestern CONUS (particularly Nevada, Utah, and Arizona; Fig. 4f).

Fig. 4.
Fig. 4.

Exceedance counts of 6-h CCPA during 2015–21. Shown are (left) fixed thresholds, (middle) ARI thresholds, and (right) FFG ratio thresholds. Panels are shown specifically as follows: (a) 50.8, (d) 63.5, (g) 76.2, (j) 88.9, and (m) 101.6 mm (6 h)−1; (b) 1-, (e) 2-, (h) 5-, (k) 10-, and (n) 25-yr ARI thresholds for the 6-h duration; and (c) 0.75, (f) 1.0, (i) 1.5, (l) 2.0, and (o) 2.5 FFGs for the 6-h duration. A total of 2519 days are included in the analysis.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Figure 5 shows comparisons of 6-h heat maps for different QPE/QPF datasets compared against representative fixed, ARI, and FFG thresholds. Striking differences emerge among the different datasets in this analysis. The stage IV high bias in the New Mexico through Montana Front Range area discussed by HS18 is seen (Figs. 5a–c), particularly in the ARI exceedances (Fig. 5b), compared with CCPA exceedances (Figs. 5d–f) which may be expected to be close to reality in the eastern United States due to the climatological correction. Yet the radar-only MRMS product shows even more ARI exceedances in this region than the stage IV (Fig. 5h), and in fact, the radar-only MRMS product has much more precipitation than the other datasets across most of the CONUS (Figs. 5g–i). This may be partially attributable to the higher resolution of the MRMS (1-km grid spacing) in comparison with the other QPE products. The counts of HRRR-forecasted 76.2 mm (6 h)−1 precipitation events appear spatially similar to the QPE datasets, although HRRR predicts more events than captured by CCPA or stage IV in the eastern United States, instead more closely matching the number of radar-only MRMS events (Figs. 5g,j). However, the HRRR predicts fewer 76.2 mm (6 h)−1 events than indicated by MRMS over the western United States, with the exception of the Sierra Nevada and the Cascade Range (Figs. 5g,j). The pattern and magnitude of FFG exceedances are similar among the datasets, indicating that FFG variability outweighs the importance of QPE/QPF differences (Fig. 5, right column). This suggests that the correspondences of different QPE/QPF datasets versus FFG thresholds against FFRs will vary more depending on the FFG ratio used, rather than on the precipitation dataset. HS18 evaluated QPE datasets against an FFG ratio of 1 only, while GV21 evaluated MRMS only, so it is difficult to determine if this is consistent with prior studies. HRRR QPF has dramatically fewer ARI exceedances than MRMS in the western United States (cf. Figs. 5h,k).

Fig. 5.
Fig. 5.

Exceedance counts of 6-h QPE and QPF during 2015–21. Shown are exceedances of (left) 76.2 mm (6 h)−1, (middle) 6-h 5-yr ARI, and (right) 1.5 6-h FFG. Products shown are (a)–(c) stage IV, (d)–(f) CCPA, (g)–(i) radar-only MRMS, and (j)–(l) 0–6-h HRRR QPF. A total of 2204 days are included in the analysis.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

For 1-h accumulations, results are broadly consistent with those seen for the 6-h duration (Fig. 6). In the western United States, pronounced circles of higher frequency of MRMS exceedances of the 5-yr ARI are seen around each WSR-88D (Fig. 6e); such radar artifacts are not as evident for the 6-h duration (Fig. 5h). We again see the relative high bias of MRMS compared to stage IV (Figs. 6a,d). For the 1-h duration, we see relatively more events in the southwestern United States, potentially reflecting the prevalence of short-duration extreme rainfall events in this region (left column of Fig. 6 vs Fig. 5). The pattern of 1-h FFG exceedances (right column of Fig. 6) is somewhat different from the pattern of 6-h FFG exceedances (right column of Fig. 5), with 1-h exceedances appearing more uniformly distributed across the southern United States in the QPE datasets, including in the desert southwest. The 1-h FFG exceedances seem to be in better agreement with the spatial pattern of flash flood reports than the 6-h FFG exceedances (cf. Fig. 1).

Fig. 6.
Fig. 6.

Exceedance counts of 1-h QPE and QPF during 2015–21. Shown are exceedances of (left) 50.8 mm (1 h)−1, (middle) 1-h 5-yr ARI, and (right) 1.5 1-h FFG. Products shown are (a)–(c) stage IV, (d)–(f) radar-only MRMS, and (g)–(i) 0–1-h HRRR. A total of 1227 days are included in the analysis.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

2) Correspondence metrics

Figure 7 shows the equitable threat score (ETS) for the dataset/threshold combinations shown in Fig. 5, illustrating the changes in correspondence with varying datasets and precipitation thresholds. ETS is calculated in a contingency table framework, with FFRs functioning as the observed events. ETS is formulated similarly to CSI but is compared with a reference random set of events, such that positive values indicate better correspondence than a random set of events and negative values indicate worse correspondence than a random set of events. Greater skill is evident in the east for all thresholds (Fig. 7). Note that ETS cannot be calculated at grid points where forecasted events never occur; these are evident as white grid points in Fig. 7. For example, stage IV never exceeded the 76.2 mm (6 h)−1 threshold during 2015–21 in many places in the northwestern United States (Fig. 7a). Overall, comparison with the static 76.2 mm (6 h)−1 threshold corresponds best with FFRs in the southern United States for stage IV and CCPA (Figs. 7a,d), while the 1.5 FFG threshold appears to have the best correspondence in the northern United States (Figs. 7c,f). For the southwestern United States, MRMS and HRRR exceedances of ARIs appear to have the best correspondence with FFRs (Figs. 7h,k).

Fig. 7.
Fig. 7.

Maps showing ETS for QPE/QPF exceedances of thresholds versus observed FFRs during 2015–21. Shown are exceedances of (left) 76.2 mm (6 h)−1, (middle) 6-h 5-yr ARI, and (right) 1.5 6-h FFG. Shown are (a)–(c) stage IV, (d)–(f) CCPA, (g)–(i) radar-only MRMS, and (j)–(l) 0–6-h HRRR QPF. A total of 2204 days are included in the analysis.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Interestingly, HRRR exceedances of the 5-yr ARI (Fig. 7k) have higher ETS than stage IV or CCPA exceedances of the 5-yr ARI in this region (cf. Figs. 7b,e). ARI thresholds appear to provide the best correspondence in the northwestern United States for all datasets (middle column of Fig. 7).

Figure 8 shows CONUS-wide ETS results for each dataset and threshold. These results may be compared directly with Fig. 15 of HS18 and with Figs. 2–4 of GV21 (keeping in mind the more frequent temporal sampling by GV21). For the 6-h duration, ETS maximizes for the 50.8–63.5 mm (6 h)−1 for fixed thresholds (Fig. 8a), at the 2–5-yr ARI (Fig. 8c), and an FFG ratio of 1–1.5 (Fig. 8e), with stage IV providing slightly higher ETS than MRMS; these results agree well with HS18, although we find highest ETS for stage IV exceedances of ARI thresholds and MRMS or HRRR exceedances of FFG thresholds (in contrast to HS18’s finding of highest ETS for fixed threshold comparisons). For the 1-h duration, the highest ETSs are seen for FFG exceedances for every dataset (Fig. 8f); this is in contrast to HS18 finding higher ETS for fixed thresholds at the 1-h duration (see their Figs. 15a–c,g). For fixed and ARI thresholds, we generally find higher scores for the 6-h duration (Figs. 8a,c) than for the 1-h duration (Figs. 8b,d), in agreement with both HS18 and GV21. In terms of ETS, stage IV emerges with the highest CONUS-wide score for all types of thresholds and for both 1- and 6-h durations, in general agreement with HS18. The highest ETS for stage IV tends to be at lower thresholds than is seen in MRMS for all threshold types and both durations; this is because of the generally higher QPE in MRMS. Higher QPE in MRMS overall leads to heavier QPE events matching the frequency of FFRs more closely than lighter QPE events.

Fig. 8.
Fig. 8.

ETS (multiplied by 100) by dataset and threshold for (left) 6- and (right) 1-h durations. Shown are (a),(b) fixed, (c),(d) ARI, and (e),(f) ratios of FFG thresholds. Dataset/threshold combinations are color coded by ETS, with higher ETS being shaded darker green. Results are for the 2015–21 period, with 1199 days included in the analysis. Verification is against FFRs, and ETS (multiplied by 100) ranges from 0 to 100, with higher scores indicating better correspondence.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

To visualize correspondence between QPE/QPF exceedances and observed FFRs in a more holistic fashion, Fig. 9 shows performance diagrams (Roebber 2009) for all of CONUS, showing 6-h (left column) and 1-h durations (right column). Pairs associated with a single QPE/QPF dataset are colored alike, with precipitation threshold types grouped into the same panel. Before we discuss the differences between the datasets, there are some general characteristics of the performance diagrams worth noting. All results for a single dataset and threshold type exhibit a curve going from the upper left portion of the diagram [high probability of detection (POD), but also high false alarm ratio (FAR), for relatively light thresholds] to the lower middle portion of the diagram (low POD but with varying FAR by dataset for rare thresholds like the 100-yr ARI). Fixed and ARI thresholds in general exhibit a similar appearance, with distinction in the slope of their performance diagram curves, going from minimal distinction between the various QPE/QPF datasets at the lowest precipitation thresholds [in terms of POD and success ratio (SR)] but a greater distinction at the highest thresholds (in terms of SR). The different slopes represent the different datasets’ climatologies of heavy precipitation amounts. For example, the HRRR QPFs have a higher FAR (lower SR) than CCPA at the heaviest precipitation threshold [127 mm (6 h)−1; Fig. 9a, lowest points on the purple and blue curves], indicating that HRRR predicts many more 127 mm (6 h)−1 precipitation events than are seen in CCPA. The high FAR in the HRRR also manifests as a lower CSI at these heavier thresholds. The FFG curves behave differently because those thresholds are generally not static in time.

Fig. 9.
Fig. 9.

Performance diagrams evaluating the degree of correspondence between QPE/QPF exceedances of (a),(b) fixed thresholds, (c),(d) ARIs, and (e),(f) FFG, and observed FFRs. The evaluation period is February 2015–December 2021 (1199 days included) for (left) 6-h durations and (right) 1-h durations. Thresholds shown are, from upper left to lower right of each panel, 25.4, 38.1, 50.8, 63.5, 76.2, 88.9, 101.6, 114.3, and 127 mm (6 h)−1; 1-, 2-, 5-, 10-, 25-, 50-, and 100-yr ARIs; and 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, and 5 × FFG. Curved lines from upper left to lower right in each panel correspond to 100 × CSI, while the dashed lines correspond to frequency bias.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

CCPA (blue curves) exhibits the greatest CSI overall (closest to top right) for all three types of thresholds for the 6-h duration (Fig. 9, left column). Stage IV QPE (red curves) is just slightly lower in CSI but has an evident shift toward higher-frequency bias (more frequent exceedances; Fig. 9, left column). MRMS corresponds to FFRs comparably to stage IV/CCPA for fixed thresholds but with a lower SR at the high precipitation thresholds (Fig. 9a), indicating more frequent heavy precipitation events in the MRMS dataset. In terms of FFG exceedances, MRMS corresponds almost as well as stage IV and CCPA to FFRs (Fig. 9e).

For the 1-h QPE and QPF results (right column of Fig. 9), we see the same relative correspondence of the stage IV and MRMS QPE for fixed threshold exceedances (Fig. 9b). For the 1-h ARI exceedances (Fig. 9d), we see less decrease in FAR with increasing threshold than was seen for the 6-h duration (Fig. 9c); this stems from the relatively more frequent false alarms at the rare ARIs (100-yr ARI) for the 1-h duration compared to the 6-h duration. HRRR QPF exceedances of ARIs correspond better with FFRs than MRMS exceedances of ARIs at the 1-h duration (Fig. 9d). In general, HRRR QPF exceedances do not correspond to FFRs as well as the QPE datasets, which is not a surprising result given that it is a forecast rather than an observational estimate.

b. Regional correspondence variations

To extend our analysis, we evaluated the degree of correspondence between exceedances and FFRs in the eight CONUS regions shown in Fig. 10. For the sake of space, we only show performance diagrams for three of the CONUS regions, although results for all regions are summarized later in this section.

Fig. 10.
Fig. 10.

Map of CONUS showing the eight regions used for correspondence evaluation. Dark gray lines are NWS county warning area boundaries, and blue lines are RFC boundaries.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Figure 11 shows performance diagrams for the southwest (SW) region (shown in Fig. 10). In this region, we see a much more pronounced difference between stage IV and MRMS for all thresholds and both durations (Fig. 11), with MRMS comparisons exhibiting a much higher frequency bias than seen in the CONUS results (cf. Fig. 9). Comparing frequency bias between stage IV and MRMS for the 25.4 mm (1 h)−1 threshold (Fig. 11b, uppermost points on the red and green curves), it is seen that MRMS contains ∼5 times as many exceedances as stage IV and ∼5 times as many exceedances as FFRs. HRRR QPF exceedances of this threshold, on the other hand, have a frequency bias near 1 when compared against FFR occurrences. In general, for the southwestern CONUS, we see that HRRR QPF is competitive with the QPE datasets in terms of correspondence with FFRs. In fact, HRRR 1-h QPF exceeding the 2-yr ARI has the highest CSI of any comparison for this region (Fig. 11d). These results are in agreement with previous studies documenting that our ability to model precipitation in sparsely observed mountainous regions is overtaking the capabilities of our observations (Lundquist et al. 2019). These results can provide context for forecasters interpreting QPE datasets and CAM QPF in the southwestern United States.

Fig. 11.
Fig. 11.

As in Fig. 9, but for the southwestern CONUS, and showing (a),(b) fixed, (c),(d) ARI, and (e),(f) FFG thresholds for (left) 6- and (right) 1-h durations.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Figure 12 shows performance diagrams for the northeast (NE) region. This region is characterized by some of the highest overall correspondences between QPE exceedances and FFRs. In particular, we see that the CSI of both 6- and 1-h exceedances of all three types of thresholds approach 0.25 (Fig. 12). Comparing with the CONUS-wide results (Fig. 9), we see that this relatively higher CSI is due to a shift toward higher success ratio (lower false alarm ratio) in the NE region. This relative dearth of false alarm events in this region could be due to the relatively high population density, meaning that most flash flood events will be reported. We also see that FFG comparisons do not provide much advantage over simpler thresholds in this region (Figs. 9e,f).

Fig. 12.
Fig. 12.

As in Fig. 9, but for the northeastern CONUS, and showing (a),(b) fixed, (c),(d) ARI, and (e),(f) FFG thresholds for (left) 6- and (right) 1-h durations.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Figure 13 shows performance diagrams for the southeast (SE) region. Comparing Figs. 12 and 13, we see that there are many more false alarm events (i.e., a much lower success ratio) in the SE compared to the NE, in which QPE exceedances indicate an event, but an FFR is not received. This indicates that flash floods are less likely to occur with a given threshold exceedance in the SE than in the NE. We also see that the highest FFG thresholds considered here (Figs. 13e,f) have a higher CSI than seen on the CONUS scale (Fig. 9), related mostly to a shift toward higher POD. This indicates that there are events in this region which produce 5 times the FFG value over a 1- or 6-h period. The highest CSI for the 5 FFG threshold is seen for the 6-h stage IV QPE (CSI of 0.17). The relative frequency of 5 FFG events in this region is an indicator of the occurrence of deep tropical moisture which contributes to excessive rainfall. Tropical cyclones play a role in the statistics for the SE region, although it is difficult to distinguish their signature from that of other forms of convection in regimes with deep tropical moisture.

Fig. 13.
Fig. 13.

As in Fig. 9, but for the southeastern CONUS, and showing (a),(b) fixed, (c),(d) ARI, and (e),(f) FFG thresholds for (left) 6- and (right) 1-h durations.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

To summarize our quantitative comparison between QPE/QPF exceedances of various thresholds and FFRs, Fig. 14 shows the best-corresponding thresholds for stage IV, MRMS, and HRRR QPF. For this evaluation, thresholds are considered optimal when they have the highest ETS, with a frequency bias falling between 0.5 and 2 (between predicting half as many events as FFRs up to twice as many events). Thresholds are colored green according to ETS, with darker colors associated with higher ETS, following the convention of HS18. For the CONUS as a whole, FFG exceedances emerge with the best correspondence to FFRs for all three datasets for the 1-h duration, and for MRMS and HRRR for the 6-h duration, which is encouraging given the additional information provided by FFG. For stage IV exceedances, the 2-yr ARI threshold has a slightly higher ETS than any FFG threshold.

Fig. 14.
Fig. 14.

As in Fig. 8, but comparing the highest ETS thresholds for stage IV, radar-only MRMS, and HRRR for each region with a frequency bias falling between 0.5 and 2 for (a) 6- and (b) 1-h durations. The numbers and shading correspond to ETS multiplied by 100.

Citation: Journal of Hydrometeorology 25, 9; 10.1175/JHM-D-23-0203.1

Regionally, correspondence with FFRs is greatest in the eastern half of the CONUS (Fig. 14), in agreement with Fig. 7. The lowest correspondence is seen for the Pacific coast (PCST) and Rockies (ROCK) regions, largely due to the relative infrequency of FFRs in these regions (cf. Fig. 1). Some interesting patterns emerge regionally in terms of the optimal 6-h thresholds to use for correspondence with FFRs (Fig. 14a). FFG comparisons become inferior to fixed and ARI thresholds in parts of the central and western United States for the 6-h duration, with FFGs not corresponding best for any QPE or QPF dataset in the SW or ROCK regions. ARIs emerge as the best thresholds to use for all datasets in the SW region. Fixed 6-h thresholds find utility for several regions, despite their simple formulation; they have the best correspondence to FFRs for QPE (stage IV or MRMS) exceedances in the southern Great Plains (SGP), for MRMS exceedances in ROCK, and for HRRR QPF exceedances in PCST. Stage IV 6-h exceedances emerge with the top correspondence in all regions but SW and ROCK (where MRMS exceedances have better correspondence with FFRs).

Results for the 1-h duration are similar but with some interesting differences (Fig. 14b). FFG comparisons emerge as the best threshold to use more often at this short duration, with fixed thresholds only finding utility in the ROCK region. This likely reflects the stronger influence of hydrologic conditions upon short-duration flooding; this was also found by GV21 for the southeastern CONUS and considering FFG ratios of 1–2 (see their Fig. 4d). ARI comparisons are the best threshold to use only for the SW, stage IV in the NE, and HRRR in the PCST. Again, stage IV emerges as the top dataset in all regions but SW and ROCK. MRMS exceedances have the best correspondence with FFRs in the ROCK region, and HRRR 0–1-h QPF exceedances correspond best in the SW region (consistent with results shown in Fig. 11). HRRR 0–1-h QPF exceedances also correspond with FFRs better than stage IV QPE exceedances for both the 1- and 6-h durations in the ROCK region.

5. Considerations for flash flood forecasting decisions

The results shown above describe the population of heavy rainfall events for the 1- and 6-h durations among a number of different datasets. Most of these datasets are too latent to be useful in real-time flash flood warning decisions; thus, the MRMS QPE is widely used in operations for making these decisions, and situational awareness is enhanced by considering running accumulations starting and ending every 2 min. The original purpose of this research was to identify proxies that are relevant to CAM forecast output, but the results may also have implications for real-time flash flood forecast decisions, which are discussed below.

Beginning again with the CONUS-wide results (Fig. 9), we find that 6-h MRMS QPE exceedances exhibit higher CSI than the 1-h MRMS QPE exceedances (cf. green curves between left and right panels in Fig. 9). This result agrees with the findings of HS18 (their Fig. 15) and of GV21 (their Figs. 2–4), indicating that it is a robust result not dependent on the choice to consider (or exclude) overlapping time windows. This suggests that MRMS QPEs with accumulation intervals longer than 1 h should not be discounted for real-time flash flood situational awareness. We note that this result does not hold in the southwestern region (Fig. 11; cf. Figs. 2–4 of GV21), where 1-h MRMS QPE exceedances generally have the highest degree of correspondence with FFRs.

Our finding that, averaged over the entire CONUS, FFG comparisons provide the best correspondence with FFRs for MRMS QPE agrees with GV21. HS18 did not come to this conclusion, but they limited their evaluation to ratios of 100% of FFG, restricting its utility in this framework. Both in this study and in GV21, comparison of MRMS QPE with FFG ratios of greater than 100% provides better correspondence with FFRs. This result, reflected also in some regional evaluations (e.g., Lincoln and Marquardt 2023), underscores the operational importance of considering FFG ratios other than 100% and highlights the potential for FFG to be improved in the future.

A number of studies have undertaken an evaluation of the correspondence of QPE exceedances of various thresholds to observed flash flood indicators in certain regions (e.g., Lincoln and Thomason 2018; Hammond 2018; Lincoln and Marquardt 2023). However, many of these studies used a range of flash flood indicators ranging in severity from minor creek overflows to catastrophic flash flooding. In this study, we used FFRs as the dataset of observed flash floods, making it challenging to quantitatively compare results with these earlier studies.

Lincoln and Thomason (2018) examined correspondences in the lower Mississippi Valley region, which is a portion of our SE region (Fig. 10). They focused on the 3-h duration but found that the 2-yr ARI captured ∼90% of FFRs. For the 1-h MRMS QPE, we find that the 2-yr ARI threshold leads to a POD of ∼40% (Fig. 13d); this large difference is attributable to their use of running accumulation periods, as well as their small sample size (24 flash flood events from 2012 to 2016).

Lincoln and Marquardt (2023) evaluated correspondence in the western Great Lakes region, which is a portion of our Midwest (MDWST) region (Fig. 10). They describe their operational procedure to simultaneously view four MRMS/FLASH products for flash flood warning decision-making. They recommend a fixed 1-h threshold of 2 in. (50.8 mm) (1 h)−1 (see their Table 15), which agrees well with our best fixed 1-h threshold of 38.1–50.8 mm (1 h)−1 for MRMS QPE in the MDWST (not shown). Their recommended ARI threshold (8 years) and FFG threshold (145%) also agree reasonably well with our findings.

GV21 included the southeast and southwest regions in their evaluation (see their Figs. 2–4), providing an opportunity to compare further with our results. For the southwest, they find the highest ETS using a fixed threshold of 2 in. (50.8 mm) (1 h)−1 or 2.5 in. (63.5 mm) (6 h)−1, which agrees reasonably well with our 50.8 mm (1 h)−1 or 50.8 mm (6 h)−1 (Figs. 11a,b). They find the highest ETS for the 100-yr 1-h ARI and the 50-yr 6-h ARI, somewhat higher than our optimal 50-yr 1-h ARI (Fig. 11d) and 10–25-yr 6-h ARI (Fig. 11c). Finally, in terms of optimal FFG ratios, they find the highest ETS at 200% of the 1-h FFG and 75% of the 6-h FFG, which agrees well with our findings of 150%–200% of the 1-h FFG (Fig. 11f) and 50% of the 6-h FFG (Fig. 11e).

For the southeast region, GV21 found the highest ETS using a fixed threshold of 3 in. (76.2 mm) (1 h)−1 and 5 in. (127 mm) (6 h)−1, agreeing relatively well with our optimal thresholds of 63.5 mm (1 h)−1 and 88.9–101.6 mm (6 h)−1. For optimal ARI thresholds, they found a 1-h 30-yr ARI or a 6-h 15-yr ARI; we found the best CSI at the 1-h 10-yr ARI and the 6-h 5-yr ARI. In terms of optimal FFG ratios, they found the highest ETS at 150% of 1-h FFG and 500% of 6-h FFG. We found optimal CSI at 100%–150% of 1-h FFG and 200% of 6-h FFG.

Given GV21’s use of running accumulation periods, among other differences in methodology, it is noteworthy that our results related to FFG agree as well as they do. This provides some confidence in the use of the thresholds provided here for flash flood situational awareness, although users should err on the high side (closer to GV21) for warning decisions due to the running accumulation period.

6. Discussion and conclusions

The correspondence of the QPF/QPE dataset exceedances of precipitation thresholds with occurrences of flash floods is a complicated relationship. There are many reasons why we would not expect perfect correspondence, even with a somewhat sophisticated threshold such as FFG. However, the framework introduced by HS18 provides a way of quantitatively evaluating QPE datasets and thresholds for their relative value in flash flood analysis and forecasting because they are the tools available to operational forecasters. In this article, we have extended the analysis of HS18 to a longer time period and included, in the same framework, QPF from a state-of-the-art CAM.

A key finding from this study, which is consistent with previous work, is that dramatic uncertainties persist in QPE, particularly in sparsely observed regions of the United States. The major differences in the population of heavy precipitation events between different QPE datasets are concerning when these datasets are routinely used for many purposes, including model evaluation. As an example, Fig. 5 shows that, even in a relatively well-observed region like central South Carolina, 6-h CCPA contains 20–30 6-h QPEs exceeding 76.2 mm during a 7-yr period (Fig. 5d), but MRMS contains 70+ events (Fig. 5g). The more frequent occurrence of heavy precipitation in MRMS also manifests in higher thresholds providing best correspondence with FFRs in the framework of this study. These uncertainties pose major challenges for both the research and operational communities, although it is important to note that our framework cannot determine which QPE dataset is more accurate.

In agreement with HS18, we find that the skill of correspondences generally is highest in the eastern United States, with lower skill in the west. We also find the same recurring deficiencies and biases reported by HS18, including the high bias of stage IV in the interior western United States, and the dependence of 1-h MRMS QPE upon proximity to radars. Consistent with HS18, we find that MRMS generally outperforms stage IV and CCPA in terms of FFR correspondence in the western United States for the 6-h duration but with a much greater frequency of events. Stage IV exceedances have the highest correspondence with FFRs in the eastern United States. We find that, at the 1-h duration, stage IV exceedances have the best correspondence for almost every region. Exceedances for the 6-h duration have better correspondence with FFRs in all regions except the SW and PCST, where 1-h durations have higher correspondence.

In terms of thresholds, FFG is the best threshold for correspondence with FFRs for most dataset/region combinations and for both 1- and 6-h durations (Fig. 12). This is true for the CONUS as a whole as well, in contrast to HS18’s finding that fixed thresholds provided the best correspondence for the CONUS; this is due to our consideration of ratios of FFG above and below unity. The correspondence of FFG exceedances with FFRs is encouraging, demonstrating the value of the dynamic FFG as a threshold for flash flooding onset. There are, however, some interesting exceptions to this result. In the ROCK region, FFGs do not provide the best correspondence for any dataset. For the 1-h duration, fixed thresholds provide the best correspondence for all datasets, with the specific threshold ranging from 25.4 mm (1 h)−1 for HRRR QPF to 50.8 mm (1 h)−1 for MRMS QPE (also illustrating the high frequency of exceedances in MRMS 1-h QPE). At the 6-h duration, for the ROCK region, relatively high ARI thresholds of 25 years (stage IV) and 50 years (HRRR) provide the best correspondence. Overall, the ROCK region features the second-lowest correspondence between exceedances and FFRs.

The lowest correspondence is seen for the PCST region. This region is noteworthy for the relatively high thresholds that provide the best correspondence with FFRs, including twice FFG for stage IV and MRMS 1-h QPE, 100-yr ARI for 1-h HRRR QPF (Fig. 12b), and a fixed threshold of 101.6 mm (6 h)−1 for HRRR 0–6-h QPF. The need for very high precipitation thresholds to obtain optimal (although still very poor) correspondence with FFRs stems from the rarity of FFRs in this region (Fig. 1); the requirement for a frequency bias between 0.5 and 1 for the results shown in Fig. 12 necessitates using an extremely trimmed down set of exceedances for any of the datasets shown here.

HRRR forecasts are evaluated here in the same framework as the QPE datasets, and as expected, HRRR QPF exceedances generally have inferior correspondence to FFRs for the CONUS scale for 1- and 6-h durations. FFG is the best threshold with which to compare HRRR forecasts, for both 6- and 1-h QPF. However, in certain poorly observed regions like the SW, HRRR exceedances correspond better with FFRs than any QPE exceedance for the 1-h duration (Fig. 12). This is indicative of the relative skill of the HRRR in predicting short-duration excessive rainfall events, compared with the relative lack of radar observations in this region. These results argue for the consideration of model QPF when determining a best estimate of QPE in regions of complex terrain and/or sparse observations.

As noted by HS18, it is important to acknowledge the problems associated with the FFR dataset. FFRs have an inherent low bias in rural regions and during the night and are also subject to reporting differences between WFOs (with variability in what is considered a flash flood). It is likely that the true number of flash flood events is somewhat higher than that indicated by the FFR dataset, indicating that QPE/threshold comparisons featuring a frequency bias above unity may actually be superior to those with a bias of unity.

Another important issue to note is the limitation of our analysis to nonoverlapping hourly 1-h QPEs and 6-h QPEs between synoptic times; as demonstrated by GV21 and Schumacher and Herman (2021), this has the effect of reducing the number of events. The inclusion of additional MRMS QPE to include running accumulation QPEs ending at off-hour times would be informative, but comparison with other datasets would not be possible.

This study (as well as HS18) has highlighted the regionally varying relationships between QPE/QPF and flash flood events. These variations are somewhat analogous to the varying updraft helicity (UH) thresholds used in predicting severe weather (Loken et al. 2020) and are an important consideration in the use of any QPE-based dataset for training a machine learning system to predict flash flooding (e.g., Hill and Schumacher 2021; Schumacher et al. 2021). Machine learning flash flood prediction systems should be trained on different QPE datasets and thresholds depending on their application (i.e., flash flood warning decision support requires high temporal resolution QPE and running accumulation periods, while longer-range forecasting systems could use coarser temporal resolution QPE datasets), and care should be taken to choose datasets and thresholds for training. Work also is underway to evaluate probabilistic QPFs from the High-Resolution Ensemble Forecast (HREF; Roberts et al. 2020) system in this framework; these results will be reported in a subsequent manuscript.

Our results highlight that flash flood forecasting is a highly probabilistic problem. Uncertainties are present in both the forcing (QPF) and the response (hydrology, or the threshold for flooding) components of the flash flood prediction problem; state-of-the-art flash flood prediction problems need to approach the forecast from this perspective. The use of a probabilistic FLASH system (Gourley et al. 2017) in combination with ensemble forecasts from the Warn-on-Forecast System (WoFS; Stensrud et al. 2009, 2013) is one such example, tested recently at the hydrometeorology testbed (Martinaitis et al. 2023). Use of convection-allowing ensemble systems, in combination with increasingly advanced hydrologic modeling, will continue to advance the skill of probabilistic flash flood forecasts in the coming years.

Acknowledgments.

This work, completed as part of the first author’s PhD dissertation, was supported by the NOAA Disaster Related Appropriations supplemental project IFAA-1A-1-1a and by the NOAA Joint Technology Transfer Initiative Grant NA21OAR4590187. The authors gratefully acknowledge Mike Erickson (CIRES/WPC) for his assistance in obtaining flash flood guidance data.

Data availability statement.

FFRs were obtained from the Iowa Environmental Mesonet. Gridded ARIs, with estimates in the northwestern United States where NOAA Atlas 14 is unavailable, were provided by Greg Herman. QPE, QPF, and FFG datasets were obtained from the NOAA High Performance Storage System (HPSS). All data used in this study are posted on Dryad (https://doi.org/10.5061/dryad.gxd2547rt).

REFERENCES

  • Ahmadalipour, A., and H. Moradkhani, 2019: A data-driven analysis of flash flood hazard, fatalities, and damages over the CONUS during 1996–2017. J. Hydrol., 578, 124106, https://doi.org/10.1016/j.jhydrol.2019.124106.

    • Search Google Scholar
    • Export Citation
  • Barthold, F. E., T. E. Workoff, B. A. Cosgrove, J. J. Gourley, D. R. Novak, and K. M. Mahoney, 2015: Improving flash flood forecasts: The HMT-WPC flood and intense rainfall experiment. Bull. Amer. Meteor. Soc., 96, 18591866, https://doi.org/10.1175/BAMS-D-14-00201.1.

    • Search Google Scholar
    • Export Citation
  • Benjamin, S. G., and Coauthors, 2021: Stratiform cloud-hydrometeor assimilation for HRRR and RAP model short-range weather prediction. Mon. Wea. Rev., 149, 26732694, https://doi.org/10.1175/MWR-D-20-0319.1.

    • Search Google Scholar
    • Export Citation
  • Bonnin, G. M., D. Martin, B. Lin, T. Parzybok, M. Yekta, and D. Riley, 2006:Version 3.0: Delaware, District of Columbia, Illinois, Indiana, Kentucky, Maryland, New Jersey, North Carolina, Ohio, Pennsylvania, South Carolina, Tennessee, Virginia, West Virigina. Vol. 2, Precipitation-Frequency Atlas of the United States, NOAA Atlas 14, 295 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume2.pdf.

  • Bytheway, J. L., and C. D. Kummerow, 2015: Toward an object-based assessment of high-resolution forecasts of long-lived convective precipitation in the central U.S. J. Adv. Model. Earth Syst., 7, 12481264, https://doi.org/10.1002/2015MS000497.

    • Search Google Scholar
    • Export Citation
  • Bytheway, J. L., C. D. Kummerow, and C. Alexander, 2017: A features-based assessment of warm season precipitation forecasts from the HRRR model over three years of development. Wea. Forecasting, 32, 18411856, https://doi.org/10.1175/WAF-D-17-0050.1.

    • Search Google Scholar
    • Export Citation
  • Calianno, M., I. Ruin, and J. J. Gourley, 2013: Supplementing flash flood reports with impact classifications. J. Hydrol., 477, 116, https://doi.org/10.1016/j.jhydrol.2012.09.036.

    • Search Google Scholar
    • Export Citation
  • Clark, R. A., J. J. Gourley, Z. L. Flamig, Y. Hong, and E. Clark, 2014: CONUS-wide evaluation of national weather service flash flood guidance products. Wea. Forecasting, 29, 377392, https://doi.org/10.1175/WAF-D-12-00124.1.

    • Search Google Scholar
    • Export Citation
  • Dougherty, K. J., J. D. Horel, and J. E. Nachmakin, 2021: Forecast skill for California heavy precipitation periods from the High-Resolution Rapid Refresh model and the Coupled Ocean–Atmosphere Mesoscale Prediction System. Wea. Forecasting, 36, 22752288, https://doi.org/10.1175/WAF-D-20-0182.1.

    • Search Google Scholar
    • Export Citation
  • Dowell, D. C., and Coauthors, 2022: The High-Resolution Rapid Refresh (HRRR): An hourly-updating convection-allowing forecast model. Part I: Motivation and system description. Wea. Forecasting, 37, 13711395, https://doi.org/10.1175/WAF-D-21-0151.1.

    • Search Google Scholar
    • Export Citation
  • Downton, M. W., J. Z. Barnard Miller, and R. A. Pielke Jr., 2005: Reanalysis of U.S. National Weather Service flood loss database. Nat. Hazards Rev., 6, 1322, https://doi.org/10.1061/(ASCE)1527-6988(2005)6:1(13).

    • Search Google Scholar
    • Export Citation
  • English, J. M., D. D. Turner, T. I. Alcott, W. R. Moninger, J. L. Bytheway, R. Cifelli, and M. Marquis, 2021: Evaluating operational and experimental HRRR model forecasts of atmospheric river events in California. Wea. Forecasting, 36, 19251944, https://doi.org/10.1175/WAF-D-21-0081.1.

    • Search Google Scholar
    • Export Citation
  • Erickson, M. J., J. S. Kastman, B. Albright, S. Perfater, J. A. Nelson, R. S. Schumacher, and G. R. Herman, 2019: Verification results from the 2017 HMT-WPC flash flood and intense rainfall experiment. J. Appl. Meteor. Climatol., 58, 25912604, https://doi.org/10.1175/JAMC-D-19-0097.1.

    • Search Google Scholar
    • Export Citation
  • Erickson, M. J., B. Albright, and J. A. Nelson, 2021: Verifying and redefining the weather prediction center’s excessive rainfall outlook forecast product. Wea. Forecasting, 36, 325340, https://doi.org/10.1175/WAF-D-20-0020.1.

    • Search Google Scholar
    • Export Citation
  • Gourley, J. J., and H. Vergara, 2021: Comments on “flash flood verification: Pondering precipitation proxies”. J. Hydrometeor., 22, 739747, https://doi.org/10.1175/JHM-D-20-0215.1.

    • Search Google Scholar
    • Export Citation
  • Gourley, J. J., and Coauthors, 2013: A unified flash flood database across the United States. Bull. Amer. Meteor. Soc., 94, 799805, https://doi.org/10.1175/BAMS-D-12-00198.1.

    • Search Google Scholar
    • Export Citation
  • Gourley, J. J., and Coauthors, 2017: The FLASH project: Improving the tools for flash flood monitoring and prediction across the United States. Bull. Amer. Meteor. Soc., 98, 361372, https://doi.org/10.1175/BAMS-D-15-00247.1.

    • Search Google Scholar
    • Export Citation
  • Hammond, N., 2018: A comparison between 2016 flash flood observations and rainfall ARIs across the north-central United States. Preprints, 32nd Conf. on Hydrology, Austin, TX, Amer. Meteor. Soc., 42, ams.confex.com/ams/98Annual/webprogram/Paper326494.html.

  • Herman, G. R., and R. S. Schumacher, 2016: Extreme precipitation in models: An evaluation. Wea. Forecasting, 31, 18531879, https://doi.org/10.1175/WAF-D-16-0093.1.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018a: Money doesn’t grow on trees, but forecasts do: Forecasting excessive precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018b: Flash flood verification: Pondering precipitation proxies. J. Hydrometeor., 19, 17531776, https://doi.org/10.1175/JHM-D-18-0092.1.

    • Search Google Scholar
    • Export Citation
  • Hill, A. J., and R. S. Schumacher, 2021: Forecasting excessive rainfall with random forests and a deterministic convection-allowing model. Wea. Forecasting, 36, 16931711, https://doi.org/10.1175/WAF-D-21-0026.1.

    • Search Google Scholar
    • Export Citation
  • Hou, D., and Coauthors, 2014: Climatology-calibrated precipitation analysis at fine scales: Statistical adjustment of stage IV toward CPC gauge-based analysis. J. Hydrometeor., 15, 25422557, https://doi.org/10.1175/JHM-D-11-0140.1.

    • Search Google Scholar
    • Export Citation
  • Ikeda, K., M. Steiner, J. Pinto, and C. Alexander, 2013: Evaluation of cold-season precipitation forecasts generated by the hourly updating High-Resolution Rapid Refresh model. Wea. Forecasting, 28, 921939, https://doi.org/10.1175/WAF-D-12-00085.1.

    • Search Google Scholar
    • Export Citation
  • James, E. P., and Coauthors, 2022: The High-Resolution Rapid Refresh (HRRR): An hourly updating convection-allowing forecast model. Part II: Forecast performance. Wea. Forecasting, 37, 13971417, https://doi.org/10.1175/WAF-D-21-0130.1.

    • Search Google Scholar
    • Export Citation
  • Kean, J. W., and Coauthors, 2019: Inundation, flow dynamics, and damage in the 9 January 2018 Montecito debris-flow event, California, USA: Opportunities and challenges for post-wildfire risk assessment. Geosphere, 15, 11401163, https://doi.org/10.1130/GES02048.1.

    • Search Google Scholar
    • Export Citation
  • Lincoln, W. S., and R. F. L. Thomason, 2018: A preliminary look at using rainfall average recurrence intervals to characterize flash flood events for real-time warning forecasting. J. Oper. Meteor., 6, 1322, https://doi.org/10.15191/nwajom.2018.0602.

    • Search Google Scholar
    • Export Citation
  • Lincoln, W. S., and N. W. S. Marquardt, 2023: MRMS and FLASH thresholds for assessing flash flood potential in realtime for the western Great Lakes. Central Region Technical Attachment 23-01, 31 pp., https://www.weather.gov/media/crh/publications/TA/TA_2301.pdf.

  • Loken, E. D., A. J. Clark, and C. D. Karstens, 2020: Generating probabilistic next-day severe weather forecasts from convection-allowing ensembles using random forests. Wea. Forecasting, 35, 16011631, https://doi.org/10.1175/WAF-D-19-0258.1.

    • Search Google Scholar
    • Export Citation
  • Lundquist, J., M. Hughest, E. Gutman, and S. Kapnick, 2019: Our skill in modeling mountain rain and snow is bypassing the skill of our observational networks. Bull. Amer. Meteor. Soc., 100, 24732490, https://doi.org/10.1175/BAMS-D-19-0001.1.

    • Search Google Scholar
    • Export Citation
  • Martinaitis, S. M., and Coauthors, 2023: A path towards short-term probabilistic flash flood prediction. Bull. Amer. Meteor. Soc., 104, E585E605, https://doi.org/10.1175/BAMS-D-22-0026.1.

    • Search Google Scholar
    • Export Citation
  • Nelson, B. R., O. P. Prat, D.-J. Seo, and E. Habib, 2016: Assessment and implications of NCEP stage IV quantitative precipitation estimates for product intercomparisons. Wea. Forecasting, 31, 371394, https://doi.org/10.1175/WAF-D-14-00112.1.

    • Search Google Scholar
    • Export Citation
  • NWS, 2023: Colorado flood safety and wildfire awareness week: Flash floods and warnings. NWS, accessed 7 September 2023, https://www.weather.gov/pub/FSWPW4flashfloodswednesday.

  • Perica, S., and Coauthors, 2011: California. Precipitation-frequency atlas of the United States, vol. 6, version 2.0, NOAA Atlas 14, 233 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume6.pdf.

  • Perica, S., and Coauthors, 2013a: Midwestern States (Colorado, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Oklahoma, South Dakota, Wisconsin). Precipitation-frequency atlas of the United States, vol. 8, version 2.0, NOAA Atlas 14, 289 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume8.pdf.

  • Perica, S., and Coauthors, 2013b: Southeastern States (Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi). Precipitation-frequency atlas of the United States, vol. 9, version 2.0, NOAA Atlas 14, 163 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume9.pdf.

  • Perica, S., S. Pavlovic, M. St. Laurent, C. Trypaluk, D. Unruh, D. Martin, and O. Wilhite, 2015: Northeastern States (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island, Vermont). Precipitation-frequency atlas of the United States, vol. 10, version 2.0, NOAA Atlas 14, 265 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume10.pdf.

  • Perica, S., S. Pavlovic, M. St. Laurent, C. Trypaluk, D. Unruh, and O. Wilhite, 2018: Texas. Precipitation-frequency atlas of the United States, vol. 11, version 2.0, NOAA Atlas 14, 283 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume11.pdf.

  • Potvin, C. K., C. Broyles, P. S. Skinner, H. E. Brooks, and E. Rasmussen, 2019: A Bayesian hierarchical modeling framework for correcting reporting bias in the U.S. tornado database. Wea. Forecasting, 34, 1530, https://doi.org/10.1175/WAF-D-18-0137.1.

    • Search Google Scholar
    • Export Citation
  • Prein, A. F., C. Liu, K. Ikeda, S. B. Trier, R. M. Rasmussen, G. J. Holland, and M. P. Clark, 2017: Increased rainfall volume from future convective storms in the US. Nat. Climate Change, 7, 880884, https://doi.org/10.1038/s41558-017-0007-7.

    • Search Google Scholar
    • Export Citation
  • Qi, Y., S. Martinaitis, J. Zhang, and S. Cocks, 2016: A real-time automated quality control of hourly rain gauge data based on multiple sensors in MRMS system. J. Hydrometeor., 17, 16751691, https://doi.org/10.1175/JHM-D-15-0188.1.

    • Search Google Scholar
    • Export Citation
  • Roberts, B., B. T. Gallo, I. L. Jirak, A. J. Clark, D. C. Dowell, X. Wang, and Y. Wang, 2020: What does a convection-allowing ensemble of opportunity buy us in forecasting thunderstorms? Wea. Forecasting, 35, 22932316, https://doi.org/10.1175/WAF-D-20-0069.1.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601608, https://doi.org/10.1175/2008WAF2222159.1.

    • Search Google Scholar
    • Export Citation
  • Schumacher, R. S., and G. R. Herman, 2021: Reply to “comments on ‘flash flood verification: Pondering precipitation proxies.’” J. Hydrometeor., 22, 749752, https://doi.org/10.1175/JHM-D-20-0275.1.

    • Search Google Scholar
    • Export Citation
  • Schumacher, R. S., A. J. Hill, M. Klein, J. A. Nelson, M. J. Erickson, S. M. Trojniak, and G. R. Herman, 2021: From random forests to flood forecasts: A research to operations success story. Bull. Amer. Meteor. Soc., 102, E1742E1755, https://doi.org/10.1175/BAMS-D-20-0186.1.

    • Search Google Scholar
    • Export Citation
  • Stensrud, D. J., and Coauthors, 2009: Convective-scale warn-on-forecast system: A vision for 2020. Bull. Amer. Meteor. Soc., 90, 14871500, https://doi.org/10.1175/2009BAMS2795.1.

    • Search Google Scholar
    • Export Citation
  • Stensrud, D. J., and Coauthors, 2013: Progress and challenges with warn-on-forecast. Atmos. Res., 123, 216, https://doi.org/10.1016/j.atmosres.2012.04.004.

    • Search Google Scholar
    • Export Citation
  • Stevenson, S. N., and R. S. Schumacher, 2014: A 10-year survey of extreme rainfall events in the central and eastern United States using gridded multisensor precipitation analyses. Mon. Wea. Rev., 142, 31473162, https://doi.org/10.1175/MWR-D-13-00345.1.

    • Search Google Scholar
    • Export Citation
  • Tang, L., J. Zhang, M. Simpson, A. Arthur, H. Grams, Y. Wang, and C. Langston, 2020: Updates on the radar data quality control in the MRMS quantitative precipitation estimation system. J. Atmos. Oceanic Technol., 37, 15211537, https://doi.org/10.1175/JTECH-D-19-0165.1.

    • Search Google Scholar
    • Export Citation
  • Weygandt, S. S., S. S. Benjamin, M. Hu, C. R. Alexander, T. G. Smirnova, and E. P. James, 2022: Radar reflectivity-based model initialization using specified latent heating (radar-LHI) within a diabatic digital filter or pre-forecast integration. Wea. Forecasting, 37, 14191434, https://doi.org/10.1175/WAF-D-21-0142.1.

    • Search Google Scholar
    • Export Citation
  • Yussouf, N., K. A. Wilson, S. M. Martinaitis, H. Vergara, P. L. Heinselman, and J. J. Gourley, 2020: The coupling of NSSL warn-on-forecast and FLASH systems for probabilistic flash flood prediction. J. Hydrometeor., 21, 123141, https://doi.org/10.1175/JHM-D-19-0131.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Coauthors, 2011: National mosaic and multi-sensor QPE (NMQ) system: Description, results, and future plans. Bull. Amer. Meteor. Soc., 92, 13211338, https://doi.org/10.1175/2011BAMS-D-11-00047.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Coauthors, 2016: Multi-Radar Multi-Sensor (MRMS) quantitative precipitation estimation: Initial operating capabilities. Bull. Amer. Meteor. Soc., 97, 621638, https://doi.org/10.1175/BAMS-D-14-00174.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., L. Tang, S. Cocks, P. Zhang, A. Ryzhkov, K. Howard, C. Langston, and B. Kaney, 2020: A dual-polarization radar synthetic QPE for operations. J. Hydrometeor., 21, 25072521, https://doi.org/10.1175/JHM-D-19-0194.1.

    • Search Google Scholar
    • Export Citation
Save
  • Ahmadalipour, A., and H. Moradkhani, 2019: A data-driven analysis of flash flood hazard, fatalities, and damages over the CONUS during 1996–2017. J. Hydrol., 578, 124106, https://doi.org/10.1016/j.jhydrol.2019.124106.

    • Search Google Scholar
    • Export Citation
  • Barthold, F. E., T. E. Workoff, B. A. Cosgrove, J. J. Gourley, D. R. Novak, and K. M. Mahoney, 2015: Improving flash flood forecasts: The HMT-WPC flood and intense rainfall experiment. Bull. Amer. Meteor. Soc., 96, 18591866, https://doi.org/10.1175/BAMS-D-14-00201.1.

    • Search Google Scholar
    • Export Citation
  • Benjamin, S. G., and Coauthors, 2021: Stratiform cloud-hydrometeor assimilation for HRRR and RAP model short-range weather prediction. Mon. Wea. Rev., 149, 26732694, https://doi.org/10.1175/MWR-D-20-0319.1.

    • Search Google Scholar
    • Export Citation
  • Bonnin, G. M., D. Martin, B. Lin, T. Parzybok, M. Yekta, and D. Riley, 2006:Version 3.0: Delaware, District of Columbia, Illinois, Indiana, Kentucky, Maryland, New Jersey, North Carolina, Ohio, Pennsylvania, South Carolina, Tennessee, Virginia, West Virigina. Vol. 2, Precipitation-Frequency Atlas of the United States, NOAA Atlas 14, 295 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume2.pdf.

  • Bytheway, J. L., and C. D. Kummerow, 2015: Toward an object-based assessment of high-resolution forecasts of long-lived convective precipitation in the central U.S. J. Adv. Model. Earth Syst., 7, 12481264, https://doi.org/10.1002/2015MS000497.

    • Search Google Scholar
    • Export Citation
  • Bytheway, J. L., C. D. Kummerow, and C. Alexander, 2017: A features-based assessment of warm season precipitation forecasts from the HRRR model over three years of development. Wea. Forecasting, 32, 18411856, https://doi.org/10.1175/WAF-D-17-0050.1.

    • Search Google Scholar
    • Export Citation
  • Calianno, M., I. Ruin, and J. J. Gourley, 2013: Supplementing flash flood reports with impact classifications. J. Hydrol., 477, 116, https://doi.org/10.1016/j.jhydrol.2012.09.036.

    • Search Google Scholar
    • Export Citation
  • Clark, R. A., J. J. Gourley, Z. L. Flamig, Y. Hong, and E. Clark, 2014: CONUS-wide evaluation of national weather service flash flood guidance products. Wea. Forecasting, 29, 377392, https://doi.org/10.1175/WAF-D-12-00124.1.

    • Search Google Scholar
    • Export Citation
  • Dougherty, K. J., J. D. Horel, and J. E. Nachmakin, 2021: Forecast skill for California heavy precipitation periods from the High-Resolution Rapid Refresh model and the Coupled Ocean–Atmosphere Mesoscale Prediction System. Wea. Forecasting, 36, 22752288, https://doi.org/10.1175/WAF-D-20-0182.1.

    • Search Google Scholar
    • Export Citation
  • Dowell, D. C., and Coauthors, 2022: The High-Resolution Rapid Refresh (HRRR): An hourly-updating convection-allowing forecast model. Part I: Motivation and system description. Wea. Forecasting, 37, 13711395, https://doi.org/10.1175/WAF-D-21-0151.1.

    • Search Google Scholar
    • Export Citation
  • Downton, M. W., J. Z. Barnard Miller, and R. A. Pielke Jr., 2005: Reanalysis of U.S. National Weather Service flood loss database. Nat. Hazards Rev., 6, 1322, https://doi.org/10.1061/(ASCE)1527-6988(2005)6:1(13).

    • Search Google Scholar
    • Export Citation
  • English, J. M., D. D. Turner, T. I. Alcott, W. R. Moninger, J. L. Bytheway, R. Cifelli, and M. Marquis, 2021: Evaluating operational and experimental HRRR model forecasts of atmospheric river events in California. Wea. Forecasting, 36, 19251944, https://doi.org/10.1175/WAF-D-21-0081.1.

    • Search Google Scholar
    • Export Citation
  • Erickson, M. J., J. S. Kastman, B. Albright, S. Perfater, J. A. Nelson, R. S. Schumacher, and G. R. Herman, 2019: Verification results from the 2017 HMT-WPC flash flood and intense rainfall experiment. J. Appl. Meteor. Climatol., 58, 25912604, https://doi.org/10.1175/JAMC-D-19-0097.1.

    • Search Google Scholar
    • Export Citation
  • Erickson, M. J., B. Albright, and J. A. Nelson, 2021: Verifying and redefining the weather prediction center’s excessive rainfall outlook forecast product. Wea. Forecasting, 36, 325340, https://doi.org/10.1175/WAF-D-20-0020.1.

    • Search Google Scholar
    • Export Citation
  • Gourley, J. J., and H. Vergara, 2021: Comments on “flash flood verification: Pondering precipitation proxies”. J. Hydrometeor., 22, 739747, https://doi.org/10.1175/JHM-D-20-0215.1.

    • Search Google Scholar
    • Export Citation
  • Gourley, J. J., and Coauthors, 2013: A unified flash flood database across the United States. Bull. Amer. Meteor. Soc., 94, 799805, https://doi.org/10.1175/BAMS-D-12-00198.1.

    • Search Google Scholar
    • Export Citation
  • Gourley, J. J., and Coauthors, 2017: The FLASH project: Improving the tools for flash flood monitoring and prediction across the United States. Bull. Amer. Meteor. Soc., 98, 361372, https://doi.org/10.1175/BAMS-D-15-00247.1.

    • Search Google Scholar
    • Export Citation
  • Hammond, N., 2018: A comparison between 2016 flash flood observations and rainfall ARIs across the north-central United States. Preprints, 32nd Conf. on Hydrology, Austin, TX, Amer. Meteor. Soc., 42, ams.confex.com/ams/98Annual/webprogram/Paper326494.html.

  • Herman, G. R., and R. S. Schumacher, 2016: Extreme precipitation in models: An evaluation. Wea. Forecasting, 31, 18531879, https://doi.org/10.1175/WAF-D-16-0093.1.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018a: Money doesn’t grow on trees, but forecasts do: Forecasting excessive precipitation with random forests. Mon. Wea. Rev., 146, 15711600, https://doi.org/10.1175/MWR-D-17-0250.1.

    • Search Google Scholar
    • Export Citation
  • Herman, G. R., and R. S. Schumacher, 2018b: Flash flood verification: Pondering precipitation proxies. J. Hydrometeor., 19, 17531776, https://doi.org/10.1175/JHM-D-18-0092.1.

    • Search Google Scholar
    • Export Citation
  • Hill, A. J., and R. S. Schumacher, 2021: Forecasting excessive rainfall with random forests and a deterministic convection-allowing model. Wea. Forecasting, 36, 16931711, https://doi.org/10.1175/WAF-D-21-0026.1.

    • Search Google Scholar
    • Export Citation
  • Hou, D., and Coauthors, 2014: Climatology-calibrated precipitation analysis at fine scales: Statistical adjustment of stage IV toward CPC gauge-based analysis. J. Hydrometeor., 15, 25422557, https://doi.org/10.1175/JHM-D-11-0140.1.

    • Search Google Scholar
    • Export Citation
  • Ikeda, K., M. Steiner, J. Pinto, and C. Alexander, 2013: Evaluation of cold-season precipitation forecasts generated by the hourly updating High-Resolution Rapid Refresh model. Wea. Forecasting, 28, 921939, https://doi.org/10.1175/WAF-D-12-00085.1.

    • Search Google Scholar
    • Export Citation
  • James, E. P., and Coauthors, 2022: The High-Resolution Rapid Refresh (HRRR): An hourly updating convection-allowing forecast model. Part II: Forecast performance. Wea. Forecasting, 37, 13971417, https://doi.org/10.1175/WAF-D-21-0130.1.

    • Search Google Scholar
    • Export Citation
  • Kean, J. W., and Coauthors, 2019: Inundation, flow dynamics, and damage in the 9 January 2018 Montecito debris-flow event, California, USA: Opportunities and challenges for post-wildfire risk assessment. Geosphere, 15, 11401163, https://doi.org/10.1130/GES02048.1.

    • Search Google Scholar
    • Export Citation
  • Lincoln, W. S., and R. F. L. Thomason, 2018: A preliminary look at using rainfall average recurrence intervals to characterize flash flood events for real-time warning forecasting. J. Oper. Meteor., 6, 1322, https://doi.org/10.15191/nwajom.2018.0602.

    • Search Google Scholar
    • Export Citation
  • Lincoln, W. S., and N. W. S. Marquardt, 2023: MRMS and FLASH thresholds for assessing flash flood potential in realtime for the western Great Lakes. Central Region Technical Attachment 23-01, 31 pp., https://www.weather.gov/media/crh/publications/TA/TA_2301.pdf.

  • Loken, E. D., A. J. Clark, and C. D. Karstens, 2020: Generating probabilistic next-day severe weather forecasts from convection-allowing ensembles using random forests. Wea. Forecasting, 35, 16011631, https://doi.org/10.1175/WAF-D-19-0258.1.

    • Search Google Scholar
    • Export Citation
  • Lundquist, J., M. Hughest, E. Gutman, and S. Kapnick, 2019: Our skill in modeling mountain rain and snow is bypassing the skill of our observational networks. Bull. Amer. Meteor. Soc., 100, 24732490, https://doi.org/10.1175/BAMS-D-19-0001.1.

    • Search Google Scholar
    • Export Citation
  • Martinaitis, S. M., and Coauthors, 2023: A path towards short-term probabilistic flash flood prediction. Bull. Amer. Meteor. Soc., 104, E585E605, https://doi.org/10.1175/BAMS-D-22-0026.1.

    • Search Google Scholar
    • Export Citation
  • Nelson, B. R., O. P. Prat, D.-J. Seo, and E. Habib, 2016: Assessment and implications of NCEP stage IV quantitative precipitation estimates for product intercomparisons. Wea. Forecasting, 31, 371394, https://doi.org/10.1175/WAF-D-14-00112.1.

    • Search Google Scholar
    • Export Citation
  • NWS, 2023: Colorado flood safety and wildfire awareness week: Flash floods and warnings. NWS, accessed 7 September 2023, https://www.weather.gov/pub/FSWPW4flashfloodswednesday.

  • Perica, S., and Coauthors, 2011: California. Precipitation-frequency atlas of the United States, vol. 6, version 2.0, NOAA Atlas 14, 233 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume6.pdf.

  • Perica, S., and Coauthors, 2013a: Midwestern States (Colorado, Iowa, Kansas, Michigan, Minnesota, Missouri, Nebraska, North Dakota, Oklahoma, South Dakota, Wisconsin). Precipitation-frequency atlas of the United States, vol. 8, version 2.0, NOAA Atlas 14, 289 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume8.pdf.

  • Perica, S., and Coauthors, 2013b: Southeastern States (Alabama, Arkansas, Florida, Georgia, Louisiana, Mississippi). Precipitation-frequency atlas of the United States, vol. 9, version 2.0, NOAA Atlas 14, 163 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume9.pdf.

  • Perica, S., S. Pavlovic, M. St. Laurent, C. Trypaluk, D. Unruh, D. Martin, and O. Wilhite, 2015: Northeastern States (Connecticut, Maine, Massachusetts, New Hampshire, New York, Rhode Island, Vermont). Precipitation-frequency atlas of the United States, vol. 10, version 2.0, NOAA Atlas 14, 265 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume10.pdf.

  • Perica, S., S. Pavlovic, M. St. Laurent, C. Trypaluk, D. Unruh, and O. Wilhite, 2018: Texas. Precipitation-frequency atlas of the United States, vol. 11, version 2.0, NOAA Atlas 14, 283 pp., https://www.weather.gov/media/owp/oh/hdsc/docs/Atlas14_Volume11.pdf.

  • Potvin, C. K., C. Broyles, P. S. Skinner, H. E. Brooks, and E. Rasmussen, 2019: A Bayesian hierarchical modeling framework for correcting reporting bias in the U.S. tornado database. Wea. Forecasting, 34, 1530, https://doi.org/10.1175/WAF-D-18-0137.1.

    • Search Google Scholar
    • Export Citation
  • Prein, A. F., C. Liu, K. Ikeda, S. B. Trier, R. M. Rasmussen, G. J. Holland, and M. P. Clark, 2017: Increased rainfall volume from future convective storms in the US. Nat. Climate Change, 7, 880884, https://doi.org/10.1038/s41558-017-0007-7.

    • Search Google Scholar
    • Export Citation
  • Qi, Y., S. Martinaitis, J. Zhang, and S. Cocks, 2016: A real-time automated quality control of hourly rain gauge data based on multiple sensors in MRMS system. J. Hydrometeor., 17, 16751691, https://doi.org/10.1175/JHM-D-15-0188.1.

    • Search Google Scholar
    • Export Citation
  • Roberts, B., B. T. Gallo, I. L. Jirak, A. J. Clark, D. C. Dowell, X. Wang, and Y. Wang, 2020: What does a convection-allowing ensemble of opportunity buy us in forecasting thunderstorms? Wea. Forecasting, 35, 22932316, https://doi.org/10.1175/WAF-D-20-0069.1.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601608, https://doi.org/10.1175/2008WAF2222159.1.

    • Search Google Scholar
    • Export Citation
  • Schumacher, R. S., and G. R. Herman, 2021: Reply to “comments on ‘flash flood verification: Pondering precipitation proxies.’” J. Hydrometeor., 22, 749752, https://doi.org/10.1175/JHM-D-20-0275.1.

    • Search Google Scholar
    • Export Citation
  • Schumacher, R. S., A. J. Hill, M. Klein, J. A. Nelson, M. J. Erickson, S. M. Trojniak, and G. R. Herman, 2021: From random forests to flood forecasts: A research to operations success story. Bull. Amer. Meteor. Soc., 102, E1742E1755, https://doi.org/10.1175/BAMS-D-20-0186.1.

    • Search Google Scholar
    • Export Citation
  • Stensrud, D. J., and Coauthors, 2009: Convective-scale warn-on-forecast system: A vision for 2020. Bull. Amer. Meteor. Soc., 90, 14871500, https://doi.org/10.1175/2009BAMS2795.1.

    • Search Google Scholar
    • Export Citation
  • Stensrud, D. J., and Coauthors, 2013: Progress and challenges with warn-on-forecast. Atmos. Res., 123, 216, https://doi.org/10.1016/j.atmosres.2012.04.004.

    • Search Google Scholar
    • Export Citation
  • Stevenson, S. N., and R. S. Schumacher, 2014: A 10-year survey of extreme rainfall events in the central and eastern United States using gridded multisensor precipitation analyses. Mon. Wea. Rev., 142, 31473162, https://doi.org/10.1175/MWR-D-13-00345.1.

    • Search Google Scholar
    • Export Citation
  • Tang, L., J. Zhang, M. Simpson, A. Arthur, H. Grams, Y. Wang, and C. Langston, 2020: Updates on the radar data quality control in the MRMS quantitative precipitation estimation system. J. Atmos. Oceanic Technol., 37, 15211537, https://doi.org/10.1175/JTECH-D-19-0165.1.

    • Search Google Scholar
    • Export Citation
  • Weygandt, S. S., S. S. Benjamin, M. Hu, C. R. Alexander, T. G. Smirnova, and E. P. James, 2022: Radar reflectivity-based model initialization using specified latent heating (radar-LHI) within a diabatic digital filter or pre-forecast integration. Wea. Forecasting, 37, 14191434, https://doi.org/10.1175/WAF-D-21-0142.1.

    • Search Google Scholar
    • Export Citation
  • Yussouf, N., K. A. Wilson, S. M. Martinaitis, H. Vergara, P. L. Heinselman, and J. J. Gourley, 2020: The coupling of NSSL warn-on-forecast and FLASH systems for probabilistic flash flood prediction. J. Hydrometeor., 21, 123141, https://doi.org/10.1175/JHM-D-19-0131.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Coauthors, 2011: National mosaic and multi-sensor QPE (NMQ) system: Description, results, and future plans. Bull. Amer. Meteor. Soc., 92, 13211338, https://doi.org/10.1175/2011BAMS-D-11-00047.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Coauthors, 2016: Multi-Radar Multi-Sensor (MRMS) quantitative precipitation estimation: Initial operating capabilities. Bull. Amer. Meteor. Soc., 97, 621638, https://doi.org/10.1175/BAMS-D-14-00174.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., L. Tang, S. Cocks, P. Zhang, A. Ryzhkov, K. Howard, C. Langston, and B. Kaney, 2020: A dual-polarization radar synthetic QPE for operations. J. Hydrometeor., 21, 25072521, https://doi.org/10.1175/JHM-D-19-0194.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Number of FFRs received during 2015–21, on a 60 km × 60 km grid.

  • Fig. 2.

    ARIs, derived primarily from NOAA Atlas 14, for (left) 1- and (right) 3-h durations (mm). Shown are (a),(b) 1-, (c),(d) 2-h, (e),(f) 5-, (g),(h) 10-, (i),(j) 25-, (k),(l) 50-, and (m),(n) 100-yr ARIs. Additional details on the derivation of the ARIs are provided in the text.

  • Fig. 3.

    (a)–(f) Median and (g)–(l) 10th percentile FFG estimates over the 7-yr period of record, showing the actual threshold estimates in (a)–(c) and (g)–(i), as well as the equivalent ARIs in (d)–(f) and (j)–(l). Shown are (left) 1-, (middle) 3-, and (right) 6-h FFG values.

  • Fig. 4.

    Exceedance counts of 6-h CCPA during 2015–21. Shown are (left) fixed thresholds, (middle) ARI thresholds, and (right) FFG ratio thresholds. Panels are shown specifically as follows: (a) 50.8, (d) 63.5, (g) 76.2, (j) 88.9, and (m) 101.6 mm (6 h)−1; (b) 1-, (e) 2-, (h) 5-, (k) 10-, and (n) 25-yr ARI thresholds for the 6-h duration; and (c) 0.75, (f) 1.0, (i) 1.5, (l) 2.0, and (o) 2.5 FFGs for the 6-h duration. A total of 2519 days are included in the analysis.

  • Fig. 5.

    Exceedance counts of 6-h QPE and QPF during 2015–21. Shown are exceedances of (left) 76.2 mm (6 h)−1, (middle) 6-h 5-yr ARI, and (right) 1.5 6-h FFG. Products shown are (a)–(c) stage IV, (d)–(f) CCPA, (g)–(i) radar-only MRMS, and (j)–(l) 0–6-h HRRR QPF. A total of 2204 days are included in the analysis.

  • Fig. 6.

    Exceedance counts of 1-h QPE and QPF during 2015–21. Shown are exceedances of (left) 50.8 mm (1 h)−1, (middle) 1-h 5-yr ARI, and (right) 1.5 1-h FFG. Products shown are (a)–(c) stage IV, (d)–(f) radar-only MRMS, and (g)–(i) 0–1-h HRRR. A total of 1227 days are included in the analysis.

  • Fig. 7.

    Maps showing ETS for QPE/QPF exceedances of thresholds versus observed FFRs during 2015–21. Shown are exceedances of (left) 76.2 mm (6 h)−1, (middle) 6-h 5-yr ARI, and (right) 1.5 6-h FFG. Shown are (a)–(c) stage IV, (d)–(f) CCPA, (g)–(i) radar-only MRMS, and (j)–(l) 0–6-h HRRR QPF. A total of 2204 days are included in the analysis.

  • Fig. 8.

    ETS (multiplied by 100) by dataset and threshold for (left) 6- and (right) 1-h durations. Shown are (a),(b) fixed, (c),(d) ARI, and (e),(f) ratios of FFG thresholds. Dataset/threshold combinations are color coded by ETS, with higher ETS being shaded darker green. Results are for the 2015–21 period, with 1199 days included in the analysis. Verification is against FFRs, and ETS (multiplied by 100) ranges from 0 to 100, with higher scores indicating better correspondence.

  • Fig. 9.

    Performance diagrams evaluating the degree of correspondence between QPE/QPF exceedances of (a),(b) fixed thresholds, (c),(d) ARIs, and (e),(f) FFG, and observed FFRs. The evaluation period is February 2015–December 2021 (1199 days included) for (left) 6-h durations and (right) 1-h durations. Thresholds shown are, from upper left to lower right of each panel, 25.4, 38.1, 50.8, 63.5, 76.2, 88.9, 101.6, 114.3, and 127 mm (6 h)−1; 1-, 2-, 5-, 10-, 25-, 50-, and 100-yr ARIs; and 0.25, 0.5, 0.75, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, and 5 × FFG. Curved lines from upper left to lower right in each panel correspond to 100 × CSI, while the dashed lines correspond to frequency bias.

  • Fig. 10.

    Map of CONUS showing the eight regions used for correspondence evaluation. Dark gray lines are NWS county warning area boundaries, and blue lines are RFC boundaries.

  • Fig. 11.

    As in Fig. 9, but for the southwestern CONUS, and showing (a),(b) fixed, (c),(d) ARI, and (e),(f) FFG thresholds for (left) 6- and (right) 1-h durations.

  • Fig. 12.

    As in Fig. 9, but for the northeastern CONUS, and showing (a),(b) fixed, (c),(d) ARI, and (e),(f) FFG thresholds for (left) 6- and (right) 1-h durations.

  • Fig. 13.

    As in Fig. 9, but for the southeastern CONUS, and showing (a),(b) fixed, (c),(d) ARI, and (e),(f) FFG thresholds for (left) 6- and (right) 1-h durations.