An extreme precipitation categorization scheme, used to temporally and spatially visualize and track the multiscale variability of extreme precipitation climatology, is applied over the continental United States. The scheme groups 3-day precipitation totals exceeding 100 mm into one of five precipitation categories, or “P-Cats.” To demonstrate the categorization scheme and assess its observational uncertainty across a range of precipitation measurement approaches, we compare the climatology of P-Cats defined using in situ station data from the Global Historical Climatology Network-Daily (GHCN-D); satellite-derived data from the Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA); gridded station data from the Parameter-Elevation Regression on Independent Slopes Model (PRISM); global reanalysis from the Modern-Era Retrospective Analysis for Research and Applications, version 2; and regional reanalysis from the North American Regional Reanalysis. While all datasets capture the principal spatial patterns of P-Cat climatology, results show considerable variability across the suite in frequency, spatial extent, and magnitude. Higher-resolution datasets, PRISM and TMPA, most closely resemble GHCN-D and capture a greater frequency of high-end P-Cats relative to the lower-resolution products. When all datasets are rescaled to a common coarser grid, differences persist with datasets originally constructed at a high resolution maintaining a higher frequency and magnitude of P-Cats. Results imply that dataset choice matters when applying the P-Cat scheme to track extreme precipitation over space and time. Potential future applications of the P-Cat scheme include providing a target for climate model evaluation and a basis for characterizing future change in extreme precipitation as projected by climate model simulations.
Extreme precipitation is associated with a multitude of societal and environmental impacts across the United States. Often accompanying severe weather events, including hurricanes, snowstorms, and atmospheric rivers, these meteorological phenomena pose a threat to property, agriculture, infrastructure, and human life while also playing a key role in the water budget (Kunkel et al. 2013). According to the 2017 National Climate Assessment (NCA) Climate Science Special Report, climate change is projected to alter the frequency, severity, and seasonality of extreme precipitation across the United States (Easterling et al. 2017). Climate change mitigation policies and adaption initiatives are greatly influenced by societal vulnerabilities to climate impacts like those associated with extreme precipitation. Therefore, a comprehensive understanding and intuitive way to track and project change across space and time at impacts-relevant scales is critical.
Climate model projections of future change in global precipitation generally follow the Clausius–Clapeyron relationship projecting the atmosphere’s water holding capacity to increase exponentially with temperature at roughly 7% °C−1warming (Allen and Ingram 2002; Trenberth et al. 2003; Pall et al. 2007). Consistent with these expectations, a number of studies have suggested that anthropogenic climate warming may be attributable to an increase in the probability and severity of recent notable heavy precipitation events over the United States such as September 2013 in Colorado (Pall et al. 2017), the rainfall from Hurricane Harvey (Risser and Wehner 2017), and the August 2016 Louisiana event (Wang et al. 2016). However, the sign and magnitude of observed changes in extreme precipitation are not always immediately apparent from observational analysis at local through regional scales. This is due in part to the character of extreme precipitation varying considerably over space and time, making it difficult to detect an anthropogenic signal above natural variability (Easterling et al. 2000; O’Gorman and Schneider 2009). Furthermore, understanding observed and projected changes in the frequency and intensity of key mechanisms associated with extreme precipitation, such as tropical cyclones and atmospheric rivers, is still an area of active research (e.g., Knight and Davis 2009; Prat and Nelson 2013; Gao et al. 2015; Behrangi et al. 2016; Mahoney et al. 2016; Lamjiri et al. 2017).
Several extreme precipitation indices have been developed and applied to a diverse set of datasets using a range of methods to examine changes in extreme precipitation over space and time (Zhang et al. 2011, and references therein). One example is a set of extreme indices developed by the Expert Team on Climate Change Detection and Indices (ETCCDI) as part of the World Climate Research Programme Project on Climate Variability and Predictability (Frich et al. 2002; Alexander et al. 2006). These indices are designed to address a broad range of global climate information needs ranging from the frequency of precipitation threshold exceedances to the maximum length of wet spells. Specific to the United States, precipitation extremes have been monitored using the U.S. Climate Extremes Index (Gleason et al. 2008) in addition to the U.S. Environmental Protection Agency’s climate indicator for annual heavy precipitation aggregated over the continental United States (CONUS; U.S. EPA 2016). While concise and useful, these monitoring approaches provide a great deal of climate information at broad global and national scales, but less information at local to regional scales. The regional variability in extreme precipitation can be large across a single climate region (e.g., the Northwest or Southeast), therefore it is important that monitoring addresses the need for regional relevance while also providing a similarly high level of intuitive interpretability.
The ability to detect, analyze, and track changes in extreme precipitation is also heavily dependent on the reliability of observations and a number of precipitation climatology and dataset intercomparison studies have been conducted at global and regional scales highlighting these differences (e.g., Adler et al. 2001; Guirguis and Avissar 2008). In situ station data are commonly accepted as a primary source and often used as a reference relative to other products. However, station observations are spatially heterogeneous and may be temporally inconsistent, creating observational gaps (Kidd et al. 2017). Satellite-based precipitation measurements, on the other hand, are spatially seamless regardless of in situ gauge density or quality, however these datasets exhibit bias resulting from instrumental and algorithmic error (Sapiano and Arkin 2009; Chen et al. 2013; Behrangi et al. 2014a; Tan et al. 2016). Similarly, bias can be introduced to analysis products through data assimilation and model errors (Bukovsky and Karoly 2007; Bosilovich et al. 2008; Reichle et al. 2017), to gridded in situ products through spatial interpolation (Daly 2006), and simply from spatial resolution (Herold et al. 2017). Additionally, the high spatial and temporal variability characterizing precipitation extremes has been shown to result in exceedingly low agreement among a range of global precipitation measurement products (Donat et al. 2013). Because the dataset one uses has been shown to matter, it is critical to understand and, where possible, constrain observational uncertainty when monitoring and tracking precipitation extremes.
Here we present a climatology of an extreme precipitation categorization scheme as an intuitive way to interpret extreme precipitation climatology, variability, and change over space and time and evaluate its observational uncertainty across a range of datasets. The application of this scheme is motivated by the need for an intuitive, pointwise climate indicator for extreme precipitation that can be provided clearly at scales relevant to societal and environmental impacts. The broad and diverse range of extreme precipitation impacts makes the regional information provided by the indicator suitable for a wide range of interests, including scientists and practitioners, concerned with heavy precipitation climatology, variability, and change at local through CONUS levels. The approach, which is analogous to the familiar Saffir–Simpson hurricane intensity scale, assigns categories from one to five to extreme 3-day precipitation totals at each data point (grid cell or rain gauge). However, unlike the Saffir–Simpson scale, this approach is not designed to rank an individual storm event, but rather provide information at climate scales for pointwise magnitudes of heavy 3-day precipitation totals while being extensible across datasets, time, and space. This approach, adapted from the “R-Cat” categorization scheme first presented in Ralph and Dettinger (2012), can then be stratified by season, geographic subregion, or time period, while change in extreme event categories can be monitored across multiple spatial and temporal scales. By examining the observational uncertainty of this scheme, this study highlights both the utility of the approach as a means to depict the climatology of extreme precipitation, as well as what considerations should be made when choosing a reference dataset.
We apply the extreme precipitation categorization scheme to five datasets, each constructed using a different approach and all provided at a relatively high spatial resolution. All datasets used for the intercomparison are summarized in Table 1 and described in more detail below.
a. TRMM 3B42V7
Satellite-derived precipitation data are from NASA’s Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA) 3B42V7 product (Huffman et al. 2007; Huffman and Bolvin 2015). Prior to its decommissioning in 2015, TMPA was NASA’s flagship precipitation measurement product (Liu et al. 2012). TMPA is provided with a 3-hourly temporal and 0.25° latitude–longitude spatial resolution, globally from 50°N to 50°S latitude from 1998 to 2015. TMPA measurements are produced using microwave-calibrated infrared (IR) estimates from multiple geostationary Earth-orbiting and low-Earth-orbiting satellites (Huffman et al. 2007). The final precipitation estimates contain microwave-derived measurements and calibrated thermal IR-derived estimates. The spatial domain accounts for the tendency of microwave and IR estimates to lose skill at higher latitudes (Huffman et al. 2010). The 3B42V7 product incorporates monthly in situ gauge observations from the Global Precipitation Climatology Center and the Climate Assessment and Monitoring System for bias adjustment.
As a part of the Global Precipitation Measurement (GPM) mission, the Integrated Multisatellite Retrievals for GPM (IMERG) product was developed as an extension of TMPA after its decommission. IMERG data are provided at 0.1° latitude–longitude resolution every half hour between 60°N and 60°S latitude (Hou et al. 2014; Liu 2016). The GPM core observatory presents an increased orbiting inclination over TRMM, from 35° to 65° respectively, rendering more extensive latitudinal coverage (Huffman et al. 2017). Additionally, more advanced instrumentation, capable of capturing multiple phases of precipitation, is possible through the addition of a higher-frequency radar offering an improved sensitivity to light precipitation as well as to snow and ice. IMERG integrates algorithms from TMPA, the Climate Prediction Center morphing technique, and Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks. As of the writing of the paper, IMERG extends from April 2014 to the present, but will be retro-processed to overlap the TRMM era. IMERG and TMPA are freely available via the GES DISC.
The Parameter-Elevation Regressions on Independent Slopes Model (PRISM) uses point data and a digital elevation model (DEM) to generate gridded precipitation data (Daly et al. 1994). We utilize the daily PRISM product, offered on a 0.04° latitude–longitude grid over the CONUS. The PRISM technique attempts to account for physiographic effects such as coastal proximity and orography using the linear regression between gauge measurements and the elevation of the gauge taken from a DEM (Daly et al. 1994, 2002, 2008). The gauge measurements used for interpolation were supplied by various sources including the U.S. National Weather Service Cooperative Observer Network and the Natural Resources Conservation Service daily snowpack telemetry gauges. Station network density relates to population density (Daly et al. 2007). The PRISM product is freely available from Oregon State University’s PRISM Climate Group portal.
The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), atmospheric reanalysis product provides 3-hourly precipitation estimates generated on a 0.5° × 0.625° latitude–longitude grid. MERRA-2 is the latest multiyear reanalysis product produced by NASA’s Global Modeling and Assimilation Office using the Goddard Earth Observing System version 5 (Molod et al. 2015; Gelaro et al. 2017; Reichle et al. 2017). This product corrects model generated precipitation estimates with observations, showing marked improvements upon its predecessor MERRA (Rienecker et al. 2011; Reichle et al. 2017). The method for merging observed precipitation into MERRA-2 assimilates aerosols and integrates MERRA-Land reanalysis for correction (Reichle et al. 2017). Estimates are further merged with precipitation generated by the MERRA-2 atmospheric general circulation model weighted according to latitude. MERRA-2 is freely available via the GES DISC.
The North American Regional Reanalysis (NARR) is based on the regional Eta model and its 3D variation data assimilation system initialized from lateral boundary conditions provided by the National Centers for Environmental Information (NCEI; Mesinger et al. 2006) and is freely available through the National Oceanic and Atmospheric Administration’s Earth System Research Laboratory. This product is provided at a 3-hourly temporal resolution and a 32-km spatial resolution (Lin et al. 1999). Precipitation gauge observations are used to adjust atmospheric moisture and energy field estimates to improve model-derived precipitation fields.
In situ daily observations are from the NCEI Global Historical Climatology Network–Daily (GHCN-D) product (Menne et al. 2012). This dataset contains comprehensive in situ climatic data that have undergone extensive quality control procedures to limit internal, spatial, and temporal inconsistencies (Durre et al. 2010). For this study, only gauges reporting at least 90% of days over the period of 1998–2015 are included. The data are frequently updated and can be obtained freely via the web from NCEI.
a. Extreme precipitation categorization scheme
Extreme precipitation totals are grouped into five categories, or “P-Cats,” according to their overall accumulated 3-day total. P-Cats are defined as follows using even 100-mm thresholds as intuitive bounds on each category. A 3-day total between 100 and 199 mm is assigned to P-Cat 1, between 200 and 299 mm to P-Cat 2, between 300 and 399 mm to P-Cat 3, between 400 and 499 mm to P-Cat 4, and greater than 500 mm to P-Cat 5 (Fig. 1). Three-day totals are defined as the sum of the accumulated precipitation for that day and the two preceding days such that if a P-Cat 4 is recorded on 4 January at a given location, the precipitation accumulated over 2–4 January totaled between 400 and 499 mm. This window is then moved forward by one day each time step so that the 3-day total for each day includes the sum of that day and the previous two. The P-Cat approach is a slightly modified version of the rainfall category or “R-Cat” approach introduced by Ralph and Dettinger (2012). The R-Cat scheme is used operationally by the Scripps Institute of Oceanography Center for Western Weather and Water Extremes (http://cw3e.ucsd.edu/) to categorize discrete rainfall events associated with AR landfalls over California. Here we use the term “P-Cat” to clarify that this scheme is not only geared toward rainfall, hence the more general “precipitation.” While similar to the R-Cat scale, the P-Cat approach offers an intuitive way to interpret and visualize extreme precipitation climatology across the CONUS applied as an indicator of climate change and variability. Our P-Cats 2–5 are the same as R-Cats 1–4, however we introduce a lower category to capture a wider geography of extreme precipitation and a greater diversity of the associated meteorological mechanisms. Multiday totals have been suggested as highly relevant to regional hydrologic impacts including flooding and landslides (Ralph and Dettinger 2012). Furthermore, Ralph and Dettinger (2012) indicate that the 3-day window provides the best representation of major storms, with 2-day totals missing storms and 4-day periods revealing negligible differences to 3-day periods.
We note that in using a fixed threshold we are capturing the most extreme 3-day totals defined relative to the CONUS, rather than relative to the climatology of the grid point or station where the P-Cat occurs. As such, some dry portions of the CONUS do not observe P-Cat events during the time period of our analysis while other wetter places experience relatively frequent P-Cats. While this can be viewed as a caveat, the set of fixed thresholds provides an intuitive way to view extreme precipitation climatology and track change in the magnitude of extreme precipitation over space and time. Furthermore, while not applied in this study, variants on the P-Cat approach could be developed that are regionally specific or customized for different datasets. In that sense, the threshold approach can also carry potential for novel climate model evaluation of extreme precipitation and assessment of projections of future changes.
b. Dataset comparison
To assess the effect of observational uncertainty on using the P-Cat approach we compare the magnitude and frequency of P-Cats across a five-dataset suite. Magnitude is assessed by comparing the maximum observed P-Cat at each data point while frequency is examined both through total P-Cat occurrence as well as the average number of P-Cats observed per year or season. Dataset comparisons are performed and summarized over the CONUS as well as over the seven multistate defined NCA regions (Fig. 2; Easterling et al. 2017). All comparison analyses are performed at the annual and seasonal scales with winter defined as December–February (DJF), spring as March–May (MAM), summer as June–August (JJA), and fall as September–November (SON). Comparisons are performed over the period of maximum overlap across all datasets, 1998–2015. Additionally, IMERG is compared with TMPA for the years of overlap (2014–15). In all analyses involving GHCN-D, the station data are used only for qualitative comparison to what can be considered ground truth.
Results for all the datasets are presented both on their native grid and a common grid for comparison to assess the effect of spatial scale on P-Cat frequency and magnitude. Gridded datasets were rescaled, prior to assigning P-Cats, to a common 0.5° × 0.625° grid over the CONUS. This resolution matches that of the coarsest resolution product included in the study, MERRA-2. To rescale each gridded product, the first-order conservative remapping technique introduced in Jones (1999) was used. Conservative remapping acts to maintain the areal average (Chen and Knutson 2008), unlike alternate methods such as bilinear, bicubic, or distance weighted, and has been used in a number of studies (e.g., Nikulin et al. 2012; Kalognomou et al. 2013; Diaconescu et al. 2015). The spatial correspondence between the patterns of the regridded results are quantitatively summarized using Taylor diagrams, in terms of the centered root-mean-square difference (CRMSD), standard deviation, and correlation coefficient (Taylor 2001). To construct a Taylor diagram, one dataset must be chosen as the reference to measure dataset similarities and differences against. In all Taylor diagrams here, PRISM is used as the reference dataset, chosen because it is the only gridded dataset based primarily on gauge data; however, this is not to say that PRISM is without bias.
a. Annual precipitation climatology
As a first-order comparison of dataset precipitation climatology, annual mean precipitation is shown for each dataset on its native grid in Fig. 3. All datasets show similar general climatology patterns, however, using GHCN-D as a reference (Fig. 3a), considerable differences across the datasets emerge. First-order differences relate to the representation of the effect of topography on precipitation, with the high-resolution PRISM (Fig. 3b) best resembling GHCN-D over the mountainous West and the lowest resolution MERRA-2 (Fig. 3e) showing the least detail. TMPA also has a notable dry bias relative to GHCN-D across the mountains of the Northwest despite its relatively high spatial resolution (Fig. 3c), likely due to limitations in the ability of TMPA to measure snowfall (Bharti and Singh 2015). NARR (Fig. 3d) has a broad dry bias over much of the Southeast compared with GHCN-D and the other three datasets. MERRA-2 is too coarse to resolve most details of individual mountain ranges; however, it does show some qualitative similarities with GHCN-D over the coastal Northwest and northern Rocky Mountains.
b. Maximum P-Cats
The maximum recorded P-Cats are presented for the full year (Fig. 4), for DJF (Fig. 5), and for SON (Fig. 6). Fall and winter are chosen for seasonal analysis because they are concurrent with the most widespread occurrence of heavy precipitation, spanning two primary meteorological mechanisms consistent with the findings in Kunkel et al. (2012): atmospheric rivers often associated with extratropical cyclones in the West in both seasons (Neiman et al. 2008a,b; Ralph and Dettinger 2011, 2012) and tropical systems in the Southeast in the fall (Knight and Davis 2009; Knutson et al. 2010; Kunkel et al. 2010). Results are summarized across seasons and subregions using Taylor diagrams in Fig. 7.
The spatial distribution of the maximum observed P-Cats in GHCN-D (Fig. 4a, analogous to Fig. 3 from Ralph and Dettinger 2012) generally resembles the precipitation climatology in Fig. 3, with the highest P-Cats coinciding with the highest annual rainfall. This is supported in the West by the prevalence of high-end P-Cats across the coastal mountain ranges, the Sierra Nevada and Cascade ranges, and the Transverse Ranges of Southern California. High-end P-Cats are also more prevalent in the Southeast stretching from Texas eastward to the Carolinas. The maximum P-Cats recorded during this period are generally lower across the Great Plains, the desert Southwest, and the interior western rain shadows.
All datasets capture the general pattern of relatively high P-Cats in the western mountains and Southeast, and low P-Cats over the Great Plains and Southwest. However, considerable differences are apparent in P-Cat extent and magnitude. For example, PRISM shows the most widespread P-Cats 4 and 5, likely due at least in part from it having the finest grid resolution and being constructed using gauge data. PRISM also shows a multitude of high-end P-Cats over the Southeast, which the other datasets do not capture, possibly indicative of localized convective precipitation that can be captured by the relatively dense gauge network used to construct PRISM here. TMPA (Fig. 4c) also captures a greater occurrence of high-end extremes compared to NARR and MERRA-2 (Figs. 4d,e).
While regridding reduces some of the P-Cat magnitudes through spatial smoothing, some differences persist (right column of Fig. 4; i.e., regridded to MERRA-2 resolution). In the case that high resolution is necessary for capturing processes leading to extreme precipitation (Herold et al. 2017), such as localized convection, then it is possible that a high-resolution dataset will maintain some high-end totals compared with the coarser products. Potentially illustrative of this effect, PRISM maintains a relatively high number of P-Cats 2–4 after regridding (Fig. 4f). The same effect is apparent for TMPA over the Southeast and Northwest. In addition to spatial resolution, other factors may also be important in determining the level of agreement after interpolation, including differences in the ability of the analysis products to accurately capture land–atmosphere interaction or potential bias and overestimation in PRISM (Mesinger et al. 2006; Bharti and Singh 2015; Molod et al. 2015).
The Taylor diagrams in Figs. 7a and 7b summarize the dataset correspondence for the CONUS annually and seasonally and NCA subregions annually, respectively. At the seasonal scale (Fig. 7a), NARR and MERRA-2 show a lower spatial standard deviation across all seasons with TMPA generally exceeding PRISM. TMPA also has a greater spread in pattern correlation resulting in larger CRMSD values compared with NARR and MERRA-2, especially for DJF and MAM. Both NARR and MERRA-2 cluster closely at the CONUS scale across the seasonal cycle. Less spread is apparent at the subregion scale (Fig. 7b) with all datasets revealing similar spatial variance and correlation relative to PRISM.
In SON, the highest observed P-Cats captured by GHCN-D (Fig. 5a) are over the Pacific Northwest, central Texas, and the Gulf and Atlantic Coasts of the Southeast. P-Cats 1 and 2 are common throughout the higher elevations of the West and across the Midwest through the Northeast. Several examples of southwest to northeast oriented bands of P-Cat 2s are apparent in the central United States. For example, one band extends from northern Illinois to southeastern Michigan providing a useful baseline for comparing the details of the other datasets. In many cases, very high-end P-Cats can readily be traced to the contributing storm. For example, the high values over eastern North Carolina are the result of Hurricane Floyd that made landfall in September of 1999, resulting in catastrophic societal impacts (Easterling et al. 2000). The similarities between Figs. 4 and 5 over the Southeast indicate that most of the highest recorded P-Cats occur during SON.
Consistent with our previous findings, PRISM captures the greatest magnitude and spatial extent of high-end totals (Fig. 5b), sharing the most qualitative similarities with the GHCN-D results, including the collocation of the southwest to northeast oriented bands of P-Cat 2s across the Midwest and topographic enhancement in the West. These features are generally captured in the other datasets, however with lower magnitudes. In some cases, regional scale details are not similar across the suite especially in the case of the high-end P-Cats over the Southeast where MERRA-2 and NARR show varying degrees of dissimilarities with the other datasets. As in Figs. 3 and 4, there is a close relationship between spatial resolution and P-Cat magnitude, however even considering a systematic resolution related bias, some fundamental differences persist.
After spatial interpolation, PRISM and TMPA maintain high-end totals over Washington and North Carolina (Figs. 5f,g). MERRA-2 and NARR generally show systematically lower P-Cat magnitudes relative to the regridded PRISM and TMPA, providing further evidence of factors other than resolution being influential on dataset agreement (Figs. 5e,h). In Fig. 7c dataset spread is small between MERRA-2 and NARR, especially across the variance ratio, while TMPA tends to exceed PRISM’s spatial variance in most subregions. Note that we omit results for Great Plains North because of its very low number of grid cells with P-Cats.
In DJF (Fig. 6), the overall spatial coverage of stations recording P-Cats is lower than SON, especially across the central United States. GHCN-D shows the most extreme precipitation occurring along the western mountains stretching from northern Washington to southern California and across the southern Midwest and Southeast (Fig. 6a). This is evidence that the intense precipitation from North Pacific extratropical cyclones is maximized by the orographic enhancement of landfalling atmospheric rivers (e.g., Neiman et al. 2008a,b; Guan et al. 2010, 2013; Ralph and Dettinger 2012). Across the eastern half of the CONUS, high-end P-Cats are the result of strong midlatitude cyclones that strengthen along the strong temperature gradients formed by southward excursions of Arctic air masses.
In agreement with GHCN-D, PRISM shows many of the high-end totals that occur across the West (Fig. 6b). TMPA’s limitations at capturing snowfall are apparent with considerable underestimation of the magnitude of P-Cats along the Sierra Nevada and Cascades (Fig. 6c). These results are consistent with Behrangi et al. (2014a), emphasizing the inherent challenges associated with measuring precipitation in remote regions, where station data are sparse, orography and finescale processes are key, and precipitation type limits the utility of TMPA retrievals. Substantial differences in the magnitude of P-Cats captured by NARR and MERRA-2 (Figs. 6d,e) suggest that grid resolution may inhibit the ability of a dataset to capture the impact of localized phenomena, although both datasets capture the broad patterns of P-Cats across the West and Southeast.
While regridding reduces the overall magnitude of P-Cat intensity in PRISM and TMPA, both datasets continue to show more P-Cats 2 and 3. Over the Southeast, resolution does not appear as important at capturing high-end P-Cats, which is consistent with the typical synoptic-scale storms that result in extreme precipitation here in winter. Additionally, this provides evidence that differences across datasets are also driven by dataset construction, and not solely a result of grid resolution. The Taylor diagram in Fig. 7d shows that TMPA exhibits a higher variance relative to PRISM over the Southeast and roughly the same in the Northwest, with all other datasets and subregions showing a slightly lower spatial variance than PRISM and pattern correlation coefficients between 0.9 and 0.99.
c. P-Cat frequency
As for the comparison of P-Cat magnitude in the above section, P-Cat frequency, computed as P-Cats per year or season, is compared across the entire year (Fig. 8), for SON (Fig. 9), and for DJF (Fig. 10). Differences across the data suite are also presented as biases, with reference to PRISM. Results are further summarized using Taylor diagrams in Fig. S1 in the online supplemental material.
The highest annual frequency of P-Cats in GHCN-D (Fig. 8a) generally corresponds spatially to the highest magnitude P-Cats in Fig. 4a. These areas include the Southeast and the mountains of the Pacific Northwest and California where annual P-Cat frequency exceeds 20. For reference, if a station shows an average frequency of 20 P-Cats per year, this would mean that on average 20 days of every year are part of a 3-day precipitation total that exceeds 100 mm. In such cases, heavy precipitation is relatively common and the simple occurrence of a P-Cat may not necessarily be considered highly extreme in a local climatological context. In contrast, a large swath of the eastern half of the domain experiences between 2 and 8 P-Cats annually, while P-Cats are infrequent across the High Plains and the inland West. P-Cats 1 and 2 make up the vast majority of P-Cats per year CONUS-wide with some areas of the West and Southeast recording as many as two high-end P-Cats per year (not shown).
All datasets capture similar principal spatial patterns of annual P-Cat frequency. Qualitatively, PRISM (Fig. 8e) most closely resembles GHCN-D, even capturing many of the small-scale features in areas of complex terrain and regional variations in the Southeast. TMPA (Fig. 8f) shows notable positive frequency bias across the eastern half of the CONUS and over the valleys of the coastal Northwest with lower frequencies across the western mountains, compared with PRISM. NARR and MERRA-2 both share similarities, with systematically lower P-Cat occurrence compared with PRISM after regridding. NARR shows a greater frequency of P-Cats across the Sierra Nevada compared with MERRA-2, however both datasets show considerable negative frequency biases across most of the West.
During SON (Fig. 9), GHCN-D shows the highest frequency of P-Cat occurrence in the Northwest and Southeast with values exceeding 10 P-Cats per season along the coasts of Washington and Oregon and between 2 and 4 in southeast Texas and southwest Louisiana (Fig. 9a). This indicates that at least 4 days per fall are part of a 100-mm or greater 3-day precipitation total on average in these places. There are many commonalities between the frequency map in Fig. 9a and the maximum P-Cat map in Fig. 5a, with many of the regions that experience high values of one also experiencing high values of the other. However, this is not always the case in some parts of the South and along the Atlantic Coast of Florida where P-Cats are common, but rarely exceed P-Cat 2.
Consistent with expectations based on the above results, the observation-based TMPA and PRISM (Figs. 9b,e) share the most similarities with GHCN-D. PRISM captures the overall spatial patterns and frequency magnitudes, but it is also capable of resolving small scale features such as higher frequencies over southeastern Texas. Over the Northwest, as in other analyses, TMPA’s limitation at capturing frozen precipitation likely contributes to its negative biases over the mountains (Behrangi et al. 2014a), however it shows a weak positive frequency bias across the lower elevations of the coastal Northwest. NARR and MERRA-2 resemble each other with systematic low frequency bias across the CONUS (Figs. 8f–h).
During winter (Fig. 10a), the P-Cat frequencies are highest across the mountains of Washington, Oregon, and California with elevated P-Cat frequencies also occurring in the higher elevations of Idaho, Utah, and Arizona. In contrast, the other area of high P-Cat occurrence is a broad swath of the South and southern Midwest where Gulf of Mexico moisture fuels heavy precipitation associated with midlatitude cyclones. PRISM (Fig. 10b) captures the mountain ranges across the West and the general pattern in the East (Fig. 10e). However, it underestimates the isolated high frequency P-Cats that GHCN-D captures over the higher terrain of Idaho and Utah. TMPA (Fig. 10b) resembles both PRISM and GHCN-D, but with substantial high frequency biases over the lower elevations of the West Coast and throughout the Southeast (Fig. 10f). A physical explanation for this widespread bias in TMPA is unclear as it is not consistent with findings from other seasons or at the annual scale. TMPA also shows negative biases along the immediate Pacific coast, suggesting frozen precipitation is not the only contributor to underestimation in the West. NARR and MERRA-2 are quite similar with overall negative frequency biases across the CONUS with the exception of some western valleys.
d. Annual P-Cat occurrence
Figures 11–13 show spatially aggregated P-Cat frequencies over time. Here we only show results for annual frequency at the CONUS scale, for DJF over the Northwest and for SON over the Southeast to capture the regions and corresponding seasons where high-end P-Cats are most common. In each figure, the left column shows the number of P-Cats per category on the native grid of each dataset, while the right column represents the datasets interpolated to the MERRA-2 grid. This means that all things equal, prior to regridding the coarser resolution datasets will have a lower frequency of P-Cat occurrence, simply because there are more data points in the high-resolution cases. In this sense, the left column is intended for qualitative comparison while the right column compares datasets with an equal number of data points.
For most years the full range of P-Cats occurs somewhere over the CONUS according to GHCN-D (Fig. 11a). There is no apparent systematic trend in the frequency of any P-Cat occurrence across the CONUS and the datasets. Comparing each dataset to GHCN-D, datasets tend to show a similar evolution of interannual variability. For example, the year 2000 shows a relative minimum in P-Cat 2s in all datasets. Consistent with results from Figs. 4–10, high-end P-Cats are most common in PRISM (Fig. 11b) while they are uncommon in NARR and MERRA-2 (Figs. 11d,e). When compared on a common grid, P-Cat 1 frequencies are more comparable across the suite. PRISM (Fig. 11f) maintains a number of P-Cats 3 and 4 after regridding. The coefficients of variation for each P-Cat time series, computed as the standard deviation of each dataset’s annual frequency divided by its mean, are recorded in Table 2. All datasets show a greater year-to-year variability in higher-end P-Cats relative to lower-end P-Cats. For example, GHCN-D has a coefficient of variation for the annual frequency of P-Cat 5 that greatly exceeds that of P-Cat 1.
During SON over the Southeast (Fig. 12), GHCN-D shows a high number of P-Cats 4 and 5 occurring during 1998 and 1999 (Fig. 12a) with considerable interannual variability throughout the record. PRISM (Fig. 12b) continues to show the greatest number of high-end P-Cats compared with the other datasets. TMPA also captures higher-end P-Cats in the early part of the record (Fig. 12c), including 1999. NARR and MERRA-2 (Figs. 12d,e) show primarily P-Cats 1 and 2, with MERRA-2 showing some P-Cat 3s in 1998 and 1999 suggesting that it realistically represents the high-end totals captured in the finer resolution datasets but with diminished magnitude. This reduced magnitude of extremes likely results in part from the coarser reanalysis resolution, but differences may also stem from the dataset generating algorithms. When compared on a common grid, dataset agreement is much stronger, although NARR stands out as having the lowest P-Cat occurrence, and datasets capture similar interannual variability. The coefficient of variation results continue to show greater variability among the most extreme P-Cats across the five-dataset suite (Table 3).
Resolving topography is important for capturing P-Cats in DJF in the Northwest subregion (Fig. 13). GHCN-D and PRISM (Figs. 13a,b) show the most qualitative agreement, including with interannual variability, with NARR also sharing commonalities in year-to-year fluctuations (Fig. 13d). When compared on common grids, overall magnitudes of P-Cat 1 are in reasonable agreement across the suite, however interannual variability is still somewhat different in TMPA (Fig. 13g) compared with PRISM and NARR (Figs. 13f,h). These results further suggest using caution when measuring and monitoring extreme precipitation across areas of complex terrain where orographic effects on precipitation are key and extremes are often associated with frozen precipitation. The datasets annual P-Cat frequency results for DJF in the Northwest continue to show greater variability as the P-Cats increase (Table 4).
e. Comparison of individual storms
While the primary aim of implementing the P-Cat scheme in this study is to track and describe extreme precipitation climatology at the grid point scale, as another way to intercompare the five datasets and as a way to further demonstrate the reliability of the P-Cat approach in capturing extreme precipitation, we show the P-Cat values associated with individual historically impactful storms. Figure 14 shows four examples while additional examples are provided in Figs. S2–S4. Examples were chosen to capture a wide range of storm types occurring across a diverse range of geographic areas. The top row shows the P-Cat values associated with the landfall of Hurricane Floyd in September of 1999. Note that the P-Cat values are based on the 3-day rainfall totals ending on the date specified to the right of each row. All datasets capture high-end P-Cats (3–5) over a similar region, while NARR shows relatively modest P-Cats. This indicates that all datasets with the exception of NARR are capable of capturing the magnitude of rainfall associated with this intense tropical system.
In the second row from the top, P-Cats from a notable atmospheric river event from November 2006 are shown with generally good qualitative agreement across the datasets despite the coarser datasets showing overall lower P-Cat magnitudes. There is some indication, however, that PRISM overestimates the magnitude of rainfall over the northern Oregon Coast Range and the Olympic Mountains. The third row from the top compares P-Cats for an intense winter storm that occurred during December 2015. All datasets capture the swath of P-Cats 1 and 2 extending from northeast Texas into central Illinois indicating reasonable qualitative agreement in the magnitude and extent of heavy precipitation from this powerful winter storm. Last, the bottom row shows a comparison for a strong mesoscale convective system that occurred in September 2004. While all datasets can capture the convective precipitation to the extent that it surpasses the P-Cat 1 threshold, TMPA shows a relatively larger area of P-Cats compared with the other datasets suggesting some difference in how it captures convective precipitation in this event. We also note that the P-Cat approach is capable of visualizing the propagation of weather events that produce heavy precipitation totals. Figure S5 shows 5-day evolutions of P-Cats for two hurricanes and one winter storm. The P-Cats show the temporal, spatial, and intensity evolution of the storms highlighting the efficacy of the 3-day total approach to capture observed heavy precipitation events.
f. IMERG intercomparison
Considering the potential benefits of using remote sensing to continuously monitor and track extreme precipitation over time, we compare IMERG data to its predecessor, TMPA in Fig. 15. IMERG has only been online for a short time, prohibiting a comprehensive climatology intercomparison. We therefore leverage the existing overlap period (April 2014–December 2015), as of the writing of paper, using the maximum observed P-Cats as well as total observed P-Cat frequency for comparison. Over this 2-yr period, there is some indication that IMERG captures more small-scale features and better represents extremes over the mountainous West (Fig. 15c). These results are likely attributable, at least in part, to IMERG’s higher spatial resolution, but may also be due to improvements in GPM sensors to measure snow (Hou et al. 2014). This qualitatively brings IMERG closer to GHCN-D with exceptions. For example, IMERG does a poorer job at capturing the band of P-Cat 2s stretching from northeast Texas through Missouri compared with TMPA and overestimates P-Cat magnitude over eastern Tennessee and northern Alabama. P-Cat frequencies reveal similarities between TMPA and IMERG (Fig. 15d).
5. Summary and conclusions
Here we present a climatology of a categorization scheme for monitoring and tracking change in extreme precipitation over space and time and assess its observational uncertainty. The approach assigns a category between one and five to 3-day storm totals (Fig. 1). Intended as a way to track extreme precipitation as a climate indicator, this scheme provides a platform for monitoring change in extreme precipitation across scales, datasets, time, and geography. However, precipitation observation products are all associated with some degree of bias, making it important to understand and attempt to constrain observational uncertainty when analyzing extremes. To demonstrate the utility of the P-Cat scheme as a way to track extreme precipitation events in time and space and to highlight the importance of understanding observational uncertainty, we apply the 3-day total categorization as a basis for a dataset intercomparison across four gridded products, spanning a range of construction methodologies, and in situ station data.
All gridded datasets capture the principal spatial patterns of mean annual precipitation climatology, with higher resolution datasets capturing more orographic features than the lower resolution datasets (Fig. 3). Focusing on extremes, the magnitude (Figs. 4–7) and frequency of P-Cats (Figs. 7–11) are assessed using the P-Cat scheme as a metric for intercomparison. In general, the higher resolution datasets more closely resemble gauge data across the CONUS and seasons. Specifically, PRISM shares many detailed commonalities with station data while the next highest resolution dataset, TMPA, is also similar overall. NARR and MERRA-2 reanalysis show systematically lower magnitude and frequency of P-Cats across the CONUS and seasonal cycle. TMPA shows systematically lower P-Cat magnitudes and frequencies across the mountains of the West during fall and winter when a large portion of precipitation falls as snow, consistent with known limitations of TMPA at capturing frozen precipitation.
When all datasets are interpolated to a common coarser grid, differences persist but are reduced. In particular, the datasets that were originally constructed at the highest spatial resolution often maintain the highest magnitude of P-Cats, even after coarsening of the gridded data. This feature could result from a number of factors; however, one likely contributor is the fact that a dataset constructed originally at fine resolution is able to capture extreme events that simply could not be resolved at coarser grids (Herold et al. 2017). This may be particularly acute in areas of complex topography where, for example, PRISM is able to resolve local high magnitude events that the other datasets are simply not capable of capturing. Other factors could include other underlying biases in the dataset stemming from factors such as spatial and temporal heterogeneity in gauges (Kidd et al. 2017), sensor sensitivity to precipitation type (Behrangi et al. 2012, 2014b) or methods of retrieving precipitation from individual sensors in satellites (Kummerow et al. 2011), interpolation methods or misrepresentation due to the sparseness of the observing network (Min et al. 2011), or general deficiencies and model limitations simulating precipitation amounts in reanalysis (Kharin et al. 2013). The annual occurrence of P-Cats shows similar differences across the suite, with a general positive relationship between grid resolution and the number of P-Cats (Figs. 11–13). Preliminary assessment of IMERG, the follow-on satellite product to TMPA, suggests some potential improvements over TMPA in capturing frozen precipitation and finescale extremes (Fig. 15). Ultimately, results suggest satellite data show promise in capturing the overall patterns of heavy precipitation climatology, which could lead to improved monitoring in regions of sparse ground observations.
We acknowledge some assumptions and limitations in use of the P-Cat scheme as a climate indicator for extreme precipitation. First, the use of fixed thresholds for the entire CONUS is intended to highlight the heaviest precipitation across the domain in an intuitive way. As such, some drier regions do not record P-Cats as defined in this study, even though smaller totals may be considered impactful relative to the local climatology. The synoptic scale of measurement also captures totals at a temporal scale often associated with impacts such as flooding and landslides (Ralph and Dettinger 2012) but does not distinguish between shorter and longer duration totals. This may be relevant for lower-end P-Cats that could result from short duration extreme convective events. It is also possible that a single storm may be counted more than once due to the moving 3-day window used to construct the P-Cat. Finally, while we include the five datasets here in an effort to capture a range of measurement methods while focusing on high resolution products, this analysis could be extensible to other observations.
Overall, the P-Cat characterization scheme applied here offers several opportunities for future research and applications. By comparing P-Cat climatology in climate model simulations of the historical climate to observations, this scheme could provide a novel target for climate model evaluation. As further extension of the P-Cat approach for dataset intercomparison, P-Cat thresholds could be customized to a dataset’s grid resolution to account for the inherently lower magnitude of extremes captured at coarser versus finer resolutions, although this could come with somewhat reduced levels of intuitiveness since a P-Cat 1 would be different for different datasets. The P-Cat scheme could also be used for assessing future projections of changes in extreme precipitation in climate models. Last, the P-Cat approach is easily extensible to other regions, facilitating temporal and spatial tracking and monitoring of extremes, dataset intercomparison, model evaluation, and future change assessment.
This work was carried out, in part, at the Jet Propulsion Laboratory, California Institute of Technology and at Portland State University, under a contract from National Aeronautics and Space Administration (NASA). Support for this project was provided by the NASA Indicators for the National Climate Assessment (NCA) Program under NASA award NNX16AG60G. Partial support for Emily Slinskey was provided by a JPL student summer internship. We thank Judah Detzer, Heejun Chang, and Andrew Martin for their help with this project. Additionally, we thank Ali Behrangi for his contributions with satellite observation information.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JHM-D-18-0148.s1.