Extreme short-duration rainfall can cause devastating flooding that puts lives, infrastructure, and natural ecosystems at risk. It is therefore essential to understand how this type of extreme rainfall will change in a warmer world. A significant barrier to answering this question is the lack of sub-daily rainfall data available at the global scale. To this end, a global sub-daily rainfall dataset based on gauged observations has been collated. The dataset is highly variable in its spatial coverage, record length, completeness and, in its raw form, quality. This presents significant difficulties for many types of analyses. The dataset currently comprises 23 687 gauges with an average record length of 13 years. Apart from a few exceptions, the earliest records begin in the 1950s. The Global Sub-Daily Rainfall Dataset (GSDR) has wide applications, including improving our understanding of the nature and drivers of sub-daily rainfall extremes, improving and validating of high-resolution climate models, and developing a high-resolution gridded sub-daily rainfall dataset of indices.
One of the most important questions in climate change research is how the intensity, frequency, and duration of extreme rainfall will change with global warming. This question must be approached in several ways, as extreme rainfall occurs over different spatial and temporal scales and has multiple drivers, and needs to be answered on a global scale. Recent work has focused on analyzing global-scale trends in time series of land-based precipitation extremes that occur on daily time scales. For example, Westra et al. (2013) showed that close to two-thirds of stations across the world displayed increasing trends in annual maximum rainfall while Groisman et al. (2005) found an increasing probability of intense precipitation events (e.g., the frequency of very heavy precipitation or the upper 0.3% of daily precipitation events) for many extratropical regions. Other work has characterized global daily rainfall extremes via a series of indices that have provided useful information for climate modelers and hydrologists (e.g., Frich et al. 2002; Alexander et al. 2006; Donat et al. 2013a). While observed long-term (>40 yr) globally consistent daily rainfall datasets do not yet exist, the work on indices has facilitated the study of long-term changes of rainfall extremes using good-quality station data covering large parts of the world (Donat et al. 2013b).
Research is now turning to the sub-daily scale (1–6 h) to further our understanding of the nature and drivers of intense rainfall as sub-daily precipitation extremes cause flash floods and can trigger landslides, which result in damage to infrastructure, lives, homes, and ecosystems (Georgakakos 1986; Marchi et al. 2010; Archer and Fowler 2018; Barbero et al. 2019). Such extremes are relatively poorly understood; we do not fully understand the processes that cause extreme precipitation or its inherent intermittency properties (Trenberth et al. 2017) or variability under the current climate. An increasing number of regional studies have explored the relationship between sub-daily rainfall extremes and coincident temperature [e.g., Hardwick Jones et al. (2010) for Australia; see commentary by Lenderink and Fowler (2017)]. These have found that hourly extremes may scale at a higher rate than that expected (and observed) for daily extremes—higher than Clausius–Clapeyron scaling [~6.5% (°C)−1] [for the Netherlands: Lenderink et al. (2017) and Lenderink and van Meijgaard 2008; for the Netherlands and Hong Kong: Lenderink et al. (2011); for Austria: Formayer and Fritz (2017)]. Studies have also used longer records to look for trends or changes in hourly rainfall but these have tended to be over relatively small scales with the exception of some national-scale studies (e.g., Sen Roy 2009; Westra and Sisson 2011; Barbero et al. 2017; Guerreiro et al. 2018; Sen Roy and Rouault 2013). Previous studies have used different methodologies and have shown inconsistent changes, although most point to a general increase in intensity [Westra et al. 2014; see Hartfield et al. (2017) for a graphical summary]. However, high-resolution modeling studies have shown us that it is unlikely that extreme hourly precipitation intensities can simply be extrapolated from scaling relationships associated with warming due to the influence of atmospheric moisture, dynamical feedback to increased latent heat release, and changes in atmospheric circulation on larger scales (Lenderink and Fowler 2017; Chan et al. 2016; Bao et al. 2017; Prein et al. 2017; Wang et al. 2017; Barbero et al. 2018).
State-of-the-art research on extreme precipitation therefore currently uses either quasi-global/continental-scale data at a daily time step or regional (country)-scale sub-daily data. Widely used daily datasets include the E-OBS gauge dataset for Europe (Klein Tank et al. 2002), a dataset of climate variables that contains 10 584 gauges and is updated regularly (http://www.ecad.eu/). The NOAA Global Historical Climatology Network (GHCN)-Daily dataset is popular and large, with over 100 000 stations (Menne et al. 2016). The Global Precipitation Climatology Centre (GPCC) have a near-real-time gridded daily precipitation product using over 7000 rain gauge stations (Schamm et al. 2014). The Asian Precipitation–Highly-Resolved Observational Data Integration Toward Evaluation (APHRODITE) daily gridded precipitation dataset uses around 12 000 gauges; however, this project has now ended and the dataset is not updated (Yatagai et al. 2012).
Other quasi-global data products exist that are not based on gauged observations, including daily satellite datasets like Tropical Rainfall Measuring Mission (TRMM)/TRMM Multisatellite Precipitation Analysis (TMPA; Huffman et al. 2007) and their higher-resolution replacement, Global Precipitation Measurement (GPM), which records precipitation and other variables every 3 h (Hou et al. 2014). Multi-Source Weighted-Ensemble Precipitation (MSWEP; Beck et al. 2017a) is a 3-h 0.25° global gridded precipitation dataset from 1979 to 2014, based on merged gauged, satellite, and reanalysis data products. Some quasi-global satellite precipitation datasets developed recently provide measurements at the 3-hourly scale and even at hourly and half-hourly scales, but have short record lengths, often starting later than 1998, including TRMM (Huffman et al. 2007; Trenberth et al. 2017), Precipitation Estimation from Remotely Sensed Information Using Artificial Neural Networks (PERSIANN; Hsu et al. 1997), Climate Prediction Center morphing technique (CMORPH; Joyce et al. 2004), Global Satellite Mapping of Precipitation (GSMaP; Kubota et al. 2007), and GPM (Hou et al. 2014). Radar and merged rainfall measurements are good supplements for gauge observations; however, they are measuring different things. Gauges measure the weight or volume of rainfall directly whereas satellite and radar infer rainfall rates based on the interaction of signals with hydrometeors. These indirect measurements then depend on algorithms to convert them to precipitation rates and are subject to a range of uncertainties (see Beck et al. 2017b; Michaelides et al. 2009; Krajewski et al. 2010; Thorndahl et al. 2017). These data products are limited in their usefulness by systematic biases and are yet to be fully validated by sub-daily observations, as no global sub-daily gauge dataset exists. In particular, these datasets need to be validated for precipitation extremes.
The largest current dataset of sub-daily rainfall gauges is the Integrated Surface Database (ISD; Smith et al. 2011). The database includes over 35 000 stations worldwide, with over 14 000 “active” stations updated daily. The ISD includes numerous meteorological parameters including precipitation amounts for various time periods. However, the actual rainfall data contained within it is very limited. Only ~8000 stations report hourly rainfall and many of these are extremely short records with large amounts of missing data (as we demonstrate in section 4) and have not yet been subject to quality control or tests of homogeneity. Although many countries collect such data, most do not (see Table A1 in appendix A). There is no single repository for sub-daily rainfall data and, until now, there has been no concerted effort to create such a database. Thorne et al. (2017) call for a comprehensive global set of data holdings that integrate across essential climate variables and time scales and outline the steps that need to be taken to make this happen. Zhang et al. (2017) state that progress in understanding the changes in sub-daily rainfall extremes has been limited due to the lack of availability of sub-daily rainfall data, and call for efforts to be made to create a global sub-daily rainfall dataset, which would have wide applications in hydrology and for the validation of the emerging generation of very high-resolution convection-permitting climate models and remote sensing data. Further, coupled with model simulations it would facilitate improved understanding of how an important component of the global climate system will respond (and is already responding) to atmospheric warming, and whether there are dangerous or important thresholds in terms of changes to precipitation extremes.
To address this need, we have identified and collated sub-daily rainfall data from across the globe to form the Global Sub-Daily Rainfall Dataset (GSDR) as part of the INTENSE project (Blenkinsop et al. 2018; https://research.ncl.ac.uk/intense/), in conjunction with the World Climate Research Programme (WCRP)’s Grand Challenge on Weather and Climate Extremes (https://www.wcrp-climate.org/grand-challenges/gc-extreme-events) and the Global Water and Energy Exchanges Project (GEWEX) Science Questions (https://www.gewex.org/about/science/gewex-science-questions). The “Intelligent use of climate models for adaptation to non-stationary hydrological extremes” (INTENSE) project is a European Research Council–funded project to lead a community effort into the collection and analysis of sub-daily precipitation data, building on the ISD and model outputs. This paper outlines the gauge data we have collected so far for the GSDR, providing details of spatial coverage, record duration, and completeness in an ongoing process to form the first comprehensive global sub-daily rainfall dataset.
2. Data collection
a. Data availability
While many international efforts have already struggled to make long-term daily rainfall records widely accessible, the situation for sub-daily data is even more challenging (Zhang et al. 2011; Zwiers et al. 2013; Alexander 2016). As such, this work represents the cooperation and support of over 100 meteorological offices, environmental agencies, and researchers. The ISD (Smith et al. 2011) forms the foundation of this dataset and through collaboration we were able to collect additional data free of charge (for academic research purposes) from the countries listed in Table A2 in appendix A. Data were typically obtained from the National Hydrological and Meteorological Services (NHMSs), but sometimes from their environment agency. Some data were also provided by research groups who have field campaigns in catchments and were willing to share their data. However, because of the requirements of license agreements some of the raw data are currently not available outside of the project partners (see Table A2). We aim to demonstrate the value of this dataset and encourage data owners to feed into a freely available version through ongoing work in the INTENSE project.
Through our data collection efforts we have found that sub-daily rainfall is often available in more recent years, given the advancement of rain gauges and electronic recording devices/telemetry. Short records of sub-daily rainfall are available from many countries, but longer records, particularly useful for the assessment of trends and variability, are much harder to access.
Data collection is still ongoing, and we have identified additional sub-daily rainfall datasets for Spain, the Philippines, New Zealand, a few stations in Kenya, Tuvalu, the Caribbean, South Africa, Colombia, Fiji, Israel, India, Denmark, Slovenia, Iran, Bangladesh, Russia, Hungary, Czechia, China, Uruguay, Vanuatu, Hong Kong, Mexico, Poland, and Vietnam. Additional data across the world has also recently become available from the U.S. Air Force, which will also be collected. Work is ongoing to collect these data and add them to the database. Data collection is, however, a very time-consuming exercise and the dataset presented in this paper represents the efforts of a very small team building a network of contacts and as such, more rainfall data are certainly available than described here. Data policy remains a large constraint on developing this dataset further. While many countries are moving toward an open data policy, many still restrict access to data or charge very large sums of money for access. These policies are understandable but hinder scientific progress on answering global-scale questions.
b. Data formats
As data were collected from many different sources, it is unsurprising that the datasets obtained were in different formats. Data were mainly provided as ASCII files (.txt or .csv), but sometimes as a database (Microsoft Access) or in netCDF format. Each of the national datasets was also submitted in a different structure (matrix of days and hours, time series) or files were often split by month, year, or some other time aggregation. Sometimes all stations were included in one file or each station might be a separate file. The data were also accessed in several different ways. Data holders would either send the information directly, or provide a link to a web interface, FTP, or WSDL service. This highlights the need for consistent standards and formats across national agencies to facilitate easier collaboration for global scale analyses, as well as the necessity for international initiatives, such as the Copernicus Climate Change Service (https://climate.copernicus.eu), to archive and maintain such datasets, as called for by Thorne et al. (2017). For consistency, we converted all the data to the same format before use, which records the data at a 1-h time step.
When processing the data there were many differences. First, data were obtained at different time steps: typically 1 h but also 1 min, 5 min, 10 min, 15 min, 30 min, 3 h, and 6 h. Almost all data were provided at 1-h or finer resolution and so only these data are presented here, although 6-h data were also collected for Bermuda (1 station), Brazil (297 stations), and Canada (72 stations). Data at 3- and 6-h resolution are also available from the ISD but the quality is highly variable (for 3-h data there are 2130 stations with more than one wet hour in the record and for 6-h data there are 5675 stations fulfilling the same criterion). Furthermore, we are aware of changes in measurement precision for some countries that create inhomogeneities in the time series [e.g., the United States (Barbero et al. 2017) and United Kingdom (Kendon et al. 2018)]. Second, some formats differentiated between zero rainfall and no data, while others did not, which can lead to ambiguity about whether the gauge was working or not at a particular time. Third, data were provided with varying levels of quality control information, some with very detailed metadata of up to 20 quality-control codes while others had none.
A particular characteristic to note is the precision of measurement as this has an impact on the analysis of the data. Typically this was 0.1 or 0.2 mm from tipping-bucket rain gauge (TBR) records. However, reported resolutions range from 0.001 mm (from interpolated pluviograph records in Australia), to 0.1 in. (2.54 mm) in the United States. Such differences in resolution create problems when comparing rainfall statistics between countries, for example, when comparing wet or dry hours globally or for fitting extreme value distributions. However, it is possible to overcome this limitation (to a certain extent) by converting the data of the finer resolution to a coarser and common resolution following previous studies (Groisman et al. 2012).
3. Dataset characteristics
a. Number and distribution of gauges
At the time of writing, hourly data have been collected for 23 687 stations, with 15 331 of these stations from non-ISD sources. This is almost double the number of stations with rainfall data available in the ISD. These gauges cover 200 territories, 38 of which were collected by this project (territories are defined by the International Organization for Standardization alpha-2 codes1). A total of 452 of these stations are coincident, located within 100 m of each other: 134 of these are potentially duplicate gauges from the ISD dataset while the remainder seem to be genuinely collocated gauges. Gauge density is highly variable: Singapore has the highest density network of 33 stations over an area of 563 km2, and Switzerland and the United Kingdom also have very high network densities.
b. Length of records, gauges per year
Table 1 shows that 22% of stations have records longer than 30 years. These longer records are suitable for looking at changes in rainfall over time, such as trend analysis and the influence of natural variability. However, shorter records (e.g., 56% have records longer than 10 years) are still useful for other analyses, such as the assessment of sub-daily precipitation climatology, including extremes and seasonal and diurnal variability, and for applications including the validation of remotely sensed rainfall products. Figure 1 shows the number of gauges recording hourly rainfall for each year; the earliest record begins in 1911 and is located in Hobart, Tasmania. Very few gauges have records longer than 60 years (Table 1). The U.S. gauges commence in 1950 and form one of the most complete datasets. The majority of other records begin after 1990. It should be noted that the record lengths discussed here are those of available digitized data. It is possible that longer datasets exist but only as paper records. Initiatives such as International Atmospheric Circulation Reconstructions over the Earth (ACRE) aim to rescue this data to expand and extend existing datasets (Allan et al. 2011). Some national records show a network density increase over the years but a decline in more recent years. An example is the United Kingdom, where resources are being spent on radar measurements instead. Figure 1 similarly shows a global decrease of gauges in recent years, which may arise partly from a data collection artifact since we have not yet requested updates from datasets collected at the beginning of the project in 2015.
c. Completeness of records
Records can be long (Fig. 2) but some contain a large percentage of missing data at an hourly time step (see Table 2 and Fig. 3). This again affects the usability of the data. Table 1 and Fig. 2 show the real record length [record length × (1 − fraction of missing data)] of the gauges in GSDR. Approximately 7% of stations have complete records and ~39% have records with less than 10% missing data, while almost a quarter (~23%) of stations have over 90% missing data, making them practically unusable. Approximately 17% of stations have real record lengths of over 30 years, making these potentially the most useful for a range of analyses and applications. Figure 4 shows that the United States, Japan, and Australia have the greatest number of stations available with >30 years of data.
d. Format and availability of GSDR
GSDR is stored in a flat file system. Each gauge is stored as an individual text file in a compressed folder organized by country. The files contain station metadata including station ID, country, original station details, origin of the data, latitude, longitude, elevation, record start and end dates, the number of hours in the record, and the percentage of missing data, as well as the original time step, time zone, and units of the data. The rainfall data are then recorded as a complete time series from the recorded start date with missing values included as −999. Some of the dataset is currently only accessible to the INTENSE team and project partners but some is open access (see Table A2).
To address one of the objectives of the WCRP Grand Challenge on Extremes, we have compiled a global sub-daily precipitation dataset and describe the hourly data in this paper. This dataset is highly variable in global coverage, record length, real record length, and the extent to which it has been assessed for quality. The data quality and quantity should match that required by the analysis or application being undertaken. Short, incomplete records may still have value for some applications (e.g., to validate satellite or radar observations) and may be used for some types of analyses (e.g., to determine the diurnal cycle of rainfall) or may be pooled for temperature scaling and extreme value analysis. Long records are, however, essential to allow the detection of changes in rainfall extremes (e.g., Kendon et al. 2018). Future work will build on the methods applied to U.K. hourly rainfall data (Blenkinsop et al. 2017; Lewis et al. 2018) to develop a standard methodology for quality controlling this Global Sub-Daily Rainfall (GSDR) dataset from multiple sources and for different climatic regimes to ensure the data are of high quality. The code for this will be made available to help ensure minimum standards of quality can be provided across data providers. Data collection is ongoing and further contributions to the dataset are very welcome.
The dataset presented here provides a platform for future development by the larger scientific community and policy makers. In particular, it should be supported and maintained by a global organization, preferably an NHMS, with efforts made to ensure that data licenses make the raw data itself available to researchers in the future, to further scientific understanding. This would align with the goals outlined in Thorne et al. (2017) to harmonize surface meteorological holdings across essential climate variables and time scales, and in time be open and free from usage restrictions. Work toward such a goal is underway within the Copernicus Climate Change Service framework (https://climate.copernicus.eu/global-land-and-marine-observations-database). INTENSE will endeavor to provide as much support with this as possible and are taking steps to find a suitable organization to continue the work of this project. This highlights a wider problem that needs to be addressed with regard to the maintenance of datasets that are developed by specific funding. In the meantime, future work from the INTENSE project will produce indices of extreme sub-daily rainfall, similar to those already available at daily time scales, such as the Expert Team on Climate Change Detection and Indices (ETCCDI) Climate Change Indices (Zhang et al. 2011; Donat et al. 2013a). These will be made freely available to the academic community through the CLIMDEX (www.climdex.org) platform. INTENSE is also working closely with the convection-permitting model community to provide a set of climate model relevant evaluation metrics at sub-daily scales.
The GSDR provides a new, invaluable resource to Earth scientists, as expanding the availability of global sub-daily precipitation data will improve our capacity to address significant research questions associated with variability and trends in intense rainfall and its associated impacts. Furthermore, coupled with information derived from the new generation of convection-permitting climate models (e.g., Kendon et al. 2014, 2017), such data provide the potential to increase our understanding of how large-scale dynamics interact with local-scale thermodynamics (Pfahl et al. 2017) as drivers of intense rainfall in a changing climate.
5. Data availability
The subset of GSDR that can be made freely available (marked as “open” in Table A2) will shortly be hosted by the Global Precipitation Climatology Centre at Deutsche Wetterdienst and available through the Copernicus Climate Change Service Climate Data Store. Until then, the data can be obtained from the authors.
The INTENSE project is funded through the European Research Council (Grant ERC-2013-CoG-617329) and funds EL, HJF, SB, X-FL, and SG. HJF is also funded by the Wolfson Foundation and the Royal Society as a Royal Society Wolfson Research Merit Award (WM140025) holder. LA is supported by Australian Research Council Centre of Excellence Grant CE17010023 and Discovery Project Grant DP160103439. RJHD was supported by the Met Office Hadley Centre Climate Programme funded by BEIS and Defra. A huge amount of thanks is owed to the many people who have helped identify and provide data for this paper, particularly those outlined in Tables A1 and A2 in appendix A.
Table A1 summarizes the sources and availability of data from countries that were contacted by the INTENSE project. Table A2 focuses on those countries where sub-daily rainfall data were available and collected by the INTENSE project.
Figures B1–B4 indicate the real record length for stations in Australia, the United States, Europe, and Southeast Asia, respectively. Figures B5–B8 show the percentage of missing data for the stations in these regions.