The Rainy and Dry Seasons (RADS) dataset, a new compilation of precipitation statistics available to the public, is described. The dataset contains the dates of onset and demise of the rainy season (one date per year), the duration of the rainy and dry seasons, and the accumulated precipitation during the rainy and dry seasons. The methodology for detecting the characteristics of the rainy season is based solely on precipitation data. RADS was developed from multiple global gridded daily precipitation datasets [Tropical Rainfall Measuring Mission (TRMM), 1998–2015; Climate Prediction Center Unified Gauge-Based Analysis of Global Daily Precipitation (CPC_UNI), 1979–present; and Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), 1980–present] and therefore shares the spatial resolution, temporal range, and limitations of the original precipitation datasets. This is the first free public dataset of the characteristics of the rainy and dry seasons created using a consistent methodology across the globe, including all major monsoonal regions. We expect that the RADS dataset will contribute to our understanding of the sources of variability of the timing of rainy seasons (on local to regional scales) and monsoons (on large scales) and their impacts on water resource management and other aspects of geosciences and human activities.
A global gridded dataset containing the characteristics of the rainy and dry seasons, including onset and demise dates, duration, and accumulated precipitation during the wet and dry seasons, is described.
The characteristics (onset and demise dates, duration, and accumulated precipitation) of the wet and dry seasons have important implications to several sectors of society. On one hand, late rainy season onset can have negative impacts on agriculture, leading to yield loss. On the other hand, late rainy season demise can lead to postharvest loss due to the infection of crops by mold, threatening human health. Moreover, variations in the accumulated precipitation during the rainy and dry seasons have direct implications to sectors related to water resource management, such as agriculture (global food security), energy generation (water and energy), human health, and property loss (landslides, floods, waterborne diseases, and forest fires).
There is extensive literature on how to define the timing of the rainy season (e.g., Kousky 1988; Liebmann and Marengo 2001, hereafter LM01; Wang and LinHo 2002; Cook and Buckley 2009; Asharaf et al. 2012) and monsoons (Webster and Yang 1992; Higgins et al. 1997; Goswami et al. 1999; Fasullo and Webster 2003; Gan et al. 2004; Goswami and Xavier 2005; Joseph et al. 2006; da Silva and Carvalho 2007; Garcia and Kayano 2009; Tong et al. 2009; Moron et al. 2010; Moron and Robertson 2014; Carvalho et al. 2016). While the timing of monsoons can be defined based on changes in wind or transport of moisture, not all regions that experience well-defined rainy seasons are influenced by monsoons. In addition, some midlatitude regions (e.g., the Great Plains in the United States) experience significant precipitation seasonality even though the annual cycle of precipitation in these regions is less sharp than the annual precipitation cycle in monsoonal regions. The simplest way to define the onset (or demise) of the wet season would be to find the first (or last) wet day with precipitation above a given threshold (Nicholls et al. 1982; Lau and Yang 1997; Marengo et al. 2001; Li and Fu 2004; Nieto-Ferreira and Rickenbach 2011), and considering the persistence of precipitation and the occurrence of dry spells (or wet spells). This method is well synchronized with the planting in rain-fed agriculture (Stern et al. 1981; Sivakumar 1988; Tadross et al. 2005; Marteau et al. 2011). One problem of using rigid thresholds of precipitation is that they are not useful outside the region of interest. Several methods are based on applying some form of low-pass filtering to rainfall time series as an alternative to using rigid thresholds (LM01; Wang and LinHo 2002; Camberlin and Diop 2003; Cook and Buckley 2009). The LM01 method (or a variation of it) has been widely applied to study several monsoonal regions (Liebmann et al. 2007; Bombardi and Carvalho 2008, 2009; Liebmann et al. 2012; Diaconescu et al. 2015; Bombardi et al. 2015, 2016; Dunning et al. 2016; Noska and Misra 2016; Alves et al. 2017).
Despite the importance of the characteristics of the wet and dry seasons to several sectors of society, there are no free, public, global datasets of these variables. Therefore, the objective of this article is to present a global gridded dataset containing the onset and demise dates of the wet season (one date per year) as well as the accumulated precipitation and the duration of the wet and dry seasons. The methodology for detection of the characteristics of the rainy season is based solely on precipitation data. The dataset was developed from multiple gridded daily precipitation products and therefore shares the spatial resolution, temporal range, and limitations of the original gridded precipitation product. The Rainy and Dry Seasons (RADS) dataset is the first free public dataset of the characteristics of the rainy and dry seasons created using a consistent methodology across the globe, including all major monsoonal regions (more than half of the world’s population lives in monsoonal regions).
We expect that this dataset will support research related to several aspects of water resource management, such as the impacts of the wet and dry seasons on crop management, forest fires, and runoff and streamflow analysis and modeling. This dataset will also support weather and climate research that is not directly related to water resource management. For example, it is a common practice in global-scale climatological studies to divide analyses into seasons. While the climate of mid- to high latitudes is well characterized by four seasons linked to calendar months, most tropical regions only truly experience two seasons: the wet season and the dry season. Therefore, analyses focusing on autumn or spring in the tropics might become contaminated by improperly combining wet and dry precipitation regimes. We expect this dataset will incentivize researchers to consider separating their analysis in the tropics between wet and dry seasons, without devising an independent method of calculating these dates themselves. Moreover, we expect that researchers in weather, climate, and related fields will apply their expertise to this dataset and improve our understanding of the sources of variability of the characteristic of the rainy and dry seasons.
METHODOLOGY AND DATA.
This dataset was created using the method developed by LM01, with some slight modifications based on Bombardi et al. (2017, hereafter B017). This method uses only precipitation data and calculates the timing of the rainy (or dry) season at the local scale (e.g., station or grid point) based on accumulate precipitation anomalies S from Eq. (1):
where Pi is the daily precipitation rate on day “i,” “” is the long-term mean annual precipitation rate in mm day−1, and “t0” is the starting date for the calculations. If we want to calculate the onset date of the rainy (or demise date of the dry) season, we start the calculation of S well within the dry season. Therefore, S (black line in Fig. 1a) will initially assume negative values. Once the rainy season starts, there will be an inflection in S. The inflection point is considered the onset date of the rainy season. For regions that experience well-defined rainy seasons, the date of inflection of the S curve does not depend on t0, as long as t0 is reasonably well defined (not shown).
To create a global gridded dataset, we made five simple adaptations to the LM01 method. The first modification is the choice of t0 in Eq. (1) . Because the rainy season varies substantially from region to region (e.g., B017), t0 has to change from grid point to grid point. Previous work (Bombardi and Carvalho 2009) defined these dates as simply the minimum and maximum dates of the mean annual cycle at each grid point. However, the minimum of the mean annual cycle can occur right after the rainy season, leading to problems such as the detection of false onset dates. Therefore, we define t0 as the date of the minimum of the first harmonic of the mean annual cycle of precipitation at each grid point. This small modification also ensures that the calculation of the onset and demise dates start half a year from the peak date of the rainy season.
In previous work (e.g., B017), the demise of the rainy (or the onset date of the dry) season was defined using a similar approach as described above. B017 defined t0 as the maximum of the first harmonic of the mean annual cycle and started the calculation well within the wet season. However, since precipitation is a discrete and highly variable quantity, changes in precipitation intensity during the rainy season can sometimes lead to false demise dates. Therefore, the second modification refers to the calculation of demise dates. In this work, we define the demise date of the rainy season almost exactly how we define the onset date, the only difference being that we calculate demise dates retrospectively (gray curve in Fig. 1a). That is, we start the calculation of S at date t0 (well within the dry season) and move back in time.
The third modification is a two-step process in the calculation of the onset or demise dates of the rainy season. As we mentioned before, the LM01 method is based on finding the inflection point in the time series of accumulated precipitation anomalies. For most years, especially in monsoonal regions, finding the inflection point is as straightforward as finding the date of minimum accumulated precipitation anomalies. Therefore, the first step consists of calculating the timing of the rainy season using the original LM01 method (Fig. 1a). However, in some years, the rainy season evolves so gradually (Fig. 1b) that the dates of minimum or maximum in the time series of accumulated precipitation anomalies are ambiguous. For such years, the LM01 algorithm tends to fail to detect the onset or demise dates or detect them too early or too late in comparison to the climatology. Therefore, after calculating the onset and demise dates, we check for outliers. Outlier are defined as cases when the onset or demise dates are below or above 1.5 times the interquartile range. Then we perform the second step, which consists of recalculating the timing of the rainy season for the cases detected as outliers. For these few cases, we determine the timing of the rainy season based on the methodology described in B017. That means that we detect the dates of onset or demise by smoothing S and taking the first derivative (with respect to time) of the smoothed S curve. The first day when the derivative changes sign and persists for 3 days is considered the onset (or demise) date of the rainy season (Fig. 1b). The S curve is smoothed using a 1–2–1 filter passed 50 times. The smoothing in this second pass is necessary to avoid detecting false onset (or demise) dates. Finally, we check again for outliers as dates above or below 3 times (less strict than the first check) the interquartile range. If both methods fail, the data point is considered missing data. We also mask regions where the algorithm fails to detect the onset or demise dates more than 33% of the time. Note that in this work the demise dates were calculated retrospectively.
The fourth modification consists of masking regions characterized by two or three rainy seasons per year. We achieve that by using the explained variances of the first three harmonics of the mean annual cycle of precipitation. If the explained variance of the second or third harmonics explain the same or more variance than the first harmonic, this indicates that that region experiences a pronounced bimodal or trimodal precipitation regime. Therefore, that region is masked. For calculations of the onset and demise dates in regions with more than one rainy season per year, see Dunning et al. (2016) and Seregina et al. (2019). This method performs well for regions that experience a single well-defined rainy and dry season per year. Regions adjacent to the masked regions might still be characterized by multiple rainy seasons a year. The dates in these regions might have little meaning. It is also important to mention that if a region lacks a clear wet season, it does not indicate that the region does not experience a rainy season at all. It instead means that that region experiences a complex precipitation regime. One could mask regions where the accumulated precipitation during the rainy season is below a certain threshold. However, that criterion might inadvertently exclude regions that experience well-defined wet seasons. In this work we chose to not use rigid thresholds, so it is up to the user to decide what is an acceptable level of uncertainty. The data repository contains a file (called “percentage”) with the average percentage of annual precipitation that falls within the rainy season, which can be used for this purpose.
The fifth and final modification refers to boundary conditions. The last year of historical records poses a problem for regions where the wet (or dry) season occurs between years. To solve this problem, we apply the methodology called “minimum roughness” developed by Mann (2004). This methodology consists of applying a boundary constraint to the time series of accumulated precipitation anomalies that approximates the “minimum roughness” boundary constraint. This constraint consists of simply mirroring the time series of accumulated precipitation anomalies at the end of the time series on both x and y axes. In other words, we simply add extra data points to the end of the time series that mirror the end of the original time series (Fig. 2). This extension of the time series is only used to assure that the smoothing of the S curve is correct at the boundaries. The effect of this boundary constraint is to create an inflection between the end of the time series and the added data points. However, we do not allow the algorithm to detect the onset (or demise) date of the rainy season beyond the second-to-last day of the historical record. Therefore, even though the LM01 method is based on finding inflections in time series, we do not allow the algorithm to detect the onset (or demise) date as the inflection point created by the boundary constraint.
For the rainy season of 1979/80 the onset and demise dates defined by the original LM01 original method is exactly the same as the dates defined in the first step of our methodology. In this case, the second step of our methodology is not triggered since the dates are well within the normal range of dates and the dates are defined as shown in Fig. 1a. For the rainy season of 2015/16, both the original LM01 method and the first step in our methodology fail, since the demise date is defined as the first day in curve S and the onset as the last day in curve S. In this case, the second step of our methodology is triggered and the dates are defined as shown in Fig. 1b.
Once the timing of the rainy season is defined, we can also define the duration of the wet and dry seasons and the accumulated precipitation during the wet and dry seasons, which are included as part of our RADS product. Table 1 summarizes the variables included in the RADS dataset, where each variable is available for each year in the record of the input precipitation dataset.
Rather than creating a single dataset for the characteristics of the rainy and dry seasons, several datasets were created using several gridded precipitation datasets as input. For this pilot project, RADS datasets were created from gridded precipitation data from three different sources: the Climate Prediction Center Unified Gauge-Based Analysis of Global Daily Precipitation (CPC_UNI; global coverage at 0.5° spatial resolution; 1979–present; Xie et al. 2007; Chen et al. 2008), precipitation estimates from the Tropical Rainfall Measuring Mission (TRMM; 3B42; 50°S–50°N coverage at 0.25° spatial resolution; 1998–2015; Huffman et al. 2007), and adjusted precipitation from the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2; global coverage at 0.5° × 0.625° spatial resolution; 1980–present; Reichle et al. 2017). The TRMM dataset has relatively high spatial resolution, but shorter temporal coverage in comparison to CPC_UNI and MERRA-2. In addition, TRMM observations consists of 3-hourly sampling that might not be representative of precipitation daily totals. Precipitation values from TRMM and CPC_UNI in regions with low gauge coverage are associated with higher uncertainty. The precipitation from MERRA-2 is essentially model generated, but the precipitation is corrected (Reichle et al. 2017) using observations from CPC_UNI [and the Global Precipitation Climatology Project (Adler et al. 2003; Huffman et al. 2009) in Africa]. The characteristics of the wet and dry seasons, therefore, inherit the grid spacing, temporal coverage, and limitations of the original datasets.
RAINY AND DRY SEASON STATISTICS.
Global maps of the median onset and demise dates of the rainy season are very similar to the results of B017 (Figs. 3a,b), reflecting the seasonal evolution of convection and precipitation. The interannual variance of the timing of the rainy season provides useful information about the spatial distribution of monsoons as well as the accuracy of the RADS datasets (Figs. 3c,d). These quantities calculated using TRMM and MERRA-2 are included as online supplemental material (Figs. ES1, ES2; https://doi.org/BAMS-D-18-0177.2). Core monsoonal regions show the lowest interquartile range of onset and demise dates of the rainy season. In addition, the interquartile range of the timing of the rainy season increases away from monsoonal regions, which also indicates an increased uncertainty in the onset and demise dates. Away from monsoonal regions, the rainy season becomes less pronounced. Users should use caution when interpreting results calculated using RADS in mid- to high latitudes.
Figure 4 shows the mean annual cycle of CPC_UNI precipitation in two monsoonal regions (India and South America) and two nonmonsoonal regions (Russian and the Great Plains). Although these values were averaged over relatively large regions, the method represents well the timing of the rainy season in the two monsoonal regions, with very similar values for median and interquartile range among all the precipitation datasets (Figs. 4a,b). The mean and interannual variability of onset and demise dates in our product is consistent with the literature. Table 2 shows a comparison of the statistics of our product and some key references for India and South America.
The onset of the Indian monsoon is evident in the mean annual cycle, with a sharp increase in precipitation in June (Fig. 4a). The demise of the Indian monsoon as well as the onset and demise of the South American monsoon is not as evident as the onset of the Indian monsoon (Figs. 4a,b). In nonmonsoonal regions the method still captures the rainy period, although the rainy season is more ambiguous than in monsoonal regions and there are larger differences across precipitation datasets (Figs. 4c,d).
An interesting aspect of this dataset is that it allows us to visualize the total precipitation that occurs during the rainy season (or during the dry season) over both hemispheres in a single map (Fig. 5). Although the maps of accumulated precipitation during the rainy season resemble maps of mean annual precipitation, they also highlight core monsoonal regions. In contrast, when we visualize the accumulated precipitation during the dry season (Figs. 5b,d), the precipitation in core monsoonal regions is suppressed.
As shown before, the characteristics of the rainy and dry seasons can vary depending on the precipitation dataset used in the calculation. However, the characteristics of the rainy season calculated with different precipitation datasets are still very well correlated (Figs. 6, 7), with correlation coefficients above 0.98 (not shown) among all the datasets (for time series spatially averaged over the regions shown in Fig. 3). This is true even in the central Russia region, which shows the largest root-mean-square error (RMSE) in comparison to other regions (Fig. 6c). The values of accumulated precipitation calculated using TRMM show the largest differences from the other two datasets (Fig. 7). These differences are likely because the TRMM product provides eight precipitation estimates (mm h−1) each day (due to being combined with other microwave imagers and adjusted geostationary infrared data). In addition, MERRA-2 precipitation is adjusted based on CPC_UNI precipitation over most of the land (except Africa and high latitudes), which explains why the results calculated using CPC_UNI and MERRA-2 are closer.
COMPARISONS WITH THE ORIGINAL LM01 METHOD.
The LM01 method has proven to be a reliable method for detecting the timing of the rainy season. The modifications applied in this work were designed to strengthen the method and to allow its application consistently around the world. In this section, we present a comparison between the original LM01 method and our “new” adapted version. First, we evaluate the percentage of undefined onset and demise dates using each methodology. Undefined dates are considered as cases when the LM01 is not able to define onset or demise (e.g., Fig. 1b) or cases when the dates are considered outliers (1.5 times the interquartile range considering the LM01 statistics). In most regions, the original LM01 definition of onset fails about 10%–30% of the time (Fig. 8a) and the demise is undefined 5%–20% of the time (Fig. 8c). The new method decreases the number of undefined onset dates by 10%–30% (Fig. 8c), whereas the number of undefined demises show both decreases and increases (Fig. 8d). These seemingly contradictory results can be further explained by Fig. 9.
There is a high level of agreement between both definitions of onset dates, with high correlations (Fig. 9a) especially in monsoonal regions, and biases smaller than 5 days in absolute value in most regions (Fig. 9c). These results show that the new methodology improves the original LM01 definition by decreasing the number of undefined onset dates while maintaining a high level of agreement between defined dates. In contrast, the demise dates are not well correlated (Fig. 9b) and there are large biases between the two definitions of demise dates (Fig. 9d). Nonetheless, the new methodology also improves the definition of demise dates. Note that the regions with strong negative biases (Fig. 9d) coincide with the regions with low undefined demise dates (Fig. 8b) and the regions of increased undefined demise dates (Fig. 8d). The original LM01 method defines demise dates as the maximum in the time series of accumulated precipitation anomalies starting from the dry season. Since the demise of the rainy season is usually less sharp than the onset date, strong precipitation events after the end of the rainy season can trick the method into defining the demise of the rainy season too late. In fact, a visual inspection of several locations revealed that the LM01 method defines the demise unrealistically late for several years (not shown). Our new method calculates the demise date retrospectively and, therefore, is less sensitive to anomalous rainy events at the time of the demise of the rainy season. Therefore, the new method improves on the original method by reducing the mean date of demise dates and by excluding demise dates defined unrealistically late.
SUMMARY AND DISTRIBUTION.
We present the first dataset of the characteristics of the rainy and dry season (RADS), calculated at the local scale (grid point) using a systematic approach across the globe. The characteristics of the rainy and dry seasons (timing, duration, and accumulated precipitation) are calculated based solely on precipitation data. The dataset was calculated for multiple precipitation datasets (TRMM, CPC_UNI, and MERRA-2) and therefore shares the spatial resolution, temporal range, and limitations of the original precipitation datasets. The characteristics of the rainy and dry seasons are consistent across precipitation datasets. However, there is more agreement among the datasets in monsoonal regions than in nonmonsoonal regions.
It is important to mention that not all regions can be simply defined by a rainy season and a dry season. For example, some regions show two peaks in their seasonal precipitation variations (e.g., over tropical Congo), while others show small seasonal variations with abundant rainfall all year-round (e.g., over the northeastern United States). For such cases, using a single rainy season and dry season to describe the climate is not the best way to study it. The user can use the variable called “perc” that represents the average percentage of the annual precipitation that falls within the rainy season.
Potential applications for this dataset include research focused on improving our understanding of the sources of variability of the timing of the rainy season (on local to regional scales) and monsoons (on large scales). Several studies have focused on understanding the large-scale sources of variability of the timing of monsoons. However, fewer studies have investigated local to regional sources of variability of the timing of monsoons. This dataset can also be applied to research based on the impacts of the rainy and dry seasons on other aspects of geosciences and human activities. For example, the relationships between the characteristics of the wet (or dry) season and forest fires, availability of water resources, household water security, landslides, rain-fed agriculture, and health impacts such as waterborne diseases or postharvesting mold.
The RADS datasets are publicly hosted at the Department of Atmospheric, Oceanic, and Earth Science at George Mason University (ftp://cola.gmu.edu/RADS). The file format is Network Common Data Form (netCDF). More information about RADS and links to datasets generated for each specific precipitation product can be found online (https://climatology.tamu.edu/research/Rainy-and-Dry-Season-RADS.html). We will update the dataset yearly as long as new precipitation data from the source precipitation datasets are available. We will also include RADS datasets calculated using other precipitation data in the future, pending permission of the precipitation data providers. The codes for calculating the characteristics of the rainy and dry seasons are also available online (https://github.com/rjbombardi/onset_demise_rainy_season).
Some regions experience the onset or demise of the rainy season around the end of the year. That means that, for example, the onset of the rainy season of a given year can occur at the beginning of that year and the onset of the subsequent rainy season can occur at the end of that same year, rendering two onset dates of two different rainy seasons during the same year. Because this is a global dataset, it is hard to keep the year of the onset and demise consistent with the reference years in the output file for all grid points. Therefore, the user should always check the file containing the onset and demise year of the rainy (or dry) season in all the grid points in a region of interest. For this same reason, one should be careful when calculating statistics of the onset and demise dates. The average between Julian days 365 and 2 should be Julian 1 and not Julian day 183.5. The easiest way is to calculate the average (or median) onset or demise date using circular statistics and calculate anomalies. Then subsequent statistical analyses can be applied to the anomalies with no further constraints. It is worth mentioning that the wet season duration and accumulated precipitation for a given year always refer to the year in which that rainy season started (onset year). Likewise, the dry season duration and accumulated precipitation refer to the year in which the wet season ended (or the dry season started; demise year).
We thank NOAA Climate Prediction Center for making available the CPC Unified Gauge-Based Analysis of Global Daily Precipitation and the National Aeronautics and Space Administration for making available the TRMM analysis and MERRA-2. We thank Dr. Vincent Moron for his comments regarding the calculation of the timing of the rainy season. In addition, Rodrigo Bombardi would like to thank his wife, Dr. Heather Burte, for improving the grammar and style of this work and for discussions of statistics.
A supplement to this article is available online (10.1175/BAMS-D-18-0177.2).