A new approach to define heavy and extreme rainfall events based on cluster analysis and area-average rainfall series is presented. The annual frequency of the heavy and extreme rainfall events is obtained for the southeastern and southern Brazil regions. In the 1960–2004 period, 510 (98) and 466 (77) heavy (extreme) rainfall events are identified in the two regions. Monthly distributions of the events closely follow the monthly climatological rainfall in the two regions. In both regions, annual heavy and extreme rainfall event frequencies present increasing trends in the 45-yr period. However, only in southern Brazil is the trend statistically significant. Although longer time series are necessary to ensure the existence of long-term trends, the positive trends are somewhat alarming since they indicate that climate changes, in terms of rainfall regimes, are possibly under way in Brazil.
In recent times, heavy rainfall events are receiving increasing attention by people and scientists owing to their large social and economic consequences. Two regions of Brazil, the southeastern and southern regions, have more than 80% of their populations residing in urban areas that did not have a planned growth and were built around rivers. They are susceptible to flood occurrence, partly due to terrain irregularity and impermeability of the surface and mainly due to occasional heavy rainfall. Climate change may have an important contribution to the rise of the number of heavy rainfall events (Huntingford et al. 2003; Buonomo et al. 2007; Sansom and Renwick 2007, e.g.) as well as their intensity (Allan and Soden 2008).
Southeastern Brazil (Fig. 1) has its rainfall season between the middle of October and the end of March, with the remaining months presenting a drier season, July being the driest month (Satyamurty et al. 2007). One important meteorological feature of the rainy season of southeastern Brazil is the South Atlantic convergence zone (SACZ), which produces large amounts of rainfall during its active period. This feature drew attention of many precipitation climatology studies in this part of Brazil (Carvalho et al. 2002; Ferreira et al. 2003, e.g.). Heavy rainfall episodes occur in the dry season also, mainly due to the cold fronts that successfully penetrate northeastward into central Brazil from southern Chile and Argentina. Since heavy rainfall is rare during this period, the occurrence of such situations may cause damage as large as those registered in the rainy season.
On the contrary, southern Brazil (Fig. 1) does not have a well-defined rainy season. Its rainfall can be considered well distributed over the year (Rao and Hada 1990). The cold fronts (Andrade 2007) and mesoscale convective systems (Velasco and Fritsch 1987; Anabor et al. 2008) are the most important rain-producing weather systems. Thus, heavy rainfall events can occur all over the year in this region (Teixeira and Satyamurty 2007).
Highly irregular topography is an important geographic feature of these regions. The Serra do Mar, a coastal mountain range extending from the state of Santa Catarina (SC) (see Fig. 1) to the state of Rio de Janeiro (RJ), and Serra da Mantiqueira, another mountain range that separates the states of São Paulo (SP) and Minas Gerais (MG), are the most prominent topographic features, and the interaction of the atmospheric circulation with these mountain ranges (mainly the Serra do Mar) can trigger rainfall intensification processes (Blanco and Massambani 2000) that can cause high daily rainfall totals.
Heavy rainfall is a subjective term and its definition varies significantly. It is often defined in terms of the rainfall at a single station and sometimes in terms of an average over a preestablished area. If the rainfall in one day exceeds a certain percentage (e.g., 20%) of the seasonal rainfall (mainly for rainy season) the event is considered a heavy rainfall event (Harnack et al. 1999; Junker et al. 1999; Carvalho et al. 2002). Some studies have used a fixed rainfall threshold value (Teixeira and Satyamurty 2007) on the order of 50 mm day−1 for defining heavy rainfall.
When a fixed rainfall threshold is used, some problems may arise. (i) Selection of heavy rainfall events using single-station rainfall values may result in a sample containing highly localized heavy rainfall events. Such events may be a product of only local forcing, such as boundary layer instabilities, and become unsuitable for synoptic system studies. (ii) Selection based on the area-average rainfall (often made using an isohyet analysis) is not suitable in regions with complex terrain, and such a methodology may create a sample that excludes some heavy rainfall events over those areas. Selection of the events based on seasonal rainfall has another drawback, namely, misleading samples of heavy rainfall events, because the criterion does not take into account the intraseasonal oscillations in the rainfall regime. This problem may be reduced when a criterion based on monthly rainfall is used.
So, in this study, a different approach is presented for the selection of heavy rainfall events that takes into account the extreme character of rainfall in southeastern and southern Brazil as well as its spatial variability. However, the method, in a way, can be applied in other areas as well. Heavy rainfall episodes were selected for the 1960–2004 period using the new approach, and long-term trends in their annual frequency and a possible relationship with the El Niño–Southern Oscillation (ENSO) phenomena were examined.
2. Spatial rainfall analysis
Daily rainfall data from Agência Nacional das Águas (ANA, Brazilian Water Agency) for the 1960–2004 period were used in this work. The ANA rain gauge network has 796 stations in southeastern Brazil and 476 stations in southern Brazil. However, not all stations have complete rainfall time series, and only those stations with less than 10% missing data were considered for this study. This restriction has reduced the number of stations included in the study to 202 (about 25%) in the southeastern and 109 (about 23%) in the southern Brazil regions. Contrary to Karl and Knight (1998), no procedure was used to complete the time series for missing data since completion of missing daily data can introduce bias and errors in the analysis of these series. Figure 2 presents the stations used in this study for the two regions.
Unfortunately, parts of these two regions are not well covered by the stations. It is apparent that the sparsest coverage is in the west and the central parts of SP, in Southeastern Brazil. Other parts, like southern Rio Grande do Sul (RS) and western Paraná (PR), also have a sparse network. As will be seen later, spatial behavior of rainfall is important for the present work. However, a few areas with sparse data coverage will not invalidate the methodology and the results.
To properly analyze the extreme characteristics of rainfall in these two regions, a cluster analysis was performed with the available rain gauge stations. The cluster analysis is based on an idea to separate stations into homogeneous groups whose identities and characteristics are not known a priori (Wilks 2006). This methodology agrees with what is emphasized by Groisman et al. (2005), who recommend that the characteristics of heavy rainfall should be averaged over spatially homogeneous regions in order to obtain statistically significant estimates.
Hierarchical cluster is a step-by-step process in which objects (rain gauge station time series) are grouped into clusters. The union of the rain gauge stations is performed by comparing some measure of similarity, often specified by a distance measure. Here, Euclidian distance was used. To form the groups, a rule or decision criterion should be used. The Ward method was used for this purpose, which is a least squares approach that minimizes information loss at each union of objects.
There is no unique method to stop the process of merging objects, and the choice of the number of the groups can be quite subjective. It was decided to use the distances between the merged groups as a function of the stage of the analysis to determine the number of clusters. Using this “stopping rule,” the hierarchical cluster analysis can be halted at a point just before the distances between the merged clusters become very large. In addition, climatological information along with the topography of the regions was also used to inspect the results and to help decide the final number of clusters. Groisman et al. (2005) also used a set of information in to define their climate regions.
As mentioned before and as can be seen from Fig. 2, the rain gauge stations are located in regions with heterogeneous terrain. As a result, artificial clusters may result from a cluster analysis applied directly to rainfall data. This was avoided by first centering (obtaining the anomalies) and then scaling (dividing by their standard deviation) the time series. The standardized time series were used in cluster analysis.
Figure 2 also shows the groups resulted from cluster analysis. Five groups were obtained for southeastern Brazil and four for southern Brazil. Two peculiar groups are 1) one covering the metropolitan area of Belo Horizonte city (group 5 of southeastern Brazil) and 2) the other covering the Itajaí River valley (group 2 of southern Brazil). The first is the third major metropolitan area of Brazil (São Paulo and Rio de Janeiro cities are the first two), and the Itajaí River valley is an area well known in Brazil for its frequent floods. Also, it is important to note that group 3 of southeastern Brazil contains the highest places of the Serra do Mar mountain range, and this feature could be important to its isolation.
Recently, in November 2008, heavy rainfall events occurred in the Itajaí River valley causing large floods and mudslides leading to thousands of people being homeless and hundreds of deaths. The metropolitan area of Belo Horizonte city (capital of the state of MG) also suffered from floods in December 2008, when 10 people died during a heavy rainfall event that left more than 200 000 houses without electricity.
It is important to emphasize the constancy of the cluster analysis results. As long time series were used to obtain groups of stations, it is interesting to break these series in a number of small series and perform the cluster analysis for each set of small series. The time series were broken into series of 15 years, resulting in 3 subperiods within the 45-yr period. The results from the cluster analysis for these subperiods (not shown) revealed little or no change in comparison with the cluster analysis results for the entire period. Minor changes observed were related to the displacement of a few stations from one group to the other. Thus, the results of cluster analysis for the 45-yr period can be considered robust and constant.
3. Heavy rainfall definition
As discussed previously, the selection of heavy rainfall events based on single-station rainfall or preestablished area-average values are not suitable for areas with complex terrain. In such places, the rainfall distribution presents strong spatial variability with some places having very high daily rainfall records. Thus, the information given by the groups obtained from cluster analysis were used to construct a definition for heavy rainfall. Time series of mean precipitation for each group of stations was created by a simple arithmetical mean of all stations belonging to the group. Thus, five mean time series for the subregions of the southeastern region and four for subregions of the southern region were constructed.
From these time series, the 99% and 99.9% quantiles for each group were obtained for each month of the year. These climatological quantiles are shown in Fig. 3. Karl and Knight (1998) also used monthly percentiles to evaluate trends in rainfall in the United States.
Some interesting aspects of the rainfall regimes of these two regions were well captured by 99% and 99.9% quantiles. All groups in southeastern Brazil, except group 3, have substantially lower values for winter months, in comparison with the rainy season. The group 3 behavior in winter months can be attributed to two meteorological conditions. During the winter months, a southward-displaced and stronger subtropical jet acts as a barrier to the meridional movement of cold fronts (Andrade 2007) and dry air masses dominate a large area in central Brazil causing prolonged warm periods (Satyamurty et al. 2007). In such conditions, cold fronts move zonally eastward and affect only the coastal parts of southeastern Brazil. Even when cold fronts are not able to produce large rainfall anomalies, the circulation of the postfrontal high pressure center advects relatively warm and wet air from the Atlantic Ocean over the continent. The interaction of this circulation with the Serra do Mar Range can enhance precipitation in the group 3 region, occasionally producing high amounts of rainfall (Blanco and Massambani 2000).
For southern Brazil Teixeira and Satyamurty (2007) showed that the circulation associated with the South Atlantic subtropical high pressure center is a second source of moisture for heavy rainfall, besides the moisture transport from the southern Amazon. It seems that post-cold-front high pressure centers can have similar importance for heavy rainfall events in southeastern Brazil. In addition, almost all groups present two small peaks in the climatological 99% quantiles, one in autumn (May) and the other in spring (October) seasons. Mesoscale convective complexes form more frequently over northeast Argentina and southern Brazil in these transitions seasons (Velasco and Fritsch 1987) and contribute to these high extreme daily rainfall values in the station groups of southern Brazil.
Another point that deserves to be stressed is the high monthly variability in the 99% and 99.9% quantiles, especially in the southern Brazil groups. This monthly variation suggests that identification of heavy rainfall and extreme rainfall made with parameters obtained for an entire season, referred to as a seasonal criterion, is inappropriate. The use of seasonal criteria is common in heavy rainfall studies. Comparing the climatological 99% quantiles for southern Brazil in March with that in May, the difference is more than 10 mm and could be very significant if a seasonal criterion is used to identify the heavy rainfall events. The main difference between the variation of 99% and 99.9% quantiles is a small upward shift of the 99.9% quantiles. The monthly variation remains almost similar during the year for these two quantiles. Even with this variability, it cannot be seen as a preferred season for heavy rainfall events in southern Brazil as in southeastern Brazil.
Finally, all events with mean daily rainfall in excess of the climatological 99% quantile of the month of its occurrence for a given station group were considered as “heavy” and those with daily rainfall higher than 99.9% were considered as “extreme.” Of course, there were events that produced widespread rainfall exceeding the 99% and 99.9% quantiles at more than one station group. In these situations, the event was selected only once. Also, if an event produces heavy rainfall for two or more consecutive days, only the first date was selected. Persistence of rainfall events is a different subject and is not in the scope of this work.
4. Annual frequency of heavy rainfall events
Using the information shown in Fig. 3, 510 heavy rainfall events and 98 extreme rainfall events were identified in southeastern Brazil and 466 heavy rainfall events and 77 extreme rainfall events were selected in southern Brazil. Figure 4 presents the annual frequencies of heavy rainfall in these two regions.
Some important features can be seen from the frequencies shown in Fig. 4. In southeastern Brazil, a relatively long period with high frequencies of heavy and extreme rainfall is observed between 1976 and 1983 (Figs. 4a and 4b). In southern Brazil, such a long period is not observed; however, two high peaks, in both heavy and extreme rainfall event frequencies, one in 1983 and the other in 1998, can be seen. Here, a qualitative association between ENSO and heavy and extreme rainfall events in southern Brazil is evident as the peaks coincide with the two strongest ENSO events in 1982–83 and 1997–98 Wolter and Timlin (1998). The same statement is not true for southeastern Brazil since heavy and extreme rainfall event frequencies did not present prominent peaks in the 1997–98 period (Figs. 4c and 4d). These affirmations are also valid when the frequencies of heavy rainfall events in both southeastern and southern Brazil are analyzed for each individual group of stations (not shown). For extreme events, the frequencies for the groups in southern Brazil did not present this qualitative relationship with ENSO (not shown).
The heavy rainfall event frequencies, shown in Figs. 4a and 4c, present positive trends in both regions, which is more evident in southern Brazil. For the extreme events trends are not so evident (Figs. 4b and 4d). To test for the existence of such trends in the frequencies of heavy and extreme rainfall events, time series techniques should be applied. A common approach is to use regression analysis to test the significance of the trend coefficient. This approach has serious drawbacks, especially when doing such analysis with meteorological time series that presents some correlation structure. Regression analysis does not take into account correlation structures and, therefore, is not appropriate for trend testing (Woodward and Gray 1993).
An alternative is to use a time series approach that takes into account autocorrelation structures existing in the heavy and extreme rainfall event frequency time series. Autoregressive-moving-average (ARMA) models have been used for this purpose (Woodward and Gray 1993, e.g.).
An ARMA model can be written as
in which φ(B) and θ(B) are polynomials that involve autoregressive (AR) parameters φ (−1 < φ < 1) and moving-average (MA) parameters θ (−1 < θ < 1). Here B refers to the backshift operator, defined as Bkyt = yt−k, and ɛt represents errors in the time series that are free from serial correlation. The number of autoregressive terms, p, and the number of moving-average terms, q, in the polynomials, given by
determine the ARMA model order. The value of φk determines how strongly the observation yt−k affects yt (autoregressive component). Thus, in an AR(1) model, for example, an observation is influenced by the previous observation; in an AR(2) model the observations are influenced by the two last observations, and so on. The value of θk determines, roughly speaking, how many time steps a previous observation (yt−k) will affect the future observation. For example, θ1 = 0.5 means that the observation yt−1 will affect yt only in two periods. Actually, MA models are harder to interpret in comparison to AR models, but this explanation can be thought of as a very simple visualization of such models. ARMA models are a combination of AR and MA models, given more complex models.
The order of an ARMA model is a crucial point in the fitting procedure, but there is no automatic and deterministic way to perform this task. Here, the Akaike information criterion (AIC) was used to select the order of model. The AIC statistic does not determine the best order of a model, but its residual uncertainty (Wilks 2006). Therefore, the lower AIC the better the fitting obtained. Thus, several ARMA models were fitted to time series of frequencies of events, and the model with lower AIC statistic was selected. In addition, autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs) were obtained from residuals of the models with lower AIC values to verify for the existence of any significant cut off. These analyses were performed with R software (http://www.r-project.org), a free open source package commonly used in statistical studies.
Significance of the trends obtained from this approach is tested using the ratio between the estimated trend parameter and its standard error, as in Woodward and Gray (1993). This ratio presents a distribution like Student’s t with N − 2 degrees of freedom when the null hypothesis (trend equal to zero) is true, with N the number of observations of the time series. Here N is equal to 45. Thus, when this ratio is higher than 2.021 the trend estimate is statistically significant at 95% level.
Tables 1 and 2 show the orders of the ARMA models fitted to time series of heavy and extreme rainfall event frequencies. It can be seen that a linear fitting (regression analysis—ARMA models with zero p order and zero q order) is sufficient to represent most event frequencies in the time series as indicated by the AIC values (not shown).
In southeastern Brazil there is no significant trend in the frequencies of heavy and extreme rainfall events, except for the region delimited by group 2, but there is a small positive trend—about one additional event each 20 years. Somewhat higher positive trends were found in the entire southern Brazil region and in the regions delimited by groups 1 and 4. As for southeastern Brazil, trends in the frequency of extreme rainfall events are very low and do not seem to be of practical importance. Trends of one additional event each 10 years in groups 1 and 4, and almost one additional event each 5 years for all of southern Brazil were found. Figure 5 shows the frequencies of heavy rainfall events in these two groups of southern Brazil.
These figures may not be impressive at first inspection, but can be important when long-term analysis is an objective. During all of the 45-yr period, a total of nine additional heavy rainfall events were registered. The methodology used here can be too rigid to select heavy and extreme rainfall events since it is expected that a number of stations have their mean extreme quantiles of daily rainfall exceeded. However, these small but significant trends may be a signal that possible climate changes are already changing rainfall behavior in these Brazilian regions. The reasons for such trends are not part of the scope of this study, but deserve special attention in future works.
Since ANA applies quality control to their data, which also includes consistency analysis, the trends in frequency of heavy and extreme rainfall events for both regions could hardly be attributed to long-term errors in rainfall data. Another point that reinforces these results is that an areal average, and not information from single stations, is used. If some erroneous rainfall data still survived the ANA quality control, these errors do not strongly influence 99% climatological quantiles obtained from area-averaged time series.
Positive trends in extreme precipitation events also have been found in other places (Frich et al. 2002; Sansom and Renwick 2007). Climate forecasting studies (Huntingford et al. 2003; Buonomo et al. 2007) suggest that these trends will continue in the future and that it could be a consequence of human influence on climate. Although it is not possible to state that the significant positive trend in southern Brazil is, in fact, a consequence of some process of natural or human-induced climate change, it is an alarming result, and an investigation on climate dynamics should be performed to clarify any doubts.
5. Summary and conclusions
In this work, heavy and extreme rainfall events that occurred during the 1960–2004 period in southeastern and southern Brazil regions were investigated. Daily rainfall was first analyzed through cluster analysis to obtain station groups in which rainfall had similar behavior. For each group of stations, the extreme aspects of their mean rainfall time series were used to define heavy and extreme rainfall events. Climatological monthly 99% and 99.9% quantiles were used to select heavy and extreme rainfall events, respectively, in both regions.
A total of 510 (98) and 466 (77) heavy (extreme) rainfall events were identified in these Brazilian regions. These events presented positive linear trends over a 45-yr period, especially in southern Brazil where they were statistically significant. One interesting observation is that, although the rainfall regimes are different and not all trends are statistically significant, the trends in heavy rainfall events for these entire regions of Brazil are positive.
Positive trends shown in this work, although not very high, are an alarming result since they show that climate change processes may be under way. However, it is not possible to affirm that these results are really a consequence of a human-induced climate change, and deeper investigations are necessary in both regions, where trends for extreme events had much lower values than those for heavy rainfall events.
From the annual frequencies presented here, it is also possible to state that there is a modulation in the number of heavy rainfall events and even in the number of extreme events, as shown in Fig. 5, by ENSO phenomena in southern Brazil. There is no indication of modulation of the southeastern Brazil events by ENSO since there are high frequencies in 1975–76 years and in 1982–83 years, characterized by La Niña and El Niño episodes, respectively (Figs. 4a and 4b). In addition, the intense 1997–98 El Niño episode appears to have no influence on the number of events (Fig. 4a). On the contrary, high frequencies of heavy rainfall events in southern Brazil (Fig. 4c) appear to be related to El Niño episodes: 1972, 1982–83, 1986–87, 1991–92, 1997–98, and 2002, for example. This is not so surprising since the influence of El Niño on rainfall of this part of the world is well known (Ropelewski and Halpert 1989; Rao and Hada 1990; Grimm et al. 2000).
The heavy and extreme rainfall definitions used in this study are useful and flexible for applied in any region, permitting wide cross comparisons between studies for different regions. The cluster analysis provides areal information collectively for those stations at which rainfall behaves in a similar way. The creation of homogeneous areas and the use of an area-mean rainfall time series avoids selecting events that are exclusively local. The extreme quantiles of the station groups provide the criterion for a statistical selection of heavy and extreme events in any time of the year, remembering that these kinds of events may occur in any season.
Also, another important aspect of this definition is the rain gauge network used, specifically the number of the stations and the completeness of time series. Keeping constant the number of stations throughout the analyzed period avoids some problems that may arise when additional stations are used in different subperiods, like an artificial increase of the number of heavy rainfall events with time. Obviously, the more complete the time series the more reliable are the results obtained from them.
It is of interest to mention that low values of positive significant trends encountered in the frequency of heavy and extreme rainfall events in these two regions may be a drawback of this methodology since its selection criterion could be very restrictive, leading to a lower number of events identified. Anyway, it may be preferable to have fewer events but with true impact in some areas instead of having a large sample of very localized extreme events that may lead to unreliable conclusions.
Another point that deserves further investigation is the cause of the large number of heavy and extreme rainfall events in southeastern Brazil between the end of the 1970s and the beginning of the 1980s. This issue will be addressed in a future work.
This research was supported by National Counsel of Technological and Scientific Development (CNPq). The second author also thanks FAPEAM, Manaus, for support during 2007–09. The authors also thank the comments and suggestions of the anonymous reviewers that helped to improve the quality of this work.