## Abstract

The occurrence of rainfalls of high magnitude constitutes a primary natural hazard in many parts of the world, and the elaboration of maps showing the hazard of extreme rainfalls has great theoretical and practical interest. In this work a procedure based on extreme value analysis and spatial interpolation techniques is described. The result is a probability model in which the distribution parameters vary smoothly in space. This methodology is applied to the middle Ebro Valley (Spain), a climatically complex area with great contrasts because of the relief and exposure to different air masses. The database consists of 43 daily precipitation series from 1950 to 2000. Because rainfall tends to occur highly clustered in time in the area, a declustering process was applied to the data, and the series of daily cluster maxima were used hereinafter. The mean excess plot and error minimizing were used to find an optimum threshold value to retain the highest records (peaks-over-threshold approach), and a Poisson–generalized Pareto model was fitted to the resulting series. The at-site parameter estimates (location, scale, and shape) were regressed upon a set of location and relief variables, enabling the construction of a spatially explicit probability model. The advantages of this method to obtain maps of extreme precipitation hazard are discussed in depth.

## 1. Introduction

The analysis of extreme events is currently one of the lead research topics in climatology because of the potentially dangerous character of the phenomena (Obasi 1994; Bruce 1994). An increase in the interest of extreme events has been detected in the last years within the climatic change paradigm. One of the most accepted hypotheses is that of a future increase of extreme events resulting from the increase of climatic variability (Katz and Brown 1992; Groisman et al. 1999). In different regions a clear growth of climate extremes has been reported (Karl et al. 1995; Groisman et al. 2004), which has had great negative impacts on society and the environment (Kunkel et al. 1999; Easterling et al. 2000).

Extreme precipitation events constitute a primary natural hazard because they are in the origin of degradation processes like severe erosion, landslide triggering, or flash floods, which can have regionally devastating power and pose a severe hazard to lives and property. Mapping the hazard of extreme precipitation allows the spatial distribution of this climatic feature to be assessed, but also permits the estimation of hazard at locations where no climatic record exists. Climatic hazard maps, in general, can also be useful as a part of decision support systems, especially in the fields of regional planning and environmental management. The main objective of this paper is to describe a method to obtain extreme rainfall hazard maps.

The problem of mapping the spatial distribution of extreme precipitation involves the need to translate the point information registered at different climatic stations in a region to a spatially continuous variable. Also, different variables have been used to describe the extreme precipitation. Thus, Prudhomme (1999) and Prudhomme and Reed (1999) used the median of the annual maximum daily precipitation. Lorente and Beguería (2002) used the median of the annual maximum precipitation accumulated in 1, 3, 5, and 7 days. One might object that a median value is not the most adequate variable to express extreme events. Attempts using absolute maxima, however, have had very little success (García Ruiz et al. 2000).

Instead of using a single statistic like the median of the annual maxima to describe the occurrence of extreme rainfalls in a region, the extreme value theory provides a much more complete analysis of the statistical distribution of extreme rainfall events, allowing for the construction of magnitude–frequency curves. Derived statistics like quantile estimates (average magnitude of an event of given return period) have been widely used to express the degree of hazard related to extreme precipitation at a given location. Combined with spatial interpolation techniques, they have also been used to assess the spatial distribution of the hazard of extreme rainfalls. For example, Gajic-Capka (1991) and Lana et al. (1995) mapped quantile estimations obtained by fitting a Gumbel model to series of annual maxima by means of local interpolation methods. Beguería and Lorente (1999) used ordinary regression against relief parameters to provide estimations of the 100-yr daily maximum rainfall also estimated by the Gumbel model at several points in the study area. Weisse and Bois (2001) compared kriging and ordinary regression against topography to model 10- and 100-yr rainfall estimates for rainfall duration of 1–24 h. The use of the extreme value theory in these examples, however, was reduced to at-site (independent) calculation of the extreme quantiles. The existence of a spatial structure in the distribution of the model parameters was not addressed. In addition, if a map was wanted corresponding to a different return period or hazard level, new at-site estimations and interpolation were needed.

In this paper we explore the construction of a spatially continuous probability model in which the parameters of a generalized Pareto (GP) distribution are allowed to vary spatially. The parameters are estimated locally by common at-site methods from the climatological records in the study area, and then are distributed using spatial interpolation techniques. Once the spatial models of the distribution parameters are known, different maps of probabilities, quantiles, and return periods can be derived according to the needs of the user, without undertaking new spatial interpolations. Moreover, relative to using a set of unrelated at-site probability models, the analysis of the spatial distribution of the model parameters results in a much more robust regional probability model. The spatial distribution of the parameters can also be interesting from a theoretical point of view.

## 2. Study area and elaboration of the database

We have tested our methodology in a Mediterranean area in which extreme precipitation is frequent and causes important social, economic, and environmental damage (White et al. 1997; García-Ruiz et al. 2000; Lasanta 2003). The study has been carried out in the middle Ebro Valley (in the northeast of Iberian Peninsula) (Fig. 1).

Figure 2 shows the spatial distribution of 1) the mean annual precipitation, 2) the average annual daily maxima, and 3) the ratio of the annual maximum event to the total annual rainfall in percentage for the period of 1951–2000. The study area covers 22 973 km^{2}. Relief isolates the valley, largely impeding the maritime influence, resulting in a continental-like climate. These features similarly determine the complexity of its climate, with its principal feature being its aridity (Cuadrat 1991; Creus and Ferraz 1995; Creus 2001).

The annual precipitation oscillates between 300 and 450 mm in the valley bottom, and more than 800 mm in the northern and southern mountain areas. A high interannual variability is observed as a consequence of the alternation of dominant atmospheric patterns. There are years in which the precipitation greatly exceeds the mean value; in other years, the values are less than one-third of the mean, and long drought periods are particularly frequent (Vicente-Serrano and Beguería-Portugués 2003).

Extreme precipitation events are frequent in Mediterranean regions. In the Iberian Peninsula the most extreme events are recorded in the Mediterranean coastland areas (Romero et al. 1998) as a consequence of the higher influence of Mediterranean convective cellules that affect importantly these areas (Camarasa 1993; Llasat and Puigcerver 1994; Millán et al. 1995; Llasat 2001). This causes precipitation to be concentrated in a small number of days and annual maxima events represent a high percentage of the total annual precipitation (De Luis et al. 1996; Martín-Vide 2004). This influence can be seen in the study area, where the highest irregularity (map 3) is found on the southeast, close to the Mediterranean Sea.

The original database consisted of 380 series of daily precipitation with different lengths. Because of the frequent changes of position of the observatories within the same locality we created new series merging the data of the observatories located in the same town (<10 km of distance). We then selected the series with less than 15% of daily data lost (Karl et al. 1995), and with a common record period covering the range from 1951 to 2000. We adopted this strict criterion to ensure that all of the series were sufficiently long to provide reliable estimates of the probability of extreme events (Jones 1997), and that they covered the same record period to avoid variability in parameter estimation resulting from interanual climatic cycles. This led to a final database of 43 observatories (see location in Fig. 1).

The reduction of the spatial coverage of the weather stations can introduce some limitations to the analysis of climatic variables. The use of a high density of weather stations facilitates the mapping process independently of the interpolation method selected (Weisse and Bois 2002; Vicente-Serrano et al. 2003). Nevertheless, long climatologic time series are not frequent in the Iberian Peninsula because of frequent changes in the location of observatories. Also, the series can greatly differ in the period of record.

The length of the dataset is an important aspect in the analysis of climatological variables, but in the case of extreme value analysis it becomes critical, because the samples are reduced to only the highest values in the range of the variable (Jones 1997). The problem is even higher under Mediterranean climate, which is characterized by high interannual variability of precipitation. In Spain significant differences in the annual averages have been found between decades (Esteban-Parra et al. 1998; Rodríguez-Puebla et al. 1998; Rodríguez et al. 1999), which implies the need of long series to provide robust estimations. Significant temporal variability and trends in extreme events have also been found in different areas of the Iberian Peninsula (García et al. 1995; González-Hidalgo et al. 2003).

The adequate length of series needed to obtain reliable predictions about the frequency of extreme events is subject to debate. Benson (1960) indicated that the 50-yr return interval estimations, by means of annual maxima following a Gumbel distribution with a 25% error, require 39 yr of data. Porth et al. (2001) analyzed the adequate sample size for return interval estimation and found that 20 yr of data provide estimates with a 20% rate of error, and in order to obtain the return periods with less than 20% error 25 or more years of data were necessary.

Considering the previous discussion, we decided to maintain a large temporal extent of the database (50 yr), despite the reduction of spatial coverage. Although the methods used to analyze the probability of the extreme events do not require that there be a complete series, we decided to fill in the lost data to be certain that all of the observatories recorded the same precipitation events, to avoid the possibility that the lack of a high event in only one observatory could affect the frequency distribution of the variable in that location.

The process of completing missing daily data is highly problematic. It is common practice to fill the gaps by means of statistical techniques. Karl et al. (1995), Karl and Knight (1998), and Brunetti et al. (2001, 2002) filled missing values by generating random rainfall amounts based on the probability distributions of the variables studied. The goal of this procedure was not giving a realistic estimate of the unknown daily values, but to obtain data series of equal length without changing the probability distributions of rainfall amounts. To fill the gaps, we followed the procedure described by Romero et al. (1998) to create a complete daily precipitation dataset in the Mediterranean coastland of the Iberian Peninsula. They used the weighted average of the records at the nearest stations to fill the gaps in the weather station under study. The spatial criterion that they used to select the auxiliary weather stations had a distance limit of 60 km. In our case we selected only the weather stations around 15 km of the station to be completed. The high density of weather stations allowed records to be completed with a high reliability. Only on the rare occasions (<5% of total missing data) when no data were available at the circle of 15 km did we consider a wider radius of 30. A total of 47 317 daily data were filled, with an average of 1100 data in each weather station (5.7% of the data) and a range from 31 (0.16%) to 2773 (14.88%) records.

The quality control and homogeneity of daily climatological records is very complex. There are no standard tools in the scientific literature to test the homogeneity of daily datasets, as is the case for monthly datasets. For this reason, different authors have addressed the problem by first aggregating the data from daily to monthly records, and then applying the homogeneity tests to the aggregated series (Manton et al. 2001; Brunetti et al. 2002). This has been the method followed in this study.

Thus, we tested the monthly aggregated series against inhomogeneities using the standard normal homogeneity test developed by Alexandersson (1986) and Alexandersson and Moberg (1997). For this purpose the AnClim software was used (Štìpánek 2004). Because we did not have the metadata of the weather stations we decided to apply a test of relative homogeneity. The relative homogeneity method compares the temporal evolution of the series under study with the temporal evolution of a reference series, which records the evolution of the neighbor stations. The procedure developed by Peterson and Easterling (1994) was followed to create each reference series, consisting of using the average of the five highest correlated series with the series to check (using the first difference series).

We only found significant shifts in five monthly series, which do not coincide with changes in the reconstruction of the series. The inhomogeneities were neither important nor detected in other months on the same observatory. Moreover, we did not have the metadata of the stations to be sure that the shifts coincide with changes in the location of the observatories. The shifts were recorded in small population centers or depopulated areas in which urban effect does not exist. For this reason we considered that there were no significant inhomogeneities in the dataset used. However, some inhomogeneities may remain in the data at a daily scale, which can affect the distribution of extremes in the dataset and might have some impact on the accuracy of our results.

## 3. Methods

### a. Hazard estimation by the extreme value theory

The extreme value theory deals with the analysis and estimation of the tails of random variables, adjusting different distribution functions to the highest or lowest observations in a given dataset (see Hershfield 1973; Smith 1990 and 2003; Reiss and Thomas 2001). This allows for the estimation of the probability of events greater than those observed during the period of record. The mostly used approaches for this are the series of maxima and the partial duration (PD) series.

The exceedances or PD series approach has been used for some time (see, i.e., Todorovic and Zelenhasic 1970; Davison and Smith 1990). It is based on censoring the original sample at a certain threshold value *x*_{0}, taking only the exceedances over that threshold: *y _{i}* =

*x*

_{i}, ∀

*x*

_{i}>

*x*

_{0}. In contrast with the series of series of maxima, like annual maxima, the PD approach allows more cases to be included in the sample, resulting in a much more accurate estimation of the parameters (Madsen et al. 1997). As demonstrated by Pickands (1975), the series generated by exceedances over a threshold tend to converge to a GP distribution, provided that the threshold value

*x̃*

_{0}is high enough. In the form commonly applied to PD series, the GP distribution is described by a shape parameter

*κ̃*and a scale parameter

*α̃*, and has the following cumulative distribution function:

It contains the exponential distribution as a special case when *κ̃* = 0 (second expression), a form that was used extensively in the first applications of the PD scheme. For *κ̃* < 0 the distribution is long tailed, and for *κ̃* > 0 it becomes upper bounded with endpoint at −*α̃*/*κ̃*. These two forms of the GP distribution are sometimes referred in the literature as the Fréchet and beta distributions, respectively.

The probability of an exceedance *Y* is frequently expressed by its return period *T*, which is the expected time between two consecutive occurrences of the event. Under the assumption of a homogeneous Poisson process, the return period is the inverse of the probability of exceedance; expressed in years, it is equal to

where *λ* is a frequency parameter equaling the average number of occurrences of *Y* per year in the original sample. Expressed in the original scale, this gives

The return period is only an expected, or most probable, value The probability that an event of magnitude *X* (expressed in the original scale) will occur at least once in a period of *t* years is

A common problem when analyzing natural data series is the likely presence of serial dependence or persistence in the process, resulting from the rainfall events that tend to occur clustered in groups. This is especially true for Mediterranean climates, where long periods without rainfall are followed by events that last several days (Martín-Vide 1989; Llasat and Puigcerver 1994; Martín-Vide and Llasat 2000). This constitutes a major problem, because persistence affects the premise of homogeneous random process. For this reason we adopted a cluster approach, considering that the series of clusters of rainfall (consecutive days with precipitation higher than 0 mm) adapt better to a random process (Beguería 2005). This is a common approach in the analysis of hydrological extremes (see, e.g., Cunnane 1979), and has been used with other environmental variables as well (e.g., Smith 1989). Thus, series of cluster maxima (maximum daily intensities in the clusters) were constructed for each station, and the remaining analysis was performed upon this dataset.

The following procedure was followed to perform the PD–GP analysis. First, a threshold value *x̃*_{0} was needed for extracting the exceedance series at every station. This is one of the most important issues in PD series modeling because the threshold level controls the size of the data sample. An important property of the GP model is that the shape parameter remains constant independent of the threshold (threshold stability), so the final predictions would be equivalent whichever the threshold value. Generally, a low threshold is to be preferred to a high one, because it maximizes the amount of data used for estimation. However, if the threshold is too low, the asymptotic property would not apply, and the GP model would not fit the data correctly.

An appropriate tool to assess the validity of the GP assumption is the mean excess plot, which is a plot of the average mean excess over a threshold against the value of the threshold. If the process follows a GP distribution at a given threshold value *x̃*_{0}, then the mean excess plot should appear to be approximately linear from this value on. Considering this, we selected the highest 1200 cluster maxima values at every station, corresponding approximately to the 93.5th centile of the original data series (*λ* = 23.7 events per year). This can be considered a sufficiently low threshold level for a starting point. Mean excess plots were constructed using increasing threshold values to evaluate the adequacy of the model to the data. In all cases the series generated were found to fit the GP model adequately at the lower threshold values.

Figure 3 shows, for example, the mean excess plot for the station of La Sotonera. As can be seen, the dots follow a straight line from the lowest value of the threshold, which equals 7 mm. Additionally, we tested the convenience of the threshold by means of the probability plot–weighted correlation coefficient (PPWCC; see below and also shown in Fig. 3). As can be seen, the process exhibits the highest values of this statistic around the selected threshold.

We estimated the GP parameters using the method of probability-weighted moments (Hosking and Wallis 1987),

and

where the caret designates that the parameter is estimated from the sample; *β _{r}* is the order

*r*probability-weighted moment, obtained as (unbiased estimator; Landwehr et al. 1979)

For assessing the goodness of fit of the GP models to the data, probability plots (PP) were constructed, which compare the observed exceedances *Y* and the expected (GP predicted) ones *Ŷ*. This are obtained by solving Eq. (1) for *y _{j}*:

where EmpCDF(*j*) is the empirical cumulative distribution of *y _{j}*, obtained using the plotting position formula of Landwehr et al. (1979):

where *j* is the position of *y _{j}* in the series of exceedances sorted in ascending order. The values of

*A*= −0.15 and

*B*= 0 were used, as recommended by Landwehr et al. (1979) for similar long-tailed distributions. The probability plot is a graphical technique for assessing if the dataset follows a given distribution function (see Chambers et al. 1983).

A probability plot–weighted correlation coefficient statistic was computed as a measure of goodness-of-fit (based on Filliben 1975):

where *y* and are the weighted averages of *y* and *ŷ*, respectively. Because the number of small exceedances is much greater than the number of high exceedances, a weighting function *ω _{j}* has been used in the calculation of the statistic;

*ω*is proportional to the probability of exceedance, thus giving more importance to the highest, less-frequent observations in the sample:

_{j}Additionally two error statistics, the probability plot–weighted mean bias error (PPWMBE) and the probability plot–weighted root-mean-square error (PPWRMSE), were computed to compare the results of the at-site GP models with the results from the regional model (see below). The PPWMBE represents the average error, and the PPWRMSE represents the standard deviation or errors. The prefix for probability plot PP stands, again, to reflect the fact that the comparison is made using two estimators of probability, and not directly between real and estimated values, as is common. The expressions for these statistics are described in Table 1.

### b. Hazard mapping using spatial interpolation methods

The results of the previous section are a series of at-site GP models from which it is possible to obtain return periods, probabilities of extreme events, etc., only at places where a climatic station is located. The purpose of this section is to extend this analysis to the rest of the study area, allowing for the elaboration of maps showing the spatial distribution of the above-mentioned extreme statistics. As explained above, our approach has been to develop a spatial probability model whose parameters are dependent on the location of the point of interest. The local parameters *α̃*, *κ̃* and *x̃*_{0}, obtained by the procedure outlined above, were thus the dependent climatic variables to model. With respect to the frequency parameter *λ* used in Eqs. (3) and (4), we observed that it did not present significant variability between stations, so we used a fixed value corresponding to the average of the at-site values.

There are different methods to predict the distribution of spatial variables: global, local, and geostatistical techniques (Burrough and McDonnell 1998). It has been shown that in mountainous areas and in regions with complex atmospheric influences, such as the middle Ebro Valley, the local interpolators and geostatistical methods, which do not use external variables, do not show the real spatial variability of climatic variables at different scales, and yield higher prediction errors (Daly et al. 2002; Weisse and Bois 2002; Vicente-Serrano et al. 2003). In fact, these methods usually provide bad results if the sample network is not dense (Dirks et al. 1998).

On the other hand, regression is a global approach based on empirical relationships between the variable of interest and other spatial variables, and tend to produce maps with a lower degree of generalization. But also, by applying smoothing filters of different sizes to the independent variables it is possible to capture the variability of the target variable at different spatial scales. Regression-based techniques adapt to almost any space and usually generate adequate maps (Goodale et al. 1998; Vogt et al. 1997; Ninyerola et al. 2000). We thus selected a multiple regression scheme based upon location parameters (powers of the coordinates of the point) and other spatially distributed independent variables. The value of a climatic variable at nonsampled points is predicted by the following transference function:

where *z* is the predicted value at the point (*x*), *b*_{0}, . . . *, b _{n}* are the regression coefficients and

*P*

_{1}, . . . ,

*P*are the values of the different independent variables at point

_{n}*x*. A forward stepwise procedure with probability to enter the set at 0.01 was used to select only the significant variables.

A list of the independent variables introduced into the model is shown in Table 2. The independent variables were generated from a digital elevation model (DEM) at a resolution of 1 km using the MiraMon geographic information system (GIS) software. Latitude and longitude (km) were calculated in universal transverse Mercator (UTM) system 30°N coordinates. They were used to fit first- and second-order trend surfaces to the data, which can account for the broad global spatial trends. Elevation (km) could be extracted directly from the DEM and introduced to the regression model, considering the well-known effect of elevation on precipitation. Slope and relief (difference in height between the point of interest and the highest point in a circle of 2.5 or 25 km) were directly derived from the DEM. In a different way, they both give information about the topography energy in the surroundings of a point, yielding high values in mountain areas and lower values on flatter areas, independent of their elevation. Also, the annual potential incoming solar radiation was estimated from the DEM and included in the model (Pons 1997). Low-pass filters with radii of 2.5 and 25 km were applied to elevation, slope, and incoming solar radiation maps in order to measure the influence of these variables at different spatial resolutions.

The normality of each variable was tested by the chi-squared test; *x̃*_{0} showed a normal distribution (*p* = 0.70), and this was also the case for *κ̃* (*p* = 0.98). In the case of the *α̃* parameter a logarithmic transformation was needed to attain normality (*p* = 0.95).

Because of the likely presence of correlation between several of the independent variables, a conservative value of 0.01 was set for a variable to enter the stepwise procedure, as recommended by Hair et al. (1998). However, because colinearity problems can arise even with correlations as low as 0.3, a condition index test [Statistical Package for Social Sciences (SPSS) software, version 12] was applied to the regression models. This test flags the models for which the coefficient uncertainty is too large because of multicolinearity, and recommends the use of a model with less number of parameters.

The spatial database was then sampled at the locations corresponding to the climatic stations. The regression analysis was performed upon this database using SPSS, version 12, software, and the final maps showing the distribution of the three parameters were obtained using ArcView, version 3.2, GIS.

We used a jackknife cross-validation method to validate the maps of GP parameters. A recursive procedure removed one station at a time and calculated the regression model with the remnant points, storing the predicted values of the parameters for the removed station. This procedure was repeated until all of the stations were evaluated, and the predicted values of the parameters were compared with the known values obtained before. The error statistics described in Table 3 (Willmot 1981) were used to compare both results.

Because of the global nature of the regression-based techniques, some local features can still not be represented in the model. For this reason, maps of the residuals (observed minus predicted values) of the parameters were visually inspected for spatial structure of the errors. A certain degree of spatial correlation was found in the case of the location and scale parameters, and a random pattern in the case of the shape parameter. For this reason, correction maps were constructed only for the two first parameters.

A local adaptive method, splines with tension (*ϕ* = 400), was selected for the interpolation of the errors (Mitasova and Mitas 1993). The final maps of *x̃*_{0} and *α̃* were obtained by adding the regression estimates and the maps of interpolated errors.

## 4. Results

### a. At-site extreme value analysis

The results of the at-site hazard estimation are shown in Fig. 4. For every station, the maximum daily rainfall is related to its average return period. Note the different shapes of the curves resulting from the use of local (not regionally averaged) *κ̃* estimates.

Goodness-of-fit and error statistics for the at-site GP models are given in Table 4. The mean bias error shows the very small skew of the predictions to the right (note that the scale is the same as that of the original variable: millimeters of exceedance above the threshold). The probability plot error and goodness-of-fit statistics can also be considered very good. The average PPWCC is 0.972.

Two probability plots are given as examples in Fig. 5, relating the observed exceedances with the theoretical values computed using Eq. (6). The plots show one of the stations with best agreement (La Sotonera, PPWCC = 0.999) and one with the worst (Escatrón, PPWCC = 0.922). A certain overestimation of the highest values is seen in the second case, but in general the level of agreement can be considered very good. Very similar plots were found in the rest of stations.

### b. Maps of the GP parameters

The standardized beta coefficients of the multiple regression of the GP parameters are shown in Table 5, along with the goodness-of-fit (*R ^{2}*) and error statistics for the final models. Only the significant variables are shown. The three regressions were significant at a 95% confidence level. The

*R*

^{2}values obtained can be considered good, especially for the origin (

*x̃*

_{0}) and scale (

*α̃*) parameters. The regression of the shape parameter (

*κ̂*) obtained a lower

*R*

^{2}, as expected from the greater uncertainty involved in its estimation, but it still was significant. Error statistics for the regression models are also shown in Table 6. For

*x̃*

_{0}, the mean absolute error was 0.03, a small value when compared with the range of the variable (0.1–7.6). The agreement between observed and predicted values is very high (0.99). The same occurs with

*α*, which has a mean error of 0.12 for a variable range of 6.6–19.5. For

*κ̃*the relative error is higher because of the great uncertainty of this parameter (0.06, range from −0.175 to −0.025).

Figure 6 shows the maps of the different parameters. The value of *x̃*_{0} is related with the latitude, showing a gradual increase to the north (map 6.1). Also, *x̃*_{0} shows relation with the relief, but at a broad scale (10-km average). The spatial distribution of *x̃*_{0} mimics quite well the general distribution of the precipitation in the area.

The distribution of *α̃* is slightly different, as shown in map 6.2; *α̃* is negatively related with the longitude and the distance to the Cantabrian Sea, capturing the gradual Mediterranean influence on climate toward the east. The consequence for the probability distribution of the extreme events is an increase in its variance, resulting in a more irregular distribution of the events. At midrange scale *α* is also related with the elevation and relief (2.5 km).

In contrast with the high spatial detail of the previous parameters, *κ̃* presents a coarser spatial distribution, because it is only related to the latitude and longitude. The result is a gradient from the north toward the southwest. The lowest values of *κ̃* resulting in greater importance of the right tail of the distribution and hence of the extreme events, are found along the southwest limit of the study area.

Because the spatial regression of the parameters introduces a new source of uncertainty to the estimations, new validation statistics were calculated for the resulting models. Table 6 reports the error and goodness-of-fit statistics obtained by comparing the empirical exceedance probability *Y* with the theoretical one obtained using the standard precipitation index (SPF) method *Ỹ*. The results from the validation were very similar to the ones obtained with at-site-estimated parameters (Fig. 5), which supports that the SPF model is a reliable method for estimating the degree of hazard of extreme rainfalls.

### c. Spatial distribution of extreme precipitation hazard

With the parameter maps in Fig. 6 the estimation of return periods of extreme events and quantiles becomes possible at sites without climatic data records. The following figures are examples of this: Fig. 8 shows the distribution of the expected return period of a 100- mm event, Fig. 9 shows the maximum daily expected precipitation for different return periods, and Fig. 10 shows the probability that an event of 100 mm yr^{−1} occurs within different time periods. Other maps can be of course obtained for different requirements.

In general, the greatest hazard or heaviest precipitation is expected in the foothills of the Pyrenees and the Iberian Range, in the north and southeast of the study area, respectively, coinciding with the greatest elevation and relief energy. But not only does the relief explain the spatial distribution of the hazard of extreme precipitation, the atmospheric influences are also important. The southeast regions are highly affected by the Mediterranean perturbations that cause precipitation events of high intensity, mainly during the autumn, because of the instability caused by the contrast between the hot sea surface and the cold air masses that arrive in this latitude (Millán et al. 1995; Serra et al. 1996). These perturbations become weaker as they ascend the Ebro Valley (Creus and Ferraz 1995), and this fact can explain the reduction of the extreme precipitation hazard to the west.

Also, the higher magnitude of precipitation expected in the northern mountains is explained by the influence of the perturbations associated with the polar front that arrives from the west. The effect of these fronts on intense precipitation is greatly increased by the mountain ranges that oppose the west flows, creating a great contrast with the center of the valley (Creus and Ferraz 1995; Ruiz 1982). Moreover, the mountains located to the north are affected by southwest flows associated to the negative phase of the North Atlantic Oscillation in winter (Martín-Vide and Fernández 2001). These are regional flows that affect the whole of the Iberian Peninsula and are reactivated in the northern mountains because of vertical movements of the air masses (Esteban et al. 2002; Vicente-Serrano 2004).

The central areas of the valley are isolated from these air masses because of the mountains located in the north and south. The aridity is high in this area and extreme droughts are highly frequent (Vicente-Serrano and Beguería-Portugués 2003). Nevertheless, the low hills located in the center of the valley (<700 m MSL) modulate the spatial hazard of extreme precipitation events because of their role in the convective movements, which cause high precipitation during summer months in the center of the valley (Cuadrat 1999).

## 5. Discussion

The methodology presented in this paper has some points in common with the regional approaches to extreme flood estimation (see Cunnane 1988). Regional approaches were designed to maximize the information contained in different data series within a homogeneous region, with the result of providing a more robust estimation of the distribution parameters. This applies specially to the shape parameter (*κ̃*), which is subject to a great uncertainty because high-order moments are involved in its estimation. For this reason a common average value of *κ* is used in most regional approaches, while the other two parameters can be estimated locally or averaged too. Regional methods have been extensively used for the estimation of extreme floods, but some examples exist of their use to model extreme rainfall (Alila 2000; Fowler and Kilsby 2002); and even some regionalization methods have been proposed for this variable (Cong et al. 1993; DeGaetano 1998).

In this paper we have obtained statistically significant spatial models of the three parameters of the GP model, including the shape parameter, which represents an improvement over considering a simple average value. At least in our study area we have demonstrated that spatial interpolation techniques can be used to obtain smooth, continuous spatial representations of the probability model parameters, resulting in a robust regional model. Because the regional model contains the spatial distribution of the parameters of the probability distribution, this allows the theory of extreme value analysis to be applied also to ungauged locations. This is not the case for most of the regional approaches in the literature, where one or more parameters need to be estimated locally from a real data series.

We used a standard multiple regression model to explain the spatial distribution of the probability distribution parameters, which allowed for the inclusion of explanatory variables likely to be related with extreme rainfalls. For example, many authors have highlighted the relationship between precipitation intensity and elevation and other topographic features (Konrad 1996; Prudhomme 1999; Prudhomme and Reed 1999; Lorente and Beguería 2002); but also the geographic location, the exposition to wind flows (Basist et al. 1994; Daly et al. 2002), or the proximity to hot seas (Llasat and Puigcerver 1994) can be determining factors that explain the spatial distribution of extreme precipitation.

Several comparative studies have determined the superiority of regression-based techniques in the case of variables with a complex spatial distribution and a normally not very dense sampling network, as is normally the case of climatic variables. Analyzing specifically extreme precipitation data, Weisse and Bois (2002) concluded that geostatistical methods (kriging) performed better than regression models only when the gauging network was dense enough. Similar results have been published for other climatic variables (Dirks et al. 1998; Vicente-Serrano et al. 2003). It must be noticed, however, that other spatial interpolation techniques like geostatistics (kriging) have also great potential for capturing the variability of a spatial variable, and its use should not be avoided a priori when analyzing other datasets.

The results from the cross validation showed good agreement with the estimates obtained by using standard at-site techniques at the gauged location, supporting the validity of the proposed method to estimate the hazard of extreme rainfalls also at ungauged locations. This is true, at least for locations showing similar characteristics in terms of the predicted variables to the set of climatic stations used for creating the spatial models. At points where one or more of these variables are far outside the range of the observations, however, it is necessary to extrapolate the results of the spatial models, and the results have to be considered as subject to higher uncertainty. This represents an important problem for any study addressing the spatial distribution of a climatic variable, because the stations rarely present a random spatial distribution, as would be desirable in an ideal case. In this study we have considered one year as the temporal unit of study, because it is common practice in the analysis of heavy rainfalls for the purpose of hazard assessment. No seasonal effects or different synoptic situations have been considered, because they present a complete cycle within the period of 1 yr, and thus do not introduce differences between different years. The explained methodology, however, can also be applied to datasets where the events have been classified according to the synoptic situation, or to the season of the year. This approach would allow for the development of different maps accounting for the spatial distribution of extreme rainfalls for every season or meteorological situation.

## 6. Conclusions

In this paper we address the problem of mapping the hazard of extreme precipitation, linking the theory of extreme value analysis and spatial interpolation techniques. We have showed that it is possible to obtain a probability model in which the distribution parameters vary spatially, yielding a robust regional extreme value model. As a product, maps showing the hazard of extreme precipitation at different recurrence times can be developed, or single values can be derived for nongauged sites.

Starting from daily rainfall records we have used partial duration series of rainfall cluster maxima, fitted to a general Pareto distribution, to obtain at-site estimates of the model parameters. The at-site magnitude–frequency curves have been validated against empirical frequency estimations using probability plot error statistics.

We have addressed the spatial distribution of the probability model parameters using georegression techniques, including location and other spatially independent variables as predictors, obtaining significant and well-fitted models. The residuals of two of the parameters—location and scale—have been incorporated to the maps by local interpolation using splines. A jackknife validation scheme has been used to provide goodness-of-fit and error statistics, showing very good results.

## Acknowledgments

The authors acknowledge financial support from the following projects: PIRIHEROS (REN2003-08678/HID); PROHISEM (REN2001-2268-C02-01/HID); BSO2002-02743; REN2002-01023-CLI and REN2003-07453, funded by Ministerio de Ciencia y Tecnología (Spain) and UE-FEDER, and “Programa de grupos de investigación consolidados” (BOA 147 of 18-12-2002), funded by Aragón Government. Research of the authors was supported by postdoctoral fellowships by the Ministerio de Educación, Cultura y Deporte (Spain).

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

* Current affiliation: Instituto Pirenaico de Ecología, CSIC—Campus de Aula Dei, Zaragoza, Spain, and Unit for Landscape Modelling, University of Cambridge, Cambridge, United Kingdom

*Corresponding author address:* Dr. Santiago Beguería, UCEL, Department of Physical Geography, Utrecht University, Heidelberlaan 2, P.O. Box 80115, 3584 CS Utrecht, Netherlands. Email: s.begueria@geog.uu.nl