## Abstract

Long-term global gridded datasets of observed precipitation are essential for the analysis of the global water and energy cycle, its variability, and possible changes. Several institutions provide those datasets. In 2005 the Global Precipitation Climatology Centre (GPCC) published the so-called Variability Analysis of Surface Climate Observations (VASClimO) dataset. This dataset is especially designed for the investigation of temporal change and variability. To date, however, the GPCC has not published how this dataset has been produced. This paper aims to fill this gap. It provides detailed information on how stations are selected and how data are quality controlled and interpolated. The dataset is based only on station records covering at least 90% of the period 1951–2000. The time series of 9343 stations were used. However, these stations are distributed very inhomogeneously around the globe; 4094 of these stations are within Germany and France. The VASClimO dataset is interpolated from relative deviations of observed monthly precipitation, leading to considerably lower interpolation errors than direct interpolation or the interpolation of absolute deviations. The retransformation from interpolated relative deviations to precipitation is done with local long-term averages of precipitation interpolated from data of the Food and Agriculture Organization of the United Nations. The VASClimO dataset has been interpolated with a method that is based on local station correlations (LSC) that is introduced here. It is compared with ordinary kriging and three versions of Shepard’s method. LSC outperforms these methods, especially with respect to the spatial maxima of interpolation errors.

## 1. Introduction

Several organizations collect global rain gauge observations. Among them are the Global Precipitation Climatology Centre (GPCC) of the German Weather Service, the Global Historical Climatology Network (GHCN) in the United States, the Climatic Research Unit (CRU) of the University of East Anglia in Norwich (United Kingdom), and the Food and Agriculture Organization of the United Nations (FAO). While the first three of these organizations publish global datasets of interpolated monthly rain gauge observations, the latter provides original station data and the interpolation software New_LocClim (FAO 2005), allowing users to choose an interpolation method and settings that best fit their purpose. However, FAO only provides a limited number of monthly precipitation time series (13 568) but long-term monthly means for 27 909 stations.

The GPCC offers four global datasets of time series of gridded monthly precipitation interpolated from global rain gauge observations: The First Guess Product, the Monitoring Product, the Full-Data Product, and the Variability Analysis of Surface Climate Observations (VASClimO) dataset (see http://gpcc.dwd.de). The latter was published by the GPCC in 2005 (Beck et al. 2005; Rudolf and Schneider 2005). As clarified by the GPCC (Becker et al. 2012), it was interpolated by the author of this paper. The VASClimO dataset is especially designed for investigations of global precipitation variability and change. In this article the author provides background information about the generation of this dataset that up to now has not been provided to the scientific community.

The result of the interpolation of observed time series depends not only on the interpolation method used. Different station selections can be used to optimize the result for various purposes. All available station data might be used if the most detailed result for each individual time step is desired. If, however, temporal homogeneity is desired, it might be better to remove stations with only short time series from the interpolation. The number of missing values accepted to use a station is a matter of judgement. If one is too demanding, only a few stations are left for the interpolation, and important information might be lost. For the VASClimO dataset, station records are used if they cover at least 540 out of the 600 months (90%) of the observation period from 1951 to 2000. The GPCC provided these station data to the author. They build the basis for the analyses of this paper and are discussed in section 2. An enhanced quality control of the data is performed by the author and discussed in section 3. This includes visual checks, an outlier test, and an analysis of spatial representativeness based on the correlation between time series of neighboring stations.

Precipitation follows a skewed distribution and shows high spatial variability. Therefore, interpolation results and quality might depend strongly on the interpolation method used. The GPCC interpolates all their datasets by variants of Shepard’s method (Shepard 1968; Becker et al. 2011, 2013). Beck et al. (2005) and Rudolf and Schneider (2005) published that the VASClimO dataset has been interpolated by ordinary kriging (Krige 1951). Both methods, Shepard’s method and ordinary kriging, are briefly discussed in section 4. In the same section, another method that is based on individual station correlation length is introduced. This latter method is called local station correlation method (LSC) by the author of this paper. The VASClimO dataset has been interpolated with this method.

The interpolation methods are applied not only to the station data directly but also to transformed variables, like absolute and relative deviations from the long-term mean. Because of the large spatial variability of precipitation, it might be expected that the interpolation of those deviations leads to better results. Nine different transformations are compared in section 5.

Long-term averages of precipitation are available at many more locations than time series. This information can be used when transformations are applied that include the mean. The use of this additional information within the interpolation strategy increases the representativeness of the results.

Time series that cover less than 90% of the interpolation period are excluded from the interpolation to avoid artificial inhomogeneities in their vicinity. However, these time series still contain valuable information. The most obvious are some moments like the local mean precipitation, its annual cycle, and the variance. They also contain information about the explained variance by the observations at neighboring stations. This information can be used for the interpolation. Section 4 describes this approach for the generation of the VASClimO dataset. The quality of several interpolation strategies is discussed in section 6. Section 7 describes the generation of the VASClimO dataset based on the results of the previous sections. A final discussion of the results is provided in section 8.

## 2. Database

The GPCC provided three station datasets to the author, covering the period 1951–2000. These are

5506 globally distributed time series of quality-controlled and homogeneity tested observed monthly precipitation from the database of the GPCC,

3210 additional observed German monthly precipitation time series, and

627 additional observed French monthly precipitation time series.

The first dataset already contains 122 French and 135 German time series. The average fraction of missing data per calendar month (January–December) is between 2.1% and 2.4%. These datasets allow for the comparison of local dense station networks and a global coarse network. Combining all three datasets to one global dataset, containing 9343 stations, allows for the analysis of a global dataset with very inhomogeneous spatial station density. These four datasets are named , , , and from here on.

Two additional datasets, published by the FAO (FAO 2001), are used in this paper. These are 1) observed monthly precipitation time series of 13 568 stations and 2) long-term monthly mean precipitation of 27 909 stations.

While the French and German datasets are quality controlled by the respective meteorological services, the global dataset from the GPCC is merged within the GPCC from various sources. Therefore the GPCC has done a comprehensive quality control of the dataset , including outlier and homogeneity tests. For details see Beck et al. (2005) and Schneider et al. (2011, 2014).

Aside from the synoptic observational data (SYNOP) and monthly aggregated data (CLIMAT) transferred regularly via the Global Telecommunication System of the World Meteorological Organization (WMO), there are time series submitted by a variety of national and regional agencies. Furthermore, CRU, GHCN, and FAO submitted their databases to the GPCC; 85% of the observations in the dataset are taken from the data collections of CRU and FAO (Beck et al. 2005). A large fraction of the data from different sources comes from the same stations. Observation periods for individual stations, however, can be very different depending on the time the national services submitted their data to the various agencies collecting the data.

Merging all these data bears the risk of erroneous attribution, and this in turn can produce temporal inhomogeneities. The GPCC (Schneider et al. 2014) bases the merging process on station metadata like station coordinates, name, elevation, and WMO number, if available. They themselves use observations only sporadically to determine whether data from different sources belong to the same station.

Within the VASClimO project, C. Beck and the author of this paper compared different measures of data agreement between station observations to be added to the GPCC database and those already in the database. Percentage of full agreement, deviations not exceeding 1 mm, and average and maximum differences were extracted to allow an informed decision as to whether a time series is partly in line with one already in the database. A typical error in station records is the confusion of no precipitation and missing values. Therefore, zero precipitation and missing value codes were also compared.

This strategy has two major advantages compared to the semiautomatic method used in the GPCC (Schneider et al. 2014). First, it avoids the problem of data being attributed to an incorrect station in the database based on good agreement of the metadata. Second, it avoids the automatic generation of a new station within the database if no agreement with the metadata of any station within the GPCC database is found. However, this methodology was applied to GHCN and FAO data only.

For all stations available in the GPCC database that contain data from different sources for overlapping periods, Beck (2004) tested various measures of agreement. Beck et al. (2005) summarized some of the results. While the data in the GPCC database originating from FAO, CRU, and GHCN differ from each other by more than ±1 mm month^{−1} only in about 10% of the cases, the monthly precipitation data generated by the GPCC from SYNOP observations deviate by more than ±1 mm from those of other sources in about 60% of the cases. A major result of Beck (2004) is that data from CRU, FAO, and national and regional data sources differ on average (mean absolute difference) by less than 3 mm month^{−1} while monthly data generated from SYNOP observations by the GPCC differ on average by 19.2 mm month^{−1} from CRU data and 18.5 mm month^{−1} from FAO data. This shows that the way monthly data are aggregated from individual observations strongly influences the resulting monthly precipitation sum. Monthly precipitation sums aggregated from individual observations by the GPCC differ considerably from the results of national agencies.

About 55% of the station data in are merged from different sources. This corresponds to 93% of the station records outside Germany and France. Therefore, homogeneity tests of the monthly time series are crucial to detect inhomogeneities resulting from the merging within the GPCC. Beck et al. (2005) applied relative homogeneity tests to the dataset in the frame of the VASClimO project. However, relative homogeneity tests cannot be performed on all station records. Only regions with high station density allow for the generation of reference time series necessary for the application of relative homogeneity tests. Therefore 12.9% of the stations in dataset were not tested for homogeneity (Beck et al. 2007), which corresponds to 21.5% of the global dataset . These stations are included in the dataset to avoid spatial gaps in data-sparse areas. Beck et al. (2007) homogenized 31% of the stations of .

Standard outlier tests including observations of neighboring stations were performed. From the nearly 5.5 million observations, about 6000 exceeding 4 times the interquartile range of the respective time series and of the reference value generated from the homogeneity test were replaced by missing values by Beck et al. (2007). Since both the homogeneity and the outlier test were only applied to less than 80% of the dataset , the author performed further quality tests described in the following section.

## 3. Enhanced quality control

Three tests are applied to ensure highest possible quality of the data used for the generation of the VASClimO dataset. First, all 5506 time series of the global dataset are visually checked. The second test is an outlier test: For all 9343 time series and each calendar month, the probability that the highest observed value is not from the same distribution than all other values of the time series is estimated. Third, the temporal correlation between neighboring stations is calculated to identify stations with data not in line with regional precipitation observations.

### a. Visual check

All time series of the dataset are visually checked for obvious inhomogeneities since the relative homogeneity test used by Beck et al. (2007) could not be applied to all time series. Obvious inhomogeneities are found in 13 station records. One example is shown in Fig. 1. The first 37 years of observations and the last 13 years either do not reflect the precipitation at the same location or are provided in different units.

The visual inspection also reveals stations for which data of several consecutive years are exactly the same. An example is shown in Fig. 2, where the first 6 years contain identical observations. Time series like these are not identified by the GPCC and therefore not excluded from the GPCC products.

### b. Outlier test

The GPCC performs outlier tests (Schneider et al. 2014). Their strategy is to flag a certain percentage of the most extreme values (measured as exceedance of empirical percentiles) within datasets they obtain. The flagged data are visually inspected (Hechler 1999) to decide whether these are extreme observations or outliers. According to Schneider et al. (2014), the threshold was changed from 4% until 2009 to 2% afterward. The application of this strategy to 12 calendar months and 9343 time series of 50 years would have led to an expected number of 224 232 visual inspections and subjective decisions in the dataset . This high number results since 4% of the values fulfill the criteria regardless of whether there are outliers. Since this strategy leads to far too many cases to handle, the GPCC did not apply outlier tests to large data deliveries like the ones they obtained from CRU, FAO, and GHCN as well as large datasets from national agencies. The flagged values are visually checked by a “trained expert” (Schneider et al. 2014); undoubtedly, reliability of the visual checks highly depends on the expertise and experience of the person in charge.

Therefore an objective procedure is used here. It flags extreme values if they are not in line with the rest of the time series. For each calendar month and each station the null hypothesis is that the highest observed value is in line with all the other observations of this calendar month at the respective station. It is assumed that all observations are from the same probability distribution *F*. If *F* is known, the probability that the highest value of *N* observations exceeds is

To get an approximation of *F*, a Weibull distribution is fit to the observations of each calendar month and station leaving out the largest observed value since this is the one to be tested based on the distribution of the other observations. The Weibull distribution is very general and fits monthly precipitation data well (see, e.g., Legates 1991; Trömel and Schönwiese 2007). Skewness can be negative, zero, or positive, depending on the parameters of the distribution. Both, narrow-tailed and fat-tailed distributions can be described by a Weibull distribution. The plotting position is used, where *N* is the number of observations and *m* is the order number of the *m*th lowest observation. This attributes the median of the distribution of exceedance probabilities to each observation (see, e.g., Hyndman and Fan 1996).

A Kolmogorov–Smirnov test (see, e.g., Wilks 2006) is applied to measure the deviation of the observed distribution from the best-fit Weibull distribution. If the Weibull distribution does not deviate significantly from the distribution of the observations, it can be used to estimate the probability that a single observation is at least as high as the highest observation , that is, . Equation (1) then provides the probability that the highest observation of a time series of length *N* at a station and a calendar month is no outlier. The result of the Kolmogorov–Smirnov test measures the quality of the test strategy.

This test is applied to 9343 stations and each calendar month, that is, 112 116 times. In 2618 cases the test was not applicable because fewer than 15 values differed from zero. The Kolmogorov–Smirnov test indicated deviations on the 99% level in only 21 cases (by calendar month: 5, 2, 1, 0, 1, 2, 3, 2, 1, 0, 1, and 3); 15 of these cases are time series with only low integer values, that is, precipitation never higher than 10 mm. Even if these time series were from a Weibull distribution, the observation accuracy of integers made the Weibull distribution a bad fit, as can be seen in Fig. 3. In the residual six cases precipitation is in the hundreds of millimeters with only one occurring zero each. Without the zeros these cases can be expressed by the Weibull distribution. It cannot be excluded that the observation of zero precipitation in these cases, in fact reflects missing observations interpreted incorrectly as zero precipitation.

For the other 109 477 cases the application of the outlier test is justified. Equation (1) can be used to flag those maxima () for which the rest of the observations suggest that they should be lower with a desired probability . Table 1 shows results for three cases of and individual calendar months. Choosing 99%, the number of flagged outliers per calendar month is between 366 and 1031. These numbers exceed the expected exceedances for 9343 time series, that is, 93 per calendar month, by a factor of 4–11. Therefore the dataset contains several thousand precipitation maxima that are not in line with the other observations of the same station and calendar month.

These values may not be erroneous since highest observed precipitation may not result from the same probability distribution as the rest of the observations. Typical phenomena that lead to extraordinarily high rain amounts are atmospheric rivers (Zhu and Newell 1994) and the Vb cyclone tracks (van Bebber 1891) in Europe. The rare Vb cyclones move from the Mediterranean Sea northward, passing the Alps on the east, and bring extraordinarily high rain amounts to parts of Germany, Poland, and the Czech Republic. A Vb cyclone is responsible for the highest measured precipitation in Dresden, Germany, observed on 1 August 2002.

The strategy discussed here reduced the number of questionable observations from 224 232 (in line with the GPCC strategy in 2005) to 7728, that is, by a factor of 29. In other words, 28 out of 29 values that the GPCC strategy would flag as possible outliers are actually in line with the residual observations and do not need visual inspection.

Here, only the 799 values that do not fit to the rest of the distribution with a probability of 99.999% are visually checked. While some of these values might be explicable by rare weather phenomena, others are more likely to be erroneous. Figure 4 shows a typical example of such an outlier that is likely to be a result of a factor-of-10 error; that is, the value of 499 mm month^{−1} might in reality mean 49.9 mm month^{−1}. In compliance with the orders of the GPCC, none of these values are altered or excluded from further analysis or from any of the GPCC products.

The strategy applied here has three major advantages. First, it does not flag values as potential outliers data that are in fact in line with the distribution of the time series. Second, potential outliers can be ordered by the estimated probability that they actually are outliers. This allows one to deal with the most significant cases first. Third, thresholds can be chosen to group potential outliers. All values above a threshold can become the subject of further research. Those values that can be identified as nonerroneous by further evidence (e.g., reports) can be analyzed to identify specific weather phenomena that produced extreme precipitation.

### c. Local correlation length

Another method to investigate the quality of a set of precipitation time series is a correlation analysis between neighboring stations. Correlation between time series of neighboring stations is high if station density is higher than spatial variability of precipitation. In this case there must be a particular reason if a station shows considerably lower correlation to its neighboring stations. Possible reasons are 1) the station observations reflect very local conditions not representative for a larger area, 2) the station is mislocated in the database, and 3) the time series contains erroneous data.

A simple way of attributing a correlation length *λ* is by assuming an exponential decay of the explained variance with distance *d* as

This equation can be applied to each station and calendar month to estimate a station- and calendar-month-specific correlation length.

Each station *i* has neighboring stations. For each pair the distance is known and correlation *ρ* can be estimated from the time series of observations. Averaging over all *N* neighboring stations yields an estimate of the correlation length. However, the error variance of this estimate becomes huge in the case where correlation with at least one neighboring station is close to zero or unity. This can be shown by error propagation. The estimate of is

The error of this estimate as function of a neighboring station is

And since

the uncertainty of diverges to infinity whenever at least one neighboring station has a correlation close to zero or close to unity. A far more robust estimate of the local correlation length follows from minimizing the sum of squared differences between the logarithm of observed temporal correlations between neighboring stations and the ones modeled from Eq. (2), , as

Minimizing with respect to leads to

This equation is used to attribute a correlation length to all stations and calendar months.

The correlation length offers much information. If a station is mislocated—for example, because of swapping latitude and longitude—it should have nearly no correlation even to close stations. In this case the correlation length is small, and the station can be identified as erroneous. If a station has no other stations in its vicinity, it usually has a large correlation length, meaning that it is the only local source of information and therefore very important. This helps to identify regions with pronounced undersampling based on observed regional precipitation variability.

For the estimation of the correlation lengths of the 9343 stations for which 50 years of data are available, the time series of the 13 568 stations provided by the FAO are also used. Although these time series are too short to be used for the interpolation they contribute valuable information on local spatiotemporal variability of precipitation. The only condition applied was that they have at least 15 years of observations in the period from 1951 to 2000.

Results for January and July are provided in Fig. 5. No stations with considerably lower correlation length than neighboring stations were found, indicating that the GPCC quality control with respect to station location is satisfying.

To evaluate the regional station density, the locally explained variance by the closest station according to Eq. (2) is calculated for each calendar month and each grid point of a global land surface grid with 0.5° resolution. Figure 6 shows the global maps of for January and July. Aside the large spatial variability, Fig. 6 reflects the pronounced annual cycle of . This annual cycle needs to be considered when spatial station density and precipitation sampling error is evaluated.

The calculation of the locally explained variance allows general remarks of the representativeness of observed monthly precipitation for the global land surface excluding Antarctica and Greenland (together they cover 10.85% of the global land surface). Figure 7 shows the empirical cumulative density function of the explained variance by the closest station for January and July. Only in about 10% of the area is the explained variance by the closest station less than 30%. For about 20% of the area, the explained variance is higher than 95% in January and higher than 89% in July. The local correlation lengths have been used for the interpolation of the VASClimO dataset as described in the following section.

## 4. Interpolation

The results of interpolation depend on the interpolation method used. According to Becker et al. (2013) the GPCC so far has used two variants of Shepard’s method for the interpolation of their datasets. They published in Beck et al. (2005) and Rudolf and Schneider (2005) that ordinary kriging is used for the interpolation of the VASClimO dataset. In fact the VASClimO dataset is interpolated by a method based on the local correlation length discussed in the previous section. The three methods are now discussed.

Shepard’s method is based on inverse-distance weighted averaging (IDWA). It overcomes some of the shortcomings of IDWA without being too computationally expensive. Shepard introduced a scale that depends solely on the average station density. A radius *R* is defined so that, on average for each grid point, seven stations are not farther than this radius from the grid point. If local station density is high, the radius is reduced to ensure that data of not more than nine stations are used. In regions with low station density, the radius is locally enlarged to ensure that observations of at least three stations are used. The use of a limited number of stations for each grid point is not only more efficient than using all stations, it also avoids the problem of interpolation results being affected by observations taken far away.

As with IDWA, weights are inversely proportional to a power of the distance between a station and a grid point. This ensures that closer stations get higher weights. Shepard fixed the exponent to 2 for stations close to grid points (and forced the weights to zero at distance *R*). However, the inverse of the squared distance becomes arbitrarily high whenever a station is very close to a grid point. To avoid numerical overflow errors, Shepard defined an *ε* environment around each grid point. If stations are closer than *ε* to a grid point, the arithmetic mean of the observations of these stations is used as an estimate of the value at the grid point. For the interpolation of global precipitation data to grids with 0.5° grid size the GPCC originally used an epsilon environment of 5-km radius (Rudolf et al. 1992). This leads to the effect that in data-dense areas some stations automatically get ignored and information is lost.

Although the GPCC published in 2011 that they still use this approach (Becker et al. 2011), in 2013 they published (Becker et al. 2013) that they had altered the methodology in 1994. Whenever at least one station is closer than 5 km to a grid point, they use the arithmetic mean of all stations closer than 0.25° to the grid point as an estimate for the precipitation at the grid point. In Becker et al. (2013) they claim they are doing this to avoid ignoring too many stations while at the same time avoiding the “double use of stations.” Note that if the GPCC uses Shepard’s method, they use on average seven stations per grid point. Their global grid has more than 70 000 grid points. Assuming that they use on average 30 000 stations, this results in the fact that on average each station is used 16 times. The double use of stations is inevitable as long as the number of grid points exceeds the number of stations, which is generally the case in interpolation problems. However, if the GPCC really uses their version of Shepard’s method, they ignore stations within about (≈21.5%) of the area in data-dense regions [for details, see Becker et al. (2013), their Fig. 10]. Becker et al. (2013) state that they exclude a maximum of 21.5% of the stations. In fact, all stations within a grid cell can be located within the 21.5% of the area and get ignored by this approach.

The author compares three variants of *ε* environments in Shepard’s method. These are 1) averaging all station observations closer than 5 km to a grid point, 2) averaging all observations closer than 0.25° to a grid point if at least one station is closer than 5 km to it, and 3) reducing the epsilon environment to a radius of 5 m, which is still large enough to avoid overflow errors given that station coordinates are provided with an accuracy not higher than 0.001° and no station coincides with any grid point. While variants 1 and 2 are used by the GPCC, variant 3 is the cleanest use of Shepard’s method.

One particular disadvantage of IDWA is that it produces extremely steep gradients in the vicinity of stations. To avoid this, Shepard forced the gradients at station locations to a mean gradient and relaxed this condition with distance to station. While this improvement ensures smooth interpolated surfaces without spikes and unrealistically steep gradients, it can overshoot slightly out of the range of observations. This effect, however, is small and usually negligible. The GPCC (Rudolf et al. 1992) misinterpreted this small undesired effect as a deliberate act to estimate unobserved minima and maxima (see also their Fig. 4.2). Shepard’s method applies a gradient correction around station locations and not around grid points as written in Schneider et al. (2011).

Also, ordinary kriging introduces a scale. While the scale length of Shepard’s method is solely a function of the station density, the length scale of kriging is a function of the spatial variability of the variable as observed at stations. Kriging is based on the variogram that measures the mean squared difference of observations at different locations as a function of the distance between the locations. The empirical variogram is the one calculated from the observations. For large distances the variogram converges against a constant (the sill) if the variable under consideration is spatially stationary. A function is usually fit to the variogram, which reflects its increase with distance. A simple variogram function is with the sill *s*, the distance *d*, and the range *r*. This range is the length scale of kriging. It depends on the spatial variability of the variable as observed with given station density. Figure 8 shows empirical variograms and best-fit variogram functions.

Ordinary kriging is a best linear unbiased estimator if the variable is spatially stationary. This is not the case for precipitation on a global scale. With increasing distance to a location, chances increase that stations are in a different climate zone. Even stations that are very close to each other might observe different precipitation climates, for example, if they are in mountainous area. There, precipitation can be much higher on the windward side of mountain ranges than on the lee side, which can easily create very different precipitation climates and anticorrelations of neighboring stations. In other words, global precipitation violates the conditions necessary to ensure that kriging is the optimal method to interpolate it.

The third method applied here uses the correlation length calculated for each individual station based on temporal Pearson correlation with observations at neighboring stations (even those that have data only for periods too short to be included into the interpolation of a homogeneous product). The rest of the strategy follows ordinary kriging, that is, a system of linear equations for the weights of neighboring stations, and a Lagrange multiplier is solved. As compared with ordinary kriging, this third method uses the local length scale instead of a global one. This automatically dampens the influence of station observations not in line with neighboring observations. Only in their direct vicinity are these stations the most influential. As mentioned previously, this method is called LSC.

The length scales of Shepard’s method for the four datasets under investigation are 232 km for , 44 km for , 16 km for , and 178 km for . The length scale of Shepard’s method taken from the global datasets is an order of magnitude higher than that for the national datasets. All grid points within Germany have more than nine stations closer than Shepard’s length scale estimated from the global dataset . If Shepard’s method is applied globally, it uses data from the nine closest stations in Germany. Since there are 3345 stations but fewer than 200 grid points in Germany, at least one-third of the German stations get automatically ignored. This is a major disadvantage of Shepard’s method if station density is very inhomogeneous.

The length scales taken from the variograms depend on both station density and length scale of the variable under consideration, which in the case of monthly precipitation depends on the calendar month. For each station 20 neighboring stations are used to calculate the range of the variogram. Multiplying 50 years of observations and 20 neighboring stations results in 1000 × *N* points per calendar month for the estimation of the variogram, where *N* is the number of stations in the dataset. Before fitting the variogram function, this number is reduced by binning distances into 10-km bins and averaging the variogram values per bin. The resulting ranges of the variogram functions for the four datasets are provided in Table 2.

Interestingly, the ranges estimated using all data are slightly higher than those of the original dataset. This is counterintuitive since the ranges for and are much lower than for the global datasets. However, not only are the ranges lower for and , but the sill, that is, the magnitude of spatial variability, is also low. If the dataset is combined with and , the average increase of the variogram with distance is reduced for small distances while the sill (determined by large distances) is not affected. The exponential variogram function reacts with a slightly higher range. This shows the shortcoming of the application of a single and simple global variogram function to a spatially instationary and inhomogeneously sampled variable like precipitation. Figure 8 shows the variograms of the datasets and for September.

The variogram is a global measure, that is, it takes all available data into account to obtain a single range representative for all stations. The correlation length, discussed in section 3, provides a local station-specific length scale. Figure 5 shows the pronounced spatial variability that is ignored by ordinary kriging. Figure 9 depicts the empirical cumulative density function of the correlation lengths based on 20 neighboring stations and the dataset for four calendar months. It shows the wide range of variability of the correlation length. While ordinary kriging with a single variogram function is a global method, the use of the local correlation function is the most localized method.

## 5. Variable transformations

Does it make a difference whether precipitation is interpolated directly or first transformed, then interpolated and then transformed back? To investigate this question, nine different transformations of precipitation are interpolated. Table 3 provides a list of the transformations. The first transformation is just the neutral transformation; that is, the original data are interpolated. The second, third, and fourth transformation take the long-term mean observation into account, while the fifth, eighth, and ninth include the standard deviation as well. Transformations 6–9 apply the logarithm to decrease the skewness of the variable. In these cases a constant is added to the monthly precipitation to avoid as the transformation of zero precipitation. Two different constants are used: 0.5 and 1. The use of 1 as a constant transforms precipitation from a nonnegative variable to a nonnegative variable. The value 0.5 is chosen to see the impact of a smaller constant.

The application of these transformations to the different datasets and for different interpolation methods reveals that transformations 3 and 4 lead to best results in the case of global precipitation. This is discussed in detail in the next section.

## 6. Comparison of interpolation strategies

Five different interpolation methods (three variants of Shepard’s method, ordinary kriging, and LSC) and nine transformations can be compared. This adds up to 45 interpolation strategies. These strategies are applied to the four datasets available for this study, leading to 180 cases. All interpolations are performed in spherical coordinates. The formulation of Shepard’s method in spherical coordinates is introduced by Willmott et al. (1985).

A jackknife approach is applied to measure and compare the quality: Each station is left out of the interpolation in turn while the other stations are used to interpolate to the location of this station. This leads to time series of interpolated precipitation at each station being compared with the observed precipitation. Three error measures are calculated per station: 1) the mean error (the bias), 2) the root-mean-squared error (RMSE), and 3) the mean relative error (average of the ratio of absolute of interpolation error and observation).

The different interpolation strategies lead to a wide range of interpolation errors. Since all transformations that include the logarithm lead to some cases with extremely high errors, they are excluded from the further discussion, although they perform well on average. Transformations 2–5 generally show better results than direct interpolation and are further investigated. Transformations 3 and 4 show identical results. Therefore, transformation 4 is excluded from further discussion as well.

The comparison with respect to interpolation methods is done in two steps. First, the three versions of Shepard’s method are compared. In the second step the best of these three versions is compared to kriging and LSC. The different versions of Shepard’s method would all lead to exactly the same results if no station was closer than 5 km to any grid point. Therefore, the jackknife error for each station with no stations closer than 5 km to it, is unaffected. Differences between the three versions should become most important in areas of high station density. Table 4 shows some parameters of the distribution of the RMSE for the dataset .

Obviously, the recently published strategy of the GPCC—that is, averaging all stations closer than 0.25° to a grid point if one station is closer than 5 km—behaves worse with respect to all parameters than the use of a small *ε* environment as suggested by Shepard. Differences are most pronounced in the third quartile and maximum RMSE in the case of the interpolation of untransformed precipitation, where the GPCC version exceeds the error of the use of m by 4.1% (14.28 vs 13.72 mm) and 8.9% (140.8 vs 129.3 mm), respectively. This shows that it is important not to alter Shepard’s method unnecessarily.

The results for bias and relative error are not shown. They show qualitatively the same behavior. The average bias of all transformations is very small (usually below 1 mm month^{−1}). Average relative errors are 13.3% in the case of transformation 3 and 15.2% in the case of transformation 1. Maximum relative errors, however, reach 30.9% and 65.2%, respectively. This shows that the use of relative deviations outperforms direct interpolation by more than a factor of 2 in the worst cases. This effect is also visible in the other datasets.

Table 4 shows that the transformation of the observations has a strong impact on the quality of the interpolation. While the interpolation of absolute deviations already leads to smaller errors than direct interpolation, the interpolation of relative deviations shows further significant improvement. The mean RMSE in the case of m is 15.9% higher in the case of direct interpolation (12.62 mm) than in the case of interpolation of relative deviations (10.89 mm). Using also the standard deviations of precipitation observations (transformation 5) leads only to small further improvements for most of the parameters. The maximum RMSE, however, increases in this case.

While the average bias is always below 1 mm month^{−1}, the station-specific bias ranges from −54.9 to 52.6 mm month^{−1} in the case of no transformation, from −1.7 to 2.1 mm month^{−1} in the case of transformation 2, and from −1.5 to 1.3 mm month^{−1} in the case of transformation 3. This means that the use of absolute deviations reduces the bias considerably. Relative deviations further reduce it slightly. The major effect of using relative deviations instead of absolute deviations is the reduction of error variance as expressed by the RMSE and the maximum relative errors. Among the interpolation strategies that include Shepard’s method, the interpolation of relative deviations with a small *ε* environment as suggested by Shepard performs best. This combination clearly outperforms the GPCC strategy, which, according to Becker et al. (2013), changed in 2008 from interpolating observations directly to the use of absolute deviations.

In the next step, the three interpolation methods (ordinary kriging, Shepard’s method with m, and LSC) are compared. The influence of the interpolation method is generally rather small. For the dataset , the maximum RMSE reduces from 125.8 mm month^{−1} in the case of Shepard’s method to 124.3 mm month^{−1} in the case of ordinary kriging and 124.12 mm month^{−1} in the case of LSC. The ratio of these values is typical for all error measures. While ordinary kriging and LSC interpolation errors differ by less than 1%, these methods outperform Shepard’s method by up to some percent. This result is seen in all four datasets and for most of the parameters of the error distributions. Larger differences, however, are seen for the maximum relative error. Table 5 shows the results for the dataset . The table shows that Shepard’s method performs best if precipitation is interpolated directly, while it performs worst if any of the transformations is applied. LSC and ordinary kriging outperform Shepard’s method by factors of 6 and 4, respectively, with respect to the maximum relative error if relative deviations are interpolated.

As a result of this discussion and the numbers provided in Table 5, the author decided to produce a global gridded precipitation dataset based on the dataset interpolating relative deviations with LSC. The generation of the interpolated dataset is discussed in the following section.

## 7. Generation of the VASClimO dataset

The global gridded dataset is generated on the grid used by the GPCC. It has a mesh width of 0.5° latitude and longitude and covers the global land area. Antarctica and Greenland are excluded because of a lack of data. Precipitation is interpolated to each of the four corner points of a grid cell. Results of the four corner points are then averaged to get a gridcell average. This strategy is taken over from the GPCC. To perform the back transformation from relative deviations, estimates of mean precipitation at grid-corner points are needed. Mean precipitation at the grid points is interpolated from the combination of the monthly long-term means of the 50-yr time series of and the long-term means provided by FAO. The use of these 27 909 station averages increases representativeness considerably since only 5249 stations of the dataset are outside Germany and France. However, no quality control is applied to the long-term means provided by FAO. It is ensured neither that they are located correctly nor that the observations are realistic. Furthermore, for most of these stations the duration over which precipitation has been averaged to obtain the long-term mean is unknown.

How does this influence the mean climatology? WMO suggests an observation period of 30 years for reliable estimates of the mean. This can be quantified. According to the central-limit theorem, the distribution of the mean converges toward a normal distribution with a standard deviation of , where *σ* is the standard deviation of the variable itself and *N* is the number of realizations from which the mean is estimated. The uncertainty of an estimated mean () can therefore be written as

where and are the estimates of standard deviation and mean of the mean, respectively, and *γ* is the coefficient of variation (standard deviation over average) of the observations. Equation (8) can be used to get the number of observations necessary to ensure a certain accuracy of the mean or, vice versa, the accuracy of the mean if a certain number of observations is available. The uncertainty in the long-term mean depends linearly on the coefficient of variation but goes with . For 30 years of monthly observations, the uncertainty in the estimated mean is about 18% of the coefficient of variation or 1.8% if but nearly 55% if . For monthly precipitation of most of the global land surface, *γ* is between 0.1 and 3. It exceeds 3 for very dry areas and calendar months and goes down to 0.1 only for very wet regions and calendar months. Maps of *γ*, estimated from the dataset of this paper are calculated by the author and provided on the website of the GPCC (http://gpcc.dwd.de). The impact of a change of *N* on the uncertainty can easily be deduced from Eq. (8) to be

Note that *η* is independent of the coefficient of variation. If only 10 years of observations are used, the uncertainty in the mean rises by 73%. If 50 years are used, it decreases by 23%. The uncertainty due to observation-period lengths differing from 30 years is therefore below a factor of 2. Given the large spatial variability of precipitation, it is assumed that this uncertainty is smaller than the error that would be made if the 27 909 long-term means of the FAO were not used. The long-term means are interpolated using ordinary kriging.

In 2005 the GPCC published the final dataset based on the dataset and interpolated by LSC under the name VASClimO product on their website (available at http://gpcc.dwd.de). It is unchanged since 2005 although the GPCC still announces on that website that “the gridded dataset will be updated in periods of 3 to 4 years.”

## 8. Discussion and conclusions

The database of the GPCC contains time series of monthly station observations from a variety of sources. The data are merged on the basis of station metadata. Beck (2004) found large inconsistencies in the database of the GPCC between data provided from different organizations. The merging of the FAO and GHCN databases with that of the GPCC was done in the VASClimO project. Only for these data are all station observations compared. This minimized the risk of attributing data to a wrong station within the GPCC database or, alternatively, erroneously creating new database stations. This strategy was not adopted by the GPCC.

Within the GPCC, outlier tests are only applied to small datasets since an unnecessarily high number of observations (1 in 25 until 2009; 1 in 50 thereafter) are flagged as possible outliers to be visually checked. However, all data used for the VASClimO dataset were checked for outliers in the sense that only extremes not in line with the other observations of a time series got flagged. This strategy reduces unnecessary work considerably and is therefore applicable also to very large datasets. According to Schneider et al. (2014), the GPCC did not adopt this strategy.

The VASClimO dataset is based on time series of 9343 stations; 4094 (43.8%) of these stations are within France and Germany and cover only 0.68% of the total area represented by the dataset. The original global dataset () contained observed monthly precipitation time series from 124 countries and territories (like Antarctica, Greenland, British Indian Ocean Territory, etc.). This means that for about one-third of the 193 U.N. membership states no long time series were available. The highest station density was in Belgium, with 14 stations covering a relatively small country resulting in an area of 2324 km^{2} per station. France had the third highest station density with 4422 km^{2} per station, Germany ranked 19 with 9396 km^{2} per station. Using all available time series (dataset ) brings Germany to rank 1 with 109 km^{2} per station and France to rank 2 with 725 km^{2} per station and leaves the order of the other countries and territories unchanged. Outside France and Germany, a station represents 24 114 km^{2} on average. This author opposes the statement of the German Weather Service (DWD 2010) that it was necessary to use the German and the French datasets to fill a gap that would result otherwise. France had already the third highest station density. Five out of six countries that contributed data contributed a lower station density than Germany.

Contrary to Rudolf and Schneider (2005), Beck et al. (2005, 2007), and Schneider et al. (2011), the VASClimO dataset is not interpolated by ordinary kriging and not based on quality-controlled long-term averages of the GPCC. The spatial pattern of average precipitation is estimated based on uncontrolled station averages provided by FAO. Spatial representativeness might have been better if the long-term means of the GPCC could have been used by the author.

As of this writing, the GPCC has always used Shepard’s method for the interpolation of monthly precipitation data to global land surface grids. The method was installed at the GPCC by D. Legates in 1991 (Becker et al. 2012). Becker et al. (2013) published that the GPCC altered the method already in 1994. It is shown in this paper that the reason they provide for altering Shepard’s method (avoiding double use of stations although the number of stations is lower than the number of interpolation points) is irrational. Results are worse than by applying Shepard’s method in its original version.

Ordinary kriging and LSC outperform Shepard’s method. While average errors are only slightly smaller, maximum relative errors are decreased by up to a factor of 6. For the VASClimO dataset relative deviations of precipitation have been interpolated using LSC since this combination of data transformation and interpolation method provided the best results. In contrast, Becker et al. (2013) published that the GPCC now interpolates absolute deviations as a result of tests performed by Beck et al. (2005), although these authors clearly state that relative deviations perform better.

The VASClimO dataset was produced to provide the best temporal homogeneity for the analysis of globally observed temporal precipitation variability. Some authors, however, analyzed features of the dataset that are the least reliable. An example is Kottek et al. (2006), who used the local mean precipitation, which is mainly driven by the 27 909 uncontrolled FAO station averages, to recalculate the global Köppen climatology.

Global precipitation observations are important for understanding the global hydrological cycle, its variability, and possible changes. With tens of thousands of station records of observed monthly mean precipitation, the GPCC holds a treasure of information. The data are of high value to both science and society.

## Acknowledgments

The author thanks Dr. Christoph Beck for providing the three datasets of observed monthly precipitation data to him. Special thanks go to two German scientists for reading a previous version of the manuscript. Both prefer to stay anonymous. The author is also very grateful for the extremely helpful comments by the three anonymous reviewers, and he thanks several former German colleagues who motivated him to write this paper.

## REFERENCES

*Proc. Second Workshop of the International Precipitation Working Group (IPWG),*Monterey, CA, CNR-ISAC,

*Proc. 23rd National Conf. ACM,*New York, NY, ACM,

*Statistical Methods in the Atmospheric Sciences.*Elsevier, 627 pp.