## Abstract

This paper presents a method to detect and correct occurring biases in observational mean sea level pressure (MSLP) data, which was developed within the Mesoscale Alpine Climate Dataset [MESOCLIM; i.e., 3-hourly MSLP, potential and equivalent potential temperature Vienna Enhanced Resolution Analysis (VERA) analyses for a 3000 km × 3000 km area centered over the Alps during 1971–2005] project. There are many reasons for a change of a measurement site’s performance, for example, a change in the instrumentation, a slight modification of the site’s place or position, or a different way of data processing (pressure reduction). To get an estimate for these artificial influences in the data, deviations for each reporting station at each point of time were calculated, using a piecewise functional fitting approach that is based on a variational algorithm. In this algorithm first- and second-order spatial derivatives are minimized using the tested stations neighbor stations and furthermore their neighbors. The resulting time series of deviations for each station were then tested with a “standard normal homogeneity test” to detect changes in the mean deviation. With the knowledge of these “break points,” bias-correction estimates for each station were calculated. These correction estimates are constant between the detected break points because the method does not detect different slopes in trends. Application of these correction estimates yields in smoother fields and a more homogenous distribution of trends.

## 1. Introduction

Since more long-term observations are available every year, quality control and bias correction are becoming increasingly important. On the one hand, long observation time series are used to detect climate changes; on the other hand, they can be used to validate climate model simulations (Feng et al. 2004). Therefore, robust and easy applicable techniques to control and correct meteorological observations have to be developed to meet the standards for data suggested by the World Meteorological Organization (WMO 2008).

While some data centers still consider human inspection as an important part of quality control (Shafer et al. 2000), this paper focuses on an automated method. It can be applied to huge numbers of time series, if the spatial distance of the observation sites is reasonably small compared to the scale of variations characteristic for the measured field.

Finding break points and removing artificial shifts in time series are often referred to as homogenization. Various methods to detect and correct artificial breaks in time series have been developed using metadata (Luers and Eskridge 1995; Eskridge et al. 2003) and analysis of time series (Haimberger 2007). The procedure presented here is estimating break times and bias corrections from observation time series and therefore does not need metadata.

To create a 3-hourly mean sea level pressure (MSLP) analysis dataset for a 3000 km × 3000 km domain centered over the Alps, the observational input data had to be quality controlled and bias corrected. A partially biased MSLP time series would decrease the temporal consistency of the analyses, and therefore such records have to be identified. For this purpose a bias-correction scheme and a two-step quality-control procedure were developed.

There are a few eventualities that could cause a systematic error persisting over an interval longer than a few months (e.g., a change in instrumentation, a modification of the measurement site, or a different pressure reduction). The presented approach tries to detect and correct such systematic inhomogeneities.

Prior to the bias estimation, the observations were compared to the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) grid points. By rejecting the observations, if the computed difference exceeded a certain threshold, the influence of gross errors was reduced.

The bias estimation was done with the help of an automated quality-control algorithm described by Steinacker (2000). This variational algorithm calculates deviations by minimizing first- and second-order spatial derivatives. These deviations were then tested for inhomogeneities using a standard normal homogeneity test (SNHT) similar to the test used by Haimberger (2007) for the Radiosonde Observation Correction using Reanalysis (RAOBCORE). With information about the breaks in these time series and the magnitude of the deviations between the breaks, a bias correction was calculated. After this bias correction, the corrected observations were checked with the quality-control algorithm again to provide a field smooth enough to be resolved by the Vienna Enhanced Resolution Analysis (VERA) tool (Steinacker et al. 1997; Bica et al. 2007).

In section 2, the origin and characteristic of the dataset that underwent the quality control is described. The methodological approach to detect breaks in the data series is given in section 3, followed by a description of a method to detect gross errors in section 4. The operational implementation of the quality control and results are discussed in section 5, and conclusions and future outlooks are presented in section 6.

## 2. The MESOCLIM dataset

The Mesoscale Alpine Climate Dataset (MESOCLIM) covers the period 1971–2005 and contains reports from over 4000 stations, ships, platforms, and buoys in Europe and the surrounding area. The data were merged from five different data archives:

Deutscher Wetterdienst (DWD): Provided data from stations in Germany.

Meteorological Archival and Retrieval System (MARS) operational data: Provided station data from all over the domain, but only from 1996 to 2005.

MARS ERA-40 data: Also provided stations from all over the domain and for the whole period.

Zentral Anstalt für Meteorologie und Geodynamik (ZAMG): Station data from Austria for the whole period.

MeteoSwiss: Swiss station data for the whole period.

Various problems occurred within the merging process and a lot of effort was put into cleaning and controlling the input data. Because station observation data were sometimes available in more than one data archive, the archives had to be set in some kind of order considering the reliability of their data. Additionally, procedures to identify such duplicate data entries had to be developed. Because of several political changes in Europe, WMO station numbers changed in some countries (East Germany, Croatia, Serbia, Slovenia, and Bosnia–Herzegovina), leading to a complication in the process of finding redundant data. In the ERA-40 archive for example, data from East Germany before 1989 was labeled with the WMO identifier valid at that time (09 000–09 999), while the DWD archive displayed these stations with the now valid numbers (10 000–10 999). The insufficient precision of the geographical location information data was another factor that made it hard to detect stations with dispensable reports originating from the different datasets. Observations found in more than one dataset were taken preferably from the MARS archive, as it is believed to be the most reliable one.

## 3. Bias correction: Theory and experiments

The presented procedure was invented to detect artificial breaks in MSLP time series and to correct occurring biases between two break points. Reasons for such errors in long-term pressure time series could be the usage of new measurement instruments or a slight modification of the instruments height. Nevertheless, these break points are hard to detect if just the station’s time series are examined.

Therefore, this procedure is testing time series of deviations calculated with an automated quality-control algorithm. This algorithm compares each measurement with an interpolated value, and the difference between the two is defined as a deviation. To derive an interpolated value at a measurement’s site, an analysis field is fitted with respect to minimizing its first- and second-order derivatives by using the surrounding station’s observations.

The output of the automated quality-control algorithm was checked for inhomogeneities that could be caused by artificial shifts in the station records with an SNHT (Alexanderson and Moberg 1997). The variant of the SNHT used here was developed by Haimberger (2007) for testing analysis feedback data of radiosonde temperatures and is defined as

where *N*/2 denotes the length of the subsamples; *μ*_{1k} and *μ*_{2k} are the means of the deviations in subsamples 1 and 2, respectively; while *μ* and *σ _{k}* represent mean and standard deviation, respectively, over both intervals. This test is using fixed subsample lengths.

To investigate the test’s capability to detect breaks in the deviation time series, an experiment with synthetic time series was carried out. To get realistic synthetic data, 200 deviation time series, obtained by applying the automated quality-control algorithm to real data, were undertaken by a Fourier analysis. In this way, 200 sets of Fourier coefficients were calculated and used to estimate their distributions. To derive synthetic time series, 3000 sets of Fourier coefficients were sampled using the statistics of the distributions. Wavelengths greater than 0.25 yr^{−1} were omitted, which equals a high-pass filter. This approach was chosen to create time series with realistic diurnal and yearly cycles, but without trends or steps. A comparison of random synthetic time series with real deviation time series revealed that this technique yielded realistic results.

Every time series was modified with a break of the size 0.25σ, 0.5σ, 1σ, 2σ, and 3*σ* at random times, making it a total of 18 000 time series. Two different setups of the SNHT were tested, with modified subsample lengths of 250, which equals to 31.25 days since the time step is 3 h, and with a subsample length of 2920, which corresponds to 1 yr.

Figure 1 shows an example for such a simulated time series and the corresponding SNHT statistics. Both setups of the test seem to detect the break, differing only in magnitude. Distributions of the maximum SNHT values can be seen in Figs. 2 and 3 for the different sample length. To get an estimate for a reasonable threshold for the *T _{k}* values to detect a break, the maximum

*T*values for homogenous time series have to be examined. For the short sample length, these maximum

_{k}*T*values are smaller than 24 in 95% of all cases and smaller than 29.5 for 99% of all homogenous time series, while the longer sampling interval yields higher levels of 137.5 and 194.5 in 95% and 99% of all cases, respectively. To indicate a break, a

_{k}*T*threshold higher than the maximum values of the SNHT statistics calculated for homogenous time series has to be chosen.

_{k}To estimate the size of the breaks possibly detectable with thresholds of the aforementioned magnitude, the distribution of *T _{k}* values for synthetic time series with artificial breaks have to be examined. It showed that, using the short interval test, the maximum

*T*values reached for time series with a break of 2

_{k}*σ*are smaller than 44 in 1%, and smaller than 52.7 in 5% of all cases, while these numbers are 537 and 629.6 for the long intervals, respectively. These 1% and 5% values of the

*T*distributions are shown in Table 1 for breaks of different sizes.

_{k}In addition to the experiment described above, the SNHT’s capability to estimate the right time of the break was also investigated. Figure 4 shows that both tests are detecting the time of the break quite well, with mean absolute errors of 4.2 and 2.8 h (3-h resolution) for long and short intervals. If more than one break appears within one subsample length, the ability for correct time detection decreases as a result of less distinctive *T _{k}* maxima that might appear in the vicinity of the break points.

The results suggest that both tests are capable of detecting breaks ≥2*σ*, while smaller breaks can be lost in the noise as a result of the high significance levels. Because of the much shorter computing time and the higher accuracy in terms of break time detection when using short intervals, it was decided to use this version of the test.

## 4. Checking the data for gross errors with the help of ERA-40 data

The ERA-40 MSLP analyses, complemented with analysis data from the operational ECMWF forecast, seemed to provide a field smooth enough to be used for the elimination of gross errors. Therefore, the observational data were compared to the nearest ERA-40 grid point of a 2.5° × 2.5° grid, and observations were rejected if the difference exceeded 9.5 hPa. This threshold was chosen because gross errors of 10 hPa were quite common when reports were still written by hand. The maximum percentage of stations that were rejected throughout the whole period was 17.45%, which was encountered on 2 February 1972, when 137 of 785 observing stations were omitted. Roughly 90% of these stations were located in the former Soviet Union, and the data was extracted from the ERA archive. Their offset was approximately 100 hPa, indicating a systematic error on that day. On average only 0.5% of the reporting stations were rejected, and on only 75 out of 102 272 occasions, more than 5% were ignored.

## 5. Operational implementation and results

The variational algorithm that calculates deviations for all stations that have passed the gross error check was carried out at each time to obtain deviation time series for every station. These time series were then tested with the short interval setup of the SNHT. To take missing values into account, the test statistics were only calculated when both intervals provided at least 80% of the possible records. Because a deviation time series can be influenced by a neighboring station, especially if the neighboring time series contains a break, a restrictive threshold of 100 was set for the SNHT values to indicate a break. With this threshold the test is still capable of finding more than 95% of the breaks greater than 3*σ*, since the 5% level of the test statistics for breaks of this size was 107.6. The big margin left to the theoretical 99%f level of maximum *T _{k}* values for homogenous time series (i.e., 29) when choosing this threshold is applied to make sure no breaks in homogenous time series are indicated.

If the test statistics exceeded this threshold, the maximum within a 6-month interval was considered a break point. This has to be done because the test statistics sometimes exhibit a few local maxima within 1 or 2 months before and after the main, which indicated a break in the time series. In the top panel of Fig. 5 the deviation time series of station 11022 (Retz, Austria) is shown in black and the black vertical lines indicate detected breaks. To correct these inhomogeneities the median of the deviations between two break points was used as correction estimate. If no break points were found in a time series, the median of the deviations was still applied as constant bias correction, only if it was bigger than 0.1 hPa and more than 6 months of time series were available.

The middle and bottom panels of Fig. 5 show the uncorrected and corrected MSLP time series of the same station. In this case the applied bias correction is changing the trend of this time series from −0.1 to −0.01 hPa yr^{−1}. Considering that the long-term trend of MSLP should be rather close to zero, the correction seems to have a good effect on the time series. The distribution of MSLP trends from uncorrected and corrected time series with a minimum length of 20 yr can be seen in Fig. 6. The overall impact of the corrections seems to reduce the individual trends. A smoothing of the field caused by the correction can also be noticed when comparing analyses calculated with VERA for raw and corrected data, as shown in Figs. 7 and 8.

## 6. Conclusions and outlook

The proposed methodology to detect gross errors and biases allows for meaningful mesoscale analyses of MSLP. To apply this method successfully, it is important that the station density is sufficiently high, which implies a significant autocorrelation. Note that not all meteorological parameters show the same spatial autocorrelation. Whereas for surface temperature the results for the fairly dense European synoptic network are also meaningful, the application of the methodology to precipitation data is less useful, especially during periods of convective precipitation, since the scale of convective precipitation is much smaller than for other parameters. Gridded mesoscale analyses of quality-controlled MSLP, surface temperature, specific humidity, and surface wind speed will soon become available for public access.

## Acknowledgments

The authors thank the Austrian Science Fund (FWF) for supporting the project.