• Bayes, T., 1763: An essay towards solving a problem in the doctrine of chances. Philos. Trans. Roy. Soc., 53, 330418.

  • Behrens, W. V., 1929: Ein beitrag zur fehlerberechnung bei wenigen beobachtungen. Landwirtschaftliche Jahrbücher, 68, 807837.

  • Bovensmann, H., , J. P. Burrows, , M. Buchwitz, , J. Frerick, , S. Noël, , V. V. Rozanov, , K. V. Chance, , and A. H. P. Goede, 1999: SCIAMACHY–Mission objectives and measurement modes. J. Atmos. Sci., 56, 127150.

    • Search Google Scholar
    • Export Citation
  • Bretthorst, G. L., 1993: On the difference in means. Physics and Probability, Cambridge University Press, 177–194.

  • Burrows, J. P., and Coauthors, 1999: The Global Ozone Monitoring Experiment (GOME): Mission concept and first scientific results. J. Atmos. Sci., 56, 151175.

    • Search Google Scholar
    • Export Citation
  • de Laplace, P. S., 1812: Théorie Analytique des Probalités. Courcier Imprimeur, 506 pp.

  • Dose, V., , and A. Menzel, 2004: Bayesian analysis of climate change impacts in phenology. Global Change Biol., 10, 259272.

  • Edelson, R. A., , and J. H. Krolik, 1988: The discrete correlation function: A new method for analyzing unevenly sampled variability data. Astrophys. J., 333, 646659.

    • Search Google Scholar
    • Export Citation
  • Elliott, W. P., , and D. J. Gaffen, 1991: On the utility of radiosonde humidity archives for climate studies. Bull. Amer. Meteor. Soc., 72, 15071520.

    • Search Google Scholar
    • Export Citation
  • Fisher, R. A., 1937: The comparison of samples with possibly unequal variances. Ann. Eugen., 9, 174180.

  • Garand, L., , C. Grassotti, , J. Halle, , and G. L. Klein, 1992: On differences in radiosonde humidity—Reporting practices and their implications for numerical weather prediction and remote sensing. Bull. Amer. Meteor. Soc., 73, 14171423.

    • Search Google Scholar
    • Export Citation
  • Gilks, W. R., , S. Richardson, , and D. Spiegelhalter, 1995: Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC, 512 pp.

  • Jaynes, E. T., , and L. G. Bretthorst, 2003: Probability Theory: The Logic of Science: Principles and Elementary Applications. Vol 1. Cambridge University Press, 758 pp.

    • Search Google Scholar
    • Export Citation
  • Jeffreys, H., 1939: Theory of Probability. 3rd ed. Oxford University Press, 472 pp.

  • Kahn, B. H., , A. Gettelman, , E. J. Fetzer, , A. Eldering, , and C. K. Liang, 2009: Cloudy and clear-sky relative humidity in the upper troposphere observed by the A-train. J. Geophys. Res., 114, D00H02, doi:10.1029/2009JD011738.

    • Search Google Scholar
    • Export Citation
  • Keeling, C. D., , S. C. Piper, , R. B. Bacastow, , M. Wahlen, , T. P. Whorf, , M. Heimann, , and H. A. Meijer, 2001: Exchanges of atmospheric CO2 and 13CO2 with the terrestrial biosphere and oceans from 1978 to 2000. Observations and carbon cycle implications. A History of Atmospheric CO2 and Its Effects on Plants, Animals, and Ecosystems: I. Global Aspects., J. R. Ehleringer et al., Eds., Springer, 83–113.

    • Search Google Scholar
    • Export Citation
  • Lally, V. E., 1985: Upper air in situ observing systems. Handbook of Applied Meteorology, John Wiley & Sons, Inc., 352–360.

  • Lanzante, J. R., 2005: A cautionary note on the use of error bars. J. Climate, 18, 36993703.

  • Lee, T. C. K., , F. W. Zwiers, , G. C. Hegerl, , X. Zhang, , and M. Tsao, 2005: A Bayesian climate change detection and attribution assessment. J. Climate, 18, 24292440.

    • Search Google Scholar
    • Export Citation
  • Liu, J. S., 2003: Monte Carlo Strategies in Scientific Computing. Springer, 360 pp.

  • Mieruch, S., 2010: Identification and statistical analysis of global water vapour trends based on satellite data. Ph.D. thesis.

  • Mieruch, S., , S. Noël, , H. Bovensmann, , and J. P. Burrows, 2008: Analysis of global water vapour trends from satellite measurements in the visible spectral range. Atmos. Chem. Phys., 8, 491504.

    • Search Google Scholar
    • Export Citation
  • Moreno, E., , F. Bertolino, , and W. Racugno, 1999: Default bayesian analysis of the Behrens-Fisher problem. J. Stat. Plann. Infer., 81, 323333.

    • Search Google Scholar
    • Export Citation
  • Noël, S., , M. Buchwitz, , and J. P. Burrows, 2004: First retrieval of global water vapour column amounts from SCIAMACHY measurements. Atmos. Chem. Phys., 4, 111125.

    • Search Google Scholar
    • Export Citation
  • Perneger, T. V., 1998: What’s wrong with Bonferroni adjustments. BMJ, 316, 12361238.

  • Robert, C. P., , and G. Casella, 2005: Monte Carlo Statistical Methods. Springer, 536 pp.

  • Satterthwaite, F. E., 1946: An approximate distribution of estimates of variance components. Biom. Bull., 2, 110114, doi:10.2307/3002019.

    • Search Google Scholar
    • Export Citation
  • Schlittgen, R., , and B. H. J. Streitberg, 1997: Zeitreihenanalyse. Oldenbourg.

  • Schulz, J., and Coauthors, 2009: Operational climate monitoring from space: The EUMETSAT Satellite Application Facility on Climate Monitoring (CM-SAF). Atmos. Chem. Phys., 9, 16871709.

    • Search Google Scholar
    • Export Citation
  • Sivia, D. S., , and J. Skilling, 2006: Data Analysis: A Bayesian Tutorial. Oxford University Press, 208 pp.

  • Sohn, B. J., , and R. Bennartz, 2008: Contribution of water vapor to observational estimates of longwave cloud radiative forcing. J. Geophys. Res., 113, D20107, doi:10.1029/2008JD010053.

    • Search Google Scholar
    • Export Citation
  • ter Braak, C., 2006: A Markov Chain Monte Carlo version of the genetic algorithm differential evolution: Easy Bayesian computing for real parameter spaces. Stat. Comput., 16, 239249.

    • Search Google Scholar
    • Export Citation
  • Vaisala, 1989: RS 80 Radiosondes. Upper-Air Systems product information. Vaisala Inc. Reference R0422-2, 16 pp.

  • Weatherhead, E. C., and Coauthors, 1998: Factors affecting the detection of trends: Statistical considerations and applications to environmental data. J. Geophys. Res., 103, 17 14917 161.

    • Search Google Scholar
    • Export Citation
  • Welch, B. L., 1947: The generalization of “student’s” problem when several different population variances are involved. Biometrika, 34, 2835.

    • Search Google Scholar
    • Export Citation
  • Westfall, P. H., , W. O. Johnson, , and J. M. Utts, 1997: A Bayesian perspective on the Bonferroni adjustment. Biometrika, 84, 419427.

  • View in gallery

    Sensitivity analysis of a change in the trend prior. Exemplary we have chosen P(A|D1, D2, I) = 0.7 and P(B|D1, D2, I) = 0.3 depicted as black and gray points. The x axis represents a change of the prior probability P(ω|I) = 5 in %. For instance, a change of 10% corresponds to an increase of the prior to P(ω|I) = 5.5. The embedded small figure shows the results for larger changes of the prior.

  • View in gallery

    One hundred, eighty-seven test results between pairs of water vapor trends from GOME–SCIAMACHY and radiosonde data are plotted against the respective trend differences and against the trend differences divided by the error of the differences: (a),(b) using the Welch test, (c),(d) applying the exact Bayesian method, and (e),(f) performing the approximated Bayesian approach.

  • View in gallery

    Exact Bayesian method applied to trends using the Bonferroni correction .

  • View in gallery

    Examples of GOME–SCIAMACHY and radiosonde water vapor time series at (a),(b) Nottingham, England, (c),(d) Albany Airport, Australia; (e),(f) Meiningen, Germany; and Minqin, China.

  • View in gallery

    Global water-vapor total column trends from GOME–SCIAMACHY are coded in gray scales. The 187 radiosonde water vapor trends are embedded into the figure as circles, where the same color bar is used for filling. White-bordered circles depict Bayesian probabilities of agreements between satellite and radiosonde trends >0.76, gray borders indicate probabilities >0.5 and ≤0.76, whereas black-bordered circles show probabilities of agreement ≤0.5. Note that the Himalayas and Andes regions are excluded from the analysis because of high elevation, which actually cannot be retrieved by the AMC–DOAS method.

  • View in gallery

    GOME–SCIAMACHY water vapor trends in the range of ±0.03 g cm−2 yr−1 at the Arabian Peninsula (10°–30°N, 35°–60°E) with embedded radiosonde trends from 1996 to 2007. A different color scale as in Fig. 5 is used in this blowup.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 60 59 9
PDF Downloads 16 16 3

A New Method for the Comparison of Trend Data with an Application to Water Vapor

View More View Less
  • 1 Institute of Environmental Physics (IUP), University Of Bremen, Bremen, Germany
  • | 2 Satellite Application Facility on Climate Monitoring (CMSAF), German Weather Service, Offenbach, Germany
  • | 3 European Organisation for the Exploitation of Meteorological Satellites, Darmstadt, Germany
© Get Permissions
Full access

Abstract

Global total column water vapor trends have been derived from both the Global Ozone Monitoring Experiment (GOME) and the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) satellite data and from globally distributed radiosonde measurements, archived and quality controlled by the Deutscher Wetterdienst (DWD).

The control of atmospheric water vapor amount by the hydrological cycle plays an important role in determining surface temperature and its response to the increase in man-made greenhouse effect. As a result of its strong infrared absorption, water vapor is the most important naturally occurring greenhouse gas. Without water vapor, the earth surface temperature would be about 20 K lower, making the evolution of life, as we know it, impossible. The monitoring of water vapor and its evolution in time is therefore of utmost importance for our understanding of global climate change. Comparisons of trends derived from independent water vapor measurements from satellite and radiosondes facilitate the assessment of the significance of the observed changes in water vapor.

In this manuscript, the authors have compared observed water vapor change and trends, derived from independent instruments, and assessed the statistical significance of their differences. This study deals with an example of the Behrens–Fisher problem, namely, the comparison of samples with different means and different standard deviations, applied to trends from time series.

Initially the Behrens–Fisher problem for the derivation of the consolidated change and trends is solved using standard (frequentist) hypothesis testing by performing the Welch test. Second, a Bayesian model selection is applied to solve the Behrens–Fisher problem by integrating the posterior probabilities numerically by using the algorithm Differential Evolution Markov Chain (DEMC). Additionally, an analytical approximative solution of the Bayesian posterior probabilities is derived by means of a quadratic Taylor series expansion applied in a computationally efficient manner to large datasets. The two statistical methods used in the study yield similar results for the comparison of the water vapor changes and trends from the different measurements, yielding a consolidated and consistent behavior.

Corresponding author address: Sebastian Mieruch, University of Bremen, P.O. Box 330440, Bremen, Germany. E-mail: sebastian.mieruch@iup.physik.uni-bremen.de

Abstract

Global total column water vapor trends have been derived from both the Global Ozone Monitoring Experiment (GOME) and the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) satellite data and from globally distributed radiosonde measurements, archived and quality controlled by the Deutscher Wetterdienst (DWD).

The control of atmospheric water vapor amount by the hydrological cycle plays an important role in determining surface temperature and its response to the increase in man-made greenhouse effect. As a result of its strong infrared absorption, water vapor is the most important naturally occurring greenhouse gas. Without water vapor, the earth surface temperature would be about 20 K lower, making the evolution of life, as we know it, impossible. The monitoring of water vapor and its evolution in time is therefore of utmost importance for our understanding of global climate change. Comparisons of trends derived from independent water vapor measurements from satellite and radiosondes facilitate the assessment of the significance of the observed changes in water vapor.

In this manuscript, the authors have compared observed water vapor change and trends, derived from independent instruments, and assessed the statistical significance of their differences. This study deals with an example of the Behrens–Fisher problem, namely, the comparison of samples with different means and different standard deviations, applied to trends from time series.

Initially the Behrens–Fisher problem for the derivation of the consolidated change and trends is solved using standard (frequentist) hypothesis testing by performing the Welch test. Second, a Bayesian model selection is applied to solve the Behrens–Fisher problem by integrating the posterior probabilities numerically by using the algorithm Differential Evolution Markov Chain (DEMC). Additionally, an analytical approximative solution of the Bayesian posterior probabilities is derived by means of a quadratic Taylor series expansion applied in a computationally efficient manner to large datasets. The two statistical methods used in the study yield similar results for the comparison of the water vapor changes and trends from the different measurements, yielding a consolidated and consistent behavior.

Corresponding author address: Sebastian Mieruch, University of Bremen, P.O. Box 330440, Bremen, Germany. E-mail: sebastian.mieruch@iup.physik.uni-bremen.de

1. Introduction

The estimation of robust parameters that describe change from time series of data and their statistical significance is one important aspect of modern climate research. Sophisticated and appropriate statistical methods are needed for this task. Data for climate studies is collected within large international projects, designed for time spans of several decades—for example, the Global Climate Observing System (GCOS). Such data includes both in situ measurements, such as radiosondes, and remote sensing measurements from spaceborne instruments. This effort has been generating long-term time series of diverse quantities (Schulz et al. 2009). The most famous climate time series ever recorded is probably the CO2 time series at the Mauna Loa Observatory at Hawaii, initiated by the late C. D. Keeling (Keeling et al. 2001). Long-term time series provide extremely useful information about the change of the respective quantities in form of trends. The separation of the daily, weekly, monthly, seasonal, yearly, decadal, and multidecadal changes, which are often called trends, is challenging, and the significance or knowledge of these trends depends on the length of the data, the instrumental noise, the natural variability and noise, and also on autocorrelations in the noise. Additionally any changes in instruments, calibration, and the location of measurement sites decrease the knowledge and significance of the derived trends. (Weatherhead et al. 1998). To test the robustness of the estimated errors in the derived trends, and to quantitatively establish the representativeness of the trends, it is not sufficient to calculate only the trends and their errors, but it is also necessary to compare trends from independent instruments having overlapping measurements.

In this manuscript, we compare trends of total water vapor columns obtained from different sources. One dataset is from the Global Ozone Monitoring Experiment (GOME) (Burrows et al. 1999)—onboard ERS-2 since 1995—and the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY) (Bovensmann et al. 1999)—on board ENVISAT since 2002. In addition, the water vapor trends have been used from the globally distributed radiosonde network stations archived by the Satellite Application Facility on Climate Monitoring (CMSAF), which is hosted by the Deutscher Wetterdienst (DWD). These data are well suited for comparison and the assessment of significance of trends in water vapor because of their independence. However, the datasets have different sampling times, spatial resolution, etc. So, the issue arises how to best compare these data and the trends derived from them.

The appropriate comparison of data or quantities such as means and standard deviations from different measurement techniques is a common problem in a variety of scientific disciplines. One possible solution for this issue has a long tradition, going back at least to 1929, when Behrens (Behrens 1929) proposed a solution to the problem of the accurate determination of the difference of means in data from different sources, assuming that the standard deviations of the different datasets are not equal and unknown. Fisher (1937) found a solution to the problem using fiducial inference. The Behrens–Fisher problem has also been solved by Jeffreys (1939) and recently, for example, by Bretthorst (1993) and Moreno et al. (1999), who have used Bayesian probability analysis. Currently, examples of the Behrens–Fisher problem are mostly solved by using the frequentist statistics approach—namely, the Welch test (Welch 1947)—which is an adaption of a Student’s t test. Unfortunately, the Behrens–Fisher problem is often incorrectly simplified in such a way that a difference of given quantities is stated as significant if the respective error bars do not overlap, as discussed by Lanzante (2005).

In this study the Behrens–Fisher problem associated with the derivation of trends in water vapor from different time series having different, and possibly unknown, standard deviations has been investigated. The objective of our analysis is twofold. First, methods for trend comparisons in the sense of hypothesis testing under the frequentist and Bayesian framework have been developed. Second, the methods have been applied to water vapor data in a demonstration study, which shows that valuable meteorological information is gained successfully by applying the methods developed in this manuscript.

The manuscript is structured as follows. In section 2, we introduce the databases used, that is, the water vapor measurements from satellites and ground stations. In section 3, we briefly describe the two relevant schools of statistics—“the frequentist” (standard) approach and the Bayesian concepts—and explain their underlying philosophy. The statistical methods used are discussed in section 4, including the trend calculation, the Welch test for trends, the Bayesian model selection applied to trends, a sensitivity study regarding prior information, and the derivation of an approximation of the Bayesian method. The three approaches are applied to the GOME–SCIAMACHY and radiosonde water vapor time series and compared in section 5. Finally, the conclusions are given in section 6.

2. The water vapor measurements

a. GOME–SCIAMACHY

The global total water vapor column amounts used in the present study have been retrieved by the Air Mass Corrected Differential Optical Absorption Spectroscopy approach (AMC–DOAS) (Noël et al. 2004) from observation of the upwelling radiation at the top of the atmosphere in the visible range measured by the GOME instrument onboard the orbiting platform ERS-2, launched in April 1995, and the SCIAMACHY spectrometer onboard ENVISAT, launched in March 2002. These satellites fly in sun-synchronous orbits in descending node at a height of about 785 km. GOME crosses the equator at 1030 local time and has a spatial resolution of typically 40 km × 320 km: global coverage being achieved after about three days. SCIAMACHY crosses the equator at 1000 local time, its spatial resolution being somewhat better, typically at 30 km × 60 km, whereas global coverage is achieved within six days. Since the instruments are dependent on sun light, no information on water vapor is retrieved during night, which results in no data being obtained during the polar night. Scenes that have too large cloud cover and regions with extreme elevation (e.g., Himalayas) are excluded from the analysis (Noël et al. 2004). The GOME–SCIAMACHY dataset is gridded on a spatial 0.5° × 0.5° lattice and then accumulated to monthly means. This yields a cloud-cleared monthly mean water vapor climatology between 1000 and 1030 local time.

b. Radiosondes

Radiosondes are devices carried on small balloons to heights around 35 km. Typically the parameters pressure, altitude, geographical position, temperature, relative humidity, and wind speed are recorded and sent to a ground station. Comprehensive information on different sensors, designs, calibration, etc. can be found, for example, in Elliott and Gaffen (1991), Garand et al. (1992), Lally (1985), and Vaisala (1989).

Under the framework of the World Meteorological Organization a network of globally distributed radiosonde stations perform regular observations, which have been quality-controlled and archived at the DWD. In this study, 187 high quality time series (from more than 900 stations in total) are considered. The measured water vapor profiles have been integrated up to 100 hPa to yield the total column amounts. Typically the radiosondes are launched daily at 0600, 1200, 1800, and 0000 local time. These data have been averaged to derive monthly means, which are comparable with those determined from the satellite data. The time span is selected to match the satellite time range and the comparison is performed for data from January 1996 to December 2007.

There are several differences between the two datasets. One of the most important differences is horizontal spatial resolution, which is relatively low, that is, poor (0.5° × 0.5°) for the satellite data and high (point measurements) for the radiosonde observations. Further differences in the water vapor data are expected because of the AMC–DOAS data being cloud cleared, whereas the radiosonde data contain all-sky measurements. The difference between clear-sky and all-sky relative humidity has been explored, for example, by Kahn et al. (2009) for the upper troposphere, where a large impact has been observed. Since we are dealing with the total column water vapor, these effects in the upper troposphere have only a minor contribution. However, Sohn and Bennartz (2008) have investigated the clear-sky bias for total column water vapor and found a bias of about 0.2 g cm−2 for zonal means between clear-sky and all-sky data. Additional biases can be expected due to diurnal variations of water vapor, as the satellite measurements in principle present a morning climatology (equator crossing at about 1000 local time), whereas the radiosondes sample the complete day. To account for such biases we use individual offsets for each dataset in the regression procedure in section 4a.

In spite of the differences in sampling there are important similarities in both datasets, for example, the temporal resolution being monthly. The averaging to monthly means typically smears out small-scale fluctuations and the individual offsets account for biases, which makes a comparison possible between the two datasets.

3. The two schools of statistics

The frequentist statistics approach was primarily developed by, for example, Fisher, Neyman, and Pearson at the beginning of the twentieth century. The underlying philosophy of the frequentist statistics is the interpretation of an event probability as the limit of its relative frequency for a large number of trials. A major component of how statistics is used in environmental science is the hypothesis testing, which is used under the framework of induction to make decisions using experimental data. The basic concept of hypothesis testing is to set up a null hypothesis H0, which is assumed to be true and an alternative hypothesis H1, which is the complementary event of H0. Then the probability of exceeding a value of a test statistic (according to H0) is inferred. The null hypothesis is typically rejected if the observed probability is below a significance level of, for example, α = 0.05. Such a case would confirm the alternative hypothesis.

Bayesian statistics has been developed by Bayes (1763) and de Laplace (1812). It is much older than the frequentist approach but was then largely forgotten until Jeffreys (1939) rediscovered the ideas of Bayes and de Laplace. The Bayesian concepts have undergone a renaissance in the late twentieth century, in part as a result of the increase of computational power. Influence on the Bayesian development in recent times has been contributed by, for example, Jaynes and Bretthorst (2003). An advantage of the Bayesian formalism is that it is based completely on probability theory (Jaynes and Bretthorst 2003), whereas the frequentist statistics represents rather a compilation of a large amount of tests and methods. Hypothesis testing can also be accomplished within the Bayesian framework. However, Bayesian hypothesis testing is better described as a model selection procedure, that is, inferring which model or hypothesis has the higher probability to explain certain data or phenomena.

The major differences between frequentist and Bayesian statistics are as follows.

  • Philosophical difference: The deep philosophical difference is that the parameters are fixed (but unknown) in the former and have some randomness in the form of a prior (or degree of belief) distribution in the latter. Data is used by Bayesians to update the prior knowledge in the form of a prior distribution, resulting in a posterior distribution that expresses the relative evidence of the parameters values given the data and the prior knowledge. In contrast, frequentists calculate statistics from the data to estimate the parameters and calculate the distribution of these statistics, given in (hypothetical) other datasets generated under the same model and assumed fixed parameter values. This can be elucidated by conditional probabilities. A conditional probability is the probability of an event X given the occurrence of another event Y and is denoted as P(X|Y). In the frequentist approach X could be data or a statistic derived from the data and Y a hypothesis, for example, a model with a particular parameter value that is assumed to have generated the data. In hypothesis testing frequentists then calculate the exceedance probability P(X > X0|Y), where X0 is the (test) statistic calculated from the data at hand. Bayesians can give the reverse, P(Y|X), the probability of the hypothesis given the data. The frequentists probability requires the notion of other (hypothetical) datasets generated from the same model and parameters, whereas the Bayesian probability conditions on the particular data at hand.
  • Prior information: Bayesian methods utilize prior information about the truth of a hypothesis or parameter range, which reflects the knowledge (or ignorance) before the data have been analyzed. This enables a new quality of statistical inference, for example, the investigation of the evidence for human-induced climate change, as performed by Lee et al. (2005). In frequentist statistics such prior information does not exist.
The fundament of Bayesian statistics is given by the Bayes theorem, which can be formulated as
e1
where X and Y are propositions, and I denotes the relevant background information. The I is often neglected, but it has to be kept in mind that no absolute probabilities exist without certain background assumptions or information. The P(Y|X, I) is called the posterior probability; P(X|Y, I) is the likelihood; P(Y|I) is the prior probability; and P(X|I) has formerly been called the marginalization likelihood, for which Sivia and Skilling (2006) have introduced the term “evidence.”

4. Methods

a. Estimating trends from time series

As one quality criterion, the time series have to include at least two-thirds of all (144) data points considered for the comparison. This is because large data gaps are not representative. The disadvantage of the two-thirds criterion is that only 187 radiosonde time series fulfill this requirement, whereas the satellite data fulfill the criterion in all 908 cases.

Global GOME–SCIAMACHY water vapor trends have been calculated for the time span from 1996 to 2006 in Mieruch et al. (2008), where the methods have been adopted and slightly expanded from Weatherhead et al. (1998). We have extended the trend analysis, which now includes the time from 1996 to 2007.

In the following the water vapor trend estimation is shortly discussed. For a more detailed description we refer to Mieruch et al. (2008) and Weatherhead et al. (1998).

An individual GOME–SCIAMACHY time series (single grid point) can be described by the trend model shown in Eq. (2):
e2
where Y1t contains the monthly mean water vapor data, μ1 is a constant to be estimated, and C1t equals unity for all t and is needed when assessing autocorrelations; S1t is the seasonal component, ω1 represents the trend, and X1t contains the time. The subscript “1” is used for the satellite data, whereas we will use the subscript “2” for the radiosonde trend model.
To account for the change from the GOME to the SCIAMACHY data, that is, the change of spatial resolution and the time of measurement, we use a level shift (as suggested by Weatherhead et al. 1998) of fitted magnitude δ at and after time t = T0(1 < T0 < T1), where T0 = 85 represents the intersection of GOME and SCIAMACHY data on January 2003. Here Ut describes a step function:
e3

The noise N1t is modeled as an autoregressive process of order one [AR(1)], that is, N1t = ϕ1N1t−1 + ε1t (Schlittgen and Streitberg 1997) to consider autocorrelations in the data, where ε1t is an independent random variable with zero mean and variance σ12. The magnitude of autocorrelation ϕ1 is restricted to −1 < ϕ1 < 1 and is estimated using the discrete autocorrelation function (Edelson and Krolik 1988), which can account for gaps in the data.

In the publications of Mieruch et al. (2008) and Weatherhead et al. (1998) the seasonal component is described as a Fourier series and subtracted from the measurements to yield deseasonalized data. For simplicity anomalies are calculated by subtracting the seasonal means from 1996 to 2007 from the monthly mean water vapor time series in this manuscript. In Mieruch (2010) the trends are shown to be invariant with respect to the choice of the two methods for deseasonalizing. Accordingly, we add the respective overall means to the data. The magnitude of the level shift δ and the trend ω1 are invariant under the calculation of anomalies. Thus, the deseasonalized GOME–SCIAMACHY water vapor is now modeled by
e4
As shown by Mieruch et al. (2008) the autocorrelated noise N1t is transformed to white noise ε1t using the AR(1). The autocorrelations are accommodated by the variables in Eq. (4) (now indicated with the asterisk superscript), thus we observe
e5

Comparing the trend results derived here with those shown in Mieruch et al. (2008) results in small differences, indicating that the expansion of the data to 2007 influences the trends very slightly.

The regression parameters , , and have been estimated by a least squares approach. Furthermore, the standard errors of the regression parameters , , and and the standard deviation of the noise σ1 have been determined within the regression procedure. Note that the standard error refers to the standard deviation divided by the sample size. Since we are interested in trends, only the error of the trend and the standard deviation of the noise are needed for the hypothesis testing. The trend model for the radiosonde measurements , where also autocorrelations have been considered, is
e6
where no level shift is used. In the same way as above, the regression parameters , , their errors , , and the standard deviation of the noise σ2 have been estimated. For the approximation of the Bayesian method, shown in section 4e, the standard deviation of the noise from the pooled data with a single trend is needed, which can be obtained by applying the least squares regression to the pooled data :
e7
with
e8
To pool the data individual offsets are used in the regression model, being implemented as μp1 and μp2 in Eq. (7).
Solving the least squares regression, the standard deviation of the noise
e9
and
e10
under a single trend is determined. The li are the respective lengths of the time series and , which are reduced by the number of fitted parameters.

b. Welch test applied to trends

The Welch test (Welch 1947) is the most commonly used approach to solve the Behrens–Fisher problem, that is, the estimation of the probability of equal means with different unknown standard deviations from time series. It represents an unpaired t test; thus we assume the independence of satellite and radiosonde measurements. This is a reasonable assumption regarding the differences between the datasets, which have been introduced in section 2. In the following, the Welch test is applied to trends from time series. The null hypothesis H0: d = ω1ω2 = 0 postulates that the difference of the two trends is equal to zero, whereas the alternative hypothesis is H1: d = ω1ω2 ≠ 0. The standard error of the difference d is observed as
e11
where are the respective standard errors of the trends with i = 1, 2.
The t statistic is then given by
e12
Accordingly the t distribution with
e13
degrees of freedom [Eq. (13) is called the Welch–Satterthwaite equation (Satterthwaite 1946)] has to be integrated from t0 to ∞. The result has to be multiplied by the factor 2 because no prior information on the sign of d exists, which requires a two-tailed test. Finally, the exceedance probability P(t > t0|H0) is derived where t0 is the result of Eq. (12) calculated from the data. The integrals of the t distribution are typically tabulated in several high level programming languages such as Octave (http://www.gnu.org/software/octave/).

c. Bayesian model intercomparison

In the following, a Bayesian method to compare trends in time series is presented. The Bayesian model selection for the difference of trends is based on the works of Bretthorst (1993) and Sivia and Skilling (2006) who estimate the difference of means and standard deviations between two sets of data. For this study, the methods have been extended to compare trends.

We set up two hypotheses:

  • A: Both sets have a common (unknown) trend ω;
  • B: The two datasets have individual (unknown) trends ω1 and ω2.

Note that the magnitudes of the trends do not matter. Hypothesis A corresponds to Eq. (7), while hypothesis B corresponds to Eqs. (5) and (6).

The posterior probability of the hypothesis A, given the respective data using the Bayes theorem, is estimated:
e14
where the D1 and D2 represent the two datasets, and I describes certain relevant background information.
Since the absolute magnitudes of the parameters p1 = (μp1, μp2, ωp, δp, σp1, σp2) from Eq. (7) are irrelevant, we can use the marginalization rule (cf. Sivia and Skilling 2006 and Bretthorst 1993) and integrate
e15
Assuming logical independence of the prior probabilities of the hypothesis A and the parameters p1:
e16
e17
e18
In the same way as in Eq. (15) the posterior for hypothesis B is derived:
e19
with p2 = [μ1, μ2, ω1, ω2 δ, σ1, σ2] from Eqs. (5) and (6) and
e20
e21
e22
The denominator P(D1, D2|I) is the evidence (cf. section 3), in this case:
e23
Here P(A|I) is the prior probability for hypothesis A. As there is no reason to prefer either this hypothesis or the alternative P(B|I), we assign both with the probability 0.5; thus they cancel out in the ratios given in Eqs. (15) and (19).
The prior probabilities P(p1|I) and P(p2|I) in (15) and (19) do not have to be integrated because they are independent from the parameters themselves and are realized by choosing them as bounded priors in the form of fully normalized uniform distributions:
e24
That is, it is assumed that all pi in the interval [pi min, pi max] have the same probability. All prior probabilities, except the trend priors, occur in the numerator and denominator of Eqs. (15) and (19), respectively; thus they cancel out. The priors of the pooled trend and the separate trends are chosen as P(ω|I): = P(ωp|I) = P(ω1|I) = P(ω2|I) with
e25

This prior information provides the probabilistic analysis and interpretation of the results. If the range of possible trends is increased, larger differences of trends are probable and vice versa. A sensitivity analysis on the trend priors is given in section 4d. Fortunately, the trend study of Mieruch et al. (2008) provides beneficial information on the range of the trends. The boundaries for the three trend priors in Eq. (25) are chosen as ωmin = −0.1 and ωmax = +0.1 g cm−2 yr−1. This trend range comprises more than 99.9% of all water vapor trends for the time span 1996 to 2006 and is a lower bound. Any decrease of the trend range would result in a truncation of the probability space. Finally, another trend prior cancels out in each of the ratios of Eqs. (15) and (19).

The only remaining quantities are the two likelihood functions, where the two independent datasets D1 and D2 from radiosonde and satellite measurements are assumed to be independent on a noise basis as well:
e26
and
e27
A Gaussian likelihood is assumed such that the residuals ε1t, ε2t, and εpt are normally distributed. The D1 comprise l1 independent measurements {D1t}, and the D2 comprise l2 independent measurements {D2t}, leading to
e28
and
e29
where the asterisks have been dropped for convenience, but the transformed data remain addressed (cf. section 4a).
Owing to the equality of the priors and the normalization, the analysis simplifies to
e30
and
e31
The final posterior probabilities (30) and (31) constitute highly complex functions comprising products of more than 200 Gaussians, which have to be integrated over six and seven dimensions, respectively. Such multidimensional, complex probability density distributions have extremely small peaks and are exceedingly steep, comparable to “needles in a haystack,” as stated by Liu (2003). This means that standard quadrature, and even standard Monte Carlo, integration algorithms are not sufficient to solve these integrals. Therefore a Markov Chain Monte Carlo (MCMC) method is used for integration [comprehensive information can be found e.g., in Gilks et al. (1995) and Robert and Casella (2005)], where the algorithm Differential Evolution Markov Chain (DEMC), explained by ter Braak (2006), has been implemented. Two factors essentially determine the precision of the method, which are the burn in phase (bip), that is, the time the algorithm needs to converge to the target distribution, and the number of samples (nos). These parameters are adjusted to achieve a precision of ~0.01 with bip = 105 and nos = 105.

d. Sensitivity analysis

An important step in Bayesian analysis is the choice of prior information. As shown above, we have chosen the prior for the trend parameter as a fully normalized uniform distribution in the range from ωmin = −0.1 to ωmax = +0.1 g cm−2 yr−1, hence the prior range is Δω = 0.2 and , which acts as a penalty on hypothesis B. This circumstance is known as Ockhams Razor, a principle that recommends the selection of an accurate theory or model having the fewest assumptions and postulates when multiple competing theories are equal in describing respective phenomena. Ockham’s Razor is naturally implemented in the Bayesian concept in such a way that a theory is penalized for every additional parameter automatically.

We can qualitatively derive the Ockham factor, which is also shown in Sivia and Skilling (2006) and Dose and Menzel (2004). If model B is the more complex hypothesis and model A is the simpler one, there is one more dimension to integrate over, denoted as ω2, in the case of model B. This contribution to the integral is proportional to the width of the probability density function P(B|D1, D2, I) in this direction (denoted as δω2). With P(ω2|I) = 1/Δω2, we see that the Ockham factor is ≈δω2ω2. This ratio is typically smaller than unity and thus penalizes model B for its additional parameter.

On the one hand, prior information is a great advantage of Bayesian methods; however, on the other hand, the prior information represents a certain degree of subjectivity influencing the results. A sensitivity analysis with respect to the trend prior has been undertaken. Figure 1 depicts the results, where we have chosen an exemplary situation in which the probability for hypothesis A yields P(A|D1, D2, I) = 0.7 and the probability for hypothesis B yields P(B|D1, D2, I) = 0.3 using our chosen trend prior. These results are depicted in Fig. 1 as black and gray points. The x axis in Fig. 1 represents a change of the prior probability P(ω|I) = 5 in percent. For instance, a change of 10% corresponds to an increase of the prior to P(ω|I) = 5.5 and accordingly to a decrease of the trend prior range of ωmin = −0.09 and ωmax = +0.09 g cm−2 yr−1. As can be seen from Fig. 1, P(A|D1, D2, I) decreases with increasing P(ω|I) and decreasing Δω, and vice versa for P(B|D1, D2, I). As mentioned in section 4c, the choice of our trend range constitutes a lower bound; thus in principle only an increase of the trend range would make sense, avoiding any truncation of the probability space. This would decrease P(ω|I) and, as can be seen in Fig. 1, increase P(A|D1, D2, I)—namely, the probability of a common trend. However, in a meaningful range of deviations from our prior, within ±20%, the results are quite insensitive. For unrealistically large deviations of the prior from our choice, large changes of the results are observed. This is shown in the small embedded figure of Fig. 1. In conclusion, a reasonable prior has been selected, and our results are insensitive to changes of this prior—at least in a range of ±20%.

Fig. 1.
Fig. 1.

Sensitivity analysis of a change in the trend prior. Exemplary we have chosen P(A|D1, D2, I) = 0.7 and P(B|D1, D2, I) = 0.3 depicted as black and gray points. The x axis represents a change of the prior probability P(ω|I) = 5 in %. For instance, a change of 10% corresponds to an increase of the prior to P(ω|I) = 5.5. The embedded small figure shows the results for larger changes of the prior.

Citation: Journal of Climate 24, 12; 10.1175/2011JCLI3669.1

e. Analytical approximation

DEMC is a sophisticated and powerful algorithm that goes beyond what is implemented in standard computational programming languages or packages. The disadvantage of DEMC is the need for large computational power. Sivia and Skilling (2006) have derived an approximation for a Bayesian method, which compares means and standard deviations of data. This approximation is adapted to the method for trend comparison shown in the following.

Using a quadratic Taylor series expansion of the logarithmic likelihood function in (26) we find
e32
where LA = loge[P(D1|A, μp1, ωp, δp, σp1, I)P(D2|A, μp2, ωp, σp2, I)] with a maximum at . The parameters are determined by the first partial derivatives ∂LA/∂p1 = 0, solving a set of linear equations in a least squares sense. Thus, we can use the parameters estimated in section 4a.

The second term in Eq. (32) contains the vector , which is shown explicitly in the appendix. The entries of the 6 × 6 matrix in Eq. (32) are derived from the second partial derivatives of LA evaluated at , which is shown in the appendix.

The approximated likelihood of hypothesis A exponentiating LA, is
e33
e34
The first exponential in Eq. (34) is a constant and the second is a six-dimensional Gaussian. Sivia and Skilling (2006) integrate an M-dimensional Gaussian by,
e35
thus Eq. (34) becomes
e36
This analysis yields
e37
The alternative hypothesis B, stating that the time series have individual trends ω1 and ω2, is now derived. The procedure is identical to the previous derivations, using the quadratic Taylor series expansion of the logarithmic likelihood function Eq. (27)
e38
The quantities , KB, and are given in the appendix. Accordingly, we find
e39
The posterior probabilities have to be normalized. Because of
e40
with the evidence
e41
this yields
e42
and
e43
After normalization all terms occurring in both (42) and (43) cancel out in the ratios, thus we have
e44
and
e45

The posterior probability of hypothesis B is also proportional to the prior probability of the trends, which is chosen in the same way as in Eq. (25). Eqs. (42) and (43) are analytical functions, which can quite easily be computed in contrast to Eqs. (30) and (31) that can only be solved numerically.

5. Results

The comparison methods for trends in time series described above, that is, the Welch test and the Bayesian model selection, have been applied to measured trends from satellite and radiosonde monthly mean water vapor data. The trends have been calculated using the methods described in section 4a. For the comparison a quality criterion is required, that is, both time series have to contain at least two-thirds of the monthly mean measurements over the time span from January 1996 to December 2007, that is, at least 96 data points from the maximal 144. This constraint assures that the trends are representative for the period investigated and less susceptible to possible outliers.

Figure 2a depicts the results of the Welch test. The probabilities P(t > t0|H0), with null hypothesis H0: d = ω1ω2 = 0 and the difference of the trends d, for the 187 trend pairs are plotted versus d. High probabilities are observed for small trend differences, while lower probabilities are found for large trend differences, as expected. Figure 2b shows the P(t > t0|H0) plotted versus the trend differences normalized to the error of the difference. From the definition of the Welch test it is clear that P(t > t0|H0) is totally determined by (ω1ω2)/σd, where ω1 is the GOME–SCIAMACHY trend and ω2 is the radiosonde trend. In the sense of the frequentist interpretation, the null hypothesis for a single test would be rejected if P(t > t0|H0) < α = 0.05, which would apply in 20 of 187 cases (~10%). Hence, in about 90% of the tests we cannot reject the null hypothesis. However, we are dealing with 187 independent tests, so multiple testing would be possibly more appropriate. A common approach used in multiple comparisons is the Bonferroni correction (see, e.g., Perneger 1998), which decreases the significance level α. The reason for this correction is that the number of observed significant test results that occur by chance, performing n tests, is ≤. The Bonferroni correction would yield a significance level: . Using the Bonferroni correction we would test the general null hypothesis that all null hypotheses are true simultaneously, which has to be rejected if one or more p values are smaller than the significance level . Since no P(t > t0|H0) from the 187 tests is smaller than , the general null hypothesis cannot be rejected. The Bonferroni correction is quite conservative, which has been seriously criticized, for example, by Perneger (1998). A discussion on the usefulness of the Bonferroni correction is considered valuable, but the general conclusions of this investigation are similar with and without the Bonferroni correction. Without the multiple testing correction the null hypothesis is accepted in 90% of all cases. Using the Bonferroni correction the general null hypothesis for the complete dataset is also accepted.

Fig. 2.
Fig. 2.

One hundred, eighty-seven test results between pairs of water vapor trends from GOME–SCIAMACHY and radiosonde data are plotted against the respective trend differences and against the trend differences divided by the error of the differences: (a),(b) using the Welch test, (c),(d) applying the exact Bayesian method, and (e),(f) performing the approximated Bayesian approach.

Citation: Journal of Climate 24, 12; 10.1175/2011JCLI3669.1

The 187 probabilities of a common trend P(A|D1, D2, I) from the exact Bayesian model selection (for each trend pair) are plotted versus the difference of the trends in Fig. 2c and versus the trend difference normalized to the error of the difference (ω1ω2)/σd in Fig. 2d. Additionally, the results from the approximation of the Bayesian method are shown in Figs. 2e and 2f. High probabilities for a common trend are found for small trend differences, whereas the probability is low for large trend deviations as in the case of the Welch test. The approximation slightly overestimates the exact probabilities and the mean relative difference is O(10%), but the general results from the exact method and the approximation are very similar, thus the use of the approximation is recommended for monthly mean water-vapor trend comparison if sophisticated algorithms like DEMC are not available or large datasets have to be analyzed in a short period of time. To judge the probabilities of different hypotheses, Jeffreys (1939) introduced the scale presented in Table 1. Here P(A) and P(B) denote the respective probabilities of the hypotheses, where a value of log10[P(A)/P(B)] = 1 means that hypothesis A is 10 times more probable than hypothesis B.

Table 1.

Judgement of evidence against hypothesis B regarding Jeffreys (1939).

Table 1.

Regarding the Jeffreys scale (Table 1), the evidence against hypothesis B is substantial if the logarithm of the so-called Bayes factor, which is P(D1, D2|A, I)/[P(D1, D2|B, I) · p(ω|I)] here, is larger than 0.5 and smaller than 1, which corresponds to 0.76 < P(A|D1, D2, I) < 0.91; hence the evidence against hypothesis A is substantial if 0.09 < P(A|D1, D2, I) < 0.24. Hypothesis A is preferred substantially in 49 cases and hypothesis B in 9 cases, using the exact method. When the approximation is used, A is preferred in 114 cases and B in 5 cases. The evidence against B is strong to decisive if P(A|D1, D2, I) > 0.91, which is true in zero cases for the exact solution and true in 10 cases for the approximation. Strong to decisive evidence is drawn against A if P(A|D1, D2, I) < 0.09, which is observed 3 times in the exact case and 2 times in the case of the approximation. The rigorous application of the Bayesian model selection would prefer hypothesis A if P(A|D1, D2, I) > 0.5, which is true in 153 cases of 187, that is, 82% for the exact method and in 165 cases for the approximation. Interpreting the observed patterns in Figs. 2c–f, distinct clusters of data points are found between probabilities of 0.7 to 0.9. These are mostly classified as substantially supporting hypothesis A of a common trend. Since this is true for the exact Bayesian method and the approximation, generally similar conclusions are drawn. Nevertheless, since large differences between the exact and the approximation method have been observed for single time series, significant conclusions can only be drawn for the ensemble level. Furthermore, as applied for the Welch test, the argumentation of multiple testing also concerns the Bayesian model selection. Westfall et al. (1997) (and citations therein) suggest a perspective of a Bayesian Bonferroni correction, which acts on the prior information. Accordingly, our trend prior P(ω|I) would be transformed to , which corresponds to an enlarged trend range of ωmin = −0.5 and ωmax = +0.5 g cm−2 yr−1, increasing the P(A|D1, D2, I) decisively and hence supporting hypothesis A in general, which is shown in Fig. 3.

Fig. 3.
Fig. 3.

Exact Bayesian method applied to trends using the Bonferroni correction .

Citation: Journal of Climate 24, 12; 10.1175/2011JCLI3669.1

In the following section, examples of GOME–SCIAMACHY and radiosonde water vapor time series are analyzed. Figure 4a shows the deseasonalized GOME–SCIAMACHY and radiosonde monthly mean water vapor columns together with their linear trends from Nottingham, England. For visual presentation the GOME–SCIAMACHY level shift has been removed. The human visual system is quite sophisticated in the identification of diverse patterns and also in comparing trends. From Fig. 4a it is clear that the trend difference is small and, indeed, the trends are nearly equal with ω1ω2 = 0.001 g cm−2 yr−1 and (ω1ω2)/σd = 0.14 (cf. Fig. 2). The Welch test gives a probability of P(t > t0|H0) = 0.89, which cannot be rejected.

Fig. 4.
Fig. 4.

Examples of GOME–SCIAMACHY and radiosonde water vapor time series at (a),(b) Nottingham, England, (c),(d) Albany Airport, Australia; (e),(f) Meiningen, Germany; and Minqin, China.

Citation: Journal of Climate 24, 12; 10.1175/2011JCLI3669.1

The Bayesian hypothesis B is visualized schematically in Fig. 4a by modeling the data with two trends. Hypothesis A is illustratively shown in Fig. 4b by pooling the data and applying a single trend. For visual presentation the offsets of GOME–SCIAMACHY and radiosonde data have been removed. From the Bayesian point of view hypothesis A is substantially preferred with P(A|D1, D2, I) = 0.82. The approximation method gives Papprox(A|D1, D2, I) = 0.90. Hence, for small trend differences, both the frequentist and Bayesian concept reveal quite large probabilities for the respective tests.

Low probabilities are found, for example, at Albany Airport in Australia. The time series are shown in Figs. 4c and 4d. The visual inspection definitely classifies the trends as different. The trend difference is actually ω1ω2 = 0.04 g cm−2 yr−1 and after normalization it is (ω1ω2)/σd = 3.2. The Welch test gives a probability of P(t > t0|H0) = 0.002 (rejection of the null hypothesis). The exact Bayesian finds P(A|D1, D2, I) = 0.02 and the approximation yields Papprox(A|D1, D2, I) = 0.04 (preferring B). Thus, low probabilities are found for large trend differences by both statistical methods. Different probabilities for the frequentist test and the Bayesian method, as can be seen from Fig. 2, are found in the range between small and large trend differences. As an example, a pair of water vapor time series from Meiningen, Germany, is chosen with a trend difference of ω1ω2 = 0.014 g cm−2 yr−1 and a normalized trend difference of (ω1ω2)/σd = 1.3. The probability derived from the Welch test yields P(t > t0|H0) = 0.19, which is small, but not small enough to reject the null hypothesis. The exact Bayesian gives P(A|D1, D2, I) = 0.89, and the approximation is Papprox(A|D1, D2, I) = 0.75. Here it again has to be mentioned that both methods (Welch test/Bayes) reveal different probabilities and are both correct under the respective frameworks of the frequentist philosophy and the Bayesian concept. The exact probabilities of the frequentist and Bayesian method differ; however, the conclusions are nevertheless similar.

The GOME–SCIAMACHY trends are plotted in Fig. 5, where the 187 radiosonde trends have been embedded into the figure indicated by black, gray, and white bordered circles. The circles of radiosonde trends are filled with the color for the magnitude of the respective trends according to the color bar used for the GOME–SCIAMACHY data as well. The borders of the circles indicate the Bayesian posterior probabilities P(A|D1, D2, I) at a specific geolocation. A black border indicates a probability ≤0.5, which means that hypothesis B is preferred. Seven of the total 34 black bordered circles are covered by the other circles and cannot be seen in the figure. A gray-bordered circle represents probabilities >0.5 and ≤0.76 where hypothesis A is favored (104 circles), and a white border indicates that hypothesis A is substantially preferred with Bayesian probabilities >0.76 (49 circles).

Fig. 5.
Fig. 5.

Global water-vapor total column trends from GOME–SCIAMACHY are coded in gray scales. The 187 radiosonde water vapor trends are embedded into the figure as circles, where the same color bar is used for filling. White-bordered circles depict Bayesian probabilities of agreements between satellite and radiosonde trends >0.76, gray borders indicate probabilities >0.5 and ≤0.76, whereas black-bordered circles show probabilities of agreement ≤0.5. Note that the Himalayas and Andes regions are excluded from the analysis because of high elevation, which actually cannot be retrieved by the AMC–DOAS method.

Citation: Journal of Climate 24, 12; 10.1175/2011JCLI3669.1

One reason for discrepancies between satellite and radiosonde trends are data gaps in the radiosonde data. This has been observed, for example, at Minqin, China, shown in Figs. 4g and 4h. Radiosonde data are often missing in summer, especially in 2006 and 2007 when high water vapor was observed by SCIAMACHY. The Welch test gives P(t > t0|H0) = 0.01, which implies that the null hypothesis should be rejected. The exact Bayesian gives P(A|D1, D2, I) = 0.43, hence preferring hypothesis B. Again, the individual p values of both methods are different, but the conclusions are similar.

As mentioned in section 2b one possible important reason for discrepancies between observed trends from satellite and radiosonde water vapor data is the different resolution of the two instruments. Radiosondes can capture local events, whereas the satellite measurement is an average over a large area. This will be shown in the following using an example from the west coast of Saudi Arabia. A zoom into this region is depicted in Fig. 6. Note that in Fig. 6 the color scale used for the GOME–SCIAMACHY and radiosonde trends is different from the one used in Fig. 5. Here a positive water vapor trend is observed with a radiosonde measurement located exactly at the city of Jeddah. The satellite trends in the near vicinity of the town are enhanced as well but are not as strong as the very localized radiosonde trend. However, it seems possible that changes in the total water vapor column, observed via satellite, can be attributed to human activities with a high probability, identifying Jeddah as a source of water vapor. Further, increasing water vapor is observed exactly at the city of Asmara in Eritrea as well (cf. Fig. 6), which shows that urban areas, and hence anthropogenic influence on water vapor changes, can be detected using satellite observations. The satellite measurements of the positive trends over Jeddah are rather smeared out over a larger region. This is a likely explanation for the relatively low probability for hypothesis A between the observed trends, which is indicated by the gray-bordered circle.

Fig. 6.
Fig. 6.

GOME–SCIAMACHY water vapor trends in the range of ±0.03 g cm−2 yr−1 at the Arabian Peninsula (10°–30°N, 35°–60°E) with embedded radiosonde trends from 1996 to 2007. A different color scale as in Fig. 5 is used in this blowup.

Citation: Journal of Climate 24, 12; 10.1175/2011JCLI3669.1

6. Conclusions

In this manuscript, we have solved the Behrens–Fisher problem (same means, unequal unknown standard deviations) for the analysis of trends from temporal series for water vapor, derived by different methods using two schools of statistics, the frequentist (standard) approach and the Bayesian concepts. We have applied both methods to global water vapor datasets observed by satellite (GOME–SCIAMACHY) and radiosonde instruments.

To utilize and assess the value of frequentist statistics, the widely used Welch test has been applied to address the Behrens–Fisher problem. The individual null hypothesis H0: d = ω1ω2 = 0, stating that the difference of the trends is equal to zero, could only be rejected in 10% of the 187 trend comparisons. Additionally we have applied multiple testing by using the Bonferroni correction to test the general null hypothesis stating that all single null hypotheses are true, which could not be rejected. These results increase our confidence, in a climatological sense, that satellite and radiosondes measure the true changes of total column water vapor, assuming no long-term calibration issues for the radiosondes, negligible changes in cloud cover, and an adequate combination of the GOME and SCIAMACHY data.

Concerning the Bayesian model selection we estimated the probabilities for the hypotheses

  • A: Both sets have a common (unknown) trend ω;
  • B: The two datasets have individual (unknown) trends ω1 and ω2.
This is achieved using prior information and further the complete data under consideration. Today, the computational power of standard computers is sufficient to operate, for example, the genetic algorithm DEMC to estimate the Bayesian posteriors. In the Bayesian framework, a cluster of about 50 trend differences between probabilities of 0.7 and 0.9 is obtained for hypothesis A (cf. Fig. 2). This provides evidence supporting the hypothesis A, namely, that the observed trends are common for the two datasets. Overall, the analysis of the datasets for the two different hypotheses of having either common and individual trends show that the common trend is preferred in 153 cases of 187 (82%) and that generally hypothesis A has a larger probability than hypothesis B. We attribute the cases where the results are poorer to either an inadequate calibration of the long-term datasets from the radiosondes or local-scale changes, which are smoothed in the lower spatially resolved remote sounding dataset. In addition, the use of a Bonferroni-like correction in a Bayesian sense results in a clear preference of hypothesis A, thus hinting at a common trend in satellite and radiosonde measurements.

In conclusion both approaches, the frequentist Welch test and the Bayesian model selection, yield similar results when used to compare water vapor trends from the two independent measurement sets on an ensemble basis, including or neglecting multiple testing. However, differences can occur when using the two tests to compare trends at a given location. The frequentist method is recommended, provided it is sufficient to work with a given null hypothesis and thereby infer the probability of a parameter and associated significance test. If this is not the case, then we recommend the Bayesian approach, where it is possible to derive probabilities of competing hypotheses. From a climatological viewpoint evidence that trends in time series from both observing systems, radiosondes, and satellites are real significant and not statistical artifacts has been found. The size of the trends is not discussed further because the length of the time series is short and therefore the trends observed are not yet identifying a climate signal.

Acknowledgments

SCIAMACHY is a national contribution to the ESA ENVISAT project, funded by Germany, the Netherlands, and Belgium. SCIAMACHY data have been provided by ESA. Radiosonde data have been provided by DWD. This work has been funded in part by the University and state of Bremen, the German Aerospace (DLR) by the EU (6th and 7th framework ACCENT and ACCENT plus projects), and by EUMETSAT in the framework of the Climate Monitoring (CM-SAF) part of the Satellite Application Facilities Network. We thank D. S. Sivia and G. L. Bretthorst for advice concerning prior probabilities. We also thank C. ter Braak for discussion on the DEMC algorithm. Fruitful discussions are acknowledged to A. C. Davison and J. A. Freund during the 11th IMSC in Edinburgh. We thank Kirsten Schnülle for proofreading the manuscript. We grateful acknowledge the critical comments of three anonymous referees, which have improved the manuscript significantly.

APPENDIX

Analytical Approximation—The Matrices

Regarding section 4e, the quadratic Taylor series expansion of the logarithmic likelihood function Eq. (26) yields
ea1
with
ea2
and
ea3
which is
ea4
Analogously the quadratic Taylor series expansion of the logarithmic likelihood function Eq. (27) yields
ea5
with
ea6
and
ea7
which is
ea8

REFERENCES

  • Bayes, T., 1763: An essay towards solving a problem in the doctrine of chances. Philos. Trans. Roy. Soc., 53, 330418.

  • Behrens, W. V., 1929: Ein beitrag zur fehlerberechnung bei wenigen beobachtungen. Landwirtschaftliche Jahrbücher, 68, 807837.

  • Bovensmann, H., , J. P. Burrows, , M. Buchwitz, , J. Frerick, , S. Noël, , V. V. Rozanov, , K. V. Chance, , and A. H. P. Goede, 1999: SCIAMACHY–Mission objectives and measurement modes. J. Atmos. Sci., 56, 127150.

    • Search Google Scholar
    • Export Citation
  • Bretthorst, G. L., 1993: On the difference in means. Physics and Probability, Cambridge University Press, 177–194.

  • Burrows, J. P., and Coauthors, 1999: The Global Ozone Monitoring Experiment (GOME): Mission concept and first scientific results. J. Atmos. Sci., 56, 151175.

    • Search Google Scholar
    • Export Citation
  • de Laplace, P. S., 1812: Théorie Analytique des Probalités. Courcier Imprimeur, 506 pp.

  • Dose, V., , and A. Menzel, 2004: Bayesian analysis of climate change impacts in phenology. Global Change Biol., 10, 259272.

  • Edelson, R. A., , and J. H. Krolik, 1988: The discrete correlation function: A new method for analyzing unevenly sampled variability data. Astrophys. J., 333, 646659.

    • Search Google Scholar
    • Export Citation
  • Elliott, W. P., , and D. J. Gaffen, 1991: On the utility of radiosonde humidity archives for climate studies. Bull. Amer. Meteor. Soc., 72, 15071520.

    • Search Google Scholar
    • Export Citation
  • Fisher, R. A., 1937: The comparison of samples with possibly unequal variances. Ann. Eugen., 9, 174180.

  • Garand, L., , C. Grassotti, , J. Halle, , and G. L. Klein, 1992: On differences in radiosonde humidity—Reporting practices and their implications for numerical weather prediction and remote sensing. Bull. Amer. Meteor. Soc., 73, 14171423.

    • Search Google Scholar
    • Export Citation
  • Gilks, W. R., , S. Richardson, , and D. Spiegelhalter, 1995: Markov Chain Monte Carlo in Practice. Chapman & Hall/CRC, 512 pp.

  • Jaynes, E. T., , and L. G. Bretthorst, 2003: Probability Theory: The Logic of Science: Principles and Elementary Applications. Vol 1. Cambridge University Press, 758 pp.

    • Search Google Scholar
    • Export Citation
  • Jeffreys, H., 1939: Theory of Probability. 3rd ed. Oxford University Press, 472 pp.

  • Kahn, B. H., , A. Gettelman, , E. J. Fetzer, , A. Eldering, , and C. K. Liang, 2009: Cloudy and clear-sky relative humidity in the upper troposphere observed by the A-train. J. Geophys. Res., 114, D00H02, doi:10.1029/2009JD011738.

    • Search Google Scholar
    • Export Citation
  • Keeling, C. D., , S. C. Piper, , R. B. Bacastow, , M. Wahlen, , T. P. Whorf, , M. Heimann, , and H. A. Meijer, 2001: Exchanges of atmospheric CO2 and 13CO2 with the terrestrial biosphere and oceans from 1978 to 2000. Observations and carbon cycle implications. A History of Atmospheric CO2 and Its Effects on Plants, Animals, and Ecosystems: I. Global Aspects., J. R. Ehleringer et al., Eds., Springer, 83–113.

    • Search Google Scholar
    • Export Citation
  • Lally, V. E., 1985: Upper air in situ observing systems. Handbook of Applied Meteorology, John Wiley & Sons, Inc., 352–360.

  • Lanzante, J. R., 2005: A cautionary note on the use of error bars. J. Climate, 18, 36993703.

  • Lee, T. C. K., , F. W. Zwiers, , G. C. Hegerl, , X. Zhang, , and M. Tsao, 2005: A Bayesian climate change detection and attribution assessment. J. Climate, 18, 24292440.

    • Search Google Scholar
    • Export Citation
  • Liu, J. S., 2003: Monte Carlo Strategies in Scientific Computing. Springer, 360 pp.

  • Mieruch, S., 2010: Identification and statistical analysis of global water vapour trends based on satellite data. Ph.D. thesis.

  • Mieruch, S., , S. Noël, , H. Bovensmann, , and J. P. Burrows, 2008: Analysis of global water vapour trends from satellite measurements in the visible spectral range. Atmos. Chem. Phys., 8, 491504.

    • Search Google Scholar
    • Export Citation
  • Moreno, E., , F. Bertolino, , and W. Racugno, 1999: Default bayesian analysis of the Behrens-Fisher problem. J. Stat. Plann. Infer., 81, 323333.

    • Search Google Scholar
    • Export Citation
  • Noël, S., , M. Buchwitz, , and J. P. Burrows, 2004: First retrieval of global water vapour column amounts from SCIAMACHY measurements. Atmos. Chem. Phys., 4, 111125.

    • Search Google Scholar
    • Export Citation
  • Perneger, T. V., 1998: What’s wrong with Bonferroni adjustments. BMJ, 316, 12361238.

  • Robert, C. P., , and G. Casella, 2005: Monte Carlo Statistical Methods. Springer, 536 pp.

  • Satterthwaite, F. E., 1946: An approximate distribution of estimates of variance components. Biom. Bull., 2, 110114, doi:10.2307/3002019.

    • Search Google Scholar
    • Export Citation
  • Schlittgen, R., , and B. H. J. Streitberg, 1997: Zeitreihenanalyse. Oldenbourg.

  • Schulz, J., and Coauthors, 2009: Operational climate monitoring from space: The EUMETSAT Satellite Application Facility on Climate Monitoring (CM-SAF). Atmos. Chem. Phys., 9, 16871709.

    • Search Google Scholar
    • Export Citation
  • Sivia, D. S., , and J. Skilling, 2006: Data Analysis: A Bayesian Tutorial. Oxford University Press, 208 pp.

  • Sohn, B. J., , and R. Bennartz, 2008: Contribution of water vapor to observational estimates of longwave cloud radiative forcing. J. Geophys. Res., 113, D20107, doi:10.1029/2008JD010053.

    • Search Google Scholar
    • Export Citation
  • ter Braak, C., 2006: A Markov Chain Monte Carlo version of the genetic algorithm differential evolution: Easy Bayesian computing for real parameter spaces. Stat. Comput., 16, 239249.

    • Search Google Scholar
    • Export Citation
  • Vaisala, 1989: RS 80 Radiosondes. Upper-Air Systems product information. Vaisala Inc. Reference R0422-2, 16 pp.

  • Weatherhead, E. C., and Coauthors, 1998: Factors affecting the detection of trends: Statistical considerations and applications to environmental data. J. Geophys. Res., 103, 17 14917 161.

    • Search Google Scholar
    • Export Citation
  • Welch, B. L., 1947: The generalization of “student’s” problem when several different population variances are involved. Biometrika, 34, 2835.

    • Search Google Scholar
    • Export Citation
  • Westfall, P. H., , W. O. Johnson, , and J. M. Utts, 1997: A Bayesian perspective on the Bonferroni adjustment. Biometrika, 84, 419427.

Save