A Bayes Factor Model for Detecting Artificial Discontinuities via Pairwise Comparisons

Jun Zhang CICS-NC, North Carolina State University, Raleigh, and Statistical and Applied Mathematical Sciences Institute, Research Triangle Park, North Carolina

Search for other papers by Jun Zhang in
Current site
Google Scholar
PubMed
Close
,
Wei Zheng Sanofi-Aventis, Boston, Massachusetts

Search for other papers by Wei Zheng in
Current site
Google Scholar
PubMed
Close
, and
Matthew J. Menne NOAA/National Climatic Data Center, Asheville, North Carolina

Search for other papers by Matthew J. Menne in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

In this paper, the authors present a Bayes factor model for detecting undocumented artificial discontinuities in a network of temperature series. First, they generate multiple difference series for each station with the pairwise comparison approach. Next, they treat the detection problem as a Bayesian model selection problem and use Bayes factors to calculate the posterior probabilities of the discontinuities and estimate their locations in time and space. The model can be applied to large climate networks and realistic temperature series with missing data. The effectiveness of the model is illustrated with two realistic large-scale simulations and four sensitivity analyses. Results from applying the algorithm to observed monthly temperature data from the conterminous United States are also briefly discussed in the context of what is currently known about the nature of biases in the U.S. surface temperature record.

Corresponding author address: Jun Zhang, NOAA/National Climatic Data Center, 151 Patton Ave., Asheville, NC 28801. E-mail: zhang.jun1@gmail.com

Abstract

In this paper, the authors present a Bayes factor model for detecting undocumented artificial discontinuities in a network of temperature series. First, they generate multiple difference series for each station with the pairwise comparison approach. Next, they treat the detection problem as a Bayesian model selection problem and use Bayes factors to calculate the posterior probabilities of the discontinuities and estimate their locations in time and space. The model can be applied to large climate networks and realistic temperature series with missing data. The effectiveness of the model is illustrated with two realistic large-scale simulations and four sensitivity analyses. Results from applying the algorithm to observed monthly temperature data from the conterminous United States are also briefly discussed in the context of what is currently known about the nature of biases in the U.S. surface temperature record.

Corresponding author address: Jun Zhang, NOAA/National Climatic Data Center, 151 Patton Ave., Asheville, NC 28801. E-mail: zhang.jun1@gmail.com

1. Introduction

It is well known that temperature series may contain unknown artificial discontinuities (Peterson et al. 1998). Such discontinuities are typically caused by station moves, instrument changes, and/or microclimate changes surrounding a station. If left undetected, these artificial signals can bias attempts to estimate true climate signals (Menne et al. 2009). Consequently, many algorithms have been developed to detect the discontinuities. A list of representative publications includes Alexandersson (1986), Vincent (1998), Lund and Reeves (2002), Caussinus and Mestre (2004), Della-Marta and Wanner (2006), Lund et al. (2007), Reeves et al. (2007), Wang et al. (2007), Wang (2008a,b), Menne and Williams (2009), Hannart and Naveau (2009), Beaulieu et al. (2010), and Lu et al. (2010).

Caussinus and Mestre (2004) and Menne and Williams (2009) adopt a pairwise comparison approach that is argued to have advantages in terms of avoiding the detection of true climate signals and utilizing difference series to increase signal-to-noise ratios (SNR) and improve hit rates (HRs). In Menne and Williams (2009), a semihierarchical splitting algorithm is applied to each difference series to identify all potential discontinuities and a rule-based algorithm is used to automatically assign discontinuities to the corresponding stations. However, for a specific target discontinuity, the estimated locations based on different target–neighbor difference series may not agree with each other. Menne and Williams (2009) solve the location uncertainty issue empirically. Hannart and Naveau (2009) and Beaulieu et al. (2010) address the location uncertainty issue from a Bayesian perspective. Hannart and Naveau (2009) propose a method based on Bayesian decision theory. Their method identifies subsequences containing a unique discontinuity by minimizing average posterior cost functions recursively. Beaulieu et al. (2010) develop a framework based on Bayesian normal homogeneity test (BNHT) and apply BNHT recursively on the series to detect multiple discontinuities. However, such uncertainty can also be addressed from a Bayesian model selection perspective.

Here, we describe a Bayes factor model selection procedure for the automatic detection of temperature series changepoints using pairwise comparisons. In the procedure, the Bayes factor or the evidence of discontinuities at each time step is first computed via a sliding sample window. Then, after the Bayes factors are obtained, we identify potential discontinuities by comparing the Bayes factors with an appropriate threshold and calculate the posterior probabilities of the discontinuities for each time step. Finally, we obtain the estimated locations for the discontinuities by computing the posterior mean for each location. In section 2, we describe the details of the Bayes factor model. In section 3, we discuss how to select model parameters. Some results based on simulations and real observations are discussed in section 4. Also, sensitivity analyses with respect to different model parameters are presented in section 4. The conclusions are in section 5.

2. Description of the Bayes factor model

a. Difference series

Before we describe the details of the model, we first define the difference series following Menne and Williams (2009). Suppose T(t) is the monthly temperature anomaly1 at a station,
e1
where t is the monthly index, CT(t) is the climate signal, JT(t) is the artificial changepoint signal, and εT(t) is the noise. Here, T(t) has a series of correlated neighbors Nj(t), where j = 1, … , n. The difference series can be expressed as
e2
Because of the high correlation between T(t) and Nj(t), εi(t) − εj(t) typically has a smaller variance than does εi(t) or εj(t). Here, JT(t) − Jj(t) contains the changepoint signals from either T(t) or its neighbor Nj(t). Because of multidecadal variations and trends, CT and Cj are not stationary in time. Rather, it is assumed the high spatial correlation inherent in temperature fields means that CT and Cj are approximately equal. As discussed below, a rather narrow time-frame moving window is used to identify local discontinuities; therefore, low-frequency “creeping” inhomogeneities are not likely to be efficiently identified by the Bayes factor approach.

b. Bayes factors

In this case, we want to pick a good model for the difference series. As in other applications, Bayes factors are useful tools for selecting a “winner” among competing models. Suppose, for example, that we have M0 and M1 where M0 is no changepoints in ΔTj(t) and M1 means that there is a changepoint at month t in ΔTj(t); we can compute the posterior probability for each model using the Bayes theorem,
e3
where i = 0, 1 and Y is the observation. We obtain posterior odds by
e4
The Bayes factor is defined as
e5
To obtain P(Y | Mi), we integrate out the parameters,
e6
where P(Y | ωi, Mi) is the probability density function with parameter ωi under Mi and ψ(ωi | Mi) is the prior density for ωi under Mi. After obtaining the Bayes factor and assume prior odds equal to 1, we often use the value of 2 loge(BF10) to evaluate the evidence against M0 (Kass and Raftery 1995). For example, when the value of 2 loge(BF10) is between 2 and 6, there is evidence against M0 (Kass and Raftery 1995). The definition in (5) can be extended to the cases with more than two models. More details of Bayes factors and Bayesian model selection can be found in Kass and Raftery (1995) and MacKay (2003).

c. Bayes factors for one difference series

Suppose for a short time window τ = {s, s + 1, … , s + l}, where s, s + 1, … , s + l are consecutive time indexes in a monthly temperature anomaly series, we have a set of hypotheses H = {Mt: t = 0, s, s + 1, … , s + l}, where Mt = {at month t in ΔT(t)j there is a discontinuity} and M0 = {no discontinuities}. To limit the potential hypotheses, we assume that there is at most one discontinuity in τ. According to Menne et al. (2009), the average distance between two detected discontinuities for the U.S. Historical Climatology Network (USHCN) monthly temperature data version 2 is about 180–240 months. Although the actual frequency of the discontinuities must be higher, we expect that most time windows will contain at most one discontinuity, especially when l ≪ 180.

For each Mt, we write the competing models as
e7
and
e8
where μ1,t is the mean before month t and μ2,t is the mean after month t. With the above assumptions, we can compare Ms, … , Ms+l with M0 and produce the Bayes factors, BFs0, … , BFs+l0. Then, after obtaining BFs0, … , BFs+l0, we can compute the posterior probabilities for all models.
To calculate the probabilities, we apply the Bayesian two-sample t-test framework proposed by Gönen et al. (2005). Suppose x1,j’s are observations before t and x2,j’s are observations after t, for Mt with t = s, … , s + l; we assume observations are from two normal distributions and and the prior distribution is . For M0, we assume observations are from one normal distribution and the prior distribution is . After we observe the dataset Y for τ, we compute the Bayes factor for Mt by integrating out all parameters. Gönen et al. (2005) obtain a closed form for the Bayes factor,
e9
where ζ is the usual two-sample t statistic; μ0 and σ0 are prior mean and prior variance of (μ1,iμ2,j)/σ; np is the pooled sample size; and ϒν(.|ξ, κ) is the noncentral t distribution with location parameter ξ, scale parameter κ1/2, and degree of freedom ν. It is possible to use different priors such as a Cauchy prior (Rouder et al. 2009). However, we have found via simulations that, while we can achieve similar results with normal and Cauchy priors, numerical integration is required with the latter, which increases computational cost. So we choose to use normal priors in our calculations.

d. Bayes factors for K difference series

Suppose now that we observe multiple sets {Y1, Y2, … , YK} of observations from K difference series for a time window τ, we want to compute the Bayes factor for the set of target–neighbor differences. Because the Yi’s are not independent, we cannot combine them in one t statistic. Nevertheless, theoretically the Bayes factor for a model Mt can be obtained with
e10
However, it is difficult to define a parametric model for P(Y1Y2YK|Mt) and to integrate out the parameters directly, so in this case we apply the Bayes factor model of a single series on each difference series and use the following formula to approximate the Bayes factor for K difference series:
e11
e12
where BFt0,i’s are computed with Eq. (9). The rationale behind this approximation is that loge(BFt0,i) can be viewed as the weight of evidence from each dataset Yi and the mean of log Bayes factors can be viewed as the average weight of evidence from the dataset {Y1, Y2, … , YK} (Good 1985). For real applications, the median instead of the mean is recommended to mitigate the impacts of outliers.

e. Estimate break locations

After we obtain the approximate Bayes factor at each time point of the set of difference series, we notice that Bayes factors increase whenever we approach a potential discontinuous point at the target. By selecting an appropriate threshold, a time window τm = {sm, sm + 1 … , sm + lm} with 2 loge(BF) above the threshold will be identified to form the model set Hm.2 Depending on the magnitude of the threshold and the size of the discontinuities, we may have several τm’s for a station with multiple discontinuities. Each τm contains a potential discontinuity, and the number of τm’s corresponds to the number of potential discontinuities. Since we have the approximate Bayes factors BFi0 for each in , we compute the posterior probability of using the formula in Kass and Raftery (1995),
e13
where and BF00 = c0 = 1. If we define A = {there is a discontinuity in τm}, then P(A | Y1, Y2, … , YK) > 0.5 indicates there is a discontinuity in the time window. For time windows with P(A | Y1, Y2, … , YK) > 0.5, we estimate the expected location E(Lm | Y1Y2YK, A) for discontinuities. We compute the probability when there is a discontinuity by
e14
e15
e16
e17
For time window τm, the posterior mean of the location of the discontinuity is
e18
e19
We round the E(Lm | Y1Y2YK, A) to the closest integer and obtain the final estimation of the location of the discontinuity. We could compute the variance of the location of the discontinuity with
e20
The plot of 2 loge(BF) for a station based on simulated data is shown in Fig. 1 (details of the simulations are provided in section 4). In Fig. 1, there are three time windows containing 2 loge(BF) above the threshold. The threshold value is 4 and the indexing refers to months beginning with January 1900 (i = 1) and going through December 1999 (i = 1200). The break around 850 in differences series is from the neighbor series and we notice that 2 loge(BF) for this break is not above the threshold. The estimated locations for three detected discontinuities are 562, 933, and 1007 and estimated variances are 47.6, 7.5, and 14.7. For the first detected break, the true location is 570; for the second, the true location is 932; for the third, two true breaks are located at 1002 and 1011. We notice that all true breaks are within the interval of two estimated standard deviations from the estimated location and the distance between the estimated location and the true location is approximately proportional to the estimated standard deviation. Thus, the value of the estimated variance could be used to measure the relative accuracy of the estimated location and to select temperature series when the accuracy of the time location is important.
Fig. 1.
Fig. 1.

The 2 loge(BF) plot for simulated station USC00026486 from the “clustering and sign bias C20C1” simulation (Williams et al. 2012). (top) Temperature anomalies, (middle) a difference series between USC00026486 and one of its neighbors, and (bottom) is 2 loge(BF) for station USC00026486.

Citation: Journal of Climate 25, 24; 10.1175/JCLI-D-12-00052.1

3. Selection of parameters

Since the prior distribution has the form of , we need to specify the values for prior mean μ0 and prior variance . We set the prior mean μ0 to zero because we do not know whether the discontinuities will be positive or negative. A reasonable guess about the prior variance is that 90% of the discontinuities have a standardized size less than 1 (the accuracy of this guess will not significantly impact the results, as we later will discuss the sensitivity with respect to the prior variance in the results section). So the prior variance can be decided by
e21
where Δμ = μ1μ2, and the prior variance is equal to 0.3696 since
e22
We also need to determine a reasonable sample window size n: that is, how many observations from each side of the potential break point will be included in the t statistic. Including too many observations will increase undesired biases (i.e., the window may encompass more than one break); and including too few will lead to large uncertainty. We select the sample window size through a series of sensitivity analyses. As discussed further in the results section, it seems that if we let the window size n equal to 30 months (each side of the potential break), we achieve good results on simulated datasets (details are provided in the results section). For the prior odds P(Mt)/P(M0), typically the noninformative prior odds P(Mt)/P(M0) = 1 is used in the calculation, although other choices are possible, as we later will see in the results section.

To find the potential time window τm’s, a threshold for 2 loge(BF) needs to be specified. We follow the recommendation in Kass and Raftery (1995) and use the moderately positive evidence level of 4 as the threshold for 2 loge(BF). This threshold seems to work well for various simulations. Finally, since we could potentially have hundreds of neighbors, we need to set the upper limit for the number of neighbors included in the computation. To avoid including neighbors with very low correlation,3 we must set the lowest correlation limit. Based on our simulations, the performance of the algorithm is not very sensitive to these two parameters. Therefore, we use 40 as the upper limit for the number of neighbors and 0.5 as the lowest correlation limit for the neighbors as in Menne and Williams (2009).

4. Results

a. Results using simulated and real-world observations

We used two large simulated datasets to evaluate the effectiveness of our algorithm. Simulated datasets have been used to benchmark algorithms in the homogenization of radiosonde records (e.g., Titchner et al. 2009); paleoclimate reconstructions (e.g., Mann et al. 2005); and, more recently, surface temperature records (Venema et al. 2012; Williams et al. 2012). Here we used two of the simulated datasets described in Williams et al. (2012).

Each of these synthetic datasets contained about 7700 station series and each station has a maximum record length of 100 yr; however, many of the stations have much shorter records and are characterized by missing periods of varying length. The simulated temperature series are based on the climate model output and contain correlated errors. As described in detail in Williams et al. (2012), the missing data patterns mimic the data record and geographic distribution of stations in the U.S. Cooperative Observer Network. The simulations contain only step function discontinuities, which are arguably the most prevalent type of artificial discontinuities. We have two simulated datasets that we call them simulations 1 and 2. In Williams et al. (2012), simulation 1 is referred to as “clustering and sign bias C20C1,” and simulation 2 is referred to as “many small breaks with sign bias.” For simulation 1, there are on average 7 breaks for each series. The breaks are not randomly spaced through time but rather are “clustered”, with most stations having a break within 30 yr of 1945 and during the 1980s to reflect the changes that occurred in the real-world network (Menne et al. 2009). For simulation 2, there are on average 10 breaks for each series. Also, for both simulations, there is a sign bias to reflect what is known about the errors in the USHCN (Menne et al. 2009), which means that the imposed errors do not have a frequency distribution that is symmetric about zero. Rather, there is a preference for positive errors in simulation 1 and negative errors in simulation 2. Overall, simulation 1 contains a mixture of large, medium, and small discontinuities and resembles the type of errors thought to be present in USHCN temperature data. Simulation 2 has predominantly small breaks and resembles a very challenging situation. In addition, for simulation 2, breaks are very close to each other, which makes the detection even more challenging. More details of two simulations can be found in Table 1.

Table 1.

The characteristics of two simulations as in Williams et al. (2012).

Table 1.

We used the parameter settings described in section 3 to identify breaks in the two datasets. The result for simulation 1 is listed in Table 2. Our algorithm detects 84.50% of true large discontinuities.4 For detected large discontinuities, the false detection rate (FDR) is only 1.11%. The overall hit rate is 47.11%, and the false detection rate is 11.82%. The result for simulation 2 is listed in Table 3. We detected 11.17% of the total breaks. The false detection rate is 9.55%. The median of or estimated SNR for simulation 1 is 0.78 and for simulation 2 is 0.19 (Table 4). Although the overall hit rates and false discovery rates may not be impressive at the first look, we should recognize that simulated temperature series are quite realistic and many of the imposed breaks are small (near zero). Thus, the hit rates and false detection rates achieved by our algorithm are reasonable and comparable to results using the Menne and Williams (2009) pairwise homogenization algorithm (PHA; Tables 2, 3). Particularly for simulation 2, many small breaks with sign bias, most of the breaks are small breaks or smaller than 0.5°C and the median of for small discontinuities is only 0.16. Thus, any algorithm will likely have difficulties to boost the hit rate without increasing the false detection rate. The computations for the examples were carried out in R language on a 2.66-GHz CPU. Computation time is roughly about 2.7 s per station or 5.8 h for a 7700-station network.

Table 2.

Results obtained in simulation 1, “clustering and sign bias C20C1.”

Table 2.
Table 3.

Results obtained in simulation 2 “many small breaks with sign bias.”

Table 3.
Table 4.

Median of the estimated SNR. The sizes are as follows: large is δ ≥ 1.0°C, medium is 0.5°C ≤ δ < 1.0°C, and small is δ < 0.5°C, where δ is the size of a discontinuity and 1°C = 1°C.

Table 4.

To further evaluate the efficiency of the Bayes algorithm, a simple adjustment factor was calculated for each of the break dates identified. For each detected break, multiple adjustments were first calculated using a 30-month window (each side of the break) on each difference series and the median of the adjustments was used as the final adjustment factor. These adjustments were then applied to the 1218 simulated series that are corollaries to the real USHCN station temperature series (Menne et al. 2009). A conterminous U.S. (CONUS) average time series was then computed as in Williams et al. (2012) using the 1218 adjusted series as well as the raw input unadjusted series (i.e., with errors) and the underlying series with no seeded errors. As shown in Fig. 2, applying the Bayes factor adjustments moves the CONUS average trends closer to its true “homogeneous” value. In the case of simulation 1, the adjusted trends are smaller than the raw input indicating the adjustments are accounting for the input data errors, which have had a positive sign bias. In simulation 2, the errors have a negative bias, and the adjusted trends are therefore larger than the raw input. Not surprisingly, the adjustments do not move the CONUS average trend too far but rather not far enough. This is an indication that the adjustments are incomplete rather than overly aggressive, especially in the case of simulation 2 where the detection rate is relatively low. As discussed in Williams et al. (2012), PHA-based adjustments behave similarly (results using the operational configuration of the PHA, version 52i are also shown in Fig. 2a). Notably, the Bayes factor algorithm moves trend nearly as far as the operational PHA algorithm in simulation 2 but not in simulation 1. Because the detection rates are comparable between the two algorithms, the reason for the differential adjustments may be related to the way in which the Bayes factor adjustments are calculated (i.e., using a very limited time window) compared to the PHA and/or the fact that the Bayes factor algorithm, unlike the PHA, does not currently use metadata as a prior. As mentioned in the conclusion, the adjustment method and exploiting available metadata are both logical options for future Bayes factor algorithm improvement.

Fig. 2.
Fig. 2.

(top) Annual average CONUS temperature series calculated using the USHCN monthly temperature series from the simulation-1 dataset. Spatial averages are based adjustments calculated from the Bayes factor algorithm (in black), the Menne and Williams (2009) PHA (in orange). CONUS averages for the nonhomogenized (raw) input values with the seeded errors are shown in red. Averages based on the true data series without errors are shown in green. (bottom) As in (top), except for simulation 2.

Citation: Journal of Climate 25, 24; 10.1175/JCLI-D-12-00052.1

The Bayes factor algorithm was also applied to mean monthly maximum and mean monthly minimum temperature series from full 7000+ stations in the U.S. Cooperative Observer Program network with the parameters described in section 3. Similar to the above, a CONUS-wide average was computed from the 1218-station USHCN subset of stations using both the raw input series and the adjusted series. The time series and trend values for maximum and minimum temperatures are shown in Fig. 3. As in the case of the PHA adjustments (also shown), adjusted maximum temperature trends based on the Bayes factor algorithm are larger than in the raw, unadjusted trends. This is consistent with the presented understanding that maximum temperatures in the United States contain pervasive negative biases, especially since 1950. These biases are primarily related to changes in the time of observation and a widespread change from liquid-in-glass thermometers to electronic thermistors (see Menne et al. 2009; Williams et al. 2012). For minimum temperatures, there are apparent conflicting biases in the USHCN temperature measurements, with a negative time of observation bias dominating since 1950 and a positive bias associated with the change to electronic thermistors that occurs largely in the mid-1980s. The Bayes factor adjustments on minimum temperature trends are also broadly consistent with this understanding.

Fig. 3.
Fig. 3.

As in Fig. 2 but for real-world monthly-mean (top) maximum and (bottom) temperatures.

Citation: Journal of Climate 25, 24; 10.1175/JCLI-D-12-00052.1

b. Evaluation of parameter sensitivity

For simulation 1, we randomly selected 5% of the stations and performed the sensitivity analyses of the HRs and FDRs with respect to prior variances, prior odds, threshold values of 2 loge(BF), and sample window sizes. Figure 4a shows the sensitivity analysis of the HRs and FDRs with respect to the prior variance. We observe that the HRs and FDRs are not overly sensitive to the choice of the prior variance unless the prior variance is unreasonably small. This means that the selection of the prior variance is not a concern for the model. Figure 4b contains the sensitivity analysis of the HRs and FDRs with respect to the log10(prior odds) of P(Mt)/P(M0). The HRs and FDRs are not very sensitive to the choice of prior odds when the log10(prior odds) is greater than −2. The sensitivity analysis of the HRs and FDRs with respect to the sample window size is shown in Fig. 4c. Very large or very small sample window sizes will lower the HRs. Also, very large sample window sizes cause the FDRs to increase. This is perhaps caused by including nearby discontinuities in the sample window. Figure 4d shows the sensitivity analysis of the HRs and FDRs with respect to the threshold values of 2 loge(BF). The HRs and FDRs are sensitive to the threshold value. Fortunately, the FDRs decrease more rapidly than the HRs when the threshold value increases.

Fig. 4.
Fig. 4.

The sensitivity analysis of HRs and FDRs with respect to (a) the prior variance, (b) the the log10(prior odds), (c) the sample window size, and (d) the threshold value of 2 loge(BF).

Citation: Journal of Climate 25, 24; 10.1175/JCLI-D-12-00052.1

From Eq. (9), we know that the 2 loge(BF) is the function of the SNR, the sample window size n and the prior variance if the prior mean μ0 is equal to 0.5 From Fig. 5, we know that the 2 loge(BF) is not sensitive to the change of prior variance , and we can follow the procedure in section 3 to select a prior variance. For breaks with relatively large SNR values, Fig. 5 shows that the 2 loge(BF) is sensitive to the change of the sample window size n. However, a large sample window may include nearby discontinuities. The choice of the sample window size n depends on the prior information about the density of the discontinuities and the level of SNR. To apply the model on real observations, we could start from a relatively small window and gradually increase the size of the window until the HRs decrease. From Fig. 5, we also notice that increasing the threshold for 2 loge(BF) will effectively eliminate false detections with small SNR values and lower the FDRs. Choosing a different prior odds will also affect FDRs. In the next paragraph, we will discuss the choice of prior odds.

Fig. 5.
Fig. 5.

(a) The 2 loge(BF) as the function of the sample window size and SNR value when prior variance equal to 0.3696 (each curve has the same SNR value), (b) 2 loge(BF) as the function of the sample window size and SNR value when prior variance equal to 5.3696, (c) 2 loge(BF) as the function of the sample window size and SNR value when prior variance equal to 10.3696, and (d) the box plot of values for each size category in simulation 1.

Citation: Journal of Climate 25, 24; 10.1175/JCLI-D-12-00052.1

If we use flat priors or cj = c for ∀j ∈ (sm, … , sm + lm), then from Eq. (13) we know
e23
where A = {there is a discontinuity in τm}. Because the condition of having a break is P(A | Y1, Y2, … , YK) > 0.5, from Eq. (23), we know
e24
Since
e25
the sufficient and necessary condition for P(A | Y1, Y2, … , YK) > 0.5 is
e26
For the usual noninformative prior c = 1, Eq. (26) is always valid. If we want to achieve the maximum hit rate for a certain threshold of 2 loge(BF), then we should use the noninformative prior. If we want to lower the FDR and the threshold for 2 loge(BF) is T, we can choose a different c with
e27
where d > 0 and is the estimated average time window length.

5. Conclusions

Detecting the artificial discontinuities in the real temperature series usually carries uncertainties. For example, a large fraction of the breaks in most surface temperature networks are probably undocumented and we may have more than one plausible location for a discontinuity. Because the “true” climate signal in these series is unknown, the best that we can hope to do is to quantify the uncertainty, and one way to do this is to approach the changepoint detection problem in multiple different ways (Thorne et al. 2011). With the model in this paper, we can quantify the uncertainty of the location and estimate the most likely location in a probabilistic approach. We have shown in the examples that the proposed model achieved reasonable results with simulated large-scale realistically noisy temperature series. The results of sensitivity analyses also provide the evidence that the model is useful for the real applications. In the future, we plan to use the available metadata information as a prior in the Bayes factor algorithm as well as test alternative ways to calculate adjustments for the identified breaks. These future algorithm enhancements will allow for a more comprehensive comparison to other homogenization algorithms and help quantify the structural uncertainty associated with surface temperature homogenization.

Acknowledgments

We are grateful to Dr. Peter Thorne for his assistance with the data preparation and to Dr. Murray Clayton for his comments on our model. The comments by three anonymous reviews also greatly improved the manuscript.

REFERENCES

  • Alexandersson, H., 1986: A homogeneity test applied to precipitation data. J. Climatol., 6, 661675.

  • Beaulieu, C., T. Ouarda, and O. Seidou, 2010: A Bayesian normal homogeneity test for the detection of artificial discontinuities in climatic series. Int. J. Climatol., 30, 23422357.

    • Search Google Scholar
    • Export Citation
  • Caussinus, H., and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. J. Royal Stat. Soc., 53C, 405425.

    • Search Google Scholar
    • Export Citation
  • Della-Marta, P., and H. Wanner, 2006: A method of homogenizing the extremes and mean of daily temperature measurements. J. Climate, 19, 41794197.

    • Search Google Scholar
    • Export Citation
  • Gönen, M., W. Johnson, Y. Lu, and P. Westfall, 2005: The Bayesian two-sample t test. Amer. Stat., 59, 252257.

  • Good, I., 1985: Weight of evidence: A brief survey. Bayesian Statistics, J. Bernardo et al., Eds., Elsevier, 249–269.

  • Hannart, A., and P. Naveau, 2009: Bayesian multiple change points and segmentation: Application to homogenization of climatic series. Water Resour. Res., 45, W10444, doi:10.1029/2008WR007689.

    • Search Google Scholar
    • Export Citation
  • K-1 Model Developers, 2004: K-1 coupled GCM (MIROC) description. K-1 Tech. Rep, 1, 39 pp.

  • Kass, R., and A. Raftery, 1995: Bayes factors. J. Amer. Stat. Assoc., 90, 773795.

  • Lu, Q., R. Lund, and T. Lee, 2010: An MDL approach to the climate segmentation problem. Ann. Appl. Stat., 4, 299319.

  • Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15, 25472554.

    • Search Google Scholar
    • Export Citation
  • Lund, R., X. Wang, Q. Lu, J. Reeves, C. Gallagher, and Y. Feng, 2007: Changepoint detection in periodic and autocorrelated time series. J. Climate, 20, 51785190.

    • Search Google Scholar
    • Export Citation
  • MacKay, D. J. C., 2003: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 628 pp.

  • Mann, M., S. Rutherford, E. Wahl, and C. Ammann, 2005: Testing the fidelity of methods used in proxy-based reconstructions of past climate. J. Climate, 18, 40974107.

    • Search Google Scholar
    • Export Citation
  • Menne, M. J., and C. N. Williams, 2009: Homogenization of temperature series via pairwise comparisons. J. Climate, 22, 17001717.

  • Menne, M. J., C. N. Williams, and R. S. Voss, 2009: The U.S. Historical Climatology Network monthly temperature data, version 2. Bull. Amer. Meteor. Soc., 90, 9931007.

    • Search Google Scholar
    • Export Citation
  • Peterson, T. C., and Coauthors, 1998: Homogeneity adjustment of in situ atmospheric climate data: A review. Int. J. Climatol., 18, 14931517.

    • Search Google Scholar
    • Export Citation
  • Reeves, J., J. Chen, X. L. Wang, R. Lund, and Q. Q. Lu, 2007: Comparison of techniques for detection of discontinuities in temperature series. J. Appl. Meteor. Climatol., 46, 900914.

    • Search Google Scholar
    • Export Citation
  • Rouder, J. N., P. L. Speckman, D. Sun, R. D. Morey, and G. Iverson, 2009: Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev., 16, 225237.

    • Search Google Scholar
    • Export Citation
  • Thorne, P. W., and Coauthors, 2011: Guiding the creation of a comprehensive surface temperature resource for twenty-first-century climate science. Bull. Amer. Meteor. Soc., 92, ES40ES47.

    • Search Google Scholar
    • Export Citation
  • Titchner, H., P. W. Thorne, M. P. McCarthy, S. F. B. Tett, L. Haimberger, and D. E. Parker, 2009: Critically assessing tropospheric temperature trends from radiosondes using realistic validation experiments. J. Climate, 22, 465485.

    • Search Google Scholar
    • Export Citation
  • Venema, V. K. C., and Coauthors, 2012: Benchmarking monthly homogenization algorithms. Climate Past, 8, 89115, doi:10.5194/cp-8-89-2012.

    • Search Google Scholar
    • Export Citation
  • Vincent, L., 1998: A technique for the identification of inhomogeneities in Canadian temperature series. J. Climate, 11, 10941104.

  • Wang, X. L., 2008a: Accounting for autocorrelation in detecting mean shifts in climate data series using the penalized maximal t or F test. J. Appl. Meteor. Climatol., 47, 24232444.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., 2008b: Penalized maximal F test for detecting undocumented mean shift without trend change. J. Atmos. Oceanic Technol., 25, 368384.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized maximal t test for detecting undocumented mean change in climate data series. J. Appl. Meteor. Climatol., 46, 916931.

    • Search Google Scholar
    • Export Citation
  • Washington, W., and Coauthors, 2000: Parallel climate model (PCM) control and transient simulations. Climate Dyn., 16, 755774.

  • Williams, C. N., M. J. Menne, and P. W. Thorne, 2012: Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. J. Geophys. Res., 117, D05116, doi:10.1029/2011JD016761.

    • Search Google Scholar
    • Export Citation
1

We calculate the mean monthly temperature for each month and obtain the monthly temperature anomaly by subtracting the corresponding monthly-mean temperature from the actual monthly temperature.

2

We focus on time windows with 2 loge(BF) greater than zero since such windows contain evidence that favors discontinuities.

3

The interstation correlation is estimated from the first difference series.

4

We classify the discontinuities into three categories in terms of their actual sizes to help readers understand the performance of the model. Three categories are defined as follows: large, δ ≥ 1.0°C; medium, 0.5°C ≤ δ < 1.0°C; and small, δ < 0.5°C, where δ is the size of a discontinuity and 1°C = 1°C.

5

If the SNR is known, then we can replace t statistic ζ in Eq. (9) with .

Save
  • Alexandersson, H., 1986: A homogeneity test applied to precipitation data. J. Climatol., 6, 661675.

  • Beaulieu, C., T. Ouarda, and O. Seidou, 2010: A Bayesian normal homogeneity test for the detection of artificial discontinuities in climatic series. Int. J. Climatol., 30, 23422357.

    • Search Google Scholar
    • Export Citation
  • Caussinus, H., and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. J. Royal Stat. Soc., 53C, 405425.

    • Search Google Scholar
    • Export Citation
  • Della-Marta, P., and H. Wanner, 2006: A method of homogenizing the extremes and mean of daily temperature measurements. J. Climate, 19, 41794197.

    • Search Google Scholar
    • Export Citation
  • Gönen, M., W. Johnson, Y. Lu, and P. Westfall, 2005: The Bayesian two-sample t test. Amer. Stat., 59, 252257.

  • Good, I., 1985: Weight of evidence: A brief survey. Bayesian Statistics, J. Bernardo et al., Eds., Elsevier, 249–269.

  • Hannart, A., and P. Naveau, 2009: Bayesian multiple change points and segmentation: Application to homogenization of climatic series. Water Resour. Res., 45, W10444, doi:10.1029/2008WR007689.

    • Search Google Scholar
    • Export Citation
  • K-1 Model Developers, 2004: K-1 coupled GCM (MIROC) description. K-1 Tech. Rep, 1, 39 pp.

  • Kass, R., and A. Raftery, 1995: Bayes factors. J. Amer. Stat. Assoc., 90, 773795.

  • Lu, Q., R. Lund, and T. Lee, 2010: An MDL approach to the climate segmentation problem. Ann. Appl. Stat., 4, 299319.

  • Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15, 25472554.

    • Search Google Scholar
    • Export Citation
  • Lund, R., X. Wang, Q. Lu, J. Reeves, C. Gallagher, and Y. Feng, 2007: Changepoint detection in periodic and autocorrelated time series. J. Climate, 20, 51785190.

    • Search Google Scholar
    • Export Citation
  • MacKay, D. J. C., 2003: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 628 pp.

  • Mann, M., S. Rutherford, E. Wahl, and C. Ammann, 2005: Testing the fidelity of methods used in proxy-based reconstructions of past climate. J. Climate, 18, 40974107.

    • Search Google Scholar
    • Export Citation
  • Menne, M. J., and C. N. Williams, 2009: Homogenization of temperature series via pairwise comparisons. J. Climate, 22, 17001717.

  • Menne, M. J., C. N. Williams, and R. S. Voss, 2009: The U.S. Historical Climatology Network monthly temperature data, version 2. Bull. Amer. Meteor. Soc., 90, 9931007.

    • Search Google Scholar
    • Export Citation
  • Peterson, T. C., and Coauthors, 1998: Homogeneity adjustment of in situ atmospheric climate data: A review. Int. J. Climatol., 18, 14931517.

    • Search Google Scholar
    • Export Citation
  • Reeves, J., J. Chen, X. L. Wang, R. Lund, and Q. Q. Lu, 2007: Comparison of techniques for detection of discontinuities in temperature series. J. Appl. Meteor. Climatol., 46, 900914.

    • Search Google Scholar
    • Export Citation
  • Rouder, J. N., P. L. Speckman, D. Sun, R. D. Morey, and G. Iverson, 2009: Bayesian t tests for accepting and rejecting the null hypothesis. Psychon. Bull. Rev., 16, 225237.

    • Search Google Scholar
    • Export Citation
  • Thorne, P. W., and Coauthors, 2011: Guiding the creation of a comprehensive surface temperature resource for twenty-first-century climate science. Bull. Amer. Meteor. Soc., 92, ES40ES47.

    • Search Google Scholar
    • Export Citation
  • Titchner, H., P. W. Thorne, M. P. McCarthy, S. F. B. Tett, L. Haimberger, and D. E. Parker, 2009: Critically assessing tropospheric temperature trends from radiosondes using realistic validation experiments. J. Climate, 22, 465485.

    • Search Google Scholar
    • Export Citation
  • Venema, V. K. C., and Coauthors, 2012: Benchmarking monthly homogenization algorithms. Climate Past, 8, 89115, doi:10.5194/cp-8-89-2012.

    • Search Google Scholar
    • Export Citation
  • Vincent, L., 1998: A technique for the identification of inhomogeneities in Canadian temperature series. J. Climate, 11, 10941104.

  • Wang, X. L., 2008a: Accounting for autocorrelation in detecting mean shifts in climate data series using the penalized maximal t or F test. J. Appl. Meteor. Climatol., 47, 24232444.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., 2008b: Penalized maximal F test for detecting undocumented mean shift without trend change. J. Atmos. Oceanic Technol., 25, 368384.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized maximal t test for detecting undocumented mean change in climate data series. J. Appl. Meteor. Climatol., 46, 916931.

    • Search Google Scholar
    • Export Citation
  • Washington, W., and Coauthors, 2000: Parallel climate model (PCM) control and transient simulations. Climate Dyn., 16, 755774.

  • Williams, C. N., M. J. Menne, and P. W. Thorne, 2012: Benchmarking the performance of pairwise homogenization of surface temperatures in the United States. J. Geophys. Res., 117, D05116, doi:10.1029/2011JD016761.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    The 2 loge(BF) plot for simulated station USC00026486 from the “clustering and sign bias C20C1” simulation (Williams et al. 2012). (top) Temperature anomalies, (middle) a difference series between USC00026486 and one of its neighbors, and (bottom) is 2 loge(BF) for station USC00026486.

  • Fig. 2.

    (top) Annual average CONUS temperature series calculated using the USHCN monthly temperature series from the simulation-1 dataset. Spatial averages are based adjustments calculated from the Bayes factor algorithm (in black), the Menne and Williams (2009) PHA (in orange). CONUS averages for the nonhomogenized (raw) input values with the seeded errors are shown in red. Averages based on the true data series without errors are shown in green. (bottom) As in (top), except for simulation 2.

  • Fig. 3.

    As in Fig. 2 but for real-world monthly-mean (top) maximum and (bottom) temperatures.

  • Fig. 4.

    The sensitivity analysis of HRs and FDRs with respect to (a) the prior variance, (b) the the log10(prior odds), (c) the sample window size, and (d) the threshold value of 2 loge(BF).

  • Fig. 5.

    (a) The 2 loge(BF) as the function of the sample window size and SNR value when prior variance equal to 0.3696 (each curve has the same SNR value), (b) 2 loge(BF) as the function of the sample window size and SNR value when prior variance equal to 5.3696, (c) 2 loge(BF) as the function of the sample window size and SNR value when prior variance equal to 10.3696, and (d) the box plot of values for each size category in simulation 1.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 161 59 9
PDF Downloads 69 22 1