Attributes of Several Methods for Detecting Discontinuities in Mean Temperature Series

Arthur T. DeGaetano Northeast Regional Climate Center, Department of Earth and Atmospheric Science, Cornell University, Ithaca, New York

Search for other papers by Arthur T. DeGaetano in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Simulated annual temperature series are used to compare seven homogenization procedures. The two that employ likelihood ratio tests routinely outperform other methods in their ability to identify modest (0.33°C; 0.6 standard deviation anomaly) shifts in the mean. The percentage of imposed shifts that are detected by these methods is similar to that based on tests that rely on a priori metadata information concerning the position of potential shifts. These methods, along with a two-phase regression approach, are also best at identifying and placing multiple shifts within a single time series. Although the regression procedure is better able to detect multiple breaks that are separated by relatively short time intervals, in its published form it suffers from a higher-than-expected Type I error rate. This was also found to be a problem with a metadata-based procedure currently in operational use. The likelihood tests are strongly influenced by the presence of trends in the difference series and short (<20 yr) series length.

The ability of a given procedure to detect a discontinuity is predominately influenced by the magnitude of the discontinuity relative to the standard deviation of the data series being evaluated. Data series length, correlation between the test series and its associated reference series, and test series autocorrelation also influence test performance. These features were not considered in previous homogenization method comparisons.

Discontinuities with magnitudes less than 0.6 times the standard deviation of the time series represent the lower limit for homogenization. Based on the most effective homogenization techniques, 10% of the 1.25 standard deviation discontinuities are likely to remain in climatic data series, unless reference station correlations are exceptional or quality station metadata are available.

Corresponding author address: Dr. Art DeGaetano, Cornell University, 1119 Bradfield Hall, Ithaca, NY 14853. Email: atd2@cornell.edu

Abstract

Simulated annual temperature series are used to compare seven homogenization procedures. The two that employ likelihood ratio tests routinely outperform other methods in their ability to identify modest (0.33°C; 0.6 standard deviation anomaly) shifts in the mean. The percentage of imposed shifts that are detected by these methods is similar to that based on tests that rely on a priori metadata information concerning the position of potential shifts. These methods, along with a two-phase regression approach, are also best at identifying and placing multiple shifts within a single time series. Although the regression procedure is better able to detect multiple breaks that are separated by relatively short time intervals, in its published form it suffers from a higher-than-expected Type I error rate. This was also found to be a problem with a metadata-based procedure currently in operational use. The likelihood tests are strongly influenced by the presence of trends in the difference series and short (<20 yr) series length.

The ability of a given procedure to detect a discontinuity is predominately influenced by the magnitude of the discontinuity relative to the standard deviation of the data series being evaluated. Data series length, correlation between the test series and its associated reference series, and test series autocorrelation also influence test performance. These features were not considered in previous homogenization method comparisons.

Discontinuities with magnitudes less than 0.6 times the standard deviation of the time series represent the lower limit for homogenization. Based on the most effective homogenization techniques, 10% of the 1.25 standard deviation discontinuities are likely to remain in climatic data series, unless reference station correlations are exceptional or quality station metadata are available.

Corresponding author address: Dr. Art DeGaetano, Cornell University, 1119 Bradfield Hall, Ithaca, NY 14853. Email: atd2@cornell.edu

1. Introduction

The homogeneity of the climate record continues to receive considerable attention. Time series are commonly contaminated by nonclimatic discontinuities that result from station relocations (Karl and Williams 1987), instrument transitions (Quayle et al. 1991), observation time changes (DeGaetano 1999), and station-specific trends related to environmental changes in the proximity of the observation site (e.g., Kalnay and Cai 2003). Numerous methodologies have been developed to detect and adjust these inhomogeneities (Peterson et al. 1998). These methods are increasingly being relied upon to detect shifts in the means of relatively short climatological time series as these data are used in financial transactions and business decisions (Banks 2002).

Despite the large number of homogenization methodologies, few studies have evaluated the performance of the different techniques side by side. Ducré-Robitaille et al. (2003, hereafter DR03) offer the most comprehensive comparison of techniques. The overall performance of eight tests was evaluated using three types of simulated data series. Homogeneous series were generated from 100 random normally distributed values, representing standardized annual temperature anomalies (mean = zero and variance = 1) using an AR(1) model. Autocorrelation was set at 0.1. Single-break series were fabricated by imposing 0.25° to 2.0°C shifts in the homogeneous series. Steps were imposed at several positions in the series, with the earliest at position (year) 5 and the latest at the midpoint of the series (year 50). Multiple-step series had steps of random magnitude inserted at random positions, with at least 10 yr between consecutive breaks. Homogeneous random normal reference series were also generated and served as a benchmark for each test. The correlation between these reference series and the series being tested (i.e., the candidate series) was typically near 0.8. The methods tested were generally comparable in performance, with tests proposed by Alexandersson (1986) and Vincent (1998) giving slightly more favorable results.

Easterling and Peterson (1992; 1995) also provide a comparison of homogenization techniques. For a series with a single discontinuity, Easterling and Peterson (1992) found that likelihood ratio tests such a those proposed by Potter (1981) and Alexandersson (1986) were best able to identify relatively small anomalies (e.g., 0.5 standard deviations from the mean). For larger (2.0 standard deviation) discontinuities, all methods exhibited similar performance. Easterling and Peterson (1995) compared their new two-phase regression approach to the Alexandersson (1986) test. When only one 0.5 standard deviation (σ) discontinuity was imposed at the midpoint of a 100-value simulated data series, Alexandersson's test outperformed Easterling and Peterson's approach. However the regression-based procedure was superior at detecting two discontinuities with alternating signs (−1σ at year 45 and +1σ at year 55). The methods performed similarly when three or four discontinuities were introduced to the series. When a fifth discontinuity was added, Easterling and Peterson's approach correctly identified more of the discontinuities.

Using the work of DR03 as a basis, this work provides a more rigorous evaluation of homogenization techniques. DR03 do not account for differences in station variance and between-station correlation structure (i.e., the correlation between the time series of the candidate and reference stations). In this study, simulated reference series with different variance and correlation attributes are evaluated. This allows the test routines to be evaluated for regions with different degrees of spatial heterogeneity and in areas with different station densities. It also provides insight into the use of these tests on monthly and submonthly temperature series, as well as time series of data other than surface air temperature. The effects of series length, autocorrelation, and nonstationarity are also addressed—three parameters that DR03 held constant.

2. Homogenization methods

Seven homogenization techniques are compared. Five represent techniques that do not require a priori knowledge about the potential positions of the discontinuities. Such metadata are necessary for the other two techniques. DR03 do not investigate metadata-based approaches, modifying the Karl and Williams (1987) technique to allow the statistical selection of breaks.

Each procedure has been described previously in the literature and thus for brevity an explanation of the algorithms is avoided. Nonetheless, Table 1 enumerates the procedures, giving an appropriate reference. Summaries of these approaches are also given in DR03, Peterson et al. (1998), and WMO (2003).

Slight modifications from the published versions were incorporated into two of the procedures. Lund and Reeves (2002) discuss a deficiency with two-phase regression (TPR) model approaches. They argue that serial dependence between the F values for adjacent potential change points as well as an unrealistic constraint that the two regression lines meet at the change point compromise the assumption that the null distribution follows an F distribution with 3 numerator and n−4 denominator degrees of freedom (F3,n−4), where n refers to the number of years in the time series. Through Monte Carlo simulations they show that the use of F3,n−4 leads to an overestimation of the true number of discontinuities. Their simulated F percentiles were used to assess the statistical significance of the change points identified in the TPR procedure.

The Allen and DeGaetano (2000) procedure [the nonparametric metadata-based test (NMETA)] was developed specifically for use with annual extreme temperature exceedence series. Homogeniety adjustment of such series requires modification of the daily temperature threshold that defines extreme, rather than the mean, of the annual exceedence series. Iterative adjustments were applied and tested to obtain the appropriate threshold adjustment. Here, an analogous procedure was adopted by iteratively increasing (decreasing) the annual averages by 0.1°C. In practice, a more direct method of determining the magnitude of the discontinuity could be implemented, if this procedure proves superior in detecting the presence of inhomogeneities. The iterative adjustment has positive implications when this procedure is used with trended difference series.

3. Simulated data series

a. Homogeneous series

The simulated series, used as the basis for comparing homogenization methodologies, were rooted in actual climatological candidate and reference series groupings. Hereafter the candidate series refers to the data being tested for discontinuities, while the reference series is constructed from four adjacent sites with relatively high correlation. Table 2 summarizes the relevant statistics associated with five site groupings, chosen to represent a range of U.S. climate conditions. The original station data are 2-m air temperatures computed from the National Climatic Data Center (NCDC) TD-3200 daily data archive. Reference station data were taken from the same source and chosen to minimize the geographic distance from the candidate station while maintaining a between-station correlation of at least 0.70. Subsequently the site groupings (station networks) are referred to by their state postal code given parenthetically in Table 2.

The simulated series were generated using a multivariate normal model following a procedure outlined in Wilks (1999). Homogenous sets of candidate and reference series were generated as
i1520-0442-19-5-838-e1
Here Zt is a vector of five stationary standardized (zero mean and unit variance) variables, one representing the candidate station and the remaining four, reference sites; [𝗕] is a 5 × 5 matrix calculated from the correlations among the stations and et is a vector of independent Gaussian random numbers. Since the lag-1 autocorrelations of the detrended observed annual mean temperature series (i.e., the residuals of a linear least squares fit) were not significant, the autocorrelation term [Φ]zt−1 was omitted.

This method is different from those used in the earlier assessments in that it preserves the observed between-station correlations. In some cases, the distribution of stations precludes between-station correlations of 0.8 or higher. Thus, the selection of an optimal homogenization method should consider its resilience to the correlation structure of the candidate and reference stations.

b. Artificial inhomogeneities

Step discontinuities were introduced into the individual Zt series given by Eq. (1) using the model
i1520-0442-19-5-838-e2
where Z*tj is the simulated data value for year t at station j that is part of a time series with an inhomogeneity of magnitude δ at year i. Thus, I = 0 for t < i and I = 1 for ti. Five values of δ were evaluated ranging from 0.11° to 0.55°C in 0.11°C increments (0.2°–1.0°F). This range represents a more stringent set of tests (smaller discontinuities) than was evaluated by DR03. In general, these discontinuities range from 0.15 to 1.5 times the typical candidate series σ. Each discontinuity was imposed at position (year) 6, 12, or 25 of a 50-yr simulated time series.

Multiple-step discontinuities were introduced in a similar manner. The ability of a homogenization procedure to detect multiple breaks is likely related to the time interval between consecutive breaks and the sign and magnitude of each discontinuity. Thus, specific conditions were imposed on series with multiple breaks, akin to the approach used by Easterling and Peterson (1995). These included two discontinuities characterized by breaks of equal magnitude and the same (and separately opposite) sign separated by short (6 yr), long (38 yr), and moderate (22 yr) time intervals. Simulated series with three and four discontinuities were also constructed with a range of discontinuity separations, step magnitudes, and signs.

c. Nonstationary series

In practice many of the existing data homogenization techniques are applied to nonstationary series. This situation arises in cases where a trend at the candidate station is not reflected in the reference series, perhaps due to a transition in land use around the candidate site. Allen and DeGaetano (2000) encountered nonstationary difference (candidate – reference) series in 17% of the homogeneity tests they applied.

Trends were introduced to homogeneous series generated by Eq. (1), using the model
i1520-0442-19-5-838-e3
where b is the imposed slope in (°C yr−1). Slopes of 0.001, 0.0025, and 0.005°C yr−1 were evaluated. Such trends are typical of those found by Allen and DeGaetano (2000) as well as reported urbanization and land use trends (Kalnay and Cai 2003; Vose et al. 2004).

4. Results

a. Homogeneous series

Table 3 shows the percentage of the 1000 homogeneous time series in which at least one false discontinuity was detected. For the parametric metadata-based test (PMETA) and NMETA it was assumed that metadata indicated a physical change at year 25, but the change did not produce a climatologically significant discontinuity in the data record.

It is not surprising that the values in Table 3 are typically near 5%, as the 95% confidence interval was used as the common basis for discontinuity identification. The results are in general agreement with those reported by DR03 with three exceptions. The probability of Type I errors is markedly lower for the Bayesian test (BAYE), TPR, and PMETA.

The reduction in Type I errors using TPR is related to the Lund and Reeves (2002) modification. Without this change, the probability of Type I errors for TPR increases to near 20%, which is still considerably lower than the 41.3% value reported by DR03. It appears that the Type I error rate for TPR is influenced by the length of the simulated time series. Replacing the 50-yr series with 100-yr series, as evaluated by DR03, results in Type I error rates that approach the 41% value cited.

Conversely, BAYE is stringent in terms of limiting Type I errors. Inhomogeneities are incorrectly flagged in only about 1.5% of the cases, as opposed to the approximate 7% Type I error rate reported by DR03. The Bayesian procedure used in the present study follows that outlined by Perreault et al. (1999) and is able to replicate their results.

PMETA is associated with the highest Type I error rates. This was also the case in DR03, who reported false step changes in over 50% of the homogeneous test series they tested using a modified version of this technique. The Type I error rate in the present analysis is considerably lower than reported in DR03. This is apparently is related to DR03's modification of the procedure to identify the most probable position of a break, rather than relying on metadata to preselect potential break positions. They also applied the Wilcoxon rank sum test as opposed to the Student's t test used by Karl and Williams (1987).

The above discrepancies in Type I error rates preclude an unbiased comparison of the methods in terms of their abilities to accuracy detect imposed discontinuities. In subsequent analyses, the significance requirements of the offending procedures were altered to provide a standard average Type I error rate of 5.0% ± 1.4%. In the case of TPR, the secondary t test (used to declare trended time series as inhomogeneous) was run at the α = 0.01 level to achieve this result, as opposed to the α = 0.05 level in Easterling and Peterson (1995). Likewise, the α level used in PMETA was reduced from 0.05 to 0.01. In BAYE, the Type I error rates were increased by reducing the prior probability for a change assumed by Perreault et al. (1999) from p = 0.5 to p = 0.25.

Figure 1 shows the magnitude and position of the falsely identified inhomogeneities for the nonmetadata procedures. In a separate analysis, there was no consistent relationship between the Type I error rate and between series correlation or candidate station σ. Thus, the Florida station network is used as a representative example. Like DR03, both the number and magnitude of Type I errors increase toward both ends of the series. This is a function of the size of the subset of the overall series that is available for the testing. This tendency is not as well defined for the two regression-based results [multiple linear regression (MLR) and TPR]. Also, as noted in DR03, the majority of falsely identified inhomogeneities are associated with offsets of 0.4°C or less.

b. Single-step series

The ability of each procedure to detect the position and magnitude of a single inhomogeneity varied with the position and size of the shift as well as the set of stations used to simulate the candidate and reference series. Figure 2 illustrates the combined effect of time series variability and candidate reference station correlation. Four methods are highlighted given the similarity between the results for 1) Potter's method (POTT) and the standard normal homogeneity test (SNHT), 2) MLR and TPR, and 3) PMETA and NMETA. In all cases, the ratio of the magnitude of the imposed inhomogeneity to the standard deviation of the candidate series (hereafter σ anomaly) is the dominant factor affecting the proportion of imposed discontinuities that are detected. Caussinus and Mestre (2004) report a similar finding. While Fig. 2 allows the influence of σ anomaly and candidate reference series correlation, r, to be assessed separately, these values could be collapsed into a single variable, Q, representing the ratio of σ anomaly to (1 − r2)0.5. For a given Q, the proportion of imposed discontinuities that are detected is nearly constant.

The ability to identify greater than 90% of the discontinuities (within ±2 yr of the imposed position) is associated with a relatively wide range of correlations and σ anomalies in PMETA and NMETA. Using PMETA all but 10% of the 1σ anomaly discontinuities can be identified with candidate reference series correlations a low as 0.65 (Fig. 2d). Only POTT and SNHT are able to identify this percentage of 1σ anomaly discontinuities (Fig. 2a). However, this can only be achieved for the highest (0.95) between station correlations.

The two regression-based techniques (TPR and MLR) consistently detect fewer than 80% of the 0.55°C (approximately 1σ anomaly) breaks. This pattern of weaker performance also extends to a more modest 0.33°C (0.6 σ anomaly) shift in the mean. The Bayesian approach is similar to SNHT and POTT in its ability to detect (and correctly position) a single break. In terms of station location, all tests exhibit superior performance for series that simulate the temperature climatology and reference station characteristics of NY. Collectively, the test procedures identify the fewest discontinuities for the FL and MN station networks, owing to their low between-station correlation and high standard deviation, respectively.

All methods are unable to reliably detect small (0.11°C, <0.4 σ anomaly) shifts in the mean (Fig. 2). NMETA and PMETA are able to identify the highest (although still less than 50% in most cases) percentage of these small breaks.

In all procedures the ability to identify a break increases with the length of the shortest homogeneous segment of the series (Fig. 3). Using POTT (and SNHT), there is a sharp decline in the number of identified discontinuities in series shorter than 21 yr (Fig. 3a). This influence diminishes as series length increases such that test performance is essentially unaffected by series length when more than 50 yr are available. The effect of series length is similar using MLR and BAYE (Figs. 3b,c) as well as TPR (not shown). However, the performance of these procedures does not decline as markedly for series with less than 21 data points.

In Fig. 3d, the performance of PMETA is sensitive to series length across the range of lengths tested, for σ anomalies less than 0.7. The influence of series length diminishes in more-than-50-yr series for σ anomalies near 1.0. For larger 1.67 σ anomalies, little change in the proportion of identified discontinuities is noted for series lengths >21 yr. NMETA behaves similarly.

The results presented in Fig. 3 also apply to the position of the break within a series. When the break was positioned near the beginning of the 50-yr series such that only a 5-yr segment existed before the break, all methods experience a decline in their ability to detect the break. Averaged over the five stations highlighted in Fig. 2, the proportion of 0.11° and 0.55°C discontinuities that are detected changes by less than 0.05 when these steps are imposed at year 12 as opposed to year 25 of a 50-yr series. This difference increases, reaching almost 0.15 for the regression-based procedures when the step is introduced at year 6 rather than year 25. When the magnitude of the break is modest (e.g., 0.6 σ anomaly), the influence of break position is more important. The proportion of breaks detected by POTT, SNHT, BAYE, PMETA, and NMETA decreases by 0.20–0.30 depending upon whether the step is imposed at year 25 or year 6 of a 50-yr series.

Figures 2 and 3 do not provide information as to the skill of the procedures at identifying the correct magnitude of the imposed break, nor do they provide information on the positions of incorrectly identified breaks. Using a 0.33°C break positioned at year 12 as a representative example, Fig. 4 compares the procedures using simulated data for WA. The results for the other stations are similar. All of the methods tend to overestimate the magnitude of the offsets (Fig. 4a). The median offset identified by PMETA, however, is essentially equal to the imposed 0.33°C shift. Most methods display a similar degree of variation in the magnitude of the offset, with the smallest interquartile ranges associated with SNHT and POTT. In approximately 5% of the cases, the regression-based approaches (MLR and TPR) identify an offset that is of opposite sign to the imposed offset. This also occurs to a lesser degree in the NMETA approach. Presumably, these cases contain spurious trends prior to and/or after the imposed step.

For all methods, the median position of the inhomogeneity corresponds to the position at which the break was introduced (Fig. 4b). The interquartile range for the MLR procedure is relatively large for the Washington station network, but one of the smallest at the New York site (not shown). Some techniques and station networks exhibit skewed distributions of break positions. For NY, the upper quartile break position is equivalent to the median using TPR and BAYE, whereas the SNHT method places a higher percentage of breaks after the correct position.

Although computed, box plots for the other break positions and magnitudes are not shown for brevity. For 0.11°C breaks, the interquartile ranges of break positions encompass as many as 20 yr in all procedures. Conversely, years with the largest (0.55°C) breaks typically had interquartile ranges of positions limited to the single break year. The magnitude of the largest breaks was also well specified by all procedures. There was a slight tendency for the variability of the step magnitudes to increase as the break position moved closer to the beginning (or end) of the time series. This was apparent both for imposed 0.55° and 0.33°C breaks.

c. Trended series

As trended difference series are not uncommon in the observed record, it is fruitful to quantify how this attribute affects each of the homogeneity tests. For series without an imposed step change, the number of falsely detected inhomogeneities increases with the magnitude of the trend (Table 4). The effect of a relatively small 0.001°C yr−1 trend is minimal, with Type I errors rates in the range from 4.5% to 8.0%. The existence of a modest 0.0025°C yr−1 trend more than doubles the number of Type I errors associated with most of the procedures. The two regression-based procedures (TPR and MLR) are notable exceptions, with relatively small increases in Type I errors. Likewise, Type I errors for NMETA remain below 10%. This procedure specifically accounts for trended difference series.

For the largest 0.005°C yr−1 trend, only the regression-based procedures provide reasonable Type I error rates. The New York group is particularly vulnerable, with Type I error rates exceeding 50% for SNHT and POTT. Interaction between the slope magnitude and the NY candidate station's relatively low standard deviation apparently influences the error rate.

Figure 5 illustrates the effect of positive trends on a series with a 0.33°C step discontinuity imposed at year 12. At all stations (NY is shown as a representative example) and for all test methods, the effect of the positive slope is to overadjust the discontinuity. If a negative slope had been imposed, this discontinuity would be underestimated. The interquartile ranges of the adjustments show little change with slope magnitude, except in NMETA. The nonregression-based methods (with the exception of NMETA) are affected the most, particularly for the 0.005°C yr−1 trend. Here the overadjustment is approximately 0.1°C, about half the increase introduced by the trend in the portion of the series after the imposed break.

Of the methods compared, MLR and NMETA are the only two that specifically account for trended difference series. The overadjustment for MLR is among the smallest. However, since this method is prone to overestimation in the absence of a trend, the adjustment biases are comparable to those for the other methods. The median adjustments for NMETA deviate the least from the imposed 0.33°C discontinuity in the presence of a trend. The relatively favorable performance of TPR appears to be an artifact of the standardization of its Type I error rate. To achieve the desired rate, the probability of rejecting the null hypothesis of homogeneity in the secondary t test was reduced. The intent of this test in the original procedure was to deem nonstationary difference series inhomogeneous. However, the secondary test was responsible for the higher-than-expected Type I error rates when applied to stationary series (Table 3).

There are two other features in Fig. 5. First, as was the case with the stationary series, MLR, and to a lesser extent TPR, are associated with a number of large outliers. These offsets are of the opposite sign of the imposed discontinuity. Adjustments based on these values would increase (rather than mitigate) the effect of the imposed break. While this occurs in less than 5% of the cases, such errors are particularly troublesome. Second, the offsets given by NMETA display a unique response to trend magnitude. Rather than experiencing a simple translation of the median and quartile adjustments, the effect of the increasingly steeper trends is to broaden the range of offsets, while keeping the median offset relatively close to the imposed 0.33°C discontinuity. This increase in variability is related to the detrending procedure used by Allen and DeGaetano (2000), which reflects both the upper and lower bounds of the 95% regression confidence interval, as well as the iterative testing of a range of possible adjustments.

The imposed slopes also affect the distribution of the break positions. In general, the 75th percentile expands to between year 13 and 15 (as opposed to year 12 or 13 in the stationary case) when a 0.005°C yr−1 trend is imposed (not shown). In all cases, the median and 25th percentile break years remain unchanged in the trended cases.

d. Multiple discontinuities

Table 5 compares the methods based on their ability to identify the correct number of inhomogeneities when more than one is introduced. SNHT is superior in identifying the correct number of breaks when two shifts in the mean are imposed on the series. When the interval between two discontinuities is relatively short, the TPR method identifies a greater percentage of the discontinuities. As more discontinuities are introduced, TPR, POTT, and SNHT identify the greatest percentage of breaks. TPR performance is best when sequential breaks are of opposite sign.

With regard to the placement of the breaks, when a method is able to identify the existence of one or multiple breaks, the placement of the break is generally accurate with little variation in position (Fig. 6). TPR tends to spread the position of the break over a two-or-more-position window, while the position histograms for POTT and SNHT exhibit more distinct nearly single position peaks. In TPR, breaks are detected with almost equal likelihood in both the correct year and one year earlier (e.g., Fig. 6c). SNHT and POTT appear to be more likely to identify multiple breaks that occur near the beginning and end of the time series, relative to breaks within the intervening portion of the record, particularly when the discontinuities alternate in magnitude (Figs. 6b and 6c).

MLR is characterized by spurious breaks. These tend to be positioned near the beginning and end of the time series (Fig. 6d) or near the midpoint of sequential breaks (Fig. 6b). BAYE generally captures the position of the imposed breaks, but it has difficulty identifying the existence of multiple breaks, as noted by the low percentage of breaks values. MLR also exhibits this tendency.

Figure 7 shows box plots of the magnitude of the detected breaks from series with multiple discontinuities. In all cases the analyzed breaks are either +0.55 or –0.55°C. For sequential breaks with alternating sign (Figs. 7a–c), TPR is appealing in that the detected discontinuities exhibit relatively little bias and modest spread. In general the biases associated with one discontinuity are compensated by those associated with the other discontinuities. This is a characteristic of all the procedures.

For sequential breaks of the same sign (Figs. 7d–f), TPR appears to be an attractive choice, as well. It is associated with relatively small biases and considerably lower variability than the other methods. It should be pointed out that the seemingly high variability (and large offsets) associated with the other methods is an artifact of the methods detecting fewer than the total number of imposed discontinuities. As a result, the magnitudes of the detected discontinuities are inflated by the presence of the undetected inhomogeneity. This is particularly apparent in Figs. 7d–f. In contrast, the narrow (single line) BAYE and MLR box plots correspond to cases where few (no) discontinuities were detected.

Figure 3d provides a means for comparing the PMETA and NMETA procedures with the others in terms of identifying multiple discontinuities in a series. When multiple potential breaks are indicated by the metadata, the data series must be divided into subseries, each containing only one of the potential discontinuities and separate PMETA or NMETA tests conducted on each subseries individually. Thus, the ability to detect multiple breaks in a series is related to the procedures' capacity to identify single breaks in several relatively short sequential time series.

If PMETA was applied to the series depicted in Figs. 6c and 6f, four separate tests would be conducted on four 20-yr subseries (encompassing years 0–20, 10–30, 20–40, and 30–50) each with a potential discontinuity introduced after the tenth value. This is analogous to the 21-yr candidate series depicted in Fig. 3. Based on the subseries, POTT, BAYE, and PMETA each identify about 70% of the 1.0 σ anomaly discontinuities (Fig. 3). When the entire 50-yr series was analyzed (Fig. 6f), the performance of POTT decreases slightly, with more than 60% of the discontinuities being identified. BAYE shows a marked decline in performance. This is due to the procedure's strong tendency to combine sequential discontinuities, thus overestimating their magnitude (Fig. 7f) and misplacing their position (Fig. 6f). This is a significant problem in both MLR and BAYE when sequential discontinuities are of the same magnitude but opposite sign (Figs. 6c and 7c).

5. Discussion

Based on Fig. 2 and several notable differences between the present results and those presented by DR03, it is apparent that several station-specific attributes influence the performance of homogenization procedures. For a fixed candidate reference series correlation, the proportion of 0.33° and 0.55°C discontinuities that were identified (p) generally decreases logarithmically with σ anomaly. In the regression-based procedures (TPR and MLR), p decreases exponentially with decreasing σ anomaly for high (0.95–0.80) fixed correlations. This is an indication that p has yet to reach a plateau within the range of σ anomalies examined (Fig. 2b). If larger σ anomalies were considered, the relationship between p and σ anomaly would tend toward logarithmic for these methods as well.

For a fixed σ anomaly >1 (i.e., discontinuity magnitude > series standard deviation), p generally decreases logarithmically with deceasing candidate reference series correlation (r). For σ anomalies <1, this relationship become exponential. Candidate series length also tends to be logarithmically related to p. Across the range of series lengths from 11 to 100 yr, p increases logarithmically using MLR, TPR, BAYE, PMETA, and NMETA. For SNHT and POTT this logarithmic increase is limited to series lengths greater than 21 yr.

The percentage of discontinuities that are identified is also a function of candidate series autocorrelation, r−1. For a fixed σ anomaly and r = 0.8, POTT and BAYE detect fewer 0.55°C discontinuities with increasing autocorrelation in the range of –0.2 to 0.2 (Figs. 8a and 8c). A similar tendency is noted for SNHT (not shown). This dependency is linear with a greater rate of decrease for the smaller σ anomalies.

For MLR (Fig. 8b), and similarly TPR (not shown), discontinuity detection is maximized for series with no serial correlation. Particularly for the MLR method, the percentage of discontinuities that are identified decreases as the absolute value of the lag-1 autocorrelation increases. Thus, this relationship is best characterized as quadratic.

PMETA is relatively resistant to differences in lag-1 autocorrelation (Fig. 8d). The relatively subtle decline in p (particularly for σ anomalies <1) is linear as was the case for POTT and BAYE. Likewise, autocorrelation only weakly influences NMETA.

There is no consistent relationship between the Type I error rates and between-series correlation or candidate series standard deviation. Although there is a consistent increase in Type I errors with increasing series length for most methods, the difference in Type I error rate is small (less than 3%) across the range if series lengths evaluated. Lag-1 autocorrelation, however, does influence the Type I error rates of all procedures, with higher (positive) autocorrelation leading to more Type I errors. This effect is most pronounced in the MLR, POTT, and SNHT methods. For the MLR method, the percentage of series with Type I errors increases from 5% (for autocorrelation near 0) to 25% in series with a lag-1 autocorrelation equal to 0.2. For the POTT and SNHT the largest Type I error rates are near 20% when lag-1 autocorrelation equals 0.2.

In combination, Figs. 2 –3 along with Fig. 8 provide a means to explain the differences between the present results and those given by DR03. Between 5% and 35% (in the case of MLR), more single 0.55°C artificial steps were identified in Fig. 2 than were reported by DR03 using SNHT, BAYE, MLR, and TPR. These differences arise solely from the statistical properties of the series analyzed. For series with properties similar to those used by DR03 (i.e., r = 0.8, σ anomaly = 1.0, and r−1 = 0.1), the number of identified 0.55°C steps is within 10% of those reported by DR03 (Fig. 8), with the present results indicating fewer detected discontinuities.

Accounting for the influence of series length (DR03 use 100-yr series as opposed to 50-yr intervals) increases the percentage of correctly identified discontinuities and brings the results of the present study to within 5% of those given by DR03. TPR is a notable exception owing to the adjustment needed to standardize the Type I error rate. As the test was made less aggressive (the probability of falsely rejecting the null hypothesis of homogeneity was lowered), it also detects about 10% fewer of the imposed discontinuities than reported by DR03.

6. Summary and conclusions

a. Individual methods

It is difficult to declare one of the seven common data homogenization routines tested a panacea for detecting and adjusting discontinuities in annual average temperature series. Each is associated with its own set of strengths and weaknesses. In this regard, it is possible to offer a number of generalizations describing each methodology.

1) SNHT and POTT

Pros:

  • Aside from PMETA and NMETA, identified the greatest number of imposed single discontinuities within 20-or-more-year series

  • Among the most favorable when applied to series with multiple discontinuities

  • Type I error rates were as expected based on the selected α level

  • Computationally inexpensive and statistically the least complex

Cons:

  • Performance declines markedly when discontinuities are separated by fewer than 10 yr

  • Not resilient to the presence of trends in the difference series

2) TPR

Pros:

  • Ability to detect multiple breaks, particularly when sequential breaks are close in time or of opposite sign

  • Fairly resilient to nonstationary difference series, when Type I error rate is standardized

Cons:

  • Less apt (compared to SNHT and POTT) to identify breaks in series with a single imposed discontinuity

  • Even when modified as per Lund and Reeves (2002), the number of Type I errors was greater than expected

  • Offset magnitudes are more variable than those given by other procedures for single discontinuities

  • Tends to specify the position of the break or breaks with less precision than the other methods

3) MLR

Pros:

  • Resilient to nonstationary difference series

Cons:

  • Difficulty in identifying single and multiple breaks except when magnitude approaches 1.67 σ anomalies

  • Distributions of offset magnitudes and break positions are most variable

4) BAYE

Pros:

  • Performance is comparable to POTT and SNHT for the large (>1 σ anomalies) single step changes

Cons:

  • Combines sequential discontinuities, in series with multiple breaks

5) PMETA and NMETA

Pros:

  • Detect the highest percentage of imposed single breaks

  • Comparable performance to POTT and SNHT in series with multiple discontinuities

  • NMETA is resistant to trended difference series

Cons:

  • Without modification, PMETA is plagued by a high Type I error rate

  • PMETA is not resilient to the presence of trends

  • Require the incorporation of accurate and complete station metadata

  • Development of reference series is cumbersome

  • NMETA is computationally expensive due to resampling

b. General findings

Using 50-yr simulated time series with no serial correlation, small breaks (σ anomalies ≤ 0.4) are poorly detected by all homogenization methods given the inherent variability of annual temperature series at stations in the contiguous United States. Here, σ anomaly refers to the ratio of the magnitude of the imposed discontinuity to the standard deviation of the candidate series. In general, 0.60 σ anomaly breaks represent a practical lower limit of detectability, for the methods (and time series lengths) evaluated. Between 70% and 90% of these breaks, which typically represent temperature steps of 0.33°C at continental U.S. stations, are detected, provided that candidate reference series correlation exceeds 0.80. TPR and MLR are exceptions as they consistently detect less than half of these discontinuities. Time series of more than 11 yr are necessary for the successful application of nonmetadata-based methods.

A number of the methods evaluated have been used in practice to create reference temperature time series. Typically the choice of method has followed national boundaries. For instance the U.S. Historical Climatology Network employs PMETA. Vincent and Gullet (1999) use MLR to create a temperature climatology for Canada. POTT has been used in Australia (Plummer et al. 1995), and Norwegian meteorological records are homogenized using SNHT (Peterson et al. 1998). Data homogeneity in the Global Historical Climate Network is addressed using TPR. The global dataset of Jones and Moberg (2003) merges station data that have been homogenized using most of the evaluated techniques. This differs from earlier versions of this dataset in which a more consistent but arduous station-by-station homogenization was conducted (Jones et al. 1985).

Given the differences in method performance, inhomogeneities have been adjusted to varying degrees in each of these datasets. For instance, homogeneities of a fixed magnitude (in terms of σ anomaly) may go unadjusted in datasets employing MLR, while similar discontinuities are adjusted in datasets homogenized using SNHT. Alternatively, it is likely that datasets homogenized by TPR and PMETA contain records that are overhomogenized, given the greater-than-expected Type I error rates that were identified. Collectively, it is unlikely that this overadjustment has introduced a systematic bias, given that the Type I errors are equally likely to represent positive and negative discontinuities.

In trended difference series, there is a systematic tendency for all methods (except NMETA) to alias a portion of the trend onto the estimated magnitude of the step change. In most cases, about half of the underlying trend is falsely included as part of the step change. Such a tendency is a fundamental problem with homogenization, particularly when the goal is to assess trends in the homogenized records. Future efforts in the development of homogenization techniques should consider incorporating procedures that specifically account for the presence of trended difference series.

It is possible that the influence of between-station correlation, autocorrelation, and time series variance has introduced consistent spatial variations in the degree of homogenization to global and regional datasets. Larger unadjusted discontinuities are likely at stations representing climates with high lag-1 autocorrelation or high year-to-year variability. Likewise, station records from climates characterized by relatively high spatial variability or low station density (and hence lower candidate reference series correlation) are likely to contain unadjusted discontinuities. The overall effect of these spatial biases is most problematic in cases where a systematic discontinuity has been introduced to the record, perhaps through a change in instrumentation or when station relocations introduce a consistent change in station environment. An assessment of these biases is an anticipated extension of this work.

Acknowledgments

This work was supported by the National Oceanic and Atmospheric Administration, National Climatic Data Center, under Contract EA133E-02-CN-0033. Thanks are also extended to Dan Wilks for his help in generating the simulated data series. The comments of two anonymous reviewers enhanced the revised manuscript.

REFERENCES

  • Alexandersson, H., 1986: A homogeneity test applied to precipitation data. J. Climatol, 6 , 661675.

  • Allen, R. J., and A. T. DeGaetano, 2000: A method to adjust long-term temperature extreme series for nonclimatic inhomogeneities. J. Climate, 13 , 36803695.

    • Search Google Scholar
    • Export Citation
  • Banks, E., 2002: Weather Risk Management Markets, Products and Applications. Palgrave, 366 pp.

  • Caussinus, H., and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. Appl. Stat, 53 , 405425.

  • DeGaetano, A. T., 1999: A method to infer observation time based on day-to-day temperature variations. J. Climate, 12 , 34433456.

  • Ducré-Robitaille, J. F., L. A. Vincent, and G. Boulet, 2003: Comparison of techniques for detection of discontinuities in temperature series. Int. J. Climatol, 23 , 10871101.

    • Search Google Scholar
    • Export Citation
  • Easterling, D. R., and T. C. Peterson, 1992: Techniques for detecting and adjusting for artificial discontinuities in climatological time series: A review. Preprints, Fifth Int. Meeting on Statistical Climatology, Toronto, ON, Canada, Steering Committee for International Meetings on Statistical Climatology, J28–J32.

  • Easterling, D. R., and T. C. Peterson, 1995: A new method for detecting undocumented discontinuities in climatological time series. Int. J. Climatol, 15 , 369377.

    • Search Google Scholar
    • Export Citation
  • Jones, P. D., and A. Moberg, 2003: Hemispheric and large-scale surface air temperature variations: An extensive revision and an update to 2001. J. Climate, 16 , 206223.

    • Search Google Scholar
    • Export Citation
  • Jones, P. D., and Coauthors, 1985: A grid point temperature data set for the Northern Hemisphere. Tech. Rep. TR022, Carbon Dioxide Research Division, U.S. Department of Energy, 251 pp.

  • Kalnay, E., and M. Cai, 2003: Impact of urbanization and land-use change on climate. Nature, 423 , 528531.

  • Karl, T. R., and C. W. Williams, 1987: An approach to adjusting climatological time series for discontinuous inhomogeneities. J. Climate Appl. Meteor, 26 , 17441763.

    • Search Google Scholar
    • Export Citation
  • Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15 , 25472554.

    • Search Google Scholar
    • Export Citation
  • Perreault, L., M. Haché, M. Slivitzky, and B. Bobée, 1999: Detection of changes in precipitation and runoff over eastern Canada and U.S. using a Bayesian approach. Stochastic Environ. Res. Risk Assess, 13 , 201216.

    • Search Google Scholar
    • Export Citation
  • Perreault, L., J. Bernier, B. Bobée, and E. Parent, 2000: Bayesian change-point analysis in hydrometeorological time series. Part 2. Comparison of change-point models and forecasting. J. Hydrol, 235 , 242263.

    • Search Google Scholar
    • Export Citation
  • Peterson, T. C., and Coauthors, 1998: Homogeneity adjustments of in situ atmospheric climate data: A review. Int. J. Climatol, 18 , 14931517.

    • Search Google Scholar
    • Export Citation
  • Plummer, N., Z. Lin, and S. Torok, 1995: Trends in diurnal temperature range over Australia since 1951. Atmos. Res, 37 , 7986.

  • Potter, K. W., 1981: Illustration of a new test for detecting a shift in mean in precipitation series. Mon. Wea. Rev, 109 , 20402045.

  • Quayle, R. G., D. R. Easterling, T. R. Karl, and P. Y. Hughes, 1991: Effects of recent thermometer changes in the Cooperative Station Network. Bull. Amer. Meteor. Soc, 72 , 17181723.

    • Search Google Scholar
    • Export Citation
  • Vincent, L. A., 1998: A technique for the identification of inhomogeneities in Canadian temperature series. J. Climate, 11 , 10941104.

    • Search Google Scholar
    • Export Citation
  • Vincent, L. A., and D. W. Gullet, 1999: Canadian historical and homogeneous temperature datasets for climate change analyses. Int. J. Climatol, 19 , 13751388.

    • Search Google Scholar
    • Export Citation
  • Vose, R. S., T. R. Karl, D. R. Easterling, C. N. Williams, and M. J. Menne, 2004: Climate (communication arising): Impact of land-use change on climate. Nature, 427 , 213214.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1999: Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agric. For. Meteor, 96 , 85101.

    • Search Google Scholar
    • Export Citation
  • WMO, 2003: Guidelines on climate metadata and homogenization. WCDMP-No. 53, WMO-TD No. 1186, World Meteorological Organization, Geneva, Switzerland, 51 pp.

Fig. 1.
Fig. 1.

Scatterplots showing the position and magnitude of falsely identified inhomogeneities for five methodologies.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 2.
Fig. 2.

Proportion of single-step discontinuities correctly identified as a function of average candidate reference series correlation and σ anomaly using (a) POTT, (b) MLR, (c) BAYE, and (d) PMETA. The vertical axis is not drawn to scale. The statistical characteristics of the stations in Table 2 are indicated on each panel based on a 0.55°C discontinuity.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 3.
Fig. 3.

As in Fig. 2 but for the proportion of single-step discontinuities correctly identified as a function of candidate series length and σ anomaly. A candidate reference series correlation of 0.80 and a 0.55°C discontinuity, imposed at the midpoint of each series, is used in all cases.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 4.
Fig. 4.

Box plots showing the distribution of identified (a) offset magnitudes (°C) and (b) offset years as a function of methodology. In all cases the methods are applied to a series with a 0.33°C offset imposed at year 12 (denoted by the dashed lines) based on the WA station network.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 5.
Fig. 5.

Box plots of the mean offset given by each homogenization procedure for stationary (leftmost box plot in each group) and nonstationary series with a 0.33°C discontinuity imposed at year 12 using the NY station network. Results for 0.001°, 0.0025°, and 0.005°C yr−1 trends are represented by the second through fourth (from left) box plots in each group.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 6.
Fig. 6.

Position of the detected 0.55°C inhomogeneities in series with breaks imposed at (a) and (d) years 12 and 38; (b) and (e) years 12, 25, and 38; and (c) and (f) years 10, 20, 30, and 40 as indicated by the dashed lines. In (a)–(c) sequential breaks are of opposite sign, while in (d)–(f) sequential breaks have the same sign.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 7.
Fig. 7.

Box plots of mean offset given by each homogenization procedure for each break in series with multiple breaks. Panel letters correspond to the break positions and patterns indicated in Fig. 6.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Fig. 8.
Fig. 8.

Proportion of single-step discontinuities correctly identified as a function of average candidate series lag-1 autocorrelation and σ anomaly using (a) POTT, (b) MLR, (c) BAYE, and (d) PMETA. A candidate reference series correlation of 0.80 is used in all cases. The vertical axis is not drawn to scale. The statistical characteristics of the time series evaluated by DR03 are indicated by the square in each panel.

Citation: Journal of Climate 19, 5; 10.1175/JCLI3662.1

Table 1.

Summary of homogenization procedures included in the comparison.

Table 1.
Table 2.

Standard deviation of annual temperature at the candidate stations (Can) and correlation (r) characteristics of the network of reference sites (Ref) associated with each candidate station. Stations are identified by their National Climatic Data Center Cooperative Observer Network identifier. Data periods and lag-1 autocorrelations of the detrended candidate series are also given.

Table 2.
Table 3.

Percentage of 50-yr time series with at least one falsely detected step inhomogeneity when applied to 1000 homogeneous series, by method. Bold values indicate the highest proportion of Type I errors for each method.

Table 3.
Table 4.

Percentage of 50-yr time series with at least one falsely detected step inhomogeneity, by method, when applied to 1000 nonstationary homogeneous candidate station series. Unless otherwise noted, series are based on the characteristics of the New York station network. Simulation locations refer to those described in Table 2.

Table 4.
Table 5.

Percentage of multiple-step inhomogeneities identified, by method, when applied to 1000 stationary 50-yr candidate station series with 0.55°C breaks imposed at the indicated positions (years). Two break patterns are tested: one in which sequential breaks are of the same sign (Inc) and the second in which sequential breaks are of opposite sign (Alt). Values for 0.33°C discontinuities are given in parentheses. All tests are based on the New York station network. Bold values indicate the highest identification rates.

Table 5.
Save
  • Alexandersson, H., 1986: A homogeneity test applied to precipitation data. J. Climatol, 6 , 661675.

  • Allen, R. J., and A. T. DeGaetano, 2000: A method to adjust long-term temperature extreme series for nonclimatic inhomogeneities. J. Climate, 13 , 36803695.

    • Search Google Scholar
    • Export Citation
  • Banks, E., 2002: Weather Risk Management Markets, Products and Applications. Palgrave, 366 pp.

  • Caussinus, H., and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. Appl. Stat, 53 , 405425.

  • DeGaetano, A. T., 1999: A method to infer observation time based on day-to-day temperature variations. J. Climate, 12 , 34433456.

  • Ducré-Robitaille, J. F., L. A. Vincent, and G. Boulet, 2003: Comparison of techniques for detection of discontinuities in temperature series. Int. J. Climatol, 23 , 10871101.

    • Search Google Scholar
    • Export Citation
  • Easterling, D. R., and T. C. Peterson, 1992: Techniques for detecting and adjusting for artificial discontinuities in climatological time series: A review. Preprints, Fifth Int. Meeting on Statistical Climatology, Toronto, ON, Canada, Steering Committee for International Meetings on Statistical Climatology, J28–J32.

  • Easterling, D. R., and T. C. Peterson, 1995: A new method for detecting undocumented discontinuities in climatological time series. Int. J. Climatol, 15 , 369377.

    • Search Google Scholar
    • Export Citation
  • Jones, P. D., and A. Moberg, 2003: Hemispheric and large-scale surface air temperature variations: An extensive revision and an update to 2001. J. Climate, 16 , 206223.

    • Search Google Scholar
    • Export Citation
  • Jones, P. D., and Coauthors, 1985: A grid point temperature data set for the Northern Hemisphere. Tech. Rep. TR022, Carbon Dioxide Research Division, U.S. Department of Energy, 251 pp.

  • Kalnay, E., and M. Cai, 2003: Impact of urbanization and land-use change on climate. Nature, 423 , 528531.

  • Karl, T. R., and C. W. Williams, 1987: An approach to adjusting climatological time series for discontinuous inhomogeneities. J. Climate Appl. Meteor, 26 , 17441763.

    • Search Google Scholar
    • Export Citation
  • Lund, R., and J. Reeves, 2002: Detection of undocumented changepoints: A revision of the two-phase regression model. J. Climate, 15 , 25472554.

    • Search Google Scholar
    • Export Citation
  • Perreault, L., M. Haché, M. Slivitzky, and B. Bobée, 1999: Detection of changes in precipitation and runoff over eastern Canada and U.S. using a Bayesian approach. Stochastic Environ. Res. Risk Assess, 13 , 201216.

    • Search Google Scholar
    • Export Citation
  • Perreault, L., J. Bernier, B. Bobée, and E. Parent, 2000: Bayesian change-point analysis in hydrometeorological time series. Part 2. Comparison of change-point models and forecasting. J. Hydrol, 235 , 242263.

    • Search Google Scholar
    • Export Citation
  • Peterson, T. C., and Coauthors, 1998: Homogeneity adjustments of in situ atmospheric climate data: A review. Int. J. Climatol, 18 , 14931517.

    • Search Google Scholar
    • Export Citation
  • Plummer, N., Z. Lin, and S. Torok, 1995: Trends in diurnal temperature range over Australia since 1951. Atmos. Res, 37 , 7986.

  • Potter, K. W., 1981: Illustration of a new test for detecting a shift in mean in precipitation series. Mon. Wea. Rev, 109 , 20402045.

  • Quayle, R. G., D. R. Easterling, T. R. Karl, and P. Y. Hughes, 1991: Effects of recent thermometer changes in the Cooperative Station Network. Bull. Amer. Meteor. Soc, 72 , 17181723.

    • Search Google Scholar
    • Export Citation
  • Vincent, L. A., 1998: A technique for the identification of inhomogeneities in Canadian temperature series. J. Climate, 11 , 10941104.

    • Search Google Scholar
    • Export Citation
  • Vincent, L. A., and D. W. Gullet, 1999: Canadian historical and homogeneous temperature datasets for climate change analyses. Int. J. Climatol, 19 , 13751388.

    • Search Google Scholar
    • Export Citation
  • Vose, R. S., T. R. Karl, D. R. Easterling, C. N. Williams, and M. J. Menne, 2004: Climate (communication arising): Impact of land-use change on climate. Nature, 427 , 213214.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 1999: Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agric. For. Meteor, 96 , 85101.

    • Search Google Scholar
    • Export Citation
  • WMO, 2003: Guidelines on climate metadata and homogenization. WCDMP-No. 53, WMO-TD No. 1186, World Meteorological Organization, Geneva, Switzerland, 51 pp.

  • Fig. 1.

    Scatterplots showing the position and magnitude of falsely identified inhomogeneities for five methodologies.

  • Fig. 2.

    Proportion of single-step discontinuities correctly identified as a function of average candidate reference series correlation and σ anomaly using (a) POTT, (b) MLR, (c) BAYE, and (d) PMETA. The vertical axis is not drawn to scale. The statistical characteristics of the stations in Table 2 are indicated on each panel based on a 0.55°C discontinuity.

  • Fig. 3.

    As in Fig. 2 but for the proportion of single-step discontinuities correctly identified as a function of candidate series length and σ anomaly. A candidate reference series correlation of 0.80 and a 0.55°C discontinuity, imposed at the midpoint of each series, is used in all cases.

  • Fig. 4.

    Box plots showing the distribution of identified (a) offset magnitudes (°C) and (b) offset years as a function of methodology. In all cases the methods are applied to a series with a 0.33°C offset imposed at year 12 (denoted by the dashed lines) based on the WA station network.

  • Fig. 5.

    Box plots of the mean offset given by each homogenization procedure for stationary (leftmost box plot in each group) and nonstationary series with a 0.33°C discontinuity imposed at year 12 using the NY station network. Results for 0.001°, 0.0025°, and 0.005°C yr−1 trends are represented by the second through fourth (from left) box plots in each group.

  • Fig. 6.

    Position of the detected 0.55°C inhomogeneities in series with breaks imposed at (a) and (d) years 12 and 38; (b) and (e) years 12, 25, and 38; and (c) and (f) years 10, 20, 30, and 40 as indicated by the dashed lines. In (a)–(c) sequential breaks are of opposite sign, while in (d)–(f) sequential breaks have the same sign.

  • Fig. 7.

    Box plots of mean offset given by each homogenization procedure for each break in series with multiple breaks. Panel letters correspond to the break positions and patterns indicated in Fig. 6.

  • Fig. 8.

    Proportion of single-step discontinuities correctly identified as a function of average candidate series lag-1 autocorrelation and σ anomaly using (a) POTT, (b) MLR, (c) BAYE, and (d) PMETA. A candidate reference series correlation of 0.80 is used in all cases. The vertical axis is not drawn to scale. The statistical characteristics of the time series evaluated by DR03 are indicated by the square in each panel.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1054 385 86
PDF Downloads 292 66 1