A Novel Approach for the Detection of Inhomogeneities Affecting Climate Time Series

Andrea Toreti Department of Geography, Climatology, Climate Dynamics and Climate Change, Justus-Liebig University of Giessen, Giessen, Germany

Search for other papers by Andrea Toreti in
Current site
Google Scholar
PubMed
Close
,
Franz G. Kuglitsch Institute of Geography, and Oeschger Centre for Climate Change Research, University of Bern, Bern, Switzerland

Search for other papers by Franz G. Kuglitsch in
Current site
Google Scholar
PubMed
Close
,
Elena Xoplaki Institute of Geography, University of Bern, Bern, Switzerland, and Department of Geography, Climatology, Climate Dynamics and Climate Change, Justus-Liebig University of Giessen, Giessen, Germany

Search for other papers by Elena Xoplaki in
Current site
Google Scholar
PubMed
Close
, and
Jürg Luterbacher Department of Geography, Climatology, Climate Dynamics and Climate Change, Justus-Liebig University of Giessen, Giessen, Germany

Search for other papers by Jürg Luterbacher in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Sudden changes caused by nonclimatic factors (inhomogeneities) usually affect instrumental time series of climate variables. To perform robust climate analyses based on observations, a proper identification of such changes is necessary. Here, an approach (named the “GAHMDI” method, after its components and purpose) that is based on a genetic algorithm and hidden Markov models is proposed for detection of inhomogeneities caused by changes in the mean and variance. Simulated series and a case study (winter precipitation from a weather station located in Milan, Italy) are set up to compare GAHMDI with existing methodologies and to highlight its features. For the identification of a single changepoint, GAHMDI performs similarly to other methods (e.g., standard normal homogeneity test). However, for the identification of multiple inhomogeneities and changes in variance, GAHMDI returns better results than three widespread methods by avoiding overdetection. For future applications and research in the homogenization of climate datasets (temperature and precipitation) the use of GAHMDI is encouraged, preferably in combination with another detection procedure (e.g., the method of Caussinus and Mestre) when metadata are not available. Since GAHMDI is developed in the generic context of time series segmentation, it can be applied to series of generic variables—for instance, those related to economics, biology, and informatics.

Corresponding author address: Andrea Toreti, University of Giessen, Senckenbergstr. 1, Giessen, Germany 35390. E-mail: andrea.toreti@geogr.uni-giessen.de

Abstract

Sudden changes caused by nonclimatic factors (inhomogeneities) usually affect instrumental time series of climate variables. To perform robust climate analyses based on observations, a proper identification of such changes is necessary. Here, an approach (named the “GAHMDI” method, after its components and purpose) that is based on a genetic algorithm and hidden Markov models is proposed for detection of inhomogeneities caused by changes in the mean and variance. Simulated series and a case study (winter precipitation from a weather station located in Milan, Italy) are set up to compare GAHMDI with existing methodologies and to highlight its features. For the identification of a single changepoint, GAHMDI performs similarly to other methods (e.g., standard normal homogeneity test). However, for the identification of multiple inhomogeneities and changes in variance, GAHMDI returns better results than three widespread methods by avoiding overdetection. For future applications and research in the homogenization of climate datasets (temperature and precipitation) the use of GAHMDI is encouraged, preferably in combination with another detection procedure (e.g., the method of Caussinus and Mestre) when metadata are not available. Since GAHMDI is developed in the generic context of time series segmentation, it can be applied to series of generic variables—for instance, those related to economics, biology, and informatics.

Corresponding author address: Andrea Toreti, University of Giessen, Senckenbergstr. 1, Giessen, Germany 35390. E-mail: andrea.toreti@geogr.uni-giessen.de

1. Introduction

Long instrumental time series of climate variables are often affected by abrupt changes caused by climatic and/or nonclimatic factors. In a time series context, the identification of abrupt changes is equivalent to the subdivision of a series into segments characterized by homogeneous statistical features (e.g., mean and higher-order moments). Time series segmentation or changepoint detection is a broad topic involving several application fields, such as, for example, information systems (Tartakovsky et al. 2006), neurology (Robinson et al. 2010), genetics (Tai et al. 2010), and economics (Koop and Potter 2009). Changepoint problems have also received increasing attention in geophysical research (e.g., Reeves et al. 2007). Many studies have recognized that abrupt changes affect climate records, like the North Pacific Ocean mean sea level pressure (Trenberth 1990), stratospheric temperature (Pawson et al. 1998), and surface air temperature (e.g., Miranda and Tomé 2009; Toreti and Desiato 2008). Besides these climate-related changes, other nonclimatic factors (e.g., relocation of the weather station, changes of instrumentation) usually cause sudden changes (Kuglitsch et al. 2009, and references therein). Therefore, the identification and attribution of abrupt changes in time series are essential tasks for an accurate analysis of climate and climate change (e.g., Randall et al. 2007; Jansen et al. 2007).

Various methods, characterized by different features, are generally applicable—for example, Bayesian methods (Perreault et al. 2000; Ray and Tsay 2002), likelihood ratio tests (Kim and Siegmund 1989), dynamic algorithms (Bai and Perron 1998), optimal least squares segmentation, and hidden Markov models (Hubert 1997; Kehagias 2004; Kehagias et al. 2005; Kehagias and Fortin 2006). Some methods have also been developed for specific tasks such as the homogenization of climate data (e.g., Alexandersson and Moberg 1997; Easterling and Peterson 1995; Lund et al. 2007; Wang et al. 2007; Lu et al. 2010).

In focusing on changepoints attributable to nonclimatic factors (i.e., inhomogeneities or break points), identification and correction (whenever possible) are ideally performed after standard quality control procedures (e.g., Moberg et al. 2006). For instance, trustworthy trend assessments and analyses of extreme events rely on high-quality data, not influenced by nonclimatic factors (Toreti et al. 2010a). The aim of homogenization techniques (i.e., procedures combining detection and correction) is the removal (or at least the reduction) of the nonclimatic signal affecting time series under investigation [for an overview, the reader is referred to Aguilar et al. (2003)]. The detection step aims to identify times at which the series suddenly changes behavior because of local factors not ascribable to the climate system. During the analyses of sets of annual and seasonal mean temperature and total precipitation data (Kuglitsch et al. 2009; Toreti et al. 2009, 2010b), several methods were applied for the detection of break points: sequential standard normal homogeneity test (SNHT; Alexandersson and Moberg 1997), RHtest (Wang et al. 2007; Wang 2008a,b), and the method of Caussinus and Mestre (2004; CauMe). All these methods have limitations and drawbacks in regard to underestimation or overestimation of multiple changepoints and/or incorrect break point location times. Trying to overcome these shortcomings, we present a more general time series segmentation method based on hidden Markov models (HMM) and a genetic algorithm (GA). The method, called genetic algorithm hidden Markov models for detection of inhomogeneities (GAHMDI), is explained and implemented with some restrictive hypotheses (e.g., the process cannot come back to a previous condition), usually satisfied in the homogenization context, that can be relaxed for other applications.

Section two focuses on the proposed methodology. Section 3 compares the performance of GAHMDI with the CauMe, SNHT, and RHtest segmenters using simulated series and one case study. This comparative analysis is not exhaustive. Other recent homogenization procedures (e.g., Lu et al. 2010) have not been included, because simulations are computationally expensive and a complete comparison of GAHMDI with respect to all available methods was not feasible in the frame of this research. In addition, a case study on winter precipitations recorded in northern Italy from 1950 to 2006 is shown. In section 4 conclusions and outlook are provided. Appendixes A and B give further details on the method and the simulation of the series.

2. Method

Let {Xt}t=1,...,N be a discrete time process, for instance annual mean temperature from a weather station, affected by K − 1 changepoints (K is not known) located at unknown times {τ1, … , τK−1} (e.g., Lu et al. 2010). Thus, the process is characterized by K homogeneous segments given by τj−1 < tτj for j = 1, … , K with τ0 = 0 and τK = N. The aim of detection procedures is the identification of K and {τ1, … , τK−1} and the estimation of the statistical features describing the homogeneous segments, as for example the mean and variance. GAHMDI puts this problem in a hidden Markov models framework (section 2a) and provides an estimated segmentation when K is fixed a priori. To avoid convergence to local maximum, GAHMDI performs an initial estimation by using a GA (section 2b). Finally, GAHMDI is applied with K ∈ {1, … , Kmax} for some preset Kmax and the optimal number of segments (changepoints) is chosen by minimizing a penalized likelihood objective function of the form −log(likelihood) + penalty (section 2c).

a. Hidden Markov models

HMMs are broadly applied, especially in speech recognition; however, few authors have applied them to segmentation problems (e.g., Kehagias 2004). In this frame, {Xt}t=1,2,...,N depends on an unobservable process {St}t=1,2,...,N (the state process) that takes values in {1, 2, … , K}. In the homogenization context the state process could be considered as the set of nonclimatic factors influencing the measured variables (e.g., the position of the weather station, the observing practices). The state process is a Markov chain, characterized by a transition matrix , whose elements are pi,j = P(St+1 = j|St = i) where 1 ≤ i, jK, and an initial state distribution π = (π1, … , πK) where πi = P(S1 = i). Xt are conditionally independent with a distribution (given St) that is Gaussian with mean and variance . Moreover, we can suppose that: pi,j = 0, for all j < i; π = (1, 0, … , 0); pi,i 0 and pi,i+1 = 1 − pi,i for all i = 1, … , K. The above described model is known as a left-to-right HMM [for a detailed description/review the reader is referred to Rabiner (1989), MacDonald and Zucchini (1997), and Cappé et al. (2005)]. These conditions are reasonable in a homogenization context, although, from a theoretical point of view, exceptions could arise. For instance, the assumptions on the transition matrix and the initial state distribution imply that the system cannot come back to a previous state and the first observation belongs to the first state. Since geophysical systems can (in principle) come back to previous states, this constraint can be easily relaxed for general purposes. Furthermore, the Gaussian condition is not essential and with the appropriate changes (in the likelihood function), other distributions can be considered.

The set of HMM parameters that need to be estimated is and the optimal state sequence (s1, … , sN) that represents the segmentation of {Xt}. Suppose that K and an initial guess at the other parameters in λ are known. The model likelihood of these parameters is
e1
where the first summation is over all state sequences that have K different states from t = 1, 2, … , N. As pointed out by Juang and Rabiner (1991), this likelihood is difficult to evaluate directly. Thus, we have applied an expectation–maximization (EM) procedure that does not maximize directly the likelihood, but uses a surrogate function [for a complete description the reader is referred to Welch (2003) and Cappé et al. (2005)]. The above EM algorithm enables us to, for a fixed K, optimize the likelihood to find the best estimates of and μi, σi for 1 ≤ iK. This is done through the so called Baum–Welch algorithm (Rabiner 1989; Welch 2003), which meshes well with the EM approach. The Baum–Welch procedure gives a likelihood-optimal value of λ, but does not give an estimated state sequence. To obtain , we employ another algorithm known as the Viterbi algorithm (Forney 1973; Rabiner 1989; Viterbi 2006). The Viterbi algorithm maximizes the posterior likelihood of the state sequence given the observation {Xt}t=1,...,N and , , for i = 1, … , K (e.g., Juang and Rabiner 1991). The Viterbi algorithm is a dynamic procedure that consists of four steps based on , where (x1, … , xt) is the observation vector from 1 to t. The term δt(i) maximizes the conditional likelihood (L2) of the state sequence up to time t and ending in a state equal to i. Since at time 1 the state is equal to 1, the initialization is given by
e2
and ψ1(1) = 1, δ1(i) = 0, and ψ1(i) = 0 for all i = 2, … , K (ψ is a storage variable). Then, the recursion, for t = 2, … , N and j = 1, … , K, is provided by
e3
e4
Here ψ retains the value for which the maximum of (3) is achieved; that is, it retains the state at time t − 1 of the sequence s1, … , st−1 that maximizes L2 with st = j. Equation (3) is based on the fact (Cappé et al. 2005) that L2 of a state sequence until time t is equal (except for a constant term that does not depend on the state sequence) to L2 of the state sequence until t − 1 multiplied by some terms depending only on the states at time t − 1 and t. The third step, the termination, is given by max1≤iK δN(i) and . Thus, the last value of the state sequence has been estimated. Finally, the complete optimal state sequence is obtained with the so-called backtracking: , t = N − 1, N − 2, … , 1.

b. Initial estimation

Suppose the number of states K is fixed. The Baum–Welch algorithm depends on an initial segmentation estimate that can be obtained by partitioning the observations, that is, giving a first guess of the state sequence. This state sequence is usually random (Kehagias 2004) or provided by partitioning around medoids (Fridlyand et al. 2004; Swami and Jain 2006). However, this can lead to local maxima. To get global maxima, GAHMDI uses a GA to estimate the initial state sequence. GAs can be defined either as a family of computational models inspired by evolution (Whitley 1994) or as robust techniques for optimization based on the laws of natural selection and genetics. They were introduced by Holland (1992) and several authors applied GAs for the estimation of HMM parameters, usually hybridizing them (e.g., Kwong et al. 2001; Won et al. 2004). Jann (2006) and Li and Lund (2012) directly use GAs to solve climate homogenization problems.

The main elements of a GA are a set of possible solutions, called population, and an evaluation function. Through genetic processes, these algorithms reach an optimal solution in terms of the evaluation function. The first step of a GA is the creation of the initial population (randomly generated). The individuals of the population are quantified via a notion of a chromosome and represent possible solutions to the optimization problem. A fitness value is assigned to each chromosome by using the evaluation function. Before undertaking the genetic operations, an intermediate population must be created by selecting individuals from the initial population. For this task, methods like stochastic universal sampling (Baker 1987) or tournament selection (e.g., Blickle and Thiele 1995) have been developed. The latter is widespread and well known for its efficiency. It organizes tournaments between two individuals, randomly chosen, and winners (in terms of the fitness values) become members of the intermediate population. The next step involves genetic processes, that is, crossover and mutation. The aim is the evolution of the population toward a new generation of individuals. Crossover (single point) involves two chromosomes that are split at the same point and recombined, thus to produce offspring. For instance, parents a = (a1, … , an) and b = (b1, … , bn), crossover at point h, give offspring: c = (a1, … , ah, bh+1, … , bn) and d = (b1, … , bh, ah+1, … , an). The crossover point is randomly chosen and the crossover operation is performed with a fixed probability pc. The other genetic operation is called mutation; it involves only one chromosome and produces small changes in its structure. Let a = (1, 1, 1, 1, 2, 2, 2) be a chromosome. A mutation of a, for instance, is the chromosome am = (1, 1, 1, 2, 2, 2, 2). Also mutation is applied with a fixed probability pm. Finally, the fitness value of each individual of the new generation is calculated. To guarantee the survival of the best solutions belonging to the previous population, a process called elitism is performed. Elitism replaces the worst solutions of the new generation with the best solutions of the previous one (only if their fitness values are lower). The procedure is iterated until convergence.

In GAHMDI a single chromosome represents a state sequence; for instance, a = (1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4) is a four-state HMM with 12 observations. Crossover and mutation are applied ensuring that the HMM structure is preserved, that is, avoiding the birth of chromosomes like (1, 1, 1, 1, 1, 1, 1, 3, 3, 3, 4, 4, 4) that skip a state. Solutions where any segment is less than four steps long are rejected. The evaluation function is derived by following the approach of Kehagias (2004), that is, using a simplified HMM. For each chromosome the parameters of the associated HMM (μk and σk) are estimated by the mean and the standard deviation, respectively, of all the observations belonging to the state k (from 1 to K), where the segmentation is described by the chromosome under evaluation. Moreover, the transition matrix has all elements on the diagonal (except the last one pK,K = 1) equal to the number of time steps without change in the state sequence divided by the total number of time steps. Therefore, the evaluation function of the GA is the joint likelihood of the state sequence and the observations:
e5
with p0,1 = 1. Taking logs and imposing the restriction pi,i = p for all iK − 1, we obtain
e6
where p is the first element on the main diagonal of , CK = {st such that st = K}, and |S| denotes the cardinality of the set S.

c. How many states?

In the previous sections the number of hidden states K has been assumed to be known, but actually it is not. For K = 1, … , Kmax we apply the model described in the previous subsections, stopping the procedure when the best state solution, associated with Kf states, has at least one state whose time duration is less than four steps. This is a reasonable minimum length for a homogeneous subperiod of a series (DeGaetano 2006; Toreti et al. 2010a). The choice of the best number of states falls in the framework of model selection. Here, several methods are available, for example, penalized likelihood criteria [such as Akaike information criterion (AIC); Akaike (1973); Bayesian information criterion (BIC); Schwarz (1978)], penalized marginal likelihood criterion (PML; Gassiat 2002), cross-validated likelihood criteria (Celeux and Durand 2008), and minimum distance estimator (MacKay 2002). However, as pointed out by Cappé et al. (2005) and Chambaz et al. (2009), the order estimation in a HMM is a very difficult problem. Recently, approaches based on the minimum description length (MDL) principle, developed in the context of information theory, have been proposed (e.g., Davis et al. 2006; Chambaz et al. 2009; Lu et al. 2010). The basic idea of MDL is to view model selection as a data compression problem (for a complete description the reader is referred to Grünwald 2007 and references therein); that is, in terms of achieving the shortest code length that describes data (where a code can be thought as a mapping between a sample space and the set of all strings of finite length composed by symbols ∈ {0, 1}). There are different versions of the MDL principle, but the application of the complex ones (e.g., normalized maximum likelihood) to HMM order selection is not straightforward and presents some open issues (P. D. Grünwald and T. Roos 2011, personal communications). Therefore, a simplified two-part MDL has been implemented (for a detailed description of the two-part MDL, the reader is referred to Davis et al. 2006 and Lu et al. 2010). In brief, the code length (to be minimized) of the vector composed of the observations and the state sequence, y = (x, s), associated with a model (a K-state HMM) belonging to a class (the class of left to right HMM), is given by
e7
where the first term is the code length of the chosen model and the second one is the code length of data when described by using that model (Grünwald 2007). By applying the encoding rules summarized by Lee (2001), the first term is equal to K log(N) + 2−1(K − 1)log(N), while the second term is equal to the negative of log[L1(λ)] (see appendix A). Finally, a multiresponse permutation procedure (MRPP; Mielke et al. 1981) has been also implemented to provide further support for the selected model order.

d. Additional remarks

GAHMDI has been presented as a tool for the detection of inhomogeneities in climate time series. Since detection procedures are usually not applied to the series to be tested (i.e., the candidate; see section 3) the independence assumption can be considered fulfilled. Some authors have developed procedures taking into account autocorrelation (e.g., Lund et al. 2007; Lu et al. 2010). The other two main hypotheses, namely the Gaussian distribution and the constraint on the state sequence, can be relaxed. The same does not hold, however, for the independence condition. A full development of GAHMDI for a generic dependent process is beyond the aim of this paper, but it is worthwhile to provide some important elements for autoregressive processes. In the context of these processes all parameters or a subset of them can be regime dependent (e.g., Frühwirth-Schnatter 2006). We focus on a specific linear model investigated by Ephraim and Roberts (2005):
e8
where are independent and Gaussian with zero mean and unit variance; the autoregressive coefficients and (grouped in the variable ) have to be estimated along with and K. In this case the complete likelihood is
e9
where and denote given initial conditions (xp+1, … , x0) and (xtp, … , xt−1), respectively. Finally, d(·) is the observation conditional density. The parameters are estimated with a stable forward–backward recursion (see Ephraim and Roberts 2005 for details). This model can be further generalized to nonlinear settings (e.g., Xie et al. 2008).

3. Results: Detection of inhomogeneities

The homogenization of climate time series (the candidate) relies on the detection of an unknown number of changepoints. This is usually done by comparing the candidate series to a set of well-correlated neighboring series belonging to the same climatic area (e.g., Peterson and Easterling 1994; Aguilar et al. 2003; Caussinus and Mestre 2004; Menne and Williams 2005; Kuglitsch et al. 2009). To remove the climate signal and to detect only artificial break points, changepoint methods are usually applied to the standardized difference series (candidate minus reference) of annual/seasonal values (in case of temperature) or to the log series of ratios (in case of precipitation; e.g., Toreti et al. 2009). The detection process is run r times, where r is the cardinality of the reference set. Kuglitsch et al. (2009) suggest retaining break points confirmed by three or more reference series within two consecutive years. In section 3a GAHMDI is tested on a simulated sample and its performance is compared to CauMe, SNHT, and RHtest segmenters. In section 3b the application of the method to a total winter precipitation series (from a weather station located in northern Italy) is shown.

a. Simulated series

Two well-correlated series (a candidate and a reference) of 100 independent values are generated 1000 times following the multivariate method of Wilks (1999, 2005) and using a Gaussian distribution for both series (see appendix B for details). To get additional information on GAHMDI’s behavior, artificial signals are added to the simulated candidate series (not affected by changepoints). Since it is not possible to cover all combinations of changepoints in terms of number, magnitude, and location, five specific cases are investigated (see Table 1). They are common in the homogenization of a real dataset and give the opportunity to test our procedure in a simple and effective way, looking at the number of detections and the contemporaneous identification of more than one changepoint. This last feature is important, because long time series are usually affected by more than one inhomogeneity, and unidentified break points could induce an erroneous correction of the candidate series. Following classical notation, σ denotes the standard deviation of the candidate. In the first three cases, the mean of the candidate series is changed after the 25th (50th, 75th) value, adding a constant signal [i.e., μa(t) = μa] of variable magnitude (from 0.1σ to 1.5σ, with steps of 0.1). In the fourth case, a random signal N(μa, 0.2σ) is added after the 70th values, with μa in (0.1σ, 0.2σ, … , 1.5σ). In the last case, three changepoints (associated with three changes of the mean) are added to the candidate series at the 20th, 50th, and 85th value; the constant signals are of magnitude equal to 0.8σ, −0.5σ, and 0.7σ, respectively. Since SNHT is designed to identify only one inhomogeneity, it has to be applied in a sequential way (e.g., Alexandersson and Moberg 1997); that is, after each detected changepoint, the series is split into subperiods and the test reapplied to each of them, until no further inhomogeneities are detected or the length of the subperiods is too short. Finally, in this subsection a detected changepoint is considered correct with a leeway of ±1.

Table 1.

Simulated case studies; cp and as denote changepoint and artificial signal, respectively.

Table 1.

Figure 1 shows GAHMDI’s performance in the first four cases. The behavior of our method is approximately identical to the behavior of CauMe and SNHT, whereas RHtest seems to underdetect single break points. The results are similar for break points located at the beginning (25th value), the middle (50th value), and the end (75th value) of the series. Moreover, the addition of an artificial random signal does not affect the detections of the four methods. When a stepwise function with three changepoints is added (Fig. 2), GAHMDI performs better than CauMe and RHtest in the contemporaneous identification of the changepoint vector (20, 50, 85): 229 times against 57 (CauMe) and 148 (RHtest). SNHT has the highest number of correct identifications (408 times), but it often detects more than three break points. Summarizing, the proposed method shows a behavior similar to widespread tests (CauMe and SNHT) for single changepoints, but it is better in the detection of multiple changepoints (i.e., more than two segments). Furthermore, GAHMDI is able to identify break points due to changes in variance, whereas methods designed to identify only changes in the mean (e.g., CauMe) show a drastic performance reduction. Indeed, when the variance of the candidate series is changed by 0.5 after the 50th value, GAHMDI detects this break point 77 times against 12 for CauMe, 10 for RHtest, and 4 for SNHT (not shown in the figures). Notice that this evaluation of the method (performed on a simulated dataset) does not cover all possible cases (e.g., different sizes, locations and number of inhomogeneities). However, the five simulated situations are rather common (e.g., Kuglitsch et al. 2009), so the results provide a proper description.

Fig. 1.
Fig. 1.

Number of correct detections of a single break point in function of the shift (expressed in terms of σ) for GAHMDI (black line), CauMe (gray line), RHtest (dashed gray line), and SNHT (dashed black line). A constant inhomogeneity is added after the (a) 25th, (b) 50th, and (c) 75th value. (d) A random signal N(μa, σa) is added after the 70th value.

Citation: Journal of Applied Meteorology and Climatology 51, 2; 10.1175/JAMC-D-10-05033.1

Fig. 2.
Fig. 2.

Identification of three break points (located at the 20th, 50th, and 85th values) added at the simulated series. The number of correct detections of each break point is plotted in correspondence of 20, 50, and 85 (x axis). The label 3 on the x axis represents the contemporaneous identification of the three break points. Black circles are associated with GAHMDI, gray circles with CauMe, gray stars with RHtest, and black stars with SNHT.

Citation: Journal of Applied Meteorology and Climatology 51, 2; 10.1175/JAMC-D-10-05033.1

b. Case study

Besides results from simulated series, a detection of inhomogeneities was performed on a winter (December to February) precipitation series that has been derived from daily observations (over the period 1950–2006) at the weather station of Milan (northern Italy; see Fig. 3). To make the comparison of GAHMDI with the other three methods straightforward, only one reference series is used. As shown in Fig. 3, GAHMDI flags one break point in 1991. The same holds for CauMe, although the inhomogeneity is identified in 1989; while both SNHT and RHtest detect two inhomogeneities, that is, (1991, 1996) and (1980, 1990). Therefore, only the break point located in 1991 is unanimous. These results, although not confirmed by metadata, point out the differences between the applied methods and show the need of multiple detection procedures in the homogenization of real series.

Fig. 3.
Fig. 3.

Winter precipitation sums (black line) from the weather station of Milan, Italy (red dot in the small panel). Vertical dashed lines identify the break points detected by GAHMDI (blue line), CauMe (red line), SNHT (green lines), and RHtest (gray lines).

Citation: Journal of Applied Meteorology and Climatology 51, 2; 10.1175/JAMC-D-10-05033.1

4. Conclusions

Time series segmentation is a complex task with potential applications in many research fields (e.g., climate change, economics, finance, biology, music, and informatics). Abrupt changes (characterizing transitions from a state to another one) affect several physical systems. A statistical description, through changepoints, of abrupt responses (e.g., in the climate system) to external forcings helps to improve the understanding of mechanisms behind those phenomena. In this context, an approach based on a genetic algorithm and hidden Markov models was proposed. GAHMDI has been developed to be immediately applied in the homogenization field. However, its flexibility allows an easy adaptability to different fields and initial assumptions. GAHMDI guarantees the reliability of the solution by avoiding convergence to local optimums. Furthermore, application of the MDL principle permits us to choose the number of states in an objective way; although, as pointed out by several authors, the order selection of a HMM is very difficult and still an active field of research. The method is theoretically explained and its applicability in climate homogenization was demonstrated. GAHMDI’s behavior is investigated by using a simulated dataset, and compared with the method developed by Caussinus and Mestre (2004), the standard normal homogeneity test (Alexandersson and Moberg 1997), and the RHtest (Wang et al. 2007; Wang 2008a,b). GAHMDI performs better than the other three methods in the contemporaneous detection of multiple changepoints. In addition, it also takes into account changes in variance. An application of GAHMDI to an observed series of winter precipitation recorded at Milan (northern Italy) demonstrates the practical utility of the method. The described evaluation is surely not exhaustive. Extensive tests based on simulated series, a complete set of cases and involving additional methods are computationally very expensive and could be performed within a dedicated project.

In future research, GAHMDI will be expanded to handle autocorrelated data as well as on changepoint detection of daily climate time series.

Acknowledgments

We are grateful to Dr. D. Harte (SRA) for the R-package HiddenMarkov and helpful discussions. We thank Dr. P. D. Grünwald (CWI and Leiden University) and Dr. T. Roos (University of Helsinki) for useful and interesting discussions and W. Perconti (ISPRA) for his support during the simulation process. The comments and suggestions of two anonymous referees and Dr. R. Lund (Clemson University) improved the quality and the readability of the manuscript. This research was funded by the EU/FP6 integrated project CIRCE (Climate Change and Impact Research: the Mediterranean Environment; http://www.circeproject.eu/; Contract 036961) and the EU/FP7 project ACQWA (Assessing Climate Impacts on the Quantity and Quality of Water; http://www.acqwa.ch/; Grant 212250). The method is fully developed in R.

APPENDIX A

Likelihood L1

ea1
with p01 = 1. Taking logs gives
ea2
where Ck = (st such that st = k).

APPENDIX B

Simulated Series

The simulation is based on the following equations:
eb1
where and are 2 × 100 matrices and T denotes matrix transpose. The rows are the output (i.e., the simulated candidate and reference series), while the rows are two vectors of Gaussian independent random variables. The is a 2 × 2 matrix of correlations. The can be calculated using the spectral decomposition , where is the eigenvector matrix of and Λ1/2 is a diagonal matrix whose elements are the square root of the eigenvalues.

REFERENCES

  • Aguilar, E., I. Auer, M. Brunet, T. C. Peterson, and J. Wieringa, 2003: Guidance on metadata and homogenization. WMO TD 1186, 53 pp.

  • Akaike, H., 1973: Information theory and an extension of the maximum likelihood principle. Proceedings of the Second International Symposium on Information Theory, B. N. Petrov and F. Csádki, Eds., Akadémiai Kiadó, 267–281.

    • Search Google Scholar
    • Export Citation
  • Alexandersson, H., and A. Moberg, 1997: Homogenization of Swedish temperature data. Part I: Homogeneity test for linear trends. Int. J. Climatol., 17, 2534.

    • Search Google Scholar
    • Export Citation
  • Bai, J., and P. Perron, 1998: Estimating and testing linear models with multiple structural changes. Econometrica, 66, 4778.

  • Baker, J., 1987: Reducing bias and inefficiency in the selection algorithm. Proc. Second Int. Conf. on Genetic Algorithms and Their Application, Cambridge, MA, Massachusetts Institute of Technology, 14–21.

    • Search Google Scholar
    • Export Citation
  • Blickle, T., and L. Thiele, 1995: A mathematical analysis of tournament selection. Proc. Sixth Int. Conf. on Genetic Algorithms, Pittsburgh, PA, University of Pittsburgh, 9–16.

    • Search Google Scholar
    • Export Citation
  • Cappé, O., E. Moulines, and T. Rydén, 2005: Inference in Hidden Markov Models. Springer, 672 pp.

  • Caussinus, H., and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. Appl. Stat., 53, 405425.

  • Celeux, G., and J. B. Durand, 2008: Selecting hidden Markov model state number with cross-validated likelihood. Comput. Stat., 23, 541564.

    • Search Google Scholar
    • Export Citation
  • Chambaz, A., A. Garivier, and E. Gassiat, 2009: A minimum description length approach to hidden Markov models with Poisson and Gaussian emissions. Application to order identification. J. Stat. Plann. Inference, 139, 962977.

    • Search Google Scholar
    • Export Citation
  • Davis, R. A., T. C. M. Lee, and G. A. Rodriguez-Yam, 2006: Structural break estimation for nonstationary time series models. J. Amer. Stat. Assoc., 101, 223239.

    • Search Google Scholar
    • Export Citation
  • DeGaetano, A. T., 2006: Attributes of several methods for detecting discontinuities in mean temperature series. J. Climate, 19, 838853.

    • Search Google Scholar
    • Export Citation
  • Easterling, D. R., and T. C. Peterson, 1995: A new method for detecting undocumented discontinuities in climatological time series. Int. J. Climatol., 15, 369377.

    • Search Google Scholar
    • Export Citation
  • Ephraim, Y., and W. J. J. Roberts, 2005: Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Process. Lett., 12, 166169.

    • Search Google Scholar
    • Export Citation
  • Forney, G. D., 1973: The Viterbi algorithm. Proc. IEEE, 61, 268278.

  • Fridlyand, J., A. M. Snijders, D. Pinkel, D. G. Albertson, and A. N. Jain, 2004: Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal., 90, 132153.

    • Search Google Scholar
    • Export Citation
  • Frühwirth-Schnatter, S., 2006: Finite Mixture and Markov Switching Models. Springer, 492 pp.

  • Gassiat, E., 2002: Likelihood ratio inequalities with applications to various mixtures. Ann. Inst. Henri Poincaré, 38, 897906.

  • Grünwald, P. D., 2007: The Minimum Description Length Principle. MIT Press, 703 pp.

  • Holland, J. H., 1992: Adaptation in Natural and Artificial Systems. MIT Press, 211 pp.

  • Hubert, P., 1997: Change points in meteorological time analysis. Application of Time Series Analysis in Astronomy and Meteorology, T. Subba Rao, M. B. Priestly, and O. Lessi, Eds., Chapman and Hall, 399–412.

    • Search Google Scholar
    • Export Citation
  • Jann, A., 2006: Genetic algorithms: Towards their use in the homogenization of climatological records. Croatian Meteor. J., 41, 319.

  • Jansen, E., and Coauthors, 2007: Paleoclimate. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 433–497.

    • Search Google Scholar
    • Export Citation
  • Juang, B. H., and L. R. Rabiner, 1991: Hidden Markov models for speech recognition. Technometrics, 33, 251272.

  • Kehagias, A., 2004: A hidden Markov model segmentation procedure for hydrological and environmental time series. Stochastic Environ. Res. Risk Assess., 18, 117130.

    • Search Google Scholar
    • Export Citation
  • Kehagias, A., and V. Fortin, 2006: Time series segmentation with shifting means hidden Markov models. Nonlinear Processes Geophys., 13, 114.

    • Search Google Scholar
    • Export Citation
  • Kehagias, A., E. Nidelkou, and V. Petridis, 2005: A dynamic programming segmentation procedure for hydrological and environmental time series. Stochastic Environ. Res. Risk Assess., 20, 7794.

    • Search Google Scholar
    • Export Citation
  • Kim, H. J., and D. Siegmund, 1989: The likelihood ratio test for a change point in simple linear regression. Biometrika, 76, 409423.

  • Koop, G., and S. M. Potter, 2009: Prior elicitation in multiple change-point models. Int. Econ. Rev., 50, 751772.

  • Kuglitsch, F. G., A. Toreti, E. Xoplaki, P. M. Della-Marta, J. Luterbacher, and H. Wanner, 2009: Homogenization of daily maximum temperature series in the Mediterranean. J. Geophys. Res., 114, D15108, doi:10.1029/2008JD011606.

    • Search Google Scholar
    • Export Citation
  • Kwong, S., C. W. Chau, K. F. Man, and K. S. Tang, 2001: Optimisation of HMM topology and its model parameters by genetic algorithms. Pattern Recognit., 34, 509522.

    • Search Google Scholar
    • Export Citation
  • Lee, T. C. M., 2001: An introduction to coding theory and the two-part minimum description length principle. Int. Stat. Rev., 69, 169183.

    • Search Google Scholar
    • Export Citation
  • Li, S., and R. Lund, 2012: Multiple changepoint detection via genetic algorithms. J. Climate, 25, 674686.

  • Lu, Q., R. Lund, and T. C. M. Lee, 2010: An MDL approach to the climate segmentation problem. Ann. Appl. Stat., 4, 299319.

  • Lund, R., X. L. Wang, Q. Lu, J. Reeves, C. Gallagher, and Y. Feng, 2007: Changepoint detection in periodic and autocorrelated time series. J. Climate, 20, 51785190.

    • Search Google Scholar
    • Export Citation
  • MacDonald, I. L., and W. Zucchini, 1997: Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman and Hall, 256 pp.

  • MacKay, R. J., 2002: Estimating the order of a hidden Markov model. Can. J. Stat., 30, 573589.

  • Menne, M. J., and C. N. Williams, 2005: Detection of undocumented changepoints using multiple test statistics and composite reference series. J. Climate, 18, 42714286.

    • Search Google Scholar
    • Export Citation
  • Mielke, P. W., K. J. Berry, and G. W. Brier, 1981: Application of multi-response permutation procedures for examining seasonal changes in monthly mean sea level pressure patterns. Mon. Wea. Rev., 109, 120126.

    • Search Google Scholar
    • Export Citation
  • Miranda, P. M. A., and A. R. Tomé, 2009: Spatial structure of the evolution of surface temperature (1951–2004). Climatic Change, 93, 269284.

    • Search Google Scholar
    • Export Citation
  • Moberg, A., and Coauthors, 2006: Indices for daily temperature and precipitation extremes in Europe analyzed for the period 1901–2000. J. Geophys. Res., 111, D22106, doi:10.1029/2006JD007103.

    • Search Google Scholar
    • Export Citation
  • Pawson, S., K. Labitzke, and S. Leder, 1998: Stepwise changes in stratospheric temperature. Geophys. Res. Lett., 25, 21572160.

  • Perreault, L., J. Bernier, B. Bobee, and E. Parent, 2000: Bayesian change-point analysis in hydrometeorological time series. Part 1. The normal model revisited. J. Hydrol., 235, 221241.

    • Search Google Scholar
    • Export Citation
  • Peterson, T. C., and D. R. Easterling, 1994: Creation of homogeneous composite climatological reference series. Int. J. Climatol., 14, 671679.

    • Search Google Scholar
    • Export Citation
  • Rabiner, L. R., 1989: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77, 257286.

  • Randall, D. A., and Coauthors, 2007: Climate models and their evaluation. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 589–662.

    • Search Google Scholar
    • Export Citation
  • Ray, B. K., and R. S. Tsay, 2002: Bayesian method for change-point detection in long-range dependent processes. J. Time Ser. Anal., 23, 687705.

    • Search Google Scholar
    • Export Citation
  • Reeves, J., J. Chen, X. L. Wang, R. Lund, and Q. Lu, 2007: A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteor. Climatol., 46, 900915.

    • Search Google Scholar
    • Export Citation
  • Robinson, L. F., T. D. Wager, and M. A. Lindquist, 2010: Change point estimation in multi-subject fMRI studies. NeuroImage, 49, 15811592.

    • Search Google Scholar
    • Export Citation
  • Schwarz, G., 1978: Estimating the dimension of a model. Ann. Stat., 6, 461464.

  • Swami, D. K., and R. C. Jain, 2006: PAMC: Partitioning around medoids for classification. Inf. Technol. J., 5, 11021105.

  • Tai, Y. C., M. N. Kvale, and J. S. Witte, 2010: Segmentation and estimation for SNP microarrays: A Bayesian multiple change-point approach. Biometrics, 66, 675683.

    • Search Google Scholar
    • Export Citation
  • Tartakovsky, A. G., B. L. Rozovskii, R. Blažek, and H. Kim, 2006: Detection of intrusions in information systems by sequential change-point methods. Stat. Methodol., 3, 252293.

    • Search Google Scholar
    • Export Citation
  • Toreti, A., and F. Desiato, 2008: Temperature trend over Italy from 1961 to 2004. Theor. Appl. Climatol., 91, 5158.

  • Toreti, A., G. Fioravanti, W. Perconti, and F. Desiato, 2009: Annual and seasonal precipitation over Italy from 1961 to 2006. Int. J. Climatol., 29, 19761987.

    • Search Google Scholar
    • Export Citation
  • Toreti, A., F. G. Kuglitsch, E. Xoplaki, J. Luterbacher, and H. Wanner, 2010a: A novel method for the homogenization of daily temperature series and its relevance for climate change analysis. J. Climate, 23, 53255331.

    • Search Google Scholar
    • Export Citation
  • Toreti, A., E. Xoplaki, D. Maraun, F. G. Kuglitsch, H. Wanner, and J. Luterbacher, 2010b: Characterisation of extreme winter precipitation in Mediterranean coastal sites and associated anomalous atmospheric circulation patterns. Nat. Hazards Earth Syst., 10, 10371050.

    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., 1990: Recent observed interdecadal climate changes in the Northern Hemisphere. Bull. Amer. Meteor. Soc., 71, 988993.

    • Search Google Scholar
    • Export Citation
  • Viterbi, A. J., 2006: A personal history of the Viterbi algorithm. IEEE Signal Process. Mag., 120, 120122.

  • Wang, X. L., 2008a: Accounting for autocorrelation in detecting mean shifts in climate data series using the penalized maximal t or F test. J. Appl. Meteor. Climatol., 47, 24232444.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., 2008b: Penalized maximal F test for detecting undocumented mean shift without trend change. J. Atmos. Oceanic Technol., 25, 368384.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized maximal t test for detecting undocumented mean change in climate data series. J. Appl. Meteor. Climatol., 46, 916931.

    • Search Google Scholar
    • Export Citation
  • Welch, L. R., 2003: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc. Newsl., 53, 1013.

  • Whitley, D., 1994: A genetic algorithm tutorial. Stat. Comput., 4, 6585.

  • Wilks, D. S., 1999: Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agric. For. Meteor., 96, 85101.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2005: Statistical Methods in the Atmospheric Sciences. Academic Press, 648 pp.

  • Won, K. J., A. Prügel-Bennett, and A. Krogh, 2004: Training HMM structure with genetic algorithm for biological sequence analysis. Bioinformatics, 20, 36133619.

    • Search Google Scholar
    • Export Citation
  • Xie, Y., J. Yu, and B. Ranneby, 2008: A general autoregressive model with Markov switching: Estimation and consistency. Math. Methods Stat., 17, 228240.

    • Search Google Scholar
    • Export Citation
Save
  • Aguilar, E., I. Auer, M. Brunet, T. C. Peterson, and J. Wieringa, 2003: Guidance on metadata and homogenization. WMO TD 1186, 53 pp.

  • Akaike, H., 1973: Information theory and an extension of the maximum likelihood principle. Proceedings of the Second International Symposium on Information Theory, B. N. Petrov and F. Csádki, Eds., Akadémiai Kiadó, 267–281.

    • Search Google Scholar
    • Export Citation
  • Alexandersson, H., and A. Moberg, 1997: Homogenization of Swedish temperature data. Part I: Homogeneity test for linear trends. Int. J. Climatol., 17, 2534.

    • Search Google Scholar
    • Export Citation
  • Bai, J., and P. Perron, 1998: Estimating and testing linear models with multiple structural changes. Econometrica, 66, 4778.

  • Baker, J., 1987: Reducing bias and inefficiency in the selection algorithm. Proc. Second Int. Conf. on Genetic Algorithms and Their Application, Cambridge, MA, Massachusetts Institute of Technology, 14–21.

    • Search Google Scholar
    • Export Citation
  • Blickle, T., and L. Thiele, 1995: A mathematical analysis of tournament selection. Proc. Sixth Int. Conf. on Genetic Algorithms, Pittsburgh, PA, University of Pittsburgh, 9–16.

    • Search Google Scholar
    • Export Citation
  • Cappé, O., E. Moulines, and T. Rydén, 2005: Inference in Hidden Markov Models. Springer, 672 pp.

  • Caussinus, H., and O. Mestre, 2004: Detection and correction of artificial shifts in climate series. Appl. Stat., 53, 405425.

  • Celeux, G., and J. B. Durand, 2008: Selecting hidden Markov model state number with cross-validated likelihood. Comput. Stat., 23, 541564.

    • Search Google Scholar
    • Export Citation
  • Chambaz, A., A. Garivier, and E. Gassiat, 2009: A minimum description length approach to hidden Markov models with Poisson and Gaussian emissions. Application to order identification. J. Stat. Plann. Inference, 139, 962977.

    • Search Google Scholar
    • Export Citation
  • Davis, R. A., T. C. M. Lee, and G. A. Rodriguez-Yam, 2006: Structural break estimation for nonstationary time series models. J. Amer. Stat. Assoc., 101, 223239.

    • Search Google Scholar
    • Export Citation
  • DeGaetano, A. T., 2006: Attributes of several methods for detecting discontinuities in mean temperature series. J. Climate, 19, 838853.

    • Search Google Scholar
    • Export Citation
  • Easterling, D. R., and T. C. Peterson, 1995: A new method for detecting undocumented discontinuities in climatological time series. Int. J. Climatol., 15, 369377.

    • Search Google Scholar
    • Export Citation
  • Ephraim, Y., and W. J. J. Roberts, 2005: Revisiting autoregressive hidden Markov modeling of speech signals. IEEE Signal Process. Lett., 12, 166169.

    • Search Google Scholar
    • Export Citation
  • Forney, G. D., 1973: The Viterbi algorithm. Proc. IEEE, 61, 268278.

  • Fridlyand, J., A. M. Snijders, D. Pinkel, D. G. Albertson, and A. N. Jain, 2004: Hidden Markov models approach to the analysis of array CGH data. J. Multivariate Anal., 90, 132153.

    • Search Google Scholar
    • Export Citation
  • Frühwirth-Schnatter, S., 2006: Finite Mixture and Markov Switching Models. Springer, 492 pp.

  • Gassiat, E., 2002: Likelihood ratio inequalities with applications to various mixtures. Ann. Inst. Henri Poincaré, 38, 897906.

  • Grünwald, P. D., 2007: The Minimum Description Length Principle. MIT Press, 703 pp.

  • Holland, J. H., 1992: Adaptation in Natural and Artificial Systems. MIT Press, 211 pp.

  • Hubert, P., 1997: Change points in meteorological time analysis. Application of Time Series Analysis in Astronomy and Meteorology, T. Subba Rao, M. B. Priestly, and O. Lessi, Eds., Chapman and Hall, 399–412.

    • Search Google Scholar
    • Export Citation
  • Jann, A., 2006: Genetic algorithms: Towards their use in the homogenization of climatological records. Croatian Meteor. J., 41, 319.

  • Jansen, E., and Coauthors, 2007: Paleoclimate. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 433–497.

    • Search Google Scholar
    • Export Citation
  • Juang, B. H., and L. R. Rabiner, 1991: Hidden Markov models for speech recognition. Technometrics, 33, 251272.

  • Kehagias, A., 2004: A hidden Markov model segmentation procedure for hydrological and environmental time series. Stochastic Environ. Res. Risk Assess., 18, 117130.

    • Search Google Scholar
    • Export Citation
  • Kehagias, A., and V. Fortin, 2006: Time series segmentation with shifting means hidden Markov models. Nonlinear Processes Geophys., 13, 114.

    • Search Google Scholar
    • Export Citation
  • Kehagias, A., E. Nidelkou, and V. Petridis, 2005: A dynamic programming segmentation procedure for hydrological and environmental time series. Stochastic Environ. Res. Risk Assess., 20, 7794.

    • Search Google Scholar
    • Export Citation
  • Kim, H. J., and D. Siegmund, 1989: The likelihood ratio test for a change point in simple linear regression. Biometrika, 76, 409423.

  • Koop, G., and S. M. Potter, 2009: Prior elicitation in multiple change-point models. Int. Econ. Rev., 50, 751772.

  • Kuglitsch, F. G., A. Toreti, E. Xoplaki, P. M. Della-Marta, J. Luterbacher, and H. Wanner, 2009: Homogenization of daily maximum temperature series in the Mediterranean. J. Geophys. Res., 114, D15108, doi:10.1029/2008JD011606.

    • Search Google Scholar
    • Export Citation
  • Kwong, S., C. W. Chau, K. F. Man, and K. S. Tang, 2001: Optimisation of HMM topology and its model parameters by genetic algorithms. Pattern Recognit., 34, 509522.

    • Search Google Scholar
    • Export Citation
  • Lee, T. C. M., 2001: An introduction to coding theory and the two-part minimum description length principle. Int. Stat. Rev., 69, 169183.

    • Search Google Scholar
    • Export Citation
  • Li, S., and R. Lund, 2012: Multiple changepoint detection via genetic algorithms. J. Climate, 25, 674686.

  • Lu, Q., R. Lund, and T. C. M. Lee, 2010: An MDL approach to the climate segmentation problem. Ann. Appl. Stat., 4, 299319.

  • Lund, R., X. L. Wang, Q. Lu, J. Reeves, C. Gallagher, and Y. Feng, 2007: Changepoint detection in periodic and autocorrelated time series. J. Climate, 20, 51785190.

    • Search Google Scholar
    • Export Citation
  • MacDonald, I. L., and W. Zucchini, 1997: Hidden Markov and Other Models for Discrete-Valued Time Series. Chapman and Hall, 256 pp.

  • MacKay, R. J., 2002: Estimating the order of a hidden Markov model. Can. J. Stat., 30, 573589.

  • Menne, M. J., and C. N. Williams, 2005: Detection of undocumented changepoints using multiple test statistics and composite reference series. J. Climate, 18, 42714286.

    • Search Google Scholar
    • Export Citation
  • Mielke, P. W., K. J. Berry, and G. W. Brier, 1981: Application of multi-response permutation procedures for examining seasonal changes in monthly mean sea level pressure patterns. Mon. Wea. Rev., 109, 120126.

    • Search Google Scholar
    • Export Citation
  • Miranda, P. M. A., and A. R. Tomé, 2009: Spatial structure of the evolution of surface temperature (1951–2004). Climatic Change, 93, 269284.

    • Search Google Scholar
    • Export Citation
  • Moberg, A., and Coauthors, 2006: Indices for daily temperature and precipitation extremes in Europe analyzed for the period 1901–2000. J. Geophys. Res., 111, D22106, doi:10.1029/2006JD007103.

    • Search Google Scholar
    • Export Citation
  • Pawson, S., K. Labitzke, and S. Leder, 1998: Stepwise changes in stratospheric temperature. Geophys. Res. Lett., 25, 21572160.

  • Perreault, L., J. Bernier, B. Bobee, and E. Parent, 2000: Bayesian change-point analysis in hydrometeorological time series. Part 1. The normal model revisited. J. Hydrol., 235, 221241.

    • Search Google Scholar
    • Export Citation
  • Peterson, T. C., and D. R. Easterling, 1994: Creation of homogeneous composite climatological reference series. Int. J. Climatol., 14, 671679.

    • Search Google Scholar
    • Export Citation
  • Rabiner, L. R., 1989: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE, 77, 257286.

  • Randall, D. A., and Coauthors, 2007: Climate models and their evaluation. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 589–662.

    • Search Google Scholar
    • Export Citation
  • Ray, B. K., and R. S. Tsay, 2002: Bayesian method for change-point detection in long-range dependent processes. J. Time Ser. Anal., 23, 687705.

    • Search Google Scholar
    • Export Citation
  • Reeves, J., J. Chen, X. L. Wang, R. Lund, and Q. Lu, 2007: A review and comparison of changepoint detection techniques for climate data. J. Appl. Meteor. Climatol., 46, 900915.

    • Search Google Scholar
    • Export Citation
  • Robinson, L. F., T. D. Wager, and M. A. Lindquist, 2010: Change point estimation in multi-subject fMRI studies. NeuroImage, 49, 15811592.

    • Search Google Scholar
    • Export Citation
  • Schwarz, G., 1978: Estimating the dimension of a model. Ann. Stat., 6, 461464.

  • Swami, D. K., and R. C. Jain, 2006: PAMC: Partitioning around medoids for classification. Inf. Technol. J., 5, 11021105.

  • Tai, Y. C., M. N. Kvale, and J. S. Witte, 2010: Segmentation and estimation for SNP microarrays: A Bayesian multiple change-point approach. Biometrics, 66, 675683.

    • Search Google Scholar
    • Export Citation
  • Tartakovsky, A. G., B. L. Rozovskii, R. Blažek, and H. Kim, 2006: Detection of intrusions in information systems by sequential change-point methods. Stat. Methodol., 3, 252293.

    • Search Google Scholar
    • Export Citation
  • Toreti, A., and F. Desiato, 2008: Temperature trend over Italy from 1961 to 2004. Theor. Appl. Climatol., 91, 5158.

  • Toreti, A., G. Fioravanti, W. Perconti, and F. Desiato, 2009: Annual and seasonal precipitation over Italy from 1961 to 2006. Int. J. Climatol., 29, 19761987.

    • Search Google Scholar
    • Export Citation
  • Toreti, A., F. G. Kuglitsch, E. Xoplaki, J. Luterbacher, and H. Wanner, 2010a: A novel method for the homogenization of daily temperature series and its relevance for climate change analysis. J. Climate, 23, 53255331.

    • Search Google Scholar
    • Export Citation
  • Toreti, A., E. Xoplaki, D. Maraun, F. G. Kuglitsch, H. Wanner, and J. Luterbacher, 2010b: Characterisation of extreme winter precipitation in Mediterranean coastal sites and associated anomalous atmospheric circulation patterns. Nat. Hazards Earth Syst., 10, 10371050.

    • Search Google Scholar
    • Export Citation
  • Trenberth, K. E., 1990: Recent observed interdecadal climate changes in the Northern Hemisphere. Bull. Amer. Meteor. Soc., 71, 988993.

    • Search Google Scholar
    • Export Citation
  • Viterbi, A. J., 2006: A personal history of the Viterbi algorithm. IEEE Signal Process. Mag., 120, 120122.

  • Wang, X. L., 2008a: Accounting for autocorrelation in detecting mean shifts in climate data series using the penalized maximal t or F test. J. Appl. Meteor. Climatol., 47, 24232444.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., 2008b: Penalized maximal F test for detecting undocumented mean shift without trend change. J. Atmos. Oceanic Technol., 25, 368384.

    • Search Google Scholar
    • Export Citation
  • Wang, X. L., Q. H. Wen, and Y. Wu, 2007: Penalized maximal t test for detecting undocumented mean change in climate data series. J. Appl. Meteor. Climatol., 46, 916931.

    • Search Google Scholar
    • Export Citation
  • Welch, L. R., 2003: Hidden Markov models and the Baum-Welch algorithm. IEEE Inf. Theory Soc. Newsl., 53, 1013.

  • Whitley, D., 1994: A genetic algorithm tutorial. Stat. Comput., 4, 6585.

  • Wilks, D. S., 1999: Simultaneous stochastic simulation of daily precipitation, temperature and solar radiation at multiple sites in complex terrain. Agric. For. Meteor., 96, 85101.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2005: Statistical Methods in the Atmospheric Sciences. Academic Press, 648 pp.

  • Won, K. J., A. Prügel-Bennett, and A. Krogh, 2004: Training HMM structure with genetic algorithm for biological sequence analysis. Bioinformatics, 20, 36133619.

    • Search Google Scholar
    • Export Citation
  • Xie, Y., J. Yu, and B. Ranneby, 2008: A general autoregressive model with Markov switching: Estimation and consistency. Math. Methods Stat., 17, 228240.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Number of correct detections of a single break point in function of the shift (expressed in terms of σ) for GAHMDI (black line), CauMe (gray line), RHtest (dashed gray line), and SNHT (dashed black line). A constant inhomogeneity is added after the (a) 25th, (b) 50th, and (c) 75th value. (d) A random signal N(μa, σa) is added after the 70th value.

  • Fig. 2.

    Identification of three break points (located at the 20th, 50th, and 85th values) added at the simulated series. The number of correct detections of each break point is plotted in correspondence of 20, 50, and 85 (x axis). The label 3 on the x axis represents the contemporaneous identification of the three break points. Black circles are associated with GAHMDI, gray circles with CauMe, gray stars with RHtest, and black stars with SNHT.

  • Fig. 3.

    Winter precipitation sums (black line) from the weather station of Milan, Italy (red dot in the small panel). Vertical dashed lines identify the break points detected by GAHMDI (blue line), CauMe (red line), SNHT (green lines), and RHtest (gray lines).

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1961 1409 145
PDF Downloads 279 48 3