## Abstract

This study integrates a Box–Cox power transformation procedure into a common trend two-phase regression-model-based test (the extended version of the penalized maximal *F* test, or “PMFred,” algorithm) for detecting changepoints to make the test applicable to non-Gaussian data series, such as nonzero daily precipitation amounts or wind speeds. The detection-power aspects of the transformed method (transPMFred) are assessed by a simulation study that shows that this new algorithm is much better than the corresponding untransformed method for non-Gaussian data; the transformation procedure can increase the hit rate by up to ∼70%. Examples of application of this new transPMFred algorithm to detect shifts in real daily precipitation series are provided using nonzero daily precipitation series recorded at a few stations across Canada that represent very different precipitation regimes. The detected changepoints are in good agreement with documented times of changes for all of the example series. This study clarifies that it is essential for homogenization of daily precipitation data series to test the nonzero precipitation amount series and the frequency series of precipitation occurrence (or nonoccurrence), separately. The new transPMFred can be used to test the series of nonzero daily precipitation (which are non Gaussian and positive), and the existing PMFred algorithm can be used to test the frequency series. A software package for using the transPMFred algorithm to detect shifts in nonzero daily precipitation amounts has been developed and made freely available online, along with a quantile-matching (QM) algorithm for adjusting shifts in nonzero daily precipitation series, which is applicable to all positive data. In addition, a similar QM algorithm has also been developed for adjusting Gaussian data such as temperatures. It is noticed that frequency discontinuities are often inevitable because of changes in the measuring precision of precipitation, and that they could complicate the detection of shifts in nonzero daily precipitation data series and void any attempt to homogenize the series. In this case, one must account for all frequency discontinuities before attempting to adjust the measured amounts. This study also proposes approaches to account for detected frequency discontinuities, for example, to fill in the missed measurements of small precipitation or the missed reports of trace precipitation. It stresses the importance of testing the homogeneity of the frequency series of reported zero precipitation and of various small precipitation events, along with testing the series of daily precipitation amounts that are larger than a small threshold value, varying the threshold over a set of small values that reflect changes in measuring precision over time.

## 1. Introduction

Climate data series usually contain artificial shifts due to inevitable changes in observing instrument (or observer), location, environment, and observing practices/procedures taking place during the period of data collection. Data discontinuities also arise from the continuously evolving technology of climate monitoring. It is important to detect artificial changepoints in climate data series, because these artificial changes could considerably bias the results of climate trends, variability, and extremes analysis.

Many changepoint detection methods have been developed (Wang et al. 2007; Wang 2003, 2008a,b; Vincent 1998; Alexandersson 1986; among many others). However, most of the commonly used methods (including those listed above) assume that the data are normally distributed, and those that do not need the normality assumption, such as nonparametric methods (see Reeves et al. 2007) or empirical likelihood-based methods, are at most comparable with a method based on normality assumption if the data can be transformed to approximate a normal distribution well. However, the normality assumption is often invalid for daily precipitation data, which is one of the most important climate variables. Apart from its departure from normality, daily precipitation is not a continuous variable. Therefore, the most common approach to modeling daily precipitation has been to use models that describe the occurrence (nonoccurrence) process and to describe the distribution of the nonzero amounts independently (Woolhiser 1992; Katz and Perlange 1993, 1996; Wang and Cho 1997; among others).

For homogenization of daily precipitation data series, it is essential to model the occurrence process and the nonzero amounts separately. Otherwise, estimates of adjustments needed for nonzero amounts would be biased; in particular, adding an adjustment value to all days, including days of zero precipitation, will make all days of reported zero precipitation disappear from the series or will result in negative daily precipitation amounts. Changes in measuring device usually would not introduce any sudden change in the reported zero precipitation, unless they are accompanied by a change in the measuring precision (which could affect the frequency of small amounts measured and hence the frequency of reported precipitation occurrence or nonoccurrence). One should not change the zeros in a daily precipitation series, unless there are corresponding reports of trace occurrence or there is a change in the measuring precision. In the latter cases, the amount to be added to some of the days of reported zero precipitation should be very small, not exceeding the smallest measured amount (i.e., the highest measuring precision during the period of data record).

In a data homogenization context, the precipitation occurrence process can be represented by the frequency of precipitation occurrence (or nonoccurrence). For series of monthly or annual relative frequency (i.e., count divided by the total number of observations in the month or year) or a logistic transformation of the count series (Wang 2006), normality is not a big concern. Some existing methods, such as the penalized maximal *F* (PMF) test and its extended version PMFred (which accounts for the first-order autocorrelation; Wang 2008a,b), can be used to test the homogeneity of the frequency series. For the nonzero daily amounts series, however, a more data-adaptive transformation is necessary (more details are given in section 3 below).

The main objective of this study is to develop a method for detecting changepoints in series of nonzero daily precipitation amounts, which are positive and typically non Gaussian. The focus here is on integrating a Box–Cox power transformation procedure into the PMFred algorithm, a common trend two-phase regression-model-based test for detecting changepoints (Wang 2003, 2008a,b). However, with minor modification, the transformation approach is also applicable to other two-phase regression-model settings, as detailed in section 2 below, and to other positive data (such as nonzero wind speeds). This study also proposes methods for adjustment of shifts in nonzero daily precipitation and other series of positive values; one of which can also be used for Gaussian data with slight modification (see section 5 below).

The paper is arranged as follows. Section 2 briefly reviews the maximal *F* tests for changepoint detection. Section 3 describes the new changepoint detection procedure with details. Section 4 reports the results of the detection power assessment. Section 5 describes the newly proposed methods for adjustment of shifts. Section 6 gives examples of application of the new algorithm to real daily precipitation data series from different precipitation regimes across Canada. We complete this article with some concluding remarks in section 7.

## 2. The maximal *F* tests for changepoint detection

Two maximal *F* tests for changepoint detection have been developed: the test of Lund and Reeves (2002) for a mean shift that may be accompanied with a trend change (TPR4 test) and the test of Wang (2003) for a mean shift that is not accompanied with a trend change (TPR3 test). Here, TPR indicates two-phase regression. We briefly review these tests in this section.

Let {*X _{i}*,

*i*= 1, … ,

*N*} denote a data series observed at times

*t*

_{1}< · <

*t*< · · · <

_{i}*t*. Assuming that there is at most one changepoint in this time series, the following two-phase regression model has been developed to test if there is a sudden change at time

_{N}*t*:

_{c}where the errors {*ε _{i}*} are usually assumed to be identically and independently distributed (IID) normal random variables with 0 mean and common variance

*σ*

^{2}(Lund and Reeves 2002; Wang 2003). Denote

**= (**

*θ**μ*

_{1},

*μ*

_{2},

*β*

_{1},

*β*

_{2}) and the expectation of

*X*by

_{i}*E*(

*X*) =

_{i}*μ*(

*t*,

_{i}*c*,

**); that is**

*θ*If (*μ*_{1}, *β*_{1}) ≠ (*μ*_{2}, *β*_{2}) for any *c* ∈ {*N*_{min}, *N*_{min} + 1, … , *N* − *N*_{min}} (*N*_{min} is the preset minimum segment length; the series can be split further only if its length *N* ≥ 2*N*_{min}), there exists a sudden change at time *t _{c}* (between

*t*and

_{c}*t*

_{c+1}), in which case

*t*is called a changepoint (and

_{c}*c*is the changepoint parameter) and

where (*μ*, *σ*^{2}) denotes a normal distribution with mean *μ* and variance *σ*^{2}.

Because of incomplete or inaccurate metadata, the changepoint time *t _{c}* is often unknown. Two maximal

*F*tests have been developed to estimate the changepoint time

*t*(or the parameter

_{c}*c*) and to test for its statistical significance. These are to test the full model (1) against the null model,

using the following *F*_{max} statistic:

where *S*_{0}^{2} denotes the sum of residual squares of the null model (3) fit and *S*^{2}(*c*) denotes the sum of the full model (1) fit with a changepoint at time *t _{c}*; in addition,

*κ*is the number of free regression parameters under the full model (1) and

*κ*is equal to the difference in the number of free regression parameters between the full and null models (Seber 1977; Lund and Reeves 2002). Note that the distribution function of

_{d}*F*

_{max}is complicated and unknown; hence, the

*F*

_{max}threshold values are estimated through Monte Carlo simulations (Lund and Reeves 2002; Wang 2003). As was shown in Wang (2008a), these empirical threshold values not only depend on

*N*,

*κ*, and

_{d}*κ*, but they also depend on the minimum segment length

*N*

_{min}. Here,

*N*

_{min}≥ 2 because there are two regression parameters (

*μ*,

*β*) in the null model (3). We set

*N*

_{min}= 10 in this study, so that we can use the

*F*

_{max}critical values given in Wang (2008a) and in the “RHtestsV3” software package (Wang and Feng 2010) and its previous version, which took several months of CPU time to obtain. Thus, there are at least 10 data available for estimating the two regression parameters (

*μ*,

*β*).

The integer values *κ _{d}* and

*κ*depend on the model settings. For the setting in which

*μ*

_{1}≠

*μ*

_{2}and/or

*β*

_{1}≠

*β*

_{2}(i.e., a mean shift that may be accompanied by a trend change),

*κ*= 2 and

_{d}*κ*= 4, and the related maximal

*F*test (Lund and Reeves 2002) is referred to as the TPR4 test. Different constraints on the regression parameters may be imposed to meet various environmental and application conditions. For the cases in which

*μ*

_{1}≠

*μ*

_{2}but

*β*

_{1}=

*β*

_{2}=

*β*(a mean shift without an accompanying trend change),

*κ*= 1 and

_{d}*κ*= 3, and the corresponding maximal

*F*test (Wang 2003) is referred to as the TPR3 test. The latter tests the null model (3) against the following full model:

For the case in which *β*_{1} ≠ *β*_{2} and *μ*_{2} = *μ*_{1} + (*β*_{1} − *β*_{2})*t _{c}* (i.e., the regression function is continuous at the changepoint

*c*; Solow 1987),

*κ*= 1 and

_{d}*κ*= 3; in addition, the corresponding maximal

*F*test is to test for a sudden trend change without an accompanying mean shift, that is, to test the full model

against the null model (3). The empirical *F*_{max} percentiles of Wang (2003) are applicable in this case, because the number of free regression parameters in the full model is also 3 in this setting of Solow (1987). Readers are referred to Reeves et al. (2007) for a comprehensive review and comparison of these maximal *F* tests.

Note that if *c* is fixed and known (no need to estimate *c* statistically), the standard *F* test should be used to test for the statistical significance of the known/documented changepoint at time *t _{c}*, that is, to compare the

*F*statistic with its critical value from

_{c}*F*

_{κd,N−κ}, a central

*F*distribution with

*κ*and (

_{d}*N*−

*κ*) degrees of freedom (e.g., Wang 2008a). This is a much easier case, which is not the focus of this study, although a similar search for the best data transformation (see section 3) can be performed while accounting for the known changepoint

*c*.

The above changepoint detection methods require the following core assumptions: 1) independent normality: the errors {*ε _{i}*} are IID normally distributed, 2) constant variance: {

*X*} have the common variance

_{i}*σ*

^{2}across the time period examined, 3) single changepoint: {

*X*} experience at most one changepoint over the time period examined, and 4) piecewise linearity: the expectation of

_{i}*X*is linear in time

_{i}*t*, respectively, before and after the putative changepoint

_{i}*c*. Violation of these model assumptions can result in severe consequences and impacts on validation and efficiency of the detection procedures and can even break down the procedures. As many researchers and practitioners have recognized, these assumptions are often not met in climate applications.

It has been noticed that, as a result of the effect of unequal sample sizes, the distribution of false-alarm rate (FAR) is W shaped for the TPR3 test and U shaped for the TPR4 test (Wang 2008b). To even out the uneven FAR distribution of the TPR3 test, Wang (2008b) has proposed the PMF test, a penalized version of the TPR3 test. [In a similar way, a penalized version of the TPR4 test is being developed (X. L. Wang 2010, unpublished manuscript).] Wang (2008a) further extended this PMF test to account for the first-order autocorrelation in the series being tested, proposing the PMFred algorithm, which also includes a stepwise testing algorithm for detecting multiple changepoints in a series (note that the stepwise algorithm does not test the null hypothesis of no changepoint against the alternative of one or more changepoints; rather, it repeats the test of the no-changepoint null hypothesis against the alternative of at most one changepoint in a segment of the series, with the segments being determined by the results of the preceding tests; see Wang 2008a).

In the next section, we propose to integrate a data transformation procedure into the PMFred algorithm (Wang 2008a) to diminish the effects of departure from normality on the test results. Note that this data transformation procedure is also applicable to other positive data such as nonzero wind speeds and to other two-phase regression-model-based tests, such as the TPR4 test or the maximal *F* test for data of structure as described in Solow (1987). The transformation procedure is also applicable to monthly total precipitation data series, but zero amounts need not be treated separately in this case. However, a simple logarithmic transformation is often sufficient for monthly total precipitation series, as is recommended in the RHtestsV3 user manual (Wang and Feng 2010).

## 3. The transPMFred algorithm

Let {*Y _{i}*,

*i*= 1, … ,

*N*} be a climate data series observed at times

*t*(

_{i}*t*

_{1}< · · · <

*t*). The main idea of the proposed changepoint detection procedure is to first seek an appropriate transformation, say

_{N}*h*(), such that the transformed data series {

*h*(

*Y*)} comes near to meeting the core model assumptions, namely, the assumptions of normality, constant variance, single changepoint, and piecewise linearity. The changepoint detection is then performed on the transformed data.

_{i}Selection of a transformation obviously needs to be based on the data, because different climate data series examined may have different distributions, and especially distribution shapes, and hence require different transformations so that the transformed data nearly satisfy the normality assumption. Following common modeling practice and techniques, a transformation is to be selected from a family that is indexed by a scalar parameter, say *h*(*y*; *λ*), where *h* is a specific function and *λ* is called a transformation parameter. The transformation family is expected to possess the following three properties:

For changepoint detection, the transformation must guarantee that none of the information on changepoints contained in the data is altered or modified by the transformation. For each

*λ*fixed,*h*(*y*;*λ*) must be a one-to-one function in*y*.The transformation family is easy to handle mathematically. To be specific, the transformation does not bring in intractable computational difficulties. Common requirements include differentiability of

*h*with respect to*y*for each fixed*λ*.The transformation family is rich and flexible enough to allow a selection to attain good approximation to normality.

Because nonzero daily precipitation data are positive and typically highly skewed, we recommend using the Box–Cox power transformation (Box and Cox 1964), which is defined as follows: For a response *Y _{i}* > 0,

It is seen that the power transformation family is of the properties 1–3. Given positive data series {*Y _{i}*}, the transformed distribution of

*X*=

_{i}*h*(

*Y*;

_{i}*λ*) in the common trend two-phase regression model (4) is expected to have the probability density function

for *i* = 1, 2, … , *N*, where ** θ** = (

*μ*

_{1},

*μ*

_{2},

*β*,

*σ*

^{2}), as defined before. Noting that the transformation

*x*=

*h*(

*y*;

*λ*) has Jacobian

*dx*/

*dy*=

*y*

^{λ−1}, in terms of the original scale,

*Y*is expected to have the probability density function

_{i}For a given transformation parameter *λ*, let *S*^{2}(*c*|*λ*) denote the sum of residual squares of the least squares fit to the transformed responses *X _{i}* =

*h*(

*Y*;

_{i}*λ*) with a putative changepoint at time

*t*=

*t*. Let

_{c}*P*(

*c*) denote the penalty function proposed in Wang (2008b) for the TPR3 setting; and let “

*ĉ*|

*λ*” denote the most probable changepoint in the transformed data series {

*X*} = {

_{i}*h*(

*Y*;

_{i}*λ*)}. Then, the PMF test statistic can be expressed as

The profile log-likelihood function for *λ* up to a constant can be defined as

The number of free parameters in a multiphase common trend regression model with *M _{c}* mean shifts is

*κ*= (2 +

*M*); thus,

_{c}*κ*= 3 for the single-mean-shift full model (4). This log-likelihood function has been maximized over all admissible

*c*s (

*c*=

*N*

_{min},

*N*

_{min}+ 1, … ,

*N*−

*N*

_{min}), given the transformation parameter

*λ*.

For any admissible changepoint parameter *c*, the pseudo–maximum likelihood estimate *λ̂* for the transformation parameter *λ* is such that

where *a* and *b* are two specific numbers. In applications, *a* = −1 and *b* = 1 are commonly recommended and are found to be sufficient for the maximizer *λ̂* to occur within the interval [*a*, *b*].

In view of modeling, the proposed changepoint detection procedure can be considered to be a changepoint detection procedure in the model (7) with *λ* being treated as a model parameter. In other words, for the observed responses, there is a *λ* such that the transformed data *X _{i}* = {

*h*(

*Y*;

_{i}*λ*)} approximately meet the assumptions of normality, constant variance, single changepoint, and piecewise linearity, so that the changepoint detection can be performed on the transformed data series {

*X*}.

_{i}It has been a common practice in applications to reexpress the observed data and use a new scale for the measurements to employ a standard statistical inference procedure, which dated back at least to 1964 when the paper Box and Cox (1964) was published (Atkinson 1985). Chen and Loh (1992) and Chen (1995) show that the transformed tests are usually more efficient and have greater testing powers within the context of large samples (such as daily data). A drawback of the transformed analysis is that it causes difficulties in interpretting the conclusions in the new scale with *λ̂* [perhaps we may feel comfortable only with *λ̂* = −1, −½, 0, or ½ to interpret the results in terms of *X _{i}* =

*h*(

*Y*;

*λ̂*)]. This drawback fortunately does not show up in the present problem of changepoint detection since we are concerned with the question of whether a changepoint occurs in the climate data during the time period examined.

In applications, the estimate *λ̂* is usually approximated by an exhaustive search algorithm. For convenience, a practical algorithm to search for the best *λ* value is outlined below. Let *a* = *λ*_{1} < *λ*_{2} < · · · < *λ _{J}* =

*b*be a grid of

*λ*over the interval [

*a*,

*b*], with the increment

*δ*; that is,

*λ*=

_{j}*a*+ (

*j*− 1)

*δ*for

*j*= 1, 2, … ,

*J*, where

*J*= [(

*b*−

*a*)/

*δ*+ 1]. To save computation time, we carry out four rounds of searching over the interval [−1.0, 1.0] in this study. In the first round, we search over three trial values of

*λ*, {−1.0, 0.0, 1.0} (i.e., set

*a*= −1.0,

*b*= 1.0, and

*δ*= 1.0), obtaining the first estimate of

*λ*, say

*λ̂*. We refine this estimate by setting

^{o}*a*=

^{o}*λ̂*−

^{o}*δ*and

*b*=

^{o}*λ̂*+

^{o}*δ*(and if

*a*<

^{o}*a*, we reset

*a*=

^{o}*a*; if

*b*>

^{o}*b*, we reset

*b*=

^{o}*b*), and

*δ*= 0.5, to search over the

^{o}*J*= [(

*b*−

^{o}*a*)/

^{o}*δ*+ 1] trial values of

^{o}*λ*:

*λ*=

_{j}*a*+ (

^{o}*j*− 1)

*δ*for

^{o}*j*= 1, 2, … ,

*J*. For example, if

*λ̂*= 0.0, the second set of

^{o}*J*= 5 trial values of

*λ*are {−1.0, −0.5, 0.0, 0.5, 1.0} (here only −0.5 and 0.5 are new trial values, for which the calculations need to be done). In a similar way, we carry out two more rounds of searching, with

*δ*= 0.2 and

^{o}*δ*= 0.1, respectively. The resulting final estimate of

^{o}*λ*is of one-decimal-place precision; the calculations are repeated up to 11 times (instead of the 21 times required by the direct search over the 21 trial values: −1.0, −0.9, … , 0.9, 1.0). If desired, further rounds of searching can be performed to refine the estimate further (e.g., from 0.1 to 0.01 precision, which would double the calculation time). Our experiments indicate that only the second decimal place of the test statistic could be affected by the

*λ*precision change from 0.1 to 0.01, which would only affect detection of marginally significant changepoints. A

*λ*value of one-decimal-place precision is often of sufficient precision. Thus, we decided to perform only four rounds of searching here and in the related software package RHtests_dlyPrcp, providing

*λ*estimates of one decimal point precision.

The smaller the *λ* value is, the shorter is the upper tail of the distribution; data of a negative *λ* have a shorter upper tail than those of a positive *λ*. We noticed that for Canadian nonzero (i.e., >0) daily precipitation data the best *λ* values range from −0.2 to 0.2, with −0.2 for the high Arctic stations (Alert and Resolute in Nunavut, Canada), −0.1 for central Canada, 0.0 for the region from the Great Lakes area eastward to the east coast, and 0.2 for the southwest-coastal stations (e.g., Port Hardy, British Columbia). For 11 series of nonzero daily precipitation data from stations in different regions of China, the best *λ* values range from −0.1 to 0.1, with *λ* = 0.0 for 8 out of the 11 series. We also noticed that the best *λ* value is 0.1 for two series from Turkey, 0.0 for one series from Jordan, and 0.1 and 0.2 for two series from Congo (a very wet area), respectively. (We are not able to estimate the *λ* value range for other areas because we do not have access to the daily precipitation data series.) From these results we speculate that negative *λ* values are typical for regions where precipitation is dominantly light, such as the Arctic and subarctic regions where more than one-third of the daily amounts are less than 1 mm and the daily amount rarely exceeds 25 mm (for the high Arctic station Alert, one-half of the daily amounts are less than 1 mm and the daily amount rarely exceeds 20 mm). However, when the series of daily precipitation greater than 3 mm (at The Pas, Manitoba, Canada) was tested, the best *λ* value is −0.6 (see section 6 for the reasons for testing such a series). These results indicate that it is necessary to search for the best *λ* value over the interval [−1.0, 1.0], allowing the comparison with the case of no transformation [i.e., *λ* = 1.0, in which case the results of testing the transformed series {*X _{i}*} = {

*h*(

*Y*; 1)} = {

_{i}*Y*− 1} will be the same as testing the original series {

_{i}*Y*}, because {

_{i}*X*} and {

_{i}*Y*} share exactly the same shape of distribution]. Note that we also include the most common case,

_{i}*λ*= 0.0, up front in our first round of searching for the best

*λ*.

Next, we outline a practical algorithm to perform the transformed changepoint detection procedure. Here, we focus on data from the TPR3 setting (i.e., common trend series: *β*_{1} = *β*_{2} = *β*). We describe the procedure for series of at most one changepoint and of multiple changepoints, subsequently.

For common trend series of at most one changepoint (mean shift), the detection procedure is as follows: In step A, for each trial transformation parameter *λ _{j}*(1 ≤

*j*≤

*J*), the transformed series {

*X*} = {

_{i}^{j}*h*(

*Y*;

_{i}*λ*)} is obtained:

_{j}In step B, the PMFred algorithm (Wang 2008a) is used to find the most probable changepoint in the transformed series {*X _{i}^{j}*} for each

*λ*. This step involves computing

_{j}*S*

_{0}

^{2}(

*λ*) and

_{j}*S*

^{2}(

*c*|

*λ*) for each admissible

_{j}*c*∈ {

*N*

_{min},

*N*

_{min}+ 1, … ,

*N*−

*N*

_{min}} and finding

*ĉ*such that

_{j}It also involves computing the log-likelihood function:

In step C, the profile log-likelihood function *l*(*ĉ _{j}*|

*λ*) is maximized over all

_{j}*j*∈ {1, 2, … ,

*J*}, to find

*λ̂*(and

*ĉ*) such that

where *m* is a fixed integer between 1 and *J* (inclusive). Now *λ̂* = *λ _{m}* represents the best transformation among all of the trial transformations (

*λ*

_{1},

*λ*

_{2}, … ,

*λ*), and

_{J}*ĉ*=

*ĉ*is the most probable changepoint in both the transformed series {

_{m}*X*} = {

_{i}^{m}*h*(

*Y*;

_{i}*λ*)} and the original series {

_{m}*Y*}. Last, in step D, the PF

_{i}_{max}value,

is compared with its critical value corresponding to the nominal significance, to determine whether *ĉ* is a statistically significant changepoint. As noted before, some critical values of the PF_{max} for the TPR3 setting are given in Wang (2008a), with the lag-1 autocorrelation of the series being taken into account.

For common trend series that may contain more than one mean shift (multiple changepoints), the detection procedure is as follows: In step 1, for each trial transformation parameter *λ _{j}* (1 ≤

*j*≤

*J*), the transformed series {

*X*} = {

_{i}^{j}*h*(

*Y*;

_{i}*λ*)} is obtained. In step 2, the PMFred algorithm (Wang 2008a) is used to identify all significant changepoints in the transformed series {

_{j}*X*} for each

_{i}^{j}*λ*(note that the PMFred algorithm includes a stepwise algorithm for detecting multiple changepoints in a single series). For the transformed series {

_{j}*X*}, say

_{i}^{j}*M*≥ 0 changepoints, , were identified to be statistically significant at the nominal level. Then,

_{c}^{j}*κ*=

_{j}*N*− (2 +

*M*). We compute

_{c}^{j}*S*

^{2}(

*|*

**Ĉ**_{j}*λ*), the sum of residual squares of the least squares fit to the transformed series {

_{j}*X*} = {

_{i}^{j}*h*(

*Y*;

_{i}*λ*)} with the

_{j}*M*changepoints, and the log-likelihood function:

_{c}^{j}In step 3, the profile log-likelihood function *l*(* Ĉ_{j}*|

*λ̂*) is maximized over all

_{j}*j*∈ {1, 2, … ,

*J*}, to find

*λ̂*(and

**) such that**

*Ĉ*where *m* is a fixed integer between 1 and *J* (inclusive). Now *λ̂* = *λ _{m}* represents the best transformation among all of the trial transformations (

*λ*

_{1},

*λ*

_{2}, … ,

*λ*), and are statistically significant changepoints in both the transformed series {

_{J}*X*} = {

_{i}^{m}*h*(

*Y*;

_{i}*λ*)} and the original series {

_{m}*Y*}. Last, in step 4, the identified changepoints and their significance corresponding to the best transformation

_{i}*λ*, and the best transformed series {

_{m}*X*} = {

_{i}^{m}*h*(

*Y*;

_{i}*λ*)}, are saved. The identified changepoints along with plots of the original and transformed series and all available metadata are analyzed to determine whether they are artificial changepoints of physical significance and whether they should be adjusted. This procedure is the same with or without the above transformation procedure; it has been detailed in Wang and Feng (2010) and Wang (2008a). However, when one or more of the identified changepoints is determined to be insignificant and hence is deleted from the list of changepoints, the best

_{m}*λ*value for the series with the new list of changepoints being accounted for is sought; then, the statistical significance of the retained changepoints is reestimated using the new transformed series (with the new best

*λ*value). The search for the new best

*λ*value is conducted in the same way as before. This procedure is repeated until all of the retained changepoints are determined to be significant. We noticed that even adding a few changepoints to the list of changepoints usually does not change the best

*λ*value. This is because all major changepoints that would notably bias the best

*λ*estimate were accounted for in the earlier stage of searching for the best value.

With a slight modification, the above procedure is applicable to data series of the TPR4 setting (i.e., *μ*_{1} ≠ *μ*_{2} and/or *β*_{1} ≠ *β*_{2}). One only needs to replace the PMF test in the PMFred algorithm with the corresponding test statistic for the TPR4-based test in the above procedure when handling data of the TPR4 structure.

Note that, in the procedure described above, the *λ* value that corresponds to the maximal profile log likelihood is simply chosen and is not compared with any threshold value; the position and significance of changepoints are determined by applying the PMFred test to one single transformed series, which does not involve any comparison of the test statistic from different *λ* values. Therefore, the critical values of the test statistic remain the same as those for the PMFred test. However, if we were to search for a most probable changepoint among all of the trial values of *λ* and to repeat such a search for the next most probable changepoint until no more significant changepoints can be found, the critical values most likely would be different from those of the PMFred algorithm, because the maximization of the test statistic for determining the most probable changepoint position involves different trial values of *λ*. In this case, one must reestimate the critical values. It is our plan to try the latter approach in a separate study.

## 4. Assessment of detection power aspects

In this section, we describe simulation studies to compare the detection power aspects of the new transPMFred algorithm with the corresponding untransformed version, the PMFred algorithm. Readers are referred to Reeves et al. (2007) for a detailed review and comparison of eight data homogeneity tests, including the TPR3, the TPR4, and the standard normal homogeneity (SNH) test (Alexandersson 1986), and to Wang (2008b) for a comparison of the PMF test with the TPR3 test. An extension of the PMF to the PMFred, along with a comparison of the PMFred with the PMF and TPR3 tests can also be found in Wang (2008a). The same study also presents an extension of the penalized maximal *t* (PMT) test to the PMTred, along with a comparison of the PMTred with the PMT and SNH tests; a comparison of the PMT test with the SNH test can also be found in Wang et al. (2007). However, the SNH, PMT, and PMTred tests, which assume zero trend in the series being tested and thus are for use with a reference series, are not suitable for daily precipitation series, because the extremely high spatial variability and noncontinuity of daily precipitation make it unrealistic to find a suitable reference series.

Note that an algorithm should be applied to truly homogeneous series to estimate its false-alarm rate and to series of known shifts (i.e., both the exact location and size of the shift are known) to estimate its hit rate (both false-alarm and hit rates are defined later in this section). A popular approach for conducting such a detection power assessment is to generate homogeneous surrogate series and to apply the algorithm to be assessed/compared with these series before and after inserting one or more changepoints into these series. This is called a simulation study and is what we do in this section.

To generate surrogate daily precipitation series that are homogeneous and yet mimic real daily precipitation data series, we use the so-called block-bootstrapping technique in this study. Considering the length of the typical synoptic scale, we choose a block size of 5 days, which should help to preserve the first-order autocorrelation of the daily precipitation process in the homogeneous surrogate series. To be specific, we randomly take blocks of 5-day data records, including zero values, from a homogeneous real daily precipitation series to generate *M* = 1000 surrogate daily precipitation series, each of which contains 900 nonzero values [i.e., we repeat the block-bootstrapping procedure until the surrogate series contains 900 nonzero values each, so that the sample size is the same for all of the surrogate series, which is important for the intended comparison of detection power over different *λ* values, because detection power increases with an increase in sample size (see Wang 2008a)]. Further, we add the linear trend estimated from the homogeneous real daily precipitation series to each of these surrogate series. We apply the transPMFred algorithm to each of these homogeneous surrogate series to estimate the false-alarm rate. Here, a false alarm is counted when the test detects a significant changepoint in an actually homogeneous surrogate series and a hit is counted when the test identifies a significant changepoint, *ĉ* ∈ [*K* − 10, *K* + 10], for a time series that has a true changepoint at time *K* (thus, the hit rate is a strict measure of accurate detection power).

Also, we insert, at randomly selected positions, one or two mean shifts of different magnitudes and signs to each of the surrogate series and then apply the transPMFred algorithm to each of these *M* series to estimate the hit rate. The size of the inserted shift, expressed in units of the standard deviation *σ* of the homogeneous surrogate series, varies from 0.5*σ* to 2.5*σ* (see Table 1 and Fig. 1). Considering that the hit rate for the one-shift cases would be more comparable with that for the two-shift cases (more details below) if they share the same sample size on average, or equivalently the same “mean” segment length *N _{s}* =

*N*/(

*M*+ 1), we use only the first 600 nonzero values in each of these

_{c}*M*series to assess the hit rate for one-shift (

*M*= 1) cases and the false-alarm rate, that is,

_{c}*N*= 600 for the one-shift cases, and we use

*N*= 900 for the two-shift cases, so that

*N*=

_{s}*N*/(

*M*+ 1) = 300 for both the one- and two-shift cases presented in Table 1. The lower bound (with subscript

_{c}*) of the rates in Table 1 is estimated using the upper bound of the first-order autocorrelation estimate for the series being tested, and the upper bound (with subscript*

_{L}*) is estimated using the lower bound of the autocorrelation estimate (these together represent the 95% uncertainty range). The latter is used in the RHtests_dlyPrcp package to output also changepoints that are identified to be within the 95% uncertainty range for further analysis subjectively or along with metadata [see Wang and Feng (2010) and Wang (2008a) for more details].*

_{U}We repeat the above simulation study using five homogeneous real daily precipitation data series of different *λ* values (−0.2, −0.1, 0.0, 0.1, and 0.2; see Table 1; they are from Alert, Coral Harbour, and Dawson in Canada and two stations in Congo). For comparison, the PMFred algorithm is also used in place of using the transPMFred algorithm. The results are summarized in Table 1 (and some are shown in Figs. 1a and 1b). Note that the values in Fig. 1 and Table 1 are only rough estimates, because each of them was estimated from only *M* = 1000 simulations (even so, we need to carry out a total of 180 000 simulations for the 180 values in Table 1, which is very CPU-consuming when *N* = 600 or 900).

In general, the false-alarm rates of the transPMFred algorithm are around the nominal level 5% (viz., this nominal level is within the 95% uncertainty range; see Table 1); the hit rates are satisfactory for shifts of moderate to large size and are more uncertain to estimate for small shifts (e.g., of size 0.5*σ*; see Table 1). In terms of hit rate, as shown in Fig. 1a (see also Table 1), the transPMFred algorithm clearly outperforms the PMFred algorithm. In other words, the transformation procedure significantly improves the detection power for various nonnormal data (of *λ* values ranging from −0.2 to 0.2); when *N* = 600, the increases in hit rates are up to 71%, 54%, 40%, and 25% for shifts of size 0.5*σ*, 1.0*σ*, 1.5*σ*, and 2.5*σ*, respectively (Table 1, top half; one-shift cases). The improvement is also greater for data with negative *λ* values (see Fig. 1a; also cf. top and bottom halves of Table 1). Important, as shown in Fig. 1b, is that the hit rates of the transPMFred algorithm are roughly the same (up to estimation error) across different *λ* values, showing more variations only when the shifts are small (e.g., 0.5*σ*; see also top half of Table 1).

As shown in Table 1, the hit rates are slightly lower in the two-shift cases than in the corresponding one-shift cases, which is due to the possibility of smaller sample sizes in the two-shift cases. Note that the length of the segment that contains one shift could be shorter than 600 in the two-shift cases (e.g., if the two shifts occur at the 250th and 400th points, respectively, the sample size for identifying the first shift is only 400), but it is always 600 in the one-shift cases (although the mean segment length is kept the same by using *N* = 600 and *N* = 900 for the one- and two-shift cases, respectively, as described above). Figure 1b also shows that the larger the shift is, the higher is the hit rate, as would be expected.

To check whether the detection power of the transPMFred is evenly distributed over changepoint position *K*, we also carry out the following simulations. Here, we use only the *M* surrogate series of *λ* = −0.1 and reduce the sample size from 600 to 500 (i.e., *N* = 500), just to save some CPU time. For each of the nine selected *K* ∈ {50, 100, 150, 200, 250, 300, 350, 400, 450}, we inserted a mean shift in each of the homogeneous surrogate series, setting the shift size to 0.5 and 1.0 standard deviations of the series, respectively. Then, we apply the transPMFred algorithm to each of theses surrogate series (each contains one shift), separately, to estimate the hit rates as a function of *K*. The results, shown in Fig. 1c, indicate that the detection power of the transPMFred algorithm is roughly the same (up to estimation error) across different *K* values, showing again more estimation uncertainty for small shifts (e.g., of size 0.5*σ*). Again, these hit rates are only rough estimates because each of them was obtained from only 1000 simulations.

## 5. Methods for adjustment of shifts

After identifying all of the significant changepoints, we wish to adjust the daily precipitation data series to diminish all significant artificial shifts (i.e., to homogenize the daily precipitation series). In this section, we propose two methods for estimating adjustments of artificial shifts identified in the nonzero daily precipitation series. The quantile-matching algorithm below can be used for other nonnegative data such as wind speeds and, with slight modification (see the different lower boundary conditions below), for Gaussian data as well.

First of all, one can derive the shift sizes from the “fitted mean response” of the nonzero daily precipitation series defined as

where *X̂ _{i}^{m}* denotes the multiphase regression fit to the transformed series {

*X*=

_{i}*h*(

*Y*;

_{i}*λ̂*)} with the estimated best transformation parameter

*λ̂*, and

*h*

^{−1}(

*X̂*;

_{i}^{m}*λ̂*) denotes the inverse Box–Cox (IBC) transformation of

*X̂*with

_{i}^{m}*λ̂*. The adjustments derived this way (i.e., from

*Ŷ*) are referred to as IBC adjustments. Note that applying such IBC adjustments could result in nonpositive daily precipitation amounts, unless the series is adjusted to the highest segment (i.e., the segment of highest mean value). To avoid nonpositive daily precipitation amounts and to keep the number of “rainy” days unchanged, one can replace each nonpositive value in the adjusted precipitation series with the smallest measured amount of daily precipitation. Such replacement is implemented in our RHtests_dlyPrcp software package. (Note that in this study the term rainy days is used to refer to precipitating days, which include days of any form of precipitation, liquid or solid, such as snowfall, as is popularly used in the literature.) An example series of IBC adjustments is shown in Fig. 2a, and the resulting IBC-adjusted precipitation series is shown in Fig. 3e (see section 6 below for more details). Note that the same single IBC adjustment value is applied to all data in the same segment, except for the cases in which nonpositive values are replaced with the smallest measured amount. In other words, an IBC adjustment is basically a mean adjustment, making little change to the shape of the distribution of the data.

_{i}^{m}As an alternative, we propose the following quantile-matching (QM) algorithm, in which the estimated precipitation “trend” component is preserved. Let {*X _{i}*

^{tr}=

*β̂t*′

_{i}}(

*i*= 1,2, … ,

*N*) denote the estimated linear trend component of the transformed series {

*X*=

_{i}*h*(

*Y*;

_{i}*λ̂*)}, and {

*Y*

_{i}^{tr}} is the IBC-transformed version of {

*X*

_{i}^{tr}}, that is,

*Y*

_{i}^{tr}=

*h*

^{−1}(

*X*

_{i}^{tr};

*λ̂*). The series {

*Y*

_{i}^{tr}} represents the monotone trend component in the fitted mean response of the nonzero daily precipitation series [i.e.,

*Ŷ*as defined in (8); see the magenta trend lines in Figs. 3a and 3b], which we will use as an approximation to the trend component of the nonzero daily precipitation series. Then, the nonzero daily precipitation series is “detrended” as follows:

_{i}^{m}where (inclusion of this constant is just to prevent nonpositive values from occurring in the detrended series). The use of quotation marks around the word detrended on first use is to emphasize that this detrending procedure removes only the trend component in *Ŷ _{i}^{m}*, which is what we want to preserve here; they will be omitted hereinafter. The detrended series {

*Y*

_{i}^{dtr}} is then used to estimate the probability distribution function (PDF) for each segment of the data series, which is then used to estimate the QM adjustments needed to make the data series homogeneous. The procedure is detailed next.

In consideration of possible climatic change in the distribution, we allow choices of using either all of the data in a segment or up to a chosen number (≥5) of years of data before (or after) the changepoint to estimate the PDF for the segment in question. However, we do not recommend the use of less than 30 yr of daily precipitation data (or 15 yr of continuous data such as daily temperatures) to estimate a PDF when there are more data available. This is because we noticed that an estimate of PDF (and hence the QM adjustments) using less than 30 yr of daily precipitation data often contains large sampling uncertainty; the error due to sampling uncertainty is often much larger than that due to the assumption of the same distribution over the period of the data record after a monotone trend has been preserved (as is done in this study). As a result of the large sampling uncertainty, the shape of the distribution could be overadjusted (e.g., the large variations in the QM adjustments for different quantiles derived with up to 10 yr of data, shown in the lower-right panel of Fig. 4, could lead to an overadjustment in the distribution shape; see section 6 for more details about the Amos example series). We recommend the use of all available data in a segment to estimate the PDF if the assumption of the same distribution throughout the period of the data record is not obviously violated (viewing the data time series plot would help to determine this subjectively).

For each changepoint, there are two chosen parts of the series for use in estimating the two PDFs, respectively, before and after the changepoint, as is schematically shown in Fig. 5. The data in each of these two chosen parts of the detrended series {*Y _{i}*

^{dtr}} are sorted in ascending order and then divided into

*M*ascending categories “equally” (to the extent possible) to estimate the PDFs at

_{q}*M*cumulative frequencies that are separated by the increment of 1/

_{q}*M*. For each

_{q}*l*∈ {1, 2, … ,

*M*}, let

_{q}*F*(

*l*) = (

*l*− 0.5)/

*M*denote the median empirical cumulative frequency (ECF) of the

_{q}*l*th category data, whose ECF falls in the interval ((

*l*− 1)/

*M*,

_{q}*l*/

*M*]. Also, let

_{q}*F*(0) =

*F*(1) − 1/

*M*and

_{q}*F*(

*M*+ 1) =

_{q}*F*(

*M*) + 1/

_{q}*M*, which are to be used for imposing the lower and upper boundary conditions for use in the spline interpolation below. Let

_{q}*P*(

_{b}*k*,

*l*) and

*P*(

_{a}*k*,

*l*) denote the means of the

*l*th category data in the chosen part of the series immediately before and after the

*k*th changepoint, respectively (see Fig. 5). The difference in the

*l*th category mean between the two segments separated by the

*k*th changepoint is derived as

Note that this is the category mean of the (*k* + 1)th segment minus that of the *k*th segment (*k* = 1, 2, … , *M _{c}*), and that

*P*(

_{a}*k*,

*l*) =

*P*(

_{b}*k*+ 1,

*l*) if the whole segment of data is chosen for use to estimate the PDF.

Let *s _{o}* denote the segment to which the other segments are to be adjusted (also referred to as the base segment). For each segment

*s*∈ {1, 2, … , (

*M*+ 1)}, the difference in the

_{c}*l*th category mean between the

*s*th and

*s*th segments can be derived as

_{o}Also, let the lower boundary value *A*(*s*, 0) = 0 for positive data, but *A*(*s*, 0) = *A*(*s*, 1) for Gaussian data; in addition, let the upper boundary value *A*(*s*, *M _{q}* + 1) =

*A*(

*s*,

*M*) (these boundary conditions keep the lowest and highest (50/

_{q}*M*)% of data bounded, so as not to let them depart too much from the mean of the QM adjustments for the respective category). Thus, for each segment

_{q}*s*, there are (

*M*+ 2) data points, (

_{q}*F*(

*l*),

*A*(

*s*,

*l*)) for

*l*= 0, 1, … ,

*M*+ 1. As shown in Fig. 4, a natural cubic spline is then fitted to these (

_{q}*M*+ 2) data points for each segment

_{q}*s*[except the base segment

*s*, for which

_{o}*A*(

*s*,

_{o}*l*) ≡ 0]. The QM adjustments needed to homogenize the series {

*Y*} will be derived from these fitted splines, as described next.

_{i}Let * _{s}*(

*i*) denote the ECF of the

*i*th datum in segment

*s*of the series {

*Y*

_{i}^{dtr}}. From the fitted spline for segment

*s*, we can look up the intersegment difference (i.e., the

*y*-axis value, which is the difference between segments

*s*and

*s*) that corresponds to the cumulative frequency

_{o}*(*

_{s}*i*). This difference, say (

*s*,

*i*), is the amount that will be added to the

*i*th datum in segment

*s*of the series {

*Y*}, to adjust it to segment

_{i}*s*. This spline interpolation is carried out for each value in each segment that needs to be adjusted (i.e., all except the base segment). The resulting differences (

_{o}*s*,

*i*) for

*i*= 1, 2, … ,

*N*are referred to as the QM adjustments for segment

*s*; the corresponding fitted spline (Fig. 4) represents these QM adjustments as a function of cumulative frequency. In a similar way, nonpositive values could appear in the adjusted precipitation amounts using these QM adjustments. Again, these nonpositive values are replaced with the smallest measured precipitation amount (or the smallest value in the original positive data, in general). An example series of QM adjustments is shown in Fig. 2b, and the resulting QM-adjusted precipitation series is shown in Fig. 3f (see section 6 below for more details).

Note that, because of the spline fit and interpolation described above, for any *M _{q}* ≥ 2, the resulting adjustment may vary from one datum to another; even when

*M*= 2, not only are two different adjustment values applied to the values in a segment, but each and every datum in a segment has its own adjustment that corresponds to its ECF.

_{q}The following aspects should be considered when choosing the number *M _{q}* for use in deriving the QM adjustments. First,

*M*shall be determined so that the shortest segment in the series has enough data in each of the

_{q}*M*categories to estimate the category means. Second, the larger the

_{q}*M*value is, the larger is the resulting adjustment to the shape of the distribution, and vice versa; the shape of the distribution is not adjusted at all when

_{q}*M*= 1 (one single adjustment value is applied to all data in the same segment in this case, which is the usual mean adjustment). Also, the larger the

_{q}*M*value is, the fewer data are available for estimating the PDF and the mean intersegment difference for each category and therefore the larger is the sampling uncertainty in the estimate of the PDFs and thus the QM adjustments (see Fig. 4). Given these considerations, in our software package we allow users either to set the

_{q}*M*value to any integer between 1 and 20 inclusive or to set

_{q}*M*= 0 to let the codes choose an

_{q}*M*value (a chosen

_{q}*M*value results from either method). Further, to ensure that even the shortest segment (say, of length

_{q}*N*

_{srt}) has enough data to estimate the mean of each category, our codes will automatically replace a chosen

*M*value with the integer max(

_{q}*N*

_{srt}/20, 1) if the chosen

*M*value is larger than integer

_{q}*N*

_{srt}/20 or smaller than 1. This ensures that there are at least 20 daily data in each category for estimating the categorical mean and that the minimum value for

*M*is 1 (viz., setting

_{q}*M*= 1 when there are not enough data in the shortest segment for estimating QM adjustments, and thus applying one single adjustment value to all data in the same segment). (For monthly or annual data series, the above

_{q}*N*

_{srt}/20 is replaced by

*N*

_{srt}/5 so that there are at least 5 monthly or annual data in each category for estimating the category mean.) In our software packages (RHtestsV3 and RHtests_dlyPrcp),

*M*is also automatically reset to 20 if it is larger than 20 to avoid large sampling uncertainty in the estimate of QM adjustments (and thus overadjustments to the shape of the distribution).

_{q}Note that all QM algorithms [e.g., Della-Marta and Wanner (2006), Trewin and Trevitt (1996), and the one described above] try to line up the adjustments by empirical frequencies, implicitly assuming that the frequency series are homogeneous. Thus, QM algorithms would work well for continuous variables such as air temperature or pressure; they should also work well for noncontinuous variables (such as daily precipitation) in the absence of frequency discontinuity. However, when there exists a discontinuity in the frequency of measurements of a noncontinuous variable (such as daily precipitation or wind speed), extra caution has to be exercised when using a QM algorithm. One must address all frequency discontinuities before estimating and applying any QM adjustments; otherwise, the QM adjustments could be largely biased by the frequency discontinuities and thus could be problematic, making the data depart more from the truth (see section 6 below for examples of, and possible solutions to address, frequency discontinuity).

Although the above description of the QM adjustment method is specific to nonzero daily precipitation series, this adjustment method can be used to homogenize any other climate variables. For data series that do not need the Box–Cox transformation, the detrending procedure above is also more straightforward. For example, the linear trend component as estimated by the multiphase regression model fit to the deseasonalized base series in the PMFred (or PMTred) algorithm (Wang 2008a) can be removed before the data are used to estimate the PDF and the QM adjustments. This QM adjustment algorithm has also been added to the RHtestsV3 software package (Wang and Feng 2010) as an alternative to the mean adjustments. This is particularly important for adjusting daily climate data series. Note that, as a result of applying the QM adjustments with *M _{q}* ≥ 2, the whole distribution of the data, and not only the mean, could be adjusted. When the PMFred or PMTred algorithm is used without data transformation, the detection of shifts is done on the mean only, which indicates that any shift that occurs in the variance or higher-order statistic without an accompanying significant shift in the mean may go undetected in this case. However, detection of shifts in the variance or higher-order statistic without an accompanying significant shift in the mean is outside the scope of this study, although it can deal with variance seasonality (see the last paragraph of section 6).

## 6. Application to real daily precipitation data series

In this section, we present an application of the new transPMFred algorithm to detect changepoints in nonzero daily precipitation amounts (mm) recorded at six Canadian stations in different precipitation regimes. The series for London [Ontario, joining of three stations whose climate identifiers (ID) are 6144505, 6144481, and 6144475] for the period from 1 March 1883 to 31 December 2006, and the series for St. John’s A (Newfoundland; climate ID 8403506) for the period from 1 January 1942 to 31 October 2008 were found to be homogeneous at the 95% confidence level (*λ̂* = 0.0 for both series) and hence will not be discussed further. For the other four example series, the names, climate IDs, periods of data tested, and their best *λ̂* values are listed in Tables 2 and 3 (column 1). Except for the Amos series, these long series were formed by joining of two nearby stations’ data (the two IDs listed in Tables 2 and 3).

Note that these data do not include adjustments for trace precipitation (more about trace precipitation at the end of this section), nor adjustments for the joining of stations. However, observations of snowfall were converted into their water equivalents using the adjustment factor map of Mekis and Brown (2010) and rainfall data had been adjusted for wetting losses and gauge undercatch using the methods of Mekis and Hogg (1999) but with adjustments being updated to account for new data/information (E. Mekis 2009, personal communication). After these adjustments, the smallest nonzero daily precipitation amount in these series is 0.20 mm.

As mentioned earlier, we propose to test the series of precipitation occurrence (or nonoccurrence, i.e., zero precipitation) frequency and the series of nonzero daily precipitation amounts separately using different tests (viz., the PMFred for the former and the transPMFred for the latter). Therefore, we collect all >0 daily precipitation amounts to form a series of nonzero daily precipitation (a pooled precipitation series), to which we apply the transPMFred algorithm, taking into account the effects of “lag 1” autocorrelation in the series being tested (which is the pooled precipitation series, in which two consecutive data points do not necessarily correspond to two consecutive calendar days; thus, the lag-1 autocorrelation is likely smaller than the autocorrelation calculated from consecutive days). We record the time index *t _{i}* at which the

*i*th nonzero daily precipitation in the series occurs. In other words, the

*i*th nonzero daily precipitation in the series occurred in the

*t*-th day in the period of record. The

_{i}*i*here is also referred to as the index number of nonzero daily precipitation day in the series. In our computing codes, the time index

*t*′

_{i}=

*t*/365.25 is used (rather than the index number

_{i}*i*or

*t*) in the regression analysis to estimate the monotonic trend component in the precipitation series. Thus, the linear trend (

_{i}*β̂*) of the transformed precipitation series is expressed in “units per year.”

Wang (2008a) defines two types of changepoints: Type-1 changepoints are those identified by applying a maximal *F*-type test (or maximal *t*-type tests when a reference series is used) and are statistically significant at the nominal level even without metadata support. Type-0 changepoints are those whose exact times of change are documented/known (therefore, there is no need to detect them using a statistical test), for which one only needs to determine their statistical significance using the regular *F* test (or the *t* test when a reference series is used). Although focusing on type-1 changepoints would address major discontinuities, one should look at both types of changepoints for purposes of data homogenization when metadata are available. The FindUD.dlyPrcp function in the RHtests_dlyPrcp software package can be used to narrow down the necessary metadata investigation if it has not been done [if you already know the times of changes that might cause an artificial shift in the data series, you can simply add them in the list of changepoints to test their statistical significance; see Wang and Feng (2010) for more details]. Here, we focus on the detection of type-1 changepoints to see whether the new transPMFred algorithm is able to detect real changepoints if there are no metadata available.

The results of applying the transPMFred algorithm to the four example series of >0 daily precipitation amounts are summarized in Tables 2 and 3. Two significant type-1 changepoints were identified for each of the three series listed in Table 2, which have very different distributions (*λ̂* = 0.2, 0.0, and –0.2 for Amos, Dawson, and Alert, respectively), while three significant type-1 changepoints were identified for the series of >0 daily precipitation amounts at The Pas (Table 3), whose *λ̂* = −0.1. All of these statistically identified times of change are in good agreement with the times of change indicated in the related metadata, implying the success of the transPMFred algorithm in detecting shifts in nonzero daily precipitation data of different climate regimes (reflected by different *λ* values, which represent different distributions). The original and transformed series of >0 daily precipitation amounts recorded at The Pas are shown in Figs. 3a and 3c, respectively, along with the estimated trend component and the identified changepoints.

After identifying all significant changepoints, we now wish to adjust the precipitation data series to diminish all significant artificial shifts (i.e., to homogenize the precipitation data series). However, homogenization of daily precipitation series is a very challenging task. Next, we use the series at The Pas to showcase the challenge, proposing some practical solutions.

Since daily precipitation is not a continuous variable (it is a mixed discrete/continuous variable), discontinuities could exist in both the frequency and the amount of precipitation measured. Because of changes in the unit (e.g., from imperial to metric), in measuring precision (gauge gradation scale), in observing practices, and so on, discontinuities often exist in the frequency of measured small precipitation and hence in the frequency of reported precipitation occurrence (or nonoccurrence). For example, our metadata indicate that there were changes in the measuring precision of both rainfall and snowfall, which introduced a discontinuity in the frequency of small daily precipitation amount measured at The Pas, as shown in the top panel of Fig. 6. That is, there was not any value ≤0.3 (dashed line) or in the interval (0.4, 0.5] (dotted line) in this daily precipitation series before 1977; there were many fewer values in the interval (0.3, 0.4] before 1946 (solid line). In other words, there are two obvious discontinuities in the frequency of measured small amounts, in 1945–46 and 1976–77, respectively. As noted in Table 3, the 1945–46 discontinuity is related to the joining of two stations that used different gauge types, rim heights, and observation frequencies and the 1976–77 discontinuity is related to changes in gauge type and rim height and also in the snowfall-measuring precision, which changed from 0.1 in. (about 2.5 mm) to 2 mm (the corresponding change in the precision of the snow water equivalent would be from 0.25 to 0.20 mm if the snowfall to water equivalent ratio is 10:1). There are also fewer values in the interval (0.5, 1.0] before 1939, a phenomenon that is, however, not necessarily a frequency discontinuity—it could be a manifestation of a shift in the measured amounts.

The results of applying the transPMFred to the series of >0.4-mm daily precipitation (Table 3, bottom part) also suggest that the 1945–46 and 1976–77 discontinuities are mainly discontinuities in the frequency of small amounts (≤0.4 mm) measured, because they are not found in the series in which all ≤0.4-mm amounts were excluded (i.e., the series of >0.4-mm daily precipitation), and that the 1938–39 changepoint is not necessarily/entirely a frequency discontinuity because it is also found in this series of >0.4-mm daily precipitation. In general, one can, and we recommend one to, test the series of >*P*_{thr} daily precipitation amounts, varying *P*_{thr} over a set of small values that reflect changes in the measuring precision (such as 0.3, 0.4, 0.5, …), to find out whether the detected discontinuity is a frequency discontinuity. This is because a frequency discontinuity due to an increase in measuring precision, say from *δ*_{1} to *δ*_{2} mm (*δ*_{2} < *δ*_{1}), shall not exist in the series of ≥*δ*_{1} daily precipitation amounts, namely, the series in which the measured amounts that are below *δ*_{1} are excluded.

The presence of frequency discontinuity could complicate the detection of other shifts. For example, the changepoint in June 1933, which is associated with changes in gauge type/rim height and in observing frequency, was only identified in the series of >0.4-mm daily precipitation and not in the series of >0 daily precipitation (Table 3).

In particular, in the presence of frequency discontinuity, the QM adjustments (and even the IBC adjustments) will not work properly; they will not be good physically even if they can make the series homogeneous. For example, for the series of > 0 daily precipitation at The Pas, which is affected by the frequency discontinuities in 1945–46 and 1976–77, the QM adjustments (with *M _{q}* = 4, 8, and 16) fail to homogenize it: the 1945–46 and 1976–77 changepoints always remain in the QM-adjusted series. The IBC adjustments also fail to homogenize this series of >0 daily precipitation. In this case, even if the QM or IBC adjustments can make the series homogeneous, they are not good at all physically. For example, as a result of the increase in measuring precision during 1976–77, 6.12245% of the >0 daily precipitation amounts during the period of 1977–2005 are ≤0.3 mm, but the lowest 6.12245% of the >0 daily precipitation in the early period (1910–76) are in the range of 0.31–0.51 mm (none of them are ≤0.3 mm), which would not belong to the lowest 6.12245% if there were no missed measurements of ≤0.3-mm daily precipitation in the early period. In this case, adjusting the 1910–76 segment of data to the 1977–2008 segment using a QM algorithm will alter the lowest 6.12245% of the data in the period of 1910–76, whereas these data of 1910–76, ranging from 0.31 to 0.51 mm, were most likely correct measurements and should not be altered in any way. The error arises from the wrong estimate of the cumulative frequency (CF) for the data of 0.31–0.51 mm—mistaking them as lower-CF-category data because the true lower-CF-category data are missing. Such a frequency mismatch will affect the adjustments for all frequency categories.

In reality, in the presence of frequency discontinuity, any adjustments that are derived from the measured precipitation amounts are not good, regardless of how they are estimated [by QM, IBC, or the ratio-based method of Alexandersson (1986)], because the problem is in the failure to report the small amounts, not in the amounts measured. One must account for the frequency discontinuities before estimating and applying any adjustments to the measured amounts; otherwise, correct measurements could be mistakenly adjusted to make up the missed measurements of small amounts—that is, a frequency mismatch will occur and will void the adjustments.

It would be the best if one could fill in the missed measurements of small daily precipitation, for example, by assigning the smallest measured amount (i.e., the smallest amount that was ever measured during the period of the data record) to the days for which precipitation occurrence was reported with zero amount (i.e., the amount was too small to be measured back then and hence was reported as zero). Of course, metadata (including flags in the data record, such as the flag used to indicate trace precipitation occurrence) and/or observations of other climate variables (such as current weather reports, cloud observations, etc.) are needed to identify the days of reported precipitation occurrence with zero amount recorded.

If the above best approach is not possible, one can also do the following to fill in the missed measurements of small amounts: (a) use the PMFred algorithm to identify shifts in the monthly relative frequency series of the nonzero but small (≤0.3 mm) precipitation events and estimate the shift size Δ* _{f}* (e.g., Δ

*= 6% for a jump from 0.5% to 6.5%, on average, in the relative frequency); (b) estimate the number of missed measurements of small amounts for each month in the early period as*

_{f}*M*

_{msg}= Δ

*×*

_{f}*N*

_{obs}, where

*N*

_{obs}is the total number of daily precipitation observations in the month; and (c) randomly pick

*M*

_{msg}days out of the reported zero precipitation days in the month and then assign the smallest measured amount to each of these

*M*

_{msg}days and indicate with a flag that these are estimated values. It is obvious that there is no guarantee that small precipitation did occur on all or some of these

*M*

_{msg}days. Therefore, warning of this uncertainty should be given to data users. With this uncertainty, the discontinuity in the frequency of measured small precipitation is diminished (adjusted). Therefore, the resulting daily precipitation series is more suitable for analyzing trends in the frequency of rainy days that are defined as days of >0 precipitation (a popular definition of a rainy day). Also, without the complication by frequency discontinuity, other shifts are more likely to be detected by the transPMFred algorithm, and more proper adjustments for other shifts detected in this series can be estimated using a QM algorithm. Correctly measured values will no longer be mistakenly adjusted to make up the missed measurements of small amounts; for example, the correctly measured values of 0.31–0.51 mm in the period of 1910–76 at The Pas will not be mistakenly adjusted to make up the missed measurements of ≤0.3-mm amounts in this period.

For all climate variables in general, one can also use simultaneous observations of other climate variables (predictors) that are closely related to the variable to be homogenized (predictand) to fill in the missed measurements. A statistical relationship between the predictand and the predictor(s) can be established using data from a period of more reliable observations for both the predictand and predictor(s). This relationship is then used, along with the reliable observations of the predictor(s) in the period in which observations of the predictand are missed, to estimate the missed observations of the predictand. Along this line, Dai et al. (2011) present an example of estimating missed measurements of dewpoint depression (DPD), a humidity variable derived from relative humidity measurements by hygrometers. Noting that the early hygrometers were considered to be unreliable or were not able to report on humidity under cold or dry conditions, they use a statistical relationship between DPD and vapor pressure and air temperature under cold (or dry) conditions to estimate the corresponding missed measurements of humidity (and hence DPD) under cold (or dry) conditions before their attempt to homogenize the DPD data series using the QM adjustment algorithm for nonnegative data proposed in this study.

One can also go around frequency discontinuities in daily precipitation data by testing the series of >*P*_{thr} daily precipitation, varying *P*_{thr} over a set of small values that reflect changes in the measuring precision (e.g., 0.3, 0.4, 0.5, …) and then homogenizing the series of >*P*_{thr} daily precipitation data that was found to be free of frequency discontinuity. For example, the series of >0.4-mm daily precipitation at The Pas, which contains two shifts (see Table 3, bottom part), can be homogenized by the QM adjustments (with *M _{q}* = 4 or 8) or by the IBC adjustments (Figs. 3e and 3f). However, all of the ≤

*P*

_{thr}values in the series are left unadjusted, and thus the frequency discontinuities remain in the series of >0 daily precipitation, affecting the values between 0 and

*P*

_{thr}inclusive. In this case, warning of such frequency discontinuity effects and advice to use only the homogenized series of >

*P*

_{thr}daily precipitation should be given to data users. This is the drawback of this “go around” approach. Some applications fortunately do not need to include small daily precipitation amounts. For example, one may define rainy days as those of >0.5-mm daily precipitation for analyzing trends in the frequency of rainy days, which is a commonly used climate index.

Another issue relevant to precipitation data homogeneity (and accuracy) is trace precipitation, which is particularly important for regions of high northern latitudes (such as Canada, Alaska, Siberia, and northern Europe), where trace and light precipitation prevail. In Canada, a zero daily precipitation is recorded with a T flag when precipitation occurred on that day but the amount is too small to be measured and hence is reported as zero. It is obvious that an increase in measuring precision could make an unmeasurable amount become measurable and thus decrease the frequency of reporting trace precipitation. Changes in the practice of trace reporting that took place during the period of the data record could also cause discontinuities in the frequency series of reported trace occurrence. In other words, discontinuity is inevitable in the frequency series of reported trace occurrence; adding a trace amount to days of reported trace occurrence could introduce a discontinuity into the series of daily precipitation.

As mentioned before, the PMFred algorithm can be used to test the homogeneity of relative frequency series. For example, as shown in the bottom panel of Fig. 6, two type-1 changepoints (1945–46 and 1955–56) were identified by the PMFred algorithm from the series of annual relative frequency of reported trace precipitation occurrence at The Pas. In this case, we can do the following to obtain the adjusted annual totals of trace precipitation: (a) correct the daily precipitation data for trace amounts over the most recent period (after 1955 in the example) by using the method of Mekis and Hogg (1999), (b) calculate the mean trace rate by dividing the total trace amount by the total count of T flags over the most recent period so that this mean trace rate is in units of millimeters per T flag and so that it is an average over the most recent period, (c) adjust the annual relative frequencies of T flag for the early periods (1946–55 and 1910–45) to the most recent period using the multiphase regression model fit of the PMFred algorithm to this series, (d) calculate the adjusted annual count of T flags for each year in the early periods by multiplying the adjusted annual relative frequency by the corresponding count of precipitation observations, and (e) calculate the adjusted annual trace amount for each year in the early periods by multiplying the adjusted annual count of T flags by the mean trace rate derived in step b and add the annual trace amount to the annual total precipitation to adjust it for the discontinuities in the frequency of reported trace occurrence. In a similar way, the PMFred algorithm can be used to check the homogeneity of the series of monthly relative frequency of trace precipitation occurrence; then adjustments to monthly total trace precipitation amounts can be derived. A different mean trace rate can be derived for each calendar month separately, to account for any annual cycle in this quantity (or for trace rainfall and trace snowfall separately, if desired).

We also point out that, for the example series from Canada, seasonal cycle is mainly in the frequency of precipitation occurrence and is negligible in the daily precipitation amounts so that we can pool all nonzero daily precipitation amounts together for the test. When seasonal cycle is not negligible in the daily precipitation amounts, one can pool all nonzero daily precipitation amounts in a season together and test the pooled precipitation series for each season separately (note that season is a general term here; it can be defined as a calendar month). This way, the seasonal cycle is accounted for by allowing a different transformation for different seasons—namely, different distributions of precipitation amounts and hence different variances in different seasons. The trend is also allowed to be different in different seasons. A disadvantage of this is that the same shift could be identified at slightly different times in the series for different seasons because of sampling variability and estimation error. Further analysis is needed to summarize the results of the tests, just as in the case of testing monthly mean temperature series for each season (or each of the 12 calendar months) separately.

## 7. Concluding remarks

We have integrated a Box–Cox power transformation procedure into the PMFred algorithm (which is based on the common trend two-phase regression model) to make it applicable for detecting changepoints in non-Gaussian data series, such as nonzero daily precipitation amounts or wind speeds or dewpoint depression data (which are all nonnegative and non Gaussian). The transformation procedure is also applicable to the two-phase regression model that allows the trend to change at the time of mean shift (Lund and Reeves 2002). We have also assessed the detection-power aspects of the new transPMFred algorithm by conducting a simulation study, the results of which show that the transPMFred algorithm is much better than the corresponding untransformed method, PMFred, for data of different non-Gaussian distributions (reflected by the different *λ* values ranging from –0.2 to 0.2). The transformation procedure can increase the hit rate by as much as 70% (see section 4).

A software package called RHtests_dlyPrcp, which contains a set of functions for use in detection and adjustment of shifts in nonzero daily precipitation amounts, has also been developed and has been made available online (http://cccma.seos.uvic.ca/ETCCDMI/software.shtml). This includes a quantile-matching algorithm for adjusting shifts in nonzero daily precipitation series, which is applicable to other nonnegative data such as wind speed and dewpoint depression. We have developed a function, QMadj.nonNegativeDLY, for conducting QM adjustments on nonnegative daily data and another function, QMadj.GaussianDLY, for conducting QM adjustments on Gaussian daily data such as daily temperatures. (The major difference between these two functions is in the lower boundary condition, as described earlier in section 5.)

Noting that frequency discontinuities are often inevitable because of changes in the measuring precision and that they could complicate the detection of shifts in nonzero daily precipitation data series and void any attempt to homogenize the series, we recommend the use of transPMFred to test the series of daily precipitation amounts that are larger than a threshold value, varying the threshold over a set of small values that reflect changes in the measuring precision over time, to gain some insight into the characteristics of the discontinuities detected statistically. In the meantime, one should also test the homogeneity of the frequency series of reported zero and various small precipitation events, and one can use the PMFred algorithm (the untransformed version) to do so. When a frequency discontinuity is present and is left unaccounted for, adjustments derived from the measured daily amounts could make the data deviate more from the truth, regardless of the method used to derive the adjustments [e.g., QM or IBC or the ratio based method of Alexandersson (1986)]. In this case, one must account for all frequency discontinuities before attempting to adjust the measured amounts. We have also proposed approaches to account for detected frequency discontinuities, to fill in the missed measurements of small precipitation or the missed reports of trace precipitation (section 6). It must be stressed that caution has to be exercised when homogenizing daily precipitation data series so as not to forget or ignore possible complication by frequency discontinuity.

The common trend assumption in the transPMFred (and PMFred) algorithm is not a problem for Canadian precipitation data series, which are mostly less than 100 yr in length and mostly have no significant low-frequency variations. However, it may not be valid for other precipitation data series. The common approach to accounting for low-frequency variations has been to use a reference series that has the same linear trend and periodic components (including annual cycle and low-frequency variations). However, the extremely high spatial variability and noncontinuity of daily precipitation make it unrealistic to find a suitable reference series for use in homogenization of daily precipitation series. A new method that allows nonlinear trends (consisting of periodic components and a linear trend) in the time series being tested and thus does not need the common trend assumption is being developed and will soon be reported upon in a separate study. Soon, we will be able to drop the common trend assumption from the PMFred and transPMFred algorithms. The related software package will be updated accordingly.

A common variance throughout the period of data record is also assumed in the SNH, PMT, and PMF tests and thus in the PMFred and transPMFred algorithms. However, the common variance assumption may be not valid for climate variables in a changing climate; variance seasonality often exists in climate variables. As briefly discussed at the end of section 6, one can apply the test to daily precipitation series in each season separately, thus allowing the trend, the distribution, and hence the variance to be different in different seasons of the year. As an alternative, the QM adjustment algorithms can easily be modified to allow different trends in the series of data of different frequency categories, as was done in Dai et al. (2011), so that the distribution is allowed to change gradually over time. However, splicing one data series into several (say *M* > 1) series significantly decreases the sample size (from *N* to *N*/*M* for estimating the linear trend in each season or each of the *M _{q}* frequency categories), increasing the sampling uncertainty and hence the uncertainty in the estimates of the trends and the adjustments. One should seek a balance between sampling uncertainty (or estimation uncertainty) and the limitation of the assumptions. Note that the common variance assumption is also relaxed by allowing the use of a chosen part of the segment (instead of the whole segment) to estimate the PDF and hence the QM adjustments for use in homogenizing the series, as detailed earlier in section 5. This could also increase the estimation uncertainty.

## Acknowledgments

The authors thank Lucie Vincent for her helpful internal review comments on an earlier version of this manuscript and Hui Wan for her help in preparing the 11 Chinese data series. The Climate Research Division of the Atmospheric Science and Technology Directorate of Environment Canada is acknowledged for supporting this research through a Grant and Contributions (KM040-05-0016-IP) to York University. The three anonymous reviewers are also acknowledged for their helpful comments on an earlier version of this paper.

## REFERENCES

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Dr. Xiaolan L. Wang, Climate Research Division, ASTD, STB, Environment Canada, 4905 Dufferin St., Toronto, ON M3H 5T4, Canada. Email: xiaolan.wang@ec.gc.ca