## Abstract

This paper looks at the analysis of U.K. monthly rainfall data from a long-term persistence viewpoint. Different modeling approaches are considered, taking into account the strong dependence and the seasonality in the data. The results indicate that the most appropriate model is the one that presents cyclical long-run dependence with the order of integration being positive though small, and the cycles having a periodicity of about a year.

## 1. Introduction

Time series modeling of hydrological and climatological time series data is a major issue in the planning, operating, and decision making concerning water resources and the investigation of climatic fluctuations. Also, it is important for data generation, forecasting, estimating missing data, and extending data records (Delleur et al. 1976; Salas 1993; Hipel and McLeod 1994).

Rainfall data are difficult to model given that they are not perfectly linear and are non Gaussian, present seasonally in some cases, and possess a degree of dependence across time. Throughout this paper we present a variety of model specifications that take into account all the above characteristics, focusing mainly on long-memory or long-term persistence processes. These processes were introduced in the hydrological community by Hurst (1951), who heuristically detected the presence of long-range dependence in the well-known series of annual minima of the Nile River. These processes are so called because they display a high degree of association between observations that are distant in time. Following Hurst’s work, extensive research has been carried out to detect long-term persistence or long memory in hydrological data (Hipel and McLeod 1994; Montanari et al. 1996; Montanari and Rosso 1997; Pelletier and Turcotte 1997; Corduas and Piccolo 2006; Koutsoyiannis and Montanari 2007; Gil-Alana 2009).

In the context of rainfall data, long-memory models fit well in the drought hazard assessment. Bras and Rodriguez-Iturbe (1985) started with the discussion of long-range dependence in the context of droughts. They showed that standard autoregressive integrated moving average (ARIMA) models often underestimate the frequency of historical drought. The use of fractional models was also advocated by Booy and Lye (1989) for use in flood frequency analysis. Montanari et al. (1996) estimated an autoregressive fractionally integrated moving average (ARFIMA) model to the monthly and daily inflows of Lake Maggiore, Italy, finding that the ARFIMA models provide a much better fit than the traditional ARIMA models. Other papers dealing with long memory in rainfall data are Burlando et al. (1996), Brath et al. (2001), Brockwell and Chan (2006), Kantelhardt et al. (2006), and Gil-Alana (2004, 2009). On the other hand, several authors have found significant trends in rainfall data. Thus, for example, Camuffo (1984) used one of the longest precipitation records available in Italy to detect the possible presence of tendencies or cyclical patterns in its behavior and found evidence of wavy trends, not always in phase with the frequency trend. Montanari et al. (1996) analyzed monthly and annual rainfall data in several locations in Italy and found decreasing trends in the precipitation amounts observed in the last century. Using long-range dependence techniques these authors conclude, however, that some of the trends were statistically insignificant.

This paper innovates with respect to other articles in long-memory models for rainfall data in the use of a cyclical approach based on the Gegenbauer processes. We show that the U.K. monthly rainfall data can be well described in terms of a cyclical precipitation(*d*) model with a positive though small differencing parameter *d*. From a physical perspective, development of rainfall systems depends on the local environmental conditions and if we consider synoptic scales can be traced back to previously occurred weather systems. Nevertheless, though the evidence of long memory has been verified in many environmental time series (e.g., Bunde et al. 2003a,b) its evidence in rainfall data is weak (Pelletier and Turcotte 1997).

In this paper we focus on the U.K. monthly rainfall data and estimate different long-range dependence models based on their degrees of seasonal and nonseasonal persistence. The possibility of linear trends is also investigated. The results indicate that U.K. rainfall data do not contain linear trends though present a persistent cyclical pattern of length of approximately one year, and with the effect of the shocks disappearing relatively fast in time. The outline of the article is as follows: Section 2 presents the statistical framework. Section 3 describes the data and the models employed. Section 4 displays the empirical results, and section 5 concludes the article.

## 2. The statistical framework

For the purpose of the present work, we define an I(0) process as a covariance stationary process with spectral density function *f*(*λ*) that is positive and finite at any frequency. Alternatively, it can be defined in the time domain as a process such that the infinite sum of the autocovariances is finite. This includes a wide range of model specifications such as the white-noise case, the stationary autoregression (AR), moving average (MA), and stationary autoregressive moving average (ARMA) models. Usually, the I(0) condition is a prerequisite for statistical inference in time series analysis.

Within the I(0) context, the ARMA model is commonly employed in time series analysis, and in particular, the AR(1) model given by

with |*ρ*| < 1, where *ρ* is the AR coefficient and *υ _{t}* is white noise. This model has been widely employed in the hydrological and climatological community because of its relation with the stochastic first order differential equation. In case of seasonal data, (1) can be replaced by the seasonal AR(1) process:

where *s* indicates the number of time periods per year (e.g., *s* = 4 with quarterly data, and *s* = 12 with monthly data). A strong limitation of this approach is that the coefficient *ρ* in (2) is fixed across seasons (months) implying that the rainfall amount for a certain month depends solely on the rainfall amount of the same season of the previous year.

On the other hand, the series might be nonstationary, in the sense that the mean, the variance, or the autocovariance may change across time. In this context we can take *ρ* in (1) equal to 1, and *u _{t}* is said to be integrated of order 1 [and denoted by

*u*~ I(1)]. In such a case, the first differences (1 –

_{t}*L*)

*u*=

_{t}*u*−

_{t}*u*

_{t}_{−1}are stationary I(0), and statistical inference must be relied on the first differenced process. In other words,

*u*is said to be I(1) if

_{t}where *L* is the lag operator (*Lu _{t}* =

*u*

_{t}_{−1}) and

*υ*is I(0) as defined above. If

_{t}*υ*is ARMA(

_{t}*p*,

*q*), then

*u*is said to be an ARIMA(

_{t}*p*, 1,

*q*) model.

^{1}Similarly, for the seasonal case, we can take seasonal first differences on

*u*, such that

_{t}and *υ _{t}* is I(0). There exist various test statistics for testing models like (3) (i.e., unit roots; e.g., Dickey and Fuller 1979; Phillips and Perron 1988; Elliot et al. 1996; Ng and Perron 2001) and (4) (seasonal unit roots; Dickey et al. 1984; Hylleberg et al. 1990; Beaulieu and Miron 1993) that have been widely employed in the time series literature. These methods, however, focus on the I(0) (stationarity) and I(1) (nonstationary) cases and do not take into account other more flexible alternatives such as the fractional ones that will be examined in this work.

The above models have been extended in recent years to the fractional case. The number of differences required to render a series stationary I(0) may not necessarily be an integer value (usually 1) but a fractional one. In this context, *u _{t}* is said to be I(

*d*) if

where *d* can be any real value, *u _{t}* = 0,

*t*≤ 0, and

*υ*is again I(0). Note that the polynomial on the left-hand side of (5) can be expanded, for all real

_{t}*d*, as

Thus, if *d* in (5) is an integer value, *u _{t}* will be a function of a finite number of past observations, while if

*d*is a noninteger,

*u*depends upon values of the time series in the distant past, and the higher the value of

_{t}*d*, the higher the level of dependence between the observations.

^{2}Examples of applications of I(

*d*) models of form as in (5) in hydrological data are the papers of Montanari et al. (1996) and Montanari and Rosso (1997), and other applications of these models in meteorological and climatological data can be found in Bloomfield (1992), Smith (1993), Lewis and Ray (1997), Maraun et al. (2004), and Gil-Alana (2003, 2005).

If *d* > 0 in (5) *u _{t}* displays long-range dependence (LRD). We can provide two definitions of LRD, one in the time domain and the other in the frequency domain. The time domain definition of LRD states that given a covariance stationary process (

*u*,

_{t}*t*= 0, ±1, …), with an autocovariance function

*E*[(

*u*–

_{t}*Eu*)(

_{t}*u*

_{t}_{−j}−

*Eu*)] =

_{t}*γ*,

_{j}*u*displays LRD if

_{t}is infinite. A frequency domain definition is as follows. Suppose that *u _{t}* has an absolutely continuous spectral distribution, so that it has a spectral density function, denoted by

*f*(

*λ*), and defined as

Then, *u _{t}* displays LRD if the spectral density function has a pole at some frequency

*λ*in the interval [0,

*π*], that is,

Most of the empirical literature has concentrated on the case where the singularity or pole in the spectrum occurs at the zero frequency (*λ** = 0). In fact, the I(*d*) model, defined as in (5), is characterized because the spectral density function is unbounded at the origin.^{3} However, there might be situations where the singularity or pole in the spectrum takes place at other frequencies. This is the case of the seasonal fractional process, which is basically an extension of model (5) to

where *d* may again be any real value, and its binomial expansion provides that, for all real *d*,

Thus, if *d* in (7) is noninteger, *u _{t}* depends on past (multiple of

*s*) values. In this case, the spectral density function is unbounded at the zero but also at the

*s*− 1 seasonal frequencies. Models of this form have been examined by, among others, Porter-Hudak (1990), Ray (1993), Sutcliffe (1994), and Gil-Alana and Robinson (2001).

Another specification in the context of LRD models is the one where the spectrum contains a single pole at a frequency away from zero. In this case, the process still displays the property of LRD but the autocorrelations present a cyclical structure that decays very slowly. This is the case of the Gegenbauer processes defined as

where *w _{r}* and

*d*are real values, and

*υ*is I(0). For practical purposes we define

_{t}*w*= 2

_{r}*π*

*r*/

*T*, with

*r*=

*T*/

*c*, and thus

*c*will indicate the number of time periods per cycle, while

*r*refers to the frequency that presents a pole or singularity in the spectrum of

*u*(

_{t}*λ**). Note that if

*r*= 0 (or

*c*= 1), the fractional polynomial in (7a) becomes (1 –

*L*)

^{2d}, which is the polynomial associated with the common case of fractional integration at the long-run or zero frequency.

Gray et al. (1989, 1994) showed that the polynomial in (7a) can be expressed in terms of the Gegenbauer polynomial, such that, denoting *μ* = cos*w _{r}*, for all

*d*

*≠*0,

where are orthogonal Gegenbauer polynomial coefficients recursively defined as

[see, for instance, Magnus et al. (1966) and Rainville (1960) for further details on Gegenbauer polynomials]. Gray et al. (1989) showed that *u _{t}* in (7a) is (covariance) stationary if

*d*< 0.5 for │

*μ*= cos

*w*│ < 1 and if

_{r}*d*< 0.25 for │

*μ*│ = 1.

This type of process described in (7a) was introduced by Andel (1986) and subsequently analyzed by Gray et al. (1989, 1994), Chung (1996a,b), Gil-Alana (2001), and Dalla and Hidalgo (2005) among many others.

All the above models display LRD, they are characterized because the spectral density is unbounded at one (or more) frequencies in the interval [0, *π*], and they will all be employed in the following sections when analyzing the U.K. monthly rainfall data.

## 3. Data and models

The data employed in this article correspond to the monthly U.K. rainfall (mm) data, areal series, from January 1914 to December 2008. They were obtained from the Met Office website (http://www.metoffice.gov.uk).

Figure 1 displays the time series plot along with its corresponding sample correlogram and periodogram. The data have an appearance of stationarity though with some type of seasonal (cyclical) pattern, which is also observed through the correlogram and the periodogram. In fact, the correlogram displays values with a cyclical structure that is decaying very slowly, which may be consistent, among many other specifications, with some of the LRD models described in section 2. Moreover, the periodogram presents a large peak at a frequency away from zero, giving further support in favor of the cyclical I(*d*) model described by (7a). Although the periodogram is not a consistent estimator of the spectral density function, it is an asymptotically unbiased estimator of it and, evaluated at the discrete Fourier frequencies *λ _{j}* = 2

*πj*/

*T, j*= 1, 2, …,

*T*/2, can give us an indication about the length of the cycles. In our case, we observe that the highest peak takes place at , which implies that the cycles have a periodicity of

*T*/95 = 1140/95 = 12 periods (=months)/cycle.

In what follows, we assume that *y _{t}* is the time series we observe (U.K. monthly rainfall data), initially we suppose that it contains deterministic terms such as an intercept and/or a linear time trend,

^{4}

and, based on the above evidence, we examine the following models for the detrended series:

in all cases with white noise *ɛ _{t}*. Note that model (8a) imposes an I(0) structure for the detrended time series

*x*; model (8b) is more general, allowing for fractional integration or I(

_{t}*d*), and including model (8a) as a particular case when

*d*= 0. In this context, the parameter

*d*describes the long-run persistence of the series while

*ρ*measures the short-run seasonal behavior. Model (8c) also uses fractional integration, though now

*d*describes the long-run seasonal evolution of the series, while

*ρ*measures the short-run dynamics.

^{5}Finally, in models (8d) and (8d′) the parameter

*d*indicates the cyclical long-range persistence and the short-run dynamics are described in terms of a seasonal AR process in (8d′) to incorporate potential periodicities. Nonseasonal AR disturbances were also considered in this model and they were discarded since we do not find statistically significant cases.

The inclusion of intercepts and linear trends may appear unrealistic in the context of the present research. However, these terms disappear when interacting with the LRD polynomials. Note that the model *y _{t}* =

*β*

_{0}+

*β*

_{1}

*t*+

*x*,

_{t}*ρ*(

*L*;

*d*)

*x*=

_{t}*u*, where

_{t}*ρ*(

*L*;

*d*) is the LRD polynomial, can be expressed as , where ; and and the time trend asymptotically disappears for

*d*> 0. In any case, the presence of deterministic trends in rainfall data has been observed in some regions (e.g., Camuffo 1984; Montanari et al. 1996), which could be due to increasing moisture content in the atmosphere associated with anthropogenic global warming and/or increasing rainfall from severe weather systems. The significance of these deterministic terms will be examined by means of the

*t*values noting that the residuals in the differenced processes are supposed to be well behaved I(0).

^{6}

The LRD models (8b)–(8d′) were estimated using the Whittle function in the frequency domain (see Dahlhaus 1989) and also using a very general testing procedure derived by Robinson (1994) that permits us to consider each of the above specifications as particular cases of interest. This method is based on the Lagrange multiplier (LM) principle and tests the null hypothesis *H*_{0}: *d* = *d*_{0} in (8b)–(8d′) for any real value *d*_{0}, thus encompassing stationary (*d*_{0} < 0.5) and nonstationary (*d*_{0} ≥ 0.5) hypotheses. Moreover, this method is the most efficient one in the context of fractional integration under the assumption of Gaussianity of the error term.^{7} The functional form of this method can be found in any of the numerous empirical applications based on this approach (Gil-Alana and Robinson 1997, 2001; Gil-Alana 2001).

## 4. Empirical results

We start by estimating model (8a), the simple seasonal AR(1) process. We display in Table 1 the estimates for the three cases of no deterministic terms [i.e., *β*_{0} = *β*_{1} = 0 a priori in (8)], an intercept (*β*_{0} unknown and *β*_{1} = 0), and an intercept with a linear time trend (*β*_{0} and *β*_{1} unknown). In the latter case, the trend coefficient appears statistically insignificantly different from zero, while the intercept is significant. In the three cases, the AR coefficient is found to be around 0.27. As earlier mentioned, this model seems unrealistic since it imposes the same value *ρ* across months.^{8}

Next, we allow for the possibility of fractional integration at the zero frequency and consider model (8b) as a potential specification. The results—again for the three cases of no regressors, an intercept, and an intercept with a linear time trend—are displayed in Table 2. We see here that the estimated value of *d* is significantly positive in the three cases, therefore ruling out model (8a), which would correspond to the case of *d* = 0. If no regressors are considered, the estimated value of *d* is equal to 0.138, and including an intercept and/or a linear time trend, the value is slightly smaller (0.076), though still statistically significant at the 5% level. The *t* values of the deterministic terms indicate that a time trend is not required, while the intercept appears statistically significant. The seasonal AR coefficient is about 0.25 for the three cases. This specification is fairly realistic, with the current value being a function of all its past history, and higher weights are given to the values in the same month as the current one in previous years.^{9}

Next, we consider model (8c), that is, the long-run behavior is now described through the seasonal fractional differencing parameter, *d*, while the short-run dynamic is described by means of a nonseasonal AR(1) process. The results for this case are reported in Table 3. We observe that the seasonal differencing parameter is slightly negative in the three cases (about −0.004) and the I(0) hypothesis cannot be rejected at conventional statistical levels. Here, the estimated AR(1) coefficient is about 0.22. As in the previous case, the time trend seems not to be required while the intercept is significantly different from zero. Note that in this model the current value of the series is also a function of all its past history though the autocorrelations are now decaying hyperbolically with respect to the seasonal monthly structure, and exponentially faster for the remaining values.

In Tables 4 and 5, we consider an alternative specification given by (8d) and (8d′). Here we suppose that the disturbances are white noise (in Table 4) and seasonal AR(1) (in Table 5). According to these two models, the U.K. monthly rainfall data follow a cyclical I(*d*) process with a periodicity of 12 months for the two cases of uncorrelated and AR(1) errors.

The results are similar in the two cases. The cyclical order of integration *d* is positive though small, the time trends are insignificant, while the intercepts are statistically different from zero. If seasonal autoregressions are allowed, the coefficient is found to be about 0.24. Noting that model (8d) is nested in (8d′) we performed a likelihood ratio (LR) test on these two specifications, testing the null of *ρ* = 0 in (8d′). The result of the test statistic (1.391) indicates that the model with white noise disturbances [i.e., (8d)] is preferred.

According to the above specifications, and noting that model (8a) is rejected in favor of (8b), and (8d) is preferred to (8d′), we consider the following three potential models to describe the persistent behavior of the series:

and

Remember that model M1 refers to the case of fractional integration or I(*d*) behavior (at the long-run or zero frequency) with seasonal AR disturbances, and model M2 refers to seasonal fractional integration with nonseasonal AR disturbances, while model M3 is the one related to cyclical fractional integration of the form based on the Gegenbauer processes. We examined the residuals of the three selected models based on diagnostic tests of serial correlation and functional form, and the results strongly support the cyclical I(*d*) model M3.

Finally, we compare the above models in terms of their forecasting performance by means of an in-sample forecasting experiment. Standard measures of forecast accuracy are the following: the mean absolute percentage error (MAPE), the mean-square error (MSE), the root-mean-square error (RMSE), the root-mean-percentage-square error (RMPSE), and mean absolute deviation (MAD). Some of these measures were implemented and, though not reported because of limited space, model M3 produced the best results in the majority of the cases. On the other hand, it may be argued that the above measures are purely descriptive devices. There exist now statistical tests for comparing different forecasting models. One of these tests, widely employed in the time series literature, is the asymptotic test for a zero expected loss differential of Diebold and Mariano (1995).^{10} However, for a given covariance stationary sample realization, (*d _{t}*)

_{t=T+h,…,T+n}, Harvey et al. (1997) noted that the Diebold–Mariano test statistic could be seriously oversized as the lead time

*h*increases. These authors provided a modified Diebold–Mariano test statistic given by

where DM is the original Diebold–Mariano statistic.

Based on the *M* − DM test, we make pairwise comparisons between the three models using the RMSE and the MAPE as the criteria employed, and for *h* = 12, 24, 36, and 48 in a 48-period (in sample) horizon. We report in Table 6 the values of the *M*−DM statistic for the case of the RMSE and *h* = 48 clearly observing the superiority of the model M3 over the other two. Similar results were obtained with other criteria and (lead) time horizons.

## 5. Concluding comments

In this paper we have examined long-range persistence in the U.K. monthly rainfall data by using a variety of model specifications to describe the long-run dependence in the data. Specifically, we used the following three models: 1) a fractionally integrated or I(*d*) model with seasonal AR disturbances, 2) a seasonally fractionally integrated model with nonseasonal AR disturbances, and 3) a cyclically fractionally integrated model. This latter model is based on the Gegenbauer processes and assumes a cyclical structure in the autocorrelations decaying slowly to zero. This process is also characterized because the spectral density function is unbounded at a nonzero frequency. Note that all these models imply long-run persistence, which is a commonly observed feature in hydrological and meteorological data. Also, the specifications employed in this work nest standard models widely employed in modeling rainfall data, including stationarity and nonstationary models, with and without deterministic trends, with short-memory and long-memory dependencies, and, in the latter case allowing the pole in the spectrum to occur at zero and nonzero frequencies. Our results clearly reject the hypothesis of short memory or I(0) stationarity in favor of long-memory models. Also, diagnostic tests carried out on the residuals along with an in-sample forecasting experiment support the view that the cyclical I(*d*) model is the most correct specification for this series. This result is consistent with the peak observed in the periodogram of the data at the *T*/12 frequency, suggesting cycles of length of approximately one year. The estimated order of integration is found to be positive though small, implying that the autocorrelations cyclically decay relatively fast (though hyperbolically) to zero.

With respect to the deterministic components (in particular, the intercepts that are found to be statistically significant) it may be argued that they do not have a physical meaning in the context of rainfall data. However, as we argue earlier in this work, they disappear in the long run as the fractional differencing parameter is found to be strictly higher than 0. Moreover, we also conducted the analysis based on the mean-subtracted series and, though not reported, the estimated coefficients of the LRD coefficients, were practically the same as those obtained with the original data. In particular, with respect to model M3 the estimated value of *d* was found to be 0.017, slightly higher than the one reported in this work (0.014). The lack of linear trends in the data, at least for the U.K. case examined in this work, is another distinguishing feature in this paper, especially noting that linear trends are common in precipitation in many regions (see, e.g., Castañeda and Barros 1994, for Argentina; Trömel and Schönwiese 2008, for Germany; Yazdani et al. 2011, for Iran). Our results strongly support the view that there are no linear trends in the U.K. rainfall data. Nonlinear trends of the form employed by authors such as Karl et al. (2000) for global temperatures and Minetti et al. (2003) for Argentina and Chile will be examined in future papers.

The forecasting experiment conducted in this work also indicates that the cyclically fractionally integrated model outperforms noncyclical long-memory models, while these latter models are also found to be superior to others based on integer degrees of differentiation. How our cyclically I(*d*) model predicts in out-of-sample contexts is a matter of research that will be investigated in future papers.

As a final remark, it is of interest to show whether the same result holds in other monthly rainfall data, noting that if that were the case, those specifications based on fractional (or integer) degrees of integration at the long-run or zero frequency would produce invalid results based on biased estimates of the fractional differencing parameter. Work in this direction is now in progress.

## Acknowledgments

The author gratefully acknowledges financial support from the Ministerio de Ciencia y Tecnología (ECO2008-03035 ECON Y FINANZAS, Spain) and from a PIUNA Project at the University of Navarra. Comments from two anonymous referees and the editor of the journal are gratefully acknowledged.

## REFERENCES

## Footnotes

^{1}

See Salas et al. (1980) for a complete review of these models and their characteristics.

^{2}

Though not explicitly mentioned, *u _{t}* in (5) admits an infinite AR representation, implying that

*u*can be expressed in terms of all its past history.

_{t}^{3}

These processes were introduced by Granger (1980, 1981) and Hosking (1981), and they were justified in terms of the aggregation of individual heterogeneous AR(1) processes by Robinson (1978) and Granger (1980).

^{4}

Seasonal dummy variables were not considered given the lack of systematic patterns in the monthly observations (see Fig. 2).

^{6}

Some authors eliminate trends in the data by using techniques such as wavelets (Arneodo et al. 1996) or detrended fluctuation analysis (DFA; Bunde et al. 2003a). See also Yue et al. (2010) for applications of these techniques in precipitation.

^{7}

Note, however, that Gaussianity is not a requirement in this procedure with a moment condition of only order 2 required.

^{8}

Performing AR(1) models for each month, the estimated parameters were −0.03, −0.05, 0.27, 0.03, 0.08, 0.06, 0.12. −0.07, 0.05, 0.08, −0.00, and 0.10, clearly showing a distinct pattern across months.

^{9}

As earlier mentioned, though long-range persistence exists in temperature, its evidence is weak in rainfall data.

^{10}

An alternative approach is the bootstrap-based test of Ashley (1998), though this method is computationally more intensive.