1. Introduction
Precipitation information is of crucial importance in understanding the complex interactions within the hydrological cycle, to assess the availability of water resources in space and time (see, e.g., Langousis and Kaleris 2014; Mamalakis et al. 2017), and make informed decisions for water management purposes (see, e.g., Adler et al. 2000), design of hydraulic infrastructures, and water balance assessments (see, e.g., Viola et al. 2017; Caracciolo et al. 2017).
In this context, one of the first attempts to understand the statistical character of annual precipitation was conducted by Markovic (1965), who used annual rainfall totals (ART) from 2506 stations in the western United States and southwestern Canada with at least 30 years of recordings, to identify the distribution model that best fits the observed frequencies. The study concluded that among the normal, lognormal, and Gamma distribution models, the lognormal probability density function produced acceptable results, especially in the case when the annual precipitation was positively skewed. Following the early work of Markovic (1965), several other studies have focused on identifying the best distribution model that fits the observed frequencies of annual precipitation in different areas around the globe, including India (e.g., Mooley et al. 1981; Rodell et al. 2009), south Florida (e.g., Sculley 1986), Costa Rica (e.g., Waylen et al. 1996), Saudi Arabia (e.g., Abdullah and Al-Mazroui 1998), Israel (e.g., Ben-Gai et al. 1998), Jordan (e.g., Dahamsheh and Aksoy 2007), Portugal (e.g., De Lima et al. 2010), Nigeria (e.g., Ogungbenro and Morakinyo 2014), and Iran (e.g., Vaheddoost and Aksoy 2017). The aforementioned studies concluded that the type of the distribution model that best fits the empirical frequencies of ART varies spatially, but in some cases may be well approximated by a normal distribution; see, for example, Mooley et al. (1981), Waylen et al. (1996), Abdullah and Al-Mazroui (1998), De Lima et al. (2010), and Ogungbenro and Morakinyo (2014).
The theoretical basis for the normality assumption of ART arises from central limit theorem (CLT); see, for example, Parzen (1960), Fisz (1963), Feller (1968), Benjamin and Cornell (1970), and Papoulis (1990). According to this theorem, if Xi (i = 1, …, n) are independent copies of a random variable X with finite variance, then asymptotically as n→∞ random variable
Given the increased importance of the statistical modeling of ART, this study aims at developing a nonparametric procedure to classify ART samples into two complementary groups: approximately Gaussian distributed samples (G) and non-Gaussian distributed samples (NG), based on the marginal statistics of daily rainfall. To do so, we combine 1) goodness-of-fit metrics to conclude on the approximate convergence of the empirical distribution of annual rainfall totals to a normal shape, and classify ART samples into the G and NG complementary groups; 2) logistic regression analysis to identify the statistics of daily rainfall that are most descriptive of the G/NG classification; and 3) a random-search algorithm to determine a set of constraints that allows classification of ART samples based on the marginal statistics of daily rain rates. The analysis is conducted using 3007 time series of daily rainfall from the NOAA/NCDC Global Historical Climatology Network (GHCN) database, with global coverage. This latter attribute of the data allows us to study, also, how large-scale climatic features affect Gaussianity of annual rainfall totals, such as those embedded in the Köppen–Geiger climatic classification (Kottek et al. 2006).
The suggested nonparametric procedure to assess the accuracy of the normality assumption for annual rainfall totals, is based on the marginal statistics of daily rainfall and it is simple to apply, with minimal data length requirements, thus constituting a useful tool for practitioners and hydrologists that operate in data-poor regions.
The paper is organized as follows. In section 2 below, we start by providing necessary information with regard to the origin and processing of the rainfall data used. Section 3, exemplifies the use of three goodness-of-fit statistical metrics to assess the approximate validity of the normality assumption for annual rainfall totals, study its linkage to rainfall marginal statistics, and classify ART samples into two complementary groups: G and NG. In section 4, we combine a random-search algorithm with a proper test-based objective function, to develop and study the performance of a nonparametric procedure to assess the accuracy of the normality assumption for annual rainfall totals, based on the marginal statistics of daily rain rates. Conclusions and future research directions are presented in section 5.
2. Dataset and case study
In the analysis that follows we use daily rainfall data from the NOAA/NCDC GHCN rainfall database (https://www.ncdc.noaa.gov/ghcn-daily-description; see Menne et al. 2012). NOAA/NCDC GHCN contains daily time series from 90 230 stations with global coverage. To ensure the statistical significance of the obtained results, and similar to previous studies (see, e.g., Easterling et al. 1997; Papalexiou et al. 2018; Papalexiou and Montanari 2019), in the conducted analysis we use a total number of 3007 stations with percentages of missing data below 5%, yearly completeness above 98%, and more than 30 years of recordings. The density map of the analyzed stations is shown in Fig. 1, while Fig. 2 illustrates the percentage of stations exceeding different length requirements. From Fig. 1, one sees that Africa and South America are the less represented regions, while North America, Europe, India, China, and west Australia exhibit denser station networks.
Spatial density of the 3007 NOAA/NCDC rainfall stations considered in the analysis; see section 2 for details. The size of the grid boxes is 5° × 5°.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Percentage of the considered NOAA/NCDC stations exceeding different record-length requirements.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
3. Use of goodness-of-fit statistical metrics to classify ART samples according to their approximate normality
Statistical tests are based on the acceptance/rejection of a null hypothesis H0, based on a statistical metric T, referred to as test statistic; see, for example, Fisher (1925), Neyman and Pearson (1933), Lindquist (1940), and more recently Benjamin and Cornell (1970), and Papoulis (1990). Along these lines, section 3a presents the adopted goodness-of-fit statistical metrics and section 3b discusses their application to NOAA/NCDC rainfall dataset.
a. Methodological aspects
In what follows, we use three test statistics, namely, the Lilliefors’s version of Kolmogorov–Smirnov (KSL), Anderson–Darling (AD), and Cramer–Von Mises (CVM), as implemented by Öner and Kocakoç (2017) in MATLAB for the Gaussian case, to classify rainfall samples according to the approximate normality of annual rainfall totals.
The p value of the sample can be obtained from Table 1 in Dallal and Wilkinson (1986) as a function of Dn and n. It follows from Eq. (1) that Dn is suited to quantify linear deviations between the empirical and theoretical CDFs in the frequency domain and, therefore, the test is not very accurate in detecting deviations in the upper and lower tails of the empirical distributions; see also discussion below.
Existence of interannual dependencies in ART samples may bias the estimation of the variance from finite samples and, thus, influence goodness-of-fit testing. To avoid such issues, the standard deviation of ART samples has been obtained using the procedure suggested by Koutsoyiannis (2003) for simultaneous estimation of an unbiased standard deviation σH and Hurst coefficient H in the presence of interannual temporal dependencies; see, for example, Hurst (1951), Mandelbrot and Wallis (1969), Montanari et al. (1997), and Langousis and Koutsoyiannis (2006). The Hurst coefficient H quantifies the magnitude of interannual dependencies (also referred to as long-term persistence; see, e.g., Lettenmaier and Burges 1977; Salas et al. 1979; Montanari et al. 1997; Razavi and Vogel 2018) in the time series of annual rainfall totals, and varies between 0 and 1, with H = 0.5 indicating linear independence, and H > 0.5 (<0.5) indicating positive (negative) correlations in the corresponding time series.
Setting ck(H) = 1 as first approximation, H and σH can be estimated simultaneously by linearly regressing lns(k) versus lnk. To improve accuracy, the regression is repeated after calculating ck(H) using the estimates of H and σH from the first trial. The procedure is repeated until convergence.
b. Application to the dataset
Daily rainfall time series were aggregated at an annual level using the calendar year convention; that is, ART samples were obtained by summing rainfall depths from January to December. Use of hydrological years (i.e., from October to September), instead of calendar years, was also tested, but the results obtained were identical to those obtained using the calendar year convention.
Figure 3 shows the histogram of the Hurst coefficients estimated from the 3007 selected time series of annual rainfall totals. One sees that the distribution is approximately symmetric around 0.55, indicating (on average) small positive correlations in annual rainfall totals.
Histogram of Hurst coefficient estimates for the 3007 NOAA/NCDC rainfall stations used in the analysis; see the main text for details.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
With regard to classification of the 3007 ART samples into G and NG groups, KSL normality test results show that 2386 out of 3007 samples can be considered as Gaussian distributed at the 5% significance level, while CVM and AD tests show that 2218 and 2096 samples, respectively, fall into the G group. Evidently, AD is the strictest and most conservative of all normality tests, as it is more sensitive to observations in the upper and lower distribution tails.
For the three normality tests considered, Fig. 4 shows the local fractions of stations that fall within group G (i.e., exhibit annual rainfall totals that are approximately Gaussian distributed). Apart from KSL test, which results in fractions of Gaussian distributed stations that are close to 80%, both CVM and AD tests reveal increased fractions of normally distributed ART samples in eastern and northern China, and North America, and reduced fractions of normally distributed ART samples in India and eastern Australia, indicating that approximate Gaussianity of annual rainfall totals is strongly linked to local climatic conditions.
Global maps with spatial resolution 5° × 5°, illustrating the local fractions of stations that belong to group G, according to the three normality tests considered [(a) KSL, (b) AD, and (c) CVM], at the 5% significance level. Percentages denote the fraction of ART samples belonging to group G.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Along these lines, and in the light of the obtained results, we examined how the normality patterns revealed by the KSL, CVM, and AD tests link to the five climatic types of the Köppen–Geiger climatic classification (Kottek et al. 2006): equatorial (A), arid (B), warm temperate (C), continental (D), and polar (E); see Fig. 5a. Figure 5b shows the fraction of stations with approximately Gaussian distributed ART, on the basis of the three normality tests considered, at the 5% significance level. One sees that KSL is the less conservative test in assessing the normality of ART samples (see discussion above), while CVM and AD results are comparable in terms of percentages of Gaussian distributed samples, with AD being slightly stricter (i.e., leading to lower percentages) due to the increased weight imposed on observations at the upper or lower tails of the corresponding distributions. In addition, continental climate (D) exhibits the highest fraction of Gaussian distributed ART samples (i.e., AD 74.45% and CVM 79.28%), followed by warm temperate (C; AD 72.80% and CVM 76.27%), equatorial (A; AD 68.83% and CVM 72.33%), and polar (E; AD 62.96% and CVM 74.07%) climates. Arid climate (B) displays the lowest fraction of Gaussian distributed ART samples (AD 60.29% and CVM 65.52%). This is in accordance with what is statistically expected, as daily rainfall time series in arid regions exhibit lower fractions of wet days, highly skewed distributions of positive rainfall rates (due to the increased frequency of low rainfall intensities) and, consequently, exhibit reduced convergence rates to the normal shape when aggregated at an annual level; see the introduction.
(a) Global map illustrating the Köppen–Geiger climate classification, featuring five distinct climate types: equatorial (A), arid (B), warm temperate (C), continental (D), and polar (E). Black dots denote the 3007 NOAA/NCDC stations used in this study, and the total number of stations belonging to each class is shown in the legend. (b) Bar chart illustrating, for each climate type, the total number (horizontal text) and corresponding percentage (vertical text) of ART samples exhibiting group G behavior, on the basis of the three normality tests considered (KSL, AD, and CVM) at the 5% significance level.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
4. Nonparametric procedure to assess the accuracy of the normality assumption for annual rainfall totals, based on the marginal statistics of daily rainfall
As noted in the introduction and discussed in the previous section, approximate convergence of the distribution of annual rainfall totals to a normal shape is strictly linked to the seasonal character and marginal statistics of daily rainfall rates. In what follows, we develop a nonparametric procedure to assess the approximate normality of ART samples based on the marginal statistics of daily rainfall.
a. Methodological developments
To describe basic features of daily rainfall we use the fraction of wet days fwd (here days with rainfall accumulation above 0.1 mm), the mean value mwd, standard deviation σwd, and skewness coefficient skwd (i.e., the ratio of the third central moment to the cube of the standard deviation) of rainfall in wet days (here, days with rainfall accumulation in excess of 0.1 mm), and the precipitation concentration index (PCI) (see, e.g., Oliver 1980; Michiels et al. 1992; De Luis et al. 1997; Cannarozzo et al. 2006). If the latter index is below 10, monthly rainfall seasonality can be considered negligible, whereas higher values indicate substantial inter seasonal variations.
To quantify the effects of local climate in determining the approximate convergence of annual rainfall totals to a normal distribution, we use logistic regression analysis (see, e.g., Bliss 1934; Berkson 1944; Augustin et al. 2008; Van Steenbergen and Willems 2013) to identify the most influential daily rainfall statistics in grouping the analyzed ART samples into the G and NG subsets. The model adopted here is the logit model, fitted using the method of maximum likelihood (see, e.g., McCullagh and Nelder 1990).
In what follows, we combine a random-search algorithm with a proper test-based objective function to conclude on a set of constraints that allows classification of ART samples based on the marginal statistics of daily rain rates.
To avoid convergence issues, in this study, we determine the optimal set of constraints through random search. Specifically, 1) we consider 106 threshold level combinations by uniformly sampling the corresponding regressor variables within their observed ranges, 2) for each threshold combination, we use the AD classification into G and NG groups of the previous section to estimate the empirical probabilities P[A ∩ T] and P[Ac ∩ Tc], and 3) we select as optimal the set of threshold combinations that maximizes the objective function in Eq. (5).
b. Results
Table 1 summarizes the results of the logistic regression in terms of p values (i.e., for the null hypothesis that a certain predictor is influential), and variance inflation factors (VIFs; an index to quantify multicollinearities between the regressors; see, e.g., Song and Kroll 2011), for the classification based on the Anderson–Darling test statistic (the most conservative one) at 5% significance level, and for three selected sets of predictor variables (Set I: fwd and skwd; Set II: fwd, skwd, and σwd; Set III fwd, PCI, and skwd). The p values of the regression coefficients are estimated by evaluating the exceedance probability levels of the Student’s t variables associated with the explanatory variables (see, e.g., McCullagh and Nelder 1990). Lower p values indicate higher significance of the corresponding predictors in the regression, whereas VIF values close to unity indicate approximate linear independence of the predictors (see, e.g., Chatterjee and Price 1991; O’Brien 2007; Song and Kroll 2011). One sees that Set I, which consists of two predictor variables, namely, the fraction of wet days fwd and the skewness coefficient of rainfall in wet days skwd, is the best performing one, with the two regressors exhibiting approximate linear independence.
Results of the logistic regression analysis in terms of p values and VIFs for the classification of the considered NOAA/NCDC rainfall stations into G and NG subsets, as based on the AD test statistic at 5% significance level, and for three selected sets of predictor variables (Set I: fwd and skwd; Set II: fwd, skwd, and σwd; Set III fwd, PCI, and skwd); see the main text for details.
Figure 6 illustrates histograms of the fraction of wet days fwd, and the skewness coefficient skwd of rainfall in wet days for the 3007 daily rainfall series analyzed, and Fig. 7 shows how the selected optimal thresholds of the aforementioned marginal statistics vary with the level of significance α of the AD normality test in the range 2%–15%. As expected, when the level of significance α increases, the optimal threshold for the fraction of wet days fwd increases, whereas the opposite holds for the daily skewness coefficient. More precisely, the threshold for the fraction of wet days fwd is approximately constant and equal to 0.1 up to α ≈ 7% and increases to 0.12 at α = 15%. With regard to skwd, it is approximately constant and equal to 5.92 up to α ≈ 10% and decreases to 5.7 at α = 15%.
Histograms of (a) the fraction of wet days fwd and (b) the skewness coefficient skwd for the 3007 NOAA/NCDC daily rainfall time series analyzed.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Dependence of the optimal thresholds of (a) fwd and (b) skwd of rainfall on wet days on the level of significance α of the AD test, for the 3007 NOAA/NCDC daily rainfall time series analyzed. Green or red areas highlight the domains of fwd and skwd where ART samples can respectively be considered to be Gaussian or non-Gaussian distributed on the basis of the proposed procedure.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Figure 8 compares the conditional probabilities P[A|T] and P[Ac|Tc] with the marginal probabilities P[A] and P[Ac] = 1 − P[A] as a function of the level of significance α used for the AD test. One sees that irrespective of the significance level α of the AD test, both conditional probabilities exhibit higher values relative to their respective marginals, indicating the significant information value of the nonparametric procedure.
Comparison of the conditional probabilities P[A|T] and P[Ac|Tc], with the marginal probabilities P[A] and P[Ac] = 1 − P[A], as a function of the level of significance α used for the AD test, for the case in which the two most influential predictor variables (i.e., Set I: fwd, skwd) are used to constrain classification to the G and NG groups; see the main text for details.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Figures 9–12 show similar plots to Figs. 7 and 8, but for the case in which three predictor variables are used (i.e., Set II: fwd, skwd, and σwd; Set III fwd, PCI, and skwd; see Table 1). One sees that the results obtained using three regressors are virtually identical to those obtained using solely fwd and skwd as independent variables, thus, verifying their significance in determining the approximate convergence of the distribution of annual rainfall totals to a normal shape.
Dependence of the optimal thresholds of (a) fwd, along with (b) standard deviation σwd and (c) skwd of rainfall in wet days, on the level of significance α of the AD test for the 3007 NOAA/NCDC daily rainfall time series analyzed. Green or red areas highlight the domains of fwd, σwd, and skwd where ART samples can respectively be considered to be Gaussian or non-Gaussian distributed on the basis of the proposed procedure.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Comparison of the conditional probabilities P[A|T] and P[Ac|Tc], with the marginal probabilities P[A] and P[Ac] = 1 − P[A], as a function of the level of significance α used for the AD test, for the case in which three influential predictor variables (i.e., Set II: fwd, σwd, skwd) are used to constraint classification to the G and NG groups; see the main text for details.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Dependence of the optimal thresholds of (a) the fraction of wet days fwd, along with (b) precipitation concentration index PCI and (c) skwd of rainfall in wet days, on the level of significance α of the AD test for the 3007 NOAA/NCDC daily rainfall time series analyzed. Green or red areas highlight the domain of fwd, PCI, and skwd where ART samples can respectively be considered to be Gaussian or non-Gaussian distributed on the basis of the proposed procedure.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
Comparison of the conditional probabilities P[A|T] and P[Ac|Tc], with the marginal probabilities P[A] and P[Ac] = 1 − P[A], as a function of the level of significance α used for the AD test, for the case in which three influential predictor variables (i.e., Set III: fwd, PCI, skwd) are used to constraint classification to the G and NG groups; see the main text for details.
Citation: Journal of Applied Meteorology and Climatology 60, 4; 10.1175/JAMC-D-20-0060.1
To validate the developed nonparametric normality test, we implemented a jackknife approach as follows: 1) we extracted one station out of the 3007 considered, and used the remainder 3006 to calibrate the nonparametric procedure; 2) we used the fraction fwd of wet days and the skewness coefficient skwd of positive rainfall rates of the extracted station to assess whether its ART sample can be considered Gaussian distributed according to the calibrated nonparametric test; 3) we applied the AD test to the ART sample of the extracted station and evaluated the outcome of step 2; 4) we repeated steps 1–3 for all 3007 stations of the NOAA/NCDC dataset. The power of the test (i.e., the probability of rejection of the null hypothesis of Gaussianity given that it is false) was found to be 80%, 75%, and 65%, for corresponding levels of significance of the associated AD test equal to 2%, 5%, and 10%, respectively, ensuring the effectiveness of the developed procedure in assessing Gaussianity of ART samples based on the marginal statistics of daily rain rates.
5. Conclusions and discussion
Testing the normality assumption of annual rainfall totals requires long time series of annual data, often not available in data-poor regions. The present study aimed to overcome this issue, by developing a nonparametric procedure to assess the accuracy of the normality assumption for ART based on the marginal statistics of daily rainfall, which can be effectively estimated from a few years of daily rainfall measurements.
This was done by combining 1) goodness-of-fit metrics to conclude on the approximate convergence of the empirical distribution of annual rainfall totals to a normal shape, and classify ART samples into two complementary groups, 2) logistic regression analysis to identify the statistics of daily rainfall that are most descriptive of ART convergence to a normal shape, and 3) a random-search algorithm to conclude on a set of constraints that maximizes the likelihood that the identification procedure produces accurate outcomes. The analysis was conducted using 3007 time series of daily rainfall rates obtained from the NOAA/NCDC GHCN database, with global coverage, which also allowed us to study how approximate Gaussianity of annual rainfall totals is affected by large-scale climatic features, as those embedded in Köppen–Geiger climatic classification (Kottek et al. 2006). We found that continental climate (D) exhibits the highest fraction of Gaussian distributed ART samples [i.e., 74,45%, Anderson–Darling (AD) test at α = 5% significance level], followed by warm temperate (C, 72.80%), equatorial (A, 68.83%), polar (E, 62.96%), and arid (B, 60.29%) climates (see Fig. 5). The analysis also showed that the AD statistical test is the most conservative one in determining approximate Gaussianity of ART samples [followed by Cramer–Von Mises (CVM) and Lilliefors’s version of Kolmogorov–Smirnov (KSL) tests], and that the fraction of wet days fwd, and skewness coefficient skwd of rainfall in wet days suffice to determine approximate convergence of ART to the normal shape.
Under this setting, Figs. 7 and 8 can serve as a powerful tool in determining the probability that ART samples in data-poor regions exhibit approximately Gaussian behavior, based on the marginal statistics of daily rainfall. The procedure can be outlined as follows: One uses Fig. 7 to determine whether both fwd and skwd sample estimates fall inside the domain of Gaussian ART behavior (i.e., green areas). If this is the case, one uses the P[A|T] curve in Fig. 8 to calculate the probability that the sample is Gaussian distributed at the corresponding significance level α. Otherwise, one uses the P[AC|TC]-curve in Fig. 8 to obtain the probability that the sample deviates significantly from the normal shape at the corresponding significance level α. For example, according to Figs. 7 and 8, daily rainfall time series with fraction of wet days fwd < 0.1 and daily skewness coefficient of positive rain rates skwd > 5.92 deviate significantly from the normal shape according to the Anderson–Darling statistical test at the 5% significance level.
One limitation of the developed approach is that, while the NOAA/NCDC dataset is the largest currently available, still it does not cover homogenously all regions and climatic zones around the globe. We tried to overcome this issue, by not imposing any regional or climatological constraints, and focusing on establishing a linkage between the marginal statistics of daily rainfall and the deviations of the empirical distribution of ART samples from the normal shape, irrespective of regional aspects. Since the marginal statistics of daily rainfall are descriptive of the climatology of regions (see section 3b), the developed nonparametric test should be implicitly accounting for the local rainfall climatology, irrespective of coverage issues.
An important note to be made here is that, as happens to be the case for all statistical tests, failure to reject the null hypothesis of Gaussianity does not necessarily mean that the null hypothesis is true but, rather, that there is insufficient evidence to reject it. In this context we tried to assess the power of the developed testing procedure in terms of Type-II errors using a jackknife approach. We found that the power of the test (i.e., the probability of rejection of the null hypothesis of Gaussianity given that it is false) is 80%, 75%, and 65%, for corresponding levels of significance 2%, 5%, and 10%, respectively, ensuring the effectiveness of the developed procedure in assessing Gaussianity of ART samples from a few years of daily rainfall measurements.
Future research may be directed toward 1) identifying representative distribution models for annual rainfall totals in different climatic regions; 2) exploring the time scales of averaging over which rainfall totals become approximately normal at a given level of significance, as a function of the regional rainfall climatology; and 3) extending the developed method to assess the extremal behavior of rain rates at different temporal resolutions on the basis of the marginal statistics of daily, monthly, and annual rainfall accumulations. Such an effort may result in a powerful approach toward hydrologic risk estimation, particularly suited for engineering applications in data-poor regions.
Acknowledgments
The authors acknowledge the insightful comments and suggestions of two anonymous reviewers and the editor, which significantly helped them to improve the quality of the presented work.
REFERENCES
Abdullah, M. A., and M. A. Al-Mazroui, 1998: Climatological study of the southwestern region of Saudi Arabia. I. Rainfall analysis. Climate Res., 9, 213–223, https://doi.org/10.3354/cr009213.
Adler, R. F., G. J. Huffman, D. T. Bolvin, S. Curtis, and E. J. Nelkin, 2000: Tropical rainfall distributions determined using TRMM combined with other satellite and rain gauge information. J. Appl. Meteor., 39, 2007–2023, https://doi.org/10.1175/1520-0450(2001)040<2007:TRDDUT>2.0.CO;2.
Anderson, T. W., and D. A. Darling, 1952: Asymptotic theory of certain “goodness of fit” criteria based on stochastic processes. Ann. Math. Stat., 23, 193–212, https://doi.org/10.1214/aoms/1177729437.
Augustin, N. H., L. Beevers, and W. T. Sloan, 2008: Predicting river flows for future climates using an autoregressive multinomial logit model. Water Resour. Res., 44, W07403, https://doi.org/10.1029/2006WR005127.
Ben-Gai, T., A. Bitan, A. Manes, P. Alpert, and S. Rubin, 1998: Spatial and temporal changes in rainfall frequency distribution patterns in Israel. Theor. Appl. Climatol., 61, 177–190, https://doi.org/10.1007/s007040050062.
Benjamin, J. R., and C. A. Cornell, 1970: Probability, Statistics, and Decision for Civil Engineers. McGraw-Hill, 684 pp.
Berkson, J., 1944: Application of the logistic function to bio-assay. J. Amer. Stat. Assoc., 39, 357–365.
Bliss, C. I., 1934: The method of probits. Science, 79, 38–39, https://doi.org/10.1126/science.79.2037.38.
Cannarozzo, M., L. V. Noto, and F. Viola, 2006: Spatial distribution of rainfall trends in Sicily (1921–2000). Phys. Chem. Earth Parts ABC, 31, 1201–1211, https://doi.org/10.1016/j.pce.2006.03.022.
Caracciolo, D., R. Deidda, and F. Viola, 2017: Analytical estimation of annual runoff distribution in ungauged seasonally dry basins based on a first order Taylor expansion of the Fu’s equation. Adv. Water Resour., 109, 320–332, https://doi.org/10.1016/j.advwatres.2017.09.019.
Chatterjee, S., and B. Price, 1991: Regression Analysis by Example. John Wiley and Sons, 304 pp.
Cramér, H., 1928: On the composition of elementary errors. Scand. Actuar. J., 1928, 13–74, https://doi.org/10.1080/03461238.1928.10416862.
Dahamsheh, A., and H. Aksoy, 2007: Structural characteristics of annual precipitation data in Jordan. Theor. Appl. Climatol., 88, 201–212, https://doi.org/10.1007/s00704-006-0247-3.
Dallal, G. E., and L. Wilkinson, 1986: An analytic approximation to the distribution of Lilliefors’s test statistic for normality. Amer. Stat., 40, 294–296, https://doi.org/10.2307/2684607.
De Lima, M. I. P., S. C. P. Carvalho, and J. L. M. P. De Lima, 2010: Investigating annual and monthly trends in precipitation structure: An overview across Portugal. Nat. Hazards Earth Syst. Sci., 10, 2429–2440, https://doi.org/10.5194/nhess-10-2429-2010.
De Luis, M., J. C. González-Hidalgo, J. Raventós, J. R. Sánchez, and J. Cortina, 1997: Distribucion espacial de la concentracion y agresividad de la lluvia en el territorio de la Comunidad Valenciana. Cuat. Geomorfol., 11, 33–44.
Easterling, D. R., and Coauthors, 1997: Maximum and minimum temperature trends for the globe. Science, 277, 364–367, https://doi.org/10.1126/science.277.5324.364.
Farrell, P. J., and K. Rogers-Stewart, 2006: Comprehensive study of tests for normality and symmetry: Extending the Spiegelhalter test. J. Stat. Comput. Simul., 76, 803–816, https://doi.org/10.1080/10629360500109023.
Feller, W., 1968: An Introduction to Probability Theory and Its Applications. Vol. I, John Wiley and Sons, 509 pp.
Fisher, R. A., 1925: Statistical Methods for Research Workers. Oliver and Boyd, 245 pp.
Fisz, M., 1963: Probability Theory and Mathematical Statistics. Polish Scientifique, 677 pp.
Foufoula-Georgiou, E., and D. P. Lettenmaier, 1986: Continuous-time versus discrete-time point process models for rainfall occurrence series. Water Resour. Res., 22, 531–542, https://doi.org/10.1029/WR022i004p00531.
Hurst, H. E., 1951: Long-term storage capacity of reservoirs. Trans. Amer. Soc. Civ. Eng., 116, 770–799, https://doi.org/10.1061/TACEAT.0006518.
Katz, R. W., 1993: Towards a statistical paradigm for climate change. Climate Res., 2, 167–175, https://doi.org/10.3354/cr002167.
Kottek, M., J. Grieser, C. Beck, B. Rudolf, and F. Rubel, 2006: World map of the Köppen-Geiger climate classification updated. Meteor. Z., 15, 259–263, https://doi.org/10.1127/0941-2948/2006/0130.
Koutsoyiannis, D., 2003: Climate change, the Hurst phenomenon, and hydrological statistics. Hydrol. Sci. J., 48, 3–24, https://doi.org/10.1623/hysj.48.1.3.43481.
Koutsoyiannis, D., and A. Langousis, 2011: Precipitation. Treatise on Water Science, P. Wilderer, Ed., Elsevier, 27–77.
Laio, F., 2004: Cramer–von Mises and Anderson–Darling goodness of fit tests for extreme value distributions with unknown parameters. Water Resour. Res., 40, W09308, https://doi.org/10.1029/2004WR003204.
Langousis, A., and D. Koutsoyiannis, 2006: A stochastic methodology for generation of seasonal time series reproducing overyear scaling behaviour. J. Hydrol., 322, 138–154, https://doi.org/10.1016/j.jhydrol.2005.02.037.
Langousis, A., and V. Kaleris, 2014: Statistical framework to simulate daily rainfall series conditional on upper-air predictor variables. Water Resour. Res., 50, 3907–3932, https://doi.org/10.1002/2013WR014936.
Langousis, A., A. Mamalakis, M. Puliga, and R. Deidda, 2016: Threshold detection for the generalized Pareto distribution: Review of representative methods and application to the NOAA NCDC daily rainfall database. Water Resour. Res., 52, 2659–2681, https://doi.org/10.1002/2015WR018502.
Le Cam, L., 1961: A stochastic description of precipitation. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 165–186.
Lettenmaier, D. P., and S. J. Burges, 1977: An operational approach to preserving skew in hydrologic models of long-term persistence. Water Resour. Res., 13, 281–290, https://doi.org/10.1029/WR013i002p00281.
Lilliefors, H. W., 1967: On the Kolmogorov-Smirnov test for normality with mean and variance unknown. J. Amer. Stat. Assoc., 62, 399–402, https://doi.org/10.1080/01621459.1967.10482916.
Lindquist, E. F., 1940: Statistical Analysis in Educational Research. Houghton Mifflin, 266 pp.
Mamalakis, A., A. Langousis, R. Deidda, and M. Marrocu, 2017: A parametric approach for simultaneous bias correction and high-resolution downscaling of climate model rainfall. Water Resour. Res., 53, 2149–2170, https://doi.org/10.1002/2016WR019578.
Mandelbrot, B. B., and J. R. Wallis, 1969: Some long-run properties of geophysical records. Water Resour. Res., 5, 321–340, https://doi.org/10.1029/WR005i002p00321.
Markovic, R. D., 1965: Probability functions of best fit to distributions of annual precipitation and runoff. Colorado State University Hydrology Papers 8, 34 pp.
McCullagh, P., and J. A. Nelder, 1990: Generalized Linear Models. 2nd ed. Chapman&Hall/CRC Press, 526 pp.
Menne, M. J., I. Durre, R. S. Vose, B. E. Gleason, and T. G. Houston, 2012: An overview of the Global Historical Climatology Network-Daily database. J. Atmos. Oceanic Technol., 29, 897–910, https://doi.org/10.1175/JTECH-D-11-00103.1.
Michiels, P., D. Gabriels, and R. Hartmann, 1992: Using the seasonal and temporal precipitation concentration index for characterizing the monthly rainfall distribution in Spain. Catena, 19, 43–58, https://doi.org/10.1016/0341-8162(92)90016-5.
Montanari, A., R. Rosso, and M. S. Taqqu, 1997: Fractionally differenced ARIMA models applied to hydrologic time series: Identification, estimation, and simulation. Water Resour. Res., 33, 1035–1044, https://doi.org/10.1029/97WR00043.
Mooley, D. A., B. Parthasarathy, N. A. Sontakke, and A. A. Munot, 1981: Annual rain-water over India, its variability and impact on the economy. J. Climatol., 1, 167–186, https://doi.org/10.1002/joc.3370010206.
Neyman, J., and E. S. Pearson, 1933: On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London, 231A, 289–337, https://doi.org/10.1098/rsta.1933.0009.
O’Brien, R. M., 2007: A caution regarding rules of thumb for variance inflation factors. Qual. Quant., 41, 673–690, https://doi.org/10.1007/s11135-006-9018-6.
Ogungbenro, S. B., and T. E. Morakinyo, 2014: Rainfall distribution and change detection across climatic zones in Nigeria. Wea. Climate Extremes, 5–6, 1–6, https://doi.org/10.1016/j.wace.2014.10.002.
Oliver, J. E., 1980: Monthly precipitation distribution: A comparative index. Prof. Geogr., 32, 300–309, https://doi.org/10.1111/j.0033-0124.1980.00300.x.
Öner, M., and I. D. Kocakoç, 2017: JMASM 49: A compilation of some popular goodness of fit tests for normal distribution: Their algorithms and MATLAB codes (MATLAB). J. Mod. Appl. Stat. Methods, 16, 547–575, https://doi.org/10.22237/jmasm/1509496200.
Onof, C., R. Chandler, A. Kakou, P. Northrop, H. Wheater, and V. Isham, 2000: Rainfall modelling using Poisson-cluster processes: A review of developments. Stochastic Environ. Res. Risk Assess., 14, 384–411, https://doi.org/10.1007/s004770000043.
Papalexiou, S. M., and A. Montanari, 2019: Global and regional increase of precipitation extremes under global warming. Water Resour. Res., 55, 4901–4914, https://doi.org/10.1029/2018WR024067.
Papalexiou, S. M., A. AghaKouchak, K. E. Trenberth, and E. Foufoula-Georgiou, 2018: Global, regional, and megacity trends in the highest temperature of the year: Diagnostics and evidence for accelerating trends. Earth’s Future, 6, 71–79, https://doi.org/10.1002/2017EF000709.
Papoulis, A., 1990: Probability & Statistics. Vol. 2, Prentice-Hall, 512 pp.
Parzen, E., 1960: Modern Probability Theory and Its Applications. John Wiley and Sons, 464 pp.
Razavi, S., and R. Vogel, 2018: Prewhitening of hydroclimatic time series? Implications for inferred change and variability across time scales. J. Hydrol., 557, 109–115, https://doi.org/10.1016/j.jhydrol.2017.11.053.
Rodell, M., I. Velicogna, and J. S. Famiglietti, 2009: Satellite-based estimates of groundwater depletion in India. Nature, 460, 999–1002, https://doi.org/10.1038/nature08238.
Salas, J. D., D. C. Boes, V. Yevjevich, and G. G. S. Pegram, 1979: Hurst phenomenon as a pre-asymptotic behavior. J. Hydrol., 44 (1–2), 1–15, https://doi.org/10.1016/0022-1694(79)90143-4.
Sculley, S. P., 1986: Frequency analysis of SFWMD rainfall. Water Resources Division, Resource Planning Department, South Florida Water Management District Tech. Publ. 86-6, 178 pp.
Song, P., and C. Kroll, 2011: The impact of multicollinearity on small sample hydrologic regional regression. World Environmental and Water Resources Congress 2011: Bearing Knowledge for Sustainability, Palm Springs, CA, ASCE, 3713–3722, https://doi.org/10.1061/41173(414)389.
Stephens, M. A., 1986: Tests based on EDF statistics. Goodness-of-Fit Techniques, R. B. D’Agostino and M. A. Stephens, Eds., Marcel Dekker, 97–194.
Vaheddoost, B., and H. Aksoy, 2017: Structural characteristics of annual precipitation in Lake Urmia basin. Theor. Appl. Climatol., 128, 919–932, https://doi.org/10.1007/s00704-016-1748-3.
Van Steenbergen, N., and P. Willems, 2013: Increasing river flood preparedness by real-time warning based on wetness state conditions. J. Hydrol., 489, 227–237, https://doi.org/10.1016/j.jhydrol.2013.03.015.
Veneziano, D., and A. Langousis, 2010: Scaling and fractals in hydrology. Advances in Data-Based Approaches for Hydrologic Modeling and Forecasting, World Scientific, 107–243.
Viola, F., D. Caracciolo, A. Forestieri, D. Pumo, and L. V. Noto, 2017: Annual runoff assessment in arid and semiarid Mediterranean watersheds under the Budyko’s framework. Hydrol. Processes, 31, 1876–1888, https://doi.org/10.1002/hyp.11145.
Von Mises, R., 1928: Wahrscheinlichkeit, Statistik und Wahrheit. Springer-Verlag, 192 pp.
Waylen, P. R., M. E. Quesada, and C. N. Caviedes, 1996: Temporal and spatial variability of annual precipitation in Costa Rica and the Southern Oscillation. Int. J. Climatol., 16, 173–193, https://doi.org/10.1002/(SICI)1097-0088(199602)16:2<173::AID-JOC12>3.0.CO;2-R.
Waymire, E., and V. K. Gupta, 1981a: The mathematical structure of rainfall representations: 1. A review of the stochastic rainfall models. Water Resour. Res., 17, 1261–1272, https://doi.org/10.1029/WR017i005p01261.
Waymire, E., and V. K. Gupta, 1981b: The mathematical structure of rainfall representations: 2. A review of the theory of point processes. Water Resour. Res., 17, 1273–1285, https://doi.org/10.1029/WR017i005p01273.
Waymire, E., and V. K. Gupta, 1981c: The mathematical structure of rainfall representations: 3. Some applications of the point process theory to rainfall processes. Water Resour. Res., 17, 1287–1294, https://doi.org/10.1029/WR017i005p01287.
Wilks, D. S., 1990: Maximum likelihood estimation for the gamma distribution using data containing zeros. J. Climate, 3, 1495–1501, https://doi.org/10.1175/1520-0442(1990)003<1495:MLEFTG>2.0.CO;2.
Yoo, C., and E. Ha, 2007: Effect of zero measurements on the spatial correlation structure of rainfall. Stochastic Environ. Res. Risk Assess., 21, 287–297, https://doi.org/10.1007/s00477-006-0064-3.