## 1. Introduction

Climate impact studies on a regional scale typically use the simulation results from general circulation models (GCMs) to assess past and future precipitation trends. GCMs account for various internal and external atmospheric forcings, including various scenarios for socioeconomic, greenhouse gases, and population changes. However, the spatial scale of the climate output is too large for regional studies on the water resources response (Maurer et al. 2007). The horizontal resolution in the current state-of-the-art GCMs is roughly around a few hundred kilometers. Thus, GCMs perform reasonably well at larger spatial scales (on the order of 10^{4} km^{2}) but poorly at finer spatial and temporal scales, especially precipitation, which is of interest to the hydrologists (Ghosh and Mujumdar 2008). Hence, the effect of large-scale feature changes on regional surface climate cannot be resolved in the current generation of GCMs, which introduces the need for downscaling techniques.

Many downscaling models have been developed in the past few decades, which all have strengths and weaknesses. Several research papers offer discussions on the various methodologies, the most notable among them being the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (Solomon et al. 2007; Wigley 2004). Statistical downscaling, which is the main emphasis in this paper, derives a statistical or empirical relationship between the large-scale climate features simulated by the GCM (predictors) and the finescale climate variables (predictands) for the region. The statistical downscaling method involves three implicit assumptions (Wilby and Wigley 1997; Von Storch et al. 2000): 1) the predictors are variables of relevance and are realistically modeled by the GCM, 2) the predictors employed fully represent the climate change signal, and 3) the relationship is valid under altered climate conditions. Recent studies (Hwang et al. 2011; Hwang 2011; Dezmain 2013) indicate the limited efforts and issues pertaining to development of statistical downscaling models for precipitation in Florida. Moreover, the monthly precipitation totals to be derived from downscaling are critical for water budget analysis and supply planning in the region, which is dominated by highly variable rainfall characteristics. Therefore, statistical downscaling of monthly precipitation throughout Florida is the main focus of this study.

Fowler et al. (2007) classified sophisticated statistical downscaling methods into three groups: regression models, weather typing schemes and weather generators. Khan et al. (2006) have compared three downscaling models—namely, artificial neural networks (ANNs), statistical downscaling model (SDSM), and the Long Ashton Research Station Weather Generator (LARS-WG)—in terms of various uncertainty attributes exhibited in their downscaling results of daily precipitation and daily maximum and minimum temperature. The methods indicated that no single model performed better for all the attributes and that downscaling daily precipitation ANN model errors are significant at 95% confidence level for all months of the year. However, SDSM and LARS-WG model errors of only a few months were significant. Further, they showed that the estimates of means and variances of downscaled precipitation and temperature performed better for SDSM and LARS-WG, while ANN performed poorly. Haylock et al. (2006), however, compared statistical downscaling models [canonical correlation analysis (CCA), SDSM, and ANN] and dynamical downscaling models [Hadley Centre Regional Climate Model, version 3 (HadRM3) and Climate High Resolution Model (CHRM)] to seven downscaled seasonal indices of heavy precipitation for two station networks in northeastern and southeastern England. They showed that models based on ANN were found to be the best at modeling the interannual variability of the indices and the skill of the downscaling model is highest in the winter season.

Goyal and Ojha (2010) applied several linear regression-based downscaling models such as direct, forward, backward, and stepwise regression for downscaling mean monthly precipitation in the arid Pichola watershed, India. They concluded that direct regression-based downscaling model yielded better performance among all the other regression techniques for that region, as mentioned above. Raje and Mujumdar (2011) compared three downscaling models: conditional random field (CRF), *k*-nearest neighbor (KNN), and support vector machine (SVM) for downscaling point-scale daily precipitation in the Punjab region, India, at six locations only for the monsoon regime. They indicated that CRF and KNN performed marginally better than SVM. Tripathi et al. (2006) developed several SVM-based downscaling models to downscale monthly precipitation at various meteorological subdivisions (MSDs) in India and showed that the SVM-based downscaling model performed better than a conventional ANN-based downscaling model.

As this article focuses on statistical downscaling of monthly precipitation, only regression model–based methods are selected and used in this study. Generally, the term transfer function is used to describe methods that directly quantify a relationship between the set of large-scale predictors and a predictand. Although many downscaling models have been developed in the past decade, no single model has been found to perform well over all the regions and time scales. Thus, evaluations of different models are critical to understand the applicability of the existing models. In addition to the above studies, comparison of different statistical downscaling models for precipitation have been conducted in many countries at various spatial and temporal scales (Frías et al. 2006; Tryhorn and DeGaetano 2011; Frost et al. 2011). However, it remains difficult to directly compare the skill of different downscaling models because of the range of different hydrological variables that have been assessed in the literature in both space and time domains, the large number of predictors used, and the different proposed evaluation metrics used for assessing model performances. Hence, based on the recommendations in the literature provided above the following methods are chosen for this study: multiple linear regression with a seasonal component, stepwise regression, and support vector machine are used in this study along with the introduction of positive coefficient regression to evaluate the monthly precipitation downscaling results in Florida. The main advantage of using positive coefficient regression is the nonnegative rainfall estimates, but the underlying assumption is that the set of predictors should also be nonnegative.

In general, while employing transfer functions in downscaling, models are either calibrated based on National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis data or GCM-based data. The former approach is widely used because only a single model can be developed and validated for all the GCMs, whereas in the latter case for each GCM a separate model needs to be developed. In the current study, models are developed based on NCEP–NCAR data. One key essential task in NCEP–NCAR-based models is the integration of NCEP–NCAR and GCM grid level data for the future projections. Past studies (Ghosh and Mujumdar 2008; Raje and Mujumdar 2011) have used a spatial interpolation technique for such integration. In the current study, a multiple linear regression (MLR) technique is proposed and evaluated.

The main focus of the paper is to develop and evaluate different models for downscaling monthly station-scale precipitation in the state of Florida. The contents of this paper are organized as follows: An introduction to the five downscaling models along with their inputs variables is presented first. Details of the case study region and data used are discussed next. A comparison of model performances, model rankings, and future projections is presented next. Finally, the major findings of the study are discussed in the conclusions section.

## 2. Downscaling models

*c*-means clustering technique has been adopted in the current study to identify the circulation patterns in the atmospheric variables. Fuzzy clustering is used to classify the principal components into clusters or classes and assign the membership values of the classes to various data points. Only the principal components accounting for 98% of the variance are chosen as an input to the fuzzy clustering technique. The main parameters required for fuzzy clustering algorithm are the number of clusters and the fuzzification parameter. These parameters are determined from cluster validity indices such as the fuzziness performance index (FPI) and the normalized classification entropy (NCE) (Roubens 1982). FPI estimates the degree of fuzziness generated by a specified number of classes and is given bywhere

*μ*

_{i,t}is the membership in cluster

*i*of the data in time

*t*,

*c*refers to the number of clusters, and

*T*is the total number of time steps. NCE estimates the degree of disorganization created by a specified number of classes and given aswhereThe optimum number of classes/clusters is obtained by minimizing the FPI and NCE measures provided by Equations (1) and (3).

*j*th location (i.e., grid point) in the

*i*th time interval,

*p*refers to the total number of GCM grids, and

*i*th time interval. The following sections provide details of the different downscaling models developed in the current study. A comprehensive overview of all the techniques used in the statistical downscaling models developed in the current study is provided in Figure 3.

### 2.1. Multiple linear regression

*Y*given a set of

*p*predictor variables (

*c*-means clustering method. The following equations involving the seasonal components are used for regression analysis (Ghosh and Mujumdar 2006):where pc is the principal components;

*μ*represents the membership values in each cluster;

*t*is the serial number of the data point;

*Y*is the observed precipitation; and

*β*,

*γ*, and

*ρ*are the coefficients of the regression equation. The sum of the membership values for each data point in all clusters is always equal to 1. To avoid redundancy

*i*is varied from 1 to

*m*, where

*m*represents 1 less than number of clusters and

*k*varies from 1 to

*n*, where

*n*represents the number of principal components chosen.

### 2.2. Positive coefficient regression

**τ**are encoded by the principal components and fuzzy clustering membership values matrix (input)

**τ**in such a way that estimated values are close to the observed values. The requirement will be fulfilled by considering the following problem:Solution of NNLS problem is solved in the current study using the standard algorithm provided by Lawson and Hanson (1974).

### 2.3. Support vector machine

*ψ*is the cost function;

*w*is the parameters of the regression;

*C*is a positive real constant, which serves as a penalty parameter for large model errors;

*k*refers to the number of predictor variables;

*x*represents the input to the model;

*y*represents the observed data; and

*σ*and the penalty parameter

*C*. The linear correlation coefficient, or

*R*value, obtained is used as an index to assess the performance of the model for fixing these parameters. In this study, a simplex procedure (van Gestel et al. 2004) is used to find the optimum range for the parameters

*C*and σ, giving the highest

*R*values for testing. The range of values used in this study was

*C*= 1–500 and

*σ*= 1–100.

*b*is the bias term.

### 2.4. Stepwise regression

SR belongs to a class of regression models where the choice of the predictive variables is carried out by an automatic procedure (Draper and Smith 1981) with a goal to choose a small subset from a larger set of predictor variables that results in a regression model that is simple with good predictive ability. The selection of predictor variables using SR is described for use in statistical downscaling by Huth (1999), Linderson et al. (2003), and Goyal and Ojha (2010).

### 2.5. Bias-correction spatial disaggregation approach

The BCSD method (Wood et al. 2004) used as a statistical downscaling approach for development of downscaled climate projections for the entire United States is also evaluated in the current study. The BCSD-based projections feature spatially downscaled translations of 112 projections from the World Climate Research Programme (WCRP) phase 3 of the Coupled Model Intercomparison Project (CMIP3) collectively produced by 16 models simulating three emissions paths [B1 (low), A1b (middle), and A2 (high); Brekke et al. 2009]. The BCSD method has been shown to provide downscaling capabilities comparable to other statistical and dynamical models especially in the context of hydrologic impact evaluations (Wood et al. 2004). The bias-correction part of the BCSD procedure (Maurer et al. 2007; Teegavarapu 2012) used for generation of fine-resolution temperature and precipitation datasets is illustrated in Figure 4a. The bias-correction procedure is the first step and uses a quantile mapping technique (Wood et al. 2004). The observations are spatially aggregated to a specific scale from ⅛° to 2°. The coarser scale for correction is chosen specifically for these data. However, there is no limitation on a specific scale selection. The GCM simulations for the twentieth and twenty-first centuries are also spatially interpolated to confirm the aggregated scale. Common temporal data of observations and GCM simulations for the twentieth century are used for bias correction. Quantile maps are then generated for any specific variable of interest (e.g., temperature and precipitation), and these maps are then used to adjust the twentieth- and twenty-first-century GCM simulations. Before application of bias correction to the twenty-first-century projected values for temperature, the twenty-first-century GCM trend is identified and is removed from the GCM twenty-first-century dataset and is then added back to the adjusted GCM dataset (Maurer 2007). The bias-correction methodology assumes that GCM biases have the same structure during the twentieth- and twenty-first-century simulations. The second step in BCSD methodology is the spatial disaggregation procedure, which is illustrated in Figure 4b. The adjusted twenty-first-century GCM data and observed values of the same spatial resolution are used to obtain spatial-correction factors. These factors at a coarser resolution are then interpolated to a resolution at which downscaled data are required. The downscaled data are available through an interactive website for downloading the data for a specific grid (of ⅛°) resolution over the entire United States. Wood et al. (2004) indicate that BCSD method is competitive to other downscaling models.

## 3. Study domain and data

The state of Florida covers 65 755 square miles with a population of approximately 19 million. Its geography is marked by a coastline, vast areas of water (17.9% of the state is covered with water), and the threat of hurricanes (http://www.census.gov/). Much of the state is at or near sea level. The most recent Köppen–Geiger climate classification scheme (Kottek et al. 2006) for Florida identifies four distinct classes with high rainfall variability. The majority of the state is classified as warm temperate, fully humid, and hot summer. However, three small regions in the southeastern part of Florida influenced by frequent hurricane landfalls are defined by equatorial fully humid, equatorial monsoonal, and equatorial winter dry classes. The climate varies from subtropical in the north to tropical in the south with two seasons (viz., wet and dry) and each season lasts for 6 months.

The dry season lasts from November to April, followed by a wet season from May to October. Several studies have indicated the relationship between the precipitation in Florida and El Niño–Southern Oscillation (ENSO) (Goly and Teegavarapu 2014) and Atlantic multidecadal oscillation (AMO) (Goly and Teegavarapu 2012; Teegavarapu et al. 2013). In the current study, monthly precipitation is downscaled at 18 rain gauge sites in Florida. The locations of these 18 stations are shown in Figure 2. Predictor variables for statistical downscaling models (Wilby et al. 1999; Wetterhall et al. 2005) are selected appropriately such that these variables are 1) reliably simulated by GCMs, 2) readily available from archives of GCM outputs, and 3) strongly correlated with the surface variables of interest. Precipitation is linked to airmass transport and atmospheric water content and thus can be related to atmospheric circulation or pressure patterns and wind velocities (Hughes and Guttorp 1994; Wetterhall et al. 2005), geopotential height (Cannon and Whitfield 2002), specific humidity (Charles et al. 1999), and temperature (Buishand and Brandsma 1999). Following the available literature and performing preliminary correlation analysis at one of the sites, large-scale atmospheric predictors of mean sea level pressure, geopotential height at 500 hPa, specific humidity at 850 hPa, surface air temperature, surface *U* wind (zonal), and surface *V* wind (meridional) were considered in the current study. Predictor data in the form of gridded climate variables over the area 25°–32.5°N and 80°–87.50°W are obtained from the NCEP–NCAR reanalysis data (Kalnay et al. 1996) for the years 1948–2010. Reanalysis data (gridded data with a resolution of 2.5°) are outputs from a high-resolution atmospheric model that have been run using data assimilated from surface observation stations, upper-air stations, and satellite-observing platforms. Use of these data as a proxy for observed climate variables in recent literature has been known to be a standard practice (Raje and Mujumdar 2011). Point-scale (i.e., rain gauge) monthly precipitation data at 18 different locations (noted in Table 1) for the years 1948–2010 were obtained from the U.S. Historical Climatology Network (USHCN) and used as a predictand for training the models.

List and locations of USHCN rain gauge stations.

The USHCN, a high-quality dataset containing monthly records of basic meteorological variables, was developed over the years by the National Oceanic and Atmospheric Administration (NOAA)/National Climatic Data Center (NCDC) to assist in the detection of regional climate change. In the current study there is no missing data and the period of record for the stations is the same (i.e., 1948–2010). USHCN stations were chosen using a number of criteria, including length of record, percent of missing data, number of station moves and other station changes that may affect data homogeneity, and resulting network spatial coverage. Distributions of observed mean monthly precipitation at 18 stations grouped based on the continental and peninsular regions are shown in Figures 5a,b. The stations in continental Florida shows uniform/bimodal distribution of monthly precipitation data, whereas the stations in the peninsular region show a clear unimodal distribution of monthly precipitation data.

A functional relationship is developed in downscaling models between large-scale atmospheric variables and point-scale precipitation data, with NCEP–NCAR reanalysis data as predictors and USHCN precipitation data as predictands at each station. Following the above relationship, the GCM data as predictors is used to model the future precipitation. The GCM used for the analysis is the Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled Global Climate Model, version 3 (CGCM3). The Special Report on Emissions Scenarios (SRES) scenario A1B is selected for the estimation of monthly rainfall at different rain gauge stations. The spatial resolution of the GCM grid is 2.81° × 2.81°. The CGCM3 represents a 720-ppm stabilization experiment. Although GCM runs are available at time scales shorter than 1 month, there is little confidence in those GCM outputs (Ghosh and Mujumdar 2006). The CGCM3 A1B scenario data are obtained over the area 23.72°–34.88°N and 78.75°–90°W covering all the NCEP–NCAR reanalysis grid points and for the period 2001 to 2099. Similarly, CGCM3 twentieth-century run data are obtained for the period 1961–90. Moreover, BCSD projections for the CGCM3 A1B scenario are obtained for the time period 1950–2099 (BCSD 2011).

## 4. Performance measures

*t*th time step,

*T*is the total number of time steps, and

*o*) precipitation data.

## 5. Results and analysis

Downscaling models described in previous sections were developed using six predictor variables—namely, mean sea level pressure, geopotential height at 500 hPa, specific humidity at 850 hPa, surface air temperature, surface *U* wind (zonal), and surface *V* wind (meridional) at 16 NCEP–NCAR grid points with a dimensionality of 96 (i.e., 16 × 6)—that are used as the standardized data of potential predictors. All downscaling models used in this study are listed in Table 2. The BCSD method (M1) provides precipitation that is downscaled to the ⅛° spatial resolution and all other methods developed in this study downscale to a point scale (i.e., at rain gauge site or location). Comparison of BCSD with all other methods is appropriate as the correlations between the observed ⅛° grid and the point-scale observed data were in the range of 0.9247 and 0.9957. A strict comparison of model M1 with the other models is not appropriate as the spatial resolution is not the same. However, the performance of M1 is included using different performance measures as an independent assessment. The models using different techniques or a combination of techniques are referred to as M1, M2, M3, M4, and M5 for convenience. The first 10 principal components (PCs) explained 98% variance when PCA was used. Fuzzy clustering was then applied to obtain the clusters with values of FPI and NCE attaining their minimum values for three (*m*) clusters and a fuzzy parameter of 1.4. NCEP–NCAR data for 44 years (1948–91) are chosen for calibrating the models and the remaining data from 1992 to 2010 are used for validation. In addition, 10 years (2001–10) of GCM data are used for testing.

List of statistical downscaling models evaluated in this study.

The five models were tested for the period 1992–2010 for reproduction of monthly rainfall statistics. Figures 6 and 7 show comparisons of observed and computed (model derived) statistics of monthly rainfall at the 18 different locations during the validation period. The results show that the SR model (M4) consistently underestimates the mean monthly precipitation at all the locations because of negative values. The inability of linear regression models in capturing the relationship between predictors and predictand is evident. All other downscaling models have captured the observed monthly mean very well, but the standard deviation is consistently underestimated by all the models. Moreover, there is a high variability in precipitation in Florida as shown by the distribution of mean monthly precipitation in each month at all stations. Most of the stations in continental Florida have low variability compared to the stations in peninsular Florida. The stations located in the continental and peninsular regions are listed in Table 1. This variability can be attributed to the large number of hurricane landfalls in the southeastern Florida. Several performance measures are used to evaluate the models. Some of the precipitation values using the models resulted in negative precipitation. These negative values are converted to zero values, facilitating the computation of performance measures. All the performance measures are equally weighted. Thus, the models are initially ranked ordinal for each performance measure (obtaining individual model rankings for each performance measure) and then the cumulative rankings pertained to each model across all the performance measures are used to obtain the overall ranks. Once the final ranks are obtained, the models are ranked again based on the totals. Finally, the model ranks are obtained at each station. Table 3 shows the performance measures values at station 3 (randomly chosen) during the validation period 1992–2010. It has been noticed, the M5 model performs better than all other models when all performance measures are considered. Next, Table 4 shows the ranking of the models based on the performance measures given in Table 3. The column “total” is obtained by summing up the ranks for each model, and the new rankings based on the total are placed in the column “rank.” In general, models M5 and M2 consistently outperform all the other models including BCSD (M1), which is bias corrected. The model rankings for all the stations during the validation period are shown in Table 5.

Performance measures during validation period 1992–2010 at station 3.

Ranking of models during the validation period 1992–2010 at station 3.

Model ranks during the validation period 1992–2010. (The best method across each station is represented by a boldface value.)

*i*refers to a downscaling model,

List of different sets of weights for performance measures.

Tables 7–9 list the weighted and overall performance measures based on three sets of weights listed in Table 6. Similar to the previous ranking procedure with equal weights, the model with lowest overall performance measures is considered the best model. Based on the results from all three sets of weights, model M5 is ranked lowest and is followed by the model M2. The overall performance measures of M5 vary from 0.432 (set 1) to 0.518 (set 2). Model M5 performs better compared to all other models when both error measures and performance measure including distribution-specific measures are considered.

Weighted performance measures for different models based on weights defined by set 1 (Table 6) during the validation period 1992–2010 for station 3.

Weighted performance measures for different models based on weights defined by set 2 (Table 6) during the validation period 1992–2010 for station 3.

Weighted performance measures for different models based on weights defined by set 3 (Table 6) during the validation period 1992–2010 for station 3.

Figure 8 shows a box plot comparison of the model-computed and observed monthly rainfall for four locations. Models M1, M2, M3, and M5 perform reasonably well in reproducing the median and interquartile ranges for the data at most of the locations. The model M4 results demonstrate very low median rainfall, whereas all the other models reproduce the spread of data well, with few exceptions at most of the stations. In this context, the choice and applicability of a downscaling model would appear to be dependent on the rainfall characteristics of a region. Figure 9 shows a comparison of the CDFs for observed and model-computed rainfall at four locations. The M1, M3, and M5 models usually stand out as the reasonably closest fits to the CDF of observed rainfall in independent testing at all locations, while the odd behavior of M4 model is due to the replacement of negative downscaled precipitation with zeros. As discussed earlier, there is no unanimous choice of models across the stations but at most of the stations the M5 model is outperforming the other models considering together the ranks, interquartile ranges, summary statistics, and CDF plots. Figures 10 and 11 show the seasonal correlations between models-computed and observed monthly precipitation at all stations. Florida experiences seasons that differ from most of the United States. Rather than the four seasons of winter, spring, summer, and fall, Florida exhibits a distinct wet (warm) season and dry (cooler) season. The dry season is from November through April, and the wet season is from May through October. As seen in Figures 10 and 11 the performance of the M5 model during wet and dry seasons is superior to that of the other models during the validation period. Especially, since the maximum correlation of M1 is less than the minimum correlation of M5. The ranges of the correlation coefficient during the wet season are (−0.098 to 0.373) for M1, (0.508–0.763) for M2, (0.338–0.718) for M3, (0.551–0.729) for M4, and (0.504–0.745) for M5, and the ranges during dry season are (−0.057 to 0.219) for M1, (0.199–0.630) for M2, (0.069–0.571) for M3, (0.258–0.587) for M4, and (0.357–0.654) for M5. Based on the validation results the choice of the downscaling model is region specific, but in general it has been noticed that M5 captures the monthly precipitation very well at most of the stations.

Once the downscaling models are calibrated and validated, the next step was to use these models to downscale the control scenario simulated by the GCM. One major task that is not generally discussed in the literature is the link between GCM variables at GCM grid level to NCEP–NCAR grid level. Generally, spatial interpolation is performed to obtain the GCM variables at NCEP–NCAR grid level but, in this study along with spatial interpolation (SI), the MLR method is assessed. In the current study, MLR is introduced as a replacement for the SI method, considering the relationship between NCEP and GCM grid variables is constant at all periods of time. The MLR equation is calibrated based on the time period 1961–90 and evaluated during the GCM testing period 2001–10. As discussed for the validation period, the GCM testing period 2001–10 is also evaluated with the performance measures and ranked. In this case, during the GCM testing period 2001–10, each model is evaluated for spatial interpolation and MLR. Table 10 shows the rankings of the models during GCM testing period at all stations. It is evident that 12 out of 18 stations show that the MLR method is performing better compared to the traditional spatial interpolation and BCSD. M5 ranks as the best model at 10 out of 18 stations (7 with MLR and 3 with spatial interpolation), M3 ranks as the best at 4 out of 18 stations (all with MLR), M2 ranks as the best at 1 out of 18 stations (1 with MLR), and M1 ranks as the best at 3 out of 18 stations. Tables 11 and 12 list the individual model performances at station 3 during the GCM testing period, and it is evident that the results improve for the MLR method in linking the GCM variable from GCM grid level to NCEP grid level.

Model ranks during the GCM testing period 2001–10. (The best method across each station is represented by a boldface value.)

Performance measures during GCM testing period 2001–10 at station 3.

Ranking of models during the GCM testing period 2001–10 at station 3. (The best model across each performance measure is represented by a boldface value, and the overall rankings for the models are in italics.)

The model based on the support vector regression (M5) technique is used for modeling monthly rainfall in Florida for 2001–2100 for SRES A1B, with a basic assumption that the predictor–predictand relationship will not change in the future. The box plot for different months has been plotted for the periods 2001–25, 2026–50, 2051–75, and 2076–2100 (Figure 12). Also, the cumulative distributions of rainfall series for four time periods are shown in Figure 13. Figures 12 and 13 show that there is a decreasing trend in rainfall, especially during the wet period (May–September). Conversely, data from the months November and December show an increasing trend. A possibility of decrease in rainfall during the wet period may suggest a decrease in hurricane activity, based on the SRES A1B scenario of CGCM3. Also, the rainfall decrease in the months of May–September and increase in the months of November and December may be an indicative of shift in the wet season in the region.

## 6. Summary and conclusions

Multiple linear regression (MLR) has been introduced as an alternative to spatial interpolation in this study linking the GCM variables between GCM and NCEP grids. The proposed method has improved the performance measures at 12 out of the 18 stations compared to those from spatial interpolation suggesting that using MLR method in linking GCM–NCEP offers improvements in downscaling. The CGCM3 SRES A1B projections based on support vector machine (SVM) with principal component analysis (PCA) and fuzzy clustering downscaling indicate a possibility of transition in wet season and a decrease in the hurricane activity. All the models developed and evaluated in the current study underestimated the variance, which is not a surprise considering the already acknowledged limitations of statistical downscaling methods. Several reasons can be attributed to failure of models in the case study region to replicate the observed variance of precipitation data and these include 1) high spatial and temporal variability of storm events and precipitation extremes driven by localized convective rainfall and hurricane landfalls in specific areas of the study region and 2) a possible inappropriate set of predictors used for establishing predictor–predictand relationships. In the case of Florida, even though the terrain is flat, precipitation (especially summer precipitation) is mostly due to convective and localized storms. These storms introduce high spatial and temporal variability in the precipitation patterns influenced by the tropical nature of the climate, large water bodies, and wet lands. The continental climate in the north and peninsular climate in the south heavily influences the rainfall patterns in Florida. These precipitation variations are not well captured by several GCMs evaluated for this region. Also, it is difficult to replicate point-scale (i.e., rain gauge) precipitation variability, which is attempted in the current study. Results from this study also indicate multiple predictor–predictand relationship-based methodologies with classification schemes (e.g., fuzzy clustering) to group data performed better than the methods that did not use any clustering schemes. The positive coefficient regression (PCR) approach as a variant of multiple linear regression may not always perform well when the predictor–predictand functional relationship is nonlinear. However, it was found to be superior to stepwise regression (SR) and MLR in providing better precipitation estimates in this study.

In conclusion, the study reported in this paper evaluated five statistical downscaling models that use techniques such as MLR, PCR, SR, and SVM for downscaling monthly precipitation from GCM output to a local station scale (i.e., rain gauge). The models also use PCA for dimensionality reduction and fuzzy *c*-means clustering method (FCM) for partitioning of predictor–predictand datasets into clusters. The models are used for downscaling precipitation at 18 rain gauge stations in Florida. The monthly precipitation is downscaled from simulations of the CGCM3 model using the IPCC A1B scenario. Performances of all the models are compared with already existing BCSD downscaling models widely used in the United States, using error and performance measures. The best model is chosen based on its performance during validation and GCM testing periods. Results from performance measures indicate that there is no single model that was found to perform best at all sites. However, at most of the sites the model using SVM with PCA and FCM is ranked as the best model based on several performance measures. Regression models performed poorly compared to SVM-based model with some of the models resulting in many negative precipitation values. The SVM-based model also performed better than the BCSD model, which already employs bias correction and uses large-scale numerically simulated precipitation as the predictor for fine-scaled precipitation. Use of regression in lieu of spatial interpolation to relate GCM-based variables and NCEP–NCAR variables at two different spatial resolutions improved the performances of all models. The models evaluated in the study are able to preserve the monthly mean and not the variance. The models fail to downscale highly variable wet season rainfall extremes in the case study region dominated by subtropical climatic conditions, localized convective storms, and hurricane landfalls generating variable long-duration rainfall extremes along the storm paths. Results of the statistical downscaling can be further improved by optimal selection of the predictor variables, because their selection significantly affects the results of downscaling model.

## Acknowledgements

The authors thank the Indo-U.S. Science and Technology Forum (IUSSTF) for providing a Research Internship in Science and Engineering (RISE) award to Dr. Aneesh Goly and Dr. Pradeep P. Mujumdar, Indian Institute of Science, Bangalore, India, for providing valuable guidance during this study. The authors sincerely thank the two anonymous reviewers for their objective and constructive criticism of the paper. The suggestions provided by the reviewers have helped in improvement of the paper.

## References

BCSD, cited 2011: Bias correction and statistically downscaled data. [Available online at http://gdo-dcp.ucllnl.org/downscaled_cmip3_projections/dcpInterface.html#Projections:%20Subset%20Request.]

Brekke, L. D., , J. E. Kiang, , J. R. Olsen, , R. S. Pulwarty, , D. A. Raff, , D. P. Turnipseed, , R. S. Webb, , and K. D. White, 2009: Climate change and water resources management: A federal perspective. U.S. Geological Survey Circular 1331, 76 pp.

Buishand, T. A., , and T. Brandsma, 1999: The dependence of precipitation on temperature at Florence and Livorno.

,*Climate Res.***12**, 53–63, doi:10.3354/cr012053.Cannon, A. J., , and P. H. Whitfield, 2002: Downscaling recent streamflow conditions in British Columbia, Canada using ensemble neural network models.

,*J. Hydrol.***259**, 136–151, doi:10.1016/S0022-1694(01)00581-9.Charles, S. P., , B. C. Bates, , and J. P. Hughes, 1999: A spatio-temporal model for downscaling precipitation occurrence and amounts.

,*Geophys. Res. Lett.***104**, 31 657–31 669, doi:10.1029/1999JD900119.Dezmain, C., 2013: Evaluation of future design rainfall extremes and characteristics using multiple-model and multiple-scenario climate change models. M.S. thesis, Florida Atlantic University Department of Civil, Environmental and Geomatics Engineering, 191 pp.

Draper, N., , and H. Smith, 1981:

*Applied Regression Analysis*. 2nd ed. John Wiley & Sons, 709 pp.Fodor, I. K., 2002: A survey of dimension reduction techniques. Lawrence Livermore National Laboratory Center for Applied Scientific Computing Tech. Rep., 18 pp. [Available online at http://computation.llnl.gov/casc/sapphire/pubs/148494.pdf.]

Fowler, H. J., , S. Blenkinsop, , and C. Tebaldi, 2007: Review linking climate change modeling to impacts studies: Recent advances in downscaling techniques for hydrological modeling.

,*Int. J. Climatol.***27**, 1547–1578, doi:10.1002/joc.1556.Frías, M. D., , E. Zorita, , J. Fernández, , and C. Rodríguez-Puebla, 2006: Testing statistical downscaling methods in simulated climates.

,*Geophys. Res. Lett.***33**, L19807, doi:10.1029/2006GL027453.Frost, A. J., and et al. , 2011: A comparison of multi-site daily rainfall downscaling techniques under Australian conditions.

,*J. Hydrol.***408**, 1–18, doi:10.1016/j.jhydrol.2011.06.021.Ghosh, S., , and P. P. Mujumdar, 2006: Future rainfall scenario over Orissa with GCM projections by statistical downscaling.

,*Curr. Sci.***90**, 396–404.Ghosh, S., , and P. P. Mujumdar, 2008: Statistical downscaling of GCM simulations to streamflow using relevance vector machine.

,*Adv. Water Resour.***31**, 132–146, doi:10.1016/j.advwatres.2007.07.005.Goly, A., , and R. S. V. Teegavarapu, 2012: Influence of teleconnections on spatial and temporal variability of extreme precipitation events in Florida.

*Proc. World Environmental and Water Resources Congress,*Albuquerque, NM, ASCE Environmental and Water Resources Institute, 1899–1908.Goly, A., , and R. S. V. Teegavarapu, 2014: Individual and coupled influences of AMO and ENSO on regional precipitation characteristics and extremes.

*Water Resour. Res.,***50,**4686–4709, doi:10.1002/2013WR014540.Goyal, M. R., , and C. S. P. Ojha, 2010: Evaluation of various linear regression methods for downscaling of mean monthly precipitation in arid Pichola watershed.

,*Nat. Resour.***1**, 11–18, doi:10.4236/nr.2010.11002.Haylock, M. R., , G. C. Cawley, , C. Harpham, , R. L. Wilby, , and C. M. Goodess, 2006: Downscaling heavy precipitation over the United Kingdom: A comparison of dynamical and statistical methods and their future scenarios.

,*Int. J. Climatol.***26**, 1397–1415, doi:10.1002/joc.1318.Hughes, J. P., , and P. Guttorp, 1994: A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena.

,*Water Resour. Res.***30**, 1535–1546, doi:10.1029/93WR02983.Huth, R., 1999: Statistical downscaling in central Europe: Evaluation of methods and potential predictors.

,*Climate Res.***13**, 91–101, doi:10.3354/cr013091.Hwang, S., 2011: Dynamical and statistical downscaling of climate information and its hydrologic implications over west-central Florida. Ph.D. dissertation, University of Florida, 191 pp.

Hwang, S., , W. D. Graham, , J. S. Geurink, , and A. Adams, 2011: Hydrologic importance of spatial variability in statistically downscaled precipitation predictions from global circulation models for west-central Florida.

*2011 AGU Fall Meeting,*San Francisco, CA, Amer. Geophys. Union, Abstract H33D-1342.Kalnay, E., and et al. , 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–471, doi:10.1175/1520-0477(1996)077,0437:TNYRP.2.0.CO;2.

Khan, M. S., , P. Coulibaly, , and Y. Dibike, 2006: Uncertainty analysis of statistical downscaling methods using Canadian global climate model predictors.

,*J. Hydrol.***319**, 357–382, doi:10.1016/j.jhydrol.2005.06.035.Kottek, M., , J. Grieser, , C. Beck, , B. Rudolf, , and F. Rubel, 2006: World map of Köppen-Geiger climate classification updated.

,*Meteor. Z.***15**, 259–263, doi:10.1127/0941-2948/2006/0130.Lawson, C. L., , and R. J. Hanson, 1974:

*Solving Least Squares Problems*. Prentice-Hall, 337 pp.Linderson, M. L., , C. Achberger, , and D. Chen, 2003: Statistical downscaling and scenario construction of precipitation in Scania, southern Sweden.

,*Nord. Hydrol.***35**, 261–278.Maurer, E. P., 2007: Uncertainty in hydrologic impacts of climate change in the Sierra Nevada, California under two emissions scenarios.

,*Climatic Change***82**, 309–325, doi:10.1007/s10584-006-9180-9.Maurer, E. P., , L. Brekke, , T. Pruitt, , and P. B. Duffy, 2007: Fine-resolution climate projections enhance regional climate change impact studies.

,*Eos, Trans. Amer. Geophys. Union***88**, 504, doi:10.1029/2007EO470006.Raje, D., , and P. P. Mujumdar, 2011: A comparison of three methods for downscaling daily precipitation in the Punjab region.

,*Hydrol. Processes***25**, 3575–3589, doi:10.1002/hyp.8083.Roubens, M., 1982: Fuzzy clustering algorithms and their cluster validity.

,*Eur. J. Oper. Res.***10**, 294–301, doi:10.1016/0377-2217(82)90228-4.Solomon, S., , D. Qin, , M. Manning, , Z. Chen, , M. Marquis, , K. B. Averyt, , M. Tignor, , and H. L. Miller, Eds., 2007:

*Climate Change 2007: The Physical Science Basis.*Cambridge University Press, 996 pp.Suykens, J. A. K., 2001: Nonlinear modeling and support vector machines.

*In Proc. 18th IEEE Instrumentation and Measurement Technology Conf.,*Budapest, Hungary, IEEE, 287–294, doi:10.1109/IMTC.2001.928828.Teegavarapu, R. S. V., 2012:

*Floods in Changing Climate: Extreme Precipitation*. Cambridge University Press, 285 pp.Teegavarapu, R. S. V., , T. Meskele, , and C. Pathak, 2012: Geo-spatial grid-based transformation of multi-sensor precipitation using spatial interpolation methods.

,*Comput. Geosci.***40**, 28–39, doi:10.1016/j.cageo.2011.07.004.Teegavarapu, R. S. V., , A. Goly, , and J. Obeysekera, 2013: Influences of Atlantic multidecadal oscillation phases on spatial and temporal variability of regional precipitation extremes.

,*J. Hydrol.***495**, 74–93, doi:10.1016/j.jhydrol.2013.05.003.Tripathi, S., , V. V. Srinivas, , and R. S. Nanjundiah, 2006: Downscaling of precipitation for climate change scenarios: A support vector machine approach.

,*J. Hydrol.***330**, 621–640, doi:10.1016/j.jhydrol.2006.04.030.Tryhorn, L., , and A. DeGaetano, 2011: A comparison of techniques for downscaling extreme precipitation over the northeastern United States.

,*Int. J. Climatol.***31**, 1975–1989, doi:10.1002/joc.2208.van der Maaten, L. J. P., , E. O. Postma, , and H. J van den Herik, 2009: Dimensionality reduction: A comparative review. Tilburg University Tilburg Centre for Creative Computing Tech. Rep. 2009–005, 36 pp.

van Gestel, T., , J. A. K. Suykens, , B. Baesens, , S. Viaene, , J. Vanthienen, , G. Dedene, , B. D. Moor, , and J. Vandewalle, 2004: Benchmarking least squares support vector machine classifiers.

,*Mach. Learn.***54**, 5–32, doi:10.1023/B:MACH.0000008082.80494.e0.Vapnik, V. N., 1995:

*The Nature of Statistical Learning Theory*. Springer Verlag, 188 pp.Von Storch, H., , H. Langenberg, , and F. Feser, 2000: A spectral nudging technique for dynamical downscaling purposes.

,*Mon. Wea. Rev.***128**, 3664–3673, doi:10.1175/1520-0493(2000)128<3664:ASNTFD>2.0.CO;2.Wetterhall, F., , S. Halldin, , and C. Xu, 2005: Statistical precipitation downscaling in central Sweden with the analogue method.

,*J. Hydrol.***306**, 174–190, doi:10.1016/j.jhydrol.2004.09.008.Wigley T. M. L., 2004: Input needs for downscaling of climate data. California Energy Commission Public Interest Energy Research Program Discussion Paper 500-04-027, 36 pp.

Wilby, R. L., , and T. M. L. Wigley, 1997: Downscaling general circulation model output: A review of methods and limitations.

,*Prog. Phys. Geogr.***21**, 530–548, doi:10.1177/030913339702100403.Wilby, R. L., , L. E. Hay, , and G. H. Leavesly, 1999: A comparison of downscaled and raw GCM output: implications for climate change scenarios in the San Juan River Basin, Colorado.

,*J. Hydrol.***225**, 67–91, doi:10.1016/S0022-1694(99)00136-5.Wilby, R. L., , S. P. Charles, , E. Zorita, , B. Timbal, , P. Whetton, , and L. O. Mearns, 2004: The guidelines for use of climate scenarios developed from statistical downscaling methods. Intergovernmental Panel on Climate Change Supporting Material, 27 pp.

Wood, A. W., , L. R. Leung, , V. Sridhar, , and D. P. Lettenmaier, 2004: Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs.

,*Climatic Change***62**, 189–216, doi:10.1023/B:CLIM.0000013685.99609.9e.