## 1. Introduction

Seasonal climate variability has great impact on many sectors such as agriculture, water resources, human health, ecosystems, transportation, and infrastructure. If seasonal climate conditions can be predicted a few months in advance, there is potential to reduce the damage caused by climate variability or take advantage of the benefits brought by it. There are two aspects that make the seasonal climate predictable: parts of the geophysical system such as the oceans and the land are evolving much slower than the atmosphere, and the useful information of long-lead seasonal forecasts can be extracted from large averaging in time and space and does not have to be the same quality as a weather forecast (Troccoli 2010).

Because of the predictability of seasonal climate variability, dynamical models and statistical models have been developed for seasonal forecasting. Weather and climate forecast centers such as the National Centers for Environmental Prediction (NCEP) and the European Centre for Medium-Range Weather Forecasts (ECMWF) make dynamical predictions using fully coupled ocean–atmosphere general circulation models (GCMs). These models have demonstrated skill in forecasting SSTs at global or continental scales (Troccoli et al. 2008), but they cannot accurately predict land surface variables such as precipitation rate (*P*) and 2-m air temperature (T2M) beyond 1-month lead consistently in all regions even though these forecasts provide direct information for agricultural management, water resources and irrigation planning, flood and drought forecasting, etc. (e.g., Lavers et al. 2009; Saha et al. 2014; Weisheimer et al. 2009; Yuan et al. 2011). In addition to this, GCMs are typically configured at effective resolutions of 100–300 km, which are too coarse to represent local features. As an alternative approach to dynamical models, statistical models have often used the observed antecedent SSTs in tropical oceans as predictors of local *P* and T2M (Troccoli et al. 2008). While such statistical models are black-box models and their applications are restricted by locations, they can give comparable and even higher skill for some regions and are much less expensive and more straightforward than dynamical models.

While GCM *P* and T2M forecasts have coarse resolution and suffer from systematic errors, there are several techniques that can be used to compensate for these limitations. Statistical or dynamical downscaling procedures can to be employed to produce local-scale *P* and T2M forecasts from a GCM (Fowler et al. 2007). Statistical downscaling employs the statistical relationship between the GCM output and the local observed weather, while dynamical downscaling uses GCM output to provide initial and boundary conditions for a regional climate model (RCM) to produce local-scale forecasts. Dynamical downscaling is computationally intensive and the errors from both the GCM and the RCM are propagated to the downscaled predictions and typically need an additional statistical postprocessing step to correct for systematic error (e.g., Hwang et al. 2011; Yoon et al. 2012). Compared to dynamical downscaling, statistical downscaling is computationally efficient and straightforward to apply, and the statistical downscaling methods inherently correct the model bias.

Statistical downscaling methods can be classified into three categories: perfect prognosis (PP), model output statistics (MOS), and weather generators (WGs) (Maraun et al. 2010). PP approaches calibrate the statistical model (often regression-related methods) using the observed large-scale field (predictor) and observed local-scale weather (predictand) and use the GCM forecast field as the predictor in the calibrated statistical model to produce the local-scale forecast (Maraun et al. 2010). The commonly used statistical models for PP are as follows: linear models, generalized linear and additive models, vector generalized linear models, weather type–based downscaling, nonlinear regression, and analog methods (e.g., Haylock et al. 2006; Juneng et al. 2010; Kang et al. 2007; Tian and Martinez 2012a,b; Yates et al. 2003). MOS approaches produce the downscaled field by directly calibrating the spatially disaggregated or dynamically downscaled GCM output using observations. Many MOS approaches have been applied to downscale GCMs (e.g., Abatzoglou and Brown 2012; Wood et al. 2004; Yoon et al. 2012; Tian et al. 2014) or to compare atmospheric models with fully coupled models (Landman et al. 2012; Ndiaye et al. 2011). WGs are statistical models that generate random sequences of local-scale weather with the temporal and spatial characteristics that resemble the observed weather. For downscaling purposes, WGs need to be used in conjunction with MOS or PP methods (e.g., Feddersen and Andersen 2005). Some other statistical downscaling techniques such as the Bayesian merging technique have also been applied to seasonal forecasts (e.g., Luo and Wood 2008; Luo et al. 2007; Yuan et al. 2013). The major limitation of statistical downscaling is it requires a long series of retrospective forecasts (reforecasts or hindcasts) and observations to derive the statistical relationship.

Another technique to improve GCM seasonal forecast skill is through multimodel ensembles (MMEs). The MME forecasts combine multiple GCMs and often have higher skill than any individual model (e.g., Juneng et al. 2010; Kar et al. 2012; Krishnamurti et al. 2000; Lavers et al. 2009; Yun et al. 2003). The benefits of MMEs are a result of the increased ensemble size that generates the full spectrum of possible forecasts by including models with different physics, numerics, and initial conditions (Hagedorn et al. 2005). Several algorithms have been developed to combine MME forecasts to produce probabilistic or deterministic forecasts. These methods vary from the simplest combination of unweighted ensembles or the ensemble mean of different models (e.g., Hagedorn et al. 2005; Landman and Beraki 2012; Mason and Mimmack 2002; Peng et al. 2002; Tippett and Barnston 2008) to more complex weighted ensembles using multiple linear regression, maximum likelihood, or Bayesian techniques (e.g., Bohn et al. 2010; Doblas-Reyes et al. 2005; Duan et al. 2007; Kar et al. 2012; Kharin and Zwiers 2002; Raftery et al. 2005). A number of studies have shown that the latter approaches do not always have higher skill than the simple unweighted ensemble approach, especially when the sample size is relatively small (e.g., Doblas-Reyes et al. 2005; Landman and Beraki 2012).

There are strong connections between large-scale oceanic–atmospheric oscillations and local land surface variables in many regions of the globe. A number of studies have employed such semi-empirical relationships to predict long-lead land surface variables such as precipitation and streamflow at several locations using different data-driven statistical models (e.g., Grantz et al. 2005; Juneng et al. 2010; Kalra and Ahmad 2012; Regonda et al. 2006). The most commonly used oceanic–atmospheric oscillations are based on SST anomalies in the tropical Pacific Ocean associated with the El Niño–South Oscillation (ENSO). ENSO has been shown to have significant influence on precipitation and streamflow in the southeastern United States (SEUS) (Johnson et al. 2013; Schmidt et al. 2001; Sun and Furbish 1997). Many GCMs have shown high skill for predicting tropical SST at long lead times (leads) (e.g., Lavers et al. 2009; Saha et al. 2014; Weisheimer et al. 2009), and such information is attractive for use with the PP downscaling approaches to predict local land surface variables.

This raises a question whether seasonal *P* and T2M at local scale are predictable using GCMs’ tropical SST prediction. In other words, is there any added value to local-scale *P* and T2M prediction using GCMs’ tropical SST prediction against directly downscaling these land surface fields from GCMs for different locations, seasons, leads, and MME schemes? Thus, it is necessary to conduct a comprehensive assessment of the local-scale *P* and T2M prediction from multiple climate models using direct methods (MOS and downscaling from GCMs’ *P* and T2M) and indirect methods (PP and prognosis from SST prediction) for all lead times and seasons over a region. The investigation of these local forecasts can provide valuable information for decision making in water and natural resources, agriculture, infrastructure planning, and other sectors at the regional scale.

This research was conducted over three states (Alabama, Georgia, and Florida) in the SEUS. The paper is organized as follows: Section 2 describes the dataset and methodology used in this study. Section 3 presents results. Discussion and conclusions are given in section 4.

## 2. Data and methods

### a. The datasets

The newly developed hindcast dataset from the North American Multimodel Ensemble (NMME) system was used in this work. The NMME is a forecasting system that currently includes GCMs from the National Oceanic and Atmospheric Administration (NOAA)/NCEP and Geophysical Fluid Dynamics Laboratory (GFDL), the International Research Institute for Climate and Society (IRI), the National Center for Atmospheric Research (NCAR), and the National Aeronautics and Space Administration (NASA). The hindcast dataset of the NMME is archived at IRI. The *P* and T2M from nine NMME models and SST from seven NMME models have been used in this study. Table 1 shows the basic information of the hindcast archives from nine models of the NMME. All model outputs are monthly forecasts at 1.0° × 1.0° resolution (approximately 100 km) (Fig. 1) with at least 7-month lead time and covering a 29-yr period (1982–2010 or 1981–2009). All the monthly forecasts of *P*, T2M, and SST were converted to seasonal means by calculating 3-month moving averages. Since the hindcasts from GFDL CM2.1-aer04, GMAO, and GMAO-062012 had missing data, this paper mainly focused on comparing the remaining six models for all seasons and leads over the SEUS.

Basic information of the NMME hindcast archives used in this study.

Observed monthly SST in the Niño-3.4 region (January 1982–December 2010) (170°–120°W, 5°S–5°N) was obtained from the NOAA/Climate Prediction Center (available at http://www.cpc.ncep.noaa.gov/data/indices/). Monthly values were converted to 3-month seasonal means. The *P* and T2M from the National Land Data Assimilation System phase 2 (NLDAS-2) forcing dataset (Xia et al. 2012a,b) were used as a surrogate for long-term observations for forecast verification and downscaling (as described in sections 2b and 2c). The NLDAS-2 is available at a 0.125° × 0.125° (approximately 12 km) (Fig. 1) resolution, with an hourly time step, over North America from 1979 to the present. The NLDAS-2 forcing dataset was developed for driving land surface models and was mostly derived from the North American Regional Reanalysis (Mesinger et al. 2006) and observations. Seasonal values of *P* and T2M from January 1982 to December 2010 were used in this study. To calculate seasonal *P*, hourly values were aggregated into daily values and then averaged into monthly values and converted to 3-month moving averages. T2M was calculated from daily maximum and minimum temperature (Tmax and Tmin), which were extracted from hourly values, averaged into monthly values, and converted to 3-month moving averages.

### b. The MME schemes

There are a number of algorithms to combine multiple GCMs. This work used two simple MME schemes to combine models of the NMME system to generate both deterministic and probabilistic forecasts. One approach combined all forecast members of the NMME models into a “super ensemble” (SuperEns), which assigns equal weight to all member forecasts. For the SuperEns method, the probabilistic forecast was generated using all the forecast members and the deterministic forecast were calculated by averaging all members. The other approach assigned equal weights to each model to create a “mean ensemble” (MeanEns); the MeanEns assigns equal weights to each model’s ensemble means. The probabilistic forecast and deterministic forecast were generated using the ensemble of model ensemble means and by averaging the ensemble means of all models, respectively.

### c. The PP downscaling method

The PP downscaling method establishes statistical models using large-scale and local-scale observed data and applies these models to the GCMs output. Nonparametric locally weighted polynomials regression (LWPR) models were conducted for each member of the NMME models. For the regression models, the predictor was the spatial average of Niño-3.4 SSTs and the predictand was *P* or T2M. The regression models were first trained using the observed Niño-3.4 SST and the *P* or T2M of the NLDAS-2 in the SEUS for each season and each grid point over the SEUS, and then these trained models were applied to the quantile mapping (see section 2d) bias-corrected SST from each of the NMME forecast members to predict *P* or T2M for all seasons and leads over all the grid points over the SEUS. Before fitting the model, the *P* data were transformed by taking a square root to remove some of the skewness (e.g., Dettinger et al. 2004; Hidalgo et al. 2008). The LWPR models and downscaling procedures are briefly described below.

*x*is the predictor variable (in this case SST),

*Y*is the response variable (in this case

*P*or T2M), the function

*f*can be linear or nonlinear, and

*ε** is the residual, which is assumed to follow a normal distribution with zero mean and standard deviation estimated from the data. The LWPR method fits the function

*f*locally to estimate the response variable

*Y*. In this procedure, the value of the function

*f*at any point

*x** is obtained by fitting a polynomial of order

*p*to a small number of neighbors

*K*of

*x** using a weighted least squares approach with the observed data (in this case observed SST and

*P*or T2M), and then the model residuals are determined. The predicted mean value of a response variable is estimated by the fitted polynomial function with new values of the predictor variable (in this case bias-corrected SST forecast from the NMME). To include the uncertainties of the regression model and generate an ensemble forecast for each member, 10 deviates were stochastically generated from the normal distribution of

*ε**. The ensemble for each forecast member was then generated by adding these ten random deviates to the predicted mean value of the response variable. This approach has shown good results with a small sample size (Singhrattna et al. 2005). In addition to the stochastic generator, the nonparametric

*K*nearest neighbor (KNN) approach could be used to resample the residuals and generate an ensemble forecast. The benefit of the KNN approach is that there is no normality assumption for the residuals, but this approach only works well with a large sample size (Grantz et al. 2005; Prairie et al. 2005; Singhrattna et al. 2005) and thus was not tested in this study. The local regression and likelihood toolbox (LOCFIT) package (Loader 1999) was used in the fitting process; the optimal values of the neighborhood size

*K*and the polynomial order

*p*(usually 1 or 2) were obtained by minimizing the generalized cross-validation (GCV) score function,where

*e*

_{i}is the error (the difference between the model estimate and observed),

*N*is the number of data points, and

*m*is the degrees of freedom of a local fit, which provides a generalization of the number of parameters in the local regression model. The detailed definition and calculation of

*m*can be found in Loader (1999).

### d. The MOS downscaling methods

The MOS method calibrates the model output against the observations. In this study, an interpolation-based MOS downscaling method, spatial disaggregation with bias correction (SDBC), was conducted for each member and each lead forecast of the NMME models. The SDBC method was modified from the widely used bias correction and spatial disaggregation (BCSD) method (e.g., Christensen et al. 2004; Maurer et al. 2010; Salathe et al. 2007; Wood et al. 2002, 2004; Yoon et al. 2012).

*P*and T2M forecasts to the resolution of the NLDAS-2 (0.125° × 0.125°) using the inverse distance weighting (IDW) technique. The anomalies were the difference between the model forecast and the climatology of the model forecast for each season. The IDW technique first computes distances from the interpolating target grid points to all other grid points in the dataset and then calculates weights based on the distances. The

*P*or T2M anomalies of the interpolating target grid points are the weighted average of the

*P*or T2M anomalies of the NMME grid points. The weighting function iswhere

*Z*is the anomalies at the NMME grid point,

*d*

_{i}is the distance between the interpolating target grid point (

*X*

_{o}′,

*Y*

_{o}′) and the NMME grid point (

*X*

_{i}′,

*Y*

_{i}′),

*p*is the power to which the distance is raised (

*p*= 2 was used in this work), and

*n*is the number of grid points. The second step is to use the quantile mapping technique (Panofsky and Brier 1958; Wood et al. 2002) with the anomalies of the NLDAS-2 to bias correct the anomalies of the interpolated model output. The quantile mapping technique includes three steps: 1) creating cumulative density functions (CDFs) of the anomalies of NLDAS-2 forcing data for each grid point, 2) creating CDFs of the NMME anomalies for each grid point and lead time, and 3) using the quantile mapping that preserves the probability of exceedance of the NMME anomalies but corrects the NMME anomalies to the value that corresponds to the same probability of exceedance from the anomalies of NLDAS-2. Thus, bias-corrected data on time

*i*at grid

*j*was calculated aswhere

*F*(

*u*) and

*F*

^{−1}(

*u*) denote a CDF of the data and its inverse and subscripts sim and obs indicate downscaled NMME anomalies and NLDAS-2 anomalies, respectively. Thus, both the mean and variance of the forecast anomalies were corrected in probability space. The third step of the SDBC method is to produce the forecast by adding the climatology of the NLDAS-2 to the bias-corrected interpolated anomalies of the NMME.

### e. Forecast verification

_{f}) and the mean squared error of the climatology (MSE

_{c}) were calculated for each grid point for each season and lead. The MSESS is then calculated byMSESS ranges from −∞ to 1.0, with values of 0 indicating the forecast has the same skill as climatology, negative values indicating the forecast has worse skill than climatology, and 1.0 indicating a perfect forecast.

_{f}is the Brier score of the forecast and BS

_{c}is the Brier score of climatology (the NLDAS-2 seasonal values in the forecast target season). The BS

_{f}and BS

_{c}were calculated aswhere

*n*is the number of forecasts and observations of a dichotomous event,

*p*

_{i}

^{f}is the forecasted probability of the event using the forecasts, and

*p*

_{i}

^{c}is the forecasted probability of the event using climatology (which is always 33.3%);

*I*

_{i}

^{o}= 1 if the event occurred and

*I*

_{i}

^{o}= 0 if the event did not occur. The BSS ranges between −∞ and 1.0, and values of the BSS of 1 indicate perfect skill and values of 0 indicate that the skill of the forecast is equivalent to climatology.

Since the forecasts need to be tested on a sample of new data that were not used for training, a cross-validation procedure was conducted for all the downscaling methods used in this study. We tested both leave-one-out and leave-three-out cross-validation procedures for bias correction of SST. For leave-one-out (leave-three-out) cross validation, we left the target season (target season plus two neighbor seasons) out when creating the CDFs of the anomalies and calculating the NLDAS-2 climatology. A Student’s *t* test was conducted to compare the squared error of the NMME ensemble forecasts from these two procedures and found there was no significant difference between them (*p* value > 0.4 for all seasons and leads). In addition, the interannual autocorrelation in the data was small and insignificant. Therefore, the leave-one-out cross validation was justified and was conducted for all downscaling methods used in this study.

## 3. Results

In the following sections, we defined skill scores below 0 as “no skill,” between 0 and 0.2 as “limited skill,” between 0.2 and 0.4 as “noticeable skill,” and above 0.4 as “high skill.” The results for near-normal probabilistic forecasts were not shown, because all near-normal probabilistic forecasts showed no skill, as many previous studies found (e.g., van den Dool and Toth 1991).

### a. Model performance on forecasting Niño-3.4 SST

Since ENSO has significant impact on the seasonal climate variability in the SEUS, we first evaluated the skill of the quantile mapping bias-corrected spatial averaged SST in the Niño-3.4 region for each of the NMME models. Figure 2 shows the Niño-3.4 SST forecast skill for six NMME models as a function of season and lead time. All the models showed high skill in predicting the SST for most leads. While the CFSv1 is the predecessor of the CFSv2, the CFSv1 was more skillful than CFSv2 in fall and winter seasons but showed lower skill at far leads when the target seasons were in spring. This result is consistent with Barnston and Tippett (2012), who explained the low skill was mainly caused by a discontinuity in the initial condition climatology of the CFSv2 and this discontinuity turns out to be most prominent in the tropical Pacific region (Kumar et al. 2012; Xue et al. 2011).

### b. Overall mean forecasting skill

The overall mean forecasting skill was calculated by averaging the skill scores over space and seasons. Tables 2 and 3 summarize the *P* and T2M overall mean skill scores for each downscaling method and for each model and MME schemes for the first forecast season (0-month lead), respectively. On average, for most of the *P* and T2M forecasts, the SDBC method and the LWPR method showed limited skill or no skill.

Overall mean of precipitation forecast skill for the NMME models in lead 0. The highest skill scores are in boldface for each model and each category.

As in Table 2, but for temperature.

### c. Performance of single models

In this section, we showed results for deterministic forecasts and above- and below-normal probabilistic forecasts for the single NMME model forecasts.

Figures 3–6show the deterministic and probabilistic forecast skill for the *P* and T2M forecasts for six NMME models downscaled by SDBC and LWPR at 0-month lead for winter [December–February (DJF)] and summer [June–August (JJA)] over space. For the *P* forecasts downscaled by the SDBC method in winter, all six models showed noticeable skill or high skill over a great area of the SEUS. However, in summer, only CFSv1 and CFSv2 showed limited skill and the other four models showed no skill (Fig. 3). The LWPR method showed noticeable skill or high skill in DJF and limited skill or no skill in JJA (Fig. 4). For the T2M forecasts downscaled by the SDBC method in winter, only the CFSv2 and the GFDL models showed limited to high skill and the other four models showed no skill or limited skill over the region; in summer, only CFSv1 and CFSv2 showed noticeable or high skill for most of the region and other models mostly showed no skill (Fig. 5). The LWPR method for the T2M forecasts mostly showed no skill or limited skill (Fig. 6).

The *P* and T2M forecast skill as a function of season and lead were calculated by averaging the skill scores over space. Figures 7–10 show the deterministic and probabilistic forecast skill for *P* and T2M for six NMME models downscaled by SDBC and LWPR as a function of target season and lead time. In Fig. 7, we compared the models’ *P* forecasts downscaled by the SDBC method. There was no skill when the forecast target was in summer seasons; however, for deterministic forecasts five models showed limited or noticeable skill for all leads for winter seasons, except the CCSM3 showed no skill, and for probabilistic forecasts only CFSv1 and CFSv2 show limited skill for winter seasons. Similar to the SDBC method, most of the *P* forecasts downscaled by the LWPR showed limited skill in winter and no skill in summer (Fig. 8). Figures 9 and 10 provide the T2M forecast skill as a function of target season and lead time for the SDBC and LWPR methods, respectively. For the SDBC method, only the CFSv1 and the CFSv2 models showed more than limited skill at near lead over all seasons; the other four models mostly showed no skill at all leads throughout the year (Fig. 9). The LWPR method mostly showed no skill even at near lead (Fig. 10).

To compare the skill between the LWPR and SDBC methods for each season and lead time, we calculated the mean difference between the skill scores of the LWPR and SDBC over all grid points (the LWPR skill scores minus the SDBC skill scores). Figure 11 shows the LWPR method improved most models’ *P* forecast skill (except the CFSv1 and CFSv2 models) compared to the SDBC method only for summer seasons. Figure 12 shows, for most leads and seasons, the LWPR method improved most models’ T2M forecast skill (except the CFSv2 model) compared to the SDBC method. Since these comparisons were based on a very large sample size (3385 grid points), the 95% confidence interval were extremely small (around ±0.005). These improvements were considered as statistically significant.

### d. Performance of ensemble forecasts

In this section, we report the deterministic and probabilistic forecast skill for the two MME forecasts (MeanEns and SuperEns) and compare these results with the single models.

The *P* and T2M forecast skill over space are shown for winter (DJF) and summer (JJA) at 0-month lead (Figs. 13 and 14). For *P* forecasts, the SDBC SuperEns and the LWPR SuperEns and MeanEns showed noticeable skill or high skill in winter but limited skill in summer; the SDBC MeanEns showed less skill than the other three methods in both seasons (Fig. 13). For the T2M forecast, the SDBC SuperEns showed high skill or noticeable skill in DJF and JJA over the region; the SDBC MeanEns showed less skill than the SDBC SuperEns but higher skill than the LWPR SuperEns and MeanEns (Fig. 14). The *P* and T2M forecasting skill at all leads were calculated by averaging the skill scores over space. For the *P* forecasts, the SDBC SuperEns and the LWPR SuperEns and MeanEns mostly showed limited skill in cool seasons even at far leads but less than limited skill in warm seasons, but the SDBC MeanEns mainly showed no skill for all leads at all seasons (Fig. 15). For the T2M forecast, the SDBC MeanEns showed no skill after lead 1; the LWPR SuperEns and MeanEns showed limited skill or no skill in all seasons (Fig. 16). For both *P* and T2M forecasts, the SDBC SuperEns always had better performance than the SDBC MeanEns. This was because the models with more forecast members such as CFSv2 showed higher skill and MeanEns assigned less weight than SuperEns to these skillful models.

To compare SuperEns with single model forecasts for each season and lead time, we calculated the mean difference between the skill scores of SuperEns and every single model over all grid points (SuperEns skill scores minus single model skill scores). Because of the limited space, we only compared the forecasts downscaled by the SDBC method. Figures 17 and 18 show the SuperEns *P* and T2M had similar skill to the CFSv2 *P* and T2M but higher skill than the other models. As explained in section 3c, these improvements were considered as statistically significant.

## 4. Discussion and conclusions

We used an MOS approach (SDBC) and a PP approach (LWPR) to statistically downscale the NMME models and two MME schemes (SuperEns and MeanEns) for *P* and T2M forecasts. The downscaled probabilistic and deterministic forecast skill was assessed over the SEUS for each of the NMME models and model ensembles for all leads and seasons. For the SDBC method, the *P* forecast of all NMME models and ensembles showed skill for winter seasons but no skill for summer seasons, whereas only the T2M forecast of the CFSv2 and SuperEns showed skill throughout the year over the SEUS. For the LWPR method, the *P* forecast showed positive skill in winter seasons but no skill or limited skill in summer seasons; the T2M forecast mostly showed no skill in all seasons. To the authors’ knowledge, this work was the first to apply LWPR for statistical downscaling. While the skill was not high, in many cases the LWPR method showed higher skill than the SDBC method. This result implies the local *P* can be better predicted indirectly using SSTs, in some cases, compared to directly downscaling and bias correct the model *P* field, whereas the T2M field could be better predicted by directly downscaling and bias correcting the model output. In many cases, the PP downscaling methods showed significantly higher skill than the MOS downscaling methods. While the SuperEns showed higher skill than the MeanEns and showed similar skill to the best single models for the SDBC method, the SuperEns and MeanEns showed similar skill to each other and to the best single models for the LWPR method. In summary, the forecast skill varies among different models, ensemble schemes, variables, seasons, and locations. The skillful downscaled forecasts could be useful in agriculture, water resources, human health, ecosystems, transportation, and infrastructure; the downscaling framework could be applied in the other parts of the world.

The PP methods used in this study are based on the relationship between the Niño-3.4 SST and the local *P* and T2M in the SEUS, which, in many cases, showed higher skill than the MOS methods that are based on the bias correction of the spatially disaggregated raw model output. This indicates the GCMs were able to better simulate large-scale ocean variables than the local-scale *P* and T2M, as explained by Juneng et al. (2010). The PP methods successfully enhanced the forecast skill by statistically relating the large-scale climate signal of ENSO with the local *P* and T2M in the SEUS. To improve the *P* forecast using the MOS procedure, some predictable low-level circulation fields—such as geopotential height fields, moisture fields at standard pressure levels, thickness fields, etc.—can be considered as predictors of the *P* field (e.g., Landman et al. 2012). Since the NMME has not archived low-level circulation fields, we did not consider these fields in this study but it should be explored in the future.

The use of the PP method in this study was justified by two reasons. First, the local climate in the SEUS has strong connections with the ENSO signal between October and March, with El Niño (La Niña) years tending to be cooler (warmer) and wetter (dryer) during these months (Ropelewski and Halpert 1986). Second, the Niño-3.4 SST as a primary indicator of ENSO was highly predictable by the NMME models particularly in the winter seasons. However, none of the statistical downscaled forecasts showed even modest skill in summer seasons. The poor performance of the MOS was likely due to the inability of the GCMs to simulate the strong local convective climate in summer seasons in the SEUS. Since ENSO is only significantly correlated to the local climate in the SEUS in cool seasons, it is expected that the PP method would not do well in summer seasons. Other than ENSO, there are other large-scale SST-based variables associated with the climate in the SEUS including the Pacific decadal oscillation and Atlantic multidecadal oscillation (e.g., Johnson et al. 2013; Martinez and Jones 2011; Martinez et al. 2009). The forecast skill can be further improved with the PP method in summer seasons if the historical forecasts for the NMME system were long enough to represent these longer-term phenomena.

The MME forecasts in this study were based on combining all forecast members (SuperEns) and assigning equal weights to each model (MeanEns). For the SDBC method the SuperEns outperformed the MeanEns, whereas the differences in the ensemble methods were very small when the downscaling method used was LWPR. This was because for the SDBC method the models with higher skill typically contained a higher number of forecast members, but for the LWPR method the models showed similar skill. Although the skill of the SuperEns is generally higher than most of the single models, it did not outperform the best single model. This is likely due to the relatively poor performance of some of the NMME models. If the historical forecasts were long enough to provide a sufficient sample to robustly assign the truly bad forecast members with less weight and truly good forecast members with more weight, some weighted MME schemes (e.g., multiple linear regression, maximum likelihood, and Bayesian techniques) may be considered to further improve the forecast skill. In addition, it is notable that the skill of the SDBC and LWPR methods was complementary in some cases. The forecasting skill can be improved using a combination of the forecasts downscaled by the SDBC and LWPR methods.

Because of space limitations, this research only evaluated the probabilistic forecast using the BSS measuring the overall probabilistic forecast accuracy without conducting a complete evaluation for forecast systems’ ability for discrimination, reliability, sharpness, etc. Such evaluation could provide useful forecast information for end users and could be considered as a future work.

## Acknowledgments

This research was supported by the NOAA-RISA Project NA12OAR4310130. The authors thank Seungwoo Chang of the University of Florida and Drs. Alison Adams and Tirusew Asefa of Tampa Bay Water for their useful discussions. The authors acknowledge Ying Zhang from the University of Florida Research Computing (http://researchcomputing.ufl.edu) for providing computational support that have contributed to the research results reported in this publication. The authors also thank three anonymous reviewers for their useful comments. The NMME hindcast dataset was provided by International Center for Climate and Society (IRI) at Columbia University. The forcing data of NLDAS-2 used in this effort were acquired as part of the activities of NASA’s Science Mission Directorate and are archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC).

## REFERENCES

Abatzoglou, J. T., , and T. J. Brown, 2012: A comparison of statistical downscaling methods suited for wildfire applications.

,*Int. J. Climatol.***32**, 772–780, doi:10.1002/joc.2312.Barnston, A. G., , and M. K. Tippett, 2012: A comparison of skill of CFSv1 and CFSv2 hindcasts of Niño3.4 SST.

*Science and Technology Infusion Climate Bulletin,*NOAA/National Weather Service, Silver Spring, MD, 18–28. [Available online at http://www.nws.noaa.gov/ost/climate/STIP/37CDPW/37cdpw-tbarnston.pdf.]Bohn, T. J., , M. Y. Sonessa, , and D. P. Lettenmaier, 2010: Seasonal hydrologic forecasting: Do multimodel ensemble averages always yield improvements in forecast skill?

,*J. Hydrometeor.***11**, 1358–1372, doi:10.1175/2010JHM1267.1.Christensen, N., , A. Wood, , N. Voisin, , D. Lettenmaier, , and R. Palmer, 2004: The effects of climate change on the hydrology and water resources of the Colorado River basin.

,*Climatic Change***62**, 337–363, doi:10.1023/B:CLIM.0000013684.13621.1f.Delworth, T. L., and et al. , 2006: GFDL's CM2 global coupled climate models. Part I: Formulation and simulation characteristics.

,*J. Climate***19**, 643–674, doi:10.1175/JCLI3629.1.Dettinger, M., , D. Cayan, , M. Meyer, , and A. Jeton, 2004: Simulated hydrologic responses to climate variations and change in the Merced, Carson, and American River basins, Sierra Nevada, California, 1900–2099.

,*Climatic Change***62**, 283–317, doi:10.1023/B:CLIM.0000013683.13346.4f.DeWitt, D. G., 2005: Retrospective forecasts of interannual sea surface temperature anomalies from 1982 to present using a directly coupled atmosphere–ocean general circulation model.

,*Mon. Wea. Rev.***133**, 2972–2995, doi:10.1175/MWR3016.1.Doblas-Reyes, F. J., , R. Hagedorn, , and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—II. Calibration and combination.

,*Tellus***57A**, 234–252, doi:10.1111/j.1600-0870.2005.00104.x.Duan, Q., , N. K. Ajami, , X. Gao, , and S. Sorooshian, 2007: Multi-model ensemble hydrologic prediction using Bayesian model averaging.

,*Adv. Water Resour.***30**, 1371–1386, doi:10.1016/j.advwatres.2006.11.014.Feddersen, H., , and U. Andersen, 2005: A method for statistical downscaling of seasonal ensemble predictions.

,*Tellus***57A**, 398–408, doi:10.1111/j.1600-0870.2005.00102.x.Fowler, H. J., , S. Blenkinsop, , and C. Tebaldi, 2007: Linking climate change modelling to impacts studies: Recent advances in downscaling techniques for hydrological modelling.

,*Int. J. Climatol.***27**, 1547–1578, doi:10.1002/joc.1556.Grantz, K., , B. Rajagopalan, , M. Clark, , and E. Zagona, 2005: A technique for incorporating large-scale climate information in basin-scale ensemble streamflow forecasts.

,*Water Resour. Res.***41**, W10410, doi:10.1029/2004WR003467.Hagedorn, R., , F. J. Doblas-Reyes, , and T. N. Palmer, 2005: The rationale behind the success of multi-model ensembles in seasonal forecasting—I. Basic concept.

,*Tellus***57A**, 219–233, doi:10.1111/j.1600-0870.2005.00103.x.Haylock, M. R., , G. C. Cawley, , C. Harpham, , R. L. Wilby, , and C. M. Goodess, 2006: Downscaling heavy precipitation over the United Kingdom: A comparison of dynamical and statistical methods and their future scenarios.

,*Int. J. Climatol.***26**, 1397–1415, doi:10.1002/joc.1318.Hidalgo, H. G., , M. D. Dettinger, , and D. R. Cayan, 2008: Downscaling with constructed analogues: Daily precipitation and temperature fields over the United States. California Energy Commission PIER Final Project Rep. CEC-500-2007-123, 62 pp.

Hwang, S., , and W. D. Graham, 2013: Development and comparative evaluation of a stochastic analog method to downscale daily GCM precipitation.

,*Hydrol. Earth Syst. Sci.***17**, 4481–4502, doi:10.5194/hess-17-4481-2013.Hwang, S., , W. D. Graham, , J. L. Hernández, , C. Martinez, , J. W. Jones, , and A. Adams, 2011: Quantitative spatiotemporal evaluation of dynamically downscaled MM5 precipitation predictions over the Tampa Bay region, Florida.

,*J. Hydrometeor.***12**, 1447–1464, doi:10.1175/2011JHM1309.1.Johnson, N. T., , C. J. Martinez, , G. A. Kiker, , and S. Leitman, 2013: Pacific and Atlantic sea surface temperature influences on streamflow in the Apalachicola–Chattahoochee–Flint river basin.

,*J. Hydrol.***489**, 160–179, doi:10.1016/j.jhydrol.2013.03.005.Juneng, L., , F. T. Tangang, , H. Kang, , W.-J. Lee, , and Y. K. Seng, 2010: Statistical downscaling forecasts for winter monsoon precipitation in Malaysia using multimodel output variables.

,*J. Climate***23**, 17–27, doi:10.1175/2009JCLI2873.1.Kalra, A., , and S. Ahmad, 2012: Estimating annual precipitation for the Colorado River basin using oceanic-atmospheric oscillations.

,*Water Resour. Res.***48**, W06527, doi:10.1029/2011WR010667.Kang, H., , K.-H. An, , C.-K. Park, , A. L. S. Solis, , and K. Stitthichivapak, 2007: Multimodel output statistical downscaling prediction of precipitation in the Philippines and Thailand.

,*J. Geophys. Res.***34**, L15710, doi:10.1029/2007GL030730.Kar, S. C., , N. Acharya, , U. C. Mohanty, , and M. A. Kulkarni, 2012: Skill of monthly rainfall forecasts over India using multi-model ensemble schemes.

,*Int. J. Climatol.***32**, 1271–1286, doi:10.1002/joc.2334.Kharin, V. V., , and F. W. Zwiers, 2002: Climate predictions with multimodel ensembles.

,*J. Climate***15**, 793–799, doi:10.1175/1520-0442(2002)015<0793:CPWME>2.0.CO;2.Kirtman, B. P., , and D. Min, 2009: Multimodel ensemble ENSO prediction with CCSM and CFS.

,*Mon. Wea. Rev.***137**, 2908–2930, doi:10.1175/2009MWR2672.1.Krishnamurti, T. N., , C. M. Kishtawal, , Z. Zhang, , T. LaRow, , D. Bachiochi, , E. Williford, , S. Gadgil, , and S. Surendran, 2000: Multimodel ensemble forecasts for weather and seasonal climate.

,*J. Climate***13**, 4196–4216, doi:10.1175/1520-0442(2000)013<4196:MEFFWA>2.0.CO;2.Kumar, A., , M. Chen, , L. Zhang, , W. Wang, , Y. Xue, , C. Wen, , L. Marx, , and B. Huang, 2012: An analysis of the nonstationarity in the bias of sea surface temperature forecasts for the NCEP Climate Forecast System (CFS) version 2.

,*Mon. Wea. Rev.***140**, 3003–3016, doi:10.1175/MWR-D-11-00335.1.Lall, U., , Y. I. Moon, , H. H. Kwon, , and K. Bosworth, 2006: Locally weighted polynomial regression: Parameter choice and application to forecasts of the Great Salt Lake.

,*Water Resour. Res.***42**, W05422, doi:10.1029/2004WR003782.Landman, W. A., , and A. Beraki, 2012: Multi-model forecast skill for mid-summer rainfall over southern Africa.

,*Int. J. Climatol.***32**, 303–314, doi:10.1002/joc.2273.Landman, W. A., , D. DeWitt, , D. Lee, , A. Beraki, , and D. Lötter, 2012: Seasonal rainfall prediction skill over South Africa: One- versus two-tiered forecasting systems.

,*Wea. Forecasting***27**, 489–501, doi:10.1175/WAF-D-11-00078.1.Lavers, D., , L. Luo, , and E. F. Wood, 2009: A multiple model assessment of seasonal climate forecast skill for applications.

,*J. Geophys. Res.***36**, L23711, doi:10.1029/2009GL041365.Loader, C., 1999:

*Statistics and Computing: Local Regression and Likelihood.*Springer, 308 pp.Luo, L., , and E. F. Wood, 2008: Use of Bayesian merging techniques in a multimodel seasonal hydrologic ensemble prediction system for the eastern United States.

,*J. Hydrometeor.***9**, 866–884, doi:10.1175/2008JHM980.1.Luo, L., , E. F. Wood, , and M. Pan, 2007: Bayesian merging of multiple climate model forecasts for seasonal hydrological predictions.

,*J. Geophys. Res.***112**, D10102, doi:10.1029/2006JD007655.Maraun, D., and et al. , 2010: Precipitation downscaling under climate change: Recent developments to bridge the gap between dynamical models and the end user.

,*Rev. Geophys.***48**, RG3003, doi:10.1029/2009RG000314.Martinez, C. J., , and J. W. Jones, 2011: Atlantic and Pacific sea surface temperatures and corn yields in the southeastern USA: Lagged relationships and forecast model development.

,*Int. J. Climatol.***31**, 592–604, doi:10.1002/joc.2082.Martinez, C. J., , G. A. Baigorria, , and J. W. Jones, 2009: Use of climate indices to predict corn yields in southeast USA.

,*Int. J. Climatol.***29**, 1680–1691, doi:10.1002/joc.1817.Mason, S. J., , and G. M. Mimmack, 2002: Comparison of some statistical methods of probabilistic forecasting of ENSO.

,*J. Climate***15**, 8–29, doi:10.1175/1520-0442(2002)015<0008:COSSMO>2.0.CO;2.Maurer, E., , H. Hidalgo, , T. Das, , M. Dettinger, , and D. Cayan, 2010: The utility of daily large-scale climate data in the assessment of climate change impacts on daily streamflow in California.

,*Hydrol. Earth Syst. Sci.***14**, 1125–1138, doi:10.5194/hess-14-1125-2010.McCabe, G. J., , and M. D. Dettinger, 2002: Primary modes and predictability of year-to-year snowpack variations in the western United States from teleconnections with Pacific Ocean climate.

,*J. Hydrometeor.***3**, 13–25, doi:10.1175/1525-7541(2002)003<0013:PMAPOY>2.0.CO;2.Mesinger, F., and et al. , 2006: North American Regional Reanalysis.

,*Bull. Amer. Meteor. Soc.***87**, 343–360, doi:10.1175/BAMS-87-3-343.Ndiaye, O., , M. N. Ward, , and W. M. Thiaw, 2011: Predictability of seasonal sahel rainfall using GCMs and lead-time improvements through the use of a coupled model.

,*J. Climate***24**, 1931–1949, doi:10.1175/2010JCLI3557.1.Panofsky, H. A., , and G. W. Brier, 1958:

*Some Applications of Statistics to Meteorology.*The Pennsylvania State University, 224 pp.Peng, P., , A. Kumar, , H. van den Dool, , and A. G. Barnston, 2002: An analysis of multimodel ensemble predictions for seasonal climate anomalies.

,*J. Geophys. Res.***107**, 4710, doi:10.1029/2002JD002712.Piechota, T., , F. Chiew, , J. Dracup, , and T. McMahon, 2001: Development of exceedance probability streamflow forecast.

,*J. Hydrol. Eng.***6**, 20–28, doi:10.1061/(ASCE)1084-0699(2001)6:1(20).Prairie, J., , B. Rajagopalan, , T. Fulp, , and E. Zagona, 2005: Statistical nonparametric model for natural salt estimation.

,*J. Environ. Eng.***131**, 130–138, doi:10.1061/(ASCE)0733-9372(2005)131:1(130).Raftery, A. E., , T. Gneiting, , F. Balabdaoui, , and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles.

,*Mon. Wea. Rev.***133**, 1155–1174, doi:10.1175/MWR2906.1.Regonda, S. K., , B. Rajagopalan, , M. Clark, , and E. Zagona, 2006: A multimodel ensemble forecast framework: Application to spring seasonal flows in the Gunnison River basin.

,*Water Resour. Res.***42**, W09404, doi:10.1029/2005WR004653.Rienecker, M. M., and et al. , 2008: The GEOS-5 Data Assimilation System—Documentation of versions 5.0.1, 5.1.0, and 5.2.0. NASA Tech. Rep. NASA/TM-2008-104606, Vol. 27, 118 pp.

Ropelewski, C. F., , and M. S. Halpert, 1986: North American precipitation and temperature patterns associated with the El Niño/Southern Oscillation (ENSO).

,*Mon. Wea. Rev.***114**, 2352–2362, doi:10.1175/1520-0493(1986)114<2352:NAPATP>2.0.CO;2.Saha, S., and et al. , 2006: The NCEP Climate Forecast System.

,*J. Climate***19**, 3483–3517, doi:10.1175/JCLI3812.1.Saha, S., and et al. , 2014: The NCEP Climate Forecast System version 2.

,*J. Climate***27,**2185–2208, doi:10.1175/JCLI-D-12-00823.1.Salathe, E. P., , P. W. Mote, , and M. W. Wiley, 2007: Review of scenario selection and downscaling methods for the assessment of climate change impacts on hydrology in the United States Pacific Northwest.

,*Int. J. Climatol.***27**, 1611–1621, doi:10.1002/joc.1540.Schmidt, N., , E. K. Lipp, , J. B. Rose, , and M. E. Luther, 2001: ENSO influences on seasonal rainfall and river discharge in Florida.

,*J. Climate***14**, 615–628, doi:10.1175/1520-0442(2001)014<0615:EIOSRA>2.0.CO;2.Singhrattna, N., , B. Rajagopalan, , M. Clark, , and K. Krishna Kumar, 2005: Seasonal forecasting of Thailand summer monsoon rainfall.

,*Int. J. Climatol.***25**, 649–664, doi:10.1002/joc.1144.Sun, H., , and D. J. Furbish, 1997: Annual precipitation and river discharges in Florida in response to El Niño- and La Niña-sea surface temperature anomalies.

,*J. Hydrol.***199**, 74–87, doi:10.1016/S0022-1694(96)03303-3.Tian, D., , and C. J. Martinez, 2012a: Comparison of two analog-based downscaling methods for regional reference evapotranspiration forecasts.

,*J. Hydrol.***475**, 350–364, doi:10.1016/j.jhydrol.2012.10.009.Tian, D., , and C. J. Martinez, 2012b: Forecasting reference evapotranspiration using retrospective forecast analogs in the southeastern United States.

,*J. Hydrometeor.***13**, 1874–1892, doi:10.1175/JHM-D-12-037.1.Tian, D., , C. J. Martinez, , and W. D. Graham, 2014: Seasonal prediction of regional reference evapotranspiration (ETo) based on Climate Forecast System version 2 (CFSv2).

,*J. Hydrometeor.***15,**1166–1188, doi:10.1175/JHM-D-13-087.1.Tippett, M. K., , and A. G. Barnston, 2008: Skill of multimodel ENSO probability forecasts.

,*Mon. Wea. Rev.***136**, 3933–3946, doi:10.1175/2008MWR2431.1.Troccoli, A., 2010: Seasonal climate forecasting.

,*Meteor. Appl.***17**, 251–268, doi:10.1002/met.184.Troccoli, A., , M. Harrison, , D. L. T. Anderson, , and S. J. Mason, 2008:

*Seasonal Climate: Forecasting and Managing Risk.*Earth and Environmental Studies, Vol. 82, Springer Verlag, 467 pp.van den Dool, H. M., , and Z. Toth, 1991: Why do forecasts for “near normal” often fail?

,*Wea. Forecasting***6**, 76–85, doi:10.1175/1520-0434(1991)006<0076:WDFFNO>2.0.CO;2.Weisheimer, A., and et al. , 2009: ENSEMBLES: A new multi-model ensemble for seasonal-to-annual predictions—Skill and progress beyond DEMETER in forecasting tropical Pacific SSTs.

,*J. Geophys. Res.***36**, L21711, doi:10.1029/2009GL040896.Wilks, D. S., 2011:

*Statistical Methods in the Atmospheric Sciences.*3rd ed. Elsevier, 676 pp.Wood, A. W., , E. P. Maurer, , A. Kumar, , and D. P. Lettenmaier, 2002: Long-range experimental hydrologic forecasting for the eastern United States.

,*J. Geophys. Res.***107**, 4429, doi:10.1029/2001JD000659.Wood, A. W., , L. R. Leung, , V. Sridhar, , and D. P. Lettenmaier, 2004: Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs.

,*Climatic Change***62**, 189–216, doi:10.1023/B:CLIM.0000013685.99609.9e.Xia, Y., and et al. , 2012a: Continental-scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products.

,*J. Geophys. Res.***117**, D03109, doi:10.1029/2011JD016048.Xia, Y., and et al. , 2012b: Continental-scale water and energy flux analysis and validation for North American Land Data Assimilation System project phase 2 (NLDAS-2): 2. Validation of model-simulated streamflow.

,*J. Geophys. Res.***117**, D03110, doi:10.1029/2011JD016051.Xue, Y., , B. Huang, , Z.-Z. Hu, , A. Kumar, , C. Wen, , D. Behringer, , and S. Nadiga, 2011: An assessment of oceanic variability in the NCEP climate forecast system reanalysis.

,*Climate Dyn.***37**, 2511–2539, doi:10.1007/s00382-010-0954-4.Yates, D., , S. Gangopadhyay, , B. Rajagopalan, , and K. Strzepek, 2003: A technique for generating regional climate scenarios using a nearest-neighbor algorithm.

,*Water Resour. Res.***39**, 1199, doi:10.1029/2002WR001769.Yoon, J.-H., , K. Mo, , and E. F. Wood, 2012: Dynamic-model-based seasonal prediction of meteorological drought over the contiguous United States.

,*J. Hydrometeor.***13**, 463–482, doi:10.1175/JHM-D-11-038.1.Yuan, X., , E. F. Wood, , L. Luo, , and M. Pan, 2011: A first look at Climate Forecast System version 2 (CFSv2) for hydrological seasonal prediction.

,*J. Geophys. Res.***38**, L13402, doi:10.1029/2011GL047792.Yuan, X., , E. F. Wood, , J. K. Roundy, , and M. Pan, 2013: CFSv2-based seasonal hydroclimatic forecasts over conterminous United States.

,*J. Climate***26**, 4828–4847, doi:10.1175/JCLI-D-12-00683.1.Yun, W. T., , L. Stefanova, , and T. N. Krishnamurti, 2003: Improvement of the multimodel superensemble technique for seasonal forecasts.

,*J. Climate***16**, 3834–3840, doi:10.1175/1520-0442(2003)016<3834:IOTMST>2.0.CO;2.