1. Introduction
The accurate estimation of evapotranspiration (ET) is needed for determining agricultural water demand, reservoir losses, and driving hydrologic simulation models. In typical hydrological and agricultural practice, evapotranspiration is calculated from reference evapotranspiration (ET0), where ET0 is the evapotranspiration from a well-watered reference surface. In an effort to provide a common, globally valid standardized method for estimating ET0, the FAO-56 Penman–Monteith (PM) equation (Allen et al. 1998) was adopted by the Food and Agricultural Organization (FAO) of the United Nations. While the physically based PM equation has been shown to accurately estimate ET0 (Chiew et al. 1995; Garcia et al. 2004; López-Urrea et al. 2006; Yoder et al. 2005), it requires a large amount of meteorological data that are often not available in many regions.
Forecast output from numerical weather prediction models (NWPMs) and global climate models (GCMs) are potentially useful for ET0 forecasting. However, local application usually requires a finer resolution than is currently available from most coarse-scale NWPM and GCM output (Fowler et al. 2007). Downscaling techniques are able to address this problem by using dynamical or statistical methods. Dynamical downscaling focuses on nesting a regional climate model (RCM) in a NWPM or GCM to produce spatially complete fields of climate variables, thus preserving some spatial correlation as well as physically plausible relationships between variables (Maurer and Hidalgo 2008). However, dynamical downscaling suffers from biases introduced by the driving model and from high computational demand (Abatzoglou and Brown 2012; Hwang et al. 2011; Plummer et al. 2006). Statistical downscaling methods develop empirical mathematical relationships between output from NWPM–GCMs and local climate observations (Barsugli et al. 2009). The advantage of statistical downscaling is computational efficiency and the ability to be applied across multiple models to develop ensembles for scenario building (Abatzoglou and Brown 2012). Because of these advantages, extensive research has been conducted on statistical downscaling for a variety of purposes in recent years. For example, Maurer and Hidalgo (2008) downscaled reanalysis precipitation data over the western United States using both constructed analogs (CA) and the bias correction and spatial downscaling (BCSD) method of Wood et al. (2004). Another study compared three statistical downscaling methods: BCSD, CA, and a hybrid of the two [bias correction and constructed analogs (BCCA)] to downscale reanalysis data and used the downscaled data to drive hydrologic models (Maurer et al. 2010). Abatzoglou and Brown (2012) compared two statistical downscaling methods: BCSD and multivariate adapted constructed analog (MACA) to downscale reanalysis data for wildfire applications in the western United States.
Several studies have been conducted on ET0 forecasts in recent years. Cai et al. (2007, 2009) developed and applied ET0 forecasts using weather forecast messages produced by the China Meteorological Administration. Several studies have focused on the use of artificial neural network (ANN) or other empirical models to simulate or forecast ET0 (Chattopadhyay et al. 2009; Dai et al. 2009; Kumar et al. 2011; Landeras et al. 2009; Ozkan et al. 2011; Pal and Deswal 2009; Wang et al. 2011). A recent review of ANN modeling of ET0 can be found in Kumar et al. (2011). Comparatively fewer studies have been conducted to dynamically or statistically downscale ET0 forecasts using NWPMs or GCMs. Ishak et al. (2010) used the fifth-generation Pennsylvania State University–National Center for Atmospheric Research Mesoscale Model (MM5) to dynamically downscale 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) data in the Brue catchment in southwest England, and found ET0 to be overestimated by 27%–46%. However, Ishak et al. (2010) noted that there were clear patterns in downscaled weather variables that could be used to correct the bias in the results. Silva et al. (2010) found that statistically corrected MM5 estimates of ET0 improved results compared to raw model output in the Maipo basin in Chile. Direct statistical downscaling methods (i.e., methods that downscale directly from coarse-scale NWPM or GCM output) that specifically address the needs for ET0 forecasts appear to be generally lacking to date.
Archives of NWPM reforecasts have recently been made available for diagnosing model bias and improving forecasts, including the reforecast datasets using the National Centers for Environmental Prediction’s (NCEP’s) Global Forecast System (GFS; Hamill et al. 2006) and that from ECMWF (Hamill et al. 2008; Hagedorn et al. 2008). A series of articles have introduced the use of the GFS reforecast dataset, including week-2 forecasts (Hamill et al. 2004; Whitaker et al. 2006), short-range precipitation forecasts (Hamill and Whitaker 2006; Hamill et al. 2006, 2008), forecasts of geopotential heights and temperature (Hagedorn et al. 2008; Hamill and Whitaker 2007; Wilks and Hamill 2007), and streamflow predictions (Clark and Hay 2004; Muluye 2011; Werner et al. 2005). The GFS reforecast dataset includes over 31 years of 1–15-day 15-member forecasts for multiple variables at a T62 (approximately 200 km) resolution (Hamill et al. 2006). Hamill and Whitaker (2006) demonstrated that a forecast analog technique could produce simple, skillful probabilistic forecasts at a high spatial resolution using the GFS reforecast archive. The analog technique has been found to perform, in general, as well or better than other statistical downscaling methods (Timbal and McAvaney 2001; Wilby et al. 2004; Zorita and von Storch 1999). The GFS reforecast dataset includes forecasted variables that include wind speed, temperature, and relative humidity, which are important input data for the PM equation, and thus may be useful for generating probabilistic ET0 forecasts. However, the GFS reforecast dataset does not include archives of incoming solar radiation, nor maximum and minimum temperature (forecast variables are archived in 12-h intervals), which are also important input data for the PM equation.
In this work, we employ the GFS reforecast dataset to generate 1–15-day probabilistic daily ET0 forecasts and downscale the forecasts using a forecast analog technique over the states of Alabama, Florida, Georgia, North Carolina, and South Carolina in the southeastern United States. Sections 2 and 3 provide the data and methodology used in this work. The description of results and discussion are presented in section 4. Concluding remarks are given in section 5.
2. Data


Since the GFS reforecast archive does not include solar radiation output, and includes variables at a relatively coarse temporal resolution (12 h) this work also employed the R2 dataset. The R2 dataset includes variables such as incoming solar radiation (Rs) and daily maximum and minimum temperature. The R2 dataset is available from 1979 to present at a T62 resolution (Kanamitsu et al. 2002). Daily climatological mean values of the R2 dataset were calculated using a running ±30-day window.
As the long-term continuous observed meteorological data needed for the estimation of ET0 are generally not available, the North American Regional Reanalysis (NARR) dataset (Mesinger et al. 2006) was used for forecast verification. The NARR dataset contains all of the variables required for estimation of ET0 (as described below) and is available at an approximately 32-km-grid resolution. While the NARR dataset may contain biases (e.g., Markovic et al. 2009; Vivoni et al. 2008; Zhu and Lettenmaier 2007), the NARR data used to calculate ET0 were taken as a surrogate for long-term observations in this work.
3. Methods
a. ET0 calculation methods










b. Forecast analog method
The ET0 forecasts were produced using a moving spatial window forecast analog approach, as described by Hamill and Whitaker (2006) and Hamill et al. (2006). The moving window approach uses a limited number of GFS reforecast grid points, which increases the likelihood of finding skillful natural analogs (van den Dool 1994). The moving window forecast analog procedure used the calculated value of forecasted ET0 at a given lead using one of the approaches in Table 1 to find a subset of analog forecasts from the historical reforecast archive that were most similar (based on root-mean-square error) within the limited spatial region. Once the dates of the nearest analogs were chosen, the corresponding fine resolution estimates were obtained for these dates from the 32-km-resolution-grid ET0 values computed from the NARR. For this work forecast analogs were chosen from a subset of nine grid points (Fig. 1). This subset of nine points was used to determine the finescale analogs within the interior of the domain (Fig. 1). This process was then repeated across the region of interest. Following Hamill and Whitaker (2006), analog forecasts were selected within a ±45-day window around the date of the forecast and the best 75 analogs were chosen to construct the forecast ensemble. A cross-validation procedure was employed where dates from the current year were excluded from the list of potential analogs.
Summary of methods used to find forecast analogs.



Example of a subset of nine grid points covering Tampa Bay area. The small points denote where NARR data are available. Large black points denote where the GFS and Reanalysis 2 data are available. Small gray points denote where analog patterns will be applied, which is selected from the large black points.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

Example of a subset of nine grid points covering Tampa Bay area. The small points denote where NARR data are available. Large black points denote where the GFS and Reanalysis 2 data are available. Small gray points denote where analog patterns will be applied, which is selected from the large black points.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Example of a subset of nine grid points covering Tampa Bay area. The small points denote where NARR data are available. Large black points denote where the GFS and Reanalysis 2 data are available. Small gray points denote where analog patterns will be applied, which is selected from the large black points.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Since the GFS reforecast archive does not include solar radiation output, and only includes variables at a 12-h temporal resolution, this work employed the R2 dataset in the selection of forecast analogs by substitution of long-term climatological mean daily values or bias correction of GFS reforecasts (Table 1). The rationale to the substitution of climatological mean values from the R2 dataset is that it effectively provides an appropriate estimate (though not perfect) of the parameter in question. In doing so, it lessens the parameter’s importance in the selection of forecast analogs (since all of the potential analogs within the ±45-day window will have been calculated using the same/similar values) and the analog selection becomes weighted toward other terms in Eq. (2).



c. Evaluation procedure














The resolution and reliability of categorical forecasts were evaluated using the relative operating characteristic (ROC) diagrams (e.g., Sobash et al. 2011; Wilks 2006) and reliability diagrams (e.g., Wilks 2006), respectively. The ROC diagram compares hit rates to false alarm rates at different forecast probability levels and is a measure of how well the probabilistic forecast discriminates between events and nonevents. An ROC curve that lies along the 1:1 line indicates no skill and a curve that is far toward the upper-left corner indicates high skill. The reliability diagram indicates the degree that forecast probabilities match observed frequencies. An overall measure of the reliability of the forecasts can be assessed by the deviation of the reliability curve from the diagonal. For a perfectly reliable forecast system, the reliability curve is aligned along the diagonal. Curves below (above) the diagonal indicate overforecasting (underforecasting). The nearer the curve is to horizontal, the less resolution in the forecast.
4. Results and discussion
Table 2 and Table 3 show the overall mean LEPS and BSS skill scores for lead days 1 and 5 for all ET0 methods used to find forecast analogs. Overall, PM_RH and PM_RHRs, which used 700-mb RH data, were more skillful than the other methods. For lead day 1, PM_RHRs had the highest skill for overall results (based on the LEPS skill score), lower extremes, lower terciles, and upper terciles, while methods Thorn and PM_Rs had the highest skill for middle terciles and upper extremes, respectively (Table 2). Among the five methods that did not use GFS reforecast RH data, Thorn, which used the Thornthwaite equation with only mean temperature, had the highest skill for middle terciles (Table 2); PM_Rs, which used the PM equation with the combination of climatological mean values of Rs from the R2 dataset and temperature and wind speed from the GFS reforecast archive (Table 1), showed the highest forecast skill for overall results, lower extremes, lower terciles, upper terciles, and upper extremes. For lead day 5, the overall skill (based on the LEPS skill score) of PM_RH, Thorn, PM_Rs, and PM_RHRs is approximately 8.0, which outperformed PM_GFS, PM_RsT, and PM_BC in terms of the overall forecast (Table 3). The lower extreme and middle tercile forecasts showed no skill in lead day 5 for all the seven methods, while some forecasts were skillful for the lower tercile, upper tercile, and upper extreme categories, with PM_Rs and PM_RHRs showing the highest skill for lower tercile and upper tercile, and PM_Rs and PM_RsT showing the highest skill for the upper extreme forecast.
The overall average LEPS skill score and BSS for lead day 1. LEPS skill score and BSS are, respectively, evaluating the overall skill and categorical skill; the five categories represent <10%, <⅓, ⅓–⅔, >⅔, and >90%. The highest scores are highlighted in bold in each skill category.


a. Evaluation of reference evapotranspiration methods in time
Figures 2–4 show the skill of ET0 forecasts, by method, for lead day 1 and lead day 5 in terms of LEPS skill score (Fig. 2), BSS of forecasted extreme values (Fig. 3), and BSS of forecasted terciles (Fig. 4). Overall, ET0 estimated by PM_RH and PM_RHRs were more skillful than the methods that did not use GFS reforecast 700-mb RH data. According to the LEPS skill score (Fig. 2), PM_RH and PM_RHRs lead day 1 forecasts showed similar patterns of skill and were generally greater than other methods in cold months, with PM_BC showing slightly higher skill in warmer months (May–August). The LEPS skill score in lead day 5 shows the skill is higher in cold months than in warm months for all the seven methods. PM_RHRs still performed the best in cold months, while PM_GFS and PM_BC forecasted equally best in warm months. In terms of BSS for lower extreme forecasts (Fig. 3), PM_RHRs was the most skillful for lead day 1 and lead day 5 when the BSS was above zero. For upper extreme forecasts, PM_Rs and PM_RsT generally showed the highest skill during the cold months, while PM_BC was showed the greatest skill in July and August for both lead day 1 and lead day 5. In terms of BSS for tercile forecasts (Fig. 4), PM_RHRs showed the greatest skill for both lead day 1 and lead day 5 for lower tercile forecasts, with PM_BC showing slightly higher skill in June and July; for the middle tercile forecast, the BSS of all the seven methods over the year were within a range of −0.1 to 0.1, and no single method was the best over other methods in terms of BSS in different months; for the upper tercile forecast, PM_RHRs had the highest skill in cold months, while PM_BC and PM_Rs showed the highest in the other months.

Comparison of LEPS skill score for the seven methods as a function of time of the year: (a) LEPS skill score at lead day 1 and (b) LEPS skill score at lead day 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

Comparison of LEPS skill score for the seven methods as a function of time of the year: (a) LEPS skill score at lead day 1 and (b) LEPS skill score at lead day 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Comparison of LEPS skill score for the seven methods as a function of time of the year: (a) LEPS skill score at lead day 1 and (b) LEPS skill score at lead day 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

Comparison of (top) lower and (upper) BSS for the seven methods extreme forecasts as a function of time of the year: BSS at lead day (a) 1 and (b) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

Comparison of (top) lower and (upper) BSS for the seven methods extreme forecasts as a function of time of the year: BSS at lead day (a) 1 and (b) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Comparison of (top) lower and (upper) BSS for the seven methods extreme forecasts as a function of time of the year: BSS at lead day (a) 1 and (b) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 3, but for tercile forecasts and addition of middle tercile BSS.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 3, but for tercile forecasts and addition of middle tercile BSS.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
As in Fig. 3, but for tercile forecasts and addition of middle tercile BSS.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
While the GFS reforecast RH data used here was at 700-mb height [and not at 2-m height as typically required for Eq. (2)], it nevertheless played an important role in improving the forecast skill in the humid southeastern United States. In addition, using external R2 Rs climatology was found to improve skill in cooler months, while the PM_BC approach produced higher skill in warmer months. In the PM equation, Δ and u2 were calculated using readily available output from the GFS reforecast archive, while the values of ea, es, and Rn require, or can be estimated from (in the case of Rn), Tmax and Tmin. The Tmax and Tmin were estimated from the available 12-h output from the GFS reforecast archive, and daily RHmean was also determined from 12-h output and at 700-mb height, which arguably makes these terms the most uncertain in Eq. (2). Comparison of PM_GFS, Thorn, PM_Rs, PM_RsT, and PM_BC suggests that using the Thornthwaite equation with only mean temperature data was able to achieve similar skill to using the PM equation with Tmean and u2 data and approximate (12 h) GFS reforecast Tmin and Tmax data (PM_GFS), indicating that the PM equation failed to improve forecast skill alone without RHmean. This result also suggests that all the methods with either the PM or Thornthwaite equations show better skill compared to climatology, albeit less than when GFS reforecast RH data were included. Comparison of PM_RH (which did not use R2 Rs climatology) with PM_RHRs (which used R2 Rs climatology) suggests that the addition of Rs contributed only slightly to improvement in skill and that the majority of skill improvement was due to the inclusion of GFS reforecast RH. Comparison of PM_GFS, PM_Rs, PM_RsT, and PM_BC suggests that bias correction of all the GFS reforecast variables with R2 (PM_BC) can improve the forecast skill above that from replacing all of those variables with R2 climatology (PM_RsT). For the ET0 calculated by PM_RHRs, it is important to note that this method used R2 climatology of Rs, so the selection of candidate analogs were likely more weighted based on GFS reforecast Tmean, RH, and u2. As PM_GFS, PM_RH, and Thorn suggest, u2 was likely less important than Tmean and RH in the selection of analogs.
In Figs. 5 and 6 the BSS of PM_RH and PM_RHRs are repotted to show the relative skill between the five categories. For both methods, the upper and lower terciles showed the greatest skill for both lead day 1 and lead day 5, the middle tercile the least for lead day 1 and the extreme forecasts the least for lead day 5, and all forecasts in five categories were skillful over all 12 months for lead day 1 except the middle tercile forecast in June, August, and September. For the lead day 5 forecasts of the two methods, only the upper tercile, lower tercile, and lower extreme forecasts were skillful from December to July, while there was no skill in other categories and other months.

Comparison of the categorical forecasts in time for PM_RH in lead day (a) 1 and (b) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

Comparison of the categorical forecasts in time for PM_RH in lead day (a) 1 and (b) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Comparison of the categorical forecasts in time for PM_RH in lead day (a) 1 and (b) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 5, but for PM_RHRs.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 5, but for PM_RHRs.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
As in Fig. 5, but for PM_RHRs.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Figures 7 and 8 show the ROC and reliability diagrams for the five categorical forecasts for PM_GFS, PM_RH, and PM_RHRs for both lead day 1 and 5. All of the ROC diagrams in Fig. 7 show that the lower tercile forecast had the greatest resolution for all methods and both lead days, this was followed by the upper tercile, upper extreme, lower extreme, and middle tercile, respectively. The ROC diagrams show that the resolution of the three methods was very close, and similar results can be found for the lead day 1 and 5 forecasts. In Fig. 8, there were few differences in reliability among the three methods and the two lead days. For all methods there was some overforecasting at low probabilities. All methods showed some overforecasting bias, especially for the upper and lower tercile and upper extreme forecasts. High probabilities for the upper extremes were overforecasted for all the three methods. Although the LEPS and BSS showed that PM_RH and PM_RHRs were more skillful than PM_GFS, and the skill in lead day 1 was greater than lead day 5, the ROC and reliability diagrams show the resolution and reliability for the three methods and two lead days were similar. While the BSS indicated no skill in several instances, the corresponding ROC and reliability diagrams showed positive skill. One possible explanation is that the climatological ET0 may have significant variability across locations and across seasons, which can produce artificially high skill as explained in Hamill and Juras (2006).

ROC diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, and (f) PM_RHRs lead day 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

ROC diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, and (f) PM_RHRs lead day 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
ROC diagrams: (a) PM_GFS lead day 1, (b) PM_GFS lead day 5, (c) PM_RH lead day 1, (d) PM_RH lead day 5, (e) PM_RHRs lead day 1, and (f) PM_RHRs lead day 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 7, but for reliability diagrams.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 7, but for reliability diagrams.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
As in Fig. 7, but for reliability diagrams.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Figure 9 shows the mean monthly and annual ET0 forecasts for lead days 1 and 5 using PM_RHRs. Also shown are the 10th and 90th percentiles of the forecasts and observations. The figure indicates the mean of forecasted monthly ET0 matched quite well with the mean of NARR monthly ET0, with a slight underprediction in May–September and overprediction in November–January for both lead day 1 and 5. Similarly, for the 10th percentile of the monthly ET0, there was slight overprediction in warm months and underpredictions in cool months; on the contrary, 90th percentile forecasts were found to underpredict during warmer months and overpredict in cooler months. Figure 9 also indicates that the annual variation of ET0 forecasts generally followed observations for both lead day 1 and 5.

Comparison of the (top) monthly and (bottom) annual ET0 over October 1979–September 2009 for NARR ET0 and PM_RHRs ensemble mean ET0 analogs: lead day (a) 1 and (b) 5. The gray zones represent 10th to 90th percentile of NARR ET0, reflecting spatial variation of the monthly and annual ET0 over the five states.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

Comparison of the (top) monthly and (bottom) annual ET0 over October 1979–September 2009 for NARR ET0 and PM_RHRs ensemble mean ET0 analogs: lead day (a) 1 and (b) 5. The gray zones represent 10th to 90th percentile of NARR ET0, reflecting spatial variation of the monthly and annual ET0 over the five states.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Comparison of the (top) monthly and (bottom) annual ET0 over October 1979–September 2009 for NARR ET0 and PM_RHRs ensemble mean ET0 analogs: lead day (a) 1 and (b) 5. The gray zones represent 10th to 90th percentile of NARR ET0, reflecting spatial variation of the monthly and annual ET0 over the five states.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
b. Evaluation of reference evapotranspiration methods in space
Based on the LEPS skill score, PM_RHRs forecasts showed greater overall skill than PM_RH for both lead day 1 and lead day 5 (Fig. 10). The forecast skill for both lead day 1 and lead day 5 were highest in the west and northeast. Forecast skill was lowest over Florida and near the coast and in the more mountainous region at the confluence of North Carolina, South Carolina, and Georgia. This may be due to the inability of coarse-scale analogs to accurately match local-scale phenomena related to topographic effects and the influence of the sea breeze over the Florida peninsula (e.g., Marshall et al. 2004; Misra et al. 2011).

The average LEPS skill score of the (top) PM_RH and (bottom) PM_RHRs across the five states for lead day (left) 1 and (right) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

The average LEPS skill score of the (top) PM_RH and (bottom) PM_RHRs across the five states for lead day (left) 1 and (right) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
The average LEPS skill score of the (top) PM_RH and (bottom) PM_RHRs across the five states for lead day (left) 1 and (right) 5.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
Figure 11 shows the BSS of the categorical forecasts for PM_RH and PM_RHRs. For the upper extreme forecasts, PM_RHRs showed greater skill than PM_RH; the greatest skill found in three states (North Carolina, South Carolina, and Alabama) and northern Georgia, while the least skill were mostly in coastal areas. The lower extreme forecasts were less skillful than upper extreme forecasts but showed a similar pattern in space. The upper and lower tercile forecasts showed a similar spatial pattern to the upper and lower extreme forecasts, with PM_RHRs showing greater skill than PM_RH in most of the area. PM_RHRs showed slightly greater skill than PM_RH for lead day 1 middle tercile forecasts and the forecast skill in space was comparably homogeneous.

The average BSS of the (columns one and three) PM_RH and (columns two and four) PM_RHRs for (top to bottom) the five categorical forecasts across the five states.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

The average BSS of the (columns one and three) PM_RH and (columns two and four) PM_RHRs for (top to bottom) the five categorical forecasts across the five states.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
The average BSS of the (columns one and three) PM_RH and (columns two and four) PM_RHRs for (top to bottom) the five categorical forecasts across the five states.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
c. Evaluation of reference evapotranspiration methods by forecast lead day
Figures 12 and 13 show the BSS and LEPS skill score as a function of month and lead day for PM_RH and PM_RHRs, respectively. Overall, for PM_RH, the forecasts were more skillful in cooler months when the skill scores were above zero. The LEPS skill score showed that the overall forecasts were skillful for the first nine lead days. For the lower extreme and lower tercile forecasts, the BSS were mostly above zero before lead day 5 and the early half of the year were found to be more skillful at later lead days than later half of the year. For the upper extreme forecasts, the warmer months showed more skill at later lead days than cooler months. For the upper tercile forecasts, lead day 7 was still skillful for the months of January, December, and July; however, the skill was modest. For the middle tercile forecasts, the BSS was greater than 0 from lead day 1 to 3 for most months; after lead day 3 the forecasts showed no skill over the year. PM_RHRs showed similar forecast patterns, except it has more skillful lead days in some months (Fig. 13).

BSS of the PM_RH for five categorical forecasts as a function of time and the lead time of the forecast: (a) lower extreme forecast, (b) lower tercile forecast, (c) middle tercile forecast, (d) upper tercile forecast, (e) upper extreme forecast, and (f) overall forecast.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

BSS of the PM_RH for five categorical forecasts as a function of time and the lead time of the forecast: (a) lower extreme forecast, (b) lower tercile forecast, (c) middle tercile forecast, (d) upper tercile forecast, (e) upper extreme forecast, and (f) overall forecast.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
BSS of the PM_RH for five categorical forecasts as a function of time and the lead time of the forecast: (a) lower extreme forecast, (b) lower tercile forecast, (c) middle tercile forecast, (d) upper tercile forecast, (e) upper extreme forecast, and (f) overall forecast.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 12, but for PM_RHRs.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1

As in Fig. 12, but for PM_RHRs.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
As in Fig. 12, but for PM_RHRs.
Citation: Journal of Hydrometeorology 13, 6; 10.1175/JHM-D-12-037.1
5. Summary and concluding remarks
A forecast analog technique was successfully used to downscale 1–15-day 200-km-resolution ET0 forecasts using the GFS reforecast archive and R2 climatology data using seven methods over the states of Alabama, Georgia, Florida, North Carolina, and South Carolina in the southeastern United States. The 32-km-resolution ET0 calculated from the NARR dataset using the PM equation was used to evaluate the forecast analogs. The skill of both terciles and extremes (10th and 90th percentiles) were evaluated. The ET0 forecast methods, which included GFS reforecast 700-mb RH data in the PM equation (PM_RH and PM_RHRs), showed greater skill compared to the methods that did not use RH. The inclusion of R2 solar radiation data with GFS reforecast data (PM_Rs, PM_RHRs, and PM_RsT) was found to provide a modest increase of forecast skill. The forecasts using both GFS reforecast 700-mb RH data and R2 solar radiation (PM_RHRs) were found to produce slightly greater skill that forecasts using GFS reforecast 700-mb RH data alone (PM_RH). The bias correction of all the GFS reforecast variables with R2 (PM_BC) was found to improve the forecast skill compared to substitution of several variables with R2 climatology (PM_RsT). While the five categorical forecasts were skillful, the skill of upper and lower tercile forecasts was greater than those of lower and upper extreme forecasts and middle tercile forecasts. Most of the forecasts were skillful in the first 5 lead days.
Forecasting ET0 using GFS reforecasts is advantageous in many respects. Previously applied ANN models for evapotranspiration forecasts are black-box models, and ANN models developed for one location cannot be implemented in another without local training (Kumar et al. 2011). In contrast, using ET0 from NWPM–GCM output preserves the physical relationships between different variables and preserves the spatial correlation of the output. Compared to the work of Cai et al. (2007), who used daily weather forecast messages to forecast ET0 deterministically at eight stations over China, the data availability and the forecast resolution for downscaling NWPM–GCM forecasts is arguably more objective and at a finer resolution. There is no evidence to show that statistical downscaling of forecasts are better than dynamical downscaling forecasts (Abatzoglou and Brown 2012) in terms of forecast skill, but statistical methods have an advantage in requiring significantly less computational resources.
The advantage of analog selection based on calculated values of ET0 rather than finding analogs for each variable individually is that the PM equation appropriately weights the input variables according to their importance. The relative importance likely changes at different times of year and is captured by the analog approach used. Physically plausible relationships between variables and correlation between variables are also preserved. The disadvantage of this approach is that analogs are found based on the magnitude of ET0 and not the relative contribution of advective or radiative forcing. For example, high ET0 analog days may be found where wind and relative humidity played a larger contributing role compared to incoming solar radiation and vice versa.
This work showed that a forecast analog approach using the Penman–Monteith equation with several approximated terms could successfully downscale ET0 forecasts. However, the need to approximate several terms in the Penman–Monteith equation was likely a limiting factor in forecast skill and indicates the importance in archiving relevant variables in future, next-generation reforecast datasets. Future work in evaluating ET0 forecasts generated from retrospective forecast datasets should include comparison to direct model outputs of operational models in order to clearly demonstrate the value of statistical postprocessing relative to direct model output.
Acknowledgments
This research was supported by the NOAA’s Climate Program Office SARP-Water program Project NA10OAR4310171. The GFS reforecasts data, R2 data, and NARR data were provided by the NOAA/OAR/ESRL PSD, Boulder, Colorado, from their website at http://www.esrl.noaa.gov/psd/. The authors thank the reviewers for helpful comments.
APPENDIX A
Parameters’ Calculation for the Penman–Monteith Equation
The terms of the Penman–Monteith equation are calculated or estimated following the recommendations of Allen et al. (1998):


T air temperature [°C];




P is atmospheric pressure (kPa) and
z is elevation above sea level (m);




T is air temperature (°C);










Tmax is maximum air temperature (°C),
Tmin is minimum air temperature (°C), and
is adjustment coefficient [0.175 (°C−0.5)].


Gsc is solar constant = 0.0820 MJ m−2 min−1,
dr is inverse relative distance Earth–Sun [Eq. (A10)],
ws is sunset hour angle (rad) [Eq. (A11)],
j is latitude (rad), and
d is solar decimation (rad) [Eq. (A12)]:








where J is Julian date.


s is Stefan–Boltzmann constant (4.903 × 10−9 MJ K−4 m−2 day−1),
is maximum absolute temperature during the 24-h period , is minimum absolute temperature during the 24-h period ,ea is actual vapor pressure (kPa),
Rs is solar radiation [see Eq. (A8)],
Rso is calculated clear-sky radiation (MJ m−2 day−1) [Eq. (A14)],


z is station elevation above sea level (m), and
for Ra, see (A9).




z is height of measurement above ground surface (m)
when
.
REFERENCES
Abatzoglou, J. T., and Brown T. J. , 2012: A comparison of statistical downscaling methods suited for wildfire applications. Int. J. Climatol., 32, 772–780.
Allen, R. G., Pereira L. S. , Raes D. , and Smith M. , 1998: Crop evapotranspiration: Guidelines for computing crop water requirements. FAO Irrigation and Drainage Paper 56, 300 pp. [Available online at http://www.fao.org/docrep/X0490E/X0490E00.htm.]
Barsugli, J., Anderson C. , Smith J. , and Vogel J. , 2009: Options for improving climate modeling to assist water utility planning for climate change. Water Utility Climate Alliance White Paper, 146 pp.
Cai, J., Liu Y. , Lei T. , and Pereira L. S. , 2007: Estimating reference evapotranspiration with the FAO Penman–Monteith equation using daily weather forecast messages. Agric. For. Meteor., 145, 22–35.
Cai, J., Liu Y. , Xu D. , Paredes P. , and Pereira L. , 2009: Simulation of the soil water balance of wheat using daily weather forecast messages to estimate the reference evapotranspiration. Hydrol. Earth Syst. Sci., 13, 1045–1059.
Chattopadhyay, S., Jain R. , and Chattopadhyay G. , 2009: Estimating potential evapotranspiration from limited weather data over Gangetic West Bengal, India: A neurocomputing approach. Meteor. Appl., 16, 403–411.
Chiew, F. H. S., Kamaladasa N. N. , Malano H. M. , and McMahon T. A. , 1995: Penman-Monteith, FAO-24 reference crop evapotranspiration and class-A pan data in Australia. Agric. Water Manage., 28, 9–21.
Clark, M. P., and Hay L. E. , 2004: Use of medium-range numerical weather prediction model output to produce forecasts of streamflow. J. Hydrometeor., 5, 15–32.
Dai, X., Shi H. , Li Y. , Ouyang Z. , and Huo Z. , 2009: Artificial neural network models for estimating regional reference evapotranspiration based on climate factors. Hydrol. Processes, 23, 442–450.
Fowler, H., Blenkinsop S. , and Tebaldi C. , 2007: Linking climate change modelling to impacts studies: Recent advances in downscaling techniques for hydrological modelling. Int. J. Climatol., 27, 1547–1578.
Garcia, M., Raes D. , Allen R. , and Herbas C. , 2004: Dynamics of reference evapotranspiration in the Bolivian highlands (Altiplano). Agric. For. Meteor., 125, 67–82.
Hagedorn, R., Hamill T. M. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 2608–2619.
Hamill, T. M., and Juras J. , 2006: Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc., 132, 2905–2923.
Hamill, T. M., and Whitaker J. S. , 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 3209–3229.
Hamill, T. M., and Whitaker J. S. , 2007: Ensemble calibration of 500-hPa geopotential height and 850-hPa and 2-m temperatures using reforecasts. Mon. Wea. Rev., 135, 3273–3280.
Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434–1447.
Hamill, T. M., Whitaker J. S. , and Mullen S. L. , 2006: Reforecasts: An important dataset for improving weather predictions. Bull. Amer. Meteor. Soc., 87, 33–46.
Hamill, T. M., Hagedorn R. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 2620–2632.
Hwang, S., Graham W. , Hernández J. L. , Martinez C. , Jones J. W. , and Adams A. , 2011: Quantitative spatiotemporal evaluation of dynamically downscaled MM5 precipitation predictions over the Tampa Bay region, Florida. J. Hydrometeor., 12, 1447–1464.
Ines, A. V. M., and Hansen J. W. , 2006: Bias correction of daily GCM rainfall for crop simulation studies. Agric. For. Meteor., 138, 44–53.
Ishak, A. M., Bray M. , Remesan R. , and Han D. , 2010: Estimating reference evapotranspiration using numerical weather modelling. Hydrol. Processes, 24, 3490–3509.
Kanamitsu, M., Ebisuzaki W. , Woollen J. , Yang S. K. , Hnilo J. , Fiorino M. , and Potter G. , 2002: NCEP–DOE AMIP-II Reanalysis (R-2). Bull. Amer. Meteor. Soc., 83, 1631–1644.
Kumar, M., Raghuwanshi N. S. , and Singh R. , 2011: Artificial neural networks approach in evapotranspiration modeling: A review. Irrig. Sci., 29, 11–25.
Landeras, G., Ortiz-Barredo A. , and López J. J. , 2009: Forecasting weekly evapotranspiration with ARIMA and artificial neural network models. J. Irrig. Drain. Eng., 135, 323–334.
López-Urrea, R., Martín de Santa Olalla F. , Fabeiro C. , and Moratalla A. , 2006: Testing evapotranspiration equations using lysimeter observations in a semiarid climate. Agric. Water Manage., 85, 15–26.
Markovic, M., Jones C. G. , Winger K. , and Paquin D. , 2009: The surface radiation budget over North America: Gridded data assessment and evaluation of regional climate models. Int. J. Climatol., 29, 2226–2240.
Marshall, C. H., Pielke R. A. , Steyaert L. T. , and Willard D. A. , 2004: The impact of anthropogenic land-cover change on the Florida peninsula sea breezes and warm season sensible weather. Mon. Wea. Rev., 132, 28–52.
Maurer, E., and Hidalgo H. , 2008: Utility of daily vs. monthly large-scale climate data: An intercomparison of two statistical downscaling methods. Hydrol. Earth Syst. Sci., 12, 551–563.
Maurer, E., Hidalgo H. , Das T. , Dettinger M. , and Cayan D. , 2010: The utility of daily large-scale climate data in the assessment of climate change impacts on daily streamflow in California. Hydrol. Earth Syst. Sci., 14, 1125–1138.
Mesinger, F., and Coauthors, 2006: North American Regional Reanalysis. Bull. Amer. Meteor. Soc., 87, 343–360.
Misra, V., Moeller L. , Stefanova L. , Chan S. , O’Brien J. J. , Smith T. J. III, and Plant N. , 2011: The influence of the Atlantic warm pool on the Florida panhandle sea breeze. J. Geophys. Res., 116, D00Q06, doi:10.1029/2010JD015367.
Muluye, G. Y., 2011: Implications of medium-range numerical weather model output in hydrologic applications: Assessment of skill and economic value. J. Hydrol., 400, 448–464.
Ozkan, C., Kisi O. , and Akay B. , 2011: Neural networks with artificial bee colony algorithm for modeling daily reference evapotranspiration. Irrig. Sci., 29, 431–441.
Pal, M., and Deswal S. , 2009: M5 model tree based modelling of reference evapotranspiration. Hydrol. Processes, 23, 1437–1443.
Panofsky, H. A., and Brier G. W. , 1958: Some Applications of Statistics to Meteorology. The Pennsylvania State University, 224 pp.
Plummer, D. A., and Coauthors, 2006: Climate and climate change over North America as simulated by the Canadian RCM. J. Climate, 19, 3112–3132.
Potts, J., Folland C. , Jolliffe I. , and Sexton D. , 1996: Revised “LEPS” scores for assessing climate model simulations and long-range forecasts. J. Climate, 9, 34–53.
Silva, D., Meza F. J. , and Varas E. , 2010: Estimating reference evapotranspiration (ET0) using numerical weather forecast data in central Chile. J. Hydrol., 382, 64–71.
Sobash, R. A., Kain J. S. , Bright D. R. , Dean A. R. , Coniglio M. C. , and Weiss S. J. , 2011: Probabilistic forecast guidance for severe thunderstorms based on the identification of extreme phenomena in convection-allowing model forecasts. Wea. Forecasting, 26, 714–728.
Thornthwaite, C. W., 1948: An approach toward a rational classification of climate. Geogr. Rev., 38, 55–94.
Timbal, B., and McAvaney B. , 2001: An analogue-based method to downscale surface air temperature: Application for Australia. Climate Dyn., 17, 947–963.
van den Dool, H. M., 1994: Searching for analogues, how long must we wait? Tellus, 46A, 314–324.
Vicente-Serrano, S. M., Begueria S. , and Lopez-Moreno J. I. , 2010: A multiscalar drought index sensitive to global warming: The standardized precipitation evapotranspiration index. J. Climate, 23, 1696–1718.
Vivoni, E. R., Moreno H. A. , Mascaro G. , Ridriguez J. C. , Watts C. J. , Garatuza-Payan J. , and Scott R. L. , 2008: Observed relation between evapotranspiration and soil moisture in the North American monsoon season. Geophys. Res. Lett., 35, L22403, doi:10.1029/2008GL036001.
Wang, Y.-M., Traore S. , Kerh T. , and Leu J.-M. , 2011: Modelling reference evapotranspiration using feed forward backpropagation algorithm in arid regions of Africa. Irrig. Drain., 60, 404–417.
Werner, K., Brandon D. , Clark M. , and Gangopadhyay S. , 2005: Incorporating medium-range numerical weather model output into the ensemble streamflow prediction system of the National Weather Service. J. Hydrometeor., 6, 101–114.
Whitaker, J. S., Wei X. , and Vitart F. , 2006: Improving week-2 forecasts with multimodel reforecast ensembles. Mon. Wea. Rev., 134, 2279–2284.
Wilby, R., Charles S. , Zorita E. , Timbal B. , Whetton P. , and Mearns L. , 2004: Guidelines for use of climate scenarios developed from statistical downscaling methods. Intergovernmental Panel on Climate Change, 27 pp. [Available online at http://www.narccap.ucar.edu/doc/tgica-guidance-2004.pdf.]
Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 467 pp.
Wilks, D. S., and Hamill T. M. , 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 2379–2390.
Wood, A. W., Leung L. R. , Sridhar V. , and Lettenmaier D. P. , 2004: Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs. Climatic Change, 62, 189–216.
Yoder, R. E., Odhiambo L. O. , and Wright W. C. , 2005: Evaluation of methods for estimating daily reference crop evapotranspiration at a site in the humid Southeast United States. Appl. Eng. Agric., 21, 197–202.
Zhang, H., and Casey T. , 2000: Verification of categorical probability forecasts. Wea. Forecasting, 15, 80–89.
Zhu, C., and Lettenmaier D. P. , 2007: Long-term climate and derived hydrology and energy flux data for Mexico: 1925–2004. J. Climate, 20, 1936–1946.
Zorita, E., and von Storch H. , 1999: The analog method as a simple statistical downscaling technique: Comparison with more complicated methods. J. Climate, 12, 2474–2489.