Postprocessing of Ensemble Weather Forecasts Using a Stochastic Weather Generator

Jie Chen Department of Construction Engineering, École de technologie supérieure, Université du Québec, Montreal, Quebec, Canada

Search for other papers by Jie Chen in
Current site
Google Scholar
PubMed
Close
,
François P. Brissette Department of Construction Engineering, École de technologie supérieure, Université du Québec, Montreal, Quebec, Canada

Search for other papers by François P. Brissette in
Current site
Google Scholar
PubMed
Close
, and
Zhi Li College of Natural Resources and Environment, Northwest A & F University, Yangling, Shaanxi, China

Search for other papers by Zhi Li in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

This study proposes a new statistical method for postprocessing ensemble weather forecasts using a stochastic weather generator. Key parameters of the weather generator were linked to the ensemble forecast means for both precipitation and temperature, allowing the generation of an infinite number of daily times series that are fully coherent with the ensemble weather forecast. This method was verified through postprocessing reforecast datasets derived from the Global Forecast System (GFS) for forecast leads ranging between 1 and 7 days over two Canadian watersheds in the Province of Quebec. The calibration of the ensemble weather forecasts was based on a cross-validation approach that leaves one year out for validation and uses the remaining years for training the model. The proposed method was compared with a simple bias correction method for ensemble precipitation and temperature forecasts using a set of deterministic and probabilistic metrics. The results show underdispersion and biases for the raw GFS ensemble weather forecasts, which indicated that they were poorly calibrated. The proposed method significantly increased the predictive power of ensemble weather forecasts for forecast leads ranging between 1 and 7 days, and was consistently better than the bias correction method. The ability to generate discrete, autocorrelated daily time series leads to ensemble weather forecasts’ straightforward use in forecasting models commonly used in the fields of hydrology or agriculture. This study further indicates that the calibration of ensemble forecasts for a period up to one week is reasonable for precipitation, and for temperature it could be reasonable for another week.

Corresponding author address: Jie Chen, Department of Construction Engineering, École de technologie supérieure, Université du Québec, 1100 rue Notre-Dame Ouest, Montreal QC H3C 1K3, Canada. E-mail: jie.chen.1@ens.etsmtl.ca

Abstract

This study proposes a new statistical method for postprocessing ensemble weather forecasts using a stochastic weather generator. Key parameters of the weather generator were linked to the ensemble forecast means for both precipitation and temperature, allowing the generation of an infinite number of daily times series that are fully coherent with the ensemble weather forecast. This method was verified through postprocessing reforecast datasets derived from the Global Forecast System (GFS) for forecast leads ranging between 1 and 7 days over two Canadian watersheds in the Province of Quebec. The calibration of the ensemble weather forecasts was based on a cross-validation approach that leaves one year out for validation and uses the remaining years for training the model. The proposed method was compared with a simple bias correction method for ensemble precipitation and temperature forecasts using a set of deterministic and probabilistic metrics. The results show underdispersion and biases for the raw GFS ensemble weather forecasts, which indicated that they were poorly calibrated. The proposed method significantly increased the predictive power of ensemble weather forecasts for forecast leads ranging between 1 and 7 days, and was consistently better than the bias correction method. The ability to generate discrete, autocorrelated daily time series leads to ensemble weather forecasts’ straightforward use in forecasting models commonly used in the fields of hydrology or agriculture. This study further indicates that the calibration of ensemble forecasts for a period up to one week is reasonable for precipitation, and for temperature it could be reasonable for another week.

Corresponding author address: Jie Chen, Department of Construction Engineering, École de technologie supérieure, Université du Québec, 1100 rue Notre-Dame Ouest, Montreal QC H3C 1K3, Canada. E-mail: jie.chen.1@ens.etsmtl.ca

1. Introduction

Ensemble weather forecasts offer great potential benefits for water resource management, as they provide useful information for analyzing the uncertainty of predicted variables (Boucher et al. 2011). The advantages of ensemble weather forecasts over deterministic forecasts were observed in several studies, even at locations where the spatial resolution of ensemble forecasts was much lower (Bertotti et al. 2011; Boucher et al. 2011). However, raw ensemble forecasts are generally biased and tend to be underdispersed (Buizza 1997; Hamill and Colucci 1997; Eckel and Walters 1998; Toth et al. 2001; Pellerin et al. 2003), thus limiting the predictive power of probability density functions (PDFs). For example, Hamill and Colucci (1997) verified the Eta and regional spectral model (Eta-RSM) for predicting short-range (24 h) 850-mb temperature, 500-mb geopotential height, and 24-h total precipitation amounts using rank histograms. The nonuniform rank histograms indicated that the assumption of identical errors for each member was not achieved. Buizza et al. (2005) compared three ensemble prediction systems [the European Centre for Medium-Range Weather Forecasts (ECMWF), the Meteorological Service of Canada (MSC), and the National Centers for Environmental Prediction (NCEP)] in forecasting the 500-hPa geopotential height over the Northern Hemisphere and found that none of them was able to capture all sources of forecast uncertainty. In addition, both spread-error correlations and underdispersion were detected. Therefore, some form of postprocessing is required before ensemble forecasts can be incorporated into the decision-making process so that the predictive distributions are reliable and properly reflect the real-world uncertainty (Hamill and Colucci 1998; Richardson 2001; Boucher et al. 2011; Cui et al. 2012).

During the last two decades, a number of postprocessing methods have been proposed and implemented to address the bias and underdispersion of ensemble weather forecasts. These include rank histogram techniques (Hamill and Colucci 1998; Eckel and Walters 1998; Wilks 2006), ensemble dressing (Roulston and Smith 2003; Wang and Bishop 2005; Wilks and Hamill 2007; Brocker and Smith 2008), Bayesian model averaging (BMA; Raftery et al. 2005; Vrugt et al. 2006; Wilson et al. 2007; Sloughter et al. 2007; Soltanzadeh et al. 2011), logistic regression (Hamill et al. 2006; Wilks and Hamill, 2007; Hamill et al. 2008), analog techniques (Hamill et al. 2006; Hamill and Whitaker 2007), and nonhomogeneous Gaussian regression (NGR; Gneiting et al. 2005; Wilks and Hamill, 2007; Hagedorn et al. 2008). Among these methods, the logistic regression method was most often used to calibrate both precipitation and temperature, and BMA and NGR were usually used to calibrate the temperature (Raftery et al. 2005; Hagedorn et al. 2008; Hamill et al. 2008). More recently, studies have also extended the BMA for the postprocessing of precipitation (Sloughter et al. 2007; Schmeits and Kok 2010).

Hamill et al. (2004) used a logistic regression method to improve the medium-range precipitation and temperature forecast skill using retrospective forecasts. The ensemble mean and ensemble mean anomaly were used as predictors for precipitation and temperature, respectively. The results showed that the logistic regression-based probability forecasts (using retrospective forecasts) were much more skillful and reliable than the operational NCEP forecast. Raftery et al. (2005) proposed using the BMA method to calibrate the ensemble forecasts of temperature and found that the calibrated predictive PDFs were much better than those of the raw forecast.

Wilks (2006) compared eight ensemble model output statistics (MOS) methods for the statistical postprocessing of ensemble forecast using the idealized Lorenz’96 setting. The eight methods were classified into four categories: 1) early, ad hoc approaches (direct model output, rank-histogram recalibration, and multiple implementations of single-integration MOS equations), 2) the ensemble dressing approach, 3) regression methods (logistic regression and NGR), and 4) Bayesian methods (forecast assimilation and BMA). This is probably the most thorough study to date in terms of including the greatest number of MOS methods for the postprocessing of ensemble forecasts. The three best performing methods were found to be logistic regression, NGR, and ensemble dressing. Wilks and Hamill (2007) further compared these three methods for postprocessing daily temperature, and medium-range (6–10 and 8–14 days) temperature and precipitation forecasts. The results showed there was not a single best method for all of the applications of daily and medium-range forecasts. For example, the logistic regression method yielded the best Brier score (BS) for central forecast quantiles, while the NGR forecasts displayed slightly greater accuracy for probability forecasts of the more extreme daily precipitation quantiles. Hagedorn et al. (2008) and Hamill et al. (2008) did a parallel study that used NGR and logistic regression for postprocessing temperature and precipitation, respectively, using the ECMWF and Global Forecast System (GFS) ensemble reforecasts. The skill and reliability of ECMWF and GFS ensemble temperature and precipitation forecasts were largely improved when using the NGR and logistic regression methods, respectively. These studies also emphasized the benefits of using ensemble retrospective forecasts (reforecasts).

Other studies such as Wilson et al. (2007) and Soltanzadeh et al. (2011) showed that BMA is also able to improve the skill and reliability of ensemble forecasts. However, most studies were only focused on the calibration of temperature rather than precipitation using BMA, because the original BMA developed by Raftery et al. (2005) was only suitable for variables whose predictive PDFs are approximately normal. To use it for the calibration of precipitation, Sloughter et al. (2007) extended BMA by modeling the predictive PDFs corresponding to an ensemble member as a mixture of a discrete event at zero and a gamma distribution. The extended BMA yielded calibrated and sharp predictive distributions for 48-h precipitation forecasts. It even outperformed the logistic regression at estimating the probability of high precipitation events, because it gives a full predictive PDF rather than separate forecast probability equations for different predictand thresholds. Similarly, Wilks (2009) also extended the logistic regression to provide full PDF forecasts. The main advantage of the extended logistic regression is that the forecasted probabilities are mutually consistent, thus, the cumulative probability for a small predictand threshold cannot be larger than the probability for a larger threshold (Wilks 2009). Based on the above-mentioned studies, Schmeits and Kok (2010) compared the raw ensemble output, modified BMA, and extended logistic regression for postprocessing ECMWF ensemble precipitation reforecasts. The results showed that, even though the raw ensemble precipitation forecasts were relatively well calibrated, their skill could be significantly improved by the modified BMA and extended logistic regression methods. However, the difference in skill between modified BMA and extended logistic regression was not significant.

Even though a number of methods have been proposed for postprocessing the ensemble weather forecasts, most of them are aimed at finding the underlying probabilistic distribution of forecasted variables. However, for some practical applications, such as ensemble streamflow predictions, several sets of discrete, autocorrelated time series over several days are needed for driving the impact models (e.g., hydrological models). However, there is no simple way to go from the underlying distribution to the generation of a discrete, autocorrelated time series that is fully consistent with the underlying distribution. This study presents a new method for postprocessing ensemble weather forecasts using a stochastic weather generator. The ensemble mean precipitation and temperature anomalies are used as predictors for the calibration of precipitation and temperature, respectively. A great number of ensemble members can be produced using the stochastic weather generator with a gamma distribution for generating precipitation amounts and a normal distribution for generating temperature. A simple bias correction (BC) method is used as a benchmark to demonstrate the performance of the proposed method [i.e., the generator-based postprocessing (GPP) method]. The GPP ensemble forecasts were compared with BC and raw GFS ensemble forecasts over two Canadian watersheds in Quebec, Canada, using a set of deterministic and probabilistic metrics. The ultimate goal of this study is to provide reliable ensemble weather forecasts for ensemble streamflow forecasts. Therefore, watershed-averaged precipitation and temperature are used instead of traditional station meteorological data.

2. Study area and dataset

a. Study area

The ultimate goal of this project is to provide and evaluate ensemble streamflow forecasting. It is with this goal in mind that we chose to focus on watershed-averaged meteorological data instead than station data. Accordingly, this study is conducted over two Canadian catchments located in the Province of Quebec (Fig. 1). Two different catchments (Peribonka and Yamaska) were selected to evaluate the impact of basin characteristics on ensemble weather forecasts. Both the Peribonka and Yamaska catchments are composed of several tributaries draining basins of approximately 27 000 and 4843 km2 in southeastern and southern Quebec, respectively. The southern parts of the Peribonka and Yamaska catchments, known as the Chute-du-Diable (CDD) and the Yamaska (YAM) watersheds, respectively, are used in this study. The two watersheds differ in size (9700 vs 3330 km2) and location (the CDD watershed is located in central Quebec and the smaller YAM watershed is located in southern Quebec). Additional details on both watersheds are presented below.

Fig. 1.
Fig. 1.

Location map of the two catchments.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

1) Chute-du-Diable (CDD)

The CDD watershed (48.5°–50.2°N, 70.5°–71.5°W) is located in central Quebec. With a mostly forested surface area of 9700 km2, it is a subbasin of the Peribonka River watershed. The basin is part of the northern Quebec subarctic region, characterized by wide daily and annual temperature ranges, heavy wintertime snowfall, and pronounced rainfall and/or snowmelt peaks in the spring (April–June; Coulibaly 2003). The average annual rainfall in the area is 962 mm, of which about 36% is snowfall. The average annual maximum and minimum temperatures (Tmax and Tmin) between 1979 and 2003 were 5.49° and −5.85°C, respectively. The CDD watershed contains a large hydropower reservoir managed by Rio Tinto Alcan for hydroelectric power generation. River flows are regulated by two upstream reservoirs. Snow plays a crucial role in the watershed management, with 35% of the total yearly discharge occurring during the spring flood. The mean annual discharge of the CDD watershed is 211 m3 s−1 with daily maximum registered flood of 1666 m3 s−1. Snowmelt peak discharge usually occurs in May and averages about 1200 m3 s−1.

2) Yamaska (YAM)

The YAM watershed (45.1°–46.1°N, 72.2°–73.1°W) is composed of a number of tributaries draining a basin of approximately 4843 km2 in southern Quebec; the southern part of the YAM basin, with an area of 3330 km2, is used in this study. The average annual rainfall in the area is 1175 mm, of which about 23% is snowfall. The average annual Tmin and Tmax are above the freezing mark at 0.56° and 10.83°C, respectively, between 1979 and 2003. The mean annual discharge of the YAM River is 61 m3 s−1 with a daily maximum registered flood of 881 m3 s−1. Snowmelt peak discharge usually occurs in April and averages about 495 m3 s−1.

b. Dataset

The dataset consists of observed and ensemble-forecasted daily total precipitation and mean temperature. The observed daily precipitation and temperature over two watersheds were taken from the National Land and Water Information Service (www.agr.gc.ca/nlwis-snite) dataset covering the period of 1979–2003. This dataset was created by interpolating station data to a 10-km grid using a thin plate-smoothing spline surface fitting method (Hutchinson et al. 2009). All grid points within a watershed were averaged to represent the observed time series.

Ensemble forecasts (daily total precipitation and mean temperature) with the global grid of 2.5° were taken from the GFS reforecast dataset (http://www.esrl.noaa.gov/psd/forecasts/reforecast/; Hamill et al. 2006). Several previous studies (e.g., Hamill et al. 2004, 2006; Hamill and Whitaker 2006, 2007; Whitaker et al. 2006; Wilks and Hamill 2007) have presented the benefit of calibrating probabilistic forecasts using ensemble reforecast datasets. Forecasts for each day since 1979 were made with GFS, composed of a 15-member run out to 15 days. Since little skill is retained for precipitation after 1 week, only 1–7 lead days are used in this study over the 1979–2003 time frame. Two grid boxes were selected and averaged for the CDD watershed and only one grid box was selected for the YAM watershed.

3. Methodology

a. Stochastic weather generator

A stochastic weather generator is a computer model that can produce climate time series of arbitrary length and with statistical properties similar to those of the observed data (Richardson 1981; Nicks and Gander 1994; Semenov and Barrow 2002; Chen et al. 2010, 2012). The generation of precipitation and temperatures are usually the two main components of a weather generator. Precipitation is most often generated using a two-component model: one for the precipitation occurrence and the other for the wet-day precipitation amount.

The precipitation occurrence is usually generated using a Markov chain with various orders based on transition probabilities. Alternatively, the precipitation occurrence can also be generated based on an unconditional precipitation probability if the precipitation model only considers the wet- and dry-day probabilities rather than the wet- and dry-spell structures. In this sense, if a random number drawn from a uniform distribution for one day is less than the unconditional precipitation probability, a wet day is predicted. Since the weather generator is used in this study to synthesize the wet and dry states of ensemble members for a given day rather than to generate the continuous time series of precipitation occurrence, only the second method was used. For a predicted wet day, stochastic weather generators usually produce the precipitation amount by using a parametric probability distribution (e.g., exponential and gamma distributions). The two-parameter gamma distribution is the most widely used method to simulate wet-day precipitation. Temperature is usually generated using a two-parameter (mean and standard deviation) normal distribution. In this study, the gamma and normal distributions are used to generate the ensemble members of precipitation and temperature, respectively, for a given day. Similarly to stochastic weather generators such as Weather Generator (WGEN; Richardson 1981) and Weather generator of the École de Technologie Supérieure (WeaGETS; Chen et al. 2012), the auto- and cross correlation of and between Tmax and Tmin are preserved using a first-order linear autoregressive model. The detailed methodology is presented below.

b. Generator-based postprocessing (GPP) method

The GFS ensemble forecasts are postprocessed using the GPP method. The observed daily precipitation and temperature are used as predictands, and the forecasted ensemble mean precipitation and temperature anomalies are used as predictors, respectively. The evaluation of the GPP method is based on a cross-validation approach (Wilks 2005) to ensure the independence of the training and evaluation data. Given 25 years of available forecasts, when making forecasts for a particular year, the remaining 24 years were used as training data.

1) Postprocessing for precipitation

The calibration of precipitation is based on four seasons: winter [January–March (JFM)], spring [April–June (AMJ)], summer [July–September (JAS)], and autumn [October–December (OND)]. The methodology for the precipitation calibration is based on the hypothesis that a relationship must exist between the mean of the ensemble forecast and both the probability of precipitation occurrence and wet-day precipitation amounts. The larger the mean of the ensemble forecast, the more likely that rainfall will occur, and the more likely that a large precipitation amount will be registered. For each season and lead day, the ensemble precipitation is calibrated with the following three steps.

  1. The ensemble mean precipitation is first calculated using the 15-member ensemble precipitation forecasts. The calculated ensemble mean precipitation for each lead day in the given season is then classified into several classes based on wet-day precipitation amounts. Depending on the training samples, the number of classes is different. A maximum of 10 classes with wet-day precipitation amounts between 0–1, 1–2, 2–3, 3–4, 4–5, 5–7, 7–10, 10–15, 15–25, and ≥25 mm are used in this study. If the training sample in the largest class is less than 30 precipitation events, the last two classes are combined, and so on. The probabilities of the observed precipitation occurrence and observed mean wet-day precipitation amount corresponding to each class of forecasted precipitation are then calculated. The observed wet-day precipitation events in each class are fitted using a gamma distribution. For example, for the first class, all of the observed wet-day precipitation that correspond to ensemble mean precipitation ranging between 0 and 1 mm are pooled and fitted using a gamma distribution.

  2. The second step involves establishing relationships between the forecasted precipitation classes and the probabilities of observed precipitation occurrence and observed mean wet-day precipitation amounts. Figure 2 presents the probabilities of the observed precipitation occurrence and mean wet-day precipitation amounts as functions of the forecasted precipitation classes for summer precipitation at 1 and 3 lead days over the two selected watersheds (solid lines in the Fig. 2). The results clearly show the relationship between the mean of the ensemble forecast and the observed probability of precipitation occurrence (left-hand side), and between that same mean and the observed mean precipitation amount (right-hand side). For a large ensemble mean, the observed precipitation occurrence is nearly 100% for the larger basin. For a 7-day lead time (not shown), both relationships are close to a horizontal line, indicating that the ensemble precipitation forecast has little relevance for that lead time. The variability observed in the graphs is due to sampling times that are too short. Accordingly, the lines were smoothed using a second-order polynomial (dashed lines in Fig. 2).

  3. In the third step, the relationships (smoothed functions) between the probability of the observed precipitation occurrence and the forecasted precipitation class are directly used to determine the probability of precipitation occurrence for a given day. For any given day in the evaluation period, a forecasted precipitation class is first determined according to the ensemble mean precipitation for that day. For example, if the ensemble mean precipitation is 0.5 mm for a given day, it is classified into the first class (between 0 and 1 mm). The corresponding probability of observed precipitation occurrence (e.g., 40% for the YAM basin) is then used as the precipitation probability for this day. Then 1000 random numbers drawn from a uniform distribution are generated to represent 1000 members for this day. If the random numbers are less than or equal to the corresponding probability of observed precipitation occurrence (e.g., 40%), the corresponding members are predicted to be wet, otherwise, they are predicted to be dry. Finally, if a member is deemed wet, the fitted gamma function in the corresponding class is used to generate the precipitation amounts with uniform random numbers. Overall, 1000 members are generated for any given day. A large number of members are used to obtain the truest possible results of a weather generator. A small number of samples could result in biases due to the random nature of the stochastic process. The proposed postprocessing approach does not directly take into account the autocorrelation of precipitation occurrence. During the period covered by the ensemble weather forecast, the probability of precipitation is directly given by the forecast for each lead day, and thus preserves the coherence of the ensemble forecast. As such the autocorrelation of precipitation occurrence is directly governed by the forecast. If the forecast is wet for several days, all 1000 members will carry this information stochastically and all sequences will be dominated by wet days. As long as the forecasts have skill, using the probability of precipitation occurrence given by the forecasts is highly preferable to using the mean probabilities used to generate the occurrence series in a pure stochastic mode. Similarly to most stochastic weather generators, the proposed method does not account for the possible autocorrelation of precipitation amounts.

Fig. 2.
Fig. 2.

The relationships between forecasted summer (JAS) precipitation classes and the probability of observed summer precipitation occurrence and mean wet-day precipitation amounts for 1 and 3 lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

2) Postprocessing for temperature

The postprocessing for temperature is performed on a daily basis. The calibration of ensemble temperature forecasts includes two stages. The first stage consists of the BC of the ensemble mean temperature using a linear regression method. The second stage adds the ensemble spread using a weather generator–based method. For each evaluation year and lead day, the ensemble temperature forecast BC follows three specific steps:

  1. Similarly to precipitation, the ensemble mean temperature (24 yr × 365 days) is first calculated using the 15-member ensemble temperature forecasts (24 yr × 365 days ×15 members). The mean observed daily temperature (1 yr × 365 days) is also calculated using the 24-yr daily time series (24 yr × 365 days). The temperature anomalies (24 yr × 365 days) of both observed and forecasted data are then determined by subtracting the mean observed daily temperature (1 yr × 365 days) from the observed temperature (24 yr × 365 days) and from the ensemble mean temperature (24 yr × 365 days), respectively.

  2. Linear equations are fitted between observed and forecasted temperature anomalies using a 31-day window centered on the day of interest. For example, when fitting the linear equation for 16 January, the temperature anomalies from 1 January to 31 January over 24 yr are pooled. The use of a 31-day window ensures there will be enough data points to fit a reliable equation. This process is conducted for each day to obtain 365 equations, which can be used to correct the bias of ensemble mean temperature anomaly for an entire year.

  3. The fitted linear equations in step 2 are used to correct the daily ensemble mean temperature anomaly for each validation year. Finally, the bias-corrected ensemble mean temperature is obtained by adding the mean observed temperature to the bias corrected temperature anomalies.

A scatterplot of the ensemble mean temperature before and after BC is plotted against the corresponding observed temperature for the 1 lead day over the two selected watersheds (Fig. 3). In this case, all 25 years of raw and corrected mean forecasts are pooled together rather than separated by 31-day windows. Only a slight bias is observed for the raw GFS ensemble mean temperature for both watersheds, as displayed in Figs. 3a and 3c, and where the linear regression line slightly deviates from the 1:1 line. However, this bias is removed by using the linear regression method, as shown in Figs. 3b and 3d where the linear regression and 1:1 lines overlap each other.

Fig. 3.
Fig. 3.

The relationships between observed temperature and (left) GFS and (right) GPP ensemble mean temperature for 1 lead day over the (a),(b) CDD and (c),(d) YAM watersheds [linear regression (LR)].

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

After the BC of the ensemble mean temperature, the ensemble spread is added using a stochastic weather generator–based method. The ensemble temperature of any given day is supposed to follow a two-parameter (mean and standard deviation) normal distribution. The bias-corrected ensemble mean temperature is used as the mean of the normal distribution. The standard deviation for each season (the same standard deviation is used for each day in a specific season) is obtained using an optimization algorithm to minimize the root-mean-square error (RMSE) of rank histogram bins. Specifically, the optimization algorithm involves four steps. 1) A number of standard deviation values (ensemble spreads) are preset for each season. For example, they are set between 0.5° and 5°C with an interval of 0.05°C in this study. 2) The ensemble temperature for every day in this season is calculated by multiplying each standard deviation by a normally distributed random number and adding the bias-corrected ensemble mean temperature. This step is repeated for all preset standard deviation values to obtain a number of temperature ensembles. 3) Rank histograms are constructed for all temperature ensembles. 4) The RMSEs of rank histogram bins are calculated for all histograms. The standard deviation corresponded to the lowest RMSE is selected as the optimized one for this season. These four steps are repeated for all four seasons to obtain four optimized standard deviations for entire year postprocessing. To insure there are enough samples to construct reliable rank histograms, the standard deviation is optimized at the seasonal scale.

For any given day, the postprocessed ensemble temperature is found by multiplying the optimized standard deviation in the specific season with a normally distributed random number (1000 in this study to represent 1000 members) and adding the bias-corrected ensemble mean temperature for that day. However, the ensemble temperature generated this way lacks an autocorrelation structure. For hydrological studies, autocorrelated time series of Tmax and Tmin are usually needed to run hydrological models. Applying a similar technique used in weather generators, the observed auto- and cross correlation for and between Tmax and Tmin can be preserved using a first-order linear autoregressive model. With this model, the Tmax and Tmin ensembles over several lead days are generated at the same time, rather than generated day after day and variable after variable.

The ensemble mean Tmax and Tmin are first obtained by adding the mean observed Tmax and Tmin to the biased-corrected temperature anomalies (obtained from step 2), respectively, for all lead days. The residual series of Tmax and Tmin with desired auto- and cross correlation are then generated using a first-order linear autoregressive model:
e1
where is a (2 × 1) vector for lead day i whose elements are the residuals of the generated Tmax (j = 1) and Tmin (j = 2); is a (2 × 1) vector of independent random components that are normally distributed with a mean of zero and a variance of unity. Here and are (2 × 2) matrices whose elements are defined such that the new sequences have the desired auto- and cross-correlation coefficients. The and matrices are determined by
e2
e3
where the superscripts −1 and T denote the inverse and transpose of the matrix, respectively, and 0 and 1 are the lag 0 and lag 1 covariance matrices and calculated using the observed time series for each season. With Eq. (1), a number of the residual series over all lead days are generated to represent the ensemble members. Finally, the ensemble Tmax and Tmin over several days can be found by multiplying the optimized standard deviation in the specific season by the generated residual series and adding the bias-corrected ensemble mean Tmax and Tmin.

The first-order linear autoregressive model has been tested extensively in several studies (e.g., Richardson 1981; Chen et al. 2011, 2012) and showed a good performance at preserving the desired auto- and cross correlation of and between Tmax and Tmin. Since the main goal of this paper is to present the postprocessing method, only results involving mean temperature are presented.

c. Bias correction (BC) method

A simple BC method is used as a benchmark to demonstrate the advantages of the proposed GPP method. The BC step for temperature is similar to that of the GPP method. Linear equations with the form of y = ax + b (where a and b are two estimated coefficients) are fitted between observed and forecasted temperature anomalies using a 31-day window centered on the day of interest. The fitted linear equations are than used to correct the daily ensemble temperature anomaly for all 15 members. This step supposes that all ensemble members have the same bias. The variance optimization stage of the GPP method was not applied to the BC method. As such, it can be expected to outline the advantages of the GPP method over the simpler direct bias correction method.

A bias correction procedure is also applied to the ensemble precipitation forecast. Linear equations of the form y = ax (where a is the estimated coefficient) are fitted between the observed and forecasted mean precipitation using a 31-day window centered on the day of interest. It differs from the temperature correction in that the linear equation for precipitation is fitted using mean precipitation values and not the daily values. This results in a more reliable estimation of the linear dependence between observed and forecasted values. Moreover, since the distribution of the daily precipitation is highly skewed, a fourth root transformation was applied to precipitation values prior to fitting the linear equations. Similarly to the temperature, for a given day, the same linear equation is used for all ensemble members.

d. Verification of the postprocessing method

Rank histograms permit a quick examination of the quality of ensemble weather forecasts. Consistent biases in an ensemble weather forecast result in a sloped rank histogram, and a lack of variability (underdispersion) is revealed as a U-shaped, concave, population of the ranks (Hamill 2001). Thus, the rank histogram is first used to evaluate ensemble precipitation and temperature forecasts. However, a uniform rank histogram is a necessary but not a sufficient criterion for determining the reliability of an ensemble forecast system (Hamill 2001). Besides, some other characteristics are not evaluated by rank histograms, such as the resolution. Other verification metrics are thus necessary for testing the predictive power of an ensemble weather forecast. In this study, the GFS, BC, and GPP ensemble precipitation and temperature forecasts are verified using the Ensemble Verification System (EVS) developed by Brown et al. (2010). The selected verification metrics include two deterministic metrics for verifying the ensemble mean, and two probabilistic metrics for verifying the distribution. The continuous ranked probability skill score (CRPSS) and the Brier skill score (BSS) are also used to verify the skill of the ensemble forecasts relative to the climatology.

The two deterministic metrics are the mean absolute error (MAE) and RMSE. The MAE measures the mean absolute difference between the ensemble mean forecast and the corresponding observation and the RMSE measures the average square error of the ensemble mean forecast. The two probabilistic metrics include the BS and the reliability diagram. The BS measures the average square error of a probability forecast. It is analogous to the mean square error of a deterministic forecast. It can be decomposed into three components: reliability, resolution, and uncertainty. A reliability diagram measures the accuracy with which a discrete event is forecast by an ensemble forecast system. The BS and reliability diagram only verify discrete events in the continuous forecast distributions. Thus, one or more thresholds have to be defined to represent cutoff values from which discrete events are computed. Six thresholds corresponding to the probability of precipitation and temperature exceeding 10% (lower decile), 33% (lower tercile), 50% (median), 67% (upper tercile), 90% (upper decile), and 95% (95th percentile) are used in this study. Details of these metrics can be found in Brown et al. (2010), Demargne et al. (2010), and in the user manual of the EVS (Brown 2012) (http://amazon.nws.noaa.gov/ohd/evs/evs.html).

4. Results

Figure 4 presents the rank histograms of GFS and GPP ensemble precipitation forecasts for 1 lead day over two watersheds. Only wet-day precipitation is used to produce the rank histograms. To allow for a proper comparison with the raw ensemble forecasts, only 15 members are generated using the GPP method in this case. The results show that the distributions of the raw GFS ensemble forecasts are highly nonuniform; there is a marked tendency for the distribution to be most populated at the lowest and extreme ranks to form U-shaped rank histograms (Figs. 4a,c). This indicates that the raw GFS forecasts are considerably underdispersive for both watersheds. Wet biases are observed for the CDD watershed and dry biases for the YAM watershed. However, after the calibration with the GPP method, the rank histograms are much flatter for both watersheds (Figs. 4b,d), even though only 15 members are generated in this case. Using more members would result in more uniform rank histograms.

Fig. 4.
Fig. 4.

Rank histograms of the (left) GFS and (right) GPP ensemble precipitation forecasts for the 1 lead day over the (a),(b) CDD and (c),(d) YAM watersheds.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

Figure 5 shows the rank histograms of ensemble temperature forecasts before and after calibration for 1 lead day over the two watersheds. Similarly to precipitation, to allow for a fair comparison with the raw ensemble forecasts, only 15 members are generated for the GPP ensemble forecasts in this case. The results show that the distribution of raw GFS ensemble forecasts is highly nonuniform (U shaped) for temperature. There is a marked tendency for the distribution to be most populated at the extreme ranks, indicating the underdispersion and cold bias of the raw forecasts over the two watersheds. However, rank histograms of calibrated ensemble forecasts tend to be uniform for both watersheds.

Fig. 5.
Fig. 5.

Rank histograms of the (left) GFS and (right) GPP ensemble temperature forecasts for 1 lead day over the (a),(b) CDD and (c),(d) YAM watersheds.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

Figures 6 and 7 show the quality of the ensemble mean forecast before and after postprocessing in terms of the MAE and RMSE, respectively, for both precipitation and temperature over both watersheds. Both statistics are computed using all forecast–observation pairs (25 yr × 365 days). Overall, the GFS ensemble mean forecasts display large errors for both precipitation and temperature covering leading days from 1 to 7 days. However, the GPP method consistently improves the qualities of the ensemble mean forecasts for all leads. In terms of the MAE, the BC method displays more benefits than the GPP method for precipitation over both watersheds. This is expected, since the BC method specifically accounts for the bias of the GFS forecast. However, in terms of the RMSE, the GPP method consistently performs better than the BC method for precipitation. Since both BC and GPP methods share the same step at removing the bias of the ensemble mean temperature, the MAE and RMSE of the forecast temperature are the same for both.

Fig. 6.
Fig. 6.

MAE of GFS, GPP, and BC ensemble mean (left) precipitation and (right) temperature forecasts for 1–7 lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

Fig. 7.
Fig. 7.

RMSE of GFS, GPP, and BC ensemble mean (left) precipitation and (right) temperature forecasts for 1–7 lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

As displayed in Figs. 6a and 6c, the quality of raw ensemble mean forecasts decreases slightly with increasing lead time for precipitation in terms of the MAE. However, the RMSE of raw ensemble mean forecasts tends to decrease with an increase in lead time for precipitation (Figs. 7a,c). After postprocessing, the forecast quality slightly decreases with the increase in lead days. For the ensemble mean temperature, there is a progressive decline in forecast quality with increasing lead time in terms of both MAE and RMSE.

Moreover, the quality of the ensemble mean forecast at the CDD watershed is consistently better than that at the YAM watershed for precipitation, suggesting that watershed size plays an important role. This likely indicates that the numerical weather forecast system is better at representing precipitation events over a larger area, since the representation of convective events is very difficult considering the horizontal resolution of the computational grid. In this work, the observed precipitation is watershed averaged, and as such, convective precipitation extremes are smoothed over the larger basin. The same extremes would play a more important role in a smaller watershed.

The skill of ensemble forecasts relative to unskilled climatology is assessed using the mean continuous ranked probability skill score (MCRPSS; Fig. 8). The GFS ensemble precipitation forecasts show negative skill relative to the climatology over both watersheds. The forecast skill consistently increases with the forecast leads. This is caused primarily by the lack of spread (greater sharpness) in shorter lead ensemble forecasts and the larger spread in longer lead ensemble forecasts. Even though the BC method is able to improve the ensemble precipitation forecast to a certain extent, the skill is still negative for all 7 lead days. The GPP method considerably increases the skill of the ensemble forecast for both watersheds and is consistently better than the BC method. The skill of the GPP forecast decreases with increasing lead times, and is close to zero at the 7-day lead, indicating that the ensemble weather forecast has reached its predictability limit. Thus, the calibration of ensemble precipitation forecasts for a period of 7 lead days is appropriate in this study.

Fig. 8.
Fig. 8.

MCRPSS of GFS, GPP, and BC ensemble (left) precipitation and (right) temperature forecasts for 1–lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

The GFS ensemble temperature forecasts are much more skillful than their precipitation forecasts for all leads over both watersheds. Even though the GFS ensemble temperature forecasts are skillful for the period up to 1 week, they can be further improved by both GPP and BC methods. The GPP ensemble temperature forecasts are consistently better than the BC ones for all 7 lead days and both watersheds, indicating that benefits of the GPP method not only come from the BC stage, but also from the variance optimization stage. The BC stage plays a slightly more important role at improving the raw forecasts. Moreover, the skill of ensemble temperature forecasts (before and after postprocessing) consistently decreases with the increase in lead time for both watersheds.

For probabilistic metrics computed for discrete events, such as the BS, BSS, and reliability diagrams used in this study, a number of thresholds have been defined. As mentioned earlier, six thresholds were used in this study. Since similar patterns are obtained, only the results with the threshold exceeding the median are presented for illustration for all four metrics.

Figures 9 and 10 show the BS of GFS, GPP, and BC ensemble precipitation and temperature forecasts for both watersheds, with leads ranging between 1 and 7 days. The reliability, resolution, and uncertainty components of the BS and the BSS, which measure the performance of an ensemble weather forecast relative to the climatology, are also presented. The reliability term of BS measures how close the forecast probabilities are to the true probabilities, with smaller values indicating a better forecast system. The resolution term measures how much the predicted probabilities differ from a climatological average and therefore contribute valuable information. Thus, a larger resolution value suggests a better forecast. According to its definition, the uncertainty term of the BS is always equal to 0.25 [0.5 × (1 − 0.5)] when using the median as the threshold.

Fig. 9.
Fig. 9.

BS and its decomposed components [Brier score uncertainty (BSunc), Brier score reliability (BSrel), and Brier score resolution (BSres)] of (left) GFS, (middle) GPP, and (right) BC ensemble precipitation forecasts for 1–7 lead days over the (a)–(c) CDD and (d)–(f) YAM watersheds. Probability exceeding 50% (median) is used as the threshold. The BSS of the ensemble forecasts relative to the climatology is also presented.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for ensemble temperature forecasts.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

In terms of the BS, the ensemble forecasts are less accurate (in overall performance) for precipitation (Fig. 9), and reasonably accurate for temperature (Fig. 10) for both watersheds. In terms of the BS, the BC method performs slightly better for the temperature forecasts, but shows no improvement for the ensemble precipitation forecasts. Nevertheless, the GPP method consistently increases the accuracy for both precipitation and temperature for all 7 lead days for both watersheds, with a consistent increase in the resolution component of the BS. In addition, the reliability component of the BS is also improved for the ensemble precipitation forecasts for all lead times. In contrast, the BS’ reliability component is slightly degraded for the ensemble temperature forecasts at all lead times. This is because the raw ensemble forecasts are very reliable for temperature to begin with (mean reliability component of 0.005 for the CDD watershed and 0.003 for the YAM watershed). The moderate decrease in the BS is due to a relatively large increase in the resolution component and a slight deterioration of the reliability component.

In terms of the BSS, the skill of the GFS ensemble precipitation forecast is negative for all 7 lead days. The BC method results in small improvements for precipitation forecasts, but only for the first few lead days. It then progressively becomes worse than the GFS forecasts for the other lead days. The GPP method considerably improves the skill of the ensemble forecast for both watersheds and is consistently better than the BC method. The skill of the GPP ensemble forecast decreases with increasing lead times, with the BSS being close to zero at 7 lead days, further indicating that the ensemble weather forecast retains some skill for a period of up to 1 week for precipitation. The BSS shows high skill in GFS ensemble temperature forecasts, for all lead times and for both watersheds. The BC method slightly improves the skill of the ensemble temperature forecast, but at the expense of the resolution. However, the GPP ensemble forecast consistently exceeds the skill of GFS and BC ensemble forecasts. In particular, the BSS of the GPP ensemble forecast at 7 lead days is greater than that of the GFS forecast at the 1 lead day for the CDD watershed.

The reliability diagram (Hartmann et al. 2002) is a graph of the observed frequency of an event plotted against its forecast probability. It provides information with respect to the reliability, resolution, skill, and sharpness of a forecast system. Figure 11 presents the reliability diagrams of ensemble precipitation and temperature forecasts before and after postprocessing for a probability threshold exceeding the median for 1 and 3 lead days over both watersheds. Underdispersion of the raw ensemble precipitation forecast (Fig. 4) is reflected in the reliability diagrams (Figs. 8a,c) for both 1 and 3 lead days for both watersheds, as reflected in the slopes of the reliability curves that are smaller than 1. This indicates that the GFS ensemble precipitation forecasts are poorly calibrated with limited skill and resolution. The probability forecasts derived from the raw ensembles are overconfident, which can be reflected by the sharpness (relative frequency of usage). The BC method results in little improvement for the ensemble precipitation forecasts, essentially because it does not account for the spread. The GPP method results in a dramatic improvement in the reliability of the ensemble precipitation forecasts for both 1 and 3 lead days, although the sharpness is somewhat lessened. Postprocessing thus improves the reliability at the expense of the sharpness.

Fig. 11.
Fig. 11.

Reliability diagrams of GFS, GPP, and BC ensemble precipitation and temperature forecasts for 1 and 3 lead days over the (a),(b) CDD and (c),(d) YAM watersheds. The probability of exceeding 50% (median) is used as the threshold.

Citation: Monthly Weather Review 142, 3; 10.1175/MWR-D-13-00180.1

The cold biases of the raw temperature forecast (Fig. 5) are also reflected in the reliability diagrams (Figs. 8b,d) for both 1 and 3 lead days for both watersheds, as displayed by the underforecasting. The ensemble temperature forecasts calibrated using the weather generator–based method are much more reliable than the GFS and BC forecasts for both 1 and 3 lead days over both watersheds, as indicated by the reliability curves, which are very close to the 1:1 reference line. More importantly, the improvement in reliability results in a slight decline in the sharpness. The better performance of the GPP method over the BC method is a clear indication that a significant part of the performance is derived from the variance optimization stage.

5. Discussion and conclusions

Ensemble weather forecasts generally suffer from bias and tend to be underdispersive, which limits their predictive power. Several methods, such as logistic regression, BMA, and ensemble dressing have been proposed for postprocessing these ensemble forecasts. These methods are relatively complex to set up and generally aim at estimating the underlying predictive PDFs. However, a series of point values are often more convenient for practical applications such as ensemble streamflow forecasts, which need discrete, autocorrelated time series over several days in order to run hydrological models. These discrete, autocorrelated time series of precipitation and temperature need to be physically constrained. For example, temperatures changes from one day to the next and the probability of precipitation occurrence are not random. Even if a method is adequate at reconstructing the underlying PDF, there is no simple way to go from the underlying distributions to generating several time series fully consistent with the underlying distributions. The GPP method presented in this study is significantly simpler to implement than most existing methods, and it can readily generate an infinite number of discrete, autocorrelated time series over the forecasting horizon. The auto- and cross correlation of and between Tmax and Tmin were specifically taken into account with this method. Moreover, the GPP method specifically takes into account the precipitation occurrence biases of ensemble forecasts. Precipitation amounts are modeled using a parametric distribution. This underlying assumption allows extreme values outside the range of the observed data to be simulated. A gamma distribution was used in this study. Other distributions with a heavy tail (e.g., mixed exponential distribution and hybrid exponential and generalized Pareto distribution) can also be used to better represent the ensemble spread (C. Li et al. 2012; Z. Li et al. 2013). This is one of the major advantages of this method over analog techniques. Even though the Hamill et al. (2006) approach also constructed a distribution from the observations associated with the analogs, the inclusion of dry events makes it impossible to fit the parametric distributions. Furthermore, the performance of an analog technique is strongly depended on the reforecast period, especially for extreme precipitation forecasts. The rarer the events, the more difficult it is to find appropriate forecast analogs (Hamill et al. 2006). However, this problem is avoided by the GPP methods, since the extremes of ensemble forecasts are grouped to the last class of the ensemble mean forecast. The ensemble mean precipitation and mean temperature anomalies are used as predictors for postprocessing precipitation and temperature, respectively. Since the spatial resolution of ensemble weather forecasts (model scale) is usually lower than the required resolution of real applications (e.g., watershed scale for hydrological studies), postprocessing also acts as a downscaling method.

A simple BC method was used as a benchmark to demonstrate the performance of the GPP method over two Canadian watersheds located in the Province of Quebec. Ensemble weather forecasts were taken from the GFS retrospective forecast dataset. Much longer time series are available for ensemble reforecasts than for operational forecasts for properly calibrating the postprocessing method. Previous studies convincingly showed that postprocessing using reforecasts achieved substantial improvements in their skill and reliability (Hagedorn et al. 2008; Hamill et al. 2008).

The GFS and GPP ensemble weather forecasts were preliminarily tested using rank histograms. Similarly to previous studies (Hamill and Colucci 1997; Hagedorn et al. 2008), the GFS forecasts were found to be biased and underdispersed, as illustrated by the excess populations of the extreme ranks. This underdispersion was more pronounced at the shorter forecast leads than for longer forecast leads (results not shown). Uniform rank histograms could be achieved for both precipitation and temperature when postprocessed using the GPP method.

The performance of GFS, GPP, and BC ensemble weather forecasts was further verified using both deterministic and probabilistic metrics. The deterministic metrics (MAE and RMSE) showed large errors in GFS ensemble mean forecasts for both precipitation and temperature at all 7 lead days over both watersheds. The GPP method was able to consistently improve the quality of the ensemble mean forecasts. The skill of ensemble weather forecasts relative to the climatology was measured using MCRPSS. The raw forecast had negative to near-zero skill at all forecast leads for precipitation. The GPP method substantially improved the skill of the ensemble forecasts for precipitation, with MCRPSS being positive for all 7 lead days. Even though relatively good skill was observed for the raw ensemble temperature forecasts, they could be further benefitted by applying the postprocessing method. The performance of the GPP method was consistently better than that of the BC method.

Probabilistic metrics computed for discrete events including the BS, BSS, and reliability diagrams were further used to verify the overall performance (accuracy, skill, reliability, resolution, and sharpness) of ensemble weather forecasts. Overall, the GPP method was able to consistently improve the accuracy of ensemble forecasts for both precipitation and temperature over both watersheds. It also consistently outperformed the BC method. The GFS ensemble forecasts showed negative skill for precipitation for all 7 lead days. This indicated that the underdispersed GFS forecasts were even worse than the climatology for precipitation. However, with the GPP method, a positive skill was achieved for a period of up to 7 lead days. With the GPP method, the skill of the ensemble temperature forecasts was also improved, even though they usually had revealed reasonable skills before postprocessing. Underdispersion of the GFS ensemble precipitation forecasts was reflected in the reliability diagrams, indicating that the GFS precipitation forecasts were poorly calibrated and showed little skill and resolution. The GPP method markedly improved their reliability and resolution for all leads over both watersheds. However, the sharpness was somewhat diminished. This is consistent with other studies (e.g., Hamill et al. 2008) in that the reliability was improved at the expense of sharpness. The reliability diagrams showed cold biases for GFS ensemble temperature. However, the reliability curves were very close to the 1:1 perfect line after postprocessing.

Overall, even though GFS ensemble forecasts are biased and tend to be underdispersed, their overall performance was considerably improved using the proposed GPP method. Predictably, the performance of the GPP method decreased with increasing lead days. For the GFS ensemble reforecasts and selected basins, 7 days was the maximum lead day for precipitation. For temperature, postprocessing over a longer period may be possible. The use of the BC method for temperature provided an opportunity to separate the advantage of the GPP method into that from the bias correction and variance optimization stages. The better performance of the GPP method clearly demonstrates the importance of the variance optimization stage, even though the bias correction carries the largest part of the performance gain.

Owing to restrictions in paper length, the only results that were show for the probabilistic metrics (BS, BSS, and reliability diagrams) are those with a threshold exceeding the median value. The use of higher thresholds is also interesting for many real-world applications. The ensemble weather forecasts were also tested using higher thresholds (67%, 90%, and 95%). While the results for the higher thresholds slightly differ from those obtained using the median values, the patterns were very similar. Specifically, the skill of the GPP forecasts decreased slightly with the increasing threshold, because the GFS forecast performance gets worse for the higher thresholds. However, the degree of improvement obtained from the GPP method increased with the larger thresholds.

The excellent performance of the postprocessing scheme may partly be due to the choice of basin-averaged meteorological time series. While still much smaller than the numerical model scale, the basin scale (9700 and 3330 km2) is nevertheless slightly more comparable and may result in a better representation of the precipitation than at the station scale, as is more common. The method performed very well with only one predictor (ensemble mean for precipitation and ensemble mean anomaly for temperature). No attempt was made at using additional predictors. In particular, the use of the ensemble standard deviation may yield additional improvements.

To obtain the true expectancy of a weather generator, a large number of ensembles need to be generated with the proposed method. Short time series could result in biases due to the random nature of the stochastic process. Thus, a 1000-member ensemble was generated in this study. However, for hydrological studies, running such a large ensemble may be time consuming for the fully distributed model. However, as indicated by rank histograms in Figs. 4 and 5, the ensemble with only 15 members was nevertheless better than the GFS forecast. Therefore, depending on research purposes and hydrology model complexity, an ensemble with fewer members may also be acceptable.

For the real climate system, a correlation exists between precipitation and temperature. Generally, mean temperature is generally cooler on wet days. However, the proposed GPP method generated precipitation and temperature independently. This may affect this correlation to a certain extent. To investigate thus, the precipitation–temperature correlations were calculated for observed and forecasted datasets using all 25-yr daily time series. The correlation coefficients for GFS, GPP, and BC forecasts were obtained by averaging all coefficients over all 7 lead days and all ensemble members. The correlation coefficients are 0.189, 0.363, 0.161, and 0.227 for observed data, GFS, GPP, and BC forecasts, respectively, over the CDD watershed. They are equal to 0.119, 0.239, 0.088, and 0.069, respectively, over the YAM watershed. These results indicate that the GFS forecasts overestimated the precipitation–temperature correlation. However, the precipitation–temperature correlation was slightly underestimated when using the GPP method. This is as expected, since the ensemble precipitation and temperature were generated independently. However, it should be noted that any bias correction or postprocessing method may be expected to alter the precipitation–temperature correlation, unless specifically taken into account.

The goal of this work was to provide a postprocessing method to improve the ensemble weather forecasts for ensemble streamflow forecasts in the Province of Quebec. The proposed method was tested on two watersheds. For a broader use, it should be tested using more datasets from different climate zones. In addition, the performance of a postprocessing method may partly depend on the skill of raw forecasts. Thus, it may be necessary to test the proposed method using other ensemble weather forecast products such as the ECMWF reforecast. Especially, in the process of this work, a newer version of the GFS reforecast dataset was made available showing improved skills over the older one used in this study (Hamill et al. 2013). It would be interesting to know how the GPP method would perform over this newer dataset. Therefore, more comprehensive verifications, including evaluating the proposed method over different locations and using other ensemble weather forecast products are recommended in further studies.

Acknowledgments

This work is part of a project funded by the Projet d’Adaptation aux Changements Climatiques (PACC26) of the Province of Quebec, Canada. The authors thank the Centre d’Expertise Hydrique du Québec (CEHQ) and the Ouranos Consortium on Regional Climatology and Adaptation for their contributions to this project. The authors wish to thank Dr. Vincent Fortin of Environment Canada for his insights on ensemble weather forecasts, and Dr. James D. Brown of the NOAA/National Weather Service, Office of Hydrologic Development, for assisting us in the use of the Ensemble Verification System (EVS) and for providing insightful comments as we prepared this paper. We also thank the Earth System Research Laboratory, Physical Sciences Division for providing reforecast products.

REFERENCES

  • Bertotti, L., J. R. Bidlot, R. Buizza, L. Cavaleri, and M. Janousek, 2011: Deterministic and ensemble-based prediction of Adriatic Sea sirocco storms leading to ‘acqua alta’ in Venice. Quart. J. Roy. Meteor. Soc., 137 (659), 14461466.

    • Search Google Scholar
    • Export Citation
  • Boucher, M. A., F. Anctil, L. Perreault, and D. Tremblay, 2011: A comparison between ensemble and deterministic hydrological forecasts in an operational context. Adv. Geosci., 29, 8594, doi:10.5194/adgeo-29-85-2011.

    • Search Google Scholar
    • Export Citation
  • Brocker, J., and L. A. Smith, 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663678.

  • Brown, D. J., 2012: Ensemble Verification System (EVS), version 4.0. User’s manual, 107 pp.

  • Brown, D. J., J. Demargne, D. J. Seo, and Y. Liu, 2010: The Ensemble Verification System (EVS): A software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Modell. Software, 25, 854872.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF ensemble prediction system. Mon. Wea. Rev., 125, 99119.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP Global Ensemble Prediction Systems. Mon. Wea. Rev., 133, 10761097.

    • Search Google Scholar
    • Export Citation
  • Chen, J., F. P. Brissette, and R. Leconte, 2010: A daily stochastic weather generator for preserving low-frequency of climate variability. J. Hydrol., 388, 480490.

    • Search Google Scholar
    • Export Citation
  • Chen, J., F. P. Brissette, and R. Leconte, 2011: Assessment and improvement of stochastic weather generators in simulating maximum and minimum temperatures. Trans. ASABE, 54 (5), 16271637.

    • Search Google Scholar
    • Export Citation
  • Chen, J., F. P. Brissette, R. Leconte, and A. Caron, 2012: A versatile weather generator for daily precipitation and temperature. Trans. ASABE, 55 (3), 895906.

    • Search Google Scholar
    • Export Citation
  • Coulibaly, P., 2003: Impact of meteorological predictions on real-time spring flow forecasting. Hydrol. Processes, 17 (18), 37913801.

    • Search Google Scholar
    • Export Citation
  • Cui, B., Z. Toth, Y. Zhu, and D. Hou, 2012: Bias correction for global ensemble forecast. Wea. Forecasting, 27, 396410.

  • Demargne, J., J. Brown, Y. Liu, D. J. Seo, L. Wu, Z. Toth, and Y. Zhu, 2010: Diagnostic verification of hydrometeorological and hydrologic ensembles. Atmos. Sci. Lett., 11, 114122.

    • Search Google Scholar
    • Export Citation
  • Eckel, F. A., and M. K. Walters, 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13, 11321147.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., A. E. Raftery, A. H. Westveld III, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., T. Hamill, and J. S. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 26082619.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550560.

  • Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327.

  • Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711724.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 32093229.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2007: Ensemble calibration of 500-hPa geopotential height and 850-hPa and 2-m temperatures using reforecasts. Mon. Wea. Rev., 135, 32733280.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2006: Reforecasts: An important dataset for improving weather predictions. Bull. Amer. Meteor. Soc., 87, 3346.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., R. Hagedorn, and J. S. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 26202632.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., G. T. Bates, J. S. Whitaker, D. R. Murray, M. Fiorino, T. J. Galarneau Jr., Y. J. Zhu, and W. Lapenta, 2013: NOAA’s second-generation global medium-range ensemble reforecast dataset. Bull. Amer. Meteor. Soc.,94, 1553–1565.

  • Hartmann, H. C., T. C. Pagano, S. Sorooshiam, and R. Bales, 2002: Confidence builders: Evaluating seasonal climate forecasts from user perspectives. Bull. Amer. Meteor. Soc., 83, 683698.

    • Search Google Scholar
    • Export Citation
  • Hutchinson, M. F., D. W. McKenney, K. Lawrence, J. H. Pedlar, R. F. Hopkinson, E. Milewska, and P. Papadopol, 2009: Development and testing of Canada-wide interpolated spatial models of daily minimum–maximum temperature and precipitation for 1961–2003. J. Appl. Meteor. Climatol., 48, 725741.

    • Search Google Scholar
    • Export Citation
  • Li, C., V. P. Singh, and A. K. Mishra, 2012: Simulation of the entire range of daily precipitation using a hybrid probability distribution. Water Resour. Res.,48, W03521, doi:10.1029/2011WR011446.

  • Li, Z., F. Brissette, and J. Chen, 2013: Finding the most appropriate precipitation probability distribution for stochastic weather generation and hydrological modelling in Nordic watersheds. Hydrol. Processes,27, 3718–3729, doi: 10.1002/hyp.9499.

  • Nicks, A. D., and G. A. Gander, 1994: CLIGEN: A weather generator for climate inputs to water resource and other model. Proc. Fifth Int. Conf. on Computers in Agriculture, St. Joseph, MI, American Society of Agricultural Engineers, 394.

  • Pellerin, G., L. Lefaivre, P. Houtekamer, and C. Girard, 2003: Increasing the horizontal resolution of ensemble forecasts at CMC. Nonlinear Processes Geophys., 10, 463468.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., T. Gneiting, F. Balabdaout, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174.

    • Search Google Scholar
    • Export Citation
  • Richardson, C. W., 1981: Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour. Res., 17, 182190.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of sample size. Quart. J. Roy. Meteor. Soc., 127, 24732489.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 1630.

  • Schmeits, M. J., and K. J. Kok, 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 41994211.

    • Search Google Scholar
    • Export Citation
  • Semenov, M. A., and E. M. Barrow, 2002: LARS-WG: A stochastic weather generator for use in climate impact studies. User manual, 28 pp. [Available online at www.rothamsted.ac.uk/mas-models/download/LARS-WG-Manual.pdf.]

  • Sloughter, J. M., A. E. Raftery, T. Gneitting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220.

    • Search Google Scholar
    • Export Citation
  • Soltanzadeh, I., M. Azadi, and G. A. Vakili, 2011: Using Bayesian Model Averaging (BMA) to calibrate probabilistic surface temperature forecasts over Iran. Ann. Geophys., 29, 12951303.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., Y. Zhu, and T. Marchok, 2001: The use of ensembles to identify forecasts with small and large uncertainty. Wea. Forecasting, 16, 463477.

    • Search Google Scholar
    • Export Citation
  • Vrugt, J. A., M. P. Clark, C. G. H. Diks, Q. Duan, and B. A. Robinson, 2006: Multi-objective calibration of forecast ensembles using Bayesian model averaging. Geophys. Res. Lett., 33, L19817, doi:10.1029/2006GL027126.

    • Search Google Scholar
    • Export Citation
  • Wang, X., and C. Bishop, 2005: Improvement of ensemble reliability with a new dressing kernel. Quart. J. Roy. Meteor. Soc., 131, 965986.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., X. Wei, and F. Vitart, 2006: Improving week-2 forecasts with multimodel reforecast ensembles. Mon. Wea. Rev., 134, 22792284.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2005: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 467 pp.

  • Wilks, D. S., 2006: Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteor. Appl., 13, 243256.

  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361368.

  • Wilks, D. S., and T. M. Hamill, 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390.

  • Wilson, L. J., S. Beauregard, A. E. Raftery, and R. Verret, 2007: Calibrated surface temperature forecasts from the Canadian ensemble prediction system using Bayesian Model Averaging. Mon. Wea. Rev., 135, 13641385.

    • Search Google Scholar
    • Export Citation
Save
  • Bertotti, L., J. R. Bidlot, R. Buizza, L. Cavaleri, and M. Janousek, 2011: Deterministic and ensemble-based prediction of Adriatic Sea sirocco storms leading to ‘acqua alta’ in Venice. Quart. J. Roy. Meteor. Soc., 137 (659), 14461466.

    • Search Google Scholar
    • Export Citation
  • Boucher, M. A., F. Anctil, L. Perreault, and D. Tremblay, 2011: A comparison between ensemble and deterministic hydrological forecasts in an operational context. Adv. Geosci., 29, 8594, doi:10.5194/adgeo-29-85-2011.

    • Search Google Scholar
    • Export Citation
  • Brocker, J., and L. A. Smith, 2008: From ensemble forecasts to predictive distribution functions. Tellus, 60A, 663678.

  • Brown, D. J., 2012: Ensemble Verification System (EVS), version 4.0. User’s manual, 107 pp.

  • Brown, D. J., J. Demargne, D. J. Seo, and Y. Liu, 2010: The Ensemble Verification System (EVS): A software tool for verifying ensemble forecasts of hydrometeorological and hydrologic variables at discrete locations. Environ. Modell. Software, 25, 854872.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF ensemble prediction system. Mon. Wea. Rev., 125, 99119.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP Global Ensemble Prediction Systems. Mon. Wea. Rev., 133, 10761097.

    • Search Google Scholar
    • Export Citation
  • Chen, J., F. P. Brissette, and R. Leconte, 2010: A daily stochastic weather generator for preserving low-frequency of climate variability. J. Hydrol., 388, 480490.

    • Search Google Scholar
    • Export Citation
  • Chen, J., F. P. Brissette, and R. Leconte, 2011: Assessment and improvement of stochastic weather generators in simulating maximum and minimum temperatures. Trans. ASABE, 54 (5), 16271637.

    • Search Google Scholar
    • Export Citation
  • Chen, J., F. P. Brissette, R. Leconte, and A. Caron, 2012: A versatile weather generator for daily precipitation and temperature. Trans. ASABE, 55 (3), 895906.

    • Search Google Scholar
    • Export Citation
  • Coulibaly, P., 2003: Impact of meteorological predictions on real-time spring flow forecasting. Hydrol. Processes, 17 (18), 37913801.

    • Search Google Scholar
    • Export Citation
  • Cui, B., Z. Toth, Y. Zhu, and D. Hou, 2012: Bias correction for global ensemble forecast. Wea. Forecasting, 27, 396410.

  • Demargne, J., J. Brown, Y. Liu, D. J. Seo, L. Wu, Z. Toth, and Y. Zhu, 2010: Diagnostic verification of hydrometeorological and hydrologic ensembles. Atmos. Sci. Lett., 11, 114122.

    • Search Google Scholar
    • Export Citation
  • Eckel, F. A., and M. K. Walters, 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble. Wea. Forecasting, 13, 11321147.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., A. E. Raftery, A. H. Westveld III, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., T. Hamill, and J. S. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 26082619.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550560.

  • Hamill, T. M., and S. J. Colucci, 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327.

  • Hamill, T. M., and S. J. Colucci, 1998: Evaluation of Eta-RSM ensemble probabilistic precipitation forecasts. Mon. Wea. Rev., 126, 711724.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2006: Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application. Mon. Wea. Rev., 134, 32093229.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and J. S. Whitaker, 2007: Ensemble calibration of 500-hPa geopotential height and 850-hPa and 2-m temperatures using reforecasts. Mon. Wea. Rev., 135, 32733280.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and X. Wei, 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., J. S. Whitaker, and S. L. Mullen, 2006: Reforecasts: An important dataset for improving weather predictions. Bull. Amer. Meteor. Soc., 87, 3346.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., R. Hagedorn, and J. S. Whitaker, 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 26202632.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., G. T. Bates, J. S. Whitaker, D. R. Murray, M. Fiorino, T. J. Galarneau Jr., Y. J. Zhu, and W. Lapenta, 2013: NOAA’s second-generation global medium-range ensemble reforecast dataset. Bull. Amer. Meteor. Soc.,94, 1553–1565.

  • Hartmann, H. C., T. C. Pagano, S. Sorooshiam, and R. Bales, 2002: Confidence builders: Evaluating seasonal climate forecasts from user perspectives. Bull. Amer. Meteor. Soc., 83, 683698.

    • Search Google Scholar
    • Export Citation
  • Hutchinson, M. F., D. W. McKenney, K. Lawrence, J. H. Pedlar, R. F. Hopkinson, E. Milewska, and P. Papadopol, 2009: Development and testing of Canada-wide interpolated spatial models of daily minimum–maximum temperature and precipitation for 1961–2003. J. Appl. Meteor. Climatol., 48, 725741.

    • Search Google Scholar
    • Export Citation
  • Li, C., V. P. Singh, and A. K. Mishra, 2012: Simulation of the entire range of daily precipitation using a hybrid probability distribution. Water Resour. Res.,48, W03521, doi:10.1029/2011WR011446.

  • Li, Z., F. Brissette, and J. Chen, 2013: Finding the most appropriate precipitation probability distribution for stochastic weather generation and hydrological modelling in Nordic watersheds. Hydrol. Processes,27, 3718–3729, doi: 10.1002/hyp.9499.

  • Nicks, A. D., and G. A. Gander, 1994: CLIGEN: A weather generator for climate inputs to water resource and other model. Proc. Fifth Int. Conf. on Computers in Agriculture, St. Joseph, MI, American Society of Agricultural Engineers, 394.

  • Pellerin, G., L. Lefaivre, P. Houtekamer, and C. Girard, 2003: Increasing the horizontal resolution of ensemble forecasts at CMC. Nonlinear Processes Geophys., 10, 463468.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., T. Gneiting, F. Balabdaout, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174.

    • Search Google Scholar
    • Export Citation
  • Richardson, C. W., 1981: Stochastic simulation of daily precipitation, temperature, and solar radiation. Water Resour. Res., 17, 182190.

    • Search Google Scholar
    • Export Citation
  • Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of sample size. Quart. J. Roy. Meteor. Soc., 127, 24732489.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and L. A. Smith, 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 1630.

  • Schmeits, M. J., and K. J. Kok, 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 41994211.

    • Search Google Scholar
    • Export Citation
  • Semenov, M. A., and E. M. Barrow, 2002: LARS-WG: A stochastic weather generator for use in climate impact studies. User manual, 28 pp. [Available online at www.rothamsted.ac.uk/mas-models/download/LARS-WG-Manual.pdf.]

  • Sloughter, J. M., A. E. Raftery, T. Gneitting, and C. Fraley, 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220.

    • Search Google Scholar
    • Export Citation
  • Soltanzadeh, I., M. Azadi, and G. A. Vakili, 2011: Using Bayesian Model Averaging (BMA) to calibrate probabilistic surface temperature forecasts over Iran. Ann. Geophys., 29, 12951303.

    • Search Google Scholar
    • Export Citation
  • Toth, Z., Y. Zhu, and T. Marchok, 2001: The use of ensembles to identify forecasts with small and large uncertainty. Wea. Forecasting, 16, 463477.

    • Search Google Scholar
    • Export Citation
  • Vrugt, J. A., M. P. Clark, C. G. H. Diks, Q. Duan, and B. A. Robinson, 2006: Multi-objective calibration of forecast ensembles using Bayesian model averaging. Geophys. Res. Lett., 33, L19817, doi:10.1029/2006GL027126.

    • Search Google Scholar
    • Export Citation
  • Wang, X., and C. Bishop, 2005: Improvement of ensemble reliability with a new dressing kernel. Quart. J. Roy. Meteor. Soc., 131, 965986.

    • Search Google Scholar
    • Export Citation
  • Whitaker, J. S., X. Wei, and F. Vitart, 2006: Improving week-2 forecasts with multimodel reforecast ensembles. Mon. Wea. Rev., 134, 22792284.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2005: Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 467 pp.

  • Wilks, D. S., 2006: Comparison of ensemble-MOS methods in the Lorenz ’96 setting. Meteor. Appl., 13, 243256.

  • Wilks, D. S., 2009: Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteor. Appl., 16, 361368.

  • Wilks, D. S., and T. M. Hamill, 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390.

  • Wilson, L. J., S. Beauregard, A. E. Raftery, and R. Verret, 2007: Calibrated surface temperature forecasts from the Canadian ensemble prediction system using Bayesian Model Averaging. Mon. Wea. Rev., 135, 13641385.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Location map of the two catchments.

  • Fig. 2.

    The relationships between forecasted summer (JAS) precipitation classes and the probability of observed summer precipitation occurrence and mean wet-day precipitation amounts for 1 and 3 lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

  • Fig. 3.

    The relationships between observed temperature and (left) GFS and (right) GPP ensemble mean temperature for 1 lead day over the (a),(b) CDD and (c),(d) YAM watersheds [linear regression (LR)].

  • Fig. 4.

    Rank histograms of the (left) GFS and (right) GPP ensemble precipitation forecasts for the 1 lead day over the (a),(b) CDD and (c),(d) YAM watersheds.

  • Fig. 5.

    Rank histograms of the (left) GFS and (right) GPP ensemble temperature forecasts for 1 lead day over the (a),(b) CDD and (c),(d) YAM watersheds.

  • Fig. 6.

    MAE of GFS, GPP, and BC ensemble mean (left) precipitation and (right) temperature forecasts for 1–7 lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

  • Fig. 7.

    RMSE of GFS, GPP, and BC ensemble mean (left) precipitation and (right) temperature forecasts for 1–7 lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

  • Fig. 8.

    MCRPSS of GFS, GPP, and BC ensemble (left) precipitation and (right) temperature forecasts for 1–lead days over the (a),(b) CDD and (c),(d) YAM watersheds.

  • Fig. 9.

    BS and its decomposed components [Brier score uncertainty (BSunc), Brier score reliability (BSrel), and Brier score resolution (BSres)] of (left) GFS, (middle) GPP, and (right) BC ensemble precipitation forecasts for 1–7 lead days over the (a)–(c) CDD and (d)–(f) YAM watersheds. Probability exceeding 50% (median) is used as the threshold. The BSS of the ensemble forecasts relative to the climatology is also presented.

  • Fig. 10.

    As in Fig. 9, but for ensemble temperature forecasts.

  • Fig. 11.

    Reliability diagrams of GFS, GPP, and BC ensemble precipitation and temperature forecasts for 1 and 3 lead days over the (a),(b) CDD and (c),(d) YAM watersheds. The probability of exceeding 50% (median) is used as the threshold.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 655 304 75
PDF Downloads 363 112 18