A Stratified Sampling Approach for Improved Sampling from a Calibrated Ensemble Forecast Distribution

Yiming Hu College of Hydrology and Water Resources, Hohai University, Nanjing, China
UNESCO-IHE Institute for Water Education, Delft, Netherlands

Search for other papers by Yiming Hu in
Current site
Google Scholar
PubMed
Close
,
Maurice J. Schmeits Royal Netherlands Meteorological Institute, De Bilt, Netherlands

Search for other papers by Maurice J. Schmeits in
Current site
Google Scholar
PubMed
Close
,
Schalk Jan van Andel UNESCO-IHE Institute for Water Education, Delft, Netherlands

Search for other papers by Schalk Jan van Andel in
Current site
Google Scholar
PubMed
Close
,
Jan S. Verkade Deltares, Delft, Netherlands

Search for other papers by Jan S. Verkade in
Current site
Google Scholar
PubMed
Close
,
Min Xu Deltares, Delft, Netherlands

Search for other papers by Min Xu in
Current site
Google Scholar
PubMed
Close
,
Dimitri P. Solomatine UNESCO-IHE Institute for Water Education, Delft, Netherlands
Water Resources Section, Delft University of Technology, Delft, Netherlands

Search for other papers by Dimitri P. Solomatine in
Current site
Google Scholar
PubMed
Close
, and
Zhongmin Liang National Cooperative Innovation Centre for Water Safety and Hydro-Science, and State Key Laboratory of Hydrology-Water Resources and Hydraulic Engineering, Hohai University, Nanjing, China

Search for other papers by Zhongmin Liang in
Current site
Google Scholar
PubMed
Close
Full access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

Before using the Schaake shuffle or empirical copula coupling (ECC) to reconstruct the dependence structure for postprocessed ensemble meteorological forecasts, a necessary step is to sample discrete samples from each postprocessed continuous probability density function (pdf), which is the focus of this paper. In addition to the equidistance quantiles (EQ) and independent random (IR) sampling methods commonly used at present, the stratified sampling (SS) method is proposed. The performance of the three sampling methods is compared using calibrated GFS ensemble precipitation reforecasts over the Xixian basin in China. The ensemble reforecasts are first calibrated using heteroscedastic extended logistic regression (HELR), and then the three sampling methods are used to sample calibrated pdfs with a varying number of discrete samples. Finally, the effect of the sampling method on the reconstruction of ensemble members with preserved space dependence structure is analyzed by using EQ, IR, and SS in ECC for reconstructing postprocessed ensemble members for four stations in the Xixian basin. There are three main results. 1) The HELR model has a significant improvement over the raw ensemble forecast. It clearly improves the mean and dispersion of the predictive distribution. 2) Compared to EQ and IR, SS can better cover the tails of the calibrated pdfs and a better dispersion of calibrated ensemble forecasts is obtained. In terms of probabilistic verification metrics like the ranked probability skill score (RPSS), SS is slightly better than EQ and clearly better than IR, while in terms of the deterministic verification metric, root-mean-square error, EQ is slightly better than SS. 3) ECC-SS, ECC-EQ, and ECC-IR all calibrate the raw ensemble forecast, but ECC-SS shows a better dispersion than ECC-EQ and ECC-IR in this study.

Corresponding author address: Dr. Yiming Hu, College of Hydrology and Water Resources, Hohai University, No. 1 Xikang Road, Nanjing City, Nanjing 210098, China. E-mail: hymkyan@163.com

Abstract

Before using the Schaake shuffle or empirical copula coupling (ECC) to reconstruct the dependence structure for postprocessed ensemble meteorological forecasts, a necessary step is to sample discrete samples from each postprocessed continuous probability density function (pdf), which is the focus of this paper. In addition to the equidistance quantiles (EQ) and independent random (IR) sampling methods commonly used at present, the stratified sampling (SS) method is proposed. The performance of the three sampling methods is compared using calibrated GFS ensemble precipitation reforecasts over the Xixian basin in China. The ensemble reforecasts are first calibrated using heteroscedastic extended logistic regression (HELR), and then the three sampling methods are used to sample calibrated pdfs with a varying number of discrete samples. Finally, the effect of the sampling method on the reconstruction of ensemble members with preserved space dependence structure is analyzed by using EQ, IR, and SS in ECC for reconstructing postprocessed ensemble members for four stations in the Xixian basin. There are three main results. 1) The HELR model has a significant improvement over the raw ensemble forecast. It clearly improves the mean and dispersion of the predictive distribution. 2) Compared to EQ and IR, SS can better cover the tails of the calibrated pdfs and a better dispersion of calibrated ensemble forecasts is obtained. In terms of probabilistic verification metrics like the ranked probability skill score (RPSS), SS is slightly better than EQ and clearly better than IR, while in terms of the deterministic verification metric, root-mean-square error, EQ is slightly better than SS. 3) ECC-SS, ECC-EQ, and ECC-IR all calibrate the raw ensemble forecast, but ECC-SS shows a better dispersion than ECC-EQ and ECC-IR in this study.

Corresponding author address: Dr. Yiming Hu, College of Hydrology and Water Resources, Hohai University, No. 1 Xikang Road, Nanjing City, Nanjing 210098, China. E-mail: hymkyan@163.com

1. Introduction

The advantages of ensemble forecasting have been widely recognized, as it takes into account uncertainty and provides a probabilistic forecast, for example, by running a forecast model several times with different initial conditions and often also with differences in model physics or configurations to produce a range of possible outcomes. However, because the perturbed initial conditions do not fully reflect the uncertainty of initial conditions and the forecasting models do not perfectly represent the real physics, the ensemble forecast is often subject to bias and underdispersion (Scherrer et al. 2004; Hamill and Colucci 1997; Messner et al. 2014a; Wilks 2015). Therefore, statistical postprocessing methods have been developed to calibrate the raw ensemble forecasts, such as Gaussian regression (Gneiting et al. 2005; Hagedorn et al. 2008; Kann et al. 2009; Williams et al. 2014), (modified) Bayesian model averaging (Raftery et al. 2005; Sloughter et al. 2007; Hamill 2007; Schmeits and Kok 2010), logistic regression (Hamill et al. 2004, 2008; Wilks and Hamill 2007), ensemble dressing (Roulston and Smith 2003), extended logistic regression (Wilks 2009; Schmeits and Kok 2010), and heteroscedastic extended logistic regression (Messner et al. 2014a,b).

The above-mentioned methods are only for univariate predictands and do not have the ability to consider the dependence structure between different variables, stations, and lead times, which sometimes will result in significant impact on, for example, hydrological applications (Clark et al. 2004; Wilks 2015). To reconstruct the dependence structure from a postprocessed predictive probability distribution, methods such as the Schaake shuffle (Clark et al. 2004) and empirical copula coupling (ECC; Schefzik et al. 2013) have been proposed. In the Schaake shuffle method, the dependence structure is reconstructed based on the historical climatology, while the ECC method is based on raw forecast ensembles. In both methods, a necessary and critical step is to sample discrete samples from each continuous postprocessed predictive distribution. It is this sampling process that we address in this paper. At present, the equidistance quantiles (EQ) and independent random (IR) sampling methods are commonly used (Schefzik et al. 2013; Wilks 2015). The studies conducted by Schefzik et al. (2013) and Wilks (2015) showed that EQ sampling is preferred to IR sampling.

In this paper, we propose to use an alternative sampling method, stratified sampling (SS), to sample discrete samples from each continuous postprocessed predictive distribution. The SS method is able to sample the tails of the calibrated probability density function (pdf) better than EQ. It has been widely used for other applications (Claggett et al. 2010; Wallenius et al. 2011; Noble et al. 2012; Padilla et al. 2014; Ding and Lee 2014). The performance of SS in representing a postprocessed forecast probability distribution and for application in ECC will be assessed and compared with EQ and IR sampling. The methods are applied to GFS ensemble reforecasts of precipitation (Hamill et al. 2013) over the Xixian subbasin of the Huai River catchment in China.

2. Methods used

a. Statistical postprocessing with HELR

The raw ensemble forecasts are first postprocessed with heteroscedastic extended logistic regression (HELR). This method was recently shown to have a good performance in calibrating ensemble precipitation forecast pdfs (Messner et al. 2014b). The HELR method uses the ensemble spread as a scale parameter to adjust the dispersion of the predictive distribution. The HELR function can be expressed as (Messner et al. 2014a)
e1
where q is the quantile; p is the nonexceedance probability; y is the variable of interest; g indicates a function; μ is the location parameter; and σ is the scale parameter, which can be obtained by
e2
where X and H are predictor vectors, H includes ensemble spread, and and are coefficient vectors. The exponential function in Eq. (2) is to ensure positive values of . If = 1, HELR reduces to extended logistic regression (ELR; Messner et al. 2014a).

Wilks (2009), Schmeits and Kok (2010), and Roulin and Vannitsem (2012) showed that the function in Eq. (1), where a is a coefficient, leads to well-calibrated results for precipitation forecasts. In section 4a, information about the predictor vectors X and H in μ and σ is given.

b. Sampling methods

After postprocessing the raw ensemble forecasts using HELR, the continuous postprocessed predictive distributions are obtained. If we want to input the calibrated ensemble forecasts to drive the hydrological model for ensemble streamflow prediction, a necessary and key step is to obtain discrete samples from the continuous postprocessed predictive distributions using a sampling method. At present, the EQ and IR sampling methods are commonly used (Schefzik et al. 2013; Wilks 2015). In this study, the SS method (Ding and Lee 2014) is also used.

For a given postprocessed predictive distribution F, the EQ approach is expressed by (Schefzik et al. 2013)
e3
where n is the sample number and F−1 is the inverse of the distribution F.
Independent random sampling is expressed by (Schefzik et al. 2013)
e4
where are independent standard uniform random numbers.
The SS method can be carried out in the following way to sample discrete samples (size n) from distribution F (Ding and Lee 2014). The first step is to divide the range (0, 1] into n disjoint intervals, that is, , which makes the probability of a given event occurring in each interval equal to ; the second step is to sample a random number from each interval to generate samples . For example, is selected at random from the first interval , and is selected at random from the second interval . Then, using Eq. (4), the samples can be obtained:
e5

The differences among the SS, EQ, and IR approaches can be demonstrated by Fig. 1, where the number of samples sampled from the standard uniform distribution is n = 11 and the number of repeated implementations is 1000. The quantiles are ranked from smallest to largest, one for each implementation. It can be seen that for EQ, the quantile series is fixed [Eq. (3)]. Compared to IR, SS is more representative for the standard uniform distribution because the quantile series of SS always evenly cover the range from 0 to 1, while for IR, the range of samples sometimes is very narrow, such as [0.5, 0.8], failing to properly represent the range from 0 to 1.

Fig. 1.
Fig. 1.

The characteristics of EQ, IR, and SS. Each time, 11 samples are sampled from a standard uniform distribution and then ranked from smallest to largest. The number of repeated implementations is 1000 times. For the EQ approach, the 11 samples are fixed as for 1000 times.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

In theory, compared to the EQ method, the SS method better samples the tails of the continuous distribution. From Eq. (3), it is clear that EQ does not have the ability to sample the intervals and . Instead, SS covers these two intervals better by sampling samples from those intervals. Therefore, compared to the EQ method, the SS method can increase the spread of the ensemble forecasts sampled from the calibrated continuous distribution.

c. ECC method

The ECC method (Schefzik et al. 2013) is applied here to preserve the space–time-dependent structure of raw ensemble precipitation forecasts when reconstructing the postprocessed ensemble forecast. The first step of ECC is, similarly to standard postprocessing, to apply statistical techniques to postprocess univariate variables and obtain calibrated predictive distributions for each time and location. The second step is to sample discrete values from each postprocessed predictive distribution to obtain postprocessed ensemble forecasts for each time and location. The third step is to rearrange the sampled values in the rank order structure of the original raw ensemble to restore the space–time-dependent structure. In general, ECC can also be used to reconstruct the correlation structure between different variables.

3. Data and experiment setup

a. Study area and data

The data used in this study are covering the Xixian subbasin of the Huai River catchment in eastern China. The data include 24-h accumulated precipitation measurements from 0000 to 0000 UTC at 17 precipitation stations and 24-h accumulated precipitation reforecasts from 0000 to 0000 UTC in 11 model grid boxes with a Gaussian ~0.5° resolution from the GFS 11-member ensemble reforecast (Fig. 2; Hamill et al. 2013). The period from 1 May to 1 October, which covers the rainy season, in the years of 2006–09 is considered. Area-averaged predictive precipitation is calculated by weight averaging the precipitation at 11 forecast model grid points for each ensemble member, and area-averaged measured precipitation is calculated by weight averaging the data of 17 precipitation stations.

Fig. 2.
Fig. 2.

The location of the stations in the Xixian basin of the Huai River catchment in eastern China.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

b. Experiment setup

To assess the performance of the HELR method for calibrating the raw ensemble precipitation forecasts, the root-mean-square error (RMSE; appendix, section a), the fraction of measured values falling within the forecast range, the ranked probability skill score (RPSS; appendix, section c), and the Brier skill score (BSS; appendix, section d) are used. The following forecast lead times are evaluated: 0–1 day, 1–2 days, 2–3 days, 3–4 days, 4–5 days, 5–6 days, 6–7 days, and 7–8 days. Sixfold cross validation is employed; that is, for a given lead time, five subsamples of the available data are used to train the HELR postprocessing model and the one remaining subsample is used to verify the predictive distribution, with sixfold permutation of these subsamples.

After postprocessing the raw ensemble forecasts using HELR, the continuous postprocessed predictive distributions are obtained. Discrete samples from the calibrated predictive distributions are taken to construct the postprocessed ensemble members with size n using the EQ, IR, and SS methods. The sample size n from a given predictive distribution is set as ; M is any positive integer, and here . In this study, we use precipitation reforecast data from the 11-member GFS ensemble (Hamill et al. 2013), so that . The implementation is repeated 1000 times. The RMSE, the ratio of measured values falling within the forecast range (Rat; appendix, section b), and the RPSS are used to assess and compare the performance of the sampled postprocessed forecasts.

As a final part of this study, the effect of the sampling method on the reconstruction of ensemble members with preserved time–space-dependent structure is analyzed by using EQ, IR, and SS in ECC for reconstructing postprocessed ensemble members for four locations. Here, accumulated forecast precipitation for the Huanggang, Dingyuandian, Wulidian, and Xiaocaodian stations in the Xixian basin (Fig. 2) for 0–1- and 4–5-day lead times are used. The ensemble precipitation reforecasts from GFS are bilinearly interpolated from neighboring grid points to the station locations (Messner et al. 2014a,b).

The above-mentioned sixfold cross validation is also used here. Next, SS, EQ, and IR sampling are used to sample discrete values from the postprocessed marginal predictive distributions. Finally, the ECC method is used to construct the dependent structure, and the multivariate rank histogram (appendix, section e) is employed to assess the performance of ECC-SS, ECC-EQ, and ECC-IR.

4. Results and analysis

a. HELR performance for postprocessing ensemble precipitation forecasts

First, we have verified the various ELR and HELR models of Messner et al. (2014a) using the area-averaged precipitation in the Xixian basin (section 3a) and the model variant of HELR, in which , , and in Eq. (1), that is,
e6
appeared to give the highest RPSS for most lead times, although the differences between the models were rather small. In Eq. (6), M and S represent the mean and standard deviation of square-root-transformed ensemble precipitation forecasts, respectively; and a, b, c, d, and h are function coefficients. A square-root transformation has been applied because it yields better forecasts for a positively skewed predictand like precipitation (e.g., Wilks 2011). The used thresholds of the area-averaged precipitation are 2 mm (63%), 5 mm (79%), 8 mm (85%), 10 mm (88%), 15 mm (92%), and 20 mm (94%).

The RMSE is used to assess the performance of the HELR model [Eq. (6)] for the calibrated ensemble mean forecast (CEMF). Figure 3 shows that for the first 4 days, CEMF is better than the raw ensemble mean forecast (REMF). For the fifth and sixth days, CEMF is slightly better or similar to REMF, but for the seventh and eighth days, CEMF is worse than REMF. Overall, with the increase of lead times, the calibration capacity of HELR for the ensemble mean forecast is reduced.

Fig. 3.
Fig. 3.

The RMSE of raw and HELR CEMF.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

Figure 4 shows the fraction of measured values falling within the 17%–83% and 8%–92% forecast range of the raw ensemble forecasts and calibrated forecasts obtained by HELR, respectively. For the raw ensemble forecasts, the forecast range of 17%–83% is determined using the second-smallest and the tenth-smallest ensemble member, and the 8%–92% forecast range is determined using the smallest and the largest ensemble member for a given forecast time. Compared to the raw forecast, the fraction of measured values falling within the forecast range clearly increases for the calibrated forecast for all lead times. For the 17%–83% forecast range, the underdispersion has been corrected by postprocessing, leading to a good coverage of about 70%. For the 8%–92% forecast range, the underdispersion has also been corrected by postprocessing, but now leading to a slight overdispersion with a coverage of about 92%. This indicates that the calibrated forecast of HELR has clearly improved the dispersion of the predictive distribution.

Fig. 4.
Fig. 4.

The fraction of measured values falling within the 17%–83% and 8%–92% forecast ranges of the raw ensemble forecast and the calibrated forecast obtained by HELR at eight different lead times.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

The RPSS results are presented in Fig. 5. It is clear that the calibrated forecast provided by the HELR model shows a significant improvement over the raw forecast for all lead times. With the lead time increasing, the RPSS values decrease for both the raw forecast and the HELR model, which indicates the decrease in predictability, and the calibration capacity of HELR is reduced.

Fig. 5.
Fig. 5.

The RPSS with the climatological forecast as a reference for the raw ensemble forecast (R) and the HELR model (HR) at eight different lead times. The lower and upper limit of the box is the 25% and 75% quartile, respectively, and the line inside the box is the median of 300 bootstrap samples (appendix, section c). The whiskers mean the RPSS values are less than 1.5 times the length of the box away from the box, and the plus signs indicate the RPSS value is outside the whiskers.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

The Brier skill score with the climatological forecast as a reference is computed for the thresholds 2 mm (63%), 10 mm (88%), and 20 mm (94%). For the three thresholds, the BSS of HELR is larger than that of the raw forecast (Fig. 6). This indicates again that the HELR model has a capacity to calibrate the raw ensemble forecast.

Fig. 6.
Fig. 6.

The BSS of the raw forecast and HELR model for thresholds 2, 10, and 20 mm at eight different lead times.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

b. Comparison among EQ, IR, and SS

After obtaining the HELR pdf of the area-averaged precipitation for a given sample size n in each of 1000 implementations, one value of RMSE (appendix, section a), Rat (appendix, section b), and RPSS (appendix, section c) can be obtained for SS, EQ, and IR, respectively. So, the total number of RMSE, Rat, and RPSS values is 1000 for each sampling approach. The average value of RMSE, Rat, and RPSS can be calculated by
e7
e8
e9

Figure 7 shows that for 0–1- and 4–5-day lead times the (computed using the ensemble mean of the calibrated ensemble forecasts sampled from the HELR pdfs) provided by the SS and EQ method is less than that of IR for all sample sizes, while the of SS is (slightly) larger than that of EQ, especially for small sample sizes. It is very interesting to notice that with the increase of sample size the of EQ is becoming larger, which means that the performance of EQ is reduced. Compared to EQ and IR, the of SS converges faster and tends to be steady from sample size n = 55 for the two lead times. Overall, with the increase of sample size, the differences among SS, EQ, and IR vanish. For the six other lead times, the results are similar.

Fig. 7.
Fig. 7.

The for different numbers of ensemble members sampled from calibrated pdfs of HELR using EQ, IR, and SS.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

Figure 8 presents the results of , that is, the mean percentage of observations falling within the calibrated ensemble forecast range, sampled from the HELR pdfs using SS, EQ, and IR. It can be seen that provided by the SS method is larger than that of EQ and IR, especially if the sample size is less than 110. This means that compared to EQ and IR, SS can obtain a larger dispersion of calibrated ensemble forecasts and better samples the tails of the HELR pdfs. However, a slight overdispersion is also noticeable, which is caused by HELR itself (Fig. 4), rather than by the sampling methods. The HELR ratio, defined here as the ratio of observations falling between the 0.001 and 0.999 quantile of the HELR pdf, equals 1 and SS converges faster than EQ and IR. For the six other lead times, the results are similar.

Fig. 8.
Fig. 8.

The of 1000 implementations, which shows the mean percentage of measured values falling within the ensemble forecast range sampled from the HELR pdfs using EQ, IR, and SS.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

The verification results of SS, EQ, and IR in terms of (for the thresholds mentioned in the appendix, section c) are presented in Fig. 9. It is clear that both the EQ and SS approaches are better than IR, while SS is slightly better than EQ for lower values of n. However, all these MRPSS values are smaller than that of HELR, which might be caused by the transfer from a continuous predictive distribution to a discrete ensemble forecast. The values of SS and EQ converge fast and tend to be steady from sample size n = 55.

Fig. 9.
Fig. 9.

The of 1000 implementations for different numbers of ensemble members, sampled from HELR pdfs using EQ, IR, and SS.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

c. Application of EQ, IR, and SS in ECC

Now HELR is first applied to postprocess raw ensemble precipitation forecasts for the Huanggang, Dingyuandian, Wulidian, and Xiaocaodian stations to obtain continuous calibrated pdfs for each of these stations. Then, SS, EQ, and IR are used to sample discrete samples (with sample size n = 11) from each HELR pdf, denoted by S-SS, S-EQ, and S-IR, respectively. The ranked probability skill scores (appendix, section c) with the climatological forecast as a reference for the raw ensemble and the calibrated pdfs of HELR, S-SS, S-EQ, and S-IR are calculated for the four stations separately (Fig. 10). It is clear that the calibrated forecast of HELR shows a significant improvement over the raw forecast for all four stations. However, it is also evident that the postprocessed continuous predictive distribution has a higher skill than S-SS, S-EQ, and S-IR with 11 discrete ensemble members sampled from the HELR pdfs. The decrease in skill is caused by the transfer from a continuous predictive distribution to a discrete 11-member ensemble forecast. The S-SS is generally better than S-EQ and S-IR, and S-EQ is better than S-IR. This difference in skill between the sampling methods means that the choice of the sampling method has a considerable effect on preserving the calibration of ensemble forecasts obtained by the postprocessed continuous predictive distribution.

Fig. 10.
Fig. 10.

The RPSS with the climatological forecast as a reference for the raw ensemble, the calibrated forecast of HELR, S-SS, S-EQ, and S-IR (with n = 11) for four stations (columns) and for (top) 0–1- and (bottom) 4–5-day lead times.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

To investigate the skill differences for relatively extreme precipitation between the three sampling methods, the BSS with the climatological forecast as a reference is computed for the high threshold of 20 mm for the four stations and for 0–1- and 4–5-day lead times (Fig. 11). The percentiles corresponding to the threshold of 20 mm for the four stations are between 93% and 95%. For the threshold of 20 mm, the median value of the BSS of HELR is generally larger than those of the raw forecast, S-SS, S-EQ, and S-IR. This indicates that the decrease in BSS for the 20-mm threshold is caused again by the transfer from a continuous predictive distribution to a discrete 11-member ensemble forecast. Overall, S-SS is better than S-EQ and S-IR, and S-EQ is better than S-IR. This result also confirms that SS can better sample the upper tails of the HELR pdfs than the EQ and IR sampling methods can.

Fig. 11.
Fig. 11.

The BSS for the threshold of 20 mm with the climatological forecast as a reference for the raw ensemble (R), the calibrated forecast of HELR (HR), S-SS, S-EQ, and S-IR (with n = 11) for four stations (columns) and for (top) 0–1- and (bottom) 4–5-day lead times.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

Based on the S-SS, S-EQ, and S-IR samples of the four stations, the ECC method is used to reconstruct the dependent structure between the forecasts for the four stations, denoted by ECC-SS, ECC-EQ, and ECC-IR, respectively. Multivariate rank histograms (Gneiting et al. 2008; Schefzik et al. 2013; appendix, section e) are compared for the raw ensemble forecasts, ECC-SS, ECC-EQ, and ECC-IR. Figure 12 presents these multivariate rank histograms for 0–1- and 4–5-day accumulated precipitation. The discrepancy measure values (Gneiting et al. 2008; appendix, section f) are calculated for GFS, ECC-SS, ECC-EQ, and ECC-IR. At the lead time of 0–1 days, these values are 0.609, 0.1194, 0.2291, and 0.1845, respectively, and at the lead time of 4–5 days, they are 0.3495, 0.0991, 0.1727, and 0.2093, respectively. It can be concluded that the underdispersion of the raw ensemble forecasts has been clearly diminished by the three approaches. In this study, ECC-SS is better than ECC-EQ and ECC-IR, as the multivariate rank histogram provided by ECC-SS is more uniform than that of ECC-EQ and ECC-IR. ECC-EQ and ECC-IR still show underdispersion to a certain extent.

Fig. 12.
Fig. 12.

Multivariate rank histogram of raw ensemble forecasts, ECC-SS, ECC-EQ, and ECC-IR for 0–1- (four upper panels) and 4–5-day (four lower panels) accumulated precipitation.

Citation: Journal of Hydrometeorology 17, 9; 10.1175/JHM-D-15-0205.1

5. Summary and conclusions

In this paper, we have proposed to use stratified sampling (SS) for better sampling from a continuous predictive distribution, and a comparison between SS, equidistance quantiles (EQ), and independent random (IR) sampling was made based on 0–1 day, 1–2 days, 2–3 days, 3–4 days, 4–5 days, 5–6 days, 6–7 days, and 7–8 days 24-h precipitation from GFS ensemble reforecasts (Hamill et al. 2013) over part of the Huai River basin (the Xixian subbasin) in China. Heteroscedastic extended logistic regression (HELR; Messner et al. 2014a) was employed to calibrate the ensemble precipitation reforecasts and provided the predictive distributions to sample from. The results show that in terms of RMSE, dispersion of the predictive distribution, ranked probability skill score (RPSS), and Brier skill score (BSS), HELR significantly improves raw area-averaged ensemble precipitation reforecasts, although with the increase of lead times, the calibration capacity of HELR for ensemble mean and probabilistic forecasts is reduced.

For the SS, EQ, and IR approaches used to sample discrete samples from the probability density functions (pdfs) of HELR, and for different sample sizes n = 11, 22, 33, …, 528, 539, 550, the results show that, compared to EQ and IR, SS can obtain a larger dispersion of calibrated ensemble forecasts and better samples the tails of the HELR pdfs. In terms of RMSE, EQ is slightly better than SS, while IR is the worst-performing sampling method. However, with the increase of sample size, the difference of RMSE among SS, EQ, and IR vanishes. In terms of RPSS, SS is slightly better than EQ and clearly better than IR. Overall, SS is better than EQ and IR, especially for small size samples sampled from the pdfs of HELR.

The application of HELR, sampling, and ECC on ensemble precipitation forecasts for four observation stations in the Xixian basin shows that in terms of ranked probability skill score, the HELR and the discrete samples from the pdfs of HELR using SS, EQ, and IR (denoted by S-SS, S-EQ, and S-IR, respectively) all have a significant improvement over the raw ensemble forecasts for the 0–1-day lead time, while only S-SS and S-EQ show a significant improvement for the 4–5-day lead time. Besides, the BSS values for the threshold of 20 mm show that, compared to EQ and IR, SS can better sample the upper tails of the HELR pdfs. However, a decrease in skill of S-SS, S-EQ, and S-IR with respect to the HELR method is caused by the transfer from the postprocessed continuous predictive distribution to the discrete ensemble forecasts. The selection of the sampling method has a considerable effect on preserving the calibration of ensemble forecasts obtained by the postprocessed continuous predictive distribution. The multivariate rank histograms indicate that the ECC-SS approach is better than ECC-EQ and ECC-IR in preserving the space dependence structure in this study. While the multivariate rank histograms of ECC-EQ and ECC-IR show underdispersion to a certain extent, the multivariate rank histogram of ECC-SS is more uniform, probably because the SS approach better samples the tails of the calibrated pdf. Although we did not show that in this study, it is important to note that ECC-SS is also capable of reconstructing the dependent structure between different lead times and variables, like ECC-EQ and ECC-IR.

The SS approach can also be used for other postprocessing methods, ensemble weather forecasts, or locations, as long as the postprocessed result is a continuous pdf and the discrete samples are sampled from that pdf. However, the differences in skill between SS and EQ are expected to be small if the calibrated pdf has flat tails.

Finally, it is worth noting that, although the raw ensemble precipitation forecasts can be improved and the dependent structure between multivariate meteorological forecasts also can be reconstructed by postprocessing methods, the effect on total predictive uncertainty of ensemble streamflow forecasts may be limited because of the errors linked to hydrological modeling (Zalachori et al. 2012). To obtain more skillful probabilistic streamflow forecasts, a postprocessing method for ensemble streamflow forecasts is (also) needed.

Acknowledgments

This study is supported by the National Natural Science Foundation of China (Grant 51179046); Special Funds for Public Industry Research Projects of the China Ministry of Water Resources (Grant 201301066); the Fundamental Research Funds for the Central Universities (2015B05514); the Graduate Research and Innovation Program for Ordinary University of Jiangsu Province, China (CXZZ13_0248); the Ministry of Infrastructure and the Environment of the Netherlands; and China Scholarship Council. Jakob Messner and Achim Zeileis are thanked for providing the (heteroscedastic) extended logistic regression crch R package available online (https://cran.r-project.org/). We are also grateful to all anonymous reviewers for their helpful comments, which helped us to improve the quality of the article.

APPENDIX

Verification Metrics

a. RMSE

Let and denote ensemble forecasts and observations, respectively, where n is the number of ensemble members and N is the length of the series of forecast–observation pairs. The RMSE can be calculated by
ea1
where is the ensemble mean forecast at time j. The smaller the RMSE value, the better the ensemble mean forecast. When the RMSE equals zero, the forecast model is perfect.

b. Ratio of measured values falling within the prediction range

The ratio of measured values falling within the prediction range is used to assess the dispersion of ensemble forecasts. The larger the ratio, the more observations are within the range of the ensemble forecasts. It is calculated by
ea2
ea3
ea4
where m is the number of observations falling within the prediction range. For a given target distribution, the expected ratio for an adequate sample from the target distribution relies on the distribution itself and can be approximately estimated by calculating the ratio of the observations falling within the range between the smallest and the largest sampled ensemble member.

c. RPSS

The RPSS is a skill score that is widely used for assessing the performance of ensemble forecasts, and it measures the skill of multicategory probabilistic forecasts relative to a reference forecast. The RPSS can be expressed as (e.g., Weigel et al. 2007)
ea5
ea6
where N is the number of forecast categories; and denote the kth component of the cumulative forecast and observation vectors and , respectively; , with being the probabilistic forecast for the event to happen in category i; , with if the observation is in category i and if the observation is in category ; and is the ranked probability score (RPS) of a climatological forecast. The higher the RPSS value, the better the model performance. The model is perfect if RPSS equals one.

The RPS in each subsample is calculated for the fitted model, trained using five remaining subsamples. The thresholds are set to 2 mm (63% for area-averaged precipitation), 5 mm (79%), 8 mm (85%), 10 mm (88%), 15 mm (92%), and 20 mm (94%). For each given projection time, 300 bootstrap samples of RPS are used to estimate the sampling distribution of average of raw, calibrated, and climatological forecast; then we can obtain the sampling distribution of using Eq. (A5).

d. BSS

The Brier score (BS) indicates the mean-squared error of the probability forecasts with regard to the observations. The score averages the squared differences between pairs of forecast probabilities and the subsequent binary observations (e.g., Wilks 2011):
ea7
where is the forecast probability for event (e.g., for the raw ensemble is the ratio of ensemble members ≤q at time k); q is quantile (or threshold) of interest; and is an indicator function, I = 1 if event occurs and I = 0 if not. The smaller the BS value, the better the model performance.
The BSS with the climatological forecast as a reference is calculated by
ea8
where is the BS of a climatological forecast. The higher the BSS value, the better the model performance.

e. Multivariate rank histogram

To assess the preservation of the dependent structure obtained by ECC-EQ, ECC-IR, and ECC-SS, the multivariate rank histogram is used, which can be described as follows (Gneiting et al. 2008; Schefzik et al. 2013).

Let and denote the ensemble forecast and the corresponding verifying observation, respectively, and d is the number of variables of matrix . The multivariate rank histogram can be obtained as follows:

  1. For a given time j, use Eq. (A9) to find the prerank of among the observation and the ensemble member forecasts. Each prerank is an integer between 1 and n + 1:
    ea9
    ea10
  2. Calculate the multivariate rank for the given time j as follows:
    ea11
    The multivariate rank is chosen from a discrete uniform distribution in the set . It is an integer between 1 and n + 1.
  3. Based on , the multivariate rank histogram can be obtained; it is a plot of the empirical frequency of the multivariate ranks. Underdispersed ensembles tend to produce U-shaped histograms, overdispersed ensembles produce hump-shaped histograms, and biased ensembles produce skewed rank histograms.

f. Discrepancy measure

To quantify the deviation from uniformity in a rank histogram, the discrepancy measure is used and can be expressed as (Gneiting et al. 2008)
ea12
where is the observed relative frequency of rank i and m is the number of ranks. The smaller the Δ value is, the flatter the rank histogram will be.

REFERENCES

  • Claggett, P. R., Okay J. A. , and Stehman S. V. , 2010: Monitoring regional riparian forest cover change using stratified sampling and multiresolution imagery. J. Amer. Water Resour. Assoc., 46, 334343, doi:10.1111/j.1752-1688.2010.00424.x.

    • Search Google Scholar
    • Export Citation
  • Clark, M., Gangopadhyay S. , Hay L. , Rajagopalan B. , and Wilby R. , 2004: The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeor., 5, 243262, doi:10.1175/1525-7541(2004)005<0243:TSSAMF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Ding, C. G., and Lee H. Y. , 2014: An accurate confidence interval for the mean tourist expenditure under stratified random sampling. Curr. Issues Tour., 17, 674678, doi:10.1080/13683500.2013.857296.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., Raftery A. E. , Westveld A. H. III, and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., Stanberry L. I. , Grimit E. P. , Held L. , and Johnson N. A. , 2008: Assessing probabilistic forecasts of multivariate quantities, with applications to ensemble predictions of surface winds. TEST, 17, 211235, doi:10.1007/s11749-008-0114-x.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., Hamill T. M. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 26082619, doi:10.1175/2007MWR2410.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2007: Comments on “Calibrated surface temperature forecasts from the Canadian ensemble prediction system using Bayesian model averaging.” Mon. Wea. Rev., 135, 42264230, doi:10.1175/2007MWR1963.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327, doi:10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Hagedorn R. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 26202632, doi:10.1175/2007MWR2411.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Bates G. T. , Whitaker J. S. , Murray D. R. , Fiorino M. , Galarneau T. J. , Zhu Y. , and Lapenta W. , 2013: NOAA’s second-generation global medium-range ensemble forecast dataset. Bull. Amer. Meteor. Soc., 94, 15531565, doi:10.1175/BAMS-D-12-00014.1.

    • Search Google Scholar
    • Export Citation
  • Kann, A., Wittmann C. , Wang Y. , and Ma X. , 2009: Calibrating 2-m temperature of limited-area ensemble forecasts using high-resolution analysis. Mon. Wea. Rev., 137, 33733387, doi:10.1175/2009MWR2793.1.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., Mary G. J. , Zeileis A. , and Wilks D. S. , 2014a: Heteroscedastic extended logistic regression for post-processing of ensemble guidance. Mon. Wea. Rev., 142, 448456, doi:10.1175/MWR-D-13-00271.1.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., Mary G. J. , and Zeileis A. , 2014b: Extending extended logistic regression: Extended versus separate versus ordered versus censored. Mon. Wea. Rev., 142, 30033014, doi:10.1175/MWR-D-13-00355.1.

    • Search Google Scholar
    • Export Citation
  • Noble, W., Naylor G. , Bhullar N. , and Akeroyd M. A. , 2012: Self-assessed hearing abilities in middle- and older-age adults: A stratified sampling approach. Int. J. Audiol., 51, 174180, doi:10.3109/14992027.2011.621899.

    • Search Google Scholar
    • Export Citation
  • Padilla, M., Stehmanb S. V. , and Chuviec E. , 2014: Validation of the 2008 MODIS-MCD45 global burned area product using stratified random sampling. Remote Sens. Environ., 144, 187196, doi:10.1016/j.rse.2014.01.008.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • Roulin, E., and Vannitsem S. , 2012: Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts. Mon. Wea. Rev., 140, 874888, doi:10.1175/MWR-D-11-00062.1.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and Smith L. A. , 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 1630, doi:10.1034/j.1600-0870.2003.201378.x.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., Thorarinsdottir T. L. , and Gneiting T. , 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Scherrer, S. C., Appenzeller C. , Eckert P. , and Cattani D. , 2004: Analysis of the spread–skill relations using the ECMWF ensemble prediction system over Europe. Wea. Forecasting, 19, 552565, doi:10.1175/1520-0434(2004)019<0552:AOTSRU>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Schmeits, M. J., and Kok K. , 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 41994211, doi:10.1175/2010MWR3285.1.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M., Raftery A. E. , Gneiting T. , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, doi:10.1175/MWR3441.1.

    • Search Google Scholar
    • Export Citation
  • Wallenius, K., Niemi R. M. , and Rita H. , 2011: Using stratified sampling based on pre-characterisation of samples in soil microbiological studies. Appl. Soil Ecol., 51, 111113, doi:10.1016/j.apsoil.2011.09.006.

    • Search Google Scholar
    • Export Citation
  • Weigel, A. P., Liniger M. A. , and Appenzeller C. , 2007: The discrete Brier and ranked probability skill scores. Mon. Wea. Rev., 135, 118124, doi:10.1175/MWR3280.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2009: Extending logistic regression to provide full probability distribution MOS forecasts. Meteor. Appl., 16, 361368, doi:10.1002/met.134.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2015: Multivariate ensemble model output statistics using empirical copula. Quart. J. Roy. Meteor. Soc., 141, 945952, doi:10.1002/qj.2414.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., and Hamill T. M. , 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390, doi:10.1175/MWR3402.1.

    • Search Google Scholar
    • Export Citation
  • Williams, R. M., Ferro C. A. T. , and Kwasniok F. , 2014: A comparison of ensemble post-processing methods for extreme events. Quart. J. Roy. Meteor. Soc., 140, 11121120, doi:10.1002/qj.2198.

    • Search Google Scholar
    • Export Citation
  • Zalachori, I., Ramos M. H. , Garçon R. , Mathevet T. , and Gailhard J. , 2012: Statistical processing of forecasts for hydrological ensemble prediction: A comparative study of different bias correction strategies. Adv. Sci. Res., 8, 135141, doi:10.5194/asr-8-135-2012.

    • Search Google Scholar
    • Export Citation
Save
  • Claggett, P. R., Okay J. A. , and Stehman S. V. , 2010: Monitoring regional riparian forest cover change using stratified sampling and multiresolution imagery. J. Amer. Water Resour. Assoc., 46, 334343, doi:10.1111/j.1752-1688.2010.00424.x.

    • Search Google Scholar
    • Export Citation
  • Clark, M., Gangopadhyay S. , Hay L. , Rajagopalan B. , and Wilby R. , 2004: The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeor., 5, 243262, doi:10.1175/1525-7541(2004)005<0243:TSSAMF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Ding, C. G., and Lee H. Y. , 2014: An accurate confidence interval for the mean tourist expenditure under stratified random sampling. Curr. Issues Tour., 17, 674678, doi:10.1080/13683500.2013.857296.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., Raftery A. E. , Westveld A. H. III, and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 10981118, doi:10.1175/MWR2904.1.

    • Search Google Scholar
    • Export Citation
  • Gneiting, T., Stanberry L. I. , Grimit E. P. , Held L. , and Johnson N. A. , 2008: Assessing probabilistic forecasts of multivariate quantities, with applications to ensemble predictions of surface winds. TEST, 17, 211235, doi:10.1007/s11749-008-0114-x.

    • Search Google Scholar
    • Export Citation
  • Hagedorn, R., Hamill T. M. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 26082619, doi:10.1175/2007MWR2410.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., 2007: Comments on “Calibrated surface temperature forecasts from the Canadian ensemble prediction system using Bayesian model averaging.” Mon. Wea. Rev., 135, 42264230, doi:10.1175/2007MWR1963.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 13121327, doi:10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 14341447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Hagedorn R. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 26202632, doi:10.1175/2007MWR2411.1.

    • Search Google Scholar
    • Export Citation
  • Hamill, T. M., Bates G. T. , Whitaker J. S. , Murray D. R. , Fiorino M. , Galarneau T. J. , Zhu Y. , and Lapenta W. , 2013: NOAA’s second-generation global medium-range ensemble forecast dataset. Bull. Amer. Meteor. Soc., 94, 15531565, doi:10.1175/BAMS-D-12-00014.1.

    • Search Google Scholar
    • Export Citation
  • Kann, A., Wittmann C. , Wang Y. , and Ma X. , 2009: Calibrating 2-m temperature of limited-area ensemble forecasts using high-resolution analysis. Mon. Wea. Rev., 137, 33733387, doi:10.1175/2009MWR2793.1.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., Mary G. J. , Zeileis A. , and Wilks D. S. , 2014a: Heteroscedastic extended logistic regression for post-processing of ensemble guidance. Mon. Wea. Rev., 142, 448456, doi:10.1175/MWR-D-13-00271.1.

    • Search Google Scholar
    • Export Citation
  • Messner, J. W., Mary G. J. , and Zeileis A. , 2014b: Extending extended logistic regression: Extended versus separate versus ordered versus censored. Mon. Wea. Rev., 142, 30033014, doi:10.1175/MWR-D-13-00355.1.

    • Search Google Scholar
    • Export Citation
  • Noble, W., Naylor G. , Bhullar N. , and Akeroyd M. A. , 2012: Self-assessed hearing abilities in middle- and older-age adults: A stratified sampling approach. Int. J. Audiol., 51, 174180, doi:10.3109/14992027.2011.621899.

    • Search Google Scholar
    • Export Citation
  • Padilla, M., Stehmanb S. V. , and Chuviec E. , 2014: Validation of the 2008 MODIS-MCD45 global burned area product using stratified random sampling. Remote Sens. Environ., 144, 187196, doi:10.1016/j.rse.2014.01.008.

    • Search Google Scholar
    • Export Citation
  • Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 11551174, doi:10.1175/MWR2906.1.

    • Search Google Scholar
    • Export Citation
  • Roulin, E., and Vannitsem S. , 2012: Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts. Mon. Wea. Rev., 140, 874888, doi:10.1175/MWR-D-11-00062.1.

    • Search Google Scholar
    • Export Citation
  • Roulston, M. S., and Smith L. A. , 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 1630, doi:10.1034/j.1600-0870.2003.201378.x.

    • Search Google Scholar
    • Export Citation
  • Schefzik, R., Thorarinsdottir T. L. , and Gneiting T. , 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616640, doi:10.1214/13-STS443.

    • Search Google Scholar
    • Export Citation
  • Scherrer, S. C., Appenzeller C. , Eckert P. , and Cattani D. , 2004: Analysis of the spread–skill relations using the ECMWF ensemble prediction system over Europe. Wea. Forecasting, 19, 552565, doi:10.1175/1520-0434(2004)019<0552:AOTSRU>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Schmeits, M. J., and Kok K. , 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 41994211, doi:10.1175/2010MWR3285.1.

    • Search Google Scholar
    • Export Citation
  • Sloughter, J. M., Raftery A. E. , Gneiting T. , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 32093220, doi:10.1175/MWR3441.1.

    • Search Google Scholar
    • Export Citation
  • Wallenius, K., Niemi R. M. , and Rita H. , 2011: Using stratified sampling based on pre-characterisation of samples in soil microbiological studies. Appl. Soil Ecol., 51, 111113, doi:10.1016/j.apsoil.2011.09.006.

    • Search Google Scholar
    • Export Citation
  • Weigel, A. P., Liniger M. A. , and Appenzeller C. , 2007: The discrete Brier and ranked probability skill scores. Mon. Wea. Rev., 135, 118124, doi:10.1175/MWR3280.1.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2009: Extending logistic regression to provide full probability distribution MOS forecasts. Meteor. Appl., 16, 361368, doi:10.1002/met.134.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2015: Multivariate ensemble model output statistics using empirical copula. Quart. J. Roy. Meteor. Soc., 141, 945952, doi:10.1002/qj.2414.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., and Hamill T. M. , 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 23792390, doi:10.1175/MWR3402.1.

    • Search Google Scholar
    • Export Citation
  • Williams, R. M., Ferro C. A. T. , and Kwasniok F. , 2014: A comparison of ensemble post-processing methods for extreme events. Quart. J. Roy. Meteor. Soc., 140, 11121120, doi:10.1002/qj.2198.

    • Search Google Scholar
    • Export Citation
  • Zalachori, I., Ramos M. H. , Garçon R. , Mathevet T. , and Gailhard J. , 2012: Statistical processing of forecasts for hydrological ensemble prediction: A comparative study of different bias correction strategies. Adv. Sci. Res., 8, 135141, doi:10.5194/asr-8-135-2012.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    The characteristics of EQ, IR, and SS. Each time, 11 samples are sampled from a standard uniform distribution and then ranked from smallest to largest. The number of repeated implementations is 1000 times. For the EQ approach, the 11 samples are fixed as for 1000 times.

  • Fig. 2.

    The location of the stations in the Xixian basin of the Huai River catchment in eastern China.

  • Fig. 3.

    The RMSE of raw and HELR CEMF.

  • Fig. 4.

    The fraction of measured values falling within the 17%–83% and 8%–92% forecast ranges of the raw ensemble forecast and the calibrated forecast obtained by HELR at eight different lead times.

  • Fig. 5.

    The RPSS with the climatological forecast as a reference for the raw ensemble forecast (R) and the HELR model (HR) at eight different lead times. The lower and upper limit of the box is the 25% and 75% quartile, respectively, and the line inside the box is the median of 300 bootstrap samples (appendix, section c). The whiskers mean the RPSS values are less than 1.5 times the length of the box away from the box, and the plus signs indicate the RPSS value is outside the whiskers.

  • Fig. 6.

    The BSS of the raw forecast and HELR model for thresholds 2, 10, and 20 mm at eight different lead times.

  • Fig. 7.

    The for different numbers of ensemble members sampled from calibrated pdfs of HELR using EQ, IR, and SS.

  • Fig. 8.

    The of 1000 implementations, which shows the mean percentage of measured values falling within the ensemble forecast range sampled from the HELR pdfs using EQ, IR, and SS.

  • Fig. 9.

    The of 1000 implementations for different numbers of ensemble members, sampled from HELR pdfs using EQ, IR, and SS.

  • Fig. 10.

    The RPSS with the climatological forecast as a reference for the raw ensemble, the calibrated forecast of HELR, S-SS, S-EQ, and S-IR (with n = 11) for four stations (columns) and for (top) 0–1- and (bottom) 4–5-day lead times.

  • Fig. 11.

    The BSS for the threshold of 20 mm with the climatological forecast as a reference for the raw ensemble (R), the calibrated forecast of HELR (HR), S-SS, S-EQ, and S-IR (with n = 11) for four stations (columns) and for (top) 0–1- and (bottom) 4–5-day lead times.

  • Fig. 12.

    Multivariate rank histogram of raw ensemble forecasts, ECC-SS, ECC-EQ, and ECC-IR for 0–1- (four upper panels) and 4–5-day (four lower panels) accumulated precipitation.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 1555 874 207
PDF Downloads 935 499 86