1. Introduction
The advantages of ensemble forecasting have been widely recognized, as it takes into account uncertainty and provides a probabilistic forecast, for example, by running a forecast model several times with different initial conditions and often also with differences in model physics or configurations to produce a range of possible outcomes. However, because the perturbed initial conditions do not fully reflect the uncertainty of initial conditions and the forecasting models do not perfectly represent the real physics, the ensemble forecast is often subject to bias and underdispersion (Scherrer et al. 2004; Hamill and Colucci 1997; Messner et al. 2014a; Wilks 2015). Therefore, statistical postprocessing methods have been developed to calibrate the raw ensemble forecasts, such as Gaussian regression (Gneiting et al. 2005; Hagedorn et al. 2008; Kann et al. 2009; Williams et al. 2014), (modified) Bayesian model averaging (Raftery et al. 2005; Sloughter et al. 2007; Hamill 2007; Schmeits and Kok 2010), logistic regression (Hamill et al. 2004, 2008; Wilks and Hamill 2007), ensemble dressing (Roulston and Smith 2003), extended logistic regression (Wilks 2009; Schmeits and Kok 2010), and heteroscedastic extended logistic regression (Messner et al. 2014a,b).
The above-mentioned methods are only for univariate predictands and do not have the ability to consider the dependence structure between different variables, stations, and lead times, which sometimes will result in significant impact on, for example, hydrological applications (Clark et al. 2004; Wilks 2015). To reconstruct the dependence structure from a postprocessed predictive probability distribution, methods such as the Schaake shuffle (Clark et al. 2004) and empirical copula coupling (ECC; Schefzik et al. 2013) have been proposed. In the Schaake shuffle method, the dependence structure is reconstructed based on the historical climatology, while the ECC method is based on raw forecast ensembles. In both methods, a necessary and critical step is to sample discrete samples from each continuous postprocessed predictive distribution. It is this sampling process that we address in this paper. At present, the equidistance quantiles (EQ) and independent random (IR) sampling methods are commonly used (Schefzik et al. 2013; Wilks 2015). The studies conducted by Schefzik et al. (2013) and Wilks (2015) showed that EQ sampling is preferred to IR sampling.
In this paper, we propose to use an alternative sampling method, stratified sampling (SS), to sample discrete samples from each continuous postprocessed predictive distribution. The SS method is able to sample the tails of the calibrated probability density function (pdf) better than EQ. It has been widely used for other applications (Claggett et al. 2010; Wallenius et al. 2011; Noble et al. 2012; Padilla et al. 2014; Ding and Lee 2014). The performance of SS in representing a postprocessed forecast probability distribution and for application in ECC will be assessed and compared with EQ and IR sampling. The methods are applied to GFS ensemble reforecasts of precipitation (Hamill et al. 2013) over the Xixian subbasin of the Huai River catchment in China.
2. Methods used
a. Statistical postprocessing with HELR
Wilks (2009), Schmeits and Kok (2010), and Roulin and Vannitsem (2012) showed that the function
b. Sampling methods
After postprocessing the raw ensemble forecasts using HELR, the continuous postprocessed predictive distributions are obtained. If we want to input the calibrated ensemble forecasts to drive the hydrological model for ensemble streamflow prediction, a necessary and key step is to obtain discrete samples from the continuous postprocessed predictive distributions using a sampling method. At present, the EQ and IR sampling methods are commonly used (Schefzik et al. 2013; Wilks 2015). In this study, the SS method (Ding and Lee 2014) is also used.
The differences among the SS, EQ, and IR approaches can be demonstrated by Fig. 1, where the number of samples sampled from the standard uniform distribution is n = 11 and the number of repeated implementations is 1000. The quantiles are ranked from smallest to largest, one for each implementation. It can be seen that for EQ, the quantile series is fixed [Eq. (3)]. Compared to IR, SS is more representative for the standard uniform distribution because the quantile series of SS always evenly cover the range from 0 to 1, while for IR, the range of samples sometimes is very narrow, such as [0.5, 0.8], failing to properly represent the range from 0 to 1.
In theory, compared to the EQ method, the SS method better samples the tails of the continuous distribution. From Eq. (3), it is clear that EQ does not have the ability to sample the intervals
c. ECC method
The ECC method (Schefzik et al. 2013) is applied here to preserve the space–time-dependent structure of raw ensemble precipitation forecasts when reconstructing the postprocessed ensemble forecast. The first step of ECC is, similarly to standard postprocessing, to apply statistical techniques to postprocess univariate variables and obtain calibrated predictive distributions for each time and location. The second step is to sample discrete values from each postprocessed predictive distribution to obtain postprocessed ensemble forecasts for each time and location. The third step is to rearrange the sampled values in the rank order structure of the original raw ensemble to restore the space–time-dependent structure. In general, ECC can also be used to reconstruct the correlation structure between different variables.
3. Data and experiment setup
a. Study area and data
The data used in this study are covering the Xixian subbasin of the Huai River catchment in eastern China. The data include 24-h accumulated precipitation measurements from 0000 to 0000 UTC at 17 precipitation stations and 24-h accumulated precipitation reforecasts from 0000 to 0000 UTC in 11 model grid boxes with a Gaussian ~0.5° resolution from the GFS 11-member ensemble reforecast (Fig. 2; Hamill et al. 2013). The period from 1 May to 1 October, which covers the rainy season, in the years of 2006–09 is considered. Area-averaged predictive precipitation is calculated by weight averaging the precipitation at 11 forecast model grid points for each ensemble member, and area-averaged measured precipitation is calculated by weight averaging the data of 17 precipitation stations.
b. Experiment setup
To assess the performance of the HELR method for calibrating the raw ensemble precipitation forecasts, the root-mean-square error (RMSE; appendix, section a), the fraction of measured values falling within the forecast range, the ranked probability skill score (RPSS; appendix, section c), and the Brier skill score (BSS; appendix, section d) are used. The following forecast lead times are evaluated: 0–1 day, 1–2 days, 2–3 days, 3–4 days, 4–5 days, 5–6 days, 6–7 days, and 7–8 days. Sixfold cross validation is employed; that is, for a given lead time, five subsamples of the available data are used to train the HELR postprocessing model and the one remaining subsample is used to verify the predictive distribution, with sixfold permutation of these subsamples.
After postprocessing the raw ensemble forecasts using HELR, the continuous postprocessed predictive distributions are obtained. Discrete samples from the calibrated predictive distributions are taken to construct the postprocessed ensemble members with size n using the EQ, IR, and SS methods. The sample size n from a given predictive distribution is set as
As a final part of this study, the effect of the sampling method on the reconstruction of ensemble members with preserved time–space-dependent structure is analyzed by using EQ, IR, and SS in ECC for reconstructing postprocessed ensemble members for four locations. Here, accumulated forecast precipitation for the Huanggang, Dingyuandian, Wulidian, and Xiaocaodian stations in the Xixian basin (Fig. 2) for 0–1- and 4–5-day lead times are used. The ensemble precipitation reforecasts from GFS are bilinearly interpolated from neighboring grid points to the station locations (Messner et al. 2014a,b).
The above-mentioned sixfold cross validation is also used here. Next, SS, EQ, and IR sampling are used to sample discrete values from the postprocessed marginal predictive distributions. Finally, the ECC method is used to construct the dependent structure, and the multivariate rank histogram (appendix, section e) is employed to assess the performance of ECC-SS, ECC-EQ, and ECC-IR.
4. Results and analysis
a. HELR performance for postprocessing ensemble precipitation forecasts
The RMSE is used to assess the performance of the HELR model [Eq. (6)] for the calibrated ensemble mean forecast (CEMF). Figure 3 shows that for the first 4 days, CEMF is better than the raw ensemble mean forecast (REMF). For the fifth and sixth days, CEMF is slightly better or similar to REMF, but for the seventh and eighth days, CEMF is worse than REMF. Overall, with the increase of lead times, the calibration capacity of HELR for the ensemble mean forecast is reduced.
Figure 4 shows the fraction of measured values falling within the 17%–83% and 8%–92% forecast range of the raw ensemble forecasts and calibrated forecasts obtained by HELR, respectively. For the raw ensemble forecasts, the forecast range of 17%–83% is determined using the second-smallest and the tenth-smallest ensemble member, and the 8%–92% forecast range is determined using the smallest and the largest ensemble member for a given forecast time. Compared to the raw forecast, the fraction of measured values falling within the forecast range clearly increases for the calibrated forecast for all lead times. For the 17%–83% forecast range, the underdispersion has been corrected by postprocessing, leading to a good coverage of about 70%. For the 8%–92% forecast range, the underdispersion has also been corrected by postprocessing, but now leading to a slight overdispersion with a coverage of about 92%. This indicates that the calibrated forecast of HELR has clearly improved the dispersion of the predictive distribution.
The RPSS results are presented in Fig. 5. It is clear that the calibrated forecast provided by the HELR model shows a significant improvement over the raw forecast for all lead times. With the lead time increasing, the RPSS values decrease for both the raw forecast and the HELR model, which indicates the decrease in predictability, and the calibration capacity of HELR is reduced.
The Brier skill score with the climatological forecast as a reference is computed for the thresholds 2 mm (63%), 10 mm (88%), and 20 mm (94%). For the three thresholds, the BSS of HELR is larger than that of the raw forecast (Fig. 6). This indicates again that the HELR model has a capacity to calibrate the raw ensemble forecast.
b. Comparison among EQ, IR, and SS
Figure 7 shows that for 0–1- and 4–5-day lead times the
Figure 8 presents the results of
The verification results of SS, EQ, and IR in terms of
c. Application of EQ, IR, and SS in ECC
Now HELR is first applied to postprocess raw ensemble precipitation forecasts for the Huanggang, Dingyuandian, Wulidian, and Xiaocaodian stations to obtain continuous calibrated pdfs for each of these stations. Then, SS, EQ, and IR are used to sample discrete samples (with sample size n = 11) from each HELR pdf, denoted by S-SS, S-EQ, and S-IR, respectively. The ranked probability skill scores (appendix, section c) with the climatological forecast as a reference for the raw ensemble and the calibrated pdfs of HELR, S-SS, S-EQ, and S-IR are calculated for the four stations separately (Fig. 10). It is clear that the calibrated forecast of HELR shows a significant improvement over the raw forecast for all four stations. However, it is also evident that the postprocessed continuous predictive distribution has a higher skill than S-SS, S-EQ, and S-IR with 11 discrete ensemble members sampled from the HELR pdfs. The decrease in skill is caused by the transfer from a continuous predictive distribution to a discrete 11-member ensemble forecast. The S-SS is generally better than S-EQ and S-IR, and S-EQ is better than S-IR. This difference in skill between the sampling methods means that the choice of the sampling method has a considerable effect on preserving the calibration of ensemble forecasts obtained by the postprocessed continuous predictive distribution.
To investigate the skill differences for relatively extreme precipitation between the three sampling methods, the BSS with the climatological forecast as a reference is computed for the high threshold of 20 mm for the four stations and for 0–1- and 4–5-day lead times (Fig. 11). The percentiles corresponding to the threshold of 20 mm for the four stations are between 93% and 95%. For the threshold of 20 mm, the median value of the BSS of HELR is generally larger than those of the raw forecast, S-SS, S-EQ, and S-IR. This indicates that the decrease in BSS for the 20-mm threshold is caused again by the transfer from a continuous predictive distribution to a discrete 11-member ensemble forecast. Overall, S-SS is better than S-EQ and S-IR, and S-EQ is better than S-IR. This result also confirms that SS can better sample the upper tails of the HELR pdfs than the EQ and IR sampling methods can.
Based on the S-SS, S-EQ, and S-IR samples of the four stations, the ECC method is used to reconstruct the dependent structure between the forecasts for the four stations, denoted by ECC-SS, ECC-EQ, and ECC-IR, respectively. Multivariate rank histograms (Gneiting et al. 2008; Schefzik et al. 2013; appendix, section e) are compared for the raw ensemble forecasts, ECC-SS, ECC-EQ, and ECC-IR. Figure 12 presents these multivariate rank histograms for 0–1- and 4–5-day accumulated precipitation. The discrepancy measure values (Gneiting et al. 2008; appendix, section f) are calculated for GFS, ECC-SS, ECC-EQ, and ECC-IR. At the lead time of 0–1 days, these values are 0.609, 0.1194, 0.2291, and 0.1845, respectively, and at the lead time of 4–5 days, they are 0.3495, 0.0991, 0.1727, and 0.2093, respectively. It can be concluded that the underdispersion of the raw ensemble forecasts has been clearly diminished by the three approaches. In this study, ECC-SS is better than ECC-EQ and ECC-IR, as the multivariate rank histogram provided by ECC-SS is more uniform than that of ECC-EQ and ECC-IR. ECC-EQ and ECC-IR still show underdispersion to a certain extent.
5. Summary and conclusions
In this paper, we have proposed to use stratified sampling (SS) for better sampling from a continuous predictive distribution, and a comparison between SS, equidistance quantiles (EQ), and independent random (IR) sampling was made based on 0–1 day, 1–2 days, 2–3 days, 3–4 days, 4–5 days, 5–6 days, 6–7 days, and 7–8 days 24-h precipitation from GFS ensemble reforecasts (Hamill et al. 2013) over part of the Huai River basin (the Xixian subbasin) in China. Heteroscedastic extended logistic regression (HELR; Messner et al. 2014a) was employed to calibrate the ensemble precipitation reforecasts and provided the predictive distributions to sample from. The results show that in terms of RMSE, dispersion of the predictive distribution, ranked probability skill score (RPSS), and Brier skill score (BSS), HELR significantly improves raw area-averaged ensemble precipitation reforecasts, although with the increase of lead times, the calibration capacity of HELR for ensemble mean and probabilistic forecasts is reduced.
For the SS, EQ, and IR approaches used to sample discrete samples from the probability density functions (pdfs) of HELR, and for different sample sizes n = 11, 22, 33, …, 528, 539, 550, the
The application of HELR, sampling, and ECC on ensemble precipitation forecasts for four observation stations in the Xixian basin shows that in terms of ranked probability skill score, the HELR and the discrete samples from the pdfs of HELR using SS, EQ, and IR (denoted by S-SS, S-EQ, and S-IR, respectively) all have a significant improvement over the raw ensemble forecasts for the 0–1-day lead time, while only S-SS and S-EQ show a significant improvement for the 4–5-day lead time. Besides, the BSS values for the threshold of 20 mm show that, compared to EQ and IR, SS can better sample the upper tails of the HELR pdfs. However, a decrease in skill of S-SS, S-EQ, and S-IR with respect to the HELR method is caused by the transfer from the postprocessed continuous predictive distribution to the discrete ensemble forecasts. The selection of the sampling method has a considerable effect on preserving the calibration of ensemble forecasts obtained by the postprocessed continuous predictive distribution. The multivariate rank histograms indicate that the ECC-SS approach is better than ECC-EQ and ECC-IR in preserving the space dependence structure in this study. While the multivariate rank histograms of ECC-EQ and ECC-IR show underdispersion to a certain extent, the multivariate rank histogram of ECC-SS is more uniform, probably because the SS approach better samples the tails of the calibrated pdf. Although we did not show that in this study, it is important to note that ECC-SS is also capable of reconstructing the dependent structure between different lead times and variables, like ECC-EQ and ECC-IR.
The SS approach can also be used for other postprocessing methods, ensemble weather forecasts, or locations, as long as the postprocessed result is a continuous pdf and the discrete samples are sampled from that pdf. However, the differences in skill between SS and EQ are expected to be small if the calibrated pdf has flat tails.
Finally, it is worth noting that, although the raw ensemble precipitation forecasts can be improved and the dependent structure between multivariate meteorological forecasts also can be reconstructed by postprocessing methods, the effect on total predictive uncertainty of ensemble streamflow forecasts may be limited because of the errors linked to hydrological modeling (Zalachori et al. 2012). To obtain more skillful probabilistic streamflow forecasts, a postprocessing method for ensemble streamflow forecasts is (also) needed.
Acknowledgments
This study is supported by the National Natural Science Foundation of China (Grant 51179046); Special Funds for Public Industry Research Projects of the China Ministry of Water Resources (Grant 201301066); the Fundamental Research Funds for the Central Universities (2015B05514); the Graduate Research and Innovation Program for Ordinary University of Jiangsu Province, China (CXZZ13_0248); the Ministry of Infrastructure and the Environment of the Netherlands; and China Scholarship Council. Jakob Messner and Achim Zeileis are thanked for providing the (heteroscedastic) extended logistic regression crch R package available online (https://cran.r-project.org/). We are also grateful to all anonymous reviewers for their helpful comments, which helped us to improve the quality of the article.
APPENDIX
Verification Metrics
a. RMSE
b. Ratio of measured values falling within the prediction range
c. RPSS
The RPS in each subsample is calculated for the fitted model, trained using five remaining subsamples. The thresholds are set to 2 mm (63% for area-averaged precipitation), 5 mm (79%), 8 mm (85%), 10 mm (88%), 15 mm (92%), and 20 mm (94%). For each given projection time, 300 bootstrap samples of RPS are used to estimate the sampling distribution of average
d. BSS
e. Multivariate rank histogram
To assess the preservation of the dependent structure obtained by ECC-EQ, ECC-IR, and ECC-SS, the multivariate rank histogram is used, which can be described as follows (Gneiting et al. 2008; Schefzik et al. 2013).
Let
- For a given time j, use Eq. (A9) to find the prerank of
among the observation and the ensemble member forecasts. Each prerank is an integer between 1 and n + 1: - Calculate the multivariate rank
for the given time j as follows: The multivariate rankis chosen from a discrete uniform distribution in the set . It is an integer between 1 and n + 1. Based on
, the multivariate rank histogram can be obtained; it is a plot of the empirical frequency of the multivariate ranks. Underdispersed ensembles tend to produce U-shaped histograms, overdispersed ensembles produce hump-shaped histograms, and biased ensembles produce skewed rank histograms.
f. Discrepancy measure
REFERENCES
Claggett, P. R., Okay J. A. , and Stehman S. V. , 2010: Monitoring regional riparian forest cover change using stratified sampling and multiresolution imagery. J. Amer. Water Resour. Assoc., 46, 334–343, doi:10.1111/j.1752-1688.2010.00424.x.
Clark, M., Gangopadhyay S. , Hay L. , Rajagopalan B. , and Wilby R. , 2004: The Schaake shuffle: A method for reconstructing space–time variability in forecasted precipitation and temperature fields. J. Hydrometeor., 5, 243–262, doi:10.1175/1525-7541(2004)005<0243:TSSAMF>2.0.CO;2.
Ding, C. G., and Lee H. Y. , 2014: An accurate confidence interval for the mean tourist expenditure under stratified random sampling. Curr. Issues Tour., 17, 674–678, doi:10.1080/13683500.2013.857296.
Gneiting, T., Raftery A. E. , Westveld A. H. III, and Goldman T. , 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, doi:10.1175/MWR2904.1.
Gneiting, T., Stanberry L. I. , Grimit E. P. , Held L. , and Johnson N. A. , 2008: Assessing probabilistic forecasts of multivariate quantities, with applications to ensemble predictions of surface winds. TEST, 17, 211–235, doi:10.1007/s11749-008-0114-x.
Hagedorn, R., Hamill T. M. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures. Mon. Wea. Rev., 136, 2608–2619, doi:10.1175/2007MWR2410.1.
Hamill, T. M., 2007: Comments on “Calibrated surface temperature forecasts from the Canadian ensemble prediction system using Bayesian model averaging.” Mon. Wea. Rev., 135, 4226–4230, doi:10.1175/2007MWR1963.1.
Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range ensemble forecasts. Mon. Wea. Rev., 125, 1312–1327, doi:10.1175/1520-0493(1997)125<1312:VOERSR>2.0.CO;2.
Hamill, T. M., Whitaker J. S. , and Wei X. , 2004: Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts. Mon. Wea. Rev., 132, 1434–1447, doi:10.1175/1520-0493(2004)132<1434:ERIMFS>2.0.CO;2.
Hamill, T. M., Hagedorn R. , and Whitaker J. S. , 2008: Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation. Mon. Wea. Rev., 136, 2620–2632, doi:10.1175/2007MWR2411.1.
Hamill, T. M., Bates G. T. , Whitaker J. S. , Murray D. R. , Fiorino M. , Galarneau T. J. , Zhu Y. , and Lapenta W. , 2013: NOAA’s second-generation global medium-range ensemble forecast dataset. Bull. Amer. Meteor. Soc., 94, 1553–1565, doi:10.1175/BAMS-D-12-00014.1.
Kann, A., Wittmann C. , Wang Y. , and Ma X. , 2009: Calibrating 2-m temperature of limited-area ensemble forecasts using high-resolution analysis. Mon. Wea. Rev., 137, 3373–3387, doi:10.1175/2009MWR2793.1.
Messner, J. W., Mary G. J. , Zeileis A. , and Wilks D. S. , 2014a: Heteroscedastic extended logistic regression for post-processing of ensemble guidance. Mon. Wea. Rev., 142, 448–456, doi:10.1175/MWR-D-13-00271.1.
Messner, J. W., Mary G. J. , and Zeileis A. , 2014b: Extending extended logistic regression: Extended versus separate versus ordered versus censored. Mon. Wea. Rev., 142, 3003–3014, doi:10.1175/MWR-D-13-00355.1.
Noble, W., Naylor G. , Bhullar N. , and Akeroyd M. A. , 2012: Self-assessed hearing abilities in middle- and older-age adults: A stratified sampling approach. Int. J. Audiol., 51, 174–180, doi:10.3109/14992027.2011.621899.
Padilla, M., Stehmanb S. V. , and Chuviec E. , 2014: Validation of the 2008 MODIS-MCD45 global burned area product using stratified random sampling. Remote Sens. Environ., 144, 187–196, doi:10.1016/j.rse.2014.01.008.
Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174, doi:10.1175/MWR2906.1.
Roulin, E., and Vannitsem S. , 2012: Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts. Mon. Wea. Rev., 140, 874–888, doi:10.1175/MWR-D-11-00062.1.
Roulston, M. S., and Smith L. A. , 2003: Combining dynamical and statistical ensembles. Tellus, 55A, 16–30, doi:10.1034/j.1600-0870.2003.201378.x.
Schefzik, R., Thorarinsdottir T. L. , and Gneiting T. , 2013: Uncertainty quantification in complex simulation models using ensemble copula coupling. Stat. Sci., 28, 616–640, doi:10.1214/13-STS443.
Scherrer, S. C., Appenzeller C. , Eckert P. , and Cattani D. , 2004: Analysis of the spread–skill relations using the ECMWF ensemble prediction system over Europe. Wea. Forecasting, 19, 552–565, doi:10.1175/1520-0434(2004)019<0552:AOTSRU>2.0.CO;2.
Schmeits, M. J., and Kok K. , 2010: A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts. Mon. Wea. Rev., 138, 4199–4211, doi:10.1175/2010MWR3285.1.
Sloughter, J. M., Raftery A. E. , Gneiting T. , and Fraley C. , 2007: Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Mon. Wea. Rev., 135, 3209–3220, doi:10.1175/MWR3441.1.
Wallenius, K., Niemi R. M. , and Rita H. , 2011: Using stratified sampling based on pre-characterisation of samples in soil microbiological studies. Appl. Soil Ecol., 51, 111–113, doi:10.1016/j.apsoil.2011.09.006.
Weigel, A. P., Liniger M. A. , and Appenzeller C. , 2007: The discrete Brier and ranked probability skill scores. Mon. Wea. Rev., 135, 118–124, doi:10.1175/MWR3280.1.
Wilks, D. S., 2009: Extending logistic regression to provide full probability distribution MOS forecasts. Meteor. Appl., 16, 361–368, doi:10.1002/met.134.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.
Wilks, D. S., 2015: Multivariate ensemble model output statistics using empirical copula. Quart. J. Roy. Meteor. Soc., 141, 945–952, doi:10.1002/qj.2414.
Wilks, D. S., and Hamill T. M. , 2007: Comparison of ensemble-MOS methods using GFS reforecasts. Mon. Wea. Rev., 135, 2379–2390, doi:10.1175/MWR3402.1.
Williams, R. M., Ferro C. A. T. , and Kwasniok F. , 2014: A comparison of ensemble post-processing methods for extreme events. Quart. J. Roy. Meteor. Soc., 140, 1112–1120, doi:10.1002/qj.2198.
Zalachori, I., Ramos M. H. , Garçon R. , Mathevet T. , and Gailhard J. , 2012: Statistical processing of forecasts for hydrological ensemble prediction: A comparative study of different bias correction strategies. Adv. Sci. Res., 8, 135–141, doi:10.5194/asr-8-135-2012.