Verification of TIGGE Multimodel and ECMWF Reforecast-Calibrated Probabilistic Precipitation Forecasts over the Contiguous United States*

Thomas M. Hamill NOAA/Earth System Research Laboratory/Physical Sciences Division, Boulder, Colorado

Search for other papers by Thomas M. Hamill in
Current site
Google Scholar
PubMed
Close
Full access

We are aware of a technical issue preventing figures and tables from showing in some newly published articles in the full-text HTML view.
While we are resolving the problem, please use the online PDF version of these articles to view figures and tables.

Abstract

Probabilistic quantitative precipitation forecasts (PQPFs) were generated from The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) database from July to October 2010 using data from Europe (ECMWF), the United Kingdom [Met Office (UKMO)], the United States (NCEP), and Canada [Canadian Meteorological Centre (CMC)]. Forecasts of 24-h accumulated precipitation were evaluated at 1° grid spacing within the contiguous United States against analysis data based on gauges and bias-corrected radar data.

PQPFs from ECMWF’s ensembles generally had the highest skill of the raw ensemble forecasts, followed by CMC. Those of UKMO and NCEP were less skillful. PQPFs from CMC forecasts were the most reliable but the least sharp, and PQPFs from NCEP and UKMO ensembles were the least reliable but sharper.

Multimodel PQPFs were more reliable and skillful than individual ensemble prediction system forecasts. The improvement was larger for heavier precipitation events [e.g., >10 mm (24 h)−1] than for smaller events [e.g., >1 mm (24 h)−1].

ECMWF ensembles were statistically postprocessed using extended logistic regression and the five-member weekly reforecasts for the June–November period of 2002–09, the period where precipitation analyses were also available. Multimodel ensembles were also postprocessed using logistic regression and the last 30 days of prior forecasts and analyses. The reforecast-calibrated ECMWF PQPFs were much more skillful and reliable for the heavier precipitation events than ECMWF raw forecasts but much less sharp. Raw multimodel PQPFs were generally more skillful than reforecast-calibrated ECMWF PQPFs for the light precipitation events but had about the same skill for the higher-precipitation events; also, they were sharper but somewhat less reliable than ECMWF reforecast-based PQPFs. Postprocessed multimodel PQPFs did not provide as much improvement to the raw multimodel PQPF as the reforecast-based processing did to the ECMWF forecast.

The evidence presented here suggests that all operational centers, even ECMWF, would benefit from the open, real-time sharing of precipitation forecast data and the use of reforecasts.

Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/MWR-D-11-00220.s1.

Corresponding author address: Dr. Thomas M. Hamill, NOAA/ESRL, Physical Sciences Division, R/PSD 1, 325 Broadway, Boulder, CO 80305-3328. E-mail: tom.hamill@noaa.gov

Abstract

Probabilistic quantitative precipitation forecasts (PQPFs) were generated from The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) database from July to October 2010 using data from Europe (ECMWF), the United Kingdom [Met Office (UKMO)], the United States (NCEP), and Canada [Canadian Meteorological Centre (CMC)]. Forecasts of 24-h accumulated precipitation were evaluated at 1° grid spacing within the contiguous United States against analysis data based on gauges and bias-corrected radar data.

PQPFs from ECMWF’s ensembles generally had the highest skill of the raw ensemble forecasts, followed by CMC. Those of UKMO and NCEP were less skillful. PQPFs from CMC forecasts were the most reliable but the least sharp, and PQPFs from NCEP and UKMO ensembles were the least reliable but sharper.

Multimodel PQPFs were more reliable and skillful than individual ensemble prediction system forecasts. The improvement was larger for heavier precipitation events [e.g., >10 mm (24 h)−1] than for smaller events [e.g., >1 mm (24 h)−1].

ECMWF ensembles were statistically postprocessed using extended logistic regression and the five-member weekly reforecasts for the June–November period of 2002–09, the period where precipitation analyses were also available. Multimodel ensembles were also postprocessed using logistic regression and the last 30 days of prior forecasts and analyses. The reforecast-calibrated ECMWF PQPFs were much more skillful and reliable for the heavier precipitation events than ECMWF raw forecasts but much less sharp. Raw multimodel PQPFs were generally more skillful than reforecast-calibrated ECMWF PQPFs for the light precipitation events but had about the same skill for the higher-precipitation events; also, they were sharper but somewhat less reliable than ECMWF reforecast-based PQPFs. Postprocessed multimodel PQPFs did not provide as much improvement to the raw multimodel PQPF as the reforecast-based processing did to the ECMWF forecast.

The evidence presented here suggests that all operational centers, even ECMWF, would benefit from the open, real-time sharing of precipitation forecast data and the use of reforecasts.

Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/MWR-D-11-00220.s1.

Corresponding author address: Dr. Thomas M. Hamill, NOAA/ESRL, Physical Sciences Division, R/PSD 1, 325 Broadway, Boulder, CO 80305-3328. E-mail: tom.hamill@noaa.gov

1. Introduction

An ongoing challenge with short- and medium-range ensemble prediction systems (EPSs) is how to generate probabilistic forecasts that account for the system errors in the ensemble. System errors include sampling error due to the finite ensemble size, the error introduced by model imperfections such as the grid truncation, the use of deterministic parameterizations (Houtekamer and Mitchell 2005), and assimilation system and observation imperfections. There are many methods for treating system error, from introducing stochastic aspects into the ensemble prediction system (Buizza et al. 1999; Shutts 2005; Berner et al. 2009; Palmer et al. 2009; Charron et al. 2010), using multiple parameterizations (Charron et al. 2010; Berner et al. 2011), using multiple models (Bougeault et al. 2010), and statistical postprocessing.

Two methods that will be explored and contrasted here are multimodel methods and statistical postprocessing. The underlying hypothesis of multimodel ensembles (Krishnamurti et al. 2000; Wandishin et al. 2001; Mylne et al. 2002; Doblas-Reyes et al. 2005; Hagedorn et al. 2005; Weigel et al. 2008; Candille 2009; Johnson and Swinbank 2009; Bougeault et al. 2010; Iversen et al. 2011) is that the many differences between constituent EPSs will result in them generating ensemble forecasts with quasi-independent systematic errors, so the combination may result in a more accurate estimate of the uncertainty. Practically, also, these are ensembles of opportunity. If all centers are willing to share rather than sell their forecast data, the additional members’ ensembles can be used for only the cost of data transmittal and storage, so they may provide an inexpensive way to improve forecast skill. However, there are some potential disadvantages of multimodel ensembles. Developing an accurate, stable weather prediction system is costly, so multimodel ensembles are likely to be less useful when formed from immature systems. System outages may prevent routine access to other centers’ ensembles. One or another of the models is likely to have been changed recently, rendering it difficult to understand the multimodel system error characteristics. Also, the hypothesis of quasi-independent errors may not always hold. Practically, each operational center is interested in providing a high-quality product without depending on another center’s data. When another center develops a method that improves the forecast significantly, it may be adopted at other operational centers. The similarity could result in some colinearity of errors and decreased collective usefulness (Lorenz et al. 2011).

Another method for addressing system error is through statistical postprocessing. Discrepancies between time series of past forecasts from a fixed model and the verifying observations/analyses can be used to modify the real-time forecasts. For some variables such as short-range forecasts of surface temperature, a short time series may be sufficient (Stensrud and Yussouf 2003; Yussouf and Stensrud 2007; Hagedorn et al. 2008). For others such as heavy precipitation and longer-lead forecasts, using a long time series of reforecasts has been shown to dramatically improve the reliability and skill of the probabilistic forecasts (Hamill et al. 2004, 2006; Hamill and Whitaker 2007; Wilks and Hamill 2007; Hamill et al. 2008). A drawback of using reforecasts is that a forecast time series spanning many years or even decades may be necessary to produce a sufficiently large sample to adjust for systematic errors in rare-event forecasts. Since forecast models are frequently updated, which may change the systematic error characteristics, either a forecast model must be frozen once a reforecast dataset has been generated or a new reforecast dataset must be generated every time the modeling system changes significantly. Hence, reforecasting can be computationally expensive and can restrict the ability of a forecast center to upgrade its system rapidly. Recently, statistical postprocessing methods have been the subject of much investigation (Gneiting et al. 2005; Raftery et al. 2005; Sloughter et al. 2007; Wilson et al. 2007; Vannitsem and Nicolis 2008; Glahn et al. 2009; Bao et al. 2010).

To date, however, there have been no systematic comparisons of multimodel and reforecast-calibrated probabilistic quantitative precipitation forecasts (PQPFs) verified over a large enough area and a long enough period of time to confidently assess the relative strengths and weaknesses of these two approaches. This study attempts to provide such a comparison for this high-impact forecast parameter. Using The Observing System Research and Predictability Experiment (THORPEX) Interactive Grand Global Ensemble (TIGGE) forecast data from the National Centers for Environmental Prediction (NCEP), the Canadian Meteorological Centre (CMC), the Met Office (UKMO), and the European Centre for Medium-Range Weather Forecasts (ECMWF), multimodel ensemble 24-h accumulated probabilistic forecasts of precipitation were generated and then compared against ECMWF forecasts that were statistically adjusted using their reforecast dataset. The comparison was performed over the contiguous United States (CONUS) during the period of July–October 2010. Statistical adjustments were also attempted with multimodel forecasts, trained on the previous 30 days of forecasts and analyses.

Below, section 2 describes the datasets used in this experiment, the verification methodology, and the statistical postprocessing method. Section 3 provides results, and section 4 some conclusions.

2. Datasets and methods

a. Analysis data used

A recently created precipitation dataset, NCEP’s Climatology-Calibrated Precipitation Analysis (CCPA), was used for verification. CCPA attempts to combine the relative advantages of the 4-km, hourly NCEP stage-IV precipitation analysis (Lin and Mitchell 2005) and the daily, 0.25° NCEP Climate Prediction Center (CPC) Unified Precipitation Analysis (Higgins et al. 1996). The former is based on gauge and radar data, the latter solely on gauge data. A disadvantage of the stage-IV product is that it may inherit some of the biases due to the estimation of rainfall from radars. A disadvantage of the CPC product is that there are areas of the United States that are only sparsely covered by gauge data. CCPA regressed the stage-IV analysis (the predictor) to the CPC analysis (the predictand), thereby reducing bias with respect to the in situ observations. In several of the driest locations in the western United States, the CCPA analysis was set to missing, for the regression analysis was untrustworthy and singular because of no precipitation in either analysis product. In such cases, CCPA for this study was simply replaced with the stage-IV analysis. For our purposes, we used CCPAs that also were upscaled to 1° and accumulated over a 24-h period in a manner that preserved total precipitation, similar to the “remapping” procedure described in Accadia et al. (2003). CCPAs were available from 2002 to present, a shorter period than the ECMWF reforecasts, thus limiting the amount of training data that could be used in the statistical postprocessing.

b. Forecast and reforecast model data

For this experiment, 20 perturbed member forecasts of 24-h accumulated precipitation were extracted from the UKMO, CMC, NCEP, and ECMWF ensemble systems archived in the TIGGE database at ECMWF. Probabilities were calculated directly from the ensemble relative frequency, referred to as “raw” probabilities henceforth. The forecast period was from July to October 2010; only 0000 UTC initial time forecasts were extracted to allow comparison with postprocessed forecasts using ECMWF’s reforecasts, which were generated only from 0000 UTC initial conditions. Daily forecasts of 24-h accumulated precipitation were examined from +1- to +5-day leads. Regardless of the original model resolution, all centers’ forecasts were bilinearly interpolated to a 1° latitude–longitude grid covering CONUS using ECMWF’s TIGGE portal software. ECMWF’s interpolation procedure set the amount to zero if there was no precipitation at the nearest neighboring point and the interpolated value was less than 0.05 mm. No control forecasts were included—just the forecasts from the perturbed initial conditions. Other forecast centers’ contributions to the TIGGE archive were not used here for various reasons, such as the unavailability of 0000 UTC ensemble forecasts from the Japan Meteorological Agency. For size consistency and to facilitate skill comparisons, only the first 20 of the full 50 ECMWF member forecasts were used in the generation of the multimodel ensemble, though the 50-member ECMWF forecasts were evaluated for skill and reliability. More detailed descriptions of the configurations of these four ensemble systems are described in appendix A.

When calibrating ECMWF data with reforecasts, the five-member weekly reforecast precipitation data were extracted from ECMWF’s weekly reforecast archive (Hagedorn 2008) and similarly interpolated to the 1° grid. The control reforecasts were initialized from the ECMWF Interim reanalysis (ERA-Interim; Dee et al. 2011), which used version Cy31r2 of the ECMWF Integrated Forecast System (IFS) in the data assimilation process. The 2010 real-time ensemble forecasts and the reforecasts were then run using IFS model version Cy36r2 (more detail is provided in appendix A).

The four perturbed initial conditions for the reforecasts were generated with a combination of their singular-vector approach (Molteni et al. 1996; Barkmeijer et al. 1999) and their “ensembles of data assimilations” (EDA; Isaksen et al. 2010) that used a cycled, reduced-resolution, four-dimensional variational data assimilation (4D-Var) and perturbed observations. However, for initialization of the reforecasts, ECMWF used the EDA perturbations from 2010 and applied them to the 2002–09 data rather than running the EDA during the reforecast period. To apply EDA to dates in the past would have been computationally expensive, but having not done so may have resulted in the perturbations in the reforecast having less flow-dependent character, possibly making them somewhat statistically inconsistent with ECMWF’s real-time ensemble forecasts.

Since precipitation analysis data were only available for the period from 2002 forward, the training data were limited to the reforecasts for the period from June to November 2002–09, or 8 yr. To limit the possible deleterious effects of seasonal biases in the forecast model, only the reforecast data for the month of interest and the months before and after were used. For example, when calibrating July forecasts, June–August reforecast data were used. With reforecasts generated once per week, this typically meant there were ~13 once-weekly samples × 8 yr = 104 samples. This was in many cases an insufficient sample size, so data from other grid points were used to increase the training sample size (appendix B).

c. Statistical postprocessing methodology

The extended logistic regression (ELR) approach of Wilks (2009) was used here, a procedure that permitted the development of a single regression equation that was suitable for predicting the probabilities of exceeding any precipitation amount. The probability was estimated with a function of the form
e1
where f(x) was a linear function of the predictor variables. In this case, the predictors were (i) the ensemble-mean forecast raised to the 0.4 power, (ii) the product of (i) and the variance to the 0.4 power, and (iii) the precipitation event threshold T raised to the 0.4 power. The linear function was thus
e2
The choice of these predictors was arrived at through some trial and error. The power transformation of the predictors helped make the input data somewhat more normally distributed. The probabilistic forecast skill was also only mildly dependent on the inclusion/exclusion of the predictor with the product of the transformed mean and variance. The skill was also only slightly dependent on the power of the transform, with 0.4 providing an approximate minimum. Previous values of power transformations in the literature have ranged from ½ in Hamill and Whitaker (2006) and Schmeits and Kok (2010) to ⅓ in Sloughter et al. (2007) to ¼ in Hamill et al. (2008) and Roulin and Vannitsem (2012). The use of the product of the ensemble mean and variance follows Wilks and Hamill (2007). The additional predictor incorporating T permitted the single regression equation to be used to predict probabilities across the range of possible amounts. A disadvantage of this ELR approach [as opposed to approaches such as the analog approach discussed in Hamill and Whitaker (2006)] was that this algorithm was not able to correct for possible position biases in forecast features.

ELR was applied both to calibrating real-time multimodel forecasts and to calibrating ECMWF forecasts alone using the weekly reforecasts. It was found that forecast skill increased if some method was applied to increase the modest training sample sizes. A discussion of how sample sizes were augmented using data from other nearby forecast grid points is provided in appendix B.

Roulin and Vannitsem (2012) noted that since the ECMWF reforecast size (5 members) was smaller than the operational ensemble size (50 members, or in the case here, 20 members selected from the 50), the regression coefficients may be somewhat biased when trained with a smaller ensemble compared to what they would be were they trained with a larger ensemble. Hence, when the coefficients are used to correct the larger real-time ensemble, they may produce somewhat biased probabilistic forecasts. They adjusted the values of the five-member ensemble training data to better estimate the values that would be obtained with the larger real-time ensemble. An analogous approach was tried here but did not improve the forecast skill. The results discussed below will omit this adjustment.

d. Verification methods

The primary verification methods used here were Brier skill scores (BSSs), continuous ranked probability skill scores (CRPSSs), and reliability diagrams (Wilks 2006). The BSS and CRPSS as conventionally calculated (see section 7.4.2 of Wilks 2006) can exaggerate forecast skill, attributing skill to variations in climatological event probabilities. Thus, the procedures suggested in Hamill and Juras (2006) were used here to avoid this.

To calculate the BSS, the score was calculated separately for subsets of points that had more uniform climatological probabilities. The overall BSS was the average of the skill scores over these subsets. The specific procedure was as follows: using the 1° precipitation analysis data from 2002 to 2009, for each month the climatological probability of a given precipitation event was estimated from the observed frequency. For a given precipitation event such as >1 mm (24 h)−1, the ns grid points within CONUS were sorted from lowest to highest event probability. The sorted points were then divided into k = 6 classes, with the lowest bin containing the ~n s/6 grid points with the lowest event probabilities, the highest bin containing the n s/6 points with the highest probabilities, and so on (Fig. 1). Let denote a matrix of Brier scores for forecast model f1, where was an nd -dimensional (=123, the number of case days here) column vector of average Brier scores for the points in the ith class and for forecast model f1. An element of this vector thus provided the average Brier score for all of the grid points in the ith class on a particular day; the samples were weighted by the cosine of their latitude to account for differences in gridbox size. The average over the 123 case days produced a vector . Similarly, for climatology there was an array of Brier scores, , and a vector of their averages over the 123 days, . Following Hamill and Juras [2006, their Eq. (9)], the overall BSS for model f1 was then calculated as
e3
The boundaries between the classes were calculated independently for each event, so it was possible that a given grid point may have been assigned to different classes when evaluating, say, the 1- and 10-mm BSSs.

Fig. 1.
Fig. 1.

Illustration of the process for determining precipitation classes used in the calculation of BSS. (a) Climatological probability of >1 mm (24 h)−1 precipitation as determined from stage-IV data for September 2002–09. (b) Climatological class assigned to each grid point for 1 mm (24 h)−1 event in September.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

BSS confidence intervals were estimated using the paired block bootstrap approach of Hamill (1999). The input data to the bootstrap approach consisted of arrays of and for two competing models, f1 and f2, as well as . Let be the vector of forecast scores on the dth case day, and similarly the vector for forecast model f2. The daily differences in Brier scores, , were determined to be approximately statistically independent of , with 1-day lagged rank correlations ranging from 0.08 for 1-day forecasts to 0.21 for 5-day forecasts. Thus, the data were judged to be amenable to a paired resampling strategy, with these distinct vector blocks of data for each day. The following process was then repeated 10 000 times. For each of the 123 days, a random uniform number between 0 and 1 was generated. If the number was greater than 0.5, was randomly selected for inclusion in sample 1 (s1), was selected for inclusion in sample 2 (s2), and vice versa if the number was ≤0.5. The vector of average Brier scores for samples s1 and s2 was then calculated: and . The BSSs for samples 1 and 2 were generated via Eq. (1), and the difference between the BSSs for the two samples was noted. The confidence intervals are the 5th and 95th percentiles of the difference between the BSSs of the two samples from the distribution generated through the 10 000 iterations.

These block bootstrap confidence intervals should be regarded as approximations. An assumption underlying this process is that there were 123 independent data samples. However, and were slightly correlated as discussed above, especially for the longer-lead forecasts, which will contribute a slight overestimate of the effective sample size and thus underestimate of the confidence interval. On the other hand, data from grid points across CONUS were aggregated in this procedure and thereafter considered a single block. In reality there may be far more than one independent sample spanning CONUS, thus leading to an underestimate of sample size and consequent overestimate of the confidence interval in this approach. Note also that for simplicity of presentation, the skill diagrams will show only one set of confidence intervals (e.g., between NCEP and ECMWF forecasts). Slightly smaller confidence intervals could be expected were they computed using ECMWF and CMC forecasts, given their more similar skills.

To make sure that the CRPSS did not excessively reflect the skill of the climatologically wet grid points, an alternative to the standard method of CRPSS calculation was followed here. This method is analogous to how CRPSS would be computed if the forecasts were of probabilities of exceedance of various quantiles. Examples of such a forecast product expressed in quantiles are NCEP/CPC’s 6–10-day and 8–14-day forecasts (e.g., http://www.cpc.ncep.noaa.gov/products/predictions/610day/), which provide probabilities of below-normal/near-normal/above-normal temperature and precipitation (i.e., probabilities for the <⅓ and ≥⅔ quantiles). In this alternative method of calculation, the continuous ranked probability score (CRPS) at a given grid point was not computed by integrating differences between observed and forecast cumulative distribution functions (CDFs) over a range of precipitation values (the standard method). Instead, the differences between observed and forecast CDFs were integrated over the percentiles of the CDF, which were determined separately for each model grid point and each month. Specifically, given nd case days, for the s = 1, … , nd × ns samples, let q s = be the 20-dimensional vector of the precipitation quantiles associated with the 2.5th, 7.5th, … , 97.5th percentiles of the climatological CDF for that point and that month. The average forecast CRPS f was determined by integrating in steps of 5%:
e4
where represents the forecast’s CDF for the sth sample evaluated at the quantile, and represents the same, but for the observed (analyzed), and is the latitude of the grid box, the cosine factor accounting for latitudinal variations in gridbox size. For raw ensemble guidance, was directly computed from the ensemble relative frequency. For example, if 5 of 20 members had values exceeding , then = 0.75. For postprocessed forecasts, was determined by the numerical integration of Eqs. (1) and (2). For the observed CDF, the analyzed state was assumed perfect; that is, no analysis errors were incorporated, so the analyzed CDF was a Heaviside function: 0 at the quantiles less than the analyzed value, 1 at quantiles greater than or equal to the analyzed value. The CRPS of the climatological forecast, CRPS c , was calculated as in Eq. (4), but substituting the climatological CDF for the forecast CDF. Finally, the overall skill score was calculated as CRPSS = 1. − CRPS f /CRPS c . As with the BSS, a paired block bootstrap approach was used to estimate the confidence intervals.

Two other common verification statistics were also used—root-mean-square (RMS) errors and bias (the average forecast divided by the average analyzed amount).

3. Results

a. Properties of forecasts from the individual centers

Before considering the multimodel and ECMWF reforecast-calibrated forecast properties, let us consider the verification of PQPFs from the individual centers. Figure 2 shows >1 mm (24 h)−1 and >10 mm (24 h)−1 event BSSs and CRPSSs. ECMWF generally produced the most skillful raw precipitation PQPFs. Depending on the metric, either NCEP or UKMO produced the least skillful forecasts.


Fig. 2.
Fig. 2.

BSSs of various forecasts for the (a) >1 mm (24 h)−1 event, (b) >10 mm (24 h)−1 event, and (c) CRPSSs, all as a function of forecast lead time. Error bars denote confidence intervals, the 5th and 95th percentiles of a paired block bootstrap between ECMWF and NCEP forecasts.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

Interestingly, though UKMO forecasts generally appeared to be more skillful than NCEP forecasts for BSS, they appeared to be consistently worse for CRPSS. This was a consequence of the CRPSS verification algorithm as implemented here, which attempted to equally weight the CRPSS at all grid points, irrespective of whether the climatological event probability was extremely high or extremely low. The conventionally calculated CRPSS is dominated by the performance of the forecasts in the climatologically wet areas (Hamill and Juras 2006). There is inherently greater climatological variance of precipitation for the wet regions, and associated with this there are generally much larger CRPS values than in dry regions. Consequently, when evaluated over many grid points, the conventionally calculated CRPS and hence the CRPSS are dominated by the performance at the wetter points. Figure 3 shows maps of the day +3 CRPSS scores (see the supplemental material for CRPSS maps for the other lead times). The UKMO forecasts had negative skill in the extremely dry regions of the western United States. The RMS errors of the ensemble-mean forecasts in the dry regions of all the models were very small and relatively similar (Fig. 4a; for other lead times, see the supplemental material). However, the UKMO forecasts exhibited a large moist bias in the climatologically dry regions (Fig. 4b), which resulted in a very large overforecast of probabilities and poor skill for those points. This was apparently due to a drizzle overforecast bias in that version of the UKMO’s forecast model (D. Barker 2011, personal communication).


Fig. 3.
Fig. 3.

Maps of average CRPSS for day +3 forecasts for (a) ECMWF, (b) NCEP, (c) UKMO, and (d) CMC.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1


Fig. 4.
Fig. 4.

(a) RMS errors and (b) bias for day +3 forecasts, each as a function of the climatological probability of >1 mm (24 h)−1 precipitation. Light gray bars in (a) denote the relative frequency of each climatological probability.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

Figure 4b also illustrates some other interesting characteristics of the ensemble systems. NCEP overforecasted rainfall for the grid points and dates where the climatological probability was already quite high. CMC forecasts were also biased, exhibiting a moist bias at the lowest climatological probabilities but dry biases for most of the rest of the larger climatological probabilities. ECMWF forecasts were the least biased, with a moderate overforecast bias at the low climatological probabilities.

Figure 5 provides reliability diagrams of day +3 forecasts of the >10 mm (24 h)−1 event. Other reliability diagrams for other lead times and for the >1 mm (24 h)−1 event are available in the supplemental material. CMC forecasts were generally the most reliable, though they were not as sharp as the ECMWF forecasts and hence had a lower BSS. UKMO and NCEP forecasts were much less reliable, though NCEP forecasts were slightly sharper than the others. ECMWF 50-member forecasts were slightly more reliable and skillful than their 20-member subset.


Fig. 5.
Fig. 5.

Reliability diagrams for day +3 forecasts for the >10 mm (24 h)−1 event. (a) ECMWF, (b) NCEP, (c) CMC, and (d) UKMO. The dark line on each is the 20-member reliability curve. The lighter gray line in (a) is the reliability for the full 50-member ensemble. The inset histogram bars show the relative frequency of usage for each probability bin. The black lines on the inset are the relative frequency of usage for the climatological distribution across all the sample points. The gray dots on the inset histogram of (a) are the relative frequency of usage for the ECMWF full 50-member ensemble.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

In subjective analyses of individual forecasts, it appeared that several of the forecast models had subtle systematic northward biases in the northern central United States. Figure 6 shows the 10-mm observed contour and the 0.5 probability contour for the >10 mm (24 h)−1 event from the day +3 ECMWF forecasts. Here, the 25 cases with the largest areal coverage of observed precipitation between 105° and 80°W longitude and 35° and 50°N latitude were chosen. Similar plots for the other forecast models are included in the supplemental material.


Fig. 6.
Fig. 6.

Analyzed >10 mm (24 h)−1 precipitation boundary (black line) and area exceeding 10 mm (gray shading) for 25 cases with the largest areal coverage of greater than 10 mm in the upper Midwest. Red lines indicate the 0.5 probability contour from the ECMWF ensemble for the day +3 forecasts of >10 mm (24 h)−1 precipitation.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

b. Properties of multimodel and statistically postprocessed forecasts

Before considering verification scores, consider first two actual forecast cases, presented in Figs. 7 and 8, showing probabilities from the 20-member ensembles and from the 80-member multimodel ensemble. The first case, covering the 24-h period ending 0000 UTC 21 July 2010, illustrates that sometimes the forecast models could be overly similar to each other. Here all the forecast precipitation areas were significantly north of the observed area. A multimodel forecast would not be expected to provide much benefit in such a situation. Figure 8 shows the same type of plot, but for the 24-h period ending 0000 UTC 7 August 2010. Here the multimodel forecast provided some improvement. On this day the CMC and UKMO areas of high probabilities were too far north and the NCEP area too far south, but the higher probabilities in the multimodel forecasts were more coincident with the analyzed regions exceeding 10 mm. Most of the area with greater than 10 mm in the analysis was covered by nonzero multimodel probabilities. More generally, when there was some diversity of positions in the multimodel forecasts, this often allowed the forecast to avoid being inappropriately sharp.


Fig. 7.
Fig. 7.

(a) Analyzed precipitation for the 24-h period ending 0000 UTC 21 Jul 2010; 10 mm (24 h)−1 contour is denoted by the thick black line. (b) Probability of >10 mm (24 h)−1 precipitation for day +3 forecast from the ECMWF ensemble for the same period. The analyzed 10-mm contour from (a) is repeated. (c)–(f) As in (b), but for NCEP, CMC, UKMO, and the multimodel combination, respectively.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1


Fig. 8.
Fig. 8.

As in Fig. 7, but for 24-h period ending 0000 UTC 11 Aug 2010.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

Figure 9 provides BSSs and CRPSSs for the multimodel and postprocessed forecasts. For the light precipitation forecasts [>1.0 mm (24 h)−1; Fig. 9a], the multimodel forecasts improved the skill by approximately +1 day relative to ECMWF at the earliest lead times; a +2-day multimodel forecast could now be made as skillfully as a +1-day ECMWF forecast. The improvement in skill was a more modest ~+0.3 days at the longer forecast lead times. The calibrated multimodel forecast product improved skill over the basic multimodel forecast by a tiny amount at day +1 but degraded the skill after day +3. This is consistent with previous results; at the longer lead times, the growth of errors makes it more difficult to differentiate the model bias from the chaotically induced errors with short training datasets (Hamill et al. 2004). The improvement from reforecast-based postprocessing to the raw ECMWF system was much smaller than the improvement from the single to multimodel approach and was even slightly negative at the day +5 lead. Reasons for the less impressive performance of reforecast calibration than in previous studies will be discussed at the end of this section.


Fig. 9.
Fig. 9.

BSSs of various forecasts for (a) >1 mm (24 h)−1 event and (b) >10 mm (24 h)−1 event, and (c) CRPSSs, all as a function of forecast lead time. “Multi-model/cal” refers to forecasts from the multimodel calibrated using ELR. “ECMWF/reforecast” refers to ECMWF forecasts calibrated using ELR and the reforecast dataset. Error bars denote confidence intervals, the 5th and 95th percentiles of a paired block bootstrap between ECMWF and NCEP forecasts.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

More impressive increases in skill were evident for the >10 mm (24 h)−1 event. Both the reforecast-based calibration and the multimodel approach increased forecast skill by an equivalent of up to +2 days of additional lead time. Again, the calibration of the multimodel forecasts provided modest improvement at the early leads and degradation at the longer leads relative to the unprocessed multimodel approach.

Measured in CRPSS, the multimodel forecasts produced the most skillful forecasts, exceeding the skill of reforecast-calibrated ECMWF forecasts by a small amount. Consider now where the forecasts were improved or degraded by the various approaches. Figure 10 provides maps of the day +3 CRPSS; maps for other lead times are in the supplemental material. The patterns of multimodel skill are rather similar to those of the most skillful ensemble system, ECMWF (Fig. 3a). The reforecast-calibrated ECMWF forecasts appear to have increased the skill most notably in the driest regions of the western United States.


Fig. 10.
Fig. 10.

Maps of CRPSS for day +3 forecasts for (a) the multimodel, (b) the multimodel with ELR calibration, and (c) ECMWF with ELR calibration using reforecasts.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

Figure 11 shows day +3 >10 mm (24 h)−1 event reliability diagrams for the multimodel, calibrated multimodel, and reforecast-calibrated ECMWF PQPFs. The raw multimodel PQPFs were slightly more reliable than any of the PQPFs from the individual centers (Fig. 5) and retained a slight overforecast bias at the higher probabilities. The improvements in reliability were more substantial than for the >1 mm (24 h)−1 event (see diagrams in the supplemental material). The reforecast-calibrated PQPFs exhibited a slight underforecast bias and were not as sharp as those from the multimodel forecasts. Was this due to some inhomogeneity between the 2002–09 training data and the 2010 real-time forecasts? Figure 12 shows that there were fewer large forecast busts in 2010 than there were in 2002 or 2006. When the regression analysis from the 2002–09 data was applied to correct the 2010 forecasts, the assumption was that the 2010 forecasts would be equally unskillful. In fact they were better, and as a consequence the postprocessed forecasts were less sharp than they could have been. Though it was not attempted here, it might be possible to apply ad hoc corrections to the training data to improve the regression analysis. Perhaps a slight adjustment of the training data ensemble mean toward the analyzed data would make its accuracy more closely resemble that of the 2010 data, sharpening and making the ELR forecasts more reliable and skillful.


Fig. 11.
Fig. 11.

As in Fig. 3, the reliability diagrams for day +3 forecasts at the >10 mm (24 h)−1 event: (a) the multimodel, (b) the multimodel with ELR calibration, and (c) ECMWF with ELR calibration using reforecasts.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1


Fig. 12.
Fig. 12.

A histogram of the absolute errors of day +3 ensemble-mean precipitation forecasts for the 2002 and 2006 reforecasts and for the 2010 20-member real-time ensemble.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

Figure 13 shows the multimodel areal coverage of the 0.5 probability contours for the >10 mm (24 h)−1 event for selected cases; these should be compared with Fig. 6 for ECMWF-only PQPFs. Figure 14 also shows the areal coverage, but for reforecast-calibrated ECMWF PQPFs. The areal coverage was only slightly smaller for the multimodel PQPFs than it was for the ECMWF PQPFs, illustrating that the multimodel forecasts did not lose a tremendous amount of sharpness (coverage of greater than 0.5 probability being a proxy for sharpness here). In comparison, the reforecast-calibrated PQPFs in Fig. 14 show a marked decrease in the areal coverage; many grid points with probability p > 0.5 in the raw ECMWF PQPF had p < 0.5 after calibration. Figures 15 and 16 show for the cases plotted in Figs. 7, 8 a bit more detail on what happened with typical multimodel and reforecast-calibrated PQPFs. The multimodel forecasts retained their sharpness, but not always desirably so. For example, in Fig. 15, the multimodel forecasts retain relatively high probabilities in eastern Iowa and northern Illinois, whereas the analyzed area was displaced farther south. The reforecast-calibrated PQPFs decreased the areal coverage of high probabilities, appropriately so in this case, reducing the false alarms. However, as seen in the inspection of Figs. 13, 14, there were many cases when the sharpness retained in the multimodel forecasts was desirable.


Fig. 13.
Fig. 13.

As in Fig. 6, but for multimodel forecasts.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1


Fig. 14.
Fig. 14.

As in Fig. 6, but for reforecast-calibrated ECMWF forecasts.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1


Fig. 15.
Fig. 15.

(a) Analyzed precipitation for the 24-h period ending 0000 UTC 21 Jul 2010; 10-mm contour is denoted by the thick black line. (b) Probability of >10 mm (24 h)−1 precipitation for day +3 forecast from the ECMWF ensemble for the same period. (c) As in (b), but for the multimodel ensemble. (d) As in (b), but for reforecast-calibrated ECMWF ensemble.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1


Fig. 16.
Fig. 16.

As in Fig. 15, but for the 24-h period ending 0000 UTC 8 Aug 2010.

Citation: Monthly Weather Review 140, 7; 10.1175/MWR-D-11-00220.1

The results exhibited here with reforecast calibration were not as impressive as they have been in previous studies (e.g., Hamill and Whitaker 2006; Hamill et al. 2008). There are at least four reasons for this. First, the training data were not as accurate as the real-time data in this application (Fig. 12), and this inhomogeneity degraded the regression analysis. This may have been due to less accurate initial conditions (ERA-Interim for the reforecast, operational 4D-Var for the real-time forecasts) and because the reforecast ensemble was initialized with perturbations that were constructed with approximations different from those in the real-time forecasts (section 2b). The second reason is that gratifying improvements have been made to models and EPSs so that they produce more skillful and reliable forecasts than they did even in the recent past; it is tougher to improve upon ECMWF’s 2010 model output than its 2005 model output. The third reason is that even with the use of ECMWF’s reforecasts, there really was a limited training dataset in this study, here because of the unavailability of precipitation analyses prior to 2002 and the unavailability of reforecast data more frequent than once per week. The fourth reason is that in prior studies, the ensemble forecasts (at coarse resolution) were evaluated against analysis data at finer resolution, so that the reforecast calibration process was also producing a statistical downscaling. This point is worth keeping in mind when considering the relative merits of reforecast calibration versus multimodel approaches. If the desired output is forecast data at the grid scale, multimodels may have substantial appeal. If the desired output is point data or high-resolution gridded data, the statistical downscaling is more straightforward when reforecasts are used.

Overall, the impressive skill improvements provide evidence of the merit of both multimodel ensemble and reforecast approaches. Should other forecast centers share precipitation ensemble data, large gains in probabilistic precipitation forecast skill are possible for little more than the cost of data transmission and storage. Alternatively, should any one center produce and utilize reforecasts, they can improve their own forecasts significantly, assuming a comparably long time series of observations or analyses are available. The improvement here noted with reforecasts may have also been modest because the training data were limited on account of a short time series of analyses, dating back to only 2002; only around 40% of the available reforecast data were used.

4. Conclusions

This article examined probabilistic multimodel weather forecasts of precipitation over CONUS and the relative advantages and disadvantages of these forecasts when compared to statistically postprocessed ECMWF forecasts. Twenty-member forecasts were extracted from the ECMWF, NCEP, UKMO, and CMC global ensemble systems at 1° resolution between June and October 2010. Daily 24-h accumulated probabilistic precipitation forecasts were generated from the subsequent 80-member ensemble for lead times from +1 to +5 days and compared to gridded precipitation analyses. Two statistically postprocessed products were also evaluated, the first being multimodel forecasts that were adjusted using extended logistic regression and that were trained on the previous 30 days of forecasts and analyses. The second was ECMWF forecasts, which were statistically adjusted using forecast/analysis data for the period 2002–09—the time period when both reforecasts and analyses were available.

Considering first the skill of forecasts from the individual EPSs, ECMWF forecasts generally were the most skillful in terms of Brier skill scores and the continuous ranked probability skill score. CMC forecasts were the most reliable but the least sharp, while NCEP and UKMO forecasts were more sharp but less reliable.

Multimodel probabilistic forecast products were substantially more skillful than the best of the individual centers’ probabilistic forecasts. The improvement was approximately from an extra +0.5 to +1 day of forecast lead time for light precipitation events and as much as +2 days for heavier precipitation events. The reforecast-calibrated ECMWF forecasts exhibited more skill and reliability improvement at the >10 mm (24 h)−1 event than they did at the >1 mm (24 h)−1 event. Relative to the multimodel forecasts, the reforecast-calibrated skills were similar for the >10 mm (24 h)−1 event, but the reforecast-calibrated forecast was more reliable while the multimodel forecast was sharper.

The results exhibited here with reforecast calibration were not as impressive as they have been in previous studies. There were at least four reasons for the lessened improvement of reforecast calibration here. First, the reforecast training data were shown to be less accurate than the real-time data in this application. Second, gratifying improvements have been made to models and EPSs in the last few years; it is tougher to improve upon ECMWF’s 2010 model output than its 2005 model output. Third, a limited training dataset was available for this study. Fourth, prior studies were performed at higher resolution and produced a statistical downscaling that the coarser raw forecasts could not accomplish.

I was pleasantly surprised by the magnitude of skill improvements demonstrated here from multimodel ensembles, improvements that were larger than those seen with 2-m temperatures (Hagedorn et al. 2012). From our own experience, however, I recommend some caution against broadly generalizing these results to any multimodel ensemble system. This study examined a combination of data from four mature EPSs based on mature models and assimilation systems. Each center’s system has been refined through the collective efforts of hundreds if not thousands of person-years of research and development. A combination of less developed EPSs may not provide nearly the same gratifying result.

Nonetheless, these results demonstrate the potential value of multimodel ensembles. THORPEX, organized by the World Meteorological Organization, has promoted the concept of a multimodel based “Global Interactive Forecast System” (Bougeault et al. 2010), whereby the operational centers share data that will facilitate the production of multimodel products for high-impact weather events. This study provides additional evidence for the validity and the potential benefits of such a system. Currently, several centers have restrictive data policies; full access to their data is reserved for paying customers, and those customers cannot thereafter share the data they purchased. Perhaps the approach embraced in the United States and Canada will be followed by other centers worldwide, for the mutual benefit of all. In the United States and Canada, the data are effectively free since the research, development, and production were funded by public taxpayer funds.

Finally, can we all have “the best of both worlds”? That is, will NWP centers agree to both share their ensemble data freely and internationally in real time, and produce reforecast datasets so that each model can be calibrated to remove systematic errors prior to their combination? There is evidence that such approaches will provide a substantial benefit. The climate community is working on sharing multimodel information and hindcasts to facilitate the error correction for intraseasonal and seasonal forecasts. For weather and weather-to-climate applications, there have also been successful demonstrations of multimodel calibrated forecasts (Vislocky and Fritsch 1995; Whitaker et al. 2006). The National Oceanic and Atmospheric Administration (NOAA) is currently developing a new reforecast dataset for its global ensemble prediction system, and I hope that other centers will be inspired to do so as well.

Acknowledgments

TIGGE data were obtained from ECMWF’s TIGGE data portal. I thank ECMWF for the development of this portal software and for the archival of this immense dataset. Florian Pappenberger of ECMWF was very helpful in extracting and preprocessing the reforecasts. Yan Luo of NCEP/EMC was helpful in obtaining the CCPA data. Tom Galarneau of NCAR/MMM is thanked for providing an informal review. Publication of this study was funded by NOAA THORPEX. Three anonymous reviewers are thanked for their helpful recommendations to improve this article.

APPENDIX A

Description of Modeling Systems Used

Here are additional details on the forecast models and ensemble systems used in this study.

a. NCEP

NCEP used the Global Forecast System (GFS) model in their ensemble system at T190L28 resolution. A lengthier description of the physical packages used in this model was given in Hamill et al. (2011). A description of the GFS model is available from the NCEP Environmental Modeling Center (EMC), with changes as of 2003 included (see online at www.emc.ncep.noaa.gov/gmb/moorthi/gam.html).

The control initial condition around which the perturbed initial conditions were centered was produced by the T382 gridpoint statistical interpolation (GSI) analysis (Kleist et al. 2009) at T384L64 resolution. Perturbed initial conditions were generated with the ensemble transform with the rescaling technique of Wei et al. (2008). Stochastic perturbations were included, following Hou et al. (2008). More details on changes to the NCEP ensemble system can be found online (see http://www.emc.ncep.noaa.gov/gmb/yzhu/html/ENS_IMP.html).

b. Canadian Meteorological Centre

The CMC EPS used the Global Environmental Multiscale Model (GEM), a primitive equation model with a terrain-following pressure vertical coordinate. Further documentation on the GEM model can be found online (see http://collaboration.cmc.ec.gc.ca/science/rpn/gef_html_public/DOCUMENTATION/GENERAL/general.html) and in Charron et al. (2010). The CMC ensemble system used a horizontal computational grid of 400 × 200 grid points, or approximately 0.9°, and 28 vertical levels. The ensemble Kalman filter (EnKF) initial conditions were used, following Charron et al. (2010) and Houtekamer et al. (2009; and references therein). The 20 forecast ensemble members used a variety of perturbed physics: changing gravity wave drag parameters, land surface process type, condensation scheme type, convection scheme type, shallow convection scheme type, mixing-length formulation, and turbulent vertical diffusion parameter. More details on these are provided online (see http://www.weatheroffice.gc.ca/ensemble/verifs/model_e.html).

c. ECMWF

The ECMWF EPS used the ECMWF Integrated Forecast System model, versions 36r2. The model resolution was T639L62 for both versions (details on the IFS are provided at www.ecmwf.int/research/ifsdocs/). The changes to the ensemble stochastic treatments in the 8 September 2009 implementation are described in Palmer et al. (2009). The ensemble was initialized with a combination of initial-time and evolved total-energy singular vectors (Buizza and Palmer 1995; Molteni et al. 1996; Barkmeijer et al. 1998, 1999; Leutbecher 2005) and utilized stochastic perturbations to physical tendencies. An overview of the ensemble system was provided in Buizza et al. (2007; and references therein). For consistency with the analyses of other EPSs, only the first 20 perturbed members were used here.

d. UKMO

The Met Office ensemble system was the Met Office Global and Regional Ensemble Prediction System (MOGREPS). Tropical cyclone track forecasts from this system came from its global component, which was described in Bowler et al. (2008, 2009). The global system was run at a resolution of 0.83° longitude and 0.55° latitude on a regular latitude–longitude grid. Seventy vertical levels were employed (Tennant et al. 2011). Initial condition perturbations were generated from an implementation of the ensemble transform Kalman filter (Hunt et al. 2006; Bowler et al. 2009). The mean initial state was generated from the UKMO 4D-Var system (Rawlins et al. 2007). The model included a parameterization of one type of model uncertainty via its stochastic kinetic-energy backscatter scheme, following Shutts (2005) and Tennant et al. (2011).

APPENDIX B

Methodology to Increase Training Sample Size

This appendix discusses the method used to augment the training sample size used in the regression analyses. Suppose when calibrating multimodel ensembles using the past 30 days of forecasts and analyses, only data from the grid point of interest were used. This would provide, of course, only 30 training samples. Older forecasts could be used, but precipitation biases are often seasonally dependent, so the older data may degrade the results despite augmenting the sample size. Also, with such a multimodel ensemble, the farther back into the past one seeks training data, the more likely it is that at least one of the models will have had a major upgrade and concomitant change in systematic error characteristics.

Despite ECMWF providing a multidecadal reforecast, in practice the sample sizes were too small here, too. When using the 2002–09 weekly, five-member ECMWF reforecasts (including reforecast dates ±6 weeks around the week of interest), this provided a total of 13 weeks × 8 yr = 104 samples. In both cases these were relatively small samples to estimate four regression parameters, and, especially for rare events such as heavy precipitation, experience has shown that larger training samples improved the regression analysis.

Hence, following the general philosophy demonstrated and discussed in Hamill et al. (2008) and inspired by the regionalization used in some model output statistics algorithms (Lowry and Glahn 1976), the training dataset for a particular grid point was augmented by finding 25 other grid points that had relatively similar climatological analyzed CDFs. Consider a particular location at which we seek to augment the sample size, and another location that we are considering as a location with suitable supplemental training data. Differences between the analyzed cumulative probabilities at and were measured at the 1, 2.5, 5, 10, 25, and 50 mm (24 h)−1 amounts and then weighted by similar respective factors of [1, 2.5, 5, 10, 25, 50]. That is, a cumulative probability difference of 0.1 at 1 mm and 0.1/50 at 50 mm were judged to have the same weighted difference (this approach is admittedly somewhat arbitrary, and testing found that the overall calibration results were relatively insensitive to the details of this assumption). The maximum weighted difference at any of the possible precipitation amounts was then noted for this . Having evaluated the maximum of the weighted differences at all the grid points less than eight grid points distant from the grid point of interest , the 25 grid points with the smallest weighted differences were identified, and the training sample for was augmented by the forecast–analysis pairs at these locations. This approach increased sample size, but it is possible that the forecast bias might have been different at the supplemental locations, and hence not an unalloyed benefit. For more discussion of this, see Hamill et al. (2008, their section 3a).

REFERENCES

  • Accadia , C. , S. Mariani , M. Casaioli , A. Lavagnini , and A. Speranza , 2003 : Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids . Wea. Forecasting , 18 , 918 932 .

    • Search Google Scholar
    • Export Citation
  • Bao , L. , T. Gneiting , E. P. Grimit , P. Guttorp , and A. E. Raftery , 2010 : Bias correction and bayesian model averaging for ensemble forecasts of surface wind direction . Mon. Wea. Rev. , 138 , 1811 1821 .

    • Search Google Scholar
    • Export Citation
  • Barkmeijer , J. , F. Bouttier , and M. Van Gijzen , 1998 : Singular vectors and estimates of the analysis-error covariance metric . Quart. J. Roy. Meteor. Soc. , 124 , 1695 1713 .

    • Search Google Scholar
    • Export Citation
  • Barkmeijer , J. , R. Buizza , and T. N. Palmer , 1999 : 3D-Var Hessian singular vectors and their potential use in the ECMWF ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 125 , 2333 2351 .

    • Search Google Scholar
    • Export Citation
  • Berner , J. , G. J. Shutts , M. Leutbecher , and T. N. Palmer , 2009 : A spectral stochastic kinetic energy backscatter scheme and its impact on flow-dependent predictability in the ECMWF ensemble prediction system . J. Atmos. Sci. , 66 , 603 626 .

    • Search Google Scholar
    • Export Citation
  • Berner , J. , S.-Y. Ha , J. P. Hacker , A. Fournier , and C. Snyder , 2011 : Model uncertainty in a mesoscale ensemble prediction system: Stochastic versus multiphysics representations . Mon. Wea. Rev. , 139 , 1972 1995 .

    • Search Google Scholar
    • Export Citation
  • Bougeault , P. , and Coauthors , 2010 : The THORPEX Interactive Grand Global Ensemble . Bull. Amer. Meteor. Soc. , 91 , 1059 1072 .

    • Search Google Scholar
    • Export Citation
  • Bowler , N. E. , A. Arribas , K. R. Mylne , K. B. Robertson , and S. E. Beare , 2008 : The MOGREPS short-range ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 134 , 703 722 .

    • Search Google Scholar
    • Export Citation
  • Bowler , N. E. , A. Arribas , S. E. Beare , K. R. Mylne , and G. J. Shutts , 2009 : The local ETKF and SKEB: Upgrades to the MOGREPS short-range ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 135 , 767 776 .

    • Search Google Scholar
    • Export Citation
  • Buizza , R. , and T. N. Palmer , 1995 : The singular-vector structure of the atmospheric global circulation . J. Atmos. Sci. , 52 , 1434 1456 .

    • Search Google Scholar
    • Export Citation
  • Buizza , R. , M. Miller , and T. N. Palmer , 1999 : Stochastic representation of model uncertainties in the ECMWF ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 125 , 2887 2908 .

    • Search Google Scholar
    • Export Citation
  • Buizza , R. , J.-R. Bidlot , N. Wedi , M. Fuentes , M. Hamrud , G. Holt , and F. Vitart , 2007 : The new ECMWF VAREPS (Variable Resolution Ensemble Prediction System) . Quart. J. Roy. Meteor. Soc. , 133 , 681 695 .

    • Search Google Scholar
    • Export Citation
  • Candille , G. , 2009 : The multiensemble approach: The NAEFS example . Mon. Wea. Rev. , 137 , 1655 1665 .

  • Charron , M. , G. Pellerin , L. Spacek , P. L. Houtekamer , N. Gagnon , H. L. Mitchell , and L. Michelin , 2010 : Toward random sampling of model error in the Canadian ensemble prediction system . Mon. Wea. Rev. , 138 , 1877 1901 .

    • Search Google Scholar
    • Export Citation
  • Dee , D. P. , and Coauthors , 2011 : The ERA-Interim reanalysis: Configuration and performance of the data assimilation system . Quart. J. Roy. Meteor. Soc. , 137 , 553 597 .

    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes , F. J. , R. Hagedorn , and T. N. Palmer , 2005 : The rationale behind the success of multi-model ensembles in seasonal forecasting – II. Calibration and combination . Tellus , 57A , 234 252 .

    • Search Google Scholar
    • Export Citation
  • Glahn , B. , M. Peroutka , J. Wiedenfeld , J. Wagner , G. Zylstra , B. Schuknecht , and B. Jackson , 2009 : MOS uncertainty estimates in an ensemble framework . Mon. Wea. Rev. , 137 , 246 268 .

    • Search Google Scholar
    • Export Citation
  • Gneiting , T. , A. E. Raftery , A. H. Westveld , and T. Goldman , 2005 : Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation . Mon. Wea. Rev. , 133 , 1098 1118 .

    • Search Google Scholar
    • Export Citation
  • Hagedorn , R. , 2008 : Using the ECMWF reforecast data set to calibrate EPS reforecasts. ECMWF Newsletter, No. 117, ECMWF, Reading, United Kingdom, 8–13 .

  • Hagedorn , R. , F. J. Doblas-Reyes , and T. N. Palmer , 2005 : The rationale behind the success of multi-model ensembles in seasonal forecasting – I. Basic concept . Tellus , 57A , 219 233 .

    • Search Google Scholar
    • Export Citation
  • Hagedorn , R. , T. M. Hamill , and J. S. Whitaker , 2008 : Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part I: Two-meter temperatures . Mon. Wea. Rev. , 136 , 2608 2619 .

    • Search Google Scholar
    • Export Citation
  • Hagedorn , R. , R. Buizza , T. M. Hamill , M. Leutbecher , and T. N. Palmer , 2012 : Comparing TIGGE multi-model forecasts with reforecast-calibrated ECMWF ensemble forecasts . Quart. J. Roy. Meteor. Soc. , in press .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , 1999 : Hypothesis tests for evaluating numerical precipitation forecasts . Wea. Forecasting , 14 , 155 167 .

  • Hamill , T. M. , and J. Juras , 2006 : Measuring forecast skill: Is it real skill or is it the varying climatology? Quart. J. Roy. Meteor. Soc. , 132 , 2905 2923 .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , and J. S. Whitaker , 2006 : Probabilistic quantitative precipitation forecasts based on reforecast analogs: Theory and application . Mon. Wea. Rev. , 134 , 3209 3229 .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , and J. S. Whitaker , 2007 : Ensemble calibration of 500-hPa geopotential height and 850-hPa and 2-m temperatures using reforecasts . Mon. Wea. Rev. , 135 , 3273 3280 .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , J. S. Whitaker , and X. Wei , 2004 : Ensemble reforecasting: Improving medium-range forecast skill using retrospective forecasts . Mon. Wea. Rev. , 132 , 1434 1447 .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , J. S. Whitaker , and S. L. Mullen , 2006 : Reforecasts: An important dataset for improving weather predictions . Bull. Amer. Meteor. Soc. , 87 , 33 46 .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , R. Hagedorn , and J. S. Whitaker , 2008 : Probabilistic forecast calibration using ECMWF and GFS ensemble reforecasts. Part II: Precipitation . Mon. Wea. Rev. , 136 , 2620 2632 .

    • Search Google Scholar
    • Export Citation
  • Hamill , T. M. , J. S. Whitaker , M. Fiorino , and S. G. Benjamin , 2011 : Global ensemble predictions of 2009’s tropical cyclones initialized with an ensemble Kalman filter . Mon. Wea. Rev. , 139 , 668 688 .

    • Search Google Scholar
    • Export Citation
  • Higgins , R. W. , J. E. Janowiak , and Y.-P. Yao , 1996 : A gridded hourly precipitation data base for the United States (1963-1993). NCEP/Climate Prediction Center ATLAS 1, U.S. Department of Commerce, NOAA/NWS, 47 pp .

  • Hou , D. , Z. Toth , Y. Zhu , and W. Yang , 2008 : Impact of a stochastic perturbation scheme on NCEP Global Ensemble Forecast System. Proc. 19th Conf. on Probability and Statistics, New Orleans, LA, Amer. Meteor. Soc., 1.1. [Available online at http://ams.confex.com/ams/pdfpapers/134165.pdf.]

  • Houtekamer , P. L. , and H. L. Mitchell , 2005 : Ensemble Kalman filtering . Quart. J. Roy. Meteor. Soc. , 131 , 3269 3289 .

  • Houtekamer , P. L. , H. L. Mitchell , and X. Deng , 2009 : Model error representation in an operational ensemble Kalman filter . Mon. Wea. Rev. , 137 , 2126 2143 .

    • Search Google Scholar
    • Export Citation
  • Hunt , B. , E. Kostelich , and I. Szunyogh , 2006 : Efficient data assimilation for spatiotemporal chaos: A local ensemble transform Kalman filter. Physica D, 230, 112–126.

  • Isaksen , L. , M. Bonavita , R. Buizza , M. Fisher , J. Haseler , M. Leutbecher , and L. Raynaud , 2010 : Ensemble of data assimilations at ECMWF. ECMWF Tech. Memo., Vol. 636, 46 pp .

  • Iversen , T. , A. Deckmyn , C. Santos , K. A. I. Sattler , J. B. Bremnes , H. Feddersen , and I.-L. Frogner , 2011 : Evaluation of ‘GLAMEPS’—A proposed multimodel EPS for short range forecasting . Tellus , 63A , 513 530 .

    • Search Google Scholar
    • Export Citation
  • Johnson , C. , and R. Swinbank , 2009 : Medium-range multimodel ensemble combination and calibration . Quart. J. Roy. Meteor. Soc. , 135 , 777 794 .

    • Search Google Scholar
    • Export Citation
  • Kleist , D. T. , D. F. Parrish , J. C. Derber , R. Treadon , W.-S. Wu , and S. Lord , 2009 : Introduction of the GSI into the NCEP Global Data Assimilation System . Wea. Forecasting , 24 , 1691 1705 .

    • Search Google Scholar
    • Export Citation
  • Krishnamurti , T. N. , C. M. Kishtawal , Z. Zhang , T. LaRow , D. Bachiochi , E. Williford , S. Gadgil , and S. Surendran , 2000 : Multimodel ensemble forecasts for weather and seasonal climate . J. Climate , 13 , 4196 4216 .

    • Search Google Scholar
    • Export Citation
  • Leutbecher , M. , 2005 : On ensemble prediction using singular vectors started from forecasts . Mon. Wea. Rev. , 133 , 3038 3046 .

    • Search Google Scholar
    • Export Citation
  • Lin , Y. , and K. E. Mitchell , 2005 : The NCEP stage II/IV hourly precipitation analyses: Development and applications. Preprints, 19th Conf. on Hydrology, San Diego, CA, Amer. Meteor. Soc., 1.2. [Available online at http://ams.confex.com/ams/Annual2005/webprogram/Paper83847.html.]

  • Lorenz , J. , H. Rauhut , F. Schweitzer , and D. Helbing , 2011 : How social influence can undermine the wisdom of crowd effect . Proc. Natl. Acad. Sci. USA , 108 , 920 925 .

    • Search Google Scholar
    • Export Citation
  • Lowry , D. A. , and H. R. Glahn , 1976 : An operational model for forecasting probability of precipitation—PEATMOS PoP . Mon. Wea. Rev. , 104 , 221 232 .

    • Search Google Scholar
    • Export Citation
  • Molteni , F. , R. Buizza , T. N. Palmer , and T. Petroliagis , 1996 : The ECMWF Ensemble Prediction System: Methodology and validation . Quart. J. Roy. Meteor. Soc. , 122 , 73 119 .

    • Search Google Scholar
    • Export Citation
  • Mylne , K. R. , R. E. Evans , and R. T. Clark , 2002 : Multi-model multi-analysis ensembles in quasi-operational medium-range forecasting . Quart. J. Roy. Meteor. Soc. , 128 , 361 384 .

    • Search Google Scholar
    • Export Citation
  • Palmer , T. N. , R. Buizza , F. J. Doblas-Reyes , T. Jung , M. Leutbecher , G. J. Shutts , M. Steinheimer , and A. Weisheimer , 2009 : Stochastic parametrization and model uncertainty. ECMWF Tech. Memo. 598, 42 pp .

  • Raftery , A. E. , T. Gneiting , F. Balabdaoui , and M. Polakowski , 2005 : Using Bayesian model averaging to calibrate forecast ensembles . Mon. Wea. Rev. , 133 , 1155 1174 .

    • Search Google Scholar
    • Export Citation
  • Rawlins , F. , S. P. Ballard , K. J. Bovis , A. M. Clayton , D. Li , G. W. Inverarity , A. C. Lorenc , and T. J. Payne , 2007 : The Met Office global four-dimensional variational data assimilation scheme . Quart. J. Roy. Meteor. Soc. , 133 , 347 362 .

    • Search Google Scholar
    • Export Citation
  • Roulin , E. , and S. Vannitsem , 2012 : Postprocessing of ensemble precipitation predictions with extended logistic regression based on hindcasts . Mon. Wea. Rev. , 140 , 874 888 .

    • Search Google Scholar
    • Export Citation
  • Schmeits , M. J. , and K. J. Kok , 2010 : A comparison between raw ensemble output, (modified) Bayesian model averaging, and extended logistic regression using ECMWF ensemble precipitation reforecasts . Mon. Wea. Rev. , 138 , 4199 4211 .

    • Search Google Scholar
    • Export Citation
  • Shutts , G. , 2005 : A kinetic energy backscatter algorithm for use in ensemble prediction systems . Quart. J. Roy. Meteor. Soc. , 131 , 3079 3102 .

    • Search Google Scholar
    • Export Citation
  • Sloughter , J. M. L. , A. E. Raftery , T. Gneiting , and C. Fraley , 2007 : Probabilistic quantitative precipitation forecasting using Bayesian model averaging . Mon. Wea. Rev. , 135 , 3209 3220 .

    • Search Google Scholar
    • Export Citation
  • Stensrud , D. J. , and N. Yussouf , 2003 : Short-range ensemble predictions of 2-m temperature and dewpoint temperature over New England . Mon. Wea. Rev. , 131 , 2510 2524 .

    • Search Google Scholar
    • Export Citation
  • Tennant , W. J. , G. J. Shutts , A. Arribas , and S. A. Thompson , 2011 : Using a stochastic kinetic energy backscatter scheme to improve MOGREPS probabilistic forecast skill . Mon. Wea. Rev. , 139 , 1190 1206 .

    • Search Google Scholar
    • Export Citation
  • Vannitsem , S. , and C. Nicolis , 2008 : Dynamical properties of model output statistics forecasts . Mon. Wea. Rev. , 136 , 405 419 .

    • Search Google Scholar
    • Export Citation
  • Vislocky , R. L. , and J. M. Fritsch , 1995 : Improved model output statistics forecasts through model consensus . Bull. Amer. Meteor. Soc. , 76 , 1157 1164 .

    • Search Google Scholar
    • Export Citation
  • Wandishin , M. S. , S. L. Mullen , D. J. Stensrud , and H. E. Brooks , 2001 : Evaluation of a short-range multimodel ensemble system . Mon. Wea. Rev. , 129 , 729 747 .

    • Search Google Scholar
    • Export Citation
  • Wei , M. , Z. Toth , R. Wobus , and Y. Zhu , 2008 : Initial perturbations based on the ensemble transform (ET) technique in the NCEP global operational forecast system . Tellus , 60A , 62 79 .

    • Search Google Scholar
    • Export Citation
  • Weigel , A. P. , M. A. Liniger , and C. Appenzeller , 2008 : Can multi-model combination really enhance the prediction skill of probabilistic ensemble forecasts? Quart. J. Roy. Meteor. Soc. , 134 , 241 260 .

    • Search Google Scholar
    • Export Citation
  • Whitaker , J. S. , X. Wei , and F. Vitart , 2006 : Improving week-2 forecasts with multimodel reforecast ensembles . Mon. Wea. Rev. , 134 , 2279 2284 .

    • Search Google Scholar
    • Export Citation
  • Wilks , D. S. , 2006 : Statistical Methods in the Atmospheric Sciences. 2nd ed. Academic Press, 627 pp .

  • Wilks , D. S. , 2009 : Extending logistic regression to provide full-probability-distribution MOS forecasts . Meteor. Appl. , 16 , 361 368 .

    • Search Google Scholar
    • Export Citation
  • Wilks , D. S. , and T. M. Hamill , 2007 : Comparison of ensemble-MOS methods using GFS reforecasts . Mon. Wea. Rev. , 135 , 2379 2390 .

    • Search Google Scholar
    • Export Citation
  • Wilson , L. J. , S. Beauregard , A. E. Raftery , and R. Verret , 2007 : Calibrated surface temperature forecasts from the Canadian ensemble prediction system using Bayesian model averaging . Mon. Wea. Rev. , 135 , 1364 1385 .

    • Search Google Scholar
    • Export Citation
  • Yussouf , N. , and D. J. Stensrud , 2007 : Bias-corrected short-range ensemble forecasts of near-surface variables during the 2005/06 cool season . Wea. Forecasting , 22 , 1274 1286 .

    • Search Google Scholar
    • Export Citation

Supplementary Materials

Save
  • Accadia , C. , S. Mariani , M. Casaioli , A. Lavagnini , and A. Speranza , 2003 : Sensitivity of precipitation forecast skill scores to bilinear interpolation and a simple nearest-neighbor average method on high-resolution verification grids . Wea. Forecasting , 18 , 918 932 .

    • Search Google Scholar
    • Export Citation
  • Bao , L. , T. Gneiting , E. P. Grimit , P. Guttorp , and A. E. Raftery , 2010 : Bias correction and bayesian model averaging for ensemble forecasts of surface wind direction . Mon. Wea. Rev. , 138 , 1811 1821 .

    • Search Google Scholar
    • Export Citation
  • Barkmeijer , J. , F. Bouttier , and M. Van Gijzen , 1998 : Singular vectors and estimates of the analysis-error covariance metric . Quart. J. Roy. Meteor. Soc. , 124 , 1695 1713 .

    • Search Google Scholar
    • Export Citation
  • Barkmeijer , J. , R. Buizza , and T. N. Palmer , 1999 : 3D-Var Hessian singular vectors and their potential use in the ECMWF ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 125 , 2333 2351 .

    • Search Google Scholar
    • Export Citation
  • Berner , J. , G. J. Shutts , M. Leutbecher , and T. N. Palmer , 2009 : A spectral stochastic kinetic energy backscatter scheme and its impact on flow-dependent predictability in the ECMWF ensemble prediction system . J. Atmos. Sci. , 66 , 603 626 .

    • Search Google Scholar
    • Export Citation
  • Berner , J. , S.-Y. Ha , J. P. Hacker , A. Fournier , and C. Snyder , 2011 : Model uncertainty in a mesoscale ensemble prediction system: Stochastic versus multiphysics representations . Mon. Wea. Rev. , 139 , 1972 1995 .

    • Search Google Scholar
    • Export Citation
  • Bougeault , P. , and Coauthors , 2010 : The THORPEX Interactive Grand Global Ensemble . Bull. Amer. Meteor. Soc. , 91 , 1059 1072 .

    • Search Google Scholar
    • Export Citation
  • Bowler , N. E. , A. Arribas , K. R. Mylne , K. B. Robertson , and S. E. Beare , 2008 : The MOGREPS short-range ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 134 , 703 722 .

    • Search Google Scholar
    • Export Citation
  • Bowler , N. E. , A. Arribas , S. E. Beare , K. R. Mylne , and G. J. Shutts , 2009 : The local ETKF and SKEB: Upgrades to the MOGREPS short-range ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 135 , 767 776 .

    • Search Google Scholar
    • Export Citation
  • Buizza , R. , and T. N. Palmer , 1995 : The singular-vector structure of the atmospheric global circulation . J. Atmos. Sci. , 52 , 1434 1456 .

    • Search Google Scholar
    • Export Citation
  • Buizza , R. , M. Miller , and T. N. Palmer , 1999 : Stochastic representation of model uncertainties in the ECMWF ensemble prediction system . Quart. J. Roy. Meteor. Soc. , 125 , 2887 2908 .

    • Search Google Scholar
    • Export Citation
  • Buizza , R. , J.-R. Bidlot , N. Wedi , M. Fuentes , M. Hamrud , G. Holt , and F. Vitart , 2007 : The new ECMWF VAREPS (Variable Resolution Ensemble Prediction System) . Quart. J. Roy. Meteor. Soc. , 133 , 681 695 .

    • Search Google Scholar
    • Export Citation
  • Candille , G. , 2009 : The multiensemble approach: The NAEFS example . Mon. Wea. Rev. , 137 , 1655 1665 .

  • Charron , M. , G. Pellerin , L. Spacek , P. L. Houtekamer , N. Gagnon , H. L. Mitchell , and L. Michelin , 2010 : Toward random sampling of model error in the Canadian ensemble prediction system . Mon. Wea. Rev. , 138 , 1877 1901 .

    • Search Google Scholar
    • Export Citation
  • Dee , D. P. , and Coauthors , 2011 : The ERA-Interim reanalysis: Configuration and performance of the data assimilation system . Quart. J. Roy. Meteor. Soc. , 137 , 553 597 .

    • Search Google Scholar
    • Export Citation
  • Doblas-Reyes , F. J. , R. Hagedorn , and T. N. Palmer , 2005 : The rationale behind the success of multi-model ensembles in seasonal forecasting – II. Calibration and combination . Tellus , 57A , 234 252 .

    • Search Google Scholar
    • Export Citation
  • Glahn , B. , M. Peroutka , J. Wiedenfeld , J. Wagner , G. Zylstra , B. Schuknecht , and B. Jackson , 2009 : MOS uncertainty estimates in an ensemble framework . Mon. Wea. Rev. , 137 , 246 268 .

    • Search Google Scholar
    • Export Citation
  • Gneiting , T. , A. E. Raftery , A. H. Westveld , and T. Goldman , 2005 : Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation . Mon. Wea. Rev. , 133 , 1098 1118 .

    • Search Google Scholar
    • Export Citation
  • Hagedorn , R. , 2008 : Using the ECMWF reforecast data set to calibrate EPS reforecasts. ECMWF Newsletter, No. 117, ECMWF, Reading, United Kingdom, 8–13 .

  • Hagedorn , R. , F. J. Doblas-Reyes , and T. N. Palmer , 2005 : The rationale behind the success of multi-model ensembles in seasonal forecasting – I. Basic concept . Tellus , 57A , 219 233 .

    • Search Google Scholar
    • Export Citation
  • Hagedorn , R. , </