## 1. Introduction

Ensemble forecasts of precipitation provide information on the predictability^{1} by a numerical model given the sources of error considered in the generation of the ensemble. When averaged over many cases, the ensemble mean (ENM), or the arithmetic average of gridded precipitation fields from the ensemble members, is more skillful than any of the members as verified against observations. A major reason is that the averaging operation filters out features that the ensemble members fail to agree on (Warner 2011, p. 254), which tend to be more poorly predicted then the features on which the ensemble members agree.^{2} A similar approach to increase forecasting skill is used in nowcasting. For radar-based nowcasts, the loss of predictive skill with forecast time is scale dependent, with scales smaller than 10 km losing predictability after about 1 h. As a consequence, in some nowcasting systems, nonpredictable scales are increasingly filtered out with advancing lead time (Germann and Zawadzki 2003, 2004; Seed 2003) to maintain the skill for longer time periods.

In model precipitation forecasts, the dependence of forecasting skill on spatial scale has also been documented, with the small scales being less predictable (Demaria et al. 2011; Gilleland et al. 2009; Roberts and Lean 2008). It is therefore reasonable to postulate that the filtering of unpredictable modeled features through ensemble averaging would likewise occur at small scales. In this paper, we will analyze the statistical properties of the modeled precipitation fields by comparing the ENM and the individual ensemble members of a Storm-Scale Ensemble Forecasting (SSEF) system to determine the exact range of scales affected by the averaging process. Information on the scales at which unpredictable features are removed from the ensemble members provides insight on the scale dependence of predictability by numerical models. This analysis is independent of, but complementary to the evaluation of model output against observations.

In general, the ENM depicts only the most certain components of the forecasts. As a result, for precipitation, the ENM forecasts would not predict the highest intensities associated with the nonpredictable small scales. The information on intensities is contained in another useful statistic of the ensemble, namely the mean probability density function (PDF) of rainfall values. For every point in space and for every forecast time, the ensemble gives a PDF of rainfall intensities. Ebert (2001) proposed to combine the PDF from the ensemble members with the ENM by recalibrating the intensities in the ENM in an empirical manner, similar to the calibration of radar precipitation fields with the PDF of rain rates from gauges (Calheiros and Zawadzki 1987). This probability matching (PM) procedure^{3} leads to a new ensemble mean, the probability matched mean (PMM), which appears more realistic and can have better skill as evaluated by traditional metrics [e.g., the equitable threat score (ETS); Wilks 1995]. The latter point raises intriguing questions: Since the ENM was already the best forecast an ensemble prediction system could produce, what is the nature of the improvement of PMM? What is the effect of PM on the filtering achieved through averaging? Is there a relation between the two? These questions are of interest, especially as the PM procedure has been used extensively for the correction of the precipitation forecasts from the ensemble mean in the context of model evaluation studies (Berenguer et al. 2012; Chien and Jou 2004; Clark et al. 2012; Gallus 2010; Snook et al. 2012). We will show here that despite the PMM’s having the same PDF of rainfall intensities as the ensemble members, the spectral structure of precipitation fields from the PMM differs from the ensemble members. It is this difference that causes in some cases a better skill of the PMM in terms of traditional scores.

The paper is organized as follows. Section 2 offers a brief description of the data. The effect of ensemble averaging through the investigation of the statistical properties of precipitation fields from the ENM is presented in section 3. Section 4 examines the effects of the PM procedure on both the statistical properties of rainfall fields and on quantitative precipitation forecast (QPF) skill, while section 5 presents the conclusions.

## 2. Data

This paper uses the same forecast and verification dataset as Berenguer et al. (2012).

Precipitation forecasts of hourly rainfall accumulations from an SSEF system are analyzed and evaluated with respect to radar-derived precipitation estimates. The SSEF system was developed at the Center for Analysis and Prediction of Storms (CAPS) and it was run during the 2008 National Oceanic and Atmospheric Administration (NOAA) Hazardous Weather Testbed (HWT) Spring Experiment (Xue et al. 2008; Kong et al. 2008). It uses the Advanced Research version of the Weather Research and Forecasting Model (ARW-WRF; Skamarock et al. 2008), version 2.2, and consists of 10 members with different physical schemes, data assimilation options, and perturbed initial and lateral boundary conditions (IC/LBC). Here 30-h forecasts on a 4-km grid were initialized at 0000 UTC almost every day from April to June 2008. The background initial conditions (ICs) are interpolated from the North American Mesoscale Model (NAM; Janjic 2003) 12-km analysis, and ICs perturbations for perturbed members are obtained from the operational Short-Range Ensemble Forecast (SREF; Du et al. 2006) system from the National Centers for Environmental Prediction (NCEP) Environmental Modeling Center (EMC). Convective-scale observational information is introduced into the ICs of nine of the members by assimilating level-II radial velocity and reflectivity data from individual Weather Surveillance Radar-1988 Dopplers (WSR-88Ds) and data from surface station networks. Radar data are treated with the CAPS processing package that includes a quality control and averaging of radar observations from their native coordinates onto the 4-km model grid (superobing). The data are then assimilated with a three-dimensional variational data assimilation (3D-Var) cloud analysis system (Gao et al. 2004; Brewster et al. 2005; Hu et al. 2006a,b) within the Advanced Regional Prediction System (ARPS; Xue et al. 2003). Two of the members (control members C0 and CN) do not have SREF-based IC/LBC perturbations, have identical model configurations, and use interpolated NAM analyses as background for ICs. However, convective-scale observations from radar and surface stations are assimilated only within CN.

In this article, we analyze deterministic forecasts from the radar-data assimilating ensemble members, from the ENM, and from the PMM. The PMM is generated as proposed by Ebert (2001) by imposing the frequency distribution of rainfall intensities from the nine ensemble members to the ENM fields and the probability-matching algorithm was applied over the domain of Fig. 1. In this way, over the domain on which it is computed, the PM mean has the distribution of rainfall intensities predicted by the nine ensemble members. This procedure assumes that the correct location of rainfall is well depicted by the ENM and that the ensemble members give the correct frequency distribution of rainfall intensities. In reality, this assumption does not hold, as the ensemble members have a high-intensity bias (Berenguer et al. 2012), which is common for 4-km grid-spacing convection allowing models (Clark et al. 2009). Therefore, the resulting PMM will also have a high-intensity bias relative to observations. This bias could be eliminated though postprocessing as done by Clark et al. (2009), but as the scope of this paper is the exploration of the statistical properties of precipitation fields from the ensemble, it is preferable to avoid corrections that would modify these statistical properties.

The evaluation is done with respect to rainfall maps derived from the U.S. radar reflectivity mosaics at 2.5-km altitude generated by the National Severe Storms Laboratory (NSSL; Zhang et al. 2005). A threshold of 15 dB*Z* is used to discriminate between raining and nonraining areas for all computations, and maps of reflectivity *Z* are converted into rain rate *R* according to *Z* = 300*R*^{1.5}. Instantaneous rainfall intensity maps every 15 min have been averaged to obtain maps of hourly accumulated rainfall. An example of hourly rainfall maps corresponding to the radar observations, the nine ensemble members, the ENM, and the PMM is shown in Fig. 1.

## 3. The filtering effect of ensemble averaging of precipitation fields

### a. Quantifying the statistical properties of precipitation fields from the ensemble

*y*axis. To better illustrate the difference in power spectra as a function of scale, we compute and plot the power ratio with respect to the ENM. For a particular member

*i*, the power ratio is defined as follows:where

*i*at that scale, and

*S*

_{0}(determined subjectively—black star in Figs. 2d–f), and then it decreases following a power law. The cutoff scale

*S*

_{0}increases with forecast lead-time

*t*, being on average around 30 km for

*t =*3 h, around 80 km for

*t =*10 h, and around 200 km for

*t =*20 h. The meaning of this behavior is explored next.

### b. The significance of the power spectra differences between the ENM and the ensemble members

By definition, the integral below the power spectrum curve is a measure of the total variance of the field, while the slope of the power spectrum indicates how the total variance is distributed across scales. Therefore, two identical fields will have both the same total variance and the same slope, and therefore a power ratio of 1 at all scales. Reducing the total variance of the field of Fig. 3a by a constant factor such as in Fig. 3d, will result in changing the amplitude of the power spectrum, but not the slope; in this case, the power ratio would be different than 1 but constant with scale (black and green lines are parallel in Fig. 3e). When a smoother (i.e., a Haar-wavelet low-pass filter with a cutoff scale of 128 km) is applied to the field of Fig. 3a (Fig. 3c), the resulting power spectra will have the same amount of amplitude at the unaffected scales, but a different slope, as the variability at scales smaller than the smoother cutoff is gradually eliminated (black and blue lines in Fig. 3e). Now, by comparing the power spectrum of the ENM in Fig. 3 (purple line) with the power spectra of the other three fields, we infer that for this particular set of ensemble forecasts, the following holds:

- For scales larger than a cutoff scale
*S*_{0}, the averaging acts as a smoother, giving more relative importance to larger scales, and gradually removing variability at smaller scales. - For scales smaller than
*S*_{0}, the averaging only affects the total variance at those scales, leaving the distribution of variance with scale unchanged.

Furthermore, the value of the ratio for scales smaller than *S*_{0} is around 9, the number of ensemble members. Averaging *N* decorrelated rainfall fields *R*_{i} with similar variances and statistical properties would result in a mean field with a power spectrum having the same slope as *R*_{i}, but 1/*N* of the variance. This implies that the constant power ratio for scales smaller than *S*_{0} is due to a complete decorrelation between the ensemble members at these scales.

### c. Implications for mesoscale predictability by NWP ensembles

We have shown so far that the ENM fields do not have the same statistical properties as the rainfall fields corresponding to the ensemble members, because of the filtering effect of ensemble averaging. For scales larger than a cutoff scale *S*_{0}, the averaging acts as a smoother, giving more importance to large scales and removing variance at smaller scales. However, for scales smaller than *S*_{0}, the scale distribution of variance is no longer affected because of the lack of agreement between the members. According to these results, for the precipitation forecasts from this SSEF system, the ensemble members become decorrelated after 3 h for scales lower than 30 km, after 10 h for scales lower than 80 km, and after 20 h for scales lower than 200 km. Our conclusions are solely based on the analysis of the ensemble, and not on any comparison with observations. By evaluating precipitation forecasts with respect to observations, Clark et al. (2011) found a similar SSEF system to be underdispersive (i.e., to have insufficient spread), especially for lead times of 6–18 h. They suggested that the reason for the underdispersion might be the lack of appropriate small-scale perturbations, which are the fastest growing, the current perturbations being obtained from a coarse-resolution ensemble. It would be interesting to associate their conclusions with our results, mainly to determine whether the complete decorrelation of the ensemble members at scales smaller than *S*_{0} signifies that the ensemble has sufficient spread at these scales; but a characterization of ensemble spread as a function of spatial scale is not attempted herein. However, we can state that the perturbations used in this SSEF are sufficient to cause the decorrelation of the ensemble members at small scales. This ensemble has both mixed-physics and IC/BC perturbations, thus making it impossible to separate the effect of each perturbation strategy on the correlation between the members. In addition, while increasing the number of ensemble members should generally have a positive effect on ensemble spread, it is not clear that having more decorrelated members would add any helpful information at scales smaller than *S*_{0}.

An interesting question is to relate the filtering effect of ensemble averaging to the actual skill of the ensemble in predicting rainfall. We have shown that the ensemble members are fully decorrelated for scales smaller than *S*_{0}, while they are increasingly correlated with increasing scale. Does this imply that for scales smaller than *S*_{0} the ensemble has no skill in forecasting precipitation? This question is explored in the next section.

## 4. Implications for the PMM forecasts

### a. The statistical properties of precipitation fields from the PMM

In the previous section we showed that for precipitation forecasts, ensemble averaging acts as a filter whose characteristics depend on the differences between the ensemble members. This filtering results in the ENM fields having lower rainfall values than the individual members. The high intensities are generally associated with small scales, which are poorly predicted and hence removed through averaging. The PM correction consists of replacing the PDF of rainfall values from the ENM with the PDF of the ensemble. However, the ENM PDF does not contain complete information about the distribution of variance with scale, as the distribution is quite different between the ENM and the members. Therefore, there is no reason to believe that matching the PDF of the ENM fields to that of the ensemble members will correct the distribution of variance with scale nor provide the proper distribution of intensity in space. Furthermore, if there is little agreement between the members on the location of high rainfall intensities, but more agreement about the medium rainfall intensities, then the PM could wrongly redistribute the highest values of the PDF. However, a thorough inspection of the rainfall maps and a comparison of the rainfall values between the ensemble members and the PMM have shown this to not occur often for our dataset (not shown).

To investigate the effect of the PM on the statistical structure of the rainfall fields from the ENM, Fig. 4 shows the average power spectra (Figs. 4a–c) and the average power ratios with respect to the PMM (Figs. 4d–f) for the ensemble members for forecast hours 3, 10, and 20. Figures 4a–c shows that the power spectra of the PMM resemble more closely the power spectra of the members as forecast lead time increases (the blue line is closer to the black line with increasing lead time). Again, the power ratio plots better emphasize the differences in the distribution of variance with scale between the ensemble members and the PMM (Figs. 4d–f). For scales smaller than *S*_{0}, the ratio decreases with forecast lead time, but the ensemble members generally have more variability than the PMM at these scales. However, the probability matching does seem to affect the variance more at the smallest resolvable scales. For scales larger than *S*_{0}, the ratio decreases with scale, becoming less than 1 for scales larger than about 300 km. Therefore, the small-scale precipitation features, which are usually poorly predicted by any of the members, are less evident in the PMM fields—the PMM has less variance than the members at those scales. Conversely, features at scales larger than 300 km, which are generally better predicted by all members, are emphasized in the PMM fields (i.e., the power ratios with respect to the PMM are smaller than 1). The effect of these differences in the statistical structure of precipitation between the PMM and the members on QPF skill is explored next.

### b. Intercomparison of QPF skill between the PMM and the ensemble members

It was shown by Berenguer et al. (2012) that, on average for this dataset, the PMM showed better skill than three of the ensemble members when evaluated in terms of the critical success index (CSI; Wilks 1995) and of the correlation coefficient (Zawadzki 1973). This improvement in skill could be due to the smoothing present in the PMM. Here, to investigate the effect of the statistical properties of the PMM fields on forecasting skill, we further compare the PMM to the nine radar data assimilating members using several verification metrics. These metrics are the ETS (Wilks 1995), the correlation coefficient and the normalized root-mean-square error (NRMSE). The ETS is based on contingency tables, and it is quite sensitive to the structure of the field, such that smoother fields usually have better ETS. Here, the ETS is computed for three thresholds (0.2, 1, and 2 mm h^{−1}), it is cumulated over the 24 cases and it is presented as a function of forecast lead time (Figs. 5a–c) for the ensemble members (gray lines), the ENM (red line), and the PMM (blue line). The statistical significance of the differences in the scores between the PMM and the ensemble members and the ENM was determined using the resampling methodology of Hamill (1999). Consistent with previous results (Chien and Jou 2004), the PMM has significantly better (at a 95% confidence level) ETS than all the ensemble members for thresholds of 0.2 and 1 mm h^{−1}, while for a threshold of 2 mm h^{−1} the statistical significance is not reached at some of the times due to the small sample size. The PMM has better ETS than the ENM only for the 0.2 mm h^{−1} threshold, which might be due to the ENMs having the largest coverage bias for this threshold.

To compare the QPF skill of the precipitation forecasts at different scales, Figs. 5d–i present the correlation coefficient and the NRMSE averaged over the entire dataset and as a function of spatial scale for lead times of 3, 10, and 20 h.

*F*and

*O*represent the forecasted and observed 2D precipitation maps, respectively. The perfect value of this score is 0, while values of 1 and more represent a completely skill-less forecast.

^{4}The correlation coefficient was calculated here without subtracting the mean, as in Germann and Zawadzki (2002), and a perfect forecast would have a correlation coefficient of 1. Both the NRMSE and the correlation coefficient are presented as a function of spatial scale, being computed for precipitation fields that have been bandpass filtered using the Haar wavelet transform as in Turner et al. (2004), and can be considered what Gilleland et al. (2009) classify as scale-separation metrics. These metrics can be considered complementary as the NRMSE penalizes drastically high-intensity errors, while the correlation coefficient favors fields with similar patterns, despite the variability within the pattern. In terms of the correlation coefficient, the PMM performs significantly better than most members at all scales at short forecast lead times (less than 6 h). With increasing forecast time, the better performance of the PMM is maintained only at scales larger than 100 km. Conversely, the ENM always has a better correlation coefficient than the PMM. Regarding the NRMSE results, the PMM is better than the ensemble members at small scales. With increasing spatial scale and forecast lead time, the differences in scores become less statistically significant, while the ENM always outperforms the PMM. As seen in the previous subsection, the difference in the statistical properties of precipitation between the PMM and the ensemble members is manifested mostly at small scales, while at scales larger than

*S*

_{0}, the power ratios with respect to the PMM were tending toward 1. It appears that when the PMM has better scores than the ensemble members it is because of the PMM fields having less variance at unpredictable scales than the ensemble members. In other words, the PMM has better skill than the individual members because it does not contain information at small scales, and not because the information at those scales is more correct. A similarly skillful forecast could be obtained by removing information at the less predictable scales for the control member CN using a low-pass filter. However, because of the intermittent nature of precipitation accumulation fields, carrying out such an experiment is difficult in practice. An attempt at undertaking such an experiment is presented in the appendix.

## 5. Conclusions

The analysis presented herein characterized the filtering effect of ensemble averaging of precipitation forecasts. Through the spectral analysis of ENM precipitation fields from the SSEF system run during the 2008 NOAA’s HWT Spring Experiment, we found that for scales larger than a cutoff scale *S*_{0}, the ENM fields behaved as a smoothed version of the fields corresponding to the ensemble members. At scales smaller than *S*_{0}, the total amount of variance for the ENM fields was much smaller than for the ensemble members, while the distribution of variance with scale was equivalent between the ensemble members and the ENM. Also, *S*_{0} was found to vary with forecast time, being on average about 30 km at *t* = 3 h, 90 km at *t* = 10 h, and 200 km at *t* = 20 h, and without varying much from case to case. By comparing the power spectrum of the ENM with the power spectra of modified CN fields, we have suggested that the spatial distribution of variance with scale for scales smaller than *S*_{0} for the ENM fields is a consequence of the complete decorrelation of the ensemble members at scales lower than *S*_{0}. Adding more decorrelated ensemble members would result in a complete removal of information at scales smaller than *S*_{0} in the ENM fields. Future research efforts should focus on determining the relative contribution of different sources of errors to the decorrelation of the ensemble members, and on improving the depiction of forecast uncertainty in high-resolution ensembles.

The paper also discussed the consequences of this finding for the PM correction of the ENM. Since being proposed by Ebert (2001), the probability-matching correction of the ensemble mean of precipitation forecasts has been frequently used to obtain a more successful alternative for deterministic ensemble QPF. This procedure produces forecasts that look more realistic than the ENM forecasts, and that have been reported to give better verification results than forecasts from the individual members. However, the reason for this better performance has not been clearly stated before, and it is the authors’ concern that the success of the PMM might be misinterpreted and overstated. For this reason, it was the authors’ intention to show quantitatively that the PMM performs better than the ensemble members because, like the ENM, it filters details at scales unpredictable by the individual members. While the probability matching correction results in a PMM forecast with a PDF of rainfall values as correct as the PDF of the ensemble members, the distribution of variance with scale for the PMM rainfall fields is not equivalent to those of the members. Moreover, it is this difference in the distribution of variance with scale (i.e., less variability at small scales for the PMM fields than for the ensemble members that causes the PMM forecasts to be more successful). However, the spatial distribution of high-intensity values in the PMM is not more correct than for the ensemble members. The better ETS at low thresholds for the PMM forecasts than for the forecasts from the individual members are due, not to a more correct representation of the small-scales, but to the filtering of information at those scales. It is well accepted that the ensemble averaging acts as a nonlinear filtering, and filtering of any kind affects the variance, and hence the values, of the original field. This is the reason why a field with the statistical structure of the ENM cannot have the same PDF as the original members. Now, despite the lack of theoretical reason behind the probability-matching correction, the PMM could be useful in some cases as a visually pleasing, global representation of the ensemble forecasts. However, it should be used with caution in quantitative applications. In addition, our quantitative evaluation with respect to radar-derived rainfall has shown that the PMM outperforms the ENM only in terms of ETS for a threshold of 0.2 mm h^{−1}. Therefore, in our case, the PMM does not seem any more suitable than the ENM for forecasting intense rainfall correctly, even though the PMM fields show high rainfall intensities, while the ENM fields do not.

In this paper, we have only investigated deterministic forecasts from the ensemble, highlighting their shortcomings, and advising that they should be used with caution. However, more useful information could be extracted from the ensemble by producing probabilistic forecasts. For example, such forecasts could be used instead of the PMM for providing heavy rain warnings. Furthermore, using the findings of our study could be useful in generating appropriate probabilistic forecasts. However, both the generation and the evaluation of probabilistic forecasts are deemed too extensive to be addressed here, and are left for future endeavors. We do, however, speculate that given the decorrelation between the members at scales smaller than *S*_{0}, probabilistic forecasts that include these scales would be misleading. However, filtering all information at scales smaller than *S*_{0} would make it impossible to provide probabilistic guidance for scales less than 100 km more than 10 h in advance. This result is in a good agreement with many studies of the predictability of precipitation at convective scales (Hohenegger and Schar 2007a,b; Radhakrishna et al. 2012; Zhang et al. 2002, 2006a,b).

Finally, we emphasize that the methodology used to produce Fig. 2 offers a precise, unambiguous definition of a predictability scale *S*_{0}. As shown here, *S*_{0} clearly separates the scales over which the ensemble members are totally decorrelated from the scales over which they are only partially decorrelated. Identifying the sources of errors that cause the complete decorrelation of the ensemble members is crucial, as it is unlikely that additional members would be more correlated, and as having more decorrelated members would not provide any additional meaningful information at scales smaller than *S*_{0}. Thus, this analysis is relevant for the characterization of the scale dependence of the mesoscale predictability of precipitation systems by NWP.

We are greatly indebted to Ming Xue and Fanyou Kong from CAPS for providing us the ensemble precipitation forecasts. The CAPS SSEF forecasts were produced mainly under the support of a grant from the NOAA CSTAR program, and the 2008 ensemble forecasts were produced at the Pittsburgh Supercomputer Center. Kevin Thomas, Jidong Gao, Keith Brewster, and Yunheng Wang of CAPS made significant contributions to the forecasting efforts. M. Surcel acknowledges the support received from the Fonds de Recherche du Québec–Nature et Technologies (FRQNT) in the form of a graduate scholarship. This work was also funded by the Natural Science and Engineering Research Council of Canada (NSERC) and Hydro-Quebec through the IRC program. Dr. Aitor Atencia is acknowledged for fruitful discussion on the meaning of the variability of the power spectrum ratio with scale. The paper greatly benefited from peer review, and the authors are especially thankful to Adam Clark for his thorough comments and corrections.

# APPENDIX

## The Effect of Modifying the Power Spectrum of a Precipitation Field on QPF Skill

We show here that modifying the power spectrum of a precipitation field from CN such that the distribution of variance with scale is the same as for the PMM, would result in the modified CN being more skillful.

Modifying the power spectrum of a precipitation field from CN with that of a field corresponding to the PMM is done similarly to Radhakrishna et al. (2013). The two precipitation fields are decomposed in the Fourier domain by applying a two-dimensional fast Fourier transform (FFT), resulting in a 2D amplitude and a 2D phase field for each of CN and PMM. Then, the amplitude field corresponding to CN is replaced with the amplitude field corresponding to PMM, while the phase field is kept constant, and the inverse FFT is applied to obtained a modified CN field. This modified CN field has the same power spectrum as the PMM field, but the same spatial distribution of precipitation as the original CN field. While theoretically this procedure is valid, in practice, due to the intermittency of precipitation fields, it results in a nonrealistic modified CN field when applied on hourly rainfall accumulation fields. Radhakrishna et al. (2013) had applied this methodology on logarithmic reflectivity fields that are smoother and follow a Gaussian distribution. Therefore, in Fig. A1 we show an example of the effect of modifying the power spectrum of a simulated logarithmic reflectivity field from CN on QPF skill. Simulated reflectivity maps at every hour were available for all the ensemble members as part of the 2008 Spring Experiment dataset. The ENM for the reflectivity forecasts was obtained by first converting the precipitation fields from all the members in rainfall rate according to a standard *Z–R* relationship (*Z* units. The PM procedure was applied on the logarithmic reflectivity fields. A Hanning window is applied to the fields prior to applying the FFT in order to comply with the periodicity assumption of the Fourier transform. Even for the logarithmic reflectivity fields, this procedure has as an artifact the loss of total variance in the modified CN field. Therefore, for a fair comparison, a modified PMM field is evaluated. The modified PMM field is constructed by scaling the power spectrum of the original PMM field by the amount of variance lost in the modified CN field. Figs. A1a–e show logarithmic reflectivity maps corresponding to the radar observations, CN, the modified CN, PMM, and the modified PMM at 1800 UTC 7 May 2008 (18-h lead time). The power ratios with respect to the PMM are shown in Fig A1f. For the simulated reflectivity fields, the power ratio between CN and PMM at small scales is in fact smaller than 1, which is consistent with the noisy appearance of the maps (cf. the light blue and the purple lines). However, for scales on the order of 100 km the PMM has more variance than CN, as in the case of hourly rainfall accumulation maps (Fig. 4). The loss of power in the modified CN fields is also indicated by the difference between the blue and light blue lines, but the modified CN field has the same distribution of variance with scale as the PMM. To show the effect of this power spectrum adjustment on QPF skill, the ETS as a function of lead time for a 15 dB*Z* (equivalent to 0.2 mm h^{−1}) threshold is shown in Fig. A1g. The loss of power does not affect the ETS, as indicated by the differences between the scores of the PMM and of the modified PMM (light blue and orange lines). However, adjusting the power spectrum of the CN fields to match that of the PMM increases the ETS, as shown by the difference between the blue and the other lines.

We have repeated this procedure for every available forecast, and the cumulated ETS as a function of lead time for a 15-dB*Z* threshold is shown in Fig. A2 for the CN, the PMM, the modified CN, and the modified PMM. It should be noted that in some cases, the CN was in fact more skillful than the PMM, and those cases were also included in the average. From this figure, we can conclude that the better skill of the PMM for logarithmic reflectivity fields is due to a filtering of unpredictable features at scales of between 30 km and a few hundred kilometers (meso-*β* scales). Simply modifying the power spectrum of the control member CN to account for this filtering results in an improvement in QPF skill.

## REFERENCES

Berenguer, M., , M. Surcel, , I. Zawadzki, , M. Xue, , and F. Kong, 2012: The diurnal cycle of precipitation from continental radar mosaics and numerical weather prediction models. Part II: Intercomparison among numerical models and with nowcasting.

,*Mon. Wea. Rev.***140**, 2689–2705.Brewster, K., , M. Hu, , M. Xue, , and J. Gao, 2005: Efficient assimilation of radar data at high resolution for short-range numerical weather prediction. Preprints,

*WWRP Int. Symp. on Nowcasting and Very Short Range Forecasting,*Whistler, BC, Canada, WMO, 3.06. [Available online at http://twister.ou.edu/papers/BrewsterWWRP_Nowcasting.pdf.]Calheiros, R. V., , and I. Zawadzki, 1987: Reflectivity rain rate relationships for radar hydrology in Brazil.

,*J. Climate Appl. Meteor.***26**, 118–132.Chien, F.-C., , and B. J.-D. Jou, 2004: MM5 ensemble mean precipitation forecasts in the Taiwan area for three early summer convective (mei-yu) seasons.

,*Wea. Forecasting***19**, 735–750.Clark, A. J., and Coauthors, 2011: Probabilistic precipitation forecast skill as a function of ensemble size and spatial scale in a convection-allowing ensemble.

,*Mon. Wea. Rev.***139**, 1410–1418.Clark, A. J., and Coauthors, 2012: An overview of the 2010 Hazardous Weather Testbed experimental forecast program spring experiment.

,*Bull. Amer. Meteor. Soc.***93**, 55–74.Clark, A. J., , W. A. Gallus, , M. Xue, , and F. Y. Kong, 2009: A comparison of precipitation forecast skill between small convection-allowing and large convection-parameterizing ensembles.

,*Wea. Forecasting***24**, 1121–1140.Demaria, E. M. C., , D. A. Rodriguez, , E. E. Ebert, , P. Salio, , F. Su, , and J. B. Valdes, 2011: Evaluation of mesoscale convective systems in South America using multiple satellite products and an object-based approach.

,*J. Geophys. Res.***116**, D08103, doi:10.1029/2010JD015157.Denis, B., , J. Cote, , and R. Laprise, 2002: Spectral decomposition of two-dimensional atmospheric fields on limited-area domains using the discrete cosine transform (DCT).

,*Mon. Wea. Rev.***130**, 1812–1829.Du, J., , J. McQueen, , G. DiMego, , Z. Toth, , D. Jovic, , B. Zhou, , and H. Chuang, 2006: New dimension of NCEP Short-Range Ensemble Forecasting (SREF) system: Inclusion of WRF members. Preprints,

*WMO Expert Team Meeting on Ensemble Prediction Systems,*Exeter, United Kingdom, WMO, 5 pp. [Available online at http://www.emc.ncep.noaa.gov/mmb/SREF/WMO06_full.pdf.]Ebert, E. E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation.

,*Mon. Wea. Rev.***129**, 2461–2480.Gallus, W. A., 2010: Application of object-based verification techniques to ensemble precipitation forecasts.

,*Wea. Forecasting***25**, 144–158.Gao, J., , M. Xue, , K. Brewster, , and K. K. Droegemeier, 2004: A three-dimensional variational data analysis method with recursive filter for Doppler radars.

,*J. Atmos. Oceanic Technol.***21**, 457–469.Germann, U., , and I. Zawadzki, 2002: Scale-dependence of the predictability of precipitation from continental radar images. Part I: Description of the methodology.

,*Mon. Wea. Rev.***130**, 2859–2873.Germann, U., , and I. Zawadzki, 2003: Predictability of precipitation as a function of scale from large-scale radar composites. Preprints,

*31st Conf. on Radar Meteorology,*Seattle, WA, Amer. Meteor. Soc., 4B.8. [Available online at https://ams.confex.com/ams/32BC31R5C/techprogram/paper_64495.htm.]Germann, U., , and I. Zawadzki, 2004: Scale dependence of the predictability of precipitation from continental radar images. Part II: Probability forecasts.

,*J. Appl. Meteor.***43**, 74–89.Gilleland, E., , D. Ahijevych, , B. G. Brown, , B. Casati, , and E. E. Ebert, 2009: Intercomparison of spatial forecast verification methods.

,*Wea. Forecasting***24**, 1416–1430.Hamill, T. M., 1999: Hypothesis tests for evaluating numerical precipitation forecasts.

,*Wea. Forecasting***14**, 155–167.Hohenegger, C., , and C. Schar, 2007a: Atmospheric predictability at synoptic versus cloud-resolving scales.

,*Bull. Amer. Meteor. Soc.***88**, 1783–1793.Hohenegger, C., , and C. Schar, 2007b: Predictability and error growth dynamics in cloud-resolving models.

,*J. Atmos. Sci.***64**, 4467–4478.Hu, M., , M. Xue, , and K. Brewster, 2006a: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of Fort Worth tornadic thunderstorms. Part I: Cloud analysis and its impact.

,*Mon. Wea. Rev.***134**, 675–698.Hu, M., , M. Xue, , J. Gao, , and K. Brewster, 2006b: 3DVAR and cloud analysis with WSR-88D level-II data for the prediction of Fort Worth tornadic thunderstorms. Part II: Impact of radial velocity analysis via 3DVAR.

,*Mon. Wea. Rev.***134**, 699–721.Janjic, Z. I., 2003: A nonhydrostatic model based on a new approach.

,*Meteor. Atmos. Phys.***82**, 271–285.Kong, F., and Coauthors, 2008: Real-time storm-scale ensemble forecast 2008 spring experiment. Preprints,

*24th Conf. on Severe Local Storms,*Savannah, GA, Amer. Meteor. Soc., 12.3. [Available online at https://ams.confex.com/ams/24SLS/techprogram/paper_141827.htm.]Lorenz, E. N., 1969: Predictability of a flow which possesses many scales of motion.

,*Tellus***21**, 289–307.Lorenz, E. N., 1996: Predictability—A problem partly solved.

*Proc. Seminar on Predictability,*Vol. I, Reading, United Kingdom, ECMWF, 1–19.Radhakrishna, B., , I. Zawadzki, , and F. Fabry, 2012: Predictability of precipitation from continental radar images. Part V: Growth and decay.

,*J. Atmos. Sci.***69**, 3336–3349.Radhakrishna, B., , I. Zawadzki, , and F. Fabry, 2013: Postprocessing model-predicted rainfall fields in the spectral domain using phase information from radar observations.

,*J. Atmos. Sci.***70**, 1145–1159.Roberts, N. M., , and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events.

,*Mon. Wea. Rev.***136**, 78–97.Seed, A. W., 2003: A dynamic and spatial scaling approach to advection forecasting.

,*J. Appl. Meteor.***42**, 381–388.Skamarock, W. C., and Coauthors, 2008: A description of the Advanced Research WRF version 2. NCAR Tech. Note NCAR/TN-475+STR, 113 pp. [Available online at http://www.mmm.ucar.edu/wrf/users/docs/arw_v3.pdf.]

Snook, N., , M. Xue, , and Y. Jung, 2012: Ensemble probabilistic forecasts of a tornadic mesoscale convective system from ensemble Kalman filter analyses using WSR-88D and CASA radar data.

,*Mon. Wea. Rev.***140**, 2126–2146.Turner, B. J., , I. Zawadzki, , and U. Germann, 2004: Predictability of precipitation from continental radar images. Part III: Operational nowcasting implementation (MAPLE).

,*J. Appl. Meteor.***43**, 231–248.Warner, T. T., 2011:

Cambridge University Press, 526 pp.*Numerical Weather and Climate Prediction.*Weusthoff, T., , D. Leuenberger, , C. Keil, , and G. C. Craig, 2011: Best member selection for convective-scale ensembles.

,*Meteor. Z.***20**, 153–164.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences: An Introduction.*International Geophysics Series, Vol. 59, Academic Press, 467 pp.Xue, M., , D.-H. Wang, , J.-D. Gao, , K. Brewster, , and K. K. Droegemeier, 2003: The Advanced Regional Prediction System (ARPS), storm-scale numerical weather prediction and data assimilation.

,*Meteor. Atmos. Phys.***82**, 139–170.Xue, M., and Coauthors, 2008: CAPS realtime storm-scale ensemble and high-resolution forecasts as part of the NOAA Hazardous Weather Testbed 2008 Spring Experiment. Preprints,

*24th Conf. on Severe Local Storms,*Savannah, GA, Amer. Meteor. Soc., 12.2. [Available online at https://ams.confex.com/ams/24SLS/techprogram/paper_142036.htm.]Zawadzki, I., 1973: Statistical properties of precipitation patterns.

,*J. Appl. Meteor.***12**, 459–472.Zhang, F. Q., , C. Snyder, , and R. Rotunno, 2002: Mesoscale predictability of the “surprise” snowstorm of 24–25 January 2000.

,*Mon. Wea. Rev.***130**, 1617–1632.Zhang, F. Q., , A. M. Odins, , and J. W. Nielsen-Gammon, 2006a: Mesoscale predictability of an extreme warm-season precipitation event.

,*Wea. Forecasting***21**, 149–166.Zhang, F. Q., , N. Bei, , C. C. Epifanio, , R. Rotunno, , and C. Snyder, 2006b: A multistage error-growth conceptual model for mesoscale predictability.

,*Bull. Amer. Meteor. Soc.***87**, 287–288.Zhang, J., , K. Howard, , and J. J. Gourley, 2005: Constructing three-dimensional multiple-radar reflectivity mosaics: Examples of convective storms and stratiform rain echoes.

,*J. Atmos. Oceanic Technol.***22**, 30–42.

^{1}

In this paper, predictability will refer to practical predictability as defined by Lorenz (1996), as this study focuses on precipitation forecasts from an ensemble based on an imperfect model, and that has both initial and boundary conditions and model physics perturbations.

^{2}

Ensemble members can be more skillful than the ensemble mean for individual cases, but this behavior is not systematic, as shown by Weusthoff et al. (2011), so when averaged over many cases, the ensemble mean is best.

^{3}

The probability matching procedure for the ENM correction was developed by Ebert (2001) and is clearly described therein. Mainly, for a given rainfall map, the rainfall values (including zeroes) from all *n* ensemble members are ordered increasingly, keeping only the *n*th value. Similarly, the rainfall values from the ENM are ordered increasingly, but storing the spatial location of each point along with its value. Then, each ranked value in the ENM field is replaced with the corresponding ranked value from the ensemble PDF, starting with the largest value. As the ENM has a high coverage bias with respect to the individual members, many of the smallest values in the ENM field are replaced by 0s, thus eliminating the bias.

^{4}

NRMSE can have values larger than 1 if the fields have both positive and negative values, which is the case for bandpass-filtered precipitation fields.