## 1. Introduction

Uncertainty in the forecast from a numerical weather prediction (NWP) model arises from errors in the observations initializing the model and from imperfections in the model itself. This uncertainty in the NWP forecast can be modeled by a distribution specifying the probability that the verifying observation will differ from the predicted value because of the aforementioned errors. An ensemble prediction system (EPS) samples this distribution by integrating the NWP model from perturbed initial conditions as well as introducing perturbations into the model. [Discussions of the strategies for obtaining sufficiently “representative” such samples can be found in Buizza et al. (2005), Buizza and Palmer (1995), Molteni et al. (1996), Lefaivre et al. (1997), Pellerin et al. (2003), and Toth and Kalnay (1993, 1997).]

Attempts are sometimes made to improve upon deterministic forecasts by recourse to the mean of the ensemble member forecasts, which filters out noise in the NWP solutions entering into the perturbed members randomly, noise that therefore largely cancels out on taking the mean over the ensemble (Palmer 1993; Wilks 2006). The dispersion of the ensemble sample affords an objective measure of the uncertainty in the forecast, be it the ensemble mean or the corresponding deterministic solution (the *spread*–*skill* relationship) (Kalnay and Dalcher 1987; Buizza 1997; Whitaker and Loughe 1998; Toth et al. 2001).

But this employs only the first two moments of the distribution. Moreover, any potential advantage enjoyed by the ensemble mean over deterministic forecasts is posited to disappear when a regime change in the atmospheric state space is encountered (Palmer 1993; Wilks 2006). EPS output can be more fully exploited by computing probabilistic forecasts (Anderson 1996; Toth et al. 1998; Atger 1999a; Toth et al. 2001). The simplest way of producing these, which is tantamount to using the *empirical* cumulative distribution function^{1} (cdf) (Dalgaard 2002; Rohatgi 1976) for the sample constituted by the member forecasts, consists of determining the fraction of members forecasting the pertinent event (Toth et al. 1998; Richardson 2001; Mullen and Buizza 2001; Ebert 2001). EPSs can suffer from their own systematic biases, as reflected in rank histograms that are skewed as well as overdispersive or, more commonly, underdispersive (Feddersen and Andersen 2005; Buizza et al. 2005; Hamill and Colucci 1997). Numerous approaches have been implemented or proposed to correct these problems, yielding calibrated ensemble forecasts (Zhu et al. 1996; Hamill and Colucci 1997,1998; Eckel and Walters 1998; Krzysztofowicz 2002; Atger 2003; Coelho et al. 2004; Raftery et al. 2005; Fortin et al. 2006; Wilson et al. 2007).

Yet even the probabilistic forecasts from a perfectly calibrated EPS can yield unrealistically large differences in the probabilities of proximal events, which is symptomatic of the granularity in discrete distributions describing continuous random variables. This granularity only becomes increasingly apparent with decreasing ensemble size (Richardson 2001). The Canadian EPS, comprising 17 members in its current configuration, produces probabilistic forecasts to a resolution no better than 1/17 ≈ 0.059.

Of particular interest in the evaluation of precipitation forecasts of the Canadian EPS in Peel and Wilson (2008, hereafter PW) was the ability of the system to accurately forecast significant events, whence a focus on the uppermost decile of the distribution. This interest in extreme events motivated an investigation into the construction of continuous fits to the ensemble sample in order to better model the tails of the distributions. Such fits have already been investigated. Hamill and Colucci (1997, 1998) fit Gumbel distributions to ensembles of 24-h precipitation forecasts [obtained from 10 integrations of the Eta Model (Black 1994; Rogers et al. 1995) and five integrations of the Regional Spectral Model (Juang and Kanamitsu 1994)] in order to model the tails of the sample distributions. Ensemble samples were also fit to gamma distributions in Hamill and Colucci (1998).

Gamma probability density functions (pdfs) do afford a broad spectrum of shapes, figuring prominently in precipitation climatologies (Thom 1958; Wilks 2006; von Storch and Zwiers 1995), but are unimodal. Possible bifurcations in the solutions of models for the highly nonlinear atmosphere could translate into multimodal distributions of ensemble forecasts (Atger 1999b; Stephenson and Doblas-Reyes 1994; Wilks 2002). Multinormal pdf’s were fit to 500-hPa geopotential height forecasts from the European Centre for Medium-Range Weather Forecasts (ECMWF) EPS in Stephenson and Doblas-Reyes (1994), which investigated the nonnormality of the ensemble sample.

Wilks (2002) fit multivariate Gaussian mixture models to temperature, wind, and cloud-cover forecasts from the ECMWF EPS. Depending on the value of a chi-squared statistic computed from the ensemble sample, a single Gaussian density, or a linear combination of two such densities, was fit to the ensemble. Utilization of Gaussian pdf’s (defined on the entire real line ℝ) in the construction of probabilistic models for forecast temperature is straightforward. Wind speed, whose support is confined to the nonnegative real axis, was mapped onto ℝ by applying the logarithm. The support for the cloud-cover fraction is [0, 1], which was mapped onto ℝ by the function consisting of the composition of the inverse Gaussian cdf with a Beta cdf fitted to the ensemble forecasts. Analogous treatment for forecasts of precipitation accumulation was proposed but not carried out. The mixture models were fitted to simulated ensembles of varying sizes and verified against corresponding analyses interpolated to five English cities. Hence, for example, it was found that the accuracy of the smoothed probability models obtained from 20-member ensembles forecasting the second percentile of wind chill forecasts (derived from joint distributions for temperature and wind speed) was competitive with that of the raw models obtained from 51 members. The difference in performance between the raw and smoothed models diminished with increasing ensemble size.

Wilks’s models are reminiscent of Gaussian kernel density estimators (KDEs), which are linear combinations of Gaussian pdf’s. Kernel density estimation is a *nonparametric* method of approximating the pdf of a random variable from a sample of its realizations (Silverman 1986; Wand and Jones 1995)—nonparametric in that no functional dependence (e.g., Gaussian, gamma, beta, etc.) is assumed for the pdf, whose form is determined solely by the sample data points and their associated kernels. Gaussian kernels are not well suited to modeling pdf’s for precipitation amount, whose support is nonnegative and whose pdf is often singular at the origin. The gamma kernels prescribed in Chen (2000) seemed to be better candidates for modeling pdf’s to precipitation forecasts, but the central challenge in constructing any KDE lies in finding the ideal smoothing bandwidth, which is generally not as straightforward for gamma kernels as it is for Gaussians. This paper reports our findings in attempting to model ensemble precipitation forecasts using Gamma KDEs, including the search for the best bandwidth.

The paper is organized as follows. Section 2 is a brief description of the Canadian EPS, including specification of the verification sample. Section 3 supplies an overview of kernel density estimation, touching on theoretical and computational considerations peculiar to the gamma kernel, in particular the judicious selection of the smoothing bandwidth. An experiment constructing KDE models on samples from known distributions is outlined in section 4, along with the tabulation of the results. The outcome of the verification against real data is summarized in section 5. Finally, the results are discussed and some conclusions drawn in section 6.

## 2. The Canadian ensemble prediction system

Prior to the summer of 2007 the Spectral Finite Element (SEF) model (Ritchie and Beaudoin 1994) was used to integrate the control for the Canadian EPS. Eight perturbed analyses were obtained from the control, generating random perturbations based on known error distributions of pertinent observations, using the eigenvectors of the vertical covariance for vertically correlated measurements such as upper-air soundings. The mean of these eight perturbed analyses was then subtracted from the higher-resolution 3D variational analysis (Gauthier et al. 1999) used operationally for the deterministic NWP model, and varying fractions of this difference were added to the perturbed analyses to generate a total of 16 analyses with which to initialize the integration of the perturbed members. Half of these perturbed members were integrated in the SEF model, while the other eight perturbations were obtained from the Global Environmental Multiscale (GEM) model (Côté et al. 1998), which produced the operational deterministic forecasts at the Canadian Meteorological Center (CMC). In addition to two different dynamical NWP models, different parameterization schemes accounting for such subgrid-scale phenomena as deep and shallow convection were employed. Perturbations to boundary fields, including sea surface temperature, albedo, and roughness length, were also applied (Houtekamer et al. 1996; Pellerin et al. 2003).

The Canadian EPS underwent no changes between the beginning of August 2001 and the end of November 2004. This pause in its evolution was exploited to secure for analysis a sample of forecasts from what was deemed a tolerably stationary forecast system. (Further details on the choice of this verification sample can be found in PW.) Probabilistic forecasts resulting from KDEs fitted to the EPS forecasts in this sample were compared against the raw empirical model obtained as the fraction of ensemble members forecasting the event. These events consisted of threshold exceedances, the thresholds defined by long-term climate percentiles of 90%–99% in the case of the verification sample aggregated over the 36 stations plotted (see Fig. 1), or physical thresholds ranging from 1 to 5 mm for verification at individual stations.

The observational record was obtained by summing the 6-hourly precipitation amounts in the synoptic reports. Screening and processing of the observational data is discussed at greater length in PW. The member forecasts were interpolated to the locations of the stations, probability models fit to the resulting 17-element samples, and probabilities of threshold exceedance computed. The long-term climatology spanned 1 January 1972 through the end of July, 2001, providing a record of almost 30 yr. Percentiles were determined for each station for a warm season bounded by the Julian days of 121 and 300 inclusive, and a cool season comprising the remainder of the year. This procedure was followed to avoid those distortions in the metrics used to compare probabilistic models, which are discussed in Hamill and Juras (2006).

## 3. Modeling the PDF

### a. Kernel density estimation

*h*is the smoothing bandwidth,

*n*is the sample size, and

*u*

_{xi}(

*x*;

*h*) is the unit step function

^{2}of width

*h*centered at the sample point

*x*, are plotted in Fig. 3 for the sample {0, 1, 4, 4, 5, 9, 9, 10, 10, 10, 10, 13, 13, 13}. Use of such functions in KDEs was first suggested in Fix and Hodges (1951) (Silverman and Jones 1989). While Rosenblatt (1956) proposed other functions in place of the step functions constituting centered histograms, their replacement with smooth functions was first actually effected in Parzen (1962).

_{i}Kernel density estimation is in widespread service modeling pdf’s (Silverman 1986; Wand and Jones 1995), including applications to meteorology (Wilks 2006; Brooks et al. 2003). Approximation to the pdf is achieved through the linear superposition of functions referred to as *kernels,* which are almost invariably pdfs, and which are associated with each sample point. Gaussians are the most popular choice of kernel, but density estimators using symmetric kernels forecast nonnegligible probabilities of negative precipitation, and suffer from boundary bias near the origin (Silverman 1986; Wand and Jones 1995; Simonoff 1996).

### b. Gamma kernels

*f*(

_{γ}*x*;

*α*,

*β*) is the gamma density function with shape

*α*and scale

*β*(Rohatgi 1976; Larson 1982; Thom 1958), the

*x*are the sample data points, and

_{i}*h*is the smoothing bandwidth. In Fig. 3, gamma kernel density estimates have been superimposed onto the centered histograms of the same bandwidths. Note that the gamma kernels disperse the probability distribution much more widely over the support of the random variable, in contrast to the centered histograms, which concentrate the distributions about the sample points. Smoothing resulting from increasing the bandwidth is evident upon comparison of the KDE in Fig. 3a, for which the bandwidth is 0.4, with the KDE plotted in Fig. 3b for the bandwidth 1.0. In particular, the mode in the interval (4, 5) for the KDE in Fig. 3a has been replaced with a slight shoulder in the KDE with the larger bandwidth plotted in Fig. 3b. The concept of a KDE as a smoothed, centered histogram is manifest in Fig. 4a, which shows the KDE obtained from a bandwidth of 0.01, with a sharp mode at each point of the sample, contrasted against the much smoother representation in Fig. 4b resulting for a bandwidth of 1.0.

Strictly speaking, gamma kernels are not kernels at all, with their functional dependence upon the dummy variable and the sample points being asymmetric. The kernels become increasingly asymmetric and narrower as the data point, which coincides with the mode of the associated kernel, approaches the origin. Because the kernels are themselves pdf’s and therefore normalized to unity, the narrowing widths of the kernels approaching the origin dictate a compensatory increase in the kernel at its mode. Kernels corresponding to a data point at the origin are gamma pdf’s with a shape of 1, monotonically decreasing on [0, ∞) with a spike at the origin that is inversely proportional to the smoothing bandwidth. As is the case with any kernel estimator, increasing the bandwidth increases the width of the kernels, which therefore exert a more global influence on the KDE, resulting in a smoother approximation to the pdf.

### c. Bandwidth selection

*σ*is the standard deviation of the population and

*n*is the sample size, minimizes the asymptotic mean integrated square error in the Gaussian KDE, provided that the distribution being approximated is not too highly skewed. But distributions of quantitative precipitation forecasts (QPFs) typically exhibit marked positive skewness, in which case

*h*

_{opt}will be too large (Silverman 1986; Wand and Jones 1995).

*x*, from the original sample of EPS forecasts and constructing the gamma density estimator

_{i}*f̂*

_{−i}on the points remaining in what is termed the

*jackknife sample*(Efron and Tibshirani 1994):

*f̂*

_{−i}at the independent data point

*x*gives a likelihood

_{i}*f̂*

_{−i}(

*x*

_{i}), or log-likelihood log

*f̂*

_{−i}(

*x*

_{i}). Treating each sample point in turn as independent and averaging over their contributions to the log-likelihood yields the

*score function*CV(

*h*) proposed in Duin (1976):

*least squares cross validation*. This method seeks to minimize the integrated square error ∫(

*f̂*−

*f*)

^{2}in the estimator

*f̂*, which is shown to be equivalent (Silverman 1986) to minimizing the function

*M*

_{0}(

*h*) defined by

To effect the optimizations required for the above cross-validation techniques, we had recourse to the module *fmin* in the *Numerical Methods and Software* (NMS) package of routines in Kahaner et al. (1989), maintained at the Information Technology Laboratory (ITL) repository of the National Institute of Standards and Technology. The algorithm combines successive parabolic interpolation with a golden section search, being slightly modified from an algorithm published in Brent (1973). In addition to the function whose optimal value is sought, the algorithm requires the following inputs: a tolerance specifying the smallest separation between points in the function domain treated as distinct, and the interval over which the optimal value is sought. (We used [bw_{0}/20, 5bw_{0}].)

From the density plots in Fig. 5a of the bandwidths selected to fit the 24-h precipitation accumulations for a three day lead time, it is apparent that the bandwidths obtained from the cross-validation algorithms tend to be significantly smaller than the normal-scale estimate from (4), the least squares cross-validation value clearly being the smallest, on average, of the three. In Fig. 5b the asymmetry in the density plot for the difference bw_{lscv} − bw_{llcv} corroborates the fact that on average the least squares bandwidth estimate is smaller than that obtained from the likelihood cross-validation algorithm.

Distributions of the bandwidths from various algorithms are also compared in the density plots of Fig. 6 for lead times of 1 and 10 days. The normal-scale estimate (4), which determines the upper bound of the cross-validation bandwidths, is proportional to *σ*^{−1/5}. Here, *σ*, the standard deviation of the theoretical distribution, is approximated by the standard deviation of the member forecasts, which increases with increasing lead time as the individual member forecasts diverge. Lengthening the lead time magnifies the separation between the normal scale and cross-validation bandwidths.

Suspecting bandwidths obtained from the least squares cross-validation algorithm were too large, Chen (2000) compared KDE’s constructed with these bandwidths against KDE’s using bandwidths with half the least squares cross-validation value, and the smaller bandwidth KDE’s did indeed appear to better model the distribution. This inspired us to examine KDE’s using bandwidths obtained by recourse to the relatively cheap expedient of dividing the optimal Gaussian value bw_{0} by 5, 10 and 20. Distributions of bw_{0}/5 and bw_{0}/10 are compared with those from the other algorithms in Fig. 7.

*F̂*

_{Γ}(

*x*;

*h*) ≡ ∫

^{∞}

_{0}

*f̂*

_{Γ}(

*x*;

*h*)

*dx*is the cumulative distribution of the KDE while

*F*

_{emp}is the empirical cdf. [In practice the limit in (8) can at best be approximated by lim

_{h→ε}

*F̂*

_{Γ}(

*x*;

*h*), where

*ε*is machine epsilon for the processor performing the computations.] The analog of (8) for empirical pdf’s is confounded by the fact that they are essentially a sum of Dirac delta functions at the sample points, which also precludes plotting them. Because the pdf generally offers a more intuitive depiction of the distribution, pdf’s for the KDE model with bandwidth bw

_{0}/20 have been plotted as proxies for the empirical pdf. We denote this model KDE

_{bw0/20}; the models obtained with bandwidths bw

_{0}/10, bw

_{0}/5, and bw

_{0}are labeled analogously. Models obtained using the bandwidths generated by the likelihood and least squares cross-validation algorithms are denoted KDE

_{llcv}and KDE

_{lscv}, respectively.

### d. Examples

In Figs. 8 and 9 are plotted pdf’s obtained from gamma KDE’s with bandwidths bw_{0}/20, bw_{0}/5, and bw_{0} as well as those obtained from the cross-validation algorithms. In Fig. 8a the EPS member QPF’s range between 0.5 and 2.5 mm. The distribution from which the data in this example has been sampled can reasonably be expected to have a large positive derivative and positive curvature near the origin, resulting in an upward shift of the gamma KDE’s, a shift which increases with increasing bandwidth, per (3). Moving to the right along the abcissa, upon surpassing the first inflection point of *f*, the second term in (3) produces a negative contribution which grows in importance with increasing distance from the origin. Once the magnitude of the second term in (3) exceeds that of the first, the relative magnitude of the strongly versus weakly smoothed estimators (e.g., KDE_{bw0}versus KDE_{bw0/20}) is reversed. So for example, looking along the abscissa of Fig. 8a, for QPF in the neighborhood of 1 mm the pdf corresponding to the KDE obtained with the largest smoothing bandwidth, bw_{0}, has the smallest magnitude. On the other hand the pdf for the KDE with the smallest bandwidth, bw_{0}/20, has the largest magnitude. Eventually the curvature of *f* becomes positive again and the second term in (3) overwhelms the first because of the presence of the factor *x*, and KDE_{bw0} again dominates the pdf’s of the other models.

The net result of this is that, as expected, the greatest smoothing of the distribution is effected by the KDE with the largest bandwidth. The relationship of the gamma KDE models to the distribution being approximated is quite similar to that of Gaussian kernels except for the asymmetry arising from the variation with position along the abscissa of the gamma kernels, particularly in the vicinity of the origin (Figs. 3 and 4). One consequence of this asymmetry is that the center of mass of the pdf’s associated with the KDEs is shifted farther to the right with increasing bandwidth. Indeed, because the mean of the gamma distribution is the product of its shape and scale parameters (Thom 1958), it follows that the mean of the kernel density estimator defined in (2) is the sum of the smoothing bandwidth and the sample mean.

Not surprisingly, the KDE with bandwidth bw_{0}/20, typically the smallest of the bandwidths considered here, generally has the noisiest pdf. For the data sample modeled in Fig. 8a, the pdf obtained with this bandwidth has three modes; the other models smoothing the data into a unimodal distribution. In Fig. 8b all of the models have a sharp mode near the center of the cluster at or below 0.5 mm. In addition, KDE_{bw0/20} has smaller modes located at each of the three data points lying beyond 0.5 mm, suggesting some overfitting inasmuch as individual data points do not generally warrant their own modes. From a meteorological perspective it is difficult to believe that the EPS can accurately ascribe a much greater probability for 2.76 mm of precipitation (corresponding to the mode at the outermost point in the ensemble sample, labeled A in Fig. 8b) than for ∼2.25 mm of precipitation (forecast in the trough labeled B).

Forecasts for a significant East Coast storm are plotted in Fig. 9. At a lead time of 1 day all of the EPS member precipitation forecasts are congregated near 60 mm, corresponding to a pronounced mode in the KDEs, all of which are unimodal (save for a barely perceptible secondary mode in KDE_{bw0/20} around 40 mm). Because all of the EPS sample points are so large, little asymmetry is manifest in the kernels, which tend to resemble Gaussian densities as the shape parameter (essentially proportional to the QPF) increases (Thom 1958), while the contribution of the smoothing bandwidth to the mean of *f̂*_{Γ} becomes increasingly overshadowed by the average of the member forecasts. The modes of the models are thus nearly indistinguishable. Eighty-eight millimeters of precipitation was actually observed, which is well into the tail of KDE_{bw0/20}, whereas a considerable amount of the mass in the distribution of KDE_{bw0} remains to the right of this value.

EPS member forecasts start to diverge with increasing lead times, resulting in a greater spread of the ensemble sample, as can be seen for forecasts of the same event at a lead time of 2 days in Fig. 9b. With still greater lead times the forecasts start to separate into clusters, perhaps reflecting the uncertainty in the future trajectory of a low or a convective cell. Two or possibly three clusters are in evidence for the forecasts with 3-day lead time in Fig. 9c: near the origin, 40 mm, and a “cluster” of two forecasts just below 20 mm. KDEs obtained with the cross-validation bandwidths, as well as KDE_{bw0/20}, have three modes corresponding to these clusters, while KDE_{bw0/5} declines to assign a mode to the middle cluster, and KDE_{bw0} only admits a mode near the origin. This same behavior occurs at a lead time of 5 days. There is no clear evidence of overfitting in any of these models, but the assignment by three of the models of zero probability for precipitation between 20 and 30 mm (Fig. 9d) certainly constitutes a bold forecast, particularly for a lead time of 5 days.

Smoothing imposed upon a sample introduces bias, per (3) and discussed in more general terms in Wand and Jones (1995) and Silverman (1986). This bias can best be seen in cumulative distribution functions such as those plotted in Fig. 10a for EPS precipitation forecasts at St. John’s, Newfoundland. KDE_{bw0} has a marked overforecasting bias for QPFs ≳ 6.3 (=10^{0.75}) mm. The other estimators model the sample much more closely, the cdf of KDE_{bw0/20} (not shown) is largely indistinguishable from the empirical cdf save for the rounding of the corners at the jump points. The price for this close agreement is noisy pdf’s, as in, for example, KDE_{bw0/20} in Fig. 10b.

### e. Adjustment for nil precipitation forecasts

*α*≥ 1 and are, therefore, finite at the origin and continuous on the positive real axis. Thus, the associated distribution functions vanish at the origin, exacerbating any overforecasting bias that might be present for smaller precipitation amounts. To take such situations into account, we make the following adjustment to the probability models as represented by the cumulative distribution function

*F̂*

_{Γ}:

*n*

_{0}is the number of ensemble members forecasting nil precipitation.

*δ*

^{+}(

*x*) is the Dirac delta function restricted to the positive real axis,

^{3}centered on the origin, with the following properties:

*f*at the origin is faithfully reproduced by the first term of (10), that is,

*f*

_{1}of the true pdf, which is achieved as before by constructing the estimator

*f̂*

_{Γ}, using only the nonzero member forecasts.

In the event that none of the members forecast any precipitation, the distribution functions of all of the probability models reduce to the degenerate case of the Heaviside function with a step at the origin, the corresponding pdf’s all collapse to *δ*^{+}(*x*), and the probability of a nonzero precipitation amount vanishes. If only one of the members is forecasting precipitation, then *f̂*_{−i} in (5) is ill-defined and the cross-validation techniques are no longer applicable. However, a dataset with this structure does lend itself to fitting with a gamma distribution, the shape and scale parameters of which can be determined by the method of moments (Rohatgi 1976; Larson 1982; Thom 1958). Denoting the single nonzero forecast by *x*_{nz}, the method of moments yields *α* = 1 and *β* = *x*_{nz}. Shortcomings in the method of moments estimators notwithstanding (Thom 1958), computation of the maximum likelihood estimators (Rohatgi 1976; Larson 1982; Thom 1958) to the gamma distribution parameters is stymied by the presence of zero values in the dataset because the gamma distribution is singular at the origin for shapes less than 1. While these difficulties have been circumvented in Wilks (1990), the values obtained from the method of moments for the gamma parameters are appealing inasmuch as the magnitude of the nonzero precipitation forecast constitutes the simplest measure of the dispersion of the distribution, which is closely related to the scale parameter. The value obtained for the shape is the smallest possible for a gamma density that does not have a singularity at the origin, the latter being accounted for in the first term of (10). Using the smallest possible shape parameter subject to this constraint mutes the influence of the nonzero data point without discarding it outright.

## 4. An experiment with known distributions: Evaluation against idealized data

*a*

_{1},

*a*

_{2},

*a*

_{3},

*a*

_{4}) ∈ {(1, 0, 0, 0), (0, 0, 1, 0), (1/20, 0, 0, 19/20), (0, ¼, 0, ¾)}, comprising two unimodal distributions, one with the mode at the origin, and two bimodal distributions, one again having a mode at the origin (Fig. 13). KDEs were constructed on these samples using the bandwidths discussed above and compared, along with the empirical model, against the (known) underlying distribution.

There remains the choice of metric to measure how closely the model densities (*f̂*) approximate the known distribution ( *f* ). The mean (over repeated samples from the underlying distribution) of the integrated square error ≡ ∫^{∞}_{0} ( *f* − *f̂*)^{2} (MISE) was the first criterion used in this context and remains a popular choice (Rosenblatt 1956; Silverman 1986; Wand and Jones 1995). Devroye and Györfi (1985) extol the virtues of the mean of the integrated absolute error ≡ ∫^{∞}_{0}| *f* − *f̂*| (MIAE), which is the only *L _{p}* distance invariant under scale transformations but is not as easy to work with as the MISE in formulating algorithms to minimize error in the estimators (Wand and Jones 1995). The Kullback–Liebler divergence,

*D*

_{KL}≡ ∫

^{∞}

_{0}

*f̂ log*(

*f̂*/

*f*) (Kullback 1968; Scott and Sain 2005), while not a metric per se,

^{4}is tantamount to the expected log-likelihood. The metrics mentioned so far require the pdf, which is not available for the empirical model. A statistic that can be computed for this model is the Kolmogrov–Smirnov distance,

*D*

_{KS}≡ sup

_{x∈[0,∞)}|

*F*(

*x*) −

*F̂*(

*x*)| (Rohatgi 1976; Wilks 2006), representing the

*L*

_{∞}metric on the space of the distribution functions. The square of the

*L*

_{2}norm on this space,

*D*

_{2}≡ ∫

^{∞}

_{0}(

*F̂*−

*F*)

^{2}, is a generalization of the continuous rank probability score (CRPS; Wilks 2006) and can also be applied to the empirical model.

Results of the experiment are summarized in Tables 1 –4. In almost all cases, the models found to be closest to the known distribution were KDE_{bw0/5} or KDE_{bw0/10}; the MISE in KDE_{llcv} was slightly smaller than that of KDE_{bw0/5} in modeling *f*_{3} (Table 3). Performance of the cross-validation KDEs was generally good, although KDE_{lscv} was inferior to KDE_{llcv}, even for the MISE. While bw_{lscv} is constructed to minimize the MISE in the KDE, there is no guarantee that it always will because the optimization is only obtained in the asymptotic limit (of infinite sample size). Moreover, *M*_{0} [cf. (7)] can have more than one local minimum, resulting in the generation of a suboptimal bandwidth by the least squares algorithm. [Plotting *M*_{0} (Sheather 2004; Bowman and Azzalini 1997) each time one needs a bandwidth is unsuited to an automated forecast system, impracticable on a verification sample comprising tens of thousands of cases.] Even in cases where the above problem does not arise, the least squares bandwidth has been found to be much too small (Sheather 2004), its overall performance dubbed “somewhat disappointing” (Wand and Jones 1995; Hall and Marron 1987; Park and Marron 1990) despite still being recommended (Sheather 2004).

Generally, KDE_{bw0} performed the worst of all the models by all measures except the MIAE, for which KDE_{bw0/20} yielded poorer results for all of the distributions save *f*_{1} (Table 1). In the Kolmogorov–Smirnov metric the empirical model usually lags well behind all of the other models except KDE_{bw0}, which outperformed the empirical model in modeling *f*_{4} (Table 4). Here *D*_{KS} assesses the largest pointwise difference between two distributions and it would thus be particularly sensitive to the granularity of the empirical models. In the *D*_{2} metric, which weighs the separation of the distributions of two random variables globally over their entire support (the support of a function is the closure of that subset of its domain on which it is non-zero—where the function here is the random variable associated with the distribution, a random variable being by definition a Lebesgue-measurable function defined on the sigma-algebra which constitutes the event space of the probability space) the difference in the performance of the empirical model relative to KDE_{bw0/5} or KDE_{bw0/10} is much less pronounced. This does not hold for the experiment on *f*_{1}, which is pathological inasmuch as we are attempting to fit finite models to a distribution that is singular at the origin. In consequence the mass of *f*_{1} is highly concentrated at the origin, resulting in a very light tail. The fitted models are therefore penalized for their comparatively heavy tails, particularly in the MISE, which would be the most sensitive to this. (There is also a term in *f*_{3} with a singularity at the origin, but the other term is centered well away from the origin and its contribution to the overall distribution is weighted much more heavily.)

## 5. Results of verification against real data

Verification against real data was effected using the Brier skill score (BSS) relative to the sample climatology along with the Brier components [*reliability* (BS_{rel}) and *resolution* (BS_{res}) (Stanski et al. 1990; Buizza et al. 2005; Wilks 2006),^{5} the areas under the receiver operating characteristic (ROC) curves (Mason 1982; Harvey et al. 1992; Stanski et al. 1990), and attributes diagrams (Stanski et al. 1990; Wilks 2006; Murphy 1986)].^{6} Examples of the latter are plotted in Fig. 14 for warm season forecasts of the probability of threshold exceedance (the threshold being the 90th percentile in the long-term station climates). The 17-member forecasts afford 18 empirical probabilistic forecasts (0/17–17/17). The number of bins in the attributes diagrams must therefore be a divisor of 18 to facilitate comparison of the empirical forecasts against those from the continuous KDEs. The sample aggregated over 36 stations was large enough to use nine bins without introducing excessive noise.

Histograms of the populations of the forecast bins (insets in Figs. 14a and 14b) reveal that the KDEs reduce the population of the lowest bin, the greatest reduction being achieved with the largest bandwidth, bw_{0}. The heavier tails associated with the larger-bandwidth KDEs (Figs. 8 –10) confer greater probabilities of threshold exceedance, pushing some of the forecasts out of the leftmost bin. Forecasts culled from this bin are perforce displaced into higher probability bins and the relative populations of the corresponding bins must eventually be reversed among the different models. It can be seen from the histograms in Fig. 14 that the population of the second forecast bin is greatest for the KDE_{bw0} forecasts. The increase in the population of the second bin of the cross-validation forecasts corrects a slight underforecasting bias in the raw EPS probability forecasts at a lead time of 1 day. By 2 days this underforecasting bias of the raw EPS has been replaced by a slight overforecasting bias, which is exacerbated by the KDEs because of the increased population of this bin in these models.

Nevertheless, the performance of the KDE models is quite close to that of the raw EPS forecasts, particularly for the bandwidths generated by the cross-validation algorithms. Their behavior is intermediate between that of KDE_{bw0} with the largest bandwidth, and the empirical model, which is tantamount to a KDE in the limit of vanishing bandwidth. The models exhibit skill throughout the range of forecast probabilities, which is reflected in the plot of BSS versus lead time (Fig. 15a). Particularly interesting is the performance of KDE_{bw0/5}, which closely follows the raw EPS forecasts, especially for the first 5 days, during which the EPS forecasts demonstrate positive skill. Here, KDE_{bw0} lags well behind the rest, its performance deteriorating rapidly with increasing lead time.

The breakdown of the BSS into its contributions from the reliability and resolution of the forecasts (Fig. 15b) makes clear that the poor showing of the KDE_{bw0} forecasts is attributable to their poor reliability. Diminishing bandwidth in the KDEs reduces the bias introduced into the density estimators, as reflected in Fig. 15b by better (smaller) reliability. At a lead time of 1 day, the KDEs have a slightly better resolution than does the empirical model, after which the resolutions are virtually indistinguishable, as are the areas under the ROC curves (not shown).

The performance of cool season forecasts at a single station (St. John’s) for a lead time of 1 day is displayed in Fig. 16. Because a single station is analyzed, physical thresholds (in this case 4 mm) can be used, but smaller sample sizes dictated the use of only six bins. An underforecasting bias evident in the raw EPS forecasts for probabilities <1/2 is corrected by the KDEs (Fig. 16a). The plot of BSS as a function of lead time in Fig. 16b, however, shows that the only significant difference among the models is the more rapid deterioration of KDE_{bw0} with increasing lead time.

By definition of the cumulative distribution function *F*: [0, ∞) → [0, 1] corresponding to a random variable *X*, if *F*(*x*) = *y* is some percentage, then that percentage *y* of the population has a value not exceeding *x.* For example, if *F*(4) = 0.75 for a distribution function *F* corresponding to the daily precipitation at some station, then 75% of the daily precipitation amounts observed at that station are 4 mm or less. We can therefore gauge the calibration of the models by comparing the value of the forecast distribution *F* against the frequency with which the corroborating observations actually fell below the corresponding inverse image of *F*. For each percentile *y* in the upper decile (i.e., probabilities ranging from 0.90 to 0.99) the inverse image of *y* under *F*, *F*^{−1} (*y*), was determined for each of the KDEs and compared against the matching observation. In the case of the empirical cdf, *F*_{emp}, *F*^{−1}_{emp}(*y*) was set to the maximum member forecast for *y* > 16/17 ≈ 0.941. For 15/17 < 0.90 < *y* < 16/17, *F*^{−1}_{emp}(*y*) was set to the second largest member forecast (which could coincide with the maximum member forecast).

The results of this comparison are plotted in Fig. 17. In Fig. 17a the raw EPS cool season forecasts are seen to have the best calibration at the 90th percentile, ≈92.1% of the observations being bound by the 90th percentile from the model, representing an overforecasting bias. The discrete nature of these forecasts constrains them to the same value over the probability interval [0.90, 0.94], however, because *F*^{−1}_{emp}(*y*) does not change over this interval. Hence by the 94th percentile, only 92.1% of the observations fell below the predicted percentile, thereby constituting an underforecasting bias. Crossing the threshold of 16/17 to the 95th percentile, the empirical model is quite well calibrated with ∼94.5% of the observations constrained by the forecast quantile, but the calibration again deteriorates on the interval [0.95, 0.99], so that by the 99th percentile the raw EPS evinces a significant underforecasting bias.

The continuous models resulting from kernel density estimation are free from the above constraint on the empirical cdf. While the forecasts from KDE_{bw0/5} suffer from a somewhat higher overforecasting bias than the raw forecasts for the 90th percentile (Fig. 17a), by the 94th forecast percentile the calibration of the KDE has improved considerably. At the 95th percentile the calibration of KDE_{bw0/5} is perfect, and its superiority to the empirical model only increases with increasing forecast percentiles. Hence by the 99th percentile, the calibration of KDE_{bw0/5} is superior to that of the empirical model by ∼2.5%. The overforecasting bias of KDE_{bw0} is reflected in the appearance of its forecasts at the top of the plots in Fig. 17, showing good calibration in the cool season for forecast percentiles of 99%. The price of this calibration at the upper end of the probability domain is a more severe overforecasting bias at the left boundary of the upper decile. As usual, the performance of the cross-validation models in the verification was intermediate between the two extremes.

Analogous results were obtained for warm season forecasts (Fig. 17b)—the salient difference being the downward translation of all of the curves by ≲2%. The slopes of these curves offer another means of comparing models; the closer the slope is to unity, the better the forecasts (normally after some sort of calibration akin to vertical translation in the kind of plots displayed in Fig. 17). On this basis, KDE_{bw0/5} again appears to be superior to the others (although it is only marginally superior to the cross-validation models). On the other hand, the relatively large bandwidths in the optimal Gaussian KDE, with the concomitant oversmoothing of the pdf’s, make these forecasts the least sensitive to the value of the cdf, whence the smallest slope and the smallest opportunity for improvement after calibration.

## 6. Summary and conclusions

The aim of this work was the modeling of continous pdf’s to precipitation forecasts of the Canadian EPS by recourse to gamma KDEs with a view toward better estimation of the probabilities, especially in the tails. Several different methods were used to obtain the smoothing bandwidths of the KDEs and their performance was tested, along with the empirical probabilistic model, both on actual data from the Canadian EPS, as well as idealized data constructed from known distributions.

Gamma KDE forecasts demonstrated skill comparable to empirical probabilistic forecasts in the verification against real data. Gamma KDE forecasts can correct underforecasting biases in intermediate forecast probabilities. If, on the other hand, the EPS is already well calibrated, then the gamma KDEs tend to introduce an overforecasting bias in the intermediate forecast bins, or exacerbate such a bias if it already exists. The KDEs constructed on idealized data generally outperformed the empirical models. Granularity in the latter resulted in a particularly poor showing in the Kolmogorov–Smirnov metric, whereas their performance approached that of the best of the KDEs in the *D*_{2} norm, a generalization of the CRPS, itself an extension of the BSS. This suggests the possibility that the metrics used in verification against real data could potentially cast the empirical model in a more flattering light than other norms would supply.

KDEs using bandwidths from the cross-validation algorithms were outperformed by KDEs using bandwidths obtained by resorting to the simple expedient of taking various fractions of the optimal Gaussian bandwidth, specifically bw_{0}/5 and bw_{0}/10. One approach to the optimization of the KDEs might therefore consist of computing scores for varying fractions of bw_{0} on a training set of verified EPS forecasts, settling on that fraction giving the best results in the metric appropriate to the problem at hand. Optimal bandwidths could be determined in this manner for different seasonal strata, lead times, thresholds, and, provided the threshold is not too extreme, locations. Richardson (2000) points out that the benefit derived from an EPS forecast depends crucially on the cost–loss ratio of the user. As this cost–loss ratio diminishes, the forecast must be able to predict increasingly extreme events in order to be of value. The plots in Fig. 17 suggest how the smoothing bandwidth could be tuned to supply optimal calibration at the probability threshold of the user’s choosing, thereby maximizing the utility of the ensemble forecasts.

Increasing dispersion in the ensemble member forecasts translates into larger smoothing bandwidths, particularly bw_{0}, which is proportional to the standard deviation of the ensemble sample. The concomitant increase in the smoothing exacerbates the bias. Taking the forecast lead time into account when determining the optimal smoothing bandwidths could mitigate this deterioration in the relative quality of the KDE models with increasing lead time. However, the waning confidence in the ensemble forecasts with increasing lead time also contributes to this degradation in the quality of the KDE forecasts. At shorter lead times the confidence of the EPS manifests itself through fewer forecasts in the intermediate probability bins, whence a proclivity for an underforecasting bias in these bins, which the gamma KDEs reduce. As the EPS becomes less sure of itself with increasing lead time, the populations of the intermediate bins swell, resulting in an overforecasting bias, which the KDEs only aggravate.

The choice of the functional form for the kernels themselves has not been exhausted. While gamma densities with modes at the sample data points are intuitively appealing, densities whose means coincide with the data points have some attractive properties of their own, and were also explored in Chen (2000). Use of density estimators constructed from inverse Gaussian and reciprocal inverse Gaussian probability density functions to represent random variables whose support is [0, ∞) has also been investigated (Scaillet 2004).

There are, of course, corrections that could be applied directly to the empirical model. Linear or spline interpolation would reduce the granularity of the empirical model manifest in Fig. 17. Its tail could be modeled beyond the maximum member forecast, either linearly by use of some prescription for truncation, or resorting to a parameterized continuous form that vanishes sufficiently rapidly at infinity. Nevertheless, gamma KDEs do afford a direct, simple, intuitive approach to the construction of faithful, credible realizations of EPS forecast distributions. The smoothing bandwidth supplies one free parameter, which can be tuned to whatever verification samples might be available, subject to the requirements of the intended recipients, in order to provide the most useful forecasts. Though by no means the only method with which to model EPS outputs, gamma KDEs do seem to merit consideration as one possible approach in the generation of probabilistic predictions from ensemble forecasts of QPF, or indeed any meteorological variable whose support spans the nonnegative reals.

## Acknowledgments

Part of the observational record was extracted from a database constructed by Gérard Croteau at the Canadian Meteorological Centre. Some of the Canadian EPS forecast data was supplied by Gérard Pellerin at the Canadian Meteorological Centre. ROC areas were computed using code supplied by Gérard Pellerin, adapted from the software of D. D. Dorfman and E. Alf, based on a program listed in Swets and Pickett (1982). The authors also thank Dr. William Burrows, Dr. Jim Hansen, two anonymous reviewers, and the editor Dr. Da-Lin Zhang for their helpful comments and suggestions, along with Dr. Gilbert Brunet for suggesting this line of research.

## REFERENCES

Anderson, J. L., 1996: A method for producing and evaluating probabilistic forecasts from ensemble model integrations.

,*J. Climate***9****,**1518–1530.Atger, F., 1999a: The skill of ensemble prediction systems.

,*Mon. Wea. Rev.***127****,**1941–1953.Atger, F., 1999b: Tubing: An alternative to clustering for the classification of ensemble forecasts.

,*Wea. Forecasting***14****,**741–757.Atger, F., 2003: Spatial and interannual variability of the reliability of ensemble-based probabilistic forecasts: Consequences for calibrations.

,*Mon. Wea. Rev.***131****,**1509–1523.Black, T. L., 1994: The new NMC mesoscale Eta Model: Description and forecast examples.

,*Wea. Forecasting***9****,**265–278.Bowman, A. W., and Azzalini A. , 1997:

*Applied Smoothing Techniques for Data Analysis*. Oxford University Press, 193 pp.Brent, R., 1973:

*Algorithms for Minimization without Derivatives*. Prentice-Hall, 195 pp.Brooks, H. E., Doswell C. A. , and Kay M. P. , 2003: Climatological estimates of local daily tornado probability for the United States.

,*Wea. Forecasting***18****,**626–640.Buizza, R., 1997: Potential forecast skill of ensemble prediction and spread and skill distributions of the ECMWF Ensemble Prediction System.

,*Mon. Wea. Rev.***125****,**99–119.Buizza, R., and Palmer T. N. , 1995: The singular-vector structure of the atmospheric global circulation.

,*J. Atmos. Sci.***52****,**1434–1456.Buizza, R., Houtekamer P. L. , Toth Z. , Pellerin G. , Wei M. , and Zhu Y. , 2005: A Comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems.

,*Mon. Wea. Rev.***133****,**1076–1097.Chen, S. X., 2000: Probability density function estimation using gamma kernels.

,*Ann. Inst. Stat. Math.***52****,**471–480.Coelho, C. A. S., Pezzulli S. , Balmaseda M. , Doblas-Reyes F. J. , and Stephen-son D. B. , 2004: Forecast calibration and combination: A simple Bayesian approach for ENSO.

,*J. Climate***17****,**1504–1516.Côté, J. S., Gravel S. , Méthot A. , Patoine A. , Roch M. , and Staniforth A. , 1998: The operational CMC/MRB Global Environmental Multiscale (GEM) model. Part I: Design considerations and formulation.

,*Mon. Wea. Rev.***126****,**1373–1395.Dalgaard, P., 2002:

*Introductory Statistics with R*. Springer, 267 pp.Devroye, L., and Györfi L. , 1985:

*Nonparametric Density Estimation—The L*1*View*. Wiley, 356 pp.Duin, R. P. W., 1976: On the choice of smoothing parameters for Parzen estimators of probability density functions.

,*IEEE Trans. Comput.***C-25****,**1175–1179.Ebert, E., 2001: Ability of a poor man’s ensemble to predict the probability and distribution of precipitation.

,*Mon. Wea. Rev.***129****,**2461–2480.Eckel, F. A., and Walters M. K. , 1998: Calibrated probabilistic quantitative precipitation forecasts based on the MRF ensemble.

,*Wea. Forecasting***13****,**1132–1147.Efron, B., and Tibshirani R. J. , 1994:

*An Introduction to the Bootstrap*. Chapman and Hall/CRC, 436 pp.Feddersen, H., and Andersen U. , 2005: A method for statistical downscaling of seasonal ensemble predictions.

,*Tellus***57A****,**398–408.Fix, E., and Hodges J. L. Jr., 1951: Discriminatory analysis, nonparametric estimation: consistency properties. USAF School of Aviation Medicine Tech. Rep. 4, Project 21-49-004, Randolph Field, TX, 21 pp.

Fortin, V., Favre A-C. , and Said M. , 2006: Probabilistic forecasting from ensemble prediction systems: Improving upon the best-member method by using a different weight and dressing kernel for each member.

,*Quart. J. Roy. Meteor. Soc.***132****,**1349–1369.Gauthier, P., Charette C. , Fillion L. , Koclas P. , and Laroche S. , 1999: Implementation of a 3D variational data assimilation system at the Canadian Meteorological Centre. Part 1: The global analysis.

,*Atmos.–Ocean***37****,**103–156.Hall, P., and Marron J. S. , 1987: Extent to which least squares cross validation minimizes integrated squared error in nonparametric density estimation.

,*Probab. Theory Relat. Fields***74****,**567–581.Hamill, T. M., and Colucci S. J. , 1997: Verification of Eta–RSM short-range forecasts.

,*Mon. Wea. Rev.***125****,**1312–1327.Hamill, T. M., and Colucci S. J. , 1998: Evaluation of Eta–RSM ensemble probabilistic precipitation forecasts.

,*Mon. Wea. Rev.***126****,**711–724.Hamill, T. M., and Juras J. , 2006: Measuring forecast skill: Is it real skill or is it the varying climatology?

,*Quart. J. Roy. Meteor. Soc.***132****,**2905–2923.Harvey L. O. Jr., , Hammond K. R. , Lusk C. M. , and Mross E. F. , 1992: The application of signal detection theory to weather forecasting behavior.

,*Mon. Wea. Rev.***120****,**863–883.Heitler, W., 1984:

*The Quantum Theory of Radiation*. Dover, 340 pp.Houtekamer, P. L., Lefaivre L. , Derome J. , Ritchie H. , and Mitchell H. L. , 1996: A system simulation approach to ensemble prediction.

,*Mon. Wea. Rev.***125****,**2416–2426.Jolliffe, I. T., and Stephenson D. B. , 2003:

*Forecast Verification—A Practitioner’s Guide in Atmospheric Science*. Wiley, 240 pp.Juang, H-M., and Kanamitsu M. , 1994: The NMC nested regional spectral model.

,*Mon. Wea. Rev.***122****,**3–26.Kahaner, D., Moler C. , and Nash S. , 1989:

*Numerical Methods and Software*. Prentice Hall, 495 pp.Kalnay, E., and Dalcher A. , 1987: Forecasting forecast skill.

,*Mon. Wea. Rev.***115****,**349–356.Krzysztofowicz, R., 2002: Bayesian processor of output: A new technique for probabilistic weather forecasting. Preprints,

*19th Conf. on Weather Analysis and Forecasting,*San Antonio, TX, Amer. Meteor. Soc., 391–394.Kullback, S., 1968:

*Information Theory and Statistics*. Dover, 399 pp.Larson, H. J., 1982:

*Introduction to Probability Theory and Statistical Inference*. Wiley, 637 pp.Lefaivre, L., Houtekamer P. L. , Bergeron A. , and Verret R. , 1997: The CMC Ensemble Prediction System.

*Proc. Sixth Workshop on Meteorological Operational Systems,*Reading, United Kingdom, ECMWF, 31–44.Mason, I., 1982: A model for assessment of weather forecasts.

,*Aust. Meteor. Mag.***30****,**291–303.Molteni, F., Buizza R. , Palmer T. N. , and Petroliagis T. , 1996: The ECMWF Ensemble Prediction System: Methodology and validation.

,*Quart. J. Roy. Meteor. Soc.***122****,**73–119.Mullen, S. L., and Buizza R. , 2001: Quantitative Precipitation Forecasts over the United States by the ECMWF Ensemble Prediction System.

,*Mon. Wea. Rev.***129****,**638–663.Murphy, A. H., 1986: The attributes diagram: A geometric framework for assessing the quality of probability forecasts.

,*Int. J. Forecasting.***2****,**285–293.Palmer, T. N., 1993: Extended-range atmospheric prediction and the Lorenz method.

,*Bull. Amer. Meteor. Soc.***74****,**49–65.Park, B., and Marron J. , 1990: Comparison of data-driven bandwidth selectors.

,*J. Amer. Stat. Assoc.***85****,**66–72.Parzen, E., 1960:

*Modern Probability Theory and Its Applications*. Wiley, 464 pp.Parzen, E., 1962: On the estimation of a probability density function and the mode.

,*Ann. Math. Stat.***33****,**1065–1076.Peel, S., and Wilson L. J. , 2008: A diagnostic verification of the precipitation forecasts produced by the Canadian Ensemble Prediction System.

*Wea. Forecasting*,**23**, 596–616.Pellerin, G., Lefaivre L. , Houtekamer P. , and Girard C. , 2003: Increasing the horizontal resolution of ensemble forecasts at CMC.

,*Nonlinear Processes Geophys.***10****,**463–468.Raftery, A. E., Gneiting T. , Balabdaoui F. , and Polakowski M. , 2005: Using Bayesian model averaging to calibrate forecast ensembles.

,*Mon. Wea. Rev.***133****,**1155–1174.Richardson, D. S., 2000: Skill and relative economic value of the ECMWF Ensemble Prediction System.

,*Quart. J. Roy. Meteor. Soc.***126****,**649–667.Richardson, D. S., 2001: Measures of skill and value of ensemble prediction systems, their interrelationship and the effect of ensemble size.

,*Quart. J. Roy. Meteor. Soc.***127****,**2473–2489.Ritchie, H., and Beaudoin C. , 1994: Approximations and sensitivity experiments with a baroclinic semi-Lagrangian spectral model.

,*Mon. Wea. Rev.***122****,**2391–2399.Rogers, E., Deaven D. G. , and DiMego G. J. , 1995: The regional analysis system for the operational “early” Eta Model: Original 80-km configuration and recent changes.

,*Wea. Forecasting***10****,**810–825.Rohatgi, V. K., 1976:

*An Introduction to Probability Theory and Mathematical Statistics*. Wiley, 684 pp.Rosenblatt, M., 1956: Remarks on some nonparametric estimates of a density function.

,*Ann. Math. Stat.***27****,**832–835.Scaillet, O., 2004: Density estimation using inverse and reciprocal inverse Gaussian kernels.

,*J. Nonparametric Stat.***16****,**217–226.Scott, D. W., and Sain S. R. , 2005: Multi-dimensional density estimation.

*Data Mining and Computational Statistics,*C. R. Rao and E. J. Wegman, Eds.,*Handbook of Statistics,*Vol. 23, Elsevier, 229–258 pp.Sheather, S. J., 2004: Density estimation.

,*Stat. Sci.***19****,**588–597.Silverman, B. W., 1986:

*Density Estimation for Statistics and Data Analysis*. Chapman and Hall/CRC, 175 pp.Silverman, B. W., and Jones M. C. , 1989: An important contribution to nonparametric discriminant analysis and density estimation—Commentary on Fix and Hodges (1951).

,*Int. Stat. Rev.***57****,**233–247.Simonoff, J. S., 1996:

*Smoothing Methods in Statistics*. Springer, 338 pp.Stanski, H. R., Wilson L. J. , and Burrows W. R. , 1990: Survey of common verification methods in meteorology. World Weather Watch Tech. Rep. 8., WMO, Geneva, Switzerland, 114 pp.

Stephenson, D. B., and Doblas-Reyes F. J. , 1994: Statistical methods for interpreting Monte Carlo ensemble forecasts.

,*Tellus***52A****,**300–322.Swets, J. A., and Pickett R. M. , 1982:

*Evaluation of Diagnostic Systems: Methods from Signal Detection Theory*. Academic Press, 253 pp.Thom, H. C. S., 1958: A note on the gamma distribution.

,*Mon. Wea. Rev.***86****,**117–122.Toth, Z., and Kalnay E. , 1993: Ensemble forecasting at NMC: The generation of perturbations.

,*Bull. Amer. Meteor. Soc.***74****,**2317–2330.Toth, Z., and Kalnay E. , 1997: Ensemble forecasting at NMC and the breeding method.

,*Mon. Wea. Rev.***125****,**3297–3319.Toth, Z., Zhu Y. , Marchok T. , Tracton S. , and Kalnay E. , 1998: Verification of the NCEP global ensemble forecasts. Preprints,

*12th Conf. on Numerical Weather Prediction,*Phoenix, AZ, Amer. Meteor. Soc., 286–289.Toth, Z., Zhu Y. , and Marchok T. , 2001: The use of ensembles to identify forecasts with small and large uncertainty.

,*Wea. Forecasting***16****,**463–477.von Storch, H., and Zwiers F. W. , 1995:

*Statistical Analysis in Climate Research*. Academic Press, 484 pp.Wand, M. P., and Jones M. C. , 1995:

*Kernel Smoothing*. Chapman and Hall/CRC, 212 pp.Wheeden, R. L., and Zygmund A. , 1977:

*Measure and Integral: An Introduction to Real Analysis*. Marcel Dekker, 274 pp.Whitaker, J. S., and Loughe A. F. , 1998: The relationship between ensemble spread and ensemble mean skill.

,*Mon. Wea. Rev.***126****,**3292–3302.Wilks, D. S., 1990: Maximum likelihood estimation for the gamma distribution using data containing zeros.

,*J. Climate***3****,**1495–1501.Wilks, D. S., 2002: Smoothing forecast ensembles with fitted probability distributions.

,*Quart. J. Roy. Meteor. Soc.***128****,**2821–2836.Wilks, D. S., 2006:

*Statistical Methods in the Atmospheric Sciences*. 2d ed. Elsevier Academic, 627 pp.Wilson, L. J., Beauregard S. , Raftery A. E. , and Verret R. , 2007: Calibrated surface temperature forecasts from the Canadian Ensemble Prediction System using Bayesian model averaging.

,*Mon. Wea. Rev.***135****,**1364–1385.Zhu, Y., Iyenger G. , Toth Z. , Tracton S. , and Marchok T. , 1996: Objective evaluation of the NCEP Global Ensemble Forecasting System. Preprints,

*15th Conf. on Weather Analysis and Forecasting,*Norfolk, VA, Amer. Meteor. Soc., J79–J82.

Histograms for the sample {2, .3, .3, .4, .4, .4, .4, .6, .6, .6, .6, .7, .7, .7, .7, .7, .7, .7, .8, .8, .8, .9, .9, .9, .9, .9, .9, 1.1 , 1.1 , 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3, 1.4, 1.7, 1.7, 1.7, 1.8, 1.8, 2.1, 2.1}, with bins defined by (a) the intervals {(*i*/4, (*i* + 1)/4], 0 ≤ *i* ≤ 8}, (b) bins as in (a) but translated to the right by 0.1, (c) bins as in (b) but double the width.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Histograms for the sample {2, .3, .3, .4, .4, .4, .4, .6, .6, .6, .6, .7, .7, .7, .7, .7, .7, .7, .8, .8, .8, .9, .9, .9, .9, .9, .9, 1.1 , 1.1 , 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3, 1.4, 1.7, 1.7, 1.7, 1.8, 1.8, 2.1, 2.1}, with bins defined by (a) the intervals {(*i*/4, (*i* + 1)/4], 0 ≤ *i* ≤ 8}, (b) bins as in (a) but translated to the right by 0.1, (c) bins as in (b) but double the width.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Histograms for the sample {2, .3, .3, .4, .4, .4, .4, .6, .6, .6, .6, .7, .7, .7, .7, .7, .7, .7, .8, .8, .8, .9, .9, .9, .9, .9, .9, 1.1 , 1.1 , 1.2, 1.2, 1.2, 1.2, 1.3, 1.3, 1.3, 1.4, 1.7, 1.7, 1.7, 1.8, 1.8, 2.1, 2.1}, with bins defined by (a) the intervals {(*i*/4, (*i* + 1)/4], 0 ≤ *i* ≤ 8}, (b) bins as in (a) but translated to the right by 0.1, (c) bins as in (b) but double the width.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Centered histograms and gamma KDEs for the sample {0, 1, 4, 4, 5, 9, 9, 10, 10, 10, 10, 11, 13, 13, 13}, for smoothing bandwidths of (a) 0.4 and (b) 1.0. The KDE obtains from the average of the gamma kernels (solid lines), centered on the data points in the sample and weighted by the multiplicities of these data points in the sample.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Centered histograms and gamma KDEs for the sample {0, 1, 4, 4, 5, 9, 9, 10, 10, 10, 10, 11, 13, 13, 13}, for smoothing bandwidths of (a) 0.4 and (b) 1.0. The KDE obtains from the average of the gamma kernels (solid lines), centered on the data points in the sample and weighted by the multiplicities of these data points in the sample.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Centered histograms and gamma KDEs for the sample {0, 1, 4, 4, 5, 9, 9, 10, 10, 10, 10, 11, 13, 13, 13}, for smoothing bandwidths of (a) 0.4 and (b) 1.0. The KDE obtains from the average of the gamma kernels (solid lines), centered on the data points in the sample and weighted by the multiplicities of these data points in the sample.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Gamma KDEs for the sample in Fig. 3 for smoothing bandwidths of (a) 0.01 and (b) 1.0. As the data points approach the origin the associated kernels become increasingly asymmetric and narrower, the latter demanding a concomitant increase in their modes. The much larger bandwidth used in the example of (b) results in much greater smoothing.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Gamma KDEs for the sample in Fig. 3 for smoothing bandwidths of (a) 0.01 and (b) 1.0. As the data points approach the origin the associated kernels become increasingly asymmetric and narrower, the latter demanding a concomitant increase in their modes. The much larger bandwidth used in the example of (b) results in much greater smoothing.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Gamma KDEs for the sample in Fig. 3 for smoothing bandwidths of (a) 0.01 and (b) 1.0. As the data points approach the origin the associated kernels become increasingly asymmetric and narrower, the latter demanding a concomitant increase in their modes. The much larger bandwidth used in the example of (b) results in much greater smoothing.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distribution, for lead times of 3 days, of (a) least squares and likelihood cross-validation bandwidths and the optimal Gaussian bandwidth and (b) the difference between the least squares and likelihood cross-validation bandwidth. The negative mode in the density plotted in (b), as well as its asymmetry, confirms the fact that, on average, the likelihood cross-validation bandwidth exceeds that obtained from the least-squares cross-validation selection algorithm.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distribution, for lead times of 3 days, of (a) least squares and likelihood cross-validation bandwidths and the optimal Gaussian bandwidth and (b) the difference between the least squares and likelihood cross-validation bandwidth. The negative mode in the density plotted in (b), as well as its asymmetry, confirms the fact that, on average, the likelihood cross-validation bandwidth exceeds that obtained from the least-squares cross-validation selection algorithm.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distribution, for lead times of 3 days, of (a) least squares and likelihood cross-validation bandwidths and the optimal Gaussian bandwidth and (b) the difference between the least squares and likelihood cross-validation bandwidth. The negative mode in the density plotted in (b), as well as its asymmetry, confirms the fact that, on average, the likelihood cross-validation bandwidth exceeds that obtained from the least-squares cross-validation selection algorithm.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distributions of least squares and likelihood cross-validation bandwidths and the optimal Gaussian bandwidth for lead times of 1 vs 10 days, forecasting 1-day accumulations.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distributions of least squares and likelihood cross-validation bandwidths and the optimal Gaussian bandwidth for lead times of 1 vs 10 days, forecasting 1-day accumulations.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distributions of least squares and likelihood cross-validation bandwidths and the optimal Gaussian bandwidth for lead times of 1 vs 10 days, forecasting 1-day accumulations.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distributions of smoothing bandwidths bw_{0}, bw_{0}/5, and bw_{0}/10, and the cross-validation bandwidths. The distributions of bw_{0}/5, bw_{0}/10, and the least squares cross-validation bandwidth cluster near the origin, while the optimal Gaussian bandwidth is much larger than any of the others. The bandwidths produced by the likelihood cross-validation algorithm are intermediate between these two extremes.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distributions of smoothing bandwidths bw_{0}, bw_{0}/5, and bw_{0}/10, and the cross-validation bandwidths. The distributions of bw_{0}/5, bw_{0}/10, and the least squares cross-validation bandwidth cluster near the origin, while the optimal Gaussian bandwidth is much larger than any of the others. The bandwidths produced by the likelihood cross-validation algorithm are intermediate between these two extremes.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distributions of smoothing bandwidths bw_{0}, bw_{0}/5, and bw_{0}/10, and the cross-validation bandwidths. The distributions of bw_{0}/5, bw_{0}/10, and the least squares cross-validation bandwidth cluster near the origin, while the optimal Gaussian bandwidth is much larger than any of the others. The bandwidths produced by the likelihood cross-validation algorithm are intermediate between these two extremes.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Pdf’s of EPS forecasts at Greenwood, Nova Scotia, of the precipitation amount for the 1-day period ending at (a) 0000 UTC 12 Feb 2004 and (b) 0000 UTC 15 Feb 2004. It is difficult to see why precipitation of ∼2.8 mm [under the local maximum labeled “A” in (b)] is so much more likely than an amount of ∼2.25 mm [under the local minimum labeled “B” in (b)].

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Pdf’s of EPS forecasts at Greenwood, Nova Scotia, of the precipitation amount for the 1-day period ending at (a) 0000 UTC 12 Feb 2004 and (b) 0000 UTC 15 Feb 2004. It is difficult to see why precipitation of ∼2.8 mm [under the local maximum labeled “A” in (b)] is so much more likely than an amount of ∼2.25 mm [under the local minimum labeled “B” in (b)].

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Pdf’s of EPS forecasts at Greenwood, Nova Scotia, of the precipitation amount for the 1-day period ending at (a) 0000 UTC 12 Feb 2004 and (b) 0000 UTC 15 Feb 2004. It is difficult to see why precipitation of ∼2.8 mm [under the local maximum labeled “A” in (b)] is so much more likely than an amount of ∼2.25 mm [under the local minimum labeled “B” in (b)].

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Impact of increasing lead time on the pdf’s: for a lead time of 1 day all of the member forecasts cluster fairly closely together around 70 mm. As the lead time grows the member forecasts disperse into clusters, with a concomitant proliferation of modes in the KDEs for the smallest bandwidths.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Impact of increasing lead time on the pdf’s: for a lead time of 1 day all of the member forecasts cluster fairly closely together around 70 mm. As the lead time grows the member forecasts disperse into clusters, with a concomitant proliferation of modes in the KDEs for the smallest bandwidths.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Impact of increasing lead time on the pdf’s: for a lead time of 1 day all of the member forecasts cluster fairly closely together around 70 mm. As the lead time grows the member forecasts disperse into clusters, with a concomitant proliferation of modes in the KDEs for the smallest bandwidths.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) Cdf and (b) pdf for a precipitation event at St. John’s. The KDE obtained with the optimal Gaussian fits the raw EPS forecast much worse than the KDEs obtained with the other bandwidths.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) Cdf and (b) pdf for a precipitation event at St. John’s. The KDE obtained with the optimal Gaussian fits the raw EPS forecast much worse than the KDEs obtained with the other bandwidths.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) Cdf and (b) pdf for a precipitation event at St. John’s. The KDE obtained with the optimal Gaussian fits the raw EPS forecast much worse than the KDEs obtained with the other bandwidths.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distribution of occurrences of zeros among EPS member forecasts.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distribution of occurrences of zeros among EPS member forecasts.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distribution of occurrences of zeros among EPS member forecasts.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Illustration of the adjustment in probability model to accommodate member forecasts of nil precipitation. In this case 5 of the 17 EPS members forecast nil precipitation, yielding a value for the raw cdf at the origin of 5/17 = 0.294. The vertical origin of the KDE models has been translated upward by this amount, computing the smoothing bandwidths on the subsample of nonzero member forecasts.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Illustration of the adjustment in probability model to accommodate member forecasts of nil precipitation. In this case 5 of the 17 EPS members forecast nil precipitation, yielding a value for the raw cdf at the origin of 5/17 = 0.294. The vertical origin of the KDE models has been translated upward by this amount, computing the smoothing bandwidths on the subsample of nonzero member forecasts.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Illustration of the adjustment in probability model to accommodate member forecasts of nil precipitation. In this case 5 of the 17 EPS members forecast nil precipitation, yielding a value for the raw cdf at the origin of 5/17 = 0.294. The vertical origin of the KDE models has been translated upward by this amount, computing the smoothing bandwidths on the subsample of nonzero member forecasts.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

The pdf’s of the distributions *f*_{1}(*x*) = *f*_{γ}(*x*; 1/2, 1), *f*_{2}(*x*) = *f _{γ}*(

*x*; 3, 1),

*f*

_{3}(

*x*) = 1/20

*f*

_{γ}(

*x*; 1/2, 1) + 19/20

*f*

_{γ}(

*x*; 7, 1), and

*f*

_{4}(

*x*) = 1/4

*f*

_{γ}(

*x*; 2, 1) + 3/4

*f*

_{γ}(

*x*; 7, 1). Here,

*f*

_{1}and

*f*

_{2}are unimodal, the mode of the former being located at the origin, while

*f*

_{3}and

*f*

_{4}are bimodal, the principal mode of

*f*

_{3}located at the origin. Samples of 17 data points were repeatedly selected randomly from these distributions, KDEs fitted thereto, and the resulting KDE and empirical models compared against the known distribution.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

The pdf’s of the distributions *f*_{1}(*x*) = *f*_{γ}(*x*; 1/2, 1), *f*_{2}(*x*) = *f _{γ}*(

*x*; 3, 1),

*f*

_{3}(

*x*) = 1/20

*f*

_{γ}(

*x*; 1/2, 1) + 19/20

*f*

_{γ}(

*x*; 7, 1), and

*f*

_{4}(

*x*) = 1/4

*f*

_{γ}(

*x*; 2, 1) + 3/4

*f*

_{γ}(

*x*; 7, 1). Here,

*f*

_{1}and

*f*

_{2}are unimodal, the mode of the former being located at the origin, while

*f*

_{3}and

*f*

_{4}are bimodal, the principal mode of

*f*

_{3}located at the origin. Samples of 17 data points were repeatedly selected randomly from these distributions, KDEs fitted thereto, and the resulting KDE and empirical models compared against the known distribution.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

The pdf’s of the distributions *f*_{1}(*x*) = *f*_{γ}(*x*; 1/2, 1), *f*_{2}(*x*) = *f _{γ}*(

*x*; 3, 1),

*f*

_{3}(

*x*) = 1/20

*f*

_{γ}(

*x*; 1/2, 1) + 19/20

*f*

_{γ}(

*x*; 7, 1), and

*f*

_{4}(

*x*) = 1/4

*f*

_{γ}(

*x*; 2, 1) + 3/4

*f*

_{γ}(

*x*; 7, 1). Here,

*f*

_{1}and

*f*

_{2}are unimodal, the mode of the former being located at the origin, while

*f*

_{3}and

*f*

_{4}are bimodal, the principal mode of

*f*

_{3}located at the origin. Samples of 17 data points were repeatedly selected randomly from these distributions, KDEs fitted thereto, and the resulting KDE and empirical models compared against the known distribution.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Attributes diagrams for warm season forecasts of 24-hour precipitation: (a) 1- and (b) 2-days lead time. Histograms (9 bins) indicating the distributions of the forecasts from KDE_{bw0}, KDE_{llcv}, KDE_{lscv}, and the empirical model are inset in the upper left-hand corner. The unit-slope line indicates perfect reliability. Forecasts on the line bisecting the unit-slope line and the solid horizontal line (sample climatology) have no skill (Wilks 2006). The horizontal dashed line shows the long-term climatology.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Attributes diagrams for warm season forecasts of 24-hour precipitation: (a) 1- and (b) 2-days lead time. Histograms (9 bins) indicating the distributions of the forecasts from KDE_{bw0}, KDE_{llcv}, KDE_{lscv}, and the empirical model are inset in the upper left-hand corner. The unit-slope line indicates perfect reliability. Forecasts on the line bisecting the unit-slope line and the solid horizontal line (sample climatology) have no skill (Wilks 2006). The horizontal dashed line shows the long-term climatology.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Attributes diagrams for warm season forecasts of 24-hour precipitation: (a) 1- and (b) 2-days lead time. Histograms (9 bins) indicating the distributions of the forecasts from KDE_{bw0}, KDE_{llcv}, KDE_{lscv}, and the empirical model are inset in the upper left-hand corner. The unit-slope line indicates perfect reliability. Forecasts on the line bisecting the unit-slope line and the solid horizontal line (sample climatology) have no skill (Wilks 2006). The horizontal dashed line shows the long-term climatology.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) BSS and (b) components for warm season forecasts as a function of lead time. The 95% confidence intervals obtained from percentile bootstrap estimates (Efron and Tibshirani 1994) are plotted around each data point. Because BSS is proportional to the resolution minus the reliability, larger resolution and smaller reliability correspond to greater skill.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) BSS and (b) components for warm season forecasts as a function of lead time. The 95% confidence intervals obtained from percentile bootstrap estimates (Efron and Tibshirani 1994) are plotted around each data point. Because BSS is proportional to the resolution minus the reliability, larger resolution and smaller reliability correspond to greater skill.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) BSS and (b) components for warm season forecasts as a function of lead time. The 95% confidence intervals obtained from percentile bootstrap estimates (Efron and Tibshirani 1994) are plotted around each data point. Because BSS is proportional to the resolution minus the reliability, larger resolution and smaller reliability correspond to greater skill.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) Attributes diagrams (as in Fig. 14 but with only 6 bins) and (b) BSSs (as in Fig. 15a) of cool season forecasts of 24-hour precipitation exceeding 4 mm at St. John’s. The 95% confidence intervals obtained from percentile bootstrap estimates are plotted around each data point.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) Attributes diagrams (as in Fig. 14 but with only 6 bins) and (b) BSSs (as in Fig. 15a) of cool season forecasts of 24-hour precipitation exceeding 4 mm at St. John’s. The 95% confidence intervals obtained from percentile bootstrap estimates are plotted around each data point.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

(a) Attributes diagrams (as in Fig. 14 but with only 6 bins) and (b) BSSs (as in Fig. 15a) of cool season forecasts of 24-hour precipitation exceeding 4 mm at St. John’s. The 95% confidence intervals obtained from percentile bootstrap estimates are plotted around each data point.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Calibration of (a) cool and (b) warm season model forecasts, as a function of percentile. The 95% confidence intervals, obtained from percentile bootstrap estimates, are plotted around each datapoint. Perfectly calibrated forecasts lie on the unit-slope line.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Calibration of (a) cool and (b) warm season model forecasts, as a function of percentile. The 95% confidence intervals, obtained from percentile bootstrap estimates, are plotted around each datapoint. Perfectly calibrated forecasts lie on the unit-slope line.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Calibration of (a) cool and (b) warm season model forecasts, as a function of percentile. The 95% confidence intervals, obtained from percentile bootstrap estimates, are plotted around each datapoint. Perfectly calibrated forecasts lie on the unit-slope line.

Citation: Weather and Forecasting 23, 4; 10.1175/2007WAF2007023.1

Distances between estimators and true density: *f*_{1}(*x*) = *f*_{γ}(*x*; 1/2, 1).

Distances between estimators and true density: *f*_{2}(*x*) = *f _{γ}*(

*x*; 3, 1).

Distances between estimators and true density: *f*_{3}(*x*) = 1/20*f*_{γ}(*x*; 1/2, 1) + 19/20*f*_{γ}(*x*; 7, 1).

Distances between estimators and true density: *f*_{4}(*x*) = 1/4*f*_{γ}(*x*; 2, 1) + 3/4*f*_{γ}(*x*; 7, 1).

^{1}

Also termed the *discrete* (Parzen 1960) or *sample* distribution function (Rohatgi 1976).

^{2}

Also known as the *indicator function* (Bowman and Azzalini 1997).

^{3}

Because the support for the random variable is confined to [0, ∞), we cannot employ the conventional *δ* function *δ*(*x*) because this would introduce a factor of 1/2 in (11) (Heitler 1984). Here, *δ*^{+} can be defined in terms of *δ* by *δ ^{+}* ≡ 2

*δ*|

_{[0,∞)}.

^{4}

It is neither symmetric nor transitive.

^{5}

Here, BSS = (BS_{res} − BS_{rel})/BS_{unc}, where BS_{unc} is the variance of the observations.