## Abstract

A bivariate ensemble model output statistics (EMOS) technique for the postprocessing of ensemble forecasts of two-dimensional wind vectors is proposed, where the postprocessed probabilistic forecast takes the form of a bivariate normal probability density function. The postprocessed means and variances of the wind vector components are linearly bias-corrected versions of the ensemble means and ensemble variances, respectively, and the conditional correlation between the wind components is represented by a trigonometric function of the ensemble mean wind direction. In a case study on 48-h forecasts of wind vectors over the North American Pacific Northwest with the University of Washington Mesoscale Ensemble, the bivariate EMOS density forecasts were calibrated and sharp, and showed considerable improvement over the raw ensemble and reference forecasts, including ensemble copula coupling.

## 1. Introduction

The past two decades have seen a change of paradigms in weather forecasting, in that ensemble prediction systems have been developed and implemented operationally (Leutbecher and Palmer 2008). Ensemble systems seek to reflect and quantify sources of uncertainty in numerical weather forecasts, such as imperfections in initial conditions and incomplete mathematical representations of the atmosphere. Despite the ubiquitous positive spread–skill relationship (Whitaker and Loughe 1998; Grimit and Mass 2002), ensemble forecasts tend to be biased, and typically they are underdispersed (Hamill and Colucci 1997), in that the ensemble spread is too small to be realistic. Furthermore, differing spatial resolutions of the forecast grid and the observation network may need to be reconciled.

To address these shortcomings, various techniques for the statistical postprocessing of ensemble model output have been developed (Wilks and Hamill 2007), including ensemble model output statistics (EMOS) or nonhomogeneous Gaussian regression (Gneiting et al. 2005; Thorarinsdottir and Gneiting 2010). The EMOS technique transforms a raw ensemble forecast into a predictive probability density function, and simultaneously corrects for biases and dispersion errors. EMOS methods have been developed for temperature and surface pressure (Gneiting et al. 2005; Hagedorn et al. 2008; Kann et al. 2009), where the predictive density is normal and the method is often referred to as nonhomogeneous Gaussian regression, for quantitative precipitation (Wilks 2009) and for wind speed (Thorarinsdottir and Gneiting 2010; Thorarinsdottir and Johnson 2012). In all these implementations, the predictive density applies to a univariate weather quantity.

In this current paper, we propose and develop an EMOS technique for a bivariate weather quantity, namely surface wind vectors, comprising both zonal and meridional components or, in an alternative but mathematically equivalent representation, wind speed and wind direction. Probabilistic forecasts of wind conditions are critical in a wide range of applications, including air traffic control, ship routing, recreational and competitive sailing, and wind energy, where their societal and monetary value is huge (Marquis et al. 2011). Until very recently, wind speed and wind direction have been addressed individually in statistical postprocessing without taking dependencies into account (Bao et al. 2010; Sloughter et al. 2010; Thorarinsdottir and Gneiting 2010; Thorarinsdottir and Johnson 2012). However, in many of the aforementioned applications it is important to honor the full information about the bivariate structure of the future wind vector that is provided by the ensemble. Thus, our EMOS postprocessed forecasts take the form of elliptically symmetric bivariate normal densities, as illustrated in Fig. 1 in an application to the University of Washington Mesoscale Ensemble (UWME; Eckel and Mass 2005). A description of the UWME is given in section 3a.

The remainder of the paper is organized as follows. In section 2 we provide the details of the bivariate EMOS technique. A case study is presented in section 3, where we consider 48-h ahead forecasts of wind vectors over the North American Pacific Northwest in 2008 based on the eight-member UWME. The paper concludes in section 4, where we hint at future developments and discuss the similarities and differences between our EMOS technique, the BMA approach of Sloughter (2009) and Sloughter et al. (2012), ensemble copula coupling (ECC; Schefzik 2011), and the postprocessing method proposed by Pinson (2012), all of which are directed at the bivariate postprocessing of ensemble forecasts of wind vectors. The appendix describes our verification methods.

## 2. Ensemble model output statistics for wind vectors

A wind vector is determined by wind speed and wind direction, or by its zonal (west–east) and meridional (north–south) components, which we denote by *u* and *υ*, respectively. We now develop an ensemble model output statistics (EMOS) method for wind vectors, where the postprocessed probabilistic forecast takes the form of a bivariate probability density function. The method is tailored to ensembles with relatively few members, such as the eight-member University of Washington Mesoscale Ensemble (UWME; Eckel and Mass 2005), and we illustrate it using forecast and observation data from this ensemble system.

### a. Bivariate normal distribution

Our EMOS postprocessed forecast takes the form of an elliptically symmetric, bivariate normal probability density function for the wind vector (*u*, *υ*), with the parameters of this distribution being specified in terms of the ensemble forecast. The analytic form of the bivariate normal probability density function is

The task now is to specify the five parameters in (1), namely the mean values, *μ _{u}* and

*μ*, of the wind vector components

_{υ}*u*and

*υ*; the corresponding marginal variances, and , respectively; and the correlation coefficient,

*ρ*, between the wind components, in their dependence on the ensemble forecast.

_{uυ}Bivariate normal density forecasts for wind vectors have also been proposed by Gneiting et al. (2008), though in very crude form. The ingenious method of Pinson (2012) estimates a dilation and translation of an ensemble forecast of wind vectors based on bivariate normal densities, and our approach is very similar in its treatment of the mean and variance parameters. However, major differences between the approach of Pinson (2012) and our method lie in the form of the postprocessed forecast, which is a probability density function in our case, rather than a dilated and translated ensemble, and in the explicit modeling of the correlation coefficient *ρ _{uυ}* in our method. For a more detailed comparison and recommendations for a judicious choice of the most appropriate method, given any particular ensemble system and task at hand, we refer to section 4.

In the description that follows, we consider a general ensemble with *m* members, and denote the individual wind vector forecasts by (*u*_{1}, *υ*_{1}), … , (*u _{m}*,

*υ*), respectively.

_{m}### b. Means

In our standard implementation of the bivariate EMOS technique, the means *μ _{u}* and

*μ*are bias-corrected versions of the respective ensemble means, in that

_{υ}where and . The bias correction parameters *a _{u}*,

*b*,

_{u}*a*, and

_{υ}*b*are estimated from training data, in ways described in section 2e below. In a slightly more ambitious implementation, the mean components

_{υ}*μ*and

_{u}*μ*are affine functions of the individual ensemble member forecasts, namely

_{υ}where the regression parameters *a _{u}*,

*b*

_{u}_{,1}, … ,

*b*

_{u}_{,m},

*a*, and

_{υ}*b*

_{υ}_{,1}, … ,

*b*

_{υ}_{,m}are estimated from training data. This general version applies to ensembles with nonexchangeable members only and reduces to the standard version when and . In our experiences with the UWME, which has nonexchangeable members, the general version gave only very slightly improved predictive performance, and so we report results for the standard implementation (2) only.

### c. Variances

We specify the marginal variances and of the bivariate normal density forecast (1) as affine functions of the respective ensemble variances, in that

where and . The dispersion correction parameters *c _{u}*,

*d*,

_{u}*c*, and

_{υ}*d*are estimated from training data, as described in section 2e below. To guarantee the nonnegativity of the variances, we constrain the parameters to be nonnegative, using the technique described by Thorarinsdottir and Gneiting (2010).

_{υ}### d. Correlation coefficient

The key characteristic and major innovation in our work is the explicit modeling of the correlation coefficient *ρ _{uυ}* of the postprocessed bivariate normal density forecast (1).

To motivate our specification of *ρ _{uυ}*, we consider ensemble forecast and observation data from the UWME in calendar year 2007, comprising a total of 23 250 forecasts cases at 79 meteorological stations in the Pacific Northwest. Figure 2 distinguishes nine sectors for (

*u*,

*υ*) wind vector forecasts and observations. Figure 3 shows wind vector observations in 2007, conditionally on the raw ensemble mean wind vector falling into any given sector. The conditional distributions are partly elliptically contoured, particularly in the first sector, and partly skewed, with an orientation that depends strongly on the sector, and they reflect the discretized nature of wind observations, as discussed in more detail in section 3.

The left-hand panel in Fig. 4 plots the correlation coefficient between the wind components in the scatterplots in Fig. 3 as a function of the wind directions that correspond to the centers of sectors 2–9. The panel demonstrates that the ensemble mean wind direction ought to have a profound influence on the correlation coefficient *ρ _{uυ}* in the postprocessed EMOS density forecast (1). Thus, we model the correlation coefficient

*ρ*as a trigonometric function of the ensemble mean wind direction

_{uυ}*θ*, measured in degrees, in that

where the parameters *r*, *s*, *k*, and *ϕ* are estimated from training data, in ways described in section 2e below. The coefficients *r* and *s* concern the overall magnitude of the correlation coefficient and need to satisfy |*r*| + |*s*| ≤ 1. The parameter *k* corresponds to the number of periods of the trigonometric function, which we constrain to be either *k* = 1, 2, or 3, and the direction *ϕ* encodes phase information.

We apply the correlation model (4) to forecasts with ensemble mean vectors in all nine wind sectors, including the first sector. Alternatively, if the ensemble mean wind vector falls into the central first sector, one can take *ρ _{uυ}* to be equal to the empirical correlation coefficient in the corresponding scatterplot. In our experiences with the UWME, the two approaches resulted in nearly identical predictive performance.

### e. Estimation

Our EMOS postprocessed density forecast for a wind vector takes the form of the bivariate normal probability density (1), where the relationships of *μ _{u}*,

*μ*, , and

_{υ}*ρ*to the raw ensemble forecast are specified in (2), (3), and (4). It remains to estimate the parameters that govern these equations, and we address this task in three phases.

_{uυ}Before describing the phases of our estimation scheme, it is worth noting that we follow Thorarinsdottir and Gneiting (2010) and consider two distinct variants, which we call the Regional EMOS and the Local EMOS method, respectively. In the Regional EMOS method, only one set of parameters is estimated and used to produce forecasts over the entire ensemble domain, such as the Pacific Northwest for the UWME. Training sets thus comprise data from all stations. In contrast, the Local EMOS method uses training data only from the station at hand, and thus obtains a distinct set of parameters for each station.

We now turn to the first phase of our estimation scheme, in which we fit the correlation model (4). We do this offline, once and for all, based on historic forecast and observation data. Specifically, for the UWME, we use data from calendar year 2007 to form the conditional scatterplots in Fig. 3 and compute and plot the corresponding empirical correlation coefficients, as illustrated in Fig. 4. We then choose a suitable value for the number of cycles *k*, and fit the remaining parameters, *r*, *s*, and *ϕ*, of the correlation model (4) by a weighted nonlinear least squares technique, using the R function nonlinear least squares (NLS) with the weights being proportional to the number of observations in the sectors (R Development Core Team 2011). The use of a weighted least squares technique is critical, particularly for the Local EMOS technique, as local wind patterns may result in very few observations being available in any given sector. This first phase of the estimation was done once and for all using historic data from 2007, and the fitted correlation model was applied throughout the calendar year 2008, which we took as our test period.

The left-hand panel in Fig. 4 shows the fitted correlation model for the Regional EMOS method, where we chose *k* = 2 and obtained weighted least squares estimates of *r* = 0.20, *s* = −0.15, and *ϕ* = −61.9, respectively. As noted, the Regional EMOS method uses these parameters throughout the UWME domain and throughout the test period in the calendar year 2008. The right-hand panel shows the Local EMOS correlation model at Seattle–Tacoma (Sea-Tac) Airport, where we took *k* = 1 and obtained weighted least squares estimates of *r* = 0.24, *s* = 0.07, and *ϕ* = 70.5, respectively. The fitted Local EMOS correlation models at the remaining stations in the UWME domain are provided and illustrated in the appendix of Schuhen (2011).

In contrast to the first phase of our estimation scheme, the second and third stages proceed online, that is, they use rolling training periods consisting of data from the recent past. The Regional EMOS method uses all available data from the Pacific Northwest from the last *n* days available prior to the forecast being made. Missing data are simply omitted from the training set. For the Local EMOS method, the training period comprises data from the station at hand from the *n* most recent days where forecasts and observations were available. In either case, we talk of a sliding *n*-day training period.

In the second phase of our estimation scheme, the parameters *a _{u}*,

*b*,

_{u}*a*, and

_{υ}*b*in the specification (2) of the mean vector are estimated from the training data by standard linear least squares regression. In the third phase, the parameters

_{υ}*c*,

_{u}*d*,

_{u}*c*, and

_{υ}*d*in the specification (3) of the marginal variances are estimated on the same rolling training period by the maximum likelihood technique, with all other parameters being held fixed. In other words, we maximize the logarithm of the likelihood function:

_{υ}as a function of the parameters *c _{u}*,

*d*,

_{u}*c*, and

_{υ}*d*in the variance model (3), which are all constrained to be nonnegative. The sum in the log likelihood function extends over all locations

_{υ}*x*and times

*t*for which there are data in the training set. Any single term of the form

*f*

^{(x,t)}(

*c*,

_{u}*d*,

_{u}*c*,

_{υ}*d*) refers to the bivariate normal density (1) evaluated at the verifying values

_{υ}*u*=

*u*

^{(x,t)}and

*υ*=

*υ*

^{(x,t)}, with

*μ*,

_{u}*μ*, and

_{υ}*ρ*set at the numerical values implied by the mean model (2) and the correlation model (4), based on the parameter estimates from the first two phases of our estimation scheme, and putting and , where and denote the ensemble variances at location

_{uυ}*x*and valid time

*t*, respectively. The optimization is performed numerically with the optim function in R, using the Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm with initial values provided by the previous day’s estimates. The BFGS algorithm is a gradient optimization method of quasi-Newton type where the Hessian matrix is updated using rank-one updates (Bertsekas 1999).

The interpretation of the right-hand side of (5) as a log likelihood function is valid only if the forecast errors are independent between times and locations. While this is usually not the case, an alternative interpretation as a mean logarithmic score permits us to view the estimates as optimum score estimates, which are tailored to the estimation of forecasting models (Gneiting et al. 2005).

### f. Example

Figure 5 and Table 1 illustrate the postprocessed Local and Regional EMOS density forecasts of the surface wind vector at Sea-Tac Airport, valid at 0000 UTC 20 October 2008, at a prediction horizon of 48 h. Thus, the forecast concerns the same valid time and the same prediction horizon as the Local EMOS forecasts illustrated in Fig. 1, where the station at Sea-Tac Airport is located in the southeast Puget Sound area. The postprocessed density forecasts correct for the biases and underdispersion in the raw UWME. The Local EMOS forecast retains the positive correlation structure in the raw ensemble and is sharper than the Regional EMOS forecast.

## 3. Case study: Forecasting surface wind vectors over the Pacific Northwest

We now consider the out-of-sample predictive performance of our two-dimensional EMOS technique in a case study for wind vector forecasts over the North American Pacific Northwest in 2008, based on the UWME (Eckel and Mass 2005). We compare to the raw ensemble forecast and various reference techniques, such as ensemble copula coupling, and assess the performance of the bivariate EMOS technique when only forecasts of wind speed are desired.

### a. University of Washington Mesoscale Ensemble

In our case study, the test set consists of forecasts of surface (10 m) wind vectors based on the UWME with a valid date in calendar year 2008, at a prediction horizon of 48 h. The UWME is an eight-member multianalysis ensemble then based on the fifth-generation Pennsylvania State University–National Center for Atmospheric Research Mesoscale Model (MM5) with initial and lateral boundary conditions obtained from operational centers around the world. The forecasts were made on a 12-km grid over the Pacific Northwest region of western North America, including the U.S. states of Washington, Oregon, and Idaho, as well as the southern part of the Canadian province of British Columbia. The forecasts were bilinearly interpolated from the four surrounding grid points to the observation locations and rotated to match the true direction at each station.

Surface wind vector observations were provided by the weather observation stations in the Automated Surface Observing System network (National Weather Service 1998). The vector wind quantity studied here is horizontal instantaneous surface wind, where “instantaneous” means that the wind was measured and averaged over the last 2 min before the valid time at 0000 UTC. The wind vector observations were recorded as wind speed and wind direction, where wind speed was rounded to the nearest whole knot (kt), where 1 kt = 0.5144 m s^{−1}, while values below 2 kt were recorded as zero. The observations are thus discretized, as is easily recognizable in Fig. 3. Quality control procedures as described in Baars (2005) were applied to the entire dataset, removing dates and locations with any missing forecasts or observations.

For calendar year 2008, a total of 19 282 pairs of ensemble forecasts and observations were available on 291 distinct days and at 79 distinct observation locations. Additional data from the years 2006 and 2007 were used to provide an appropriate rolling training period for all days in 2008, for the first phase estimation of the correlation model (4), and to establish the optimal length of the rolling training period. Further information about the UWME, now using the Weather Research and Forecasting (WRF) mesoscale model, as well as real-time forecasts and observations, can be found online (http://www.atmos.washington.edu/~ens/uwme.cgi).

### b. Training periods for Regional and Local EMOS

As noted, we distinguish between Regional and Local EMOS forecasts. The Regional EMOS method uses all available training data to estimate a single set of parameters that is used throughout the Pacific Northwest domain. The same set of parameters can thus also be used to postprocess the forecast at each location on the model grid to obtain postprocessed forecasts over the entire region. The Local EMOS technique uses training data from the station at hand only to obtain a distinct set of parameters at each station. Thus, the method applies at observation stations only, and cannot be used directly on the model grid.

To determine a suitable value of the length *n* of the rolling training period for phases 2 and 3 of our estimation scheme, as described in section 2, we considered experiments with wind vector forecasts based on the UWME in calendar year 2007. In these experiments, phase 1 of the estimation scheme used (here in sample) data from 2007, while phases 2 and 3 were done on a rolling training period of length *n* days. The predictive performance was evaluated using the mean energy score and the bivariate mean absolute error, as described in the appendix. For the Regional EMOS method, these metrics differed by less than half a percent as the length *n* of the rolling training period varied between 25 and 60. Figure 6 summarizes the results for the Local EMOS method, where we follow Thorarinsdottir and Gneiting (2010) and consider the observational locations in Washington State. Based on these results, our case study in calendar 2008 uses a rolling training period of length *n* = 30 days for the Regional EMOS method, and of length *n* = 40 days for the Local EMOS technique. However, training periods of any length between *n* = 40 and *n* = 60 days work well, and the predictive performance of the EMOS methods is insensitive to this choice.

The computational costs of postprocessing are negligible, and a real-time implementation is readily feasible. Using the aforementioned training periods, the postprocessing takes less than a second per day for the Regional EMOS method, and less than 0.03 s day^{−1} and station for the Local EMOS technique, on a standard laptop computer.

### c. Reference forecasts

We compare the postprocessed Regional and Local EMOS forecasts to the raw UWME forecast, as well as to postprocessed reference forecasts, as described now. For all reference forecasts, we distinguish regional and local methods, using rolling training periods of the most recent available *n* = 30 and *n* = 40 days, respectively.

A natural reference standard is the Independent EMOS technique, which applies the standard EMOS or heterogeneous Gaussian regression technique of Gneiting et al. (2005) to each of the vector wind components *u* and *υ* individually, and then combines them under the assumption of independence. This results in a postprocessed bivariate normal density forecast of the form (1) with the critical correlation parameter *ρ _{uυ}* constrained to be zero, but with the means and variances for

*u*and

*υ*being essentially identical to those in the bivariate EMOS method, except for minor differences due to the slightly differing estimation schemes. In particular, the bivariate EMOS method and the Independent EMOS technique yield essentially identical deterministic, bivariate mean and median wind vector forecasts, even though the respective bivariate predictive densities may differ substantially.

The ECC method, originally hinted at by Bremnes (2007) and Krzysztofowicz and Toth (2008) and recently investigated and developed by Schefzik (2011), is a tool for restoring a raw ensemble’s flow-dependent rank dependence structure in individually postprocessed predictive distributions. Here, we describe and apply the method in the context of the Independent EMOS technique and a raw ensemble with *m* members. At any given location and prediction horizon, let and denote samples from the individually postprocessed univariate EMOS distributions. As the postprocessed distributions for the wind vector components are univariate Gaussian, these are ordered random samples drawn from a univariate normal density. Let *u*_{1}, … , *u _{m}* and

*υ*

_{1}, … ,

*υ*denote the raw ensemble values for the respective wind components, and let

_{m}*τ*and

_{u}*τ*be permutations of the numbers 1, … ,

_{υ}*m*such that

with any ties resolved at random. The ECC ensemble then consists of the *m* wind vectors:

Thus, the ECC ensemble inherits and honors the raw ensemble’s rank dependence structure. For example, if the first raw ensemble member shows the second lowest *u* component and the third highest *υ* component among the raw ensemble members, then the same property holds true for the ECC ensemble.

Schefzik (2011) provides a detailed discussion of the ECC technique and exposes its ties to copulas (Schölzel and Friederichs 2008). The wind vector postprocessing technique recently proposed by Pinson (2012) can be interpreted as a particularly attractive variant, where the values are constructed as a translation and dilation of the raw ensemble values *u*_{1}, … , *u _{m}*, and as a translation and dilation of

*υ*

_{1}, … ,

*υ*, respectively, rather than being drawn at random. Consequently, the postprocessed ensemble forecast retains both the raw ensemble’s bivariate Spearman rank correlation and its (standard) Pearson product moment correlation structure.

_{m}Finally, we consider an error dressing ensemble as proposed by Gneiting et al. (2008) in the spirit of the work of Roulston and Smith (2003), where we dress the UWME mean forecast with 35 error vectors from the corresponding training set.

### d. Results over the Pacific Northwest

We now give verification results aggregated over our test period, the calendar year 2008, and the Pacific Northwest domain of the UWME. Details about the verification tools, which include the multivariate rank histogram, as proposed by Gneiting et al. (2008) for calibration checks, and proper scoring rules (Gneiting and Raftery 2007), are given in the appendix. All scores used are negatively oriented, that is, the smaller the better.

Figure 7 shows multivariate rank histograms for the raw ensemble forecast and regional and local versions of the Independent EMOS, ECC, error dressing, and (bivariate) EMOS techniques. The raw ensemble forecast shows a U-shaped rank histogram, thus indicating underdispersion. In contrast, the Independent EMOS forecasts are overdispersed, as they neglect to take dependencies between the wind vector components into account, which can be corrected for with the ECC technique. In addition to the ECC method, the error dressing and (bivariate) EMOS techniques show uniform rank histograms, as expected from a calibrated forecast.

Table 2 provides numerical summary measures of the predictive performance, with the reliability index Δ for the multivariate rank histogram confirming the visual impression in Fig. 7. The energy score is a direct analog of the continuous ranked probability score for univariate quantities that provides an overall assessment of the quality of a probabilistic forecast, addressing both calibration and sharpness. The bivariate absolute error generalizes the absolute error and assesses deterministic forecast skill.

Not surprisingly, the local approaches outperform the regional approaches, both in terms of the mean energy score and the bivariate mean absolute error. Also, the postprocessed Independent EMOS forecasts show lower scores than the raw forecast. The Independent EMOS and the (bivariate) EMOS forecasts have nearly identical bivariate mean absolute error, at a substantially lower value than for the other types of forecasts, including the ECC approach. This effect can be attributed to a discretization effect, in that the ECC technique turns the Independent EMOS density forecast into a discrete ensemble forecast. However, the ECC approach improves on the Independent EMOS forecast in terms of the energy score, as it honors the raw ensemble’s flow-dependent bivariate dependence structure. The error dressing technique also shows good probabilistic forecast skill, as evidenced by a low energy score, even though it is unable to match the scores of the (bivariate) EMOS method, which performs the best. When compared to the raw ensemble, the postprocessed Local EMOS forecast reduces the mean energy score from 2.47 to 1.87 m s^{−1}.

In Fig. 8, the mean energy score of the raw ensemble forecast at each station is compared to the corresponding scores for both Local EMOS and Regional EMOS. The EMOS postprocessing consistently improves the mean score, whether the parameter estimation is performed locally or regionally. The mean energy score for the raw ensemble is particularly high at five stations. The four highest scores are at stations where large parts of the data for 2008 are missing and where wind speeds were frequently high, such as at Port Hardy Airport on the northeastern coast of Vancouver Island in British Columbia. With large parts of the data missing, the 40-day training set for the Local EMOS method may consist of data from distinct seasons and/or weather regimes, which makes parameter estimation difficult. The fifth highest score for the raw ensemble forecast is at the location Sparwood in British Columbia. While most of the data are available for this station, it is located high in the Canadian Rocky Mountains at an elevation of 1137 m. Here, the raw ensemble score of 4.06 improves to 2.54 m s^{−1} under Regional EMOS. Local EMOS can adapt the postprocessing parameters to the location-specific setting, which results in a score of 1.50 m s^{−1}.

The results in Table 2 may similarly be stratified by seasons. Such a stratification reveals that both the mean energy score and the bivariate mean absolute error for the raw ensemble forecast are substantially lower in the autumn than in the rest of the year. The largest improvement in the predictive performance of Local EMOS over Regional EMOS is obtained during the summer months.

### e. Results at Sea-Tac Airport

Next we consider forecasts at the observation station at Sea-Tac Airport, with summary measures of the predictive performance in calendar year 2008 being provided in Table 3. The results mimic those for the Pacific Northwest. The Local EMOS forecast shows the highest probabilistic forecast skill, as quantified by the energy score, which decreases to 1.94 m s^{−1}, as compared to 2.25 m s^{−1} for the raw ensemble.

For an illustration and explanation of how and to what extent the Local EMOS technique succeeds in improving the raw ensemble forecast at Sea-Tac Airport, consider the display in Fig. 9, which is a bivariate variant of the marginal calibration diagram proposed by Gneiting et al. (2007). Essentially, a marginal calibration diagram compares the observed climatology to the climatology incurred by the forecasts. In our display we plot, for each day for which data are available, the observed wind vector, perturbed very slightly in order to honor the undiscretized distribution and improve readability, a randomly chosen member of the UWME raw ensemble, and a wind vector sampled from the postprocessed, bivariate normal Local EMOS density forecast. Again we observe that the raw ensemble is underdispersive and fails to predict extreme wind vectors. The postprocessed Local EMOS forecast corrects for the underdispersion, thus leading to substantially improved probabilistic forecast skill, as evidenced by the energy score. However, the raw ensemble does not show any recognizable biases, and so the improvement in deterministic forecast skill is minor.

### f. Results for wind speed

In addition to producing calibrated and sharp forecasts of wind vectors, the bivariate EMOS method can be used to predict wind speeds, and so can be compared to the technique of Thorarinsdottir and Gneiting (2010), which is custom tailored to this task. As wind speed is a nonnegative quantity, this method employs truncated normal predictive distributions, using an estimation scheme that is based on optimum score estimation, both in regional and local versions, where we use rolling training periods comprising the most recent *n* = 30 and *n* = 40 days available, respectively. In what follows, we refer to the method of Thorarinsdottir and Gneiting (2010) as wind speed EMOS.

To generate probabilistic forecasts of wind speed from the bivariate EMOS forecast, we sample 100 wind vectors from the bivariate predictive distribution, and compute the Euclidean norm of each vector, thereby obtaining a discrete forecast ensemble of size *m* = 100 for wind speed. Table 4 compares the predictive performance of this approach to that of the wind speed EMOS technique. As noted, the wind speed observations are strongly discrete, with wind speeds below 2 kt recorded as zero, which applies to about 14% of the observations in the test period. From the perspective of wind vectors, these observations tend to fall into the center of the respective predictive distribution, so the effect of the discretization is weak. From the perspective of wind speeds, an observed value of zero is right at the boundary of the climatological range, so the effect is nonnegligible. To account for it, we replace every observation of zero by a number drawn uniformly and at random between 0 and 2 kt, or between 0 and 1.03 m s^{−1}, when computing the corresponding continuous ranked probability score or absolute error. The bivariate EMOS technique, particularly in its local version, nearly matches the predictive performance of the specialized wind speed EMOS technique of Thorarinsdottir and Gneiting (2010).

## 4. Discussion

In this paper, we have proposed a bivariate EMOS approach to the statistical postprocessing of ensemble forecasts of wind vectors that results in bivariate normal density forecasts. In experiments with 48-h-ahead forecasts of surface wind vectors over the Pacific Northwest, based on the University of Washington Mesoscale Ensemble (UWME), the postprocessed EMOS density forecast proved to bias correct and calibrate the raw ensemble forecast, therefore resulting in strongly improved deterministic and probabilistic predictive skill. When compared to the raw ensemble forecast and aggregated over calendar year 2008 and the Pacific Northwest, the Local EMOS technique reduced the bivariate mean absolute error by 13% and the bivariate mean energy score by 24%.

Our bivariate EMOS model has a total of 12 parameters, and it is useful to break the parameter estimation into phases. The correlation structure between the raw ensemble mean and the observations remains close to constant over time, and so we fit the trigonometric function (4) offline. In contrast, the mean and variance parameters in (2) and (3) may vary substantially over time due to seasonal changes in the forecast bias and spread–skill relationship. These parameters are thus estimated using a rolling training period, in procedures that have been shown to work well for other postprocessing methods such as Bayesian model averaging (BMA; Raftery et al. 2005).

There are several directions into which our bivariate EMOS method could be developed. In phases 2 and 3 of the estimation scheme, the exponential forgetting approach as in Pinson (2012) could be implemented, where the parameter estimates are updated in a computationally efficient, adaptive way. Furthermore, a geostatistical approach such as that of Kleiber et al. (2011) in the context of BMA could be developed in order to spread the Local EMOS estimates, which are currently available at observation locations only, over the model grid. Alternatively, if an analysis is used to fit the EMOS model, as was done by Pinson (2012), training data are available on the analysis grid, thereby allowing for a gridded Local EMOS approach.

A closer look at the scatterplots in Fig. 3 suggests that the conditional distribution of the observed wind vector, given the ensemble forecast, tends to be skewed. Thus, our bivariate EMOS approach, which is currently based on bivariate normal densities, could be extended to allow for skewed distributions, such as bivariate skew-normal or skew-*T* densities, similar to the wind vector time series methods proposed by Hering and Genton (2010). However, any such approach would be considerably more complex, and as it is more difficult to estimate a more complex predictive model, it is not clear whether or not such an extension would result in improved forecast performance.

Sloughter (2009) and Sloughter et al. (2012) proposed a bivariate version of the BMA technique for postprocessing ensemble forecasts of wind vectors. Like our EMOS method, the BMA approach results in a bivariate forecast density. However, the BMA forecast density is a finite mixture of bivariate, power-transformed normal densities and thus can be multimodal, as opposed to the EMOS forecast density, which is necessarily unimodal and elliptically symmetric. Thus, the basic setting of the EMOS approach is more parsimonious, thereby allowing for the key innovation in our work, namely, the explicit modeling of the correlation between the *u* and *υ* wind vector components, conditionally on the direction of the ensemble mean vector.

In contrast to the BMA and EMOS methods, the ensemble copula coupling (ECC) technique (Schefzik 2011) and the postprocessing approach proposed by Pinson (2012) generate discrete forecast ensembles that are constrained to have the same number of members, and the same bivariate Spearman rank correlation coefficients, as the raw ensemble. Therefore, these methods are particularly well adapted to large ensembles, where there is no pronounced need for the transition from a discrete forecast ensemble to a density forecast, nor any need for statistical correction of the conditional correlation structure between the wind vector components. Accordingly, both Pinson (2012) and Schefzik (2011) tailored their methods to the 50-member European Centre for Medium-Range Weather Forecasts (ECMWF) ensemble, while our work considered the much smaller eight-member UWME, where the wind vector ensemble forecasts for any given location and valid time may show unreliable, physically unrealistic empirical correlation coefficients. This effect is caused by the small ensemble size and corroborates the need for a parametric correlation model.

In this paper, we considered probabilistic forecasts of wind vectors for a single location and valid time. While we modeled the bivariate correlation structure between the wind vector components, we considered the postprocessed bivariate density forecasts at each site individually, without modeling the dependence structure between locations. To address the latter, the methods developed in our paper could be combined with the spatial statistical techniques introduced by Gel et al. (2004). A common advantage of the method proposed by Pinson (2012) and the ECC technique (Schefzik 2011) lies in their immediate extension to the calibration of spatiotemporal trajectories, in that the raw ensemble’s bivariate rank dependence structure is inherited by the postprocessed ensemble, subject to the aforementioned caveats.

## Acknowledgments

The authors are grateful to Jeff Baars, Tom M. Hamill, Clifford F. Mass, Adrian E. Raftery, and J. McLean Sloughter for discussions and for providing data. Tom Hamill communicated to us the idea that underlies ensemble copula coupling (ECC), well before we noticed the independent discussions in the work of Bremnes (2007) and Krzysztofowicz and Toth (2008), and well before the term ECC was coined in Schefzik (2011).

### APPENDIX

#### Verification Methods

In verifying probabilistic forecasts of a multivariate weather quantity, we use techniques introduced, studied, and used by Gneiting et al. (2008), Gneiting (2011), and Pinson and Hagedorn (2012).

To assess the calibration of probabilistic forecasts of wind vectors, we use the multivariate rank histogram, which is a natural, direct generalization of the verification rank histogram or Talagrand diagram for a univariate quantity (Anderson 1996; Hamill and Colucci 1997; Talagrand et al. 1997) and can be interpreted analogously, in ways described by Hamill (2001). In particular, U-shaped multivariate rank histograms correspond to underdispersed ensembles, while inverse U- or hump-shaped histograms indicate overdispersed ensembles. For a calibrated ensemble, we expect a uniform rank histogram. However, a uniform rank histogram is a necessary but not a sufficient criterion for a calibrated forecast (Hamill 2001; Gneiting et al. 2007). See also the discussion in Kolczynski et al. (2011) and Marzban et al. (2011), as well as the follow-up paper by Wilks (2011a).

For an ensemble with *m* members, the multivariate verification rank is a possibly randomized number between 1 and *m* + 1, and we refer to Gneiting et al. (2008) for the technical details in its construction. To quantify the departure of the rank histogram from uniformity, we use the discrepancy or reliability index Δ proposed by Delle Monache et al. (2006), given by

where *f _{i}* is the observed relative frequency of verification rank

*i*= 1, … ,

*m*+ 1. The UWME raw ensemble and ensemble copula coupling (ECC) techniques result in a discrete forecast ensemble with

*m*= 8 members, from which the construction of the multivariate rank histogram is straightforward. The error dressing techniques provide discrete forecast ensembles with

*m*= 35 members, and we bin the corresponding 36 ranks in the multivariate rank histogram into nine groups, comprising ranks 1–4, … , 33–36, respectively, to facilitate the comparison. For the Independent EMOS and bivariate EMOS techniques, we draw a simple random sample of size

*m*= 8 from the bivariate normal predictive distribution, and then compute the multivariate rank histogram.

To assess the overall quality of probabilistic forecasts, considering both calibration and sharpness, we use proper scoring rules (Gneiting and Raftery 2007; Wilks 2011b). For a univariate weather quantity, such as wind speed, the proper continuous ranked probability score (CRPS) is defined as

where *P* is the predictive distribution, here taking the form of a cumulative distribution function, *X* and *X*′ are independent random variables with cumulative distribution function *P*, and *y* is the verifying value (Gneiting and Raftery 2007). The term *I*(*x* ≥ *y*) denotes an indicator function, equal to 1 if *x* ≥ *y*, and equal to 0 otherwise, and is the expectation operator. The absolute error is defined as

where med_{P} is a median of the probability distribution *P* (Gneiting 2011; Pinson and Hagedorn 2012).

The energy score was introduced by Gneiting and Raftery (2007) and Gneiting et al. (2008) as a direct generalization of the CRPS (A2) in the evaluation of probabilistic forecasts of multivariate quantities. As we are interested in wind vectors, we restrict the discussion to bivariate weather quantities. The energy score then is defined as

where ‖·‖ denotes the Euclidean norm in , *P* is the predictive distribution, *X* and *X*′ are independent random vectors with distribution *P*, and *y* ∈ is the verifying wind vector. For an ensemble forecast, the predictive distribution *P*_{ens} has point mass at the member forecasts *x*_{1}, … , *x _{m}* ∈ , and the energy score can be evaluated as

We use this formula to compute the energy score for the UWME raw ensemble, as well as the ECC and error dressing techniques. For the Independent EMOS and bivariate EMOS techniques, we draw a simple random sample *x*_{1},… , *x _{k}* ∈ from the corresponding predictive density, and replace the exact energy score (A4) by the computationally efficient approximation:

where we use a sample of size *k* = 10 000. Similar approximations apply to the CRPS in (A2).

The natural generalization of the absolute error (A3) is the bivariate absolute error

where bmed_{P} denotes the bivariate or spatial median of the probability distribution *P*, defined as

where *X* is a random vector with distribution *P* (Vardi and Zhang 2000; Gneiting 2011). For an elliptically symmetric distribution, such as a bivariate normal distribution, the bivariate median and the mean vector coincide. For other types of bivariate distributions, such as an ensemble forecast with point mass at the member forecasts *x*_{1}, … , *x _{m}* ∈ , the bivariate median is generally different from the corresponding mean vector, and typically it needs to be determined numerically. For doing this we use the algorithm described by Vardi and Zhang (2000) and implemented in the R package ICSNP.

In practice, forecasting methods are assessed by averaging scores over a test period, resulting in the mean continuous ranked probability score (CRPS), mean absolute error (MAE), mean energy score (ES), and bivariate mean absolute error (bMAE), respectively. All these quantities are negatively oriented, that is, the smaller the better, and we report their values in the unit of meters per second.

## REFERENCES

_{I}-median and associated data depth