1. Introduction
In recent years, new verification methods have been developed for verifying spatial forecasts of precipitation (e.g., the ICP project at www.ral.ucar.edu/projects/icp; Brown et al. 2012; Ebert 2008; Rossa et al. 2008; Gilleland et al. 2009, 2010; Gilleland 2013a). These spatial verification metrics try to address the shortcomings of the traditionally used nonspatial precipitation verification metrics that suffer from several problems.
A common problem encountered in many forecasts is an offset in the predicted position of a weather event relative to where it actually occurred. When using a nonspatial verification metric, such a forecast incurs a “double penalty” since it is penalized for predicting an event where it did not occur, and again for failing to predict the event where it did actually occur. Another problem is that nonspatial scores do not distinguish between a “near miss” and much poorer forecasts. These problems tend to provide results that are inconsistent with a subjective evaluation of the forecast made by a human analyst. The need to reconcile subjective evaluations of forecast quality with objective measures of forecast accuracy has been the main motivation in researching and developing diagnostic methods for spatial verification that more adequately reflect their worth (Brown et al. 2012).
Although efforts to develop spatial verification metrics for precipitation have been under way for some time, very little improvement has been made regarding wind verification methods. This shortcoming was recognized and addressed by the ongoing Mesoscale spatial forecast Verification Intercomparison over Complex Terrain (MesoVICT) project (Dorninger et al. 2013) that, besides focusing on precipitation verification in complex terrain and ensemble forecasts, lists progress in wind verification as one of its primary goals.
Horizontal wind verification poses a specific problem since wind and precipitation fields are fundamentally different: the former is a vector field while the latter is a scalar field. The problem is usually avoided by converting the wind field into a scalar field by separately verifying the wind speed and wind direction or separately verifying the two wind components, making it more difficult to interpret the results.
At higher altitudes, horizontal wind is usually a relatively smooth variable (with the exception of wind near areas of strong convection or strong wind shear), which changes only gradually while, near the surface, the wind speed and direction might alter dramatically at smaller scales in response to changes in topography. The topography in a numerical model is usually smoothed because of the limitations of the model’s resolution. This smoothed topography can result in different wind patterns in which the winds might have a different speed or direction or be displaced due to the incorrect position of the topographic slopes. The spatial displacement of wind patterns also commonly happens at all altitudes in global models at synoptic scales with large forecast times (e.g., 9 days) since the synoptic-scale features, such as cyclones and frontal systems, might be significantly displaced. Thus, the spatial displacement error of wind is a common occurrence, and a double-penalty problem will arise, much like in the case of precipitation.
The fractions skill score (FSS; Roberts and Lean 2008; Roberts 2008) is a popular spatial verification metric used for verifying precipitation. It uses the concept of a neighborhood to alleviate the requirement that the event should be forecasted at exactly the correct position. The FSS can provide a truthful assessment of displacement errors and forecast skill and also enables forecasts of different resolutions to be compared against a common spatial truth (e.g., radar rainfall analyses) in such a way that high-resolution forecasts are not penalized for representativeness errors that arise from the double-penalty problem (Mittermaier and Roberts 2010; Mittermaier et al. 2013). The score can also be calculated very efficiently at low computational cost (Faggian et al. 2015). On the other hand, the score value can be influenced by the bias in the forecast, the orientation of the displacement, and the existence of a nearby domain border (Mittermaier and Roberts 2010; Skok 2015, 2016; Skok and Roberts 2016). The score, as originally defined, can analyze only scalar variables. Here an attempt is made to extend the score to analyze wind vector fields in a meaningful way. The new score is calculated and analyzed for cases in the MesoVICT database as well as an idealized setup.
Section 2 provides a definition of the new score and an evaluation of the score for a simple idealized case. Section 3 focuses on an analysis of the MesoVICT near-surface wind over the greater Alpine region, while section 4 provides the conclusions.
2. The new score
a. Definition of the 

Since the score is based on the idea of the FSS, the new score is named the wind fractions skill score, and denoted as
Next, the field is decomposed into a number of binary fields according to classes. The binary fields can only have values of 0 or 1. The value of 0 represents locations where a certain wind class is not present, while the value of 1 represents locations where a certain wind class is present. For each input wind vector field, there are













The












It is important to highlight that, contrary to the original FSS, the
It is worth highlighting that the
b. A demonstration of the new score’s spatial aspect
Traditional nonspatial grid-scale metrics only compare values at the same locations. This presents a problem since a common problem in many forecasts is an offset in the predicted position of a weather event relative to where it actually occurred. As previously mentioned, nonspatial scores do not distinguish between a near miss and much poorer forecasts. However, since the









The values of the two scores are calculated for an idealized case shown in Fig. 1. The idealized setup comprises a domain consisting of

A demonstration of the spatial aspect of the new score using an idealized experiment. (a),(b) The layouts of the observed and forecasted wind fields. (c) The dependence of the score value on the displacement of the veered wind classes in the forecasted field. The
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
Figure 1c shows how the values of
At the same time, it is important to note that, while the
3. Analysis of the MesoVICT surface wind
a. Description of the data
The behavior of the new score is tested on the MesoVICT dataset. MesoVICT (Dorninger et al. 2013) is a community project aimed at the development and analysis of verification methods. One of its primary aims is the development of verification methods for wind forecasts. The project also provides a community test bed where common datasets are available.
The MesoVICT test bed dataset consists of several cases. The dataset includes gridded data for surface wind obtained via the Vienna Enhanced Resolution Analysis (VERA) as well as forecasts provided by different models. The data have hourly resolution and are available on an 8-km regular grid in the Alps and surrounding area (Fig. 2). The analysis is performed for six MesoVICT cases (Table 1) on a rectangular domain, consisting of

The MesoVICT domain used for the
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
Description of the MesoVICT cases used for the

The VERA surface wind was obtained with use of the VERA scheme (Steinacker et al. 2000) that performs an interpolation of sparsely and irregularly distributed observations to a regular grid in mountainous terrain. The algorithm is based on a thin-plate spline approach and the analysis is independent of any model data. The resolution of the VERA analysis surface wind data is 8 km. Data from two models are usually available: forecasts from the 2.2-km Swiss model COSMO2 (denoted as CO2_00 and CO2_06 for setups initialized at 0000 and 0600 UTC) provided by MeteoSwiss, and forecasts from the 2.5-km Canadian high-resolution model Global Environmental Multiscale Limited-Area Model (GEMLAM) (denoted as CMH_06) provided by Environment Canada. The wind fields of the selected models and the VERA analysis are provided as ASCII files, with the model fields interpolated onto the VERA analysis grid. More information about the cases, models, and VERA analysis is available in Dorninger et al. (2013).
b. Definition of wind classes and calculation of the score value
To calculate the

(a) Wind rose for VERA analysis wind averaged over all cases and the whole domain. (b) The definition of the five basic wind classes. (c) The class frequency according to the basic class definition averaged over all the MesoVICT cases.
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
The simplest natural choice for differentiation according to wind direction is the four cardinal winds (American Meteorological Society 2017). It may also make sense to define a special wind class to represent calm/low wind, irrespective of the wind direction, since some types of widely used anemometers have problems estimating the wind speed or direction at low wind speeds. For example, medium-size cup anemometers require a minimum wind speed of around 1 m s−1 to work properly (Harrison 2014), which can be used as a threshold for the wind class.
We thus define five basic wind classes, as shown in Fig. 3b. Any wind with a speed below 1 m s−1 is defined as the calm/low wind class (regardless of the wind’s direction). If the wind speed exceeds the 1 m s−1 threshold, the wind is classified into one of the four additional direction-dependent classes (northerly, southerly, easterly, and westerly). Each direction-dependent class spans an azimuthal range of 90°. Figure 3c shows the frequency of wind classes averaged over all the MesoVICT cases. Although some differences do exist, the frequencies are quite evenly spread between different classes (e.g., there is no single class that would totally dominate other classes or some class that would be almost totally absent).
The use of classes presented in Fig. 3b will result in a score evaluating the spatial matching of the areas of the five basic wind classes. While the use of the five basic classes is reasonable in terms of surface wind in mountainous terrain, the classes might be defined differently for other cases. For example, when verifying upper-level jet streams, the direction of the wind might not be considered important and the focus could be solely on the magnitude of the wind speed. In this case, the classes could be defined only according to wind speed and the score would evaluate the spatial matching of the areas of different wind speeds only.
We advise the user to carefully consider how to define the wind classes. As stated before, the class definitions should cover the whole phase space of possible wind values (i.e., to make sure every wind vector can be assigned a wind class)—this will ensure the score value can always be calculated (i.e., no division with zero can occur). Moreover, it makes little sense choosing a definition with an overly large number of classes since this does not reflect how a subjective analysis by a human analyst would be done. A human analyst would not subjectively decompose a wind field into 100 classes before visually estimating a spatial match of the corresponding class regions. It also does not make sense to define classes in such a way that a single class would totally dominate over all the other classes in the dataset (that a single class would cover almost the whole domain in both fields). If this were the case, the score value would always tend to be near-perfect since the overlap of the dominating class would be nearly perfect over the whole domain. When deciding on how to define the wind classes, we advise the user to always analyze the wind rose corresponding to the dataset. Once the class definition is chosen, the frequency of all classes should be checked in order to avoid a single class being overly dominant. To avoid discrimination of certain forecasts, all forecasts should always be verified over the same domain using an identical grid and the same class definitions.
Figures 4–6 demonstrate the process of calculating the score value. Figure 4 shows the original surface wind vector fields for a selected example and the resulting wind class index fields. The index fields are obtained using the five basic classes from Fig. 3b. Figure 5 shows binary wind class fields for the same example. The binary fields are used to calculate the fractions needed by Eq. (1). These fractions can have values between 0 and 1 and represent a portion of the nonzero area inside a certain neighborhood. Figure 6 shows the resulting

An example of conversion of the wind fields into wind class index fields. Shown is the MesoVICT case 3 at 0600 UTC 25 Sep 2007 using the CO2_00 model forecast: (a),(b) the two input near-surface wind vector fields and (c),(d) the resulting wind class index fields according to the five basic wind classes from Fig. 3b.
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1

An example of binary wind class fields that are used to calculate the fractions. The figure represents the same case as in Fig. 4.
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1

The
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
c. The behavior of the new score in selected examples
To better understand the behavior of the score, it is helpful to examine some specially selected examples. From the MesoVICT dataset, a set of four interesting examples are selected for analysis. For each example in the set, the observational and the forecast wind class index fields are derived (Figs. 7a–d) and the resulting

Analysis of selected 1-h cases from the MesoVICT data. The cases are selected from having the best/worst
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
Example A represents an event with the smallest
Example B represents the opposite of example A, an event with the largest
Example C is chosen similarly to example A (an event with the smallest
The last example, D, shows an event with the lowest asymptotic value. Here a situation with a squall line ahead of a cold front led to widespread convective events in large parts of the area with strong and gusty winds observed in the vicinity of the convective cells. Similarly to some previous examples, the model has problems forecasting the correct position of the convective cells. Moreover, because of the wrong timing of the passage of the weather front, the convection starts earlier in the model, resulting in a squall line and overall stronger winds with the analysis containing only a few isolated convective cells with weak winds in the surrounding regions. In the VERA analysis, yellow (calm/low wind) covers the majority of the domain. In the model, almost no yellow is present with green, red, and blue covering areas of comparable size. This discrepancy represents a significant bias of the forecast, which in turn causes a very low
d. The case-averaged score value
The

The case-averaged
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
The test shows that, when comparing the CHM_06 with the other two models, the distributions of the
e. Analysis of sensitivity to class definitions
In the previous sections, the five basic wind classes defined in Fig. 3b are used. It is reasonable to assume the score value will depend on the chosen thresholds that define the classes, which is especially problematic for the wind direction thresholds. It makes sense to select the direction thresholds so that the direction-dependent classes span azimuthal ranges of equal size, although the exact location of the thresholds seems quite arbitrary. The selection of direction thresholds holds the potential to influence the score quite strongly. For example, there might be a case in which the prevailing wind direction would be very close to the threshold value used to distinguish between north and east wind. In this case, a small change in the wind direction in the forecast could dramatically change the
To test the sensitivity of the

(a) Azimuthal rotation of the five basic wind classes. Angle
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
Luckily, this potentially problematic effect can be minimized by calculating the
This approach is tested over all 896 possible MesoVICT dataset comparisons between the analysis and the model. For each comparison and at each neighborhood size, the difference between the maximal and minimal
In addition to the sensitivity to class rotation, there is another kind of sensitivity that is linked to the number of classes. In the case of Fig. 3b, five wind classes consist of a calm/low wind class and four additional classes according to the wind direction. Yet there is no inherent reason the classes have to be defined in this way. The classes could easily be made more sensitive to the wind direction and an additional four classes representing the northwest, northeast, southwest, and southeast directions included, with each class spanning only a 45° azimuth range (Fig. 10a). In this case, there would be a total of nine classes. Alternatively, additional classes could also be included according to wind speed, for example, 4 additional classes for “strong” winds as shown in Fig. 10b (e.g., the original easterly wind class is split into a “strong easterly wind” class and “medium easterly wind” class), or both modifications could be applied at the same time, resulting in 17 classes (Fig. 10c). When including additional classes, it is reasonable to expect that the overall

Alternative wind class definitions.
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
The impact of using alternative definitions for classes is tested on the MesoVICT cases. Figure 11 shows the analysis for cases 4 and 5, in which values for CMH_06 and CO_00 models are shown. Cases 4 and 5 are shown because they give very clear results when using the original class definition (the difference in model performance is either statistically significant or not, at all neighborhood sizes). When the number of classes increases, the

The impact of alternative wind class definitions on the
Citation: Monthly Weather Review 146, 1; 10.1175/MWR-D-16-0471.1
4. Discussion and conclusions
A new wind verification method is presented and analyzed. The new score is based on the idea of the fractions skill score. First, the original wind vector fields are decomposed into a number of binary fields using predefined wind classes. Second, the fractions are calculated, and the
Analysis of the MesoVICT cases and the idealized setup is performed to identify some properties of the new score. The new score avoids the problems of the nonspatial verification metrics (the double-penalty problem and the failure to distinguish between a near miss and much poorer forecasts) and can distinguish the forecasts even when the spatial displacement of wind patterns is large. Moreover, the time-averaged
The
It is worth noting that the new score can also be used with variables other than wind. Equations (1) and (2) are, in fact, general and represent a general multiclass version of the original FSS. The classes could also be defined for other variables. For example, instead of using a single precipitation class (as in the original FSS), two classes of precipitation could be used simultaneously: one representing medium-intensity precipitation and the other representing high-intensity precipitation. Alternatively, for atmospheric pressure, different classes could be defined, representing areas with low and high atmospheric pressure.
The authors wish to thank the three anonymous reviewers for a thorough examination of the manuscript and many valuable suggestions, which made the revised version of the manuscript much better than the original. The authors acknowledge the financial support from the Slovenian Research Agency (Research Core Funding P1-0188).
REFERENCES
American Meteorological Society, 2017: “Cardinal winds.” Glossary of Meteorology, http://glossary.ametsoc.org/wiki/Cardinal_winds.
Brown, B. G., E. Gilleland, and E. E. Ebert, 2012: Forecasts of spatial fields. Forecast Verification: A Practitioner's Guide in Atmospheric Science, 2nd ed. I. T. Jolliffe and D. B. Stephenson, Eds., Wiley, 95–117, https://doi.org/10.1002/9781119960003.ch6.
Dorninger, M., M. P. Mittermaier, E. Gilleland, E. E. Ebert, B. G. Brown, and L. J. Wilson, 2013: MesoVICT: Mesoscale Verification Inter-Comparison over Complex Terrain. NCAR Tech. Note NCAR/TN-505+STR, 23 pp., https://doi.org/10.5065/D6416V21.
Ebert, E. E., 2008: Fuzzy verification of high-resolution gridded forecasts: A review and proposed framework. Meteor. Appl., 15, 51–64, https://doi.org/10.1002/met.25.
Faggian, N., B. Roux, P. Steinle, and B. Ebert, 2015: Fast calculation of the fractions skill score. Mausam, 66, 457–466, http://metnet.imd.gov.in/mausamdocs/166310_F.pdf.
Gilleland, E., 2013a: Testing competing precipitation forecasts accurately and efficiently: The spatial prediction comparison test. Mon. Wea. Rev., 141, 340–355, https://doi.org/10.1175/MWR-D-12-00155.1.
Gilleland, E., 2013b: Two-dimensional kernel smoothing: Using the R package “smoothie.” NCAR Tech. Note NCAR/TN-502+STR, 17 pp., https://doi.org/10.5065/D61834G2.
Gilleland, E., D. Ahijevych, B. G. Brown, B. Casati, and E. E. Ebert, 2009: Intercomparison of spatial forecast verification methods. Wea. Forecasting, 24, 1416–1430, https://doi.org/10.1175/2009WAF2222269.1.
Gilleland, E., D. Ahijevych, B. G. Brown, and E. E. Ebert, 2010: Verifying forecasts spatially. Bull. Amer. Meteor. Soc., 91, 1365–1376, https://doi.org/10.1175/2010BAMS2819.1.
Harrison, G., 2014: Meteorological Measurements and Instrumentation. Wiley, 278 pp.
Mittermaier, M., and N. Roberts, 2010: Intercomparison of spatial forecast verification methods: Identifying skillful spatial scales using the fractions skill score. Wea. Forecasting, 25, 343–354, https://doi.org/10.1175/2009WAF2222260.1.
Mittermaier, M., N. Roberts, and S. A. Thompson, 2013: A long-term assessment of precipitation forecast skill using the fractions skill score. Meteor. Appl., 20, 176–186, https://doi.org/10.1002/met.296.
Roberts, N., 2008: Assessing the spatial and temporal variation in the skill of precipitation forecasts from an NWP model. Meteor. Appl., 15, 163–169, https://doi.org/10.1002/met.57.
Roberts, N., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78–97, https://doi.org/10.1175/2007MWR2123.1.
Rossa, A., P. Nurmi, and E. E. Ebert, 2008: Overview of methods for the verification of quantitative precipitation forecasts. Precipitation: Advances in Measurement, Estimation and Prediction, S. Michaelides, Ed., Springer, 419–452, https://doi.org/10.1007/978-3-540-77655-0_16.
Skok, G., 2015: Analysis of fraction skill score properties for a displaced rainband in a rectangular domain. Meteor. Appl., 22, 477–484, https://doi.org/10.1002/met.1478.
Skok, G., 2016: Analysis of fraction skill score properties for a displaced rainy grid point in a rectangular domain. Atmos. Res., 169, 556–565, https://doi.org/10.1016/j.atmosres.2015.04.012.
Skok, G., and N. Roberts, 2016: Analysis of fractions skill score properties for random precipitation fields and ECMWF forecasts. Quart. J. Roy. Meteor. Soc., 142, 2599–2610, https://doi.org/10.1002/qj.2849.
Steinacker, R., C. Häberli, and W. Pöttschacher, 2000: A transparent method for the analysis and quality evaluation of irregularly distributed and noisy observational data. Mon. Wea. Rev., 128, 2303–2316, https://doi.org/10.1175/1520-0493(2000)128<2303:ATMFTA>2.0.CO;2.
Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 100, Academic Press, 648 pp.
The R software source code for calculating the