MesoVICT focuses on the application, capability, and enhancement of spatial verification methods as applied to deterministic and ensemble forecasts of precipitation, wind, and temperature over complex terrain and includes observation uncertainty assessment.
As numerical weather prediction (NWP) models began to increase considerably in resolution, it became clear that traditional gridpoint-by-gridpoint verification methods did not provide adequate or sufficient diagnostic information about forecast performance for some users (e.g., Mass et al. 2002). Double penalties arising from spatial displacement (or perhaps timing) errors and more rapid growth of small-scale errors often result in poorer performance scores for higher-resolution forecasts than their coarser counterparts, even when subjective evaluation would support the higher-resolution models as being better. Subsequently, a host of new verification methods were developed very rapidly (Ebert and McBride 2000; Harris et al. 2001; Casati et al. 2004; Nachamkin 2004; Davis et al. 2006; Keil and Craig 2007; Roberts and Lean 2008; Marzban et al. 2009; Gilleland et al. 2010b), which we will refer to as spatial methods. There still are gaps in our understanding when it comes to interpreting what all the new spatial methods tell us; gaining an in-depth understanding of forecast performance depends on grasping the full meaning of the verification results. Furthermore, the investment required to implement a new spatial method is relatively high compared to traditional verification methods, making it important to have some criteria on which to decide which methods would best suit a particular user’s need(s). Therefore, a spatial methods meta-verification, or intercomparison, project was created to try to answer some of these questions.
The first spatial verification methods intercomparison project (ICP; Gilleland et al. 2009, 2010a) was initiated in 2007 with the aim of better understanding the rapidly increasing literature concerning new spatial verification methods, and several questions were addressed:
How does each method inform about forecast performance overall?
Which aspects of forecast error does each method best identify (e.g., location error, scale dependence of the skill, etc.)?
Which methods yield identical information to each other and which methods provide complementary information?
The aim of the ICP was to analyze the behavior and provide a structured and systematic cataloging of the existing spatial verification methods, with the final goal of providing guidance to the users on which methods are best for which purpose. The project focused on prescribed errors for idealized cases and quantitative precipitation forecasts, where the ability of the verification methods to diagnose the known error was assessed (Ahijevych et al. 2009). For this first intercomparison, complex terrain was considered too problematic, and therefore a test dataset from a region with relatively flat terrain was chosen, namely, the Great Plains of the United States. Moreover, the verification methods were tested on nine selected case studies from the 2005 National Severe Storms Laboratory (NSSL) and Storm Prediction Center (SPC) Spring Experiment, using Stage II precipitation analyses as the verification fields, and the results were compared to human subjective assessments. The focus then was on the existing NWP operational capability: convection-permitting 4–7-km grids and the challenges these grid spacings presented for verifying precipitation forecasts.
The ICP was highly informative not only for end users, but also for the verification method developers, and it led to several improvements in the verification approaches (e.g., Keil and Craig 2009). Results for specific methods from the first intercomparison of spatial verification methods can be found in the Weather and Forecasting ICP special collection (https://journals.ametsoc.org/topic/verification_icp; Brill and Mesinger 2009; Davis et al. 2009; Ebert 2009; Ebert and Gallus 2009; Keil and Craig 2009; Marzban et al. 2009; Nachamkin 2009; Wernli et al. 2009; Casati 2010; Gilleland et al. 2010b; Lack et al. 2010; Lakshmanan and Kain 2010; Mittermaier and Roberts 2010).
The ICP helped to promote the mainstream adoption of spatial verification approaches. Since the ICP special collection, a large number of papers have been published making use of the newer spatial techniques. For example, neighborhood methods [particularly the fractions skill score (FSS)] have been employed by Weusthoff et al. (2010), Schaffer et al. (2011), Sobash et al. (2011), Duc et al. (2013), Mittermaier et al. (2013), and Skok and Roberts (2016). Scale separation methods have been used by De Sales and Xue (2011) and Liu et al. (2011), and field deformation approaches by Nan et al. (2010). Feature-based techniques have been used by Demaria et al. (2011), Hartung et al. (2011), Johnson et al. (2011), Gorgas and Dorninger (2012b), Wapler et al. (2012), Crocker and Mittermaier (2013), Mittermaier and Bullock (2013), Weniger and Friederichs (2016), and Mittermaier et al. (2016). The verification of forecasts at observing locations has been extended since papers by Theis et al. (2005), Ben Bouallègue and Theis (2014), Mittermaier (2014), and Mittermaier and Csima (2017), who compared deterministic and ensembles forecasts on the kilometer scale. A comprehensive list of papers can be found at the spatial verification intercomparison website (www.ral.ucar.edu/projects/icp/references.html).
Current operational NWP now includes kilometer-scale deterministic and ensemble systems, in which the details of the terrain are more explicitly felt by the models. Small-scale detail is not confined to just precipitation but also to other spatial fields. Moreover, as NWP evolves and models become more accurate, analysis and observation uncertainty related to measurements and recording procedures, temporal and spatial sampling, gridding procedures, and other observation errors have an evolving (and sometimes increasingly important) impact on the verification results and ought to be taken into account in verification practice.
A second phase of the ICP, called the Mesoscale Verification Intercomparison over Complex Terrain (MesoVICT; www.ral.ucar.edu/projects/icp), was established in 2014, where the concept of the spatial methods intercomparison is extended to incorporate recent trends in modeling and verification science. The focus of MesoVICT is the application, assessment, and enhancement of the capability of spatial verification methods to evaluate deterministic and ensemble forecasts over complex terrain at near-convection-resolving and convection-permitting resolution. Test cases include additional variables beyond precipitation, such as wind and temperature. The cases represent interesting meteorological events that develop over time rather than single snapshots, in a region of complex terrain over the Alps where extensive observation data were collected during the Mesoscale Alpine Program Forecast Demonstration Project (MAP-FDP; Rotach et al. 2009). Synoptic observations from a very dense observation network provided the input for the generation of model-independent analyses that are used as verifying fields.
This article introduces the rationale behind MesoVICT, gives an overview, updates the classification of spatial verification techniques, describes the MesoVICT datasets, and gives a simple example of the approach used to evaluate the verification methods. The results of the MesoVICT project will be reported in an American Meteorological Society (AMS) special collection in Monthly Weather Review and Weather and Forecasting.
MesoVICT is a nonfunded, open, collaborative project. Participation in the project is still possible and welcomed. Researchers, including students, who are interested in participating are encouraged to contact one of the authors or visit the project homepage at https://ral.ucar.edu/projects/icp/.
MesoVICT STRUCTURE.
Building from the first intercomparison.
The major results of the first spatial verification methods intercomparison project are summarized in Gilleland et al. (2009, 2010a) and Ahijevych et al. (2009). Through that project, the spatial verification approaches were categorized into four classes:
Neighborhood methods relax the requirements of exact forecast location and define neighborhoods of increasing sizes in which forecasts and observations are matched, which can also be applied on the time scale. This is equivalent to applying a low-pass spatial filter. The data treatment within the neighborhood defines the verification strategy, ranging from simple upscaling to the use of probabilistic and ensemble verification approaches by assessing the forecast probability density function within the observation neighborhood. Also included in this class are single-observation–neighborhood-forecast methods that compare occurrence of the event in the forecast neighborhood with the observed occurrence of the event.
Scale-separation techniques use single-band spatial filters to decompose forecast and observed fields into scale components from which traditional verification scores are evaluated for each individual scale component, separately. These methods enable assessment of the scale dependence of the bias, error, and skill and evaluation of the forecast versus observed scale structure, that is, the scales for which the forecast can replicate observed structures, and scales at which the forecast performs well.
Feature-based (object based) methods first identify and isolate features in the forecast and observation domains by applying a threshold (e.g., rainfall accumulations ≥ 10 mm) and then assess different feature attributes (e.g., location, extent, intensity) for the paired forecast–observation features.
Field-deformation techniques use a vector (displacement) field to morph the forecast field versus the observed field (up to an optimal fit); then a scalar (amplitude) field is applied in order to correct intensity errors. These techniques assess the displacement and intensity error evaluated over the whole field.
Distance measures for binary images assess the distance between forecast and observation fields by evaluating the geographical distances between all the grid points exceeding a selected threshold. These techniques can be considered a hybrid between field-deformation and feature-based techniques.
Distance measures for binary images were developed in image processing for edge detection and pattern recognition and include Pratt’s figure of merit (FoM; Pratt 1978); the Fréchet distance (Alt and Godau 1995; Eiter and Mannila 1994); the Hausdorff metric and its derivatives, the modified and partial Hausdorff distances (Dubuisson and Jain 1994); the mean error distance (MED; Peli and Malah 1982); and the Baddeley delta metric (Baddeley 1992a,b). These distance measures are sensitive to the difference in shape and extent of objects and assess the distance/displacement between forecast and observation features. Several studies have exploited these metrics for spatial verification of precipitation forecasts (Schwedler and Baldwin 2011; Gilleland 2011, 2017; Gilleland et al. 2008; Venugopal et al. 2005; Zhu et al. 2011) and sea ice prediction (Hebert et al. 2015; Heinrichs et al. 2006; Dukhovskoy et al. 2015).
In the first intercomparison classification, the neighborhood and scale-separation methods were referred to as filtering methods because they use spatial filtering, and the field-deformation and feature-based techniques were grouped as displacement techniques because they explicitly assess displacements. However, these classes represent a fairly general categorization of the methods so that two methods in the same class can still be very different; in fact, some methods straddle more than one category, and others simply do not fit very well in any category. For example, the structure–amplitude–location (SAL) method introduced by Wernli et al. (2009) identifies features in the fields but does not analyze them individually; rather, it yields summary scores based on all features across space. Often, techniques from one category of methods might be used for optimizing a method in another category: for example, a low-pass smoother can be adopted within a feature-based approach to reduce small-scale noise (and hence help to identify the objects). Similarly, scale separation and smoothing are applied in some field deformation approaches: as an example, Keil and Craig’s (2009) displacement and amplitude score (DES) uses a spatial filter prior to defining the deformation vector field, and the image warping approach presented by Gilleland et al. (2010b) uses smoothing to help find the best fitting warp, but the final summary is based on the original, raw amplitudes without any smoothing.
Therefore, a new penta-classification (Fig. 1) is proposed here, which aims to highlight the similarities and overlaps between classes. The proximity of the techniques in the penta-classification diagram indicates which methods are more closely related. For example, scale-separation approaches are close to neighborhood methods because they both rely on spatial filtering (a single band versus a low-bandpass filter, respectively). Field-deformation methods overlap with the scale-separation approaches because many field-deformation algorithms perform a single-band spatial filtering prior to the field morphing. Neighborhood methods are close to feature-based methods because smoothing (i.e., a low-bandpass filter) is often applied to better identifying features (e.g., from high-resolution models). Finally, distance metrics bridge feature-based and field-deformation methods: in fact, they are related to feature-based methods because they both assess distances between features; however, their algorithms resemble more those of field-deformation approaches because they measure the distances between all gridpoint pairs (with no prior identification and matching of forecast and observed features).
Objectives.
The first spatial verification intercomparison project addressed the following research questions:
How does the method inform about performance at different scales?
Does the method inform about the spatial structure error?
How does the method inform about location error?
Can the method inform about timing error?
Does the method inform about intensity error and distribution differences?
Does the method provide information about hits, misses, false alarms, and correct negatives?
Does the method do anything that is counterintuitive?
Does the method have tunable parameters, and how sensitive are the results to their specific user-chosen values?
Can the results be aggregated across multiple cases?
Are the results accompanied by confidence intervals or statistical significance?
A table of attributes was produced as a reference and guide by Gilleland et al. (2009) in order to help users identify which methods provide useful information to answer each of those questions.
For MesoVICT, new challenges and research questions have been identified:
What is the ability of the method to verify forecasts of variables other than precipitation (e.g., wind)?
How can the method be adapted to evaluate ensemble forecasts?
Does the method show unusual behavior in complex terrain, and how should results be interpreted given the challenges of forecasting in complex terrain?
What is the sensitivity of existing spatial verification methods to their own specific tuning parameters, the domain size, interpolation, and regridding? The aim of this assessment is to provide guidance on the best practices.
Can the method be used fairly to compare the performance of high-resolution and coarser-resolution forecasts?
Can the method account, or be adapted to account, for analysis or observation uncertainty?
The MesoVICT outcomes shall lead to benefits for verification users and verification method developers. The aim is to refresh the guidance for the end user on the best use of spatial verification approaches. On the other hand, this analysis will potentially identify shortcomings in existing methods. MesoVICT is encouraging the scientific community to further develop and improve the existing spatial verification methods.
Experiment design.
MesoVICT is a verification methods (meta verification) intercomparison rather than a model intercomparison. The focus of MesoVICT is, therefore, to analyze and document the behavior of the spatial verification methods. Participants in the intercomparison are expected to test their chosen spatial verification methods on a set of selected case studies. To harmonize the intercomparison and facilitate participation, the selected NWP forecasts and a verifying gridded analysis and station observations have been interpolated onto a common preformatted grid, and all the data can be downloaded in ASCII format from the MesoVICT website www.ral.ucar.edu/projects/icp/. The MesoVICT data and a simple example are described in the “MesoVICT data” and “MesoVICT case and example application” sections.
The intercomparison is structured with the view of maximizing participation (hence including a wide spectrum of preexisting and new spatial verification methods), but also ensuring that participants analyze the same data and focus on addressing the same scientific questions, which is crucial if the intercomparison is to succeed. Figure 2 provides a schematic of the experimental design (Dorninger et al. 2013). All participants in the intercomparison are requested to complete the analysis of the core experiment (case 1) for inclusion in the intercomparison final review. There are six identified cases (described in the “MesoVICT case and example application” section and Table 5), of which the analysis of case 1 is the absolute minimum. The core of the intercomparison focuses on the assessment of deterministic precipitation forecasts over complex terrain. Both gridded and point observations are provided, depending on what a particular method requires.
Following Dorninger et al. (2013), the core experiment provides the foundation on which the other tiers are gradually built up tier by tier. Beyond the core, participants can choose how many additional tiers to evaluate and perhaps make other contributions. The tiers represent a progression in terms of forecast type, parameter, and choice of a verification analysis or observation, exploring a range of challenges:
Tier 1 explores the spatial verification of deterministic wind forecasts, as well as ensemble forecasts of precipitation and wind, against control analysis and point observations.
Tier 2a considers the use of an ensemble of analyses as the verification dataset for quantifying analysis uncertainty in a deterministic forecast context.
Tier 2b considers the use of an ensemble of analyses as the verification dataset for quantifying analysis uncertainty in an ensemble forecast context.
Tier 3 is the user-defined tier where method-specific sensitivities can be explored, model reruns can be assessed alongside the common dataset, or other parameters can be explored.
One of the key aims of the first spatial verification method intercomparison was to understand whether the spatial verification approaches provide more intuitive results that are in better agreement with a human’s subjective verification. Objective results of the different spatial verification methods were therefore compared to a subjective assessment (Ahijevych et al. 2009). The experiences gained during this process revealed just how challenging subjective assessment can be, as there was considerable disagreement between the assessors, depending on how each ranked the forecast attributes from the most to the least important. We do not propose to repeat exactly the same methodology in MesoVICT, but instead construct a ranking of cases for each of the participating methods and compare how the different verification methods rank the same set of cases. This process could result in a grouping of methods that assess similar attributes. Because each case study spans a time period of a few days, hourly scores can be used to track aspects of forecast performance over the case study time span. Furthermore, aggregation of the forecast performance metrics over the time period of the case study will provide the summary score for the ranking of the six case studies, which represent very different but typical synoptic settings in the Alpine region. Participants are encouraged to also provide inference information for these aggregated scores (e.g., confidence intervals or p scores) to ensure that the ranking of the case studies is accompanied by statistical significance.
MesoVICT DATA.
MesoVICT takes advantage of the huge data collection effort within the framework of two World Weather Research Programme (WWRP) Forecast Demonstration Projects (FDPs), namely, the Mesoscale Alpine Program (MAP)–Demonstration of Probabilistic Hydrological and Atmospheric Simulation of Flood Events in the Alpine Region (D-PHASE) project (Rotach et al. 2009) and the Convective and Orographically-Induced Precipitation Study (COPS; Wulfmeyer et al. 2008) over central Europe in 2007. This data collection covers observations, Vienna Enhanced Resolution Analyses (VERA), and deterministic and ensemble model forecasts for at least June–November 2007. All of these data are stored at the World Data Centre for Climate (WDCC) of the Deutsches Klimarechenzentrum (DKRZ) in Hamburg, Germany, and are freely available (https://cera-www.dkrz.de/WDCC/ui/cerasearch/). Observational data and VERA analyses are stored in netCDF format and model forecasts are stored in gridded binary 1 (GRIB1) format. To facilitate ease of access to the data for MesoVICT participants, data for the selected case studies are provided in ASCII format (“MesoVICT case and example application” section) at the National Center for Atmospheric Research (NCAR) website and can be downloaded from www.ral.ucar.edu/projects/icp.
NWP models.
During the D-PHASE Operations Period (DOP) from June to November 2007, a total of 23 atmospheric deterministic NWP models were run in an operational mode, many at a horizontal grid spacing of a few kilometers (convection-permitting models). Moreover, seven regional atmospheric ensemble modeling systems of intermediate resolution were run with up to 24 members. Some basic model specifications of the participating models, including their lower-resolution driving models, are given by Arpagaus et al. (2009). Not all of the high-resolution model domains cover the entire Alpine region, which makes a comprehensive model comparison difficult, as shown by Dorninger and Gorgas (2013). Therefore, three models have been selected for MesoVICT: the forecasts from the Swiss model Consortium for Small-Scale Modeling 2 (COSMO-2), the Canadian Global Environmental Multiscale Limited Area Model (GEM-LAM), and the ensemble model COSMO Limited Area Ensemble Prediction System (COSMO-LEPS) from the Hydro-Meteo-Climate Regional Service of Emilia-Romagna (ARPA-SIMC), Italy. Their basic model specifications are listed in Table 1 and their limited area domains are illustrated in Fig. 3a. References and links to model documentation are provided in Table 2.
Basic model specifications of deterministic and limited area ensemble models participating in MesoVICT. MAP D-PHASE model runs are shaded. All other model runs represent reruns with current model versions.
References and documentation links of MesoVICT models [D-PHASE models (shaded) as well as model reruns].
Recognizing the opportunity to analyze the advances in NWP systems within the framework of MesoVICT, several institutions provided reruns of the case studies using their most recent state-of-the-art NWP systems. The reruns that have been completed to support projects in tier 3 are listed in Table 1, and a selection of their mainly innermost domains is shown in Fig. 3b. Again, references and links to model documentation can be found in Table 2.
VERA analysis.
Because the gridded analysis and its uncertainty are key components of MesoVICT, it is important to briefly explain how it is generated. To simplify the intercomparison of verification methods requiring gridded analyses, the fields of the selected models were interpolated onto the 8-km VERA grid using an inverse distance method. The method used for remapping the NWP fields is based on the Cressman interpolation scheme (Cressman 1959). The interpolation radius is adapted according to the model resolution so that at least nine surrounding grid points are involved in each interpolation. For the interpolation of a 2.2-km grid on an 8-km grid, an interpolation radius of 4 km was chosen. The selection of 8 km for the analysis grid is a compromise between the available observation density (∼16 km) and the model grid resolution (kilometer scale).
The VERA scheme (Steinacker et al. 2000) has been developed to provide the best possible model-independent analysis fields in complex terrain. Sparsely and irregularly distributed observations are interpolated to a regular grid using a thin-plate spline algorithm. Upstream of the analysis, VERA applies a comprehensive data quality control scheme in order to exclude erroneous data from the analysis procedure (Steinacker et al. 2011). The analysis scheme does not make use of any NWP-model information as background fields, which makes it very suitable for model verification purposes.
For downscaling the analysis beyond the resolution given by the observation network, the so-called fingerprint method has been developed (Steinacker et al. 2006; Bica et al. 2007). The irregular spacing and sparse density of station observations with respect to topography (i.e., in valleys and basins, on mountaintops, on passes and slopes) may result in a quite rough analysis field. On the one hand, conventional analysis systems cannot sufficiently resolve small-scale structures caused by topography, and they will be treated as noise and smoothed out. On the other hand, mountainous topography can produce small-scale structures of considerable amplitudes. Two different physical processes can be identified as the cause for the modification of the atmosphere in complex terrain. These are thermal effects due to different heating or cooling of the atmosphere over mountains (e.g., thermal high or thermal low over the Alps; Bica et al. 2007) and dynamical effects (e.g., blocking and lee-side effects). These features can be modeled far below the scales resolved by the observation network depending on whether a very high-resolution topographic dataset is available. Such modeled thermal and dynamic fingerprints are used to downscale the observation data locally by a mean square method. They are calculated for every analysis individually by computing weighting factors. These fingerprints have much in common with empirical orthogonal functions (EOFs), but they are determined physically rather than statistically.
VERA is a two-dimensional analysis system for surface parameters. Hourly analysis fields have been produced for mean sea level pressure, surface potential and equivalent potential temperature, near-surface wind, and accumulated precipitation as default parameters. Surface mixing ratio and moisture flux divergence are computed from the default output in a postprocessing step. Three fingerprints have been implemented: one thermal and two dynamic fingerprints (one for east–west, the other for north–south streaming patterns), which enhances the information for pressure- and temperature-related parameters in data-sparse regions. Although wind and precipitation analyses are not supported by fingerprints, Dorninger et al. (2008) showed that the quality of the analysis is not diminished as long as there is a sufficient coverage of observation stations.
The ensemble of VERA is generated using the observation error ensemble approach of Gorgas and Dorninger (2012a). In this approach error estimates at observation locations, derived as residuals from VERA’s quality control scheme (Steinacker et al. 2011), are perturbed assuming a Gaussian distribution and added to the observations, which are then reanalyzed using scale-dependent weightings of the perturbations. For details, refer to Gorgas and Dorninger (2012a).
VERA is applicable to arbitrary grid resolutions and analysis domains. For MesoVICT it covers the larger D-PHASE domain with horizontal grid resolution of 8 km. The domain of the VERA ensemble extends only over the larger Alpine region for computational reasons (Fig. 3).
Station network.
For verification methods that make use of the observations at sites rather than on a grid, data from a very dense observation network in and around the Alps are provided for the MesoVICT community. In a joint activity of MAP D-PHASE and COPS, a unified D-PHASE–COPS (JDC) dataset of surface observations over central Europe was established (Dorninger et al. 2009; Gorgas et al. 2009). These products include data provided via the Global Telecommunication System (GTS) of the World Meteorological Organization (WMO) as well as from other networks for the whole of 2007, including the COPS measurement period (June–August 2007) and the DOP of D-PHASE (June–November 2007). A list of all data providers is given in Table 3 (Dorninger et al. 2013). The Department of Meteorology and Geophysics of the University of Vienna took over the responsibility for this data-collection activity.
National and regional institutions providing data for the JDC dataset.
The JDC dataset consisted of reports from more than 12,000 stations over Europe corresponding to a mean station distance of approximately 16 km. Not all stations measure all parameters. An overview of all data included in the JDC dataset can be found in Table 4 (Dorninger et al. 2013). Some of the station networks measured precipitation at different accumulation periods. The accumulation periods of non-GTS precipitation data varied among weather services. To create a homogeneous dataset of highest possible station density for precipitation, accumulation periods of shorter than 1 h were summed up to 1-, 3-, 6-, 12-, and 24-h periods.
Data provided by national and regional networks, where N defines the type of observation network with C = climatology network, S = synop network, S* = COSMO network in Italy, R = pure precipitation network. Parameters: dd = wind direction, ff = wind speed, T = 2-m temperature, Td = 2-m dewpoint temperature, Tn/Tx = min/max temperature, p = air pressure at surface, psl = air pressure at sea level, Fxx = gust speed, rr = accumulated precipitation, N = cloud cover, VV = horizontal visibility.
Selected cases.
MesoVICT CASE AND EXAMPLE APPLICATION.
A set of six synoptic cases have been selected, covering a wide range of meteorological phenomena in and around the Alps, for example, widespread convective events, organized convection on squall lines, cyclogenesis with heavy precipitation leading to severe flooding, and cold front interactions with the Alpine barrier (Table 5). For a detailed description of the synoptic situation of the different cases, the reader is referred to Dorninger et al. (2013).
In addition to these NWP case studies, a new set of idealized synthetic cases is proposed within MesoVICT (available at www.ral.ucar.edu/projects/icp). Synthetic cases primarily aim to represent simplified and individual forecast errors (e.g., a displacement or an extent error) and in the first ICP proved to be very informative on the basic diagnostic capabilities of the spatial verification methods.
Only a few studies have addressed some of the scientific questions listed in the “Objectives” subsection, and MesoVICT is the first coordinated effort to answer them for several verification methods. Therefore, MesoVICT participants are asked to begin their studies by running the core case (Fig. 2). The core case (20–22 June 2007) is characterized by strong convective events due to unstable warm and moist air masses advected into the Alpine region on 20 June 2007. The following day intense convective events occurred again ahead of a cold front with strong westerly winds. The resulting spotty rain field for a 12-h accumulation period is shown in Fig. 4a together with the cold front approaching from the northwest in terms of equivalent potential temperature in Fig. 4b.
In the following we present a simple example to show how the scientific questions can be addressed. We use the mean absolute error (MAE) and continuous ranked probability score (CRPS) as verification measures and as comparators for the spatial verification methods that will be evaluated in the MesoVICT project.
Figure 5 shows the spatial distribution of MAE for precipitation for the whole period of the core case. Large spatial variability is evident, partly related to the shape of the Alpine barrier (over Austria and parts of Switzerland). The time series of the MAE and CRPS are presented in Fig. 6. They show a pronounced maximum at around 0300 UTC 21 June 2007, when the cold front and its associated rainband (not shown) impinges on the Alps.
What is the ability of the method to verify forecasts of variables other than precipitation forecasts (e.g., wind)?
In theory the MAE and CRPS can be applied to any scalar variable, though for quantities that vary by orders of magnitude, it is probably wise to transform the variable before using it, for example, precipitation or cloud-base height.
How can the method be adapted to evaluate ensemble forecasts?
Adapting the MAE for ensemble forecasts is rather straightforward since the CRPS becomes the MAE in the limiting (deterministic) case (Hersbach 2000).
Does the method show unusual behavior in complex terrain, and how should results be interpreted given the challenges of forecasting in complex terrain?
A key question for spatial verification methods is if and how the method can account for meteorological fields connected to the mountain ranges (e.g., increasing precipitation rates with higher altitudes, windward/leeward effects for precipitation and foehn effects, cold air pooling in valleys and basins, valley wind systems versus synoptic winds at higher altitudes, etc.). The behavior of MAE and CRPS in complex terrain depends on how well the model represents the actual orography. For surface variables the impact of model orography and the deviation of that from the real orography may lead to larger errors in complex terrain, especially for variables such as temperature. Precipitation and/or cloud may also be displaced relative to the underlying orography. Model resolution impacts the level of detail contained in the model orography (coarser models have a smoother orography, with potentially larger deviations in terms of actual and model heights). Additionally, domain average statistics often needed for spatial verification methods may include data over flat and complex terrain diluting the signal, which has also to be taken into account.
What is the sensitivity of existing spatial verification methods to their own specific tuning parameters, the domain size, interpolation, and regridding?
The MAE and CRPS are not spatial methods per se, but they can be applied at a grid-square by grid-square basis to create spatial maps of forecast accuracy (Fig. 5). They could also be applied to upscaled or smoothed fields to show that results are sensitive to interpolation/regridding and displacement.
Can the method be used fairly to compare the performance of high-resolution and coarser-resolution forecasts?
If the MAE and/or CRPS are applied to forecasts originating on different grids, then it is recommended to interpolate both to a high-resolution grid (but not higher than the real resolution of the analysis grid) so that no information is lost from either forecast. It is also recommended to interpolate both forecasts to station locations (or use nearest grid point for precipitation) and verify using station observations. The results can be plotted on maps to examine the spatial structure. Ideally, one could track the information content on different scales and compare directly, scale by scale (an approach used by many spatial verification methods).
Can the method account, or be adapted to account, for analysis/observation uncertainty?
The MAE and CRPS can be adapted. Figures 7 and 8 show the uncertainty of the CRPS for wind speed forecasts at the grid point corresponding to Vienna. Each single ensemble member of the VERA ensemble (Gorgas and Dorninger 2012a) has been used to calculate the CRPS. Figure 7 shows the uncertainty of the analysis indicated by the spread of the different ensemble members, resulting in substantial differences for CRPS. The time series as shown in Fig. 8 for the MesoVICT core case indicate large variability of the uncertainty in the CRPS. This variation may be connected to the diurnal cycle of the wind speed (lower wind speeds during night resulting in lower variance of CRPS) and/or high variance of the wind observations during the frontal passage (approximately 1200–1500 UTC 21 June 2007 for Vienna).
SUMMARY, RECENT MesoVICT ACTIVITY, AND NEXT STEPS.
The initial spatial forecast verification intercomparison (ICP) was initiated in response to the rapid development of new methods for verifying high-resolution forecasts. The ICP clarified much about how the newly proposed methods described aspects of forecast error, which ones provided diagnostic guidance (and what sort), and which methods might yield similar kinds of information. However, the ICP cases focused only on verifying deterministic forecasts of precipitation over the central United States with characteristically flat terrain. A great deal remains to be learned about the information content and behavior of spatial verification methods in other forecast contexts.
This second phase of the project, the Mesoscale Verification Intercomparison in Complex Terrain (MesoVICT), was initiated in 2014 with the aim of advancing the knowledge of the various methods to determine how well they inform about forecast performance over complex terrain, for additional variables (e.g., wind and temperature), in the presence of modeling uncertainty (represented by ensemble NWP) and observation and analysis uncertainty (represented by an ensemble of analyses). Point observations have also been provided as verification data. Instead of single snapshots in time, MesoVICT cases evolve over a few days, and hence are both more realistic for a forecaster, but also more complicated in terms of analyzing model performance. To help ensure that all methods are analyzed on the same set of cases, a core case has been identified that all participants are expected to analyze. Beyond this core case, a tiered evaluation framework is provided to allow for more advanced studies to be conducted within the project.
The project started with an initial planning meeting in September 2013 at the 13th European Meteorological Society (EMS) and 11th European Conference on Applications of Meteorology (ECAM) annual conference in Reading, United Kingdom. This was followed by a kickoff meeting (First MesoVICT Workshop) held in Vienna in October 2014. Since then, MesoVICT meetings and presentations have taken place at other conferences on several occasions, including at each EMS annual meeting. A full list is given in Table 6, along with current plans for future meetings. Some first investigations using MesoVICT data have already been published (Geiß 2015; Gilleland 2017; Kloiber 2017; Skok and Hladnik 2018), and more are expected within the next few years. A closing meeting is planned for 2020, and the results of the MesoVICT project will be reported in an American Meteorological Society special collection in Monthly Weather Review and Weather and Forecasting.
MesoVICT activities and presentations at respective conferences.
ACKNOWLEDGMENTS
MesoVICT is organized by the World Meteorological Organization (WMO) Joint Working Group on Forecast Verification Research (JWGFVR) and is an activity of the World Weather Research Programme (WWRP) High Impact Weather (HIWeather) Evaluation task team. NCAR is sponsored by the U.S. National Science Foundation. Additional support for Brown and Gilleland was provided from the National Science Foundation (NSF) through Earth System Modeling (EaSM) Grant AGS-1243030. Work at NCAR was also supported in part by the Air Force 557th Weather Wing. Simon Kloiber (University of Vienna) provided Figs. 7 and 8.
REFERENCES
Ahijevych, D., E. Gilleland, B. Brown, and E. Ebert, 2009: Application of spatial forecast verification methods to gridded precipitation forecasts. Wea. Forecasting, 24, 1485–1497, https://doi.org/10.1175/2009WAF2222298.1.
Alt, H., and M. Godau, 1995: Computing the Fréchet distance between two polygonal curves. Int. J. Comput. Geom. Appl., 5, 75–91, https://doi.org/10.1142/S0218195995000064.
Ament, F., and M. Arpagaus, 2009: Dphase_cosmoch2: COSMO model forecasts (2.2 km) run by MeteoSwiss for the MAP D-PHASE project. World Data Center for Climate, accessed 11 July 2013, https://doi.org/10.1594/WDCC/dphase_cosmoch2.
Arpagaus, M., and Coauthors, 2009: MAP D-PHASE: Demonstrating forecast capabilities for flood events in the Alpine region. Veröffentlichungen MeteoSchweiz 78, 79 pp., www.meteoswiss.admin.ch/content/dam/meteoswiss/en/Ungebundene-Seiten/Publikationen/Scientific-Reports/doc/veroeff78.pdf.
Baddeley, A. J., 1992a: Errors in binary images and an Lp version of the Hausdorff metric. Nieuw Arch. Wiskunde, 10, 157–183.
Baddeley, A., 1992b: An error metric for binary images. Robust Computer Vision: Quality of Visions Algorithms, W. Förstner and S. Ruwiedel, Eds., Wichmann-Verlag, 59–78.
Baldauf, M., A. Seifert, J. Förstner, D. Majewski, M. Raschendorfer, and T. Reinhardt, 2011: Operational convective-scale numerical weather prediction with the COSMO model: Description and sensitivities. Mon. Wea. Rev., 139, 3887–3905, https://doi.org/10.1175/MWR-D-10-05013.1.
Ben Bouallègue, Z., and S. E. Theis, 2014: Spatial techniques applied to precipitation ensemble forecasts: From verification results to probabilistic products. Meteor. Appl., 21, 922–929, https://doi.org/10.1002/met.1435.
Bica, B., and Coauthors, 2007: Thermally and dynamically induced pressure features over complex terrain from high-resolution analyses. J. Appl. Meteor. Climatol., 46, 50–65, https://doi.org/10.1175/JAM2418.1.
Brill, K. F., and F. Mesinger, 2009: Applying a general analytic method for assessing bias sensitivity to bias-adjusted threat and equitable threat scores. Wea. Forecasting, 24, 1748–1754, https://doi.org/10.1175/2009WAF2222272.1.
Buehner, M., and Coauthors, 2015: Implementation of deterministic weather forecasting systems based on ensemble variational data assimilation at Environment Canada. Part I: The global system. Mon. Wea. Rev., 143, 2532–2559, https://doi.org/10.1175/MWR-D-14-00354.1.
Caron, J.-F., T. Milewski, M. Buehner, L. Fillion, M. Reszka, S. Macpherson, and J. St-James, 2015: Implementation of deterministic weather forecasting systems based on ensemble variational data assimilation at Environment Canada. Part II: The regional system. Mon. Wea. Rev., 143, 2560–2580, https://doi.org/10.1175/MWR-D-14-00353.1.
Casaioli, M., F. Catini, R. Inghilesi, P. Lanucara, P. Malguzzi, S. Mariani, and A. Orasi, 2014: An operational forecasting system for the meteorological and marine conditions in Mediterranean regional and coastal areas. Adv. Sci. Res., 11, 11–23, https://doi.org/10.5194/asr-11-11-2014.
Casati, B., 2010: New developments of the intensity-scale technique within the Spatial Verification Methods Intercomparison Project. Wea. Forecasting, 25, 113–143, https://doi.org/10.1175/2009WAF2222257.1.
Casati, B., G. Ross, and D. Stephenson, 2004: A new intensity-scale approach for the verification of spatial precipitation forecasts. Meteor. Appl., 11, 141–154, https://doi.org/10.1017/S1350482704001239.
Côté, J., S. Gravel, A. Méthot, A. Patoine, M. Roch, and A. Staniforth, 1998: The operational CMC–MRB Global Environmental Multiscale (GEM) model: Part I. Design considerations and formulation. Mon. Wea. Rev., 126, 1373–1395, https://doi.org/10.1175/1520-0493(1998)126<1373:TOCMGE>2.0.CO;2.
Cressman, G. P., 1959: An operational objective analysis system. Mon. Wea. Rev., 87, 367–374, https://doi.org/10.1175/1520-0493(1959)087<0367:AOOAS>2.0.CO;2.
Crocker, R., and M. P. Mittermaier, 2013: Exploratory use of a satellite cloud mask to verify NWP models. Meteor. Appl., 20, 197–205, https://doi.org/10.1002/met.1384.
Davies, T., M. J. P. Cullen, A. J. Malcolm, M. H. Mawson, A. Staniforth, A. A. White, and N. Wood, 2005: A new dynamical core for the Met Office’s global and regional modelling of the atmosphere. Quart. J. Roy. Meteor. Soc., 131, 1759–1782, https://doi.org/10.1256/qj.04.101.
Davis, C., B. Brown, and R. Bullock, 2006: Object-based verification of precipitation forecasts. Part I: Methods and application to mesoscale rain areas. Mon. Wea. Rev., 134, 1772–1784, https://doi.org/10.1175/MWR3145.1.
Davis, C., B. Brown, R. Bullock, and J. Halley-Gotway, 2009: The method for object-based diagnostic evaluation (MODE) applied to numerical forecasts from the 2005 NSSL/SPC spring program. Wea. Forecasting, 24, 1252–1267, https://doi.org/10.1175/2009WAF2222241.1.
Demaria, E. M. C., D. A. Rodriguez, E. E. Ebert, P. Salio, F. Su, and J. B. Valdes, 2011: Evaluation of mesoscale convective systems in South America using multiple satellite products and an object-based approach. J. Geophys. Res., 116, D08103, https://doi.org/10.1029/2010JD015157.
De Sales, F., and Y. Xue, 2011: Assessing the dynamic-downscaling ability over South America using the intensity-scale verification technique. Int. J. Climatol., 31, 1205–1221, https://doi.org/10.1002/joc.2139.
Dorninger, M., and T. Gorgas, 2013: Comparison of NWP-model chains by using novel verification methods. Meteor. Z., 22, 373–393, https://doi.org/10.1127/0941-2948/2013/0488.
Dorninger, M., S. Schneider, and R. Steinacker, 2008: On the interpolation of precipitation data over complex terrain. Meteor. Atmos. Phys., 101, 175–189, https://doi.org/10.1007/s00703-008-0287-6.
Dorninger, M., T. Gorgas, T. Schwitalla, M. Arpagaus, M. Rotach, and V. Wulfmeyer, 2009: Joint D-PHASE-COPS data set (JDC data set). Tech. Doc., 8 pp., https://ral.ucar.edu/projects/icp/Data/JDC/Description_of_JDC_data.pdf.
Dorninger, M., M. P. Mittermaier, E. Gilleland, E. E. Ebert, B. G. Brown, and L. J. Wilson, 2013: MesoVICT: Mesoscale Verification Intercomparison over Complex Terrain. NCAR Tech. Note NCAR/TN-505+STR, 23 pp., https://doi.org/10.5065/D6416V21.
Dubuisson, M.-P., and A. K. Jain, 1994: A modified Hausdorff distance for object matching. Proc 12th IAPR Int. Conf. on Pattern Recognition, Jerusalem, Israel, IEEE, 566–568.
Duc, L., K. Saito, and H. Seko, 2013: Spatial-temporal fractions verification for high-resolution ensemble forecasts. Tellus, 65A, 18171, https://doi.org/10.3402/tellusa.v65i0.18171.
Dukhovskoy, D. S., J. Ubnoske, E. Blanchard-Wrigglesworth, H. R. Hiester, and A. Proshutinsky, 2015: Skill metrics for evaluation and comparison of sea ice models. J. Geophys. Res. Oceans, 120, 5910–5931, https://doi.org/10.1002/2015JC010989.
Ebert, E. E., 2009: Neighborhood verification: A strategy for rewarding close forecasts. Wea. Forecasting, 24, 1498–1510, https://doi.org/10.1175/2009WAF2222251.1.
Ebert, E. E., and J. L. McBride, 2000: Verification of precipitation in weather systems: Determination of systematic errors. J. Hydrol., 239, 179–202, https://doi.org/10.1016/S0022-1694(00)00343-7.
Ebert, E. E., and W. A. Gallus Jr., 2009: Toward better understanding of the contiguous rain area (CRA) method for spatial forecast verification. Wea. Forecasting, 24, 1401–1415, https://doi.org/10.1175/2009WAF2222252.1.
Eiter, T., and H. Mannila, 1994: Computing discrete Fréchet distance. Christian Doppler Laboratory Tech. Rep. CD-TR 94/64, Vienna University of Technology, 8 pp., www.kr.tuwien.ac.at/staff/eiter/et-archive/cdtr9464.pdf.
Ferretti, R., and Coauthors, 2014: Overview of the first HyMeX Special Observation Period over Italy: Observations and model results. Hydrol. Earth Syst. Sci., 18, 1953–1977, https://doi.org/10.5194/hess-18-1953-2014.
Geiß, S., 2015: Comparison of spatial verification methods. Bachelor thesis, Meteorological Institute Munich, Ludwig-Maximilians-University Munich, 55 pp.
Gilleland, E., 2011: Spatial forecast verification: Baddeley’s delta metric applied to ICP test cases. Wea. Forecasting, 26, 409–415, https://doi.org/10.1175/WAF-D-10-05061.1.
Gilleland, E., 2017: A new characterization in the spatial verification framework for false alarms, misses, and overall patterns. Wea. Forecasting, 32, 187–198, https://doi.org/10.1175/WAF-D-16-0134.1.
Gilleland, E., T. C. M. Lee, J. H. Gotway, R. G. Bullock, and B. G. Brown, 2008: Computationally efficient spatial forecast verification using Braddeley’s D image metric. Mon. Wea. Rev., 136, 1747–1757, https://doi.org/10.1175/2007MWR2274.1.
Gilleland, E., D. A. Ahijevych, B. G. Brown, B. Casati, and E. E. Ebert, 2009: Intercomparison of spatial forecast verification methods. Wea. Forecasting, 24, 1416–1430, https://doi.org/10.1175/2009WAF2222269.1.
Gilleland, E., D. A. Ahijevych, B. G. Brown, and E. E. Ebert, 2010a: Verifying forecasts spatially. Bull. Amer. Meteor. Soc., 91, 1365–1373, https://doi.org/10.1175/2010BAMS2819.1.
Gilleland, E., J. Lindström, and F. Lindgren, 2010b: Analyzing the image warp forecast verification method on precipitation fields from the ICP. Wea. Forecasting, 25, 1249–1262, https://doi.org/10.1175/2010WAF2222365.1.
Girard, C., and Coauthors, 2014: Staggered vertical discretization of the Canadian environmental multiscale (GEM) model using a coordinate of the log-hydrostatic-pressure type. Mon. Wea. Rev., 142, 1183–1196, https://doi.org/10.1175/MWR-D-13-00255.1.
Gorgas, T., and M. Dorninger, 2012a: Concepts for a pattern-oriented analysis ensemble based on observational uncertainties. Quart. J. Roy. Meteor. Soc., 138, 769–784, https://doi.org/10.1002/qj.949.
Gorgas, T., and M. Dorninger, 2012b: Quantifying verification uncertainty by reference data variation. Meteor. Z., 21, 259–277, https://doi.org/10.1127/0941-2948/2012/0325.
Gorgas, T., M. Dorninger, and R. Steinacker, 2009: High resolution analyses based on the D-PHASE & COPS GTS and non-GTS data set. Ann. Meteor., 44, 94–95.
Harris, D., E. Foufoula-Georgiou, K. K. Droegemeier, and J. J. Levit, 2001: Multiscale statistical properties of a high-resolution precipitation forecast. J. Hydrometeor., 2, 406–418, https://doi.org/10.1175/1525-7541(2001)002<0406:MSPOAH>2.0.CO;2.
Hartung, D. C., J. A. Otkin, R. A. Petersen, D. D. Turner, and W. F. Feltz, 2011: Assimilation of surface-based boundary layer profiler observations during a cool-season weather event using an observing system simulation experiment. Part II: Forecast assessment. Mon. Wea. Rev., 139, 2327–2346, https://doi.org/10.1175/2011MWR3623.1.
Hebert, D. A., R. A. Allard, E. J. Metzger, P. G. Posey, R. H. Preller, A. J. Wallcraft, M. W. Phelps, and O. M. Smedstad, 2015: Short-term sea ice forecasting: An assessment of ice concentration and ice drift forecasts using the U.S. Navy’s Arctic Cap Nowcast/Forecast System. J. Geophys. Res. Oceans, 120, 8327–8345, https://doi.org/10.1002/2015JC011283.
Heinrichs, J. F., D. J. Cavalieri, and T. Markus, 2006: Assessment of the AMSR-E sea ice concentration product at the ice edge using RADARSAT-1 and MODIS imagery. IEEE Trans. Geosci. Remote Sens., 44, 3070–3080, https://doi.org/10.1109/TGRS.2006.880622.
Hersbach, H., 2000: Distortion representation of forecast errors. Wea. Forecasting, 15, 559–570, https://doi.org/10.1175/1520-0434(2000)015<0559:DOTCRP>2.0.CO;2.
Johnson, A., X. Wang, F. Kong, and M. Xue, 2011: Hierarchical cluster analysis of a convection-allowing ensemble during the Hazardous Weather Testbed 2009 Spring Experiment. Part I: Development of the object-oriented cluster analysis method for precipitation fields. Mon. Wea. Rev., 139, 3673–3693, https://doi.org/10.1175/MWR-D-11-00015.1.
Keil, C., and G. C. Craig, 2007: A displacement-based error measure applied in a regional ensemble forecasting system. Mon. Wea. Rev., 135, 3248–3259, https://doi.org/10.1175/MWR3457.1.
Keil, C., and G. C. Craig, 2009: A displacement and amplitude score employing an optical flow technique. Wea. Forecasting, 24, 1297–1308, https://doi.org/10.1175/2009WAF2222247.1.
Kloiber, S., 2017: Verification in complex terrain with ensemble-analysis. M.S. thesis, Dept. of Meteorology and Geophysics, University of Vienna, 74 pp.
Lack, S. A., G. L. Limpert, and N. I. Fox, 2010: An object-oriented multiscale verification scheme. Wea. Forecasting, 25, 79–92, https://doi.org/10.1175/2009WAF2222245.1.
Lakshmanan, V., and J. S. Kain, 2010: A Gaussian mixture model approach to forecast verification. Wea. Forecasting, 25, 908–920, https://doi.org/10.1175/2010WAF2222355.1.
Liu, Y., J. Brown, J. Demargne, and D.-J. Seo, 2011: A wavelet-based approach to assessing timing errors in hydrologic predictions. J. Hydrol., 397, 210–224, https://doi.org/10.1016/j.jhydrol.2010.11.040.
Mailhot, J., S. Bélair, B. Bilodeau, Y. Delage, L. Fillion, L. Garand, C. Girard, and A. Tremblay, 1998: Scientific description of RPN Physics Library, version 3.6. Recherche en prévision numérique, 188 pp. [Available from RPN, 2121 Trans-Canada Highway, Dorval, PQ H9P 1J3, Canada.]
Mariani, S., M. Casaioli, E. Coraci, and P. Malguzzi, 2015a: A new high-resolution BOLAM-MOLOCH suite for the SIMM forecasting system: Assessment over two HyMeX intense observation periods. Nat. Hazards Earth Syst. Sci., 15, 1–24, https://doi.org/10.5194/nhess-15-1-2015.
Mariani, S., M. Casaioli, A. Lanciani, S. Flavoni, and C. Accadia, 2015b: QPF performance of the updated SIMM forecasting system using reforecasts. Meteor. Appl., 22, 256–272, https://doi.org/10.1002/met.1453.
Marsigli, C., A. Montani, and T. Paccagnella, 2008: A spatial verification method applied to the evaluation of high-resolution ensemble forecasts. Meteor. Appl., 15, 125–143, https://doi.org/10.1002/met.65.
Marzban, C., S. Sandgathe, H. Lyons, and N. Lederer, 2009: Three spatial verification techniques: Cluster analysis, variogram, and optical flow. Wea. Forecasting, 24, 1457–1471, https://doi.org/10.1175/2009WAF2222261.1.
Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407–430, https://doi.org/10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.
McTaggart-Cowan, R., 2009: Dphase_cmcgemh: Regional GEM model high resolution forecast run by CMC for the MAP D-PHASE project. World Data Center for Climate, accessed 11 July 2013, https://doi.org/10.1594/WDCC/dphase_cmcgemh.
Milbrandt, J. A., S. Bélair, M. Faucher, M. Vallée, M. Carrera, and A. Glazer, 2016: The Pan-Canadian High Resolution (2.5 km) Deterministic Prediction System. Wea. Forecasting, 31, 1791–1816, https://doi.org/10.1175/WAF-D-16-0035.1.
Mittermaier, M. P., 2014: A strategy for verifying near-convection-resolving forecasts at observing sites. Wea. Forecasting, 29, 185–204, https://doi.org/10.1175/WAF-D-12-00075.1.
Mittermaier, M. P., and N. Roberts, 2010: Intercomparison of spatial forecast verification methods: Identifying skillful spatial scales using the fractions skill score. Wea. Forecasting, 25, 343–354, https://doi.org/10.1175/2009WAF2222260.1.
Mittermaier, M. P., and R. Bullock, 2013: Using MODE to explore the spatial and temporal characteristics of cloud cover forecasts from high-resolution NWP models. Meteor. Appl., 20, 187–196, https://doi.org/10.1002/met.1393.
Mittermaier, M. P., and G. Csima, 2017: Ensemble versus deterministic performance at the kilometer scale. Wea. Forecasting, 32, 1697–1709, https://doi.org/10.1175/WAF-D-16-0164.1.
Mittermaier, M. P., N. Roberts, and S. A. Thompson, 2013: A long-term assessment of precipitation forecast skill using the fractions skill score. Meteor. Appl., 20, 176–186, https://doi.org/10.1002/met.296.
Mittermaier, M. P., R. North, A. Semple, and R. Bullock, 2016: Feature-based diagnostic evaluation of global NWP forecasts. Mon. Wea. Rev., 144, 3871–3893, https://doi.org/10.1175/MWR-D-15-0167.1.
Montani, A., C. Marsigli, T. Paccagnella, 2009: Dphase_cleps: COSMO-LEPS forecasts run by ARPA-SIM for the MAP D-PHASE project. World Data Center for Climate, accessed 11 July 2013. https://doi.org/10.1594/WDCC/dphase_cleps.
Montani, A., D. Cesari, C. Marsigli, and T. Paccagnella, 2011: Seven years of activity in the field of mesoscale ensemble forecasting by the COSMO-LEPS system: Main achievements and open challenges. Tellus, 63A, 605–624, https://doi.org/10.1111/j.1600-0870.2010.00499.x.
Montani, A., D. Alferov, E. Astakhova, C. Marsigli, and T. Paccagnella, 2014: Ensemble forecasting for Sochi-2014 Olympics: The COSMO-based ensemble prediction systems. COSMO Newsletter, No. 14, Consortium for Small-Scale Modeling, Offenbach, Germany, 88–94, www.cosmo-model.org/content/model/documentation/newsLetters/newsLetter16/cnl16_06.pdf.
Nachamkin, J. E., 2004: Mesoscale verification using meteorological composites. Mon. Wea. Rev., 132, 941–955, https://doi.org/10.1175/1520-0493(2004)132<0941:MVUMC>2.0.CO;2.
Nachamkin, J. E., 2009: Application of the composite method to the Spatial Forecast Verification Methods Intercomparison dataset. Wea. Forecasting, 24, 1390–1400, https://doi.org/10.1175/2009WAF2222225.1.
Nan, Z., S. Wang, X. Liang, T. Adams, W. Teng, and Y. Liang, 2010: Analysis of spatial similarities between NEXRAD and NLDAS precipitation data products. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., 3, 371–385, https://doi.org/10.1109/JSTARS.2010.2048418.
Peli, T., and D. Malah, 1982: A study of edge detection algorithms. Comput. Graphics Image Process ., 20, 1–21, https://doi.org/10.1016/0146-664X(82)90070-3.
Pratt, W. K., 1978: Digital Image Processing. Wiley, 750 pp.
Roberts, N. M., and H. W. Lean, 2008: Scale-selective verification of rainfall accumulations from high-resolution forecasts of convective events. Mon. Wea. Rev., 136, 78–97, https://doi.org/10.1175/2007MWR2123.1.
Rotach, M. W., and Coauthors, 2009: MAP D-PHASE: Real-time demonstration of weather forecast quality in the alpine region. Bull. Amer. Meteor. Soc., 90, 1321–1336, https://doi.org/10.1175/2009BAMS2776.1.
Schaffer, C. J., W. A. Gallus Jr., and M. Segal, 2011: Improving probabilistic ensemble forecasts of convection through the application of QPF–POP relationships. Wea. Forecasting, 26, 319–336, https://doi.org/10.1175/2010WAF2222447.1.
Schwedler, B. R. J., and M. E. Baldwin, 2011: Diagnosing the sensitivity of binary image measures to bias, location, and event frequency within a forecast verification framework. Wea. Forecasting, 26, 1032–1044, https://doi.org/10.1175/WAF-D-11-00032.1.
Skamarock, W. C., and Coauthors, 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp., https://doi.org/10.5065/D68S4MVH.
Skok, G., and N. Roberts, 2016: Analysis of fractions skill score properties for random precipitation fields and ECMWF forecasts. Quart. J. Roy. Meteor. Soc., 142, 2599–2610, https://doi.org/10.1002/qj.2849.
Skok, G., and V. Hladnik, 2018: Verification of gridded wind forecasts in complex alpine terrain: A new wind verification methodology based on the neighborhood approach. Mon. Wea. Rev., 146, 63–75, https://doi.org/10.1175/MWR-D-16-0471.1.
Sobash, R. A., J. S. Kain, D. R. Bright, A. R. Dean, M. C. Coniglio, and S. J. Weiss, 2011: Probabilistic forecast guidance for severe thunderstorms based on the identification of extreme phenomena in convection-allowing model forecasts. Wea. Forecasting, 26, 714–728, https://doi.org/10.1175/WAF-D-10-05046.1.
Steinacker, R., C. Häberli, and W. Pöttschacher, 2000: A transparent method for the analysis and quality evaluation of irregularly distributed and noisy observational data. Mon. Wea. Rev., 128, 2303–2316, https://doi.org/10.1175/1520-0493(2000)128<2303:ATMFTA>2.0.CO;2.
Steinacker, R., and Coauthors, 2006: A mesoscale data analysis and downscaling method over complex terrain. Mon. Wea. Rev., 134, 2758–2771, https://doi.org/10.1175/MWR3196.1.
Steinacker, R., D. Mayer, and A. Steiner, 2011: Data quality control based on self-consistency. Mon. Wea. Rev., 139, 3974–3991, https://doi.org/10.1175/MWR-D-10-05024.1.
Steppeler, J., G. Doms, U. Schättler, H. Bitzer, A. Gassmann, and U. Damrath, 2003: Meso-gamma scale forecasts using the nonhydrostatic model LM. Meteor. Atmos. Phys., 82, 75–96, https://doi.org/10.1007/s00703-001-0592-9.
Tang, Y., H. W. Lean, and J. Bornemann, 2013: The benefits of the Met Office variable resolution NWP model for forecasting convection. Meteor. Appl., 20, 417–426, https://doi.org/10.1002/met.1300.
Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2002: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, https://doi.org/10.1175/BAMS-D-11-00094.1.
Theis, S. E., A. Hense, and U. Damrath, 2005: Probabilistic precipitation forecasts from a deterministic model: A pragmatic approach. Meteor. Appl., 12, 257–268, https://doi.org/10.1017/S1350482705001763.
Venugopal, V., S. Basu, and E. Foufoula-Georgiou, 2005: A new metric for comparing precipitation patterns with an application to ensemble forecasts. J. Geophys. Res., 110, D08111, https://doi.org/10.1029/2004JD005395.
Wapler, K., M. Goeber, and S. Trepte, 2012: Comparative verification of different nowcasting systems to support optimisation of thunderstorm warnings. Adv. Sci. Res., 8, 121–127, https://doi.org/10.5194/asr-8-121-2012.
Weniger, M., and P. Friederichs, 2016: Using the SAL technique for spatial verification of cloud processes: A sensitivity analysis. J. Appl. Meteor. Climatol., 55, 2091–2108, https://doi.org/10.1175/JAMC-D-15-0311.1.
Wernli, H., C. Hofmann, and M. Zimmer, 2009: Spatial Forecast Verification Methods Intercomparison Project: Application of the SAL technique. Wea. Forecasting, 24, 1472–1484, https://doi.org/10.1175/2009WAF2222271.1.
Weusthoff, T., F. Ament, M. Arpagaus, and M. W. Rotach, 2010: Assessing the benefits of convection-permitting models by neighborhood verification: Examples from MAP D-PHASE. Mon. Wea. Rev., 138, 3418–3433, https://doi.org/10.1175/2010MWR3380.1.
Wood, N., and Coauthors, 2014: An inherently mass-conserving semi-implicit semi-Lagrangian discretization of the deep-atmosphere global non-hydrostatic equations. Quart. J. Roy. Meteor. Soc., 140, 1505–1520, https://doi.org/10.1002/qj.2235.
Wulfmeyer, V., and Coauthors, 2008: The Convective and Orographically Induced Precipitation Study: A research and development project of the World Weather Research Program for improving quantitative precipitation forecasting in low-mountain regions. Bull. Amer. Meteor. Soc., 89, 1477–1486, https://doi.org/10.1175/2008BAMS2367.1.
Zhu, M., V. Lakshmanan, P. Zhang, Y. Hong, K. Cheng, and S. Chen, 2011: Spatial verification using a true metric. Atmos. Res., 102, 408–419, https://doi.org/10.1016/j.atmosres.2011.09.004.