The medium-range ensemble (ENS) from the European Centre for Medium-Range Weather Forecasts (ECMWF) Integrated Forecasting System (IFS) is used to create two new products intended to face the challenges of winter precipitation-type forecasting. The products themselves are a map product that represents which precipitation type is most likely whenever the probability of precipitation is >50% (also including information on lower probability outcomes) and a meteogram product, showing the temporal evolution of the instantaneous precipitation-type probabilities for a specific location, classified into three categories of precipitation rate. A minimum precipitation rate is also used to distinguish dry from precipitating conditions, setting this value according to type, in order to try to enforce a zero frequency bias for all precipitation types. The new products differ from other ECMWF products in three important respects: first, the input variable is discretized, rather than continuous; second, the postprocessing increases the output information content; and, third, the map-based product condenses information into a more accessible format. The verification of both products was developed using four months’ worth of 3-hourly observations of present weather from manual surface synoptic observation (SYNOPs) in Europe during the 2016/17 winter period. This verification shows that the IFS is highly skillful when forecasting rain and snow, but only moderately skillful for freezing rain and rain and snow mixed, while the ability to predict the occurrence of ice pellets is negligible. Typical outputs are also illustrated via a freezing-rain case study, showing interesting changes with lead time.
One of the greatest difficulties facing forecasters during the winter season is the accurate identification of precipitation type at ground level (Ralph et al. 2005). Certain types of precipitation can be a threat to human health and public safety and can disrupt travel and commerce, seriously affecting the economy (Ralph et al. 2005; Reeves et al. 2016). Freezing rain (FZRA) is particularly hazardous due to its ice-loading effects on power wires, and because it can make travel extremely dangerous. In the most severe cases with heavy and prolonged freezing precipitation, the consequences can be catastrophic, with collapsed power lines causing prolonged power outages, with travel networks of many types completely paralyzed, and with major long-term damage to infrastructure and vegetation (DeGaetano 2000; Chang et al. 2007; Call 2010). Accurate predictions from weather forecast models of timing (onset and duration), intensity, spatial extent, and phase (i.e., precipitation type) are crucial for decision-making and can help minimize the potential impacts (Branick 1997; Ikeda et al. 2013; Grout et al. 2012; Ikeda et al. 2017). Nevertheless, although it is self-evident that correct predictions of precipitation type are vitally important, only limited attention has been paid to wintertime precipitation-type forecasting in Europe.
There are numerous sources of uncertainty in precipitation-type forecasts, in particular mixed phases [FZRA, ice pellets (IP), and rain and snow mixed (RASN)] are not well predicted (Wandishin et al. 2005; Reeves et al. 2014; Elmore et al. 2015; Ikeda et al. 2017) and continue to pose a substantial forecast challenge for numerical weather prediction (NWP) models. The thermodynamic structures of the atmosphere in IP and FZRA situations are so similar that small errors in the predicted thicknesses of an elevated melting layer and/or a near-surface subzero layer can result in an incorrect prediction of precipitation type at ground level (Reeves et al. 2014, 2016). Precipitation rate also plays an important role in the correct determination of precipitation type, because melting, which is an integral feature of IP/FZRA situations, will absorb latent heat from the atmosphere and thereby cool and modify the thermodynamic structure in proportion to precipitation rate. Furthermore, precipitation rate is influenced by snowflake type, density, and the degree of riming, as well as by interaction with other particles during passage through different atmospheric layers with different temperature and moisture profiles (Sankaré and Thériault 2016; Reeves et al. 2016; Ikeda et al. 2017). Temporal variability is an added complication since rain (RA) and snow (SN) are in general longer-lived phenomena than FZRA, IP, or RASN. Various authors (such as Reeves 2016), have also highlighted the existence of strong biases in precipitation-type forecasts, especially for FZRA and IP (Manikin et al. 2004; Schuur et al. 2012; Reeves et al. 2014), but also with RA and SN. The causes of these systematic errors can in principle be diagnosed using observational data (Reeves et al. 2014). Forbes et al. (2014) compared two freezing-rain case studies between the last version of the cloud and precipitation parameterizations in Integrated Forecasting System (IFS) cycle 41r1 (2015) and the version in IFS cycle 36r4 (2010). In the newer version, the freezing-rain processes for elevated warm layers are modified. They include a more representative time scale for the refreezing of raindrops that depends on the temperature and crucially on whether the snow particles have completely melted or not (Zerr 1997). Forbes et al. (2014) found large errors in the previous depiction of precipitation type, specifically an inability to predict the extent of the freezing-rain event, whereas the model with the new physics is able to predict freezing rain that is in general agreement with the observations. The present study is an extension of Forbes et al. (2014), who showed the advantages of using ensemble forecasts of precipitation type and a capacity to detect potential freezing-rain areas even with low precipitation rate thresholds.
The correct choice of observations is another important aspect of precipitation-type verification but few authors have paid attention to this topic (Huntemann et al. 2014; Reeves 2016). When the true surface temperature is near to 0°C, small errors in forecasts can have a large impact on the precipitation-type forecast (Reeves 2016). Similarly, height differences between the model and the true orography at observation sites can be another source of uncertainty in the verification process. The main surface observations used by national meteorological services in Europe are surface synoptic observations (SYNOPs), of both automatic and manual types. Manual SYNOP observations are generated by trained observers and are generally accurate with regard to the determination of precipitation type. However, automatic SYNOP station precipitation-type reports are often erroneous, with mixed precipitation types often misdiagnosed (Elmore et al. 2015).
Regarding NWP, sophisticated microphysical parameterizations schemes are widely used in high-resolution regional forecast models, which should help with precipitation-type prediction, but even with such complex algorithms, correctly predicting what phase of precipitation ends up at the ground remains a challenging task (Ikeda et al. 2013). The study by Thériault et al. (2010) demonstrated that precipitation type at the ground is highly sensitive to temperature profile variations as small as ±0.5°C, meaning that predictive difficulties are particularly acute within snow–rain transition regions. Other factors such as proximity to water bodies, terrain height variations, or the precipitation rate are very important as well (Stewart 1985; Bernstein 2000; Robbins and Cortinas 2002; Minder and Durran 2011). Some authors consider these uncertainties to be difficult to reduce, but they can potentially be quantified by the use of ensemble forecasts (Cortinas et al. 2002; Manikin et al. 2004; Wandishin et al. 2005; Shafer and Rudack 2014; Scheuerer et al. 2017). Brooks et al. (1996), Wandishin et al. (2005), and Reeves (2016) point out that a more desirable approach to increasing the accuracy of precipitation-type forecasts for mixed precipitation events is to use ensemble prediction to provide probabilistic forecasts of precipitation type. Naturally, this provides the forecaster with a broader perspective on the likelihood of occurrence of different mixed phases during (potential) FZRA episodes. Wandishin et al. (2005) published the first study to investigate extensively the potential use of ensembles for forecasting precipitation types during the winter period. They used 10 ensemble members and examined 0–48-h lead times, showing that ensemble forecasts have the capacity to be of substantial value to potential users and how skill increased with the number of members, especially for mixed-phase precipitation forecasts.
ECMWF IFS ensemble forecasts (ENS) have been operational for over 25 years and currently comprise 1 control and 50 perturbed forecasts running out to 15 days, twice per day. Instantaneous surface precipitation type is one of the ENS output variables, and this takes one of six different values: RA, SN, wet snow (WSN), RASN, FZRA, and IP. In addition, one can compute the total instantaneous precipitation rate at the surface by summing together convective/stratiform precipitation rate values valid at a particular time (rather than the typical averaged or accumulated over a period), and this can be combined with the precipitation-type diagnostic. In the IFS model the melting and refreezing parameterizations must be formulated in terms of its prognostic variables for precipitation: rain and snow. Precipitation type is diagnosed from the ratio of rain and snow at the surface and the profile of precipitation and temperature above. Whether the precipitation refreezes or not in the cold air below also affects the temperature profile, resulting in colder temperatures in the layer if the particles remain as supercooled water. The latent heat of fusion is instead transferred to the surface with the rain freezing on impact, and this leads to a relative warming of the surface and near-surface temperatures. More information about the algorithms can be found in Forbes et al. (2014).
As part of ECMWF’s contribution to the Enhancing Emergency Management and Response to Extreme Weather and Climate Events (ANYWHERE) project, two new products have been developed in this study based on ENS forecasts of precipitation type: a map showing the most probable precipitation type (PREFptype) and a meteogram showing the probability of precipitation type (PROBptype) for a user-selected site. These new products highlight the advantages of using ENS forecasts to infer precipitation type, especially in more challenging situations where there is a risk of SN or FZRA. We have also developed a methodology for verifying precipitation-type forecasts using SYNOP observations. The instantaneous precipitation rate variable, which can acquire minuscule values, is also used concurrently and in a new way to define cutoffs between precipitating and dry conditions. In using this rate variable, we aimed to minimize, for each precipitation type, any frequency biases relative to SYNOP reports. In turn this approach should reduce the numbers of misses and/or false alarms. The methods and datasets used in the creation and validation of these new products are explained in section 2. Sections 3 and 4 describe the verification procedure and results for the meteogram and map products, respectively. Section 5 describes a FZRA case study using both products. Concluding remarks are provided in section 6.
2. Datasets and products
a. Forecasts and products
At ECMWF, the precipitation-type output variable is provided by the IFS deterministic/high-resolution (HRES) and ENS runs. This variable describes the type of precipitation falling at the surface at each forecast time step (and denotes dry if the total precipitation rate is zero). RA, SN, FZRA, and IP have been considered in other studies related to precipitation-type forecasts (Wandishin et al. 2005; Reeves et al. 2014; Elmore et al. 2015). However, RASN (partly melted falling snow) has not been considered in these studies. The version of IFS used for this study was the cycle 41r1 (operational at ECMWF from November 2016 to July 2017) with 1 control and 50 perturbed forecasts (the ENS). For the verification of the new ECMWF precipitation-type products we used the 0000 UTC base time and evaluated up to a lead time of 168 h. In this ENS configuration, the spatial resolution is approximately 18 km while the temporal resolution of the output available internally at ECMWF is 1 hourly from T + 0 to 90 h, 3 hourly from T + 90 to 144 h, and 6 hourly from T + 144 h onward. For the verification we split the 7 days into 24-h periods and focused on the domain 33°–71°N, 11°W–35°E. HRES precipitation-type forecasts have given (and continue to give) impactful information for the users for winter precipitation forecasting (Forbes et al. 2014). However, the new tools described in this paper go further and provide information even when the probabilities of occurrence are very low, and they add more detailed information specific to each precipitation type.
The PREFptype maps (Figs. 1a,b) show, in color, which of the six precipitation types is most likely whenever the probability of any type of precipitation is >50%. Then, shading darkness is further used to denote what the probability (of the type denoted by the color) actually is, in three probability ranges: up to 50%, 50%–70%, and more than 70%. To expand on this, particularly for longer leads when probabilities above 50% become increasingly rare, we use also two gray shades to denote when the probability of any type of precipitation is 10%–30% or 30%–50%. This overall structure has been carefully designed to try to extract, compress, and display as much information as possible from the input ensemble data, in a meaningful format, taking into account the requirements of what might be a typical user. Figures 1a and 1b show example forecast output for a winter storm in the northeastern United States, valid at 1200 UTC 14 March 2017, with base times of 0000 UTC 14 March and 0000 UTC 11 March, respectively. Both PREFptype maps show a similar structure but note, first, that FZRA and IP were the most probable types in some regions in the shortest-range forecast (Fig. 1a) but not 3 days in advance (Fig. 1b), and second that for the longer lead time (Fig. 1b) we see more gray areas, which is typical. Another design aspect for the user to appreciate is that whenever the lightest shade of a given color (except gray) appears on a map, the user immediately knows that more than one precipitation type has been predicted at that time, which can serve as an initial alarm bell for “uncertainty.” In particular, one can see examples of this in Fig. 1b. We envisage that in the first instance users might first view this product and display an animation over a range of lead times, and then focus in on a particular location/event using the meteogram product (Figs. 1c,d), which provides much more detailed probabilistic information.
The PROBptype or meteogram product itself (Figs. 1c,d) depicts the temporal evolution of probabilities for a specific location in bar chart format. Here, the shading provides much more detail regarding the probabilities for different precipitation types and also includes information pertaining to the instantaneous total precipitation rate (another IFS variable), which can be key for determining the severity of impacts of, for example, potential freezing-rain or snowfall events. We define three different categories of precipitation type depending on the precipitation rate, from Rmin to 0.2 mm h−1 (low intensity, where Rmin is the minimum permissible rate for each precipitation type), from 0.2 to 1 mm h−1 (moderate intensity), and greater than 1 mm h−1 (high intensity). One clear example of the advantages of this product can be seen in the New York meteograms; the first one is valid from 14 to 18 March 2017 (Fig. 1c) and the second one from 11 to 15 March 2017 (Fig. 1d). Although both meteograms forecasted relatively heavy snowfall (intensities greater than 1 mm h−1) in New York City on 14 March, the most recent prediction (Fig. 1c) shows higher probabilities (close to 100%) of heavy snowfall during several hours and includes some probabilities of freezing rain. However, 3 days in advance of the event, probabilities of snow were mixed with wet snow and even with some rain, but provided no evidence of FZRA.
The constant Rmin was defined on the basis of a 4-month verification training period that utilized present weather reports from SYNOP observations. For each type of precipitation the objective was to come as close as possible to removing any biases at all lead times beyond day 1 (as far as the wintertime SYNOP observations were concerned). The procedure adopted was iterative, wherein six values of the precipitation rate threshold were applied in order to calculate the bias for each precipitation type: 0.02, 0.05, 0.07, 0.1, and 0.12 mm h−1. For our computations, we considered each ensemble member as a binary forecast result (occurrence/nonoccurrence), obtaining 51 forecasts nine times per day (3 hourly), giving a total of 459 forecasts per day in each location. Each precipitation type was studied and verified probabilistically for the PROBptype product, while the data from PREFptype were considered to constitute binary observations (occurrence or nonoccurrence) in the verification.
During the design of the products, we received continuous feedback from volunteer forecasters, adding extra features that they requested, as well as removing information that was not relevant for them. Also training courses were used to show how the products could/should be used and to get further feedback regarding how the products were actually being interpreted. In these various ways the products have been optimized for potential users, and at the same time those users have been and will continue to be trained in their usage.
The verification of the two new products was performed exclusively using 3-hourly observations of present weather from manned SYNOP stations in Europe. Despite the high density of automatic SYNOP stations in Europe, we refrained from using them in this study because the present weather sensing and coding can be unreliable, particularly in mixed-phase precipitation scenarios (Elmore et al. 2015). The period analyzed ran from 15 October 2016 to 15 February 2017 (4 months over winter). The aim here was to assess the most recent ECMWF model cycle running over a winter period (cycle 43r1). Although both products were originally created with 1-h time resolution, only 3-hourly verification was possible as a result of the absence of more frequent SYNOP manual observations in ECMWF archives. One of the difficulties encountered in this validation was the inconsistent frequency of present weather observations for the different stations in the archives [an issue also noted by Carriére et al. (2000)]. The total number of stations used in this study is 1050 (Fig. 2). However, not all stations are open 24 h a day, so the nominal maximum frequency of SYNOP observations providing a current weather group during the study period varied with time of the day. The original present weather reports were classified into one of five different categories: RA, SN, RASN, FZRA and IP (Table 1). WSN was not considered separately because of the lack of direct observations for its verification; instead, the WSN forecasts were classified as SN. In future work one could conceivably use measurements of 2-m temperature (as in the IFS code) or visibility (see Ludlam 1980) as the basis for separating SN from WSN, but here we retain the simpler approach. Some types of present weather observations were easy to classify, but the classification of mixed-phase precipitation was not so straightforward. For example, how large a fraction of water do you need before RASN becomes RA or, for that matter, SN? Also, not all SYNOP present weather reports had an exclusive classification, meaning that the same observation could correspond to two or three different categories (e.g., “rain or drizzle and/or snow,” with codes 68 and 69, are included in both categories—RA and RASN). Slight freezing drizzle (code 56) was considered “no precipitation”; however, slight continuous not freezing drizzle (code 51) was included in the RA category (Table 1). This decision was made after many tests during the verification process in an attempt to avoid including extra bias due to the wrong classification of the observations (not shown). As we described in section 1, one important source of uncertainty is the height difference between the model and observations, which is critical in near-freezing temperatures. For this reason, SYNOP stations with an altitude difference of more than 200 m relative to the closest ENS point were removed from the verification.
3. Verification of ENS precipitation type (meteogram product)
a. Reducing systematic bias
Here, we first describe the methodology for rate-related frequency bias correction for the precipitation-type variable; that is, we define Rmin for each precipitation type. The target of this procedure was to make the total frequency of occurrence of each precipitation type, within forecasts, over all the observation sites, equal to the observed frequency of occurrence at those sites (i.e., frequency bias = 1). We examined multiple lead times, anticipating that there could be some lead-time-based drift in frequencies, though not expecting this would be a major factor. Figure 3a shows the frequency bias calculated over our 4-month wintertime verification training period, as a function of different precipitation rate thresholds (0.02, 0.05, 0.07, 0.1, and 0.12 mm h−1) at 24–48-h lead times. This day-2 lead was initially used as a focal point instead of day 1 to avoid any model spinup effects and also in recognition of the fact that ECMWF’s primary goal is not short-range forecasting. Naturally, frequency bias diminishes as precipitation rate increases. All precipitation types, except IP, present a positive frequency bias with a precipitation rate of 0.02 mm h−1, which suggests that this limit is too low as it leads to overprediction. Indeed, the bias reaches almost 2 for RA. A value of Rmin = 0.05 mm h−1 seems to be the most suitable limit for the SN and FZRA forecasts, giving a bias close to 1. However, for RASN and RA we need a higher rate and set Rmin = 0.1 mm h−1 for RASN and 0.12 mm h−1 for RA. So, from a bias-reduction perspective it is clearly beneficial to apply different precipitation rate thresholds for each precipitation type. IP exhibits a different behavior compared to the other precipitation types: whatever the precipitation rate threshold, a large underestimation occurs.
The frequency biases for ENS at 96–120-h lead times (Fig. 3b) are quite similar to those at 24–48-h lead times (Fig. 3a), suggesting that the bias is not in general a function of lead time (beyond day 1), and so we apply the same Rmin values across the lead-time range. Although only two lead-time ranges are shown here, we did in fact analyze seven sets (from 0–24 to 144–168 h), and very similar results were obtained for all of them. Equally, we actually found no evidence of spinup effects on day 1, which from both a user and a modeler’s perspectives is encouraging (not shown).
To evaluate the advantages of postprocessing using Rmin in probabilistic forecasts of precipitation type, reliability diagrams for each precipitation type with different thresholds were constructed. Reliability diagrams (Murphy and Winkler 1977; Wilks 1995) compare the forecast probabilities against the frequency of an event occurrence and therefore measure how closely the forecast probabilities of an event correspond to the actual chance of the event occurring. In this section, two different lead times for evaluating performance are again considered (24–48 and 96–120 h), with two different precipitation rate thresholds (see Fig. 4). The black solid diagonal line represents perfect reliability. RA forecasts are reasonably reliable for both lead times and Rmin settings (Fig. 4a), though the larger Rmin (blue) gives better results throughout. The SN forecasts are also reasonably reliable (Fig. 4b), but if the larger Rmin setting was used, and probabilities were low (10%–30% say), too many events would be missed. So results for both RA and SN are consistent with the recommendations from the previous section regarding Rmin. The FZRA (Fig. 4c) and RASN (Fig. 4d) forecasts are not good but show some limited skill, though the sample size seems insufficient to highlight the benefits of the recommended Rmin values. Also of note is the fact that high probability forecasts of either FZRA or RASN, though rarely occurring, are generally far too confident. This is a typical characteristic of reliability diagrams for parameters that are generally not well predicted. Finally, Fig. 4e shows that probabilistic forecasts for IP (Fig. 4e) cannot be relied upon.
Clearly, sample size affects the above results (Table 1). Frequencies of IP and FZRA are very low compared with, for example, RA. On the other hand, from the perspective of severe winter weather prediction, it is somewhat encouraging that in spite of this FZRA forecasts, at least, do have some reliability at day 2. This is probably because there is more spatial continuity/extent in FZRA during an FZRA event than there would be for IP during an IP event. Examining the model output, one tends to see that IP conditions are predicted to occur in narrow bands (i.e., having a relatively one-dimensional structure), which makes predictive accuracy very vulnerable to slight lateral displacement errors (and also explains the very low frequencies of higher probability forecasts). FZRA zones are typically more two-dimensional.
One disadvantage of postprocessing in general is that there is always a need to recalibrate each time a related significant change is made in a new model cycle. However, at ECMWF the experimental runs covering many winter months are always carried out in advance of the release of a new cycle, and a verification tool has been automatized to recalibrate the products in case the biases change significantly. This will allow the recalibration of Rmin to also be done in advance, and, as illustrated above, four months’ worth of reruns should be sufficient for this purpose. It is highly probable that the bias varies seasonally (e.g., we would expect a larger rate threshold for RA in summer); however, this new tool presents its main use in the winter season, so we have prioritized the correction of this period of the year.
b. ROC curves
The relative (or receiver) operating characteristic (ROC) diagram (Mason 1982) is widely used to evaluate the quality of probabilistic forecasts (Stanski et al. 1989; Buizza and Palmer 1998; Mason and Graham 1999). It plots the hit rate H against the false alarm rate F, for different probability thresholds. The main diagonal corresponds to random forecasts (H = F), and the area under the ROC curve (AUC; Hanley and McNeil 1982) is taken as a measure of skill, with values between 0.5 (random forecast) and 1 (perfect forecast). For the verification of each precipitation type, we first apply the Rmin filter for each precipitation type, as discussed above. Following the filtering, the verification is performed without taking into account the intensity of the precipitation, only the probabilities of occurrence. Although ROC curves do not of themselves provide any measure of reliability, in our case we have tried to maximize reliability using the Rmin thresholds. However, the use of this approach does not mean that we will get perfect reliability at every probability threshold.
ROC curves for each precipitation type and at seven different lead times are shown in Fig. 5, wherein probability thresholds were assigned at 2% intervals (although labels were added only at 10% intervals, for the shortest-range forecasts; shown in red). The AUC score for each category and each time step is indicated in the bottom-right corner of each diagram. Looking at the ROC curves for RA (Fig. 5a) and for SN (Fig. 5b), they seem quite similar; however, RA is a bit more skillfully predicted than SN. The differences between the RA and SN ROC plots are clearer in the AUC values (gray boxes). AUC values for RA precipitation (Fig. 5a) are between 0.90 at 0–24-h lead time and 0.81 at 144–168-h lead time. In the SN case (Fig. 5b), these values range from 0.87 to 0.83 at the same lead times. From days 5 to 7 the RA AUC is slightly lower than the SN AUC, as a result of the fact that the F is greater for RA than for SN at these lead times. As in the analysis of the reliability (Fig. 4), the ROC curves for FZRA (Fig. 5c) and RASN (Fig. 5d) are quite similar, probably because of their lower frequencies in the study sample. For FZRA and RASN the first day exhibits slightly less skill in the ROC curves when compared with the second day. One hypothesis could be that spinup in precipitation processes is to blame, although with just this information we cannot be sure, and indeed the frequency bias adjustment procedure described above suggested that spinup was not a major issue. The AUC index for FZRA varies between 0.72 at 0–24 h and 0.59 at 144–168 h, indicating slight skill, especially at earlier lead times. RASN (Fig. 5d) shows similar overall skill to FZRA; however it is slightly worse than FZRA at shorter lead times and better at longer lead times. In fact, the skill in RASN forecasts using this metric does not vary much with lead time. Finally, the ability to predict IP is almost negligible, with all curves close to the diagonal. Wandishin et al. (2005) published one of the first studies documenting the generation and verification of a precipitation-type short-range ensemble forecast product for the winter season using temperature vertical profile forecasting. That study shows much better results in the AUC, with values around 0.95 for RA, 0.86 for FZRA, 0.96 for SN, and 0.80 considering IP. This significant difference in the results is because they only considered sites at which precipitation was both observed and forecast by at least one ensemble member. In the present study, we wanted to evaluate the precipitation-type forecast, including the no-occurrence case, while in their study the problem of precipitation type was separated from the occurrence/nonoccurrence.
c. Relative economic value
Previous subsections contained several verification indices that provide the user with information about the usefulness of the PROBptype. The precipitation-type forecast value itself can be evaluated using a standard cost–loss model (Richardson 2000; Wilks 2001) or with more complex methodologies that incorporate the uncertainty in forecast probability derived from an ensemble, as Allen and Eckel (2012) propose. In this paper we only use the simpler first approach. The basic premise of the cost–loss problem is that a decision-maker is faced with the uncertain prospect of some kind of weather event. The user will be able to protect against the effects of this event, which incurs a cost, while the opposite scenario—occurrence of the event without protective action—results in a loss to the user. The protection cost occurs whenever the final decision is to protect, whether or not the weather event occurs. However if no protective action is taken and the event does not occur, the economic impact is 0 (Wilks 2001). The users will have available the probability of occurrence p of this weather event for decision-making. If protection is chosen, the cost will be incurred (with a probability of 1), but the loss will be 0. If protection is not chosen, loss will be suffered with probability p. Therefore, the optimal time to pay for protective action is when the probability of occurrence of the event is more than the user’s cost–loss ratio. So, the “relative value” of a forecast system is defined as the reduction in expenditure that it would lead to divided by the reduction that would be achieved by using a perfect forecast.
The benefit of the probabilistic approach is demonstrated in the cost–loss model by the flexibility it adds to the choice of a decision-making strategy: for a low cost–loss ratio application, it is a good choice to take action even for low forecasted probabilities. On the other hand, for large cost–loss ratio applications, costs can be reduced by taking action only for forecast probabilities close to 100%. The envelope of the value added by all possible strategies (one for each threshold value of the probability) is shown by the full curve on the relative value diagram (see Fig. 6 for examples). Two parameters can summarize the information provided by cost–loss value plots: the maximum relative value and the width of the value curve . The second parameter provides information about the range of users for whom the forecast system would provide positive value (the larger is, the more users will obtain benefit from the product, assuming there is a somewhat even spread of users’ cost–loss ratios).
We show the cost–loss ratio curves for four precipitation types: RA (Fig. 6a), SN (Fig. 6b), FZRA (Fig. 6c), and RASN (Fig. 6d), at six different lead-time ranges, from 0–24 to 144–168 h. These relative values have been calculated with a sample climatology of four winter months, as is the case for rest of the analysis in this paper. IP cases have been removed from this study due to the marginal utility of their cost–loss results for users; however, RASN was kept due to the close connection with SN events. The relative values for RA (Fig. 6a) and SN (Fig. 6b) show quite similar shapes in the plots although there are some differences, especially in their behavior at different lead times. At 0–24-h lead time, is between 0.6 and 0.7 for RA and slightly higher for SN. While decreases with increasing lead time for RA, for SN it remains quite similar up to T + 72 h. Beyond T + 72, is reduced more slowly for SN than for RA. This is similar to the behavior we saw in AUC for these precipitation types (Figs. 5a,b) At the same time, for SN, for lead times from 24 to 96 h, is higher than it is for RA, with cost–loss values extending up to about 0.8, meaning that this product’s SN forecasts would be useful for a greater range of users (greater ) than would its RA forecasts. FZRA and RASN predictions (Figs. 6c,d, with reduced scale ranges compared to Figs. 6a,b) show smaller relative values for the first lead-time range and cease to have value for users beyond 48 and 24 h, respectively. This verification was developed by comparing the exact time and location of the observations in relation to the forecasts, so predictions only slightly displaced in space or time count as incorrect (which can often happen for short-lived events and for FZRA cases, as an example in section 5 will show). Also, FZRA events are rare, implying that the user should be ready to take action on the basis of forecasts of low probability of occurrence. Although this study has not compared the relative value for the ECMWF ENS with the relative value of using just a single deterministic forecast (e.g., from HRES here), Richardson (2000) demonstrated that the added value of large ensembles, like the ECMWF ENS, is particularly important for users with low cost–loss ratios and for rarer events, because of the ensemble’s ability to sample the tails of probability space.
4. Verification of favored precipitation type (map product)
The PREFptype was verified as a dichotomous (yes–no) forecast, and it was applied only for colored areas (well-defined precipitation type). Performance diagrams (Roebber 2009) can relate four verification indices in the same plot: H, success ratio (SR), frequency bias (FB), and critical success index (CSI; also known as threat score). This diagram is similar to a Taylor diagram (Taylor 2001) but is useful for dichotomous (yes–no) forecasts. Based on a 2 × 2 contingency table (Table 2), these scores are defined as
where the false alarm ratio (FAR) is
These indices are mathematically related, and the geometrical representation in a single diagram allows accuracy, bias, reliability, and skill to be simultaneously visualized. Figure 7 is a performance diagram showing results for all precipitation types at each lead time. Dashed lines represent bias scores with labels on the outward extension of the line, and labeled solid contours are for the CSI. Green dots correspond to RA, blue dots to SN, red to FZRA, turquoise to RASN, and orange indicates IP. The different dot sizes represent different lead times, so the smaller the point, the longer the lead time. In the original conceptualization of this diagram, a perfect forecast would lie in the top-right corner; however, this is a postprocessed product where we obtained the PREFptype, thereby eliminating the possibilities of other precipitation types, so the verification results are in effect portrayed for the product itself, and not specifically for the precipitation-type variable in the ENS. For RA and SN (Fig. 7), the earliest lead times are clustered toward the center of the diagram, close to the bias = 1 line, especially RA at 0–24-h lead time (the biggest green dot) with maximum H between 0.5 and 0.6 (and with a similar result for SR). For the same precipitation types, values of CSI between 0.3 and 0.4 are observed, decreasing to 0.1 as we move on to day-6 forecasts. As seen in the probability of precipitation-type ROC curves (Fig. 5), the skill levels for FZRA and RASN on the PREFptype product are low, but there is still some predictability. In this case, the H is lower (values not higher than 0.2) than with the probability product (more than 0.4 in Fig. 5c). Finally, the forecast skill of this product is minimal for RASN and completely negligible for IP.
Because FZRA, RASN, and IP are usually transient mixed precipitation phases, identification on the PREFptype will tend to correspond, on average, to lower probabilities than one sees on average for RA and SN on that product. For similar reasons, as we go to longer lead times, the frequency with which one sees these types on the map product decreases very rapidly (note that in Fig. 7 the nominal bias for this product decreases more rapidly with lead time for mixed-phase types than it does for RA and SN). Although this verification diagram does not incorporate the verification metrics that are most strongly affected by base rate dependency or the no-occurrence cases, such as percent correct (PC) and F, it still uses H, SR, and CSI, which are all potentially affected by it. When an event becomes rarer, these quantities or indices tend toward 0, because the entries in the contingency table tend to zero at different rates. One way to solve this issue can be through the computation of a symmetric extremal dependency index (SEDI), which has many beneficial properties that are not present in most other verification measures used with rare events (Ferro and Stephenson 2011). The SEDI is based on the F and H indices. The score is defined as
Table 3 shows the SEDI index values for the three rare events studied in this paper (FZRA, IP, and RASN) at different lead times. This index gives a better idea of the skill for these three precipitation types in the PREFptype product than does the performance diagram, where the values of the indices were a bit too small to be usefully compared at different lead times. FZRA shows better skill at shorter lead times with a maximum value of 0.61, which progressively decreases with lead time, reaching zero at 120–144 and 144–168 h. RASN is more stable over lead times, always keeping low values that vary from 0.38 at 0–24 h to 0.19 at 144–168 h. Finally, ice pellet forecasts have no skill, confirming earlier results from the PROBptype product verification.
So we will summarize one or two of the key verification results for the PREFptype product. On days 1 and 2, if the PREFptype is showing SN or RA falling at a given time, there is a 50%–60% chance that observations will show the same, while for FZRA the maximum chance is only about 15% on day 1 and 5%–10% on day 2. This reiterates that the correct prediction of falling precipitation, even at day 1, is challenging, and for FZRA it is particularly difficult. This is true for the model and indeed for human forecasters also, because of the narrow range of vertical atmospheric profiles required for FZRA to be a possibility. Physically, one might expect many of the forecast failures for SN and RA to be related to showery situations. However, for FZRA this is unlikely to be a contributor, and the primary explanations will probably be the finely balanced nature of the related synoptic situation, and also the fact that the IFS model in its current form cannot represent freezing drizzle from supercooled water clouds (see Forbes et al. 2014). By design, the PREFptype also tends to “underrepresent” instances of specific precipitation types at longer leads, because it needs to have a probability of (any) precipitation falling greater than 50%, and as the ENS becomes more dispersed with lead time, probabilities in general terms tend to migrate toward the (model) climatological probability, which for most parts of Europe is well below 50% (we saw an example of this type of behavior in Fig. 1, albeit for another continent). Nonetheless, we believe that the PREFptype is a useful and compact resource for forecasters, provided there is an understanding of its method of construction. It can be made even more valuable when used in conjunction with the meteogram products. In the near future ECMWF users will be able to click on the PREFptype map product via a web interface and immediately view the corresponding PROBptype meteogram product for anywhere in the world.
5. Freezing rain in Finland: A case study
From the night of 27 February to noon on 28 February 2017, the southeastern part of Finland suffered from a heavy freezing drizzle and FZRA event that provoked numerous problems on the roads and impacted the electrical power grid. After a period of snowfall the day before, a low center situated in the North Sea transported warmer and moister air from the south, primarily in the 900–700-hPa layer that created perfect conditions for FZRA occurrence (Fig. 8).
Figure 9a shows the PREFptype product generated by ENS model output from a base time at 0000 UTC 27 February 2017 and valid 24 h later. An extensive area of FZRA is observed on the PREFptype product, situated in the southeastern part of Finland, matching fairly well with the observations from the nearest SYNOP stations. Examining the forecast from the same base time but for a 33-h lead (Fig. 9b), the signal of FZRA continues, and the probabilities in the center of the affected area are between 50% and 70%. The rest of the observations also match quite well with the PREFptype. For forecasts from the 0000 UTC 26 February 2017 base time (more than 48 h before the FZRA event started) with a 54-h lead time (Fig. 9c), a smaller but still clear signal of FZRA is observed on the map, with probabilities below 50% but with some specific points indicating probabilities between 50% and 70% in the far west of Russia. Finally, in 78-h forecasts from 0000 UTC 25 February 2017 (Fig. 9d), the FZRA signal is not so clear, as it is concentrated in Russia, but there are one or two points of IP, which needs a very similar atmospheric structure to that required for FZRA formation.
Figure 10 shows the nearest radiosonde ascent in the area (pink star in Fig. 9a), situated in Jokioinen Ilmala (60.81°N, 23.50°E). Even though this station was not exactly in the position where FZRA was observed, it was the nearest available and can reflect the general atmospheric conditions in the zone. Evidently, the lower troposphere is characterized by a very shallow layer of cold and wet air around 0°C near the ground with somewhat warmer moist air above it up to 850 hPa. Above that a saturated layer below 0°C is observed up to 550 hPa. This profile, with the presence of both an elevated warmer layer and a layer of subfreezing air adjacent to the surface, in principal creates the perfect conditions for FZRA formation. However, if the warm layer was not sufficiently extensive to melt all the snow, ice pellets could result instead.
According to reports, one of the most affected places in the region was the town of Mikkeli (black star in Fig. 9a). Meteograms of the PROBptype product for this location are shown in Fig. 11; these run out to 7-day leads and start from four consecutive 0000 UTC base times. Figure 11a is for 24 h before the FZRA event started. High probabilities of FZRA and IP together are denoted during the first half of 28 February, up to more than 70%, of which 50% or so is for FZRA. The domination of FZRA over other precipitation types was seen as well in the PREFptype map (Figs. 9a,b). The transition from SN (the day before) to FZRA, and then RA (during the second half of 28 February) is not unusual, being typical of warm fronts. The meteogram also indicates that for many of the ENS members that predicted FZRA the associated precipitation rate was between 0.2 and 1 mm h−1. The next two meteograms with base times of 0000 UTC 26 February (Fig. 11b) and 25 February 2017 (Fig. 11c) show lower probabilities of FZRA (not higher than 25%), but on the other hand there is a relatively consistent signal for the period when FZRA was observed. Finally, the meteogram initialized 4 days before the episode (Fig. 11d) does not show FZRA, but we have two ensemble members that show IP as the precipitation type at some moments during 28 February, indicative of the possibility that there could be an elevated warm layer in the model output. While two ensemble members forecasting FZRA (only 4% probability) would not be enough for decision-makers to act, it is nonetheless a small signal that alerts the user to at least pay more attention to the forecasts of the following days. Moreover, for this longer-lead forecast the time step between bars is only 3 h (compared to 1 h for the later data times), so the user would also need to be alert to the possibility of a short-lived weather event not being fully captured.
This case study has shown how these new products based on the precipitation-type variable from ENS have been able to forewarn of a severe FZRA episode 3 days in advance, but with a clearer signal in the meteogram product that can help in decision-making for local or regional warnings. As stated above, we would recommend that users start with the PREFptype map product, then for possible events investigate in more detail what the actual probabilities and rates are using the meteogram product. Unsurprisingly, the probabilities of FZRA decrease with lead time in the meteograms, but would have provided early warning that a FZRA event could happen. Users could conceivably act upon this information, taking protective action, especially if there was supporting evidence from other sources. However, recall also that Fig. 6c suggested that overall there is no economic value in FZRA forecasts for lead times beyond day 2 (when using the ENS in isolation).
This paper describes two new probabilistic products based on the instantaneous precipitation-type variable in ECMWF ENS forecasts. Together they provide a new forecast tool for decision-makers related to high-impact weather and exploit probabilistic forecasts for this purpose, which is almost certainly better than just taking a deterministic viewpoint. The PREFptype map product shows which of the six precipitation types (RA, SN, RASN, FZRA, IP, and WSN) is most probable whenever the probability of some precipitation is >50%. This product is classified in three different ranges of probabilities: up to 50% (low probability), from 50% to 70% (moderate probability), and higher than 70% (high probability). As a complementary product, the PROBptype meteogram product represents the temporal evolution of precipitation-type probabilities for a specific location, also incorporating important additional information regarding the precipitation rate. The instantaneous total precipitation rate is shown in three different categories: from a minimum Rmin to 0.2 mm h−1 (low intensity), from 0.2 to 1 mm h−1 (medium intensity), and greater than 1 mm h−1 (high intensity), providing an indication of the potential severity of SN or FZRA events. So if we consider the two products together, from a user perspective, the map product can first deliver a useful initial ENS-based overview of a given weather situation, while the meteogram product would then allow the user to drill down, for a given site, to see all the probabilities, their temporal evolution, and vital additional information concerning precipitation rates. In this way the user will be able to make better-informed decisions regarding severe winter weather, like FZRA. This is true even if the actions were only to amount to putting standby measures in place well in advance of a possible (low probability) event.
In creating the new products, a new methodology for reducing the systematic model bias and defining the minimum precipitation rate for each precipitation type was established. For this we used the instantaneous precipitation rate variable, applying different precipitation rate thresholds for each precipitation type (to classify dry from precipitating) to try to enforce a zero frequency bias, relative to manual SYNOP present weather observations. This lead to the thresholds (Rmin) being 0.12 mm h−1 for RA, 0.1 mm h−1 for RASN, and 0.05 mm h−1 for SN, FZRA, and IP, which are now being used as semipermanent filters for the final products, to help reduce misses and false alarms. When model physics (or other) changes impact the precipitation rate and precipitation type, we expect to have to recalibrate the results. Reliability diagrams showed that the main positive impact of applying this technique was in practice to reduce the overestimation of RA and to reduce the underestimation of SN. For the rest of the precipitation types the benefits are not so clear-cut, but there is no evidence of any degradation.
A complete 4-month verification of winter weather precipitation-type probabilities was developed for both products, at seven different 24-h lead time ranges from 0–24 to 144–168 h. Three-hourly manual SYNOP observations were used for the verification. Also, SYNOP observations were previously tested to reduce the uncertainty arising from incorrect classifications of the reported present weather. ROC curves and AUC values show that, for RA and SN, the levels of forecast skill are quite similar, although the RA forecasts have somewhat greater skill. For FZRA and RASN, the reliability is relatively lower and, similarly, the ROC curve skill is also quite low (though not negligible, at least for shorter lead times), probably due in part to limited occurrences during the study period. A common feature for all the precipitation types is, unsurprisingly, that the uncertainty increases as lead time increases. Finally, the ability to predict IP is almost negligible, as we expected from physical considerations and by referring to other studies.
A cost–loss value analysis also indicates that the RA and SN probabilities can be useful for decision-making for a broad number of users, while FZRA and RASN show reduced relative values for the first lead time ranges and nonexistent relative value for longer leads. However in practice, as was shown with the freezing-rain case study, the true value beyond day 2 may in fact be underestimated with this metric if one takes into account the effects of small displacements in space or time, which are more prevalent at longer leads. The IP relative value was 0 for all the lead times, so it did not present any utility for users. One question that arises is should IP remain as a separate category in the verification, or should IP events be merged with SN events. Or, alternatively, should IP be merged with FZRA because although IP itself does not present any important risk, the atmospheric conditions conducive to IP are very similar to those that are conducive to FZRA. As of now, there is no clear-cut answer here. Similar considerations could be applied for RASN, merging this category with SN.
Verification of the PREFptype map product was developed using a performance diagram, which is a very useful tool for simultaneously representing different parameters and verification indices. This verification brought similar results, but they were classified in dichotomous terms—occurrence or nonoccurrence. As was seen in the ROC curves, the skill for FZRA and RASN as represented on the PREFptype is not good, but there is some predictability. RA and SN have the best forecast skill, but that decreases considerably with lead time, while IP forecasts had no skill.
Finally, the FZRA case study described in section 5 showed how the new products from ENS, when used together, could forewarn of a severe FZRA episode 3 days in advance, but with a clearer signal in the meteogram product, which can help in decision-making for local or regional warnings. While the probabilities of FZRA decreased for longer lead times, as we expected, they still provide some useful information for the users.
A limitation of our approach is that we are working with instantaneous parameters of the model, so at longer lead times one may miss events because they may fall in between the model time steps displayed. This is an area to consider for future work, but the output we do provide does not claim to portray any more than what is happening at given snapshots in time, and a typical user should readily appreciate any innate limitations of this. Additional future work will include an assessment of the new precipitation-type products in other regions that suffer from weather hazards like heavy snowfall or freezing rain (particularly North America), since the output is global. Within the framework of the ANYWHERE project, the relative economic value for different regions in Europe will be evaluated to provide a tailored guide to the value of these products for users.
This work has been supported by the European Horizon 2020 research project ANYWHERE (EC-HORIZON2020-PR700099-ANYWHERE). The authors also wish to thank the Hungarian Meteorological Service (OMSZ), particularly Istvan Ihasz, who provided the original idea for the meteogram product, and Zied Ben Bouallegue from ECMWF for his valuable advice regarding the relative economic value subsection.