An in-flight icing index from the literature is implemented in the Met Office forecasting system. Comparison of hindcasts of cloud fraction with ground-based remote sensing observations of liquid and ice cloud fraction are used to inform a reformulation of part of the index. Satellite-retrieved icing potential is then used to quantitatively assess the reliability and skill of the new index and compare its performance to the current operational one. Having shown that the new index has substantially better reliability, ways of presenting separate icing likelihood and severity are explored using a case study.
Ice nuclei must be present in order for each liquid water droplet in a cloud to freeze as the temperature falls below 0°C. In the absence of sufficient ice nuclei, liquid cloud droplets can still be in a liquid state at temperatures between 0° and around −40°C (Rogers and Yau 1989). If an aircraft flies through such a cloud, the supercooled droplets can quickly freeze onto the surface of the wings and airframe (Bruford 1997). This accumulation of ice can seriously affect the handling of the aircraft, which can ultimately lead to a disastrous loss of control (Cole and Sand 1991; Lankford 2000, 2001; Politovich 2003).
Meteorological factors, such as air temperature, humidity, and the presence and size of supercooled cloud particles, are important in determining the probability and severity of an in-flight icing event. In addition to some automated forecasts derived from numerical weather prediction (NWP) model output, operational meteorologists at the Met Office create a number of products which consider the threat of in-cloud icing. The main group of products consists of significant weather charts for aviation purposes, which are issued at 6-hourly intervals. One of the main significant weather charts covers a large part of northwestern Europe and considers forecast weather below 10 000 ft (3050 m). These charts are produced as part of the Met Office’s contractual obligation to the Civil Aviation Authority in the United Kingdom and are known as “LOLAN” charts.
Meanwhile, the Met Office is one of only two World Area Forecast Centers (WAFC) that provide weather forecasting services to meet Annex 3 to the International Civil Aviation Organization (ICAO) Convention on Civil Aviation (ICAO 2018). As part of this commitment, significant weather charts are produced globally at 6-hourly intervals for forecast weather above 10 000 ft (3050 m, Fig. 1a). These also include a consideration of the threat of in-cloud icing.
More specific and detailed guidance on the threat of in-cloud icing is provided by Met Office operational meteorologists, mainly at British military bases. This output is often focused on specific airframes or more localized areas than the more generic LOLAN chart. Meanwhile, significant meteorological advisories (SIGMETs) for U.K. airspace, which comprises the London, Scottish, and Shanwick Flight Information Regions (Fig. 1b) and extends over part of the eastern Atlantic Ocean, are issued for a number of weather hazards, including severe in-cloud icing. Best practice at the Met Office tends to be to issue SIGMETs with lead times of approximately 1 hour. Valid for no more than 4 h at a time, the issuing of SIGMETs is based on a combination of NWP model output, operational meteorologist input, and pilot reports.
Techniques used by Met Office operational meteorologists to identify areas conducive to in-cloud icing are heavily weighted toward empirical thinking. Meanwhile, the current Met Office operational icing index, which is calculated automatically from NWP model output, is rarely used by forecasters due to its tendency to overpredict. The empirical considerations used by forecasters include an appreciation of synoptic situations known to be conducive to the potential for in-cloud icing. These include marine stratiform cloud, moist unstable airmasses, and areas on and ahead of warm and occluded fronts. Cold frontal precipitation bands, particularly when line convection is present, and regions of nimbostratus and cumulonimbus are all also associated with the risk of severe in-cloud icing. Otherwise, in-cloud icing is considered concentrated in cloudy environments, mainly with air temperatures between 0° and −20°C. Other considerations which have been identified as present in known in-cloud icing events, such as isothermal cloud layers and presence of some element of wind shear in cloud, are also borne in mind.
Human forecasters can undoubtedly add local knowledge and compensate for known model biases, before issuing an aviation forecast. As a result, a purely model-derived icing-risk product is not our goal. However, a skillful and trusted automatically generated icing index would be useful for operational meteorologists at the Met Office to consult as they prepare their forecasts.
When considering an icing threat, both currently and in the next few hours, NWP-derived indices can be combined with satellite or ground-based radar observations in order to provide a consistent estimate of the locations of icing risks in the “nowcast” range (e.g., Tafferner et al. 2003; Le Bot 2004; Bernstein et al. 2005; Kalinka et al. 2017).
The focus here is on longer lead times (from T + 12 to T + 36), where the NWP output is not combined with observations. To implement such a model-derived icing product its quality needs to be measured, and choices relating to how it is calculated need to be assessed. Not only is this quantitative assessment necessary to show that the new index is skillful and worth looking at while preparing a forecast, it is essential in order to justify subsequent changes and improvements to the index in the future.
One way of verifying icing forecasts is using pilot report (PIREPs). However, there are a number of issues relating to location errors, sparsity of null reports, and avoidance of at-risk areas, which mean that statistical evaluations of icing algorithms using PIREPS must include a careful consideration of these sampling biases (Brown et al. 1997; Kalinka et al. 2017). Additionally, there are nonmeterological factors that can modulate the severity of an icing event. These include the type of aircraft, whether it has de-icing equipment, the time spent in cloud, and its flight history. The latter is important as the same aircraft, flying through the same cloud, would be more vulnerable to icing if it is descending from altitude and has a cold fuselage, rather than if it is ascending from the relatively warmer surface.
Minnis et al. (2005) use satellite remote sensing to infer the presence of conditions conducive to icing. Their methods have been implemented at the Met Office using data from the Meteosat Second Generation (MSG; Schmetz et al. 2002) satellite over the Greenwich Meridian (Francis 2007). As a result, an alternative way of evaluating forecast icing indices is to verify against satellite-based icing potential (Bowyer and Gill 2019).
In this paper, an icing index adapted from Belo-Pereira (2015) is implemented in the Met Office forecast system and used to produce hindcasts of icing risk based on NWP model output out to T + 36. Satellite-based verification on a 0.25°, 0.5°, and 1.25° grid shows that the new index is more reliable than the one currently produced operationally. To complement the passive top-down evaluation from satellite, ground-based active remote sensing is used to evaluate one of the elements that make up the icing index and address an apparent frequency bias. This leads to a further refinement of how the new index is calculated, which, when compared to the current operational index, still leads to both improved reliability and improved ability to distinguish between icing and nonicing events, when evaluated using the satellite data.
Following discussion with aviation forecasters, it was found that for automated icing products to be useful as initial guidance, they need to include a separate quantification of likelihood and severity. Having shown that the new icing index is statistically more reliable than the current operational one, the new index is used to quantify likelihood. The in-cloud condensate amount in the supercooled liquid water temperature range is then used to quantify severity. A simple 3 × 3 color matrix is introduced, which quickly summarizes the probability and severity elements of the NWP-derived index.
The model simulations, icing index calculations, satellite-based evaluation techniques, and ground-based data used for comparison are described in section 2. Results of the satellite-based and ground-based evaluations are presented in sections 3a and 3b, while a case study from some recent icing events in section 3c illustrates some of the key points about the performance of the control and new schemes. A discussion appears in section 4, where the 3 × 3 color matrix is also introduced, and conclusions are drawn in section 5.
a. Model simulations
All hindcasts were performed with the Met Office Unified Model (MetUM), using the Global Atmosphere 6 configuration (GA6; Walters et al. 2017). All simulations were initialized from Met Office global operational analyses and were run using a resolution known as N768, which consists of a regular latitude–longitude grid with 2 × N points of longitude and 1.5 × N points of latitude, leading to a longitude spacing of 0.23° and a grid size of 26 km on the equator. All simulations were initialized at 0000 UTC and run for 36 h.
For evaluation against satellite observations, 365 simulations were carried out covering 1 March 2016–28 February 2017, and the comparisons were done at 1200 UTC using model data valid at T + 36. For comparison against ground-based radar and lidar observations, 30 simulations were initialized at 0000 UTC each day from 31 March to the end of April 2007 and hourly model data with a lead time from T + 12 to T + 36 were concatenated together. Finally, for the icing event case study occurring around noon on 26 January 2017, the simulation was started a day and a half earlier, at 0000 UTC 25 January 2017.
b. Icing indices
The Met Office icing index that is currently provided operationally to the World Area Forecast System (WAFS) is based on Schultz and Politovich (1992). If the model temperature is between 0° and −20°C, and there is some cloud present (i.e., cloud fraction C is greater than zero), then the icing index is set to the value of the relative humidity. This index is the control.
The new icing indices being tested in the Met Office system are based on the simplified forecast icing potential (SFIP) described by Belo-Pereira (2015), itself based on the work of McDonough et al. (2004), Bernstein et al. (2005), and Wolff et al. (2009). The SFIP is defined as
The SFIP is based on a combination of membership functions derived by Bernstein et al. (2005), which quantify how occurrences of icing in pilot reports vary depending on meteorological conditions ϕ at the same time and place, as estimated from a database of NWP model output. The temperature membership function used here differs slightly from the one used by Bernstein et al. (2005) and Wolff et al. (2009); for simplicity it does not differentiate between convective and stratiform cloud conditions. Additionally, in contrast to Belo-Pereira (2015), a different temperature membership function is not used for the lowest flight levels. Here, is defined as
with , , , and . This is shown in Fig. 2a.
The relative humidity (RH) membership function, which unlike Belo-Pereira (2015) is again the same irrespective of the flight level, is qualitatively similar to that used by the other authors. Here it is specifically defined as
with and (Fig. 2b). Next, the vertical velocity w membership function is defined as
with , , and m s−1 (Fig. 2c). Finally, the liquid water content (LWC) membership function is defined as
The last thing to define is the relative weights given to each of the membership functions. Following Belo-Pereira (2015), , , and , and any negative values of SFIP are reset to zero. Both the control and SFIP indices can be calculated on a three-dimensional grid, using data from a numerical weather prediction model.
There are questions regarding how well NWP models capture the physical processes that determine the partitioning of condensate into the liquid or ice phase (e.g., Kalesse et al. 2016; Furtado and Field 2017). To avoid a SFIP index performing badly simply as a result of an incorrect phase partition; despite getting the temperature, humidity, and total condensate correct; a slightly different SFIP index was considered, where the condensate membership function took the sum of the LWC and ice water content (IWC) as its input, rather than just the LWC.
It is worth noting that the RH membership function, which rises from zero to one, over an RH range of 0.6–0.95, looks very similar to the parameterization of cloud cover developed by Sundqvist et al. (1989). This prompts one to ask, given the efforts to improve the cloud cover forecasts in NWP models (e.g., Tiedtke 1993; Wilson et al. 2008; Forbes and Ahlgrimm 2014; Furtado et al. 2016), whether using a model’s cloud fraction C would be a viable alternative to using when calculating SFIP. Indeed, the model’s cloud fraction quantifies the probability of encountering cloud within a grid box; and is used by the model for doing microphysical and radiative transfer calculations. So it would make sense to test the inclusion of cloud fraction as part of an icing index.
This leads to several permutations of SFIP indices. Either can be calculated from just the liquid (L) or from the liquid and ice (LI). Additionally, the relative humidity membership function (RH) can be replaced with either the liquid cloud fraction (LCF) or the bulk cloud fraction (BCF, which is the cloud fraction irrespective of whether the condensate is in the liquid or ice phase). These will be referred to as SFIP-L-RH, SFIP-L-LCF and SFIP-LI-BCF, and scores quantifying their performance are summarized in Tables 1 and 2.
c. Satellite-based evaluation
Minnis et al. (2005) describe how an icing potential can be calculated from satellite-retrieved cloud-top temperature, liquid water path, and effective radius. Their algorithm has been implemented to work on MSG data (Francis 2007), which cover the region 60°N–60°S, 60°W–60°E. The comparison presented here uses data valid at 1200 UTC each day for 12 months from 1 March 2016.
The satellite icing potential is available on a 0.25° grid, while the model forecasts are on a similar (N768, ≃26-km grid). To reduce double-penalty effects (e.g., Mason 2003), both datasets have been regridded to 1.25°. The impact of the regridding on the apparent skill of the icing indices is shown in Tables 1 and 2 and discussed in section 4.
The satellite retrieval will not be able to determine the icing potential if there is optically thick ice cloud located above lower-level clouds. In this case, the satellite icing potential reports missing data, and those locations are ignored from the model fields prior to carrying out any comparisons. The 5 × 5 regridding is calculated as a mean of the pixels that do have valid data, and the masking of the model fields to account for missing observations is only carried out after the regridding.
The satellite icing potential is a two-dimensional product (2D). The model-derived icing indices are three-dimensional (3D). In order for the two to be compared, the model-derived indices need to be collapsed in the vertical in some way.
One option is simply to take the vertical maximum of the model icing index, and this is done here. It could be argued however, that when passive satellite sensors detect a moderate icing risk near cloud top, there could be a region of higher risk located lower down, which may be undetected. Therefore, taking the maximum value in each model column would not be representative of what the satellite is seeing.
Here a pragmatic filtering technique is used, similar to the one developed by Bowyer and Gill (2019), for turning the 3D model icing indices into a 2D field ready for comparison against the satellite icing potential. For each model column, one starts at the top and searches down through the icing index field until a certain threshold value is found. Any values smaller than the threshold are ignored. If the threshold is encountered, that becomes the value of the 2D field at that location and any icing values occurring below, whether smaller or larger, are ignored. If the threshold is never encountered the value of the 2D field is set to zero. As the goal of this study is to compare several icing indices against each other, one must ensure that the choice of threshold does not favor one index. For each of the indices studied here, the evaluation is performed four times; once with the vertical maximum and once with the filtering threshold set to 0.05, 0.1, and 0.2.
Additionally, when using the satellite icing potential to evaluate the model products, one needs to decide on the minimum value of satellite icing potential that is assumed to indicate that an event occurred. Because of uncertainty as to the best choice of threshold, three different thresholds are used in this work. The first is at 0.05 (as inspection of satellite icing potential histograms—not shown—indicates that this discriminates between all the zero and nonzero values). The other two thresholds are set at 0.4 and 0.7, which Minnis et al. (2005) state are the thresholds between low-and-medium and medium-and-high probabilities for the satellite icing potential. Combined with the 4 ways of converting from 3D to 2D fields, this leads to 12 different ways of evaluating the model icing index using the satellite observations.
Although all the icing indices and the satellite icing potential are dimensionless quantities with a range from zero to one, this does not mean that they are numerically equivalent. So as well as exploring three different choices for the definition of an event in the observations, comparisons are also made by considering a range of thresholds (from 0 to 1 in 0.01 steps) for defining an event in the model.
Hit rate and false-alarm rate are calculated at each of these thresholds, allowing the creation of receiver operating characteristic (ROC) curves (e.g., Wilks 2006). The area under the curve (AUC) can be used as a measure of the discriminatory skill of the forecast, which can further be summarized using the ROC skill score, .
To check whether the model icing indices can usefully be interpreted as probabilities, reliability diagrams are used (e.g., Wilks 2006). These show, for any model predicted value, the observed frequency of an event. If the index is a reliable predictor of probability then the data should fall along the diagonal. When interpreting reliability diagrams there is a question as to how far data can be off the diagonal and yet still be consistent with a reliable prediction. To assess this, consistency bars have been added around the diagonal following the bootstrap resampling method of Bröcker and Smith (2007).
d. Comparison against ground-based remote sensing observations
Although the satellite-based retrieval provides data over a large spatial domain, it is unable to detect any supercooled liquid cloud that may be located below thick ice cloud. Ground-based remote sensing observations from radars and lidars at three sites operated by the Atmospheric Radiation Measurement (ARM) program (Stokes and Schwartz 1994) are therefore used to provide a complementary evaluation of one element of the SFIP icing index. The data used are from the Southern Great Plains (SGP), United States (36.606°N, 262.515°E); Darwin, Australia (12.425°S, 130.892°E); and Murgtal, Germany (48.545°N, 8.405°E). Liquid and ice water contents are inferred from the radar and lidar data as described by Illingworth et al. (2007). For comparison with the model, an observed profile of cloud fraction is derived from the height-resolved high-frequency observations above each site. For the height range corresponding to each of the model levels, this is done by calculating the fraction of observations that contain liquid or ice cloud within a one hour period and, following Illingworth et al. (2007), assuming that the time-averaged cloud cover is a proxy for the spatially averaged cloud cover. This provides a separate profile of observed liquid and ice cloud fractions each hour for the month of April 2007. The month of April 2007 was chosen for validation as it was a period when observational data were available from these three sites with very different weather conditions. Hogan et al. (2001) discussed the need to filter model fields to remove the thin high-level cirrus clouds that the ground-based instruments would not be able to detect. This ensures a fairer comparison between model and observations. The filtering method, which is described by Brooks (2005) and Morcrette et al. (2012), consists in knowing the minimum detectable IWC as a function of height for each radar and assuming an analytical shape for the subgrids variation in IWC. The filtered model ice cloud fraction consists of that portion of the model IWC that would be expected to be detectable by the ground-based radar.
a. Icing index evaluation using satellite observations
As an illustration of the data being used for evaluation, Fig. 3 shows the satellite icing potential at 1200 UTC 1 March 2016. The data are on a 0.25° grid in Fig. 3a and have been regridded to 1.25° in Fig. 3b.
Meanwhile, illustrative two-dimensional icing index fields are shown for the same time in Fig. 4. Figures 4a–d are the control operational icing index based on temperature and relative humidity when cloud is present. Figures 4e–h are SFIP-L-RH (simplified forecast icing potential calculated using temperature, vertical velocity, liquid condensate amount, and relative humidity). Different ways of converting from a three-dimensional to a two-dimensional field are shown in each column: Figs. 4a and 4e use the vertical maximum. Other columns use filtering with a threshold of 0.05 (Figs. 4b,f), 0.1 (Figs. 4c,g) and 0.2 (Figs. 4d,h).
The effect of the search algorithm is to remove small values of the 2D icing index field which are present when using a vertical maximum. It also reduces the amount of large values, found when using the vertical maximum algorithm, as these are typically located below regions of moderate icing index which are numerically large enough to stop the search algorithm looking farther down.
Figure 5 shows 12 reliability diagrams for the control icing index based simply on relative humidity when cloud is present between 0° and −20°C. A total of 10 out of the 12 reliability diagrams have Brier skill scores (BSSs) that are negative, indicating that the index is worse than a prediction based simply on climatology. It is worth noting that, generally, the control icing index has poor “reliability” in the probabilistic forecast sense, which mean that the data points are far from the diagonal. The control index also has poor “resolution” as the line through the data is rather flat. This means that events occur with very similar frequencies in reality irrespective of the numerical value of the model-produced index. These findings are consistent with the anecdotal feedback from forecasters, which states that they do not use the control index as part of any decision making process as it is not found to be useful.
Figure 6 shows reliability diagrams for SFIP-L-RH. The exact values of the BSS depend on the choice of threshold for defining an event in the observations and the manner in which the 3D model data are converted to a 2D field. It can be seen that using a threshold of for deciding that an event occurred in the observations leads to the most reliable evaluation of the SFIP-L-RH index, while thresholds of and lead to an apparent under or overprediction by the model respectively. Additionally, there is a large range of model indices where the data fall within the range of the consistency bars, indicating that the data are not inconsistent with being reliable. This means that SFIP-L-RH could usefully be intepreted as a probability.
For brevity, as Fig. 5 shows that the control index has such poor reliability, its ROC curves are not shown. As a summary though, the RSS for the control using the satellite threshold of 0.4 appear in Table 1. A series of ROC curves for SFIP-L-RH, which is notably more reliable that the control index, are shown in Fig. 7. The RSS for other ways of defining SFIP, evaluated using the satellite threshold of 0.4, also appear in Table 1. This table also includes results from evaluations carried out on 0.5° and 0.25° grids. Results of BSS are shown in Table 2.
b. Evaluation of cloud forecasts using ground-based remote sensing observations
An example of the data used for evaluation in this section, Fig. 8 shows time–height cross sections of cloud fraction over the SGP site in April 2007 for both the observations and the model.
Averaged over the month, profiles of cloud fraction in the observations and the model are shown in Fig. 9 for each of the three sites. These profiles are calculated considering liquid and frozen cloud fraction separately. In the left-hand column (Figs. 9a–c), the cloud fraction profile is calculated as a simple mean over the month. In the right-hand column (Figs. 9d–f), the profiles have been calculated as the frequency of occurrence (FOO) of cloud fraction greater than 2%.
By defining cloud occurrence events as when cloud fraction is greater than 2%, one can calculate the number of hits a, false alarms b, misses c, and correct negatives d. The frequency bias can then be found from . Focusing on the temperature range from 0° to −30°C, the frequency biases are shown in Fig. 9. In this case a hit indicates that the model has correctly predicted the occurrence of liquid cloud. While still focusing on the same temperature range, the frequency biases in Fig. 10 are calculated irrespective of cloud phase, which leads to a reduction in the frequency biases.
c. Case study: 26 January 2017
On 26 January 2017, a number of incidents of in-flight ice accretion were reported to aviation forecasters at the Met Office. The time and locations of these events are shown in Table 3.
An example of a LOLAN chart, for weather-related aviation hazards below 10 000 ft (3050 m), issued at 0300 UTC and valid from 0800 to 1700 UTC 26 January 2017 is shown in Fig. 11, while an explanation of the abbreviations used on the chart can be found in Lankford (2001). Three incidents occurred below 3000 ft (910 m) within Region C where, according to the LOLAN chart, the 0°C isotherm was forecast to be between the surface and 3500 ft (1070 m) over land. In this region the forecast was for between 5 and 8 oktas of stratus or stratocumulus, with bases between 1000 and 2000 ft (305–610 m) and cloud tops between 2500 and 3500 ft (760–1070 m). Moderate icing was forecast for these clouds.
A WAFC chart, for weather-related aviation hazards above 10 000 ft (3050 m), valid at 0600 UTC 26 January 2017, is shown in Fig. 12 (unfortunately the 1200 UTC chart was not archived). The large expanse of cloud from Iberia to the west of Ireland and extending east into Scandinavia is marked as having the potential for moderate icing from below 10 000 to 18 000 ft (3050–5490 m).
A map of the satellite icing potential at 1200 UTC 26 January 2017, zoomed in over northwest Europe and the United Kingdom, is shown in Fig. 13a. Regions where the icing potential retrieval was impossible due to overlying ice cloud are masked in white. The control and SFIP-LI-BCF model indices are shown in Fig. 13b and c, filtered using a threshold of 0.1. The vertical maximum of SFIP-LI-BCF is shown in Fig. 13d. No masking has been applied to the model fields shown in Fig. 13, although the quantitative evaluation of Brier and ROC skill scores shown earlier were calculated only where the satellite icing potential had a valid retrieval. In each panel, the location of the three reported icing incidents are shown by asterisks.
A different way of visualizing the model data from this day is to plot time–height cross sections at the locations where icing events occurred. These are shown in Fig. 14 for both the control index and SFIP-LI-BCF.
The work in this paper was motivated, in part, by the fact that the current Met Office operational icing index, which is calculated automatically from NWP model output, has such a tendency to overforecast that it is not seen as useful by many aviation forecasters and is hence rarely consulted. This impression is readily confirmed by Fig. 5, which showed that the control index has no reliability and has consistently very poor (often negative) Brier skill scores, irrespective of how the model index is compared to the satellite icing potential.
By comparison, the SFIP-L-RH index shows a large improvement in terms of reliability (Fig. 6). It does depend on exactly how the evaluation against satellite observations is carried out, but there are a number of permutations where SFIP-L-RH has positive Brier skill scores and where the data fall within the Bröcker and Smith (2007) 5th–95th percentile confidence interval, suggesting that the data are not inconsistent with being reliable.
It could be argued that the inclusion of the RH membership function in the calculation of SFIP is just a proxy for cloud fraction. Afterall, the shape of is reminiscent of the parameterization of cloud cover as a function of relative humidity developed by Sundqvist et al. (1989). Unfortunately, comparison of SFIP-L-LCF to SFIP-L-RH, in Tables 1 and 2, shows that the replacement of the relative humidity membership function with the liquid cloud fraction does not generally improve the Brier or ROC skill scores.
However, comparison of model profiles of liquid and ice cloud fraction against ground-based remote sensing observations from some ARM sites shows why this is the case. Although the model mean model cloud fraction is quite well simulated in the temperature range from 0° to −30°C, it is the frequency of occurrence of nonzero cloud cover in this temperature range that is of most relevance to an icing index. In terms of liquid cloud fraction at subzero temperatures, notable frequency biases are seen over the three ARM sites (Fig. 9). Assessing the frequency of occurrence of model cloud, ignoring the phase obtained from the model, leads to reduced frequency biases in the temperature range from 0° to −30°C (Fig. 10). This information can be pulled through into the definition of SFIP. The SFIP is then calculated using the sum of LWC and IWC as the input to the membership function and the relative humidity membership function is replaced with the model bulk cloud fraction (that is the cloud fraction irrespective of whether it is liquid or ice). It is worth noting that the temperature membership function still ensures that SFIP will be zero outside of the range from −28° to +1°C. Tables 1 and 2 shows that SFIP-LI-BCF can maintain the improved reliability of SFIP-L-RH relative to the control (in terms of BSS), while also beating the control in terms of RSS on a 1.25° grid.
The results discussed so far are exemplified by the case study of the icing events reported on 26 January 2017. Three occurrences of significant ice accretion were reported between 2000 and 2500 ft (around 610–760 m) over central England that day. The locations of these are shown overlaid on the satellite icing potential (Fig. 13a) and over the filtered control and SFIP-LI-BCF indices (Fig. 13b and c) and vertical maximum of SFIP-LI-BCF (Fig. 13d).
As is typical of the control index, vast spatial regions have large values (Fig. 13b), with limited areas having nonzero but small values (as would be expected from an index that replicates the numerical value of the relative humidity when cloud is present). This would be even worse if the vertical maximum of the control index was used (not shown). The new SFIP-LI-BCF index by contrast has more small values (Fig. 13c), (even when using the vertical maximum: Fig. 13d) and subjectively seems to better replicate the extent and shape of the region of higher satellite icing potential seen over central England and the North Sea.
For the three locations where pilots reported inflight icing to aviation meteorologist on 26 January, time–height cross sections of icing index have been produced (Fig. 14). These start at 0000 UTC on the day before the reports, so these correspond to T + 36 h forecasts for the incidents concerned. The control index correctly flags that there is a risk of icing but as soon as nonzero values are produced, the value of the index is frequently quite high. This provides little indication of where more or less severe conditions are (Figs. 14a–c). The new SFIP-LI-BCF index by contrast shows a more focused shallow region of increased risk near cloud top, with lower risk below. It is indeed near cloud top that all three reported events occurred. Feedback from operational aviation meteorologists indicates that figures such as Figs. 14d–f, if produced with sufficient lead time, can provide useful additional guidance when it comes to issuing a forecast.
With the SFIP index being able to take any real value between 0 and 1, an additional request from forecasters was for guidance on the quantitative interpretation of the index. Clearly, larger numbers meant more risk, but how should the numbers be interpreted and what were the critical values? In this regard, the possibility of SFIP indices being calculated to many decimal places, leading to a multitude of different shades when displayed on a color scale, was not conducive to conveying a clear message.
Discussions with operational aviation meteorologists have highlighted a need for any automated model-derived products to provide a quantitative indication of both likelihood of an icing event and the severity of the risk. Since the SFIP-LI-BCF has been shown to have some reliability in the statistical sense; that is observed events occurring with the frequency that the forecast index suggests over a range of forecast indices; it would seem sensible to interpret SFIP-LI-BCF as the likelihood of icing conditions. Although the verification in section 3a showed the best results when the satellite icing potential was compared to the model index that had been collapsed to 2D by filtering, when displayed as a map for forecasting purposes, it is better for the 2D likelihood to be calculated from the vertical maximum of the SFIP-LI-BCF, as this highlights the maximum values that an aircraft may fly through as it goes from the surface to cruising altitude, or vice versa. For simplicity, it is decided to interpret SFIP-LI-BCF larger than zero but less than 0.33 as “low” likelihood. SFIP-LI-BCF between 0.33 and 0.66 is taken to be “medium” likelihood, and SFIP larger than 0.66 is “high” likelihood.
To estimate the severity of icing, full global 3D fields of liquid and ice condensate amount are considered. These are divided by the bulk cloud fraction to find the in-cloud condensate and then are multiplied by the temperature membership function from −28° to +1°C to focus on the temperature range of interest. Probability distribution functions (PDF) of these in-cloud condensates are constructed and, somewhat arbitrarily, it is decided to define low severity as the lowest 70 centiles of this PDF. Meanwhile, medium severity is taken to be the 70th–90th centiles, while high severity is defined as the top 10 centiles of the PDF. For the UM using GA6, using global data for the whole of the months of March and August 2016, this leads to thresholds between low-and-medium and medium-and-high located at 1.3 and , respectively. This calibration step, which is model specific and can be done using a climatology, should be revisited whenever the operational model is upgraded.
One advantage of defining severity in this way comes when icing indices produced by different NWP systems are compared. Even though different models may have different cloud fraction and cloud condensate biases, if the thresholds are calculated for each system independently, then high severity will have the same interpretation; it means the conditions are expected to be in the worse 10% that the model is capable of producing.
Having defined low, medium, and high severity in this way, there is no suggestion that these severities will match the definitions of “trace,” “light,” “moderate,” or “severe” that appear in icing PIREPS. However, it is well known that the severity of an icing event, although heavily influenced by the local meteorology, is also strongly dependent on the type of aircraft, flight history, and length of time spent in the supercooled conditions. So a direct mapping is not desirable.
The approach here is pragmatic and designed to highlight the regions where, meteorologically, the conditions are at their worst. One way of perhaps aligning our model-predicted severities with those that appear in PIREPS, would be to measure the relative abundance of light, moderate, and severe icing events that appear in PIREPS and then find the values in the model in-cloud condensate PDF that would split it into three categories that occur with the same relative frequency as the three PIREPS categories. In this way, the model severity would be phrased as low, medium, and high and the thresholds between them could be constructed such that low severity occurs with the same frequency as trace and light in the PIREPS, medium with the same frequency as moderate, and high with the same frequency as severe. Alternatively the thresholds could be chosen as a way to calibrate one NWP center’s icing severity predictions against another center’s predictions. Both of these are areas for future work.
With these caveats regarding the definition of low, medium, and high likelihood and low, medium, and high severity, attention is now focused on the aesthetics of how the information could be displayed in order to be most useful. Figure 15 shows a combination of icing likelihood and icing severity using a simple 3 × 3 color matrix. The map shows that the three reported incidents all occurred near regions of both high likelihood and high severity. Meanwhile, the time–height plots indicate regions of high likelihood of icing near the observed events. In terms of severity, the observed events occurred close to regions forecast, with a lead-time of 30–36 h, to have in-cloud condensate amount in the top decile of the multimonth global climatology.
Although the performance of the different indices in this case study is rather subjective, it is consistent with what was assessed quantitatively in section 3a: the control index tends to overpredict the risk of icing and the index often produces high values leading to little scope for differentiating between levels of risk. The SFIP-LI-BCF by comparison produces numbers that span the range of values from 0 to 1, and high values are focused on smaller regions both horizontally and vertically.
A final point for discussion is the difference in the quantitative assessment of skill depending on whether the forecasts are evaluated on a 0.25°, 0.5°, or 1.25° grid. Table 1 shows that RSS is systematically lower when assessed on the finer grid, meaning that the forecast is less capable of predicting the correct location of events on a 0.25° scale than on a 1.25° scale. This is evidence of the double-penalty effect. When regridding to 1.25° the forecast fields are effectively not being penalized for displacement errors on scales up to 1.25°. Table 2 shows that when the control index is evaluated on a 0.25° or 0.5° grid, the reliability is very poor with negative BSS, as previously discussed for the 1.25° grid.
When comparing the performance of the different indices on the different grids using BSS, the SFIP-based indices all have some reliability, irrespective of the grid, whereas the control index never does. Meanwhile, although the discriminatory skill (based on RSS) is not as good as the control for the SFIP indices on the 0.25° grid, this reduction is less apparent at 0.5° and hardly present at 1.25° (where the RSS for the control and SFIP indices is similar). If the indices had only been evaluated on the 0.25° grid, one would conclude than an improvement in BSS was only possible at the expense of RSS. However, on the 1.25° grid the SFIP indices maintain RSS while vastly improving BSS. Although this has required regridding to 1.25°, this is the grid that operational icing indices are currently shown on.
5. Summary and Conclusions
An icing index from the literature has been implemented in the Met Office forecasting system. Ground-based remote sensing observations are used to evaluate model hindcasts of cloud cover in the supercooled liquid water temperature range. The model is shown to overestimate the frequency of occurrence of liquid cloud. In the temperature range from 0° to −30°C, if one ignores the phase of the cloud predicted by the model, but just assesses the frequency of occurrence of cloud in general, the biases are reduced. Satellite-based retrievals of icing potential are used to quantitatively assess the skill of the model icing hindcasts. It is shown that the current icing index performs poorly, in line with anecdotal evidence that it is unreliable and not very useful. The new icing index, by comparison, shows evidence of reliability. When one element of the index, the relative humidity membership function, is replaced with the model-predicted bulk cloud fraction, it can improve reliability when assessed using passive satellite data on a 0.25° grid, while greatly reducing cloud occurrence frequency biases when compared to ground-based active sensing. Finally, to help with the rapid interpretation of the index in an operational context, a new color scale is proposed which quantifies both likelihood and severity in a traceable way. The SFIP-LI-BCF index is now being made available operationally in real time to aviation meteorologists at the Met Office.
There are several areas for potential future work: (i) the inclusion of information from an ensemble-based prediction system, (ii) the calibration of the severity component against pilot reports or severity indices produced by other NWP centers, (iii) making use of more recent satellite icing products (e.g., Smith et al. 2012), (iv) the inclusion of freezing rain, and (v) the assessment of the index, currently only evaluated in the global forecasting system, in convection-permitting regional modeling configurations.
Thanks are due to the United Kingdom’s Civil Aviation Authority (CAA) for funding CM, KB, and RB. This work has benefitted from discussions with: Bob Lunnon, Helen Wells, Piers Buchanan, Teil Howard, Peter Francis, Katie O’Boyle, and Ruth Steele. Some of the observational data were obtained from the Atmospheric Radiation Measurement (ARM) Climate Research Facility, a U.S. Department of Energy Office of Science user facility sponsored by the Office of Biological and Environmental Research.