An Iterative Approach toward Development of Ensemble Visualization Techniques for High-Impact Winter Weather Hazards: Part II: Product Evaluation

Jacob T. Radford Department of Marine, Earth, and Atmospheric Sciences, North Carolina State University, Raleigh, North Carolina, and Cooperative Institute for Research in the Atmosphere, Colorado State University, Fort Collins, Colorado;

Search for other papers by Jacob T. Radford in
Current site
Google Scholar
PubMed
Close
,
Gary M. Lackmann Department of Marine, Earth, and Atmospheric Sciences, North Carolina State University, Raleigh, North Carolina;

Search for other papers by Gary M. Lackmann in
Current site
Google Scholar
PubMed
Close
,
Jean Goodwin Department of Communication, North Carolina State University, Raleigh, North Carolina;

Search for other papers by Jean Goodwin in
Current site
Google Scholar
PubMed
Close
,
James Correia Jr. National Oceanic and Atmospheric Administration/National Weather Service/National Center for Environmental Prediction/Weather Prediction Center, College Park, Maryland, and Cooperative Institute for Research in Environmental Sciences, University of Colorado Boulder, Boulder, Colorado;

Search for other papers by James Correia Jr. in
Current site
Google Scholar
PubMed
Close
, and
Kirstin Harnos National Oceanic and Atmospheric Administration/National Weather Service/National Center for Environmental Prediction/Weather Prediction Center, College Park, Maryland

Search for other papers by Kirstin Harnos in
Current site
Google Scholar
PubMed
Close
Open access

Abstract

We developed five prototype convection-allowing model ensemble visualization products with the goal of improving depictions of the timing of winter weather hazards. These products are interactive, web-based plots visualizing probabilistic onset times and durations of intense snowfall rates, probabilities of heavy snow at rush hour, periods of heightened impacts, and mesoscale snowband probabilities. Prototypes were evaluated in three experimental groups coordinated by the Weather Prediction Center (WPC) Hydrometeorological Testbed (HMT), with a total of 53 National Weather Service (NWS) forecasters. Forecasters were asked to complete a simple forecast exercise for a snowfall event, with a control group using the Storm Prediction Center’s (SPC) High-Resolution Ensemble Forecast (HREF) system viewer, and an experimental group using both the HREF viewer and the five experimental graphics. Forecast accuracy was similar between the groups, but the experimental group exhibited smaller mean absolute error for snowfall duration forecasts. A series of Likert-scale questions saw participants respond favorably to all of the products and indicated that they would use them in operational forecasts and in communicating information to core partners. Forecasters also felt that the new products improved their comprehension of ensemble spread and reduced the time required to complete the forecasting exercise. Follow-up plenary discussions reiterated that there is a high demand for ensemble products of this type, though a number of potential improvements, such as greater customizability, were suggested. Ultimately, we demonstrated that social science methods can be effectively employed in the atmospheric sciences to yield improved visualization products.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jacob Radford, jacob.radford@noaa.gov

Abstract

We developed five prototype convection-allowing model ensemble visualization products with the goal of improving depictions of the timing of winter weather hazards. These products are interactive, web-based plots visualizing probabilistic onset times and durations of intense snowfall rates, probabilities of heavy snow at rush hour, periods of heightened impacts, and mesoscale snowband probabilities. Prototypes were evaluated in three experimental groups coordinated by the Weather Prediction Center (WPC) Hydrometeorological Testbed (HMT), with a total of 53 National Weather Service (NWS) forecasters. Forecasters were asked to complete a simple forecast exercise for a snowfall event, with a control group using the Storm Prediction Center’s (SPC) High-Resolution Ensemble Forecast (HREF) system viewer, and an experimental group using both the HREF viewer and the five experimental graphics. Forecast accuracy was similar between the groups, but the experimental group exhibited smaller mean absolute error for snowfall duration forecasts. A series of Likert-scale questions saw participants respond favorably to all of the products and indicated that they would use them in operational forecasts and in communicating information to core partners. Forecasters also felt that the new products improved their comprehension of ensemble spread and reduced the time required to complete the forecasting exercise. Follow-up plenary discussions reiterated that there is a high demand for ensemble products of this type, though a number of potential improvements, such as greater customizability, were suggested. Ultimately, we demonstrated that social science methods can be effectively employed in the atmospheric sciences to yield improved visualization products.

© 2023 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jacob Radford, jacob.radford@noaa.gov

Convection-allowing numerical weather prediction model (CAM) ensembles offer tremendous potential in forecasting a variety of hazardous weather threats, including severe weather, flash flooding, mesoscale snowbands, and other mesoscale features (Mass et al. 2002; Roebber et al. 2004; Smith et al. 2008; James et al. 2022). However, the sheer volume of model output necessitates innovative strategies to visualize the data in ways that are intuitive and actionable for operational weather forecasters and their customers. The effective communication of uncertainty information was a key component of an American Meteorological Society (AMS) strategic implementation plan in 2011 (Hirschberg et al. 2011) and remains an important goal toward improving decision-support services (NWS 2019, p. 10).

Effective visualization and communication of forecast uncertainty information has been shown to result in better societal outcomes. For example, Keith and Leyton (2007) found that an airline could save $50 million per year by applying probabilistic forecasts, while Steiner et al. (2008) showed potential reduction in flight delays and costs with probabilistic information. A number of studies have focused on meteorological ensemble visualization development and mathematical summarization of multidimensional datasets. Rautenhaus et al. (2018) present a review of these studies and visualization methods, including stamp maps (commonly referred to as “postage stamps”), spaghetti plots, probabilistic heat maps, ensemble mean and standard deviation, and ensemble clustering. Similarly, Wang et al. (2019) review ensemble visualization strategies from a broader perspective, including research within computer-science- and physics-based disciplines. These works have made valuable contributions to the field of uncertainty visualization, but work remains to be done in evaluating the extensibility of these products to CAMs, the feasibility of operational implementation of these products, and the utility of these products from the perspectives of operational forecasters.

Novak et al. (2008) found that forecasters desire high-resolution ensemble guidance, but there was a lack of postprocessing products to facilitate uncertainty communication. Evans et al. (2014) followed up on this study and used a forecasting exercise to test specifically how forecasters were using convection-allowing ensemble output, finding that the output was generally used to identify a “most likely” scenario, rather than interrogating the individual solutions. In the realm of severe weather, Schwartz et al. (2015) demonstrated operational application of visualization techniques, such as ensemble and neighborhood probabilities of updraft helicity, hail, and winds, to NCAR’s experimental real-time ensemble prediction system. Roberts et al. (2019) summarized several CAM visualization techniques tailored to severe weather, including neighborhood (maximum) ensemble probabilities, paintball plots, and postage stamps. NOAA’s Hazardous Weather Testbed Spring Forecasting Experiment (SFE; Clark et al. 2012) has also facilitated research into operational forecaster perspectives on probabilistic tools. For example, in a survey of 62 forecasters at the 2017 SFE, Wilson et al. (2019) found that many forecasters were misinterpreting or unable to explain probabilistic output from a storm-scale ensemble system and advocated for greater professional development in this area. It is clear that work remains to be done in the areas of uncertainty training, product development, and communication.

Demuth et al. (2020) has also made progress in this discipline, using social science research, including participant observation and interviews, to evaluate operational forecaster perspectives and needs on the current state of ensemble visualization and specifically on three novel visualization products. They found that forecasters desired products that visualize event timing and potential impact, and they sought greater transparency in the development of probabilistic products. This is consistent with the U.S. National Weather Service’s recent emphasis on impact-decision support services (IDSS), an initiative to emphasize and communicate potential impacts to end users (NWS 2019). Demuth et al. (2020) concluded that “robustly incorporating … social science research allowed us to look more broadly and listen more deeply in ways that reveal critical mismatches, gaps, and potential solutions.” This “systems” approach to development, in which researchers and forecasters, and perhaps decision-makers, work together to improve the forecast communication pipeline is another emphasis of the American Meteorological Society’s Weather and Climate Enterprise Strategic Implementation Plan for Generating and Communicating Forecast Uncertainty (Hirschberg et al. 2011). This is a similar concept to Abras et al.’s (2004) “user-centered design,” advocating for active user involvement in the design process. In order for new tools to be incorporated into operations, they must be credible, salient, and legitimate in the eyes of forecasters, attributes that are enhanced by iteration and open dialogue (Cash et al. 2003; White et al. 2010).

In Part I of this research (Radford et al. 2023), nine prototype ensemble visualization products were developed based upon input from operational forecasters to improve communication of winter weather forecast impacts and uncertainty. Follow-up discussion with the winter Forecasting a Continuum of Environmental Threats (FACETs; Rothfusz et al. 2018) team helped us to select five of these prototypes for continued development and evaluation. These five products included visualizations of probabilistic snowfall onset time [Fig. 1a; based off of Demuth et al. (2020) onset time graphic with the addition of earliest, most likely, and latest potential onset] and duration [Fig. 1b; based off of Demuth et al. (2020) duration graphic with the addition of shortest, most likely, and longest potential duration], the probability of snowfall during rush hour (Fig. 1c), a mesoscale snowband probability heat map using the Radford et al. (2019) detection algorithm (Fig. 1d), and a combination meteogram emphasizing periods with the highest potential impacts (Fig. 1e). These products are web based and interactive, offering the capability to zoom, pan, toggle percentiles, and sample values. They were applied in real time to HREF output. Now we seek to measure the utility of these products from the perspective of forecasters. We evaluate our products both qualitatively and quantitatively to determine whether the new visualizations improve forecast accuracy, reduce task workload, improve conceptualizations of forecast uncertainty, and would be incorporated into an operational workflow.

Fig. 1.
Fig. 1.

Five prototype visualizations from Radford et al. (2023), including (a) probabilistic snowfall rate onset time, (b) probabilistic snowfall rate duration, (c) probability of snowfall during rush hour, (d) mesoscale snowband probability heat map, and (e) combination meteogram.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

Methods

We evaluated our visualizations with a total of 53 NWS forecasters during the winter of 2021/22 through three separate experiments coordinated by the WPC Hydrometeorological Testbed (HMT) designed to quantify forecast accuracy, forecaster confidence, and forecaster workload with and without the new products. In other words, we sought to determine whether there was quantifiable value in these new products beyond that of commonly utilized existing tools. Each of these experiments followed the same procedure: participants were given a brief introduction into each of the new products and one of three high-impact snowfall events, then were randomly assigned to be in either the control or experimental group and asked to perform a mock forecasting exercise for their event using a Google Form. The control groups (total n = 26) were asked to complete the exercise using only the SPC HREF viewer web page (Roberts et al. 2019). The experimental groups (total n = 27) were asked to complete the exercise using both the SPC’s HREF viewer web page and the web page containing our new products (sample figures provided at www.visweather.com/bams2023). The forecasting exercise primarily concerned snowfall event timing and associated uncertainties for three different locations impacted by each event. The exercise was intended to encourage active engagement with new products to stimulate discussion and feedback, with a secondary goal of comparing task accuracy.

The three forecasting cases were all recent heavy snow events, chosen because they met a few criteria to ensure our results are robust to different scenarios. First, each of the events occurred in a different region of the United States. Event 1 occurred on 17–18 December 2020 across New York and Connecticut. Event 2 occurred on 15–16 January 2021 in Nebraska and Iowa. Event 3 occurred on 14–15 February 2021 in Texas and Mississippi. In addition, each event had at least moderate potential for mesoscale banding and forecast liquid equivalent snowfall rates (LESRs) exceeding 0.10 in. h−1 (1 in. = 2.54 cm) according to the Multi-Radar Multi-Sensor (MRMS) dataset (Zhang et al. 2016). The three locations within each event were chosen to provide a spectrum of different environments with a range of different precipitation intensities and durations. For event 1, these locations were Albany and Syracuse, New York, and Bridgeport, Connecticut. For event 2, the locations were Lincoln, Nebraska, Concordia, Kansas, and Des Moines, Iowa. For event 3, the locations were Austin and Longview, Texas, and Greenville, Mississippi. Event 1 provided an HREF initialization at a short lead time, event 2 at a long lead time, and event 3 at an intermediate lead time. Events and locations were also chosen such that precipitation type would not be of primary concern. The domains and locations for each event are shown in Fig. 2.

Fig. 2.
Fig. 2.

The domains and locations used for each of the three forecasting exercises.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

Identifying a precipitation dataset to verify forecast accuracy was difficult, given known challenges of producing quantitative precipitation estimates (QPE) for snowfall events, such as stuck gauges, gauge undercatch, radar beam overshooting, and variation in snow crystal type (Martinaitis et al. 2015; Hanft et al. 2019). The National Operational Hydrologic Remote Sensing Center’s (NOHRSC; Carroll et al. 2001) National Snow Analyses would have been ideal, but the smallest temporal resolution in this product is 6 h, insufficient for evaluation of hourly CAM output. The MRMS dataset produces hourly QPE based on quality-controlled radar reflectivity, with gauge data from the Hydrometeorological Automated Data System (HADS) and NWP data from the High-Resolution Rapid Refresh (HRRR; Benjamin et al. 2016) helping to fill in areas of poor radar coverage. While this is the best hourly QPE dataset available to our knowledge, we acknowledge that radar-based QPEs have large uncertainties, particularly for heavy snowfall events (Hanft et al. 2019) and that this is a relatively large potential source of error while validating forecast accuracy. Observed characteristics according to the MRMS of the three snowfall events at each location, including onset time and duration of two LESRs, are presented in Table 1.

Table 1.

Characteristics of the three heavy snowfall events for each of the three locations, including HREF initialization time, MRMS observed onset time of 0.05- and 0.10-in. LESR, and MRMS observed duration of 0.05- and 0.10in. LESR. “DNA” indicates that we did not ask participants about this particular rate, while N/A indicates that the threshold was never met at the given location.

Table 1.

Forecasters were asked to predict these same attributes for the three locations in their assigned event, after which mean absolute errors (MAEs) were calculated for the control and experimental groups and compared. Given the small sample sizes and large uncertainties associated with the MRMS verification dataset, this comparison is intended only to give an initial impression of the groups’ performances and rule out the possibility of large negative influences associated with the experimental products. We note also that the exercise questions are specific to the new products and thus may be viewed as biased in their favor. However, our hypothesis is that the questions posed (“when will heavy snow start?” and “how long will heavy snow last”) are relevant to forecasters and may be difficult to answer using existing tools, a motivation for research and development. Additionally, our goal is for these products to be used in tandem with existing tools rather than as replacements.

Following completion of the exercise, the experimental group (n = 27) was asked a series of Likert-scale (Likert 1932) questions to self-assess on a scale from 1 (strongly disagree) to 5 (strongly agree) whether the products increased forecaster comprehension of event uncertainties, reduced their task workload, would be incorporated into their operational workflow, or would be beneficial in communication with core partners. A free response style question at the end of the survey asked participants for any additional comments or feedback, on either the products themselves or on the exercise. Both the control group and experimental group reconvened after the exercise for a plenary discussion on the utility of the new products and ideas for further development, which was audio recorded and transcribed. This procedure yielded two quantitative datasets (forecast accuracy and Likert scale) and one qualitative (free response and discussion) dataset. The qualitative data were categorized by product and then analyzed using an inductive approach to identify common themes (Braun and Clarke 2006). We then relate these themes to the quantitative Likert-scale responses.

Results

Forecast accuracy.

Our evaluation sessions first asked participants to perform a brief forecasting exercise for one of three heavy snowfall events using either the SPC HREF viewer or the SPC HREF viewer and our experimental visualizations. Participants were asked to predict the most likely onset time and duration of various snowfall rates at different locations, with responses verified against MRMS data for the same period. It was not necessarily expected that there would be a substantial difference in forecast accuracy between the control and experimental groups as both groups had access to the same underlying HREF data and were capable of reaching the same conclusions. However, it was important that the experimental group forecasts were at least as accurate as the control group, indicating that the new products did not negatively influence forecast quality, either through inaccurate data representations or user confusion.

Two factors went into evaluating snowfall onset forecasts. First, did the forecasters predict that the specified snowfall rate would occur at any time for a given location? Second, given that a forecaster predicted the snowfall rate would occur at the location, how far off was their predicted onset time from the actual occurrence? Results are reported in Table 2. For event 1, 95% of responses across all locations correctly forecast the occurrence of 0.10-in. LESRs in both the control and experimental groups. For event 2, 70% of responses in the control group correctly forecasted the occurrence of 0.10-in. LESRs, similar to the 67% in the experimental group. However, in event 3, only 41% of control group respondents correctly predicted the occurrence of 0.10-in. LESRs compared to 70% for the experimental group.

Table 2.

The percentage of forecasters in each group that correctly predicted whether or not 0.10 in. h−1 LESRs would be observed at any time during the forecast period and the mean absolute errors for forecast onset times of 0.10 in. h−1 LESRs for each event and location. Cells marked “N/O” had no observation exceeding 0.10 in. h−1 liquid equivalent, while cells marked “N/F” had no forecasters predict occurrence of 0.10 in. h−1 LESRs. The number of forecasters included in each MAE calculation are below the MAEs. Some participants were not included in calculations due to incorrectly predicting that the snowfall threshold would not be exceeded at any time.

Table 2.

For events 1 and 2, control and experimental 0.10-in. LESR onset time errors (Table 1) were very similar, with MAEs of 1.57 and 2.05 h for the two control groups and 1.57 and 1.90 h for the two experimental groups. For event 3, the control MAE was 0.91 h, less than the experimental MAE of 1.81 h. After further investigation, this difference was primarily the result of forecasts for location B in event 3. At location B, no control forecasters predicted the occurrence of 0.10-in. LESRs, while four experimental forecasts (correctly) predicted the occurrence. However, the four experimental participants who made onset time forecasts for this location had relatively high errors. Thus, error for the experimental group was inflated due to their forecasts for location B. Excluding location B in the calculation of MAEs results in an MAE of 0.91 h for the control group and a more comparable 1.18 h MAE for the experimental group. Both groups slightly underperformed forecasts made based upon the HREF median onset times. Given the similar performance of the control and experimental groups for events 1 and 2, and the similar performance on event 3 when accounting for disparity in correct responses for location B, we do not believe there to be a discernible difference in onset time forecast skill.

Differences in forecast skill for event duration were more easily quantified than event onset. The MAEs for the duration of 0.10 in. h−1 LESRs for all locations (Table 3) across the three control groups were 4.05 h (n = 7), 4.23 h (n = 10), and 0.85 h (n = 9). This is compared to MAEs of 2.52 h (n = 7), 2.40 h (n = 10), and 0.57 h (n = 10) for the experimental groups. This is an average error reduction of 38% for the experimental groups, consistent between the three events. Beginning with group 2, we also asked forecasters to predict the duration of 0.05 in. h−1 LESRs due to the relatively short duration of 0.10 in. h−1 LESRs to induce more variance among responses. For this lower intensity threshold, MAEs for duration forecasts for the two control groups were 3.23 h (n = 10) and 2.78 h (n = 9), compared to 3.40 h (n = 10) and 1.57 h (n = 10) for the experimental groups. On average, this is a 20% reduction in error for the lower 0.05 in. h−1 liquid equivalent threshold, but improvement was solely due to much better performance by the experimental group in event 3. The control group slightly underperformed forecasts made based upon the HREF median, while the experimental group slightly overperformed the HREF median forecasts.

Table 3.

Mean absolute errors for forecast durations of 0.10 in. h−1 LESRs for each event and location.

Table 3.

We end this section by highlighting the caveat that our sample sizes are too small and uncertainties too high to make any robust conclusions about forecast accuracy. However, we did not see any evidence to indicate that the experimental groups were hindered by our products on any of the forecast tasks and may in fact have benefited from their application on duration forecasts. In addition, this was the first exposure forecasters had to the new graphics. Further training and practice could yield benefits to snowfall forecast accuracy.

Forecaster feedback.

The second portion of the forecaster survey provided forecasters in the experimental groups with a series of statements in the Likert-scale format, asking participants whether they strongly disagreed (1), disagreed (2), were neutral (3), agreed (4), or strongly agreed (5) with the statement. For the onset, duration, and rush hour probability products, forecasters were asked whether the product reduced the amount of time they felt the forecasting exercise required, whether the product improved their comprehension of ensemble spread, whether they would incorporate the product into their operational workflow, and whether they would communicate the information from the product to their core partners.

The Likert-scale distributions were our attempt to quantify the utility of the new products and the transition potential of the products into NWS operations. This is just one piece of the puzzle, the other being qualitative, open-ended feedback. Participants had two opportunities to provide their qualitative assessments of the products. First, an optional question at the end of the forecasting exercise asked for paragraph-style feedback, such as whether any of the products stood out, their suggestions for improvement, or their own ideas for visualization strategies. Following completion of the exercise, the control groups and experimental groups reconvened to debrief. Though the researchers occasionally guided participants with questions in this session, forecasters were quick to ask for clarifications, share their opinions, and offer suggestions for improvements or additional capabilities. Key themes from the qualitative feedback were identified using an inductive analysis (Braun and Clarke 2006). In the analysis below, we marry the quantitative Likert-scale feedback with the qualitative, discussion-based feedback. The precipitation type meteogram was not included in the Likert-scale analysis as it did not directly correlate with one of the forecast tasks in the mock exercise, but we have included qualitative feedback on the product.

Figure 3 shows the distribution of Likert-scale responses to questions on the snowfall onset time product. Overall response to the product was very positive, with forecasters indicating that they would use the product in operations (mean = 4.26) and to communicate information to core partners (mean = 4.48). Forecasters also indicated that they felt they had a better grasp of the HREF onset forecast spread (mean = 4.17) due to the product, presumably as a result of the capability to view the 10th-, 50th-, and 90th-percentile onset times. However, some forecasters commented that the product did not appear to convey much uncertainty information. We suspect that this is partially a consequence of the color mapping, which binned onset times into 3-h windows. The HREF spread between the 10th, 50th, and 90th percentiles often fell within the same color bin, resulting in minimal changes to the plot when toggling between the percentiles. Increasing the temporal resolution of the color map to hourly rather than 3 hourly would be an improvement, but this introduces a new challenge to the user of distinguishing between very similar colors. A compromise that we have since implemented (suggested by participants) is to retain the 3-hourly binned color bar but add the capability for users to hover over a point to get text readouts of the onset times at hourly intervals.

Fig. 3.
Fig. 3.

(a) A sample of the snowfall rate onset graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the snowfall onset time product. The dotted line is the divergence point between negative/neutral responses and positive responses.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

There was less agreement that the product saved the forecasters time for the forecast task (mean = 3.85). Forecasters commented that they had difficulty translating the contour colors to the color bar (the onset product used the Python “jet” color map), particularly for blues and greens. Maintaining a balance of data resolution and making the colors sufficiently distinct to distinguish is a common visualization challenge. As noted above, a hover feature has since been added that gives forecasters a second avenue to retrieve precise values while also providing more general sense of onset times using the color bar. The prototype onset time product also listed onset times in terms of forecast hour from the HREF initialization time rather than the valid UTC time, requiring forecasters to calculate the valid time from the initialization time and forecast hour, a time sink for some forecasters. A dynamic color bar that converts the forecast hour to UTC time has since been adopted for this reason. Finally, there were questions about how the onset time is determined. For example, how is onset time calculated when members predict different precipitation types? And is the onset time conditioned on a prediction of precipitation above a particular threshold? Quotes pertaining to the onset time and duration graphics are shown together in Table 4, as they were often grouped by forecasters for feedback.

Table 4.

Forecaster quotes pertaining to the onset time and duration products.

Table 4.

The Likert-scale divergence plot for the snowfall duration product (Fig. 4) is generally similar to that for the onset time product. Forecasters again agreed that they would use the product in operations (mean = 4.41) and would communicate the information to their core partners (mean = 4.37). They also agreed that the product increased their grasp on the spread of HREF duration forecasts (mean = 4.21). Interestingly, there was much stronger agreement here that the duration product reduced the amount of time on the forecast exercise (mean = 4.30) compared to the onset product. We propose a few potential factors that likely contribute to this difference: First, existing tools require forecasters to examine every individual output time to sum the total precipitation duration, while the duration product only requires examination of one graphic. Second, visualizing a variable like duration, which has a minimum and maximum value and a standard unit of hours, may be more intuitive for forecasters than visualizing a time. Third, it may have been easier for forecasters to distinguish values on the duration plot, which used the Python “viridis” color map, though multiple forecasters still noted that they found this distinction difficult. Finally, the duration product did not require users to perform the extra calculation to convert from forecast hour to valid UTC time. In other words, the duration graphic was much more effective in encoding information that could be accurately decoded by forecasters. This is perhaps a valuable example of how visualization choices influence task efficacy, despite being predicated upon the same underlying data. For the duration and onset graphics we incorporated a queryable rate threshold for the second and third groups, as suggested by Demuth et al. (2020). This addition seemed to be highly valued by forecasters, especially given the very limited rate thresholds offered in the SPC HREF viewer. One forecaster stated, “One thing that annoys me about the SPC HREF page is that when you’re doing the hourly precip … you either have 1/100th of an inch or one inch—no range in there, and it drives me crazy.”

Fig. 4.
Fig. 4.

(a) A sample of the snowfall rate duration graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the snowfall duration product. The dotted line is the divergence point between negative/neutral responses and positive responses.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

The Likert-scale responses for the rush hour probability plot are shown in Fig. 5. This product had the greatest level of agreement among forecasters that they would use the product in operations (mean = 4.48) and communicate the information to core partners (mean = 4.51). There was also slightly more agreement here that the rush hour product improved their grasp of the HREF forecast uncertainty associated with snowfall between the hours of 0700 and 0900 or 1600 and 1800 local time (mean = 4.41). This is not a surprising result, given that the rush hour plot is given in a more compact probabilistic heat map format rather than a toggle-able percentile slider as in the onset and duration products. Over half of the forecasters strongly agreed that the rush hour probability product reduced the time required for the forecasting exercise, roughly double the number for the onset and duration products. Overwhelmingly, the most common feedback received for the rush hour product was a desire for greater customizability because rush hour is highly variable based on location. At a minimum the rush hour period needs to be adjustable, and ideally forecasters would be able to calculate the probability of rate exceedance between any two times of their choosing, not just rush hour. With this adjustment, most forecasters felt that the rush hour product would be an efficient way to identify and communicate potential impacts to partners. As a more substantial modification, we discuss implementing this feature in the “Additional development” section. On the other hand, one forecaster wrote, “I am not sure [the rush hour products] are necessary since forecasters can look at the duration and onset timing product and figure out if the heavy snow overlaps the rush hour.” This is certainly true and may be preferable to some users, but overall response to the time reduction question indicates that an explicit rush hour product is viewed as a more efficient means to identify commuting hazards. Quotes pertaining to the rush hour graphic are shown in Table 5.

Fig. 5.
Fig. 5.

(a) A sample of the rush hour snowfall rate intersection graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the rush hour probability product. The dotted line is the divergence point between negative/neutral responses and positive responses.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

Table 5.

Forecaster quotes pertaining to the rush hour product.

Table 5.

The last product evaluated with the Likert scale was the snowband probability heat map, visualizing heightened risk of mesoscale snowband development. Recall that this product was not a component of the “event timing” product package as described in Radford et al. (2023), was less explicitly influenced by forecaster input, and did not respond directly to IDSS needs. These facts are generally reflected in the Likert scale (Fig. 6) and written responses. While a relatively high proportion of forecasters indicated that they would use the product in operations (mean = 4.22), multiple forecasters noted that they do not blindly trust snowband graphics and instead prefer to emphasize snowband environmental conditions. Others noted that while the text description of the product did cite a journal article describing the snowband detection algorithm, more direct details in or adjacent to the graphic were needed to establish trust. Forecasters were much more skeptical about communicating this product to core partners (mean = 3.78) for two reasons. First, they would not communicate a product they do not trust or understand fully. Second, forecasters indicated that the presence of banding is, in some ways, a scientific curiosity that partners, such as emergency managers, do not need to know about. Improvement in uncertainty comprehension (mean = 3.96) and time reduction (mean = 3.96) were both lower compared to other products, as well. Quotes pertaining to the snowband graphic are shown in Table 6.

Fig. 6.
Fig. 6.

(a) A sample of the snowband probability graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the snowband probability heat map product. The dotted line is the divergence point between negative/neutral responses and positive responses.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

Table 6.

Forecaster quotes pertaining to the snowband probability product.

Table 6.

The combination meteogram was not evaluated with the Likert scale but was still the subject of much forecaster feedback. Unlike the other temporal products, the meteogram had strongly diverging opinions. Feedback ranged from “the combination meteogram needs more maturing to quickly glean information from it” and “meteograms are a great concept but the actual visualization could be much improved,” to “I was particularly impressed by the combination meteogram, which synthesized a lot of forecast data into one easy to understand product” and “I actually found myself using [the combination meteogram] quite a bit … it drew my attention right away to those hours that I needed to focus on … I would say that it was really helpful for me to look at it that way.” Suggestions for improvements to the meteogram included wider bars to make hovering easier and including more statistics (such as minimum and maximum rates or distributions) in the text bubble that appears when hovering over a bar. One interesting finding related to the meteogram was that different forecasters seemed to take different pathways to perform the forecast exercise. While many forecasters chose to primarily use the snowfall onset and duration products, some others relied heavily upon the combination meteogram as a “one-stop shop” for both onset and duration questions.

More holistic commentary on the products is shown in Table 7, with four additional themes emerging. First, participants largely agreed that new products do not need to be implemented into internal viewers such as AWIPS to be useful, with forecasters frequently turning to cloud-based solutions. For example, 45% of participants use the SPC’s HREF viewer daily and 77% use it daily or weekly. Second, addressing precipitation type and snow-to-liquid ratios (SLRs) were common suggestions for future work. Multiple forecasters wished to see these products applied to freezing rain [a similar result was found by Demuth et al. (2020)] and the incorporation of SLR algorithms other than 10 to 1, which were generally viewed as insufficient due to snow density’s direct relevance to impacts. Third, forecasters appreciated the incorporation of uncertainty information in the form of the 10th, 50th, and 90th percentiles for the onset and duration products as an additional hedge for complex events that they can communicate to partners, again in keeping with Demuth et al. (2020), where the addition of probabilistic information to onset and duration was one of the most pressing needs. Fourth, as discussed previously, the addition of a mouse-over feature to sample values to all products was viewed as a virtual necessity to facilitate easier mental mapping of colors to values. Overall, forecasters made it abundantly clear that there is high demand for these types of temporal visualizations within the NWS. Said one forecaster, “This stuff is right on the money … we always get caught up in how much snow is going to fall … and what we hear from our partners is they want to know what time it’s going to start. Spot on with your visualizations, great stuff.” Said another, “These experimental graphics would be a huge asset in operations. Leading up to a big event they would help with forecasting, but I can see them being a huge asset in IDSS.” Several participants asked if the visualizations were being produced in real time and if they could be shared with colleagues and partners.

Table 7.

Forecaster quotes that did not pertain to a specific product.

Table 7.

Additional development.

Product development is an ongoing process and does not stop simply because our “final” products were well-received and we reached the last stage of our study. For example, all three evaluation groups engaged in discussion about how the rush hour product might be improved. Many forecasters expressed that while they found the product useful, rush hour varies substantially from city to city, and a customizable product could account for that. Additionally, there may be periods other than rush hour that would benefit from this type of visualization, such as a special event. Based on this feedback we developed a new “interval exceedance” product (Fig. 7a). Interval exceedances identify the HREF probability of exceeding a snowfall rate threshold between any two forecast hours, defined by the user. This type of customizability was also referenced in Demuth et al. (2020) as a potential avenue for future development. This allows forecasters to customize the tool to display the high snowfall rate probabilities for their own rush hour, or any other period of interest. Based on this same “interval” concept, a second new product was developed, termed “interval durations” (Fig. 7b), which allows forecasters to view the proportion of time that a snowfall rate is exceeded between any two forecast hours. These products are highly customizable, and forecasters reacted very positively to the concept, but the benefits come at a substantially increased computational cost. That said, we have demonstrated that using real-time data is feasible, particularly if users are willing to forego a bit of temporal resolution. Since these products were developed following the evaluation groups, they have not been evaluated in a robust sense, and this may be an area of future work.

Fig. 7.
Fig. 7.

(a) The interval exceedance plot, in which users choose a start time, end time, and threshold, and the plot displays the probability of exceedance for the custom period (www.visweather.com/bams2023, Part II 6a); (b) interval durations plot, in which users choose a start time, end time, and threshold, and the plot displays the proportion of time exceeding the threshold (www.visweather.com/bams2023, Part II 6b). Both are for shown for 17 Dec 2020.

Citation: Bulletin of the American Meteorological Society 104, 9; 10.1175/BAMS-D-22-0193.1

Conclusions

Three experiments were conducted in which forecasters were presented with new probabilistic winter weather visualizations, asked to perform a simple forecasting exercise for one of three heavy snowfall events, and then asked for feedback on the new visualizations in follow-up plenary discussions. Half of the forecasters in each group used only the SPC HREF viewer to complete the forecasting exercise, while half were able to access the SPC HREF viewer as well as our prototype graphics. The two groups performed comparably in most respects. The group with access to the experimental graphics exhibited smaller mean absolute error when forecasting durations of heavy snowfall for all three events, but the sample size is very small for quantitative verification and uncertainties are high given the quality of observational snowfall data.

Forecaster feedback on the snowfall onset and duration products was similar, with almost all forecasters indicating that they would use the products operationally and communicate the information to their core partners. In addition, forecasters responded that they had a better comprehension of the HREF uncertainty in both onset and duration from the products, an advantage over the Demuth et al. (2020) version, which displayed only the mean onset. While forecasters felt the duration product reduced time spent on the forecasting task, this was less true of the onset product, perhaps due to limitations of existing visualization techniques for evaluating event duration, the unfamiliar nature of plotting a time, the choice of color map, the format of the date and time, or a combination of these factors.

The rush hour probability plot, which visualizes the probability of exceeding a chosen snowfall rate during the morning or afternoon commute, received the most favorable Likert-scale responses of all of the products. Eighty-five percent of forecasters agreed or strongly agreed that they would use the product in operations and 93% agreed or strongly agreed that they would communicate the information to their core partners. However, in follow-up discussion, forecasters were adamant that the period defining rush hour needs to be customizable, due to different commute durations in different locations. For this reason, we proposed the “interval exceedance” and “interval duration” plots (Fig. 6). The interval exceedance plot visualizes the probability of exceeding an intensity threshold between any two (user-specified) forecast hours, while the interval duration plot visualizes the percentage of time an intensity threshold is exceeded between two (user-specified) forecast hours.

We reiterate that, like many testbed research activities, our results are limited by a small sample size (n = 53). This is especially true of quantitative results but also extends to the qualitative findings. For example, this sample is likely not fully representative of all winter weather forecasters and regions and even within this small sample we at times received contradictory feedback. Further increasing the customizability for user preference (such as custom color bars or intervals) could be one avenue to appeal to a broader scope of users, though it is unclear if there is an upper limit to how customizable a product can be before it contradicts visualization best practices or becomes overwhelming for users. Continued communication with the winter weather forecast community on the topic of probabilistic winter weather visualization strategies will be invaluable toward affirming and extending our results.

There is ample opportunity for future research, including evaluation of our other prototype products, evaluation of the interval exceedance and interval durations products, and extension of our methods to other meteorological phenomena, such as flash flooding, tropical systems, or heat waves. We primarily focused on two-dimensional data visualizations, but point-based one-dimensional (temporal) representations, such as the combination meteogram, have tremendous untapped potential due to greater dimensional freedom. Last, the preliminary focus group highlighted a need for visualizations not just of precipitation fields for IDSS, but also of diagnostics, such as frontogenesis and stability. Future work may also focus on the conditions of product uptake and attempt to quantify the factors that influence whether a forecaster will incorporate a new product into their operational workflow, including credibility, salience, and legitimacy. Visualization development is never complete. Each time we met with forecasters we brainstormed new strategies to improve the design and functionality to better meet their needs. However, we hope that these products can serve as a benchmark for future refinements and innovations, as it is clear that there is vigorous demand for the types of products that we have presented. Ultimately, our results demonstrate the value of social science research methods in improving visualization development and highlights the value of testbed activities in the NWS. The protection and expansion of forecaster bandwidth for these types of activities would further benefit future product development and evaluation.

Acknowledgments.

Support for this research was provided by NOAA Collaborative Science, Technology, and Applied Research (CSTAR) Grant NA19NWS4680001, awarded to North Carolina State University. We thank the NOAA NWS Weather Prediction Center HMT for organizing and hosting our experimental groups and we express our gratitude to the dozens of National Weather Service forecasters who took the time to participate and provide invaluable insights into NWP visualization. We would also like to thank the Storm Prediction Center, and Dr. Brett Roberts, specifically, for making an archive of HREF system data available and for the outstanding HREF viewer website (with archived images). We thank Jeff Waldstreicher for valuable suggestions during this process and for facilitating contact and discussion with other NWS groups working on similar problems.

Data availability statement.

Archived HREF data are available through the Storm Prediction Center: https://data.nssl.noaa.gov/thredds/catalog/FRDD/HREF.html; while real-time HREF data are available through NOAA NOMADS: https://nomads.ncep.noaa.gov/. The Storm Prediction Center HREF viewer and archived images can be found at www.spc.noaa.gov/exper/href/. MRMS data can be accessed through the Iowa Environmental Mesonet at https://mesonet.agron.iastate.edu/archive/. Individual forecast exercise and Likert responses, the deidentified transcripts from the first two plenary discussions, and plenary discussion notes are stored on a shared drive and are available upon request.

References

  • Abras, C., D. Maloney-Krichmar, and J. Preece, 2004: User-centered design. Encyclopedia of Human-Computer Interaction, W. Bainbridge, Ed., Vol. 37, Sage Publications, 445456.

    • Search Google Scholar
    • Export Citation
  • Benjamin, S. G., and Coauthors, 2016: A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144, 16691694, https://doi.org/10.1175/MWR-D-15-0242.1.

    • Search Google Scholar
    • Export Citation
  • Braun, V., and V. Clarke, 2006: Using thematic analysis in psychology. Qual. Res. Psychol., 3, 77101, https://doi.org/10.1191/1478088706qp063oa.

    • Search Google Scholar
    • Export Citation
  • Carroll, T., D. Cline, G. Fall, A. Nilsson, L. Li, and A. Rost, 2001: NOHRSC operations and the simulation of snow cover properties for the coterminous US. Proc. 69th Annual Meeting of the Western Snow Conf., Sun Valley, ID, WSC, www.nohrsc2.noaa.gov/technology/pdf/wsc2001.pdf.

  • Cash, D. W., W. C. Clark, F. Alcock, N. M. Dickson, N. Eckley, D. H. Guston, J. Jäger, and R. B. Mitchell, 2003: Knowledge systems for sustainable development. Proc. Natl. Acad. Sci. USA, 100, 80868091, https://doi.org/10.1073/pnas.1231332100.

    • Search Google Scholar
    • Export Citation
  • Clark, A. J., and Coauthors, 2012: An overview of the 2010 Hazardous Weather Testbed experimental forecast program spring experiment. Bull. Amer. Meteor. Soc., 93, 5574, https://doi.org/10.1175/BAMS-D-11-00040.1.

    • Search Google Scholar
    • Export Citation
  • Demuth, J. L., and Coauthors, 2020: Recommendations for developing useful and usable convection-allowing model ensemble information for NWS forecasters. Wea. Forecasting, 35, 13811406, https://doi.org/10.1175/WAF-D-19-0108.1.

    • Search Google Scholar
    • Export Citation
  • Evans, C., D. F. Van Dyke, and T. Lericos, 2014: How do forecasters utilize output from a convection-permitting ensemble forecast system? Case study of a high-impact precipitation event. Wea. Forecasting, 29, 466486, https://doi.org/10.1175/WAF-D-13-00064.1.

    • Search Google Scholar
    • Export Citation
  • Hanft, W., J. Zhang, and S. M. Martinaitis, 2019: A detailed evaluation of the MRMS snow QPE. Proc. 33rd Conf. on Hydrology, Phoenix, AZ, Amer. Meteor. Soc., 872, https://ams.confex.com/ams/2019Annual/meetingapp.cgi/Paper/351555.

  • Hirschberg, P. A., and Coauthors, 2011: A weather and climate enterprise strategic implementation plan for generating and communicating forecast uncertainty information. Bull. Amer. Meteor. Soc., 92, 16511666, https://doi.org/10.1175/BAMS-D-11-00073.1.

    • Search Google Scholar
    • Export Citation
  • James, E. P., and Coauthors, 2022: The High-Resolution Rapid Refresh (HRRR): An hourly updating convection permitting forecast model. Part II: Forecast performance. Wea. Forecasting, 37, 13971417, https://doi.org/10.1175/WAF-D-21-0130.1.

    • Search Google Scholar
    • Export Citation
  • Keith, R., and S. M. Leyton, 2007: An experiment to measure the value of statistical probability forecasts for airports. Wea. Forecasting, 22, 928935, https://doi.org/10.1175/WAF988.1.

    • Search Google Scholar
    • Export Citation
  • Likert, R., 1932: A technique for the measurement of attitudes. Arch. Psychol., 140, 155.

  • Martinaitis, S. M., S. B. Cocks, Y. Qi, B. T. Kaney, J. Zhang, and K. Howard, 2015: Understanding winter precipitation impacts on automated gauge observations within a real-time system. J. Hydrometeor., 16, 23452363, https://doi.org/10.1175/JHM-D-15-0020.1.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407430, https://doi.org/10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.

    • Search Google Scholar
    • Export Citation
  • Novak, D. R., D. R. Bright, and M. J. Brennan, 2008: Operational forecaster uncertainty needs and future roles. Wea. Forecasting, 23, 10691084, https://doi.org/10.1175/2008WAF2222142.1.

    • Search Google Scholar
    • Export Citation
  • NWS, 2019: Building a weather-ready nation: 2019–2022 strategic plan. NOAA/NWS Rep., 28 pp., www.weather.gov/media/wrn/NWS_Weather-Ready-Nation_Strategic_Plan_2019-2022.pdf.

  • Radford, J. T., G. M. Lackmann, and M. A. Baxter, 2019: An evaluation of snowband predictability in the High-Resolution Rapid Refresh. Wea. Forecasting, 34, 14771494, https://doi.org/10.1175/WAF-D-19-0089.1.

    • Search Google Scholar
    • Export Citation
  • Radford, J. T., G. M. Lackmann, J. Goodwin, J. Correia Jr., and K. Harnos, 2023: An iterative approach toward development of ensemble visualization techniques for high-impact winter weather hazards. Part I: Product development. Bull. Amer. Meteor. Soc., https://doi.org/10.1175/BAMS-D-22-0192.1, in press.

    • Search Google Scholar
    • Export Citation
  • Rautenhaus, M., M. Böttinger, S. Siemen, R. Hoffman, R. M. Kirby, M. Mirzargar, N. Röber, and R. Westermann, 2018: Visualization in meteorology—A survey of techniques and tools for data analysis tasks. IEEE Trans. Visualization Comput. Graphics, 24, 32683296, https://doi.org/10.1109/TVCG.2017.2779501.

    • Search Google Scholar
    • Export Citation
  • Roberts, B., I. J. Jirak, A. J. Clark, S. J. Weiss, and J. S. Kain, 2019: Postprocessing and visualization techniques for convection-allowing ensembles. Bull. Amer. Meteor. Soc., 100, 12451258, https://doi.org/10.1175/BAMS-D-18-0041.1.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., D. M. Schultz, B. A. Colle, and D. J. Stensrud, 2004: Toward improved prediction: High-resolution and ensemble modeling systems in operations. Wea. Forecasting, 19, 936949, https://doi.org/10.1175/1520-0434(2004)019<0936:TIPHAE>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Rothfusz, L. P., R. Schneider, D. Novak, K. Klockow-McClain, A. E. Gerard, C. Karstens, G. J. Stumpf, and T. M. Smith, 2018: FACETs: A proposed next-generation paradigm for high-impact weather forecasting. Bull. Amer. Meteor. Soc., 99, 20252043, https://doi.org/10.1175/BAMS-D-16-0100.1.

    • Search Google Scholar
    • Export Citation
  • Schwartz, C. S., G. S. Romine, R. A. Sobash, K. R. Fossell, and M. L. Weisman, 2015: NCAR’s experimental real-time convection-allowing ensemble prediction system. Wea. Forecasting, 30, 16451654, https://doi.org/10.1175/WAF-D-15-0103.1.

    • Search Google Scholar
    • Export Citation
  • Smith, T. L., S. G. Benjamin, J. M. Brown, S. Weygandt, T. Smirnova, and B. Schwartz, 2008: Convection forecasts from the hourly updated, 3-km High Resolution Rapid Refresh (HRRR) model. 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 11.1, https://ams.confex.com/ams/pdfpapers/142055.pdf.

  • Steiner, M., C. K. Mueller, G. Davidson, and J. A. Krozel, 2008: Integration of probabilistic weather information with air traffic management decision support tools: A conceptual vision for the future. 13th Conf. on Aviation, Range and Aerospace Meteorology, New Orleans, LA, Amer. Meteor. Soc., 4.1, https://ams.confex.com/ams/88Annual/techprogram/paper_128471.htm.

  • Wang, J., S. Hazarika, C. Li, and H.-W. Shen, 2019: Visualization and visual analysis of ensemble data: A survey. IEEE Trans. Visualization Comput. Graphics, 25, 28532872, https://doi.org/10.1109/TVCG.2018.2853721.

    • Search Google Scholar
    • Export Citation
  • White, D. D., A. Wutich, K. L. Larson, P. Gober, T. Lant, and C. Senneville, 2010: Credibility, salience, and legitimacy of boundary objects: Water managers’ assessment of a simulation model in an immersive decision theater. Sci. Public Policy, 37, 219232, https://doi.org/10.3152/030234210X497726.

    • Search Google Scholar
    • Export Citation
  • Wilson, K. A., P. L. Heinselman, P. S. Skinner, J. J. Chaote, and K. E. Klockow-McClain, 2019: Meteorologists’ interpretations of storm-scale ensemble-based forecast guidance. Wea. Climate Soc., 11, 337354, https://doi.org/10.1175/WCAS-D-18-0084.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Coauthors, 2016: Multi-Radar Multi-Sensor (MRMS) quantitative precipitation estimation: Initial operating capabilities. Bull. Amer. Meteor. Soc., 97, 621638, https://doi.org/10.1175/BAMS-D-14-00174.1.

    • Search Google Scholar
    • Export Citation
Save
  • Abras, C., D. Maloney-Krichmar, and J. Preece, 2004: User-centered design. Encyclopedia of Human-Computer Interaction, W. Bainbridge, Ed., Vol. 37, Sage Publications, 445456.

    • Search Google Scholar
    • Export Citation
  • Benjamin, S. G., and Coauthors, 2016: A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144, 16691694, https://doi.org/10.1175/MWR-D-15-0242.1.

    • Search Google Scholar
    • Export Citation
  • Braun, V., and V. Clarke, 2006: Using thematic analysis in psychology. Qual. Res. Psychol., 3, 77101, https://doi.org/10.1191/1478088706qp063oa.

    • Search Google Scholar
    • Export Citation
  • Carroll, T., D. Cline, G. Fall, A. Nilsson, L. Li, and A. Rost, 2001: NOHRSC operations and the simulation of snow cover properties for the coterminous US. Proc. 69th Annual Meeting of the Western Snow Conf., Sun Valley, ID, WSC, www.nohrsc2.noaa.gov/technology/pdf/wsc2001.pdf.

  • Cash, D. W., W. C. Clark, F. Alcock, N. M. Dickson, N. Eckley, D. H. Guston, J. Jäger, and R. B. Mitchell, 2003: Knowledge systems for sustainable development. Proc. Natl. Acad. Sci. USA, 100, 80868091, https://doi.org/10.1073/pnas.1231332100.

    • Search Google Scholar
    • Export Citation
  • Clark, A. J., and Coauthors, 2012: An overview of the 2010 Hazardous Weather Testbed experimental forecast program spring experiment. Bull. Amer. Meteor. Soc., 93, 5574, https://doi.org/10.1175/BAMS-D-11-00040.1.

    • Search Google Scholar
    • Export Citation
  • Demuth, J. L., and Coauthors, 2020: Recommendations for developing useful and usable convection-allowing model ensemble information for NWS forecasters. Wea. Forecasting, 35, 13811406, https://doi.org/10.1175/WAF-D-19-0108.1.

    • Search Google Scholar
    • Export Citation
  • Evans, C., D. F. Van Dyke, and T. Lericos, 2014: How do forecasters utilize output from a convection-permitting ensemble forecast system? Case study of a high-impact precipitation event. Wea. Forecasting, 29, 466486, https://doi.org/10.1175/WAF-D-13-00064.1.

    • Search Google Scholar
    • Export Citation
  • Hanft, W., J. Zhang, and S. M. Martinaitis, 2019: A detailed evaluation of the MRMS snow QPE. Proc. 33rd Conf. on Hydrology, Phoenix, AZ, Amer. Meteor. Soc., 872, https://ams.confex.com/ams/2019Annual/meetingapp.cgi/Paper/351555.

  • Hirschberg, P. A., and Coauthors, 2011: A weather and climate enterprise strategic implementation plan for generating and communicating forecast uncertainty information. Bull. Amer. Meteor. Soc., 92, 16511666, https://doi.org/10.1175/BAMS-D-11-00073.1.

    • Search Google Scholar
    • Export Citation
  • James, E. P., and Coauthors, 2022: The High-Resolution Rapid Refresh (HRRR): An hourly updating convection permitting forecast model. Part II: Forecast performance. Wea. Forecasting, 37, 13971417, https://doi.org/10.1175/WAF-D-21-0130.1.

    • Search Google Scholar
    • Export Citation
  • Keith, R., and S. M. Leyton, 2007: An experiment to measure the value of statistical probability forecasts for airports. Wea. Forecasting, 22, 928935, https://doi.org/10.1175/WAF988.1.

    • Search Google Scholar
    • Export Citation
  • Likert, R., 1932: A technique for the measurement of attitudes. Arch. Psychol., 140, 155.

  • Martinaitis, S. M., S. B. Cocks, Y. Qi, B. T. Kaney, J. Zhang, and K. Howard, 2015: Understanding winter precipitation impacts on automated gauge observations within a real-time system. J. Hydrometeor., 16, 23452363, https://doi.org/10.1175/JHM-D-15-0020.1.

    • Search Google Scholar
    • Export Citation
  • Mass, C. F., D. Ovens, K. Westrick, and B. A. Colle, 2002: Does increasing horizontal resolution produce more skillful forecasts? Bull. Amer. Meteor. Soc., 83, 407430, https://doi.org/10.1175/1520-0477(2002)083<0407:DIHRPM>2.3.CO;2.

    • Search Google Scholar
    • Export Citation
  • Novak, D. R., D. R. Bright, and M. J. Brennan, 2008: Operational forecaster uncertainty needs and future roles. Wea. Forecasting, 23, 10691084, https://doi.org/10.1175/2008WAF2222142.1.

    • Search Google Scholar
    • Export Citation
  • NWS, 2019: Building a weather-ready nation: 2019–2022 strategic plan. NOAA/NWS Rep., 28 pp., www.weather.gov/media/wrn/NWS_Weather-Ready-Nation_Strategic_Plan_2019-2022.pdf.

  • Radford, J. T., G. M. Lackmann, and M. A. Baxter, 2019: An evaluation of snowband predictability in the High-Resolution Rapid Refresh. Wea. Forecasting, 34, 14771494, https://doi.org/10.1175/WAF-D-19-0089.1.

    • Search Google Scholar
    • Export Citation
  • Radford, J. T., G. M. Lackmann, J. Goodwin, J. Correia Jr., and K. Harnos, 2023: An iterative approach toward development of ensemble visualization techniques for high-impact winter weather hazards. Part I: Product development. Bull. Amer. Meteor. Soc., https://doi.org/10.1175/BAMS-D-22-0192.1, in press.

    • Search Google Scholar
    • Export Citation
  • Rautenhaus, M., M. Böttinger, S. Siemen, R. Hoffman, R. M. Kirby, M. Mirzargar, N. Röber, and R. Westermann, 2018: Visualization in meteorology—A survey of techniques and tools for data analysis tasks. IEEE Trans. Visualization Comput. Graphics, 24, 32683296, https://doi.org/10.1109/TVCG.2017.2779501.

    • Search Google Scholar
    • Export Citation
  • Roberts, B., I. J. Jirak, A. J. Clark, S. J. Weiss, and J. S. Kain, 2019: Postprocessing and visualization techniques for convection-allowing ensembles. Bull. Amer. Meteor. Soc., 100, 12451258, https://doi.org/10.1175/BAMS-D-18-0041.1.

    • Search Google Scholar
    • Export Citation
  • Roebber, P. J., D. M. Schultz, B. A. Colle, and D. J. Stensrud, 2004: Toward improved prediction: High-resolution and ensemble modeling systems in operations. Wea. Forecasting, 19, 936949, https://doi.org/10.1175/1520-0434(2004)019<0936:TIPHAE>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Rothfusz, L. P., R. Schneider, D. Novak, K. Klockow-McClain, A. E. Gerard, C. Karstens, G. J. Stumpf, and T. M. Smith, 2018: FACETs: A proposed next-generation paradigm for high-impact weather forecasting. Bull. Amer. Meteor. Soc., 99, 20252043, https://doi.org/10.1175/BAMS-D-16-0100.1.

    • Search Google Scholar
    • Export Citation
  • Schwartz, C. S., G. S. Romine, R. A. Sobash, K. R. Fossell, and M. L. Weisman, 2015: NCAR’s experimental real-time convection-allowing ensemble prediction system. Wea. Forecasting, 30, 16451654, https://doi.org/10.1175/WAF-D-15-0103.1.

    • Search Google Scholar
    • Export Citation
  • Smith, T. L., S. G. Benjamin, J. M. Brown, S. Weygandt, T. Smirnova, and B. Schwartz, 2008: Convection forecasts from the hourly updated, 3-km High Resolution Rapid Refresh (HRRR) model. 24th Conf. on Severe Local Storms, Savannah, GA, Amer. Meteor. Soc., 11.1, https://ams.confex.com/ams/pdfpapers/142055.pdf.

  • Steiner, M., C. K. Mueller, G. Davidson, and J. A. Krozel, 2008: Integration of probabilistic weather information with air traffic management decision support tools: A conceptual vision for the future. 13th Conf. on Aviation, Range and Aerospace Meteorology, New Orleans, LA, Amer. Meteor. Soc., 4.1, https://ams.confex.com/ams/88Annual/techprogram/paper_128471.htm.

  • Wang, J., S. Hazarika, C. Li, and H.-W. Shen, 2019: Visualization and visual analysis of ensemble data: A survey. IEEE Trans. Visualization Comput. Graphics, 25, 28532872, https://doi.org/10.1109/TVCG.2018.2853721.

    • Search Google Scholar
    • Export Citation
  • White, D. D., A. Wutich, K. L. Larson, P. Gober, T. Lant, and C. Senneville, 2010: Credibility, salience, and legitimacy of boundary objects: Water managers’ assessment of a simulation model in an immersive decision theater. Sci. Public Policy, 37, 219232, https://doi.org/10.3152/030234210X497726.

    • Search Google Scholar
    • Export Citation
  • Wilson, K. A., P. L. Heinselman, P. S. Skinner, J. J. Chaote, and K. E. Klockow-McClain, 2019: Meteorologists’ interpretations of storm-scale ensemble-based forecast guidance. Wea. Climate Soc., 11, 337354, https://doi.org/10.1175/WCAS-D-18-0084.1.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., and Coauthors, 2016: Multi-Radar Multi-Sensor (MRMS) quantitative precipitation estimation: Initial operating capabilities. Bull. Amer. Meteor. Soc., 97, 621638, https://doi.org/10.1175/BAMS-D-14-00174.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Five prototype visualizations from Radford et al. (2023), including (a) probabilistic snowfall rate onset time, (b) probabilistic snowfall rate duration, (c) probability of snowfall during rush hour, (d) mesoscale snowband probability heat map, and (e) combination meteogram.

  • Fig. 2.

    The domains and locations used for each of the three forecasting exercises.

  • Fig. 3.

    (a) A sample of the snowfall rate onset graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the snowfall onset time product. The dotted line is the divergence point between negative/neutral responses and positive responses.

  • Fig. 4.

    (a) A sample of the snowfall rate duration graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the snowfall duration product. The dotted line is the divergence point between negative/neutral responses and positive responses.

  • Fig. 5.

    (a) A sample of the rush hour snowfall rate intersection graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the rush hour probability product. The dotted line is the divergence point between negative/neutral responses and positive responses.

  • Fig. 6.

    (a) A sample of the snowband probability graphic on 17 Dec 2020 and (b) Likert-scale responses to questions on the snowband probability heat map product. The dotted line is the divergence point between negative/neutral responses and positive responses.

  • Fig. 7.

    (a) The interval exceedance plot, in which users choose a start time, end time, and threshold, and the plot displays the probability of exceedance for the custom period (www.visweather.com/bams2023, Part II 6a); (b) interval durations plot, in which users choose a start time, end time, and threshold, and the plot displays the proportion of time exceeding the threshold (www.visweather.com/bams2023, Part II 6b). Both are for shown for 17 Dec 2020.

All Time Past Year Past 30 Days
Abstract Views 29 29 0
Full Text Views 1134 1134 40
PDF Downloads 757 757 25