1. Introduction and background
Heavy snowstorms can have severe societal and economic impacts; these may include adversely affecting travel, damaging property, and forcing schools and businesses to close (e.g., Rooney 1967; Robinson 1989; Changnon 1999; Eisenberg and Warner 2005; Changnon 2007; Changnon et al. 2008; Cerruti and Decker 2011). Accurately predicting snow amounts can help moderate these impacts by ensuring that resources are allocated appropriately and proper preventative measures are taken if appropriate. This has spurred the development and implementation of high-resolution, deterministic forecast models to predict snow amounts [e.g., the older Nested Area Model (NAM) and newer High-Resolution Rapid Refresh Model (HRRR)].
However, initial condition errors can grow with time, resulting in limits to predictability (Lorenz 1993; Bishop et al. 2001; Ancell and Hakim 2007; Torn and Hakim 2008; Zheng et al. 2013; Ota et al. 2013; Greybush et al. 2017). To help account for this uncertainty in model forecasts, operational global model and regional ensembles were developed [e.g., the 0.5° Global Ensemble Forecast System (GEFS) and the 12-km Short-Range Ensemble Forecast (SREF) model, respectively]. Nevertheless, the use of coarse resolution in many global ensemble models, as well as cumulus parameterizations in such ensemble models (including the SREF), can create practical model forecast errors (Zhang et al. 2007; Charles and Colle 2009; Colle and Charles 2011; Greybush et al. 2017).
Reducing model grid spacing could possibly lead to improved snowfall forecasts. For instance, forecasts described as “convection-allowing” (grid scale of 3 or 4 km) or “cloud-allowing1” (grid scale of 1 or 2 km) do not (generally) use a cumulus parameterization, but explicitly simulate clouds and precipitation using model microphysical schemes. For instance, Zhang et al. (2002) used a mesoscale model with 3.3-km grid spacing to investigate the possible sources of forecast error for the 24–25 January 2000 snowstorm along the East Coast of the United States. They found that the storm could have been well forecasted with conventional data in real time if the grid spacing had been smaller and explicit microphysics had been added, rather than the grid spacing employed in the operational forecasts of that time. They also showed that the detailed mesoscale distribution of precipitation in the 24- or 36-h forecast was significantly altered because of even small changes in the initial conditions (e.g., changes that occurred with the removing of one sounding from the analysis). Their experiments also revealed that forecast differences can arise from the rapid growth of errors at scales below 500 km, in association with moist processes. More recently, Greybush et al. (2017), who investigated the January 2015 and 2016 storms mentioned above, noted more accurate simulation of storm tracks and a better positioning of snow banding2 in 3-km retrospective ensemble forecasts of these same storms.
It is understandable that convection-allowing forecasts might be used to improve snow forecasts, given their success in improving the forecasts of severe weather and/or precipitation (e.g., Clark et al. 2009, 2011; Schwartz et al. 2009; Coniglio et al. 2010; Clark et al. 2012; Karstens et al. 2015), including precipitation morphology (Snively and Gallus 2014; Iyer et al. 2016). Convection-allowing forecast models have also been shown to produce more accurate quantitative precipitation forecast than the (12-km) NAM with parameterized convection (Schwartz et al. 2009; Clark et al. 2010). Small-membership convection-allowing ensemble forecasts were found to be more skillful than large-membership convection-parameterizing ensembles (Clark et al. 2009; Gallus 2010), emphasizing that the realistic simulation of precipitation in coarse-resolution models remains a very challenging problem. Yet, Schwartz et al. (2017) showed that probabilistic forecasts of springtime convection with 1-km grid spacing (i.e., “cloud-resolving”) were more accurate than those with 3-km grid spacing. They also found that better forecasts of heavy rain were associated with more accurate placement of mesoscale convective systems. More recently, a multiyear comparison between 1- and 3-km forecasts indicated that the simulated hourly precipitation climatology from the 1-km forecasts better aligned with the observed climatology (Schwartz and Sobash 2019).
It is an open question whether these findings can be extrapolated to forecasting snow and equivalent liquid accumulations in East Coast snowstorms, in which convection is usually weaker than in springtime convective storms. The subject is highly pertinent because there is still a strong reliance on convection-parameterizing models and ensembles for forecasting heavy snowfall. Since computing resources are limited, it is important to know how much additional quality (e.g., Tracton 2008) would be gained by reducing grid spacing from convection-parameterizing to convection-allowing and/or cloud-allowing. For instance, it is recognized that model forecast errors grow more quickly as grid spacing decreases (Weygandt and Seaman 1994; Roebber et al. 2004; Dyer and Zarzar 2016), leading more quickly to forecast displacement errors.
This paper examines how varying the grid spacing of the Weather Research and Forecasting Model (Skamarock et al. 2008) from 12 to 4 to 1.3 km impacts mid-Atlantic and East Coast winter storm forecasts at next-day (24–48 h) and day-after (48–72 h) time scales. The emphasis was on the New York City metropolitan area (hereafter, NYCMA). The practice of numerical weather prediction (NWP) is moving toward running higher-resolution operational ensemble forecasts. To evaluate the potential improvement in snow forecasts, a set of ensemble forecasts (appendix A) with GEFS boundary conditions was also produced for five disruptive East Coast storms that occurred in the period 2015–18. Comparisons were also (briefly) made between the WRF and GEFS forecasts for the same set of storms. Section 2 presents the methods of the study, while section 3 discusses the results. Section 4 sums up the findings and points to possible future research directions.
2. Methods
a. Model setup
This study produced two different sets of forecasts. For the winter of 2013–14, 19 sets of deterministic forecasts (Table 1) were produced at lead times of 24–48 h (next day) and 48–72 h (day after) using the WRF model with nested 12-, 4- and 1.3-km grid spacing domains. Case-study ensemble forecasts were also made for five disruptive storms from the winters of 2015–18 (a brief description of each storm is found in the appendices). Note that, at 12 km, the simulations had the equivalent grid spacing to the operational GFS, but smaller grid spacing than either the SREF (16 km) or the comparison 20-km model runs done by Greybush et al. (2017). Both the WRF 12-km forecasts produced here and those of Greybush et al. (2017) use a cumulus parameterization on these grids.
List of WRF wintertime deterministic forecasts. All forecasts started at 1200 UTC and continued for 72 h. “Daily snow amounts >” refers to a visual census of snow accumulation maps (greater than the amount in inches shown) within the area of the 1.3-km forecast domain on “day 2” of the forecast. The X means that measureable precipitation was not present in a majority of the areas within the 1.3-km domain.
Within the literature, there is some ambiguity as to the term for the 4- and 1.3-km forecasts. The former is referred to as convection-allowing and convection-permitting (e.g., Kain et al. 2006; Fierro et al. 2012; Schwartz et al. 2015). Forecasts with 1-km grid are referred to as convection-resolving (Fierro et al. 2012), cloud-permitting (Snively and Gallus 2014; McMillen and Steenburgh 2015; Willison et al. 2013), near-convection-allowing (Mittermaier and Csima 2017), and cloud-resolving (Song and Zhang 2018). Tsuboki (2008) defines cloud-resolving simulations as produced on grids less than 1 km, but Bryan et al. (2003) convincingly argues that cloud-resolving models require grid spacing that resolves turbulent flow O(100) m. Here, we will refer to 4-km forecasts as convection allowing. By convection-allowing, the forecast model is able to simulate realistic facsimiles of convective/mesoscale structures. By analogy with Clark et al. (2010), forecasts made using a grid of 1.3 km are here referred to as “cloud-allowing,” indicating that simulations with this grid spacing can provide valuable information to operational forecasters (Bryan et al. 2003).
The deterministic forecasts were initialized at 1200 UTC and used GFS (0.5°) analysis data for the initial and lateral boundary conditions. Each storm (included in Table 1) had >1/2 in. of observed liquid precipitation during the next forecast day within a majority of the area approximately bounded by 36°–44°N and 70°–79°W. Within this set, there were days with apparently contiguous areas of snow on snowfall maps prepared from interpolated station data (described below). Of the 19 simulated storms, 16, 12, 6, and 4 storms had maximum snow amounts of at least 1, 4, 8, and 12 in.,(approximately equal to 2.5, 10, 22, and 30 cm), respectively.
For the study of the five disruptive storms, WRF and GEFS ensemble forecasts had 10 members, which may be sufficient to give reasonably skillful forecasts (e.g., Clark et al. 2011). For a clean comparison, the WRF ensemble members used initial and lateral boundary conditions from the same-numbered GEFS. The results from the same-numbered GEFS forecasts will also be briefly presented.
With one exception, forecasts were made using the Weather Research and Forecasting (WRF) Model version 3.9.1 (Skamarock et al. 2008). The area of analysis in the deterministic forecasts was specified as 36.5°–44.5°N and 69°–79°W, within the area covered by the domain of 1.3-km grid spacing (Fig. 1). The entire forecast area (Fig. 1) consisted of a 12-km outer domain with 403 × 403 grid elements, a nested grid of 4 km with 756 × 756 grid elements, and a further nested grid of 1.3 km with 863 × 863 grid elements. The first day was considered “spinup,” so the forecasts for that day were not used in the analysis; the analysis used the second- and third-day forecasts. The 4- and 1.3-km domains were two-way nested, as is standard procedure in operational forecast simulations. This allows feedback from convection from the finest scales (cloud/mesoscale) to the synoptic scales.
Grid domains used in the deterministic and ensemble forecasts. The outside grid has 12-km grid spacing, while domain “d02” (white border) has 4-km grid spacing. The innermost domain “d03” (red border) has 1.3-km grid spacing.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
A 60-s time step was used in these forecasts, with a time step 3 times smaller for the 4-km domain and 9 times smaller for the 1.3-km domain (20 and 6.6 s, respectively). The “aerosol-aware” microphysics of Thompson et al. (2017) was used to simulate bulk microphysics (initialized with the monthly climatological values available from the WRF download site and described in Thompson and Eidhammer 2014). The scheme calculates the drop concentration during the simulation from the available aerosol (CCN) concentration. The model microphysics simulates changes in water vapor, cloud water and rainwater, ice, snow, and graupel mass content. The Thompson scheme has been shown to provide accurate snow forecasts in synoptic (Liu et al. 2011) and lake-effect (McMillen and Steenburgh 2015) snow situations, as well as mixed phase ice physics (Thompson et al. 2017). The modifed Tiedtke scheme (Tiedtke 1989; Zhang et al. 2011) was used to parameterize convection on the 12-km grid. The radiation package of Morcrette et al. (2008) was used for shortwave and longwave radiative transfer on all grids. The National Centers for Environmental Prediction–Oregon State University–Air Force–Hydrologic Research Laboratory Land Surface Model (NOAH LSM) (Chen and Dudhia 2001a,b) with multiparameterization options (NOAH-MP; Niu et al. 2011) was used to simulate surface fluxes (see also: Smirnova et al. 1997, 2000). The boundary layer scheme used was the eddy-diffusivity mass flux, Quasi-Normal Scale Elimination (QNSE) scheme of Sukoriansky et al. (2005). The grid had 31 layers and was a stretched grid, which increased the relative number of layers in the planetary boundary layer compared to upper levels.3
The NOAH LSM calculates snow depth at the surface using liquid equivalent precipitation. The LSM uses the precipitation rate and the fraction of frozen precipitation (FOFP) to calculate the snow depth (Ek et al. 2003; Tewari et al. 2004). The LSM classifies all the precipitation as snow if the FOFP is >0.5. This paper does not examine the sensitivity of precipitation type to changes in grid spacing.
Preliminary forecasts of some of the disruptive snowstorms were originally made using WRF version 3.6.1, with 36-, 12-, 4-, and 1.3-km forecast grids (not shown). For those preliminary simulation experiments, the Kain–Fritsch cumulus parameterization scheme (Kain 2004) was also used instead of the Tiedtke scheme (Tiedtke 1989; Zhang et al. 2011). For the March 2017 disruptive storm, the set of preliminary forecasts produced storm tracks much closer to the observed tracks (not shown) than the forecasts done with WRF version 3.9.1. Forecast snowfall amounts were also much closer to observed amounts, and the heaviest forecast snow was positioned more closely to the observed area of heaviest snow over central New York State. Hence, for the analysis, the ensemble set of simulations with innermost grids of 12, 4, and 1.3 km were substituted for the set obtained with WRF version 3.9.1.
For comparison to GEFS precipitation, the WRF data were interpolated to the GEFS grid using NCAR Command Language’s (NCL) “rcm2rgrid_Wrap” program.
b. Observations
Daily snowfall accumulations from 1600 Cooperative Weather Observer locations were obtained from the National Centers for Environmental Information (NCEI) via ncdc. noaa.gov. The data points, shown in Fig. 2, were linearly interpolated using NCL’s “obj_anal_ic” program (http://www.ncl.ucar.edu/Document/Functions/Built-in/obj_anal_ic.shtml) to the WRF 12-km grid in three steps, with “successive radii of influence” of 0.8°, 0.4°, and 0.2°. This type of smoothing allows the program to interpolate large-scale “features” while maintaining local detail. For instance, the radius of the finest value of the smoother was on the order of about half the width of Long Island, a geographic feature that is important to resolve in the forecasts. (A smaller radius of influence was tried on the first iteration, but it did not produce contiguous fields.)
The black squares show the locations of Cooperative Weather Observer reporting snow amounts.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
The most comprehensive high-resolution dataset for precipitation amounts (or when frozen, liquid equivalent precipitation amounts) is referred to as “Precipitation NCEP/EMC 4 KM Gridded Data (GRIB) Stage IV Data (Lin 2011).” These data were obtained from the site http://data.eol.ucar.edu/codiac/dss/id=21.093 and interpolated to the 12-km forecast grid. Since the data were already at a high resolution (4 km), they were interpolated to the WRF grid using a single radius of influence of 0.12° (or about the scale of the WRF 12-km forecasts). The Stage IV analysis is derived from multisensor hourly/6-hourly “Stage III” analyses (on local 4-km polar-stereographic grids) produced by the 12 River Forecast Centers (RFCs). The gridded dataset is potentially very useful for evaluating the accuracy of forecasts. However, during snow/ice events, the amounts recorded in the dataset are possibly less (Brennan and Lackmann 2005) than if accurate measurements of liquid equivalent precipitation were obtained from all stations. Problems in calculating precipitation amounts during winter storms from radar and gauge observations occur because of a lack of equipment needed to melt and measure frozen precipitation, which results in these stations reporting either zero or very low precipitation amounts (A. Allgood 2016, personal communication). No corrections to the data based on an analysis of reporting stations were attempted.
c. Forecast evaluation
We chose to focus our analysis on 24-h precipitation amounts when analyzing the deterministic forecasts, and storm total precipitation amounts when analyzing the disruptive storm forecasts. The results were aggregated over coastal areas (where many cities are located, including New York City), as well as over mountainous areas such as the Appalachians, the Catskills of New York, and the Green and White Mountains of Vermont and New Hampshire, in a bounding box 36.5°–44.5°N and 69°–79°W.
The deterministic forecasts were analyzed using the following evaluation metrics: critical success index (CSI), probability of detection (POD), false alarm ratio (FAR), bias, and success ratio (SR). A summary of the meaning of these terms is presented in in appendix B. Points closest to the upper right-hand corner of the plot are more skillful since POD, bias, SR, and CSI are optimized at a value of 1. To facilitate the interpretation of these data, they were displayed as a “performance diagram” (Roebber 2009). Forecast points aligning with the bias = 1 line, and with higher CSI values, are more accurate than forecasts with the same CSI values but higher bias. Increasing the POD, while holding bias steady, will move forecast points at a 45° angle within the diagram, in a direction that indicates higher accuracy of the forecasts. Exceedance values were set in intervals of 0.25 in. (6.34 mm), up to 2.5 in. day−1 (63.5 mm day−1). A neighborhood approach was used, and neighborhood radii of 12, 24, 36, and 48 km were tested,4 following the method outlined in Clark et al. (2010), including calculating the significance of the results at the 5% level. The neighborhood approach is appropriate for calculating forecast accuracy when there are typically occurring errors in the placement of precipitation “objects.” Storm track errors were also expected, as four-dimensional data assimilation (FDDA) nudging of the model simulations fields was not used.
An important aspect of forecasting is being able to predict the location of the heaviest occurrence of precipitation, and we wondered how well the GEFS (qualitatively) compared to the WRF forecasts (and WRF forecasts to each other) in predicting such areas. To perform this evaluation, the Stage IV data and the WRF 12- and 4-km ensemble data were interpolated to the GEFS 50 km grid. For each grid point, we calculated the ratio of the number of times storm total precipitation exceeded 25.4 mm (or greater than 10 in. of snow at typical 1:10 liquid equivalent ratios) divided by the total number of storm events/forecasts. The ratios were expressed as fractions: the fractions were calculated from 5 data points from the Stage IV data and 50 data points for each WRF ensemble. The data were masked out over the ocean with the exception of the immediate coastline, within the areal extent of the radar reflectivity returns that were used in the Stage IV calculations.
Ensemble forecasts were evaluated using five different metrics on the 12-km grid (described in Charles and Colle 2009): 1) the Brier score to evaluate the magnitude of the forecast probability errors; 2) rank histograms, which test how well the ensemble spread of the forecasts represents the true variability of the observation; 3) the rank probability score, a common measure to evaluate probability forecasts of multiple categories; 4) reliability diagrams to compare the observed frequencies to the forecast probabilities; and 5) discrimination diagrams to compare the ability of the forecasts to discriminate between events and nonevents. Note that the 1-, 3-, and 4-km ensemble forecast data were first interpolated to the 12-km grid (using NCL’s conservative “ESMF_grid” function) before calculating the forecast statistics. Considering the need to interpolate the snow-forecasting data from relatively coarse point data, it was appropriate to interpolate all data to the 12-km grid. Brier scores were evaluated for significance using the Bootstrap approach (Efron and Tibshirani 1994). The coding was based on the NCL program “bootstrap_diff_1.ncl,5” with random replacement of 30% of the latitude by longitude Brier scores with each 1000 iterations. Significance was determined based on a confidence level of 0.05, such that the remaining bootstrapped differences do not pass through zero.
To calculate the ensemble scores, it was necessary to have corresponding gridpoint values of probability based on the observations. The probabilities that observed precipitation amounts exceeded specified threshold values were calculated using the approach outlined in Sobash et al. (2011). To do so, it was first necessary to interpolate the observations to the model grid to create a matrix of exceedance (ones) and nonexceedance values (zeroes) for each threshold being tested. Second, a Gaussian smoother was used to calculate probabilities at each point (Sobash et al. 2016; Loken et al. 2017). The spatial smoothing parameter was chosen to be 24 km. This value of smoothing parameter subjectively produced smooth observed probability fields without losing the detail shown in the interpolated observations. The ensemble forecast probabilities were created from the gridded data itself, without applying a Gaussian smoother (e.g., Sobash et al. 2016; Loken et al. 2017).
3. Results
a. Deterministic forecast evaluation
The forecast data from each simulation was analyzed in 24-h increments, corresponding to forecast hours 24–48 h [i.e., next day forecasts, as well as the day-after forecast time periods (48–72 h)]. Regarding the day-after time period, CSI values for convection-allowing and/or cloud-resolving forecasts were not better than those with a convective parameterization. Hence, the discussion below is limited to results from next-day forecasts, in which CSI from convection-allowing forecasts and forecasts with parameterized convection had significant differences.
The performance diagram shown in Fig. 3 demonstrates advantages of using convection-allowing (4 km) and/or cloud-allowing (1.3 km) grids with explicit microphysics versus grids with parameterized convection (12 km). Results are shown for a neighborhood of 12 and 48 km. For a neighborhood of 12 km (Fig. 3a), the 12-km forecasts had higher POD than the other forecasts at the plotted threshold values, but the 1.3- and 4-km forecasts had lower bias (closer to 1). The SR calculated values were similar. Hence, CSI values at each exceedance threshold were not that different from each other. Increasing the neighborhood radius from 12 to 48 km had very little impact on the bias points calculated in the 12-km forecasts, indicating that the bias was not a matter of position error in these forecasts. However, for corresponding thresholds in Fig. 3a versus Fig. 3b, the POD in the explicit microphysics forecasts increased, the bias values were very close to 1, and the SR were closer to 1. Hence, as the radius of influence increased to 48 km, differences between the convection-parameterizing and convection-allowing forecasts’ CSI values became more apparent.
Performance diagrams for probability of detection (POD) vs success ratio (SR). The dashed lines are lines of equal bias, while the thin solid lines are lines of equal critical success index (CSI). Neighborhood skill score statistics were calculated within (a) 12- and (b) 48-km grid radius of each grid square from a comparison of WRF 24-h liquid equivalent precipitation amounts vs Stage IV observations during the winter 2013–14. Results from next-day forecasts with 1.3-, 4-, and 12-km grid spacing are shown (red, green, and blue, respectively), where different marker styles mark different thresholds. Improved accuracy is obtained as the neighborhood radius increases when points fall closer to the bias = 1 line (left to right diagonal), while also increasing POD and reducing FAR (increasing SR). The CSI of 4-km forecasts were significantly higher than the 12-km forecasts in the range of 0.75–2.0 in.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
CSI values in the 4-km forecasts were significantly better (at the 0.05 confidence level) than those of the 12-km forecasts for threshold values of 0.75, 1, 1.5, and 2 in. for a radius of 48 km. The differences between the 1.3- and 4-km forecasts were not significant, although the 1.3-km forecasts appear to have lower bias than the 4- and 12-km forecast on the 48-km grid. Increasing the neighborhood radius to 96 km did not appreciably alter the previous conclusions: the 12-km forecasts still had positive bias, but the 1.3- and 4-km forecasts had a bias less than 1 for higher threshold values of precipitation, indicating that the neighborhood radius was larger than the spatial scale of the higher-threshold precipitation structures.
The snow accumulation forecasts from the convection-allowing and cloud-allowing forecasts had higher forecast accuracy than those from the forecasts with parameterized convection. CSI values for both the 1.3- and 4-km snow forecasts were significantly better than the 12-km WRF forecasts for threshold values of 8 and 12 in. of snow (not shown). Although snowstorms were frequent during the winter of 2013–14, these were not forecast to be as big or disruptive as those discussed in the following section, where the ensemble forecast results are described.
Simulations were produced at 12-km grid spacing, but with the convective parameterization turned off. The main impact (not shown) was to reduce the bias for thresholds of 1.75, 2, and 2.5 in. compared to the 12-km forecasts with both explicit microphysics and the cumulus parameterization turned on (already described above). However, on average, >90% of the precipitation on the 12-km grid over land was from resolved clouds, and thresholds from 0.25 to 1.5 in. were mostly unaffected by turning off the cumulus parameterization. This suggests that most of the model forecast bias on the 12-km grid was because of simulated resolved precipitation, whether from synoptic or weakly convective clouds.
This result runs counter to expectations. For instance, one might expect an inverse relationship between grid spacing and maximum precipitation amounts (since vertical velocity is also inversely proportional to grid spacing). However, the occurrence of slant-wise convection depends on the intensity of a thermally direct ageostrophic circulation associated with frontogenetic forcing (e.g., Moore and Blakley 1988), and such forcing depends on horizontal temperature gradients (rather than vertical instability). Hence, all else equal, gridscale precipitation should be proportional to grid volume. In fact, Connelly and Colle (2019) found that grid spacing of at least 2-km grid was needed to realistically reproduce the aspect ratio of most snow-bands, and it can be assumed that such aspect ratios depend on grid spacing. Moreover, in strongly convective storms, Weisman et al. (1997) showed that forecasts with grid spacing of 8/12 km produced delayed but more robust convection than convection-allowing grids.
b. Comparison with the GEFS
When compared to the GEFS, Fig. 4 shows that both the 4- and 12-km WRF ensembles better predicted the general location of the area where heavy precipitation (>25.4 mm liquid equivalent) occurred most often. In the observations, 4 out of 5 storms exceeded 25.4 mm in the NYCMA (Fig. 4a), while 9/10 WRF 12- and 4-km ensembles exceeded this amount. Both the WRF 12-km (Fig. 4c) and WRF 4-km (Fig. 4d) ensembles showed the highest fraction of forecasts of precipitation exceeding 25.4 mm from southern New Jersey, northward along the coast. The pattern of fractional forecasts in each were more spatially consistent with the fraction of observations mapped in Fig. 4a than the GEFS fraction of points mapped in Fig. 4b. The GEFS underestimated the “likelihood” of forecasts exceeding 25.4 mm in the NYCMA, with 7 out of 10 forecasts instead exceeding 25.4 mm over Delaware and Maryland (Fig. 4b).
(a) The number of observed case study events exceeding 25 mm of liquid equivalent precipitation (out of 5) expressed as a fraction, and (b)–(d) the number of ensemble members forecasting > 25 mm of precipitation out of 50 total forecasts (expressed as a fraction) from the GEFS in (b), WRF 12-km ensemble in (c), and WRF 4-km ensembles in (d). The data were interpolated to the GEFS 0.5° grid and masked out beyond the immediate coastline.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
The WRF 4-km ensemble forecasts also showed important advantages compared to the 12-km ensemble forecasts. The WRF 4-km ensemble produced a comparatively higher fraction of forecasts exceeding 25 mm over Long Island (Fig. 4d), which was consistent with the spatial pattern of the observations. It also showed a higher fraction of forecasts over the mountains of West Virginia, where upslope flow might have contributed to higher precipitation amounts. In contrast, the 12-km forecasts (Fig. 4c) produced a comparatively higher fraction of forecasts with greater than 25 mm of precipitation over central and northern New Jersey, farther west than the area noted in the observations over eastern New Jersey where this amount was exceeded most frequently (Fig. 4a). There was also a comparatively lower fraction of forecasts exceeding 25.4 mm over the NYCMA than shown in Fig. 4a (observations) or Fig. 4d (4 km ensemble), while the localized maximum over the West Virginia mountains was not present. Hence, moving from the domain with parameterized convection to the convection-allowing grid forecast domain better reproduced the spatial distribution of the fraction of forecasts exceeding 25 mm.
c. Comparison of WPC and WRF ensemble disruptive storm forecasts
Based on guidance from the Weather Prediction Center6 (WPC; Fig. 5a) and analysis of other available forecast models, the 2015 storm was forecast by the National Weather Service (in Upton, New York) to bring heavy snows to the NYCMA and Boston. In fact, while heavy snow did fall in Boston, less than 10 in.7 fell (Fig. 5b) in New York City’s Central Park (Greybush et al. 2017). As similarly implied by the WPC forecast, the WRF 12-km ensemble (Fig. 5e) forecast very heavy snow amounts in southeastern New York, including the NYCMA. In comparison, both the convection-allowing 4-km (Fig. 5d) and cloud-allowing 1.3-km (Fig. 5c) ensemble forecasts were qualitatively better than the WPC (and NWS) forecasts.
(a) Weather Prediction Center probabilities of “Day 2” snowfall amounts exceeding the threshold shown for 27–28 Jan 2015. Blue line: at least 10% chance; green line: at least 40% chance; and red line: chance at least 70%. (b) Interpolated observed snow accumulation and (c)–(e) mean ensemble forecasts snow accumulations during 26–28 Jan 2015.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
Compared to the 12-km ensemble, both the 4- (Fig. 5d) and 1.3-km (Fig. 5c) WRF ensembles better positioned the area depicting the heaviest snow, which fell mostly over eastern Massachusetts and parts of Long Island. Still, the forecast westernmost border of the color shading indicating >12 in. of snow over Massachusetts was farther westward than the observed position of this demarcation line (Fig. 5a). The WRF 4-km ensemble forecast was the best in positioning the area of snow > 12 in. over Massachusetts.
In contrast, the 2016 storm was forecast incorrectly to miss the NYCMA to the south (Fig. 6a), while in fact 27.6 in. fell (Fig. 6b) in Central Park (Greybush et al. 2017). For the case study of January 2016, all WRF ensemble forecasts (Figs. 6c–e) predicted heavy snow in the NYCMA. It is noteworthy that the WRF 1.3 km ensemble forecast (Fig. 6c) more correctly limited forecast snow amounts in southern Connecticut when compared to the WRF 4 km (Fig. 6d) and was somewhat better than the WRF 12-km forecasts (Fig. 6e) in the same location. The WRF 1.3- and 4-km ensembles also improved the simulation of >30 in. of snow in the higher terrain of northern Virginia and eastern West Virginia.
As in Fig. 5, but for (a) WPS probabilities for 23–24 Jan 2016, (b) observed snow accumulation, and (c)–(e) mean forecast ensemble snow accumulation during 22–24 Jan 2016.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
The WPC forecast a 70% chance of at least 8 in. of snow (Fig. 7a) in the NYCMA on 14 March 2017. For that same storm, the NWS in Upton, New York, forecast generally 18–24 in. (not shown). In fact, the heaviest snow fell in central New York State (Fig. 7b), with snow accumulations of around 6–9 in. in the NYCMA. The WRF 1.3- (Fig. 7c) and 4-km (Fig. 7d) ensemble forecast better predicted (lower) snow amounts in the NYCMA, southeastern Pennsylvania, and New Jersey than the WRF 12-km ensemble forecast (Fig. 7e), which was similar to the WPC forecast. The 1.3- and 4-km forecasts also performed better (produced lower snow amounts) than the 12-km forecast over the high terrain in western Maryland, northern Virginia, and northeastern West Virginia (Figs. 7b–e).
As in Fig. 5, but for (a) WPS probabilities for 14–15 Mar 2017, (b) observed snow accumulation, and (c)–(e) mean forecast ensemble snow accumulation from 13 to 15 Mar 2017.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
On 7 March 2018, the WPC forecast a greater than 40% probability that at least 8 in. of snow would fall in the NYCMA (Fig. 8a). For that storm, the NWS in Upton, New York, forecast 8–12 in. in NYC, 12–18 in. over Newark (to the immediate southwest of the NYCMA), and 12–18 in. in White Plains (to its immediate north). In fact, the heaviest snow fell well northwest and north of the NYCMA: the city had only 2.9 in., Newark Airport had 4.4 in., and White Plains had 5.8 in. (Fig. 8b). All WRF models somewhat missed forecasting the spatial extent of heaviest snow over northwest New Jersey and northeastern Pennsylvania (12–21 in), with the WRF 4- and 1.3-km ensemble forecasting maximum amounts of 12–15 in. (Figs. 8c,d) over a relatively smaller area. Yet, the WRF 12 km forecast ensemble was worse, and predicted just 6–9 in. (Fig. 8e), perhaps being less sensitive to topographical forcing. Over the immediate NYCMA, the WRF 1.3- and 4-km forecasts predicted 1–3 in., but 3–6 in. was observed. The 12-km WRF ensemble, in contrast, forecast no snow at all (Fig. 8e) in the NYCMA.
As in Fig. 5, but for (a) WPS probabilities for 7–8 Mar 2018, (b) observed snow accumulation, and (c)–(e) mean forecast ensemble snow accumulation from 6 to 8 Mar 2018.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
The March 2018 storm had notable forecast errors both in the WPC/NWS forecasts and WRF forecasts. The WPC forecast (Fig. 9a) showed a >70% chance of at least 8 in. of snow, stretching from southwest of NYC northeastward into Connecticut. The local forecast office, meanwhile, forecast 12–18 in. in the NYCMA (not shown). While the NYC area did in fact receive 6–9 in. of snow, the heaviest snow actually fell over central Long Island, where 12–15 in. was observed, but where mixing of snow with rain had been forecast to lower snow amounts. The WRF 1.3- and 4-km ensemble forecasts (Figs. 9c,d) for the March 2018 storm forecast less than the observed snow amounts, and relying on these WRF forecasts for predicting snow amounts would have led to a forecast “bust.” The 12-km WRF ensemble forecast snow amounts were closer to the observed amounts in the NYCMA. However, the WRF 12-km ensemble also forecast 3–6 and 6–9 in. of snow across southeastern New York and Connecticut, respectively (Fig. 9e), where in fact much less or even no snow was observed. The overprediction of snow was associated with an overforecast of liquid equivalent precipitation (cf. Fig. 10a and 10d). The WRF 1.3- and 4-km forecasts more accurately forecast precipitation amounts across southeastern New York and Connecticut (Figs. 10b,c). Interestingly, both of these forecast ensembles predicted substantial precipitation amounts in northern Maryland and southern Pennsylvania, where the heaviest snow was actually observed, and topographical influences are present. The fact that liquid precipitation was the predominant forecast type in areas where snow was observed, suggests that the higher-resolution models “suffered” more from poor forecasting of precipitation type, rather than amounts of precipitation.
As in Fig. 5, but for (a) WPS probabilities for 21–22 Mar 2018, (b) observed snow accumulation, and (c)–(e) mean forecast ensemble snow accumulation during 20–22 Mar 2018.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
(a) Stage IV observations from 20 to 22 Mar 2018, and WRF mean ensemble precipitation (including liquid equivalent) from the (b) 1.3-, (c) 4-, and (d) 12-km forecasts.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
While the forecasts described above produced a wide range of precipitation amounts and captured important details concerning the spatial distribution of precipitation and amounts, they were not able to represent the true variability (or range) of observed snow accumulations within the set of case studies tested. Figure 11 shows rank histograms for all three ensembles, after ranking the forecasts at each grid point, and then counting the range interval for which the observation occurred. The U-shaped histograms shown are typically associated with ensemble forecasts that are underdispersive and/or have conditional ensemble bias (e.g., Hamill 2001; Wilks 2011). As can be seen, between 30% and 40% of the observations were larger than the highest forecast amounts, while 20%–30% were lower. More observations exceeded the maximum forecast amounts in the 1.3-km ensemble than in the other ensembles, while a greater number of observations had snow amounts less than the lowest forecast member in the 12-km ensemble. The reader should keep in mind, though, that the underdispersiveness reflects in part the underestimation of extremely heavy snow amounts during the March 2017 storm over central New York, and that all models did not predict the more than 2 ft of heavy snow in southeastern PA that fell during the January 2016 storm.
Rank histogram for the (a) WRF 12-km ensemble, (b) WRF 4-km ensemble, and (c) WRF 1.3-km ensemble for all five disruptive storms. A flatter spread indicates that the forecast at each grid element represents the true variability (uncertainty) of the observations. Between 30% and 40% of the forecasts produced lower snow amounts than observed.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
Overall, snow and precipitation (liquid equivalent where frozen) Brier scores were generally lower (better) in the forecasts with cloud-allowing and convection-allowing forecasts than with parameterized convection (Tables 2 and 3). Based on the results of the bootstrap testing, the 1.3- and 4-km forecast probabilities were significantly better than the 12-km forecasts for many threshold values for both snow and precipitation forecasting. Yet, there were case study days (e.g., the January 2016 storm) in which the 1.3-km forecasts were significantly better than both the 12- and 4-km ensemble forecasts in forecasting liquid precipitation amounts. Based on the Brier score decomposition into reliability and resolution (not shown), improved reliability in the 1.3- and 4-km ensembles compared to the 12-km ensemble was the main reason for the improvement in the higher-resolution forecasts (lower Brier scores). This means that the convection-allowing and cloud-allowing forecasts, compared to the forecasts with parameterized convection, produced gridscale probabilities of snow exceeding the specified thresholds closer to the actual percentage of events that occurred. However, the relative lack of the resolution term indicates that none of the forecast ensembles consistently provided conditional probabilities that varied much (on average) from the “climatic” average (calculated from the 5 forecast days). The uncertainty term ranged from about 0.025 to 0.25, with smaller values occurring for the larger thresholds, which have lower climatological frequencies and more correct null forecasts. Indeed, Brier scores generally decreased (i.e., improved) as threshold values increased.
Brier score values for the dates, thresholds, and grid spacing shown for total storm forecast amounts for each of the five disruptive storms. The Brier score calculates the magnitude of the forecast probability errors. An “x” placed in front of the Brier score in the 12-km column indicates that neither the 4- or 1.3-km forecasts were significantly better (or worse) then the 12-km forecasts at the 0.05 confidence level. If there is no “x” in the 12-km column, but an “x” in one of the other columns, then that forecast at that threshold was not better than the 12-km forecasts, but the other unmarked forecast in the other column was.
As in Table 2, but for total precipitation (including liquid equivalent). The “y” indicates that the 1.3-km forecasts were significantly better than the 4-km forecasts.
The evaluation of the rank probability scores (Fig. 12) suggests that the highest-resolution WRF ensemble 1.3- and 4-km ensemble forecasts were best overall at predicting the likelihood that forecast snow (Fig. 12a) and liquid equivalent (Fig. 12b) amounts would exceed observed amounts. During the January 2015 and March 2017 storms, both the 1.3- and 4-km ensembles had dramatically lower RPS than the 12-km ensembles for both snow and liquid equivalent amounts, with the 1.3-km ensemble having the lowest RPS. The smallest RPS of any of the forecast experiments were obtained for the forecast of January 2016 storm. For that storm, the differences among the forecasts were small and it was not possible from the RPS scores for snow and liquid precipitation to claim that one forecast was better than the other. For instance, the 12-km ensemble forecast had a lower RPS for snow amounts, but the 1.3-km had a lower RPS for liquid equivalent precipitation amounts. For the disruptive storm of 6–8 March 2018, both higher-resolution forecasts had lower RPS values for snow and liquid precipitation than the 12-km ensemble, but differences were not as large as in the other previously mentioned case studies. In this case study, the 1.3-km forecast was best. For the 20–22 March 2018 storm, the 12-km forecast ensemble had a lower RPS for snow forecasting, but higher (worse) values than the 1.3-km ensemble for forecasting liquid equivalent amounts (the 4 km ensemble forecast was slightly worse than the others). The 12-km forecast ensemble had a lower RPS for snow accumulations because the tendency to overpredict precipitation amounts worked in its favor. Over all days, the average RPS values for 12-, 4-, and 1.3-km forecasts were, respectively, for snow 0.16, 0.14, and 0.15, and precipitation 0.48, 0.41, and 0.39, suggesting that there is a small, but comparative advantage to moving from the convection-allowing to cloud-allowing grid.
Rank probability scores for (a) snow and (b) liquid equivalent precipitation for the 1.3- (green), 4- (red), and 12-km (blue) ensemble forecasts for the set of five disruptive storms. The average RPS values for 12-, 4-, and 1.3-km forecasts were, respectively, for snow 0.16, 0.14, and 0.15, and precipitation 0.48, 0.41, and 0.39. The rank probability score answers the question how well the ensembles predicted that the observations would exceed the specified threshold values shown in Tables 2 and 3. Lower scores indicate more accurate probability forecasts.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
The 1.3-km ensemble forecasts more closely predicted the observed frequencies (Fig. 13), with the reliability curves from the 1.3-km (12-km) forecasts generally falling closest (farthest) to (from) the line of perfect-reliability. The added benefit of using smaller grid spacing to forecast the probability of at least 6, 12, 18, and 24 in. of snow is apparent, keeping in mind that data points on the graphs above the no-skill line and below the diagonal line (representing perfect reliability) positively contribute to the Brier skill score when a forecast of climatology is used as the reference forecast (e.g., Wilks 1995).
Reliability diagrams for snow exceedance threshold values of (a) 6, (b) 12, (c) 18, and (d) 24 in. from the five disruptive snow storms for the 12- and 4-km ensembles. Reliability diagrams answer the question how well the predicted probabilities correspond to their observed frequencies.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
For a threshold of 6 and 12 in., many segments of the reliability curves for the 12-km forecasts fell below the no-skill line, negatively contributing to the Brier skill score (Figs. 13a,b). The 1.3-km forecasts were more reliable than the 4-km forecasts for both exceedance values, the latter lacking skill in some probability ranges. The 1.3-km forecasts also produced about equal numbers of forecasts within all probability ranges, compared to the 4- and 12-km forecasts, indicating greater ensemble spread among the 1.3-km forecasts (Figs. 14a,b). The ensembles produced on the 4- and 12-km grids produced a relatively large number of forecasts with a probability of 90%–100%, indicative of comparative bias or overcertainty.
The frequency of forecasts that occurred in each probability bin for the threshold values shown. The graphs show a measure of sharpness, or the ability of the forecasts to produce extreme values. Only nonzero forecasts probabilities were plotted.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
For forecast snow amounts exceeding 18 and 24 in., the 12-km forecasts again had lower reliability and skill compared to the 1.3- and 4-km forecasts (Figs. 13c,d), with reliability curves not much above the no-skill line. Overall, fewer forecasts within the 1.3- and 4-km ensembles than the 12-km forecasts had high probabilities of exceeding these two threshold values (Figs. 14c,d). In comparison, forecasts made with the 12-km grid were overcertain that these thresholds would be exceeded, consistent with forecast bias. For a threshold of 24 in., a greater number of the data points from the 1.3-km (and 4-km) ensemble were plotted above the diagonal line (Fig. 13d), indicating an underforecasting bias. Correspondingly, the number of 1.3- and 4-km forecasts within each probability range in Fig. 14d were quite skewed to the left side of the graphs, or lower probabilities.
Figure 15 discrimination diagrams show that the 1.3-km forecasts were best able to forecast the likelihood of the conditions for heavy snow and/or precipitation. The threshold values chosen to define heavy snow depended upon the case study, and they ranged from 6 to 24 in. A threshold of 30 mm liquid (or liquid equivalent) was chosen for the 20–22 March 2018 storm, since most areas within the domain had liquid rather than frozen precipitation. The best forecasts had the largest separation between the correct forecast of high probability that a threshold amount would be exceeded and the forecast probability that the event threshold would not be exceeded. The summary scores, representing the differences between the two distributions, were highest in the 1.3-km ensemble forecasts, but the 4-km forecasts had higher summary scores than those obtained from the 12-km forecast ensembles. Four of the five 1.3-km forecasts had positive summary scores, meaning that more forecast locations were correctly identified as exceeding the threshold tested than not, while three of the 12-km forecasts had negative summary scores.
Discrimination diagrams for “heavy snow” from the first four case study experiments, and “heavy precipitation” (liquid equivalent) from the last, for the 1, 3-, 4-, and 12-km forecast ensembles. Threshold exceedance values and summary scores are shown in parentheses. Summary scores are calculated from the difference between the mean values of each distribution. These differences are a measure of the ability of the ensembles to discriminate between observations exceeding or not exceeding the threshold values. Higher mean values indicate greater discrimination.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
4. Discussion and conclusions
Various evaluation metrics were used to demonstrate the potential value of using cloud-allowing (1.3 km) and convection-allowing (4 km) model grids to forecast total winter storm precipitation amounts, relative to forecasts with parameterized convection (12-km grid). In the higher-resolution forecasts, there was a more accurate forecast depiction of precipitation maxima both along the coastal corridor representing East Coast cities, including the NYCMA, and also within mountainous areas farther inland, where the magnitude of upslope flow was affected by elevation differences among model domains with different resolved topography.
Hence, the recommendation is to use cloud-allowing ensemble forecasts for predicting East Coast winter storms, although the use of convection-allowing forecasts would be a first and necessary step toward improving winter storm forecasts.
Calculating deterministic model forecast statistics with a neighborhood radius of 48 km was sufficient to create unbiased 1.3- and 4-km forecasts through simultaneous increases in detection and reductions in false positives. In contrast, bias was systemic in the forecasts made with parameterized convection and was not reduced within the range of neighborhood radii tested. As displayed on the performance diagram, both 1.3- and 4-km deterministic forecasts showed higher “performance” than forecasts made with parameterized convection on a 12-km grid. Thus, CSI values were generally higher and biases were lower.
Both 1.3- and 4-km ensemble forecasts were more reliable than 12-km ensemble forecasts, and cloud-allowing 1.3-km forecast grids were generally more reliable than convection-allowing 4-km forecast grids. For instance, reliability diagrams showed that forecasts with 1.3-km grid spacing had better agreement between forecast probability and mean observed frequency for snow amounts > 6, 12, and 18 in. than both the 4- and 12-km ensemble forecasts. Both 1.3- and 4-km ensemble forecasts produced higher summary scores on discrimination diagrams than the 12-km forecasts, while the WRF 1.3-km ensemble forecasts were better able to distinguish between the occurrence of heavy snow (or precipitation) than forecasts made with the 4-km forecast grid. Analysis of forecast precipitation Brier scores suggested that at least in some of the case studies, the cloud-allowing forecasts were more accurate than the cloud-allowing forecasts.
For purposes of analysis, the data were interpolated to the 12-km grid of the WRF outermost domain. This was the grid common to all ensembles. A comparison of model output on the native 1.3-km grid, for example, with the regridded 1.3-km data, revealed that some very localized detail was “lost,” but large differences remained between it and the 12-km precipitation fields. Hence, we suggest that the salient conclusions of this study do not depend on the WRF grid-interpolation choice.
Regarding the interpolation of the snow data, the interpolation was done using successively smaller “radii of influence” to maximize retention of smaller-scale details. Using a smaller outer interpolation radius led to blank areas being present within the interpolated data. Nevertheless, the presented maps of observed snowfall do show local variability both within the NYCMA and associated with enhanced snowfall rates over mountainous areas. One could also identify localized forecast errors in all forecasts.
The sample of disruptive storms studied here was a relatively small one—though larger than previous studies—and the rank histogram revealed that the forecasts were underdispersive. Perhaps a larger sample of forecasts is necessary to reach firm conclusions about ensemble variance and to refine the added advantages of using higher-resolution forecasts. Such a sample could be created in a “winter experiment” analogous to those produced during numerous spring experimental studies.
A lack of topographical resolution can also affect the simulation of surface temperatures, leading to errors in forecasting precipitation type. Yet, errors in forecasting precipitation type even occurred with the high-resolution models used here for the 20–22 March 2018 case study. In fact, as of this writing, the NWS in Upton, New York, overforecast snow amounts in the NYCMA during the 16–17 December 2019 storm, as a heavy but wet snow fell across the area. Hence, forecasting precipitation types remains a challenging problem in situations where temperatures are very close to freezing.
Here, forecasts ensembles were based on the boundary conditions from GEFS ensemble members, and mixed model physics ensembles might also be considered to further improve forecasts. However, Gallus et al. (2019) found that using such mixed physics packages—some of which introduced model biases—did not lead to improved skill of the tested ensemble configuration. (The reader is referred to appendix C and Figs. C1 and C3, for further discussion and figures concerning the possible impact of varying the cumulus parameterization and grid configuration in winter storm ensemble forecasts.)
In regard to the possible role of convection in winter storm development, it is interesting to note a more recent NWS (Upton, New York) discussion just prior to the snowstorm of 3–4 March 2019 (not simulated here): “…convection allows for latent heat release that will cause subsequent downstream ridging and upstream troughing. The overall effect is to increase baroclinicity with noticeable decreasing wavelength height patterns aloft.” A comparison of supplemental WRF ensemble forecasts with explicit microphysics turned on, compared to ensemble forecasts with microphysics turned off, showed that diabatic heating associated with microphysical processes deepened the trough and amplified the ridge compared to the forecast simulations with microphysics turned off (see Fig. C3). The impact of heating perturbations associated with moist processes are discussed in detail in Zhang et al. (2007). In fact, a tendency in the GEFS to produce heavier precipitation amounts and higher cloud tops (not shown) over Maryland and off its coast could have impacted the location of strongest baroclinicity in these forecasts, leading to a more average southern storm track. Mahoney and Lackmann (2006) documented the possible importance of unresolved or “missed” precipitation on future storm evolution, which might also have played a factor in forecast error. A detailed analysis of the comparative accuracy among global forecasts can be found in Korfe and Colle (2018).
Should we expect these differences in forecasting the location of heavy precipitation to persist in the near future? Perhaps not: the grid of the GFS was recently changed from a spectral to “Finite Volume-Sphere Cubed Sphere” dynamical core (https://www.weather.gov/news/fv3), and there have been other upgrades to model physic packages as well (moreover, similar improvements are being implemented in the GEFS), so such discrepancies in apparent storm tracks may not remain. Still, sensitivity to initial conditions (chaos) will likely continue to cause challenges to winter weather forecasting into the future, and the grid spacing of the GFS and other operational global models (including ensembles) apparently remains too large for simulating the microphysical processes and feedbacks that are required for more accurate forecasting of snow in East Coast storms.
With the advent of convection-allowing (e.g., HRRRE) or even possibly cloud-resolving forecast ensembles to the operational suite of models, we encourage the National Weather Service efforts to further highlight probability forecasts even on their standard forecast pages (i.e., “local forecasts”). At the time of this writing, however, this information is still not regularly conveyed to the public (Rothfusz et al. 2018), for example, in the discussions preceding potentially disruptive storms in the NYCMA. Perhaps there is a reluctance to convey probabilistic information to the public, who—like an aspiring forecaster’s mother—may just want to “know” whether it will snow or not and how much. In fact, forecast snow amounts are for the most part presented both by the NWS and private meteorological firms as a range (e.g., 12–18 in., or even 12–24 in.) implying that there is a 100% chance that snow amounts will be within this range. It would be interesting to investigate why this continues to be the case and whether probabilistic snow forecasts that depict a range of possible snow accumulations would be accepted and positively acted on by industry and the public.
Acknowledgments
The WRF ensemble data were produced on the High Powered Computing (HPC) of Advanced Clustering, computers of Weather It Is, and the HPC of The Hebrew University of Jerusalem, Givat Ram, Department of Earth Sciences. Precipitation data were obtained from NCAR UCAR Earth Observing Laboratory (http://data.eol.ucar.edu/codiac/dss/id=21.093). Snowfall data were obtained from the National Centers for Environmental Information (NCEI) via ncdc. noaa.gov, consisting of station data from about 1600 U.S. locations, including AWOS and ASOS stations. GFS and GEFS data were obtained from NCEP NOMADS/FTPPRD. The cloud top brightness temperatures (not shown) were obtained from: https://www.ncei.noaa.gov/data/geostationary-ir-channel-brightness-temperature-gridsat-b1/access/2017/. Although not presented in this paper, Mark Klein of NOAA’s NWS WPC office provided historical files of the WPC probability forecasts. We do show, however, WPC snow probability forecasts from WPC’s Winter Weather Forecasts page (products archives; https://origin.wpc.ncep.noaa.gov/archives/web_pages/wpc_arch/get_wpc_archives.php). We also appreciate forecast maps provided by NWS forecasterss, and discussion with the forecasters Chad Gimmestad and Paul Schlatter (Boulder, CO), as well as other forecasters from the NWS who helped in developing this analysis. We thank Matthew Pyle of the EMC Mesoscale Modeling Branch for information on the new HREF. We thank the reviewers for their very constructive comments and are indebted to Paul Roebber for his help with the “performance diagrams.” The NCL program developed as part of this collaboration is available from the first-author.
availability statement. The analysis was done using observations and WRF output files. Selected variables from the WRF output files are available on request.
APPENDIX A
Description of Disruptive Storms
There were five disruptive snowstorms studied here.
26–28 January 2015. The initial conditions (0000 UTC 26 January) and lateral boundary conditions were from GEFS 1° (retrospective) data obtained from the NOAA/National Centers for Environmental Information (https://www.ncdc.noaa.gov/data-access). The simulation ended at 0000 UTC 28 January in the New York City Metropolitan Area (NYCMA). Snow fell mostly the day after the start of the forecast simulations from about 0600 UTC 27 January until 0000 UTC 28 January.
22–24 January 2016. The simulation was started at 0000 UTC 22 January. It ended at 1200 UTC 24 January. The period of heaviest snow in the NYCMA was from about 0600 UTC 23 January until about 0600 UTC 24 January. These simulations used GEFS 0.5° data (downloaded during the operational GEFS forecasts).
13–15 March 2017. The simulation was started 0000 UTC 13 March. It ended at 1200 UTC 15 March. The period of heaviest snow was from about 0600 UTC 14 March until about 1200 UTC 15 March (where the heaviest snow in the New York City area fell during the local daytime on 14 March). These simulations also used GEFS 0.5° data (downloaded during the operational GEFS forecasts).
6–8 March 2018, again using GEFS 0.5° data. The simulation was for 54 h, ending 1800 UTC 8 March. Most of the snow fell from 1200 UTC 7 March to about 0600 UTC 8 March, and was heaviest in central New York State.
20–22 March 2018, with GEFS 0.5° data. The simulation was for 54 h, ending 1800 UTC 22 March. The snow fell predominantly from 1200 UTC 21 March to about 0600 UTC 22 March, and was heaviest in southeastern Pennsylvania.
Snowfall observations were measured over 24-h periods (from 1900 to 1900 local EST). Forecasts were started with the intent to allow for at least 6 h of spinup time before observed snowfall (within the closest 24-h period) entered the higher-resolution forecast grid.
APPENDIX B
Summary of Forecast Evaluation Terms
Various methods for implementing forecast evaluation approaches are described in, for example, Wilks (1995, 2006), Weigel et al. (2007), Roebber (2009), and Charles and Colle (2009). There are also numerous web pages online that provide additional guidance for evaluating forecasts. For instance, below are descriptions from the forecast evaluation page (https://www.cawcr.gov.au/projects/verification/, from the “Seventh International Verification Methods Workshop.”
Threat score or critical success index: “How well did the forecast “yes” events correspond to the observed “yes” events?” (Range 0–1: perfect score is 1.)
Probability of detection: “What fraction of the observed “yes” events were correctly forecast?” (Range 0–1; perfect score is 1.)
False alarm ratio: “What fraction of the predicted “yes” events actually did not occur (i.e., were false alarms)?” (Range 0–1; perfect score is 0.)
Bias score: “How did the forecast frequency of “yes” events compare to the observed frequency of “yes” events?” (Range from 0 to ∞; perfect score is 1.)
Brier score: “What is the magnitude of the probability forecast errors?” (Range 0–1, perfect score is 0).
Rank histogram: “How well does the ensemble spread of the forecast represent the true variability (uncertainty) of the observations?”
Rank probability score: “How well did the probability forecast predict the category that the observation fell into?”—a measure of the difference between the distribution of forecasts and the distribution of observations, for specified forecast categories.
Reliability diagrams: “How well do the predicted probabilities of an event correspond to their observed frequencies?” Reliability diagrams display the “observed relative frequency” (in %) of the observations, graphed against “forecast probabilities” from the ensembles. When forecasts produce too high probabilities, they reflect positive bias, and the graphed curve falls below the line of perfect reliability (or perfect skill). If they are quite biased, the graphed line will fall below the no-skill line (which is halfway between perfect skill and climatology).
Discrimination diagram: “What is the ability of the forecast to discriminate between events and non-events?” It plots the likelihood of each forecast probability when the event occurred and when it did not occur. To compare forecasts, a summary score is calculated for each forecast, which is the absolute value of the difference between the mean values of each distribution within the diagram.
Other web sites and web page–based documents detailing how to calculate these scores:
APPENDIX C
Varying Convective Parameterization and Domain Setup
Although not the emphasis of this research, supplemental simulations were done substituting one cumulus parameterization for another on either the 12-km grid (by itself) or the outer 12-km grid with nested grids. For instance, when using the Kain–Fritsch (K-F) cumulus parameterization instead of the modified Tiedtke scheme (M-T) to simulate the January 2016 storm, the forecast snow amounts increased on both the 4-km (Fig. C1c) and 12-km (Fig. C1e) grids compared to the ensemble forecasts with M-T (Figs. C1b,d), especially in the NYCMA. While the forecast amounts shown in Fig. C1e with only the 12-km forecast grid were much larger than observed (Fig. C1a), forecast snow amounts in the NYCMA were closer to observed amounts on the 4-km grid when the K-T was substituted on the outer grid. Further comparisons were made between forecasts, substituting the K-T for M-T on the 12-km forecast domain. While it was not possible to recommend one cumulus parameterization over another based on the rank probability scores, the representation of microphysical processes (or lack thereof) strongly affects storm development (Fig. C2).
(a) Observed and (b)–(e) ensemble mean forecast snow amounts for the January 2016 disruptive storm. The M-T cumulus parameterization was used on the 12-km grid in (b) and (d), and the K-F cumulus parameterization was used in (c) and (e).
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
Height differences between the NAM analysis data and (a) WRF 12-km ensemble mean heights with microphysics on and (b) WRF 12-km forecast ensemble mean heights with microphysics turned off.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
Above, it was noted that forecasts made with a previous version of WRF (3.6.1) and a larger outer domain with 36-km grid spacing more realistically simulated the spatial distribution of snowfall during the March 2017 storm than those made with WRF 3.9.1. Figures C3a–C3d show observed and forecast snowfall from the January 2015 disruptive winter storm using the same set of grids, including the 36-km outer grid. Contrary to the results previously shown, the best forecast of spatial distribution of snow amounts was obtained with the 1.3-km grid spacing. Overall, all forecasts produced snow amounts in the NYCMA closer to observed amounts than obtained with version 3.9.1 and the 12-km outer grid.
(a) Observed snow accumulation from the January 2015 disruptive storm. (b)–(d) The ensemble mean forecast accumulation from the 1.3-, 4-, and 12-km ensembles, respectively. The WRF version was 3.6.1 and the domain used was the same as used to simulate the March 2017 storm.
Citation: Weather and Forecasting 35, 6; 10.1175/WAF-D-19-0154.1
REFERENCES
Ancell, B., and G. J. Hakim, 2007: Comparing adjoint-and ensemble sensitivity analysis with applications to observation targeting. Mon. Wea. Rev., 135, 4117–4134, https://doi.org/10.1175/2007MWR1904.1.
Baxter, M. A., and P. N. Schumacher, 2017: Distribution of single-banded snowfall in central U.S. cyclones. Wea. Forecasting, 32, 533–554, https://doi.org/10.1175/WAF-D-16-0154.1.
Bishop, C. H., B. J. Etherton, and S. J. Majumdar, 2001: Adaptive sampling with the ensemble transform Kalman filter. Part I: Theoretical aspects. Mon. Wea. Rev., 129, 420–436, https://doi.org/10.1175/1520-0493(2001)129<0420:ASWTET>2.0.CO;2.
Brennan, M. J., and G. M. Lackmann, 2005: The influence of incipient latent heat release on the precipitation distribution of the 24–25 January 2000 U.S. East Coast cyclone. Mon. Wea. Rev., 133, 1913–1937, https://doi.org/10.1175/MWR2959.1.
Bryan, G. H., J. C. Wyngaard, and J. M. Fritsch, 2003: Resolution requirements for the simulation of deep moist convection. Mon. Wea. Rev., 131, 2394–2416, https://doi.org/10.1175/1520-0493(2003)131<2394:RRFTSO>2.0.CO;2.
Cerruti, B. J., and S. G. Decker, 2011: The local winter storm scale: A measure of the intrinsic ability of winter storms to disrupt society. Bull. Amer. Meteor. Soc., 92, 721–737, https://doi.org/10.1175/2010BAMS3191.1.
Changnon, S. A., 1999: Impacts of 1997/98 El Niño–generated weather in the United States. Bull. Amer. Meteor. Soc., 80, 1819–1827, https://doi.org/10.1175/1520-0477(1999)080<1819:IOENOG>2.0.CO;2.
Changnon, S. A., 2007: Catastrophic winter storms: An escalating problem. Climatic Change, 84, 131–139, https://doi.org/10.1007/s10584-007-9289-5.
Changnon, S. A., D. Changnon, T. R. Karl, and T. G. Houston, 2008: Snowstorms across the Nation: An Atlas about Storms and Their Damages. National Climatic Data Center, 96 pp.
Charles, M. E., and B. A. Colle, 2009: Verification of extratropical cyclones within the NCEP operational models. Part I: Analysis errors and short-term NAM and GFS forecasts. Wea. Forecasting, 24, 1173–1190, https://doi.org/10.1175/WAF2222169.1.
Chen, F., and J. Dudhia, 2001a: Coupling an advanced land surface–hydrology model with the Penn State–NCAR MM5 modeling system. Part I: Model implementation and sensitivity. Mon. Wea. Rev., 129, 569–585, https://doi.org/10.1175/1520-0493(2001)129<0569:CAALSH>2.0.CO;2.
Chen, F., and J. Dudhia, 2001b: Coupling an advanced land surface–hydrology model with the Penn State–NCAR MM5 modeling system. Part II: Preliminary model validation. Mon. Wea. Rev., 129, 587–604, https://doi.org/10.1175/1520-0493(2001)129<0587:CAALSH>2.0.CO;2.
Clark, A. J., W. A. Gallus Jr., M. Xue, and F. Kong, 2009: A comparison of precipitation forecast skill between small convection-permitting and large convection-parameterizing ensembles. Wea. Forecasting, 24, 1121–1140, https://doi.org/10.1175/2009WAF2222222.1.
Clark, A. J., W. A. Gallus Jr., and M. L. Weisman, 2010: Neighborhood-based verification of precipitation forecasts from convection-allowing NCAR WRF Model simulations and the operational NAM. Wea. Forecasting, 25, 1495–1509, https://doi.org/10.1175/2010WAF2222404.1.
Clark, A. J., and Coauthors, 2011: Probabilistic precipitation forecast skill as a function of ensemble size and spatial scale in a convection-allowing ensemble. Mon. Wea. Rev., 139, 1410–1418, https://doi.org/10.1175/2010MWR3624.1.
Clark, A. J., and Coauthors, 2012: An overview of the 2010 Hazardous Weather Testbed experimental forecast program spring experiment. Bull. Amer. Meteor. Soc., 93, 55–74, https://doi.org/10.1175/BAMS-D-11-00040.1.
Colle, B. A., and M. E. Charles, 2011: Spatial distribution and evolution of extratropical cyclone errors over North America and its adjacent oceans in the NCEP global forecast system model. Wea. Forecasting, 26, 129–149, https://doi.org/10.1175/2010WAF2222422.1.
Coniglio, M. C., K. L. Elmore, J. S. Kain, S. J. Weiss, M. Xue, and M. L. Weisman, 2010: Evaluation of WRF model output for severe weather forecasting from the 2008 NOAA Hazardous Weather Testbed spring experiment. Wea. Forecasting, 25, 408–427, https://doi.org/10.1175/2009WAF2222258.1.
Connelly, R., and B. A. Colle, 2019: Validation of snow multibands in the comma head of an extratropical cyclone using a 40-member ensemble. Wea. Forecasting, 34, 1343–1363, https://doi.org/10.1175/WAF-D-18-0182.1.
Dyer, J., and C. Zarzar, 2016: Defining the influence of horizontal grid spacing on ensemble uncertainty within a regional modeling framework. Wea. Forecasting, 31, 1997–2017, https://doi.org/10.1175/WAF-D-16-0030.1.
Efron, B., and R. Tibshirani, 1994: An Introduction to the Bootstrap. CRC Press, 456 pp.
Eisenberg, D., and K. E. Warner, 2005: Effects of snowfalls on motor vehicle collisions, injuries, and fatalities. Amer. J. Public Health, 95, 120–124, https://doi.org/10.2105/AJPH.2004.048926.
Ek, M. B., K. E. Mitchell, Y. Lin, E. Rogers, P. Grunmann, V. Koren, G. Gayno, and J. D. Tarpley, 2003: Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta model. J. Geophys. Res., 108, 8851, https://doi.org/10.1029/2002JD003296.
Fierro, A. O., E. R. Mansell, C. L. Ziegler, and D. R. MacGorman, 2012: Application of a lightning data assimilation technique in the WRF-ARW Model at cloud-resolving scales for the tornado outbreak of 24 May 2011. Mon. Wea. Rev., 140, 2609–2627, https://doi.org/10.1175/MWR-D-11-00299.1.
Gallus, W. A., 2010: Application of object-based verification techniques to ensemble precipitation forecasts. Wea. Forecasting, 25, 144–158, https://doi.org/10.1175/2009WAF2222274.1.
Gallus, W. A., J. Wolff, J. A. Gotway, M. Harrold, L. Blank, and J. Bleck, 2019: The impact of using mixed physics in the Community Leveraged Unified Ensemble. Wea. Forecasting, 34, 849–867, https://doi.org/10.1175/WAF-D-18-0197.1.
Greybush, S. J., S. Saslo, and R. Grumm, 2017: Assessing the ensemble predictability of precipitation forecasts for the January 2015 and 2016 East Coast winter storms. Wea. Forecasting, 32, 1057–1078, https://doi.org/10.1175/WAF-D-16-0153.1.
Hamill, T. M., 2001: Interpretation of rank histograms for verifying ensemble forecasts. Mon. Wea. Rev., 129, 550–560, https://doi.org/10.1175/1520-0493(2001)129<0550:IORHFV>2.0.CO;2.
Iyer, E. R., A. J. Clark, M. Xue, and F. Kong, 2016: A comparison of 36–60-h precipitation forecasts from convection-allowing and convection-parameterizing ensembles. Wea. Forecasting, 31, 647–661, https://doi.org/10.1175/WAF-D-15-0143.1.
Kain, J. S., 2004: The Kain–Fritsch convective parameterization: An update. J. Appl. Meteor., 43, 170–181, https://doi.org/10.1175/1520-0450(2004)043<0170:TKCPAU>2.0.CO;2.
Kain, J. S., S. J. Weiss, J. J. Levit, M. E. Baldwin, and D. R. Bright, 2006: Examination of cConvection-aAllowing cConfigurations of the WRF mModel for the pPrediction of sSevere cConvective wWeather: The SPC/NSSL SSpring PProgram 2004. Wea. Forecasting, 21 (2), 167–181, https://doi.org/10.1175/WAF906.1.
Karstens, C. D., and Coauthors, 2015: Evaluation of a probabilistic forecasting methodology for severe convective weather in the 2014 Hazardous Weather Testbed. Wea. Forecasting, 30, 1551–1570, https://doi.org/10.1175/WAF-D-14-00163.1.
Korfe, N. G., and B. A. Colle, 2018: Evaluation of cool-season extratropical cyclones in a multimodel ensemble for eastern North America and the western Atlantic Ocean. Wea. Forecasting, 33, 109–127, https://doi.org/10.1175/WAF-D-17-0036.1.
Lin, Y., 2011: GCIP/EOP surface: Precipitation NCEP/EMC 4km Gridded Data (GRIB) Stage IV Data, version 1.0. UCAR/NCAR–Earth Observing Laboratory, accessed 9 August 2019, https://doi.org/10.5065/D6PG1QDD.
Liu, C., K. Ikeda, G. Thompson, R. Rasmussen, and J. Dudhia, 2011: High-resolution simulations of wintertime precipitation in the Colorado headwaters region: Sensitivity to physics parameterizations. Mon. Wea. Rev., 139, 3533–3553, https://doi.org/10.1175/MWR-D-11-00009.1.
Loken, E. D., A. J. Clark, M. Xue, and F. Kong, 2017: Comparison of next-day probabilistic severe weather forecasts from coarse- and fine-resolution CAMs and a convection-allowing ensemble. Wea. Forecasting, 32, 1403–1421, https://doi.org/10.1175/WAF-D-16-0200.1.
Lorenz, E. N., 1993: The Essence of Chaos. University of Washington Press, 227 pp.
Mahoney, K. M., and G. M. Lackmann, 2006: The sensitivity of numerical forecasts to convective parameterization: A case study of the 17 February 2004 East Coast cyclone. Wea. Forecasting, 21, 465–488, https://doi.org/10.1175/WAF937.1.
McMillen, J. D., and W. J. Steenburgh, 2015: Impact of microphysics parameterizations on simulations of the 27 October 2010 Great Salt Lake–effect snowstorm. Wea. Forecasting, 30, 136–152, https://doi.org/10.1175/WAF-D-14-00060.1.
Mittermaier, M. P., and G. Csima, 2017: Ensemble versus deterministic performance at the kilometer scale. Wea. Forecasting, 32, 1697–1709, https://doi.org/10.1175/WAF-D-16-0164.1.
Moore, J. T., and P. D. Blakley, 1988: The role of frontogenetical forcing and conditional symmetric instability in the Midwest snowstorm of 30–31 January 1982. Mon. Wea. Rev., 116, 2155–2171, https://doi.org/10.1175/1520-0493(1988)116<2155:TROFFA>2.0.CO;2.
Morcrette, J.-J., H. W. Barker, J. N. S. Cole, M. J. Iacono, and R. Pincus, 2008: Impact of a new radiation package, McRad, in the ECMWF integrated forecast system. Mon. Wea. Rev., 136, 4773–4798, https://doi.org/10.1175/2008MWR2363.1.
Niu, G.-Y., and Coauthors, 2011: The community NOAH land surface model with multiparameterization options (NOAH-MP): 1. Model description and evaluation with local-scale measurements. J. Geophys. Res., 116, D12109, https://doi.org/10.1029/2010JD015139.
Novak, D. R., L. F. Bosart, D. Keyser, and J. S. Waldstreicher, 2004: An observational study of cold season–banded precipitation in northeast U.S. cyclones. Wea. Forecasting, 19, 993–1010, https://doi.org/10.1175/815.1.
Novak, D. R., B. A. Colle, and A. R. Aiyyer, 2010: Evolution of mesoscale precipitation band environments within the comma head of Northeast U.S. cyclones. Mon. Wea. Rev., 138, 2354–2374, https://doi.org/10.1175/2010MWR3219.1.
Ota, Y., J. C. Derber, E. Kalnay, and T. Miyoshi, 2013: Ensemble-based observation impact estimates using the NCEP GFS. Tellus, 65A, 20038, https://doi.org/10.3402/tellusa.v65i0.20038.
Robinson, P. J., 1989: The influence of weather on flight operations at the Atlanta Hartsfield International Airport. Wea. Forecasting, 4, 461–468, https://doi.org/10.1175/1520-0434(1989)004<0461:TIOWOF>2.0.CO;2.
Roebber, P., D. Schultz, B. Colle, and D. Stensrud, 2004: Toward improved prediction: High-resolution and ensemble modeling systems in operations. Wea. Forecasting, 19, 936–949, https://doi.org/10.1175/1520-0434(2004)019<0936:TIPHAE>2.0.CO;2.
Roebber, P. J., 2009: Visualizing multiple measures of forecast quality. Wea. Forecasting, 24, 601–608, https://doi.org/10.1175/2008WAF2222159.1.
Rooney, J. F., 1967: Urban snow hazard in the United States—Appraisal of disruption. Geogr. Rev., 57, 538–559, https://doi.org/10.2307/212932.
Rothfusz, L. P., R. Schneider, D. Novak, K. Klockow-McClain, A. E. Gerard, C. Karstens, G. J. Stumpf, and T. M. Smith, 2018: FACETs: A proposed next-generation paradigm for high-impact weather forecasting. Bull. Amer. Meteor. Soc., 99, 2025–2043, https://doi.org/10.1175/BAMS-D-16-0100.1.
Schumacher, R. S., D. M. Schultz, and J. A. Knox, 2010: Convective snowbands downstream of the Rocky Mountains in an environment with conditional, dry-symmetric, and inertial instabilities. Mon. Wea. Rev., 138, 4416–4438, https://doi.org/10.1175/2010MWR3334.1.
Schwartz, C. S., and R. A. Sobash, 2019: Revisiting sensitivity to horizontal grid spacing in convection-allowing models over the central and eastern United States. Mon. Wea. Rev., 147, 4411–4435, https://doi.org/10.1175/MWR-D-19-0115.1.
Schwartz, C. S., and Coauthors, 2009: Next-day convection-allowing WRF Model guidance: A second look at 2-km versus 4-km grid spacing. Mon. Wea. Rev., 137, 3351–3372, https://doi.org/10.1175/2009MWR2924.1.
Schwartz, C. S., G. S. Romine, R. A. Sobash, K. R. Fossell, and M. L. Weisman, 2015: NCAR’s experimental real-time convection-allowing ensemble prediction system. Wea. Forecasting, 30, 1645–1654, https://doi.org/10.1175/WAF-D-15-0103.1.
Schwartz, C. S., G. S. Romine, K. R. Fossell, R. A. Sobash, and M. L. Weisman, 2017: Toward 1-km ensemble forecasts over large domains. Mon. Wea. Rev., 145, 2943–2969, https://doi.org/10.1175/MWR-D-16-0410.1.
Skamarock, W. C., and Coauthors, 2008: A description of the Advanced Research WRF version 3. NCAR Tech. Note NCAR/TN-475+STR, 113 pp., https://doi.org/10.5065/D68S4MVH.
Smirnova, T. G., J. M. Brown, and S. G. Benjamin, 1997: Performance of different soil model configurations in simulating ground surface temperature and surface fluxes. Mon. Wea. Rev., 125, 1870–1884, https://doi.org/10.1175/1520-0493(1997)125<1870:PODSMC>2.0.CO;2.
Smirnova, T. G., J. M. Brown, and D. Kim, 2000: Parameterization of cold-season processes in the MAPS land-surface scheme. J. Geophys. Res., 105, 4077–4086, https://doi.org/10.1029/1999JD901047.
Snively, D. V., and W. A. Gallus Jr., 2014: Prediction of convective morphology in near-cloud-permitting WRF Model simulations. Wea. Forecasting, 29, 130–149, https://doi.org/10.1175/WAF-D-13-00047.1.
Sobash, R. A., J. S. Kain, D. R. Bright, A. R. Dean, M. C. Coniglio, and S. J. Weiss, 2011: Probabilistic forecast guidance for severe thunderstorms based on the identification of extreme phenomena in convection-allowing model forecasts. Wea. Forecasting, 26, 714–728, https://doi.org/10.1175/WAF-D-10-05046.1.
Sobash, R. A., C. S. Schwartz, G. S. Romine, K. R. Fossell, and M. L. Weisman, 2016: Severe weather prediction using storm surrogates from an ensemble forecasting system. Wea. Forecasting, 31, 255–271, https://doi.org/10.1175/WAF-D-15-0138.1.
Song, F., and G. J. Zhang, 2018: Understanding and improving the scale dependence of trigger functions for convective parameterization using cloud-resolving model data. J. Climate, 31, 7385–7399, https://doi.org/10.1175/JCLI-D-17-0660.1.
Sukoriansky, S., B. Galperian, and V. Perov, 2005: Application of a new spectral theory of stable stratified turbulence to the atmospheric boundary layer over sea ice. Bound.-Layer Meteor., 117, 231–257, https://doi.org/10.1007/s10546-004-6848-4.
Tewari, M., and Coauthors, 2004: Implementation and verification of the unified Noah land surface model in the WRF model. 20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 14.2a, https://ams.confex.com/ams/84Annual/techprogram/paper_69061.htm.
Thompson, G., and T. Eidhammer, 2014: A study of aerosol impacts on clouds and precipitation development in a large winter cyclone. J. Atmos. Sci., 71, 3636–3658, https://doi.org/10.1175/JAS-D-13-0305.1.
Thompson, G., M. K. Politovich, and R. M. Rasmussen, 2017: A numerical weather model’s ability to predict characteristics of aircraft icing environments. Wea. Forecasting, 32, 207–221, https://doi.org/10.1175/WAF-D-16-0125.1.
Tiedtke, M., 1989: The effect of penetrative cumulus convection on the large-scale flow in a general circulation model. Beitr. Phys. Atmos., 57, 216–239.
Torn, R. D., and G. J. Hakim, 2008: Ensemble-based sensitivity analysis. Mon. Wea. Rev., 136, 663–677, https://doi.org/10.1175/2007MWR2132.1.
Tracton, M. S., 2008: Must surprise snowstorms be a surprise? Synoptic-Dynamic Meteorology and Weather Analysis and Forecasting: A Tribute to Fred Sanders, Meteor. Monogr., No. 55, Amer. Meteor. Soc., 251–268, https://doi.org/10.1175/0065-9401-33.55.251.
Tsuboki, K., 2008: High-resolution simulations of high-impact weather systems using the cloud-resolving model on the Earth simulator. High Resolution Numerical Modelling of the Atmosphere and Ocean, K. Hamilton and W. Ohfuchi, Eds., Springer, 141–155.
Weigel, A. P., M. A. Liniger, and C. Appenzeller, 2007: The discrete Brier and ranked probability skill scores. Mon. Wea. Rev., 135, 118–124, https://doi.org/10.1175/MWR3280.1.
Weisman, M. L., W. C. Skamarock, and J. B. Klemp, 1997: The resolution dependence of explicitly modeled convective systems. Mon. Wea. Rev., 125, 527–548, https://doi.org/10.1175/1520-0493(1997)125<0527:TRDOEM>2.0.CO;2.
Weygandt, S. S., and N. L. Seaman, 1994: Quantification of predictive skill for mesoscale and synoptic-scale meteorological features as a function of horizontal grid resolution. Mon. Wea. Rev., 122, 57–71, https://doi.org/10.1175/1520-0493(1994)122<0057:QOPSFM>2.0.CO;2.
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. International Geophysics Series, Vol. 59, Elsevier, 467 pp.
Wilks, D. S., 2006: Statistical Methods in the Atmospheric Sciences. 2nd ed. International Geophysics Series, Vol. 100, Academic Press, 648 pp.
Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.
Willison, J., W. A. Robinson, and G. M. Lackmann, 2013: The importance of resolving mesoscale latent heating in the North Atlantic storm track. J. Atmos. Sci., 70, 2234–2250, https://doi.org/10.1175/JAS-D-12-0226.1.
Zhang, C., Y. Wang, and K. Hamilton, 2011: Improved representation of boundary layer clouds over the southeast Pacific in ARW-WRF using a modified Tiedtke cumulus parameterization scheme. Mon. Wea. Rev., 139, 3489–3513, https://doi.org/10.1175/MWR-D-10-05091.1.
Zhang, F., C. Snyder, and R. Rotunno, 2002: Mesoscale predictability of the “surprise” snowstorm of 24–25 January 2000. Mon. Wea. Rev., 130, 1617–1632, https://doi.org/10.1175/1520-0493(2002)130<1617:MPOTSS>2.0.CO;2.
Zhang, F., N. Bei, R. Rotunno, C. Snyder, and C. C. Epifanio, 2007: Mesoscale predictability of moist baroclinic waves: Cloud-resolving experiments and multistage error growth dynamics. J. Atmos. Sci., 64, 3579–3594, https://doi.org/10.1175/JAS4028.1.
Zheng, M., E. K. M. Chang, and B. A. Colle, 2013: Ensemble sensitivity tools for assessing extratropical cyclone intensity and track predictability. Wea. Forecasting, 28, 1133–1156, https://doi.org/10.1175/WAF-D-12-00132.1.
See Novak et al. (2004, 2010), Schumacher et al. (2010), and Baxter and Schumacher (2017) for a review of processes that cause snow banding. Snowbands have been observed to form in the northwest and northeastern quadrants of low pressure areas, depending on the synoptic forcing (Baxter and Schumacher 2017).
A 4-km ensemble simulation was conducted with 41 vertical layers for the 6–8 March storm. The additional levels were added automatically by the WRF preprocessing program (“WPS”) simply by reducing the grid spacing between levels (within the staggered grid). The spatial fields of precipitation obtained for liquid precipitation exceedance values of 0.1, 0.25, 05, and 1 in. were similar to those obtained in the corresponding forecast with 31 vertical layers (not shown here).
Note that a radius of 48 km is approximately the north–south distance of Long Island, New York, as well as the distance from New York City to its northern suburbs.
Forecasts and observations are always provided to the public in inches. Since this is a paper about forecasting snow amounts, relevant discussion will be provided in inches. However, the evaluation of ensemble precipitation forecasts is done in millimeters.