This study details the creation of a gridded snowfall dataset for North America, with focus on the quality of the interpolated product. Daily snowfall amounts from National Weather Service Cooperative Observer Program stations and Meteorological Service of Canada surface stations are interpolated to 1° by 1° grids from 1900 to 2009 and examined for data record length and quality. The interpolation is validated spatially and temporally through the use of stratified sampling and k-fold cross-validation analyses. Interpolation errors average around 0.5 cm and range from less than 0.01 to greater than 2.5 cm. For most locations, this is within the measurement sensitivity. Grid cells with large variations in elevation experience higher errors and should be used with caution. A new gridded snowfall climatology is presented based on in situ observations that capture seasonal and interannual variability in monthly snowfall over most of the North American land area from 1949 to 2009. The Community Collaborative Rain, Hail and Snow Network is used as an independent set of point data that is compared to the gridded product. Errors are mainly in the form of the gridded data underestimating snowfall compared to the point data. The spatial extent, temporal length, and resolution of the dataset are unprecedented with regard to observational snowfall products and will present new opportunities for examining snowfall across North America.
Long-term, high-quality observations (Peterson and Vose 1997; Klein Tank et al. 2002; Austin and Colman 2008; among others) are an integral part of documenting changing climatic conditions. They are also a valuable tool for validating many types of remotely sensed data and can extend our knowledge of climate conditions to those preceding the satellite era. Compiling a database of long-term snowfall data is somewhat challenging, since snowfall data are manually collected by observers, who are in a consistent network of trained observers through space and time. There are a limited number of such snowfall datasets available, and independently most of them are either lacking in temporal or spatial extent—and in some cases, quality.
Among the commonly used datasets for snowfall analysis, limitations often involve the spatial coverage of reporting stations over the desired period of record in the study region. Because studies generally conduct quality-control procedures to ensure the best possible selection of stations, the number of useful stations is often significantly reduced from all those with data. Among the more recent snowfall studies (Leathers et al. 1993; Groisman and Easterling 1994; Ellis and Leathers 1996; Leathers and Ellis 1996; Hartley 1996; Hartley and Keables 1998; Serreze et al. 1998; Smith and O’Brien 2001; Bradbury et al. 2002; Burnett et al. 2003; Changnon et al. 2006; Kunkel et al. 2007, 2009c), Kunkel et al. (2007) uses the largest number of United States stations, 1119. In a Canadian study, Groisman and Easterling (1994) use 1107 Canadian stations. However, even with such large datasets, the spatial coverage during the earlier period of the record is still sparse.
Aside from a paucity of reporting stations early in the twentieth century, previous snowfall studies have also required continuous data from a station over the entire study period, which limits the number of usable stations. The longest period of record found in any of the studies cited above is Kunkel et al. (2007), which used data from the period 1900–2004. Kunkel et al.’s long period of record could only be used when the number of stations in their study was reduced from 1119 to 233, and large gaps in spatial coverage were accepted. Station availability from earlier periods is still an issue for a composite/interpolated dataset; however, through interpolation of the data, stations can be included that are not continuous and may only be available for a specific period. Users must be aware that spatial consistency of stations contributing to the interpolation is not maintained and may impact earlier interpolated values at some locations.
The final factor affecting the selection of an appropriate dataset is temporal resolution. Snowfall data typically have daily to seasonal or annual temporal coverage. Many studies use monthly snowfall data. This temporal resolution is useful when identifying general trends in snowfall amounts, but a higher-resolution dataset is preferred when examining the frequency of snowfall events, changes in the number of extreme snowfall events, or synoptic patterns associated with snowfall. This higher resolution is ideal to identify changes in the short-term weather conditions that influence and produce snowfall. Daily data also allow for the computation of snowfall intensity, event size distributions, and snowstorm statistics.
Among snowfall data, heretofore a long-term gridded dataset for North America has not been available. A gridded dataset not only provides the opportunity to include all stations present during the period of record but also provides estimated values for areas with nonexistent observations that may be useful in applications where estimates are preferred to no data at all. The dataset presented in this study is a 1° by 1° North American gridded snowfall product with a daily resolution and a period of record that extends from 1900 to 2009. An earlier version was developed in conjunction with a snow depth dataset in Dyer and Mote (2006) (Kluver 2007).
Gridding precipitation is not a novel practice in climatology. There are several precipitation datasets with higher spatial resolutions than the one presented here that cover similar or larger geographical areas. Datasets that produce precipitation grids (among other nonsnow variables) include Willmott and Matsuura (2012), which extends from 1900 to 2010 at a 0.5° by 0.5° global resolution; Hijmans et al. (2005), which uses a 1-km grid for global landmasses from 1950 to 2000; the Parameter-Elevation Regressions on Independent Slopes Model (PRISM; Daly et al. 2008), which covers the conterminous United States with time series from 1895 to present at 800-m resolution; and Livneh et al. (2013), which uses interpolated precipitation to a 1/16th° spatial resolution over the conterminous United States from 1915 to 2011. Other datasets create interpolated fields of snow variables, such as the Finnish Meteorological Institute’s GlobSnow project (Hancock et al. 2013; Metsämäki et al. 2015), which produces a Northern Hemisphere satellite–derived snow water equivalent and fractional snow extent at a 0.01° grid resolution.
Reanalysis products such as the North American Regional Reanalysis (NARR) [post-2003 updates called the Regional Climate Data Assimilation System (R-CDAS); Mesinger et al. 2006] and the Modern-Era Retrospective Analysis for Research and Applications (MERRA; Rienecker et al. 2011) produce simulated snowfall data from 1979 to present at a fine spatial resolution (0.3° resolution for NARR/R-CDAS and ½° latitude by ⅔° longitude resolution globally for MERRA). These are products of model output based on assimilated input data that have changed over time and are therefore vulnerable to biases (Bengtsson et al. 2004; Rapaic et al. 2015). For example, NARR output experienced a detectable breakpoint in 2004 data corresponding to the ceased assimilation of Canadian gauge precipitation observations (Eum et al. 2014). Neither of the two reanalysis products listed above currently includes snowfall measurements in the assimilated input [Regional Data Assimilation System for NARR and Goddard Earth Observing System Data Assimilation System (GEOS DAS) for MERRA]. The NOAA National Weather Service’s National Operational Hydrologic Remote Sensing Center (NOHRSC) Snow Data Assimilation System (SNODAS) also assimilates multiple input snow variables for the coterminous United States to a 1-km spatial grid from 2003 to present (Carroll et al. 2001). However, the documentation explicitly states that “these data are not suitable for snowfall events or totals for specific regions” (NOHRSC 2004).
The dataset described in this manuscript is unlike the products discussed above because it is solely based on snowfall observations. It has a large spatial extent (North America) and a long period of record with a high temporal resolution (daily data from 1900 to 2009). The gridcell resolution is not as fine as some other interpolated products, as inputs of snowfall data alone, on a higher-resolution grid, would create many purely interpolated gridcell values. This dataset’s characteristics of being based on observations and having both a large spatial breadth and temporal longevity make for a unique product.
Section 2 describes the station data used to create the gridded product, the gridding methodology, and the steps taken to verify the interpolation method and to quantify its error. Monthly and seasonal snowfall climatologies and a spatial distribution of frequency of moderate-sized snowstorms are given in section 3. In section 4, a comparison with data from the Community Collaborative Rain, Hail and Snow Network is detailed. A summary of the dataset creation and validation is presented in section 5.
2. Data and gridding methodology
a. Data and interpolation methodology
The station data used in the grid interpolation are from U.S. National Weather Service Cooperative Observer Program (COOP) station network (National Climatic Data Center 1981) and daily surface observations from the Meteorological Service of Canada (Environment and Climate Change Canada 2012), which were quality controlled using criteria from Robinson (1988). Snowfall events entered as “trace” were treated as 0.0 cm. Input stations available during any part of the period of record are plotted, as well as the grid locations, in Fig. 1.
In Fig. 2 the number of reporting stations available each year is plotted, as well as the percentage of 1° by 1° grid cells collocated with the reporting stations. The number of reporting stations used in the interpolation varies widely from the early to later part of the record. Before 1948 there are never more than 2500 stations available, meaning less than 40% of the grid cells in the gridded snowfall product contain any reporting stations within the grid. A large increase in the number of stations coincides with the full digitization of the COOP data beginning with the year 1948; following this, the number of reporting stations increases until a peak of approximately 9500 stations in 1965. However, even at its largest extent, fewer than 50% of the grid cells in North America have reporting stations within them. The number of reporting stations declines into the twenty-first century to around 6000 stations at the end of 2009.
It should also be noted that daily snowfall measurement practices have not been consistent among all of the station data (NWS 1989; MSC 2013; NWS 2013). Potential differences, which lead to inhomogeneities, include the use of snowboards and rulers versus snow gauges, time of day, and frequency of measurements; the reporting of trace events or mixed precipitation, and other instructions to observers (Groisman and Easterling 1994; Kunkel et al. 2007; Mekis and Vincent 2011). For example, Canadian observers are instructed that when snow melts before an observation is made, they may use a 10:1 snow water equivalent ratio to estimate snowfall (MSC 2015). Although not part of the official observation practices in the United States since 1941 (USDA Weather Bureau 1941), some observers still have a tendency to erroneously record snow-to-liquid values of 10:1 (Kunkel et al. 2007; Baxter et al. 2005). This leads to undocumented discrepancies among stations within the United States and between the two countries. These differences may influence totals made by one observer compared to another, both simultaneously across a region and at an individual station through time.
Station data are interpolated to a 1° by 1° grid using the Spheremap spatial interpolation program, developed by Willmott et al. (1984). The grids extend from 25° to 71°N latitude and from 53° to 168°W longitude. The period of record is from 1900 to 2009 with a daily temporal resolution. Inverse distance weighting is used in Spheremap to interpolate the data on a spherical surface and then project the data onto a Carteasian plane. This is a desirable feature when interpolating over a sizeable geographical area that includes a large range in latitude. Since the station density varies with space and time, a variable search radius is used. The search radius changes to accept a maximum of the nearest 25 stations within 100 km. If there are fewer than five stations within 100 km of the grid center, then only those stations are used. If there are no stations within a 100-km radius, then the search radius becomes equal to the nearest observation.
Wagner et al. (2012) found that regression-based interpolation methods that include covariates (such as elevation) outperformed standard inverse-distance-weighted interpolation in a study of rainfall in a monsoonal environment. However, these more complex methodologies can be computationally intensive, which becomes a barrier when interpolating over a large spatial and temporal domain on a daily time scale.
As mentioned above, if there are no reporting stations within a grid cell, then the interpolation algorithm will extend the search radius to include the nearest station, some of which may be a significant distance from the grid cell of interest. This approach results in interpolated values for a grid cell based on data that are not located within the cell and are therefore an estimate purely based on adjacent cells’ data. Figure 1 shows the location of stations and grid cells. It can be seen that this interpolation method will result in interpolation values estimated from adjacent cells’ data in parts of northern Canada. Station density is examined by inspecting the number of stations contributing to the interpolation and the number of seasons each grid cell has stations located within it (and therefore it is not purely interpolated data). In Fig. 3, for each grid cell, the number of snow seasons (1 July–30 June) containing station data within the grid is shown. While large parts of Canada have no years with stations contributing to individual grid cells, the conterminous United States has several areas where the entire period of record has reporting stations within individual cells.
b. Interpolation validation
To estimate the accuracy of the interpolated product, the interpolated data are compared to observed values. A fairly standard method of verification for interpolated data is cross validation (hold one or more data points out of the interpolation process and use them to verify the interpolated value; Elsner and Schmertmann 1994; Wilks 2006). If the dataset is sufficiently small, then authors are able to perform cross validation on each station (called a jackknife analysis). An example of this is Moral (2010), who examined precipitation distributions in Spain. When datasets are large, a normal jackknife analysis can become computationally unreasonable and a k-fold cross validation is used. In such a case, Li and Shao (2010) validated several kriging methods for Australian precipitation with a hundredfold cross validation (1% of the stations omitted randomly, so the jackknife is only conducted 100 times) and a tenfold cross validation (10% of the stations omitted randomly, and the jackknife conducted 10 times). However, unlike previous validation studies, the large dataset in this study also has variations in station density with time.
For the purpose of completely sampling the spatial field during validation, a proportionate sampling technique is used. The study region is completely partitioned into strata based on latitude–longitude grids. The number of stations selected from each strata is computed as a percentage of the total number of stations within. A stratified random sample provides a more evenly distributed spatial sample, since a simple random sample would likely oversample in the conterminous United States, where far more stations exist.
Strata are based on 10° by 10° windows, shown in Fig. 4. This size is required in order for enough stations to exist for selection in most strata. Here 5% of the stations are sampled, resulting in 230 randomly selected stations. Proportionate allocation is used, where 5% of each strata’s population is sampled as the random samples, resulting in 5% of the total population being selected. Because the station density is also changing throughout time, stations used for validation purposes need to have data for as much of the period of record as possible. For this reason the station data are subsetted into those stations with at least 42 years (3500 stations) of data existing in the latter three-quarters of the period of record. It is from these 3500 stations that the k-fold cross validation is conducted. Five different random sample sets are collected from these strata. The first sample of stations is given as an example in Fig. 4.
Error throughout the period of record must also be considered temporally. The large amount of daily data prohibits performing a cross validation for each date even with a 5% sample of stations. Therefore, 10% of the dates are randomly sampled from 1900 to 2009 for November–April to use as cross-validation dates.
After spatial and temporal random selection processes are performed, the cross validation on 10% of the days is conducted five times (for the five different stratified random selections of stations). The error is examined temporally with graphs of mean absolute error (MAE) (an example given in Fig. 5) and spatially with time-averaged MAE for each station (Fig. 6).
The MAE time series (Fig. 5) shows the number of stations used in the interpolation as a whole along with the error as estimated from the cross-validation analysis. The error is highest during the early part of the record, when there are fewer stations used in the interpolation. For the entire period of record, error averages approximately 0.5 cm of snowfall. This is deemed an acceptable error considering the accuracy of the measurements is to the nearest tenth of an inch (0.25 cm; NWS 2013) for National Weather Service stations and 0.2 cm for Meteorological Service of Canada stations (MSC 2015).
In Fig. 6 the MAE averaged over all the randomly selected cross-validation days is plotted for each of the five random samples. Errors range from less than 0.01 cm in the southeastern United States to higher values downwind of the Great Lakes and the Intermountain West with areas of 1.0–2.5-cm error. A few stations experience errors on the order of 5.0 cm. One possible explanation includes missing high-elevation values as inputs into the interpolation or smoothing of the high-elevation values because of lower-elevation stations used in the interpolation. It should be noted that several of the regions that experience the largest error values also have average annual snowfall amounts of greater than 100.0 cm.
A histogram is included in Fig. 6, showing the percentage of errors that fall within each error category. For 89.1% of the stations errors are less than 1.0 cm. Of the remaining station errors, 9.1% are between 1.0 and 2.5 cm, and only 1.8% of the errors are greater than or equal to 2.5 cm.
To examine the explanatory power of elevation in Fig. 6, each of the error results from the jackknife analyses are compared to the station’s elevation value through Pearson’s product-moment correlation. The correlation coefficient is 0.22 (p = 0.000). A linear least squares regression and adjusted R2 value is calculated for the error versus elevation. The slope of the line is very small (0.004 068, although statistically significant from zero), and the adjusted coefficient of determination of the linear model (R2 value) is 0.045 59, indicating that elevation explains only 4.6% of the variation in error. When the data are limited to errors greater than or equal to 1.0 cm, then the correlation coefficient is 0.28 (p = 0.001) and the adjusted R2 value increases to 0.06887. So, when limiting the analysis to areas with higher error rates, elevation still only accounts for 6.9% of the error.
In contrast, for most of the Northern Canadian stations that experience a large amount of purely interpolated data, the absolute errors are rarely more than 1.0 cm. This may be a function of not having many stations for both the interpolation and validation, so the true nature of the snowfall variable is not captured (i.e., poor representativeness). High-elevation areas also experience poor representativeness through a low-elevation bias, which will contribute to error in these locations even if the interpolation error is relatively small.
To identify grid cells with the potential for poor representativeness, a mask is prepared (Fig. 7). Grid cells containing purely interpolated data are plotted in dark gray. A 1° by 1° digital elevation model (DEM) (SIO 1971) is used to compare each average gridcell elevation to the concomitant station elevations used in the interpolation. The average difference between the gridcell elevation and station elevations is computed. Stations are grayed out in Fig. 7 if the average station difference is greater than ±500 m (in almost all cases, the stations are lower than the DEM value). This figure and a file of individual gridcell values are available along with the dataset for user evaluation, as different errors maybe acceptable for different applications.
c. Trend analysis
Nonparametric trend analysis is preferred when analyzing precipitation data, which is not normally distributed (Juras 1994; Legates 1991; Wilks 2006). Using Kendall’s tau test (Mann 1945), the significance of a monotonic trend is determined at a 95% confidence level. The Sen slope is calculated to estimate the slope of the regression (Kendall 1955; Sen 1968). Other North American precipitation trend studies that use Kendall’s tau test and the Sen slope include Groleau et al. (2007), Jefferson (2011), Mekis and Vincent (2011), and De Martino et al. (2013).
3. Snowfall climatology
This climatology is limited to data after 1948 due to the large increase in station availability after that time. Monthly distributions of mean snowfall are produced from the gridded data. Average total seasonal snowfall is examined, as well as the minimum and maximum total seasonal snowfall amounts in the period of record. Trends are also presented for total seasonal snowfall amounts. For the seasonal analysis, a snow season is defined as 1 July–30 June.
a. Monthly climatology
Through examination of the monthly mean snowfall values computed for each grid cell (Fig. 8), it can be seen that the annual cycle of North American snowfall is well documented by this dataset. With the exception of northern Canada, throughout the summer months there are few grid cells with any snowfall; therefore, Fig. 8 shows mean monthly snowfall for all months except June–September. In autumn (October and November) consistent snowfall moves south from Canada: first, in the area of the Rocky Mountains; then, throughout the western United States; then, into the northeastern United States and the Great Lakes region. In the winter months (December–February), high mean snowfall values are seen in southern Alaska, the Intermountain West, eastern Canada, and the Great Lakes region. During spring (March–May), the mean snowfall values decrease to near-zero values by early summer. The southern extent of grids containing snowfall values above zero moves north as snowfall becomes more sporadic with the last areas to stop receiving measurable mean snowfall being the Rocky Mountains and eastern Canada.
Three areas in particular stand out as large snowfall accumulation regions across North America. First, mean snowfall values are among the largest in high-elevation regions, such as the Rocky Mountains, the Cascades, and the Sierra Nevada. Considering maximum monthly values, October–April have the highest snowfall values at high-elevation locations (not shown). The second region with large snowfall values is the maritime provinces of Canada. Consistently widespread and large mean monthly values are found from November through March. Third, the Great Lakes region shows evidence of lake-effect snow enhancement from November through March.
These monthly climatology maps exemplify the ability of the dataset to reasonably capture the climatology of North American snowfall when constrained to the latter half of the period of record. The monthly and seasonal statistics are visually realistic as far as spatial and temporal behavior of snowfall. However, it should be noted that this dataset is at too large a spatial resolution to identify high-elevation snowfall patterns, which change dramatically over small distances.
b. Seasonal climatology
Mean seasonal snowfall for 1949–2009 is shown in Fig. 9a. This is the average of all the total seasonal snowfall values for each grid cell. The highest values occur in high-elevation mountainous regions of the Rockies, the Cascades, and the Sierra Nevada; southeastern Alaska; the Great Lakes region; and the Canadian maritime provinces. Values are as high as 435.7 cm (171.5 in.) along the Montana–Wyoming border, near Yellowstone National Park.
The mean seasonal map is consistent with the monthly maps discussed in the previous section. High-elevation regions (including southeastern Alaska) have the largest snowfall values in almost all months. The Great Lakes and the Canadian maritime provinces have the highest snowfall totals in the fall and spring.
Figure 9b displays the maximum seasonal snowfall values for each grid cell over the period from 1949 to 2009. The grid cell with the largest seasonal snowfall is located in southeastern Alaska with a value of 1368.1 cm (538.6 in.), and it is the grid cell that includes Thompson Pass, Alaska, which has received as much as 974.1 in. in a season (1952/53). Other regions that consistently have the largest maximum values in the monthly maps and therefore the highest seasonal values are southeastern Alaska, the Intermountain West, the Great Lakes region, and the Canadian maritime provinces.
The geographical regions that are predominant in the maximum seasonal snowfall map are also notable in Fig. 9c, which shows the minimum seasonal snowfall over the period 1949–2009. The highest minimum values occur in the Rocky Mountains and in the Quebec and Labrador provinces. Along the southern tier and West Coast of the United States, there are many grid cells with zero snowfall during the minimum season.
Trends in total seasonal snowfall amount are computed over the periods 1900–2009 and 1949–2009. The resulting grid trend estimates are displayed in Fig. 10 if the Mann–Kendall test detects a monotonic trend that is statistically significant (at a 95% confidence level).
Trends from 1900 to 2009 are plotted in Fig. 10a. The southeastern United States, large parts of the western United States, and the northern islands of Nunavut, Canada, have experienced small decreasing trends in snowfall from 1900 to 2009. The largest negative trend is −1.7 cm yr−1 and is located in the southern tip of the Alaska Panhandle, just east of Ketchikan, Alaska. Most of the Great Lakes region, upper Midwest, Canadian Prairies, western Canada, and Alaska are experiencing increases in snowfall. The largest positive trend is 3.1 cm yr−1 in the northern part of the Alaska Panhandle. Caution is recommended when looking at the 1900–2009 trends in Fig. 10a, since the number of stations contributing to the interpolation drastically increases beginning in 1948, as discussed in previous sections.
In Fig. 10b the 1949–2009 trends indicate a decrease of seasonal total snowfall (cm yr–1) in most of the western United States and southwestern Canada. The Ohio River valley also experiences decreases in seasonal snowfall, although at rates mostly between 0.0 and −0.5 cm yr−1. The largest trend is −5.8 cm yr−1, which occurs in the northern Cascades in Washington, encompassing the Okanogan and Wenatchee National Forests and areas on the leeward side to the east. Areas of homogeneous positive trends are centered in Nunavut, Northwest Territories, northwestern Alaska, and northern Quebec. The highest positive trend is 2.7 cm yr−1 in an area straddling the Boundary Range in the Alaska Panhandle and northwestern British Columbia.
Annual snowfall trends produced by the gridded product are corroborated by previous literature. The decreasing snowfall trends in the Pacific Northwest, and western and mid-Atlantic United States have been recently documented by a quality-controlled in situ dataset (Kunkel et al. 2009b,c). In a dataset of adjusted daily precipitation for Canada, it was found that there are primarily decreasing annual snowfall trends in British Columbia, Alberta, and Saskatchewan, as well as at some locations in southern Manitoba, Ontario, and Quebec (Mekis and Vincent 2011). These same areas are identified in the gridded product (Fig. 10b) as having statistically significant negative trends in snowfall.
The Great Lakes region displays several grid cells with increasing trends, which have been documented in other studies, such as Leathers et al. (1993), Ellis and Leathers (1996), Burnett et al. (2003), and Kunkel et al. (2009a). In Canada, Groisman and Easterling (1994) found increasing trends in zonal snowfall north of 55°N, and Mekis and Vincent (2011) identified the largest increasing trends in annual snowfall to occur in northern Nunavut. Both of these correspond to an area of large positive trends exhibited in the gridded product in Fig. 10b.
c. Snowfall probabilities
In addition to average seasonal snowfall values and trends, the frequency of moderate-sized snowstorms is also presented. Changnon et al. (2006) computed a national-scale snowstorm climatology for the United States based on 268 stations over the period 1901–2001. Moderate-sized snowfall events were defined as 2-day snowfall events equal to or exceeding 15.2 cm. The frequency of these events over the period of record was computed, and it was found that along the ephemeral snow line, the frequency of moderate-sized events is less than one per year. The frequency of moderate-sized events increases in the high-elevation areas of the Intermountain West, the Great Lakes, Appalachian Mountains, and in the Northeast. Some of these locations exceed 10 moderate-sized snowstorms per year.
The criteria utilized by Changnon et al. for moderate-sized snowstorm events are applied to the gridded data presented in this study. The frequency of moderate-sized events as determined by the period of record from 1900 to 2009 is plotted in Fig. 11. Average patterns for North American snowstorms corroborate the results from Changnon et al., showing the lowest frequencies along the southern states, with many grid cells experiencing fewer than 1 snowstorm per year. The only grid cells without a 15.2-cm snowfall event during the period of record are located in Florida, Texas, California, and Arizona. East of the Rocky Mountains, the frequency of snowstorms increases with latitude. Values of more than 10 snowstorms per year are found to the lee of the Great Lakes, eastern Alaska, along the Rocky Mountains and the Cascades/Sierra Nevada, and in the Quebec and Labrador provinces. These results highlight the gridded product’s ability to capture snowstorm statistics using the daily temporal resolution.
Trends in the frequency of moderate-sized snowstorm events are calculated from 1900 to 2009 and are displayed in Fig. 12. Areas around Ungava Bay and western British Columbia are experiencing declines in the frequency of events. The largest decreasing trend is −0.08 moderate-sized snowfall events per year and occurs near Ketchikan, which is the same grid cell with the largest negative trend in 1900–2009 seasonal total snowfall. The majority of the rest of North America is experiencing increases in the frequency of moderate-sized snowfall events. The largest positive trend is 0.21 snowfall events per year and occurs in the northern part of the Alaska Panhandle, near Skagway, which is also the same grid cell with the largest positive trend in 1900–2009 seasonal total snowfall.
4. Comparison with CoCoRaHS data
Since the two long-term, spatially extensive snowfall observation networks in North America are included in this gridded product, any independent dataset that could be used for validation is limited to either temporally short or spatially small extents. Even so, validating the gridded data against completely independent data permits some useful quality assessment. The Community Collaborative Rain, Hail and Snow Network (CoCoRaHS) is selected due to its daily resolution and quantity of stations in the contiguous United States. CoCoRaHS is a network of volunteer observers who record precipitation data (Cifelli et al. 2005), and the data have been used as precipitation ground truth observations in previous studies (Kelly et al. 2012; Schwartz 2014). Snowfall data are collected as part of this network using a snowboard and a ruler, in the same manner as the datasets used as input for the gridded product described in this paper. For comparison with the daily gridded snowfall data, daily CoCoRaHS data are used for the period 1 January 2007–31 December 2008, with their locations plotted in Fig. 13. Since this is a relatively new network, during 2007 and 2008 there were some U.S. states and Canada provinces that had yet to join the network.
Spatial and temporal average error (AE) between the gridded snowfall data and the CoCoRaHS point data are computed. Instead of interpolating the CoCoRaHS data to a grid, a point comparison is done so as not to reproduce any error that might develop due to the interpolation itself. For each CoCoRaHS station’s daily data, the corresponding grid cell is identified and an AE is computed. After all available CoCoRaHS daily records are used to compute an AE with the respective grids, these differences are averaged for either each grid cell to get a spatial distribution of error or each day to get a temporal distribution of error.
The spatial distribution of AE is plotted in Fig. 14. The majority of grid cells with CoCoRaHS stations falling within them have a negative average error, indicating that the gridded snowfall value is lower than the CoCoRaHS observation. The average of all the grid cell’s errors is −2.0 cm (0.79 in.). The largest errors (all <−2.5 cm) occur at grid cells throughout the United States, with greater concentrations in high-elevation regions of Colorado, Wyoming, Washington, and within the Appalachian Mountains. There are also higher errors in the lake-effect snow regions in the lee of the Great Lakes. The smallest errors occur in the central Great Plains states, where there is little elevational change.
An underestimation of snowfall by the interpolated data may be due to the interpolation method or the location of the COOP and CoCoRaHS station data. In areas with large spatial variability in snowfall totals, interpolating the data will result in a de-emphasis of relatively few extreme values within a grid cell. The COOP network is designed as a quasi-homogeneous spatial sample, while the CoCoRaHS network is composed of a more random contingent of volunteer observers. This may result in more CoCoRaHS observers contributing in locations that have heavy amounts of snowfall, since station density is not regulated.
The time series of AE is also given in Fig. 14. The AE values are plotted by day of analysis, where the first day corresponds to 1 January 2007 and the last day of analysis is 31 December 2008. The average error for the whole period is −1.4 cm (0.55 in.). The second half of the 2006/07 snow season has the largest AE values, with 3 days that have AE values lower than −10.0 cm and an average error from 1 January 2007 to 30 June 2007 of −1.9 cm (0.76 in.). The 2007/08 snow season (1 July–30 June) has much lower AE values, with the average for the season of −1.3 cm (0.52 in.). Overall, the 2006/07 snow season was less active than the 2007/08 season (Robinson 2008, 2009). The large errors in January 2007 correspond to a snowfall event in the central Rockies at the beginning of the year and lake-effect snow in the Northeast around the second week of the year. The second largest AE values occur during April 2007, which saw snowfall in the central and New England regions of the United States, which set several records for April 1-day snowfall totals.
A spatially and temporally extensive gridded snowfall dataset is created through interpolating United States and Canadian station data. The gridded product is interpolated to a 1° by 1° resolution with daily data over the period 1900–2009. However, because of the increase in station data available for the interpolation after the late 1940s, it is cautioned to consider only the second half of the period of record when conducting broad spatial analyses across North America.
Both monthly and seasonal climatologies are computed using the period 1949–2009. Climatological maps show the gridded dataset’s ability to capture the spatial signal of seasonal snowfall variability and range. An analysis of the frequency of moderate-sized snowstorm events shows that the high temporal resolution of this dataset is able to capture approximately the same snowstorm return rates as previously documented (Changnon et al. 2006).
Through a k-fold cross-validation analysis using a random selection of stations and days, interpolation errors are estimated to be within the measurement criterion of to the nearest 0.1 in. An independent dataset (CoCoRaHS) consisting of daily point measurements over a 2-yr period is also used for validation. It is found that the gridded data consistently underestimate the amount of snowfall measured by the CoCoRaHS stations, with an average error of 1.4 cm (0.55 in.). These errors are likely a function of station locations or interpolated data smoothing out the higher-frequency signals through averaging of multiple station values.
The gridded snowfall dataset is available through the Rutgers University Global Snow Laboratory website (http://snowcover.org).
The authors would like to acknowledge Cort Willmott for the advice on interpolation validation.