Results from the generation of a multidecadal gridded climatic dataset for 57 yr (1950–2006) of daily and monthly precipitation (PTotal), maximum temperature (Tmax), and minimum temperature (Tmin) are presented for the important agricultural and forest products state of Wisconsin. A total of 176 climate stations were used in the final gridded dataset that was constructed at 8-km (5.0′) latitude–longitude resolution using an automated inverse distance weighting interpolation. Accuracy statistics for the interpolated data were based on a rigorous validation step using 104 first- and second-order climate observation stations withheld in the production of the gridded dataset. The mean absolute errors (MAE) for daily minimum and maximum temperatures averaged 1.51° and 1.31°C, respectively. Daily precipitation errors were also reasonable, ranging from −0.04 to 0.08 mm, on average, across all climate divisions in the state with an overall statewide MAE of 1.37 mm day−1. Correlation analysis suggested a high degree of explained variation for daily temperature (R2 ≥ 0.97) and a moderate degree for daily precipitation (R2 = 0.66), whereby the realism improved considerably for monthly precipitation accumulation totals (R2 = 0.87). Precipitation had the best interpolation accuracy during the winter months, related to large-scale, synoptic weather systems, and accuracy was at a minimum in the wetter summer months when more precipitation originates from local-to-regional-scale convective forcing. Overall the grids showed coherent spatial patterns in temperature and precipitation that were expected for this region, such as the latitudinal gradient in temperature and longitudinal gradient in precipitation across the state. The grids will prove useful for a variety of regional-scale research and ecosystem modeling studies.
An increasingly prognostic understanding of the key terrestrial–atmospheric feedback mechanisms has been gained through the development and proliferation of ecosystem process models, which utilize climatic inputs to drive plant physiological processes (Churkina and Running 1998; Kucharik et al. 2000; Thornton et al. 2002; Turner et al. 2006). With this increased process-based understanding of biospheric responses to climate change and variability, there is a rapidly rising demand for quality, high-resolution gridded climatological datasets that provide detailed information on the variability of temperature and precipitation at regional scales. These data enable the spatially explicit investigation of complex near-surface–atmosphere interactions over a larger, continuous region than the original climate station data permit.
Spatial interpolation of climatic information further facilitates basic research and numerous applications such as validation of climate models (Widmann and Bretherton 2000), monitoring or detecting and assessing potential impacts of regional climate change (Lobell et al. 2006; Zhang et al. 2000), risk assessment (Kaplan and New 2006; New 2002), and the impact of human activities on regional environments and ecosystem services, which is important for local policy decisions and natural resource management (Cooter et al. 2000). For example, the use of gridded climate data for the study of managed systems has increased in recent years with the ever important and expanding need to assess the impacts of historic and recent climate change on observed agricultural crop yields (e.g., Kucharik and Serbin 2008). Together with satellite observations, gridded meteorological variables can also provide important information on the dynamics of land surface processes (e.g., Hong et al. 2007; Zhang et al. 2004).
However, the availability of high-resolution meteorological data has been problematic, mainly owing to the difficulties of extrapolation of data from sparse observation networks to a regular grid over very broad regions and often complex terrain. Spatial interpolation of daily climate patterns also presents greater complexities than annual, long-term, or even monthly means. Interpolating daily data requires that the model captures multifaceted patterns in climate related to weather fronts, land cover, large bodies of water, and often elevation (Daly et al. 2002). Operational considerations, such as efficient daily model parameterization and optimization, have prohibited the development of daily gridded temperature and precipitation datasets.
Therefore, many existing datasets (e.g., Kittel et al. 2004; McKenney et al. 2006; New et al. 2002; Thornton et al. 1997) may not be suitable for a variety of regional-scale applications, such as crop monitoring, risk and climate change assessment due to the spatial scale, time step (i.e., monthly, annuals, or normals), or the use of stochastic methods for daily weather generation. Furthermore, the temporal extent of high-resolution meteorological data may not be sufficient for long-term analyses (e.g., Thornton et al. 1997).
This paper describes the methodology used to generate a high-resolution daily and monthly multivariable (i.e., temperature and precipitation) gridded historical climatic database for the period 1950–2006, covering the important forestry and agricultural state of Wisconsin, located in the upper Midwest region of the continental United States. We then present a summary of observed weather patterns in Wisconsin and a detailed accuracy assessment of the climate grids using stations withheld from the interpolation process. A summary of the potential uses and limitations of the data is then presented.
2. Data and methodology
a. Study region
The physiography of Wisconsin is characterized by generally minor topographic variations, with gently rolling landscapes. Elevation varies from a minimum along the shore of Lake Michigan to a peak of 595 m above sea level in Price County. Apart from the driftless area, Wisconsin is mostly covered by glacial drift (about 80%) and northern portions are underlain by pre-Cambrian bedrock (Curtis 1959; Dopp 1913). Climate is humid-continental (Moran and Hopkins 2002) with cold winters (mean January temperature from 1950 to 2006 was −9.5°C) and mild to humid summers (mean July temperature from 1950 to 2006 was 21.1°C), moderated by the Great Lakes. Total annual precipitation averaged 808 mm (±165 mm) across Wisconsin. A few medium to large population centers are found within Wisconsin (e.g., cities of Milwaukee, Madison, and Green Bay) while the remaining land comprises smaller cities, towns, and tribal lands, with farmlands and national and state forests composing ∼45% and ∼45.3% of the land area, respectively.
b. Climate data
Time series of daily climate observations of maximum temperature (Tmax), minimum temperature (Tmin), and total precipitation (PTotal) from the cooperative observer (COOP) station network for the years 1950–2006 were obtained directly from the National Climatic Data Center Web site (http://www.ncdc.noaa.gov/oa/ncdc.html/). The COOP stations used were distributed relatively evenly across Wisconsin (Fig. 1a) with a slightly lower station density toward the north. While the research objective was to produce a dataset for Wisconsin, we also chose stations from Illinois, Iowa, Michigan, and Minnesota that were within 70 km of the Wisconsin State boundary to mitigate edge effects during interpolation (Fig. 1a). Stations that did not have at least 53 yr of data recorded (1950–2006) were removed to avoid synthetic bias through the addition of stations during interpolation. The retained Wisconsin stations amounted to approximately 56% (144/315) of the potential station data. Several stations in the COOP network only provided precipitation and thus there were more daily precipitation observations than temperature in each climate division (CD) (Table 1). The final data record was composed of a maximum of 133 Tmax and Tmin stations and 176 PTotal COOP observation stations within Wisconsin and neighboring states (Fig. 1a). Reported station elevations ranged from approximately 179 to 541m. The average first-order (i.e., first nearest neighbor) distance was 21.2 km (from 3.2 to 65.4 km) and 25.0 km (from 4.3 to 65.4 km) for precipitation and temperature stations, respectively.
c. Preprocessing and quality control
Several data quality and consistency checks were performed on the primary station list (i.e., those with ≥53 yr of generally contiguous data) prior to further data processing steps. The primary station list was filtered separately for temperature and precipitation observations. Values of precipitation less than zero or flagged as erroneous values were replaced with a missing data flag value (i.e., −9999). In addition, values of Tmin > Tmax and values of Tmax or Tmin less than −50°C or greater than 55°C (i.e., outside historical bounds) were also replaced with the flag value. These steps were intended to screen out implausible values due to observer or data entry error, as well as misinterpretation of written data fields.
Finally, we assessed the homogeneity of each primary station prior to further processing steps. We evaluated station history metadata to account for errors and discontinuities due to the relocation of stations throughout the record (Easterling et al. 1996; Peterson et al. 1998). If a station was found to change geographical position and this change was not large (<10 km), we retained the station in the dataset and corrected the coordinates to reflect the most current position; the occurrence of station relocations was less than 2% (3 out of 176). Thus all stations in the dataset maintained one location for the entire record. In addition, the moves we could account for occurred in the early part of the record (<1960) and thus should not greatly influence results obtained from trend analysis, such as relocations from urban to rural stations (Hansen et al. 2001).
d. Filling missing data
Estimates for missing data were generated with the multiple imputation (MI) procedure in the statistical program SAS (version 9; see http://support.sas.com/documentation/onlinedoc/91pdf/index.html). The MI procedure is a Monte Carlo technique in which missing values are replaced or “imputed” with several plausible values generated by stochastic modeling of the observed data variability (Levy and Lemeshow 1999; Rubin 1987; Schafer 1997). The imputed datasets are complete, with observed nonmissing data remaining unchanged, while the original missing observations are replaced with new values. This procedure produces data that can then be used with normal parametric statistics (Levy and Lemeshow 1999). Multiple imputation has been utilized in a range of disciplines such as medical research (Barnard and Meng 1999), public and occupational health (Emenius et al. 2003; Zhou et al. 2001), and more recently for environmental and global change sciences (Hanson et al. 2007; Hui et al. 2004). More detail on the multiple imputation technique for estimation of missing data can be found in Rubin (1987) and Schafer (1997), as well as in Hui et al. (2004) for environmental monitoring and modeling purposes. There were approximately <1% and <1.5% missing or flagged daily observations for temperature and precipitation, respectively. The MI procedure was only used for brief periods of missing data (<1 month) and imputed values were held within historical bounds. A final set of consistency checks was run on the filled datasets to ensure that the estimates did not violate obvious constraints associated with recording maximum and minimum temperatures, such as those described in the previous section.
e. Gridding interpolation
The interpolation of daily climate data, from the irregularly spaced station locations to the nodes of a regularly spaced 8-km grid, was accomplished using the inverse distance weighting (IDW) spatial interpolation algorithm. While other methods were initially explored (e.g., kriging, thin plate splines), the high station density and low topographic complexity of Wisconsin yielded comparably high-quality results using the less complex IDW interpolator. Further, the complexity of accurately modeling the daily covariance between observation stations and the reduction in variance in the interpolated data field over flatter topography (Shen et al. 2001) restricted the utility of both kriging and splines, respectively, in this study.
The IDW algorithm determines unknown cell values using a linear-weighted combination of sample points within a specific neighborhood (Bolstad 2002; Nalder and Wein 1998); in this analysis we used the 12 nearest stations, which is common (e.g., Jarvis and Stuart 2001). Inverse distance weighting interpolation explicitly implements the assumption of spatial autocorrelation, or objects that are closer together are more similar in character than those that are farther apart. Furthermore, IDW is an exact interpolator, whereby the interpolated surface passes through all points whose values are known (i.e., IDW honors the observed data points) and as such, the maximum and minimum values in each interpolated surface can only occur at the observed locations. Given this criterion, exact interpolation techniques tend to dampen extreme values at unsampled locations, as is the case with IDW, but preserve the natural variability (i.e., roughness) in the data, which is important for preserving the spatial patterns at a regional scale.
The final IDW grids were produced at 5′ (8 km) latitude–longitude resolution using an automated procedure programmed using the object-oriented language ArcObjects in the Environmental Sciences Research Institute (ESRI) geographical information system software ArcGIS (version 9.2) following Eq. (1):
where Zj is the estimated value for an unknown point at location j, dij is the distance from known point i to unknown point j, Zi is the observed value for known point i, and n is the power parameter, controlling the significance of surrounding points. With higher n values, more emphasis is placed on nearby stations while a smaller n creates a smoother surface (less detail), with more emphasis (i.e., higher weighting) placed on more distant stations. A power of two (i.e., the weighting function varies with the inverse square of the distance) is commonly used with IDW (Bolstad 2002; Jarvis and Stuart 2001; Nalder and Wein 1998). Once IDW was chosen, we analyzed a subset of data to determine the optimum n to be use with the automated gridding of temperature and precipitation; we used data for all four of Wisconsin’s meteorological seasons. The criterion for choosing the optimal n was the value that best minimized the overall mean bias errors (see validation section), for an entire year. We chose a value of n equal to 1.1 for Tmax and Tmin and 2.0 for precipitation (PTotal) to preserve the broad patterns in temperature and local variation (i.e., spatial detail) in precipitation events.
f. Methodology of product validation
To evaluate the spatial coherence and overall accuracy of the interpolated climate surfaces, observation stations initially withheld from the development of the dataset were used to perform an independent validation. There were 104 withheld or validation stations available with sufficient observational record to be used in the validation, for the 1950–2006 period. Several stations had variable records (e.g., 5–49 yr), but nonetheless provide an extremely useful test of our output climate grids; stations varied by climate division with a minimum of 9 to a maximum of 21. Furthermore, the number of stations and distribution (Fig. 1a) are comparable to or better than other studies using withheld stations for validation (e.g., Price et al. 2000; Vicente-Serrano et al. 2003). The geographic locations for each station were used to extract an interpolated value from each gridcell centroid for each climate surface (Tmax, Tmin, and PTotal) and organized into a consistent time series for comparison with the observed values at daily and monthly time steps. The performances of the IDW interpolated surfaces were then evaluated with the mean error (ME) and mean absolute error (MAE) following Eqs. (2) and (3):
where yi is the observed value at the validation station, yi′ is the predicted value for the grid cell encompassing the station, and n is the total number of points. The ME provides an assessment of the trend in residuals or bias, either producing generally higher (i.e., overprediction) or lower (i.e., underprediction) values with respect to observations. The MAE is an absolute measure of the deviation of the predicted (i.e., cell value) from the observed mean at each validation station, ignoring its sign and thereby providing an indicator of the overall performance of the interpolator. In general, high MAEs indicate poor interpolation performance, while low MAEs suggest high confidence in the gridded values, such that the interpolated values reproduce the observations well (Daly 2006; Willmott and Matsuura 2006). We avoid using the root-mean-square error (RMSE) as this statistic generally inflates, often nonmonotonically, the mean errors and thus provides an overly ambiguous measure of predicted surface accuracy, especially when error variance is large (Willmott and Matsuura 2005, 2006). We instead provide the standard deviation of signed errors (i.e., MEs) to evaluate the spread in the distribution of errors. The evaluation of the climate surfaces allowed the assessment of 1) the realism and reasonableness of the spatial interpolated values and 2) the accuracy of the gridded values for unknown (i.e., validation) locations as the interpolation is essentially a prediction of values at locations for which physical data do not exist. Unless noted otherwise, all statistical tests were considered significant at the α = 0.05 level.
a. Observed climate patterns
A summary of the observed patterns in climate across Wisconsin, derived from the final primary station observation dataset, is shown in Table 1. In general, average Tmin and Tmax steadily increased from the northwest to the southeast, with CDs 1 and 2 having the coolest and CDs 8 and 9 having the warmest observed temperatures. For CD 6, Lake Michigan decreases the average annual maximum temperature, averaging 1.4°C cooler than surrounding CDs while Tmin is 1.3°C warmer than other CDs within the same latitudinal band (i.e., CDs 4 and 5). Mean annual air temperatures (MATs) ranged from a minimum of 5.12°C to a maximum of 8.25°C, for CDs 2 and 9, respectively, and averaged 6.8°C for the entire state.
Precipitation totals were generally higher in the southern (CDs 7–9) than northern CDs (Table 1). Extreme high-precipitation events were moderately similar across the state with generally higher values in the south-central to southeast climate divisions (CDs 5–9). The distribution of events was dominated by days with no measurable precipitation (i.e., 0 mm), followed by precipitation events ≤5 mm day−1 composing 9% of the observed record (Fig. 2). As shown in Fig. 3, statewide observed monthly precipitation follows a simple seasonal cycle and is highest in the summer (June–August) and at a minimum in the winter (December–February). For a given month, the interannual variability in total precipitation can be 42%–64% over the record (1950–2006). The maximum and minimum observed statewide annual rainfall was 972.6 mm (±137.9 mm) and 532.8 mm (±102.3 mm) in 1951 and 1976, respectively.
b. Interpolation results
The substantial number of daily grids generated here (64 509 in total) made it impossible to illustrate the daily sequences of climate grids over the entire climate record. Instead we provide examples as seasonal means representing the World Meteorological Office (WMO) 30-yr normal period of 1971–2000 in Fig. 4. For winter and summer Tmax and Tmin the spatial patterns exhibit the expected decreasing average temperature with increasing latitude, with slightly warmer and cooler temperatures near Lake Michigan in the winter and summer, respectively (Fig. 4). Patterns of gridded precipitation (PTotal) clearly indicate that the summer months are spatially the wettest (mean gridded precipitation of 314 mm for June–August) in Wisconsin with higher total accumulation in the western half of the state, while the winter months are the driest (mean gridded precipitation of 96 mm for December–February). The greatest accumulation of winter precipitation was located in the Lake Superior snowbelt and in the southeast, potentially attributed to lake effect snow accumulation but also correlated with warmer temperatures that increase the ability of air to hold more moisture. During the summer months the south-central and southwest portions of the state are warmest, with daytime high temperatures averaging about 28°C and nighttime low temperatures between 14° and 15°C. During the summer months, the spatial coherence of the Tmax grids highlights the influence of Lake Michigan on Wisconsin’s climate, with cooler temperatures closest to the lake front, increasing steadily inland (Fig. 4).
c. Validation of climate grids
The full available record for all primary stations used in the generation of the daily (and monthly) gridded climate surfaces, between 1950 and 2006, consists of over 2.1 million daily Tmin and Tmax values and over 2.5 million precipitation values (Table 1). The summary statistics of the mean predicted values, mean error or bias error and mean absolute errors, for daily predicted versus withheld station observed values are shown in Table 2.
Generally, we find that the mean interpolated values (both spatially and temporally) of temperature closely mirror the observed values (Table 2) with generally small MEs and MAEs for all of Wisconsin. Excluding CDs 8 and 9, average minimum temperature bias is positive and significantly different (paired t test, p < 0.0001) from zero (i.e., no bias), while Tmax MEs are generally negative and significant (paired t test, p ≤ 0.025), with generally smaller standard deviation of interpolation bias relative to Tmin. Correlation analysis illustrates the overall high degree of explained variance between observed and interpolated values (R2 = 0.97 for Tmin and R2 = 0.98 for Tmax) over the majority of the observed temperature range (Fig. 5). Largely, the daily gridded Tmin values had higher residuals (i.e., ME) and larger MAEs than Tmax as the IDW interpolator generally predicted Tmax more accurately than Tmin (Table 1; Fig. 5).
While the errors are generally minimal (Table 2), individual days can have comparatively large errors. Examination of the pattern in the prediction bias (i.e., ME) demonstrates that there is an underestimation of the maximum values and overestimation of minimum values by the interpolated temperature grids (Fig. 6). There is also modest differentiation in error between CDs. For example, Tmin bias for CD 9 is relatively flat (i.e., near zero) with a peak underestimation of ∼5°C, while the remaining CDs average 10% bias for Tmin < −30°C; CD 7 has the largest bias (14%). Excluding CDs 1 and 7, CDs have relatively similar error patterns for Tmax, where the former average a 9% underestimation of high temperatures (>35°C). However, the majority (99%) of observed Tmin values fell between −30° and 20°C and 98% of the values for Tmax ranged from −20° to 30°C (Fig. 2), which compose the range where MEs show minimal deviation from 0 (i.e., predicted − observed).
Predicted annual PTotal was within 2% (∼16 mm) of the observed values for each CD (Table 2). Daily MEs and MAEs are small, ranging between a minimum of 0.68 mm to a maximum of 1.71 mm for CDs 4 and 8, respectively, for PTotal MAE. The MEs for PTotal were generally about 0.1 mm or less and generally had higher standard deviations (i.e., error variances) than temperature (Table 2), owing to the commonly larger distribution of errors. For example, the ME standard deviation was 50% larger for CD 8 than CD 4, where the former receives only about 22 mm more precipitation than the later, annually.
Figure 2b presents the frequency distribution of observed and predicted PTotal (i.e., > 0 mm) at the validation stations, and shows a moderate but consistent underprediction of observed event frequency in the upper range (∼25 to 60 mm day−1) and a slight overprediction of event frequency ≤15 mm day−1. This highlights the difficulty of mapping precipitation accurately at daily time steps due to the generally patterned nature of precipitation events (i.e., spotty across large regions), resulting in the occurrence of small amounts (generally <2 mm) of predicted precipitation in regions where none was observed. For example, the predicted occurrence of days with no precipitation was about 14% less than that observed at the validation stations, while events < 2 mm were overpredicted by ∼57%.
Correlation analysis between daily observed and interpolated PTotal [interpolated = 0.67(observed) + 0.74, R2 = 0.66, RMSE = 3.23, p < 0.0001] is lower than what we found for temperature, with a higher offset, but still highly significant. Examining the interpolated and observed monthly accumulation totals we find that correlation increased substantially (Fig. 7), indicating that the errors associated with an abundance of predicted low PTotal events (i.e., < 2 mm day−1) do not strongly affect longer accumulation periods (i.e., monthly totals).
The daily and monthly PTotal residuals (Fig. 8) highlight the tendency to underestimate accumulation totals >12 mm day−1 and about 100 mm month−1 for the daily and monthly PTotal grids, respectively. To understand the effect daily biases had on the overall accuracy, we examined the mean (1950–2006) frequency of observed daily precipitation values (Fig. 9). There were an average of 113 precipitation events per grid cell, annually, over the period of record (i.e., 1950–2006) and 82% of the observed total accumulation, on days with rain, was composed of precipitation events of 10 mm or less (Fig. 9). Within this range of daily PTotal, the average ME bias is ≤−2.5 mm, thus a maximum of a 25% error. For the monthly data, the majority (86%) of monthly accumulation falls between 0 and 115 mm month−1. Within this range, there is close agreement between predicted and observed values with the error averaging −4.88 mm (4%). This illustrates that the overall effect these biases have on annual totals is small and thus results in only a slight overprediction in annual totals by CD (Table 2).
d. Seasonal patterns in error
Finally, we examined the data for seasonality in errors (Figs. 10, 11). Results for Tmax show that summer months, with the lowest diurnal variation, have the best gridded accuracy, while spring and autumn months with greater daily range in Tmax have a decreased accuracy (Fig. 10); CD 4 has the greatest MEs and the largest variation in monthly Tmax. Mean absolute errors for Tmax are generally less, by CD, relative to the errors in Tmin, a situation that reflects the results from the regression analysis (Fig. 5). For Tmin, the spread in MEs (Fig. 10a) is larger than that for Tmax (Fig. 10c) with CDs 3 and 9 having the largest seasonal biases; the cumulative seasonal ME was 0.23°C. The mean bias for Tmin increased slightly across Wisconsin from May to August, while the MAEs were largest in the winter (Fig. 10b). For both Tmin and Tmax the winter months were more prone to excessive errors than the summer months, with standard deviations of the MEs about 30% higher from December to February.
Seasonal patterns were significantly more apparent in the diagnostics of the gridded PTotal (Fig. 11). Summer months (i.e., June–August) show greater error in the average daily precipitation with a slightly positive bias, relative to the drier (Fig. 4) autumn and winter months (October–March) across the state. The MAEs for daily PTotal ranged from ∼0.5 to 3 mm during the year and for monthly accumulation totals we found a range in MAEs from ∼15 mm in the winter and spring to 25 mm in the summer (data not shown). While in absolute terms the errors are small (Table 2), they do constitute a highly variable percentage of the daily precipitation totals given the seasonal winter dry and summer wet climate of Wisconsin (see Fig. 2). For example, in the winter months, MEs were about 4.4% of the daily precipitation statewide, while in the wettest months the MEs average up to 35%, peaking at 36% in July across Wisconsin. The regional differences between CDs illustrate the variation in interpolation accuracy and highlight the large spatial differences in total precipitation accumulation, with larger errors in CDs receiving greater accumulation (CDs 7–9; Table 1).
Through a rigorous and concerted effort, daily and monthly grids of minimum and maximum temperature as well as precipitation at 8-km latitude–longitude resolution have been produced for the state of Wisconsin for the period 1950–2006. These grids have already been used to examine the impacts of recent climates on crop yields in Wisconsin (Kucharik and Serbin 2008) and preliminary studies validating global climate model output. This dataset presents a comprehensive, multidecadal, spatiotemporally complete database that is useful for regional climate analysis, risk assessment, ecosystem modeling, and management and planning purposes. This dataset was produced with a much higher station density and spatiotemporal resolution and longer data record than was feasible in many previous gridded databases (e.g., Kittel et al. 2004; McKenney et al. 2006; Thornton et al. 1997).
The errors exhibited in the interpolated climate grids display the intrinsic weather patterns reflecting the seasonal atmospheric processes found in the region, in addition to the inherent errors associated with the cooperative observer station network. These include observer error, differences in equipment calibration and error, observation inhomogeneities (from urbanization, land use bias, etc.), time of observation bias, and others (Hansen et al. 2001; Peterson et al. 1998). The accuracies of our temperature and precipitation grids are bound by both the chosen spatial interpolation (e.g., parameterization, algorithm) and input data quality.
Geography was also an important consideration in interpreting our gridded data. For example, CD 4 had a consistently larger seasonal bias (underestimation) in Tmax relative to the other CDs, while CD 5 had the smallest MAE, which was significantly lower than the average (Fig. 10d). Temperatures were generally warmer in the summer and cooler in the winter on the western edge of the state (Fig. 4), with generally larger diurnal temperature ranges (DTRs) (Moran and Hopkins 2002). These larger DTRs were not due, however, to the specific impacts of topography. While elevation often influences climate patterns (e.g., Daly et al. 2007; Hasenauer et al. 2003; Vicente-Serrano et al. 2003), topography was not a significant factor in the region for climate mapping (Xia 2008; You et al. 2008). Furthermore, while proximity to large water bodies can also influence interpolation results (Daly et al. 2002), the 30-yr climatology across the state (Fig. 4) shows the expected influence of the Great Lakes.
Prediction biases (i.e., MEs) for temperature were generally larger (and positive, indicating overestimation) for Tmin than the corresponding errors for Tmax. Similarly, the MAEs and the average variation [i.e., standard deviations (SDs)] were higher for Tmin (Table 2). This result has been observed previously (e.g., Bolstad et al. 1998; Stahl et al. 2006; Thornton et al. 1997) and is likely due to the number of factors that make interpolation of nighttime (minimum) temperatures more complex, several of which can occur at very small scales (e.g., <1 km). For example, the influence of thermal inversions can be more influential in minimum temperature mapping (e.g., Bolstad et al. 1998; Daly et al. 2007, 2002) and the extent of cloud cover (Dai et al. 1999) can increase the spatial variation in nighttime temperatures, resulting in a larger disparity between predicted and observed values at validation stations. While the influence of urbanization (i.e., urban heat island effect) was not directly accounted for in this study, about 90% of the stations were located in rural settings thus proving to be only a minor influence on climate patterns.
With some exception (e.g., Vicente-Serrano et al. 2003) the mapping of precipitation totals is generally more difficult than corresponding maximum and minimum temperatures (e.g., Thornton et al. 1997; Daly et al. 2007), especially daily values, as temperature is an intrinsically smoother variable than precipitation, where the latter is generally more heterogeneous across broad regions, often depending on season. Here daily precipitation was significantly more challenging to model than temperature because of several issues. For example, Ensor and Robeson (2008) found that gridding of daily precipitation, particularly with an algorithm that includes a smoothing parameter, can have a large impact on the statistical properties of the resulting precipitation field. As is the case with this study, gridding often results in a higher proportion of days with precipitation, but with those days having less precipitation.
The heterogeneity of precipitation can result in a large PTotal gradient across Wisconsin (Fig. 4), owing to the often highly localized precipitation events, prevailing weather, and lake effects (Moran and Hopkins 2002), which can be difficult to adequately predict spatially. For example, a high proportion of precipitation falling from May to September comes from convective (thermal) forcing, often associated with nighttime mesoscale convective complexes, and large frontal systems producing short correlation lengths in the PTotal field. For the remainder of the year, precipitation is associated with large-scale synoptic features with the formation of precipitation occurring at high levels within the atmosphere. When summed to monthly data, the precipitation results improve and confidence in the data is increased. This is a consistent issue in the mapping of daily precipitation (e.g., Daly et al. 2007; Fekete et al. 2004; Thornton et al. 1997) and warrants further study into methods to minimize this effect of interpolation (e.g., Hewitson and Crane 2005); there are likely more datasets with this issue that are not adequately reported in the literature. Finally, the difficulties in measuring solid precipitation (i.e., snow and ice) accurately through collection and appropriate conversion to liquid water equivalent (LWE) can influence interpolation results, producing an underreporting of precipitation during the winter months.
a. Comparison with other gridded climatic datasets
There is a general deficiency of historical daily climate grids of similar spatial and temporal coverage found in the present study. However, more attention has been given in recent years to the application of interpolation techniques for the development of daily gridded meteorological data (e.g., Di Luzio et al. 2008; Hasenauer et al. 2003; Thornton et al. 1997). The validation of these emerging datasets has generally followed three techniques: the use of iterative cross validation (Kittel et al. 2004; Stahl et al. 2006; Thornton et al. 1997), a subsample of withheld station data (Bolstad et al. 1998; Price et al. 2000; Vicente-Serrano et al. 2003), or a combination of both (Hasenauer et al. 2003; McKenney et al. 2006). Here we present a comparison with other datasets reporting representative independent station validation statistics.
On average, validation results illustrated that the output accuracy of the gridded data is high (Table 2) and we find that the spatial patterns in temperature and precipitation are realistic (Fig. 4). The correlation between observed and predicted temperatures was found to be quite good (Fig. 5) and comparable to results from Thornton et al. (1997). While IDW is found to be generally deficient in mountainous regions (e.g., Daly et al. 2003), it has been shown to provide comparable results to more complex spatial interpolation algorithms in areas with flatter topographic characteristics (e.g., Nalder and Wein 1998; Shen et al. 2001; You et al. 2008). Jarvis and Stuart (2001) and Nalder and Wein (1998) showed that IDW compared well with more complex algorithms in regional applications, with and without appropriate consideration of guiding variables. Stahl et al. (2006) included elevation as a covariate in several of the 12 spatial algorithms tested in their interpolation of daily temperature over British Columbia, Canada. They reported a range of MAEs from 1.22° to 1.59°C for maximum temperature and 1.55° to 1.99°C for minimum temperatures. Hasenauer et al. (2003) observed cross-validation errors slightly better than the present study using “DAYMET” (Thornton et al. 1997), reporting an MAE of 1.17° and 1.01°C for Tmin and Tmax, respectively. However, our mean bias errors compared better to their independent station validation results, reporting MEs of −0.3° and 0.1°C versus 0.23° and −0.03°C (Table 2) for Tmin and Tmax, respectively. The study by Bolstad et al. (1998) observed bias values ranging from −0.05° to 0.21°C for temperature using kriging, regression, and lapse rate–corrected interpolations and DeGaetano and Belcher (2007) observed MAEs of 1.14° and 1.43°C and MEs of −0.062° and −0.015°C for maximum and minimum temperatures, respectively, using IDW adjusted for elevation.
We observed seasonal patterns in prediction accuracy for the temperature and precipitation grids (Figs. 10, 11), which is consistent for areas of differing terrain, station densities, and using different interpolation techniques and assumptions (e.g., Daly et al. 2007; Gyalistras 2003). This variation in error is likely associated with several factors, such as dominant seasonal weather patterns. For example, Stahl et al. (2006) observed significant seasonal variation in validation errors for all spatial interpolation techniques tested. In addition, DeGaetano and Belcher (2007) observed increasing MAEs for minimum temperature with increasing snow depth. The MAEs for Tmin in this study were generally highest for all CDs during the months with the greatest probability of snow cover. We also observed low mean absolute errors for precipitation during the winter and higher MAEs in the spring and summer months, a finding consistent with others (e.g., Nalder and Wein 1998).
Despite the difficulties of gridding daily precipitation events, we found that the overall performance of the IDW grids is comparable to previous daily (Thornton et al. 1997; Daly et al. 2007) and monthly (Price et al. 2000; McKenney et al. 2006) gridded climate datasets. For example, Shen et al. (2001) observed interpolation accuracies similar to this study using an IDW algorithm and Xia et al. (2001) observed an MAE only marginally better than reported here using a thin-plate-spline interpolation, or an MAE of 1.17, versus 1.37 mm day−1. Furthermore, the results of this study are highly comparable to Hasenauer et al. (2003) for both cross-validation and withheld station validation statistics using DAYMET (Thornton et al. 1997) to interpolate both temperature and precipitation over complex terrain in Austria; however, the later study had a significantly larger elevation gradient. Nevertheless, Hasenauer et al. (2003) used a 1-km-resolution digital elevation model (DEM) to account for topography in their interpolation.
By examining a longer temporal period (e.g., monthly data) we found the accuracy and realism of the PTotal grids increased (Fig. 7), suggesting that the propagation of error is minimal, which is consistent with Thornton et al. (1997). A suggested remedy for the daily PTotal is to use a desired threshold of minimum precipitation (e.g., <1 mm) when using the data to drive water balance calculations in a process model and hydrological applications (e.g., estimating runoff and levels in catchments). We also recommend the use of monthly data for the analysis of some climatological trends. Overall however, the precipitation grids (Fig. 4) correctly generated the winter dry and summer wet seasonal pattern of PTotal (Fig. 3), the west to east gradient of precipitation that is common for the winter and summer months, and the mesoscale pattern of high snowfall accumulation in the Lake Superior snowbelt in the far north (Moran and Hopkins 2002).
b. Limitations and potential uses of the data
As with using any spatial interpolation algorithm to generate gridded climate data, a given grid cell will likely contain a degree of “smoothing” of the data extremes, particularly where there was no observed data. Thus, the prediction of record events of Tmax, Tmin, or PTotal for a given day will not be adequately represented in the gridded data and as such should not be used for these purposes. Similarly, the use of these data for legal purposes (i.e., trials and litigations) is not recommended and those seeking information on the climate of a particular day in a specific location should always consult original station data or a climate expert.
Despite limitations, regional interpolated climatic grids of daily and monthly temperatures and precipitation are useful for various purposes. As with previous datasets for which predicted values are based on observational records (Thornton et al. 1997; Rawlins and Willmott 2003; McKenney et al. 2006) our dataset represents historical information and variability that can be used to generate the occurrence and general trends of key events such as the last and first frosts, as well as daily statistics such as accumulated growing-degree days (AGDD). This gridded dataset provides a high-resolution alternative to coarser-scale data for regional-scale analyses such as risk assessment and input to ecological process models. The methodology presented is sufficiently portable, in that the methods can be used to derive climate databases for other regions where a dense network of COOP stations exist, with or without increased algorithm complexity depending on the region of interest, topographic characteristics, and other key factors controlling gridded accuracy (Daly 2006).
5. Summary and conclusions
The societal importance of Wisconsin and other key forestry and agricultural states will continue to increase as the global population rises and an emerging market for biofuels develops in the next few decades. As we become increasingly reliant on the goods and services that are provided by our ecosystems in the Midwest, changes in mean climate and the frequency of extreme events may result in increased variability in ecosystem productivity across key forestry and agricultural regions, potentially compromising food and fiber supplies, and bioenergy feedstocks (Kucharik and Serbin 2008; Lobell et al. 2006; Scheller and Mladenoff 2005). Detailed assessments of the historical influence of climate on such things as forest productivity, water quality, and changes to hydrological systems, as well as crop production and yields, stand to be highly beneficial for the development of adaptive management and future planning purposes (Kucharik 2006). To facilitate these types of studies, high-resolution climate datasets for management and modeling purposes are increasingly desired, and development of such datasets will help society better understand how previous climate change has impacted ecosystem functioning and could help to develop adaptive strategies to combat the undesired consequences of continued climate shifts. We hope that our scientific colleagues, fellow resource managers, and policymakers make use of the new dataset here in their own research objectives.
This project was funded by the Wisconsin Focus on Energy Environmental Research Program. This work was also supported in part by the Wisconsin Initiative on Climate Change Impacts (WICCI), with the help of Pete Nowak and Lewis Gilbert. The authors thank Drs. Ed Hopkins and John Young of the UW—Madison Department of Atmospheric and Oceanic Sciences and Wisconsin State Climatology Office for their helpful suggestions and expert review in the preparation of this manuscript. We are also very grateful to Scott Gebhardt for his early contributions to this effort. We also extend our gratitude to three anonymous reviewers for providing constructive comments that helped improve this manuscript.
Corresponding author address: Shawn P. Serbin, University of Wisconsin—Madison, 1630 Linden Dr., Madison, WI 53706. Email: email@example.com