Gridded precipitation and temperature products are inherently uncertain because of myriad factors, including interpolation from a sparse observation network, measurement representativeness, and measurement errors. Generally uncertainty is not explicitly accounted for in gridded products of precipitation or temperature; if it is represented, it is often included in an ad hoc manner. A lack of quantitative uncertainty estimates for hydrometeorological forcing fields limits the application of advanced data assimilation systems and other tools in land surface and hydrologic modeling. This study develops a gridded, observation-based ensemble of precipitation and temperature at a daily increment for the period 1980–2012 for the conterminous United States, northern Mexico, and southern Canada. This allows for the estimation of precipitation and temperature uncertainty in hydrologic modeling and data assimilation through the use of the ensemble variance. Statistical verification of the ensemble indicates that it has generally good reliability and discrimination of events of various magnitudes but has a slight wet bias for high threshold events (>50 mm). The ensemble mean is similar to other widely used hydrometeorological datasets but with some important differences. The ensemble product produces a more realistic occurrence of precipitation statistics (wet day fraction), which impacts the empirical derivation of other fields used in land surface and hydrologic modeling. In terms of applications, skill in simulations of streamflow in 671 headwater basins is similar to other coarse-resolution datasets. This is the first version, and future work will address temporal correlation of precipitation anomalies, inclusion of other data streams, and examination of topographic lapse rate choices.
Historical quantitative meteorological datasets for use in land surface and hydrologic modeling are available from a multitude of sources for various temporal and spatial extents and spatial resolutions (Huffman et al. 2001, 2009; Maurer et al. 2002; Lin and Mitchell 2005; Hamlet and Lettenmaier 2005; Mesinger et al. 2006; Daly et al. 2008; Haylock et al. 2008; Saha et al. 2010; Abatzoglou 2013; Xia et al. 2012; Livneh et al. 2013; Thornton et al. 2014). These products provide estimates of many different fields ranging from precipitation only (e.g., Lin and Mitchell 2005) to full meteorological fields and surface energy terms (e.g., Xia et al. 2012).
These quantitative precipitation and temperature estimates are inherently uncertain. Gauge-based estimates lack the ability to capture the true spatial variability of precipitation due to low station density, particularly in complex terrain, and often suffer from gauge undercatch (Yang et al. 1998) or other sampling errors (Clark and Slater 2006). Ground-based precipitation radar estimates suffer from sampling uncertainty as a result of terrain blocking, curvature of the earth, beam spreading, and uncertainties in the rain rate–reflectivity relationship (Austin 1987; Westrick et al. 1999; Young et al. 1999; Dinku et al. 2002; Schiemann et al. 2011). Even improved estimates from polarimetric radars are subject to sometimes large uncertainty (Ryzhkov et al. 2003; Cifelli et al. 2011). Estimates of air temperature from station networks have similar causes of uncertainty as gauge-based precipitation estimates; they lack sufficient station density to resolve the spatial variability of temperature, particularly in complex terrain (e.g., Stahl et al. 2006).
In general, the aforementioned datasets are deterministic and do not provide uncertainty estimates or ensemble output for any of the fields estimated. There are some notable exceptions, especially in the remote sensing precipitation estimation community where retrievals are inherently highly uncertain. Kirstetter et al. (2015) has developed a new probabilistic precipitation methodology for ground-based radar, and many studies have modeled or evaluated errors in satellite rainfall estimates (Morrissey and Green 1998; Krajewski et al. 2000; Huffman et al. 2001, 2009; Bauer et al. 2002; Hossain and Anagnostou 2006; Bellerby 2007; Tian and Peters-Lidard 2007; AghaKouchak et al. 2009; Habib et al. 2009; Tian et al. 2009, 2013; Sapiano and Arkin 2009; Shen et al. 2010; Kirstetter et al. 2013a,b; Maggioni et al. 2014; Gebregiorgis and Hossain 2015). In one specific example, Krajewski et al. (2000) discuss the uncertainty estimates subsequently used by Huffman et al. (2001, 2009) and note that they are developed in a somewhat ad hoc manner. Additionally, ensemble rainfall products have been developed over the last decade for satellite and ground scanning radar precipitation estimates (Bellerby and Sun 2005; Teo and Grimes 2007; AghaKouchak et al. 2010; Bellerby 2013; Nogueira and Barros 2013; Alemohammad 2014; Greatrex et al. 2014; Skinner et al. 2015). Nogueira and Barros (2013) use stage IV data to generate 1-km downscaled precipitation ensembles through modified fractal Brownian surfaces (e.g., Bindlish and Barros 1996) in an effort to provide high-resolution information in an ensemble product. However, this product is constrained to the stage IV gridbox mean and does not truly provide an uncertainty estimate. Also, Alemohammad (2014) develops a Bayesian ensemble approach for a single satellite sensor with realistic uncertainty estimates.
Uncertainties in hydrologic model forcing are a major source of hydrologic simulation error (Kavetski et al. 2003; Nijssen and Lettenmaier 2004; Renard et al. 2010; Skinner et al. 2015), and proper uncertainty specification is necessary for robust ensemble prediction. Additionally, the lack of robust uncertainty estimates (if any at all) for observed fields limits the successful application of data assimilation in hydrology (Moradkhani et al. 2005; Clark et al. 2006). Ideally all quantitative meteorological datasets, not just highly uncertain remotely sensed precipitation retrievals, should incorporate uncertainty estimates as an integrated and formalized component of dataset generation that will begin to improve our understanding and characterization of forcing uncertainties.
Here we extend the work of Clark and Slater (2006) to develop a daily, station-based, ensemble dataset of precipitation and temperature at ⅛° resolution for the contiguous United States (CONUS), northern Mexico, and southern Canada for the period 1980–2012. The dataset is initially intended for use in forcing land surface and hydrologic models and hydrologic data assimilation studies but could also be used for atmospheric model validation studies. The paper is organized as follows. Section 2 summarizes the probabilistic interpolation system, which uses the methods of Clark and Slater (2006) with minor modifications. Next, general dataset performance, comparisons to several widely used deterministic datasets, and dataset availability are described in sections 3–6. Finally, section 7 provides concluding remarks and avenues for future improvements.
2. Probabilistic interpolation
Station data from the Global Historical Climatology Network-Daily dataset (GHCN-Daily; Menne et al. 2012a,b) and the U.S. Natural Resources Conservation Service (NRCS) Snowpack Telemetry (SNOTEL) observation network were compiled for this analysis. The GHCN-Daily data were supplied by the U.S. National Climatic Data Center (NCDC) and originate from a variety of networks within the United States including the Cooperative Observer (COOP) network; the Community Collaborative Rain, Hail, and Snow (CoCoRaHS) network; and the various automated airport weather stations [e.g., Automated Surface Observing System (ASOS)]. Observations from other countries are also in the GHCN-Daily, with observations from Canada and Mexico included in this study (Fig. 1).
a. Spatial interpolation
A serially complete station dataset was used to generate a ⅛° (~12 km) resolution, gridded daily ensemble of precipitation (Prcpens), daily mean temperature Tmean, and diurnal temperature range Trange dataset on all nonwater pixels in Fig. 1, which is phase 2 of the North American Land Data Assimilation System (NLDAS-2; Xia et al. 2009, 2012) domain and the grid used here. It covers all river basins in the CONUS (including contributing areas in Canada and Mexico). The Trange is defined as the difference between the daily maximum temperature Tmax and the daily minimum temperature Tmin. Details of the generation of the serially complete station dataset can be found in the appendix. After all processing, over 12 000 unique stations with precipitation, temperature, or both types of observations were included in the analysis. The 12 153 stations provided precipitation observations and 8953 stations provided temperature observations. The Trange was used in this work under the assumption that Tmean and the Tmax–Tmin difference would generate more reliable station-derived lapse rates, and to avoid cases in which Tmax < Tmin. It is straightforward to calculate Tmax and Tmin from Tmean and Trange. The general methodology of Clark and Slater (2006) was followed with the following steps.
- Distance-dependent weights were computed at each grid cell for all stations within a given radius from Clark and Slater (2006): is the weight for the current station used to populate diagonal elements in the diagonal weight matrix as used in the local regression equations defined in Eqs. (3) and (6); is the distance of the current station to the grid point being considered; and MAXD is a specified maximum distance. Stations with distances larger than MAXD received zero weight. In this work, we limited the number of stations considered at each grid point to the 30 closest stations and computed the weights with MAXD set to 100 km if all 30 stations were within 100 km. If there were fewer than 30 stations within 100 km, then MAXD was reset to 1 km larger than the thirtieth station distance and the weights were computed using the new MAXD for that specific grid point. Roughly 90% of the grid points in the CONUS have 30 stations within a 100-km radius. In this work, these differences result in the weight matrix always being set to × (30 × 30) with no zero-weighted stations.
- Multivariate, locally weighted logistic regression was used to estimate the probability of precipitation (PoP) at each grid point using the vector of precipitation yes/no occurrence from the closest stations, the locally varying weights , and spatial attributes (initially latitude, longitude, and elevation, but any number of attributes can be incorporated): is the row vector of spatial attributes for the target grid cell and is the column vector of regression coefficients determined iteratively following Loader (1999): is the design matrix of spatial attributes from the nearest stations, is the vector of precipitation occurrence (zero or one), is the × variance matrix, and is the vector of estimated PoP at each station. The variable is initialized as a column vector of ones.
- Multivariate, locally weighted linear regression was used to determine the estimate of transformed precipitation at each grid point (PCP) using transformed nonzero station precipitation , the locally varying weights, and spatial attributes. In this paper, the transformation of nonzero precipitation for each time step at each grid point was performed using a simple power-law transformation following is the vector original quantity (nonzero precipitation), is the transformed vector quantity, and is the power-law transformation coefficient and was set to 4.0 after sensitivity testing. Then the PCP at a given grid cell is given as is the vector of regression coefficients estimated by is the design matrix of station attributes, is the weight matrix, and is the transformed precipitation amounts.
As noted above, we have limited consideration to the 30 nearest stations. Therefore, all vectors and matrices such as , , , , etc., are vectors or matrices with a maximum dimension of 30. At any given grid cell the locally weighted logistic and linear regression are applied to that cell using the unique vectors and matrices for the 30 closest stations to that grid cell.
- Uncertainty estimates E for the precipitation amounts were determined at each grid point by summing the regression residuals. This effectively combines spatial representativeness and individual station errors. From Clark and Slater (2006), E at a grid point is computed as
Two other modifications to the original Clark and Slater (2006) methodology were made for this paper. Clark and Slater (2006) computed climatological cumulative density functions (CDFs) of monthly precipitation for each station and then transformed the data into normal space before the regression steps using those CDFs. In this work, this step was dropped and only the power-law transformation was used.
Additionally, this work added topographic slope in the west–east and south–north directions to the regression as a first step toward introducing an automated method to account for windward and leeward slope precipitation differences. Smoothed topography using a 2D Gaussian low-pass filter with a size of 180 km and a standard deviation of 24 km was used to compute the two slope components. This was done to remove the high-frequency noise associated with the full-resolution topography and provide a closer match to precipitation enhancement along mountain ranges, with broad enhanced and decreased precipitation regions on the windward and leeward sides. No consideration for slope directions outside of the cardinal directions was given. For example, mountains with slopes of southwest–northeast direction will have positive slopes in west–east and south–north, and the regression can weight each component independently. Finally, for areas with little topographic slope, the regression equations used only latitude, longitude, and elevation as predictors. The change in sign of slope across mountain ranges allowed the locally weighted regression to account for relationships along slopes and to weight stations with dissimilar slopes properly, similar to the Parameter-Elevation Regressions on Independent Slopes (PRISM) method (Daly et al. 2008).
The differences in ensemble performance for precipitation between this version and the original stem from those changes in the methodology. Moving away from a climatologically based inverse-quantile transformation removed the assumption of a stationary climate when transforming observed precipitation and slightly improved computational efficiency. Additionally, limiting the regression to consider the 30 closest stations and recomputing weights where necessary slightly changed the character of the locally weighted regression. The move to a time-step-based, power-law transformation prior to the regression steps rather than building climatologically based CDFs and performing an inverse-quantile transform means that the transformed precipitation distributions are less Gaussian in some cases.
b. Ensemble generation
Each ensemble member was generated by using spatially correlated random fields (SCRFs) sampled from the standard normal distribution (Clark and Slater 2006). The SCRFs are generated using a conditional simulation approach (Johnson 1987). This method generates random numbers progressively for points on the grid, conditioning the new random numbers on the previously generated values. It allows for spatial correlation structure using a spatial correlation length:
where is the correlation of the current grid point, the initial correlation value and spatial correlation length are estimated from the station data themselves, and is the distance (km) between the ith and jth grid points. A nested grid approach following Fang and Tacher (2003) was implemented by Clark and Slater (2006) and also used here to increase the computational efficiency of the Johnson (1987) method.
An SCRF was generated independently for each day for each of the estimated variables: precipitation, mean temperature, and diurnal range. Precipitation occurrence at each grid point was determined using the regression PoP and the cumulative probability (CP) of the precipitation SCRF. Precipitation occurred at a grid point if CP was greater than PoP (step 2). If precipitation was deemed to occur, the precipitation amount was determined by first rescaling CP by
where CS is the rescaled CP of the SCRF. Then, the corresponding standard normal deviate of CS was found (RN) and the transformed precipitation amount was determined as
where is the power-law transformed precipitation amount for the current ensemble member at the current grid point, is the gridpoint normal deviate for the precipitation SCRF, and again PCP is the regression-estimated transformed precipitation gridpoint amount (step 3) and E is the gridpoint uncertainty estimate [step 4, Eq. (7)]. Finally, the actual precipitation amount (Prcpens) for the current ensemble member and grid point is
Similarly, the mean temperatures and diurnal temperature ranges were modified by their respective temperature error terms and SCRFs (directly without the power-law transformation or rescaling of the SCRF):
where is either Tmean or Trange for the current ensemble member at the current grid point, T is the regression gridpoint temperature, is the gridpoint normal deviate for the SCRF of Tmean or Trange, and is the gridpoint uncertainty estimate for either temperature variable.
Spatial correlation lengths used in the generation of the SCRFs are determined from the station data for each meteorological season separately [December–February (DJF; winter), March–May (spring), June–August (JJA; summer), and September–November (fall)]. Temporal autocorrelation, determined from the station data, is included in the SCRFs for temperature using the temperature serial autocorrelation with a 1-day lag via
where t is the current time step and is the lag-1 temperature autocorrelation. The precipitation SCRF is conditioned using Eq. (13) by setting ρ to the mean observed cross correlation between precipitation and the diurnal temperature range as well using the current Trange SCRF in place of . As a result, days with large diurnal temperature ranges are less likely to have precipitation, which is observed in the data since in general nonprecipitation days tend to be days with less cloud cover. This does not necessarily hold in all seasons or all precipitation events and is an area that should be considered for future research. Figure 2 shows example SCRFs for Trange for two time steps and precipitation for one. Since temperature has a high autocorrelation, the current and previous Trange SCRFs have high spatial similarity. Precipitation and Trange are negatively correlated, but only weakly. Thus, in general where the Trange SCRF is positive, the precipitation SCRF is more negative, but with less spatial similarity.
An important result of using SCRFs to generate gridpoint values is that this allows for gridpoint events at locations without observations that are more extreme (e.g., more precipitation) than any of the observations (plus the topographic correction). This is an important distinction from any interpolation-based schemes, in which gridpoint events are limited to the maximum observed value (plus topographic correction). This should improve the representation of extreme events in the ensemble (Gervais et al. 2014; Gutmann et al. 2014).
3. Ensemble characteristics
Precipitation statistics using leave-one-out cross validation were computed over 1980–2012 for roughly half (about 5500) of the 12 000+ stations to assess the behavior and characteristics of the ensemble. The results are summarized using reliability and discrimination diagrams. These diagrams are efficient summaries of probabilistic (ensemble) skill (Wilks 2006). Reliability diagrams display a calibration function of the ensemble, which is a line plot of the conditional probability of an event (precipitation above a threshold in this case) as a function of the estimated probability (Wilks 2006), the ensemble probability in this case. A perfectly reliable ensemble would have the same estimated probability for every observed probability for every event threshold (the calibration function will follow the 1–1 line). A dry bias (underestimation of events) will find the calibration function above the 1–1 line while a wet bias (overestimation of events) will find the calibration below the 1–1 line.
Discrimination diagrams display the distribution of the observed conditional likelihood for events and nonevents versus the ensemble estimated probability. Good discrimination between events and nonevents is seen through maximization in separation of the two likelihood distributions. In the limit, perfect nonevent distribution likelihoods should be maximized at zero probability and event distribution likelihoods should be maximized at a probability of 1 (Wilks 2006).
a. Precipitation statistics
Discrimination diagrams (Wilks 2006; Figs. 3a–d) indicate that the ensemble estimates a high probability of event occurrence when there is an observed event greater than 0 mm, and has good discrimination between events and nonevents, seen through the large separation of the event and nonevent probability distributions. For larger precipitation event thresholds, the ensemble is able to assign high probabilities of occurrence, except for the highest probabilities of occurrence for events >50 mm (Fig. 3d), and displays slightly less overlap in the two distributions for the highest threshold compared to Clark and Slater (2006).
Reliability diagrams (Wilks 2006; Figs. 3e–h) show that the ensemble has good reliability for all precipitation events (threshold >0 mm, Fig. 3e). As the precipitation event threshold is increased (Figs. 3f–h), a very slight wet bias at low probabilities of occurrence develops along with a slight dry bias for higher probabilities (>0.5), resulting in a slightly underconfident ensemble for probabilities between ~0.2 and 0.8. For the larger events, a slight wet bias is again present with a now-reduced dry bias for the higher probabilities of occurrence (Fig. 3g), with a slight wet bias at the highest probability. At the 50-mm threshold a persistent slight wet bias is present along with increased sampling uncertainty (Fig. 3h). This performance is quite similar to Clark and Slater (2006) for the lowest threshold and in general comparable at the higher event thresholds, except the consistent small dry bias for larger thresholds and higher probabilities of occurrence. Clark and Slater (2006) note good calibration for all precipitation events with some deviations (both wet and dry biases) as the precipitation threshold is increased.
Figure 4 gives the difference between the ensemble and observations for a random subset of stations across CONUS from the leave-one-out cross-validation station set. The ensemble tends to slightly overestimate precipitation in the western United States and slightly underestimate it in the southeastern United States (Fig. 4a). The ensemble also slightly overestimates the standard deviation of the station time series across the western United States with less of a tendency to underestimate station standard deviation across the eastern and southeastern United States (Fig. 4b).
b. Spatial example
Monthly precipitation for June 1993 for two ensemble members is shown in Figs. 5a and 5b. Overall, the spatial patterns are very consistent between members, with little to no precipitation in the southwestern United States and two areas of higher precipitation totals along the U.S. Gulf of Mexico coastline and the central United States. A more detailed examination of the two members in the high accumulation zones shows that ensemble member 75 (Fig. 5b) has more precipitation in parts of the central United States, while ensemble member 11 (Fig. 5a) has generally higher accumulations along the coastline. Comparisons to the ensemble mean field (Fig. 5c) highlight the more discrete nature of the individual members with areas of deviations from the ensemble mean. Finally, the ensemble standard deviation highlights areas of higher or lower uncertainty (Fig. 5d). The two high-precipitation areas generally have similar monthly accumulations (~200–300 mm), but the Gulf Coast region has standard deviations as large as 70–80 mm (~30%), while the central United States generally has ensemble standard deviations of 25–50 mm (~10%–20%). This uncertainty information available through the ensemble will enable land data assimilation schemes to more properly account for the forcing data uncertainty in time and space.
4. Comparisons to deterministic forcing datasets
Ensemble mean spatial output is compared to three widely used deterministic forcing datasets: NLDAS-2 (Xia et al. 2012), Daymet (Thornton et al. 2014), and Maurer et al. (2002, 2013; hereafter termed “Maurer”). These deterministic datasets all make different methodological choices in their dataset development. A few of the important choices are summarized here and in Table 1. The NLDAS precipitation is generated by disaggregating the Climate Prediction Center (CPC) daily ⅛° precipitation analysis using CPC hourly gauge data, U.S. radar data, satellite precipitation products, and North American Regional Reanalysis (NARR) precipitation data. Note that the CPC ⅛° daily precipitation analysis uses PRISM (Daly et al. 1994, 2008) to account for topographic effects. Daymet is produced at daily time steps using observational data and only uses the observations themselves to estimate the precipitation and temperature topographic corrections (Thornton et al. 1997, 2014) and includes SNOTEL data. Finally, the Maurer dataset uses bilinear interpolation between observations, gridcell precipitation is scaled by the PRISM 1961–90 climate mean, and temperature is adjusted for elevation using a constant temperature lapse rate of −6.5 K km−1. All three datasets use all available observations for a given day; they do not create a static, serially complete station dataset before any spatial interpolation is performed.
As noted above, the ensemble uses GHCN-Daily and SNOTEL observations, determines the precipitation and temperature lapse rates each day from the station data, and uses a static serially complete station network to create the gridded fields (Table 1). The various methodological choices each dataset makes will introduce differences in the respective final products. We stress the term differences here, as the “truth” is unknown, especially in the sparsely observed complex terrain of the western United States. However, all three datasets have been widely used to force land surface or hydrologic models and have been shown to allow for realistic hydrological and land surface simulations.
a. Precipitation across datasets
The mean daily precipitation fields across the four datasets are very similar, with all products capturing the increased precipitation in the southeastern United States, arid regions farther west, and areas of enhanced precipitation corresponding to mountainous regions in the western United States and the Pacific Northwest (Fig. 6). Differences between the four products are generally small with a few noteworthy regions of consistently positive or negative differences, almost exclusively confined to regions with complex terrain. In general, the ensemble mean has more precipitation over most of the Intermountain West than Maurer (Fig. 7a), except in select locations along the U.S. West Coast. The ensemble mean is also slightly less than Maurer throughout most of the southeastern United States. NLDAS has similar or less precipitation than Maurer over most of the United States, with only a few regions having positive differences (Fig. 7b), while Daymet generally has more precipitation than Maurer (Fig. 7c) or than the ensemble mean (Fig. 7e). However, the ensemble product may still have too much lee-side precipitation as compared to the other products, even with the inclusion of the topographic slope in the regression.
Figure 8a shows the frequency of precipitation differences between Maurer and the other three datasets. Daymet is wettest compared to Maurer, while NLDAS and Maurer are most similar because of their similar datasets and PRISM corrections. The ensemble minus Maurer differences have the most spread, which could be expected considering the ensemble is the most distinctly different dataset (Table 1). Overall, the ensemble differences mostly lie within the other interdataset differences (Fig. 8b), with the largest differences occurring in complex terrain where the largest uncertainty resides (Fig. 7).
The differences across the datasets arise from the inclusion of different stations, topographic correction differences, and rescaling to climate normals (Table 1). Maurer and Daymet do not include station filling as a preprocessing step as done in the ensemble, choosing to use only available station data for each day (with Maurer never including SNOTEL), while NLDAS uses the daily ⅛° gridded CPC analysis, again from available data only. Maurer and NLDAS use PRISM topographic precipitation corrections while Daymet and the ensemble develop their own topographic corrections from the data. Additionally, the Maurer dataset is scaled to match the PRISM (Daly et al. 1994) climate mean state for the period 1961–90 while the other datasets are not (Table 1).
b. Focused comparisons
For the rest of the paper we focus only on the Maurer dataset since it is very widely used and is developed at the same grid resolution as the ensemble product from similar input datasets.
1) Seasonal precipitation
Segregating precipitation differences by season (Fig. 9) highlights larger differences across seasons than in the annual mean (Fig. 7a vs Fig. 9). In the Intermountain West, the absolute differences are largest during the high-precipitation winter season and smallest during the dry summer season. The ensemble is generally slightly drier than Maurer during the summer season across the eastern two-thirds of the United States, while it is similar or slightly wetter during the winter season. Figure 9 highlights that the ensemble product gives similar estimates to Maurer during large-scale winter precipitation over the eastern United States, but slightly lower mean precipitation rates than Maurer during the summer when precipitation is more intermittent and small scale. A general slight underestimate of precipitation was indicated in Fig. 4, and Fig. 9 highlights that a majority of the underestimation in the ensemble likely occurs during the summer. This likely stems from the use of a regression algorithm, while Maurer effectively uses bilinear interpolation [synergraphic mapping system (SYMAP) algorithm (Shepard 1968, 1984)]. Highly localized events may be underpredicted to a greater extent when using regression versus interpolation. Future improvements of the ensemble product will work to address this potential issue.
2) Wet day fraction
Wet day fraction can be a substantial source of uncertainty in hydrologic model forcing data when the other meteorological variables (i.e., humidity and radiative fluxes) are derived from precipitation and temperature data with an empirical algorithm such as the mountain microclimate simulation model (MTCLIM; Hungerford et al. 1989; Bohn et al. 2013). In MTCLIM, shortwave radiation estimation can be affected greatly by wet day fraction because shortwave radiation is reduced by 25% on a rain day as compared to a no-rain day.
The long-term wet day fraction (WF) for the Maurer data and each ensemble member is determined by summing all days with nonzero precipitation and dividing by the total number of days. For the ensemble, the mean of the 100 ensemble members is taken to produce an ensemble mean WF, which is compared to the WF from the Maurer dataset. Both datasets show higher WF in the northeastern United States as well as the high terrain of the western United States and in the Pacific Northwest (Figs. 10a,b). The ensemble system generally has a lower WF (Fig. 10c) than Maurer. This is specifically for WF, where linear interpolation increases the number of days with precipitation for grid points between stations when either end point has nonzero precipitation. However, locally the ensemble system has a higher WF in the mountainous regions (e.g., Colorado Rocky Mountains) because of the inclusion of SNOTEL station data and the use of SCRFs in the spatial interpolation, neither of which were used in construction of the Maurer dataset (Fig. 10c). The overall lower WF in the ensemble system matches the findings in Gutmann et al. (2014), who argued that gridded precipitation products such as Maurer often have WF values that are too high on grid points between stations.
Seasonal WF differences show that the ensemble generally has less summer WF than Maurer, while winter WF differences are generally smaller, except in the Intermountain West (Figs. 10d,e). The ensemble consistently has more WF than Maurer in the northeastern United States (Figs. 10c–e). This is likely attributable to the differences in the input station data. Also, the Maurer dataset has a noticeably noisier WF “speckle pattern” than the ensemble product (Figs. 10a,b), with the effect of the individual stations and their radius of influence more readily apparent because of the use of the synergraphic mapping system algorithm (Shepard 1968, 1984) in the Maurer dataset. The ensemble mean WF is smoother because of the averaging of the 100 realizations.
Annual mean temperature differences between the two datasets are generally minimal across the CONUS, except in regions of complex terrain, particularly the high terrain of the western United States. In these regions, the ensemble dataset has higher mean temperatures at high elevations and slightly lower mean temperatures at lower elevations than the Maurer dataset (Figs. 10f–h). This dichotomy is even more prevalent in DJF (Fig. 10i) than in JJA (Fig. 10j) or in the annual mean, with some mountain ranges having a DJF mean temperature up to 5 K higher than in the Maurer dataset. In most of the Intermountain West, the ensemble mean annual temperature is generally 1–3 K higher than the Maurer mean annual temperature.
This agrees with Mizukami et al. (2014), who found that the Maurer dataset is consistently colder than NLDAS (which does use a constant temperature lapse rate, but incorporates temperature data from the NARR, which is a moderately high-resolution reanalysis product, Table 1), especially at higher elevations. The differences in temperatures in mountainous regions are due to the different treatments of the temperature lapse rate. Again, Maurer assumes a constant temperature lapse rate, whereas in the ensemble the lapse rate is determined from the input data and allowed to vary by day and grid cell (Table 1). This indicates that the −6.5 K Km−1 lapse rate is generally too large in the winter but more realistic in the summer (e.g., Stahl et al. 2006). Oyler et al. (2015) have shown there are temperature biases in SNOTEL data that can be traced to sensor changes in the network, especially in DJF, even after basic quality control, which may be enhancing the difference between the two methods in the Intermountain West. Since we have not homogenized the input datasets, this spurious trend (and others) will be present in the ensemble output; thus, it is not recommended to use this dataset for climate trend analysis.
5. Example applications
a. Performance in a hydrologic modeling system
To provide an additional comparison to deterministic forcing datasets, the ensemble mean forcing was run through a hydrologic modeling system consisting of the SNOW-17 temperature index snow model (Anderson 1973) and the Sacramento soil moisture accounting (SAC-SMA) conceptual hydrologic model (Burnash et al. 1973). The hydrologic modeling system was optimized using the shuffled complex evolution (SCE) global optimization routine (Duan et al. 1992) for 671 basins (Newman et al. 2014, 2015a). Figure 11 gives the results of the calibration and validation periods for the Daymet, NLDAS, Maurer, and ensemble mean forcing in terms of the Nash–Sutcliffe efficiency (NSE; Nash and Sutcliffe 1970). It can be seen that the CDFs for the 1-km dataset (Daymet) have consistently higher NSE values, while the various 12-km resolution datasets (NLDAS, Maurer, and ensemble) have essentially indistinguishable calibration and validation period performance over this large basin set. A note of caution is that the ensemble mean dataset has a very high wet day fraction. This has little consequence in the conceptual model used here, but it does impact energy balance models such as VIC (see Mizukami et al. 2015). Calibration using ensembles such as this can build on methods described by Skinner et al. (2015).
b. Example basin simulation
An ensemble forcing dataset is particularly useful for estimating forcing uncertainties as discussed previously, but it is also useful for estimating model state uncertainties by providing many equally plausible spinup periods for a land surface or hydrologic model. Here we provide a very brief example application (Fig. 12) of the ensemble dataset to the Crystal River in central Colorado using a very simple temperature index snow model (SNOW-17) and conceptual hydrologic model (SAC-SMA). Basin mean forcing data generation methodology and model parameters were taken from Newman et al. (2014 ,2015a). The ensemble spread nearly always envelopes the three deterministic datasets for daily mean temperature (Fig. 12a); however, the ensemble mean climatology is slightly warmer than all three deterministic datasets. Occasionally, the Maurer dataset lies below the coldest ensemble member, particularly in winter, because of the use of the constant temperature lapse rate. While the ensemble characterizes uncertainty given its methodological choices, a different set of methodological choices (e.g., constant temperature lapse rate as in Maurer) can create estimates that lie outside the range of the ensemble. Further examination of uncertainty in the context of methodological choices is outside the scope of this initial ensemble dataset.
For modeled snow water equivalent (SWE), the deterministic datasets are again nearly always enveloped within the ensemble spread. In this case, the comparison is less straightforward because the temperature index model parameters were optimized using daily streamflow calculated with the Daymet forcing dataset (Fig. 12b). The model includes a frozen precipitation gauge correction factor (~2 in this case) that amplifies precipitation differences. For reference, the ensemble 1 April SWE variability is roughly 10% of the ensemble 1 April climatology in this example.
6. Dataset availability
The final dataset is freely available online (http://dx.doi.org/10.5065/D6TH8JR2; see http://ral.ucar.edu/projects/hap/flowpredict/subpages/pqpe.php for more information; Newman et al. 2015b). The 100-member ensemble for the ⅛° grid (Fig. 1) includes daily precipitation, Tmean, and Trange and is provided in climate and forecast (CF) compliant netCDF format. It spans the entire NLDAS-2 grid and covers the time period from 1 January 1980 through 31 December 2012.
This paper addresses the need for uncertainty estimates in observationally derived gridded precipitation and temperature datasets by generating a station-based, ensemble precipitation and temperature dataset on a ⅛° grid over the CONUS, with coverage extended into northern Mexico and southern Canada to include all watersheds draining into the CONUS. The ensemble has good reliability for low-probability thresholds but is slightly underconfident and has a small dry bias for high probabilities for nearly all event thresholds (Fig. 3). Discrimination of events and nonevent likelihood distributions is generally very good, and the ensemble is able to assign nonzero probabilities to large precipitation events (Fig. 3). Figure 4 indicates that future improvements to the topographic corrections in the western United States may be needed as well as further investigation into the impact of the number of stations included in the locally weighted regression or noise estimates across the eastern United States where a majority of the precipitation is localized and intense. Comparisons to three widely used deterministic products Maurer et al. (2002), NLDAS-2 (Xia et al. 2012), and Daymet (Thornton et al. 2014) find that the general spatial patterns and magnitudes of precipitation are similar across all the datasets and interdataset differences are comparable (Figs. 6–8).
Focused comparisons of seasonal precipitation (Fig. 9), WF, and temperature with the Maurer dataset show some key differences. Since the ensemble temperature lapse rates are derived from the station data, this dataset has higher Tmean values than Maurer et al. (2002), especially in high elevations during winter (DJF; Fig. 10). The higher temperatures may also be due to the inclusion of SNOTEL observations, which may have increased temperatures over the last 20 years (Oyler et al. 2015). Additionally, precipitation is not scaled by the PRISM climatology, and when combined with SNOTEL data, differences exist in the higher terrain of the western United States. Improved WF fields are produced in the ensemble framework, which has consequences in the empirical derivation of other fields used in land surface and hydrologic modeling like net shortwave down radiation (Hungerford et al. 1989; Bohn et al. 2013; Mizukami et al. 2014). The WF field is smoother and generally has fewer days of precipitation throughout the domain, except in a few locations in the eastern United States and Intermountain West (Fig. 10).
We suggest two main avenues of research for these types of products. First, there is substantial scope for additional methodological improvements, such as examination of the underestimation of summer precipitation in the southeastern United States, improvement in topographic corrections, improvement of the representation of the temporal autocorrelation for precipitation, and improvement of the cross correlation between temperature and precipitation. Specifically, precipitation persistence is applied to the anomaly field (precipitation amount), while a more suitable scheme may be to use the temporal persistence in precipitation occurrence. This could be modeled using a Markov chain (Gabriel and Neumann 1962).
The second avenue consists of leveraging additional data sources in a data fusion framework to create more complete and robust ensemble datasets. With proper processing, radar, satellite, and even atmospheric model precipitation products can be incorporated in this type of ensemble system. Recent radar rainfall work has shown vast improvements and the ability to better quantify uncertainty at each step and produce merged radar–rain gauge analyses on smaller scales (e.g., Schiemann et al. 2011). Finally, more sparse observations of wind and radiation could be combined with short-term deterministic or ensemble model wind and radiation data to produce ensembles containing all necessary fields for energy-balance land surface modeling in a more dynamically consistent manner than currently possible.
This work is funded by the Bureau of Reclamation, U.S. Department of the Interior; the U.S. Army Corps of Engineers Climate Preparedness and Resilience Programs; and the National Science Foundation. The authors thank Kyoko Ikeda for her help with various GHCN-Daily processing steps. The NLDAS-2 data used in this study were acquired as part of the mission of NASA’s Earth Science Division and archived and distributed by the Goddard Earth Sciences (GES) Data and Information Services Center (DISC).
Station Data Processing
a. Quality control
Extensive quality-control procedures are performed on the GHCN-Daily data (Durre et al. 2008, 2010). These procedures include spatial consistency checks, multiday temperature streak checks, climatological outlier checks, and others described in detail in Durre et al. (2008). However, potential errors with the data still exist, including systematic precipitation gauge undercatch during snowfall (Goodison et al. 1998; Yang et al. 1998) or changes in station equipment impacting temperature readings (Menne et al. 2009, 2010). Additionally, unlike the U.S. Historical Climate Network, GHCN-Daily data are not homogenized to account for station movement or urbanization issues (e.g., Menne and Williams 2005). Homogenization was not performed in this work as this dataset is not intended for climate trend analysis.
NRCS SNOTEL data used in this dataset did not undergo any prior quality control, and thus the general methods of Serreze et al. (1999) were applied. They are the following for temperature: 1) a gross error check where Tmax > 45°C and Tmin < −45°C are flagged as missing, 2) temperature runs of two or more days with the exact same Tmax or Tmin are flagged as missing, and 3) a standard deviation check where Tmax or Tmin values outside three standard deviations of the station mean climatology for that date are flagged as missing. For precipitation 1) daily values >250 mm are flagged as missing and 2) winter season (October–March) precipitation are corrected using a regional bulk undercatch ratio [see the appendix of Serreze et al. (1999) for more details]. Figure A1 presents an example SNOTEL site with the gauge undercatch correction applied (Fig. A1a) and another with various temperature issues (Fig. A1b). Again, no homogenization is performed on the SNOTEL data as with the GHCN-Daily data. Recent work (Oyler et al. 2015) highlights the dangers of using nonhomogenized time series for climate trend analysis.
b. Serially complete time series
As was discussed in section 4, elevation lapse rates for precipitation and temperature are determined directly from the available station data for each time step. This requires the station data to be consistent in time and be serially complete to avoid large changes in derived lapse rates as stations are added or removed or report missing data for a given event. To meet this requirement, a serially complete station dataset was developed from the quality-controlled station data described above.
Initial station screening consisted of rejecting stations with less than 10 years of valid data during 1980–2012, with the remaining stations selected as target stations to be made serially complete (Eischeid et al. 2000). For a given target station, two data-filling methods are used. First, for missing data gaps of one or two consecutive days in Tmax or Tmin, simple linear interpolation is used to fill missing values. Second, for data gaps larger than 2 days in temperature and all missing values in precipitation a quantile matching method is used (Panofsky and Brier 1968). This process consists of the following steps for temperature and precipitation.
The closest 30 stations with at least 10 years of overlap with each target station are found.
For each day of the year (DOY), Spearman rank correlations (Lehmann and D’Abrera 1998) of the overlap stations with the target station are computed using a 31-day window for the valid overlap data.
The overlap stations are ranked from highest to lowest rank correlation for the current DOY.
Missing values at the target station for the current DOY are filled from the highest ranking overlap station with corresponding valid data based on quantile matching:
CDFs for the target and overlap station are computed for the 31-day window around the current DOY.
The cumulative probability of the overlap station value is computed from the overlap station CDF.
That cumulative probability is used in the target station CDF to determine the fill value for the target station.
If there are no overlap stations with valid data for a given target station missing value, then the target station is rejected. Using rank correlations and CDF matching removes the step of segregating stations by their reporting time used in Eischeid et al. (2000) because it accounts for the differences in the respective station distributions (Panofsky and Brier 1968; Wood et al. 2002). The methodology employed here produces consistent time series during the filled and nonfilled periods (Fig. A2). This is also a limitation of the method, in that direct CDF matching assumes the CDFs are stationary from the overlap to missing record periods (Teegavarapu 2014; Fig. A3). Overall, 12 124 stations for precipitation and 9374 stations for temperature are included in the serially complete dataset with 24.4% (25.6%) of the total precipitation (temperature) data filled and 59% (55%) of the precipitation (temperature) stations containing less than 25% filled data (Fig. A4).
The National Center for Atmospheric Research is sponsored by the National Science Foundation.