General circulation models (GCMs) are essential for projecting future climate; however, despite the rapid advances in their ability to simulate the climate system at increasing spatial resolution, GCMs cannot capture the local and regional weather dynamics necessary for climate impacts assessments. Temperature and precipitation, for which dense observational records are available, can be bias corrected and downscaled, but many climate impacts models require a larger set of variables such as relative humidity, cloud cover, wind speed and direction, and solar radiation. To address this need, we develop and demonstrate an analog-based approach, which we call a “weather estimator.” The weather estimator employs a highly generalizable structure, utilizing temperature and precipitation from previously downscaled GCMs to select analogs from a reanalysis product, resulting in a complete daily gridded dataset. The resulting dataset, constructed from the selected analogs, contains weather variables needed for impacts modeling that are physically, spatially, and temporally consistent. This approach relies on the weather variables’ correlation with temperature and precipitation, and our correlation analysis indicates that the weather estimator should best estimate evaporation, relative humidity, and cloud cover and do less well in estimating pressure and wind speed and direction. In addition, while the weather estimator has several user-defined parameters, a sensitivity analysis shows that the method is robust to small variations in important model parameters. The weather estimator recreates the historical distributions of relative humidity, pressure, evaporation, shortwave radiation, cloud cover, and wind speed well and outperforms a multiple linear regression estimator across all predictands.
Climate change will impact socioecological systems (Staudinger et al. 2012), and evaluating local climate impacts requires regional climate data at fine spatial and temporal resolutions that match the modeled processes. While general circulation models (GCMs) provide projections of an extensive set of variables at spatial scales of ~100 km, these scales are far too coarse to fulfill the needs of a range of impacts models (Hansen et al. 2006; Ingram et al. 2002). To address this issue, coarse-scale variables can be transformed into finer-scale variables through the process of downscaling. However, most downscaled products only provide precipitation and temperature, whereas impacts models often need a broader suite of variables such as humidity, cloud cover, wind speed and direction, and solar radiation. Historically, these variables have not been the focus of downscaling approaches, partially because observations of these weather variables are not as extensive. While regional climate models (RCMs) can be used to produce this suite of downscaled metrics (Giorgi et al. 2009; Mearns et al. 2009; van der Linden and Mitchell 2009), RCMs are nontrivial to implement, requiring specialized expertise, extensive model parameterization, and high-performance computing resources. Statistical downscaling is an appealing alternative and the relative pros and cons of dynamical versus statistical downscaling are summarized in Fowler et al. (2007). In this paper, we adopt a statistical downscaling approach, mainly for its computational efficiency and flexibility, developing an analog-based method that systematically produces a full suite of gridded, meteorological data that have not been traditionally available.
Statistical downscaling methods are generally defined as techniques that relate large-scale variables (predictor) to smaller-scale variables (predictand). This general definition gives statistical downscaling the advantage of being extremely flexible, although this has led to a proliferation of approaches that can be difficult to neatly categorize (Rummukainen 1997; Maraun et al. 2010; Vaittinada Ayar et al. 2016). Vaittinada Ayar et al. (2016) break statistical downscaling methods into four categories: model output statistics (MOS), transfer functions (TFs), stochastic weather generators (WGs), and weather typing (WT)-based methods. The last three approaches, referred to as “perfect prognosis” downscaling, require temporal synchronicity between the predictor and predictand datasets for training, while the MOS approach works directly on model outputs, relating distributional characteristics between the predictors and predictands without calibration (Maraun et al. 2010).
MOS downscaling, which has a long history in numerical weather forecasting (Wilks 2006), relates modeled large-scale predictors to observed local-scale predictands. MOS techniques relate distributional characteristics between the predictors and predictands and the main MOS methods are outlined in Maraun et al. (2010). For instance, bias correction with spatial disaggregation (BCSD; Wood et al. 2004) is a MOS method using quantile mapping that has been applied extensively in impact assessments in the United States.
TFs are often mathematical functions used to relate large-scale to local-scale observations. For example, Vaittinada Ayar et al. (2016) use generalized additive models as a representative TF method in their downscaling intercomparison project and Wilby et al. (2002) developed a multiple regression-based tool that has been widely applied (e.g., Ahmed et al. 2013). These TF methods are simple to implement but can underestimate variance.
WGs are statistical models that simulate realistic sequences of weather variables based on parameters derived from observed climate (Wilks and Wilby 1999). Comprehensive reviews of WGs can be found in Wilks (2010, 2012). WGs are commonly used for hydrologic, environmental management, and agricultural applications (Wilks 2002). However, significant challenges arise when applying stochastic WGs to climate change impacts assessments, especially for multisite or two-dimensional applications such as creating a gridded data product, because while multisite WGs span a range of sophistication and structures, typical limitations include the inability to reproduce nonstationarity in future projections, spatial covariance across sites, covariance between variables, and temporal persistence of variables (Steinschneider and Brown 2013; Srikanthan and Pegram 2009).
Last, WT-based approaches involve the identification of large-scale circulation patterns that can be related to phenomenon at the local scale. These methods are appealing but require careful choice of the predictor variable(s) (Jézéquel et al. 2018; Maraun et al. 2010). Analogs are a particular WT method whereby similar states of the atmosphere can be used to inform the generation of historical weather data or climate projections, typically at the daily time scale. A common use of analogs in statistical downscaling is to develop a set of one or more predictors (e.g., temperature, precipitation, geopotential heights, surface pressure) from a spatially coarse dataset that can be used to select one or a combination of analogs from a spatially fine dataset (Abatzoglou and Brown 2012; Hidalgo et al. 2008; Raynaud et al. 2017; Zorita and von Storch 1999). Analog approaches are often used to downscale temperature and precipitation (Abatzoglou and Brown 2012; Hidalgo et al. 2008; Maurer et al. 2010; Pierce et al. 2014), but have also been used to downscale wind, humidity, and evapotranspiration (Abatzoglou and Brown 2012; Martín et al. 2014; Pierce and Cayan 2016; Tian and Martinez 2012), as well as to develop meteorological reconstructions from sparse data (e.g., Schenk and Zorita 2012; Fettweis et al. 2013; Yiou et al. 2013). Statistical downscaling approaches can also be hybrids; for example, analogs can be used to design WGs (Yiou 2014). Analog approaches have the advantage that they can preserve the daily sequences of the GCM (Pierce et al. 2014), which can be relevant for impacts modeling, but also provide a broad suite of gridded daily weather variables that have not been made readily available for use by impacts models.
As mentioned previously, most of the focus of these statistical downscaling methods has been on precipitation and temperature, especially in terms of available gridded products. For instance, precipitation and temperature data that have been downscaled to ⅛° resolution across the continental United States using BCSD and several different analog approaches can be directly downloaded from the data repositories of phases 3 and 5 of the Coupled Model Intercomparison Project (CMIP3; CMIP5) (available at http://gdo-dcp.ucllnl.org; Brekke et al. 2013). These precipitation and temperature data can provide an excellent starting point for meeting the needs of the impacts modeling community as they are readily accessible. However, there is a need for a general method that leverages these readily accessible, downscaled temperature and precipitation data to provide the full suite of meteorological data needed for impacts assessment.
In this paper, we develop and demonstrate an analog-based approach, which we call a “weather estimator,” that is practical, straightforward, and flexible. The weather estimator utilizes temperature and precipitation from previously downscaled GCMs (Maurer et al. 2010; Winter et al. 2016) to systematically select analogs from a reanalysis product, creating a complete daily gridded climate dataset containing a broad suite of weather variables needed for impacts modeling. This approach allows impacts modelers to create a complete daily gridded climate dataset from a paired GCM and reanalysis product; specifically, any GCM product containing temperature and precipitation and any reanalysis product that has a relatively complete set of weather variables with realistic covariance across space and variables. The weather estimator is encapsulated in an R package (https://www.r-project.org; accessed 12 August 2017) named “weatherAnalogs” and available as free and open-source software, making it available to the wider community.
2. Data and methods
a. Study area
The weather estimator is demonstrated over the Lake Champlain basin (Fig. 1), which includes western Vermont, northeastern New York State, and southern Quebec, Canada. The Green Mountains (running north–south through central Vermont) and a portion of the Adirondack Mountains in New York are the main topographic features within the watershed. Elevation ranges from 30 m above sea level to 1340 m above sea level. This area is of particular interest for climate change impacts modeling because of the nutrient loading, primarily from agricultural runoff, that has caused intense blooms of cyanobacteria for many decades and has become more prominent in the last 20 years (Facey et al. 2012; Isles et al. 2015).
b. Climate data
The weather estimator has the flexibility to be applied across a variety of regions and driven by a range of predictor and analog datasets; we describe here the data used for the application to the Lake Champlain basin. For the predictor dataset, we first downloaded bias-correction constructed analogs ⅛° GCM temperature and precipitation data (Brekke et al. 2013) from the CMIP5 (Taylor et al. 2012) repository. We selected four GCM ensemble members (MIROC-ESM-CHEM, MRI-CGCM3, NorESM1-M, and IPSL-CM5A-MR) forced with representative concentration pathway 8.5 (Moss et al. 2010) with the objective of producing a bounding set of potential outcomes. Second, because of the complex topography of the Lake Champlain region, we used the elevation adjustment approach of Winter et al. (2016) to further downscale the data to 30 arc s (1/120°, or ~800 m). This resulted in a dataset of daily precipitation and temperature spanning from 1950 to 2099 that is hereinafter referred to as bias corrected, downscaled, and elevation-adjusted (BCDE). We note that choosing more physically relevant predictors would likely increase the accuracy of our analogs. However, in this manuscript we focus instead on how well key impacts-relevant variables can be predicted with the common constraint of having only temperature and precipitation as predictors.
For the analog dataset, we selected the North American Regional Reanalysis (NARR; Mesinger et al. 2006) because of its range of years available (1979–2014), coherence across space, time and weather variables, availability of precipitation (a variable that is not typically assimilated), and adequate spatial resolution (~32 km) for our downstream impacts models. NARR is a reanalysis product that combines the National Centers for Environmental Prediction Eta atmospheric model and Regional Data Assimilation System to produce a dynamically consistent atmospheric and land surface hydrology dataset for North America (Mesinger et al. 2006). We used NARR monolevel daily means as the pool of potential analogs for the weather estimator. The set of surface and near-surface variables in the NARR monolevel dataset (NOAA/OAR/ESRL PSD 2019) include a large number of common weather variables needed for climate impacts modeling. This study focuses on temperature (air.2m), precipitation (apcp), atmospheric pressure (prmsl), relative humidity (rhum.2m), cloud cover (tcdc), evaporation (evap), shortwave radiation flux (dswrf), and U- and V-wind speeds (uwnd.10m and vwnd.10m) because these weather variables are commonly required inputs for climate impacts models. The weather estimator could be used to estimate any weather variable in the NARR dataset with the caveat that the accuracy of the estimation will be limited by NARR’s ability to capture that weather variable and the weather variable’s correlation with the predictors.
While this study used GCM-based data with a resolution of 30 arc s for the predictor dataset and 32-km reanalysis data for the analog dataset because of their availability, a predictor dataset at any resolution finer than or near the resolution of the analog dataset is sufficient for the weather estimator. The difference in resolution is managed through the use of a set of tie points (described in the method below) to compare temperature and precipitation between the predictor and analog datasets and find the nearest analog.
The main purpose of the weather estimator is to find the analog in the predictand dataset (NARR) that is most like each data point in the predictor (BCDE) dataset. The weather estimator accomplishes this through the following main steps as illustrated in Fig. 2 and explained in detail below: 1) preprocess BCDE and NARR datasets; then, for each BCDE data point, 2) select a sample of temperature and precipitation grid cells, the tie points, from BCDE along with the corresponding NARR grid cells for all days within a temporal window, 3) standardize the temperature and precipitation values selected in step 2, 4) rank potential analogs by calculating the pairwise distances between the standardized BCDE and NARR temperature and precipitation values, and 5) select the nearest NARR analog. The R package can be used to generate a time series of weather variables at single location or a gridded product over a two-dimensional study area. The more sophisticated two-dimensional case is used for the discussion below.
Before selecting the analog, there are several preprocessing steps. First, we average the daily maximum and minimum temperatures from BCDE simulations to estimate the daily average temperature, which is the temperature variable present in the NARR dataset.
Second, we detrend BCDE temperatures to prevent poor temperature matches to the pool of potential analogs because of future increases in projected temperatures. Increasing temperatures, as high as 9°C by the end of the century (Fig. 3), lead to daily average temperatures that are rare or nonexistent in the historical record.
The temperature detrending adjustment is of the form
where y (i.e., 2015) and m (i.e., 1–12) are the year and month of the date being detrended and slopeΔT and interceptΔT are the slope and y intercept of the temperature trend line determined by the linear best fit [standard error (std err) = 0.2585, correlation coefficient squared R2 = 0.9791, significance level p < 0.001] of the mean annual temperature increase (Fig. 3) from the historical mean annual temperature (1979–2014) across the BCDE simulations used in this study. The S(m) scaling function is used to dampen the detrending in the cooler winter months when the projected future temperature increases are more severe. The 0.25 multiplier in the scaling function bounds S(m) between 0 (winter) and 0.5 (summer) and was derived empirically by comparing the BCDE monthly temperature averages for 2090–99 to the NARR historical period (1979–2014). Detrending is applied starting in 2015 because this is the boundary between the historical NARR reanalysis data and projected BCDE simulations. The constants in these equations are specific to the GCM models, analysis time period, and study area used in a specific application and should be determined on a case-by-case basis.
The detrended temperature is only used to select the NARR analogs. The final estimated weather dataset consists of the projected temperature and precipitation from BCDE and all other weather variables from the NARR analogs, preserving the projected temperature and precipitation trends from the GCM. The necessity of detrending temperature to find a suitable analog will impose some stationarity on predicted variables. Specifically, any trend in a predicted variable correlated with a temperature trend will be neglected. While this is a compromise, it both ensures a large pool of potential analogs and retains the seasonality of predicted variables. For some predicted variables, we expect the implications of this decision to be low given the relatively small or uncertain projected changes (e.g., wind speed, relative humidity) while other predicated variables will likely be impacted to a more significant degree (e.g., evaporation). Therefore, temperature detrending should be applied with caution.
Third, we transform precipitation by taking the quadratic root of both BCDE and NARR precipitation values:
where Ptrans is the transformed precipitation and P is the original precipitation. Using the raw precipitation values introduces a negative precipitation bias in the selection of the historical analog because of 1) the substantial right skew of the P distribution and 2) the selection of the nearest analog based on Euclidean distance. Because of these two conditions, for any given BCDE daily precipitation value, the nearest analog NARR precipitation value has a higher probability of being to the left (less precipitation) on the distribution than to the right (more precipitation). This tendency leads to a dry bias. Other root transforms could be used to reduce the skewness to varying degrees (Tukey 1977; Jeong et al. 2012), but we found that the quadratic root was the most effective at reducing dry bias.
The last step in preprocessing is the calculation of the long-term averaged monthly means and standard deviations for temperature and precipitation over the entire NARR dataset. These values are used to standardize temperature and precipitation from the NARR dataset as well as the precipitation and detrended temperature from the BCDE dataset before the Euclidean distance metric is applied. The values of temperature in degrees Celsius are typically higher than the values of precipitation in millimeters per day. This results in a disproportionately large influence of temperature on the Euclidean distance metric used to find the nearest historical NARR analog. Calculating the Euclidean distance using values standardized by the mean and standard deviation eliminates this bias, equally weighting temperature and precipitation for the distance metric [see Eqs. (4)–(6)]. Other approaches, such as quantile mapping, may provide alternative methods for addressing increasing temperatures, skew in the precipitation data, and mismatched ranges of values for temperature and precipitation. However, these alternatives would need to be evaluated to identify any potential limitations or errors introduced by the approach.
2) Selecting the analog
Once the preprocessing is complete, there are four primary steps to selecting an analog for each day. First, a random sample of temperature and precipitation grid cells from BCDE, and the geographically corresponding NARR grid cells, are selected (hereafter referred to as tie points). To ensure that tie points are not spatially clustered, a coarser grid is superimposed on the BCDE grid and a single tie point is selected from within each of the superimposed grid cells. For this study, we divided the study area in Fig. 1 (red box) into a coarse 2 × 3 tie point grid and, from each grid cell of that 2 × 3 grid, randomly selected a single tie point from the BCDE grid. This choice of 6 tie points is based on our sensitivity analysis described in the results section. The use of 6 tie points serves to balance using fewer points to improve computational efficiency with using more points to ensure a good overall match between the BCDE predictor grid and the chosen analog. The tie points can be randomly selected on a daily basis, as in this study, or selected once for the entire estimation time period. In addition, the tie points could be deterministically selected if there is a priori knowledge available to instruct tie point selection such as specific locations of interest for the associated impact studies.
Second, temperature and precipitation values are standardized for each tie point for both the target date of the BCDE simulation and all potential historical NARR analogs ( and ). As described above, the standardization parameters used for each target date are those calculated for the month m of the target date during preprocessing and are based on the entire NARR dataset:
Third, the standardized temperature and precipitation are used to calculate the distances between the BCDE target date and each potential NARR historical analog over the set of tie points. Only historical analogs within a user-defined window around the calendar day of the BCDE target date are considered. This places a seasonal constraint on analog selection so that, for instance, the selection of an autumn analog for a spring target date can be avoided. We use a window size of 61 days (±30 days from the target date) for our analysis based on the results of the sensitivity analysis described in the results section. Weighted Euclidean distance between T and P of the tie point grid cells is used as the distance metric:
where i is the index over the standardized tie points and wT and wP are the user-defined relative weights for temperature and precipitation. We set wT and wP to 1.0 for this study, but there could be climate impacts assessment applications where it is more important to capture weather variables more consistent with either temperature or precipitation.
Fourth, we select the potential analog that has the minimum distance, as defined by Eq. (6), from the BCDE target data point as the nearer analog. Then, the full set of weather variables across the entire study region from the selected historical NARR analog is applied to the date being estimated with the exception of temperature and precipitation. Temperature and precipitation are copied from the original BCDE data to guarantee that the projected climate trends in temperature and precipitation from the GCM are maintained in the output time series of weather variables.
3. Results and discussion
We performed four analyses to assess the performance of the weather estimator. First, the relationships between temperature and precipitation and the estimated weather variables over NARR (1979–2014) are explored. Second, the sensitivity of the algorithm to different tie points and time windows is tested. The parameter values used in these analyses are shown in Table 1. Third, a historical cross validation was performed to access the ability of the weather estimator to recreate a known historical climate distribution; and finally, the historical climate estimated by the analog-based weather estimator was compared to a more traditional climate estimation method, multiple linear regression.
a. Relationships between estimated weather variables and temperature and precipitation
The relationships between the estimated weather variables and temperature and precipitation have substantial implications for the accuracy of the weather estimator. To elucidate these relationships, we compared the distributions of each estimated weather variable across temperature and precipitation concurrently using a partial distribution matrix built with a 7 temperature bins and 10 precipitation bins (Figs. 4 and 5). Each matrix element is a histogram of the estimated weather variable data sampled 15 days before and after a target date over NARR (1979–2014) within the intersection of each temperature and precipitation bin. This analysis uses a smaller analysis window (±15 days) than the weather estimator itself (±30 days) to ensure stationarity. Only rows containing more than 3500 data points across the entire row are shown for brevity. For comparison, each partial distribution matrix contains over 100 000 data points for any given date ±15 days. To ensure that each histogram contains the same number of data points, the precipitation and temperature ranges were divided into 10 quantiles, calculated with the NARR data over the entire study region, with the exception that the first precipitation bin includes the lower 40% of all precipitation values, the largest possible set of the first 10% quantiles that contain zero precipitation days.
Changes in the histograms between adjacent elements in the matrix show that there is some relationship between the estimated weather variable and temperature, precipitation, or temperature and precipitation. Specifically, changes in the histogram matrix along columns, rows, and diagonally demonstrate an influence of precipitation, temperature, and temperature and precipitation combined on the estimated weather variable in the matrix, respectively. The larger the difference between adjacent histograms, the stronger the relationship between the estimated weather variable and temperature and precipitation.
Relative humidity histograms shift to the right and narrow as precipitation increases across all temperature bins (Fig. 4). In addition, there is a more dramatic shift to the right as temperature decreases across most precipitation bins. These changes in the relative humidity distribution show that relative humidity is closely tied to both temperature and precipitation. Most relationships between the estimated weather variables and temperature and precipitation are much more nuanced. For instance, atmospheric pressure histograms shift to the left between the first (little to no precipitation) and second (more significant precipitation) precipitation columns, but then are relatively similar when comparing across the remaining precipitation bins. This reflects the general expectation that low pressure is associated with rainy weather while high pressure is associated with drier weather.
The partial distribution matrices for the estimated weather variable V wind for two different seasons, winter (1 February) and summer (1 August), demonstrate that the relationships between temperature and precipitation and the estimated weather variables can change by season (Fig. 5). In the summer (lower matrix), the V-wind distributions shift left as the temperature cools indicating a shift from light southerly winds to stronger northerly winds. The distributions also flatten as the temperature cools. These effects appear to lessen as precipitation increases. This left shift and flattening of the histograms is less prominent in the winter (upper matrix). This indicates that the relationships between temperature and precipitation and V wind are stronger in the summer months than in the winter months.
To quantify the relationships between the estimated weather variables and temperature and precipitation, the differences in the histograms across temperature and precipitation bins were calculated using the Perkins skill score (Perkins et al. 2007), or Sscore. The Sscore is an intuitive measure of the overlap between two histograms, with a Sscore close to zero denoting a poor match (nonoverlapping histograms) and a Sscore of near one denoting a near perfect match (overlapping histograms). This measure is uniquely suited for assessing daily temperature and precipitation data and is a more rigorous standard than assessing statistical moments such as mean and variance. We calculated the Sscore between all 7 × 10 matrix element pairs where both histograms contained more than 500 data points to avoid biasing the Sscore toward outliers. We then grouped each pair by the distance between the elements using the Chebyshev metric (Deza and Deza 2009), where a one-bin shift in any direction (temperature, precipitation or temperature and precipitation together) counted as a distance of 1. Last, the average Sscore was calculated across each distance for each of the estimated weather variables (Fig. 6).
Perkins et al. (2007) tested the sensitivity of the Sscore by randomly sampling 75% of a full probability distribution to generate 100 partial probability distributions. The lowest partial probability distribution Sscore found was 0.97; therefore, Perkins et al. (2007) used this threshold (i.e., Sscore > 0.97) to define two indistinguishable probability distributions. Consistent with the analysis and discussion in Perkins et al. (2007), we set substantially lower thresholds to indicate significant (<0.8) and very significant (<0.6) differences between histograms. A drop in Sscore with increasing element distance indicates a relationship between the value of the estimated weather variable and the values of temperature, precipitation, or both.
The Sscore drops below the 0.8 threshold within a distance of one or two elements and nears or falls below the 0.6 threshold within a four-cell distance for five of the seven estimated weather variables. In addition, for all estimated weather variables, the Sscore drops consistently as element distance increases up through a distance of 6, the maximum distance across a single temperature row. This shows there is some predictive power of temperature and precipitation for all estimated weather variables. Based on these results, the weather estimator is expected to produce the best daily values for evaporation, relative humidity, and cloud cover. Conversely, temperature and precipitation had the least predictive power for pressure even though we expected a strong correlation between changes in temperature and precipitation and pressure because of the link between pressure, convergence, and precipitation. The results of this analysis should be considered specific to this region and might not be applicable to different geographies. Hence, relationships between the estimated weather variables and temperature and precipitation should be examined before applying this weather estimator to other study regions.
b. Sensitivity analysis
Each parameter listed in Table 1 influences which historical analog is selected by the weather estimator. This sensitivity analysis evaluates the effect of two of those parameters, the number of tie points and the size of the time window, by comparing the differences between the target temperature and precipitation from BCDE and the selected analog temperature and precipitation from NARR. Table 2 lists the six scenarios used to examine sensitivity across 4, 6, 12, and 20 tie points and time windows of ±15, ±30, and ±45 days. The number of tie points chosen balances a robust representation of the domain to maintain the significance of temperature and precipitation matches between BCDE and NARR (more tie points) with computational efficiency (fewer tie points). Varying time windows explores the trade-off between a small time window (±15 days), which could result in too few potential analogs to ensure a good match, and a large time window (±45 days), which could result in the selection of an analog that is seasonally inconsistent with the target date.
Ten randomly seeded simulations were performed for each of the six scenarios across the four BCDE ensemble members (1979–2014). Both the mean and the standard deviation of the temperature and precipitation biases reduce more slowly after about six tie points, making six tie points a good compromise between reducing the biases and computation time (Fig. 7). A similar conclusion can be drawn for a window size of ±30 days. The reasonable standard deviation of the temperature and precipitation biases across different values of these parameters show that the weather estimator is robust to the suboptimal selection of these parameters.
c. Historical cross validation
The ability of the weather estimator to recreate known historical climate distributions was assessed using a historical cross-validation experiment. The NARR dataset (1979–2014) was chosen for this cross validation because the historical values of the predictands are available for comparison. The historical estimated weather variable time series was generated one year at a time, removing the year being estimated from the set of potential analog matches so that a date could not be estimated by itself. For example, to estimate the historical series for 1982, the set of NARR observations from 1979 to 1981 and from 1983 to 2014 were compared with each day of 1982 to generate the historical estimate for the year 1982. After creating this historical estimate for each year, the yearly historical estimates were concatenated to build the full historical estimate from 1979 to 2014. Four tie points, randomly selected each day using a 2 × 2 tie point grid superimposed over the NARR grid, were used to compare the predictor grids to the potential analog matches and a ±30-day window was used to constrain the potential matches; both consistent with the results of the sensitivity analysis. A smaller tie point grid was used here because of the smaller size of the NARR grid relative to the BCDE grid.
The weather estimator recreates the historical distributions of all of the predictands to a very high degree with a Perkins Sscore consistently above the 0.8 threshold (see previous discussion) when compared year-by-year against the historical NARR distributions (Fig. 8). These results support our initial analysis of the historical relationships between temperature and precipitation and the predictands and shows that temperature and precipitation do have some predictive power for the predictands.
d. Comparison with MLR
A similar historical cross-validation experiment using a multiple linear regression (MLR; Jeong et al. 2012) estimator was performed to compare our weather estimator to more established methods. Each year of historical estimates of the predictands was constructed by fitting a linear regression between the predictors (temperature and precipitation) and each predictand for each month using the other years in the NARR dataset, deriving monthly linear coefficient (β parameter) vectors. For example, to estimate the historical predictand series for 1982, the set of NARR observations from 1979 to 1981 and from 1983 to 2014 were divided into 12 different datasets by month and used to generate 12 different β parameter vectors. These monthly β parameter vectors were then used along with the temperature and precipitation for each day of 1982 to generate the historical estimate for the year 1982. After creating this historical estimate for each year, the yearly historical estimates were concatenated to build the full historical estimate from 1979 to 2014. The quadratic root of precipitation was used in the predictor dataset to match the preprocessing method of our weather estimator as well as the method of Jeong et al. (2012), who used the third root of precipitation as a predictor. These root transforms reduce the skew of the precipitation distribution, making it more normal (Tukey 1977), and thus, improving the ability of the multiple linear regression to use precipitation as a predictor.
The year-by-year distributions of the predictands from the multiple linear regression estimator are less similar to the observed NARR data than those of our analog-based weather estimator as measured by the Perkins Sscore, especially the two wind variables (Fig. 8). However, the linear regression estimator does come close to the performance of our analog weather estimator for relative humidity, evaporation, and shortwave radiation. While the multiple linear regression method is computationally much faster than our analog-based method, especially after the one-time calculation of the β parameter vectors, it cannot recreate the yearly distributions for the predictands to the same degree as our analog weather estimator.
Climate data at fine spatial and temporal resolutions have become essential for socioecological research and applications in land management, conservation policy, and planning. GCM products have the advantage of filtering out some of the unpredictable noise associated with weather events and local-scale features because of their low-resolution spatial and temporal scales, but they are too coarse and do not provide a comprehensive set of weather variables to meet the needs of socioecological studies (Hansen et al. 2006; Ingram et al. 2002). This was the motivation for the development of the weather estimator.
Our weather estimator has several strengths. It can produce a full suite of weather variables at a relatively high spatial resolution, has low data requirements, is computationally efficient, and provides weather data that are consistent across space and variables. The WG can determine appropriate historical analogs consistent with the future climate simulated by GCMs, can construct a large number of nonidentical simulated series using daily, randomly selected tie points that are useful for uncertainty analysis (Beck 1987), and is generalizable to a new study region assuming that a high-quality reanalysis dataset (e.g., NARR, Daymet, North American Land Data Assimilation System) is available for the region. In addition, the analysis of the relationships between the estimated weather variables and temperature and precipitation show that temperature and precipitation do indeed have some predictive power for a wide range of other weather variables and can be used to find reasonable historical analogs for future projections. The sensitivity analysis shows that the weather estimator is robust to reasonable deviations from the optimal tie point and time window parameters and the historical cross validation demonstrates that the weather estimator can recreate historical yearly distributions of the predictands well and outperforms a multiple linear regression model on the same task. Last, the WG has already been used to generate weather variables for the lake hydrodynamic and water quality modeling component of an integrated assessment model (Zia et al. 2016) and is readily available in the “weatherAnalogs” R package (https://www.r-project.org; accessed 12 August 2017), making it a valuable contribution and community resource for the ongoing study of climate impacts.
The weather estimator also has limitations. First and foremost, the accuracy of the weather estimator is constrained by the correlation between the estimated weather variable and temperature and precipitation. In addition, any limitations of the input data will be reflected in the estimated variables. For example, GCM projections have difficulty simulating short-term extreme events. Thus, estimated weather derived from GCM projections will also not have these extreme events. Second, the size and diversity of the pool of potential analogs affects the ability of the weather estimator to find analogs that closely match the target temperature and precipitation. Therefore, the weather estimator requires a sufficiently large analog dataset to successfully find suitable analog matches.
The weather estimator can create a complete daily gridded climate dataset consisting of weather variables such as humidity, cloud cover, wind speed and direction, and solar radiation using the temperature and precipitation projections of a GCM and an analog dataset. For impacts assessments that rely on the spatial and temporal structure of weather variables, the weather estimator is a practical and robust tool to explore the effects of climate scenarios.
This material is based upon work supported by the National Science Foundation under Grants EPS-1101317 and OIA-1556770. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Brian Beckage acknowledges additional support from the USDA National Institute of Food and Agriculture through project accessions 1009564 and 1014484. Erin Towler is affiliated with the National Center for Atmospheric Research, which is sponsored by the National Science Foundation.