Most of the currently employed procedures for bias correction and statistical downscaling primarily consider a univariate approach by developing a statistical relationship between large-scale precipitation/temperature with the local-scale precipitation/temperature, ignoring the interdependency between the two variables. In this study, a multivariate approach, asynchronous canonical correlation analysis (ACCA), is proposed and applied to global climate model (GCM) historic simulations and hindcasts from phase 5 of the Coupled Model Intercomparison Project (CMIP5) to downscale monthly precipitation and temperature over the conterminous United States. ACCA is first applied to the CNRM-CM5 GCM historical simulations for the period 1950–99 and compared with the bias-corrected dataset based on quantile mapping from the Bureau of Reclamation. ACCA is also applied to CNRM-CM5 hindcasts and compared with univariate asynchronous regression (ASR), which applies regular regression to sorted GCM and observed variables. ACCA performs better than ASR and quantile mapping in preserving the cross correlation at grid points where the observed cross correlations are significant while reducing fractional biases in mean and standard deviation. Results also show that preservation of cross correlation increases the bias in standard deviation slightly, but estimates observed precipitation and temperature with increased likelihood, particularly for months exhibiting significant cross correlation. ACCA also better estimates the joint likelihood of observed precipitation and temperature under hindcasts since hindcasts estimate the observed variability in precipitation better. Implications of preserving cross correlations across climate variables for projecting runoff and other land surface fluxes are also discussed.
Global climate models (GCMs) have shown potential in simulating/projecting climate variables over historical and future time periods (Hawkins and Sutton 2009; Taylor et al. 2012). These projections are widely used to quantify future changes and associated uncertainties in hydrological responses induced by climate changes (Lettenmaier et al. 1999; Hay et al. 2000; Hanson and Dettinger 2005; Mejia et al. 2012; Singh and Sankarasubramanian 2014). However, the GCM grid resolutions from phase 5 of the Coupled Model Intercomparison Project (CMIP5) are too coarse for application in hydrologic models, thereby requiring downscaling. Dynamic downscaling and statistical downscaling are two commonly employed types of downscaling, with the former using regional climate models to downscale GCM projections (Giorgi et al. 2001; Hay and Clark 2003; Leung et al. 2003) and the latter using statistical models to obtain projections at a finer scale (Wood et al. 2004; Wilby et al. 2002; Gangopadhyay et al. 2004; Fowler et al. 2007; Maurer and Hidalgo 2008). The challenges in dynamic downscaling primarily arise from the difficulty in setting up the model for the spatial domain of interest and calibrating to the observed variables and because of the need of serious computational time (Wood et al. 2004; Maurer and Hidalgo 2008). Given the ease of model fitting and reduced computational cost, statistical downscaling is commonly used in climate application studies (Wilby et al. 2002; Gangopadhyay et al. 2004; Li et al. 2014; Mazrooei et al. 2015).
The primary assumption behind statistical downscaling is that the statistical relationship estimated between large-scale GCM projections and the local-scale data does not change with respect to time. Different types of statistical downscaling techniques have been employed, ranging from simpler regression methods (Huth 1999), semiparametric methods (Stoner et al. 2013; Sankarasubramanian and Lall 2003), and nonparametric techniques (Gangopadhyay et al. 2004) to complex weather generators (Wilks and Wilby 1999). Wood et al. (2004) provided the concept of bias correction and statistical downscaling (BCSD) based on quantile mapping. Recently, the Bureau of Reclamation (BOR; Bureau of Reclamation 2013) has developed monthly precipitation and surface temperature at ⅛° over the conterminous United States (CONUS) for projections from CMIP5 and phase 3 of the Coupled Model Intercomparison Project (CMIP3) using the BCSD concept. In this two-step process, bias correction of GCM projections is first performed based on univariate quantile mapping on the coarse GCM scale, and then spatial interpolation on the bias-corrected variable is carried out to obtain finer-resolution (⅛°) precipitation and temperature. In essence, BCSD provides bias-corrected climate projections by relating the nonexceedance probabilities of the model data with the nonexceedance probabilities of the observed data. Asynchronous piecewise regression models, primarily quantile mapping or regular regression on sorted (i.e., ascending/descending) data, also have shown their capability to downscale climate data at the local scale (Stoner et al. 2013). He et al. (2012) further extended the idea of quantile mapping for bivariate asynchronous measurements by proposing a notion of bivariate ranks and positions. However, extending the approach beyond bivariate techniques seems to be challenging because of the prescribed ranking methods. BCSD methods have also been modified to estimate joint densities of multiple variables for preserving cross correlation (Zhang and Georgakakos 2012; Mehrotra and Sharma 2016). But, both of the above approaches have not been evaluated for different seasons and over large regions. Another recently developed multivariate downscaling approach, Multivariate Adaptive Constructed Analogs (MACA), produces spatially disaggregated daily time series of multiple predictor variables (Abatzoglou and Brown 2012). MACA-downscaled products include temperature, precipitation, humidity, and wind variables, totaling 10 in number. MACA had shown satisfactory performance in exhibiting value-added information in tracking fire danger indices and periods of extreme fire dangers. MACA corrects bias in the GCM simulations, adjusts epoch for no analog situations under future climate scenarios, and identifies the patterns between the GCM and the observed fields using the constructed analogs (CA) algorithm (Hidalgo et al. 2008; Abatzoglou and Brown 2012). However, a rigorous comparison of the existing multivariate techniques in preserving the likelihood of observed moments of the data over a large spatial scale has not been carried out.
Most of the downscaling techniques are calibrated or develop downscaled products using historical simulations of 50 years. Climate projections from CMIP5 substantially differ from its earlier version (CMIP3) with increased spatial resolution, and it also encompasses experiments that focus on decadal variability with hindcasts and future projections obtained by initializing GCMs with observed ocean states (Taylor et al. 2012). There is a growing scientific consensus that at decadal time scales (10–30 years) the choice of the scenario for greenhouse gas emissions contributes little to the uncertainties in climate scenarios generated by different GCMs (Hawkins and Sutton 2009; Meehl et al. 2009; Taylor et al. 2012). Studies have shown that initializing the GCMs with observed SSTs provides better decadal predictions of surface temperature, SST, and other ocean circulation features (Keenlyside et al. 2008; Smith et al. 2010). Recently, Goddard et al. (2013) compared the ability of both schemes, initialized and uninitialized GCM projections, in developing decadal hindcasts, and they found that initialized decadal hindcasts provide better skill in predicting the observed temperature in comparison to the skill of uninitialized hindcasts in predicting observed temperature. CMIP5 hindcast runs were initialized with the observed ocean state at different times, with the archives having ten 10-yr hindcasts (1961–70, 1966–75, …, 2006–10) initialized every 5 years from 1960 and two 30-yr hindcasts initialized in 1960 (1961–90) and 1980 (1981–2010). However, spatial resolutions of these CMIP5 hindcasts are still coarser (around 1.4° × 1.4°) than needed for many climate application studies, thereby requiring downscaled products. To our knowledge, downscaled products for CMIP5 hindcasts are not available as of now. Given the importance of decadal hindcasts and projections for planning activity, this study also considers multivariate downscaling of hindcasts and compares the performance of downscaled products from both historical runs and hindcasts from CMIP5.
The objective of the current research is to propose and evaluate statistical downscaling techniques that preserve cross-correlation structure across variables. The proposed multivariate downscaling technique is developed by modifying the traditional canonical correlation analysis (CCA) method (Hotelling 1936) for relating variables that are not observed at the same time (i.e., asynchronous). CCA was also used in earlier studies for spatial downscaling of climate forecasts (Tippett et al. 2003; Oh and Sankarasubramanian 2012; Sinha and Sankarasubramanian 2013). We evaluate the proposed methodology by downscaling monthly precipitation and temperature from CMIP5 over the CONUS, and the performance of the proposed technique is compared with the existing univariate BCSD products. The proposed downscaling technique is also applied to both hindcasts and historical simulations from CMIP5 to understand how the technique preserves the observed cross correlation between precipitation and temperature over the CONUS. We compare three downscaled products [BOR’s BCSD, asynchronous canonical correlation analysis (ACCA), and MACA] in estimating the joint likelihood of observed precipitation and temperature.
The manuscript is organized as follows. Section 2 provides the details on the observed climate data and CMIP5 simulations and hindcasts over the CONUS. Following that, we propose the ACCA methodology for preserving multivariate correlation structure in climate change projections. Then, we evaluate the performance of the proposed technique and compare ACCA with other approaches. Finally, we summarize the findings of the study along with the discussion.
Monthly precipitation (pr) and average surface (2 m) temperature (tas) data from the Centre National de Recherches Météorologiques Coupled Global Climate Model, version 5 (CNRM-CM5), were used for downscaling. CMIP5 includes twentieth-century historical runs and decadal hindcast runs (Taylor et al. 2012). Experiments forced with representative concentration pathway (RCP) scenarios (van Vuuren et al. 2011) provide future projections. Decadal runs with GCMs initialized in 2005 provide the future projections for the period 2006–34 under different RCPs. We chose CNRM because of its reasonable ensemble size (five members) under historical runs and hindcasts (nine members). CNRM-CM5 was developed jointly by CNRM–Groupe d’Etude de l’Atmosphère Météorologique (GAME) and CERFACS to contribute toward CMIP5 multimodel ensemble (Table 1). The spatial resolution of the model is ~1.4°, which is regridded to ~1.0° using bilinear interpolation for the purpose of further downscaling.
For the purpose of evaluating the proposed multivariate downscaling approach, we consider the period 1950–99 from the CNRM-CM5 historical run, having five ensemble members. The historical runs are primarily used to compare with the existing univariate approach since BOR-downscaled products are primarily available for historical runs. We have used the 1960 decadal run (1961–90) and the 1980 decadal run (1981–2000) for our analysis. The 1960 decadal run (1980 decadal run) was initialized in 1960 (1980) with observed SST conditions in 1960 (1980). Both experiments span 30 years, and each has nine ensemble members. Hindcast runs are also regridded to have the same spatial resolution as historical runs. We consider hindcasts for evaluating multivariate downscaling products from hindcast and historical runs.
To compare our multivariate downscaling approach, we obtain the BOR-downscaled CMIP5 products. BOR has developed bias-corrected twentieth-century historical runs (Bureau of Reclamation 2013; Maurer et al. 2007). Since the bias correction approach is quantile mapping, it provides us the baseline for comparing the proposed multivariate downscaling approach. The BOR database also provides observed pr and tas over the CONUS for the period from January 1950 to December 2010 at a spatial resolution of ~1°. Though we have used the monthly dataset for this current study, the daily BCSD forcing dataset is also available from the BOR website (Wood et al. 2004).
The performance of our multivariate downscaling approach and the univariate approach is compared with the observed gridded variables of pr and tas from BOR. Downscaled products of CNRM-CM5 historical runs using MACA (version 2) are obtained from http://maca.northwestknowledge.net/. Since the spatial resolution of the MACA dataset is ⅞°, we average MACA products (monthly precipitation and monthly maximum and minimum temperature) to 1° using the Climate Data Operator (CDO) software. Monthly maximum and minimum temperature are averaged to obtain the monthly average temperature for CNRM-CM5 at 1°.
3. Motivation and ACCA methodology
To understand the need to preserve the cross-correlation structure in downscaling, we first present how the cross correlation between observed precipitation and observed temperature has changed over the CONUS between the two periods 1950–74 and 1975–99 (Fig. 1). Grid points exhibiting significant cross correlation for the 1950–74 (1975–99) period are shown with a cross (dot), while grid points showing significant cross correlation in both periods are shown with a triangle. Significance in cross correlation is computed at the 95% confidence level based on . For 25 years of data, grid points having cross correlation greater or lesser than ±0.42 are considered as significant.
From Fig. 1, the Northeast, the Ohio Valley, and Florida exhibit significant correlation in January only for the earlier period, and the cross-correlation strength decreases, being not significant over the latter period. The change in cross correlation for the latter period over the Northeast and the Ohio Valley is primarily due to the increase in mean monthly temperature and decrease in mean monthly precipitation during the latter period (see Fig. S1 in the supplemental material). Significant cross correlation exists in January for both periods over Montana. Significant cross correlations exist for both periods over California and over the Rockies during April, whereas significant cross correlation prevails during the latter (earlier) period over North Carolina and Virginia (coastal Northwest) in April. Insignificant cross correlation in the latter period over the coastal Northwest also appears to be due to the increase in mean monthly temperature over the latter period (Fig. S1).
In July, cross correlation is significant during both periods over the eastern and central Sunbelt regions, the coastal Northwest, and the northern Great Plains. Cross correlation in July also appears to be significant over the latter (earlier) period over the upper Midwest (Arizona and the Northeast). From Fig. S1, there is a marked increase in temperature for July in the latter period over the eastern United States. In October, significant correlation exists in both periods over Colorado. On the other hand, significant cross correlation existing in the earlier period over the west and Florida disappears in the latter period in October. From Fig. S1, it is clear that the mean monthly temperature has decreased in the latter period over the entire United States, except for the West Coast. Further, mean monthly precipitation has also increased over the Southeast, Tennessee Valley, and upper Midwest (Fig. S1). Some of the coastal regions (Northwest, Northern California, and Louisiana) also exhibit a significant correlation during the latter period. To summarize, the number of grid points exhibiting significant cross correlation has changed from 208, 145, 323, and 143 to 114, 170, 357, and 117 for January, April, July, and October, respectively, over the latter period. The changing patterns of monthly precipitation and temperature change the cross correlation between precipitation and temperature over the two time periods that we considered. The physical reasons causing these changing patterns, which are not investigated in this study, could be the natural variabilities of the climate that cause low-frequency signals (Devineni and Sankarasubramanian 2010 a,b) or the warming temperature trend (Karl et al. 2012). Thus, any downscaling method that we employ needs to accommodate the changing cross-correlation structure between the variables. Hence, we propose and evaluate multivariate downscaling methodology that preserves the correlation structure across multiple predictands and predictors.
b. Multivariate approach
CCA is a regression-based method. In climate studies, CCA has been applied to identify significant drivers associated with forecast skills (Barnett and Preisendorfer 1987) and to correct systematic errors in GCM simulations (Tippett et al. 2003). In this study, we modified CCA for multivariate downscaling.
Given X being the multiple variables from the GCM runs and Y being the corresponding observed variables over n years at a given grid point, our interest is to develop an empirical relationship between X and Y so that it could be applied for future projections (Stoner et al. 2013). For a given month, dimensions of X and Y will be (n × p), where p is the number of variables to be downscaled, and X and Y matrices are constructed for calibrating the proposed downscaling scheme. Let Z denote the validation set of GCM projections with a dimension of Z being (n′ × p), where n′ is the number of years considered for downscaling. Given X, Y, and Z as asynchronous data with observations and GCM responses having no time correspondence, we modify the traditional CCA for downscaling GCM projections.
Asynchronous Canonical Correlation Analysis
The steps in performing ACCA downscaling are summarized in Fig. 2:
Models are developed for each grid individually. As mentioned before, dimensions of X and Y are (n × p), and the dimension of Z becomes (n′ × p).
- Multivariate sorting of X and Y is based on joint probability. Given that climate change projections do not possess any time correspondence with the observed variables, it is common to employ asynchronous regressions for downscaling climate change projections (Stoner et al. 2013). Asynchronous regression is typically developed by sorted X and sorted Y in ascending or descending order. Univariate sorting is based on the quantiles of the fitted distribution (Koenker and Bassett 1978) or quantile mapping (Wood et al. 2004). Multivariate sorting can be done by sophisticated statistical methods such as bivariate ranks and positions (He et al. 2012). We propose a simpler, yet efficient, way by extending the concept of univariate sorting for multivariate sorting in which the calibration datasets X and Y were sorted based on their joint probability of occurrence. Given and being the respective means of X and Y and and being the covariance matrices of X and Y, we assume that X and Y follow the bivariate normal distribution: and , the higher the ranks are that should be assigned to the multivariate vector at each time step.
Joint probabilities are transformed to log space before comparison. Thus, both X and Y are sorted separately based on their joint probability of occurrence. These sorted X and Y correspond to each other asynchronously; hence, we call the procedure ACCA. Parameters X, Y, and Z were fitted in the log space to avoid estimating negative values in the downscaled variables. Hence, we considered the temperature in Kelvin scale for developing the ACCA.
- A canonical correlation model is then developed using X and Y. Canonical correlation, a parametric approach, is considered as the apex of regression-based modeling (Barnett and Preisendorfer 1987) by regressing various predictors and predictands in their reduced dimension. CCA defines an optimum linear combination of the prediction dataset that has the potential to explain the total variance in the predictands. CCA linearly rotates predictors X and predictands Y in such a way that their rotated components are orthogonal to each other, maximizing the explained variance on rotated Y using rotated X. Parameter U is the rotated component of sorted X and V is the rotated component of sorted Y. Parameters U and V can be calculated from Eqs. (3) and (4), respectively:
- Both eigenvectors and canonical correlations from calibration are used in validation to rotate future projections Z to obtain rotated GCM projections for developing downscaled product. Future GCM projection Z was first rotated to using eigenvector A: , canonical correlations from calibration were used: components are brought back to original space to get the downscaled variable using eigenvector B from calibration:
Finally, the downscaled variables are back transformed from the log space by taking the exponential. We built our model by using “cancorr” function in MATLAB release 2012a.
Bivariate sorting based on the joint occurrence does not change the actual cross correlation of the GCM or the observed dataset. It only changes the index of the dataset in a bivariate manner. All the reported cross correlation on evaluating the ACCA is after back transforming the ACCA-estimated precipitation and temperature into the original plane (i.e., taking exponentials on the ACCA-estimated values).
c. ACCA model development and validation (historical and hindcast)
ACCA was applied to both historical runs and hindcast experiments of CNRM-CM5. The proposed framework considered the entire period from January 1950 to December 1999 from raw historical GCM runs for model development. Downscaled bivariate historical GCM projections were compared with BOR’s univariate downscaled products of precipitation and temperature. BOR developed the bias-corrected downscaled products based on quantile mapping by considering the entire dataset over the period 1950–99. Hence, we compared the performance of ACCA in a similar way by developing the model using the entire dataset 1950–99. This analysis provides the baseline comparison of the proposed technique with the existing downscaled products.
The multivariate downscaling technique was validated for hindcast runs based on leave-5%-out cross validation. Under leave-5%-out cross validation, 1980 decadal experimental run results were used for model development by leaving out the year of prediction as well as an additional 5% of the remaining data (i.e., both X and Y). Then, using the model developed under leave-5%-out cross validation, we obtained the downscaled pr and tas for the left-out years. To compare results obtained from multivariate downscaling of hindcast runs, we developed separate univariate asynchronous regression. Univariate asynchronous regression (ASR), suggested by O’Brien et al. (2001), first sorts both the GCM variable and the observed data based on their probability of exceedance of the univariate variable and then performs the linear regression between the GCM variable and the observed data to get the regression model parameters. Model parameters thus obtained from ACCA were applied on the future GCM projections to get the downscaled projections of GCM data. The performance of the proposed regression technique, ACCA, was compared with the univariate ASR in downscaling.
d. Performance evaluation metric: Median fractional bias
It is important to note that the downscaled precipitation and temperature are developed for each ensemble member from CNRM-CM5. Downscaled ensemble members were used to calculate the fractional bias on the statistic of interest λ from GCM projections. Fractional bias for λ is defined as follows:
where θ (θ′) represents the observed (downscaled) statistic for a given ensemble member l. We considered the bias in estimating two observed statistics—standard deviation of pr and cross correlation between pr and tas—for evaluating the performance of the proposed multivariate technique. We calculated the median of fractional bias using all the ensembles available under historical/hindcast run at a given grid m. We prefer the median to be as close to zero as possible to indicate that the statistics of the downscaled variables preserve the observed statistic θ. We have divided medians of fractional biases into 10 categories at an interval of 0.25 for quantifying the spatial variability across the CONUS.
a. Comparing the performance of BOR products with ACCA historical runs
Historical runs from CNRM-CM5 consist of five ensemble members that lead to five values of the fractional biases for each statistic for a given grid point. The median of fractional bias at each grid point is calculated over these five members. Figure 3 shows median fractional biases in estimating the observed cross correlation between pr and tas from the historical ensemble runs. Four months from January to October are plotted from Fig. 3 (top) to Fig. 3 (bottom) while estimates from ACCA and quantile mapping estimates from the BOR are plotted in Fig. 3 (left) and Fig. 3 (right). Given a data length of 50 years over 1950–99 for the historical runs, cross correlations above (below) 0.29 (−0.29) are significant at the 95% confidence interval. Only those grid points for which the observed cross correlation between pr and tas is significant are marked with a black dot.
In January, ACCA maintains a small median fractional bias within ±0.25, whereas the BOR approach has the median biases between −0.25 and −0.49 in the northern region. Over other regions, ACCA underestimates the cross correlation with medians from 0 to −0.24. In April, a greater number of grid points over the central United States exhibit significant cross correlation compared to January. Under those regions, ACCA has the median fractional bias between 0 and −0.24. Similarly, the fractional biases over the western grid points are also low and negative for the multivariate technique. BOR’s univariate quantile mapping approach exhibits median fractional bias more than −1, indicating the inability to preserve the observed cross-correlation structure.
As shown in Fig. 1, many grid points exhibit the significant cross correlation during July compared to the other three months. In July, ACCA performs better in preserving the cross correlation with median fractional biases for cross correlation between −0.24 and 0.24 (Fig. 3). BOR’s quantile mapping performs better in July in comparison to other months, with median fractional bias around 0.25–0.49, but still worse than ACCA. In October, significant cross correlation between pr and tas shifts to the Northwest and the Southwest. ACCA overestimates (underestimates) the observed cross correlation between pr and tas in the Northwest (Southwest) with the median fractional biases between 0 and 0.24 (0 and −0.24).
Details of the performance of two approaches, ACCA and BOR products, in preserving observed monthly cross correlation between pr and tas for the historical runs are summarized in Table 2. The analysis considered grid points that exhibit statistically significant cross correlation. The values in the table provide the fraction of total grid points that has the median fractional bias within the range between −0.24 and 0.24. The relative performance of one model over the other model is specified in the parentheses. For example, during January, median fractional bias under ACCA is from 0.24 to −0.24 over 92% of the significant grids, and ACCA is better than BOR in 70% of the grid points exhibiting significant cross correlation. Considering all months, ACCA presents the median fractional bias in cross correlation between −0.24 and 0.24 in 90% of the grid points where the observed cross correlation is statistically significant. On the other hand, univariate downscaling from the BOR products shows a higher bias, with only 0%–20% of grid points having a low fractional bias (in the range from −0.24 to 0.24) during winter and spring months, but improves to around 55% of grid points in other months. Overall, regarding preserving the cross correlation, ACCA performs better (percent shown in parentheses in Table 2) over 70% grids than the univariate method. ACCA performs better than the quantile mapping products from BOR based on the number of grid points (percent of grid points shown without parentheses in Table 2) having the lowest fractional bias in cross correlation over all the months.
Median fractional bias in estimating the observed standard deviation for the two approaches under the historical runs is summarized in Table 3 over 12 months. Given the BOR’s univariate ASR approach being quantile mapping, it shows a negligible bias in standard deviation. The bias is almost completely removed from the GCM runs. For this analysis, we consider all the grid points over the CONUS. During January, ACCA exhibits the median fractional bias in standard deviation from −0.24 to 0.24 over 60% of the grid points. Results for July are very much the same as January for the ACCA. In July, ACCA shows a low median fractional bias over 56% of all grid points. However, during April and October, the amount decreases to 39% and 22%, respectively. Overall, the ACCA approach overestimates the observed standard deviation for most grid points across the country. BOR’s univariate performs better since it was developed to ensure the observed mean and standard deviation in precipitation and temperature over all the grid points. We are not reporting the median fractional biases in mean monthly precipitation and temperature since both approaches preserve the observed mean for the historical period, as regression methods are expected to maintain the long-term mean for the calibration dataset. However, the biases in estimating the mean and standard deviation are supposed to increase under the validation. Given that we have compared the performance of ACCA with BOR products under historical runs, we next compare the performance of ACCA with the univariate downscaling, ASR, for downscaling and bias-correcting hindcasts under the cross validation.
b. Performance comparison of ACCA and ASR for hindcasts
Since hindcasts typically consist of 10-yr/30-yr runs, we compare the performance of ACCA and ASR based on the leave-5%-out cross validation. Hindcast experiments from CNRM-CM5 consist of nine ensemble members, so the medians of fractional bias are calculated over nine members. Median fractional biases for cross correlation between pr and tas are plotted for three approaches and four months in Fig. 4. Similar to the historical runs, black dots are pointed at the center of grid points where the observed cross correlations between pr and tas are significant for the period 1981–2010. Statistical significance related to 30 years of monthly data is within ±0.37. Results for all the months are summarized in Table 4.
During January, ACCA keeps the median fractional bias for cross correlation within the small range (between −0.24 and 0.24) at most of the grid points. The univariate ASR approach underestimates the cross correlation at most of the grid points. From Table 4, ACCA preserves the cross correlation within the low range at 83% of significant grid points whereas univariate ASR approach performs better in 35% of the grid points. ACCA has the lowest cross correlation during January in 79% of grid points that exhibit statistically significant cross correlation. ACCA also underestimates cross correlations in April over the western United States. Univariate ASR underestimates significantly over the central and eastern United States. Univariate performance improves considerably during the summer, with median fractional bias in the low range over 76% of grid points during June and 64% of grid points during July. ACCA can preserve the significant cross correlations over the central United States with the median fractional bias value between −0.24 and 0.24. However, ACCA underestimates the cross correlation over the western United States. Considering all the months, ACCA performs better than the univariate ASR over 60%–70% grid points. Further, ACCA can limit the bias within the desired low range over 80% of grid points.
About preserving the standard deviation, the univariate approach resulted in slight biases in estimating the standard deviation of precipitation. Results for ACCA and ASR in preserving the observed standard deviation are summarized in Table 5. ACCA overestimates the standard deviation in precipitation during winter months. ACCA preserves the standard deviation for at least 40% grid points with the bias being within the desired range. Comparing ACCA performance with ASR, in general, ASR performs better regarding capturing the standard deviation in precipitation. Overall, ACCA preserves the cross correlation better, whereas univariate preserves the standard deviation better, and both schemes exhibit equal efficiency in estimating the mean. We address this trade-off by calculating the joint likelihood of pr and tas from both schemes in section 4d.
c. Initialized and uninitialized CMIP5 run: A comparison
One objective of this study is to compare the performance of the univariate and the multivariate downscaling techniques under the initialized hindcast runs and the uninitialized historical runs in preserving the observed moments. Nonexceedance probabilities [cumulative distribution function (CDF)] of cross correlations from the raw GCMs and the downscaled GCM runs under hindcasts and historical runs are shown for four months (Fig. 5). Observed, historical runs, and hindcast runs are indicated as red, blue, and green lines, respectively. CDFs of nine ensemble members of the hindcast runs and five ensemble members of the historical runs are plotted for grid points exhibiting significant cross correlation. Raw hindcast runs and raw historical runs underestimate the observed cross-correlation distribution in January and April (Fig. 5, left). For January and April, univariate bias correction when applied on the historical runs did not show any improvements from the raw cross correlation estimated by the GCMs (Fig. 5, center). Das Bhowmik (2016) also confirmed analytically that the cross correlation estimated by the univariate regression methods (e.g., ASR) simply reproduces the cross correlation estimated by the GCMs. However, the CDF of cross correlation of bias-corrected historical runs using ACCA better estimates the observed cross-correlation CDF. Further, ACCA estimates of cross correlation for the hindcast runs show lesser bias compared to the raw hindcast runs (Fig. 5, right). The improved performance of ACCA under hindcasts and historical runs is due to its ability to develop multiple-regression relationships by relating multiple predictands with multiple predictors within the canonical regression framework. Thus, the CDF of bias-corrected hindcasts and historical runs using multivariate downscaling better preserves the observed cross-correlation CDF.
d. Estimation of joint likelihood of observed precipitation and temperature
The proposed technique, ACCA, estimates the observed cross correlation better than ASR, but estimates observed standard deviation of both precipitation and temperature with increased bias. The increased bias in estimating one of the moments (i.e., standard deviation) could be interpreted as a trade-off between preserving standard deviation in precipitation and preserving cross correlation between pr and tas in selecting the right downscaling approach for climate application studies. To quantify this trade-off, we evaluate both ACCA and ASR on their ability to estimate the joint probability of downscaled monthly pr and tas over the historical (1950–99) and hindcast period (1981–2010) using the observed moments of precipitation and temperature by assuming the bivariate lognormal distribution (Sankarasubramanian and Lall 2003). Similarly, we also estimate the joint probability of estimating the observed monthly pr and tas considering the observed moments of pr and tas. For each grid point, the likelihood ratio for ACCA/ASR is calculated by dividing the joint probability of downscaled pr and tas of ACCA/ASR estimated using observed moments with the joint probability of observed pr and tas. Comparing the two likelihood ratios from ACCA and ASR, the model (i.e., ACCA/ASR) that has likelihood ratio closer to 1 is considered as the best performing approach, since the joint probability estimated from that model is closer to the joint probability of the observed estimated using the observed moments of the data (step-by-step methodology in the supplemental material).
Figures 6a and 6c provide the fraction of total grid points where the likelihood ratio of ACCA is better than the ASR likelihood ratio for the historical and the hindcast runs, respectively. The same has been calculated for grid points with statistically significant cross correlations, and the results are shown in Figs. 6b and 6d for the historical and the hindcast runs, respectively. Historical experiments with five ensemble members and hindcast experiments with nine ensemble members are considered. One potential reason for the improved performance of hindcast in comparison to the historical runs based on the likelihood ratio could be due to their improved performance in predicting the variability in observed temperature as the models are initialized with observed SSTs (in this case 1960) prior to the period of developing projections. Studies have shown improved ability of decadal hindcasts in projecting temperature that has been attributed to both the initialized SSTs and to forcing the model with observed greenhouse gas concentration (Keenlyside et al. 2008; Smith et al. 2010; Goddard et al. 2013).
In three months, January, April, and October from the historical runs, shown in Fig. 6a, ACCA performs better over one-quarter of all grid points since the likelihood ratio is close to 1, particularly for grid points exhibiting significant cross correlation. During July, in which almost half of all grid points have witnessed statistically significant cross correlation, the likelihood ratio of ACCA is better than the likelihood ratio of ASR, being better in more than half of all grid points. The performance of ACCA is 5%–10% better when only significant cross-correlation grid points are considered instead of considering all grid points. During January, April, and October, ACCA is better than ASR over 30% of grid points with significant cross correlation, and during July, ACCA exhibits better joint likelihood ratio compared to the ASR likelihood over 60% of grid points. On the other hand, the ACCA approach using the hindcast runs performs better than ASR over 80% of all grid points, almost for all the months and for all ensemble members, by keeping the likelihood of pr and tas closer to the observed likelihood. However, the ACCA performance is only slightly enhanced when significant cross-correlation grid points are considered for the hindcast runs. This result confirms that preserving cross correlation by ACCA results in a better estimation of the joint probability of pr and tas during those months where the dependency between pr and tas is statistically significant. ACCA performance is better at estimating the joint likelihood between pr and tas when the initialized runs are considered.
We also compared the likelihood estimation of ACCA with MACA products. MACA adjusts epoch and trends in the climate projections and then applies quantile mapping for bias correction. Following that, MACA applies CA (Hidalgo et al. 2008) to develop daily downscaled CMIP5 products. Version 2 of the MACA dataset is considered at a monthly time scale (http://maca.northwestknowledge.net/index.php). We performed a bilinear interpolation in CDO to bring MACA and ACCA products to the same resolution. Figures 7a and 7b show the fraction of grid points over which the performance of ACCA is better than that of MACA in estimating the joint likelihood of observed pr and tas for all grid points and for significant grid points, respectively, over the CONUS. The performance of ACCA is consistently better over all four months considered. During April and July, ACCA performs better than MACA over 80% of the significant grid points. ACCA also performs better than MACA over 70% of the significant grid points during January and October. Thus, the ability of ACCA in estimating the joint likelihood of pr and tas is better than both MACA and ASR. The main reason for MACA’s inability in estimating pr and tas cross correlation is primarily due to the fact that it uses quantile mapping for bias correction. Further, at the daily time scale, CA is applied jointly on the temperature variables, but the analogs for precipitation are identified separately (Abatzoglou and Brown 2012). These steps allow MACA to retain the coherence between the temperature variables but compromises with the coherence between precipitation and temperature. The primary information from this comparison of ACCA with ASR and MACA shows that estimation of cross correlation is very critical in preserving the joint likelihood of observed precipitation and temperature.
5. Discussion and concluding remarks
Currently employed univariate downscaling approaches, such as quantile mapping and ASR, have limited potential in preserving the cross correlation between multiple variables. Analyses show that cross correlation between precipitation and temperature is changing over the CONUS. Grid points with significant cross correlation are decreasing in magnitude, resulting in insignificant cross correlation during winter months. On the other hand, mainly during summer and fall months, cross correlation over the Midwest remained significant with limited/no change in magnitude over the central United States. Given such spatiotemporal variability in the dependency between precipitation and temperature, downscaling and bias correction techniques need to consider the cross correlation between precipitation and temperature along with the mean and standard deviation of climate attributes.
Hence, we proposed and evaluated a multivariate downscaling technique, ACCA, for preserving the cross correlation between precipitation and temperature. Results of multivariate downscaling were compared with the univariate downscaling—quantile mapping, ASR, and MACA. As a baseline, we compared the performance of ACCA with the univariate downscaled historical runs from the BOR and MACA. The multivariate downscaling approach, ACCA, performs better than the univariate downscaling techniques, ASR, or quantile mapping, in preserving the cross correlation. Both methods, univariate approaches and ACCA, showed slight bias in estimating the standard deviation of temperature; hence, those results are not presented. However, this improved estimation of the cross correlation comes with increased bias in estimating the standard deviation of precipitation by ACCA. Hence, we performed a likelihood analysis to understand how ASR, MACA, and ACCA preserve the joint likelihood of observed precipitation and temperature using the bias-corrected mean, standard deviation, and cross correlation from each technique. We find that ASR performs better in estimating the joint likelihood of observed precipitation and temperature during months when fewer grid points exhibit statistically significant cross correlation under historical runs. However, ACCA performs better than ASR in months (e.g., July) when most grid points exhibit the statistically significant cross correlation under historical runs. ACCA also exhibits better joint likelihood ratio than MACA products from CMIP5 downscaled historical runs over all months.
We also examined the skill of initialized CMIP5 runs with initialized SSTs (hindcasts) compared to uninitialized CMIP5 runs regarding estimating cross correlation between precipitation and temperature. For raw GCM runs, there was little or no skill in estimating cross correlation for both hindcast and historical runs, as their respective CDFs depart significantly from the observed CDF of cross correlation. Multivariate bias correction of historical and hindcast runs using ACCA has shown significant improvement in reducing the bias in the estimation of the cross correlation. Multivariate bias correction of the historical runs also exhibited significant improvement in estimating the cross correlation compared to the estimates obtained from ASR. On the other hand, downscaled cross correlation of historical runs using ASR did not result in any improvements in reducing the error in estimation of the observed CDF of the cross correlation. Comparing the performance of hindcasts and historical runs in estimating the joint likelihood of observed precipitation and temperature, we find that ACCA performs much better than ASR in all months for bias-correcting hindcasts. This partly arises from the improved ability of hindcasts in capturing the observed variability in precipitation.
The proposed multivariate scheme, ACCA, is applied to GCM and observed datasets having the same spatial resolution (1° × 1°). In principle, ACCA can also be used for downscaling. For instance, the ACCA algorithm could be employed for downscaling GCM fields to another spatial scale (e.g., ¼° or watershed scale) to obtain downscaled monthly precipitation and temperature. Thus, the proposed technique could be used for both bias correction and spatial downscaling. The proposed algorithm can also be extended to downscale variables at daily time scale. One approach would be to fit ACCA models between the observed and GCM daily fields by considering a window of a predetermined number of days. We intend to consider this as part of our ongoing work on developing downscaled products using ACCA.
Preserving the cross correlation between precipitation and temperature is also necessary for estimating land surface states such as soil moisture and runoff using downscaled GCM projections. Simulation of streamflow in large basins where many grid points exhibit significant cross correlation exist could benefit from ACCA, particularly in predicting low flows. For instance, if the significant negative cross correlation in the summer is overestimated by the downscaled products (i.e., negative values are closer toward zero cross correlation), it will result in less dependence between precipitation and temperature, which could potentially result in increased temperature after precipitation events. Such increased temperature exhibited in the bias-corrected fields could potentially induce more evapotranspiration, resulting in reduced runoff and soil moisture. On the other hand, if the significant positive correlation between precipitation and temperature during the winter is underestimated, cross correlation between the projected soil moisture and streamflow will also be biased, which will, in turn, impact the cross correlations between the snow storage and runoff/snowmelt generation in the spring. Since human-induced climate change will lead to further warming (Hawkins and Sutton 2009; Hansen et al. 2010; Goddard et al. 2013), cross correlation between pr and tas will also likely change. Thus, preserving the cross correlation in the downscaled variables is important for developing hindcasts of land surface fluxes and streamflow projections under the near-term climate change (Seo et al. 2016). Our future effort will focus on the importance of preserving cross correlation in climate forcings for developing streamflow projections under the near-term climate change.
This research is supported in part by the National Science Foundation Grant CBET-1204368. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. We also thank the two anonymous reviewers and the editor, Dr. Andy Wood, for their comments and detailed reviews, which led to substantial improvements in the quality of the manuscript.
Supplemental information related to this paper is available at the Journals Online website: http://dx.doi.org/10.1175/JHM-D-16-0160.s1.