A novel multivariate quantile-matching nesting bias correction approach is developed to remove systematic biases in general circulation model (GCM) outputs over multiple time scales. This is a significant advancement over typical quantile-matching alternatives available for bias correction, as they implicitly assume that correction of individual variable attributes will lead to correction of dependence biases between multiple variables. Furthermore, existing approaches perform bias correction at a given time scale (e.g., daily), whereas applications often require biases to be addressed at more than one time scale (such as annual in the case of most water resources planning projects). The proposed approach addresses all these issues, and additionally attempts to correct for lag-1 dependence (and cross-dependence) attributes across multiple time scales. The approach is called multivariate recursive quantile nesting bias correction (MRQNBC). The fidelity of the approach is demonstrated by applying it to a vector of CSIRO Mk3 GCM atmospheric variables and comparing the results with the commonly used quantile-matching approach. Following this, the implications of the approach in hydrology- and water resources–related applications are demonstrated by feeding the bias-corrected data to a rainfall downscaling model and comparing the downscaled rainfall attributes for current and future climate. The proposed approach is shown to represent the variability and persistence related attributes better and can thus be expected to have important consequences for the simulation of occurrence and intensity of extreme events such as floods and droughts in downscaled simulations, of importance in various climate impact assessment applications.
Regional and catchment scales studies dealing with the assessment of impacts of climate change on hydrology are of great interest to water resources planners and managers in formulating decision making and mitigation policies. General circulation models (GCMs) are often used to obtain climate information necessary to run the catchment models used in these studies. However, these GCMs have limited capability to capture catchment‐scale or regional‐scale climate variations that are of direct relevance for assessment of potential impacts of climate change (e.g., IPCC 2007; Wood et al. 2004). Thus, there is a mismatch between the capability of current GCMs and preferred information at the catchment scales. Dynamical or statistical downscaling approaches are routinely applied to transfer the coarse-scale GCM information to the fine scale required in regional and catchment scales studies. Furthermore, inadequate knowledge of key physical processes (e.g., cloud physics) and their simplification in climate models often leads to systematic errors in the output of GCMs and/or regional climate models (RCMs) (Li et al. 2010). Therefore, postprocessing of raw outputs from GCMs or RCMs is normally considered an essential step before their use in downscaling or hydrological modeling studies (Maurer and Hidalgo 2008).
Several statistical bias correction approaches have been developed and proposed in the past. The underlying idea behind these approaches is to identify the possible biases between observed and GCM simulated variables for the current climate and correct both current and future climate GCM simulations for these biases under the assumption that they do not change over time. It is expected that such a postprocessor should correct more than one aspect (e.g., mean or variability) at multiple time scales (daily, monthly, and annual) of the raw GCM simulation. Daily or monthly standardization forms the most basic bias correction and is used routinely to correct for systematic biases in the mean and variances of GCM simulations (Wilby et al. 2004). Other bias correction approaches use quantile matching, correction factors, and transfer functions (e.g., Arnell and Reynard 1996; Chen et al. 2013; Chiew and McMahon 2002; Teutschbein and Seibert 2013; Mpelasoka and Chiew 2009; Ines and Hansen 2006; Li et al. 2010; Piani et al. 2010; Wood et al. 2004), as well as correction across multiple time scales (e.g., Johnson and Sharma 2012; Mehrotra and Sharma 2012).
The quantile matching or distribution mapping bias correction approach has been used extensively in climate change impact assessment studies (e.g., Cayan et al. 2008; Li et al. 2010; Teutschbein and Seibert 2013; Maurer and Hidalgo 2008). The approach corrects for the biases in the entire frequency distribution of the target variables. A variation of quantile matching, termed equidistant quantile matching (EQM), has been proposed by Li et al. (2010).
Commonly used quantile-matching approach deals with the biases at a selected single time scale (day, month, or year) and ignores the biases in persistence attributes. The nesting bias correction (NBC) approach (Johnson and Sharma 2012; Mehrotra and Sharma 2012) offers the advantage of postprocessing GCM simulations for biases in mean, standard deviation, and lag-1 persistence at multiple time scales. While all of the approaches above are applied to multiple variables of relevance in a single study (such as precipitation and potential evapotranspiration in a hydrological assessment, to a collection of atmospheric variables relevant for use in statistical or dynamical downscaling), the assumption that is made in such applications is that the joint dependence attributes of these variables does not change between observations and GCM simulations, an assumption that is flawed more often than not.
Given the importance of correcting for any mismatch in dependence attributes between simulations and observations, Piani and Haerter (2012) applied a univariate bias correction to precipitation time series conditional on a few discrete classes (bins) of the bias corrected time series of temperature. They reported significant improvements in the dependence structure between the two variables even with a relatively small number of temperature classes. In addition to traditional quantile matching–based bias correction approaches, copula-based methods have also been applied to consider the joint and/or spatial dependence across variables and grids (Mao et al. 2015; Vrac and Friederichs 2015) with realistic reproduction of observed spatial and temporal fields. Recently Mehrotra and Sharma (2015, hereafter MS2015) proposed a multivariate extension of the nesting bias correction approach to reproduce the observed dependence across meteorological variables in bias-corrected time series. The approach can be used to simultaneously correct for the biases in the distributional and auto- and cross-dependence attributes in a multivariate time series across multiple time scales of interest. This approach corrects for differences in the parameters of an assumed multivariate autoregressive first-order model defining the observations and simulations, and hence claims to correct for distributional biases as per the differences in the parameters (mean, standard deviation, and correlation) estimated.
While the parametric bias correction alternatives discussed above have obvious advantages, the apparent advantage of quantile-matching approach lies in its ability to correct for the biases in the entire frequency distribution of the variable including the extremes. On the other hand, in addition to correcting for the biases in the persistence related attributes, the nesting logic is capable of correcting for the biases simultaneously at multiple time scales. Following this, we adopt and combine the idea of equidistant quantile matching from the method proposed by Li et al. (2010), of persistence-based nesting bias correction from Johnson and Sharma (2012), and of the recursive logic from Mehrotra and Sharma (2012) and introduce recursive nesting at multiple time scales under a quantile framework. Additionally, we also adopt and propose a multivariate extension of this combination following the procedure detailed in MS2015. The resulting statistical bias correction approach is then referred to as multivariate recursive quantile-matching nesting bias correction (MRQNBC).
Please note the significant differences between the approaches presented here and discussed in MS2015. Both approaches work on totally different rationales. The bias correction approach of MS2015 corrects for the biases in means and standard deviations using these statistics of the observed and raw data (parametric) whereas equidistant quantile matching, adopted in the present study, does so using the entire empirical distribution of observed and raw series (nonparametric). Similarly, MS2015 corrects each value of the raw series using observed means and standard deviations (constant parameters) whereas EQM operates in the ranked space and corrects the distribution of raw series using the differences of observed and raw values at each quantile (variable scaling). As mentioned earlier, EQM offers an added advantage of correcting for the biases at the upper and lower tails of the distribution (extremes), which are of importance in any climate change impact assessment study. The bias correction framework presented here primarily follows on the equidistant quantile-matching bias correction approach of Li et al. (2010) and extends the approach to deal with the biases at multiple time scales, multiple variables, and low-order persistence attributes. The logic and the discussions related to the multivariate part of the approach presented here may look similar to what was presented in MS2015.
The reliability and/or robustness of any bias correction approach used to correct raw GCM outputs can be effectively tested by employing the bias-corrected time series either as boundary conditions or as an input to a downscaling or catchment modeling model and evaluating results. The output of these models depends significantly on the quality of the large scale climate model simulations that have been used as boundary conditions and/or input to the model used. It is recognized that a properly bias-corrected GCM or RCM output used to obtain downscaling/hydrological simulations will provide more accurate and realistic assessment of the changes in the hydrological cycle (Hagemann et al. 2011; Sharma et al. 2007). Keeping in mind these aspects, this study first investigates the impact of various quantile matching–based bias correction alternatives on the raw GCM output when compared to observations and thereafter examines and evaluates the influence of these bias-corrected GCM predictors on the downscaled rainfall. However, alternatives do exist to apply bias correction on the raw GCM or RCM rainfall directly with focus on the removal of biases in rainfall frequency, intensity, and extremes (e.g., Cannon et al. 2015; Ines et al. 2011; Grillakis et al. 2013; Bárdossy and Pegram 2011; Bárdossy and Pegram 2012; Pegram and Bárdossy 2013). The approach proposed here is quite generic and should be viewed as an extension to the existing quantile-matching univariate bias correction approach, which can be applied to all continuous variables.
The remainder of this paper is structured as follows. Section 2 presents an overview of the bias correction model proposed and of the downscaling model used in the study. Section 3 provides a description of the atmospheric variables considered and section 4 presents the results and discussions. Finally, the conclusions drawn from the study are presented in section 5.
The MRQNBC model structure builds on distribution matching, time nesting, multivariate modeling, and recursion. The discussion related to the multivariate modeling draws heavily on MS2015 and a summary of the essential material is reproduced here. In the following, we represent distribution attributes by the empirical frequency distribution of the data, and persistence attributes by the lag-0 and lag-1 auto- and cross-correlation coefficients, all modeled at four predefined bias correction time scales—daily, monthly, quarterly, and annual. The statistical attributes and time scales selected are arbitrary, and the approach presented here could accommodate more generic representations of statistical attributes, as well as time scales (Johnson and Sharma 2012; Mehrotra and Sharma 2012, 2015). We first describe the basic structure of the univariate standard equidistant quantile matching and of the full multivariate model that incorporates cross and lag dependence attributes. This is followed by the discussion on the nesting across multiple time scales, recursions, and some simplifications in the model structure. Finally, we provide a short description of the downscaling model used. Please note that the EQM and the multivariate autoregressive (AR) modeling (in the context of bias correction) are discussed at length in Li et al. (2010) and MS2015, respectively, and a few important aspects are reproduced here for the sake of completeness.
In the discussions hereafter, subscript t refers to the day t in the time series and M denotes the number of variables. Also, we refer to three types of datasets; observed (reanalysis) and GCM current and future climate daily time series of these M variables. For simplicity, the subscript for variables is ignored. Let vectors , respectively, represent the observed (reanalysis) and modeled (GCM simulated) raw daily time series of M variables.
a. Univariate equidistant quantile matching
The univariate bias correction approach of Li et al. (2010) is a quantile-based matching approach that works on the cumulative distribution functions of observed (reanalysis), training (GCM current climate), and testing (GCM future climate) datasets at a single defined time step. The approach assumes that, for a given quantile, the difference between the GCM and observed value during the training period also applies to the future testing period. It basically implies that although the adjustment function remains the same for the current and future climate, the difference between the CDFs for the current and future periods is also taken into account. After applying EQM, the GCM bias-corrected time series is denoted as . For daily time series, the approach includes the following steps:
Fit empirical cumulative distribution functions (CDFs) to the observed () as well as GCM () simulated data for current and future climate .
For a given value of future climate GCM simulations , calculate the cumulative probability [FGCM, ]. Obtain the values from observed CDF and GCM current climate CDF for this cumulative probability. Calculate the difference of observed and GCM current climate values at this cumulative probability.
Obtain the corresponding value for this cumulative probability from the future climate CDF. Apply the difference to the value for the future climate to obtain the bias-corrected value.
Repeat the same procedure for every point for the future projection time series  to obtain the bias‐corrected future projection time series .
b. Multivariate autoregressive bias correction model
We modify the EQM bias-corrected time series of individual variables using a multivariate transformation so that the resulting series match the observed lag-1 and lag-0 auto and cross correlations. As discussed in MS2015, this multivariate transformation is based on a standard multivariate autoregressive model. We apply EQM and multivariate modeling at all predefined time scales, namely daily, monthly, quarterly, and annual. To have appropriate representation of seasonality in the bias correction procedure, two types of models, nonperiodic (daily and annual) and periodic (monthly and seasonal), are considered. The nonperiodic model is based on the multivariate first-order AR [AR(1)] model with constant parameters while the periodic model follows on the multivariate AR(1) model with periodic parameters (Salas et al. 1980). The four quarters considered in the analysis presented here are autumn [April–June (AMJ)], winter [July–September (JAS)], spring [October–December (OND)], and summer [January–March (JFM)] and are mentioned hereafter as seasons. Please note that the seasons defined here are simply represent three months’ aggregation/averaging and do not follow the definitions of seasons commonly used in Australia. This variation is adopted to simplify the time scale aggregation from month to season and from season to year (MS2015).
As mentioned before, the EQM bias-corrected time series of individual variables is denoted . The observed and GCM standardized time series (after applying univariate EQM to each individual time series) with zero mean and unit variance are denoted as and , respectively. The following sections present the expressions for multivariate periodic and nonperiodic (constant) autoregressive models of order one for the observed and modeled time series. These formulations are discussed in details in Salas et al. (1980) and MS2015 and are reproduced here for the sake of completeness only.
1) Multivariate AR(1) model with constant parameters
A multivariate first-order autoregressive model (MAR; with constant parameters) to model the observed and GCM multivariate time series can be expressed as (Salas et al. 1980)
where and are the coefficient matrices of the observed time series . These matrices contain the lag-1 and lag-0 cross-correlation information as explained later. Similarly, and are the corresponding coefficients matrices of the standardized and EQM corrected GCM time series , which is a vector of mutually independent normal random variates.
As mentioned in MS2015, the multivariate bias correction is implemented by removing the lag-0 and lag-1 auto- and cross-correlations ( and ) from the standardized GCM time series and applying the observed lag-0 and lag-1 auto- and cross-correlations ( and ) and creating a modified time series, . Rearranging the terms of Eq. (1b) and simplifying for leads to the following:
where is a standardized vector of M variables formulated after taking out the lag-1 and lag-0 dependence structures from series. We use this standardized vector along with observed lag-1 and lag-0 attributes ( and ) to obtain a modified series as follows:
Equation (3) presents the structure of a multivariate model that maintains the observed lag-0 and lag-1 auto- and cross-dependence structures in the bias-corrected time series. Adding back observed means and standard deviations provides a bias-corrected time series of GCM variable(s) that has the desired means, standard deviations, distributions, lag-1 correlations, and lag-1 and lag-0 cross-dependence attributes (MS2015).
Matrices and or and are derived from the lag-0 and lag-1 cross-correlation matrices of the observed and modeled time series (Matalas 1967):
where and are the lag-0 and lag-1 cross-correlation matrices, respectively. Elements of matrix or are found by singular value decomposition of or and of and (corresponding to variables i and j) are obtained from the observed/EQM simulated set of standardized time series as
where N is total number of observations.
2) Multivariate AR(1) model with periodic parameters
The equations defining the parameters of the periodic multivariate first-order AR model (PMAR) are similar to the constant parameter case with the only difference that now these are derived separately for each period to allow for periodicity. Let the vectors respectively, represent the observed (reanalysis) and modeled (GCM simulated) and EQM corrected periodic time series of M variables. The subscript t refers to the year and τ to a specific period in the year. The standardized periodic time series with zero mean and unit variance is denoted as . Following MS2015, a periodic model that maintains the observed lag-1 serial and cross dependence in the modeled series can be formulated as
Expressions for the periodic parameters are obtained following Salas et al. (1980) as
c. Nesting of bias correction models of multiple time scales and recursion
Nesting approach follows the procedure as discussed in Johnson and Sharma (2012) and MS2015. We apply the bias correction models at predefined daily, monthly, seasonal, and annual time scales. The correction starts with daily raw time series and progressively extends to other time scales. Distributional biases are corrected separately for each variable, while dependence/persistence biases are corrected jointly. Starting at daily level, first EQM is applied to all variables individually and then MAR is applied. Daily bias-corrected time series of all variables are aggregated and averaged to form the monthly time series. Univariate EQM and PMAR are applied at a monthly level. The monthly bias-corrected series is aggregated and averaged to form seasonal time series and the univariate EQM and PMAR are repeated. Finally, the seasonal time series is aggregated and averaged to form annual time series and EQM and MAR are applied on the annual time series.
The outcome of bias correction steps performed at multiple time scales is incorporated by forming a single weighting factor for each day and modifying the daily bias-corrected time series as follows (Srikanthan and Pegram 2009):
where is the monthly corrected value, the aggregated–averaged monthly value formed from the bias-corrected daily values, the seasonal corrected value, the aggregated–averaged quarterly value formed from the bias corrected monthly values, the yearly corrected value and the aggregated/averaged yearly value formed from the bias corrected seasonal values. In Eq. (8), subscript t stands for day, j for month, s for season, and i for year.
Mehrotra and Sharma (2012) noted that the nesting bias correction procedure cannot effectively removes the biases at all time scales and suggested repeating the whole bias correction procedure multiple times in order to minimize the biases in various statistical attributes at all time scales. Essentially, the whole bias correction procedure including nesting is repeated again on the bias correct series. They recommended 3–5 iterations when applying the nesting bias correction to remove low-frequency biases in GCM simulated variables. Following this, we also include repetition of the multivariate bias correction procedure in the approach proposed.
d. Other variants of MRQNBC model
The multivariate nesting bias correction model usually require a large number of parameters in order to preserve distribution and time dependence attributes at multiple time scales and across multiple variables (MS2015). It is possible to simplify the model structure if some of these parameters are considered as not important. Following MS2015, we discuss here a few simplifications in the general structure of MRQNBC. In the first case, we ignore lag-1 cross correlations and consider and as diagonal matrices (contemporaneous model), which allows considering univariate parameter estimation procedures for individual variable (Salas et al. 1980). In this case, the elements of and corresponding to variables i and j are expressed as shown below:
This model is referred hereafter as MRQNBC-l. Another simplification is obtained by ignoring all lag-1 correlations and modeling only cross dependence. This leads to the following expressions for Eqs. (3) and (6):
This model is referred to as MRQNBC-nl. Further simplifications in the model structure can be obtained by dropping the cross correlations altogether and reducing the multivariate model to fully detached individual univariate models. This model is referred hereafter as recursive quantile nesting bias correction (RQNBC). This variation considers biases in distributional and lag-1 dependence attributes and is very similar to the recursive nested bias correction (RNBC) model of Mehrotra and Sharma (2012). If we assume lag-1 auto-dependence as unimportant, we have a univariate model that considers nesting of quantile mapping at all predefined time scales and repeats the procedure multiple times (recursive scheme). We call it the recursive nesting equidistant quantile matching (RNEQM) model. Finally, in the most simplified case, the univariate quantile matching is carried out only at single time scale, for example daily, with single iteration. This model is a standard EQM model proposed by Li et al. (2010).
A stepwise procedure explaining the general implementation of the multivariate nesting quantile approach at multiple time scales is presented in the appendix.
e. MMM-KDE downscaling model
The daily downscaling model used here is a multisite modified Markov model (MMM)–kernel density estimation (KDE) downscaling model and is described in details in Mehrotra and Sharma (2010) and Mehrotra et al. (2013). The model consists of two parts: 1) downscaling of rainfall occurrences (whether rain or no rain) and 2) simulation of rainfall amounts on days simulated as wet in the first part. The rainfall occurrence downscaling model is an extension of the stochastic generation modified Markov model (Mehrotra and Sharma 2007), also including the use of exogenous atmospheric predictors as the conditioning variables in the rainfall occurrence model. At site rainfall amounts on wet days are simulated conditional on the atmospheric variables using a kernel density estimation–based approach (referred to as KDE) that also allows proper representation of temporal dependence attributes. Spatial correlation of rainfall occurrence and amounts is maintained by making use of random numbers that are spatially correlated yet serially independent in nature (Wilks 1998).
3. Data used
To illustrate the applicability of the MRQNBC logic, we focus on the daily times series of quite a few large-scale atmospheric variables commonly used in statistical and dynamical downscaling. These include geopotential height, relative humidity, specific humidity, air temperature, dewpoint depression (difference of the air temperature from the dewpoint temperature; Charles et al. 1999), equivalent potential temperature (Evans et al. 2004), and mean sea level pressure. Earlier studies have found that atmospheric variables at 500, 700, and 850 hPa largely influence the rainfall over the Sydney region (Mehrotra et al. 2004; Mehrotra and Sharma 2010). Following this, six atmospheric variables at three pressure levels, 500, 700 and 850 hPa, and mean sea level pressure and their gradients and thickness, totaling 19 variables, are considered for further analysis. Daily time series of these variables over Sydney, Australia, from 1979 to 2010 are obtained from the National Centers for Environmental Prediction (NCEP) Reanalysis-2 data provided by the NOAA CIRES Climate Diagnostics Center, Boulder, Colorado (from their website at http://www.cdc.noaa.gov/). Likewise, daily output of CSIRO Mk3.0 A2 GCM for these variables for the same time period and also for future climate 2069–99 is obtained from the Atmospheric Research Division of the CSIRO, Australia. A 32-yr continuous record (1979–2010) of daily rainfall at 30 stations around Sydney is also obtained. Figure 1 presents the location of selected rain gauge stations along with reanalysis and GCM data grid points. Predictor identification is performed separately for each season and for rainfall occurrence and amount processes. A nonparametric partial dependence–based predictor identification analysis (Sharma and Mehrotra 2014) conducted on the area averaged daily rainfall (as response) and reanalysis data (as predictors) identified six atmospheric variables as significant in driving the rainfall occurrence and amount mechanisms for the four seasons over the study region. Table 1 provides a listing of these variables while Table 2 lists predictors identified as significant for each season and for occurrence and amount downscaling applications. NCAR–NCEP reanalysis data (Kalnay et al. 1996) represents the observations.
4. Application of models and results discussion
We conduct the whole analysis in three parts. In the first part, we bias-correct the raw GCM daily time series of the six identified variables using three univariate approximations of MRQNBC and compare the results. These include 1) equidistant quantile mapping at daily time scale (EQM); 2) EQM with nesting at monthly, quarterly, and annual time scales and repeating the procedure with three iterations (RNEQM); and finally 3) adding a lag-1 correlation bias correction structure in EQM with nesting at each time scale and repeating the bias correction procedure three times (RQNBC). In the second part, we consider MRQNBC and its two multivariate approximations, MRQNBC-l and MRQNBC-nl, to bias-correct the raw GCM daily time series and, compare and evaluate the results. To have an unbiased comparison of the performance of these models, results presented and discussed next are obtained in cross-validation. The available data length of 32 yr is divided into four blocks of 8 yr. The model is calibrated using the three blocks and is validated on the omitted fourth block. The whole analysis is repeated four times using different blocks. Finally, in the last part of the analysis, we use EQM and MRQNBC-l bias-corrected daily time series of identified predictors as an input to our MMM rainfall downscaling model to obtain daily time series of current and future climate downscaled rainfall at 30 rain gauges and compare the results. As NCEP reanalysis variables form the observed dataset, in the discussions that follow it is also mentioned as observed data. Similarly, a season is also sometimes used to express a quarter.
a. Application of univariate bias correction models
Figure 2 presents scatterplots of means, standard deviations (SDs), and a few dependence-related statistics of reanalysis, raw, EQM, RNEQM, and RQNBC bias-corrected GCM atmospheric variables (in cross validation) at multiple time scales. For the sake of simplicity of presentation, means and standard deviation statistics of all variables are scaled to have reasonable comparing values (i.e., to lie between 0 and 100). The dots in these plots represent variables with different symbols shown for daily, monthly, seasonal, and annual time scales. The first column presents a comparison of the statistics of raw GCM and observed data. It can be seen that the raw data exhibit large biases at all time scales for all variables and all the statistics considered. Results of EQM presented in the second column provide a good fit to the means at all time scales (first row, second column). They also match the daily SDs quite well (a statistic related to distribution), albeit with some nominal biases at higher time scales (second row, column 2). As the model does not correct for the biases in lag-0 and lag-1 auto- and cross-correlations at all time scales, they are more or less of the same order of magnitude as the raw GCM data (rows 3–5, columns 1 and 2). In addition to daily time scale, applying EQM at monthly, seasonal, and annual time scales and repeating the procedure 3 times (RNEQM), provides a superior fit to the SDs at all time scales (row 2, third column). However, similar to EQM, the nesting does not improve the correlation-related biases in the GCM simulations (rows 3–5, column 3). Finally, the last column of Fig. 2 presents these statistics obtained by also having an autocorrelation structure in the RNEQM bias correction model (RQNBC). In addition to means and SDs, now biases in the lag-1 autocorrelation statistic at all time scales are also reduced (row 3, last column). Being univariate models, none of the variants considered above consider the cross-dependence structure across variables in the simulations (last two rows of all columns).
We now compare and evaluate the empirical distributions of raw and EQM, RNEQM, and RQNBC bias-corrected GCM results with observations at daily, monthly, quarterly, and annual time scales. Owing to the large number of plots and the fact that they convey more or less similar information, we present here the results of only one atmospheric variable, namely meridional wind (VWIND) at 500-hPa pressure level (Fig. 3). In these plots, the horizontal axis presents the exceedance probability and vertical axis contains the actual values. The raw GCM data exhibit significant biases and the nature of the bias varies with the variables (not shown here), time scales, and probability levels (first column). As can be seen from the second column of the figure, the EQM corrects for the distributional biases at daily time scale only (first row). In comparison to raw results, improvements are seen at higher time scales as well; however, some biases still exist, more specifically at the tails of the distributions. Other two alternatives that consider nesting of EQM at monthly, quarterly, and annual time scales (RNEQM) and addition of lag dependence structure (RQNBC) further reduce the biases at other time scales (last two columns).
b. Application of multivariate bias correction models
As a second part of the analysis, we compare the performance of three multivariate models proposed, MRQNBC and its two variants, and present a few results in Fig. 4. Notations and other details in this figure are similar to Fig. 2. MRQNBC and its variants successfully correct for the biases in means and SDs (distributional biases) at all time scales considered (first two rows). The full version of MRQNBC matches well the observed and lag-1 cross-dependence attributes at all time scales (first column, last three rows). MRQNBC-l (ignoring lag-1 cross dependence, center column) shows some biases in the lag-1 cross dependence more specifically at annual time scale (Fig. 4, bottom center). Results of the further simplified version of the model are presented in the last column where all lag-1 dependence is ignored (MRQNBC-nl). As expected, now the model shows some biases for modeled lag-1 autocorrelation as well (third row of the last column).
Similar to Fig. 3, Fig. 5 presents the empirical distributions of MRQNBC family of models simulated results at daily, monthly, seasonal, and annual time scales for VWIND at 500-hPa pressure level. As all these models are structured to include nesting of quantile matching at multiple time scales with the difference of only in cross-dependence attributes across variables, the bias-corrected time series of these models show similar results with good match of data distribution with the observations at all time scales (all plots of Fig. 5).
c. Application of a rainfall downscaling model
Last, as a final step of the analysis, we present and compare a few important statistics of observed and downscaled daily rainfall simulations for current (1979–2008) and future (2069–99, centered around 2085) climate. These downscaled rainfall simulations are derived using reanalysis and GCM bias-corrected daily time series of GCM large-scale predictors obtained by applying standard univariate EQM and multivariate MRQNBC-l models, and feeding them as an input to MMM-KDE daily rainfall downscaling model. In this comparison, rather than using the full model we prefer to use a simplified contemporaneous bias correction model as the results presented earlier showed that ignoring lag-1 cross dependence has little impact on the overall quality of the results. In addition, for the current climate we also include the downscaled rainfall results obtained using raw GCM predictors. In all the results that follow, the statistics reported are ascertained by generating 100 realizations of the downscaled rainfall from the model. In addition to commonly used statistics such as means and standard deviations, we also evaluate few low-frequency and extreme rainfall attributes such as year-to-year rainfall dependence attributes and number of instances in a year when daily rainfall is greater than a prespecified threshold. In the following discussions, the seasons are defined by grouping the three consecutive calendar months—spring is September–November (SON), summer is December–February (DJF), autumn is March–May (MAM), and winter is June–August (JJA).
1) Rainfall results for the current climate (1979–2008)
Figures 6 and 7 present a few scatter and distribution plots of observed and current climate downscaled rainfall obtained using reanalysis, raw, EQM, and MRQNBC-l bias-corrected time series of GCM large-scale predictors. The results presented in the first column are obtained using reanalysis data, which are also used to calibrate the downscaling model and as such form the calibration results. The results presented in the other columns are obtained by using the calibrated downscaling model driven by raw and bias-corrected GCM data and as such form validation results. The dots shown in the scatterplots of these figures refer to the 30 rain gauge stations considered in the study. In this figure, first two rows present the scatterplots of observed (horizontal axis) and models simulated (vertical axis) seasonal and annual mean rainfall at 30 stations. Rainfall simulations from reanalysis and other two bias correction alternatives provide overall good match of this statistic with observations. Results obtained using raw GCM data show gross underestimation of these statistics. The third and fourth rows of the figure present scatterplots of seasonal and annual SDs of observed and downscaled rainfall. Reanalysis and MRQNBC-l results provide some undersimulation of seasonal standard deviation, although a better unbiased simulation of annual SDs in comparison to EQM. Raw results show consistent under simulation of seasonal and annual SDs at all locations. Figure 7 presents a few statistics of daily rainfall extremes. In the figure, the first and second rows present the scatterplots of the 5th percentile of daily maximum rainfall at each station on seasonal and annual basis while the third and fourth rows present the scatterplots of number of days in a year when daily rainfall exceeds the 5th percentile of the observed rainfall value at each station on seasonal and annual basis. Again, downscaled rainfall simulations from reanalysis and MRQNBC-l datasets better reproduce these statistics in comparison to EQM. Downscaled rainfall obtained using raw and EQM corrected GCM predictors show under simulation of these extreme statistics. Finally, the last two rows of the figure present the statistics of the average number of wet and dry spells of length >7 days and >18 days in a year (representing longer duration spells), respectively. Although no significant differences in results across bias correction models are noted, MRQNBC-l slightly oversimulates wet spells at a majority of stations but performs slightly better than EQM for dry spells. As MRQNBC-l considers lag-1 dependence at daily as well as higher time scales, it is expected to provide better results for these statistics in comparison to EQM. Simulations using raw GCM data show significant underestimation of wet spells and oversimulation of dry spells.
Distribution plots of a few selected statistics of observed and model simulated rainfall are shown in Fig. 8. The first two rows of this figure present the distribution of observed and downscaled annual rainfall totals and daily maximum rainfall at a representative rain gauge station, Lithgow. In these plots the 5th and 95th percentiles and median values of the model simulated rainfall are shown as continuous lines while the observed values are superimposed as circles. Reanalysis and MRQNBC-l data-driven results capture better the observed year-to-year variations in the simulations including high and lows. The third and fourth rows of Fig. 8 present the year-to-year distribution of the area averaged number of wet days and rainfall amounts obtained using the reanalysis and GCM bias-corrected datasets. The daily area averaged time series over the study area is formed by taking the average of daily rainfall at the 30 stations considered in the analysis. The reanalysis data-driven results are quite successful in reproducing the observed year-to-year variability in the area averaged downscaled simulations with the performance of MRQNBC-l being somewhat better in comparison to EQM. Simulations obtained using raw GCM predictors do not exhibit any significant year-to-year variability in any of the rainfall attributes. Finally, the last row presents plots of standard deviation of total rainfall versus time scale in years for 1–6-yr rainfall totals (capturing the years’ variability). Reanalysis and MRQNBC- l datasets reproduce these statistics better than the EQM. It seems that the simulations from the EQM do not adequately capture the observed extreme wet and dry years in the simulations and it gets worse in case of raw GCM data-derived simulations.
2) Rainfall results for the future climate
It is important to note that the results presented hereafter show the changes in the rainfall statistics in the future with respect to the current climate (median value of 100 realizations) and therefore any assessment with respect to the observations needs to be examined in the light of the results presented in Figs. 6 and 7. In the scatterplots, as current climate statistics are plotted on the horizontal axis, symbols above the continuous diagonal line show an increase in the value of that statistic in the future. The future climate time window spans over the 30-yr time window 2069–99 and presents the changes in the rainfall around 2085.
Similar to the current climate results, Figs. 9 and 10 present scatter and distribution plots of the current and future climates downscaled rainfall. The first two rows of Fig. 9 present the scatterplots of current and future climate seasonal and annual rainfall at individual rain gauge station as simulated by EQM and MRQNBC-l bias-corrected datasets. No substantial changes in annual rainfall by either of the datasets are predicted. Simulations from MRQNBC-l predict seasonal and annual rainfall to be more variable whereas results of the EQM show winter and autumn rainfall to be less variable with no changes at annual time scale at majority of locations. Changes in the 5th percentile of seasonal and annual daily maximum rainfall are shown in the top two rows of Fig. 10. EQM results project a slight increase in summer and decrease in winter and autumn in the daily rainfall extreme at majority of locations. Results of MRQNBC-l, although largely similar to EQM, project changes at fewer locations with no changes in winter. Results from both the models project no changes in the daily rainfall extremes at the annual time scale. Changes in the occurrences of these daily rainfall extreme events (when daily rainfall is ≥5th percentile rainfall) are examined in the third and fourth rows of Fig. 10. Results of EQM suggest an increase in the occurrence of extreme events in summer and decrease in other seasons and at annual time scale at a majority of locations. The results of MRQNBC-l are similar, but with lesser magnitude. Models predict no changes in the occurrences of longer period wet spells (>7 days) (Fig. 10, 5th row) and a tendency of increased occurrence of extreme dry spells (>18 days) (Fig. 10, bottom row) at all locations.
The changes in the over the annual distributions of a few important statistics are presented in Fig. 11. No appreciable year-to-year changes in the annual rainfall distribution and daily maximum rainfall at a representative rain gauge station, Lithgow, are noted (first and second rows), although MRQNBC-l results suggest a slight increase in the daily maximum and annual rainfall totals. On an area averaged basis, both models predict increased occurrences of years with low rainfall (rows 3 and 4). MRQNBC-l predicts more year-to-year variability (with an increase in the standard deviation) while EQM suggests no changes with respect to the current climate (Fig. 11, last row).
Finally, Table 3 summarizes the results discussed so far, by presenting the percent changes in quite a few important rainfall statistics over the study area on a seasonal and annual basis. In general, MRQNBC-l results present a less extreme scenario in comparison to EQM. The results of EQM project a very dry winter while those of MRQNBC-l project a moderate decrease in the rainfall in the season. Both models predict a nominal decrease in the annual number of wet days and rainfall and a nominal increase in the summer rainfall over the study area in 2085. The number of days with extreme rainfall and its intensity are projected to increase in summer and decrease in other seasons. The number of wet spells of varying durations is expected to decrease with extended occurrences of longer period dry spells.
Univariate bias correction approaches such as EQM are routinely used to correct for the systematic biases in lower-order moments and in the frequency distribution of individual time series usually at daily or monthly time scales. These approaches, while quite effective in removing the biases in the individual time series at a predefined time scale, still exhibit significant biases at other time scales and also in persistence-related attributes. In addition, downscaling/hydrological applications often require multiple predictor variables as input to these models, and use of a univariate approach applied to each variable in isolation may lead to a biased representation of their joint dependence in comparison to what is observed. The proposed multivariate recursive quantile nesting bias correction (MRQNBC) procedure can be used to correct for variability and persistence biases in multiple variables on a range of time scales. As a first part of the paper, we show that the raw GCM data exhibit significant biases in various statistical attributes at multiple time scales and nesting and persistence modifications in the standard univariate equidistant quantile matching (EQM) model successfully reproduce the observed distribution attributes in bias-corrected simulations at multiple time scales. As a second part of the paper, we extend the univariate nesting model to a multivariate setting and show that the multivariate bias correction procedure with three iterative steps can significantly reduce the biases in mean, variability, and auto- and cross-dependence-related attributes in GCM simulations.
The MRQNBC or similar bias correction approaches make use of continuous time series and work well where data do not contain many zeros. Although not a focus of the study, if MRQBNC is used to bias-correct rainfall, more specifically daily rainfall, which usually contains many zeros, the results may not look good. In these circumstances, the structure of the model may require some modifications providing specific treatment for dry periods and the associated dependence structure (e.g., Pegram and Bárdossy 2013).
The approach as presented in the paper deals with multiple variables at a given point or grid location. As it is based on multivariate modeling, it can be applied in space as well without any change in the basic model structure (MS2015; Salas et al. 1980). If it is applied across multiple grid points or point locations, the model can be viewed as a univariate model with the number of variables now replaced with the number of locations/grid points.
It is well known that systematic biases in contributing variables when simulated using a model can lead to erroneous representations of the response. Hence, as a final part of the research, we used univariate and multivariate bias correction approaches to correct atmospheric variables that were then used as inputs to a rainfall downscaling model. Results indicate that the use of multivariate bias-corrected GCM data shows improvements in comparison to standard univariate bias correction approach for the current climate. As the MRQNBC approach takes into account the complex interactions among the variables, it is expected to better capture the behavior of natural processes that contribute to the variability of the rainfall occurrence process. In addition, as MRQNBC also corrects for the biases at seasonal and annual time scales, the bias-corrected atmospheric predictors provide a better representation of long-term variability in the downscaled simulations.
Implementing the Multivariate Recursive Nesting Quantile Bias Correction (MRQNBC) Approach
The following explains in brief the stepwise procedure required for the general implementation of the multivariate nesting quantile approach at selected daily, monthly, seasonal, and annual time scales.
Iterate over daily, monthly, seasonal, and annual time scales. Recall that the vectors and respectively, represent the observed (reanalysis) and modeled (GCM simulated) raw daily time series of M variables at time step t.
Calculate the empirical CDFs of and and apply univariate EQM to get the quantile-corrected daily time series, .
Standardize univariate quantile corrected daily time series, (obtained from step 2) using mean and standard deviations of individual time series and obtain time series, .
Calculate the lag-0 and lag-1 cross-correlation matrices for and and use these to bias-correct as in section 2b and obtain the bias-corrected daily time series .
Reconstruct the full-variance corrected time series at the relevant time scales by multiplying by the individual time series standard deviation and adding the mean. The corrected time series at each time scale, denoted , , in the text, form the numerators in Eq. (8).
Aggregate the daily series and to the next higher time scale: monthly, seasonal, or annual. The aggregated values at higher time scales, denoted , , and in the text, form the denominators in Eq. (8).
Repeat steps 2–5 for monthly, seasonal, and annual time scales.
Apply Eq. (8) to get .
Repeat steps 2–8 three times to apply the recursive scheme. As noted in Mehrotra and Sharma (2012) repeating the whole procedure at least three times further improves the results.
Apply the above procedure separately for current and future climates GCM data. Observed (reanalysis) and current climate GCM datasets used to formulate the empirical distributions and covariance matrices remain the same for both current and future climate periods. When dealing with current climate, the bias correction procedure is usually applied on the same raw GCM time series used to derive the parameters, whereas for future climate it belongs to different time period not used in the parameter estimation procedure. Interpolation and/or extrapolation is required if data length for observed, GCM current, and GCM future climate periods is different.