## Abstract

A time-scale decomposition (TSD) approach to statistically downscale summer rainfall over North China is described. It makes use of two distinct downscaling models respectively corresponding to the interannual and interdecadal rainfall variability. The two models were developed based on objective downscaling scheme that 1) identifies potential predictors based on correlation analysis between rainfall and considered climatic variables over the global scale and 2) selects the “optimal” predictors from the identified potential predictors via cross-validation-based stepwise regression. The downscaling model for the interannual rainfall variability is linked to El Niño–Southern Oscillation and the 850-hPa meridional wind over East China, while the one for the interdecadal rainfall variability is related to the sea level pressure over the southwest Indian Ocean. Taking the downscaled interannual and interdecadal components together the downscaled total rainfall was obtained. The results show that the TSD approach achieved a good skill to predict the observed rainfall with the correlation coefficient of 0.82 in the independent validation period. The authors further apply the model to obtain downscaled rainfall projections from three climate models under present climate and the A1B emission scenario in future. The resulting downscaled values provide a closer representation of the observation than the raw climate model simulations in the present climate; for the near future, climate models simulated a slight decrease in rainfall, while the downscaled values tend to be slightly higher than the present state.

## 1. Introduction

In the past decades, global climate has undergone rapid changes as it has been approved by documented observation in every continent (Solomon et al. 2007). Projection for future climate (e.g., rainfall) and its associated influences on environment and society (e.g., runoff and water storages) have attracted growing attentions worldwide. However, uncertainties in projected rainfall changes for later this century plague estimates of impacts on future runoff and water storages (Milly et al. 2008). In particular, there are several difficulties associated with interpreting changes in variables simulated at a resolution of 100–200 km in terms of changes to be expected at smaller catchment scales. There is an increasing demand for more reliable estimates of these changes by water resource managers who need to make long-term decisions about future infrastructure demands (e.g., new reservoirs, pipelines, drainage, etc.). North China (NC; 110°–122°E, 35°–40°N) has already been severely affected by a downturn in rainfall and reductions in runoff and water shortage. Problems of water shortage and related environmental issues in NC have become the most significant limiting factors affecting sustainable development in this important region of China (Xia et al. 2007).

NC is located at the northern margin of East Asian subtropical monsoon region and receives the bulk of annual rainfall during the summer half-year (i.e., May–October). Summer (July–August) rainfall over NC is affected by both the teleconnected large-scale signals and the regional signals. As for the teleconnected signals, El Niño–Southern Oscillation (ENSO) was reported to be associated with NC summer rainfall (Huang and Wu 1989; Lu 2005; Wang et al. 2000; Wu and Li 2008); North Atlantic Oscillation (NAO) yields another predictability source for the NC summer rainfall (Wu et al. 2009, 2011). Additionally, the regional signals, such as the components in the East Asian summer monsoon (EASM) system (Huang et al. 2008; Li and Zeng 2002; Yang and Sun 2003) and the mid-high-latitude circulations over Eurasia (Wang et al. 2008; Zhao and Song 1999) also exert influence on NC summer rainfall. Lu (2002, 2003) reported that there exists obviously distinct variability at the interannual and interdecadal time scales in NC summer rainfall. The strong high-frequency variability results in severe floods or droughts in NC (Huang et al. 2006), while the low-frequency variability shows a pronounced drying trend during the past half-century, which has attracted great interests to find out the underlying causes of the multidecadal drought over NC (Ding et al. 2009; Li et al. 2010; Li et al. 2003; Sun 1999; Zhou et al. 2009a). The extremely complex variability in NC summer rainfall complicates its seasonal prediction and long-term projections. This is an important issue in terms of disaster prevention and mitigation and decision making.

It is well-known that general circulation models (GCMs) provide a good tool to project the large-scale long-term mean future climate; however, the skillful spatial resolution in most updated climate models is large than or at least 2000–4000 km (Grotch and MacCracken 1991), beyond the demand for regional precipitation prediction, which is sensitive to subgrid processes. The physical parameterization schemes are critical for precipitation projection, and the limitation of parameterization schemes in current climate models is also responsible for the large uncertainties in rainfall simulations, even for ensemble forecasts (Whetton et al. 2005).

Many approaches have been developed to overcome the uncertainties accompanying future rainfall projections, including the assessment of the performance of individual models as a guide to the reliability of their predicted changes (Maxino et al. 2008; Perkins et al. 2007; Smith and Chandler 2009; Wu and Li 2009). Statistical downscaling is another method that can potentially assist in the assessment of climate models. A simple test for a model is that it cannot only provide an accurate estimate for regional rainfall, but that it should also simulate the observed relationship between regional rainfall and other key variables, for example, sea level pressure (SLP). If these criteria can be satisfied, simulated changes in rainfall are more likely reliable than otherwise. It cannot only provide an indication of any such relationship, it can also potentially provide alternative estimates for rainfall changes if the model-simulated changes in the key variables are believed to be more reliable than the rainfall estimates themselves (Benestad 2001).

Statistical downscaling is an empirical relationship between the large-scale climate anomalies and local climate fluctuations based on historical data. There are numerous ways to develop statistical downscaling models (Fowler et al. 2007), but it is important to note that a statistical downscaling approach assumes that any derived historical relationship also holds for the future (Wilby 1997).

Among various statistical downscaling models, multiple linear regression models built using gridcell values of atmospheric variables as predictors for surface temperature and precipitation are popular because of their simplicity and explicit physical meaning (Benestad 2001; Wilby 1998). Other more complex techniques include using the principal components (PCs) of pressure fields or geopotential height fields (Hanssen-Bauer and Forland 1998; Kidson and Thompson 1998; Li and Smith 2009) and more sophisticated methods such as canonical correlation analyses (Busuioc et al. 2001; Karl et al. 1990; Von Storch et al. 1993), singular value decomposition (Zhu et al. 2008), and partial least squares regression (Bergant and Kajfe-Bogataj 2005).

There is no doubt that the choice of predictors and the associated domains plays a key role in statistical downscaling. An amount of sensitivity studies have indicated that the choice of predictors and domains is critical for future projections (Benestad 2001; Frias et al. 2006; Schmidli et al. 2007). The commonly used predictors are derived from circulation parameters, which could be credibly simulated by GCMs, including SLP, geopotential heights, horizontal winds at various levels, etc. For the choice of predictor domains, its importance has been indicated (Benestad 2001; Wilby and Wigley 2000), but this issue of how to choose has not been systematically addressed in the existing studies. The common approach is to subjectively select a fixed domain that encompasses the target location of the predictand (Oshima et al. 2002; Timbal et al. 2003) or to select the best from several trial domains that surrounds the target location with contrasting locations and spatial extensions (Benestad 2001, 2002). Benestad (2004) first proposed a quantitative rule to determine the spatial extent of domain surrounding the target position. It examined the correlation map between climatic parameter over the target position and the surrounding areas and defined the domain according to where the correlation goes to zero. Nevertheless, this proposition only considers the effect of the local and nearby systems but misses the remote predictive signals that exert an influence via teleconnection. Because the teleconnection is an atmospheric phenomenon explained by spherical planetary wave propagation theory (Hoskins and Karoly 1981); consequently, the preceding or concurrent teleconnection signals is useful for statistical predictions. Therefore, it is intuitive to identify potential predictors over the global scale.

The aim of this work is to build a statistical downscaling model for NC summer rainfall using an objective approach that objectively selects potential predictors over the global scale. Given that there are significantly distinct components in rainfall variability at the interannual and interdecadal time scales, it is desirable to develop a time-scale decomposition (TSD) approach to obtain the downscaled rainfall totals by combining two distinct downscaling models for the interannual and interdecadal rainfall variability.

The framework of this study is organized as follows. Section 2 introduces the data used in this work. Section 3 describes the proposed TSD approach to statistically downscale NC summer rainfall. The downscaled results from two distinct statistical downscaling models calibrated for the interannual and interdecadal rainfall variability and their combined results for total rainfall are presented in section 4. Key results by applying the downscaling model to climate change simulations are described in section 5. Finally, section 6 contributes to a summary and discussion.

## 2. Data

Observed rainfall data were derived from 160-station monthly rainfall dataset for China provided by the China Meteorological Administration for the period 1951–2008. July and August (JA) is the primary rainy season over NC, and the total rainfall series during JA averaged over 15 gauge stations (Fig. 1a) within the region of 110°–122°E, 35°–40°N is designed to be predicted.

Atmospheric data were extracted from the National Centers for Environment Prediction (NCEP)–National Center for Atmospheric Research (NCAR) reanalysis dataset on a 2.5° × 2.5° grid (http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html), including SLP, 500-hPa geopotential height (Z500), 850-hPa meridional wind (V850), etc. SST data were taken from Hadley Center SST dataset I on a 1° × 1° grid (http://hadobs.metoffice.com/hadisst/). Several well-known climate indices are employed as candidate predictors. The southern annular mode index (SAMI) is defined as the difference in the normalized monthly zonal-mean SLP between 40° and 70°S (Nan and Li 2003), and the northern annular mode index (NAMI) is defined as the difference between 35° and 65°N (Li and Wang 2003a), both are available online (http://web.lasg.ac.cn/staff/ljp/dataset.html). The North Atlantic Oscillation index (NAOI) is defined similar to NAMI but regionally over the North Atlantic sector from 80°W to 30°E (Li and Wang 2003b); it is also available online (http://web.lasg.ac.cn/staff/ljp/dataset.html). The Niño-3 index is used to represent the ENSO phenomenon and available online (http://www.cpc.noaa.gov/data/indices). The Pacific decadal oscillation index (PDOI) is derived as the leading PC of monthly SST anomalies in the North Pacific Ocean poleward of 20°N (Zhang et al. 1997) and is available online (http://jisao.washington.edu/pdo/PDO.latest).

The GCM data were derived from three GCMs [Commonwealth Scientific and Industrial Research Organisation Mark version 3.5 (CSIRO Mk3.5), Centre National de Recherches Météorologiques Coupled Global Climate Model, version 3 (CNRM-CM3), and Max Planck Institute (MPI) ECHAM5] selected from 21 GCMs (Table 1) participating in the World Climate Research Programme’s (WCRP’s) Coupled Model Intercomparison Project phase 3 (CMIP3) due to their simulation of the predictors in the downscaling model. The outputs from the twentieth-century simulation experiment (20c3m) and climate change experiment based on the A1B emission scenario of Intergovernmental Panel on Climate Change (IPCC) Assessment Report 4 (AR4) are utilized and they are available online (http://www-pcmdi.llnl.gov/). Since these GCMs have different horizontal resolutions, raw GCMs outputs were interpolated into a resolution of 2.5° × 2.5° the same as NCEP reanalysis data using bilinear interpolation method.

## 3. Methods

The spectrum analysis shows that there primarily exist two peaks with periods of 2–3 years and 12–15 years in the NC summer rainfall during 1951–2008 (Fig. 1b), indicating strong interannual and interdecadal variability (Fig. 1c). A previous study (Lu 2003) indicated that there are distinct relationships between the NC summer rainfall and circulation anomalies at the interdecadal and interannual time scales, respectively; the interdecadal variation does not modify the interannual variation and its physical mechanism. This finding motivates us to build a TSD approach to downscale NC summer rainfall by identifying respective forcing factors linked to the interannual and interdecadal variability via distinct statistical-downscaling models, respectively.

The main stages to establish and validate the TSD model are shown in Fig. 2. Assume that the observed rainfall series can be decomposed into the interannual component and the interdecadal component by

To establish a TSD approach to downscale rainfall , the whole study period 1951–2008 (*N* = 58) was separated into the calibration period 1951–90 (*n* = 40) and independent validation period 1991–2008.

To calibrate models for the interannual and interdecadal rainfall variability, observed rainfall and individual predictors are decomposed as the interannual (variation less than 7 years) and interdecadal (variation longer than 7 years) components by Fourier decomposition filtering using the data over 1951–90. A correlation-based cross-validation stepwise regression (C_CVSR) downscaling scheme documented in our previous paper (Guo et al. 2011, manuscript submitted to *J. Geophys. Res.*) is used to build the interannual model (IAM) and interdecadal model (IDM) for the relationship between rainfall and associated predictors at interannual and interdecadal time scales, respectively. Taking the predicted values and together we obtain the predicted rainfall totals over the training period 1951–90. Note that the C_CVSR downscaling scheme primarily contains two stages—that is, 1) the identification of potential predictors over the global scale through correlation analysis with rainfall, and 2) the selection of “optimal” predictors from the potential predictor set to formulate regression equations by cross-validation-based stepwise regression (CVSR) approach. See appendix A for some details about the CVSR approach.

To validate the skill of the TSD approach to downscale NC summer rainfall, predictors selected by the IAM and IDM based on the training period 1951–90 are decomposed as the interannual and interdecadal components by Fourier decomposition filtering over the whole period 1951–2008 (*N* = 58), and they are taken into respective forecast equation (the IAM and IDM) to calculate the downscaled interannual and interdecadal rainfall components over the validation period 1991–2008. Taking the predicted values and from the IAM and IDM together, we obtain the predicted rainfall totals over 1991–2008, which indicate the true predictive skill of the TSD approach. We quantify the degree of prediction uncertainty with the bootstrap approach (Stine 1985), and the confidence intervals associated with the prediction are derived from the spread of 1000 bootstrap samples with random replacement. See appendix B for some details about the bootstrap approach.

## 4. Downscaling NC summer rainfall

In this section, we use C_CVSR downscaling scheme to establish distinct models for relationships between distinct large-scale predictors and the NC summer rainfall at the interannual and interdecadal time scales, respectively.

### a. Calibrating the IAM

The interannual correlation between the well-known climate indices and NC summer rainfall are shown in Table 2. It seems that the interannual components of the June NAOI (NAO_{A}) and Niño-3 index (Niño3_{A}) are significantly relevant, thus, these two indices are taken as candidate predictors for modeling the interannual rainfall variability. To further seek other possible predictors over global scale, interannual correlation of the detrended time series between SLP, V850, Z500, SST fields, and rainfall during 1951–90 is calculated (Fig. 3). Previous studies indicated that interannual rainfall is associated with the circulation systems including low-level meridional wind over East China (Huang et al. 1999), mid-high circulation over Eurasia (Zhao and Song 1999), the Mascarene high and Australian high (Xue 2005), and the Somali Jet (Wang and Xun 2003); indeed, high correlation coefficients appear over these areas, as indicated by the rectangles in Fig. 3. Potential predictors associated with the interannual rainfall variability are calculated by averaging the values over the areas having correlation coefficients exceeding 0.4 (significant at the 0.01 level) within the marked rectangles, which are denoted as , and their details are listed in Table 3. It is clear that each of these nine potential predictors has a strong link with the NC summer rainfall with the significant correlation coefficient at the 0.01 level.

Figure 4a shows the whole process of CVSR screening procedure in calibrating the IAM. The root-mean-square-error (RMSE) between the observed and cross-validation estimated rainfall (CV_RMSE) is used to measure the predictive performance of potential predictors at each step. Since the well-known teleconnection indices represent large-scale signals and possess explicit physical meaning, the significantly related indices (NAO_{A} and Niño3_{A}) are preferentially taken to be selected in CVSR procedure. At step 1, Niño3_{A} is selected since it yields the smaller CV_RMSE value of 49.9 mm. At step 2, the CV_RMSE shows a decrease after adding the NAO_{A}, but this decrease in quadratic error is not statistically significant in terms of the mean value and the variance value because the *t*- and *F*-tests’ values are 0.33 and 1.1, less than the significant values of 1.4 and 1.6 at the 0.15 level. Among the well-known indices, only Niño3_{A} is selected into the regression equation. The additional potential predictors are added to be selected at the following steps. At step 3, the sequential inclusion of results in a statistically significant reduction in the CV_RMSE value to the minimum of 39.3 mm (*t*- and *F*-tests’ values are 1.5 and 1.62, exceeding the significant values); thus, is selected into the regression equation as the second predictor. At step 4, further inclusion of reduces the CV_RMSE value to a minimum of 33.2 mm; however, this reduction in quadratic error is not statistically significant, indicating termination of the CVSR screening procedure.

As a result of the CVSR screening procedure, Niño3_{A} and are finally selected into regression equation as predictors and ; in both cases, their regression coefficients are significant at the 0.05 level. The IAM is finally given in the form of

where is the interannual component of rainfall at *t*th year over 1951–90, and are the *t*th-observed values of the normalized indices and .

Figure 5a shows the interannual variation of observed and downscaled rainfall from the IAM (2). The IAM provides a relatively accurate representation of observations, even for the independent verification period. Table 4 summarizes this skill by showing the correlation coefficients, RMSE and the ratio of RMSE to the climatology rainfall (base period 1951–2008) between the downscaled and observed values. The correlation coefficient and RMSE are 0.76 and 34.2 mm (11.1%) in training period and 0.71 and 42.8 mm (13.9%) in independent validation period.

As a physically meaningful downscaling model, the relationship between the predictors and rainfall should be physically interpretable. In this regard, we explore the possible physical linkage between the interannual rainfall variation and predictors and by using the data from the whole period 1951–2008.

The first predictor is the interannual component of the June Niño-3 index, representing the interannual variation in June SST over the mideastern tropical Pacific. When there is anomalous warming (i.e., positive anomaly), large-scale anomalous cooling appears over the western tropical Pacific, and this anomalous El Niño pattern could persist throughout JA (Fig. 6a). As an atmospheric Rossby wave response to the western Pacific large-scale cooling in the western tropical Pacific, an anomalous meridional tripole pattern is induced at the low–midtroposphere over the western Pacific (Fig. 6b), which is analogous to the Pacific–Japan or East Asia–Pacific teleconnection pattern. Figures 6c,d show the horizontal and meridional circulation response as follows: anomalously strong WPSH locates at about 25°N, and an anomalous northeasterly at its southern boundary encounters northeastward cross-equator flows, giving rise to anomalous convergence and ascent; at its northwestern boundary, anomalous northward flows encounters the southward flows induced by the cyclonic anomaly over the North Pacific and northeast Asia, leading to anomalous convergence and ascent at about 32°N. This anomalous circulation structure, which is consistent with previous studies (Huang and Wu 1989; Lu 2005; Nitta 1987), makes NC under the influence of cold and dry flows descending from Northeast Asia and suppresses precipitation occurring over NC. In contrast, when there is an anomalous cooling over the mideastern tropical Pacific in June (i.e., negative anomaly), the circulation described above would reverse, favoring a wet summer over NC.

The second predictor represents the interannual component of JA meridional wind over East China at 850 hPa, which is a regional predictor. Figure 7 shows the interannual correlation of the detrended time series between the negative and geopotential height at 850, 500, and 200 hPa. It is evident that, associated with , there appears a quasi-barotropic anomaly in geopotential height fields of an anticyclonic anomaly over central Asia and Mongolia region and a cyclonic anomaly over northwestern Pacific corresponding to the anomalous northeasterly over East China. As a result, the anomalous northeasterly currents prevent warm and moist air being transferred to the NC, leading to a dry summer. Thus, NC summer rainfall is closely associated with low-level meridional wind at interannual time scale, and it modulates the transfer of warm and humid air from South China Sea and western Pacific. This result is consistent with the previous study by Huang et al. (1999). Yet the underlying driver for the quasi-barotropic pressure anomaly in low–mid–high troposphere is not clear and deserves further investigation.

### b. Calibrating the IDM

The procedure for calibrating the IDM is the same as that for IAM. The June SAMI and PDOI show significant correlations with rainfall at interdecadal time scale (Table 2); thus, the interdecadal components of SAMI (SAM_{D}) and PDOI (PDO_{D}) are considered as candidate predictors for IDM. Figure 8 shows the interdecadal correlation of detrended time series between SLP, V850, Z500, SST fields, and rainfall over the training period 1951–90; strongly correlated areas are denoted by rectangles. Areas with correlation coefficients exceeding 0.8 within the rectangles are identified to calculate indices forming potential predictors ; their details are listed in Table 3. It is evident that these four potential predictors are strongly associated with the interdecadal rainfall variation with correlation coefficients ranging from −0.90 to −0.95, significant at the 0.01 level after adjusting the degree of freedom.

Figure 4b shows the CVSR screening procedure for calibrating the IDM. The significantly correlated well-known indices (SAM_{D} and PDO_{D}) are preferentially selected. At step 1, SAM_{D} is selected because of its smaller CV_RMSE value. Then there is no significant reduction in CV_RMSE value after sequential adding PDO_{D} at step 2 (*t*- and *F*-tests’ values are 1.2 and 1.3, respectively, less than the significant values of 1.46 and 2.0 at the 0.15 level). The additional potential predictors are added to be selected at the next steps. At step 3, together with SAM_{D}, sequential inclusion of results in a statistically significant reduction in the CV_RMSE value to the minimum (*t*- and *F*-tests’ values are 2.4 and 2.05); thus, is selected as the second predictor. At step 4, the sequential inclusion of reduces the CV_RMSE value to the minimum, but this reduction is not statistically significant, leading to the termination of the CVSR selecting procedure. SAM_{D} and are selected into regression equation until now. However, the regression coefficient of SAM_{D} is not significant at the 0.05 level; the SAM_{D} is excluded. Finally, only is taken as the predictor for the IDM given by

where is the interdecadal component of rainfall at *t*th year over 1951–90, and is the *t*th-observed values of the normalized .

Figure 5b shows the interdecadal variation of observed and downscaled rainfall from Eq. (3). Table 4 shows the correlation coefficients, RMSE, and the ratio of RMSE to the climatology rainfall. They are 0.95 and 16.2 mm (5.2%) in training period and 0.84 and 23.1 mm (7.5%) in test period, indicating a relative high skill in predicating the interdecadal variability using .

We now give an interpretation of why the predictor [i.e., the interdecadal component of June SLP over southwestern Indian Ocean (IO)] is associated with the interdecadal rainfall variation. In the past 50 years, has a pronounced increasing trend (Fig. 9 d). Figure 9a shows the interdecadal correlation of the detrended time series between and June surface temperature field, revealing the positive association between and the surface temperature over western tropical Pacific and eastern tropical IO (i.e., the warm pool). Since the tropical IO has undergone anomalous warming during the past decades as indicated in previous studies (Ding et al. 2010; Du and Xie 2008; Li et al. 2008; Zhou et al. 2009a,b), it is likely to trigger the multidecadal increase in SLP over the southwestern IO via an anomalous zonal circulation. On the other hand, the ocean is a slowly varying medium and the warming anomaly over the warm pool may be responsible for the persistent anomaly and anomalous circulation over East Asia throughout the following JA.

Figure 9b presents the associated June surface circulation with positive anomaly. Associated with SLP increase over the southwestern IO, there appear anomalous northward cross-equatorial flows at about 50°–70°E longitudes. Meanwhile, the anomalous warming over the warm pool favors an anomalous northerly appearing over East China as reported by previous studies (Li et al. 2010; Zhou et al. 2009b). The anomalous northerly encounters the enhanced cross-equatorial southerly, intensifying the convergence in the intertropical convergence zone over the western Pacific. The enhanced convergence and ascent strengthens the meridional circulation, which is clearly seen in a latitude–vertical section averaged at 100°–140°E longitudes (Fig. 9c). The anomalous meridional circulation leads NC under a descent control and a shortage of moisture, leading to a dry summer. Figure 9d shows the normalized time series and linear trends of as well as the interdecadal rainfall component, clearly indicating their out-of-phase relationship.

### c. Downscaled total summer rainfall over NC

It is straightforward to obtain the downscaled total rainfall by summing up downscaled values from IAM and the IDM. Figure 5c compares the downscaled and observed NC summer rainfall. Because the IAM and IDM are derived using data over the training period 1951–90, the predicted values after 1990 therefore indicate the true predictive skill. In general, the performance of the TSD approach evident in the training period is maintained during the subsequent verification period. Compared to the observed climatology rainfall of 320.5 and 282.8 mm during 1951–90 and 1991–2008 periods, the downscaled results provide an accurate reproduction of 320.4 and 280.4 mm, respectively. Table 4 summarizes downscaling skill by showing some quantitative measurements. The correlation coefficients between the downscaled and observed rainfall are all highly significant at the 0.01 level, 0.83 for the training period, and 0.82 for the test period. RMSE (the ratio of RMSE to the climatology rainfall) is 39.5 mm (12.8%) in training period and 45.8 mm (14.8%) in test period. All of the results indicate that the TSD approach performs well on downscaling NC summer rainfall.

On the other hand, we have also compared the downscaling skills between the TSD approach and the single model (non-time-scale decomposition) based on C_CVSR downscaling scheme by Guo et al. (2011, manuscript submitted to *J. Geophys. Res.*). It is found that the TSD approach has a better skill in terms of higher correlation (0.82 versus 0.59) and a lower RMSE (45.8 versus 60.8 mm) between downscaled and observed rainfall in test period 1991–2008. This progress in prediction demonstrates the superiority of the TSD model.

## 5. Application to climate change simulations

We apply the downscaling models to predictors derived from GCMs’ simulations for both the present-day and future climate. Before the application, we evaluate GCMs’ simulation for predictors used in the IAM and IDM and select the well-performed GCMs to be utilized. This examination involves 21 GCMs in CMIP3, and only three GCMs (CSIRO Mk3.5, CNRM-CM3, and MPI ECHAM5) are selected finally on the basis that they are able to simulate the long-term-mean values and linear trends of the predictors well. Table 5 lists the results of the simulations of all predictors in present-day and future climate under A1B emission scenario from three selected GCMs and their ensemble mean. With the GCM-generated predictors, rainfall is estimated for the present-day climate (1951–99) and the near-future climate (2010–24 and 2035–49) under A1B emission scenario.

To downscale GCMs’ outputs to NC summer rainfall, we update the IAM and IDM with observed rainfall and NCEP data from 1951 to 2008. GCM-generated predictors are placed in the updated forecast equations to make predictions. Figure 10 compares observed, GCMs directly predicted, and downscaled long-term-mean rainfall for 1951–99, 2010–24, and 2035–49. The downscaled value for the future is accompanied by 50% and 95% confidence intervals (horizontal lines in Fig. 10), which indicate the uncertainty associated with the downscaling model as estimated by bootstrap approach.

For predictions under the present-day climate (1951–99), all of the raw GCMs obviously underestimate the rainfall, while the downscaled values represent slight overestimates. Compared with the raw GCMs’ simulations, in all cases, the downscaled values have a smaller percentage error: CSIRO Mk3.5 (+3% cf. −12%), CNRM-CM3 (+14% cf. −25%), MPI ECHAM5 (+17% cf. −21%), and ensemble mean (+11% cf. −19%).

For the future projection, simulations from CNRM-CM3, MPI ECHAM5, and the ensemble mean directly project a slow decrease in rainfall until 2010–24, followed by a slight increase (approaching the present-day state) until 2035–49, while CSIRO Mk3.5 indicates an opposite projection. However, the downscaled values show different projections from the raw GCMs’ estimates. Except for the downscaled values based on MPI ECHAM5 that indicate a continuous decrease in rainfall, other downscaled values indicate a slight increase until 2010–24, followed by a slight decrease (still wetter than the present-day state) until 2035–49. The result indicates that there is a less chance for large changes to happen, such as severe long-term droughts or floods (relative to the present-day state) for the next 40 years under A1B emission scenario. The wetting condition predicted by a majority of downscaled values is consistent with the prediction from the regional climate model (Gao et al. 2008), giving rise to more confidence to this projection.

## 6. Summary and discussion

In this paper, we have proposed a TSD approach to downscale NC summer rainfall through modeling the interannual and interdecadal rainfall variability by the IAM and IDM, respectively. The interannual components of June Niño-3 index and JA meridional wind over East China were linked to the interannual rainfall variability by the IAM, while the interdecadal component of SLP over the southwestern IO was linked to the interdecadal rainfall variability by the IDM. Both the IAM and IDM show good skills to downscale the interannual and interdecadal rainfall variability in NC summer rainfall. The downscaled total rainfall can be obtained by summing up the two downscaled components from the IAM and IDM. The results indicated that the TSD approach has a relatively high predictive capability for NC summer rainfall.

We have also applied the downscaling model to GCM-generated predictors and estimated the long-term rainfall conditions for both the present-day (1951–99) and the near-future climates (2010–24 and 2035–49) under A1B emission scenario. For the present-day climate, in all cases, the downscaled values showed smaller percentage errors than did the raw GCMs’ simulations. This superiority indicated that the downscaled predictions are more reliable for representing the present-day climate, thus implying a better representation of the future climate. For future projection, a majority of downscaled values indicated a slight increase in rainfall, different from the raw GCMs’ projection. The result also indicated that there would be less chance for large changes to happen, such as severe long-term droughts or floods (relative to the present-day state) for the next 40 years under A1B emission scenario.

We point out that the downscaling models in this study were calibrated based on NCEP-1 reanalysis data. Since the NCEP-1 data may have systematic errors in the period before 1970 (Greatbatch and Rong 2006), there is a need to verify the reliability of the proposed downscale models using other reanalysis data. We have repeated the same analysis by using 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40) data (1958–2002) extended with NCEP-1 reanalysis data (2003–08). For IAM, the results based on these two distinct datasets were almost the same; for IDM, the area of predictor selected based on ERA-40 data relatively shrunk, and the downscaling skill was slightly worse. Anyhow, the downscaling models calibrated from these two reanalysis datasets showed similar skills to some extent, suggesting reliability of our downscaled results.

One should keep in mind that the reliability of downscaled future projection is strongly dependent on the GCMs’ simulations of predictors. Hence, it is important to evaluate GCMs’ simulations of predictors. In an evaluation of the 21 GCMs participating in CMIP3 regarding the predictors used both in the IAM and IDM, large errors were found in terms of the long-term-mean value, linear trend, and interannual variability. Therefore, the improvement of GCMs’ simulations is important for statistical downscaling technique to obtain reliable projections.

It should be noted that the downscaled method in this paper only represented the changes in rainfall that linked to the changes in circulation. Previous studies have indicated the necessity to include the humidity-related parameters in projecting future rainfall changes because moisture would markedly change corresponding to future changes in radiation forcing (Benestad 2001; Charles et al. 1999; Crane and Hewitson 1998; Karl et al. 1990; Spak et al. 2007; Von Storch et al. 1993). In the present analysis, some changes in humidity may be accounted for by changes in the circulation field as they may affect the direction of moisture transfer; however, they would not account for large-scale changes in humidity associated with global warming. The latter effect is difficult to incorporate here because GCMs are unable to supply reliable simulations of the humidity-related predictor that was selected by correlation analysis.

Like other statistical downscaling models, the underlying stationary hypothesis may be questionable. Previous studies have emphasized the importance of assessing the robustness of the relationship in the future (Paul et al. 2008; Wilby and Wigley 1997). The statistical downscaling models that were linked with the principal climate modes try to test the persistence of the principal climate modes under changed climate conditions; however, this task cannot be performed in our downscaling model because it seeks predictors based on correlation analysis. Nevertheless, it is noteworthy that the relationship established in this paper is physically interpretable, which strengthens our confidence in the downscaled results for future.

Finally, it should be cautious to interpret the downscaled rainfall projections for future because the projections inevitably contain a degree of uncertainty. A consistent projection with additional types of downscaling models or regional climate models wound provide more reliability. Future work will make projection under other emission scenarios, such as A2 and B1, to obtain various rainfall conditions under distinct emission scenarios.

## Acknowledgments

We acknowledge the international modeling groups for providing their data for analysis, the Program for Climate Model Diagnosis and Intercomparison (PCMDI) for collecting and achieving the model data, and the World Climate Research Programme’s (WCRP’s) Coupled Model Intercomparison Project for organizing the model data analysis activity. We thank two anonymous reviewers for comments. This work is jointly supported by the 973 Program (2010CB950400), NSFC Key Project (41030961), and the Australia-China Bilateral Climate Change Partnerships Program of the Australian Department of Climate Change. Yun Li was also supported by the Indian Ocean Climate Initiative Project of Western Australian State Government.

### APPENDIX A

#### Cross-Validation-Based Stepwise Regression Approach

The CVSR approach is a “forward” stepwise screening procedure to select the “optimal” predictors from the potential predictor set. It employs leave-one-out cross validation to select the robust predictors and reduce the false possibility. The root-mean-square error between observation and cross-validation estimates (CV_RMSE) is taken as the criterion to evaluate the performance of potential predictor.

The CVSR method can be described in a general form using a series of iteration steps:

where is the predictand for year training period; is the *t*th observation of the predictor selected from candidate predictors by the *i*th step in forward stepwise regression screening; and are model parameters; and is the error of the estimated model (A1). Specifically, model (A1) is established by the following *p* < *m* steps.

- Step 1: Regress the predictand onto each of the potential predictors to obtain 1-predictor regression equation . The performance of each 1-predictor regression equation is measured by CV_RMSE at step 1 where regression equation is fitted by , that is, all observations excluding the
*t*th one. If is the smallest CV_RMSE achieved at step 1, that is, , the potential predictor is selected as the first predictor, that is, . - Step 2: Regress onto and each of the remaining
*m*-1 potential predictors , that is, all potential predictors except to write 2-predictor regression equation . The performance of each 2-predictor regression equation is measured by CV_RMSE at step 2 where regression equation is fitted by . Now, if is the smallest CV_RMSE achieved at step 2, that is, and moreover, is significantly smaller than , the potential predictor is selected as the second predictor, that is, ; otherwise, stop selecting new predictors. To statistically test the significant reduction in relative to ,*t*and*F*tests are utilized to test the quadratic errors series between the observation and cross-validated estimates obtained at step 2 [i.e.,, where is fitted by ] and at step 1 [i.e., , where is fitted by ] in terms of the mean value and the variance.

Generally, at step *k*, assume that there is *k* − 1 predictors selected from original potential predictors , and the associated smallest CV_RMSE at step *k* − 1 is

where regression equation is fitted by

Step *k*: Regress onto and each of remaining *m*- (*k* − 1) potential predictors to write *k*-predictor regression equation . The performance of each *k*-predictor regression equation is measured by CV_RMSE at step *k*

where regression equation is fitted by If is the smallest CV_RMSE achieved at step *k*, that is, and moreover, is significantly smaller than, the potential predictor is selected as the *k*th predictor, that is, ; otherwise, stop selecting new predictors. The *t* and *F* tests are utilized to statistically test the quadratic errors series between the observation and cross-validated estimates obtained at step *k* [i.e.,, where is fitted by ] and at step *k*-1 [i.e., , where is fitted by ] in terms of the mean value and the variance.

Finally, for all of the selected predictors via the CVSR procedure, *F* test is used to test their regression coefficients. The insignificant predictors wound be excluded, and the remaining predictors are used to fit the multilinear-regression equation with the least squares method.

### APPENDIX B

#### Bootstrapping Prediction Intervals for Linear Regression Model

In general, a linear regression model is defined as

where is residual error. Least squares fit to the *n*-yr training data {[**X**(*t*), *Y*(*t*)]: *t* = 1, … , *n*} yields

It is validated by using independent data {[(**X**(*t* + *h*), *Y*(*t* + *h*)]: *h* = 1, … , *N* – *n*}(*N* > *n*). To quantify the uncertainty of downscaled rainfall using Eq. (B2) related to *X _{i}* (

*t*+

*h*) (

*i*= 1, … ,

*p*;

*h*= 1, … ,

*N*–

*n*), we need to establish the cumulative distribution function

*G*for the confidence interval of the prediction error . A prediction intervals for is given by

However, as the distribution *F* of the residual variability in Eq. (B1) is unknown, we cannot obtain the distribution *G* analytically. We apply a bootstrap-resampling procedure to estimate the distribution *G*.

First, the residuals are calculated with Eq. (B2). For an independent test data {**X**(*t* + *h*), *h* = 1, … , *N* – *n*}, predicted value . The error distribution *F* is estimated by the empirical distribution of residuals, which we denote *F _{n}*. This is then used to construct bootstrapped samples of the form

with and , where and are independently sampled from *F _{n}*, that is, they are randomly sampled with replacement from the set of residuals . The superscript

^{*}denotes a value constructed for a particular bootstrap sample.

Each bootstrapped sample is used to calculate a simulated estimate , predicted value , and predicted error . The empirical distribution of , which we denote , is then an estimate of the distribution of the bootstrap prediction errors. It can be used as the distribution function *G*. Therefore, a prediction interval for *Y*(*t* + *h*) can be estimated as

In our downscaling rainfall analysis, estimates of the 95% confidence interval of predicted rainfall for 1000 bootstrapping samples using independent test data over 1991–2008 are shown as dashed blue curves in Fig. 5c. Further, bootstrapping estimates of the uncertainty (50% and 95% confidence intervals for 1000 bootstrap replications) of downscaled future rainfall are shown in Fig. 10.

## REFERENCES

_{2}precipitation changes for the Susquehanna Basin: Downscaling from the Genesis general circulation model