• Balsamo, G., and et al. , 2015: ERA-Interim/Land: A global land surface reanalysis data set. Hydrol. Earth Syst. Sci., 19, 389407, https://doi.org/10.5194/hess-19-389-2015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bao, J., J. Feng, and Y. Wang, 2015: Dynamical downscaling simulation and future projection of precipitation over China. J. Geophys. Res., 120, 82278243, https://doi.org/10.1002/2015JD023275.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Castro, C. L., R. A. Pielke, and G. Leoncini, 2005: Dynamical downscaling: Assessment of value retained and added using the Regional Atmospheric Modeling System (RAMS). J. Geophys. Res., 110, D05108, https://doi.org/10.1029/2004JD004721.

    • Search Google Scholar
    • Export Citation
  • Charles, S. P., B. C. Bates, and J. P. Hughes, 1999: A spatiotemporal model for downscaling precipitation occurrence and amounts. J. Geophys. Res., 104, 31 65731 669, https://doi.org/10.1029/1999JD900119.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chen, D., 2000: A monthly circulation climatology for Sweden and its application to a winter temperature case study. Int. J. Climatol., 20, 10671076, https://doi.org/10.1002/1097-0088(200008)20:10<1067::AID-JOC528>3.0.CO;2-Q.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dee, D. P., and et al. , 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553597, https://doi.org/10.1002/qj.828.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Deidda, R., 2000: Rainfall downscaling in a space-time multifractal framework. Water Resour. Res., 36, 17791794, https://doi.org/10.1029/2000WR900038.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DelSole, T., and J. Shukla, 2009: Artificial skill due to predictor screening. J. Climate, 22, 331345, https://doi.org/10.1175/2008JCLI2414.1.

  • Fan, L., Z. Yan, D. Chen, and C. Fu, 2015: Comparison between two statistical downscaling methods for summer daily rainfall in Chongqing, China. Int. J. Climatol., 35, 37813797, https://doi.org/10.1002/joc.4246.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Frost, A. J., and et al. , 2011: A comparison of multi-site daily rainfall downscaling techniques under Australian conditions. J. Hydrol., 408, 118, https://doi.org/10.1016/j.jhydrol.2011.06.021.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, G., S. P. Charles, and S. Kirshner, 2013a: Daily rainfall projections from general circulation models with a downscaling nonhomogeneous hidden Markov model (NHMM) for south-eastern Australia. Hydrol. Processes, 27, 36633673, https://doi.org/10.1002/hyp.9483.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, G., S. P. Charles, F. H. S. Chiew, J. Teng, H. Zheng, A. J. Frost, W. Liu, and S. Kirshner, 2013b: Modelling runoff with statistically downscaled daily site, gridded and catchment rainfall series. J. Hydrol., 492, 254265, https://doi.org/10.1016/j.jhydrol.2013.03.041.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hammami, D., T. S. Lee, T. B. M. J. Ouarda, and J. Lee, 2012: Predictor selection for downscaling GCM data with LASSO. J. Geophys. Res., 117, D17116, https://doi.org/10.1029/2012JD017864.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., P. Guttorp, and S. P. Charles, 2002: A non-homogeneous hidden Markov model for precipitation occurrence. J. Roy. Stat. Soc., 48C, 1530, https://doi.org/10.1111/1467-9876.00136.

    • Search Google Scholar
    • Export Citation
  • Huth, R., J. Miksovsky, P. Stepanek, M. Belda, A. Farda, Z. Chladova, and P. Pisoft, 2015: Comparative validation of statistical and dynamical downscaling models on a dense grid in central Europe: Temperature. Theor. Appl. Climatol., 120, 533553, https://doi.org/10.1007/s00704-014-1190-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Joyce, R. J., J. E. Janowiak, P. A. Arkin, and P. P. Xie, 2004: CMORPH: A method that produces global precipitation estimates from passive microwave and infrared data at high spatial and temporal resolution. J. Hydrometeor., 5, 487503, https://doi.org/10.1175/1525-7541(2004)005<0487:CAMTPG>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Khalili, M., V. N. Van Thanh, and P. Gachon, 2013: A statistical approach to multi-site multivariate downscaling of daily extreme temperature series. Int. J. Climatol., 33, 1532, https://doi.org/10.1002/joc.3402.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kirchmeier-Young, M. C., D. J. Lorenz, and D. J. Vimont, 2016: Extreme event verification for probabilistic downscaling. J. Appl. Meteor. Climatol., 55, 24112430, https://doi.org/10.1175/JAMC-D-16-0043.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, W., G. Fu, C. Liu, and S. P. Charles, 2013: A comparison of three multi-site statistical downscaling models for daily rainfall in the North China Plain. Theor. Appl. Climatol., 111, 585600, https://doi.org/10.1007/s00704-012-0692-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Y., J. Feng, X. Liu, and Y. Zhao, 2019: A method for deterministic statistical downscaling of daily precipitation at a monsoonal site in Eastern China. Theor. Appl. Climatol., 135, 85100, https://doi.org/10.1007/S00704-017-2356-6.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Manzanas, R., S. Brands, D. San-Martin, A. Lucero, C. Limbo, and J. M. Gutierrez, 2015: Statistical downscaling in the tropics can be sensitive to reanalysis choice: A case study for precipitation in the Philippines. J. Climate, 28, 41714184, https://doi.org/10.1175/JCLI-D-14-00331.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., 2013: Bias correction, quantile mapping, and downscaling: Revisiting the inflation issue. J. Climate, 26, 21372143, https://doi.org/10.1175/JCLI-D-12-00821.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., and et al. , 2010: Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys., 48, RG3003, https://doi.org/10.1029/2009RG000314.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., and et al. , 2017: Towards process informed bias correction of climate change simulations. Nat. Climate Change, 7, 764773, https://doi.org/10.1038/NCLIMATE3418.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nasseri, M., H. Tavakol-Davani, and B. Zahraie, 2013: Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J. Hydrol., 492, 114, https://doi.org/10.1016/j.jhydrol.2013.04.017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Qian, C., W. Zhou, S. K. Fong, and K. C. Leong, 2015: Two approaches for statistical prediction of non-Gaussian climate extremes: A case study of Macao hot extremes during 1912–2012. J. Climate, 28, 623636, https://doi.org/10.1175/JCLI-D-14-00159.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • San-Martin, D., R. Manzanas, S. Brands, S. Herrera, and J. M. Gutierrez, 2017: Reassessing model uncertainty for regional projections of precipitation with an ensemble of statistical downscaling methods. J. Climate, 30, 203–223, https://doi.org/10.1175/JCLI-D-16-0366.1.

    • Search Google Scholar
    • Export Citation
  • Sheffield, J., G. Goteti, and E. F. Wood, 2006: Development of a 50-year high-resolution global dataset of meteorological forcings for land surface modeling. J. Climate, 19, 30883111, https://doi.org/10.1175/JCLI3790.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shin, Y., and B. P. Mohanty, 2013: Development of a deterministic downscaling algorithm for remote sensing soil moisture footprint using soil and vegetation classifications. Water Resour. Res., 49, 62086228, https://doi.org/10.1002/wrcr.20495.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sunyer, M. A., and et al. , 2015a: Inter-comparison of statistical downscaling methods for projection of extreme precipitation in Europe. Hydrol. Earth Syst. Sci., 19, 18271847, https://doi.org/10.5194/hess-19-1827-2015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sunyer, M. A., I. B. Gregersen, D. Rosbjerg, H. Madsen, J. Luchner, and K. Arnbjerg-Nielsen, 2015b: Comparison of different statistical downscaling methods to estimate changes in hourly extreme precipitation using RCM projections from ENSEMBLES. Int. J. Climatol., 35, 25282539, https://doi.org/10.1002/joc.4138.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tareghian, R., and P. F. Rasmussen, 2013: Statistical downscaling of precipitation using quantile regression. J. Hydrol., 487, 122135, https://doi.org/10.1016/j.jhydrol.2013.02.029.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tolika, K., P. Maheras, M. Vafiadis, H. A. Flocasc, and A. Arseni-Papadimitriou, 2007: Simulation of seasonal precipitation and raindays over Greece: A statistical downscaling technique based on artificial neural networks (ANNs). Int. J. Climatol., 27, 861881, https://doi.org/10.1002/joc.1442.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Werner, A. T., and A. J. Cannon, 2016: Hydrologic extremes—An intercomparison of multiple gridded statistical downscaling methods. Hydrol. Earth Syst. Sci., 20, 14831508, https://doi.org/10.5194/hess-20-1483-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Xu, G., and et al. , 2015: Spatial downscaling of TRMM precipitation product using a combined multifractal and regression approach: Demonstration for South China. Water, 7, 30833102, https://doi.org/10.3390/w7063083.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, C., R. E. Chandler, V. S. Isham, and H. S. Wheater, 2005: Spatial-temporal rainfall simulation using generalized linear models. Water Resour. Res., 41, W11415, https://doi.org/10.1029/2004WR003739.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, Q., P. Shi, V. P. Singh, K. Fan, and J. Huang, 2017: Spatial downscaling of TRMM-based precipitation data using vegetative response in Xinjiang, China. Int. J. Climatol., 37, 38953909, https://doi.org/10.1002/joc.4964.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, X., and X. Yan, 2015: A new statistical precipitation downscaling method with Bayesian model averaging: A case study in China. Climate Dyn., 45, 25412555, https://doi.org/10.1007/s00382-015-2491-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zheng, X., and R. W. Katz, 2008: Mixture model of generalized chain-dependent processes and its application to simulation of interannual variability of daily rainfall. J. Hydrol., 349, 191199, https://doi.org/10.1016/j.jhydrol.2007.10.061.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhu, X., X. Qiu, Y. Zeng, W. Ren, B. Tao, H. Pan, T. Gao, and J. Gao, 2018: High-resolution precipitation downscaling in mountainous areas over China: Development and application of a statistical mapping approach. Int. J. Climatol., 38, 7793, https://doi.org/10.1002/joc.5162.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • View in gallery

    Study domain and the gauge stations. Eight stations marked by yellow dots were analyzed in more detail than the stations marked with other dots. Red dots represent the stations for calibration, and the green dots represent the no-gauge sites for validation.

  • View in gallery

    Sketches of (left) the standard ANN and (right) the simplified ANN that was used for the downscaling.

  • View in gallery

    Correlation (Spearman’s R) maps for screening the best model predictors across the grid boxes, calculated between the observed precipitation (Beijing) and five large-scale variables: ((a) MSLP, (b) RH850, (c) T1000, (d) U10, (e) V10, and (f) Wind speed). Noncolored areas represent the grid boxes where correlations cannot pass the significant tests at 0.05 level.

  • View in gallery

    Spearman’s correlation coefficients between the modeled precipitation and the observed precipitation: training period vs validation period for (a) GLM and (b) ANN, (c) ANN vs GLM, and the (d) spatial distribution of R, obtained from ANN at validation period.

  • View in gallery

    Relation between Spearman’s correlation coefficients (ANN) estimated from the test dataset and training dataset in the cross validation. Each dot represents a pair of correlation coefficients from the training dataset (37 years) and the test dataset (1 year).

  • View in gallery

    Statistics of predictors’ coefficients over all of the 83 stations, estimated for the ANN model through the cross validation: (a) mean of parameters/coefficients, calculated over the 38 different results from the cross validations and (b) the weights (coefficients) of the MSLP predictor across all stations.

  • View in gallery

    Histograms of (a) the original precipitation and (b) the transformed precipitation for the Nanyang station. Also shown are QQ plots between the modeled (y axis) and the observed (x axis) values for (c) simulated daily logPr vs the observed logPr with Pr values of 0 removed, (d) simulated daily logPr vs the observed logPr with Pr values of 0 preserved, and (e)–(h) simulated monthly precipitation amounts vs the observed monthly precipitation amounts.

  • View in gallery

    Annual time series of total summer precipitation (the downscaled value was multiplied by 2) from single-site models. The r represents Spearman’s correlation.

  • View in gallery

    Annual time series of summer rainy days (a threshold was set to 0.6 mm for rainy days) from single-site models. The r represents Spearman’s correlation.

  • View in gallery

    Monthly time series of total precipitation of downscaled gridded output at four no-gauge sites (Yantai, Bengbu, Jinzhou, and Xuyi) and the corresponding observations [for the summer months (July–September) in each year].

  • View in gallery

    Comparison between the downscaled precipitation (area averaged) and other products [PGF (1979–2012), ERA-Interim/Land (1979–2010), and CMORPH (2003–1016)]: (a)–(c) daily and (d)–(f) monthly.

  • View in gallery

    Time series comparison of annual summer precipitation averaged over the area between the downscaled results and the PGF product. The downscaled results were multiplied by 4.0 to match the scale of the PGF products.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 265 265 24
PDF Downloads 226 226 30

Gridded Statistical Downscaling Based on Interpolation of Parameters and Predictor Locations for Summer Daily Precipitation in North China

View More View Less
  • 1 School of Resources and Environment, Henan Polytechnic University, Jiaozuo, Henan, China
  • | 2 Key Laboratory of Regional Climate-Environment Research for Temperate East Asia, Institute Atmospheric Physics, Chinese Academy of Sciences, Beijing, China
  • | 3 Key Laboratory of Digital Earth Science, Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing, China
  • | 4 School of Resources and Environment, Henan Polytechnic University, Jiaozuo, Henan, China
© Get Permissions
Full access

Abstract

Few statistical downscaling applications have provided gridded products that can provide downscaled values for a no-gauge area as is done by dynamical downscaling. In this study, a gridded statistical downscaling scheme is presented to downscale summer precipitation to a dense grid that covers North China. The main innovation of this scheme is interpolating the parameters of single-station models to this dense grid and assigning optimal predictor values according to an interpolated predictand–predictor distance function. This method can produce spatial dependence (spatial autocorrelation) and transmit the spatial heterogeneity of predictor values from the large-scale predictors to the downscaled outputs. Such gridded output at no-gauge stations shows performances comparable to that at the gauged stations. The area mean precipitation of the downscaled results is comparable to other products. The main value of the downscaling scheme is that it can obtain reasonable outputs for no-gauge stations.

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Yonghe Liu, yonghe_hpu@163.com

This article is included in the Global Precipitation Measurement (GPM) special collection.

Abstract

Few statistical downscaling applications have provided gridded products that can provide downscaled values for a no-gauge area as is done by dynamical downscaling. In this study, a gridded statistical downscaling scheme is presented to downscale summer precipitation to a dense grid that covers North China. The main innovation of this scheme is interpolating the parameters of single-station models to this dense grid and assigning optimal predictor values according to an interpolated predictand–predictor distance function. This method can produce spatial dependence (spatial autocorrelation) and transmit the spatial heterogeneity of predictor values from the large-scale predictors to the downscaled outputs. Such gridded output at no-gauge stations shows performances comparable to that at the gauged stations. The area mean precipitation of the downscaled results is comparable to other products. The main value of the downscaling scheme is that it can obtain reasonable outputs for no-gauge stations.

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Yonghe Liu, yonghe_hpu@163.com

This article is included in the Global Precipitation Measurement (GPM) special collection.

1. Introduction

The impacts of future climate change on global and local scales are an important concern for the public, and how to estimate plausible future climate scenarios is a challenging task for researchers. For this purpose, the main source of information is global climate models (GCM). The GCM outputs are too coarse and cannot reflect the details on regional and local scales; therefore downscaling is needed to resolve this scale mismatch problem. Statistical downscaling (SD) is a technique to infer local information (Sunyer et al. 2015a) by statistically relating local variables to the GCM large-scale variables (LSV). Another technique is dynamical downscaling (DD) (Bao et al. 2015; Castro et al. 2005), which is computationally expensive. Usually, DD can produce gridded simulations covering a region with visually realistic spatial patterns that imply spatial dependence or spatial autocorrelation of precipitation, whereas SD mostly produces outputs at gauge sites and not for no-gauge sites. Gridded precipitation datasets are usually required in hydrological assessment (Werner and Cannon 2016) and are useful for short-term weather prediction or reconstructing historical precipitation. In this respect, how to exploit SD techniques to obtain gridded products with less computation than that spent by DD is a useful research subject.

Gridded SD has been used to improve the small-scale spatial variability of remote sensing products (see Shin and Mohanty 2013; Zhang et al. 2017; Zhu et al. 2018). For SD based on LSVs, gridded products can be obtained by interpolation from station-based SD outputs. Huth et al. (2015) compared the gridded temperature outputs of regional climate models and station-network SD outputs with station observations, indicating that both DD and SD show similar spatial autocorrelations. Fu et al. (2013b) implemented three gridded downscaling schemes for daily precipitation by interpolating from stations to grid cells.

SD methods can be classified as either deterministic or stochastic (Maraun et al. 2010): the former simulates unique values (usually the means) of precipitation (Kirchmeier-Young et al. 2016; San-Martin et al. 2017), whereas the latter provides noise models to explain the variability or the extremes (Maraun et al. 2010). Stochastic downscaling is usually preferred for pure climate assessment, with respect to day-to-day variability, dry–wet spells, spatial–temporal autocorrelation, and extremes (Fan et al. 2015; Nasseri et al. 2013; Sunyer et al. 2015b; Tareghian and Rasmussen 2013). In this study, we prefer deterministic and gridded SD for two reasons. First, for daily SD, one of the scientific objectives is to reduce the uncertainty and represent daily variability in a deterministic manner. The probability distribution function of precipitation under a certain atmosphere circulation is usually related to the modeled mean values. Actually, some of the current stochastic SD methods are combinations of deterministic and stochastic models (see Khalili et al. 2013). Second, the number of daily samples of specific large-scale conditions is large when compared with the numbers for monthly or annual samples; therefore, variances and distributions can be directly reflected by daily outputs. In other words, on the daily scale, the long time series of deterministic outputs can also reproduce the statistics in a climate aspect, as long as the daily series provided is long enough. It is also easy to produce stochastic ensembles that are based on deterministic outputs.

In this study, a gridded SD scheme that combines predictor screening based on correlation maps (see Liu et al. 2019) and the high-resolution interpolation of parameters to the study region is designed and tested. The following three questions can be answered: 1) Based on single-site models for different stations and the interpolation of parameters and predictors to the spatial domain, can the downscaling produce reasonable gridded outputs? 2) How can the gridded outputs reflect the spatial pattern and temporal variation in precipitation on daily and monthly scales? 3) What performance can be attained in no-gauge areas by interpolating parameters and predictors?

2. Data and methods

a. Study area and data

The study region of interest is North China, a populated area with dense meteorological stations; thus it is an ideal area for testing a gridded SD scheme. Most of the northern part of North China is in a warm temperate zone, and the southern part of North China is in a subtropical zone. Most of the precipitation in North China occurs in boreal summer as a result of the warm and moist air that is brought by the East Asian monsoon. In the winter, the Siberian anticyclone controls this area, bringing cold and dry air to this region. Relative to the climate in South China, summer precipitation in North China is lower but usually concentrated in some periods. Most of this area is semiarid and semihumid. North China also plays an important role in China’s economy and urbanization but has limited water resources. With the rapid development of the society, the demand for water resources in this area is increasing, which plays an increasingly important role in the sustainable development of ecology and economy.

A spatial domain of 32°–42°N, 110°–122°E for the study region was selected for precipitation downscaling (Fig. 1). The daily precipitation in the summer (June–September) in the region was obtained from the China Meteorological Data Sharing Service (http://data.cma.cn). Overall 83 gauging stations with complete records were used here to train the single-station downscaling models, covering the period of 1979–2016. Another 12 stations having a small number of missing records were used to validate the gridded output in “no gauge” areas. ERA-Interim datasets (ERAI) from the European Centre for Medium-Range Weather Forecasts (Dee et al. 2011) were obtained, and only those forecast at 0 h UTC for 12 h UTC each day were used as the LSVs for the downscaling. This ERAI dataset was clipped to cover the period 1979–2016 and the entire region of China (73°–137°E, 15°–55°N). In practical downscaling applications, the LSVs simulated by GCMs usually have coarse resolutions; thus, to be more comparable to the GCM outputs, this ERAI dataset was regridded from the original resolution (generally corresponding to 0.75° × 0.75°) to 1° × 1° resolution. Some gridded precipitation datasets, such as Princeton Global Meteorological Forcing data, version 2 (PGF; Sheffield et al. 2006), the Climate Prediction Center morphing global precipitation analysis (CMORPH; Joyce et al. 2004), and ERA-Interim/Land (Balsamo et al. 2015), were acquired for comparison.

Fig. 1.
Fig. 1.

Study domain and the gauge stations. Eight stations marked by yellow dots were analyzed in more detail than the stations marked with other dots. Red dots represent the stations for calibration, and the green dots represent the no-gauge sites for validation.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

b. Predictor selection

Predictor selection consists of the selection among different LSVs and the selection of representative grid box across each LSV field. The predictors representing the atmospheric circulation, humidity, and temperature can be used to downscale precipitation (Maraun et al. 2010). For representing the circulation, daily mean sea level pressure (MSLP) was selected as a candidate predictor variable, since it is commonly used for SD (Fan et al. 2015). Summer precipitation in the study area is related to the East Asia summer monsoon, so the wind velocity is also a useful indicator of the circulation. Here, the u-wind velocity and υ-wind velocity at 10 m (U10 and V10) were used, because in practice the variables close to the surface level are mostly available as compared with those at other pressure levels. Given that precipitation occurrence is sensitive to the pressure levels of humidity and air temperature, the specific humidity (SH), relative humidity (RH), and air temperature T at four pressure levels (1000, 850, 700, and 500 hPa) were selected as candidate predictors. For each candidate variable, the values between different pressure levels are highly correlated; therefore, to avoid the potential collinear problem, only the pressure level that has the largest absolute correlation coefficients with the observed precipitation was to be used. This selection was done by combining with the gridbox selection process. The gridbox selection was performed by using the method in Liu et al. (2019): the grid box having the best correlation with the gauged precipitation was used to get predictor values. This gridbox selection in Liu et al. (2019) indicated that the pressure levels for SH, RH, and T vary with different gauges. Here, only the pressure level that has the largest number of the most highly correlated gauge stations was selected: 500 hPa for SH, 850 hPa for RH, and 1000 hPa for T. The SH has fewer high-correlated stations than the RH does; therefore, it was not used as a predictor.

For adjacent stations, the most correlated grid boxes for LSVs were found usually in different but adjacent grid positions. Thus, the distance from the best-correlated LSV grid box to a gauging station is spatially correlated across the station networks. Here, we call these distances the “predictand–predictor distance.” These distances can be regarded as a function of the predictand (station) locations. For each location i and LSV j, the longitudinal or latitudinal distances from the optimal predictor location to the ith predictand location (station) is denoted as the function D(i, j). For the jth variable, the [xi, yi, D(i, j)] tuples can be constructed for different stations, where xi and yi are the longitude and latitude of the ith station. When we define a dense longitude–latitude grid for predictands, the D(k, j) value for the kth high-resolution grid cell can be interpolated. Here, a grid cell can be regarded as the location of a virtual station, regardless of whether a true station exists. For a given gridcell location, adding the interpolated distances D(k, j) enabled us to obtain the predictor grid box:
Xk,j=xk,j+Dx(k,j)andYk,j=yk,j+Dy(k,j),
where Xk,j and Yk,j are the longitude and latitude of the predictor grid box of the jth LSV, respectively. The Dx(k, j) and Dy(k, j) are the longitudinal and latitudinal distances for the kth dense grid cell, respectively, from the current predictand cell to its predictor’s grid box.

c. Statistical downscaling method

For each station, any transfer functions such as linear regression models (LM), generalized linear models (GLM) or artificial neural networks (ANN) can be constructed to relate the gauge-measured precipitation to the predictors. Precipitation has a highly skewed distribution, and using Pearson’s correlation test (Pearson R) or common linear regression on such variable is inappropriate (Qian et al. 2015). Instead, Spearman’s rank correlation (Qian et al. 2015) can be used for analyzing the correlation between such skewed distribution variables (including precipitation) and LSVs. As described in many studies (Maraun et al. 2010), precipitation can be mathematically transformed into a variable that has a more Gaussian distribution. Here, we use a simple function to transform the daily precipitation intensity Pr (Qian et al. 2015):
logPr=f(Pr)log(Pr+0.25).
Here, 0.25 is used to avoid zero values (no precipitation), and logPr is the transformed Pr. Pearson correlation and LM/GLM regressions can be used on logPr. Through such a transformation, the occurrence and intensity of precipitation can also be represented in a single variable instead of being treated separately, as in many previous studies (Manzanas et al. 2015; Yang et al. 2005; Zheng and Katz 2008).
GLMs can represent the relationship between the mean of the predictand and predictors X = [x0, x1, x2, …, xn] (n is the number of elements in X):
m=β0+β1x1+β2x2++βkxk,
where βi is the ith coefficient of the predictor value and m is the mean of logPr under the assumption of a gamma distribution. Equation (3) is almost the identical form of a common linear regression. Here, the only difference between a GLM and LM is the assumption of the precipitation’s distribution because common LMs are based on a normal distribution of the log-transformed predictand whereas GLMs assume that the log-transformed predictand has a distribution from the exponential family, such as a gamma distribution.
ANNs have been widely used for decades (Tolika et al. 2007) as a tool of artificial intelligence or machine learning. ANNs are based on a set of connected neurons to resolve the mapping between inputs and outputs (left side of Fig. 2). Each neuron has the form
y=L(i=1mxiwi+b),
where y is the output of the neuron, L() is a nonlinear transfer function, xi is the ith element in the input vector, wi is the weight of xi, and b is a constant value as a deviation. The neurons can be divided into several layers: an input layer (the input vector), several hidden layers, and an output layer. The neuron’s output in one layer can be reused as the input vector for the next layer. The weights are parameters of an ANN and must be trained from input–output samples. When more than one neuron is in the hidden layer, multiple weights/parameters must be estimated, which requires more samples in the training dataset. When the training dataset is not large enough, the ANN model will be overfitted. Therefore, in statistical downscaling based on finite samples, models with few parameters are preferable to a model with many parameters.
Fig. 2.
Fig. 2.

Sketches of (left) the standard ANN and (right) the simplified ANN that was used for the downscaling.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

In this study, a very simple ANN was used as a downscaling model with only one neuron in the hidden layer (right side of Fig. 2). The link function of the hidden-layer neuron is a common logistic function, and the output-layer neuron is a simple linear transformation. This simple ANN can be expressed as
m=b0+β×l(i=1mxiwi+b1),
where m is the mean of the log-transformed precipitation intensity [using Eq. (2)]; b0 and β are the constant deviation and the weights in the output layer, respectively; xi and wi have the same meaning as in Eq. (4); and b1 is a constant bias in the hidden-layer neuron. Here, the purpose of using logPr instead of Pr is to treat the predictand in a more Gaussian-like distribution. The function l() is the nonlinear link function, which is a logistic function in this study:
l(x)=1/[1+exp(x)].
This simple ANN can also be regarded as a variant of logistic regression and thus belongs to the GLM family. Equation (5) is a parameter-parsimonious model with fewer parameters than those used by Zhang and Yan (2015).
If two neurons are defined in the hidden layer, a simple ANN can be expressed as
m=b0+β1×l(i=1mxiw1,i+b1)+β2×l(i=1mxiw2,i+b2),
where w1,i and w2,i are the weights in the first and second neurons in the hidden layer, respectively; and b1 and b2 are constant biases in the two neurons.
In this study, the GLM of Eq. (3) and the ANN of Eq. (7) are used for comparison with the ANN in Eq. (5) during the validation step. The GLM can be fitted with the maximum likelihood method, whereas the ANN can be trained by minimizing
minθJ(θ)=(yy)2,
where θ represents all of the parameters (biases and weights) and y and y′ are the observed and the model-simulated values, respectively. All of the above models were fitted/trained by a gradient descent algorithm that was developed by the authors.

d. Streamline of the downscaling technique

The streamline is designed with eight steps. The first step is to standardize the LSV predictors for each 1° × 1° grid box:
Aij=(Xijmj)/σj,
where Aij and Xij are the anomaly value and original value at the ith day at the jth grid box, respectively; mj and σj are the mean and standard deviation, respectively, of the predictor that are calculated over all days at the jth grid box. In step 2, for each LSV and each single station, the Spearman’s rank correlation coefficients (Spearman’s R) between the observed precipitation and standardized predictors are calculated and the grid boxes with the largest Spearman’s R are selected to obtain predictor values.

The third step is to record the position of the grid box having the best absolute correlation (it must pass the significance test at the 0.05 level) for each station and each LSV and then to calculate the distances between each station and the optimal predictor locations for this station. Step 4 is to use the selected predictor values to train each single-station downscaling model (GLM or ANN).

In step 5, the performance of each single-station downscaling model is tested and then the Pearson’s correlation coefficients (Pearson R) between the downscaled and observed logarithmic precipitation are calculated for both the calibration period and the validation period. The sixth step is to define a 0.1° × 0.1° dense grid that covers the study region. The distances D(k, j) are interpolated to the dense grid from the gauging stations. Here, a bilinear interpolation method based on Delaunay triangulation is used, and the grid cells outside the Delaunay triangulation are interpolated according to the nearest neighbors.

Step 7 is to interpolate each parameter of the GLM or ANN from the stations to the 0.1° × 0.1° high-resolution grid by bilinear interpolation based on Delaunay triangulation. Step 8 has three parts: Calculate the output for each dense grid cell according to the longitudinal and latitudinal distances D(k, j) (here, k represents the kth 0.1° × 0.1° dense grid cell and j represents the jth LSV) to locate the predictor grid box. Interpolate the predictor values for the dense predictand grid to remove spatially abrupt changes at the boundaries between adjacent grid boxes. Directly use the parameters and predictor values that were interpolated for the kth cell to calculate the output by using one of the models in Eq. (3), (5), or (7).

In this streamline, once the D(k, j) and the parameters for all of the dense grid cells are interpolated and recorded in files, they become components of the downscaling technique and can be reused in any subsequent practical downscaling tasks.

e. Model training and validation

1) Training single-station models

In the second step of the streamline, only the period 1979–98 was used to calculate the Spearman’s R so as to avoid the problem of artificial skill (DelSole and Shukla 2009).

The ANNs and GLMs were trained through two calibration and validation schemes. In the first scheme, the models were trained on the basis of the period 1979–98, and their performance was assessed for the period 1999–2016 using correlation coefficients. This calibration and validation scheme was used to demonstrate whether the estimated parameters for 1979–98 remained valid for the later period and whether significant artificial skill existed because of the predictor screening. In the second scheme, cross validation was performed by leaving one year out: for each time, the data in one year were left out and then the model was trained with the remaining data. Then, the outputs for this left-out year were calculated and statistics such as the correlation coefficients R were calculated. The entire period of 1979–2016 was 38 years, so the above cross validation was conducted in 38 folds. Afterward, the averages of the estimated parameters were accepted as the final parameters to be used. Performing multiple cross validations enabled us to capture the variations in the statistics that were used for assessment.

In parameter estimation for any models with multiple predictors, some predictors may get incorrect signs of coefficients that were caused by the collinearity in different predictors. For example, air pressure is always negatively correlated with precipitation, but sometimes a positive coefficient (model parameter) for air pressure is obtained because most of the precipitation variation is explained by other predictors (such as relative humidity). Therefore, both the coefficients for air pressure and other predictors (including relative humidity) are incorrectly estimated, which is an overfitting problem. Removing air pressure from the predictors and then estimating the parameters for other predictors can resolve this problem. This is similar to the techniques of the least absolute shrinkage and selection operator (LASSO; Hammami et al. 2012). In this study, no LASSO-based parameter estimation for ANN is available; therefore, the incorrect predictor coefficients were removed manually. For both of the abovementioned schemes, to avoid the overfitting problem, the coefficient of each predictor was checked for whether the sign of coefficient was contrary to the sign of the predictor–predictand correlation. If true, the predictor was replaced by zeros (equivalent to removing this predictor) and trained for a second time. The final coefficients were accepted as reliable parameters with which to proceed with further analysis.

2) Validating the gridded outputs

The temporal variability at no-gauge locations was analyzed. Here the no-gauge location means no gauge-measured precipitation was involved in model training for the location. In this study, 12 stations having some missing values during the period 1979–2016 were reserved as the no-gauge stations to evaluate the gridded output.

3. Results and discussions

a. Parameter screening

The Spearman’s R between the observed precipitation and gridbox values of the five LSVs across the entire region of China was calculated for all stations and all grid boxes. The results for the Beijing site are shown in Fig. 3a. For the MSLP, the center of the highly correlated areas is located between the lower reaches of both the Yellow River and Yangtze River. This high-correlation zone corresponds to a low pressure system that occurs in boreal summer and is a component of the rain belt that is brought by the summer monsoon. Besides the MSLP, highly correlated areas were also found for other LSVs. No significant positively or negatively correlated center exists for the MSLP or RH850, respectively. For T1000 (Fig. 3c), U10 (Fig. 3d), and V10 (Fig. 3e), both positive and negative high-correlation areas were found. Combining the high correlation coefficients of U10 and V10 (Fig. 3f) revealed a cyclonic pattern of prevailing wind, and the Beijing area lies on the northwestern side of this cyclonic pattern. This can be explained physically: when precipitation occurs at the station of Beijing, the frontal surface inclines from the warm air mass in the southeast to the cold air mass in the northwest, and when precipitation is falling from the cloud in the frontal surface the ground at the station is still in the cold air mass below the frontal surface.

Fig. 3.
Fig. 3.

Correlation (Spearman’s R) maps for screening the best model predictors across the grid boxes, calculated between the observed precipitation (Beijing) and five large-scale variables: ((a) MSLP, (b) RH850, (c) T1000, (d) U10, (e) V10, and (f) Wind speed). Noncolored areas represent the grid boxes where correlations cannot pass the significant tests at 0.05 level.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

On the basis of this screening, eight predictors were used (Table 1): the negative high-correlation center for the MSLP, the positive high-correlation center for the RH850, and both the negative and positive centers for T1000, U10, and V10. Here, the negative (or positive) high-correlation center is a single grid box that has the largest negative (or positive) correlation in all of the high-correlation (can pass the 0.05 level significance test) grid boxes. For T1000, U10, and V10, using both the negative and positive centers has a similar effect as using predictors based on gradients between different grid points (Fu et al. 2013a; Liu et al. 2013) or circulation indices (Chen 2000; Fan et al. 2015).

Table 1.

Averaged coefficients over all stations through cross validation.

Table 1.

The abovementioned correlation maps show the large-scale conditions that are related to the rainfall in the Beijing area. Similar patterns for the high-correlation areas were also found for other stations. For stations that are close to each other, their high-correlation areas from the LSVs are very similar, but the high-correlation centers are slightly shifted with the changing station locations. This result indicates that the distances from the stations to the best positions of LSV predictors (the high-correlation centers) are generally similar for neighboring stations and thus could be spatially interpolated.

b. Training and validating single-station models

1) Comparison among different models

Different transfer functions were used to construct downscaling models: the common GLM [Eq. (3)], a simple ANN with a one-neuron hidden layer [SANN1; Eq. (5)], and a simple ANN with a two-neuron hidden layer [SANN2; Eq. (7)]. For the training period 1979–88, the Pearson’s correlation coefficient R between the modeled logPr and the observed logPr was estimated for each station and each model. For most stations and models, the R values for the training period (training R) were found to be generally larger than those for the validation period (validation R) (Figs. 4a,b).

Fig. 4.
Fig. 4.

Spearman’s correlation coefficients between the modeled precipitation and the observed precipitation: training period vs validation period for (a) GLM and (b) ANN, (c) ANN vs GLM, and the (d) spatial distribution of R, obtained from ANN at validation period.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

Additionally, when the training R at a station was relatively large when compared with stations with smaller training-R values, the difference between the training R and validation R was relatively small. For example, when the training-R values of SANN1 (Fig. 4b) were less than 0.63, there are more stations with large discrepancies between the training-R and validation-R values.

The R values from SANN2 are slightly larger than those from SANN1; however, the additional parameters from SANN2 may have increased the probability of overfitting, which means that the advantage of SANN2 over SANN1 can be ignored. Meanwhile, the R from SANN1 is greater than that from the GLM (Fig. 4c), indicating that SANN1 had better performance than an ordinary GLM. Therefore, the final downscaling models were constructed by using SANN1.

By comparing the R values (by SANN1) across all 83 stations, the spatial pattern of R dependent on different areas was obtained: the stations in the northern and eastern sides of the region had smaller R values, especially the stations around the Bohai Sea, whereas the stations in the western and southern areas of the region had larger R values. Only four stations on the outer side of Liaodong Peninsula and on the eastern front of the Shandong Peninsula are exceptions. The above pattern reflects that the rainfall in the southern and western portions of this region is more predictable when using large-scale predictors.

2) Leave-one-out cross validation

To avoid the overfitting problem, a leave-one-out cross validation was performed on SANN1 [Eq. (5)]. That is, for each time, the samples in one year (only in summer) were left out when training the models. The training-R values were plotted against the validation R (Fig. 5), showing a significant linear relationship between them and reflecting that a higher training-R value is usually accompanied by a lower validation-R value. The ranges of training R are smaller than those of validation-R. For instance, the training R for the Beijing station ranges from 0.56 to 0.58, and the corresponding validation-R ranges from 0.3 to 0.8. Nevertheless, the size of the validation datasets is much smaller than that of the training datasets, so very low or high validation-R values cannot reflect the real performance of the model. The averages of the training Rs from 38 folds of cross validations can be regarded as the performance assessment of the model and generally had the same effect as those in Fig. 4d.

Fig. 5.
Fig. 5.

Relation between Spearman’s correlation coefficients (ANN) estimated from the test dataset and training dataset in the cross validation. Each dot represents a pair of correlation coefficients from the training dataset (37 years) and the test dataset (1 year).

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

To evaluate the reliability of the parameters that were estimated during the cross validation, the standard deviation (STD) and the mean of each parameter were calculated across the 38 folds of the cross validations. For all of the stations, any parameter changes in different folds are comparatively small and stable (Fig. 6a). The STDs are smaller than the absolute means (Table 1), implying that the estimated parameters are reliable, so the corresponding predictors could not be ignored in the ANN model. The largest absolute ratios (STD/mean) were produced by the four wind-based predictors, with STDs greater than 0.1 and means less than 0.4. The absolute STD/mean of the parameters (coefficients wi and biases) for the other variables are all smaller than 0.1. For the above reason, all of the averaged parameter values were accepted for the ANN (SANN1). The parameters (weights and biases) for each predictor across all of the stations showed regional patterns (Fig. 6b), indicating that the adjacent stations had spatially autocorrelated parameter values.

Fig. 6.
Fig. 6.

Statistics of predictors’ coefficients over all of the 83 stations, estimated for the ANN model through the cross validation: (a) mean of parameters/coefficients, calculated over the 38 different results from the cross validations and (b) the weights (coefficients) of the MSLP predictor across all stations.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

3) Precipitation distributions

Precipitation amounts are usually far from a normal distribution (Fig. 7). After logarithmic transformation with Eq. (2), the distribution of the resulting logPr became closer to a normal distribution (Fig. 7). The daily distributions of summer precipitation in logarithmic transformation could generally be reproduced by the ANN model because the points were usually arranged near a straight line. When zero-value samples were excluded, the distribution could be better reproduced (Figs. 7c,d), indicating that zero values (no-rain values) play an important role in the distribution of the daily precipitation.

Fig. 7.
Fig. 7.

Histograms of (a) the original precipitation and (b) the transformed precipitation for the Nanyang station. Also shown are QQ plots between the modeled (y axis) and the observed (x axis) values for (c) simulated daily logPr vs the observed logPr with Pr values of 0 removed, (d) simulated daily logPr vs the observed logPr with Pr values of 0 preserved, and (e)–(h) simulated monthly precipitation amounts vs the observed monthly precipitation amounts.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

Extreme values were usually underestimated by the models, which is reasonable because the modeled outputs are mean values of skewed distributions and the accumulation of extreme values is far larger than the accumulation of small values. This phenomenon can lead to further underestimations on the accumulated precipitation and variance over a long period. To resolve this problem, variance inflation is often needed, although such techniques are imperfect (Maraun 2013). Here, to compare with the observations on monthly or annual scales, the modeled results were simply rescaled by multiplying them by a variance inflation factor. For different gauge stations, the factors could be set to different values from 2.0 to 4.0, which were estimated as the ratio of the downscaled interannual means to the simulated ones. By the comparison, the distributions of the monthly total summer precipitation were also adequately reproduced at most stations (Figs. 7e–h). Therefore, the extremely large monthly precipitation seems reflected by the modeled values.

4) Variations in monthly and annual total summer precipitation

The following analyses were performed on the basis of the outputs of the single-site ANN models. The R values from the simulation were calculated from the monthly total precipitation (MTP) for several sites. The simulated monthly precipitation generally has a satisfying correlation with the observed precipitation. For example, the R values for the stations of Beijing, Hohhot, and Weihai during 1979–2016 were calculated as 0.58, 0.64, and 0.65, respectively. Such correlations may not be as large as commonly expected by modelers. Nevertheless, these correlations are very sensitive to the observed extreme values, which are stochastic in nature and have a large influence on correlations on both daily and annual scales. For the annual total summer precipitation, the simulated results generally reflected the interannual variations in the observed precipitation (Fig. 8).

Fig. 8.
Fig. 8.

Annual time series of total summer precipitation (the downscaled value was multiplied by 2) from single-site models. The r represents Spearman’s correlation.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

For dry days, the models produced small precipitation values, so thresholds should be used to treat trace values as dry-day values. For each station, a threshold is calculated for the downscaled values so that the interannual total number of rainy days can match the observed ones. For most stations, the thresholds are close to 0.6 mm. Using these thresholds, the number of rainy days for each year was counted (Fig. 9). For the Xintai, Weihai, Lishi, and Nanyang stations, the modeled numbers of rainy days matched the observed ones. The correlations for annual rainy days are significantly larger than the correlations of annual total precipitation.

Fig. 9.
Fig. 9.

Annual time series of summer rainy days (a threshold was set to 0.6 mm for rainy days) from single-site models. The r represents Spearman’s correlation.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

c. Validating gridded simulation

1) Temporal performance at no-gauge sites

In the model training, 12 stations were omitted because of some missing values, so they are ideal sites for validating the gridded outputs for no-gauge areas. The series corresponding to these locations were extracted from the gridded outputs and then were compared with the daily and monthly observations. Both Spearman’s R and Pearson’s R were calculated. Specifically, Pearson’s R was calculated on the basis of the logarithmic transformation. Note that the Spearman’s R is not necessarily larger than the Pearson’s R, because of different distributions of samples. These correlations for daily precipitation are generally comparable to those estimated from the training stations (see Table 2 and Figs. 3 and 4). In general, the correlations for monthly precipitation are larger than those for daily precipitation; however, this is not always the case. After some small daily values (corresponding to dry-day values) in the downscaled outputs were replaced with zeros, the monthly precipitation generally reflected the variations as shown in the observations (Fig. 10).

Table 2.

Correlations between downscaled precipitation and the observed precipitation at no-gauge sites (represented by the 12 green dots in Fig. 1). The first two data columns represent the daily results, and the second two are the monthly results.

Table 2.
Fig. 10.
Fig. 10.

Monthly time series of total precipitation of downscaled gridded output at four no-gauge sites (Yantai, Bengbu, Jinzhou, and Xuyi) and the corresponding observations [for the summer months (July–September) in each year].

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

2) Area-averaged precipitation from the gridded output

The precipitation from the downscaled results, the PGF, and the ERA-Interim/Land and CMORPH products was spatially averaged over daily, monthly, and annual summer scales. The downscaled area-averaged precipitation was compared with those from the other three products at the daily scale (Figs. 11a–c) and the monthly scale (Figs. 11d–f). The correlation coefficients among these different precipitation products were estimated (Table 3). Here, the correlation was calculated on the basis of the original precipitation values rather than the log-transformed values. Therefore, the correlation coefficients between different products based on the area-averaged precipitation were larger than those based on the precipitation at single sites. Comparing the R values among the products showed that the downscaled precipitation and the other products have obtained comparable correlations between each pair of the products. The best correlation was obtained between the downscaled and ERA-Interim series (0.74). A comparatively low correlation (0.46) was obtained between the downscaled and PGF series. Generally, the correlations between the downscaled products and other products were better than those between the PGF and other products. At the monthly scale, the R2 values exceeded those on the daily scale. The R2 between the downscaled and ERA-Interim/Land series remained the largest (R2 = 0.57) among all of the comparisons. The R2 value on the monthly scale between the downscaled and PGF series was greatly improved (R2 = 0.52), when compared with that on the daily scale. On the annual summer scale, some annual variations were generally reflected by the downscaling (Fig. 12), with a Pearson R of 0.67 (R2 = 0.45) between the downscaled and PGF series. This value is small when compared with the R2 on the monthly scale. However, only 34 samples were examined at the annual scale, so this relatively small R is also reasonable.

Fig. 11.
Fig. 11.

Comparison between the downscaled precipitation (area averaged) and other products [PGF (1979–2012), ERA-Interim/Land (1979–2010), and CMORPH (2003–1016)]: (a)–(c) daily and (d)–(f) monthly.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

Table 3.

Pearson’s correlation coefficients R between different precipitation (logarithmic transformed) products that were averaged over the study domain, at the daily scale.

Table 3.
Fig. 12.
Fig. 12.

Time series comparison of annual summer precipitation averaged over the area between the downscaled results and the PGF product. The downscaled results were multiplied by 4.0 to match the scale of the PGF products.

Citation: Journal of Applied Meteorology and Climatology 58, 10; 10.1175/JAMC-D-18-0231.1

4. Conclusions

A gridded summer (June–September) precipitation downscaling scheme was designed that is based on the station networks in North China. The technique is extended from single-station models. The predictor screening was designed by focusing on selecting optimal grid boxes across the domain of five LSVs and obtained highly correlated predictors with daily precipitation for each station. For all of the single-station models, the correlations between the modeled and observed daily values (logarithmic) vary between 0.4 and 0.65 and are comparatively high for precipitation estimation on a daily scale. On the monthly scale, the correlations were approximately 0.6, which is also comparatively large when compared with other similar studies (Zhang and Yan 2015). Such good performance at single stations provides a basis for gridded downscaling. For single-station models, as compared with GLM and the SANN2, the SANN1 was used in the final downscaling scheme. In this study, the ANNs were designed in very simple forms to overcome the “black box” nature of a common ANN and to avoid the large number of parameters needed by a common ANN. Both versions of ANNs performed better than the GLM. SANN2 showed a small advantage over SANN1 at the expense of employing more additional weights/parameters. SANN1 is preferable to avoid the overfitting problem from the presence of additional weights.

In the predictor screening, for one station and an LSV, the selected optimal predictor location (grid box) with the best positive/negative correlation has a distance D(k, j) to the station. D(k, j) is a vector including two components: longitudinal distance and latitudinal distance. Therefore, for one station and one LSV, there should be four values for D(k, j): the longitudinal and latitudinal distances for a positively correlated grid box, and the two distances for a negatively correlated grid box. In this study, the D(k, j) for a positively correlated grid box of MSLP and the D(k, j) for a negatively correlated grid box of RH850 were not used. Here, an assumption is made: for one LSV and the station network, the D(k, j) values across different stations have spatial patterns. Actually, this spatial pattern assumption holds true, because for adjacent stations the high-correlation areas shown on correlation maps are in similar patterns. These spatial patterns implied that smoothed interpolations from the gauge stations could be used to obtain distance values for any positions (or cells in a dense grid). For any local position p, its optimal grid box of LSV can be calculated by adding the interpolated D(k, j) to p. Therefore, for any local position, no matter whether it corresponds to a gauge station, it is easy to know where to get the predictors from LSV fields.

From each LSV, the model parameter estimated also presented spatial patterns across different stations. In other words, adjacent stations produced similar parameter values. Therefore, the model parameters at any local location can be estimated by using another interpolation method. Before interpolation, the parameters should be estimated by training each single-station ANN model. The validations for single stations showed that the model output could reproduce the MTP distribution, and the downscaled MTP was highly correlated with the observed MTP (R > 0.6). The variations of annual summer precipitation were roughly reflected, and the annual number of summer rainy days reflects a high performance. Meanwhile, the performance on the daily scale, as shown by the cross validation–based parameter estimation, is also satisfying.

The spatial patterns across different stations indirectly reflect that the parameters and the station-predictor distances D(k, j) were estimated reliably, although they were estimated separately for different stations. By using these two interpolation methods, we could effectively preserve the spatial autocorrelation while also transmitting the spatial heterogeneity (the spatial variations) of the values of both parameters and predictors to the modeled outputs. This approach can overcome the shortcoming of output-based interpolation (see Fu et al. 2013b; Huth et al. 2015): spatial heterogeneity is only brought to the outputs by the spatial variations of model parameters across different stations, and the spatial variations of large-scale predictors cannot be transmitted to the outputs at no-station locations. Of course, directly interpolating outputs can also reflect the heterogeneity from predictors, but only the heterogeneity at gauge stations can be accurately transmitted, and that at no-station positions is produced through interpolation, which cannot be regarded as accurate. In this study, the actual performance of the parameter-interpolation approach should have little advantage over the output-interpolation approach because of the rough resolution of the large-scale predictors; therefore, such performance comparisons need not be quantitatively analyzed.

When compared with the other three products (CMORPH, ERA-Interim/Land, and PGF), the downscaled precipitation had higher correlations with the modeled products (ERA-Interim/Land and PGF) than do the CMORPH products for the daily and monthly area-averaged precipitation. These correlations indicated that the downscaled results had no significant differences from the other three products; therefore, the SD results could be regarded as another useful gridded products for precipitation, similar to the results from dynamical models. Nevertheless, the downscaled outputs cannot reproduce the systematic structure of the storms because the stochastic nature of precipitation still plays an important role and is difficult to be explained by the large-scale variables. Given that precipitation is a subdaily process, it is reasonable that the SD based on daily-scale variables cannot produce effects that are as satisfying as that produced by regional circulation models that simulate at very short time steps.

5. Further remarks on spatial heterogeneity

Spatial autocorrelation (spatial dependence) and heterogeneity (spatial variability) are two main characteristics that should be reflected by precipitation downscaling. In theory, reproducing heterogeneity is more meaningful than emphasizing autocorrelation because the former is the real purpose of downscaling and the latter can be satisfied more easily. In SD, spatial heterogeneity is carried by three types of information: model parameters, large-scale predictors, and random noise.

The spatial patterns of model parameters are fixed values, unlike the spatial patterns of large-scale predictors, which are temporally dynamic. In this respect, the dynamic heterogeneity carried by large-scale predictors is very important for downscaling of daily precipitation. Some previous studies used the same set of predictors for different stations; thus there is no heterogeneity carried by predictors, and most of the heterogeneity was represented by using stochastic noise. The nonhomogeneous hidden Markov model (Frost et al. 2011; Hughes et al. 2002; Charles et al. 1999) can reproduce any spatial dependence but cannot transmit heterogeneity from large-scale predictors on a daily scale. In other words, such models use only the same set of predictor values for different sites and therefore are not suitable for use in a large region such as North China. In this study, the parameter-interpolation method can isolate the heterogeneity in different stations from the heterogeneity that is carried by large-scale predictors. Although the heterogeneity carried by each predictor had a rough resolution, the combination of multiple predictors can provide a complex and meaningful heterogeneity. Thus, the parameter-interpolation method in this study can be used for regions of any size.

As an important component of the local-scale variability in the observations, the random noise is the local high-resolution or high-frequency signal that is difficult to explain by the large-scale predictors. Obviously, such noises have influences on reproducing local-scale extremes. In a downscaling application, if the daily variability is not of interest to users, such noises are negligible; thus the downscaling is a deterministic one. On the other hand, if users care about the local variability, variance inflation (bias correction) should be applied, such as a linear transformation or a quantile mapping. However, bias correction from gridbox area mean to the local observed version should be used with caution because bias correction often leads to unexpected outputs (Maraun 2013; Maraun et al. 2017). This implies that the modeling of variability related to noises with deterministic methods is not an easy task. Maraun (2013) has proposed two strategies to avoid the bias-correction problem. The first strategy is to correct only the mean and to neglect considering the variance inflation problem related to the unexplained noise. The second strategy is combining a linear transformation with a stochastic noise model. Nevertheless, introducing a noise model also should be done with caution because, apart from reproducing the local variability, how to meet spatial autocorrelation is also another problem. Fortunately, there have been many techniques to generate a stochastic field to model such spatial autocorrelation phenomena, such as multifractal models or other multiscale models (Deidda 2000; Xu et al. 2015). In this study, we mainly focus on downscaling the mean, the deterministic part of the model. As a stochastic part, the range of the local variability is always related to the mean, implying that as long as the mean is obtained then extremes can be easily obtained through the probability distribution function of the noise. Therefore, in most cases it is unnecessary to generate a stochastic output.

Acknowledgments

This work is funded by the key scientific research projects of universities in Henan (Grant 19A170007), the Innovation Scientists and Technicians Troop Construction Projects of Henan Province (Grant CXTD2016053), and the Henan Provincial Natural Science Foundation Project (Grant 182300410155).

REFERENCES

  • Balsamo, G., and et al. , 2015: ERA-Interim/Land: A global land surface reanalysis data set. Hydrol. Earth Syst. Sci., 19, 389407, https://doi.org/10.5194/hess-19-389-2015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bao, J., J. Feng, and Y. Wang, 2015: Dynamical downscaling simulation and future projection of precipitation over China. J. Geophys. Res., 120, 82278243, https://doi.org/10.1002/2015JD023275.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Castro, C. L., R. A. Pielke, and G. Leoncini, 2005: Dynamical downscaling: Assessment of value retained and added using the Regional Atmospheric Modeling System (RAMS). J. Geophys. Res., 110, D05108, https://doi.org/10.1029/2004JD004721.

    • Search Google Scholar
    • Export Citation
  • Charles, S. P., B. C. Bates, and J. P. Hughes, 1999: A spatiotemporal model for downscaling precipitation occurrence and amounts. J. Geophys. Res., 104, 31 65731 669, https://doi.org/10.1029/1999JD900119.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chen, D., 2000: A monthly circulation climatology for Sweden and its application to a winter temperature case study. Int. J. Climatol., 20, 10671076, https://doi.org/10.1002/1097-0088(200008)20:10<1067::AID-JOC528>3.0.CO;2-Q.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dee, D. P., and et al. , 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553597, https://doi.org/10.1002/qj.828.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Deidda, R., 2000: Rainfall downscaling in a space-time multifractal framework. Water Resour. Res., 36, 17791794, https://doi.org/10.1029/2000WR900038.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • DelSole, T., and J. Shukla, 2009: Artificial skill due to predictor screening. J. Climate, 22, 331345, https://doi.org/10.1175/2008JCLI2414.1.

  • Fan, L., Z. Yan, D. Chen, and C. Fu, 2015: Comparison between two statistical downscaling methods for summer daily rainfall in Chongqing, China. Int. J. Climatol., 35, 37813797, https://doi.org/10.1002/joc.4246.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Frost, A. J., and et al. , 2011: A comparison of multi-site daily rainfall downscaling techniques under Australian conditions. J. Hydrol., 408, 118, https://doi.org/10.1016/j.jhydrol.2011.06.021.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, G., S. P. Charles, and S. Kirshner, 2013a: Daily rainfall projections from general circulation models with a downscaling nonhomogeneous hidden Markov model (NHMM) for south-eastern Australia. Hydrol. Processes, 27, 36633673, https://doi.org/10.1002/hyp.9483.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fu, G., S. P. Charles, F. H. S. Chiew, J. Teng, H. Zheng, A. J. Frost, W. Liu, and S. Kirshner, 2013b: Modelling runoff with statistically downscaled daily site, gridded and catchment rainfall series. J. Hydrol., 492, 254265, https://doi.org/10.1016/j.jhydrol.2013.03.041.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hammami, D., T. S. Lee, T. B. M. J. Ouarda, and J. Lee, 2012: Predictor selection for downscaling GCM data with LASSO. J. Geophys. Res., 117, D17116, https://doi.org/10.1029/2012JD017864.

    • Search Google Scholar
    • Export Citation
  • Hughes, J. P., P. Guttorp, and S. P. Charles, 2002: A non-homogeneous hidden Markov model for precipitation occurrence. J. Roy. Stat. Soc., 48C, 1530, https://doi.org/10.1111/1467-9876.00136.

    • Search Google Scholar
    • Export Citation
  • Huth, R., J. Miksovsky, P. Stepanek, M. Belda, A. Farda, Z. Chladova, and P. Pisoft, 2015: Comparative validation of statistical and dynamical downscaling models on a dense grid in central Europe: Temperature. Theor. Appl. Climatol., 120, 533553, https://doi.org/10.1007/s00704-014-1190-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Joyce, R. J., J. E. Janowiak, P. A. Arkin, and P. P. Xie, 2004: CMORPH: A method that produces global precipitation estimates from passive microwave and infrared data at high spatial and temporal resolution. J. Hydrometeor., 5, 487503, https://doi.org/10.1175/1525-7541(2004)005<0487:CAMTPG>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Khalili, M., V. N. Van Thanh, and P. Gachon, 2013: A statistical approach to multi-site multivariate downscaling of daily extreme temperature series. Int. J. Climatol., 33, 1532, https://doi.org/10.1002/joc.3402.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kirchmeier-Young, M. C., D. J. Lorenz, and D. J. Vimont, 2016: Extreme event verification for probabilistic downscaling. J. Appl. Meteor. Climatol., 55, 24112430, https://doi.org/10.1175/JAMC-D-16-0043.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, W., G. Fu, C. Liu, and S. P. Charles, 2013: A comparison of three multi-site statistical downscaling models for daily rainfall in the North China Plain. Theor. Appl. Climatol., 111, 585600, https://doi.org/10.1007/s00704-012-0692-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, Y., J. Feng, X. Liu, and Y. Zhao, 2019: A method for deterministic statistical downscaling of daily precipitation at a monsoonal site in Eastern China. Theor. Appl. Climatol., 135, 85100, https://doi.org/10.1007/S00704-017-2356-6.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Manzanas, R., S. Brands, D. San-Martin, A. Lucero, C. Limbo, and J. M. Gutierrez, 2015: Statistical downscaling in the tropics can be sensitive to reanalysis choice: A case study for precipitation in the Philippines. J. Climate, 28, 41714184, https://doi.org/10.1175/JCLI-D-14-00331.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., 2013: Bias correction, quantile mapping, and downscaling: Revisiting the inflation issue. J. Climate, 26, 21372143, https://doi.org/10.1175/JCLI-D-12-00821.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., and et al. , 2010: Precipitation downscaling under climate change: recent developments to bridge the gap between dynamical models and the end user. Rev. Geophys., 48, RG3003, https://doi.org/10.1029/2009RG000314.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., and et al. , 2017: Towards process informed bias correction of climate change simulations. Nat. Climate Change, 7, 764773, https://doi.org/10.1038/NCLIMATE3418.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nasseri, M., H. Tavakol-Davani, and B. Zahraie, 2013: Performance assessment of different data mining methods in statistical downscaling of daily precipitation. J. Hydrol., 492, 114, https://doi.org/10.1016/j.jhydrol.2013.04.017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Qian, C., W. Zhou, S. K. Fong, and K. C. Leong, 2015: Two approaches for statistical prediction of non-Gaussian climate extremes: A case study of Macao hot extremes during 1912–2012. J. Climate, 28, 623636, https://doi.org/10.1175/JCLI-D-14-00159.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • San-Martin, D., R. Manzanas, S. Brands, S. Herrera, and J. M. Gutierrez, 2017: Reassessing model uncertainty for regional projections of precipitation with an ensemble of statistical downscaling methods. J. Climate, 30, 203–223, https://doi.org/10.1175/JCLI-D-16-0366.1.

    • Search Google Scholar
    • Export Citation
  • Sheffield, J., G. Goteti, and E. F. Wood, 2006: Development of a 50-year high-resolution global dataset of meteorological forcings for land surface modeling. J. Climate, 19, 30883111, https://doi.org/10.1175/JCLI3790.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shin, Y., and B. P. Mohanty, 2013: Development of a deterministic downscaling algorithm for remote sensing soil moisture footprint using soil and vegetation classifications. Water Resour. Res., 49, 62086228, https://doi.org/10.1002/wrcr.20495.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sunyer, M. A., and et al. , 2015a: Inter-comparison of statistical downscaling methods for projection of extreme precipitation in Europe. Hydrol. Earth Syst. Sci., 19, 18271847, https://doi.org/10.5194/hess-19-1827-2015.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Sunyer, M. A., I. B. Gregersen, D. Rosbjerg, H. Madsen, J. Luchner, and K. Arnbjerg-Nielsen, 2015b: Comparison of different statistical downscaling methods to estimate changes in hourly extreme precipitation using RCM projections from ENSEMBLES. Int. J. Climatol., 35, 25282539, https://doi.org/10.1002/joc.4138.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tareghian, R., and P. F. Rasmussen, 2013: Statistical downscaling of precipitation using quantile regression. J. Hydrol., 487, 122135, https://doi.org/10.1016/j.jhydrol.2013.02.029.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tolika, K., P. Maheras, M. Vafiadis, H. A. Flocasc, and A. Arseni-Papadimitriou, 2007: Simulation of seasonal precipitation and raindays over Greece: A statistical downscaling technique based on artificial neural networks (ANNs). Int. J. Climatol., 27, 861881, https://doi.org/10.1002/joc.1442.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Werner, A. T., and A. J. Cannon, 2016: Hydrologic extremes—An intercomparison of multiple gridded statistical downscaling methods. Hydrol. Earth Syst. Sci., 20, 14831508, https://doi.org/10.5194/hess-20-1483-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Xu, G., and et al. , 2015: Spatial downscaling of TRMM precipitation product using a combined multifractal and regression approach: Demonstration for South China. Water, 7, 30833102, https://doi.org/10.3390/w7063083.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Yang, C., R. E. Chandler, V. S. Isham, and H. S. Wheater, 2005: Spatial-temporal rainfall simulation using generalized linear models. Water Resour. Res., 41, W11415, https://doi.org/10.1029/2004WR003739.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, Q., P. Shi, V. P. Singh, K. Fan, and J. Huang, 2017: Spatial downscaling of TRMM-based precipitation data using vegetative response in Xinjiang, China. Int. J. Climatol., 37, 38953909, https://doi.org/10.1002/joc.4964.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhang, X., and X. Yan, 2015: A new statistical precipitation downscaling method with Bayesian model averaging: A case study in China. Climate Dyn., 45, 25412555, https://doi.org/10.1007/s00382-015-2491-7.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zheng, X., and R. W. Katz, 2008: Mixture model of generalized chain-dependent processes and its application to simulation of interannual variability of daily rainfall. J. Hydrol., 349, 191199, https://doi.org/10.1016/j.jhydrol.2007.10.061.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Zhu, X., X. Qiu, Y. Zeng, W. Ren, B. Tao, H. Pan, T. Gao, and J. Gao, 2018: High-resolution precipitation downscaling in mountainous areas over China: Development and application of a statistical mapping approach. Int. J. Climatol., 38, 7793, https://doi.org/10.1002/joc.5162.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save