Comparing the Lasso Predictor-Selection and Regression Method with Classical Approaches of Precipitation Bias Adjustment in Decadal Climate Predictions

Jingmin Li Institute of Geography and Geology, University of Wuerzburg, Würzburg, Germany

Search for other papers by Jingmin Li in
Current site
Google Scholar
PubMed
Close
,
Felix Pollinger Institute of Geography and Geology, University of Wuerzburg, Würzburg, Germany

Search for other papers by Felix Pollinger in
Current site
Google Scholar
PubMed
Close
, and
Heiko Paeth Institute of Geography and Geology, University of Wuerzburg, Würzburg, Germany

Search for other papers by Heiko Paeth in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

In this study, we investigate the technical application of the regularized regression method Lasso for identifying systematic biases in decadal precipitation predictions from a high-resolution regional climate model (CCLM) for Europe. The Lasso approach is quite novel in climatological research. We apply Lasso to observed precipitation and a large number of predictors related to precipitation derived from a training simulation, and transfer the trained Lasso regression model to a virtual forecast simulation for testing. Derived predictors from the model include local predictors at a given grid box and EOF predictors that describe large-scale patterns of variability for the same simulated variables. A major added value of the Lasso function is the variation of the so-called shrinkage factor and its ability in eliminating irrelevant predictors and avoiding overfitting. Among 18 different settings, an optimal shrinkage factor is identified that indicates a robust relationship between predictand and predictors. It turned out that large-scale patterns as represented by the EOF predictors outperform local predictors. The bias adjustment using the Lasso approach mainly improves the seasonal cycle of the precipitation prediction and, hence, improves the phase relationship and reduces the root-mean-square error between model prediction and observations. Another goal of the study pertains to the comparison of the Lasso performance with classical model output statistics and with a bivariate bias correction approach. In fact, Lasso is characterized by a similar and regionally higher skill than classical approaches of model bias correction. In addition, it is computationally less expensive. Therefore, we see a large potential for the application of the Lasso algorithm in a wider range of climatological applications when it comes to regression-based statistical transfer functions in statistical downscaling and model bias adjustment.

Current affiliation: Institute of Atmospheric Physics, German Aerospace Center, Oberpfaffenhofen, Germany.

Corresponding author: Jingmin Li, jingmin.li@dlr.de

Abstract

In this study, we investigate the technical application of the regularized regression method Lasso for identifying systematic biases in decadal precipitation predictions from a high-resolution regional climate model (CCLM) for Europe. The Lasso approach is quite novel in climatological research. We apply Lasso to observed precipitation and a large number of predictors related to precipitation derived from a training simulation, and transfer the trained Lasso regression model to a virtual forecast simulation for testing. Derived predictors from the model include local predictors at a given grid box and EOF predictors that describe large-scale patterns of variability for the same simulated variables. A major added value of the Lasso function is the variation of the so-called shrinkage factor and its ability in eliminating irrelevant predictors and avoiding overfitting. Among 18 different settings, an optimal shrinkage factor is identified that indicates a robust relationship between predictand and predictors. It turned out that large-scale patterns as represented by the EOF predictors outperform local predictors. The bias adjustment using the Lasso approach mainly improves the seasonal cycle of the precipitation prediction and, hence, improves the phase relationship and reduces the root-mean-square error between model prediction and observations. Another goal of the study pertains to the comparison of the Lasso performance with classical model output statistics and with a bivariate bias correction approach. In fact, Lasso is characterized by a similar and regionally higher skill than classical approaches of model bias correction. In addition, it is computationally less expensive. Therefore, we see a large potential for the application of the Lasso algorithm in a wider range of climatological applications when it comes to regression-based statistical transfer functions in statistical downscaling and model bias adjustment.

Current affiliation: Institute of Atmospheric Physics, German Aerospace Center, Oberpfaffenhofen, Germany.

Corresponding author: Jingmin Li, jingmin.li@dlr.de

1. Introduction

The least absolute shrinkage and selection operator (Lasso), developed by Robert Tibshirani in 1996 (Tibshirani 1996), is a regularized regression method with enhanced predictor accuracy and interpretability in comparison to traditional regression methods. It introduces a shrinkage factor into the regression equation and performs variable selection to avoid overfitting, a typical problem of multivariate regression (Tetko et al. 1995). Numerous scientific publications employing this method emerged recently. It has been widely applied to diverse fields; for example, medicine and biological studies (Göbl et al. 2015; Zhang et al. 2017), cosmology (Uemura et al. 2015), biogeography (Bratsch et al. 2017; Nishar et al. 2017), sociology (Woodard et al. 2016; Liang et al. 2017), ecology (Wilson et al. 2017), environmental sciences (Larkin et al. 2017; Liu et al. 2017), and meteorology (e.g Gao et al. 2014; Zaikarina et al. 2016).

So far, its applications in meteorology are mostly limited to the statistical downscaling of coarse-resolution datasets from climate models and reanalyses for some specific locations (e.g., regions with complex topography). Gao et al. (2014) applied Lasso to downscale ERA-Interim precipitation for the Alps using data from 50 meteorological stations. Rachmawati et al. (2017) estimated rainfall in one specific district in Indonesia based on 49 variables in coarse resolution predicted by the Global Precipitation Climatology Project (GPCP) with Lasso. Upadhyaya and Ramsankaran (2016) improved a multispectral rainfall algorithm for India using Lasso and static topographic variables derived from a digital elevation model. Besides its application to statistical downscaling, Lasso has been used for seasonal prediction of temperature. DelSole and Banerjee (2017) predicted seasonal temperature over Texas based on observational predictors or dynamical model outputs using the Lasso regularization feature. However, the applications of Lasso in other fields of meteorology are still sparse and limited.

Here, we introduce the Lasso approach to the topic of bias adjustment of precipitation simulated in the framework of decadal climate predictions. Forecast simulations are known to have various biases (Eden et al. 2012) and there is partly irreducible uncertainty in climate predictions (Hawkins et al. 2016). A cost effective bias adjustment can remove climate model’s systematic errors and, therefore, improve the accuracy of the model output (Maraun 2012). So far, decadal prediction skill scores for precipitation are generally poor (e.g., Boer et al. 2016). In contrast to temperature, precipitation is characterized by larger model biases and lower multiyear predictability independent of the lead time. Our previous study (Li et al. 2019) presented a non-lead-time dependence bias-adjustment method for precipitation using model output statistics (MOS). The model’s systematic error was identified using an assimilation run forced by reanalysis data for training that does not include forecast errors. The bias adjustment of systematic error in 285 individual decadal hindcasts from a large ensemble demonstrated a noticeable improvement in prediction skill. A different bias was identified when the MOS was trained on the bases of the hindcast run themself. A distinct error is that decadal forecasts fail to predict a realistic seasonal cycle of precipitation (Li et al. 2019). Therefore, we focus here on monthly time series over all seasons of the year. Another reason is that the sample size increases when using monthly time series.

In this study, we focus on explaining the technical details of the Lasso approach and comparing the behavior of Lasso regression with other statistical methods. We use model output from two single decadal hindcast simulations from a high-resolution regional climate model (CCLM) (Rockel et al. 2008) for Europe as an example, to demonstrate the ability of Lasso of finding model errors on interannual variability and achieving appropriate bias adjustment. This example can clearly visualize the Lasso regression process and its advantage with respect to classical model output statistics. The calibration of complete hindcast ensembles that include data from 285 hindcasts in total has been discussed thoroughly in our previous study (Li et al. 2019). The object of this study is to open the “black box” of regularized regression Lasso to a broader public, using two single simulations as an example. We focus on some key questions: (i) How does Lasso regularize the selection of predictors? (ii) How does Lasso regression react to a change of the shrinkage factor? (iii) What are the effects of applying Lasso regression for a bias adjustment to another independent simulation?

This paper is organized as follows. Section 2 describes data and methods. The results of the Lasso regression based on the training data are presented in section 3. Section 4 demonstrates the application of Lasso to an independent precipitation forecast (testing data) and undertakes a comparison with other statistical methods. The results are discussed in section 5 and major conclusions are drawn in section 6.

2. Data and methods

a. Data and climate model simulations

We use data from two single simulations realized with the high-resolution regional climate model CCLM, produced in the framework of the German decadal climate prediction system (Marotzke et al. 2016). They are dynamically downscaled from the coupled Earth system model MPI-ESM-HR (Müller et al. 2018), at a horizontal resolution of 0.22° (~25 km) over Europe (see model domain in Fig. 1). The hindcasts data are initialized once at the start of specific year but evolve freely over the next 10 years. They exhibit a more or less large forecasts uncertainty and different type of errors. One typical error from hindcast simulations is a lead-time dependent drift (Meehl et al. 2014; Paeth et al. 2019). This error has been found to mainly affect temperature, especially over the North Atlantic (Meehl et al. 2014; Marotzke et al. 2016; Boer et al. 2016) but not for precipitation. In this study, we focus on a technique to identify and adjust systematic errors in precipitation prediction that are not characterized by a drift problem.

Fig. 1.
Fig. 1.

Long-term mean annual precipitation (mm yr−1) during the 2001–10 period from (a) E-OBS and (b) a CCLM decadal prediction, and (c) systematic differences between them. Red stars and labels indicate the example locations where we show various results of the Lasso approach in the following plots.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

Two simulations are selected from the hindcast dataset. A hindcast run during 1986–95 is used to train the statistical models. This is the same training data as the one used in Li et al. (2019). We treat another independent hindcast run 2001–10 as unknown data (a retrospective forecast) and this dataset is used to evaluate the efficiency of statistical training. As the main goal of the present study is the comparison of the three different methods of bias adjustment, we rely on only one exemplary pair of training and verification decades, assuming that the complete 285 hindcasts published in Li et al. (2019) provide similar results in term of the relative skills of the methods. The quite stationary nature of the systematic model biases across all available hindcasts as revealed in the Li et al. (2019) supports this assumption.

To estimate the model deficiencies in CCLM hindcast, we compare CCLM predictions for precipitation from our testing simulation to the E-OBS dataset (Haylock et al. 2008). Figure 1 shows 10-yr long-term means of annual precipitation during 2001–10 over Europe from model and observational data, and their differences. The figure shows the model’s systematic bias from E-OBS. CCLM overestimates the precipitation amount for most regions in Europe. There is a bias of more than 200 mm yr−1 in Scandinavia, the Iberian Peninsula, France and the eastern part of the model domain. A smaller bias of 100 mm yr−1 is found for central Europe. These systematic overestimations are on the order of 25%–50% of their present-day rainfall totals (Figs. 1b,c).

b. Statistical models

1) Lasso regression approach

Lasso is a regularized regression method that performs subset selection and regression at the same time. It shrinks the regression coefficients of irrelevant predictors toward zero and performs an adequate variable selection to avoid the overfitting problem that is typical for ordinary least squares regression (OLS).

We have multiple realizations (e.g., a time series, yi of a variable y, which we call the predictand). For each of the i = 1, 2, …, N, data points, we have multiple predictors xij = (xi1, xi2, …, xip), where p is the total number of available predictors. The OLS estimates the j = 0, 1, …, p regression coefficients (βj) based on the criterion of a minimized mean square error between predictand (yi) and predictions y^i=β0+jβjxij, according to Eq. (1):

β^=argminβ{i=1N(yiy^i)2}.

However, when the size of predictors is large, the OLS estimation will often overfit (Tetko et al. 1995). Overfitting is a typical problem in statistical fitting (e.g., regression models), that occurs when the number of predictors is large compared with the degrees of freedom of a given sample. In our case, the sample comprises 120 data points (months) and the number of predictors is 87. Under such conditions, optimization approaches such as the least squares method are designed to identify a linear combination that explains the total variances of the sample, although the predictors have no true relationship with the predictand. To avoid this overfitting problem, Lasso includes a shrinkage factor s in regression.

The idea is to remove predictors unrelated to the predictand or merely replicating information provided by other predictor variables. This shrinkage factor s controls the size of the selected predictor subsets. Thus, the equation for the Lasso regression coefficient reads

β^=argminβ{i=1N(yiy^i)2+sj=1q|βj|},

where q is the total number of predictors selected.

We provide in this study 18 different shrinkage factors ranging from 0.01 to 3.0 to investigate the functionality of the Lasso approach, using the training hindcast period 1986–95 and the verification period 2001–10. The optimal shrinkage factor is chosen by means of an additional step. We apply a tenfold cross validation for the training simulation 1986–95 to identify the optimal setting using the minimum bias and to keep the training data consistent with the MOS approach.

In this study, the predictand y is the observed monthly precipitation provided by E-OBS. Predictors are derived from the multivariate output of the training simulation. Altogether, we take 87 potential predictors from the precipitation (pr), temperature (tas) and sea level pressure (psl) fields into account. The predictors are classified into two groups: local and EOF predictors. Local predictors are the simulated time series from nine grid points surrounding a specific E-OBS grid point. EOF predictors are constructed via empirical orthogonal function analyses over the whole model domain, and represent the large-scale intra- and interannual variation of prediction. We use the leading 20 EOF time series per variable as predictors. In the present study, EOF analysis is not applied in order to contribute to data reduction and the overfitting problem, but to set up a new group of predictors that represent the temporal behavior of large-scale meteorological fields such as circulation modes and weather types. It will be shown later that local precipitation variability is indeed governed by such large-scale atmospheric structures. At each grid point, we prepare training data that consist of 120 time steps from 87 preditors and 1 predictand. This is a highly underconstrained condition for statistical training. This condition requires additional regularization to the training through the use of lasso using its shrinkage factor. After Lasso training, subsets of predictors are selected, and their respective regression coefficients are estimated. The selected predictors and the derived coefficients are then transferred to the forecast simulation to test the bias adjustment on the basis of an independent dataset. Details about the procedure are given in Li et al. (2019).

Our statistical training and bias adjustment are conducted on simultaneously time series of monthly data for 10 years. No seasonal stratification for training data are applied. This way allows evaluation for the phase correlation between model prediction and observational time series. Furthermore, it provides more data for statistical training and benefits from efficient computing. One regression function is needed for the bias adjustment, instead of 4 seasonal or 12 monthly regression functions.

We calculate the Lasso regulation with the “glmnet” package of the R programming language (Friedman et al. 2010). This package provides a fast algorithm of cyclical coordinate descent computed along a regularization path (Goeman 2010). Cyclical coordinate descent is an optimization algorithm. It minimizes a multiple dimensional function by successively minimizing individual dimension in a cyclic way, holding the values in other dimensions fixed.

The results are compared to two other statistical methods: model output statistics based on a classical stepwise multivariate regression (MOS) (Paeth 2011) and a two-dimensional bias adjustment with temperature and precipitation (2DTP) (Piani and Haerter 2012).

2) MOS

The MOS algorithm uses exactly the same predictor datasets and the same training simulation (hindcast 1986–95) as the Lasso algorithm. MOS is a stepwise forward regression method with implemented cross-validation process (Paeth 2011). It also performs a predictor selection, with the predictors being stepwise selected based on the magnitude of their correlation with the predictand’s actual residual. The physical meaningfulness of the regression is validated via a cross-validation process. The time series of predictors and predictand from the training simulation (hindcast 1986–95) are split into two datasets: a subset with 80% of monthly time steps on which the regression model is built, and a subset with the remaining 20% of data points for its evaluation. If an added predictor is viable, the mean square error of both training and control dataset decreases. Otherwise, the additional predictor enhances the mean square error of the control dataset. Therefore, the optimal number and subset of predictors are determined from the first minimum of the forecast error. The described procedure is iterated 100 times. Each time, the training dataset is selected in a randomized way. This procedure consumes considerably more computing time than Lasso.

MOS has been successfully applied to a complete hindcasts dataset for bias adjustment of decadal prediction for precipitation and demonstrated an improved accuracy in decadal predictions of precipitation over most parts of Europe (Li et al. 2019). In this study, we use MOS method as a reference for the Lasso method. The estimated regression functions from both methods are examined using the independent control data (hindcast run 2001–10).

3) 2DTP

2DTP is another statistical method we use for comparison. Compared to Lasso and MOS, it is simple and time efficient. Piani and Haerter (2012) presented this precipitation bias-adjustment method using model outputs only for temperature and precipitation (two predictors for regression). They show that 2DTP bias correction provides a more realistic coupla function for temperature and precipitation that retains the dynamical link between them, in contrast to two independent 1D bias adjustments. This link between multiple variables is of high importance (e.g., for end users concerned with hydrological or biogeographical issues and for various climate impact models).

The regression equation of 2DTP is determined from our training simulations following the instructions of Piani and Haerter (2012). First, linear fitting of temperature (T) from model simulation (Tm) and observations (To) is used to find a regression coefficient bT and this coefficient is used to adjust the temperature of the training simulation [Eq. (3)]. The second step is to group T and P (precipitation) pairs of the temperature-adjusted training simulation into 10 temperature intensity quantiles. Then, for each temperature quantile group, a regression coefficient bP for precipitation correction is determined separately by linear regression of model precipitation (Pm) on observations (Po) [Eq. (4)]:

bT=flinear regression(Tm,To)
bPjth_T_quantile=flinear_regression(Pmjth_T_quantile,Pojth_T_quantile).

Forecast temperature and precipitation bias correction is then straightforward. Only simulated precipitation and temperature are required as predictors. The temperature forecast is adjusted with bT. Precipitation forecasts in specific temperature intensity quantile groups are adjusted with the specific bPjth_T_quantile. This method has advantages due to its simplicity and time efficiency.

c. Evaluation

We present in this study bias adjustments for monthly time series of the hindcast run 2001–10 using regression functions estimated from different statistic modes based on the training simulation. The adjusted time series from this hindcast prediction are tested against observational data in terms of the explained variance (R2) and the root-mean-square error (RMSE). An increase of R2 indicates an improved predictability of month-to-month precipitation variations, which can be attributed to a more realistic seasonal cycle and/or monthly anomalies. Similarly, a decrease in RMSE indicates an improvement in the simulated precipitation climatology. A significance test for correlations as suggested by Olkin (1967) is used to evaluate if changes in R2 are significant due to bias adjustment and if the values of R2 differ between the statistical methods. We adjust the degree of freedom with respect to the number of predictors used for regression.

We limit our evaluation in this study to this single hindcast run to focus on the technical prospects of statistical models. Similar results have been seen for all other hindcast runs. The MOS approach that has previously been evaluated in the light of various skill score function (Li et al. 2019) is used in this study as a reference method for Lasso regression. Expected results of Lasso application to the complete decadal hindcasts are briefly discussed based on MOS results published in our previous study.

3. Lasso characteristics in the training data

Our results are presented in two steps. First, we show results and characteristics of the Lasso approach based on the training simulation and observations, namely influence of Lasso shrinkage factor on selected predictors and regression results (section 3) and, second, the performance of the Lasso approach is assessed by applying it to an independent forecast simulation (section 4).

Lasso’s shrinkage factors are a major constraint that determines the number of selected predictors and the fitting accuracy. To investigate the influence of Lasso’s shrinkage factor on the selected predictor subsets, Lasso fitting is calculated with 18 different shrinkage factors ranging from 0.01 to 3 for the complete model domain. The potential predictors (87 in total) are prepared from monthly outputs of the training simulation [see details in section 2b(1)], the predictand is observed monthly precipitation from E-OBS during the same time period.

Various characteristics of the Lasso approach are visualized for four locations (Berlin, Germany; Graz, Austria; Madrid, Spain; and Oulu, Finland) in Fig. 2. Results show 1) that the number of selected predictors decrease with an increase of the Lasso shrinkage factor. For example, for Berlin, about 80 out of 87 predictors are selected by Lasso with a shrinkage factor of 0.01. Using a shrinkage factor of 2.0, only 24 predictors are selected by Lasso. 2) There is noticeable spatial variation in the Lasso predictor selection. Different subsets of predictors are selected by Lasso at different locations. 3) EOF predictors dominate for shrinkage factors larger than 1.0. This point suggests that EOF predictors, which represent larger-scale variability of precipitation, are more relevant to observed precipitation than local predictors. Local sea level pressure (psl) predictors are obviously less important for monthly precipitation variations.

Fig. 2.
Fig. 2.

Influence of Lasso shrinkage factor on the predictor selection at 4 exemplary locations, with the y -axis displaying the complete predictors (in total 87). Orange lines separate the 6 different groups of predictors: 9 local predictors and 20 EOF predictors each for variables precipitation (pr), temperature (tas), and sea level pressure (psl). The Lasso shrinkage factors (18 in total) are displayed along the x axis, ranging from 0.01 to 3.0, together with the classical MOS approach trained with the same data (bottom row, marked by a blue rectangle). The selected predictors are marked with red bars. Another blue rectangle represents the selected optimum lasso shrinkage factors as discussed in section 4.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

To evaluate the fitting accuracy, we calculate the R2 and RMSE for fitting time series in comparison with observations. With respect to the effect of shrinkage factors on the results of the Lasso regression, in general we sacrifice regression accuracy with the increase of shrinkage factors: R2 decrease by about 15%–30% from a shrinkage factor of 0.01–3.0 (Fig. 3). The RMSE increases by about 5–10 mm month−1 with the increase of the shrinkage factor from 0.01 to 3.0 (Fig. 4). Furthermore, the effect of the shrinkage factor saturates toward larger values. This effect can be explained by the relative constant selection of predictors beyond certain thresholds of the shrinkage factor (Fig. 3).

Fig. 3.
Fig. 3.

The influence of Lasso shrinkage factors (x axis) on the explained variance (R2) from Lasso regression (black dots), compared with the original model prediction (Raw, blue line), the MOS regression (dark green line), and 2DTP regression (orange line). R2 is calculated between simulated and observed monthly precipitation during 1986–95.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

Fig. 4.
Fig. 4.

As in Fig. 4, but for root-mean-square error (RMSE).

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

The above results can be explained as follows. If the shrinkage factor in Eq. (2) gets very small, Lasso reduces to the OLS approach. In our cases, a sample size with 120 data points (months) and 87 predictors implies that the Lasso model is then prone to overfitting. In contrasts, when s becomes very large, Lasso model reduces to a constant fit (i.e., the climatological mean), eliminates all other regression coefficients and likely leads to underfitting.

Note that shrinkage penalty function presented in this study depends on the scale of the coefficients, and thus the scale of the predictors. The range of shrinkage factor from 0.01 to 3 would have no meaning if the units of these predictors are changed. In regularized regression, it is customary and important to rescale the predictors to have identical variances.

A detailed description of the classical MOS regression using the same data is given in Li et al. (2019). The blue boxes in Fig. 2 mark the final predictor selections from MOS and Lasso, with differences between them. For example, at Oulu (Finland), MOS picked 9 tas_local and 4 EOF predictors, while Lasso chose 1 tas_local prediction along with 22 EOF predictors. Regarding to regression accuracy, we first see that all approaches are effective. They increase R2 by at least 20% (Fig. 3) and reduce the RMSE by about 15–30 mm month−1 (Fig. 4). Furthermore, all methods achieve improvement in the similar range with small variances.

4. Lasso application to independent data

The selected predictors and the derived regression coefficients can be transferred to a precipitation forecast for the purpose of bias adjustment. Here, we demonstrate the approach using a decadal prediction of precipitation for the 2001–10 period as an example, to assess whether the trained statistical model achieves reliable results for another independent simulation.

a. Influence of the Lasso shrinkage factor

The time series of Lasso bias-adjusted precipitation during the 2001–10 period are shown for the location Graz in Fig. 5, together with the unadjusted prediction, E-OBS observations, MOS and 2DTP adjusted prediction. Three general conclusions can be drawn here: 1) all methods adjust the simulated precipitation toward the observed seasonal cycle. 2) Same as for the training data, all methods underestimate the observed precipitation variability. Simulated extremes of monthly precipitation are eliminated in the course of bias correction. Still, the Lasso adjustment seems to make the smallest underestimation of monthly precipitation variations. 3) The precipitation variability is a function of the Lasso shrinkage factor, as has been found for the training simulation: with increasing shrinkage factor Lasso reduces the amplitude of precipitation variations during the considered time period and the adjusted time series move closer to a mean state.

Fig. 5.
Fig. 5.

Time series of precipitation (mm month−1) during 2001–10 at Graz (Austria) from (a) decadal hindcast prediction (blue) and E-OBS (red) and (b) bias-adjusted prediction using MOS (green), 2DTP (orange), and Lasso (black). The filled blue color marks the region of time series adjusted using Lasso shrinkage ranges from 0.01 to 3.0. Largest variance in data corresponds to the smallest shrinkage factor 0.01. The black line corresponds to the optimal shrinkage factor.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

The bias-adjustment accuracy is measured by R2 and RMSE and displayed in Figs. 6 and 7 for the 2001–10 control period. The results for MOS and 2DTP are included in the same plots as a reference. However, we will first discuss the Lasso bias-adjustment accuracy related to the choice of the shrinkage factor. First, R2 of Lasso bias-adjusted time series increase along with the Lasso shrinkage factor, with Berlin being a notable exception (Fig. 6). This result is in contrast to the training period. For example, R2 increases from below 10%–30% with an increasing of Lasso shrinkage factor from 0.01 to 1.0 at Graz, while a reduction from 60% to 40% was observed during the training period (cf. Fig. 3). Second, the RMSE decreases with an increasing Lasso shrinkage factor (Fig. 7). For example, the bias decreases from 50 to 40 mm month−1 with an increasing of Lasso shrinkage factor from 0.01 to 3.0 at Graz. This effect is again opposite to that for the training data.

Fig. 6.
Fig. 6.

As in Fig. 4, but for the application of bias adjustment to the independent 2001–10 period.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

Fig. 7.
Fig. 7.

As in Fig. 6, but for RMSE.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

These findings demonstrate the ability of the Lasso approach to avoid overfitting. With a small shrinkage factor, regression performs well on training data but poorly on independent data. With an increasing shrinkage factor, regression performance for training data decreases but increases for control data. In the ideal situation, a local maximum in the verification testing could be found. This local maximum then indicates the optimal shrinkage factor. In this study, optimal shrinkage factor is chosen via cross validation in the training data. Yet, it turns out that in our problem setting a trade-off occurs between training and control data. If an appropriate shrinkage factor is chosen, a robust statistical relationship between predictors and predictand is established. This relationship is more stable and viable for further applications (e.g., to unknown future conditions).

b. Comparison of bias-adjustment methods in Europe

Taking the lasso regression function using an optimal Lasso shrinkage factor, we conduct a bias adjustment for the entire European model domain. R2 and RMSE of Lasso adjusted monthly precipitation are compared with bias adjustments from MOS, 2DTP and the nonadjusted prediction in Fig. 8. Before bias adjustment, predicted monthly precipitation has poor R2 and high RMSE. For some regions in northern Europe and Spain, R2 reaches ~20%, the remaining regions R2 is lower than 5% (Fig. 8a). The RMSE is scaled references to observational standard deviation. Before bias adjustment, high bias is seen at all mountainous areas, which probably arises from coarse-grid topographic boundary condition in the climate model. Relatively low biases are found for other regions in Europe except domain edges (Fig. 8b). After the bias adjustment, bias is clearly reduced over the simulation domain and Lasso increases R2 to 30%–50% for northern Europe. The improvement in R2 is largely caused by the improved seasonal cycle in the monthly time series. R2 does not improve over England, Germany, or France (Fig. 8c). The low values of R2 in these regions are due to low predictability of precipitation that cannot be enhanced by removing model biases. Precipitation in these regions has less seasonal differences then Northern Europe. The RMSE/1STD is reduced below 1 for most regions in Europe (Fig. 8d).

Fig. 8.
Fig. 8.

Spatial distribution of (left) R2 and (right) RMSE/1STD estimated between decadal precipitation hindcast and observations, (from top to bottom) from (a),(b) non-bias-adjusted model, (c),(d) Lasso adjusted model prediction using optimum shrinkage factor, (e),(f) MOS adjusted model prediction, and (g),(h) the 2DTP adjusted model prediction. RMSE/1STD represents a scaled RMSE using one standard derivation of local observational data. Black points in the left column show locations where the bias adjustment achieves significant improvement in R2 compared to the unadjusted prediction at the 0.05 level. To enhance readability only every second significant grid point is marked in this figure due to the high spatial resolution of climate simulation.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

To assess the regional differences among statistical methods in detail, we illustrate the differences in R2 between Lasso and MOS, Lasso and 2DTP, and MOS and 2DTP in Fig. 9. Results show that Lasso performances better compared with other statistical models. Lasso outperforms MOS (5%–8% in R2) at sporty location over northern Europe. Lasso and MOS surpass 2DTP (8%–10%) for most regions in northern Europe. However, for some selected regions in middle and eastern Europe as well as in the Mediterranean basin MOS and 2DTP show slightly higher R2. Our results agree with other studies (Gao et al. 2014; Seya et al. 2015) who also suggest a generally good performance of Lasso in comparison with other methods.

Fig. 9.
Fig. 9.

Differences in R2 among the statistical methods of bias adjustment for monthly precipitation during the 2001–10 period. Green points show locations where differences between statistical methods are significant at the 0.05 level. As in Fig. 8, only every second significant grid point is marked here.

Citation: Monthly Weather Review 148, 10; 10.1175/MWR-D-19-0302.1

Here we restrict the bias adjustment to a single independent hindcast run at the monthly time scale. The evaluation of other skill scores on the complete set of 285 hindcasts published in Li et al. (2019) can support our results. MOS adjustment for decadal prediction of precipitation using the same training data and predictor sets was evaluated at leading time dependent multiple-year time scale with the mean square error skill score (MSESS) and the continuous ranked probability skill score (CRPSS). The MSESS describes the accuracy of the hindcasts ensemble mean in comparison to the reference data and CRPSS compares the ensemble spread uncertainty with observations (Illing et al. 2014). Our bias adjustment results in a regional improvement of 0.5 in MSESS but a largely reduced ensemble spread uncertainty in CRPSS. The reason is that variances are removed when correcting all hindcasts time series toward a known seasonal cycle. Ideally, we should distinguish the bias signal and the prediction signal before the statistical training. However, our current procedure of bias adjustment is not able to do so.

Regarding the computing efficiency, Lasso is veryeffective due to its newly developed fast algorithm (Keerthi and Shevade 2007; Goeman 2010; Gorban et al. 2016). Lasso is about 5 times faster than the MOS procedure as they are applied in this specific study for the same task. The difference is mostly caused by the time-consuming forward stepwise predictor selection process implemented in MOS and 100 iterations with random resampling of the training data. In some regions across our European model domain, a stable solution is already found for a smaller number of iterations, affecting the comparison of required computing time in favor of the MOS approach.

Compared to Lasso and MOS, 2DTP has a simple algorithm and the smallest computational demand. The 2DTP is efficient for the shown example, because the adjustment of precipitation within the defined temperature quantile is equivalent to a bias adjustment of precipitation for each month separately, and therefore the seasonal cycle of precipitation from the prediction is efficiently corrected. However, 2DTP only considered two predictors. Therefore, the application of 2DTP is limited when it comes to more complex variables or processes. Lasso and MOS on the other hand, can make a more flexible choice of provided predictors. They can select the most relevant predictors from a large set and, hence, have a wider application potential than 2DTP.

5. Discussion

Lasso has been demonstrated to be an efficient regularized regression method.

The regulation feature of Lasso is a useful technique to prevent overfitting and eliminate irrelevant predictors. This elimination of irrelevant predictors is especially important for the identification of robust statistical transfer functions in high-dimensional datasets. Although its current application in meteorology and climatology is still limited, it arguably has a wide potential.

The climate system is very complex, involving teleconnections and feedbacks among a large number of regional to local processes (Aguado and Burt 2015). Understanding and predicting such a chaotic system and even its subcomponents is challenging. Lasso is an efficient statistical data mining approach for investigating the link between variables within high-dimensional datasets. For example, Lasso can identify variables that are critical for a specific climate phenomenon and its prediction. Lasso helps understanding and interpreting the variations in the observational records. Chatterjee et al. (2012) designed the Sparse Group Lasso (SGL) model to find predictors of climate characteristics over the landmasses using ocean climate variables. Their statistical model is proved to be efficient and outperformed state-of-the-art approaches at that time.

Yet, there are also some limitations in the Lasso approach. The first is that regression tends to underestimate the target variable’s variability and overestimate the probability of midrange values. For example, the time series of monthly precipitation in Fig. 5 show that regression removes most of the simulated extreme precipitation anomalies, resulting in an obviously reduced variance in the bias-corrected time series compared to the observed one. This tendency of regression has been discussed in detail in another study (Li et al. 2019). The problem of underestimated variance in the bias-corrected time series is likely due to the fact that we have employed statistical models dedicated to the multivariate expectation. Thus, an alternative method for retaining the observed data variability is to employ a hybrid calibration model combining the Lasso approach with an extreme-value distribution (EVD) fitted to the lower and upper tails of the statistical distribution. Some studies have implemented a similar procedure for the bias adjustment of extreme precipitation (e.g., Willems et al. 2007; Golroudbary et al. 2016; Um et al. 2016). Further research activities are required to find an appropriate statistical model to achieve bias reduction without losing a part of the variability and without straining the sample size too much because of the enhanced number of model parameters. The second limitation is that Lasso estimates are based on linear regression. However, for nonlinear relationships one may simply add predictors (x2, x3, …) that are powers of the original predictor (x) for Lasso regression. A regression on all these polynomial terms can describe some nonlinear relationships. Another way is to use other machine learning or data mining approaches that describe nonlinear relationships, such as random forest (Liaw and Wiener 2002) or neural networks (Nielsen 2015).

6. Conclusions

Lasso represents a statistical regression method that can perform subset selection and regression at the same time. In the present study, it has been applied for the first time to the bias adjustment of decadal precipitation hindcasts over Europe. Lasso is trained using 87 predictors from a 10-yr training simulation. The statistical transfer functions are then transferred to another independent decadal hindcast to assess the robustness of the statistical relationships and its potential applicability to operational decadal climate forecasts.

A major constraint of the Lasso approach is given by the shrinkage factor as demonstrated in this study. Most of the predictors are selected at a small Lasso shrinkage factor. This regression performs well for the training simulation but poorly for the control simulation, which is a typical indication of statistical overfitting. The number of selected predictors decreases with an increasing Lasso shrinkage factor until a regionally specified break point is reached. This behavior causes a reduced performance for the training simulation but a substantial improvement in terms of the control simulation. Thus, the resulting statistical transfer functions are more robust and pave the way for various climatological applications (e.g., the bias adjustment of operational climate or weather forecasts and of long-term climate projections).

The Lasso approach has been compared with two other statistical methods of bias adjustment: MOS and 2DTP. Lasso shows regional better performance than MOS and 2DTP for bias adjustment of decadal prediction for precipitation across Europe. Lasso and MOS outperform 2DTP in their ability in handling large multiple variable datasets. Given its performance and computing efficiency, we recommend Lasso as an alternative regression tool for bias correction, statistical downscaling and the detection of teleconnections and physical mechanisms in Earth’s climate system.

Acknowledgments

This work was conducted in the framework of the German MiKlip project and supported by the German Minister of Education and Research (BMBF) under Grant 01 LP1129A-F. We thank the Institute of Meteorology and Climate Research at the Karlsruhe Institute of Technology, Germany, for providing the CCLM decadal predictions. We acknowledge the E-OBS dataset from the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com). Support for the Twentieth Century Reanalysis Project dataset is provided by the U.S. Department of Energy, Office of Science Innovative and Novel Computational Impact on Theory and Experiment (DOE INCITE) Program, the Office of Biological and Environmental Research (BER), and the National Oceanic and Atmospheric Administration Climate Program Office.

REFERENCES

  • Aguado, E., and J. E. Burt, 2015: Understanding Weather and Climate. 7th ed. Pearson Education, 608 pp.

  • Boer, G. J., and Coauthors, 2016: The Decadal Climate Prediction Project (DCPP) contribution to CMIP6. Geosci. Model Dev., 9, 37513777, https://doi.org/10.5194/gmd-9-3751-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bratsch, S., H. Epstein, M. Buchhorn, D. Walker, and H. Landes, 2017: Relationships between hyperspectral data and components of vegetation biomass in Low Arctic tundra communities at Ivotuk, Alaska. Environ. Res. Lett., 12, 025003, https://doi.org/10.1088/1748-9326/aa572e.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chatterjee, S., K. Steinhaeuser, A. Banerjee, S. Chatterjee, and A. Ganguly, 2012: Sparse group Lasso: Consistency and climate applictions. Proc. 2012 SIAM Int. Conf. on Data Mining, Anaheim, CA, SIAM, 4758, https://doi.org/10.1137/1.9781611972825.5.

    • Crossref
    • Export Citation
  • DelSole, T., and A. Banerjee, 2017: Statistical seasonal prediction based on regularized regression. J. Climate, 30, 13451361, https://doi.org/10.1175/JCLI-D-16-0249.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eden, J. M., M. Widmann, D. Grawe, and S. Rast, 2012: Skill, correction, and downscaling of GCM-simulated precipitation. J. Climate, 25, 39703984, https://doi.org/10.1175/JCLI-D-11-00254.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Friedman, J. H., T. Hastie, and R. Tibshirani, 2010: Regularization paths for generaliyed linear models via coordinate descent. J. Stat. Software, 33, 122, https://doi.org/10.18637/jss.v033.i01.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gao, L., K. Schulz, and M. Bernhardt, 2014: Statistical downscaling of ERA-Interim forecast precipitation data in complex terrain using Lasso algorithm. Adv. Meteor., 2014, 116, https://doi.org/10.1155/2014/472741.

    • Search Google Scholar
    • Export Citation
  • Göbl, C. S., L. Boykurt, A. Tura, G. Pacini, A. Kautzkz-Willer, and M. Mittlboeck, 2015: Application of penalized regression techniques in modelling insulin sensitivity by correlated metabolic parameters. PLOS ONE, 10, e0141524, https://doi.org/10.1371/journal.pone.0141524.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goeman, J., 2010: L1 penalized estimation in the Cox proportional hazards model. Biom. J., 52, 7084, https://doi.org/10.1002/bimj.200900028.

    • Search Google Scholar
    • Export Citation
  • Golroudbary, V. R., Y. Zeng, C. M. Mannaerts, and Z. Su, 2016: Attributing seasonal variation of daily extreme precipitation events across the Netherlands. Wea. Climate Extremes, 14, 5666, https://doi.org/10.1016/j.wace.2016.11.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gorban, A. N., E. M. Mirkes, and A. Zinovyev, 2016: Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning. Neural Networks, 84, 2838, https://doi.org/10.1016/j.neunet.2016.08.007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hawkins, E., R. S. Smith, J. M. Gregory, and D. A. Stainforth, 2016: Irreducible uncertainty in near-term climate projections. Climate Dyn., 46, 38073819, https://doi.org/10.1007/s00382-015-2806-8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haylock, M. R., N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New, 2008: A European daily high-resolution gridded dataset of surface temperature and precipitation. J. Geophys. Res., 113, D20119, https://doi.org/10.1029/2008JD010201.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Illing, S., C. Kadow, O. Kunst, and U. Cubasch, 2014: MurCSS: A tool for standardized evaluation of decadal hindcast systems. J. Open Res. Software, 2, e24, https://doi.org/10.5334/jors.bf.

    • Search Google Scholar
    • Export Citation
  • Keerthi, S. S., and S. Shevade, 2007: A fast tracking algorithm for gereraliyed LARS/LASSO. IEEE Trans. Neural Networks, 18, 18261830, https://doi.org/10.1109/tnn.2007.900229.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Larkin, A., J. Geddes, R. V. Martin, Q. Xiao, Y. Liu, J. D. Marshall, M. Brauer, and P. Hystad, 2017: Global land use regression model for nitrogen dioxide air pollution. Environ. Sci. Technol., 51, 69576964, https://doi.org/10.1021/acs.est.7b01148.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, J., F. Pollinger, H.-J. Panitz, H. Feldmann, and H. Paeth, 2019: Bias adjustment for decadal predictions of precipitation in Europe from CCLM. Climate Dyn., 53, 13231340, https://doi.org/10.1007/s00382-019-04646-y.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liang, X., and Coauthors, 2017: Determining climate effects on US total agricultural productivity. Proc. Natl. Acad. Sci. USA, 114, E2285E2292, https://doi.org/10.1073/pnas.1615922114.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liaw, A., and M. Wiener, 2002: Classification and regression by random forest. Roy. News, 2/3, 1822.

  • Liu, X., D. Wu, G. K. Zewdie, L. Wijerante, C. I. Timms, A. Riley, E. Levetin, and D. J. Lary, 2017: Using machine learning to estimate atmospheric ambrosia pollen contrations in Tulsa, OK. Environ. Health Insights, 11, 1178630217699399, https://doi.org/10.1177/1178630217699399.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., 2012: Nonstationarities of regional climate model biases in European seasonal mean temperature and precipitation sums. Geophys. Res. Lett., 39, L19705, https://doi.org/10.1029/2012GL051210.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marotzke, J., and Coauthors, 2016: MiKlip: A National Research Project on decadal climate prediction. Bull. Amer. Meteor. Soc., 97, 23792394, https://doi.org/10.1175/BAMS-D-15-00184.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Meehl, G. A., and Coauthors, 2014: Decadal climate prediction: An update from the trenches. Bull. Amer. Meteor. Soc., 95, 243267, https://doi.org/10.1175/BAMS-D-12-00241.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Müller, W. A., and Coauthors, 2018: A higher-resolution version of the Max Planck Institute Earth System Model (MPI-ESM1.2-HR). J. Adv. Model Earth Syst., 10, 13831413, https://doi.org/10.1029/2017MS001217.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nielsen, M. A., 2015: Neural Networks and Deep Learning. Determination Press, http://neuralnetworksanddeeplearning.com/.

  • Nishar, A., M. K.-F. Bader, E. J. O’Gorman, J. Deng, B. Breen, and S. Leuzinger, 2017: Temperature effects on biomass and regeneration of vegetation in a geothermal area. Front. Plant Sci., 8, 249, https://doi.org/10.3389/fpls.2017.00249.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Olkin, I., 1967: Correlations revisited. Improving Experimental Design and Statistical Analysis, J. C. Stanley, Ed., Rand McNally, 102–108.

  • Paeth, H., 2011: Postprocessing of simulated precipitation for impact research in West Africa. Part I: Model output statistics for monthly data. Climate Dyn., 36, 13211336, https://doi.org/10.1007/s00382-010-0760-z.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Paeth, H., J. Li, F. Pollinger, W. A. Müller, H. Pohlmann, H. Feldmann, and H.-J. Panitz, 2019: An effective drift correction for dynamical downscaling of decadal global climate predictions. Climate Dyn., 52, 13431357, https://doi.org/10.1007/s00382-018-4195-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Piani, C., and J. O. Haerter, 2012: Two dimensional bias correction of temperature and precipitation copulas in climate models. Geophys. Res. Lett., 39, L20401, https://doi.org/10.1029/2012GL053839.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rachmawati, R. N., N. H. Pusponegoro, A. Muslim, K. A. Notodiputro, and B. Sartono, 2017: Group Lasso for rainfall data modeling in Indramayu district, West Java, Indonesia. Procedia Comput. Sci., 116, 190197, https://doi.org/10.1016/j.procs.2017.10.030.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rockel, B., A. Will, and A. Hense, 2008: The regional climate model COSMO-CLM (CCLM). Meteor. Z., 17, 347348, https://doi.org/10.1127/0941-2948/2008/0309.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Seya, H. D., D. Murakami, M. Tsutsumi, and Y. Yamagata, 2015: Application of LASSO to the eigenvector selection problem in eigenvector-based spatial filtering. Geogr. Anal., 47, 284299, https://doi.org/10.1111/gean.12054.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tetko, I. V., D. J. Livingstone, and A. I. Luik, 1995: Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Info. Model., 35, 826833, https://doi.org/10.1021/ci00027a006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tibshirani, T., 1996: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc., 58B, 267288, https://doi.org/10.1111/J.2517-6161.1996.TB02080.X.

    • Search Google Scholar
    • Export Citation
  • Uemura, M., K. S. Kawabata, S. Ikeda, and K. Maeda, 2015: Variable selection for modeling the absolute magnitude at maximum of type la supernovae. Astron. Soc. Japan, 67, 55, https://doi.org/10.1093/pasj/psv031.

    • Search Google Scholar
    • Export Citation
  • Um, M.-J., H. Kim, and J.-H. Heo, 2016: Hybird approach in statistical bias correction of projected precipitation for the frequency analysis of extreme events. Adv. Water Resour., 94, 278290, https://doi.org/10.1016/j.advwatres.2016.05.021.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Upadhyaya, S., and R. Ramsankaran, 2016: Modified-INSAT multi-spectral rainfall algorithm (M-IMSRA) at climate regional scale: Development and validation. Remote Sens. Environ., 187, 186201, https://doi.org/10.1016/j.rse.2016.10.013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Willems, P., A. Guillou, and J. Beirlant, 2007: Bias correction in hydrologic GPD based extreme value analysis by means of a slowly varying function. J. Hydrol., 338, 221236, https://doi.org/10.1016/j.jhydrol.2007.02.035.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilson, C. H., T. T. Caughlin, S. W. Rifai, E. H. Boughton, M. C. Mack, and S. L. Flory, 2017: Multi-decadal time series of remotely sensed vegetation improves prediction of soil carbon in a subtropical grassland. Ecol. Appl., 27, 16461656, https://doi.org/10.1002/eap.1557.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Woodard, J. D., D. R. Wang, A. McClung, L. Ziska, T. Dutta, and S. McCouch, 2016: Integrating variety data into large-scale crop yield models. Proc. 2016 Agricultural & Applied Economics Association Annual Meeting, Boston, MA, Agricultural & Applied Economics Association, https://doi.org/10.22004/ag.econ.236170.

    • Crossref
    • Export Citation
  • Zaikarina, H., A. Djuraidah, and A. H. Wigena, 2016: Lasso and ridge quantile regression using cross validation to estimate extreme rainfall. Global J. Pure Appl. Math., 12, 33053314.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., J. M. Cavallari, S. C. Fang, M. G. Weisskopf, X. Lin, M. A. Mittleman, and D. C. Christiani, 2017: Application of linear mixed-effects model with Lasso to identify metal components associated with cardiac autonomic responses among welders: A repeated measures study. Occup. Environ. Med., 74, 810815, https://doi.org/10.1136/oemed-2016-104067.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save
  • Aguado, E., and J. E. Burt, 2015: Understanding Weather and Climate. 7th ed. Pearson Education, 608 pp.

  • Boer, G. J., and Coauthors, 2016: The Decadal Climate Prediction Project (DCPP) contribution to CMIP6. Geosci. Model Dev., 9, 37513777, https://doi.org/10.5194/gmd-9-3751-2016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bratsch, S., H. Epstein, M. Buchhorn, D. Walker, and H. Landes, 2017: Relationships between hyperspectral data and components of vegetation biomass in Low Arctic tundra communities at Ivotuk, Alaska. Environ. Res. Lett., 12, 025003, https://doi.org/10.1088/1748-9326/aa572e.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chatterjee, S., K. Steinhaeuser, A. Banerjee, S. Chatterjee, and A. Ganguly, 2012: Sparse group Lasso: Consistency and climate applictions. Proc. 2012 SIAM Int. Conf. on Data Mining, Anaheim, CA, SIAM, 4758, https://doi.org/10.1137/1.9781611972825.5.

    • Crossref
    • Export Citation
  • DelSole, T., and A. Banerjee, 2017: Statistical seasonal prediction based on regularized regression. J. Climate, 30, 13451361, https://doi.org/10.1175/JCLI-D-16-0249.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Eden, J. M., M. Widmann, D. Grawe, and S. Rast, 2012: Skill, correction, and downscaling of GCM-simulated precipitation. J. Climate, 25, 39703984, https://doi.org/10.1175/JCLI-D-11-00254.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Friedman, J. H., T. Hastie, and R. Tibshirani, 2010: Regularization paths for generaliyed linear models via coordinate descent. J. Stat. Software, 33, 122, https://doi.org/10.18637/jss.v033.i01.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gao, L., K. Schulz, and M. Bernhardt, 2014: Statistical downscaling of ERA-Interim forecast precipitation data in complex terrain using Lasso algorithm. Adv. Meteor., 2014, 116, https://doi.org/10.1155/2014/472741.

    • Search Google Scholar
    • Export Citation
  • Göbl, C. S., L. Boykurt, A. Tura, G. Pacini, A. Kautzkz-Willer, and M. Mittlboeck, 2015: Application of penalized regression techniques in modelling insulin sensitivity by correlated metabolic parameters. PLOS ONE, 10, e0141524, https://doi.org/10.1371/journal.pone.0141524.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goeman, J., 2010: L1 penalized estimation in the Cox proportional hazards model. Biom. J., 52, 7084, https://doi.org/10.1002/bimj.200900028.

    • Search Google Scholar
    • Export Citation
  • Golroudbary, V. R., Y. Zeng, C. M. Mannaerts, and Z. Su, 2016: Attributing seasonal variation of daily extreme precipitation events across the Netherlands. Wea. Climate Extremes, 14, 5666, https://doi.org/10.1016/j.wace.2016.11.003.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gorban, A. N., E. M. Mirkes, and A. Zinovyev, 2016: Piece-wise quadratic approximations of arbitrary error functions for fast and robust machine learning. Neural Networks, 84, 2838, https://doi.org/10.1016/j.neunet.2016.08.007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hawkins, E., R. S. Smith, J. M. Gregory, and D. A. Stainforth, 2016: Irreducible uncertainty in near-term climate projections. Climate Dyn., 46, 38073819, https://doi.org/10.1007/s00382-015-2806-8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haylock, M. R., N. Hofstra, A. M. G. Klein Tank, E. J. Klok, P. D. Jones, and M. New, 2008: A European daily high-resolution gridded dataset of surface temperature and precipitation. J. Geophys. Res., 113, D20119, https://doi.org/10.1029/2008JD010201.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Illing, S., C. Kadow, O. Kunst, and U. Cubasch, 2014: MurCSS: A tool for standardized evaluation of decadal hindcast systems. J. Open Res. Software, 2, e24, https://doi.org/10.5334/jors.bf.

    • Search Google Scholar
    • Export Citation
  • Keerthi, S. S., and S. Shevade, 2007: A fast tracking algorithm for gereraliyed LARS/LASSO. IEEE Trans. Neural Networks, 18, 18261830, https://doi.org/10.1109/tnn.2007.900229.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Larkin, A., J. Geddes, R. V. Martin, Q. Xiao, Y. Liu, J. D. Marshall, M. Brauer, and P. Hystad, 2017: Global land use regression model for nitrogen dioxide air pollution. Environ. Sci. Technol., 51, 69576964, https://doi.org/10.1021/acs.est.7b01148.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, J., F. Pollinger, H.-J. Panitz, H. Feldmann, and H. Paeth, 2019: Bias adjustment for decadal predictions of precipitation in Europe from CCLM. Climate Dyn., 53, 13231340, https://doi.org/10.1007/s00382-019-04646-y.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liang, X., and Coauthors, 2017: Determining climate effects on US total agricultural productivity. Proc. Natl. Acad. Sci. USA, 114, E2285E2292, https://doi.org/10.1073/pnas.1615922114.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liaw, A., and M. Wiener, 2002: Classification and regression by random forest. Roy. News, 2/3, 1822.

  • Liu, X., D. Wu, G. K. Zewdie, L. Wijerante, C. I. Timms, A. Riley, E. Levetin, and D. J. Lary, 2017: Using machine learning to estimate atmospheric ambrosia pollen contrations in Tulsa, OK. Environ. Health Insights, 11, 1178630217699399, https://doi.org/10.1177/1178630217699399.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Maraun, D., 2012: Nonstationarities of regional climate model biases in European seasonal mean temperature and precipitation sums. Geophys. Res. Lett., 39, L19705, https://doi.org/10.1029/2012GL051210.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Marotzke, J., and Coauthors, 2016: MiKlip: A National Research Project on decadal climate prediction. Bull. Amer. Meteor. Soc., 97, 23792394, https://doi.org/10.1175/BAMS-D-15-00184.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Meehl, G. A., and Coauthors, 2014: Decadal climate prediction: An update from the trenches. Bull. Amer. Meteor. Soc., 95, 243267, https://doi.org/10.1175/BAMS-D-12-00241.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Müller, W. A., and Coauthors, 2018: A higher-resolution version of the Max Planck Institute Earth System Model (MPI-ESM1.2-HR). J. Adv. Model Earth Syst., 10, 13831413, https://doi.org/10.1029/2017MS001217.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nielsen, M. A., 2015: Neural Networks and Deep Learning. Determination Press, http://neuralnetworksanddeeplearning.com/.

  • Nishar, A., M. K.-F. Bader, E. J. O’Gorman, J. Deng, B. Breen, and S. Leuzinger, 2017: Temperature effects on biomass and regeneration of vegetation in a geothermal area. Front. Plant Sci., 8, 249, https://doi.org/10.3389/fpls.2017.00249.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Olkin, I., 1967: Correlations revisited. Improving Experimental Design and Statistical Analysis, J. C. Stanley, Ed., Rand McNally, 102–108.

  • Paeth, H., 2011: Postprocessing of simulated precipitation for impact research in West Africa. Part I: Model output statistics for monthly data. Climate Dyn., 36, 13211336, https://doi.org/10.1007/s00382-010-0760-z.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Paeth, H., J. Li, F. Pollinger, W. A. Müller, H. Pohlmann, H. Feldmann, and H.-J. Panitz, 2019: An effective drift correction for dynamical downscaling of decadal global climate predictions. Climate Dyn., 52, 13431357, https://doi.org/10.1007/s00382-018-4195-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Piani, C., and J. O. Haerter, 2012: Two dimensional bias correction of temperature and precipitation copulas in climate models. Geophys. Res. Lett., 39, L20401, https://doi.org/10.1029/2012GL053839.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rachmawati, R. N., N. H. Pusponegoro, A. Muslim, K. A. Notodiputro, and B. Sartono, 2017: Group Lasso for rainfall data modeling in Indramayu district, West Java, Indonesia. Procedia Comput. Sci., 116, 190197, https://doi.org/10.1016/j.procs.2017.10.030.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Rockel, B., A. Will, and A. Hense, 2008: The regional climate model COSMO-CLM (CCLM). Meteor. Z., 17, 347348, https://doi.org/10.1127/0941-2948/2008/0309.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Seya, H. D., D. Murakami, M. Tsutsumi, and Y. Yamagata, 2015: Application of LASSO to the eigenvector selection problem in eigenvector-based spatial filtering. Geogr. Anal., 47, 284299, https://doi.org/10.1111/gean.12054.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tetko, I. V., D. J. Livingstone, and A. I. Luik, 1995: Neural network studies. 1. Comparison of overfitting and overtraining. J. Chem. Info. Model., 35, 826833, https://doi.org/10.1021/ci00027a006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tibshirani, T., 1996: Regression shrinkage and selection via the Lasso. J. Roy. Stat. Soc., 58B, 267288, https://doi.org/10.1111/J.2517-6161.1996.TB02080.X.

    • Search Google Scholar
    • Export Citation
  • Uemura, M., K. S. Kawabata, S. Ikeda, and K. Maeda, 2015: Variable selection for modeling the absolute magnitude at maximum of type la supernovae. Astron. Soc. Japan, 67, 55, https://doi.org/10.1093/pasj/psv031.

    • Search Google Scholar
    • Export Citation
  • Um, M.-J., H. Kim, and J.-H. Heo, 2016: Hybird approach in statistical bias correction of projected precipitation for the frequency analysis of extreme events. Adv. Water Resour., 94, 278290, https://doi.org/10.1016/j.advwatres.2016.05.021.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Upadhyaya, S., and R. Ramsankaran, 2016: Modified-INSAT multi-spectral rainfall algorithm (M-IMSRA) at climate regional scale: Development and validation. Remote Sens. Environ., 187, 186201, https://doi.org/10.1016/j.rse.2016.10.013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Willems, P., A. Guillou, and J. Beirlant, 2007: Bias correction in hydrologic GPD based extreme value analysis by means of a slowly varying function. J. Hydrol., 338, 221236, https://doi.org/10.1016/j.jhydrol.2007.02.035.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Wilson, C. H., T. T. Caughlin, S. W. Rifai, E. H. Boughton, M. C. Mack, and S. L. Flory, 2017: Multi-decadal time series of remotely sensed vegetation improves prediction of soil carbon in a subtropical grassland. Ecol. Appl., 27, 16461656, https://doi.org/10.1002/eap.1557.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Woodard, J. D., D. R. Wang, A. McClung, L. Ziska, T. Dutta, and S. McCouch, 2016: Integrating variety data into large-scale crop yield models. Proc. 2016 Agricultural & Applied Economics Association Annual Meeting, Boston, MA, Agricultural & Applied Economics Association, https://doi.org/10.22004/ag.econ.236170.

    • Crossref
    • Export Citation
  • Zaikarina, H., A. Djuraidah, and A. H. Wigena, 2016: Lasso and ridge quantile regression using cross validation to estimate extreme rainfall. Global J. Pure Appl. Math., 12, 33053314.

    • Search Google Scholar
    • Export Citation
  • Zhang, J., J. M. Cavallari, S. C. Fang, M. G. Weisskopf, X. Lin, M. A. Mittleman, and D. C. Christiani, 2017: Application of linear mixed-effects model with Lasso to identify metal components associated with cardiac autonomic responses among welders: A repeated measures study. Occup. Environ. Med., 74, 810815, https://doi.org/10.1136/oemed-2016-104067.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Long-term mean annual precipitation (mm yr−1) during the 2001–10 period from (a) E-OBS and (b) a CCLM decadal prediction, and (c) systematic differences between them. Red stars and labels indicate the example locations where we show various results of the Lasso approach in the following plots.

  • Fig. 2.

    Influence of Lasso shrinkage factor on the predictor selection at 4 exemplary locations, with the y -axis displaying the complete predictors (in total 87). Orange lines separate the 6 different groups of predictors: 9 local predictors and 20 EOF predictors each for variables precipitation (pr), temperature (tas), and sea level pressure (psl). The Lasso shrinkage factors (18 in total) are displayed along the x axis, ranging from 0.01 to 3.0, together with the classical MOS approach trained with the same data (bottom row, marked by a blue rectangle). The selected predictors are marked with red bars. Another blue rectangle represents the selected optimum lasso shrinkage factors as discussed in section 4.

  • Fig. 3.

    The influence of Lasso shrinkage factors (x axis) on the explained variance (R2) from Lasso regression (black dots), compared with the original model prediction (Raw, blue line), the MOS regression (dark green line), and 2DTP regression (orange line). R2 is calculated between simulated and observed monthly precipitation during 1986–95.

  • Fig. 4.

    As in Fig. 4, but for root-mean-square error (RMSE).

  • Fig. 5.

    Time series of precipitation (mm month−1) during 2001–10 at Graz (Austria) from (a) decadal hindcast prediction (blue) and E-OBS (red) and (b) bias-adjusted prediction using MOS (green), 2DTP (orange), and Lasso (black). The filled blue color marks the region of time series adjusted using Lasso shrinkage ranges from 0.01 to 3.0. Largest variance in data corresponds to the smallest shrinkage factor 0.01. The black line corresponds to the optimal shrinkage factor.

  • Fig. 6.

    As in Fig. 4, but for the application of bias adjustment to the independent 2001–10 period.

  • Fig. 7.

    As in Fig. 6, but for RMSE.

  • Fig. 8.

    Spatial distribution of (left) R2 and (right) RMSE/1STD estimated between decadal precipitation hindcast and observations, (from top to bottom) from (a),(b) non-bias-adjusted model, (c),(d) Lasso adjusted model prediction using optimum shrinkage factor, (e),(f) MOS adjusted model prediction, and (g),(h) the 2DTP adjusted model prediction. RMSE/1STD represents a scaled RMSE using one standard derivation of local observational data. Black points in the left column show locations where the bias adjustment achieves significant improvement in R2 compared to the unadjusted prediction at the 0.05 level. To enhance readability only every second significant grid point is marked in this figure due to the high spatial resolution of climate simulation.

  • Fig. 9.

    Differences in R2 among the statistical methods of bias adjustment for monthly precipitation during the 2001–10 period. Green points show locations where differences between statistical methods are significant at the 0.05 level. As in Fig. 8, only every second significant grid point is marked here.

All Time Past Year Past 30 Days
Abstract Views 296 0 0
Full Text Views 604 208 16
PDF Downloads 520 160 9