Climate impact studies constitute the basis for the formulation of adaptation strategies. Usually such assessments apply statistically postprocessed output of climate model projections to force impact models. Increasingly, time series with daily resolution are used, which require high consistency, for instance with respect to transition probabilities (TPs) between wet and dry days and spell durations. However, both climate models and commonly applied statistical tools have considerable uncertainties and drawbacks. This paper compares the ability of 1) raw regional climate model (RCM) output, 2) bias-corrected RCM output, and 3) a conventional weather generator (WG) that has been calibrated to match observed TPs to simulate the sequence of dry, wet, and very wet days at a set of long-term weather stations across Switzerland. The study finds systematic biases in TPs and spell lengths for raw RCM output, but a substantial improvement after bias correction using the deterministic quantile mapping technique. For the region considered, bias-corrected climate model output agrees well with observations in terms of TPs as well as dry and wet spell durations. For the majority of cases (models and stations) bias-corrected climate model output is similar in skill to a simple Markov chain stochastic weather generator. There is strong evidence that bias-corrected climate model simulations capture the atmospheric event sequence more realistically than a simple WG.
Several regional and global climate model (RCM and GCM) ensembles have been made available over the past decades via public archives [e.g., PRUDENCE (Christensen and Christensen 2007), phase 3 of CMIP (CMIP3), NARCCAP (Mearns et al. 2009), ENSEMBLES (van den Linden and Mitchell 2009), phase 5 of CMIP (CMIP5; Taylor et al. 2012), and the Coordinated Regional Downscaling Experiment (CORDEX; Jacob et al. 2014; Kotlarski et al. 2014)]. They form the basis of state-of-the art assessments of projected climatic conditions at global [e.g., IPCC Fourth (IPCC 2007) and Fifth Assessment Reports (IPCC 2013)] and regional scales [e.g., United Kingdom Climate Impacts Programme (UKCIP; Jenkins et al. 2008) and Swiss Climate Change Scenarios (CH2011 2011; KNMI 2014)]. Furthermore, climate model ensembles are frequently used by end users in climate impact studies that usually apply statistically postprocessed (downscaled and/or bias corrected) model output to force impact models in order to assess the consequences of climatic changes (e.g., CH2014-Impacts 2014). The majority of these techniques can be summarized as empirical–statistical downscaling (ESD) approaches (Fowler et al. 2007; Maraun et al. 2010). An ESD approach aims to remove systematic climate model biases and to downscale model output from the resolved grid scale to local conditions.
The suitability of individual ESD methods depends on the end-user application and hence requires an appropriate communication of the strengths and weaknesses related to the targeted application. Several recent studies compared the performance of different ESD approaches in cross-validation frameworks to match observed conditions and concluded that quantile mapping (QM) outperforms a number of other ESD techniques (Themeßl et al. 2011; Gudmundsson et al. 2012; Teutschbein and Seibert 2012; Räty et al. 2014; Sachindra et al. 2014). Focusing on QM, further studies addressed the representation of multiday characteristics (Addor and Seibert 2014; Wilcke et al. 2013). Uncertainties exist regarding the stationarity of the underlying transfer functions and biases (Christensen et al. 2008; Boberg and Christensen 2012; Buser et al. 2009; Maraun 2012; Bellprat et al. 2013; Kerkhoff et al. 2014), and the spatial and temporal coherence structure between different point-scale estimates (von Storch 1999; Maraun 2013). By design, QM corrects for biases in the distributional behavior of a parameter (e.g., daily precipitation amounts) but does not explicitly correct for errors in the temporal sequence (e.g., transition probabilities and spell lengths).
Beside ESD approaches, weather generators (WGs) are frequently applied to produce time series for impacts research (e.g., Richardson 1981). The main principle of WGs is to stochastically simulate precipitation series based on Markov chain (MC) model simulations. They are driven by transition probabilities (TPs) between dry and wet states, while precipitation intensity and other variables are thereafter conditioned on the precipitation occurrence (e.g., Richardson 1981; Ines et al. 2011; Keller et al. 2014). Even though WGs are constrained by observed TPs, they may underestimate the frequency of persistent weather situations (i.e., dry and wet spells) as they only account for a limited number of previous states (e.g., Semenov et al. 1998). In contrast, an adequate physical model (i.e., a GCM–RCM chain) may capture extended dry and wet spells more precisely, provided the respective memory effect is appropriately captured.
Given this background, the present study validates and compares the performance of an ensemble of raw and bias-corrected (QM) climate model simulations and of an ensemble of WG realizations in reproducing observed transition probabilities and spell-length statistics at a set of stations across Switzerland.
Daily precipitation observations in the period 1961–2000 at 61 operational MeteoSwiss weather stations across Switzerland are used for bias correction, evaluation, and estimation of WG parameters. Figure 1 visualizes the locations of the stations with the actual topography (Fig. 1a), a typical RCM topography (Fig. 1b), and a GCM topography (Fig. 1c). Note that many topographical features remain unresolved in the climate models. Figure 1 additionally provides the observed mean precipitation amounts (Fig. 1d) and wet day (>1 mm day−1) frequencies (Fig. 1e) in gridded and station-based observations for the period 1961–2000.
To make the presentation of the results more concise, the focus is put on a subset of 21 stations (Figs. 1b,c). To ensure that the subset of stations samples the whole range of precipitation variability, the whole set of 61 stations was sorted according to the observed wet day frequencies and every third station was selected.
b. Regional climate models
A set of GCM-driven RCMs from the European Union (EU) ENSEMBLES project (van der Linden and Mitchell 2009) for the historical period 1961–2000 is considered (Table 1). The ensemble consists of 14 experiments, combining five GCMs with nine RCMs. All GCM–RCM chains provide data at a horizontal resolution of about 25 km and are forced by historical greenhouse gas concentrations. Basic results are presented for all GCM–RCM chains, but in some of the analyses emphasis is put on the individual realization run by ETH Zurich [employing the RCM Consortium for Small-Scale Modeling Model in Climate Mode (COSMO-CLM; hereafter CLM)] and its driving GCM (HadCM3Q0 at a horizontal resolution of 3.75° × 2.5°; ensemble member Q0 from the HadCM3-based perturbed physics ensemble). In addition, a CLM simulation driven by reanalysis data (ERA-40; Uppala et al. 2005) is used.
The ability of raw and bias-corrected climate model output to match observed TPs in a historical period is evaluated. For this purpose, each day of a given precipitation series (observed or modeled) is categorized into three different states: dry, wet, or very wet. The motivation to distinguish between wet and very wet days is based on the desire to more accurately describe the character of precipitation. This is in particular required for hydrological applications such as the analysis of high-impact flooding events that affect large-scale catchments and often originate from consecutive days with intense precipitation (e.g., the Alpine flood in 2005; MeteoSchweiz 2006).
Climate models are bias-corrected using QM and the results are systematically compared against a large ensemble of WG simulations.
a. Empirical–statistical bias correction: Quantile mapping
A nonparametric empirical implementation of QM is applied (Gudmundsson et al. 2012). It considers a time-dependent correction function, calibrated for each day of the year (DOY) based on a 91-day window centered over the respective day (Themeßl et al. 2011, 2012).
The principle of QM is to correct the daily precipitation amount from a climate simulation (mod), say X at time t for a grid box located over a target station, so that the corrected distribution of Y(X) matches the observed (obs) cumulative distribution function F, according to
In a climate scenario framework, transfer functions as defined by Y(X) in a calibration period [Eq. (1)] are assumed to be stationary and thus also valid under future conditions. Values that lie outside the range of calibrated values are typically considered by an extrapolation of the correction function [see also Themeßl et al. (2012)]. In this study, the correction of the 99th percentile is used if the calibrated range of values is exceeded.
b. Weather generator: Transition probabilities and Markov chain model
A precipitation time series X(t) at daily resolution t is discriminated into three states: dry [D; X(t) < 1 mm day−1], wet [W; 1 mm day−1 ≤ X(t) < obs.q50DOY], and very wet [V; X(t) ≥ obs.q50DOY], where obs.q50DOY is the 50th percentile of observed wet-day precipitation (>1 mm day−1) considered for each DOY separately. This is done as precipitation parameters are usually subject to seasonal variations. As local-scale conditions are targeted, the observed threshold for very wet days is considered. Applying the modeled 50th percentile for each modeling strategy separately can yield slightly different results but does not change the overall outcome of the analysis. Further, using a two-state (dry and wet) instead of a three-state MC model yields qualitatively similar results (not shown).
The approach applied here can be described as a three-state first-order MC model (Wilks 2011), where the TPs are
with J denoting the present and I the previous day’s state (for D, W, or V) and PIJ the corresponding TP. The setup is completely defined by a combination of nine TPs.
Based on these principles, a state is assigned to each X(t) in observations and the individual model realizations. The PIJ values are calculated for each DOY based on a 91-day window centered over the respective DOY. Finally, all individual PIJ values from the different years are averaged over the entire period. Time series describing the sequence of the three states can be simulated using the estimated PIJ values as MC forcing parameters. For the analysis of TPs and spell lengths it is sufficient to not operate a second model for the wet-day precipitation amounts.
c. Evaluation strategy
A split-sample approach with independent calibration (1961–80) and validation (1981–2000) periods is applied. Validation is carried out on a monthly basis, with daily values being averaged across each month. To summarize results an annual-mean skill score is defined and applied in certain analyses:
where index i denotes months (m = 12) and Pmod (Pobs) modeled (observed) TPs averaged over each month i. The score S gives an estimate on the average magnitude of the fractional bias in monthly TPs with an ideal score of 1. Note that large over- and underestimations in different parts of the year may compensate each other.
Here, the skill of QM in correcting RCM-simulated precipitation time series and of the WG in simulating sequences of dry, wet, and very wet days is presented. Afterward, detailed results regarding TPs at the exemplary site of Chur are shown. The motivation of selecting Chur is due to 1) its location in a rather dry inner-alpine valley not resolved by the RCMs and 2) the good observational data quality at this site. The paper finally continues with summarized results from the analysis of a 21-station subset, restricting the discussion to four prominent TPs.
Figure 2 shows results for a raw (red) versus bias-corrected (blue) model ensemble of 14 RCMs (Table 1) and 100 WG simulations (gray). The raw ensemble obviously deviates from observations. The majority of raw models overestimate wet day frequencies (Fig. 2a).
Mean precipitation (Fig. 2b) is primarily overestimated and biases in wet day intensity (Fig. 2c) can be pronounced but no systematic qualitative direction is apparent. The 99th percentile of 5-day accumulated precipitation (Fig. 2d) is overestimated in most cases. Raw models tend to clearly and systematically underestimate the frequency of dry spells (Figs. 2e,f), and to overestimate the frequency of wet spells (Fig. 2g,h). The raw model biases with respect to very wet spells are diverse but in many cases substantial, primarily toward too large values (Fig. 2i). Overall, biases are more pronounced for long spell lengths (Figs. 2f,h). The application of QM systematically and massively improves various precipitation diagnostics, independently of the respective raw climate model’s skill. It is obvious that the bias-corrected model ensemble shows similar skill as, and does partly outperform, the WG ensemble (Fig. 2i). The WG tends to underestimate the frequency of long dry spells (Fig. 2f; majority of sites), of long wet spells (Fig. 2h; at dry sites) and—in a systematic manner—of very wet spells (Fig. 2i) (see also Semenov et al. 1998).
For all modeling strategies, Fig. 3 presents probabilities of different wet (Fig. 3a) and dry (Fig. 3b) spell durations at the stations Sion (SIO) and Chur (CHU), both located in dry valleys not resolved by the RCMs (see Fig. 1). As additional information, the bottom of each panel depicts mean spell durations. At both sites observations show a pronouncedly larger fraction of dry days and accordingly longer average dry than wet spells. While an average wet period lasts for about 2 days, a typical dry spell lasts 5–6 days. Raw model simulations strongly overestimate the probabilities of wet-spell durations and in turn mean wet-spell lengths, and underestimate the length of dry spells and their average duration.
Quantile-mapped simulations massively improve the representation of spell-length durations. In a similar manner, the ensemble of 100 weather generator simulations shows a very reasonable agreement with observations. However—and particularly in contrast to quantile-mapped climate model output—the WG systematically underestimates the probability of long dry spells and thus dry-spell durations.
Figure 4 presents the annual cycle of the nine TPs that describe a three-state first-order MC in CHU. Observed TPs show a pronounced seasonal cycle. As the wet-day frequency is relatively small (0.29), TPs that describe the transition to a dry state are high, particularly DD. In January, probabilities for two very wet days to follow each other (VV; all other combinations are similarly denoted) are twice as large as in early fall. In warm months DV and VD are considerably larger than in winter.
The set of raw simulations shows a prominent underestimation of DD, WD, and VD. All other TPs are overestimated, especially and most distinctly DW and DV in winter. This is probably due to the common problem of climate models to simulate too many wet days (Rajczak et al. 2013).
Bias-corrected climate model realizations (QM) show a substantial overall improvement of TPs across the whole year. The QM ensemble is in line with the range of the 100 WG simulations that by and large also match observations well. Results for DW and WD even indicate a better skill for QM compared to the WG, suggesting that memory effects are better captured in QM especially in summer. In this particular example (at CHU), the QM-adjusted RCM simulations tend to underestimate VD, which is also reflected by an overestimation of VW and VV. Nonetheless, observations still lie within the simulated range. This pattern is also seen in some of the other stations considered, in particular at dry sites (not shown). Note that results are qualitatively similar when validating second-order transition probabilities (not shown).
Figure 5 presents results for the full 21-station subset (from Figs. 1b,c) and for four prominent TPs: DD, DW, WW, and DV. Additionally, the ERA-40-driven realization of the RCM CLM run at ETH Zurich and the driving GCM HadCM3Q0 itself (see Table 1) are included by specific symbols in Fig. 5.
Overall, the results confirm those for the individual site of Chur (Fig. 4). There are distinct and systematic biases in raw climate model output, and a clear reduction of biases after bias correction for a large majority of cases. Especially at dry sites, raw RCM and GCM output obviously suffers from systematic biases in TPs. For instance, DW and WW probabilities are overestimated, and DD probabilities are underestimated. Intermodel and interstation spread is large for DV, with a tendency for distinct overestimations at dry sites and underestimations at wet sites.
The raw GCM output shows characteristics qualitatively similar to its dynamically downscaled counterpart (ETHZ–HadCM3Q0) but with larger biases, especially a pronounced overestimation of DW and WW. The reanalysis-driven simulation ETHZ–ERA-40 shows a better skill in most cases. The bias-corrected versions of the three model experiments are typically very close to the observations and no systematic difference in their respective performance is apparent. This suggests that the raw model’s skill does not necessarily determine the skill of its bias-corrected version regarding the representation of TPs.
Most obvious, biases are found for DV, where some raw RCMs overestimate the TP by a factor of about 5 at the driest site (SIO). In contrast, the considered GCM underestimates DV at the wettest site, Säntis (SAE), by a factor of 10. In a majority of cases, QM leads to a massive improvement in the representation of TPs.
In general, WG estimates have similar skill to the bias-corrected climate models. A notable feature is an increasing spread of the WG results in the estimates for WW at dry sites (leftmost stations in Fig. 5). This may be related to sampling issues, as two consecutive wet days occur less frequently than the other transitions.
5. Discussion and conclusions
Applying a cross-validation framework, the present study demonstrates that the well-established quantile mapping (QM) technique is able to correct raw RCM precipitation time series in order to represent observed transition probabilities (TPs) and spell-length durations at weather stations across Switzerland. A significant improvement in skill is found, despite the fact that TPs are not explicitly corrected for by the QM methodology. QM merely adjusts the frequency of wet days by correcting the simulated precipitation intensities such that they match observations. Analysis shows that this yields the systematic improvement of TPs and spell lengths in quantile-mapped climate model output.
The main findings of the present study are the following:
Raw climate model simulations possess substantial and systematic biases in the representation of (local scale) TPs, spell durations, and multiday precipitation diagnostics. For instance, climate model simulations have a tendency to overestimate the frequency of wet days.
Bias correction (QM) of climate model simulations leads to a substantial improvement of the representation of TPs and multiday diagnostics. The improvement of TPs concerns both their magnitude and the representation of their annual cycle. Spell-length durations agree surprisingly very well with observed values after bias correction.
The applied WG captures observed TPs and spell lengths well, but is outperformed in some seasons by bias-corrected RCM integrations, in particular for long dry spells. This implies that climate model simulations capture the statistics of atmospheric event sequences more realistic than a simple first-order WG. It is likely that this is due to the short memory of the considered WG. However, this conclusion may require revision with more sophisticated WGs.
Overall, the presented study finds that the application of QM provides obvious added value for the representation of TPs and multiday precipitation characteristics in climate model data at the local scale. This is remarkable, as temporal characteristics are not explicitly corrected for by the deterministic correction of distributional biases by QM.
The general applicability of QM, however, is limited by the demands of the end user. Fields of applicability are single-site impact assessments or multisite studies that consider temporally integrated statistics. For many end-user applications the findings of the presented work indicate a promising applicability of QM. For example, agricultural impact assessments often depend on an accurate representation of local TPs. However, the straightforward application of QM may be erroneous for multisite assessments that rely on the spatiotemporal coherence of the meteorological forcing (Maraun 2013).
Given the tremendous climatic variations of the considered stations (see Figs. 1e,f) and the large set of models used, the findings may be transferable to other settings and regions. Also, Switzerland is a particularly challenging environment in terms of regional climate modeling. This is primarily due to its complex topography and the associated influence of subgrid-scale processes, which are both not fully resolved by current climate model ensembles.
Whether our conclusions are valid under changing climatic conditions remains a question for future research. Besides methodological uncertainties in QM that particularly concern the temporal stability of biases between models and observations (see the introduction), uncertainties regarding the proper interpretation of climate model projections themselves remain (i.e., internal variability and model and scenario uncertainty).
We acknowledge the RCM datasets from the EU-FP6 project ENSEMBLES (http://ensembles-eu.metoffice.com). We also acknowledge MeteoSwiss, in particular Christoph Frei, for providing observational data and plotting routines in R. This research was partly funded by the Swiss National Science Foundation through the SNSF Sinergia Project CRSII2_136279, “The Evolution of Mountain Permafrost in Switzerland” (TEMPS), and by the Swiss Federal Office of the Environment in the framework of the project Gefahrengrundlagen für Extremhochwasser an Aare und Rhein (EXAR).