## Abstract

This paper describes the assumptions, equations, and procedures of the RheaG weather generator algorithm (WGA). RheaG was conceived for the generation of robust daily meteorological time series, whether in static or transient climate conditions. Here we analyze its performance in four Iberian locations—Bilbao, Barcelona, Madrid, and Sevilla—with differentiated climate characteristics. To validate the RheaG WGA, we compared observed and generated meteorological time series’ statistical properties of precipitation, maximum temperature, and minimum temperature for all four locations. We also compared observed and simulated rain events spell length probabilities in all four locations. Finally, RheaG includes two weather generation procedures: one in which monthly mean values for meteorological variables are unconstrained and one in which they are constrained according to a predefined baseline climate variability. Here, we compare the two weather generation procedures included in RheaG using the observed data from Barcelona. Our results present a high agreement in the statistical properties and the rain spell length probabilities between observed and generated meteorological time series. Our results show that RheaG accurately reproduces seasonal patterns of the observed meteorological time series for all four locations, and it is even able to differentiate two climatic seasons in Bilbao that are also present in the observed data. We find a trade-off between generation procedures in which the unconstrained procedure better reproduces the variability of monthly and yearly precipitation than the constrained one, but the constrained procedure is able to keep the same climatic signal across meteorological time series. Thus, the first procedure is more accurate, but the latter is able to maintain spatial autocorrelation among generated meteorological time series.

## 1. Introduction

Ongoing human-induced climate change is already affecting both human societies and natural systems all across the globe. Its negative impacts are projected to increase in the coming decades (IPCC 2013). Simulation models are the most reliable ways to explore how changes in environmental conditions will likely modify present natural systems in the future. Future climate is unknown, but simulation models can be fed with climate projections generated statistically from the meteorological record. Thus, developing tools to provide statistically robust future meteorological time series is crucial. Moreover, even when those projections are available, most of the time their temporal or spatial scale does not fit with simulation models’ requirements. In that case, further transformations shall be applied upon global climate change projections to downscale them to a site-specific data. This fact is not trivial, as further downscaling needs to be also accurate and statistically robust both with global projections and local-scale observations. Furthermore, in most locations daily meteorological time series are not available. They also might have large gaps in their records. In both cases, current or past meteorological time series have to be reconstructed in order to feed simulation models with reliable meteorological information to evaluate their performance under current climate conditions.

There have been many approaches in the development of algorithms for meteorological series generation since the mid-twentieth century (e.g., Gabriel and Neumann 1962; Richardson 1981; Richardson and Wright 1984; Bürger 1997)—see Ailliot et al. (2015) for a review. Generally speaking, weather generation algorithms (WGA) are based on summarizing observed climate data time series into a set of statistical descriptors in order to use them to generate artificial climate data statistically similar to the observed ones (Wilks and Wilby 1999). Traditionally, climate variables included in WGA have been temperature, incident radiation, wind speed, and the most important, precipitation. Most of WGA include temporal autocorrelation between their variables in their generated data. Many of them correlate climate variables to precipitation, as it is identified as the most difficult climate variable to be properly generated (Hutchinson 1986). In contrast to most global circulation models’ (GCMs’) fluid dynamics differential-equations-based projections, WGAs tend to represent better local climate variations because of their specificity to the statistical characteristics of a given local climate time series. However, the parameterization effort of those models is substantial for each location, and they need long observed climate time series to be parameterized, so their applicability to a large area is still challenging (Semenov and Brooks 1999). Still, WGAs also have to deal with the trade-off between the number of descriptive parameters and the relative information gained per added parameter (Hawkins 2004). The aim is to avoid irrelevant or roughly noninformative parameters that increase the minimum length of the observed meteorological time series required to calibrate them. In the end, each WGA is a trade-off between the likelihood and statistical robustness of its generated meteorological time series and the number of parameters required.

With this paper we propose a new Markovian transition probability-based WGA, and we evaluate its performance when compared to four contrasting climates from the Iberian Peninsula. We first compare the absolute values of observed and generated meteorological time series at four different locations representative of the different Iberian climates—that is, Barcelona, Bilbao, Madrid, and Sevilla—to account for the verisimilitude of the generated meteorological time series compared to the observed ones. Then, we compare generated versus observed precipitation patterns, focusing our attention to rain spell length distributions. Finally, we compare the two alternative procedures included in RheaG for meteorological data generation under different climate forcing scenarios for Barcelona.

## 2. Material and methods

RheaG is a WGA programmed in R coding language (R Core Team 2017) in order to make it freely and easily available. RheaG analyzes certain statistical properties of an input observed daily climate data and then uses these properties, along with a pseudorandom number generator, to produce simulated daily meteorological data one day at a time, thus reproducing observed climate statistical patterns. Table 1 summarizes the meteorological inputs and its description in the RheaG.

Daily meteorological variables included in the RheaG are divided into two categories: the mandatory ones—that is, incident radiation *Q* (MJ ha^{−1} day^{−1}), precipitation *P* (mm day^{−1}), and maximum and minimum temperatures (Tmax and Tmin; °C)—and the discretionary ones—wind speed (Ws; m s^{−1}), atmospheric water pressure (VP; kPa), and potential evapotranspiration (PET; mm day^{−1}). The statistical properties obtained from the observed meteorological time series for Tmax, Tmin, *Q*, Ws, VP, and PET are the month daily mean and standard deviation values, along with the monthly first-order Markovian transition probability matrix. The temperature and radiation Markovian transition probability matrix is defined by the probability of observing a value above or below the monthly mean given that the value of the previous day was above or below the monthly mean. Statistical properties describing precipitation patterns in RheaG are as follows: annual mean precipitation during a rainy day; monthly mean precipitation during a rainy day; rain occurrence probability defined by 15 rain occurrence probability classes, following a *p*[2^{(n−1)} < *x* < 2^{n}] distribution, where *n* ∈ *I*{1, 2, …, 15}; and last, the monthly Markovian transition probability matrix of rain occurrence given whether it rained the previous day.

It is noteworthy that, in absence of radiation inputs, the RheaG algorithm is still able to calculate them by following Hargreaves and Samani (1982), according to Eq. (1):

where *R*_{t} is the daily radiation (MJ ha^{−1} day^{−1}); Tmax and Tmin are maximum and minimum temperatures, respectively, (°C); and *R*_{a} is the extraterrestrial radiation for a given location and moment of year (MJ ha^{−1} day^{−1}) and is calculated from the latitude of the location in degrees and the Julian day (Spencer 1971; Duffie and Beckman 1991). The empirical coefficient *K*_{t} depends on the continentality of the site. For coastal sites, *K*_{t} equals 0.20, and for the interior regions it equals 0.17, according to Allen (1995).

In RheaG, discretionary variables, if not available, are calculated daily as follows:

Saturated vapor pressure (VPS; kPa) is calculated daily following Allen et al. (1998) [Eq. (2)]:

where VPS is the saturated water vapor pressure (kPa), and Tmin is the minimum daily temperature (°C). Kimball et al. (1997) found that predawn air water vapor pressure does not reach saturation point in the most arid climates, and they proposed a correction upon VPS based on the proportion of annual precipitation related to potential evapotranspiration. If annual precipitation *P*_{365} is less than a half of potential evapotranspiration (PET_{365}), a correction is applied upon VPS following Eq. (3):

where VPS_{corrected} is the saturated water vapor pressure after applying Kimball’s correction (kPa), VPS is the estimated saturated water vapor pressure (kPa), and *P*_{365} and PET_{365} are the last 365 days precipitation and potential evapotranspiration, respectively (mm yr^{−1}).

Actual VP is calculated by changing daily Tmin mean daily temperature [i.e., (Tmax + Tmin)/2 ] in Eq. (2).

Daily PET is calculated following the FAO Penman–Monteith equation described in Allen et al. (1998) [Eq. (4)]:

where Δ is the slope vapor pressure curve (kPa °C^{−1}) given daily mean temperature; *Q* is the daily incident radiation (MJ ha^{−1} day^{−1}); *G* is the soil heat flux density (MJ ha^{−1} day^{−1}), fixed at a 10%; *γ* is the psychrometric constant (kPa °C^{−1}) given the altitude of the plot; Tmean is the daily mean temperature (°C); Ws is the daily mean wind speed (m s^{−1}); VPS is the saturation vapor pressure (kPa); and VP is the actual vapor pressure (kPa).

Wind speed is often not available in original data. In that case, Ws RheaG generates daily Ws data using a Gaussian distribution Ws_{day} = *N*(*μ*, *σ*^{2}), with prescribed monthly mean and variance values according to the previous knowledge for each location.

It is noteworthy that, despite in some of the other WGAs there are some kinds of correlations among Tmax and Tmin, *Q*, PET, and VPS with daily precipitation (e.g., Wilks and Wilby 1999; Kilsby et al. 2007), RheaG does not include any a priori correlations among them, in spite of maintaining a low number of parameters required to run the algorithm—but see the constrained generation method below. We are aware that independent meteorological data generation for each variable in some cases may lead to physically unrealistic daily combinations (e.g., high *Q* values generated during extended rain events), while still presenting consistent monthly and yearly variable values. Other approaches (e.g., Kilsby et al. 2007), based on hourly observed data, have proven to significantly improve the agreement between generated and observed meteorological time series, while they consistently kept correlations between precipitation, temperature, and radiation. However, the main purpose of RheaG WGA is to feed simulation vegetation models with site-specific meteorological time series. That kind of meteorological record is often found in the form of daily values. Besides, in the past they covered too few years to capture straightforward correlations between daily meteorological variables and daily rain intensity. Slightly negative correlations between daily precipitation during the days when *P* > 0 and Tmax (*r*_{Barcelona} = −0.22, *n*_{Barcelona} = 2065; *r*_{Sevilla} = −0.15, *n*_{Sevilla} = 1672), Tmin (*r*_{Barcelona} = −0.17, *n*_{Barcelona} = 2065; *r*_{Sevilla} = 0.01, *n*_{Sevilla} = 1672) and *Q* (*r*_{Barcelona} = −0.13, *n*_{Barcelona} = 2065; *r*_{Sevilla} = −0.18, *n*_{Sevilla} = 1672) were observed in our data, with elevated intra-annual variability (see Figs. SM1 and SM2 and Table SM1 in the online supplemental material). Furthermore, it has been observed that sensitivity to biases in input precipitation and temperature for simulation models occurs mostly at monthly—rather than daily—levels (e.g., Wood et al. 2004; Fowler and Kilsby 2007). Hence, we opted to apply the most parsimonious criteria of not accounting for correlations within variables at a daily scale in order to reduce parameter number input.

### a. Precipitation

Precipitation tends to be the main driver variable in most of WGA (Kilsby et al. 2007). First-order Markovian transition probability matrices (MM) have been largely used to model the daily occurrence of climate events (Gabriel and Neumann 1962; Caskey 1963; Stern 1980; Sonnadara and Jayewardene 2015). They are a simple but effective way to model wet and dry spells’ length, frequency, and distribution if the original meteorological series analyzed are long enough (Katz 1981). However, they only generate binomial information—rain or no rain—for a single day. Thus, rain occurrence models must be coupled to rain amount distribution models to generate the meteorological time series. Exponential, potential, gamma, or mixed exponential rainfall amount distribution models have been successfully applied in several weather generator algorithms (Wilks and Wilby 1999; Ailliot et al. 2015). RheaG uses monthly based MM to determine if it rains during a given day, given that it rained the previous day. In the case of rain presence, RheaG describes the amount of precipitation during a rainy day from 15 probability classes in *p*[2^{(n−2)}] < *P*_{amount} ≤ 2^{(n−1)}, where *n* ∈ *I*{1, 2, …, 15}, and *P*_{amount} is the amount of precipitation during a rainy day. RheaG first randomly determines the *nP*_{amount} class for a given day, from the yearly probability distribution of belonging to *n* precipitation class, obtained from observed data. Then, RheaG generates actual precipitation for this day randomly, following Eq. (5):

where *P*_{day} is the amount of precipitation during a given day (mm day^{−1}), *x* is a random number between 0 and 1, *n* refers to the precipitation class—*n* ∈ *I*{1, 2, …, 15}, *P*_{month} is the mean daily precipitation for a given month, and *P*_{year} is the yearly mean daily precipitation.

Here we evaluate generated precipitation in two ways: first, we analyze the ability of RheaG to reproduce the statistical properties of the generated meteorological time series—that is, Markovian transition probabilities, rain occurrence probability, rain intensity, and probability distribution. Second, we compare the wet spell length probability between reference climate time series and generated time series. It is not explicitly included in the input for the RheaG, although is closely linked to the Markovian transition matrix. Hence, it is a good estimator of the agreement among statistical properties of observed and generated meteorological time series. As most of the complexity in weather generation relies on accurately describing precipitation patterns (Wilks and Wilby 1999), we set the main focus of this work on precipitation patterns in order to validate RheaG meteorological time series.

### b. Maximum and minimum daily temperatures and daily radiation

Daily maximum and minimum temperatures (°C) in a monthly basis—hereafter referred as “temperature values”—have been successfully modeled from Gaussian distributions (e.g., Mearns et al. 1984; Kilsby et al. 2007), and so have been implemented in the RheaG WGA, coupled to a first-order Markovian transition matrix to account for daily autocorrelation in temperatures, following Eq. (6):

where *T*_{j} is the temperature value for a given *j* day (°C), Ref_{j} is the reference mean temperature on the *j*th day (°C), and *σ*^{2} is the standard deviation of monthly temperature values. In addition, RheaG includes a MM-based probability to account for temporal autocorrelation trends in temperature. Therefore, a given day is going to add or subtract the standard deviation to the Ref_{j} values according to a MM-based probability.

Ref_{j} is a correction upon reference mean monthly temperature. It is calculated following Eq. (7) or Eq. (8), depending on being the day *j* within the first or the second half of the month, respectively:

where Ref_{j} is the reference mean temperature on the *j*th day; *μ*_{m} is the mean monthly temperature of that given month (°C); *μ*_{(m−1)} and *μ*_{(m+1)} are the mean monthly temperature values for the previous and the following month (°C), respectively; *d*_{m} is the number of days within the month *m*; *d*_{j} is the current Julian day; and *d*_{c,m} is the Julian day at the middle of the current month. These corrections are used to smooth the transition of the generated climate time series between one month and the previous or following month, alternatively. Last, RheaG recalculates daily temperature values if Tmax ≤ Tmin for a given day in order to avoid unrealistic output values.

Surface incident radiation *Q* (MJ ha^{−1} day^{−1}) for a given day is obtained following exactly the same procedure.

### c. Generation of daily meteorological time series under no climate forcing

To validate RheaG-generated time series, we selected four climate time series from the Agencia Estatal de Meteorología (AEMET) Spanish Meteorological Service (http://www.aemet.es/es/portada). This climate time series are representative of different climates present in the Spanish mainland. Bilbao (43.17°N, 25.42°W) represents the wet, northwestern Atlantic climate; Barcelona (41.25°N, 20.73°E) represents the moist Mediterranean-type climate; Sevilla (37.09°N, 53.66°W) represents the arid Mediterranean-type climate; and Madrid (40.24°N, 34.04°W) represents the continental climate. We used AEMET climate time series from 1989 to 2009 for all locations, hereafter referred as validation locations, to account for at least 20 years (Richardson and Wright 1984) of meteorological recorded data. In addition, we selected this period to avoid the longer gaps present in the register. Gaps in temperature and radiation were filled as follows: gaps shorter than 2 days were filled by linear interpolation. Gaps larger than 2 days were interpolated linearly from nearby meteorological stations. In addition, gaps in precipitation were all interpolated linearly from nearby meteorological stations. Only meteorological stations with *R*^{2} ≥ 0.8 were used to fill those gaps. Under current climate change conditions, it is very likely (IPCC 2013) a modification of climate statistical properties. So, by using only the last 21 years available, we made sure to obtain both the current means and standard deviations, as well as the most recent Markovian probability distribution. For each location, five independent 90-yr-long daily climate data series were then generated without any monthly data restriction, or any further climate forcing, to account for the ability of RheaG to reproduce the observed climate pattern in the original meteorological time series. We refer to these runs as reference (R) runs, as they do not include further forcing than observed in generated time series.

### d. Generating time series under climate forcing

RheaG also includes the option to generate meteorological data with climate forcing. RheaG includes two procedures for doing so. The first one, called the unconstrained one because monthly values for different meteorological time series are independent, generates climate just as described above for all years required, and at the end it adds three monthly cumulative anomalies for the values of Tmax, Tmin, and *P*, respectively. These anomalies are the monthly increment per year projected in Tmax and Tmin and the yearly percentage reduction in precipitation on a monthly basis.

The second procedure for meteorological time series generation under climate forcing, so-called the constrained one, first calculates a baseline monthly variability to be included in all time series to be generated. This monthly variation is generated randomly for Tmax, Tmin, *Q*, and *P* from a truncated normal distribution centered in the mean monthly values, with a variation that is function of the observed monthly standard deviation for this variable multiplied by a Gaussian-probability-based factor. It includes the option to account for autocorrelations among variables (e.g., months with positive increase in maximum temperature with respect to the mean are more likely to have positive increases in minimum temperature with respect to the mean). Final monthly values for a given meteorological variable are calculated as follows: for a given month *i* and during a given year, the reference value (Ref_{month}) for a given variable *x* follows the equation Ref_{month} = *N*(*μ*_{x,i} + Δ*x*_{i}, *a*_{x,i}*σ*^{2}_{i}), where *μ*_{x,i} is the observed mean monthly value for this given variable, Δ*x*_{i} is the anomaly of this variable under climate change conditions for the given month, *a*_{x,i} is the variability for a given *x* variable and *i* month—with values ranging between −1.5 and +1.5 and following a truncated Gaussian distribution centered in 0; and is the observed standard deviation for the given *x* climate variable during *i* month. The intention here is to maintain the same variation signal among generated meteorological time series for a given month when they are spatially autocorrelated. The *a*_{x,i} coefficient is, then, used to maintain the same underlying climate signal for the same month among different meteorological time series. It is noticeable here that absolute individual monthly values can also be included as an input for RheaG, so this WGA is able to disaggregate the more common monthly meteorological projections into the daily climate time series required by several simulation models.

Once reference monthly values are obtained, daily Tmax, Tmin, *Q*, and *P* values are generated iteratively for each month. Then, if mean Tmax, mean Tmin, mean *Q*, or cumulative *P* is equal to reference monthly values for a given month ± 0.5°C, 1 MJ m^{−2} month^{−1}, or 2 mm month^{−1}, depending on the variable, generated daily values are accepted. Conversely, if they are not within this range, generated daily values are rejected and RheaG iteratively generates another daily values set for the given month, until generated values match the monthly specific reference monthly values.

To assess how climate data generated from a given validation location vary under different climate forcing conditions—that is, climate change—we generated climate data following three climate forcing scenarios for Barcelona (Table 2) for the 2010–2100 period, and for three climate forcing scenarios: no static present climate (SPC), moderate climate forcing according to the RCP4.5 climate forcing scenario, and strong climate forcing according to the RCP8.5 climate forcing scenario (IPCC 2013). We kept all monthly anomalies for both temperature and precipitation equal, to make the results comparison more straightforward. We then compared the original daily, monthly and annual climate data probability distributions, with the newly generated climate time series under both RCP4.5 and RCP8.5 climate forcing scenarios.

All data processing and statistical analyses have been performed using R software (R Core Team 2017).

## 3. Results

### a. Validation of generated time series statistical properties

Madrid presented the lowest mean annual precipitation from the four validation locations climate data time series analyzed, with values around 370 mm yr^{−1}. Madrid also presented the widest range of temperature values, with 13°C between mean minimum and maximum annual temperatures (Table 3). Conversely, Barcelona presented the narrowest range, with only 8°C between minimum and maximum mean annual temperatures. Bilbao presented the highest annual precipitation values, around 1100 mm yr^{−1} for the 1989–2010 period. Annual mean maximum and minimum temperatures were the highest in Sevilla, with mean values of 13° and 26°C, respectively.

Differences between observed and generated climate time series in mean annual *P*, Tmin, and Tmax values were minimal for all four locations. They diverged less than 1% for all three variables considered, except for mean annual precipitation in Sevilla, which diverged about 1.5%, with slightly higher generated precipitation values than observed ones (Table 3).

Monthly daily rain probabilities between observed and generated climate time series were almost identical (Fig. 1, 0.96 < *R*^{2} < 0.99). However, in Sevilla, generated monthly daily rain probabilities diverged more than the observed ones (RMSE = 0.025, *n* = 60). Madrid (RMSE = 0.01, *n* = 60) was the validation location where generated monthly daily rain probabilities were most similar to the observed ones, followed by Barcelona (RMSE = 0.012, *n* = 60), and Bilbao (RMSE = 0.016, *n* = 60).

MM transition probabilities of generated climate time series were also consistent with observed properties for all locations (Fig. 2; 0.91 < *R*^{2} < 0.995). RMSE of the probability of observing a nonrainy day after a nonrainy day *p*(DD) ranged from 0.007 in Madrid to 0.031 in Barcelona. It is noteworthy that the relatively high divergence in Barcelona is attributable to an anomalous February *p*(DD) in one out of five of the meteorological time series generated, which can occur when Markov chains are used to randomly generate a finite number of iterations. Additionally, the RMSE among the observed and generated probabilities of observing a rainy day after a rainy day *p*(WW) reached its lowest values in Bilbao (0.015) and its highest values in Sevilla (0.048).

Newly generated climate time series maintained the observed negative linear correlation between p(DD) and daily rain probability (Fig. 3). They maintained both the overall trend and the climate time series’ specific trend for all four sites. The analysis of Bilbao meteorological time series presented two different season-related patterns of this relationship (Fig. 4): the first one comprises the cold and wet months (from November to May), and the latter one includes the warm months (from June to October). This indicates that, given the same daily rain probability, during warm months Bilbao presents a lower p(DD) than during the wet ones within the ranges of daily rain probability registered in the reference meteorological time series. Hence, an increase in the dispersion of the rain events during the warm season is observed in Bilbao.

### b. Comparison between observed and generated rain spell length

For all four validation locations, simulated rain spell length probability distribution matched the observed rain spell length (Fig. 5). In all cases, 1-day-long spell lengths were found the most probable case. A nonlinear decrease in occurrence probability was observed at increasing spell lengths. One-day-long spell length probabilities ranged from 0.35 in Bilbao to 0.55 in Barcelona, and Madrid and Sevilla being the middle cases, with values ranging from 0.45 to 0.5, respectively. Furthermore, the steeper decreases in spell length frequency were observed in the Mediterranean-type climates, Sevilla and Barcelona.

Spearman correlation scores between observed and simulated rain spell lengths were higher than 0.99 in all cases, with differences ranging from less than 2% in Bilbao to a maximum of 6% in 1-day-long rain spell length in Sevilla. No trend was observed in the differences among observed and generated climate time series. Nevertheless, RheaG tends to better reproduce low-frequency rain spell lengths in all locations and slightly fails to reproduce the observed frequency of short rain spell lengths, with no apparent trend among locations.

### c. Weather generation under climate forcing

Interannual variation in cumulated precipitation under SPC climate forcing (Fig. 6) better represented observed interannual variability in precipitation in Barcelona (602 ± 144 mm yr^{−1}) by using the unconstrained procedure (620 ± 133 mm yr^{−1}) than the constrained one (614 ± 72 mm yr^{−1}). The same pattern was also observed in mean annual maximum temperature: unconstrained values (19.4° ± 0.22°C) were more variable than constrained ones (19.5° ± 0.2°C), thus better reproducing the observed pattern (19.4° ± 0.54°C).

Following the unconstrained procedure, higher variability of monthly precipitation was observed in Barcelona on a monthly basis (Fig. 7). In addition, using the unconstrained procedure, the probability distribution of monthly anomaly of precipitation [(*P* – PMean)/PMean] was more in accordance with observed values than using the constrained one. There were peaks in autumn precipitation that reached ~320 mm month^{−1} for all three climate forcing scenarios in the unconstrained procedure, while maximum monthly precipitation values following the constrained procedure were lower, about 150 mm month^{−1} during autumn. However, as climate change intensity increased, the high precipitation months became less frequent for the two procedures (Fig. 8). On the other hand, the constrained procedure generated less variability in monthly precipitation values, with both maximum and minimum monthly precipitation values constrained within predefined ranges. In addition, under SPC conditions, daily probability of no rain was 0.844 and 0.827 for the unconstrained and the constrained procedures, respectively, while it was 0.846 in the observed data. In the RCP4.5 scenario, daily probability of no rain increased to 0.844 and 0.841 for the unconstrained and the constrained procedures, respectively. Finally, in the extreme RCP8.5 scenario, both procedures presented similar values of 0.844.

Daily Tmax probability distribution matched almost completely observed daily Tmax distribution for both the constrained and the unconstrained procedures (Fig. 9). Median observed daily Tmax values were 18.8°C. In a SPC scenario, observed median daily Tmax values were 19.1° and 18.8°C following the unconstrained and the constrained procedures, respectively. They increased to 20.4° and 20.6°C, respectively, under an RCP4.5 forcing scenario, and finally reached 22.4° and 22.9°C, respectively, under the extreme RCP8.5 climate forcing scenario.

## 4. Discussion and conclusions

We have described and tested RheaG WGA in four contrasting climates from Iberian Peninsula. RheaG accurately reproduces the statistical properties of the reference time series and is able to generate self-consistent series of meteorological variables following preestablished climate forcing scenarios. Given the intention behind the RheaG model, which is to generate meteorological time series to feed forest simulation models that run on a daily basis, our results have shown that both the simulated precipitation trends and the temperature probability distributions matched the observed values in the field when no further climate forcing—that is, SPC scenario—is considered.

In addition, RheaG WGA is able to generate meteorological time series under climate forcing. Meteorological time series generated under climate forcing take into account both the observed climate data statistical properties and the projected changes in those statistical properties—that is, anomalies in mean monthly temperature and precipitation. When new climate change projections will become available, RheaG outputs can be easily updated accordingly.

It is noteworthy the overall good agreement of generated time series to the observed monthly *p*(DD) and *p*(WW) values in all four locations, with correlations ranging between 0.97 ≤ *R*^{2} ≤ 0.99 (Fig. 2). Moreover, this good agreement in MM transition probabilities for all locations also derive in highly correlated monthly *p*(Rain) in all locations [0.96 ≤ *R*^{2} ≤ 0.99 (Fig. 1)]. Interestingly, in the case of Bilbao, RheaG is even able to reproduce observed bimodal seasonal trend (Fig. 4) that differentiates two climate seasons within Bilbao meteorological time series. These two seasons, the warm—from June to October—and the mild—from November to May—affect the relationship between the month *p*(Rain) and the month *p*(DD). During the coldest months, there is a higher *p*(DD) for the same *p*(Rain). As *p*(DD) and *p*(WW) are negatively correlated (Geng 1986), the trend observed in Bilbao indicates a higher rain concentration in fewer rain events during the mild period than during warm period, in which rain spells are shorter and more frequent. This different seasonal behavior in Bilbao leads us to emphasize the importance of monthly characterization of meteorological time series, rather than just working with annual averages.

Rain spell length directly depends on the Markov transition probability matrix (Sonnadara and Jayewardene 2015), but it is not a direct input for the RheaG WGA. Therefore, it can be used as an indicator of suitability for the generated meteorological time series. Our results show that observed and generated meteorological time series present a very close pattern of rain spell length probabilities. Most of the inconsistencies occur at 1-day spell length probabilities, with the Sevilla climate having lower agreement and the Bilbao climate having greater agreement. On the other hand, low-probability events are well represented in all locations.

Traditionally, temporal autocorrelations in precipitation generation have been implemented in different WGA using a broad range of algorithms: the simplest ones, such as the used in RheaG WGA, are based on first-order Markov probabilities. Including second-order Markov probabilities has been found to better describe precipitation patterns and to minimize the inconsistencies in 1-day rain spell lengths observed when using only first-order Markovian matrices (e.g., Buishand 1978; Chen et al. 2012), at the expenses of higher input parameter number. Recently, more complex rain-cell clustering approaches (Rodríguez-Iturbe et al. 1987), such as the Neyman–Scott rectangular pulses (NSRP; e.g., Kilsby et al. 2007; Burton et al. 2010) or the Bartlett–Lewis rectangular pulses (Onof et al. 2000) have also been developed. The latter models have proven to better reproduce the 1-day autocorrelation lag between rainy days than MM transition-based models (Kilsby et al. 2007). Hence, they better represent the 1-day lag between rain events than the ones reported by RheaG WGA (Fig. 5). On the other hand, no differences in mean monthly daily rainfall and monthly proportion of dry days were observed between MM-based and NSRP rain generation procedures (Kilsby et al. 2007). Additionally, NSRP models, compared with the most simple first-order MM-based models, are more expensive both in terms of parameter number and in the required length of the input observed meteorological time series (Ailliot et al. 2015). Therefore, in RheaG WGA we opted to maintain an input parameter number that was as low as possible in order to maximize parsimony, as well as to allow the WGA to run with shorter input observed climate time series.

Concerning to the two procedures of weather generation included in RheaG, the weather generation procedure that better reproduces both the annual and the monthly variability in precipitation of the observed time series is the unconstrained one. However, annual and monthly values for meteorological time series generated following this option are independent among sites and scenarios, as there is no spatial or temporal autocorrelation among time series generated. This leads to uncorrelated monthly and yearly values for each one, despite all of them maintaining equal statistical properties on the long run. Thus, we consider that the unconstrained procedure is the best and fastest option for independent one-site meteorological time series generation, as their meteorological time series are more accurate and realistic.

On the other hand, the generation of meteorological time series following the constrained procedure is more computationally costly—it takes 2 min to generate a 100-yr time series—than the unconstrained one, which takes about 30 s per time series. In addition, monthly precipitation values variability is more restricted, thus obtaining monthly precipitation values only within a given range (Figs. 7 and 8). Thus, this method does not allow us to generate the low-frequency high monthly precipitation events registered. However, it maintains the same background yearly climatic variability signal among time series. Keeping this background signal is important when there is spatial autocorrelation within generated meteorological time series, so monthly and yearly climate variability background has to be equalized to all generated meteorological time series. A further improvement of the constrained procedure, according to previous work on spatial autocorrelation of daily precipitation in the United Kingdom (Burton et al. 2013), would be the training and implementation of lag-0 cross-correlation-distance (XCD) double exponential algorithms for the mandatory variables (i.e., *P*, Tmax, Tmin, and *Q*), thus effectively describing the spatial autocorrelation of meteorological data for a given region based on the distance between meteorological stations. This opens an interesting field of research, as we hypothesize that this algorithm should be able to be trained at a monthly basis for Spain, if the amount of meteorological stations available for this training is large enough. If so, the capacity of RheaG WGA to generate spatially autocorrelated monthly reference time series at stations where its monthly values are unknown will be substantially increased.

Hence, the two RheaG weather generation procedures present an interesting trade-off between keeping the background climate signal similar in a multiplot generation with the constrained procedure, and the higher consistency with the observed annual precipitation variability obtained with the unconstrained one. Deciding to use the constrained or the unconstrained version of the RheaG WGA has to be carefully considered before each RheaG application—that is, we recommend applying the unconstrained procedure for generating climate time series for individual plots. When considering multiple plots where climate is spatially autocorrelated, the constrained procedure could be better as all meteorological time series generated maintain the same baseline monthly and annual climate variability. Thus, the procedure selection must be made wisely in accordance with the spatial scale to which RheaG WGA is intended to work.

## Acknowledgments

This work was funded by MED-FORESTREAM (CGL2011-30590-C02-01) project and MEDSOUL (CGL2014-59977-C3-1-R) project. Meteorological data were provided by Spanish Meteorological Agency AEMET in the frame of the MONTES-CONSOLIDER (CSD2008-00040) project. DN-S is supported by a FPI grant of the Spanish Ministerio de Economía y Competitividad (BES-2015-072983). DNS, SS, and CG are members of the research group FORESTREAM (AGAUR, Catalonia 2017SGR976). We the authors want to acknowledge the insightful comments provided by two anonymous reviewers. Specific author contributions: DN-S, CG, and SS designed the research; DN-S wrote the main text; SS and CG provided comments to the text; CG, DN-S, and SS formulated the RheaG algorithm; DN-S coded the RheaG weather generator algorithm in R language.

## REFERENCES

*Solar Engineering of Thermal Processes.*Wiley, 944 pp.

*Agricultural Environments*, A. H. Bunting, Ed., C.A.B. International, 149–157.

*Climate Change 2013: The Physical Science Basis*. Cambridge University Press, 1535 pp., https://doi.org/10.1017/CBO9781107415324.

## Footnotes

Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JAMC-D-18-0170.s1.

© 2019 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).