## Abstract

The effects of spatial aggregation on the skill of downscaled precipitation predictions obtained over an 8 × 8 km^{2} grid from circulation analogs for metropolitan France are explored. The Safran precipitation reanalysis and an analog approach are used to downscale the precipitation where the predictors are taken from the 40-yr ECMWF Re-Analysis (ERA-40). Prediction skill—characterized by the continuous ranked probability score (CRPS), its skill score, and its decomposition—is generally found to continuously increase with spatial aggregation. The increase is also greater when the spatial correlation of precipitation is lower. This effect is shown from an empirical experiment carried out with a fully uncorrelated dataset, generated from a space-shake experiment, where the precipitation time series of each grid cell is randomly assigned to another grid cell. The underlying mechanisms of this effect are further highlighted with synthetic predictions simulated using a stochastic spatiotemporal generator. It is shown 1) that the skill increase with spatial aggregation jointly results from the higher and lower values obtained for the resolution and uncertainty terms of the CRPS decomposition, respectively, and 2) that the lower spatial correlation of precipitation is beneficial for both terms. Results obtained for France suggest that the prediction skill indefinitely increases with aggregation. A last experiment is finally proposed to show that this is not expected to be always the case. A prediction skill optimum is, for instance, obtained when the mean areal precipitation is estimated over a region where local precipitations of different grid cells originate from different underlying meteorological processes.

## 1. Introduction

High-resolution scenarios for surface meteorological variables are often required for regional impact studies. As local surface meteorological situations are strongly influenced by the state of the atmosphere and its circulation at synoptic scales, these scenarios can be produced from statistical downscaling models (SDMs). SDMs are actually based on empirical relationships established, for recent decades and generally for a daily time step, between a selection of large-scale atmospheric variables (called the predictors) and the required local meteorological variables (called the predictands). A large number of SDMs have been proposed over the last two decades [see Maraun et al. (2010) for a review]. Often developed for local weather forecasting (e.g., Obled et al. 2002; Gangopadhyay et al. 2005; Marty et al. 2012, 2013), they can also be used to generate weather scenarios for past or future climates from outputs of climate models (e.g., Wilby et al. 1999; Hanssen-Bauer et al. 2005; Boé et al. 2007; Lafaysse et al. 2014) or to reconstruct weather scenarios from atmospheric reanalyses [specific events as in Auffray et al. (2011) or time series covering the past 50–100 years as in Kuentz et al. (2015) and Wilby and Quinn (2013)].

SDMs are frequently used to generate meteorological scenarios at one or more meteorological stations (e.g., Wetterhall et al. 2006; Frost et al. 2011). They are then often evaluated by their local prediction skill, that is, their ability to reproduce at each station the statistical properties and temporal variations of the observed predictand. Skillful predictions for the mean value of the predictand over a given spatial domain can, however, also be of major interest. For instance, in hydrological impact studies related to water resources, this especially applies for the mean areal precipitation over the considered territory. Aggregation (temporal or spatial) is actually expected to increase the skill of predictions (Buishand and Brandsma 2001; Gangopadhyay et al. 2004). The rather low prediction skill classically obtained for local precipitation is mainly due to their small-scale variability, which makes them very noisy. Aggregation allows reducing this small-scale variability by smoothing, leading in turn to increase the prediction skill.

If the proportionate relationship between prediction skill and spatial scale is well known in both weather and climate prediction communities, it is not so well known for hydrometeorologists, who often focus on small case studies and apply SDMs for the prediction of weather variables at single sites. In this context and to our knowledge, this issue was only really explored in the studies of Gangopadhyay et al. (2004) and Mezghani and Hingray (2009) for the very limited spatial context of small-sized river basins. In Gangopadhyay et al. (2004), the aggregation–skill dependency was presented for the upper Colorado River basin from a suite of circular aggregation domains with increasing size (with radii up to 150 km). In Mezghani and Hingray (2009), it was illustrated for the upper Rhone River basin with daily predictions issued in turn at the scales of the gauges, subbasins, and of the whole basin (5500 km^{2}). The present paper aims to enlarge these analyses in a broader spatial context, namely, the whole French metropolitan territory. We especially take the benefit of the high-resolution precipitation reanalysis available for this domain. This dataset allows for a rather accurate assessment of the aggregation–skill dependency. It additionally allows exploring how this relationship potentially depends on the region, on the season, and other factors.

In our work, downscaled precipitation estimates are obtained using an analog approach where analogs of the current generation day are searched for in a historical database on the basis of a similarity criterion applied to selected large-scale atmospheric variables. Precipitation observed for the *k* best analog situations are then used as probabilistic precipitation scenarios for the current day. Analog approaches have been widely used in recent years for the generation of multivariate daily weather variables at multiple sites (e.g., Buishand and Brandsma 2001; Gangopadhyay et al. 2005). Their major advantage is that they do not require restrictive assumptions concerning the joint distribution of the different predictands. As surface weather variables are sampled simultaneously from historical records for a given analog day, the generated fields are thus physically and spatially realistic and consistent within each day (because already observed). For metropolitan France, Chardon et al. (2014) showed that a unique analog circulation model can give a quasi-optimal prediction of local precipitation over large areas (up to 500 km wide). This analog model (AM) is further used in the present work to evaluate how the skill of a mean areal precipitation prediction depends on the spatial aggregation area.

This paper is organized as follows: section 2 describes the data, the AM, and the evaluation methodology. An analysis of the sensitivity of the prediction skill to spatial aggregation is carried out in section 3 and discussed in section 4 within several simulation frameworks. Our conclusions are presented in section 5.

## 2. Data, model, and evaluation

### a. Data

#### 1) Predictand

The predictand corresponds here to the daily total precipitation aggregated over a given territory in France. The mean areal precipitation is derived from the Safran (Quintana-Segui et al. 2008; Vidal et al. 2010) near-surface meteorological reanalysis that uses a grid covering metropolitan France (using the Lambert conformal conic system), made up of 8981 grid cells each measuring 8 × 8 km^{2}. For a given grid cell *p*, the spatial aggregation level (SAL) is defined as the number *n* of grid cells to be aggregated along each cardinal direction from *p*. The projections of the outer sides of the outermost grid cells defined by a given SAL delimit a square territory containing a number of grid cells equal to (2*n* + 1)^{2}. For example, the territories corresponding to SAL values equal to 9, 23, and 37 are illustrated in Fig. 1. The mean areal precipitation derived from an SAL equal to *n* thus consists of averaging the (2*n* + 1)^{2} precipitation values extracted from Safran grid cells located in the grid pad centered on *p*. Note that for a given grid cell *p*, some grid cells in the grid pad may have no precipitation data (grid cells located on the ocean or outside the boundaries of the country). For a given SAL and grid cell *p*, the time series of mean areal precipitation is only determined when at least 75% of grid cells within the corresponding grid pad have precipitation data. For a given SAL, a number of grid cells located in the bordering parts of France are thus excluded from the analysis (this number is larger for larger SALs). Finally, note that an SAL equal to 0 corresponds to a territory including only the single grid cell *p*. It thus corresponds to the raw 8 × 8 km^{2} Safran data.

#### 2) Predictors

The large-scale atmospheric predictors used for the identification of the analog dates are geopotentials fields at 1000 and 500 hPa from the 40-yr European Centre for Medium-Range Weather Forecasts (ECMWF) Re-Analysis (ERA-40; Uppala et al. 2005), with a 1.125° × 1.125° spatial resolution.

### b. Model

The AM applied in this study was initially developed for probabilistic quantitative precipitation forecasts (Obled et al. 2002; Bontron and Obled 2005; Marty et al. 2012, 2013) on several catchments in France. Recently, Chardon et al. (2014) evaluated its application for the generation of spatially coherent precipitation fields over the whole French metropolitan territory for hydrological impact studies. For a given day, a prediction follows four steps:

A seasonal filter is first applied in order to keep only candidates that belong to the same time of year using a sliding window of ±30 calendar days centered on the target date. The target date and neighboring days in a temporal window of ±5 days are excluded as candidates. The archive period is 1982–2001.

A similarity criterion between the target date and each candidate day is then computed. We chose the Teweles–Wobus score (TWS; Teweles and Wobus 1954) applied to the geopotential fields at 1000 and 500 hPa at +12 and +24 h UTC, respectively (Bontron 2004). To be coherent with past studies using the analog method (e.g., Marty et al. 2012), we have used the units (h). In addition, UTC indicates that geopotential fields taken at 1200 and 2400 (corresponding to 0000 of the day after) are used to identify the analog dates. It compares the shapes of geopotential fields, thus giving information on the large-scale atmospheric circulation and in turn on the origin of air masses.

The candidate days are then sorted according to the TWS, and the 25 nearest analog dates are retained.

The precipitation predictive for the target date is obtained from the empirical distribution of the precipitation values of the 25 nearest analogs.

The geopotential analogy domain has been optimized by maximizing the average prediction skill of the model over all Safran grid cells of the domain [the prediction skill is expressed by the continuous ranked probability skill score (CRPSS) described in section 2c]. The optimization was carried out for the period 1982–2001 for the prediction of daily precipitation only. The process for the spatial domain optimization is based on a growing rectangular domain algorithm described in Chardon et al. (2014, section 2b). The spatial limits of the resulting analogy domain are given in Fig. 1.

A single analog SDM is thus applied for all predictions. For a given prediction day, the same 25 analog dates are thus used to determine the 25 predicted values required for every Safran grid cell and every SAL.

### c. Evaluation

As each prediction is based on an ensemble of 25 analog dates, the criterion used for the evaluation of the AM is a probabilistic score. Inspired by common practice for meteorologists in ensemble prediction system (EPS) evaluation, we use a skill score based on the expected continuous ranked probability score (CRPS), called , introduced by Brown (1974) and Matheson and Winkler (1976). For a given grid cell, the obtained by an EPS denoted corresponds to

where and denote the cumulative distribution function (CDF) of the observation and the CDF derived from for prediction *i*, respectively; *x* denotes the predictand quantiles of the CDFs; and *M* is the number of predictions issued (i.e., all days of the considered *y*-yr period in the case of an annual analysis, the subset of days belonging to the considered season in case of a seasonal analysis). Note also that corresponds to the Heaviside function, where if and otherwise. A graphical interpretation of the CRPS for a given prediction *i* is illustrated in Chardon et al. (2014, their Fig. 1). Note that in our case, the CRPS was estimated for each prediction day between the aggregated precipitation value “observed” for this day and the 25 aggregated precipitation estimates obtained for the same grid pad from the 25 analog dates of this day. This estimation was done in turn for all locations and all aggregation levels, allowing estimation of the mean CRPS value for all these configurations.

An advantage of the is that it can give properties of an EPS through the decomposition proposed by Hersbach (2000):

where , , and corresponds to the terms of reliability, resolution, and uncertainty, respectively. The reliability term evaluates the statistical coherence between the predictions issued by and the observations. In practice, this consists of checking a posteriori that the *p*th percentile of the prediction is exceeded in % of the observations. The reliability term is positive and negatively oriented. When = 0, is said to be statistically reliable; higher values of indicate less reliability.

The resolution term expresses the ability of to discriminate the events with respect to those of a reference climatological EPS denoted , which basically predicts the same empirical CDF of the predictand for each prediction. In the present work, is simply based on a calendar climatology defined for each prediction day by the precipitation distribution of the days belonging to a seasonal window (±30 days) centered on the corresponding calendar day. The resolution term is a positively oriented score, that is, the larger it is, the more discriminates the events.

The uncertainty term of Hersbach’s decomposition corresponds to the that would be obtained with . By construction, is a reliable EPS. Nevertheless, it does not discriminate the events as a similar predictive is used for all prediction days belonging to the same seasonal context. Note that the climatological EPS is gridcell location dependent. In the present study, different EPSs were also determined for a given grid cell, corresponding to the different SALs considered for the predictand.

To spatially compare the prediction skill between two grid cells and between two SALs, we need to normalize the obtained by the AM by dividing it by the one obtained by (i.e., ). This leads to the CRPSS, which is expressed as

This score can be expressed as a function of the reliability, resolution, and uncertainty terms as

In practice, the was implemented by the NCAR Research Applications Laboratory into an R package named verification (available at http://cran.r-project.org/web/packages/verification/index.html). The function that we used for its computation is crpsDecomposition.R implementing Hersbach’s decomposition.

## 3. Results

### a. Prediction skill for local precipitation

The skill of the AM for local precipitation predictions (i.e., SAL equals 0) presents significant spatial variations over the whole French metropolitan domain, mainly driven by the topography (Fig. 2a). The highest CRPSS values, close to 0.4, are observed in the western part of the Massif Central mountains and in the northern Alps. Lower prediction skills, varying around 0.28, are observed in the plains and can reach 0.33 along the Atlantic coast. The lowest prediction skill is observed along the Mediterranean coast with values of CRPSS less than 0.20. This has to be related to the high variability of precipitation in this region characterized by a coefficient of variation ranging between 2.25 and 4.5 compared to lower than 2.25 for the other Safran grid cells (not shown).

The prediction skill also depends on the season. As classically obtained for midlatitudes as a result of more frequent convective precipitation in summer (e.g., Ayar et al. 2015), the CRPSS is, for instance, lower in summer than in winter (Fig. 3). Whatever the season, however, the spatial pattern of the prediction skill is roughly the same than that discussed for the annual analysis (cf. Fig. 2a): the prediction skill is considerably lower in the southeast.

Terms of the Hersbach’s decomposition are presented in the different maps of Fig. 2. A noticeable point is that the reliability term is negligible or fairly negligible compared to the other terms. For the annual evaluation and for a large number of grid cells, it is lower than 0.01 in Fig. 2b, whereas and (in Figs. 2c,d, respectively) are both for a majority of grid cells up to 0.2. This also holds for seasonal analyses (not shown here). Whatever the season, the AM can thus be considered as a reliable EPS and the CRPSS can be approximated as the ratio between the resolution and uncertainty terms:

As a result, the spatial distribution of the CRPSS only roughly depends on those obtained for the resolution and uncertainty terms. It is actually highly correlated to that for the resolution: regions with high values of correspond to those with high values of CRPSS (e.g., northern Alps, western part of the Massif Central mountains, and Atlantic coast), and vice versa.

### b. Sensitivity to spatial aggregation level

The aggregation–prediction skill relation for the French domain is shown in Fig. 4 with different CRPSS maps obtained for different values of SAL varying between 0 and 23. The higher the SAL, that is, the larger the territory considered for the aggregation, the higher the prediction skill is of the AM for every region in France. The CRPSS increases on average from 0.3 to 0.4 for values of SAL increasing from 0 to 23. No optimum is observed a priori. The increase in CRPSS actually results from both an increase in the resolution term and a decrease in the uncertainty term (Figs. 5b,c).

The aggregation–prediction skill relation depends on the region. In the example of Fig. 6a, the skill increase between SAL = 0 and SAL = 8 is larger in the south (up to 0.1) than in the north (about 0.05). Results also depend on the season. For the same aggregation configuration (from SAL = 0 to SAL = 8), the skill increase is higher in summer (in Fig. 6a; around 0.04 and 0.10) than in winter (around 0.01 and 0.06).

As shown by Fig. 6, regions where the CRPSS increase is the largest roughly correspond to regions where the spatial correlation in daily precipitation is the smallest. The same actually applies for the different seasons. For a given aggregation level, the higher skill increase obtained in summer can be related to the largest spatial variability of precipitation in this season.

## 4. Spatial correlation and skill score

The influence of the spatial correlation on the aggregation–prediction skill relation is further explored in the present section. We first consider an empirical analysis based on a “space-shake experiment.” The underlying mechanisms of this effect are further highlighted with synthetic predictions simulated using a stochastic spatiotemporal generator.

### a. Space-shake experiment

One way to evaluate the influence of spatial correlation would be to predict the mean areal precipitation that results from spatially uncorrelated precipitation. To do this, we built a new Safran dataset in which the precipitation time series of each grid cell was randomly assigned to another grid cell. Consequently, the spatial correlation between two neighboring grid cells is smaller in this new Safran dataset—referred to hereafter as “random Safran”—than in the raw Safran dataset.

The AM better predicts the mean areal precipitation with this random Safran dataset than with the original Safran dataset. Figure 7 presents results obtained for an SAL equal to 3. The CRPSS obtained in the random case varies from 0.45 to 0.53, whereas it varies from 0.2 to 0.35 in the original one. The prediction skill is even higher with this SAL than the prediction skill obtained with the highest considered SAL in the original Safran dataset (the CRPSS for an SAL equal to 23 is roughly equal to 0.45 in the original). This is mainly due to the differences in the uncertainty term, the values of being much smaller (inferior to 0.70 in Fig. 7c) in the random dataset.

This space-shake experiment is obviously equivalent to Monte Carlo simulations where each simulation consists in a random sampling of (2*n* + 1)^{2} grid cells from the whole French domain, from which the aggregation exercise is next performed (as we have 8981 grid points, the randomized Safran dataset corresponds to 8981 Monte Carlo simulations). There is therefore obviously no more geographical information in the maps presented for the results of these simulations. The CRPSS obtained for any aggregation level greater than one is thus also expected to be rather constant (obviously in space but also from one aggregation level to the other). Whatever the aggregation level, the (2*n* + 1)^{2} grid cells are actually a sample of the whole French domain. Even for a small aggregation level (say, 49 grid cells when SAL = 3), the aggregated precipitation from this experiment is thus expected to give an estimate of the mean precipitation signal from the whole French territory, that is, an estimate of a mean areal precipitation over a large aggregation domain. The high CRPSS values already obtained for an SAL of 3 in Fig. 7a were thus expected, and the fact that they are even higher than those obtained for the highest levels of aggregation in the original dataset is not a surprise (cf. Fig. 4f).

### b. Simulation experiment

The stochastic spatiotemporal generator we introduce in this section can be considered to be an SDM and will hereafter be denoted . The generation of space–time precipitation fields is not straightforward, especially given the intermittent nature of precipitation (leading to a mass point in zero), the skewed distribution of nonzero events, and the complex spatial structures that depend on the large-scale circulation type of the day (Leblois and Creutin 2013). For the sake of simplicity and without losing generality, the predictand is here represented by a Gaussian field. The generated fields are obviously not expected to represent precipitation fields. However, such a generation framework allows us to very easily control the quantities of interest like correlation and highlight the likely importance of such factors in the real data world.

In this section, the predictand and the spatiotemporal Gaussian generator will first be defined. The influence of certain regional parameters of on the prediction skill (i.e., the CRPSS and Hersbach’s decomposition terms) will then be evaluated. The role played by spatial correlation will finally be presented for the prediction of a mean areal variable.

#### 1) Experimental setup

In the following, several variables corresponding to the predictand and the scenarios issued by will be introduced. In accordance with standard practice in statistical literature, a random variable is denoted by an uppercase letter *X* whereas a realization of a random variable is indicated by a lowercase letter *x*. Variable denotes the random variable corresponding to the prediction of the random variable *X*. In this study, the letters *M*, *S*, and *N* correspond to the number of predictions issued, the number of grid cells for which a variable is generated, and the number of members of a probabilistic prediction, respectively. Indexes *i*, *s*, and *n* denote a given prediction, a given grid cell, and a given member, respectively.

##### (i) Predictand modeling

For each issued prediction *i*, we consider that the predictand is isotropic and second-order stationary for every grid cell *s* of the considered region. The spatial expected value of the predictand field varies for each prediction day *i*. From , a Gaussian field —for which the expected field is equal to 0 and the covariance matrix does not vary from day to day—is added in order to obtain the issued field :

where *N* denotes the Gaussian distribution. Here, is a realization of the signal driven by an ensemble of large-scale predictors through a downscaling link.^{1} The variable follows a standard normal distribution defined by

where corresponds to the variance of and is set to 1 for sake of simplicity.

The spatial Gaussian field structure of is defined by an exponential variogram as follows:

where and correspond to the variance and the range of the variogram, respectively.

In the following, the variable is referred to as the regional signal whereas the variable is referred to as the local field characterizing the local spatial structure and variability of the predictand.

##### (ii) The spatiotemporal Gaussian generator

Similar to the Gaussian field defined in Eq. (6), gives, for each prediction *i*, an ensemble of *N* spatial fields from the vector resulting from the combination of a mean signal over the whole territory to which a local spatial field generated from the vector is added:

where the curly brackets indicate a set of values, and is a realization of the signal representing the regional mean signal issued by . For a given prediction *i*, predicts an ensemble of *N* regional scenarios that are constant over the whole region. The expected regional error between and is equal to , such that:

The variable represents the regional mean error. It is distributed according to a Gaussian distribution of expectation and variance equal to 0 and , respectively.

For each prediction *i*, the generation algorithm is the one illustrated in Fig. 8.

Step 1: A regional mean error is generated from the distribution .

- Step 3: For each regional scenario (where the curly brackets indicate a set of values) a local field is generated over the
*S*grid cells from the random vector : In this experiment, we assume that the local spatial statistical characteristics of the issued fields are similar to those of the predictand defined in Eq. (8), that is, the covariance matrix and are similar. Scenarios generated for a given grid cell*s*and a given prediction*i*thus follow the distribution:

#### 2) Sensitivity to regional parameters

The parameter set of the experiment is the following:

the standard deviation of the regional error ,

the standard deviation between the regional means of the issued scenarios (controlling the dispersion of the regional scenarios around their expectation),

the standard deviation of the local predictand , and

the range of the variogram defining the statistical structure of the field around its mean regional value.

In the following, the spatiotemporal Gaussian generator (i.e., ) is used to evaluate the sensitivity of the prediction skill to the parameters listed above. The prediction skill of is evaluated following an approach similar to the one used in section 3b for the AM. This study is conducted along a one-dimensional axis composed of 25 regularly and systematically spaced grid cells. The number of issued predictions equals 10 000. For each grid cell *s* and each prediction *i*, the issued distribution is built from a number *N* of 50 scenarios generated by .

The ability of to predict the regional mean of the predictand is first evaluated. This corresponds to evaluating the ability of to predict the regional signal defined in Eq. (7). For each prediction *i*, the variable predicted by is the one for which the distribution is defined in Eq. (11). The prediction skill thus depends only on the standard deviation of the regional mean error and the standard deviation between the issued scenarios.

Figures 9a–c illustrate the reliability term, the resolution term, and the CRPSS for different values of and , respectively. In Fig. 9a, a reliability term value equal to 0 is obtained on the first bisector when equals . In this case, the generator is reliable. On each side of the first bisector, increases, indicating a decrease in reliability. For a given value of , increases with increasing difference between and and vice versa. The resolution term decreases with increasing and/or (Fig. 9b). Both the resolution and reliability terms are more sensitive to the standard deviation of the regional error than to . As observed in Fig. 9c, this impacts the CRPSS for which the value remains positive for larger values of compared to .

In Fig. 9, two classical behaviors are illustrated:

Increasing the value of for a given value of stretches the envelope of the issued scenarios. The majority of observations are found around the median of the issued distributions. Conversely, they are rarely found at the tails. Here is not reliable. Moreover, the larger the prediction standard deviation, the smaller is, characterizing a lower ability of to discriminate the different events. In that case, is said to be overdispersive.

Increasing the value of for a given value of puts the majority of observations outside the issued distribution or on the tails. Reliability is not achieved because observations are not uniformly distributed in the issued distributions. For values of smaller than those of , is said to be underdispersive. As pointed out by Hersbach (2000), the resolution term is very sensitive to observations found outside the range of values covered by the issued distributions. If this occurs frequently, will be low and the CRPSS will be very poor.

The results found in section 3b show that the AM can be regarded as a reliable SDM. In the following, we therefore only consider a reliable generator , that is, = . Considering a reliability term equal to 0, CRPSS becomes equal to the ratio between the resolution term and the uncertainty term [cf. Eq. (5)]. These terms are represented in Fig. 10 for a reliable . In this simulation experiment, is equal to 0.56 and does not vary as it does not depend on . Conversely, decreases linearly down to 0 as increases to 1, leading to a CRPSS equal to 0 (cf. appendix A). For larger values of , CRPSS becomes negative and no gain is obtained compared to a climatological EPS . In other words, the reliable SDM considered here performs better than a climatological EPS when the variability of the regional mean error is lower than the variability of the predictand.

This behavior can theoretically be found when the issued distribution follows a Gaussian distribution. For such a configuration, Gneiting et al. (2005) developed an analytic expression for the CRPS for a given prediction. This expression can be used to deduce the expected value for a reliable SDM . Appendix B shows that the is equal to , which corresponds here to —around 0.56—for the climatological EPS based on the climatology with a standard deviation equal to 1. Following Eq. (3), CRPSS thus becomes equal to

As the CRPSS values obtained for the AM were found to be positive (see Fig. 2a), we decided, for the sake of simplicity, to retain a pair of parameters that also leads to a positive CRPSS value, namely, = = 0.1.

#### 3) Sensitivity to spatial aggregation and correlation

To determine the sensitivity of prediction skill to both spatial aggregation and correlation, we now consider the spatial distribution of the predictand . For each scenario *n*, the experimental protocol consists of aggregating the issued scenarios over a number *K* of neighboring grid cells and in evaluating the behavior of the different evaluation criteria. In this section, the only varying parameters are the range of the exponential variogram [cf. Eq. (8)] and the number *K* of local aggregated variables. For a prediction day *i*, the random variable deduced from the aggregation of the predictand corresponding to the *K* nearest grid cells centered on a given grid cell *s* is represented by :

where represents the *K* nearest grid cells centered on grid cell *s*.

Similarly, for a given scenario *n*, the variable represents the aggregation of corresponding to the *K* nearest grid cells centered on grid cell *s*:

From the values of and defined in section 4b(2), the simulation is carried out following a one-dimensional axis composed of 25 grid cells. Note that a 2D simulation leads to similar results, as only the number of grid cells *K* considered for the aggregation influences the results. For this study, the value of was set to 0.5.

Figure 11 shows the resolution term, the uncertainty term, and the CRPSS versus the number *K* of aggregated grid cells for different ranges of the exponential variogram for the case of a reliable generator . For every tested value of , the results are similar to those found in section 3b for the AM, that is, the higher the number *K* of aggregated grid cells, the higher is the prediction skill in terms of CRPSS. This comes jointly from the increase of the resolution term and the decrease of the uncertainty term with the aggregation level.

Let us now consider the following two limiting cases:

These two limiting cases are represented in Fig. 11 by the dashed black lines. The small difference observed between the theoretical case (dashed black lines) and the curve resulting from simulations (in blue) comes from the rounding applied by the algorithm. If the predictand is spatially fully correlated, the variance of the variable does not depend on the number *K* of aggregated grid cells and remains constant, like the CRPSS [Eq. (18)]. On the other hand, if the predictand is spatially fully uncorrelated [cf. Eq. (20)], as the number *K* of aggregated grid cells increases, the variance of the variable continuously decreases, thus leading to a CRPSS increase.

In section 3b, the results suggested that the increase in prediction skill with spatial aggregation is linked to the spatial correlation level of the precipitation. This is confirmed by the simulation carried out here.

### c. Consideration of a spatial heterogeneous downscaling link

Results of experiments using the AM (cf. section 3b) and the spatiotemporal stochastic Gaussian generator (cf. section 4b) suggest that the prediction skill continuously and indefinitely increases with the SAL. However, these experiments implicitly assume that the predictands located between two distant grid cells are explained by the same large-scale information. In the case of the Gaussian generator, the prediction skill is moreover assumed to be spatially homogeneous, which is not the case in reality. This section attempts to deal with this aspect.

Synoptic predictors of precipitation depend a priori on the considered region. This is especially expected to be the case for regions located on opposite sides of large topographical barriers. A downscaling link optimized for one side of the barrier may thus not be appropriate for prediction on the other side. For example, Chardon et al. (2014) showed that an AM (based on the same analogy) optimized for the southeast of France was not well suited for the prediction of precipitation in the western part of France, because of the presence of the Massif Central mountains in between.

In the present section, our goal is to evaluate the prediction skill behavior when an AM optimized for one region is used to predict a mean areal precipitation estimated over a region under heterogeneous meteorological influences. To do this, we use the same AM introduced by Chardon et al. (2014), optimized for the prediction of a given precipitation grid cell (SE) located in the southeast of France (cf. Figs. 12a,b). In the following, this AM will be denoted , whereas the AM optimized for the whole of France, introduced in section 2b, will from now on be denoted . Both of these AMs are used to predict precipitation in the southeast quarter of France (SE; cf. Figs. 12a,b).

Similarly to the case in section 4b(3), is used to predict a mean areal precipitation estimated from precipitation spread along a one-dimensional axis represented here by a west–east transect centered on the SE grid cell (delimited by the black rectangle in Fig. 12b). Over this transect, three types of aggregation are carried out:

Aggregation toward west and east: The predictand corresponds to the aggregation of the precipitation of the

*n*nearest grid cells centered on the SE grid cell and extending on each side (west and east) of it.Aggregation toward west: The predictand is the aggregation of the precipitation of the

*n*nearest grid cells extending on the west side of the SE grid cell.Aggregation toward east: The predictand is the aggregation of the precipitation of the

*n*nearest grid cells spread on the east side of the SE grid cell.

Figure 12c represents the CRPSS as a function of the Safran precipitation grid cell added for the three considered aggregation types. Results obtained for an aggregation in both directions (west and east) are symmetrically presented for both directions (red curves). In this figure, CRPSS values obtained for (solid lines) are compared to those of (dashed lines). Figure 12b represents the spatial distribution of the CRPSS observed when is used for the local prediction of the Safran precipitation (without aggregation).

Similarly to Fig. 12c, Fig. 12d represents the CRPSS obtained for and for three other aggregation types along a south–north transect centered on the SE grid cell (cf. Figs. 12a,b). A similar color code characterizes the three aggregations, that is, red for aggregation toward south and north, orange for aggregation toward south, and blue for aggregation toward north. Gray solid and dashed lines respectively represent the prediction skill of and for the prediction of the local Safran precipitation.

In Figs. 12c and 12d, the gray lines show that the CRPSS is variable in space for both and . For example, the CRPSS decreases when going south to the SE grid cell, whereas it is rather stable when going north. Skill obtained for the prediction of a mean areal precipitation thus results from a compromise between the local prediction skill variability and the beneficial effect of the aggregation process in a homogeneous region:

If the local prediction skill of the grid cells neighboring the SE grid cell is lower than that of the SE grid cell, the negative effect of the lower prediction skill for the neighboring grid cells can easily be compensated by the positive aggregation effect. This can be observed for and when they are used to predict a precipitation spatially aggregated toward the north (cf. Fig. 12d) or the east (cf. Fig. 12c). When the aggregation is west oriented (cf. Fig. 12c), this result is only found for .

If the local prediction skill of the grid cells neighboring the SE grid cell is much lower than that of the SE grid cell, the negative effect of the lower prediction skill of the neighboring grid cells may be preponderant and may cancel the positive effect of the aggregation. This is especially the case for and when the aggregation is south oriented (Fig. 12d).

In some cases where the local CRPSS varies highly, an optimum prediction skill can be obtained for a given spatial aggregation level. This is, for instance, the case for for a west-oriented aggregation (Fig. 12c). From the first to the twelfth grid cell, the local prediction skill of the neighboring grid cells is equivalent to that of the SE grid cell: the positive effect of the good local prediction skill is added to the beneficial effect of the aggregation process. From the thirtieth grid cell, the local CRPSS decreases greatly (from 0.34 down to 0.25) and is even lower than that obtained by the SE grid cell. The negative effect of the lower CRPSS for grid cells 13–26 is here preponderant with respect to the positive effect of the aggregation, thus leading to an optimum CRPSS when only the twelve first grid cells are considered for the aggregation.

## 5. Conclusions

The sensitivity of mean areal precipitation prediction skill to spatial aggregation has been evaluated in this paper using an analog model (AM) optimized for the prediction of local Safran precipitation for the whole of France. It appears that prediction skill continuously increases with the spatial aggregation level (SAL). This increase is sensitive to the spatial correlation of precipitation of the considered region and/or season, as it is found that the prediction skill increase is higher from an uncorrelated precipitation field than from a correlated precipitation field.

For a better comprehension of the role of the spatial correlation, a simulation using a spatiotemporal stochastic generator was carried out to predict stationary Gaussian fields. As the AM was found to be a reliable SDM, the control of the parameters allowed us to build a reliable generator. An analysis of the sensitivity of predictions to the spatial aggregation and spatial correlation levels has shown that the prediction skill increase comes from both the decrease of predictand variability due to the aggregation process and from the higher resolution of the AM (and the lower uncertainty term). Large-scale predictors thus provide a better explanation of mean areal precipitation because the latter constitutes a more robust precipitation signal that is more closely related to predictor variability.

Experiments carried out in this study show results similar to those obtained by Gangopadhyay et al. (2004) and Mezghani and Hingray (2009), that is, the prediction skill increases with the spatial aggregation level. However, when a unique downscaling link is used to predict precipitation, the prediction skill does not a priori increase indefinitely with aggregation. This is particularly the case when the mean areal precipitation comes from local precipitation with underlying meteorological processes different from those represented in the downscaling link, a situation that decreases prediction skill. For the prediction of mean areal precipitation, optimum prediction skill comes from a compromise between the beneficial effect of aggregation, which reveals a more robust regional precipitation signal, and the adverse effect of including heterogeneous underlying meteorological processes.

Our work may finally also give some guidance for practical applications with SDMs. In most SDM developments, the parameters of the model (predictors, analogy domain, type of downscaling link, etc.) are classically estimated from punctual data for each station individually. Our analysis shows that the prediction skill increases with the aggregation level. This means that the errors of prediction are lower after spatial averaging. This also means that the link between the regional precipitation signal and the large-scale predictors is improved by the aggregation process by smoothing the noise coming from local precipitation variability. This also suggests that the statistical relationship is more robust. Considering spatially average variables should therefore be paid more attention in the development process of SDMs: it is expected in most cases to increase the robustness of the downscaling links and in turn the skill of predictions.

## Acknowledgments

The authors want to specially thank J. Blanchet for her relevant suggestions concerning the writing of the spatiotemporal Gaussian generator. We also thank three anonymous reviewers for their interesting comments on the previous versions of the manuscript.

### APPENDIX A

#### Prediction Skill in Terms of Average CRPS when the Standard Deviations of the Regional Error and Regional Scenarios are Equal to that of the Regional Signal of the Predictand

If the standard deviation of the regional mean error and the standard deviation of the generated regional scenarios are equal to that of the regional signal to predict, does not outperform . For a given prediction *i*, the distribution of the regional scenarios is represented by the Gaussian law . The obtained by for the prediction of is

Subtracting the mean of the regional scenarios from the predictor and the predictand, the becomes

As and both follow the same Gaussian distribution when and are equal to 1, the can be written as follows:

where represents a climatological EPS based on the climatology of .

### APPENDIX B

#### Expression of the CRPS for a Reliable SDM

Gneiting et al. (2005) expressed the CRPS for a given prediction following a Gaussian law of mean *μ* and variance as follows:

where and *ψ* correspond to the CDF and the probability density function (PDF) of a standard Gaussian law, respectively. Variable *y* is the realization of the predictand *Y*, and represents the normalized mean error of the prediction.

In this paper, *μ* corresponds to the sum of two normally distributed variables and defined in Eqs. (7) and (11). As our generator is considered as reliable, = [cf. section 4b(2)]. Replacing *y* by , *μ* by , and in Eq. (B1), the expected value of the CRPS can be expressed in three terms:

with

As the variable follows a Gaussian distribution of mean 0 and , *A* becomes equal to

According to Stein’s lemma (Stein 1981), if *F* is a function for which the two expected values and exist, then

where *x* is a realization of a random normally distributed variable of mean *μ* and variance . Replacing *F* by , by *ψ*, and *X* by the variable of mean 0 and variance 1, Eq. (B5) becomes

Introducing the equality of Eq. (B6) in Eq. (B4), *A* becomes equal to *B* and the can thus be expressed as

where

For a variable *X* following a standard Gaussian law, the expected value in Eq. (B8) is deduced using the Gauss integral,

Variable *D* becomes equal to and the expected value of CRPS for a reliable generator that predicts a Gaussian law finally becomes

## REFERENCES

## Footnotes

^{1}

The reader should take care not to confuse the variable with the number of predictions *M*. Throughout this section, the letter corresponds to the uppercase of Greek letter *μ*.