## Abstract

The paper presents an approach for conditional airmass classification based on local precipitation rate distributions. The method seeks, within the potential region, three-dimensional atmospheric predictor domains with high impact on the local-scale phenomena. These predictor domains are derived by an algorithm consisting of a clustering method, namely, self-organizing maps, and a nonlinear optimization method, simulated annealing. The findings show that the resulting spatial structures can be attributed to well-known atmospheric processes. Since the optimized predictor domains probably contain relevant information for precipitation generation, these grid points may also be potential inputs for nonlinear downscaling methods. Based on this assumption, the potential of these optimized large-scale predictors for downscaling has been investigated by applying an artificial neural network as a nonparametric statistical downscaling model. Compared to preset local predictors, using the optimized predictors improves the accuracy of the downscaled time series, particularly in summer and autumn. However, optimizing predictors by a conditional classification does not guarantee that a predictor increases the explained variance of the downscaling model. To study the contribution of each predictor to the output variance, either individually or by interactions with other parameters, the sources of uncertainty have been estimated by global sensitivity analysis, which provides model-free sensitivity measures. It is shown that predictor interactions play an important part in the modeling process and should be taken into account in the predictor screening.

## 1. Introduction

Statistical downscaling has become a well-established tool in regional and local impact assessments over the past few years. Precipitation downscaling is especially paramount for impact studies to correct the spatial and temporal structure of precipitation from coarse models. To resolve the scale discrepancy between global circulation models (GCMs) and local-scale weather, a vast number of algorithms and methods have been proposed and extensively tested. A comprehensive review on different precipitation downscaling approaches and their advantages and drawbacks is given by Maraun et al. (2010). Despite the extensive research on downscaling methods, there is still little consensus on the choice of useful atmospheric predictor variables (Wilby and Wigley 2000). Besides the general decision of a proper statistical downscaling model, the selection of an *informative predictor* set is crucial for the accuracy and stability of the resulting downscaled time series (Fealy and Sweeney 2007; Huth 2004; Fowler et al. 2007; Maraun et al. 2010; Bárdossy et al. 2002).

This statement relates both to the atmospheric variables as well as the predictor domains in terms of geographical location and spatial extent, to which in general not much attention is paid, which was also noted by Goodess and Palutikof (1998) and Wilby and Wigley (2000). Ideally, suitable predictors for perfect prognosis (PP) approaches should fulfill the following conditions: (i) a stationary relationship between predictors and predictands, (ii) a valid relationship under future climate conditions, (iii) predictors should have a high predictive power (explained variance), and (iv) predictors should be reasonably well simulated by the driving dynamical model (Maraun et al. 2010). For the first two conditions it is important that the statistical relationships found can be understood in physical terms.

Some circulation-related predictors, such as sea level pressure, fulfill most requirements and are therefore often employed as predictors in PP downscaling studies. However, on small scales, the sole use of circulation predictors often turns out to be insufficient to capture the complete variability. Therefore, some studies additionally consider thermodynamical variables and the amount of available water in the atmosphere (e.g., Wilby and Wigley 1997; Murphy 2000; Beckmann and Buishand 2002). Cavazos and Hewitson (2005) assessed the relative skill and error of 29 potential atmospheric predictors from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis and their relevance for precipitation at 15 globally distributed locations. An artificial neural network (ANN) downscaling model was applied, including most relevant predictors of daily precipitation. Then, the most relevant predictors were determined by a rotated principal component analysis. Midtropospheric geopotential heights and midtropospheric specific humidity were found to be the dominant predictors at all locations. Most other predictors, however, show a strong regional and seasonal dependence.

Only a limited number of papers are interested in the predictive capability of the domain size or shape and the question to what extent variability of neighboring grid points influence local-scale events. Wilby and Wigley (2000) emphasized the spatial relationships between observed daily precipitation and both mean sea level pressure and atmospheric moisture for selected regions in the United States. They found that in many cases, maximum correlation between precipitation and mean sea level pressure is not directly located above the target grid cell, whereas correlations with specific humidity were largest when the predictors were right above the predictand position. Recently, D’onofrio et al. (2010) used a weather pattern classification to optimize the spatial domain of predictors using a set of 17 atmospheric variables. They found that the optimum spatial domain is smaller than the synoptic scale and therefore suggested considering different domains for dynamical and nondynamical predictors.

The approach of this study to derive natural three-dimensional predictor domains was developed for a new weather generator, which is best driven by a number of airmass classes with unique local precipitation distributions. Rather than using a predefined set of potential predictors and fixed regular domains for airmass classification, these factors are optimized by a modified nonlinear optimization algorithm. The algorithm combines a clustering method using nonlinear unsupervised self-organizing maps (SOMs) and a generic simulated annealing (SA) algorithm (see Fig. 1). This concept respects the idea that atmospheric subspaces with irregular spatial boundaries may be more informative for finding linkages between atmospheric characteristics and local phenomena than common rigid domains.

Most of the optimized domains can be interpreted in terms of well-known atmospheric processes. Consequently, one may expect that, if these domains contain relevant information for precipitation generation, the grid points should also be suitable inputs for nonlinear downscaling methods. To establish a nonlinear empirical relationship between local-scale predictand and the previously optimized predictors, we utilize static artificial neural networks (ANNs), as in many other downscaling studies (e.g., Cavazos and Hewitson 2005; Hewitson and Crane 1996; Schoof and Pryor 2001; Wilby et al. 1998; Coulibaly et al. 2005). However, using a set of optimized predictors still poses the fundamental question whether all members increase the explained variance of the model’s output either individually or even by interaction with other predictors. So far, no downscaling study has paid much attention to either the interactions of individual variables or grid points or to the different sources of uncertainty in the model input. If one would like to achieve a full understanding of the model’s sensitivity pattern, it is necessary to recover the complete variance of the predictand, especially for nonlinear nonadditive models (Saltelli et al. 2006). Hence, a model-free sensitivity measure is required that is valid independently of the degree of linearity. An attractive model-free approach uses variance-based methods, which can also account for nonlinear nonadditive models, unlike methods based on normalized derivatives. Once, a robust ANN model is trained, the first- and higher-order effects are estimated by Monte Carlo (MC)-based global sensitivity analysis (GSA). Taking advantage of the MC runs we also analyze the model output within specific bounds of interest (*factor mapping*).

The present study intends to show that physically interpretable predictor subspaces in the potential domain can be identified by conditional airmass classification and thereby help to understand linkages between atmospheric characteristics and local phenomena. Also, in view of the difficulties of nonlinear predictor screening, the paper addresses the question whether such predictor sets could improve the accuracy of nonlinear downscaling methods. In doing so, the influences of each predictor are estimated with the objective to evaluate the quality of the predictor set. The proposed approach has been applied and evaluated to daily precipitation data in the Rhineland region of Germany.

## 2. Predictor selection

The conditional airmass classification combines a nonlinear unsupervised SOM clustering method and a generic probabilistic SA algorithm. The SOM clustering aims to find a predefined number of characteristic classes, such that each day can be assigned unambiguously to one of these classes. Rather than searching for characteristic circulation patterns, the classification is constrained by the local precipitation distributions. The selection of the predictors aims at making the precipitation distributions of every class clearly different. The selection of these predictors is performed by a SA algorithm. Both algorithms are specified in the next two sections.

### a. Self-organizing maps

Self-organizing maps is one of a multiplicity of unsupervised clustering algorithms and have become more popular, also in the field of atmospheric science, during the recent decades (Crane and Hewitson 2003; Hewitson and Crane 2002; Lynch et al. 2006; Kohonen 1982). Rather than identifying characteristic regions in data space by maximizing the within-group similarity and the difference between groups, SOMs aim to find a predefined number of representative multidimensional prototypes (position vectors) in data space. These prototypes are mapped onto a lower dimensional map space called Kohonen map, which intends to preserve the topological properties of the input space. This low-dimensional presentation is, in particular, useful in describing and visualizing high-dimensional data. In the beginning of the learning phase prototypes are initialized randomly in the data space and are adjusted iteratively during the learning process.

Let *M* be a set with *μ _{M}* data points, each specified by an

*n*-dimensional predictor vector

**x**

*(e.g., air temperature, specific humidity, etc.):*

_{i}So, we can define another set *N* consisting of prototypes **n*** _{i}* composed of a randomly initialized vector

**w**

*within the same input data space and a node vector*

_{i}**k**

*located on a discrete two dimensional topological map (Kohonen map):*

_{i}Here *μ _{N}* is the number of desired number of classes. At each iteration step

*t*we randomly select one of the predictor vectors and calculate the metric distance to each vector in

*X*. The prototype

*n*with the least distance is called the “best-matching unit” (BMU). Subsequently, the vector of the BMU and nodes, closer than a certain time-dependent distance threshold

_{w}*δ*from the winner node (kw) on the topological map, will be adjusted. The prototype subset, which will be updated, is defined as

^{t}All vectors of the subset are slightly adjusted toward the selected predictor vector , according to the Hebb learning rule:

The degree of adaptation is determined by the learning rate *ɛ ^{t}*, the weight function

*h*, and the distance between the prototype and the predictor vector. During the learning process the learning rate smoothly decreases. The weight function (Gaussian update kernel) is reduced by a time-dependent adaptation radius.

^{t}Once no more changes in location of the prototypes are observed or the maximum number of iterations is reached, the algorithm terminates. Under perfect conditions prototypes are arranged such that they span the entire data space but with more prototypes in regions of high data density (Hewitson and Crane 2002; Kohonen 1982). The properties of SOMs allow one to approximate a nonlinear distribution in predictor space despite the linear measures. A disadvantage of SOMs is that the user must define a large number of parameters such as the size and shape of the update kernel, learning rates, and number of classes (prototypes). After training each data point is assigned to the closest node.

### b. Simulated annealing

The problem of finding an optimal predictor subset on a discrete configuration space is an example of combinatorial minimization. Solving this problem with exhaustive search algorithms requires exponential computing time; this problem thus belongs to the complexity class “NP complete.” For a large number of grid points in the configuration space, it is impossible to explore all possible combinations. In the case study presented in this paper, the number of possible solutions is 2^{11730}. In many cases such combinatorial problems can be solved, or at least well approximated, by the heuristic probabilistic simulated annealing (SA) algorithm (Kirkpatrick et al. 1983). The concept of simulated annealing was first introduced in numerical calculations by Metropolis et al. (1953) and has since become a widely applicable heuristic optimization technique.

The iterative algorithm starts with a predefined domain configuration by setting “seeds” for each predictor. In each step, the configuration is rearranged by adding or removing one randomly selected grid point after which the change in energy Δ*E* (cost function, see below) between the old and new configuration is computed. Based on Δ*E* the system changes its configuration from energy *E*_{1} to energy *E*_{2} with probability

where *T* is a control parameter (analog of temperature). If Δ*E* ≤ 0, that is, the solution becomes better, the rearrangement is always accepted [Pr(Δ*E*) = 1]. In case Δ*E* > 0, that is, the solution becomes worse, the solution is accepted with this finite probability, Pr(Δ*E*), to avoid getting trapped in a local minimum in the cost function. During optimization, *T* is gradually lowered (annealing schedule). The SA algorithm was implemented with the modules detailed below. See Table 1 for an overview of the settings.

#### 1) Rearrangements

At the beginning of each iteration step the algorithm determines whether a new grid point is added or a seeded grid point is removed. The ratio of additions to removals varies in time and oscillates harmonically with a frequency of 0.001 between a ratio of 0.7 and 0.3. This strategy reduces the chance that the algorithm gets stuck in a local optimum. If grid points are added, the chance of selecting one of the directly surrounding grid points is higher (0.9) than selecting farther grid points. In case no improvement is observed after 400 consecutive iterations, the configuration is set back to the best configuration.

#### 2) Cost function (energy)

The classification is conditioned on the local-scale precipitation distributions measured at eight weather stations. For every possible pair of classes the *χ*^{2} test is used to determine whether the two distributions significantly differ on a 95% level. The optimization algorithm maximizes the total number of different distributions. Additionally, the *χ*^{2} values are summed to get a measure of how distributions are different. In case two model runs have the same number of different distributions, the final decision is determined by the summed *χ*^{2} value.

#### 3) Annealing schedule

The annealing schedule is determined empirically. The temperature is held constant for 2000 reconfigurations, or for 100 successful reconfigurations so as to reach approximately thermal equilibrium. The temperature is consecutively lowered by a constant cooling rate of 10%. Owing to the large number of grid points this rate does not guarantee finding the global minimum.

## 3. Global sensitivity analysis

In general, sensitivity analysis permits inferences on the different sources of uncertainty of predictors (inputs) by decomposing the variance of the model output. The most common approaches are based on derivatives to derive the relative importance of individual predictors. However, to investigate the nonlinear and nonadditive behavior of predictors it is important to further estimate higher-order effects (interactions). Such effects are efficiently estimated by decomposing the output variance by a global sensitivity analysis (GSA). This section briefly describes how model-free sensitivity measures are derived from variance-based methods. Let us assume a generic model *f* (which is replaced by the ANN later in this study),

with the corresponding total or unconditional variance *V*(**Y**). If one predictor **X*** _{i}* is fixed at a particular value , the resulting conditional variance of

**Y**is accordingly . Since the conditional variance will be less than the unconditional variance, this measure characterizes the relative importance of the predictor

**X**

*. That this sensitivity measure depends on the value of makes it rather impractical. Taking, instead, the average of this measure over the valid ranges of with uniform distribution, the undesired dependence will disappear (Saltelli et al. 1999, 2006). We can obtain following expression:*

_{i}where the second conditional variance on the left-hand side is called the first-order effect of **X*** _{i}* on

**Y**. The corresponding first-order sensitivity index of

**X**

*is given by*

_{i}This sensitivity index indicates the importance of individual predictors without considering interaction effects. In case the model belongs to the class of additive models, the first-order terms add up to one: . If this is not the case, the remaining variance must be explained by the higher order effects (interaction) between predictors. Interactions represent an important feature, especially of nonlinear nonadditive models. The total sensitivity *S _{Ti}* of a factor

**X**

*is made up of the first-order and all higher order terms where a given predictor*

_{i}*X*is participating, consequently giving information on the nonadditive character of the model. The can be computed using

_{i}where *X*_{~i} indicates that all predictors have been fixed and only *X _{i}* varies over its uncertainty range. This approach permits, even for nonadditive models, one to recover the complete variance of

*Y*. The indices can be efficiently computed by Monte Carlo–based numerical procedures (Saltelli et al. 2006, 1999).

## 4. Case study: The Rhineland

### a. Data and setup

Daily precipitation measurements for 1971–2000 have been obtained from the German Weather Service (DWD) for eight weather stations in the Rhineland (Germany), providing a set of 8 × 10 592 samples. For this period complete datasets are available for all stations. The large-scale atmospheric data is derived from the 40-yr European Centre for Medium-Range Weather Forecasts Re-Analysis (ERA-40), provided on 2.5° × 2.5° regular geographical coordinates. The potential predictor domain is centered on central Europe ranging from 35° to 75°N, 12.5° to 42.5°W. Six variables have been used: air temperature (*T*), specific humidity (*H _{S}*), vertical velocity (

*w*), divergence (div), potential vorticity (PV), and geopotential height (

*z*)—on five pressure levels (1000, 850, 700, 500, and 300 hPa) for classification. Inherently, some of the predictor variables used show significant correlations, for example,

*H*and

_{S}*T*, or div and

*w*.

The algorithm was initialized by setting the seed for each predictor to the grid point closest to the precipitation observation (50°N, 5°E) in the potential predictor domain. For the high performance clusters at the University of Aachen, several SOM clustering computations have been performed with a different number of classes ranging from 3 to 10. First, all models that produced significantly different precipitation distributions for all classes and stations have been selected (see cost function in section 2b). From these models, the model with the greatest number of classes was selected so as to get a possibly high number of independent classes. According to this criterion the best SOM architecture consists of six classes. An example of the cumulative precipitation distributions is given for one of the weather stations, Aachen (see Fig. 2). To assure that the algorithm has converged we restarted the optimal configuration and compared the two runs. The second run converged to the same subdomains, indicating that the algorithm found a robust optimum.

### b. Predictor domains

Figure 3 shows the optimized subdomains used for the final classification. Note that the algorithm identifies both continuous and disjointed subdomains (e.g., *T*, PV). The shape of the predictor domains depends on the physical processes involved, while the algorithm is seeking the best domain that includes most relevant information for precipitation generation for all air masses. In the following discussion, predictor domains are analyzed concerning their spatial structure as well as their inherent dynamic and thermodynamic characteristics.

The *H _{S}* domain (Fig. 3a) is situated directly above the study area and its spatial extent is much smaller than those of the other subdomains. Wilby and Wigley (2000) found similar spatial correlation patterns between precipitation and

*H*for six regions in the United States, tending to overlap with the target. Together with the

_{S}*w*pattern (Fig. 3b), extending upward to a height of several kilometers (300 hPa) and sloping backward toward the west, this formation captures air masses advancing aloft a frontal surface. This pattern is well-known from rising warm air advection in advance of developing surface lows and condensation of water vapor in the upper atmosphere.

The div dipole (Fig. 3c) is situated between the upper (300 hPa) and ground-level (1000–700 hPa) atmosphere, with the lower region directly connected to the lower edge of the *w* domain. This is essentially due to the close linkage of the two predictors by the continuity equation. Depending on whether air masses are ascending or descending (in the *w* domain), the two div regions behave either as sinks or sources (see Table 2).

Useful information on diabatic processes can be obtained by changes in the PV (Davis and Emanuel 1991; Rossa et al. 2000). As PV is the product of absolute vorticity on an isentropic surface and static stability, a release of latent heat is concentrated within the domain of enhanced PV values. Therefore, strong baroclinic zones, where much diabatic heat release takes place, usually form low-level PV anomalies (near the ground, see also Fig. 3d). Besides the derivation of diabatic processes the vorticity equation admits, by mean of its invertibility, inferences on other meteorological fields such as geopotential height, air temperature, wind, and static stability. In the North Atlantic region, the highest cyclone track densities are found around 60°N (Ulbrich et al. 2008, 2009; Wang et al. 2006), slightly varying with the seasons. Consequently, significant changes of PV particularly take place in this zone. This is in good agreement with the obtained PV subdomain, which ranges from the northern part of Germany along the west coast of Norway up to 62.5°N. The structure also nicely reflects the often observed comma-shaped pattern forming in the proximity of low pressure systems.

Unlike the subdomains of the circulation and moisture predictors, two disjointed *T* domains (Fig. 3e) are located over Spain and Iceland. This is not surprising since the effect of temperature change due to vertical movement is usually lower than that due to advection. There is no evidence that wetter or dryer events could be explained by westerly advection of warmer or colder air masses. However, the seasonal occurrence of air masses coincides with the corresponding temperature signal. Typical summer airmass patterns, such as AM1 and AM2, are characterized by warmer air temperatures in both *T* domains.

Finally, the *z* domain is shifted northeast to the Rhineland and stretches vertically from the upper to the middle troposphere with small horizontal extent (Fig. 3f). The size of the domain strongly contrasts that of the linear correlation map, which shows strong correlations of precipitation and *z* over a great distance. Accordingly, it is advised to be careful about using correlation as a measure of predictive power of predictors for downscaling.

### c. Airmass classes

To create a greater understanding of the six airmass classes and thus of their influence on local phenomena, it is key to understand the physical processes involved. These processes are closely related to the spatial domains, which have been discussed in the previous section. Some mean characteristics of the predictors and the predictand for each class are given in Table 2. The resulting composite mean sea level pressure and specific humidity maps are presented in Fig. 4. These classes can be roughly divided in three low (AM1, AM3, AM5) and three high pressure systems (AM2, AM4, AM6).

Frequent rainfall events in the study area are associated with advection of humid air masses from the west and southwest. These are typical conditions for AM1 and AM3 (see Fig. 4 and Table 2). In both cases the advection is driven by a low pressure system north of the British Isles. Within the complete *w* subdomain vertical winds are negative, indicating a large-scale ascent of air masses. Once the frontal system passes the study area and the low pressure system is located above Scandinavia, uplift rates reduce and the flow changes to the northwest. This situation is typical for AM5.

During summer and autumn the air contains more moisture due to the increased saturation vapor pressure and evaporation (see Fig. 4 and Table 2). This phenomenon can be especially observed for AM1 and AM2, as indicated by the higher frequency in Fig. 5. Obviously, such quantitative fluctuations in moisture content are mainly attributed to seasonal temperature differences. Nevertheless, an increased moisture content does not implicate higher mean rain rates (see Table 2). A good example is given by AM2 and AM3 although both air masses do have similar moisture contents in the *H _{S}* domain, which is located directly above the study area, the mean rain rates show great differences. Other physical conditions, such as vapor pressure deficit, may be more useful predictors for this purpose.

The much dryer classes, AM2 and AM6, show pronounced high pressure systems over central Europe accompanied with dry air masses in the midtroposphere. As a consequence of the large-scale subsidence, divergence takes positive values in the full div subdomain. The associated adiabatic heat release leads to changes in the vapor pressure deficit and hence to a reduction of relative humidity. This is evident above all in the reduced precipitation probabilities.

### d. ANN downscaling

In the next step, an empirical relationship between local-scale precipitation and the previously optimized predictors (47 predictors in total) is derived by a nonparametric static feed-forward ANN (M1). It should be mentioned that the predictor screening was performed on the same dataset being used for downscaling. However, since the predictors were not explicitly optimized to improve the accuracy of the downscaling, independence is still given. Next to the standard statistical model M1, for the purpose of comparison two additional ANN models are trained using different predictor settings. The statistical model M2 utilizes the same six variables as the standard model M1, but the domain is limited to the Rhineland itself. The domain is selected by computing standard product-momentum correlation coefficients for a 3 × 3 gridpoint domain above the target location. The pressure level with the strongest correlations is selected. Accordingly, the following group of predictors (54 predictors in total) was selected: *T*_{1000}, *H _{s}*

_{700},

*w*

_{700}, div

_{300}, PV

_{500}, and

*z*

_{500}. The choice of predictors for model M3 (18 predictors in total) goes back to the work of Cavazos and Hewitson (2005), who found that

*z*

_{500}and

*H*

_{s700}are the most relevant controls of daily precipitation in the midlatitudes; this matches the heights used in M2 for these two variables.

By randomly selecting individual data elements the dataset is divided in three sets: training (70%), validation (15%), and testing (15%). Each ANN model is trained using 60 different sets of random initial weights so as to avoid local optima. The number of neurons in the two hidden layers was determined by trial. Using more than three and two neurons in the first and second hidden layers, respectively, does not lead to significant improvement in the accuracy. Since the ratio of neurons to the amount of data is very low and the unbiased generalization error is constantly estimated on the test set during training, the results are unlikely to suffer from overfitting. As a consequence, the accuracy of the training set is only slightly better than for the validation and test set.

For the evaluation the mean performance over all stations is estimated from the five best ANNs. Results of all three ANN models based on the validation dataset are shown in Tables 3 and 4. The accuracy is quantified by the explained variance (*R*^{2}), the rms error (RMSE), the skill ratio between simulated and observed standard deviation (Skill), the mean precipitation differences (DP), the ratio of the average number of dry spells (DS) with more than three days, and the ratio of the average length of these dry spells. These ratios are the number or length found in the simulated downscaled data divided by the value of these indices in the measured data. Measures are calculated separately for each season, so fluctuations in model accuracy can be studied.

In general, all models perform better during winter and autumn owing to less convection in this period. Other studies (e.g., Tolika et al. 2007; Cavazos and Hewitson 2005; Coulibaly et al. 2005) reported similar annual fluctuations in model performance. While there is not much improvement using the SOM optimized predictor set in spring, significant improvements are achieved in summer and autumn. Apart from this, it is noticeable that the mean precipitation amount is underestimated by almost all models and seasons. The same holds true for the number and length of consecutive dry days, indicating difficulties in modeling days without precipitation. Overall, the downscaling with the optimized domains (M1) seems to be better at reproducing the variability of precipitation; see the column Skill. The number of predictors for the models M1 (47 predictors) and M2 (54 predictors) is very similar; thus the differences cannot be explained by the number of predictors alone.

Based on the results, we assume that the ANN models predict processes with sufficient accuracy so that they may be used for prognostic modeling and the global sensitivity analysis in the next section.

### e. Global sensitivity analysis

To understand the importance of the predictors, the model’s complete sensitivity pattern is required (factor prioritization). In the following section, different sources of uncertainty in the ANN are estimated using the variance-based GSA method introduced in section 3. To take into account the spread of ANN network weights, first- and higher-order indices are calculated and averaged over the five best ANNs (smallest RMSE) at each station. Then, for reasons of clarity indices are averaged over all stations as well.

Figure 6 shows the relative importance of individual predictors given by the total- and first-order indices. First-order indices *S _{i}* measure individual predictor contributions to the ANN output variance, while the total-order indices

*S*also include all interaction effects. Whereas the first-order sensitivity is often (close to) zero, in all cases

_{Ti}*S*values significantly differ from zero, indicating that all variables contribute to the output variance either singularly or in combination with other variables. By subtracting

_{Ti}*S*from

_{i}*S*, one obtains a measure of how much variables are involved in interactions.

_{Ti}The individual variables account for about 73% (sum of first-order indices) of the output variance, thus the remaining 27% is due to interactions (nonadditive). Thus the ANN network, trained for precipitation downscaling, belongs to the class of nonadditive models. Apparently some inputs mainly contribute because of their interactions with other variables, for example, div in June–August. Nevertheless, these factors make an important contribution to the model’s output variance by interaction. According to the result *z*, *w*, *T*, and *H _{S}* may be designated as sensitive variables since the model output is mainly influenced by the uncertainty of these factors. There is clear evidence that these factors are subject to seasonal fluctuation and, thus, not all physics is taken into account by these models. For example, the geopotential height

*z*is more important in the lower troposphere around the 500-hPa level during summer and spring, whereas the upper troposphere (300-hPa level) gains importance in winter. Conversely, at the 850–500-hPa level

*w*and

*T*consistently contribute to the model output variance. Note that specific humidity

*H*at the 500-hPa and 700-hPa level shows high values in summer and a minimum in winter.

_{S}Downscaling studies often address the question of which predictors are most responsible for producing strong precipitation events (factor mapping). To do so, model output realizations are mapped back onto the input space (Saltelli et al. 2006) using the previously used Monte Carlo realizations from the GSA. In a first step the output is filtered into two subsets: one contains the upper 5% tail of rainy days and the second subset with the remaining outcomes. Most likely the two subsamples will come from different unknown probability density functions of the predictors. The cumulative distributions are compared by applying a two-sided Kolmogorov–Smirnov test. In Fig. 7 the differences between the two groups with strong and weak precipitation of the average Kolmogorov–Smirnov test statistics over all stations are shown for all predictors: the larger the value, the more important the predictor in producing strong precipitation. In general, high influential predictors (total-order indices) also trigger stronger events. However, apparently less important parameters, such as PV and div can also play a decisive role in the tails of the prediction.

## 5. Conclusions and discussion

This study presents a new nonlinear approach for optimizing predictors and their corresponding three-dimensional domains for conditional airmass classification. Unlike common approaches, each predictor domain can take irregular spatial boundaries to obtain a wide range of precipitation distributions at each weather station. As shown in section 4a, allowing for nonlinearities in the screening process exposes clearly defined and physically interpretable compact and smooth predictor domains, which are by far rectangular and not symmetrical. Most of the domains can be assigned to specific stages of a low pressure trough. Assuming that the optimized predictors are also suitable inputs for nonlinear downscaling, the optimized predictor set has been used for an ANN downscaling in the German Rhineland. The results have been compared to those of similar models, using symmetrically arranged input predictors on several pressure levels.

Despite the predictors not having been explicitly optimized to improve the explained variance of the downscaling, an improvement could be achieved in summer and autumn. A particular emphasis was placed on the insight of predictor interactions, with the intention to indicate the sensitivities and corresponding sources of uncertainty in the predictor set. Such valuable information may be further used to develop more parsimonious models with only small decreases in accuracy. The applied GSA turned out to be an efficient way to provide the necessary understanding of the model’s sensitivity pattern. According to the results, about one-fourth of the ANN model output variance is due to interaction effects.

Neglecting such interactions in the screening process will lead to some loss of information. In this context linear screening methods are insufficient as they neither account for interactions nor for nonadditivity, as given by many nonlinear prediction algorithms. This also holds true for the widely used sigma-normalized derivative sensitivity analysis, which might lead to wrong conclusions because of missing interaction effects.

Based on the GSA results, the following conclusions can be drawn for central Europe:

The role of humidity (

*H*) appears to be more relevant for summer precipitation when subscale processes are dominant. This is also the most likely explanation for the modest horizontal expansion of the domain. The influence of_{S}*H*shows a pronounced vertical differentiation decreasing from the 700-hPa (8%–18%) to the 300-hPa (2%–3%) level (see Fig. 6)._{S}Vertical velocity (

*w*) at the 500-hPa level explains almost 30% of the model output variance (including interactions; see Fig. 6) in the transitional seasons. Together with the contribution at the 850-hPa and 500-hPa level,*w*seems to be a key element for downscaling precipitation. Unlike other predictors, its domain is sloping upwind across the entire troposphere, covering a great part of central Europe (see Fig. 3d).Potential vorticity (PV) and divergence (div) may be classified as rather insignificant by themselves, particularly in summer and spring. During that time of the year these predictors contribute almost entirely only due to interactions effects (see Fig. 6). For parsimonious models these predictors may be removed with little loss of accuracy.

Temperature (

*T*) fundamentally differs from other predictors by its dipole domain (see Fig. 3a) and its distance from the investigation area with the main center of action over Spain. Nevertheless, the mid- to high-tropospheric*T*values contribute between 10% and 18% (including interactions; see Fig. 6) to the total model output variance.Geopotential height (

*z*) was found to be an influential predictor for all seasons. During autumn and winter the high-tropospheric region (300 hPa) seems to be more important, while there is a pronounced shift downward to the 500-hPa level in summer and spring. Interestingly, the horizontal extent of the domain is relatively small. Outside of this domain pressure will also correlate with precipitation in the Rhineland; this illustrates that strong correlations do not always equate to good prediction.

In general, the decomposition of the complete model output variance by a GSA offers a comfortable way to estimate the relative importance of predictors.

Although the downscaling results are convincing, better results may be obtained by additionally including appropriate time lags and scales. Altogether, predictor optimization by means of conditional airmass classification provides a good way to find characteristic atmospheric domains. However, the proposed algorithm is rather time consuming, which is caused by both the simulated annealing algorithm and the self-organizing maps clustering algorithm. One way to speed up the optimization can be achieved by using an alternative clustering algorithm (e.g., *k* means). Furthermore, a more greedy and efficient optimization algorithm may be sufficient to find the optimum domains. It is planned to verify this approach in other climatic regions.

## Acknowledgments

We gratefully acknowledge financial support by the Deutsche Forschungsgemeinschaft (DFG), Grant VE366/3-1.