1. Introduction
Recent climate research indicates the likelihood of increasing precipitation extremes in a warmer climate (e.g., Meehl et al. 2007). Indeed, such changes are already apparent as the atmosphere has warmed over the past century (Trenberth et al. 2007; Ning and Qian 2009). The underlying physics of this overall response is relatively well understood, involving basic factors such as those described by the Clausius–Clapeyron equation, which prescribes increased atmospheric water vapor mixing ratios with warming tropospheric temperatures. There is reason to believe that current-generation atmosphere–ocean general circulation models (AOGCMs) provide credible estimates of changes in the hydrological cycle at continental and larger spatial scales (Randall et al. 2007). However, the resolutions of GCMs are usually too coarse to provide detailed regional information about climate change on local scales, and the parameterizations of subgrid-scale processes, such as precipitation, also result in some degree of uncertainty in the grid-scale projections. Understanding and projecting changes in the distribution of precipitation at regional spatial and short temporal scales most relevant for decision making and as input into hydrological models, therefore, requires a more nuanced approach (Wagener et al. 2010).
Downscaling has become a popular technique for exploring the relationship between local-scale climate change and synoptic-scale climate forcing (Hewitson and Crane 1992b,c, 1996, 2002; Wood et al. 2002). For example, Hewitson and Crane (1992a) use a technique based on an artificial neural network to demonstrate that the local precipitation variability in southern Mexico resulted from changes in the near-surface and 500-hPa circulation fields. Wilby and Wigley (1997) describe four categories of downscaling techniques: regression methods; weather-pattern-based approaches; stochastic weather generators, which belong to statistical downscaling; and limited-area modeling, generally referred to as dynamic downscaling. Downscaling techniques have continued to evolve and their use has matured since the Intergovernmental Panel on Climate Change (IPCC) Third Assessment Report (Houghton et al. 2001). In the fourth IPCC report, Christensen et al. (2007) evaluate many downscaling methods over different regions of the world, and they conclude that downscaling is an effective way to enhance the regional climate details of the AOGCM-simulated data.
With respect to the mid-Atlantic and Northeast regions of the United States, Crane and Hewitson (1998) apply artificial neural networks and find that anthropogenic greenhouse gas forcing leads to changes in storm-track and humidity fields over eastern North America, which, in an early version of the Goddard Institute for Space Studies (GISS) model, resulted in a substantial increase in spring and summer rainfall. More recent high-resolution projections of future climate change across the northeastern United States, using IPCC emission scenarios combined with both statistical and dynamical downscaling suggest temperature increases, especially at higher latitudes and inland, as well as potential precipitation pattern changes (Hayhoe et al. 2006). According to Christensen et al. (2007), annual mean precipitation is very likely to increase in Canada and the northeastern United States, and it is likely to decrease in the southwestern United States. They also indicate that over the mid-Atlantic region, most GCMs agree on increases of annual and winter mean precipitation, while for summer only about half of the GCMs predict increases. Climate change impacts over the mid-Atlantic region identified in these studies are influenced by several key processes, including midlatitude cyclones, ENSO, and the North Atlantic Oscillation (NAO)/Atlantic Oscillation (AO). The ridge and valley province of the Appalachian Mountains dominate a large part of Pennsylvania, and statistical downscaling has proven to perform well at downscaling local temperature and precipitation in similar regions of high topographic variability (Benestad 2005; Hanssen-Bauer et al. 2005; Hewitson and Crane 2006).
In many cases it is necessary to propagate climate change projections of meteorological variables, such as precipitation, through hydrological models to yield the variables of interest (e.g., streamflow or soil moisture). Hydrologic models have a much finer resolution than GCMs, and the GCMs’ output generally has to be downscaled before becoming useful for these models. Wood et al. (2004) use three different statistical downscaling methods: linear interpolation, spatial disaggregation, and bias correction and spatial disaggregation (BCSD). Each of these methods are applied to both the Parallel Climate Model (PCM) and regional climate model (RCM) to downscale climate model output to drive the Variable Infiltration Capacity (VIC) model at a ⅛° spatial resolution (Liang et al. 1996, 1999). They compared the results from the hydrological model driven by the downscaled data from the three other approaches, and they found that the BCSD methods successfully reproduce the main features of the observed hydrometeorology from the retrospective climate. Maurer (2007) also shows that winter streamflow over California will increase, while late spring and summer flow will decrease based on the VIC model driven by downscaled climate change projections from 11 GCMs under both the higher-emission Special Report on Emissions Scenarios (SRES) A2 scenario and lower-emission SRES B1 scenario.
Self-organizing maps (SOMs) represent a nonlinear technique that supports the analysis of variability in large multivariate and multidimensional datasets through the derivation of a spatially organized set of generalized patterns of variability from the data (Reusch et al. 2007). Cavazos (1999) uses SOMs to examine the relationships between large-scale circulation–humidity fields and local daily precipitation events in northeastern Mexico and southeastern Texas. SOMs are applied to combine the precipitation records of individual stations into a regional dataset by Crane and Hewitson (2003). Hewitson and Crane (2006) use SOMs to downscale synoptically controlled daily precipitation over South Africa, while Reusch and Alley (2007) find that SOM-based patterns concisely capture the spatial and temporal variability in monthly Antarctic sea ice edge position data through the examination of area anomalies of Antarctic sea ice coverage.
In this paper, we use the SOM-based downscaling methodology introduced by Hewitson and Crane (2006) to reproduce historical daily precipitation observations for stations over Pennsylvania (United States). We evaluate how well GCMs reproduce the observed synoptic-scale atmospheric conditions and assess their usefulness in projecting current precipitation variability over the region.
2. Data and methodology
a. Data
In this study, we use three sets of data for the downscaling procedure: National Centers for Environmental Prediction (NCEP) reanalysis of daily gridded atmospheric data, observed daily station precipitation data, and GCM daily gridded atmospheric data. The daily gridded atmospheric data are constructed from 6-hourly NCEP reanalysis data from 1979 to 2007 with a resolution of 2.5° × 2.5°.
The SOM procedure uses seven variables: u and υ components of the wind at 10 m and 700 hPa, respectively; relative humidity at 850 hPa; air temperature anomaly at 10 m; and the lapse rate of temperature from 850 to 500 hPa. All seven variables are physically related to the local precipitation. The u and υ components of the wind determine low-level convergence and divergence, while water vapor content of the lower atmosphere relates to relative humidity and surface temperature. The 850–500-hPa lapse rate determines whether the initial conditions for convection are met.
Detailed comparisons (B. C. Hewitson 2009, personal communication) of the present SOM-based statistical downscaling to the application of climate regimes in Africa have shown that the additional use of specific humidity makes little difference to predictions of rainfall during the historical period but leads to predictions of greater increases in rainfall in response to future anthropogenic warming. Comparisons with dynamically downscaled estimates based on regional climate models suggest that these larger precipitation increases represent overestimates. The effects of using the additional humidity parameter will be discussed later in the context of our own downscaling results.
We limit our analysis to the post-1979 NCEP data when satellite observations improve the quality and climatological continuity of the product (Sturaro 2003; Tennant 2004). The precipitation data for Pennsylvania come from 17 stations for the period 1961–2005. The names, locations, and elevations of the stations are given in Table 1. The GCM circulation data from 1961 to 2000 are taken from the twentieth-century simulation using historical greenhouse gas concentrations [Twentieth-Century Climate in Coupled Model (20C3M) scenario] of the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project, phase 3 (CMIP3), for nine different models: Canadian Centre for Climate Modelling and Analysis (CCCma) Coupled General Circulation Model, version 3.1 (CGCM3.1); Centre National de Recherches Météorologiques Coupled Global Climate Model, version 3 (CNRM-CM3); Commonwealth Scientific and Industrial Research Organisation, Mark 3.0 (CSIRO Mk3.0); Geophysical Fluid Dynamics Laboratory Climate Model, version 2.0 (GFDL CM2.0); Goddard Institute for Space Studies Model E-R (GISS-ER); L’Institut Pierre-Simon Laplace Coupled Model, version 4 (IPSL CM4); Meteorological Institute of the University of Bonn, ECHO-G Model (MIUBECHOG); Max Planck Institute (MPI) ECHAM5; and Meteorological Research Institute Coupled General Circulation Model, version 2.3.2a (MRI CGCM2.3.2a). The data and descriptions of the GCMs can be found at the WCRP CMIP3 Multi-Model Data Web site (https://esg.llnl.gov:8443/index.jsp). The variables simulated by the GCM data are the same as those for the NCEP data. In the assessment of the downscaled product, we also use daily NCEP sea level pressure (SLP) from 1979 to 2007, and the precipitation rate of the nine GCMs from 1961 to 2000.
Locations and elevations of the 17 stations over Pennsylvania. Here and in subsequent tables, ID indicates identifier number.



b. The downscaling procedure
The first step in the downscaling procedure used here involves training the SOMs. SOMs are analogous to a fuzzy-clustering algorithm and are usually used to visualize and characterize multivariate data distributions (Kohonen 1989, 1995). A SOM is typically depicted as a two-dimensional array of nodes, where each node is described by a vector representing the average of the surrounding points in the original data space. For an input dataset that is described by a matrix of n variable data points and m observations, each node in the SOM is described by a reference vector having length n. The initial step in the SOM training involves assigning random values to each node reference vector and then comparing the data record with each node vector. The reference vector that most closely matches the data vector is defined as the “winning” node. Then, the reference vector of the winning node is updated slightly toward the direction of the input data by a factor termed the “learning rate.” All the surrounding nodes are also updated in the direction of the input data by a smaller learning rate. The entire process is then repeated for multiple iterations until the differences between iterations are smaller than a selected threshold value. This training procedure is described in detail in Crane and Hewitson (2003) and illustrated in Fig. 1 therein.
In our application, a separate SOM with 9 × 11 = 99 nodes is trained for each station, with each node representing a characteristic atmospheric state. The choice of SOM size is ultimately subjective—fewer nodes increases generalization, while an increased number of nodes results in too few days being mapped to each node to derive a representative rainfall distribution function. However, statistical validation experiments, as described later, can be used to assess the sensitivity of the results to this precise choice.
Prior to the training step, the study area was divided into a grid with a resolution of 0.5 degrees (Fig. 1a), and for any given target station, the nearest cell is identified (Fig. 1b). For example, the grid cell centered on 40.0°N, 76.5°W is the nearest cell for Harrisburg (40.22°N, 76.85°W). For each of the seven variables, 19 hexagonal grids are created, with the grid cell being in the center. For each hexagonal grid, four NCEP data points surrounding each of the six triangular centroids are extracted and regridded to the centroid through a weight inversely proportional to the distance between the NCEP data point and the hexagonal centroid. The value over each of the 19 hexagonal grids is then calculated through averaging the values of six triangular centroids (Fig. 1c). Finally, all the NCEP data are separately standardized. Thus, for each node, we use seven variables, creating a 19 × 7-member vector to describe each day’s atmospheric state around the station.


The SOMs preprocessing (triangle: target location).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

The SOMs preprocessing (triangle: target location).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The SOMs preprocessing (triangle: target location).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
For each station, we compare the observed daily atmospheric data to the SOM nodes and map each day to one particular node. For each SOM node, we take all the days that map to that particular node and then rank the precipitation on those days from low to high. A spline is fit to the ranked precipitation data to define a continuous cumulative distribution function (CDF) of the node’s rainfall. This procedure is repeated for all the nodes in the SOM and then for all the stations.
To downscale the precipitation for a given station over a particular period, we first compare the circulation data (either from the GCMs or from the observations) to the SOM, associating each day with a node. For each day, a random number generator is used to select a value of precipitation from the CDF for the node to which the day is mapped. The procedure is repeated many times to produce an ensemble of time series for each station, any one of which can be considered a representative sample of the distribution characterizing the downscaled dataset. We chose to produce ensembles consisting of 1500 realizations for the purpose of our downscaling applications.
3. Results
There are a large number of empirical downscaling methodologies currently being applied to regional climate datasets. However, many simply describe the application of a technique to a single region using one or maybe two GCMs to drive the downscaling and to compare the results to observed monthly rainfall. Here we attempt to go further by downscaling all available GCMs from the CMIP3 archive that have the required daily parameters. Furthermore, we focus on validation using a variety of measures to achieve a more robust downscaling that also reflects the uncertainty resulting from the variations in the original GCM simulations.
a. Synoptic controls on precipitation
The following basic assumptions are behind our downscaling approach: (i) precipitation at each of the stations will vary as a function of the atmospheric state, (ii) the NCEP variables can adequately describe that state, and (iii) they are to some degree a function of the larger-scale atmospheric state. One appropriate diagnostic of that state is the SLP field. This motivates as an internal consistency check on the validity of our underlying assumptions, assessing how well the nodes of the trained SOM reflect differences in the synoptic state of the atmosphere as indicated by their projection onto the SLP field—a field that was not used to define (train) the SOM.
The projection of the 99 SOM nodes onto the SLP field for the Harrisburg (40.0°N, 76.5°W) site is shown in Fig. 2. This site is chosen because of its central location, but results are similar for all sites. Each projection of a given node is defined by the average of the SLP field over all days mapped to that node. The figure demonstrates that, although the SLP data are not directly used in training the SOM, the SLP distributions are clearly well differentiated by the SOM nodes: similar patterns locate close to each other in the SOM space, while different patterns locate farther apart. High pressure dominates the nodes to the top and left of the SOM space, while low pressure dominates in the bottom right, with transitional nodes in between.


SLP distribution corresponding with 99 SOM nodes (hPa).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

SLP distribution corresponding with 99 SOM nodes (hPa).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
SLP distribution corresponding with 99 SOM nodes (hPa).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
b. GCM validation
A basic method for evaluating a particular GCM’s usefulness in assessing climate change is to test the model’s ability to simulate present climate (including variability and extremes). The differences between simulations and observations should be considered insignificant if they are within unpredictable internal variability, have expected differences in forcing, or have uncertainties in the observed fields (Randall et al. 2007). In the present case, we can compare the simulations to observations (NCEP) by mapping the GCM fields to the trained SOM and comparing the results with the NCEP mapping, providing the means for assessing how well the various GCMs reproduce the atmospheric states used to differentiate characteristic rainfall distributions.
As an example, we still consider the Harrisburg (40.0°N, 76.5°W) site (Fig. 3). In Fig. 3, each square represents one node in the SOM. The frequency of a node is equal to the number of the days mapped to this node as a percentage of the number of all the days used in the SOM training. Figure 3a shows the frequency of days mapped to each of the 99 SOM nodes for the NCEP data for the period 1979–2007, while Figs. 3b–f shows the mapping of five GCMs for the period 1961–2000: CNRM, CSIRO, GFDL, IPSL, and MRI. These five models span the range of quantization errors (discussed below) encountered among the full set of GCMs analyzed. Mapping the GCM data onto the SOM trained by the NCEP data shows how well the GCMs reproduce the atmospheric states used to differentiate characteristic rainfall distributions.


Frequency distributions across the SOM nodes for atmospheric circulation from (a) NCEP and models (b) CNRM, (c) CSIRO, (d) GFDL, (e) IPSL, and (f) MPI centered on 40.0°N, 76.5°W (%).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

Frequency distributions across the SOM nodes for atmospheric circulation from (a) NCEP and models (b) CNRM, (c) CSIRO, (d) GFDL, (e) IPSL, and (f) MPI centered on 40.0°N, 76.5°W (%).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
Frequency distributions across the SOM nodes for atmospheric circulation from (a) NCEP and models (b) CNRM, (c) CSIRO, (d) GFDL, (e) IPSL, and (f) MPI centered on 40.0°N, 76.5°W (%).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The NCEP frequency distribution (Fig. 3a) shows that the frequencies are fairly uniformly distributed across all the nodes, with slightly larger frequencies located at the edges and corners. The GCMs also show a fairly uniform distribution across all nodes, although several of the models show centers of higher frequencies that are not present in the NCEP distributions. CSIRO Mk3.0 and IPSL CM 4 in particular, the two models with the largest quantization errors, show a concentration of variance in fewer nodes. These centers of higher frequencies in some of the GCM mappings suggest slightly reduced variance in the model fields, but the distribution across the nodes shows that the models do recreate the atmospheric states revealed in the NCEP data, indicating that the GCMs produce realistic synoptic-scale patterns and variability across the region.
The average quantization errors (Fig. 4) measure how well the NCEP or five GCMs circulation fields map onto the available nodes. The quantization error is the mean difference between the node vector and each of the days that map to the node. It shows how closely the days are clustered in the data space or, alternatively, how much of the data space is represented by that node. By analogy with cluster analysis, the quantization error represents the within-group variability. The quantization error of one day is defined as the smallest Euclidean distance between the input vector and its best-matching node when mapping that day to one of the SOM nodes, and it is the measure of how the node reference vector represents the mapped atmospheric vector. Figure 4a shows that the quantization errors over the left side of the SOM are larger than those over the center and right side, indicating that the variances are larger for the synoptic states dominated by high pressure systems (Fig. 2).


Average quantization error distributions across the SOM nodes for atmospheric circulation from (a) NCEP and models (b) CNRM, (c) CSIRO, (d) GFDL, (e) IPSL, and (f) MPI centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

Average quantization error distributions across the SOM nodes for atmospheric circulation from (a) NCEP and models (b) CNRM, (c) CSIRO, (d) GFDL, (e) IPSL, and (f) MPI centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
Average quantization error distributions across the SOM nodes for atmospheric circulation from (a) NCEP and models (b) CNRM, (c) CSIRO, (d) GFDL, (e) IPSL, and (f) MPI centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The average quantization error distributions of the GCMs (Figs. 4b–f) are very similar to the corresponding distribution for the NCEP fields, although with slightly higher error values in some cases. The blank node (7, 8) in the distribution of model CSIRO Mk3.0 (Fig. 4c) indicates that no single day maps to that node. This observation suggests that although the GCMs may have reduced dimensionality compared to the observed data, there is, in some cases, greater variability within the synoptic states mapped to each node.
Recall that in the training of the SOM, each daily synoptic circulation state is treated as a location in the original multidimensional state space, and that similar circulation states locate close to each other, while those states with large differences locate farther apart. Combining Figs. 3 and 4, it can be concluded that the total volume in state space of the NCEP data is larger than the volume of the GCMs’ state spaces; however, around separate characteristic synoptic atmospheric states, the distance among the daily synoptic states in each of the GCMs is usually larger than the distance in NCEP state space.
The average values and standard deviations of the averaged quantization errors through NCEP and nine GCMs are given in Fig. 5. The pattern of average values of the averaged quantization errors is similar to the pattern of averaged quantization errors seen with the NCEP data, with larger errors over the left-hand side of the SOM and smaller errors over the right-hand side (Fig. 5a). The additional four GCMs not shown in Fig. 4 thus have similar quantization error distributions to the five GCMs that are shown. It can be concluded that the distribution of variance of the synoptic circulation patterns in the GCMs are broadly similar to those for the NCEP data. The standard deviations of the mean quantization errors (Fig. 5b) show that the variability is almost uniform over all the nodes. The average of all GCMs gives a close match to the observations (NCEP). This was also demonstrated for the Pennsylvania region by Shortle et al. (2009), who show that the present-day mean temperature and rainfall simulations for all GCMs in the CMIP3 archive provide a closer match to the observations than any individual model.


The (a) average and (b) standard deviation of the averaged quantization errors across the SOM nodes for NCEP and nine GCMs’ circulation data centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

The (a) average and (b) standard deviation of the averaged quantization errors across the SOM nodes for NCEP and nine GCMs’ circulation data centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The (a) average and (b) standard deviation of the averaged quantization errors across the SOM nodes for NCEP and nine GCMs’ circulation data centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
As a simple measure of GCM performance in reproducing the observed atmospheric states, we average all the mean quantization errors over the 99 nodes individually for the NCEP data and for each of the GCMs (Fig. 6). The average error for the NCEP data is about 7, and the average error from most of the GCMs is also close to 7, with slightly larger errors from models CSIRO Mk3.0, and IPSL CM4. This suggests that overall these two models are slightly less accurate in reproducing the observed distribution of synoptic-scale atmospheric states compared to the other seven models. Although it does not necessarily follow that the accuracy with which a particular GCM is able to simulate the present climate translates directly into its ability to project the future, using the inverse difference between the GCM and the NCEP quantization errors would be one possible approach to weighting the GCM output when looking at projected changes for future climates over the region based on a multimodel ensemble (e.g., the CMIP3 future climate change projections).


The average values of the averaged quantization errors over all the SOM nodes for NCEP and nine GCMs’ circulation data centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

The average values of the averaged quantization errors over all the SOM nodes for NCEP and nine GCMs’ circulation data centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The average values of the averaged quantization errors over all the SOM nodes for NCEP and nine GCMs’ circulation data centered on 40.0°N, 76.5°W.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
c. The downscaled precipitation
Figure 7 shows the calculated CDFs of daily precipitation values corresponding to the 99 SOM nodes. For the nodes located in regions of the SOM dominated by high surface pressure, most of the CDFs show low or zero precipitation amounts. As would be expected, for nodes located in regions of the SOM dominated by low surface pressure and certain transitional surface pressure patterns, precipitation amounts are much higher. The differences in the CDFs across the SOM indicate that the SOM categorization of atmospheric conditions does allow for substantial differentiation between different precipitation states and that these differences make physical sense in the context of the synoptic-scale circulation.


The CDFs of daily precipitation values corresponding to 99 SOM nodes (x-axis; mm).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

The CDFs of daily precipitation values corresponding to 99 SOM nodes (x-axis; mm).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The CDFs of daily precipitation values corresponding to 99 SOM nodes (x-axis; mm).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The previous discussion clearly demonstrates that the SOM characterization of atmospheric states captures the synoptic variability, that the different atmospheric states (represented by the SOM nodes) have different precipitation characteristics, and that the GCMs exhibit the same atmospheric states and synoptic variability. The final step in the validation procedure is to examine whether the trained SOM can be used to generate realistic precipitation time series that have the same magnitude and frequency characteristics as the observed data.
The SOM approach to downscaling precipitation acknowledges that similar atmospheric conditions can result in different observed precipitation amounts. By randomly selecting from the rainfall CDF for each node, the approach captures some of this stochastic variability. Because the downscaling is a simplification of reality, and because of its stochastic element, any individual recreation of the precipitation represents only one possible realization of the precipitation regime, which should match in its fundamental statistical attributes, the observed precipitation, while not necessarily matching the observed time series on a day-to-day basis. To be a valid and useful representation of actual precipitation, the downscaling needs to match the characteristics needed for, for example, hydrologic modeling; that is, the downscaled precipitation should exhibit the same monthly and seasonal precipitation amounts, the same day-to-day variability, and the same number of rain days per month.
In some respects, validation is simply a matter of examining these characteristic statistics and assessing whether the results are good enough for a particular application. “Good enough,” of course, is a subjective decision that depends on the particular application. In this case, and to demonstrate that the downscaling may have broad application, we also ask, does the downscaling give a better result than just using climatology? And, is the downscaling a significant improvement over using the nearest GCM gridcell data? We seek to demonstrate that the downscaled data are a close match to the observations, and that the downscaling gives an appreciable improvement over using the GCM precipitation field directly.
The first step in validating statistical downscaling is to compare the statistical properties of the downscaled time series generated by the reanalysis fields with those of the corresponding observations. Figure 8 compares the probability distributions of the observed daily precipitation and the downscaled daily precipitation from one random iteration generated by the NCEP data over 17 stations for the period 1979–2005. In the calculation, only the days on which both observed and downscaled precipitation data are available are counted [note that only precipitation events larger than 0.25 mm (0.01 in.) are considered, consistent with the threshold for defining a “rain day” used in past work; e.g., Fitzpatrick and Krishnan 1967; Hershfield 1971; Gallus and Segal 2004]. The precipitation interval used in the calculation is 1 mm, and the probabilities of extreme precipitation larger than 50 mm are considered together. From Fig. 8 it can be concluded that, although there are some differences between the downscaled and observed probabilities of daily precipitation in the range from 0.25 to 1 mm, the downscaling reproduces the probability distributions extremely well. Moreover, the observed and downscaled probabilities of the largest precipitation events—with daily precipitation greater than 50 mm—are very close, which means that the downscaling, importantly, is also effective in capturing the extreme precipitation events for each station.


The probability distributions of observed (black) and downscaled (gray) daily precipitation over 17 stations in Pennsylvania during the period 1979–2005.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

The probability distributions of observed (black) and downscaled (gray) daily precipitation over 17 stations in Pennsylvania during the period 1979–2005.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
The probability distributions of observed (black) and downscaled (gray) daily precipitation over 17 stations in Pennsylvania during the period 1979–2005.
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
We can see that the downscaling captures the temporal variability of the actual observations (Fig. 9). While we generate many (1500) realizations of the downscaled precipitation to construct our ensemble, it is extremely unlikely that we will happen to reproduce the unique sequence of daily precipitation events that characterizes the actual observations. Nonetheless, it is instructive to see if we can find members of our ensemble that not only capture the overall statistical character of the observations but also happen to approximate well the observed sequence of monthly precipitation anomalies. Figure 9 shows the observed and downscaled monthly precipitation amount time series for three stations, with the largest correlation coefficients between the observed and downscaled monthly precipitation: Allentown (Fig. 9a), Harrisburg (Fig. 9b), and Towanda (Fig. 9c).


Observed (blue) and downscaled (red) monthly precipitation amount time series for period 1979–2005 over stations (a) Allentown, (b) Harrisburg, and (c) Towanda (mm).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1

Observed (blue) and downscaled (red) monthly precipitation amount time series for period 1979–2005 over stations (a) Allentown, (b) Harrisburg, and (c) Towanda (mm).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
Observed (blue) and downscaled (red) monthly precipitation amount time series for period 1979–2005 over stations (a) Allentown, (b) Harrisburg, and (c) Towanda (mm).
Citation: Journal of Climate 25, 2; 10.1175/2011JCLI4091.1
As one important measure of statistical skill, we evaluate whether the mean downscaled precipitation estimates using the NCEP data (rain days only) are on average closer to the observations than the climatological mean for that season (Table 2; results using median rather than mean of the 1500 downscaling realizations are provided in supplementary Table S1). Ratios less than unity indicate nominally better skill than the null “no skill” prediction of climatological mean values. We used a remove-one-sample-at-a-time jackknife procedure (Efron 1982) to provide nonparametric confidence intervals (CIs) in the mean skill over all 17 stations. If the associated upper 95% confidence limit remains below unity, we conclude that the downscaling procedure yields a statistically significant improvement above the climatological no-skill baseline. The ratios for each season (and annual mean) are in fact observed to be significantly below unity, indicating, as we would hope, that the downscaling does perform better than simply invoking climatology. Among the four seasons, the least improvement occurs in summer, reflecting the fact that much of the summer precipitation is convective. In this case there is more of a stochastic element and less dependence on the synoptic circulation.
Ratio of mean-square errors (MSEs) (relative to the observed daily precipitation values) using mean downscaled estimates and climatological mean values.



Continuing the process of skill evaluation, we would expect that if the downscaling is providing large-scale information that usefully informs the local distribution of precipitation, drawing from the precipitation CDFs for the appropriate SOM node should yield a narrower distribution than drawing simply from the climatological daily rainfall distribution for the appropriate season. In other words, accounting for the synoptic atmospheric state ought to provide some additional discrimination beyond a random draw from the climatological seasonal distribution. We define the width of the respective PDFs by the interquartile range (i.e., the difference between the 75th and 25th percentiles of the distributions). Table 3 tabulates a skill metric defined as the ratio of the mean squared width of the PDFs constructed from the ensemble of 1500 downscaled precipitation values and the observed climatological distribution. In this analysis, to get the more precise PDFs with larger amounts of samples, we use a different rainy day definition, one that is larger than 0 mm. A ratio below unity indicates that the downscaled values are drawn from a narrower PDF than would be the case for the corresponding climatological distributions, and it is therefore suggestive that conditioning on the large-scale atmospheric state via the downscaling procedure provides some additional predictive skill beyond climatology. We once again use a jackknife procedure to estimate confidence intervals and evaluate statistical significance. Apart from a number of the stations in spring (February–April), the ratios remain significantly below unity. The possible reason for the exception with the spring season results is that a large number of spring rain days are governed by synoptic circulation patterns with CDFs that have wide daily precipitation ranges.
Ratio of widths (as defined by interquartile range) of downscaled NCEP vs observed climatological precipitation distribution (days with nonzero precipitation only).



We next compared the observed and NCEP downscaled precipitation fields with respect to several key measures: mean monthly precipitation, average monthly number of rain days, and standard deviations of monthly-mean precipitation (Table 4). The comparison was performed over the 17 stations for the period 1979–2005. For the forgoing discussion, we make use of a randomly selected, representative realization from the ensemble of 1500 downscaling surrogates, though similar results are obtained for any realization. The observed average monthly precipitation amounts vary from roughly 80 to 110 mm among the different sites, with an average of 98.8 mm. The downscaling slightly underestimates the mean precipitation at 95.4 mm. This underestimation bias is statistically significant according to the jackknife error estimates, and it appears to result from the fact that the downscaling underestimates the magnitude of the most intense daily precipitation events (larger than 50 mm). The observed and downscaled average monthly rain days are very close for all stations, with an identical average over all stations of 10.9 days for both downscaled and observed. The observed and downscaled standard deviations of monthly precipitation totals are also similar, with the 95% confidence intervals overlapping (albeit only just so), indicating that the downscaling procedure is able to reproduce the observed variability.
Comparisons between observed and NCEP downscaled average monthly precipitation amounts, average monthly number of rainy days, and standard deviations of monthly precipitation over the 17 stations during the period 1979–2005.



Having validated the downscaling procedure for the late-twentieth-century observations, we then turned to the GCM simulations. We compared the same characteristics of the precipitation field for the downscaled and raw GCM precipitation data over the full data record 1961–2000. Results were analyzed by both GCM (Table 5), averaging of the 17 stations, and station (Table 6), averaging over GCMs. Errors relative to observations for both downscaled and raw GCM precipitation values are compared in Tables 7 and 8, respectively, while results for individual seasons are provided in the supplementary information (supplementary Tables S2–S9). Uncertainties in averages across the models are determined based on jackknifing with respect to the model, while uncertainties in averages across stations are determined by jackknifing with respect to the station.
Comparisons of average monthly precipitation amount, average monthly number of rainy days, and standard deviations of monthly precipitation amount for observed, downscaled GCM, and raw GCM precipitation for all nine GCMs, averaged over stations (period 1961–2000, all months). The 95% confidence intervals are given in parentheses.



As in Table 5, but for all 17 stations averaged over the GCMs (period 1961–2000, all months). The 95% confidence intervals are given in parentheses.



Absolute errors with respect to observations (expressed as percent) for the average monthly precipitation amounts, average monthly number of rainy days, and standard deviations of monthly precipitation amounts across the 17 stations for all the months during the period 1961–2000.



Absolute errors with respect to observations (expressed as percent) for the average monthly precipitation amounts, average monthly number of rainy days, and standard deviations of monthly precipitation amounts across the nine GCMs for all the months during the period 1961–2000.



These comparisons yield a number of important insights. First of all, the downscaled results are clearly closer to observations than the raw GCM results for nearly all the models and all the stations. This finding is also true for averages over the stations and averages over the models. Extremely large errors are found with the raw precipitation field for individual GCMs with respect to, for example, mean monthly precipitation (see Tables 5 and 7). However, because these errors are often of opposite sign in different models—that is, either considerably below or above the observed value—they tend to cancel. Averages over GCMs are consequently considerably closer to observations than individual GCMs. This finding is consistent with the widely reported finding (e.g., Meehl et al. 2007) that averages over multimodel ensembles often provide more faithful estimates than individual models, presumably because of the cancellation of errors specific to models. In this particular case, the errors in question likely involve the differing convective parameterization schemes used to estimate precipitation in the various models.
While the downscaled model precipitation field is closer in nearly all characteristics to observations than the models’ raw precipitation field, it is worth noting that statistically significant biases nonetheless remain in the downscaled model estimates. Mean monthly rainfall totals are, as they were for the late-twentieth-century observations (Table 5), biased slightly high (the mean monthly downscaled precipitation over all models and stations is 103.47 mm, while the observed value is 96.87 mm). Rain day numbers are also biased slightly at 11.42 days per month averaged over all models and stations versus the observed value of 10.89 days per month—the difference is small but statistically significant. The monthly standard deviation of the downscaled precipitation field is 53.85 mm averaged over stations and GCMs, while it is 48.58 for the observations—a difference that is once again statistically significant and suggests that the downscaled GCM precipitation is slightly more variable in time than the observations.
Some of the differences between the downscaled GCM values are the result of model bias and differences in the simulated atmospheric states. However, the similarity of the downscaled data suggests that a large portion of the GCM differences is likely due to differences in precipitation parameterization schemes.
That such bias remains in the downscaled estimates is hardly surprising, as there are clearly systematic biases in the various fields of the model (temperature, winds, lapse rates, etc.) from which the downscaled precipitation estimates are derived, and not even downscaling methods can cure these ills. However, there is far greater consistency among the downscaled model estimates with respect to all diagnostics (monthly-mean precipitation, rain days, and monthly standard deviation) than in the raw model precipitation values. The downscaling procedure appears to provide a more reliable determination of whether precipitation is likely, and when it is likely, how much an event will produce. These observations reinforce previous findings (e.g., Hewitson and Crane 2006) that bypassing the convective parameterization schemes in the models yields precipitation estimates that are in all key attributes more consistent among models, closer to observations, and likely more robust with respect to future projections—an issue we will discuss further below.
Looking more closely at the computed relative errors (Tables 7 and 8), we obtain some additional important insights. We see, for example, that the ability of the downscaled precipitation fields to reproduced observed characteristics at our sites is highly model dependent (Table 7). Several models—CGCM3.1, CSIRO Mk3.0, and MPI ECHAM5 to be specific—are able to reproduce observed monthly-mean rainfall totals with less than 5% error relative to observations. These same models reproduce the observed frequency of rain days with less than 6% relative error and the monthly standard deviation with less than 11% relative error. By contrast, relative errors remain high even for the downscaled estimates (though considerably less so than for the raw model precipitation) for certain models, specifically GISS (error is nearly 20% for mean precipitation, roughly 12% for frequency of rain days, and roughly 24% for the standard deviation). Looking at the breakdown of error by site (Table 8), we see that certain sites (e.g., Stroudsburg and West Chester) show particularly large errors in mean monthly precipitation (11%–13%) and monthly standard deviation (18%–26%) (errors remain small for the number of rain days). These are among the few wettest sites in our network, and they also produce among the largest discrepancies for both mean precipitation and monthly standard deviation in the downscaling of modern (NCEP) observations (Table 4). It is reasonable to speculate that the larger biases we see in the downscaled simulations in these locations may relate as much to the intrinsic biases in the application of downscaling at these sites to any features specific to the model simulations themselves.
Finally, we return to the issue of the potential sensitivity of the downscaling procedure to the precise variables used in training the SOM. As we discussed earlier, future projections of precipitation based on statistical downscaling methods show some sensitivity to which humidity variables are used. Previous work in Africa, for example, indicates that use of specific humidity can lead to projections of large future increases in mean precipitation. Comparisons with parallel dynamical downscaling results suggest that these larger projections are unrealistic (B. C. Hewitson 2009, personal communication). A plausible explanation is that the exponential dependence of specific humidity on temperature leads to a large potential extrapolation error when projecting future precipitation changes in a warmer atmosphere based on training over an historical interval for which there are no analog states for characterizing future atmospheric temperature. Arguably, the use of relative humidity as a training variable avoids this problem.
Nonetheless, one might view this sensitivity to the humidity variables used as a reasonable measure of a key structural uncertainty in projecting future precipitation changes. In this spirit, we have performed the same analyses as described earlier but where specific humidity has been added as an additional large-scale predictive variable in training the SOM. This alternative procedure yields remarkably similar results, including very similar relative errors (see supplementary Tables S10–S16). Thus, it is not possible from the skill assessments presented in this study to objectively favor one choice of humidity variables over the other. In additional work involving the downscaling of future climate change projections, we intend to use the sensitivity of projections to this choice as a measure of structural uncertainty in projecting regional changes in precipitation characteristics (Ning et al. 2011, manuscript submitted to J. Climate).
4. Conclusions
Using a specific application—reproducing historical precipitation characteristics in Pennsylvania—we demonstrated how statistical downscaling using self-organization maps to condition local precipitation estimates on large-scale atmospheric states can yield improved representations of precipitation characteristics. Using a variety of skill metrics and internal consistency tests, we showed that the downscaling procedure applied to modern (NCEP) atmospheric observations realistically reproduces observed daily precipitation characteristics at a network of sites throughout Pennsylvania.
Next, we demonstrated that application of the same SOM procedure to a suite of nine simulations from the CMIP3 multimodel historical simulation archive yields local precipitation estimates that form the model simulations that agree better with both each other and with historical observations than the raw GCM precipitation field. While the downscaling procedure does not entirely eliminate biases in the modeled local precipitation statistics, it does reduce these biases considerably, and it does provide greater consistency between the models, suggesting that bypassing the individual model’s own varying precipitation parameterization schemes can yield more robust estimates of the distribution of local precipitation from the models.
Finally, we found that similarly skillful precipitation statistics could be obtained using either of two alternative representations of humidity information in training the SOM, one in which only relative humidity is used, another in which both relative and specific humidity are used. Given that one cannot objectively distinguish, on the basis of validation against historical data alone, which of these two schemes is preferable, it is advisable to consider the sensitivity to this choice as one measure of structural error in examining downscaling results applied to future climate change projections, where the two schemes may give somewhat different results.
Acknowledgments
This work was supported by the U.S. Department of Energy (DOE). Bruce Hewitson (University of Cape Town) kindly provided the downscaling code used, and Lisa Coop (University of Cape Town) provided assistance with the downscaling implementation. The NCEP grid data and station daily precipitation data over Pennsylvania were obtained from the National Centers for Environmental Prediction (NCEP) and the DOE’s Carbon Dioxide Information Analysis Center (CDIAC) at Oak Ridge National Laboratory, respectively. The WCRP CMIP3 multimodel GCM dataset is made available by the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and the WCRP’s Working Group on Coupled Modeling (WGCM). Support of this dataset is provided by the DOE’s Office of Science.
REFERENCES
Benestad, R. E., 2005: Climate change scenarios for northern Europe from multi-model IPCC AR4 climate simulations. Geophys. Res. Lett., 32, L17704, doi:10.1029/2005GL023401.
Cavazos, T., 1999: Large-scale circulation anomalies conducive to extreme precipitation events and derivation of daily rainfall in northeastern Mexico and southeastern Texas. J. Climate, 12, 1506–1523.
Christensen, J. H., and Coauthors, 2007: Regional climate projections. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 847–940.
Crane, R. G., and B. C. Hewitson, 1998: Double CO2 precipitation changes for the Susquehanna basin: Down-scaling from the GENESIS general circulation model. Int. J. Climatol., 18, 65–76.
Crane, R. G., and B. C. Hewitson, 2003: Clustering and upscaling of station precipitation records to regional patterns using self-organizing maps (SOMs). Climate Res., 25, 95–107.
Efron, B., 1982: The Jackknife, the Bootstrap, and other Resampling Plans. Society for Industrial and Applied Mathematics, 92 pp.
Fitzpatrick, E. A., and A. Krishnan, 1967: A first-order Markov model for assessing rainfall discontinuity in central Australia. Theor. Appl. Climatol., 15, 242–259.
Gallus, W. A., and M. Segal, 2004: Does increased predicted warm-season rainfall indicate enhanced likelihood of rain occurrence? Wea. Forecasting, 19, 1127–1135.
Hanssen-Bauer, I., C. Achberger, R. E. Benestad, D. Chen, and E. J. Førland, 2005: Statistical downscaling of climate scenarios over Scandinavia. Climate Res., 29, 255–268.
Hayhoe, K., and Coauthors, 2006: Past and future changes in climate and hydrological indicators in the US Northeast. Climate Dyn., 28, 381–407, doi:10.1007/s00382-006-0187-8.
Hershfield, D. M., 1971: The frequency of dry periods in Maryland. Chesapeake Sci., 12, 72–84.
Hewitson, B. C., and R. G. Crane, 1992a: Large-scale atmospheric control on local precipitation in tropical Mexico. Geophys. Res. Lett., 19, 1835–1838.
Hewitson, B. C., and R. G. Crane, 1992b: Regional climates in the GISS global circulation model: Synoptic-scale circulation. J. Climate, 5, 1002–1011.
Hewitson, B. C., and R. G. Crane, 1992c: Regional-scale climate prediction from the GISS GCM. Palaeogeogr. Palaeoclimatol. Palaeoecol., 97, 249–267.
Hewitson, B. C., and R. G. Crane, 1996: Climate downscaling: Techniques and application. Climate Res., 7, 85–95.
Hewitson, B. C., and R. G. Crane, 2002: Self-organizing maps: Applications to synoptic climatology. Climate Res., 22, 13–26.
Hewitson, B. C., and R. G. Crane, 2006: Consensus between GCM climate change projections with empirical downscaling: Precipitation downscaling over South Africa. Int. J. Climatol., 26, 1315–1337.
Houghton, J. T., Y. Ding, D. J. Griggs, M. Noguer, P. J. van der Linden, X. Dai, K. Maskell, and C. A. Johnson, Eds., 2001: Climate Change 2001: The Scientific Basis. Cambridge University Press, 881 pp.
Kohonen, T., 1989: Self-Organization and Associative Memory. 3rd ed. Springer-Verlag, 312 pp.
Kohonen, T., 1995: Self-Organizing Maps. Springer, 362 pp.
Liang, X., E. F. Wood, and D. P. Lettenmaier, 1996: Surface soil moisture parameterization of the VIC-2L model: Evaluation and modification. Global Planet. Change, 13, 195–206.
Liang, X., E. F. Wood, and D. P. Lettenmaier, 1999: Modeling ground heat flux in land surface parameterization schemes. J. Geophys. Res., 104, 9581–9600.
Maurer, E. P., 2007: Uncertainty in hydrologic impacts of climate change in the Sierra Nevada, California, under two emission scenarios. Climatic Change, 82, 309–325.
Meehl, G. A., and Coauthors, 2007: Global climate projections. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 747–845.
Ning, L., and Y. Qian, 2009: Interdecadal change in extreme precipitation over south China and its mechanism. Adv. Atmos. Sci., 26, 109–118.
Randall, D. A., and Coauthors, 2007: Climate models and their evaluation. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 589–662.
Reusch, D. B., and R. B. Alley, 2007: Antarctic sea ice: A self-organizing map-based perspective. Ann. Glaciol., 46, 391–396.
Reusch, D. B., R. B. Alley, and B. C. Hewitson, 2007: North Atlantic climate variability from a self-organizing map perspective. J. Geophys. Res., 112, D02104, doi:10.1029/2006JD007460.
Shortle, J., and Coauthors, 2009: Pennsylvania climate impact assessment report to the Department of Environmental Protection. Environment and Natural Resources Institute, The Pennsylvania State University, 7000-BK-DEP4252, 350 pp.
Sturaro, G., 2003: A close look at the climatological discontinuities present in the NCEP/NCAR reanalysis temperature due to the introduction of satellite data. Climate Dyn., 21, 309–316.
Tennant, W., 2004: Considerations when using pre-1979 NCEP/NCAR reanalyses in the Southern Hemisphere. Geophys. Res. Lett., 31, L11112, doi:10.1029/2004GL019751.
Trenberth, K. E., and Coauthors, 2007: Observations: Surface and atmospheric climate change. Climate Change 2007: The Physical Science Basis, S. Solomon et al., Eds., Cambridge University Press, 235–336.
Wagener, T., and Coauthors, 2010: The future of hydrology: An evolving science for a changing world. Water Resour. Res., 46, W05301, doi:10.1029/2009WR008906.
Wilby, R. L., and T. M. L. Wigley, 1997: Downscaling general circulation model output: A review of methods and limitations. Prog. Phys. Geogr., 21, 530–548.
Wood, A. W., E. P. Maurer, A. Kumar, and D. P. Lettenmaier, 2002: Long-range experimental hydrologic forecasting for the eastern United States. J. Geophys. Res., 107, 4429, doi:10.1029/2001JD000659.
Wood, A. W., L. R. Leung, V. Sridhar, and D. P. Lettenmaier, 2004: Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs. Climatic Change, 62, 189–216.
