1. Introduction
The description of the mean state and variability of recent climate is important for a number of purposes in global change research. These include monitoring and detecting climate change, climate model evaluation, calibration of or merging with satellite data, biogeochemical modeling, and construction of climate change scenarios (New et al. 1999). Datasets of surface climate, which describe variability in space and time (Hulme 1992; Jones 1994; Easterling et al. 1997), historically have had incomplete spatial coverage and have been of coarse resolution (≥2.5° lat–long). This is because their primary purposes, which are monitoring current climate (and its historic perspective), climate change detection, and general circulation model (GCM) evaluation, do not necessarily require spatially continuous fields or higher resolution.
There has been a growing demand for datasets with high spatial (e.g., 0.5° lat–long) and temporal (e.g., monthly or daily) resolution that are also continuous over the space–time domain of interest. Potential applications for such datasets include understanding the role of climate in biogeochemical cycling (Dai and Fung 1993; Cramer and Fischer 1996), climate change scenario construction (Carter et al. 1994; Hulme et al. 1995) and high-resolution climate model evaluation (Christensen et al. 1997). Yet there currently are few datasets that satisfy the requirement of high spatio–temporal resolution. Notable exceptions are the monthly 1971–94 Global Precipitation Climatology Project (GPCP) dataset (Rudolf et al. 1994; Xie and Arkin 1996; Xie et al. 1996; Huffman et al. 1997); the monthly 1900–88, 2.5° lat–long precipitation dataset of Dai et al. (1997a, hereinafter DAI); and the 0.5° lat–long daily dataset being developed by Piper and Stewart (1996, hereinafter PS). However, these products either cover relatively short periods (1970s–present—GPCP, PS), are limited to precipitation (GPCP, PS, DAI) and maximum and minimum temperature (PS), do not include an elevation dependence in their interpolation schemes (GPCP, PS, DAI), or have a relatively coarse resolution (DAI). A further limitation is that GPCP and PS interpolate directly from station time series: their methodology has to overcome difficulties in interpolating monthly climate over complex terrain and they cannot make use of the more extensive network of station climatological normals to define a mean climatology (see below).
In this paper, we describe the construction of a new dataset of monthly surface climate over global land areas, excluding Antarctica, for the period of 1901–96. The dataset is gridded at 0.5° lat–long resolution and comprises a suite of seven variables, namely, precipitation, wet-day frequency, mean temperature, diurnal temperature range, vapor pressure, cloud cover, and ground frost frequency.
In constructing the monthly grids, we used an “anomaly” approach, which attempts to maximize available station data in space and time (New et al. 1999). In this technique, grids of monthly anomalies relative to a standard normal period (in our case, 1961–90) were first derived. The anomaly grids were then combined with a high-resolution mean monthly climatology to arrive at fields of estimated monthly surface climate. We used the 0.5° lat–long 1961–90 climatology described in a companion paper (New et al. 1999) for this purpose.
The advantage of this approach is that the number of archived and easily obtainable station normals is far greater than that of station time series, particularly as one goes back in time. Using as many stations as possible to generate the mean fields together with an explicit treatment of elevation dependency maximizes the representation of spatial variability in mean climate. Monthly anomalies, on the other hand, tend to be more a function of large-scale circulation patterns and relatively independent of physiographic control. Therefore, a comparatively less extensive network can be used to describe the month-to-month departures from the mean climate.
We have divided the seven climatic elements into two groups, primary and secondary variables. The former, comprising precipitation, mean temperature, and diurnal temperature range, was considered to have sufficient station coverage to attempt the derivation of grids directly from station anomalies for the entire period of 1901–96. The interpolation of the primary variable anomaly grids is covered in the first half of this paper, where we also compare our dataset of primary variables over a few selected regions with some other existing long-term, but coarsely gridded, datasets.
Station networks with time series of secondary variables, namely, wet-day frequency, vapor pressure, cloud cover, and ground frost frequency, were insufficient for the derivation of anomaly fields directly from station data. We therefore used empirical relationships to derive synthetic anomalies from the gridded anomalies of primary variables and merge these with station anomalies of secondary variables over regions where such data were available. The merged anomalies were then combined with the 1961–90 normal grids mentioned above, thereby standardizing the anomalies against high-resolution observed data. This approach is described in more detail in the second half of the paper, along with an evaluation of the various empirical relationships. We end the paper with a discussion of the merits and limitations of this new dataset and our conclusions.
2. Primary variables
a. Datasets
Three global station datasets compiled by the Climatic Research Unit (CRU) form the basis for the construction of the gridded anomalies of primary variables. The precipitation (Eischeid et al. 1991; Hulme 1994, updated) and mean temperature (Jones 1994, updated) station data have been compiled by the CRU over the last 20 yr. The diurnal temperature range dataset is based on the Global Historical Climatology Network (GHCN) maximum and minimum temperature data (Easterling et al. 1997) but has been updated for more recent years by CRU and enhanced with additional station data obtained by the CRU and the U.K. Meteorological Office (Horton 1995, updated). The original data have been subjected to comprehensive quality control over the years, as described by the above authors. Updates for more recent years and additional station data collated by the CRU have also been checked for homogeneity and outliers.
The CRU precipitation data have not been corrected for gauge biases, the most significant of which is undercatch of solid precipitation in colder areas. Undercatch also varies with gauge type, so periodic instrument changes can therefore result in inhomogeneties in the records. The correction of individual records requires detailed local meteorological and station metainformation, which are not readily available.
The station networks for all three variables exhibit a gradual increase in the total number of stations from 1901 to about 1980, after which the numbers decline (Figs. 1–3). The recent reduction in station numbers is primarily in areas with good or reasonable station coverage. However, the spatial coverage of stations reporting diurnal temperature ranges shows a more serious reduction in the 1990s. This, in due course, should be alleviated by the inclusion of mean monthly maximum and minimum temperature in the post-1995 monthly CLIMAT reports and by updated datasets for the former USSR and China, once they are included in the CRU dataset.
The station density that is required to adequately describe monthly spatial variability is characteristically greater for precipitation than for diurnal temperature range and mean temperature. For example, Dai et al. (1997a) found that zonally averaged interstation correlation distances for annual precipitation fall to an insignificant level (∼0.36 for N = 30) at 200 km for 0°–30°N, 400 km for 30°–60°N, 300 km for 60°–90°N, 550 km for 0°–30°S, and 800 km for 30°–60°S. This compares with distances of between 1200 km and 2000 km for mean temperatures reported by Hansen and Lebedeff (1987) and Jones et al. (1997).
We build on the approach of Dai et al. (1997a) and define the correlation decay distance (CDD) as the distance at which zonally averaged interstation correlation is no longer significant at the 95% level (∼0.36 for N = 30). Our own analyses, using station records with at least 30 yr of data, indicate that the larger CDDs in the Southern Hemisphere reported by Dai et al. (1997a) do not occur when monthly precipitation anomalies are considered (Fig. 4). Indeed, we find similar CDDs for comparable latitude bands in the Northern and Southern Hemispheres (350–400 km for 0°–30°N/S and 400–500 km for 30°–60°N/S), although northern CDDs are noticeably shorter in the Northern Hemisphere summer. In addition, CDDs exhibit seasonality, particularly in the case of temperature where winter CDDs are much greater than in summer (Jones et al. 1997). Diurnal temperature range CDDs are intermediate between those of precipitation and mean temperature. We use globally averaged CDDs for each variable during the interpolation of monthly anomaly grids described in the next section.
b. Anomaly interpolation
Prior to interpolation, each station time series was converted to anomalies relative to the 1961–90 mean. Series with less than 20 yr of data during 1961–90 were excluded from the analysis. Anomalies for mean temperature and diurnal temperature range were expressed in absolute units (i.e., °C) while precipitation was expressed as a percentage of the 1961–90 mean. We used percentage units for precipitation because the variance of precipitation is closely related to the mean. Interpolation in percentage units preserves this relationship better than interpolation in absolute units. Other transformations that preserve the variance are also possible, such as expression in units of standard deviation (Jones and Hulme 1996) or in terms of some other distribution (e.g., Diaz et al. 1989; Hutchinson 1995b). Of all these transformations, absolute and percentage anomalies are the simplest particularly because the reexpression into absolute monthly units requires only a mean field.
We investigated several methods to interpolate the monthly station anomalies to a regular 0.5° lat × 0.5° long grid. These included surface-fitting procedures such as thin-plate splines (Wahba 1990; Hutchinson and Gessler 1994; Hutchinson 1995a) and minimum-curvature splines (Franke 1982), Delaunay triangulation, Thiessen (1911) polygon area averaging, and angular-distance weighted averaging (Shepard 1984; Willmott et al. 1985). We found that the surface-fitting procedures (splines) were generally unsuitable for interpolation of anomaly fields because of the sharp spatial discontinuities that occur, particularly for precipitation. When the spline interpolation was parameterized to capture these abrupt spatial jumps in precipitation (i.e., to have a high surface roughness), the fitted surfaces exhibited considerable undershoot and overshoot in regions with poor station control. Conversely, when the interpolation was parameterized to have low surface roughness, undershoot and overshoot was reduced but at the expense of excessive smoothing and a reduced variance in the gridded monthly fields. This is in contrast to our experience in the interpolation of climatological normals (Part 1, New et al. 1999), where gradients in long-term climate are more continuous in longitude–latitude–elevation space and hence more amenable to surface fitting procedures.
Triangulation and Thiessen polygon interpolation are computationally efficient but employ a limited number of data points in the estimation of gridpoint values and take no account of station distance. Angular distance–weighted (ADW) interpolation can make use of any (user defined) number of stations and employs a distance weighting function so that stations closest to the grid point of interest carry greater weight. Gridpoint estimates from triangulation, Thiessen, and ADW methods cannot exceed the magnitude of highest/lowest value in the contributing data points and are therefore not subject to undershoot and overshoot. We compared triangulation, Thiessen, and ADW interpolation and found that ADW performed better in areas with sparse data because the distance weighting produced a less-irregular grid, which is a result of the combination of a greater number of stations used in determining a gridpoint average and the use of distance weighting. Consequently, ADW was used to interpolate the monthly anomalies.
Interpolation as a function of latitude and longitude, as in ADW, ignores the influence of elevation. As noted earlier, a large proportion of the spatial variation in monthly temperature anomalies is a function of large-scale circulation features and is relatively independent of topography (New et al. 1999). Interpolation of mean temperature and diurnal temperature range as a function of only latitude and longitude is therefore adequate. This is not necessarily true for precipitation, where inclusion of elevation as a copredictor has been shown to improve the accuracy of the anomaly interpolation in some situations (M. F. Hutchinson 1997, personal communication). However, the ADW gridding employed in this study did not permit the inclusion of elevation as a predictor. Elevation could have been included using a trivariate interpolation technique such as splines or co-kriging, but these would have resulted in the smoothing problems described above. Moreover, the inclusion of elevation as a predictor invokes a penalty by markedly reducing the degrees of freedom available for defining a fitted surface. It was only over Europe, the United States, and southern Canada that there were sufficient stations to overcome this limitation. For the above reasons, it was decided not to use elevation but to employ the same ADW interpolation in all regions.
As discussed earlier, a station is unlikely to provide useful information about the variable of interest at grid points beyond its CDD. To prevent extrapolation to unrealistic values, the interpolated anomaly fields were forced toward zero at grid points beyond the influence of any stations. This was accomplished by creating synthetic stations with anomaly values of zero in regions where there were no stations within a predefined distance chosen to be equal to the global-mean CDD. These distances were 450 km for precipitation, 750 km for diurnal temperature range, and 1200 km for mean temperature. Figures 1–3 show the areas for selected years where there are no stations within these distances. Although globally averaged CDDs were used, there is scope for the application of latitudinally or spatially varying CDDs, and this will be considered in future versions of the dataset.
c. Combination with climatology
We combined the interpolated anomaly fields for each month from 1901 to 1996 with the CRU 0.5° 1961–90 mean monthly climatology (New et al. 1999) to arrive at monthly grids of surface climate. This combined dataset is henceforth referred to as CRU05. The CRU 1961–90 climatology was constructed with this purpose in mind and has a number of advantages over other climatologies; chief among these is that it is strictly constrained to the period 1961–90. This permitted the addition of the anomaly fields, which were standardized against the 1961–90 period, without any biases arising from temporal sampling mismatches. The CRU climatology is also the only published climatology of global land areas that includes all of the climate elements in the anomaly dataset.
In some areas with more-sparse station coverage, the 1961–90 average of the monthly anomaly grids diverged from zero, for example, over Angola and the Democratic Republic of the Congo. This arose directly from the interpolation error in the individual anomaly fields, which, not unexpectedly, did not add up to zero. To maintain consistency, individual fields from 1961 to 1990 were adjusted so that their 1961–90 mean was zero by subtracting this mean interpolation error.
It should be noted that a direct consequence of the relaxation of the anomaly surfaces to zero in regions with no data coverage is that the resulting monthly climate relaxes toward the 1961–90 climatology in such areas. This characteristic of the dataset is discussed in more detail elsewhere in the paper.
d. Evaluation
Major sources of error in gridded datasets of this nature are instrumental (isolated errors, systematic errors, and inhomogeneity), inadequate station coverage, and interpolation errors (Groisman et al. 1991; Dai et al. 1997a; Jones et al. 1999). Isolated errors and subtle inhomogeneities not detected during quality control do not have a significant effect at the regional scale. However, such errors are noticeable at grid points near the offending station, particularly if the network is sparse. Inadequate station coverage is the largest source of error, but there is little that can be done about this except to ensure that the existing data are error free and that the interpolation methodology makes maximum use of the available data.
Extensive evaluation of the CRU05 gridded data is beyond the scope of this paper. An intercomparison of CRU05 precipitation and several other long-term instrumental and shorter-term satellite and/or gauge-based datasets is the focus of a separate study (Hulme et al. 2000, manuscript submitted to J. Climate). In this section, a limited comparison with two precipitation, one mean temperature, and one diurnal temperature range dataset, is presented to highlight the differences that can arise due to differing station networks and/or interpolation approaches.
1) Precipitation
Regional time series derived from CRU05 and two other precipitation datasets were compared over two rectangular regions with good and poor station coverage, respectively: the United Kingdom (49°N, 11°W–61°N, 3°E) and the Amazon basin (15°S, 70°W–5°N, 40°W). The other two datasets are those of Hulme (1994, updated) (HULME) and Dai et al. (1997a), (DAI) both of which have a spatial resolution of 2.5° lat–long. These were the only two other global datasets of monthly precipitation covering the period of 1901–96 known to the authors. Both these datasets were produced by interpolation of station anomalies using a Thiessen polygon (HULME) and spherical inverse-distance weighted approach (DAI), respectively. While HULME grid points are estimated using only those within the grid box, DAI grid points employ an influence radius of 350 km to select data. However, HULME uses a spherical angular distance weighting with an influence radius of 600 km to infill missing data at individual stations prior to the Thiessen gridding process.
Area-averaged time series for the two regions were constructed using the approach recommended by Jones and Hulme (1996). Gridpoint data were transformed to anomalies from the 1961–90 mean and expressed in standard deviation units. These were then averaged with a latitudinal weighting and back-transformed to millimeter units using the regionally averaged 1961–90 monthly means and standard deviations. The CRU05 dataset was first averaged to 2.5° resolution, again using a latitudinal weighting. Both DAI and CRU05 were masked using HULME, ensuring that these more spatially complete datasets do not have more grid points than HULME. In fact, at times, the masked DAI grids had fewer grid points than HULME because (i) the DAI land–sea mask is slightly different from that of HULME and (ii) at the beginning and end of the record, DAI had fewer contributing stations than HULME, resulting in fewer grid points with data.
The resulting regional time series of annual precipitation, expressed as anomalies relative to the 1901–96 mean, are shown in Fig. 5. In both regions, the three datasets agree in broad detail but exhibit some differences. Most notably, the CRU05 time series tends to have lower interannual variance than the other two, as exhibited by overlapping 20-yr coefficients of variation (CVs;see Fig. 5). We ascribe these differences primarily to different station selection criteria in the interpolation schemes. As noted earlier, HULME only makes use of those stations situated within a 2.5° grid cell. DAI used all stations that fall within 350 km of the grid point, which results in a larger search radius (about 3° at the equator) than HULME. In contrast, we select the eight nearest stations, with more distant stations having exponentially decreasing influence. Consequently, in regions where there are fewer than eight stations within the selection region of HULME or DAI, CRU05 will be derived from more stations that are also more dispersed, both of which will tend to reduce the variance of the derived gridpoint values. This occurs at most grid points in the Amazon and at several in the United Kingdom.
In the Amazon region, all the series exhibit increased interannual variance at the beginning and end of the record. This appears to be a real signal that is reflected in the raw station data (see Fig. 5). However, HULME, and particularly DAI, show a steeper rise in interannual variance than CRU05 at the beginning of the record. This difference arises for two reasons. First, prior to 1930 and especially before 1910, there are very few contributing stations to HULME and CRU05 (and presumably DAI). Thus, for HULME and DAI, gridpoint estimates are derived from only a few stations or only a single station, which serves to increase the gridpoint variance. This does not occur for CRU05, where each gridpoint average remains a function of the eight contributing stations. If anything, the CRU05 variance will be reduced as more distant stations are included (albeit with low weights) in data-sparse years. The second cause of increased variance is the reduction in the number of grid points contributing to the Amazon series. Prior to 1910, there are only two grid points for HULME (and CRU05, when masked) and only one for DAI (when masked with HULME). Thus, the regional series approach a single gridpoint series, which will have higher variance than a multigridpoint series.
Also shown in Fig. 5 are the regional series derived from the complete CRU05 grid (i.e., not masked by HULME) over each region. In the United Kingdom, this is very similar to the masked CRU05 series because of the near-complete grid coverage in HULME throughout the series. In the Amazon, the complete and masked CRU05 series diverge markedly at the beginning and, to a lesser extent, the end of the century. In the masked series, only a few grid points in the central Amazon have data, and the greater number of grid points with data in the southeast part of the region bias the series. This southeast region coincidentally has a higher interannual variance than the Amazon, so in early years where there are few grid points in the central Amazon, this heightened variance dominates the masked series. In contrast, the spatially complete series retains an equal contribution from each grid point throughout the century, and the low-variance central Amazon reduces the effect of the southeast in all years.
The southeast Amazon is also the source of the increased variance in the station-based series at the beginning and end of the century (Fig. 5). This produces a larger increase in the variance of the masked series at the beginning of the century because the counteracting effect of the central Amazon is minimal (few grid points with data). At the end of the record, the increased variance in the southeast does not have such an influence because there is relatively good coverage over the rest of Amazon.
Figure 5 also provides a qualitative indication of the error that may be potentially associated with gridded precipitation datasets. All three have been generated using datasets that have many stations in common but with different interpolation methods. Where the station network is poor, contrasting interpolation methods can produce quite varied results. Where the network is good, the three datasets tend to converge, but the CRU05 grids produce regional time series with slightly lower interannual variance. This reduced variance would appear to be the cost of interpolation to produce spatially continuous fields in data-sparse regions.
2) Mean temperature
Regional time series of mean annual temperature were calculated from the CRU05 dataset and the dataset of Jones (1994, updated; hereinafter JONES) for the United Kingdom and Amazon (Fig. 6). This was accomplished in the same way as precipitation except that the gridded data were transformed to degrees Celsius rather than standard deviation anomalies prior to area averaging. The station network is sparse over the Amazon, with a maximum of 25 stations, but drops off to 2 (JONES) and between 6 and 7 (CRU05) before 1950. Although the two series are well correlated, CRU05 is less variable and diverges (warm offset) quite markedly from JONES before 1950. This is primarily due to the presence of fewer stations in JONES; annual time series for the Amazon produced by Victoria et al. (1998) from a similar set of stations to CRU05 agree better with CRU05 than JONES over this period (not shown). Over the period with relatively good station coverage (1950–90), the interannual variability of JONES markedly increases while that of CRU05 remains relatively constant. Changes in the variance of both grid box and regional time series of temperature is to be expected from the method used by JONES (see discussion in Jones et al. 1997), while the CRU05 methodology is less sensitive to varying station networks.
The U.K. time series from each dataset are very similar, although CRU05 is slightly warmer than JONES over the period of 1930–60. This is most likely because JONES has specifically excluded several stations that exhibit marked urban warming (e.g., Dublin and Kew) which are used in CRU05. The effect of these stations is diluted after 1960, when approximately 100 additional U.K. stations come into the CRU05 dataset. Before 1960, CRU05 and Jones had a similar station network. Interannual variability of the two series is also very similar, although CRU05 tends to have slightly lower variance after 1960 when the number of stations increases to over 100 (cf. ∼20 for JONES). As with precipitation, this example indicates that results from the two gridding methodologies converge with increasing station coverage.
On a hemispheric and global basis, CRU05 agrees well with JONES. The major differences between the two occur before about 1940 (Fig. 7), with CRU05 being about 0.1°C warmer and 0.1°–0.2°C cooler than JONES in the Northern and Southern Hemispheres, respectively. Hemispheric averages are subject to some uncertainty due to sampling errors. Jones et al. (1997) have recently quantified these errors, which increase in the past when station coverage was sparser. In Fig. 7, the standard errors are shown as a shaded band and are calculated using the approach of Jones et al. (1997) but are limited to the land domains under study here. This was achieved by averaging the Jones et al. (1997) 5° × 5° grid-box standard errors over the domain of interest using their Eqs. (11) and (12). The number of spatial degrees of freedom was reduced by one-third (one-half) for the Northern (Southern, north of 60°S) Hemisphere to allow for the degrees of freedom that occur over the (excluded) ocean. It can be seen from Fig. 7 that it is only at the very beginning of the century that the CRU05 masked time series are more than one standard error different from JONES.
The differences between CRU05 and JONES are partly related to the extrapolation to data-sparse regions where CRU05 is relaxed toward the (warmer) 1961–90 mean when there are no stations within the correlation decay distance. This explanation is supported, in both hemispheres, by the larger, warmer offset associated with the time series calculated from the full CRU05 grid (i.e., including areas relaxed to 1961–90), compared to those calculated from the masked CRU05 grid. However, the relaxation to the 1961–90 mean does not explain the cooler bias in the masked CRU05 series in the Southern Hemisphere. In this case, the offset may be due to the use of different (but overlapping) station networks. JONES is constructed using only station time series where urban warming bias is minimal whereas CRU05 makes use of all available station data. In earlier years, over the Southern Hemisphere, urban stations made a greater relative contribution to the CRU05 network and may have produced the larger negative offset. This effect does not appear to be as marked in the Northern Hemisphere, probably because of the more extensive network of nonurban stations and the fact that urban warming was already under way to some extent in the first half of the century.
3) Diurnal temperature range
The Northern Hemisphere time series of diurnal temperature range derived from CRU05 were compared with those derived from the dataset of Easterling et al. (1997; hereinafter EAST) for the period of 1950–93 (Fig. 8). Note that the CRU05 series is constructed using the full Northern Hemisphere fields because there was no information on the space–time distribution of grid boxes with data in EAST. Both series show the marked decreasing trend in diurnal temperature range from 1950 to 1993 reported by Easterling et al. (1997), though CRU05 does not show as large a negative anomaly as EAST in 1993; this is, however, the year with the most sparse station coverage in both datasets. Prior to 1940, the CRU05 record is dominated by station data in North America and Russia and shows a similar trend to the combined long-term records from these regions reported by Karl et al. (1993). Some of the decrease in CRU05 prior to 1940 is also due to the relaxation toward the 1961–90 mean (lower) in regions that have no station control but nonetheless contribute to the hemispheric mean.
3. Secondary variables
a. Datasets
The datasets of secondary variables (wet-day frequency, vapor pressure, cloud cover, and ground frost frequency) held by CRU are less comprehensive than those of the primary variables. This is partly because CRU has only recently made efforts to obtain these variables but also because they are less widely measured than temperature and precipitation, particularly in earlier years. To date, station time series for some or all of the secondary variables have been acquired from some 70 different sources. Several of these are public domain or available for purchase, but many have been obtained through personal contacts or directly from national meteorological agencies (NMAs). These datasets are updated on an ad hoc basis as new data are obtained and more regularly with monthly CLIMAT reports (wet-day frequency, vapor pressure, and sunshine).
The distribution of stations in the CRU dataset from 1901 to 1995 is shown in Figs. 9–11. Cloud cover over the northern mid–high latitudes is fairly comprehensive from the 1950s onward, but is virtually nonexistent elsewhere, except for the 1980s, where the Hahn et al. (1994) global synoptic station dataset makes a major contribution. This will be significantly enhanced when the updated (1950s–1995) Hahn synoptic data are released in 1999 (C. Hahn 1998, personal communication).
The network of stations with vapor pressure and wet-day frequency exhibits a similar pattern to that of cloud cover but does not benefit from the inclusion of synoptic data in the 1980s or from data from the United States (efforts are currently under way to obtain long-term U.S. data), western Europe, China (for vapor pressure), and Australia. Both these datasets will be enhanced once data from the Monthly Climatic Data for the World/CLIMAT are incorporated, a process that is presently under way.
Station data for ground frost frequency (not shown) are restricted to the former Soviet Union, Canada, the United Kingdom, and a few other locations where access to daily ground/grass minimum temperature permitted the calculation of these time series.
We calculated CDDs for cloud cover, vapor pressure, and wet-day frequency at latitudes where the stations network permitted (Fig. 12; 60°S–90°N for cloud cover, 0°–90°N for wet-day frequency, and 30°–90°N for vapor pressure). Cloud cover CDDs range between 500 km at mid–high latitudes and ∼1000 km at low latitudes, with a global average of ∼750 km. Vapor pressure exhibits similar CDDs to mean temperature, both in terms of distances (1000–2000 km) and their seasonal cycle, suggesting that the two are a function of the same large-scale circulation forcings. Wet-day frequency decay distances are ∼500 km at low–mid northern latitudes and ∼300 km at high northern latitudes, mirroring the latitudinal variation of precipitation CDDs.
b. Empirical relationships with primary variables
The patchy distribution of stations with secondary variable data, particularly prior to 1960, meant that interpolation of anomalies directly from station data was not feasible. This is despite the large CDDs determined for cloud cover and, particularly, vapor pressure. We therefore used the existing data to develop and/or test empirical (in the case of cloud cover and ground frost frequency) or conceptual (vapor pressure and wet-day frequency) relationships with the primary variables. These relationships were used to calculate grids of synthetic monthly anomalies. In the case of cloud cover, wet-day frequency, and vapor pressure, the synthetic grids were then blended with station anomalies in the regions where such data were available. Finally, the resultant anomaly fields were combined with the CRU 0.5° 1961–90 mean climatology fields.
1) Cloud cover
The negative correlation between diurnal temperature range and both precipitation and cloud cover has been well documented at both regional/global scales (e.g., Karl et al. 1993; Dai et al. 1997b) and at individual weather stations (e.g., Wang et al. 1993; Ruschy et al. 1991). We used this as the starting point for the development of a predictive relationship for cloud cover.
Station anomaly time series of cloud cover, precipitation, and diurnal temperature range were grouped into 5° lat–long bins. Monthly cloud cover in each bin was regressed on diurnal temperature range and precipitation. In general, cloud cover correlated better with diurnal temperature range than precipitation (Fig. 13). The strong correlation between precipitation and diurnal temperature range (not shown) also meant that the inclusion of both climate elements in the regression resulted in little additional variation in cloud cover being explained. As a rule, correlation with diurnal temperature range is weak in arid regions due to a general absence of cloud cover. Notable exceptions were the arid west coasts of Africa and South America, where low cloud–fog associated with advection is frequent. The relationship between diurnal temperature range and cloud cover is also weak at around 60°N in winter and becomes positive in the Arctic. This is probably because the extreme cold and the absence of incoming solar radiation during high-latitude winters result in minimal modulation of surface energy balance by cloud cover. At these high latitudes, the correlation between precipitation and cloud cover is slightly stronger.
We discarded precipitation from further analysis because of the generally better relationship between diurnal temperature range and cloud cover. A further reason for using only one predictor variable arises from the way the grids of primary variables (which form the input in the calculation of synthetic fields) were produced. In years before ∼1950, both precipitation and diurnal temperature range fields are forced toward the 1961–90 mean in regions where there is no station control (discussed earlier). This occurs more frequently with diurnal temperature range than precipitation. Using a regression against diurnal temperature range and precipitation could produce unrealistic synthetic cloud values where one of the predictor variables was constrained to zero and the other was not.
At each 5° lat–long bin for which there were data, we used resistant regression (Emerson and Hoaglin 1983) to determine a predictive relationship with diurnal temperature range. Resistant regression is insensitive to isolated data errors, which is useful when the analysis is automated for a large number of data samples. We then interpolated the monthly regression coefficients to a regular 0.5° lat–long grid, assuming the coefficients for each 5° bin represented point values at the bin center. The 0.5° lat–long grids of diurnal temperature range anomalies were subsequently used as input to calculate synthetic cloud cover anomaly grids. We evaluated the resulting synthetic grids by degrading them to 2.5° lat–long resolution and comparing them to the 1982–91 monthly cloud cover grids of Hahn et al. (1994). Monthly gridpoint correlations (not shown) for the 10 yr of data in common are similar to those in Fig. 13 (top), indicating that the use of diurnal temperature range grids captures the majority of covariance between cloud cover and diurnal temperature range that occurs at individual stations.
Gridpoint data from the synthetic anomaly grids were used as artificial station data in areas where there were no station control, defined as a distance farther than 700 km from any observed data. Figure 14 provides an example of the resultant network of artificial and real stations. The combined station and synthetic data were interpolated using the method described in section 2b to produce anomaly grids at 0.5° lat–long resolution and subsequently combined with the CRU05 climatological mean fields to produce monthly grids of cloud cover for 1901–96.
Because diurnal temperature range is relaxed to the 1961–90 mean in areas where there are no station data, the cloud cover grids exhibit similar behavior. Thus, prior to 1950, over most regions in the northern Tropics and Southern Hemisphere, the CRU05 cloud cover grids approach the 1961–90 climatology and have little or no interannual variability.
2) Vapor pressure
The relatively large CDDs for vapor pressure suggest that there is value in interpolating anomalies from station data where they are present and using synthetic data in regions without vapor pressure data.
While recognizing the problems inherent in converting monthly relative humidity to vapor pressure (see also New et al. 1999), it was felt that the approach was justified because their expression as anomalies removes much of the systematic bias arising from the conversion and station data are preferable to the alternative, namely, synthetic data.
We investigated whether their method was appropriate for estimating monthly vapor pressure by using the observed monthly data described above. We compared the correlation between observed vapor pressure and vapor pressure predicted using minimum temperature and Kimball’s estimate of dewpoint temperature. We found that in comparison with the use of minimum temperature, Kimball’s method did not explain any additional variance in observed vapor pressure in the regions for which we had observed data. In addition, in many instances, Kimball’s formula produced worse predictions of vapor pressure in warmer months at more arid sites. This suggests that the general form of the relationship does not apply to monthly data. Although a different relationship may exist for monthly data, we did not have dewpoint temperature data to attempt the definition of such a relationship. We therefore used the gridded minimum temperatures as a proxy for dewpoint temperature and hence synthetic gridpoint vapor pressure, using (1).
The accuracy of the derived monthly estimates was evaluated using the CRU monthly time series of minimum temperature and vapor pressure. Stations with common year months of these variables were extracted and the vapor pressure was estimated using (1). The estimated and observed vapor pressure were then expressed as percentages of their respective means prior to the calculation of comparative statistics on a month-by-month basis. Correlation coefficients for January and July are shown in Fig. 15 (other months are intermediate between these two). In general, the method works better in winter than in summer and, for any particular month, better at high latitudes than at low latitudes. The method is least effective in arid regions, notably in central Asia and northwest China, most probably for reasons discussed above. Results from China are subject to additional uncertainty arising from the conversion of observed relative humidity to vapor pressure.
A similar procedure to that used for cloud cover was followed to derive blended grids of monthly vapor pressure. Monthly grids of the primary variables were used in (1) to derive grids of synthetic vapor pressure. The synthetic values were converted to anomalies relative to the 1961–90 synthetic mean. Synthetic gridpoint data, farther than a CDD of 1000 km from any observed station data, were combined with the dataset of observed anomalies and interpolated using ADW gridding. The resulting blended anomaly fields were added to the CRU05 1961–90 mean climatology to arrive at monthly grids of surface vapor pressure for 1901–96.
3) Wet-day frequency
Defining a as in (5) forces predicted wet-day frequency to equal the 1961–90 mean when monthly precipitation is equal to the 1961–90 mean precipitation (Fig. 16). A value of 0.45 for x in (4) was chosen by selecting the value that resulted in the smallest mean absolute error between predicted and observed wet-day frequency in the CRU dataset of station time series. At individual stations, the optimum value of x varied between ∼0.35 and ∼0.6. Synthetic wet-day frequency values were constrained to be zero (if there was no observed precipitation) and were set always to be no greater than the number of days in the month.
The accuracy of the relationship was assessed using observed time series of precipitation and wet-day frequency in the CRU station dataset. The correlation between observed and predicted time series varies between 0.35 and 0.96 (Figs. 17 and 18). The correlation is better in humid than in subhumid regions and, at most stations, better in winter than in summer, where precipitation tends to be more frontal than convective. Predictive error exhibits a trend from positive bias at low observed wet-day frequency to negative bias at high observed wet-day frequency (Fig. 18). This is partly a function of the formulation of (4) and the upper limit (number of days in month) for synthetic wet-day frequency. Thus, at an observed frequency of 1, the error cannot be less than −1 but can have any positive value, leading to an overall positive bias. Conversely, when the observed frequency is equal to the number of days in the month, positive errors are not possible, resulting in an overall negative bias.
As with the other secondary variables, the synthetic anomaly fields were merged with observed station anomalies and combined with the 1961–90 climatology to arrive at grids of monthly wet-day frequency from 1901 to 1996.
4) Ground frost frequency
The suitability of (6) for predicting monthly ground frost frequency was tested against observed monthly time series from 120 stations in the United Kingdom (Fig. 19). Most of the predictions are within ±10% of the observed values, with a tendency for overestimation and underestimation at low and high observed frequencies, respectively. Reasons for this are essentially the same as those producing a similar pattern for wet-day frequency. Correlations between observed and ground frost frequency are lowest in summer because, at any station, there are fewer months with both observed and simulated ground frost frequency days greater than zero.
For the calculation of monthly ground frost frequency fields, (6) was used with gridded minimum temperature (i.e., mean temperature minus one-half diurnal temperature range) to generate synthetic ground frost frequency anomaly fields. These were subsequently added to the CRU05 climatology to arrive at monthly ground frost frequency grids in absolute units for 1901–96. Thus, ground-frost frequency is the only secondary variable derived entirely from synthetic anomalies and not merged with observed station data.
4. Discussion
We have described the construction of a spatially complete gridded dataset of monthly surface climate comprising seven variables over global land areas for the period of 1901–96. These data represent an advance over previous products for several reasons.
The dataset has a higher spatial resolution (0.5° latitude by 0.5° longitude) than other datasets of similar temporal extent.
Conversely, it extends much farther back in time than other products that have similar spatial resolution.
It encompasses a more extensive suite of surface climate variables than available elsewhere, namely, mean temperature and diurnal temperature range, precipitation and wet-day frequency, vapor pressure, cloud cover, and ground frost frequency.
The construction method ensures that strict temporal fidelity is maintained; the anomalies are calculated using the same 1961–90 period as the mean climatology to which they are applied.
These time series are of particular use in applied climatology, as spatially continuous input data to environmental simulation models. Examples include modeling biogeochemical cycling in terrestrial ecosystems and global/regional hydrological modeling. In addition, the primary variables—precipitation, mean temperature, and diurnal temperature range—are derived entirely from observed station data and represent a good independent dataset for evaluation of regional climate models (e.g., Giorgi and Francisco 2000) and satellite-derived products. The mean temperature fields are not ideally suited for climate change detection because the input dataset includes stations that have an urban warming bias. The secondary variable fields—wet-day frequency, vapor pressure, cloud cover, and ground frost frequency—were constructed using a combination of observed data and empirical relationships with the primary variables. Therefore, these secondary variables should be used with caution in such climatological applications. Nonetheless, for the first time, the secondary variables provide a century-long record of spatially complete surface climate data.
For the primary variables, a direct consequence of the anomaly interpolation methodology is relaxation of the monthly fields toward the 1961–90 mean in regions where there are no stations within the correlation decay distance. This occurs most often in earlier years, particularly for diurnal temperature range. To provide an indication of where this occurs, each monthly field has a companion field listing the distance from each grid center to the nearest station. Future research is intended to develop ways of avoiding relaxation to the 1961–90 mean. One possibility is to develop 10-yr mean climatologies for 1901–present from station data that are available and constrain missing data to the relevant 10-yr mean. This would ensure that secular change in, for example, temperature is reflected in the monthly time series data. An alternative would be to fill in missing station data in early years prior to interpolation. This could be done using regression (or an alternative prediction method) with stations that do have long-term data.
Diurnal temperature range can be used in combination with mean temperature to calculate grids of maximum and minimum temperature. The resulting gridded time series will include all of the variability contained in the mean temperature grids plus additional variability in diurnal temperature range where station data permit. In domains where monthly diurnal temperature range is relaxed to the climatology, maximum and minimum temperature will only reflect variability in mean temperature.
For the secondary variables, the interpolation of merged station and synthetic data makes it more difficult to provide an indication of where a monthly field is based on (i) observed data, (ii) synthetic data derived from primary variables with interannual variability, or (iii) synthetic data derived from primary variables that had been relaxed to the climatology. However, as with the primary variables, companion grids of grid points to nearest station distances were calculated. If these are used in combination with the station information for the primary variables that were used to derive the synthetic grids, some idea of the contributing inputs can be obtained. For a more qualitative indication, the maps’ station distributions in Figs. 1–3 (primary variables) and Figs. 9–11 (secondary variables) can be used.
The CRU05 dataset is available from the Climatic Research Unit via Dr. David Viner (d.viner@uea.ac.uk), manager of the Climate Impacts LINK Project (http://www.cru.uea.ac.uk/link).
Acknowledgments
This work was undertaken with funding for M.N. from the U.K. Natural Environment Research Council (Grant GR3/09721); for M.H. from the U.K. Department of the Environment, Transport, and the Regions (DETR; Contract EPG 1/1/48); and P.J. from the Department of Energy (Grant DE-FG02-98ER62601). The CRU precipitation, mean temperature, and diurnal temperature range datasets have been compiled over a number of years with support from the U.K. DETR and the Department of Energy. Data obtained from the Global Historical Climatology Network and the Carbon Dioxide Information Analysis Centre supplemented the CRU archives. Other data supplied by numerous national meteorological agencies, research organizations, and private individuals made this research possible, and these contributions, while too numerous to mention by name, are gratefully acknowledged. The Climate Impacts LINK Project (UK DETR Contract EPG 1/1/16) at the Climatic Research Unit provided computing facilities for this research.
REFERENCES
Carter, T. R., M. L. Parry, H. Harasawa, and S. Nishioka, 1994: IPCC technical guidelines for assessing climate change impacts and adaptations. Rep. CGER-I015-94, University College, London and Centre for Global Environmental Research, Tsukaba, 59 pp. [Available from University College London, Gower St., London WC1E 6BT, United Kingdom.].
Christensen, J. H., B. Machenhauer, R. G. Jones, C. Schar, P. M. Ruti, M. Castro, and G. Visconti, 1997: Validation of present-day regional climate simulations over Europe: LAM simulations with observed boundary conditions. Climate Dyn.,13, 489–506.
Cramer, W., and A. Fischer, 1996: Data requirements for global terrestrial ecosystem modelling. Global Change and Terrestrial Ecosystems, B. Walker and W. Steffen, Eds., Cambridge University Press, 530–565.
Dai, A., and I. Y. Fung, 1993: Can climate variability contribute to the “missing” carbon sink? Global Biogeochem. Cycles,7, 599–609.
——, ——, and A. D. Del Genio, 1997a: Surface observed global land precipitation variations during 1900–1988. J. Climate,10, 2943–2962.
——, A. D. Del Genio, and I. Y. Fung, 1997b: Clouds, precipitation and temperature range. Nature,386, 665–666.
Diaz, H. F., R. S. Bradley, and J. K. Eischeid, 1989: Precipitation fluctuations over global land areas since the late 1800s. J. Geophys. Res.,94, 1195–1210.
Easterling, D. R., and Coauthors, 1997: Maximum and minimum temperature trends for the globe. Science,277, 364–367.
Eischeid, J. K., H. F. Diaz, R. S. Bradley, and P. D. Jones, 1991: A comprehensive precipitation dataset for global land areas. DOE/ER-6901T-H1, U.S. Department of Energy, Washington, DC, 81 pp. [Available from National Technical Information Service, U.S. Dept. of Commerce, Springfield, VA 22161.].
Emerson, J. D., and D. C. Hoaglin, 1983: Resistant lines for y versus x. Understanding Robust and Exploratory Data Analysis, D. C. Hoaglin et al., Eds., John Wiley and Sons, 129–165.
Franke, R., 1982: Smooth interpolation of scattered data by local thin plate splines. Comput. Math. Appl.,8, 273–281.
Giorgi, F., and R. Francisco, 2000: Uncertainties in regional climate change prediction: A regional analysis of ensemble simulations with the HadCM2 coupled AOGCM. Climate Dyn.,16, 169–182.
Groisman, P. Y., V. V. Koknaeva, T. A. Belokrylova, and T. R. Karl, 1991: Overcoming biases of precipitation measurement: A history of the USSR experience. Bull. Amer. Meteor. Soc.,72, 1725–1733.
Hahn, C. J., S. G. Warren, and J. London, 1994: Climatological data for clouds over the globe from surface observations, 1982–1991:The total cloud edition. Rep. NDP026A, Carbon Dioxide Analysis Center, Oak Ridge National Laboratory, Oak Ridge, TN, 42 pp.
Hansen, J., and S. Lebedeff, 1987: Global trends of measured surface air-temperature. J. Geophys. Res.,92, 13 345–13 372.
Horton, B., 1995: Geographical distribution of changes in maximum and minimum temperatures. Atmos. Res.,37, 101–117.
Huffman, G. J., and Coauthors, 1997: The Global Precipitation Climatology Project (GPCP) Combined Precipitation Dataset. Bull. Amer. Meteor. Soc.,78, 5–20.
Hulme, M., 1992: A 1951–80 global land precipitation climatology for the evaluation of general circulation models. Climate Dyn.,7, 57–72.
——, 1994: Global changes in precipitation in the instrumental period. Global Precipitation and Climate Change, M. Desbois and F. Désalmand, Eds., Springer-Verlag, 387–405.
——, S. C. B. Raper, and T. M. L. Wigley, 1995: An integrated framework to address climate-change (ESCAPE) and further developments of the global and regional climate modules (MAGICC). Energy Policy,23, 347–355.
Hutchinson, M. F., 1995a: Interpolating mean rainfall using thin plate smoothing splines. Int. J. Geogr. Inf. Syst.,9, 385–403.
——, 1995b: Stochastic space–time weather models from ground-based data. Agric. For. Meteor.,73, 237–264.
——, and P. E. Gessler, 1994: Splines—More than just a smooth interpolator. Geoderma,62, 45–67.
Jones, P. D., 1994: Hemispheric surface air temperature variability—A reanalysis and update to 1993. J. Climate,7, 1794–1802.
——, and M. Hulme, 1996: Calculating regional climatic time series for temperature and precipitation: Methods and illustrations. Int. J. Climatol.,16, 361–377.
——, T. J. Osborn, and K. R. Briffa, 1997: Estimating sampling errors in large-scale temperature averages. J. Climate,10, 2548–2568.
——, M. New, D. E. Parker, S. Martin, and I. G. Rigor, 1999: Surface air temperature and its changes over the past 150 years. Rev. Geophys.,37, 173–199.
Karl, T. R., and Coauthors, 1993: A new perspective on recent global warming—Asymmetric trends of daily maximum and minimum temperature. Bull. Amer. Meteor. Soc.,74, 1007–1023.
Kimball, J. S., S. W. Running, and R. Nemani, 1997: An improved method for estimating surface humidity from daily minimum temperature. Agric. For. Meteor.,85, 87–98.
New, M. G., M. Hulme, and P. D. Jones, 1999: Representingtwentieth-century space–time climate variability. Part I: Development of a 1961–90 mean monthly terrestrial climatology. J. Climate,12, 829–856.
Piper, S. C., and E. F. Stewart, 1996: A gridded global data set of daily temperature and precipitation for terrestrial biosphere modeling. Global Biogeochem. Cycles,10, 757–782.
Priestly, C. H. B., and R. J. Taylor, 1972: On the assessment of surface heat flux and evaporation using large-scale parameters. Mon. Wea. Rev.,100, 81–92.
Rudolf, B., H. Hauschild, W. Rueth, and U. Schneider, 1994: Terrestrial precipitation analysis: Operational method and required density of point measurements. Global Precipitation and Climate Change, M. Desbois and F. Désalmand, Eds., Springer-Verlag, 173–186.
Ruschy, D. L., D. G. Baker, and R. H. Skaggs, 1991: Seasonal variation in daily temperature ranges. J. Climate,4, 1211–1216.
Shepard, D., 1984: Computer mapping: The SYMAP interpolation algorithm. Spatial Statistics and Models, G. L. Gaile and C. J. Willmott, Eds., D. Reidel, 95–116.
Shuttleworth, J. W., 1992: Evaporation. Handbook of Hydrology, D. R. Maidment, Ed., McGraw-Hill, 4.1–4.53.
Thiessen, A. H., 1911: Precipitation averages for large areas. Mon. Wea. Rev.,39, 1082–1084.
Victoria, R. L., L. A. Martinelli, J. M. Moraes, M. V. Ballester, A. V. Krusche, G. Pellegrino, R. M. B. Almeida, and J. E. Richey, 1998: Surface air temperature variations in the Amazon region and its borders during this century. J. Climate,11, 1105–1110.
Wahba, G., 1990: Spline Models for Observational Data. Society for Industrial and Applied Mathematics, 169 pp.
Wang, W. C., Q. Y. Zhang, D. R. Easterling, and T. R. Karl, 1993: Beijing cloudiness since 1875. J. Climate,6, 1921–1927.
Willmott, C. J., C. M. Rowe, and W. D. Philpot, 1985: Small-scale climate maps: A sensitivity analysis of some common assumptions associated with grid point interpolation and contouring. Amer. Cartogr.,12, 5–16.
Xie, P., and P. A. Arkin, 1996: Analyses of global monthly precipitation using gauge observations, satellite estimates, and numerical model predictions. J. Climate,9, 840–858.
——, B. Rudolf, U. Schneider, and P. A. Arkin, 1996: Gauge-based monthly analysis of global land precipitation from 1971 to 1994. J. Geophys. Res.,101, 19 023–19 034.