The validation of climate model simulations creates substantial demands for comprehensive observed climate datasets. These datasets need not only to be historically and geographically extensive, but need also to be describing areally averaged climate, akin to that generated by climate models. This paper addresses one particular difficulty found when attempting to evaluate the daily precipitation characteristics of a global climate model, namely the problem of aggregating daily precipitation characteristics from station to area.
Methodologies are developed for estimating the standard deviation and rainday frequency of grid-box mean daily precipitation time series from relatively few individual station time series. Temporal statistics of such areal-mean time series depend on the number of stations used to construct the areal means and are shown to be biased (standard deviations too high, too few raindays) if insufficient stations are available. It is shown that these biases can be largely removed by using parameters that describe the spatial characteristics of daily precipitation anomalies. These spatial parameters (the mean interstation correlation between station time series and the mean interstation probability of coincident dry days) are calculated from a relatively small number of available station time series for Europe, China, and Zimbabwe. The relationships that use these parameters are able to successfully reproduce the statistics of grid-box means from the statistics of individual stations. They are then used to estimate the statistics of grid-box means as if constructed from an infinite number of stations (for standard deviations) or 15 stations (for rainday frequencies), even if substantially fewer stations are actually available. These estimated statistics can be used for the evaluation of daily precipitation characteristics in climate model simulations, and an example is given using a simulation by the Commonwealth Scientific and Industrial Research Organisation atmosphere general circulation model. Applying the authors’ aggregation methodology to observed station data is a more faithful form of model validation than using unadjusted station time series.
Recent climate modeling studies of the enhanced greenhouse effect have indicated that one of the most important climate changes may be an increase in the frequency of heavy precipitation amounts (Gordon et al. 1992; Whetton et al. 1993; Fowler and Hennessy 1995; Gregory and Mitchell 1995). Some simulations of future climate show more (or similar) mean precipitation, but falling on fewer raindays. While there is some observational evidence to support such a change (Karl et al. 1995; Suppiah and Hennessy 1996), some doubt must remain over these model results because previous work has shown such models to have systematic errors in their simulation of rainday frequencies (Reed 1986; Wilson and Mitchell 1987; Rind et al. 1989; Gregory and Mitchell 1995).
These previous validation studies of modeled rainday frequencies have, however, two main deficiencies. First, the paucity of time series of observed precipitation data on a daily timescale, available to the scientific community, often prevents the construction of “true” grid-box mean time series for comparison with simulations. Second, such validation has been confined to single or only a few grid boxes. One of the main reasons for both of these deficiencies is the lack of continental-scale quality controlled datasets of observed daily precipitation containing many hundreds if not thousands of stations. Although there are exceptions to this complaint (e.g., the United States), in general, it is either impossible (e.g., the Tropics) or too expensive (e.g., Europe; see Hulme 1994) for datasets containing a sufficient density of daily series to be compiled.
We aim to improve upon these earlier studies by addressing these two deficiencies. A technique is developed to estimate two of the statistical properties of “true” grid-box mean precipitation from a small, finite number of stations. As an example, this technique is then applied to part of western Europe, for validation of the Commonwealth Scientific and Industrial Research Organisation (CSIRO) atmosphere general circulation model (AGCM) simulation performed for the Atmospheric Model Intercomparison Project (AMIP; Gates 1992). A more detailed validation of many AMIP model simulations, using the technique developed in the present paper, is given by Osborn and Hulme (1997, manuscript submitted to Int. J. Climatol., hereafter referred to as OH97).
Implicit in our approach is that general circulation models simulate areally averaged precipitation rather than point precipitation. Given that model processes and surface boundary fluxes are parameterized over areas rather than points, this seems to us to be a reasonable assertion. There is an alternative view, however, which sees modeled precipitation data as representing point precipitation (Skelly and Henderson-Sellers 1996). How modeled data are regarded will alter the way the model is validated—can modeled precipitation be directly compared with observed station time series or do such station time series need averaging to produce an areal mean? This paper does not examine these arguments in detail; we merely note that there are alternative views and treat modeled precipitation as an areal quantity.
Some other studies—on smaller space and timescales using weather radar data—have identified log–log linearity between the variance of precipitation amount and area, and between rainday frequency and area (Gupta and Waymire 1990, 1993). We have not used their results here because of a number of limitations that they have for our application: (i) they use a threshold of >0 mm to identify a wet day, which cannot be used with gauge data; (ii) there is insufficient spatial and temporal radar coverage over our region of interest to estimate the parameters of their functions, which are highly timescale dependent (they used 15-min rainfall amounts, we require daily amounts); (iii) the relationships may not be linear over all area sizes (Bell et al. 1990), so that extrapolation down to the area of a rain gauge or up to the area of a model grid box is uncertain. Nevertheless, the use of radar will be of increasing importance in this field, with their near-complete spatial and temporal coverage (in areas where they are available) and the fact that they take areal measurements.
The paper is structured as follows. In section 2, we describe the observed and modeled daily precipitation data used in our analysis. Section 3 presents our methodology for estimating standard deviations of observed daily precipitation for grid-box means, drawing upon existing work and illustrating its application with European data. Section 4 describes our methodology for estimating grid-box rainday frequencies from observed daily data. This method requires the establishment of a new, empirically determined parameter, which describes the mean interstation probability of coincident dry days. We establish this parameter using European data and test its universality using data from China and Zimbabwe. Section 5 then provides one example, using daily precipitation data from the CSIRO general circulation model, of how our revised estimates of observed grid-box daily precipitation standard deviation and rainday frequency can be applied to a model validation exercise. We discuss the implications of this approach for evaluating modeled daily precipitation using observed data in section 6.
2. Observed and model data
Three networks of daily precipitation data are used in this analysis—for western Europe, China, and Zimbabwe. The European dataset is a combination of daily precipitation time series from several western European countries—Belgium, France, Germany, Ireland, Italy, Netherlands, Switzerland, and the United Kingdom. A total of 185 series are included, containing data mostly from 1961 to 1987. The Chinese data were obtained from Oak Ridge National Laboratory in the United States and contain daily series from 1952 to 1983 for 180 stations in China. For Zimbabwe, daily time series for 55 stations were obtained from the Zimbabwe Meteorological Service, mostly containing data from 1951 to 1980. These networks are illustrated in Figs. 1, 2, and 3, respectively. On these figures, the station networks overlay the grid boxes of the CSIRO general circulation model, the spatial resolution of which we use to establish our methodology.
One model simulation of daily precipitation variability is evaluated as an example of how the methodology developed in this paper can be applied. This is a climate simulation by the CSIRO AGCM, performed for the AMIP exercise. This 10-yr simulation was forced by realistic prescribed monthly sea surface temperatures for the period 1979–88. The precipitation fields have been archived on a daily basis and are validated in section 5. In addition, the model grid layouts of this model and of the Colorado State University (CSU) and Geophysical Fluid Dynamics Laboratory (GFDL) AGCMs have been used in developing and testing the methodology.
3. Estimation of standard deviations of grid-box means
a. Introduction and examples
Standard deviations of daily precipitation time series give an indication of the variability of precipitation at a location or of a region. While it is not necessarily the most useful indicator of variability due to the skewness of the distribution of precipitation amounts, it is used here because the relationship between station standard deviation and the standard deviation of the grid-box means of those stations is already known and has been utilized elsewhere (e.g., Jones et al. 1997). The estimation and validation of standard deviations is used mainly, therefore, as an example of the methodology involved in estimating the statistics of grid-box means from station observations. The methodology is then extended to the more interesting case of rainday frequencies. That said, however, it is still useful to validate the standard deviation of the AGCMs’ daily precipitation—if they are realistic models they should produce a standard deviation similar to that observed. Also, as Gregory and Mitchell (1995) point out, as more stations are averaged together the skewness of their mean precipitation time series decreases; so, standard deviations of grid-box mean precipitation are a more meaningful measure of variability than are standard deviations of station precipitation.
The standard deviations we consider are those of the full time series involved—dry days are not first removed. Locations with many dry days will have a standard deviation that is biased. Again, the problem is less severe for the grid-box means considered here, since there are fewer dry days in the grid-box mean time series than there are in individual station time series.
If a grid box contains N stations with daily precipitation observations, then the variance of each station time series is s2i (i = 1, . . . , N). The mean of the variances of single-station time series within the grid box, s2i, is then simply the average of the individual s2i values (i.e., the variances of the time series are computed and then averaged together). The variance of the grid-box mean precipitation time series, S2N (i.e., the time series are averaged and then the variance of the average time series is calculated), can also be computed and is always less than s2i. This reduction is greater for greater N.
To illustrate how the variance decreases as more time series are averaged together, we define the variance of an n-station mean time series as S2n (n = 1, . . . , N) and compute it for the European station data that fall into each of the CSIRO model grid boxes (Fig. 1). To obtain the true value of S2n, the variances of every combination of n stations chosen from the N available for that box should be computed and the mean taken. For some boxes with many stations, such a calculation is not computationally feasible [e.g., there are 50!/(25!25!) ≈ 1014 such combinations of choosing n = 25 stations from a box with N = 50 stations available]. Instead, up to 15 random combinations are selected in each case, and the minimum, mean, and maximum variances are shown in Fig. 4 (for the three boxes highlighted in Fig. 1) for winter (DJF) and summer (JJA).
As n increases, S2n decreases—but eventually levels off. The box covering much of England is the only one, of the three shown, with sufficient stations to estimate this final value (i.e., S2∞). Even using the full number of stations available in each of the other boxes (i.e., the right-hand end of the curves) would not result in a value that could confidently be used to validate the “true” grid-box mean that the model produces. For the two more northerly boxes, England and northeast France–Belgium, summer precipitation time series are much more variable than winter series at single stations. The reduction in variance as n increases, however, is more rapid in winter, so that the variance of a many-station mean shows smaller seasonal differences.
The reduction in variance as n increases (Fig. 4) can be computed from the following formula [after Kagan (1966); see Yevjevich (1972) for an English version, although it follows from the standard formula for the variance of the sum of correlated random variables (e.g., Lindgren 1968)]:
where s2i is the mean of the station variances (i.e., the height of the starting point at n = 1 of the curves in Fig. 4); S2n is the variance of the combination of n station time series; and r̄ is the mean interstation correlation between all pairs of stations within the box or region being considered (see also Wigley et al. 1984).
As n increases, the variance decreases hyperbolically (i.e., more rapidly for small n). This decrease is greater for lower r̄, since less of the variance is common to all station time series if the mean correlation is low, and more of the variance cancels out when taking the average of n stations. As the number of stations continues to increase, S2n begins to level off, reaching a value of s2ir̄ when n = ∞. This was seen in the box (52.56°N, 0°) with n > 50 stations in Fig. 4. Since we know s2i and n for each grid box, all that we require is r̄.
c. Computation of correlation decay lengths
The mean interstation correlation between all pairs of stations in each grid box (r̄) could be computed directly by finding the correlations between all possible station pair combinations within a box and then averaging them. There are two disadvantages to this approach. First, it is not possible to obtain an r̄ value for grid boxes that have just one station in. Second, while the r̄ computed in this way will be representative of the particular stations available, it may be unrepresentative of the box as a whole, especially if the stations are closely spaced or very far apart (within the confines of the box).
An alternative method is to compute a correlation decay length (x0) for each station, following the methodology of Briffa and Jones (1993) and Jones et al. (1997). The correlation (r) between a station and every other station—not just those in the same grid box—is plotted against their separation distance (x). An exponential decay function is then fitted to the scatter of points; thus,
where x0 is the correlation decay length at the distance where r falls to 1/e.
If this procedure is repeated for all stations in the western European dataset (Fig. 1) and all (x, r) points are plotted on the same graph, for two seasons, then a single decay function can be fitted for the dataset as a whole (Fig. 5). It is clear that exponential decay is a reasonable function for describing the decay of daily precipitation correlations with distance. The overall decay lengths for the region show some seasonality, with a 50% longer decay length in winter than in summer.
This indicates that the characteristic scale of precipitation-causing disturbances is larger in winter than in summer, due to the fact that a higher proportion of precipitation is frontal in winter than in summer. Precipitation events associated with weather fronts are generally larger in scale than convective events, resulting in the larger winter decay length. (This dominates over the effect of leaving the dry days in the time series from which the correlations are computed, which tend to raise the decay lengths a little in the drier summer season.)
Given the dependence of x0 on the relative importance of frontal and convective precipitation, it is likely that x0 will vary spatially as well as seasonally. To assess this, the correlations between one station and all others are plotted against station separation and Eq. (2) is fitted to obtain a correlation decay length for that particular station alone. This procedure is then repeated for each station, for two seasons, and the results show considerable spatial variability (Fig. 6). [If all the station values of x0 for one season are averaged, the result is the same as that obtained when all stations were considered together for that season (i.e., the values shown in Fig. 5).]
As expected, the decay lengths are longest in the north and the west (Fig. 6). The shortest decay lengths are about one-quarter of the longest and occur in northern Italy. There is some seasonality, with shorter decay lengths in summer in accordance with the values for the region as a whole (Fig. 5), particularly away from the Mediterranean region. This matches well with the results shown in Fig. 4, since the shorter decay lengths in summer imply lower interstation correlations (i.e., r̄) and, through Eq. (1), a more rapid decrease in S2n as n increases. The weaker seasonality in x0 in the Mediterranean region also agrees with the Italian grid box shown in Fig. 4, which had similar rates of decrease of S2n in both seasons shown.
The x0 values for each station in a grid box are averaged to produce a mean x0 for that box. Equation (2) could then be integrated from 0 to X, where X is the diagonal size of the grid box (i.e., the maximum possible separation of two stations within the grid box), to obtain a mean r̄ for the box:
This integral assumes, however, that the station separation between two randomly selected stations follows a uniform rectangular distribution. The actual distribution is not rectangular (Fig. 7 shows an example from a square box), and the skew increases somewhat for nonsquare boxes. So, a probability density function of station separations, f(x), is estimated for the shape of each model grid box (the shape varies with latitude) and is used to scale the decay function before it is integrated. Thus,
which we then integrate numerically to obtain a value of r̄ for each box. The values obtained using Eq. (5), taking account of the nonrectangular distribution, are around 20% higher than those computed using Eq. (4).
This approach, of computing a correlation decay length and then integrating the decay function, provides a better estimate of r̄ than computing it directly from the available stations in that box, since it gives a value that would be obtained if there were complete and/or uniform station coverage within the box (assuming that the value of x0 obtained from the correlation decay lengths is representative for the box).
d. Computation of standard deviations
We are now in a position to estimate the standard deviations of grid-box mean precipitation from the station data. The CSIRO model grid boxes are used for this example once again. The standard deviations of the European station precipitation time series (s2i) for summer are shown in Fig. 8a and are lowest over Sardinia and southern Italy and highest over the Alps. These standard deviations can be used to compute a mean station standard deviation [i.e., (s2i)1/2] for each grid box (Fig. 8b). These latter values represent the starting points of the curves shown in Fig. 4 [i.e., (S21)1/2]. Alternatively, the precipitation time series of each station in a grid box can be averaged together, and the standard deviations of the grid-box mean time series computed [i.e., (S2N)1/2]. These values (Fig. 8c) represent the end points of the curves shown in Fig. 4.
A mean interstation correlation, r̄, has been computed for each box, following the procedure outlined in the previous section (Fig. 8d). Precipitation events have greatest spatial scale near the Atlantic coast and smallest over Italy. The s2i values (Fig. 8b) can be scaled to give an estimate of (S2N)1/2 by using Eq. 1, the r̄ value for each box (Fig. 8d), and setting n to the number of stations per box (Fig. 1). That is, the end points of the curves in Fig. 4 can be estimated from the starting points and r̄. These estimates (Fig. 8e) compare favorably with the directly computed values (Fig. 8c)—the root-mean-squared (rms) error of the estimates is just 0.2 mm day−1. All but two boxes fall within the same interval on the scale in both Figs. 8c,and 8e.
This confirms that Eq. (1) is applicable here and that our values of r̄ are reasonable. But we have not yet gained anything from this extensive analysis (i.e., Figs. 8c,and 8e are largely interchangeable and the former is much simpler to calculate). The next step is to set n = ∞ in Eq. (1) (which then reduces to S2∞ = s2ir̄) and apply it to the same European grid boxes. This, in effect, extends all the curves in Fig. 4 out to n = ∞, even in boxes with very few stations. The results are shown in Fig. 8f. We suggest that these values are a better representation of the standard deviations of grid-box means than those shown in Figs. 8c,or 8e. For example, the grid box centered over the Spanish–French border appeared to be more variable than other boxes (Figs. 8c,and 8e), but this was simply due to there being only one station in that box. The correction up to an infinite number of stations lowers the standard deviation of that box to a level similar to much of north and west France.
Similar calculations can be done for winter too. Once again, if the standard deviations of stations (Fig. 9a) are simply averaged in each box [(s2i)1/2; Fig. 9b], the result is higher than if the standard deviations of the box-mean precipitation are computed [(S2N)1/2; Fig. 9c]—although the difference is not as great as for summer, in accordance with Fig. 4. The r̄ values (Fig. 9d) show a similar pattern to those in summer (Fig. 8d), although lower everywhere. Equation 1 and r̄ are used to scale Fig. 9b to give an estimate of (S2N)1/2. The estimates (Fig. 9e) again compare favorably with the directly computed values (Fig. 9c), although the rms error is a little higher (0.5 mm day−1) than for summer.
Extending the calculation to an infinite number of stations (Fig. 9f) lowers the standard deviations and provides a better estimate of the variability of grid-box means. The boxes with many stations in them already, show little change when n is set to ∞ in Eq. (1).
This technique appears to be able to give more information than can be obtained from the available stations. It is important to note, however, that it simply uses an estimate of the variability at a point (s2i) and an estimate of the spatial scale of precipitation events (r̄) to compute the variability of an infinitely sampled grid-box mean—assuming that the mean variability and scale of the infinite stations that we do not actually have are well represented by our values of s2i and r̄. The use of the correlation decay length approach, and the fact that the x0 and r̄ fields vary quite smoothly, suggest that our estimates of r̄ are quite reliable and representative, even for boxes with very few stations. The field of s2i also varies quite smoothly in summer (Fig. 8a) but has more small-scale structure in winter (Fig. 9a). To be most accurate, therefore, this technique requires the stations used in computing s2i to be representative of the box as a whole.
4. Estimation of rainday frequencies of grid-box means
a. Introduction and examples
For our analysis of rainday frequencies we begin in a similar way, defining our statistic and computing some example values. The rainday frequency is given by the probability of a day being wet (Pw). A wet day is indicated by a precipitation total of more than 0.1 mm to account for the precision used in recording daily precipitation totals. Thus, a rainday might more accurately be called a precipitation day, since it includes days with snowfall. [Other thresholds could be used, and would produce different results, but the same methodology would be expected to hold. The 0.1-mm threshold was chosen because it is the same as that used in the European baseline climatology (Hulme et al. 1995), which we use later.] When combining a number of stations together, the probability of a day being dry (Pd = 1 − Pw) is more useful, since it is simpler to compute the days when all stations are dry (i.e., the intersection) than to compute the days when at least one station is wet (i.e., the union).
We define the dry-day probability of an n-station mean time series as Pd(n), n = 1, . . . , N, and compute it for the European station data falling in each grid box (the CSIRO model grid is used once more). The results are shown for all grid boxes with N > 3 (see Fig. 1), for DJF (Fig. 10a), and JJA (Fig. 10b). The longitude (positive eastward) and latitude of the center of each box is marked and they are plotted in the order: starting in the north, working from west to east, and then moving south.
As the number of stations (n) used in the mean time series increases, the dry-day probability (that the mean precipitation is no more than 0.1 mm) decreases. As with the variance (section 3), the decrease is more rapid for small n and eventually levels off. The summer values are higher (more dry days) for most grid boxes, but there are no consistent seasonal differences in the rate at which Pd(n) decreases with n. The number of stations in each grid box (N) is such that only the four most densely observed grid boxes (see Fig. 1) have sufficiently leveled off to estimate the true grid-box mean dry-day probability, Pd(n = ∞).
b. Theory and computation of probability decay lengths
As with the standard deviations (section 3), we wish to estimate/reproduce the decrease in Pd(n) as n increases (Fig. 10). The difficulty is that there is no previously published relationship between the two variables, as there is for the case of standard deviations [i.e., Eq. (1)]. If such a relationship exists, then it is likely to depend on the individual station dry-day frequency, Pd(n = 1), n, and some measure of the spatial scale of precipitation events. The latter must express the fraction of such events that are common to all stations, regardless of magnitudes, which is a little different to r̄, which expresses the fraction of variance that is common to all stations. We therefore derive below our own empirical relationship.
A new statistic between two station time series is defined, p, as the probability that both stations are dry on a particular day given that at least one is dry. Thus, if Xi is the event that a station xi is dry, then
If all values of p are computed between all combinations of station pairs within a grid box, and then averaged, p̄ is obtained (cf. r̄). This method may lead to a biased value if the stations are not uniformly distributed and cannot be used for boxes with a single station. Instead, the decay length methodology is again utilized to compute a probability decay length, xp.
If x1 and x2 are the same station, then p = pmax = 1. If x1 and x2 are statistically independent on the daily timescale (which presumably is the case if their separation is large relative to the spatial scale of precipitation events), then p = pmin = P(X1)P(X2)/[P(X1) + P(X2) − P(X1)P(X2)]. This minimum value, pmin, is greater than zero and increases with higher values of P(X1) and P(X2); that is, pmin is higher in regions that have a greater probability of having a dry day. Very small pmin (and hence p̄) can come about only if the sites are uncorrelated and there is a very high chance of precipitation (see the appendix for a further discussion of this point).
As two stations become more widely separated, p tends to decrease. This is not the simple exponential decay of the correlations [i.e., Eq. (2)], however, since p falls to pmin not to zero and it does not begin as high as one (for very small initial station separation, p falls very rapidly—by up to 0.3—a drop that is sometimes called a nugget). Nevertheless, p between one station and every other station (including those outside the grid box) can be plotted against separation distance, and a decay function fitted to obtain values for two decay function parameters (probability decay length, xp, and probability intercept, p0) for that station. The function fitted is a hyperbolic in x:
and it is fitted to those station pairs with a separation of 1000 km or less.
This procedure is repeated separately for each station and for each season, until a set of p0 and xp values is found. These values depend upon the spatial scale of precipitation events—as did the correlation decay length, x0, which showed smaller scales in summer than in winter, and over the Mediterranean than over north western Europe (Fig. 6). In the case of the probabilities, however, this dependence is offset by a dependence on the climate of the region. If it is a dry region, then P(Xi) will be high and pmin will also be high (i.e., even if two stations are independent, then the probability that their dry days coincide will be higher if they each have many dry days). If Pmin is relatively high, then the decay of probability with distance is not rapid and xp will have a larger scale.
The values of p0 and xp for each station in a grid box are then averaged to produce mean values for that box. As for the correlation decay function [Eq. (5)], Eq. (7) can then be integrated over the area of the box, weighted by the distribution of randomly selected station separations, to produce a mean p̄ for each grid box.
What can these p̄ values be used for? The simplest case is for computing the reduction in dry-day frequency from a one-station mean, Pd(n = 1), to the dry-day frequency of a two-station mean, Pd(n = 2). Equation (6) can be rearranged to give
since Pd(n = 2) ≡ P(X1 ∩ X2) and Pd(n = 1) is the mean of P(X1) and P(X2). The shape of this function over the possible range of values is shown in Fig. 11. For 0.3 < p̄ < 1.0, the function is well approximated by a linear function (as shown on Fig. 11)—an important point, since only one grid box in the European, Chinese, and Zimbabwe datasets used (Figs. 1, 2, and 3) had a p̄ value of less than 0.3. Given a value of p̄ for the box and the mean single-station dry-day frequency of the stations within the box, Eq. (8) or its linear approximation can be used to estimate the dry-day frequency of a two-station mean for that grid box.
c. Empirical extension to more than two stations
The reduction in dry-day frequency from a one-station mean to an n-station mean [i.e., Pd(n = n)/Pd(n = 1)] can be computed from Eq. (8) for n = 2 only. The equation cannot be extended to n > 2 because that involves the computation of P(X1 ∩ X2 ∩ X3) if n = 3, which is not a function of P(Xi) and p̄ alone but also depends upon P(X3 | X1 ∩ X2). The latter cannot be expressed as a function of the known values, and so the problem is not closed. (If a wet day is defined as having >0 mm precipitation, rather than >0.1 mm, then the problem is more tractable—an analytical solution is given and discussed in the appendix.) It may be possible to find some assumption to close the problem, based perhaps on known spatial structure or size distribution parameters of precipitation events, that allows such an expression to be made, but instead we have followed an empirical approach.
Using dry-day frequencies of the means of different numbers of European station data in each CSIRO grid box (i.e., the values shown in Fig. 10), values of Pd(n = 5)/Pd(n = 1) and Pd(n = 9)/Pd(n = 1) have been computed for each grid box with sufficient data (11 boxes with at least five stations, 4 with at least nine). These values are significantly correlated with the p̄ values for their box. The calculations were done for each season separately, but the same relationship applies to all seasons (Fig. 12). The relationship appears to be stronger for the case with nine stations (Fig. 12b, rmse of the linear fit is 0.029) than for the case with five stations (Fig. 12a, rmse is 0.057). Note that there is some weak seasonality in the data, with values from DJF (* in Fig. 12) tending to have lower values than those from JJA (▵ in Fig. 12). The grid boxes with sufficient stations to be involved in the derivation of these relationships include at least one Mediterranean box, in addition to the more densely observed northwestern European region.
d. Validation using different regions
The theoretical relationship [Eq. (8) and Fig. 11] between p̄ and the two-station dry-day probability is generally applicable. The extension to five- and nine-station averages is based on European data only, which covers a limited range of climate regions only (i.e., the range of p̄ in Fig. 12 is only 0.5 to 0.8). To validate/improve the relationships, the Chinese (Fig. 2) and Zimbabwean (Fig. 3) datasets are now utilized.
The entire procedure of computing Pd(n) and p̄ has been repeated for the two regions. The decay length approach is not used to compute p̄ here; instead, the direct approach of computing and averaging p between all pairs of stations within the box only is used (since the China and Zimbabwe regions will not be used for climate model validation at present, there is no need to get a p̄ value that is unbiased by nonuniform station distribution).
There are no CSIRO model grid boxes with N = 9 Chinese stations, but there are 14 with N = 5 or more, so one relationship can be validated. The Chinese data show a much stronger correlation between Pd(n = 5)/Pd(n = 1) and p̄ for each box and each season (Fig. 13) than did the European data (Fig. 12a). There is a distinct seasonality between the dry DJF season (high p̄ values) and the wet monsoon season (JJA, lower p̄ values); here, the p̄ statistic is dominated by the climate regime rather than by the spatial scale of precipitation events.
The Zimbabwe data are sufficiently dense to find relationships for both Pd(n = 5)/Pd(n = 1) and Pd(n = 9)/Pd(n = 1) against p̄ (Fig. 14). These relationships are very strong (correlations are 0.99 in both cases), and the data are again very seasonal (JJA is very dry and has p̄ values between 0.92 and 0.98).
Near-linear relationships do exist, therefore, between the reduction in dry-day frequency as the number of stations averaged together increases, Pd(n)/Pd(1) and the mean interstation probability statistic, p̄, for the non-European areas as well as for the European area. But are they the same relationships? Comparison of Fig. 12a, 13, and 14a indicates some difference in the best-fit line between the regions, and this is clarified in Fig. 15a, where the three datasets are combined and the best-fit lines from the individual regions and from all the points are shown. Part of the reason for the differences between the individual best-fit lines is the small range of p̄ values of the European dataset, which provides a much weaker constraint to the best-fit line. So, although the best-fit lines from the Zimbabwe and, particularly, the Chinese datasets are somewhat different, they still provide a reasonable fit to the European data. For example, the rms error of the European data about the European best-fit line is 0.051, only a little better than the rms error of the same data about the Chinese (0.062) or the Zimbabwe (0.071) best-fit lines. For the nine-station case, both the European and Zimbabwe data lie on virtually identical lines (Fig. 15b).
Given the strength of the correlations (0.94 in Fig. 15a; 0.99 in Fig. 15b), the similarity of the constituent datasets, and the range of regimes covered by the combined dataset (p̄ varies from 0.25 to 0.98), the relationships derived from the combination of European, Chinese, and Zimbabwe datasets are likely to be generally applicable.
The analysis is now extended to different numbers of stations—not just five- or nine-station averages. We would expect that the linear relationships should be less steep for low n and steeper for high n, since the number of dry days decreases as the number of stations averaged together increases. This increase in gradient will then level off, just as the curves in Fig. 10 do. All lines should pass through the point [p̄ = 1, Pd(n)/Pd(1) = 1], since if all stations in the box have their dry days at the same time (i.e., p̄ = 1) then the probability of a dry day occurring will be no different regardless of how many stations are averaged together. The best-fit lines already computed (Fig. 15) support these expectations: they both pass close to the point [1, 1] and the nine-station line (Fig. 15b) is steeper than the five-station line (Fig. 15a).
The same calculations have been performed to relate Pd(n)/Pd(1) for n = 2, . . . , 15, to p̄, and give the expected results (Fig. 16). The best-fit lines do become steeper as n increases, and the increase in gradient levels off for high n (Fig. 16c; thin continuous line shows the gradient of these lines). There are fewer values as n increases, since only a limited number of grid boxes have more than 10 stations in them.
The correlation between Pd(n)/Pd(1) and p̄ remains very high whatever value of n is used (Fig. 16a; thin continuous line). It is lowest (0.87) for n = 2 and then increases for all other values of n. Theoretically, the relationship for n = 2 should not be quite linear (Fig. 11) over the range of p̄ values, perhaps explaining the lower correlation. The increasing correlations thereafter indicate that the theoretical curvature might reduce and the linear approximation may become even more realistic. The intercept of the line with p̄ = 1 is close to 1 for all n but shows a systematic variation (Fig. 16b; thin continuous line). The intercept is greater than 1 for a low n, because the linear fit to the theoretical curve results in a slightly too high value at p̄ = 1 (again, see Fig. 11).
These linear relationships suggest the possibility of Pd(n) < 0 for a valid value of p̄, since they intersect the x axis for p̄ > 0. This intersection is realistic, however, and not just a feature of the linear functions assumed here—the nonlinear functions obtained analytically in the appendix also intersect the x axis. For the analytical solutions, there is a codependency of the relationships on Pd(1) that prevents unrealistic probabilities from being obtained.
e. Validation using different gridbox sizes
All calculations so far have used grid boxes of the same size and location as those used by the CSIRO model. Although there is some variation in size with latitude, this is only a moderate change. The relationships found with this grid should be equally applicable to other grid sizes (with the exception of very large boxes, as we shall show). Smaller boxes will show a smaller decrease in Pd(n) as n increases, since the n stations within the box will be closer and thus more similar; but this will be accounted for since p̄ will be greater. The opposite applies to larger grid boxes.
To confirm this, the calculations are repeated using the coarsest model grid (CSU,4° lat × 5° long) and the finest model grid (GFDL, 2.25° lat × 3.75° long) of those AMIP models validated by Osborn and Hulme (1997, manuscript submitted to Int. J. Climatol., hereafter referred to as OH97). These results are indicated by the dotted and dashed lines on Fig. 16, respectively. There is a little more scatter to the results (not shown), and the correlations are slightly weaker between Pd(n)/Pd(1) and p̄ (Fig. 16a). They remain, however, above 0.8. The coarser model grid (CSU) yields relationships with steeper gradients (Fig. 16c) and higher intercepts (Fig. 16b) than those from the CSIRO grid, while the finer model grid (GFDL) yields the opposite. This can be explained by reference to Fig. 11 again. Fitting a straight line to the curve gives a result that depends upon the location of the available data points in p̄ space. The coarser grid has lower values (all the points move leftward along the curve in Fig. 11), and fitting a line to that portion of the curve does indeed yield a steeper line and a higher intercept at p̄ = 1. The finer grid yields higher p̄ values (the points move to the right), and the best-fit line is less steep and has an intercept value nearer to 1 at p̄ = 1.
The same underlying relationship exists, therefore, for all three grid sizes and for all three regions, but the linear approximation to it yields somewhat different values. The differences are largest for n = 5 and become quite small for n = 15. Extremely large boxes (e.g., regional) will lead to low values of p̄, and the linear relationship derived from the finer boxes would no longer be a good approximation.
f. Computation of rainday frequencies
Using the relationships found here, it is possible to estimate the dry-day probability (and hence the number of raindays) of an n-station mean (for n = 2 to 15) from an estimate of p̄ for that box and the mean single-station dry-day probability, Pd(n = 1). The values of the intercept and gradient used are those computed from the intermediate size grid (CSIRO; thin continuous lines in Fig. 16), whose coefficients are almost equal to the mean of the coefficients from the three grids used (Fig. 16; thick continuous line).
It might be possible to extrapolate the relationship to n = ∞ by fitting some function to the curves shown in Fig. 16. The lack of a theoretical function, however, means that such an extrapolation would be rather uncertain. Figures 10 and 16 indicate that there would be little further reduction for n > 15 anyway, so n = 15 is used as our estimate of grid-box means.
The procedure is very similar to that followed for the estimation of the standard deviations of grid-box mean precipitation in section 3d. The dry-day frequencies are calculated for each station in the western European dataset and are then averaged together for each CSIRO grid box that they fall in. This value of Pd(n = 1) can be converted to the number of raindays at a station (shown in Fig. 17a for DJF). These values are equivalent to the start of the curves shown in Fig. 10a for each grid box). The end points of those curves can be directly computed by averaging together the precipitation time series that fall in each grid box and finding the frequency of dry days, Pd(n = N), in these mean time series. There are fewer dry days in these mean time series than in the individual station time series and, hence, more raindays (shown in Fig. 17b).
The values of Pd(n = N) can be estimated from the values of Pd(n = 1) and p̄ for each grid box via the relationships expressed graphically in Fig. 16. The former values were shown in Fig. 17a, and the latter values are calculated by the integration of the decay function over the grid-box area with the appropriate parameters for each grid box. The dryness of the climate regime dominates over the spatial scale of precipitation events in determining p̄; for DJF (Fig. 17c) p̄ is highest in the region with most dry days (south) and lowest in the region with fewest dry days (Scotland). The estimate of Pd(n = N) can then be made (although N is limited to a maximum of 15) using the appropriate relationship shown in Fig. 16 and converted to the number of raindays per season (Fig. 17d). This compares favorably with the directly calculated values (Fig. 17b): the rms error between the two fields is 2.5 days.
The rainday frequencies of those grid boxes with small N will be underestimated. The estimates can be extended, not to an infinite number of stations, but to a coverage of 15 stations per box. The gradient and intercept for n = 15 from Fig. 16 is used to compute Pd(n = 15) from p̄ and Pd(n = 1) for each box, increasing the number of raindays in some boxes (Fig. 17e).
Similar calculations have been done for summer too (not shown). By using this method, we have obtained a better estimate of the grid-box mean rainday frequencies than could be obtained from observed station data alone. These estimates are unbiased by the number of stations available in each box and represent a useful dataset for climate model validation studies.
We can also improve on this method by using the European baseline climatology constructed by Hulme et al. (1995). There are no daily time series available with this climatology, but there are observations of station rainday frequencies [from which Pd(n = 1) can be obtained]. Although this is a gridded climatology, the values are still representative of station statistics [a surface was fitted to the available station rainday frequencies and then sampled on a regular 0.5° × 0.5° grid (see Hulme et al. 1995 for further details)]. The climatology values of Pd(n = 1) are more accurate than those calculated from the western European stations used here, since more stations were used in its construction and elevation has been accounted for (the values are estimated at the mean elevation of each grid cell).
The climatological values of Pd(n = 1) on the fine 0.5° × 0.5° grid that fall in each CSIRO grid box are averaged together. For those CSIRO grid boxes that we have estimated p̄ for (Fig. 17d), we can then recompute Pd(n = 15) using the climatology values of Pd(n = 1). The resulting values are shown for DJF and JJA in Figs. 18a and 18b, respectively. The differences between using the baseline data or the station data are small (for winter, cf. Figs. 18a and 17e), although more noticeable in the grid boxes with few stations in Fig. 1. As would be expected for this region, there are fewer days with precipitation in summer than in winter. The seasonality is a little less, however, in these grid-box statistics than in the station statistics (see Hulme et al. 1995).
5. Validation of the CSIRO AGCM
a. Validation of daily precipitation standard deviation
As an example of the application of the methodology outlined in the present paper, we validate the AMIP simulation of the CSIRO AGCM. A more detailed validation using this methodology of a range of models is given by OH97, including a discussion of the effect of interdecadal climate variations [Wigley and Jones (1986) indicate that rainday frequencies over England and Wales can vary by up to 15% from one decade to another].
Time series of daily precipitation amounts from a 10-yr simulation of the CSIRO model (forced by sea surface temperature anomalies observed from 1979 to 1988) have been obtained. The first year of the simulation was discarded because some of the models used in the AMIP exercise exhibited initialization artefacts. From the remaining 9 yr, the standard deviation and rainday frequency were computed (not shown) at each grid box that was classified as land in the model and for which we have estimated the observed standard deviation and rainday frequency.
The difference between model standard deviations and observed standard deviations, computed on a grid box by grid box basis, has been binned into error bins of size 0.5 mm day−1 (Fig. 19a; thin line winter, thick line summer). The CSIRO model appears to have a bias toward having weaker daily variability than the observations, with the bias being larger in summer. Had the adjustment from N to infinite stations in each box not been made, the model would have appeared to be poorer than it is (Fig. 19b, where the errors are computed by comparing the model with the fields shown in Figs. 8e and 9e). The difference between the standard deviations computed from N stations and those computed from infinite stations can be similarly displayed (Fig. 19c). Boxes with high N show little difference, but boxes with low values of N show higher variability compared to the field adjusted for infinite sampling (i.e., the boxes falling to the right of the zero line in Fig. 19c are those with low values of N).
b. Validation of rainday frequencies
A similar comparison has been performed for the rainday frequencies (Fig. 20). The CSIRO model shows too few raindays in summer and too many in winter (Fig. 20a), although the distributions cross the zero error line more than for the standard deviations. Had the observed values not been adjusted up to 15 stations, then the boxes with low N would appear to have fewer raindays than they actually do (i.e., those boxes cause the distribution in Fig. 20c to shift to the left of the zero line). As a consequence, the model’s winter bias in having too many raindays would have appeared worse than it truly is, while its summer simulation would have appeared to be more realistic.
6. Summary and conclusions
The difficulty in comparing the statistical characteristics of time series from (station) point sources with time series from (grid box) areal-mean sources is highlighted. This difficulty is increased for variables whose anomalies occur with characteristic spatial scales that are less than the size of the grid boxes. Precipitation on a daily timescale is one such variable. The problem becomes even more extreme for hourly precipitation (Chen et al. 1996). With a very high density of stations with sufficiently long time series of daily data, a good estimate of grid-box mean values could be made, and the difficulty would be resolved. Such a station network, however, is only available to the scientific community for very limited regions of the world.
Evaluation of the statistics of daily precipitation simulated by climate models has been limited because of this problem. For example, Gordon et al. (1992), Whetton et al. (1993), and Fowler and Hennessy (1995) make use of model estimates of daily precipitation, yet make no attempt to evaluate the current climate model simulations of daily precipitation. In the present paper, we have developed methodologies to estimate some of the statistics of daily grid-box mean precipitation using relatively few stations, by making use of parameters that describe the spatial scale of precipitation events. Estimates have been made of the daily standard deviation and of the frequency of raindays for grid boxes covering part of western Europe. We believe these estimates provide a better field for the validation of climate models than have previously been available. A different estimate must be constructed for each model, as the grid boxes vary in size and position from one model to another. As an example of the application of this methodology, we have constructed an estimate of these observed statistics for the CSIRO model grid and evaluate the simulation of that model. Systematic biases in the simulation were found, but see OH97 for a more comprehensive validation of a range of models.
The methodology for estimating the standard deviation uses a previously established relationship between station and grid-box means that depends upon the mean interstation correlation of the station precipitation time series in a box, r̄. A few station time series are required to estimate r̄, but the relationship then allows the standard deviation of an infinitely sampled grid-box mean to be calculated. The methodology for estimating the rainday frequency of a grid-box mean time series from station time series is similar but uses a new, partly empirical, relationship that is derived in the present paper. This relationship appears universal. A similar parameter to r̄ is required, which is the mean interstation probability of coincident dry days (denoted by p̄). This parameter again depends upon the spatial scale of the anomalies, but also (and to a greater extent) upon the climate regime (in that regions with more dry days have a higher p̄).
These relationships enable a better validation to be made of the daily precipitation standard deviation and rainday frequency simulations of climate models. Precipitation totals can, of course, be validated with more ease by using existing climatologies (e.g, Hulme et al. 1995). Observed precipitation intensity on raindays can be calculated by dividing these precipitation totals by the number of raindays estimate in the present paper (see OH97). It would be useful to derive a similar methodology that would allow estimates of other statistical parameters of grid-box mean daily precipitation from relatively few station time series, such as skewness.
We also make some comments on previous validation work, as a consequence of the results found here. Reed (1986) found somewhat better agreement between model and observations when he used a six-station mean rather than a single station. Even closer agreement was found when an average of 144 stations from all over England and Wales was used for comparison. These stations were not all in the grid box being validated, however, and so would have a lower p̄ than those in the box (due to their wider separation); hence the number of raindays might be higher than for the true grid-box mean. This highlights the point that the number of raindays can be (artificially) raised as high as is required by selecting stations outside the area being validated. Gregory and Mitchell (1995) use both 3- and 11-station averages for a grid box over England to estimate the error due to the lack of complete coverage. In German and Italian grid boxes, however, they have only three stations available. The work completed here indicates that more than three stations is required to get an accurate estimate of the rainday frequency for these areas. It implies that the apparent bias to too many raindays over Germany of the model validated by Gregory and Mitchell (1995) may have been overestimated, whereas the reported good agreement over Italy may in fact mask a tendency of the model to produce too few raindays. Such possibilities should be reassessed using the methodology developed here.
This work was supported by the U.K. Department of Environment (EPG 1/1/14) and by the U.S. Department of Energy (DE-FG02-86ER60397). The AMIP model data were supplied by PCMDI, LLNL, under AMIP Diagnostic Sub-project No.21 (Surface Climatologies, P.I. Dr. P. D. Jones). Discussions with Jonathan Gregory, Phil Jones, and Keith Briffa are acknowledged, as are comments from two reviewers. David Viner (Climate Impacts LINK Project, Department of Environment EPG 1/1/16) provided computer facilities for this work.
An Analytical Expression for the Rainday Frequency of an Areal Mean
If we change the threshold for distinguishing wet and dry days from 0.1 mm to 0 mm—so that any precipitation in a grid box constitutes a rainday—then the problem of estimating the dry-day probability of an areal mean (in this case a grid box) from the mean dry-day probability of single stations becomes closed.
Let P∞ be the probability that there is no precipitation anywhere in a box, let Pd(n) be the probability that there is no precipitation at n randomly selected points in the box, and let p̄ be the mean value of p for the box [with p defined by Eq. (6)]. If the fraction of the box affected by a one-day precipitation event is, in the long-term mean, given by R, then
For n = 1, the chance of a point being dry is the sum of the probability that the entire box is dry (P∞) and the chance that there is some precipitation (1 − P∞) but that the selected point is not in the affected area (1 − R). As more points are selected (n > 1), the chance that they all miss the precipitation-affected area decreases—hence the exponent on (1 − R).
Putting n = 1 and n = 2 separately into Eq. (A1), together with Eq. (8), yields three equations with five unknowns [Pd(1), Pd(2), P∞, p̄, and R]. If we choose to remove Pd(2) and R by substitution, we obtain
Note that P∞ ≡ Pd(∞), and also that Eq. (A1) can be used to compute Pd(n) for any n.
We have, therefore, obtained an expression for the ratio of the dry-day probability of an areal mean to that of a single point (i.e., station). It is not solely dependent upon p̄, as we had assumed in the empirical relationships (with nonzero threshold) in the main part of the paper. A separate curve is, therefore, obtained for each value of Pd(1), and nine such curves are shown in panel a of Figure A1.
These results demonstrate three points of interest. First, the empirical approach may be flawed because linear relationships are used, and also ones that depend solely on p̄. The empirical data do not, however, appear to show the nonlinear character of curves such as those in Figure A1a (although a single curve cannot be chosen for comparison with the data, since each describes the statistics for an infintely sampled grid box with a different station dry-day probability). This implies that, when a nonzero threshold is used, the relationships become more linear and less dependent upon Pd(1).
Second, since Pd(∞)/Pd(1) is still dependent upon Pd(1), it is simpler to plot Pd(∞) against p̄ for various values of Pd(1) (see Figure A1b). If p̄ = 1, then Pd(∞) = Pd(1), so the height of the curves at the right-hand axis of this figure identify the single-station dry-day probabilities. As p̄ decreases, the areal-mean probability decreases—but at a greater rate for drier locations. This leads to the (initially) surprising result that the curves cross (Fig. A1b)—implying, for example, that for two boxes with the same p̄ = 0.6, the one that is drier at the station scale [Pd(1) = 0.7] actually has fewer dry days than one that is wetter at the station scale [Pd(1) = 0.5]. In fact, since p̄ depends on both Pd(1) and on the spatial scale of precipitation events, to get the same value of p̄ in the two boxes implies that the spatial scale of precipitation events in the drier box must be very much less than in the wetter box—and hence a greater chance of precipitation occurring somewhere in the box.
Third, the curves intersect with the x axis for values of p̄ > 0, implying that negative probabilities could be obtained. In fact, this is not possible—again due to the dependence of p̄ on Pd(1). If Pd(1) = 0.9, for example, then the minimum value of p̄ is 0.818 [i.e., pmin: see Eq. (6) in the main text and the subsequent discussion]. This is exactly the point at which Pd(∞) = 0; as expected, therefore, if all points are statistically independent, then the areal mean will never be dry. A negative probability can never be obtained—p̄ can be lower only if Pd(1) is lower—in which case a different curve is followed that does not intersect with the x axis until a lower value of p̄ is reached.
Corresponding author address: Dr. Timothy J. Osborn, Climatic Research Unit, University of East Anglia, Norwich NR4 7TJ, United Kingdom.