Skillful forecasts of 3-month total precipitation would be useful for decision making in hydrology, agriculture, public health, and other sectors of society. However, with some exceptions, the skill of seasonal precipitation outlooks is modest, leaving uncertainty in how to best make use of them. Seasonal precipitation forecast skill is generally lower than the skill of forecasts for temperature or atmospheric circulation patterns for the same location and time. This is attributable to the smaller-scale, more complex physics of precipitation, resulting in its “noisier” and hence less predictable character. By contrast, associated temperature and circulation patterns are larger scale, in keeping with the anomalous boundary conditions (e.g., sea surface temperature) that often give rise to them.
Using two atmospheric general circulation models forced by observed sea surface temperature anomalies, the skill of simulations of total seasonal precipitation is examined as a function of the size of the spatial domain over which the precipitation total is averaged. Results show that spatial aggregation increases skill and, by the skill measures used here, does so to a greater extent for precipitation than for temperature. Corroborative results are presented in an observational framework at smaller spatial scales for gauge rainfalls in northeast Brazil.
The findings imply that when seasonal forecasts for precipitation are issued, the accompanying guidance on their expected skills should explicitly specify to which spatial aggregation level the skills apply. Information about skills expected at other levels of aggregation should be supplied for users who may work at such levels.
Forecasts of 3-month precipitation totals are issued by some climate forecasting institutions, such as the European Centre for Medium-Range Weather Forecasts (ECMWF), the National Oceanic and Atmospheric Administration (NOAA) Climate Prediction Center (CPC), the Met Office (UKMO), and the International Research Institute for Climate Prediction (IRI). For many locations and seasons, the expected skill of such forecasts is modest, and users are given guidance on what skill levels apply to specific portions of the forecasts. While skill may be high enough to be statistically significant over a large sample of cases, it is often low in an absolute sense. Therefore, forecasts of 3-month precipitation totals are often expressed probabilistically rather than deterministically.
Predictive skill for precipitation is usually lower than that of temperature or upper-atmospheric circulation. This is largely because precipitation events are known for their “noisy,” small-scale character, due to the occurrences of discrete individual convective cells. Even nonconvective precipitation usually contains pockets of locally heavy rainfall. Within the general region experiencing a large-scale precipitation anomaly, these local points of particularly heavy rainfall are a result of the comparatively complex physics of precipitation, and are rarely forecastable at lead times beyond a couple of hours. Even when totaled over a 3-month period, substantial irregularities in the spatial pattern are likely, particularly in the Tropics where convective rainfall is most common. The larger-scale circulation pattern containing the general precipitation anomaly may be somewhat predictable several months in advance, as when it is related to a predictable anomaly in sea surface temperature (SST) such as one associated with the El Niño–Southern Oscillation (ENSO) phenomenon in the tropical Pacific.
Because noise is present in precipitation fields, it is a major factor in precipitation measurement as well. Noise in precipitation measurements comes not only as a result of sampling variability related to incomplete observational coverage of a precipitation field having sharp irregularities, but also to inaccuracies of the rainfall recordings at the rain gauges themselves. Rain gauge errors are mainly catch reductions that are approximately linearly proportional to wind speed during the rainfall. For example, Larson and Peck (1974) found 11% mean catch deficiencies in 4.5 m s−1 winds and 20% deficiencies in 9 m s−1 winds. The variability of the gauge site errors is therefore a function of the gauge-to-gauge variability of the wind speed integrated over the course of a rainfall event. For most rain gauge networks, this source of noise is considerably smaller than sampling variability (Barnston 1991). Still other measurement problems arise due to station relocations and varying sampling periods among gauges in a network (Groisman et al. 1991). It is worth pointing out that while most of the sampling variability is associated with small-scale, random meteorological processes that cannot be predicted in seasonal forecasts, it is possible that some portion of the small-scale precipitation field may be due to local land surface features (e.g., orography, vegetation, or soil wetness) that are potentially predictable using more regional models. In the present study, only global models are used and this potentially predictable variability assumes the role of noise.
Because precipitation contains noise superposed on what otherwise might be a smoother field such as the that of the embedding circulation and temperature anomaly pattern, it is possible to apply a spatial filter to the precipitation total to produce a version of the precipitation that is more skillfully predicted. Integrating precipitation over a larger area is what is desired in reservoir management, and hydrologists can therefore benefit from this enhanced predictability. On the other hand, farmers and others whose livelihoods depend on highly localized precipitation cannot benefit as much or as directly from a spatially aggregated forecast. Indirect benefits are possible, for example, if water can be transported from a regional storage facility to individual users.
Let us consider in a theoretical framework how spatial aggregation might be expected to achieve noise reduction, and a higher correlation between rainfall and an external forcing signal. For simplicity, suppose we initially assume two idealistic conditions: 1) that we are working across a region having a uniform response to the forcing, and 2) that the amplitude and spatial scale of the noise is also uniform throughout the region. Under these conditions, the response of the precipitation to the external signal(s) becomes more clearly identifiable as the number of independent samples across the region increases. Thus, the least favorable signal-to-noise ratio (S/N) would result from the use of a single rain gauge, and improved S/Ns would be achieved with increasingly large numbers of independent gauge samples. A lack of independent gauge data would be illustrated, for example, in the use of two redundantly located gauges, such as 10 m apart from one another. The scale of the noise “parcels”—that is, the spatial scale of the noise—might be reflected in the size of the individual cells of the typical convective rainfall (order of 1–10 km). Independence of observations with respect to the noise would be effected by spacing the gauges approximately this distance apart from one another, and noise could be reduced maximally by deploying gauges in this manner across the entire region. Under these ideal circumstances, the value of the noise (N) in S/N, in terms of the noise variance, would be reduced by the factor n, where n represents the number of effectively independent gage samples: VAR(N) ∝ l/n. Meanwhile, S would remain unchanged. It should be noted that a lack of independence of the gauge data may come about not only due to an unfavorable gauge-spacing decisions, but also by a large spatial scale of the noise, as when one “parcel” of noise is as large as the study region.
In the real world, the circumstances conveniently assumed above do not exist. We do not know the size of the noise parcels with sufficient accuracy because the size depends not only on the type of rainfall event typical for the region and season, but also on the proportion of the unpredictable variations attributable to this type of noise as opposed to signals other than the one or more that are expected a priori. Additional SST-forced signals may be unrecognized and/or of unknown nature, and would play the role of noise in our frame of reference. Moreover, some forms of noise may encompass spatial scales sufficient to govern a large region similarly over a given season. For example, a well-known form of noise is the North Atlantic Oscillation (NAO), which is largely unpredictable and not strongly related to SST forcing. Effects of the NAO would not be filterable upon aggregating rainfalls over regions of 10°–20° in affected regions of the Northern Hemisphere in winter, and thus no improvement in skill would be realized.
Because we often do not have adequate knowledge of the spatial character of the variability other than that associated with known and documented SST-forced signals across a given region, we cannot analytically derive a detailed or reliable profile of the geographical distribution of the effect of spatial aggregation on simulation skill for a given season. Therefore, rather than attempting an analytic approach, we aim to measure it more directly, measuring skill as a function of aggregation level, using atmospheric general circulation model (AGCM) simulations and observations over a historical period. Here also, one must be cautious about believing the results too literally, as use of data for only a few decades inevitably produces sampling errors in the results. To first order, however, in locations and seasons in which large apparent skill improvements results from spatial aggregation, we can assume that the externally forced signal is fairly homogeneous across the region and that some portion of the unpredictable variance (noise) occurs on a spatial dimension sufficiently small to allow the aggregation-sampled locations to include some independent, cancelable realizations of the noise. It may be noted that another approach to this inquiry would be measurement of the correlation structure of the precipitation as a function of spatial separation, from which the effects of aggregation could be surmised. While we do not pursue this here, more will be said about it later.
In this study, we empirically determine the degree to which spatial aggregation increases the skill of precipitation simulations of two AGCMs that are driven with prescribed observed SST as a lower boundary condition. For comparison, we perform the same exercise on 2-m air temperature to show that aggregation has less influence on the skill of a variable that is already fairly smooth. To further the case for precipitation, we examine the effect of aggregation on the strength of association between a simultaneous SST anomaly and precipitation at smaller scales using rainfall data at a set of stations in the state of Ceará in northeast Brazil.
2. Data and analysis method
The AGCM datasets used here are monthly mean simulations of precipitation and temperature from the ECHAM4.5 AGCM (Roeckner et al. 1996) and the Community Climate Model 3.2 (CCM3.2) AGCM (Hack et al. 1998; Hurrell et al. 1998; Kiehl et al. 1998), forced with simultaneous observed global SST. These data are gridded with 2.81° latitude–longitude resolution. The verifying observed data come from the Climate Anomaly Monitoring System (CAMS; Ropelewski et al. 1985) for temperature, and from the Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP; Xie and Arkin 1997), and University of East Anglia (New et al. 1999, 2000) for precipitation. Rain gauge precipitation data at 84 stations in the state of Ceará come from FUNCEME1 in Fortaleza, Brazil.
Using the AGCM simulation data, skills are determined with respect to the corresponding observed data, and the verification measure used here is the temporal correlation coefficient for a given 3-month season over the historical period of 1965–97. Skills are computed using several levels of aggregation. The lowest level of aggregation is the single grid square (or rain gauge in the Brazil study), and the higher levels of aggregation include ones using 3 adjacent squares; 9 squares; and 15, 25, and 35 squares, respectively. After examining the overall variation of skill as a function of the above levels of aggregation, attention is subsequently given to the results for 15 points versus the unaggregated results in greater detail, for reasons to be provided below. The 15 points include two points east and two points west of the central reference point, and one point north and south of each, forming three rows of five points per row. The domain of 50°N–50°S is used in all analyses, and only grid points over land are included. The analyses are applied to the four regular 3-month seasons of January–February–March (JFM), April–May–June (AMJ), etc. (Hereafter all 3-month seasons will be referred to by the first letter of each month, respectively.)
It is assumed that the climate signals reflected in the model simulations are shared approximately equally by all 15 points in an aggregated region. Of course, this is not always the case. In an attempt to select only those cases in which this assumption shows some evidence of being met, the following requirements are imposed: The average correlation skill over the 15 grid points must be at least 0.2, and at least 10 of the 15 points must themselves have this level of skill. For locations near a coastline or a lake, where some of the adjacent grid points may be over water, an aggregation region is qualified if both of the above rules are met even if there are as few as 10 points over land. For example, a region having only 12 of the 15 points over land may qualify if at least 10 points have the required correlation skill of 0.2 and the average correlation skill over the 12 points is also at least 0.2. The result of this relaxation is that the average aggregation size is approximately 14.5 grid points instead of 15.
One might challenge the consideration of a correlation skill as low as 0.2. The low skill level is used to increase the sample of qualifying points so that the conclusions pertaining to skill increases related to aggregation will be more robust. Using a skill requirement of 0.3 results in a paucity of qualifying regions for precipitation, rendering the results less meaningful.
a. Gridded global AGCM simulations versus observations
Figure 1 shows, by model and by variable, the ratios of the temporal correlation skills for aggregated areas to skills at the respective individual central grid points, for aggregation levels ranging from 3 to 35 grid points. Results are averaged over all qualifying points (as specified above) and over all four seasons. The results show that increasing aggregation progressively increases the simulation skills for both precipitation and temperature. Further, improvements are uniformly greater for precipitation than for temperature in terms of correlation ratios. The fact that these relationships are very similar between the two independently developed AGCMs adds to their believability.
An issue pertaining to the upper portion of the aggregation range is that of the homogeneity of the climate signals over increasingly large areas. The curves in Fig. 1 show some decrease in slope beyond the 15-point aggregation, suggesting that some inhomogeneity of climate effects may be occurring. This would create a situation where skills over more than one signal area may be being averaged together as effectively smaller aggregation levels. Nonetheless, the continuation of improved skill between 15 and 25 points, and particularly between 25 and 35 points, suggests that climate signal areas may often be of the order of 15° or more latitude–longitude. Adding some uncertainty to this conclusion, however, is the fact that for the 25-point and particularly the 35-point analyses the number of qualifying points is quite limited for precipitation, yielding small samples from which to form skill ratios. Because the analyses here are intended to determine the skill benefits of aggregation under homogeneous climate forcing conditions, the suggestion that aggregation levels of 25 or more points may bring diminishing returns in increases of skill and must result in the use of only marginally adequate numbers of qualifying samples, the remainder of this study will focus on skills for the 15-point aggregation level and their comparison with unaggregated skills.
The percentages of grid points qualifying for the skill examination for the 15-point aggregation are shown in Table 1 for each of the two AGCMs, for precipitation and temperature, and for each of the four 3-month seasons. It is apparent that precipitation yields much fewer qualifying locations than temperature, and that the ECHAM4.5 AGCM tends to have a slightly greater number of qualifying locations than CCM3.2. Table 2 is designed identically to Table 1, but shows the average of the multipoint area average correlation skills for all qualifying locations. Here it is seen that within the set of qualifying locations, variations in skill across season and AGCM are not large. While the average skills for temperature are not much higher than those for precipitation after aggregation, the margin of difference is greater for the skills of the unaggregated central grid squares (not shown in Table 2).
The left and center portions of Table 3 show, by model, by variable, and by season, the ratios and the differences of average AGCM simulation skills at individual central grid points versus the embedding 15-point spatial average. The results show that aggregation increases the simulation skills under all conditions. Further, improvements are consistently greater for precipitation than for temperature. In terms of skill ratio, there is an approximately 20% improvement for precipitation, compared to 11%–12% for temperature. For actual skill increases, differences between precipitation and temperature are not as large, with precipitation skill improving by approximately 0.09 compared with about 0.06 for temperature. In none of the four 3-month periods for either AGCM is the temperature ratio or actual skill difference as large as the corresponding ratio or difference for precipitation. The right portion of Table 3 shows the ratios of error variance associated with unaggregated versus aggregated correlation skill, where the error variance (or unexplained variance) is defined as 1 − r2 where r is the correlation coefficient. For example, if aggregation increases the correlation skill from 0.3 to 0.5, the error variance decreases from 0.91 to 0.75, and the error variance ratio is 1.21. Additional to the skill ratio and skill difference, the error variance ratio is offered as an alternate measure of changes in value or utility. A 0.2 increase in the correlation shows up as a greater error variance ratio when the original correlation is relatively high, as it is for temperature, than when it is lower as in the case of precipitation. Table 3 shows that the error variance ratios are approximately equal for temperature and precipitation, offering a different possible interpretation from that given above about precipitation being more greatly benefited by aggregation than temperature.
The geographical distribution of baseline correlation skill, ratio of aggregated to unaggregated skill, and difference between the two, are shown in Figs. 2, 3, and 4, respectively, for precipitation for each of the four 3-month seasons examined. Only results for the ECHAM4.5 AGCM are shown, as those from CCM3.2 are basically similar. Figure 2 indicates the seasonal cycle of the regions of potential predictability for precipitation, based on the variations of SST anomalies in known ocean basin locations. These regions are consistent with those noted in other studies. In JFM, for example, precipitation anomalies are expected in the southern and Ohio Valley portions of the United States in association with ENSO (Ropelewski and Halpert 1987; Barnston 1994; Mason and Goddard 2001), and in parts of southern Africa due to both ENSO and SST anomalies in the Indian Ocean (Rocha and Simmonds 1997; Goddard and Graham 1999; Mason 1998). In AMJ, rainfall in northeast Brazil is a response to both ENSO and the SST anomaly pattern in the tropical Atlantic (Moura and Shukla 1981; Ward and Folland 1991; Hastenrath and Lamb 1977; Folland et al. 2001). In JAS, eastern Australia and Indonesia have precipitation responses to ENSO (Casey 1995; Ropelewski and Halpert 1987), and the Guinea coast in Africa is sensitive both to ENSO and to the SST anomaly pattern in the eastern tropical Atlantic (Folland et al. 1986; Lamb and Peppler 1992; Wagner and da Silva 1994; Ward 1998; Goddard and Mason 2002). Rainfall in southeastern Brazil is also sensitive to the ENSO state during much of the second half of the calendar year (Grimm et al. 2000). While both the Sahel and India have known responses to ENSO and to the SST anomalies in other ocean basins during JAS (Ward 1998; Thapliyal 1997), the ECHAM4.5 is unable to reproduce these. (The CCM3.2 does qualify five contiguous grid points in the Sahel, and none in India.) In OND, Indonesia, southeastern Brazil, and Uruguay are sensitive to ENSO, and east equatorial Africa rainfall responds to ENSO indirectly by virtue of its effect on more local western Indian Ocean SST anomalies (Mutai et al. 1998; Goddard and Graham 1999; Indeje et al. 2000). Precipitation in many of the above-mentioned regions/seasons has been shown to be predicted with some skill in real time by the IRI over the last four years (Goddard et al. 2003), using predicted SST to drive several AGCMs including the two AGCMs used in the present study.
Some regions in addition to those mentioned above have potential SST-forced predictability (Fig. 2). Certain of these potential forecast skills have also been documented, such as rainfall in northern South America and Central America (Aceituno 1988; Ropelewski and Halpert 1987) for several seasons, parts of southwest Asia in northern winter and spring (Barlow et al. 2002; Tippett et al. 2003; and others). It is also noted that potential predictive skill may be present in regions during seasons that are climatologically fairly dry, such as northeastern Brazil in JAS. While interesting, these cases may lack practical utility.
Figures 3 and 4 show, as ratios and differences for aggregated versus nonaggregated conditions, the extent to which a 15:1 spatial aggregation process can increase potential predictability for precipitation. For example, a large increase in skill is shown in tropical eastern Africa during their short rains in OND. This suggests that there is a fairly large area, including Somalia, northern Kenya, and southern Ethiopia, that share a common SST-forced precipitation signal but are plagued with substantial local variability that can greatly weaken the relationship in the absence of spatial aggregation. The very high predictability in northeastern Brazil in AMJ (Fig. 2) can be further improved to some extent through aggregation (Figs. 3 and 4), but the greatest benefits are realized farther west over interior Brazil where unaggregated skills are lower. Gains in skill are noteworthy in portions of the Gulf of Guinea in JAS, and in parts of Indonesia in JAS and OND. In the Sahel in JAS, results from the CCM3.2 AGCM (not shown) indicate some benefit from aggregation, in which single-point skills of 0.20–0.40 are raised by amounts of up to 0.1. A benefit is also seen in southeastern southern Africa in JFM. Aggregation enhances predictive skill by relatively smaller amounts in the United States in JFM and in extratropical eastern Australia in JAS. This is likely due to much of the precipitation being nonconvective and of a larger, synoptic scale during the local winter season, resulting in comparatively less noise to be filtered by spatial aggregation.
Figure 5 shows, for only the AMJ season, results for temperature: the baseline skill, and the ratio and the difference for aggregated versus nonaggregated data. Comparing with the upper right panels of Figs. 2, 3, and 4, it is seen that skills of 0.2 or higher are much more common for temperature than for precipitation; however, the beneficial effects of aggregation are smaller in terms of correlation ratios and differences.
b. Rain gauge observations in northeast Brazil versus Niño-3 region SST
The gridded AGCM simulation data provide fairly clear indications of the sensitivity of forecast skill to precipitation aggregation for areas equal to or greater than the nearly 3° latitude–longitude resolution supplied by the atmospheric models. However, the unaggregated grid square may actually already contain some implicit aggregation, as when the analysis method used for the gridding involves averaging of stations within the square and/or interpolation from additional stations outside of the square. Thus, the effects of aggregation may be relatively less trustworthy on the lower portion of the aggregation continuum, as represented for example by the results for three squares in Fig. 1. In addition to a need to better assess effects of aggregation on skill for smaller areas, one also may wonder at what rate the skill might continue to decrease for areas still smaller than the model grid units. In an attempt to assess these, a set of high-quality rain gauge rainfall data was obtained from FUNCEME. A set of 84 stations was used, covering the 27-yr period of 1971–97. For any month, no more than 1 year out of the 27 could be missing to qualify the station. Missing months were filled using linear regression with neighboring stations serving as predictors, with high training period correlations. The stations are spread throughout the state of Ceará, which spans approximately 3°–8°S, 38°–41°W. While not equally spaced, the stations cover the area of the somewhat cone-shaped state fairly well. The shape of the state of Ceará is such that the effective area is slightly larger than one AGCM grid box.
We examine the effects of spatial averaging in the context of the observed correlation between rainfall and the Niño-3 SST index, which is the average SST over the large area of 5°N–5°S, 90°–150°W. The Niño-3 SST index is closely related to the ENSO state (Zebiak and Cane 1987; Rasmusson and Carpenter 1982). The ENSO state is one of the two major factors well known to influence the rainfall in northeast Brazil during the northern spring rainy season, the other factor being the polarity of a north–south dipole in tropical Atlantic SST related to the intertropical convergence zone (Hastenrath 1984). For simplicity, we focus here only on the Niño-3 index, keeping in mind that the correlations with rainfall would be significantly higher if the Atlantic SST were also taken into account. For consistency with the analysis periods used before, we look at the correlations for the JFM and AMJ seasons, as the other two 3-month periods usually have very little precipitation.
To examine the effect of averaging over smaller areas than was possible in the AGCM analyses, we look at the average of the individual correlations over each of the 84 stations, followed by three levels of progressively greater area averaging. The first level divides the area into 10 subareas, each roughly 1° latitude–longitude. We then compute the average of the 10 correlations between the subarea average rainfall and Niño-3 SST. Each subarea contains approximately eight stations, with some variation. A second level of averaging collapses two adjacent subareas of the 10 described above into one larger area, resulting in 5 areas with about 17 stations per area, with some variation. Finally, the highest level of aggregation is that of the area average of the rainfalls of all 84 stations. No gridding or equal-area accommodation is made for the all-area average.
Table 4 shows the resulting correlations for the four aggregation levels for both 3-month seasons. Correlations with Niño-3 SST are much higher in AMJ than in JFM, providing the opportunity to examine results in a context of a low and a higher strength of association. In both cases, correlations increase as expected with increasing aggregation. There is a noticeable skill increase when considering the change from a single rain gauge to a cluster of 5–14 gauges. An increase in aggregation to about double the area (12–24 gauges) results in an insignificant increase, and the final move to an entire 84-gauge area average provides a small amount of additional noise removal. Recall from the earlier AGCM analyses that northeastern Brazil showed only a moderate amount of improvement with aggregation of rainfall at 10–15 nearby grid squares. Nonetheless, “zooming” down to single-gauge rainfalls clearly lowers the signal-to-noise ratio, with implications for forecast skill. The spatial scale of interest to farmers with small, nonirrigated fields would be best represented by rainfall as measured by a single gauge.
c. Statistical significance
An evaluation of the statistical significance of the results is important because it estimates the probability that they could have come about by chance, due to sampling variability. Table 3 shows that for all four times of the year, aggregation improves simulation skill as compared with unaggregated skill for both temperature and for precipitation, on average, over all locations having a qualified level baseline skill. Let us test the difference between the global means of the ratios of aggregated to unaggregated correlation-measured skill for precipitation and that of temperature, using a t test. This assesses whether the greater improvements for precipitation than for temperature are statistically significant. If the results for the four nonoverlapping seasons can be considered four independent cases (to be challenged momentarily), the t test indicates that the null hypothesis improvements for precipitation equal those for temperature has a 0.003 likelihood for a one-sided test. (A one-sided test is used because of the a priori expectation that aggregation should increase precipitation skill to a greater degree than temperature skill, due to the relatively larger amount of noise-filtering potential for precipitation.) One may legitimately doubt that the four seasons can be regarded as independent realizations due to climate regimes that may overlap the seasons or trends that encompass decades across the seasons. The t test is therefore identically repeated, except that a sample size of two seasons is used instead of four. The revised significance level is now 0.027—weaker, but still significant. The significance of the aggregated-to-nonaggregated ratios for temperature with respect to a null hypothesis value of 1.00 is very strong (i.e., <0.001), as only a 1-sample test is required in this case. Significance for this same hypothesis test for precipitation would be still stronger. It is clear that all of the overall aggregation versus skill results seen here for the AGCMs for the globe have not come about accidentally, and would be expected to be replicable in independent studies.
For the aggregation study using northeast Brazil rain gauge data, statistical significance is not achieved for a Fisher Z test of the difference between the JFM correlation of 0.324 for single stations versus 0.433 for the 84-station average, with only 27 yr in the sample. Nonsignificance applies likewise to the AMJ result. Combining the results for the two seasons for a doubled sample size (assuming independence between the seasons) helps, but still leaves a 12% probability that the differences came about by chance. However, the fact that there is an increase in correlation for all three increments in aggregation level for each of the seasons is suggestive. If the results for the two seasons are regarded as independent then there are six such positive increments, with a chance probability of (1/2)6 = 0.02 that would be significant. While total independence of the two seasons is debatable, the authors suspect that these results would hold on a larger sample of years, seasons, and regions of the globe, based on the AGCM skill results described above.
4. Summary, implications, and conclusions
The results indicate that under conditions of a common precipitation anomaly forcing signal, such as that generated by an SST anomaly in a critical region, the expected skill of a climate forecast increases as the averaging area (the aggregation level) is increased. This general relationship holds regardless of whether the forecast is developed using an AGCM or a statistical model, or whether one is dealing with a simultaneous diagnostic relationship or a simulation or a time-lagged forecast. The effects of noise, or unpredictable localized precipitation variability, hide an underlying spatially homogeneous relationship to a lesser extent as the averaging area is increased. This applies to a large range of area size, from an individual rain gauge to areas as large as 15° latitude–longitude. The limiting factor on the upper end of the area spectrum is the necessity to maintain a homogeneous forcing signal. Thus, the effect of aggregation on skill ceases to be beneficial, or begins to provide diminishing returns, when the area expands to a zone not clearly participating in a tendency toward a given climate anomaly, or that is responding to a different climate forcing signal.
A speculation is that there are probably different types of curves that would summarize the skill as a function of level of aggregation, due to differences in the physical mechanisms supporting the typical precipitation events among locations and times of the year. Additionally, there may be other factors contributing to precipitation patterns that may be potentially predictable but unrecognized and thus subsumed under the umbrella of noise. It is beyond this study's scope to explore the possibilities. Perhaps densely deployed gauge networks and/or quality satellite- and radar-based precipitation data could be exploited to gain a better understanding of the variety of possible influences.
One might inquire as to whether the benefits of spatial aggregation could be estimated simply by measuring the correlation structure as a function of separation distance for the precipitation totals over a gridded area (or rain gauge network) during a given season. It is reasonable to expect that in the presence of a uniform, SST-forced signal, greater skill would be able to be harvested with aggregation when correlations decay over short distances than when neighboring gauges or grid points have higher correlations. Examination of the partial correlations over locations, removing the variance associated with the common forcing signal, would be appropriate for such an analysis. Estimation of the forcing signal could be conducted through AGCM skill diagnostics, possibly with a need for assumptions concerning the singularity of signal sources to reduce the analysis complexity. Because of the practical, “brue-force, bottom-line” orientation of this work, we have not attempted to analytically derive a relationship between the partial correlation structure (falloff as a function of separation distance) over a region and the realizable skill improvement through aggregation, nor have we examined this empirically by computing correlations after removing an estimate of the common forcing signal(s). These issues are clearly of interest, and it would be desirable to demonstrate their potential roles as shortcuts to the approach taken here.
There are several implications of the results shown above. First, greater skill would be expected when forecasting aggregated precipitation anomalies over a region sharing the same externally forced signal (e.g., an SST anomaly), than when forecasting precipitation over an individual cluster of local farms or over a metropolitan area within the region. This has favorable implications for predictions for large watersheds, and their resulting streamflows or reservoir levels, and disappointing implications for predicting rainfall totals over smaller individual areas, such as a 50 km2 plot of pastureland or even smaller localities such as a single villages or farms.
These results place a possibly unforeseen responsibility on forecast providers—that of specifying the spatial scale to which their precipitation forecast skill guidance applies. This is in addition to their responsibility to provide general skill guidance, which is already a challenge in terms of the users' understandability of the forecast. The essence of the spatial-scale aspect is that the general skill guidance is estimated with respect to a specific spatial scale of which the users, and possibly even the forecast producers, may not be aware. Prior to performing the AGCM aggregation exercise shown here, the basic skills were known at the spatial scale of the individual AGCM grid points—2.81° × 2.81°. It is not correct to regard such skills as simply the skills. If such skills were provided along with a real-time forecast (after being adjusted appropriately downward, if necessary, to reflect the uncertainty of the future SSTs), users interested in aggregated information would likely receive higher skills than those advertised, while users looking for very localized rainfall information would likely become disappointed by skills lower than those advertised over the long term. In both cases, user decisions might be made differently if the skills appropriate to the spatial scale in question were explicitly provided.
Some forecasters who notice the favorable effects of aggregation on skill may spatially smooth their forecasts so as to achieve higher hindcast skills, and provide the improved skill guidance with their forecasts. This is fair for the aggregated scale, but skills for applications at highly local scales would be even more greatly misrepresented. A solution to this problem would be to provide users with skill estimates over a range of spatial scales.
For probabilistically expressed forecasts, the level of skill contained in the forecast is not only expressed as guidance that accompanies the forecast, but is reflected in the probability forecast itself. In forecasts with relatively high expected skill, the deviation of the probabilities from climatological probabilities can be greater than when expected skill is low. Thus, for a given specific probability forecast, the amount of departure from climatological probabilities is greater for large than for small spatial scales. In other words, aside from accompanying skill information, even a probability forecast, itself, assumes a level of spatial aggregation, and would be a somewhat different forecast for another level of aggregation. In this case, not only should the forecaster provide guidance on the spatial scale assumed in the skill information, but also indicate how the forecast itself would differ if another spatial scale is of interest. Smaller spatial scales would have “flatter” (more climatological) probabilities, and the climate signal causing a departure from climatological probabilities would have a lesser net effect on the forecast and, in turn, on decisions made on its basis.
The main purpose of highlighting the role of spatial aggregation level in expected precipitation forecast skill is to encourage production of forecast information more useful to those in need of the information. The first step in this direction would be the addition of information about the assumed spatial scale to the information accompanying issued precipitation forecasts—particularly for tropical rainfall for which the spatial scale matters most.
The authors wish to thank various colleagues at the IRI for several stimulating discussions on this topic; in particular, Drs. Shardul Agrawala and James Hansen. The reviewers' comments also helped to improve the paper.
Corresponding author address: Dr. Xiaofeng Gong, International Research Institute for Climate Prediction, Columbia University, 61 Route 9W, Monell #229, Palisades, NY 10964. Email: Xgong@iri.columbia.edu
The hydrological/meteorological center in the state of Ceará in northeastern Brazil.