A monthly reconstruction of precipitation beginning in 1900 is presented. The reconstruction resolves interannual and longer time scales and spatial scales larger than 5° over both land and oceans. Because of different land and ocean data availability, the reconstruction combines two separate historical reconstructions. One analyzes interannual variations directly by fitting gauge-based anomalies to large-scale spatial modes. This direct reconstruction is used for land anomalies and interannual oceanic anomalies. The other analyzes annual and longer variations indirectly from correlations with analyzed sea surface temperature and sea level pressure. This indirect reconstruction is used for oceanic variations with time scales longer than interannual. In addition, a method of estimating reconstruction errors is also presented.
Over land the reconstruction is a filtered representation of the gauge data with data gaps filled. Over oceans the reconstruction gives an estimate of the atmospheric response to changing temperature and pressure, combined with interannual variations. The reconstruction makes it possible to evaluate global precipitation variations for periods much longer than the satellite period, which begins in 1979. Evaluations show some large-scale similarities with coupled model precipitation variations over the twentieth century, including an increasing tendency over the century. The reconstructed land and sea trends tend to be out of phase at low latitudes, similar to the out-of-phase relationship for interannual variations. This reconstruction may be used for climate monitoring, for statistical climate studies of the twentieth century, and for helping to evaluate dynamic climate models. In the future the possibility of improving the reconstruction will be explored by further improving the analysis methods and including additional data.
Observations from a number of earth-orbiting satellites combined with rain gauge measurements make it possible to analyze global precipitation for the satellite era. Monthly precipitation analyses beginning in 1979 have been produced by the Global Precipitation Climatology Project (GPCP) (Huffman et al. 1997; Adler et al. 2003; Huffman et al. 2009) and Climate Prediction Center Merged Analysis of Precipitation (CMAP) (Xie and Arkin 1996, 1997). These land–ocean analyses are valuable for assessing global and regional climate variability in the satellite era. For climate change studies it is desirable to have longer records. Here we discuss improved methods for using the available historical observations with statistics obtained from satellite-based data to extend the global precipitation record back to 1900. This includes ocean-area precipitation, an important component of the global hydrologic cycle that could be affected by climate change.
Over the oceans both sea surface temperature (SST) and sea level pressure (SLP) have been reconstructed through the twentieth century (e.g., see Smith and Reynolds 2005; Smith et al. 2008a; Allan and Ansell 2006). Oceanic monthly reconstructions of SST and SLP anomalies are possible because they were regularly measured by ships over the twentieth century (e.g., Woodruff et al. 1998) and because of their relatively large time and space scales. The longer climate records allow SST and SLP reconstructions to be used to better understand climate variations and to validate climate models. Historical precipitation beginning in 1900 is available for many land regions from rain gauge measurements (e.g., Vose et al. 1998). However, there are many land regions where gauges are sparse, and over oceans there are no systematic gauge observations for the presatellite period.
Reconstructions of historical precipitation that includes oceanic regions have been developed in an attempt to fill in these missing regions for the presatellite era. Xie et al. (2001) fit gauge data to a set of empirical orthogonal functions (EOFs) to reconstruct precipitation for the second half of the twentieth century. Their reconstruction yielded good skill in the tropical Pacific because of its ability to reconstruct variations associated with ENSO. In most other regions their reconstruction had little skill. A similar reconstruction by Efthymiadis et al. (2005) gave similar results, with little skill outside the tropics except near gauge locations. Smith et al. (2008b) produced a similar reconstruction for monthly precipitation beginning in 1900. This reconstruction, computed by fitting Global Historical Climatology Network (GHCN) (Vose et al. 1998) gauge data to a set of EOFs, will be referred to as REOF. The REOF was based on an improved satellite base analysis and carefully tuned. Besides having high skill in the tropics, consistent with earlier studies, the REOF was found to have improved skill over Northern Hemisphere extratropical oceans. The REOF skill is lowest in the extratropical Southern Ocean. Another deficiency with the REOF is its multidecadal component, which was found to be sensitive to the gauge dataset used for the reconstruction. Evaluation of multidecadal variations is important for understanding twentieth-century climate variations, so something more than the REOF was needed.
In an attempt to better resolve multidecadal variations, we developed a canonical correlation analysis (CCA) relating fields of SST and SLP anomalies to precipitation anomalies [Smith et al. (2009a); also see Barnett and Preisendorfer (1987) for a description of CCA]. The SST and SLP anomalies have been reconstructed over oceanic regions and are historically better sampled than precipitation. Relationships for the CCA are developed using a satellite-based precipitation analysis over the satellite era. Since this reconstruction was intended to resolve large-scale multidecadal variations, annual precipitation anomalies were analyzed using their relationships to annual SST and SLP anomalies. We will refer to this reconstruction as RCCA.
Large-scale averages of the RCCA were found to compare well with the available data. The near-global average at gauge locations is consistent with averages of independent gauges. Over oceans both the RCCA and an ensemble of Fourth Assessment Report (AR4) coupled models (Randall et al. 2007) indicate increasing average precipitation on multidecadal time scales, although the RCCA increase is stronger than that from the AR4 ensemble (see Smith et al. 2009a for details). However, smaller-scale variations in the RCCA are much weaker than in the REOF.
Clearly, it is desirable to blend these two analyses, retaining the best features of each. The REOF has more reliable month-to-month variations and better spatial resolution, so its high-frequency variations should be part of the blended analysis. The RCCA has more reliable multidecadal variations, so its low-frequency variations should be part of the blended analysis.
Our previous studies show that many aspects of large-scale precipitation anomalies can be resolved using monthly or annual modes. This is particularly useful for oceanic anomalies, which are important for understanding climate variations but not well described for most of the twentieth century. Because of a lack of direct oceanic data for most of that time, it is difficult to evaluate the climate-scale anomalies and to know the accuracy of those evaluations. But there are some data available, so reconstructions are possible. We present the present reconstruction as a useful step toward a better understanding of historical precipitation, with the understanding that there may be many future improvements in data, analyses, and models that could increase our future understanding of oceanic precipitation variations.
In the following sections, the input data and individual reconstructions to be blended are described in greater detail. Blending methods are then presented, followed by discussions of results and a summary. A method for estimating reconstruction error is presented in the appendix.
2. Input data and reconstructions
Here we describe the different datasets needed to compute the reconstructions. Data used for comparisons with the reconstruction are also described. A description of the reconstruction based on fitting gauge data to EOFS (REOF) and the reconstruction based on a CCA (RCCA) is provided. For both REOF and RCCA the cross-validation tests used to tune the EOF and CCA reconstructions and the skill of the individual analyses are discussed.
a. Input data
Satellite-based datasets are used for development of the reconstructions and to help validate the analyses. These base datasets are needed to compute the large-scale spatial covariance used in the reconstructions. Therefore, it is critical that they be as accurate and unbiased as possible to avoid introducing false signals to the historical reconstruction.
One satellite-based analysis used is the GPCP, mentioned above (Huffman et al. 1997; Adler et al. 2003). The current version of the GPCP is version 2.1 (GPCP.v2.1) (Huffman et al. 2009). The GPCP combines different infrared- and microwave-based satellite analyses after adjusting them to remove intersatellite biases. The combined satellite product is merged with a gauge product. The result is a global monthly precipitation analysis on a 2.5° latitude–longitude grid beginning January 1979. Changes incorporated into GPCP.v2.1 include the use of improved gauge data from the Global Precipitation Climatology Centre (GPCC) and improved adjustments for the satellite inputs. Here the GPCP.v2.1 monthly data are used from 1979 to 2008 for reconstruction model development and evaluation.
The statistical reconstruction models used here will use covariance computed from the satellite base period data, beginning in 1979, to reconstruct the presatellite period, beginning in 1900. It is important that the satellite-period analysis be as free as possible from nonphysical variations to keep spurious variations out of the reconstruction. One potential problem is inhomogeneities from using satellites with different sampling times and different instruments. Problems may be especially severe at high latitudes where satellite data are less reliable. The GPCP data have been carefully constructed for climate studies, including adjustments to minimize intersatellite biases. Testing on an earlier version of the GPCP showed no apparent biases from using multiple satellite inputs (Smith et al. 2006). However, the need for a mix of different inputs over time makes it possible that there may still be some subtle biases in GPCP that could contaminate reconstruction statistics. Therefore, as the satellite base period is extended and analyses of the satellite data improve, it is possible that historical reconstructions may also be improved by the use of these improved base data.
Gauge-based precipitation analyses are used for the REOF, and several gauge analyses are tested here. One gauge analysis is the GHCN (Vose et al. 1998), produced by the National Climatic Data Center. The GHCN is a monthly analysis on a 5° spatial grid, from 1900 to 2008. The GPCC full data reanalysis product version 4 gauge data are also used to test our reconstructions. The monthly GPCC data are available from 1901 to 2007. Descriptions of the GPCC are given by Schneider et al. (2008) and Rudolf (2005). Here we average their 2.5° data to the 5° grid. In addition we also use the University of East Anglia Climatic Research Unit (CRU) 5° monthly gauge analysis (Hulme et al. 1998). The monthly CRU analysis is available from 1900 to 1998.
All three gauge-based datasets are tested in the REOF analysis. Differences can occur because of different data included in the gauge analyses, differences in gauge adjustments and quality control, filling to replace missing stations, and differences in averaging from individual stations to 5° regions. Gauge analysis differences in areal coverage are illustrated by Fig. 1. For most of the historical period the CRU analysis shows better coverage of 5° areas than either of the others, while GHCN has the least. This figure illustrates regional coverage of filled 5° areas but not the total number of gauges. Both the GHCN and GPCC incorporate the CRU stations and use more individual station observations. Differences in the coverage are due to interpolation to fill missing data before averaging to spatial regions. The GPCC is interpolated to a regular grid before spatial averaging. There is more extensive interpolation of CRU stations before averaging, yielding greater product coverage in most years. Interpolation to fill gaps in the CRU station records will likely increase the random error for individual 5° values. However, as we discuss below, the analysis involves fitting data using spatial modes that filter out nearly all random errors, and for our analysis the spatial coverage is more critical. We test these different gauge-based datasets to evaluate each for use with our analysis method, with the understanding that the dataset that is best for our application may not be best for others.
The method used for producing the reconstruction based on EOFs was described by Smith et al. (2008b). Here we summarize the method and note differences in the present REOF compared to Smith et al. (2008b), which should be consulted for more details of the method.
The REOF analysis uses a set of large-scale covariance EOFs of precipitation anomalies. The EOFs need to represent spatial scales that extend from regions where data are available to other regions. For analyses the REOF uses gauge data from over land and islands. The REOF analysis is performed separately in three regions: 80°–20°S, 30°S–30°N, and 20°–80°N. This separation is done to enhance the sensitivity of the reconstruction to extratropical variability, which is generally smaller than tropical variability. After the REOFs are computed for each region, they are merged with smoothing across the overlap regions.
For each region, a set of EOFs is computed using the GPCP.v2.1 monthly anomalies, 1979–2008. A maximum number of EOFs for each region is assigned and used for the reconstruction. The reconstruction finds weights for each of the EOFs for each month that minimize the mean-squared error between the reconstructed and gauge anomaly. Weights are determined by fitting the available gauge data to the set of EOFs.
The maximum number of EOFs to use for each region is determined by cross-validation testing. In addition, each EOF from that maximum set must pass a screening test using the gauge sampling for each month or it will be excluded for that month’s analysis. Cross-validation testing is also used to determine the screening level to use for the REOF in each region. The screening parameter for each EOF mode is the fraction of EOF variance sampled by the available data.
Cross-validation testing uses the GPCP.v2.1 data and historical sampling to simulate reconstruction conditions for three 30-yr periods: 1900–29, 1930–59, and 1960–89. In Smith et al. (2008b), the maximum number of modes was found to be 6 for the southern extratropics, 12 for the tropics, and 11 for the northern extratropics. These settings are chosen to give low rms error in the cross-validation tests conducted using the historical sampling grids. Here the values for these three regions were found to be 5, 15, and 10, respectively. Thus, there was little change in the tuning of the maximum number of modes in each region. For all of these regions there was little change in the mean-squared error when the screening parameter was set between 0.05 and 0.15, compared to changes outside that range. In Smith et al. (2008b), the sampling fraction was set to 0.05. However, here we find that for some regions and historical grids, the error was reduced using a slightly higher screening parameter value; therefore, we use 0.15 in the REOFs. This slightly higher value requires slightly more sampling for each mode compared to the earlier REOF.
In Smith et al. (2008b) a REOF was evaluated by regressing it against climate modes. The resulting regression maps were used to show the spatial reconstruction patterns associated with each mode. Here that is repeated for two important modes—the Southern Oscillation index (SOI) and the North Atlantic Oscillation (NAO)—using the same SOI and NAO index values. These regressions were used to evaluate the GHCN-based REOF, here called REOF(GHCN); the GPCC-based REOF, called REOF(GPCC); and the CRU-based REOF, called REOF(CRU). All three exhibit similar ENSO and NAO regression patterns, indicating that they all resolve these important climate modes.
The global spatial standard deviation for each historical REOF (Fig. 2) indicates their relative consistency of each over time. Here the monthly global spatial variance is computed and then averaged annually before taking the square root to define the spatial standard deviation of each. REOF(GHCN) has lower values before 1950, when its sampling tends to be low (Fig. 1). After about 1990, sampling for both GHCN and CRU decreases. There is a corresponding decrease in its REOF(GHCN) standard deviation in that period (Fig. 2), while the CRU ends after 1998. Compared to REOF(GHCN), both REOF(GPCC) and REOF(CRU) have more consistent standard deviation values over most of the period. REOF(GPCC) has lower standard deviations before 1910, when it has less coverage (Fig. 1), but it also decreases in both sampling and standard deviation in the most recent years (Fig. 2). The REOF(GPCC) also indicates strong positive values associated with the 1997/98 ENSO, which the other gauge-based REOFs only partly resolve.
The GPCC monitoring product is incorporated into the GPCP satellite–gauge analysis, which is used to compute reconstruction statistics. Although the monitoring product is not identical to the GPCC full data reanalysis product used here, it could give the GPCC an advantage over gauge datasets not used in the GPCP. However, even more important is the strong GPCC sampling in more recent years, which allows the REOF(GPCC) standard deviation to be similar to the standard deviation of the GPCP-based REOF, referred to as the REOF(GPCP). The REOF(GPCC) standard deviation in 1997/98 is slightly higher than the REOF(GPCP) standard deviation for that time. Since the GPCP includes much more data than the GPCC, the slightly higher variance for 1997/98 may indicate errors in the REOF(GPCC). Since the REOF analysis is tuned to avoid overfitting errors when data are sparse and because that is not a sparse-sampling time for the GPCC, the higher standard deviation is likely caused by GPCC errors. In any case the differences between the two are small.
Overall, the REOF(CRU) has the most consistent standard deviation values over most of its record and for the overlap period before 1990 it yields values similar to the REOF(GPCP). This is likely due to the CRU product filling of more 5° grid squares over most of its record.
For their common analysis period (1901–98) global spatial correlations between pairs of REOFs are computed to better indicate when they are most similar (Fig. 3). Correlations against the REOF(GHCN) tend to be lowest early in the reconstruction period when GHCN coverage is lowest. REOF(GPCC) and REOF(CRU) have roughly consistent correlations for their entire overlap period, and after 1950 the REOF(GHCN) and REOF(GPCC) correlations have similar values. The strongest correlations are between the REOF(GHCN) and REOF(CRU) in 1950–90. This strong correlation indicates that the two gauge analyses yield similar variations when gauge-product coverage is sufficient in both.
These intercomparisons between the three REOFs indicate that since about 1950 REOF(CRU) and REOF(GHCN) are similar, but early in the twentieth century REOF(CRU) may be more reliable owing to its better gauge-product coverage. The spatial standard deviations and correlations both indicate that the lower GHCN product coverage leads to filtering out variations in the early twentieth century. The GPCC has better coverage than either of the others in recent years; however, it may artificially inflate variations, as discussed above. Even in 1950–90 when sampling is best for all gauge analyses, REOF(GPCC) typically has higher standard deviations than either of the other two.
Because the REOF(CRU) has the most consistent variance over its entire reconstruction period, and also because of its consistency with REOF(GHCN) since 1950, we use the REOF(CRU) for 1900–78. Since the CRU gauge analysis is not updated throughout the entire period and sampling becomes sparse near the end of the twentieth century, a different REOF is required for the end of the period. Therefore, after 1988 we will use the REOF(GPCP). From 1979 to 1988 we smoothly merge the two, using linear weights so that in 1978 the blended REOF is all REOF(CRU) and in 1989 it is all REOF(GPCP). Since the two have similar variance in the overlap period, this blending should not cause a shift in the overall variance of the analysis. Because of filtering by the EOF modes, there are no apparent jumps induced by changing the GPCP inputs over the REOF(GPCP) record. We refer to this blended analysis as the REOF(Blend). The common overlap period, 1979–88, is also used as a recentering period. The averages of anomalies are all forced to equal zero over this period in the comparisons that follow.
In Smith et al. (2008b), it was shown that the REOF method realistically reconstructs interannual variations, but it may be less reliable for representing multidecadal variations. Here the reconstructions are compared for global averages over both land and ocean separately. The annual and global averages are first computed and then filtered using a 7-yr low-pass filter to more clearly show the multidecadal variations. The low-pass weights for the annual average are (0.032, 0.110, 0.220, 0.276, 0.220, 0.110, 0.032), which are close to binomial weights for a 9-yr filter with the end years eliminated.
The land-average multidecadal variations are similar for all three (Fig. 4), although REOF(GHCN) is more damped than the others before 1950. The REOF(GPCP) is similar to the others for most of the overlap period, but it shows a sharp increase in the last several years when the REOF(GHCN) is damped due to a dropoff in sampling. The recent increase is partly reflected in the REOF(GPCC).
Over oceans (Fig. 5) there are greater differences in the first half of the twentieth century. Differences may be increased by the fact that most ocean locations are remote from gauge sampling, making the oceanic component of the historical REOFs depend on large-scale teleconnections from the leading modes. Over land this is less critical because there are many local data to adjust the analysis. Without similar local oceanic data, the oceanic REOF multidecadal variations are less reliable, especially in the early twentieth century when sampling was sparser. In the second half of the twentieth century the ocean-area reconstruction averages are more consistent.
Differences in the oceanic multidecadal signals from earlier REOF analyses were discussed by Smith et al. (2008b), and they inspired Smith et al. (2009a) to develop an indirect reconstruction method for resolving oceanic multidecadal variations. That method reconstructs precipitation anomalies using both local and remote oceanic variables related to precipitation.
The reconstruction using canonical correlation analysis was discussed in detail by Smith et al. (2009a). CCA was used by Barnett and Preisendorfer (1987) to forecast North American temperatures using SST and SLP predictors. The CCA finds correlations between fields of predictors and a predictand field, which can then be used to estimate the predictand field at some other time when only the predictors are available. Historical monthly reconstructions of SST and SLP are available for the twentieth century, based mostly on ship observations of these variables. On long time scales there tend to be relationships between the SST and SLP anomalies and precipitation anomalies. This allows us to use data from the GPCP period to define the relationships and then use those relationships to reconstruct precipitation anomalies at times before the satellite period. Smith et al. (2009a) found that most relationships reflected in the RCCA have seasonal time scales or longer and, therefore, produced their RCCA for annual average anomalies. Here we do the same except that we use the updated GPCP.v2.1 data. The RCCA training period is 1979–2004, from the first year of the satellite period to the last year of the historical SLP analysis.
The number of RCCA modes is tuned to ensure that the optimal amount of variance is reconstructed. Here cross-validation tests are performed that compute the RCCA for each year using training data that excludes the analysis year. For each of the cross-validation years the RCCA used 9 or 10 modes, although 9 were used more often than 10 and the tenth mode never added more than a small fraction of the variance. Usually the tenth and higher modes were truncated since they accounted for less than 1% of the first mode’s variance. Thus, we perform our RCCA using 9 modes. However, testing using fewer modes showed that most multidecadal variations were retained using as few as 3 modes.
In our earlier studies we found that the annual average fields of precipitation anomalies are related to annual average fields of both SST and SLP. Comparing annual RCCA results against the annual GPCP for the dependant period shows that the best correlations occur in the tropics where precipitation anomalies are also strongest. Tropical values are roughly 0.6–0.9, with the highest values in the tropical Pacific associated with ENSO. Lowest correlations, typically less than 0.5, occur at high latitudes where the anomalies also tend to be weak. The global average correlation is about 0.6. If the RCCA is performed using only SST or only SLP, the correlations are lower, with global averages of roughly 0.5 for each used individually.
d. Comparing REOF and RCCA
Next we compare the REOF and RCCA estimates over both land and ocean areas. The REOF discussed here is the blend of REOF (CRU) and REOF (GPCP), REOF(Blend).
Over land the different estimates are all correlated in their interannual variations (Fig. 6). The larger variations in the CRU gauge data are more closely matched by the REOF(Blend) when it is subsampled at gauge locations (shown by the thin dotted black line). Both the REOF(Blend) and the CRU gauges indicate a positive trend over the period, but the RCCA indicates a negative trend over the period (see Table 1). All, including the CRU data, show decreased land precipitation in the 1970s. That 1970s land decrease could be associated with the oceanic 1970s increase, due to a shift of precipitation from one region to another. Because local gauge data anchor the land area REOF(Blend), it should be better able to better represent variations over land compared to its oceanic variations. By comparison, the RCCA land analysis does not use gauges and depends heavily on teleconnections from ocean areas driven by the SSTs. In regions near gauge data, direct reconstructions using those data should be superior to indirect reconstructions. Therefore, the REOF(Blend) should be used over land regions.
Averages over ocean areas are similar for the REOF(Blend) and RCCA for most of the satellite period (Fig. 7). Before 1980 they diverge, with the REOF(Blend) indicating a negative trend and the RCCA a positive trend. In addition, the REOF(Blend) does not resolve the 1970s climate shift, which is associated with a rapid change in Pacific SSTs (Trenberth and Hurrell 1994; Zhang et al. 1997). The RCCA increasing precipitation and 1970s climate shift are modeled from correlations with the SST variations. The ocean-area average AR4 indicates a weaker but consistent positive trend, which is the theoretical response to a warming earth (Held and Soden 2006; Allan and Soden 2008). There is some observational evidence from satellites for overall increasing precipitation (Wentz et al. 2007; Adler et al. 2008). The ability of the RCCA to resolve these oceanic variations, consistent with known and theoretical climate variations, suggests that its multidecadal signal is superior to the REOF(Blend) multidecadal signal over oceans. The local SST data appear to be most important for the RCCA resolution of those oceanic variations.
For the satellite period, the REOF(GPCP) gives a filtered version of the satellite-based GPCP, and the increasing tendency in that period is similar in both during that relatively brief period. Note that if only the tropics are considered, then the variations are stronger; however, they are qualitatively the same since most of the global precipitation variations occur in the tropics.
Although the multidecadal variations are not linear trends, examination of trends is useful for evaluating overall changes of different estimates. Table 1 shows trends from several reconstructions, the AR4 model ensemble, and gauges. The AR4 interannual signal is damped because the models used in the ensemble do not have phase-locked interannual variations, but they do have consistent greenhouse gas and aerosol forcing. The RCCA has a strong positive ocean-area trend, but its land-area trend is negative. The RCCA difference in sign between ocean and land trends may be due to the generally opposite tendency in precipitation anomalies associated with ENSO episodes (Adler et al. 2008). The RCCA ENSO modes are developed from interannual variations, but those ENSO modes are used for modeling variations on longer time scales that can include ENSO-like variations (Zhang et al. 1997). Because the GPCP record is relatively brief, its modes may not fully span multidecadal variations in the full analysis period. If the ENSO-like low-frequency variations have opposite land–sea precipitation tendencies, similar to interannual ENSO variations, then the opposite tendency in the RCCA may be correct. A similar but weaker ocean–land difference is evident for the AR4 ensemble trends.
The CRU gauge data and the land REOF(Blend) both suggest that this tendency for opposite ocean–land precipitation trends may be overestimated by the RCCA. Both of those gauge-based estimates show positive trends over land areas, demonstrating the importance of local data.
3. Merging REOF(Blend) and RCCA
The REOF(Blend) uses the REOF(CRU) through 1978, REOF(GPCP) after 1988, and a smooth blend of the two in between. This step merges the REOF(Blend) with the RCCA by bias adjusting the REOF(Blend) using the RCCA multidecadal signal.
As discussed above, the multidecadal component is approximated by filtering annual averages to remove most interannual variations. In the sections above and here, we filtered over seven years using the near-binomial weights defined in section 2b. The figures in section 2 illustrate the effect of this filter. This filter was chosen for adjusting the REOF(Blend) after performing a number of tests on an earlier REOF analysis using GPCP base data and GHCN gauge data.
In section 2 we show that over the oceans the REOF(Blend) multidecadal signal is less consistent between analyses early in the twentieth century, owing to sparse data. Therefore, oceanic regions are bias adjusted so that their multidecadal variations match the RCCA multidecadal variations. The REOF(Blend) is annually averaged and filtered using the seven annual weights, while the annual RCCA analysis is similarly filtered to define its multidecadal signal. Both multidecadal signals are interpolated to monthly values, and over oceans the REOF(Blend) multidecadal signal is removed and the RCCA multidecadal signal is inserted.
Land regions do not require bias adjustment since they are sampled by the gauges, which are assumed here to be unbiased. Biases in the gauge data can occur, especially in cold regions because of blowing snow. However, these biases are better understood than potential biases in oceanic analyses, and therefore we concentrate on adjusting the oceanic analysis. Since land regions are not adjusted, the adjustment weight for 5° regions that are all land is zero. The adjustment weight for regions that are all ocean is one, and for coastal and island regions the weight is between 0 and 1, depending on the fraction of land area. Because the land REOF(Blend) multidecadal signal is similar to the land RCCA multidecadal signal, coastal discontinuities in the multidecadal signal are minimal. The REOF(Blend) uses the REOF(GPCP) data after 1988, which gives it good oceanic sampling for the recent period. Thus, it should not be necessary to bias adjust it for the most recent years and for any updates to the analysis that we may wish to produce. Therefore, the bias adjustment is allowed to decay linearly from full strength in 1989 to zero in 1999. Because the REOF(GPCP) heavily filters the GPCP, we should be able to use future updated versions of the GPCP to update the analysis without introducing inhomogeneities.
We tested an analysis in which the RCCA annual anomalies are statistically reinjected into the bias-adjusted REOF(Blend). This could possibly improve skill if the annual RCCA contains interannual variations that are not well represented in the annual bias-adjusted REOF(Blend), which could occur in parts of the southern oceans where the REOF signal is weakest. We found that the analysis with reinjected RCCA had lower variance than the bias-adjusted REOF(Blend) almost everywhere, including in the Southern Ocean. Since variance damping is undesirable, we do not use the analysis with the reinjected RCCA.
Where there are gauges, reinjecting the gauge data will enhance the accuracy of the analysis, particularly for smaller spatial scales. A user of this reanalysis particularly interested in land values may wish to reinject a gauge analysis, such as one of those tested here. Our major goal here is to reconstruct oceanic variations, so gauge reinjection is not performed for this version of the reconstruction.
4. Merged precipitation anomalies
Here the blended bias-adjusted reconstruction is discussed and compared with other analyses. First, the overall variations are examined to see how they may change in time. An overall measure of analysis strength is provided by the spatial standard deviation. The global spatial variance is computed monthly and then annually averaged for plotting clarity before the square root is taken. We show spatial standard deviation time series for the bias-adjusted REOF(Blend), the RCCA, the REOF(GPCP), and the GPCP (Fig. 8). For consistency, the RCCA values were interpolated to monthly values before computing spatial statistics.
The adjusted REOF(Blend) has systematically higher standard deviation than the RCCA, although both indicate ENSO variations over the analysis period. Both adjusted REOF(Blend) and RCCA have consistent standard deviations, with slight negative trends in each. Thus, the sparse sampling early in the twentieth century is not causing the analyses to be damped in that period. For REOF(GPCP) the standard deviation is consistent with values for the adjusted REOF(Blend). Both have similar averages and magnitudes of changes with ENSO episodes. This indicates that blending with the adjusted REOF(GPCP) should not cause variance jumps relative to the earlier period. The GPCP standard deviation is largest of all because those data are not filtered with spatial modes and they represent all satellite and gauge spatial variations. In addition, the GPCP standard deviation has a trend from before 1990, when infrared-based satellite estimates dominate the analysis, to the later years when microwave-based satellite estimates are used. The microwave-based satellite estimates have higher spatial resolution, and this trend in the GPCP standard deviation reflects the change in instruments used rather than changes in precipitation. Filtering in the REOF(GPCP) removes this trend from the data used in the blend.
The bias-adjusted REOF(Blend) with the 7-yr low-pass filter applied is shown for averages over the global oceans (Fig. 9) and averages over all ocean and land areas (Fig. 10). Over the oceans, this is the same as the RCCA low-pass average. The uncertainty estimates over oceans, computed using methods described in the appendix, are about 0.5 mm month−1 early in the period, shrinking to less than half that by the end of the period. Much of this error is due to bias error from the RCCA, but the sampling error component also contributes to errors early in the analysis period. Near the end of the analysis period most of the error is from bias errors. Bias errors are largest before 1940 when the SST bias uncertainty is largest, which causes uncertainty in the RCCA. Note that the 1970s increase in oceanic precipitation occurs before the beginning of the GPCP period. Testing the RCCA analysis showed that it is caused by changes in SSTs in that period. The rapid increase in the oceanic RCCA is associated mostly with the main ENSO mode, caused by the ENSO-like shift in SSTs (Trenberth and Hurrell 1994; Zhang et al. 1997; Smith et al. 2009b).
Combined ocean and land averages (Fig. 10) have an increase over the twentieth century similar to the ocean averages, but the increase is not as strong when land areas are included. In addition, for the total area the 1970s climate shift is not apparent, and there are fewer variations with time scales shorter than 10 years. Thus, there is a tendency for land variations to counteract oceanic variations, likely from land-to-ocean shifts of precipitation associated with the variations. In the combined average there is more year-to-year variation in the satellite period, which could be influenced by satellite sampling in the REOF(GPCP) part of the reconstruction. However, Fig. 8 shows that the REOF filtering removes most or all additional variance from satellites, while Fig. 5 indicates that the REOF(GPCC) roughly matches the REOF(GPCP) over the common record. In addition, Fig. 10 indicates greater year-to-year variation before 1920 than over much of the period, so the more frequent variations at the end of the period may represent natural variations unrelated to sampling.
The error estimates in Fig. 10 are also smaller for the all-area average, with largest values between about 1940 and 1990 due to an increase in the sampling error estimate over that period. After 1990 errors are smaller. Note that the sampling error estimate used here is computed from the difference in variance between the base period and the historical reconstruction. It uses the assumption that the variance is roughly stationary, and it does not directly measure the sampling. Thus, it should be considered a crude estimate of the sampling error. The midcentury inflation in global all-area sampling error is likely influenced by changes in SST and SLP sampling, which affects the RCCA variance. But, the reduced variance may also represent natural reductions in the variation of important climate modes, such as the ENSO and NAO.
Next, the tendency and strength of the analysis are evaluated using the linear trend and the standard deviation over the analysis period (Fig. 11). Here trends are used as a diagnostic tool to evaluate the reconstruction. Trends may not be fully resolved by the reconstruction because the base period is not long enough to span all multidecadal variations. However, the results indicate that many multidecadal variations are resolved by our reconstruction, making this a meaningful diagnostic. For both monthly anomalies are used. The trend is scaled so that it may be plotted using the same shading as the standard deviation. The trends are clearly strongest over the oceans where the RCCA multidecadal component defines them. However, there are trends over land, including a positive trend over the eastern United States. There is also consistency between ocean and land trends in several places, including parts of South America, the west coast of North America, and northern Australia.
The standard deviation shows that strongest variations are over the tropics, but there are secondary maxima over the northern midlatitudes associated with extratropical storm tracks. There is much less variation in the Southern Ocean, where the REOF analysis has only five modes owing to the lack of gauge sampling in that region.
Much of the variation in the Southern Ocean comes from the RCCA component of the analysis, as indicated by the standard deviation ratio of low-pass filtered data to the unfiltered standard deviation (Fig. 12, top). The low-pass standard deviation is also a large fraction of the whole in the southeast Pacific and Atlantic, which are normally dry, and in the Arabian Sea and off the west coast of North America and North Africa. Over regions dominated by extratropical cyclones the low-pass standard deviation is a small fraction of the total. The fraction of the trend standard deviation to the low-pass data standard deviation is high in many of the same places where the low-pass data accounts for much of the variation (Fig. 12, bottom). This indicates that where the low-pass standard deviation is relatively strong, much of its variation is explained by a linear trend. Trends account for much of the variation over the Southern Ocean, where the trend itself is positive and relatively weak (Fig. 11, top). A positive trend is also important over the eastern tropical Pacific, which influences ENSO-like variations. Off the southwest coast of North America a negative trend is important, in the midlatitude subsidence zone. The trends are strongest over the oceans but they can influence adjacent land regions.
The differences in overall land and ocean trends are a bit clearer when zonal averages of each are compared (Fig. 13). The ocean-area trends are stronger, especially in the tropics. Over the oceans there is a positive trend in the tropics and a negative trend in each hemisphere in the subtropics. In the Northern Hemisphere there is a negative ocean trend in the extra tropics, while in the Southern Hemisphere the ocean trend becomes positive south of about 40°S. The land-area trends are weaker and generally negatively correlated with the ocean-area trends. In the tropics the land trend is negative, with more positive trends in the subtropics at latitudes where the ocean trends are negative. Just south of 30°N there is a weak negative land trend, with a weak positive land trend just north of 30°N. This is consistent with slight drying in the Northern Hemisphere desert zones and increasing precipitation in eastern North America. The all-area trends are similar to the ocean trends in shape, but their magnitude is damped due to averaging with the land areas.
To help compare the reconstruction multidecadal tendency with that from the AR4 model ensemble, a joint EOF (JEOF) analysis is done of the two fields. Both fields are low-pass filtered and normalized to concentrate on similarities in their multidecadal variations. About 30% of the variance is accounted for by the first JEOF mode, which shows a clear trendlike tendency with some similarities in the patterns of both fields (Fig. 14). In particular, they both indicate increasing precipitation over the Southern Ocean and in parts of the tropical Pacific. The reconstruction Southern Ocean tendency is less uniform than in AR4, and the reconstruction tropical Pacific increase is shifted east relative to AR4, but the similarity of these two signals suggests that AR4 broadly represents multidecadal variations in those regions.
Both fields also show decreases in the tropical Atlantic and in some midlatitude zones, but the similarities are not as strong in those regions. In particular, both show decreases in southern Europe, but the decrease is larger in AR4, and the reconstruction shows increases in the Eastern Mediterranean. Both show decreases in the Pacific near and extending into the southwest United States and Mexico area, but the AR4 decrease is more extensive over land and less extensive over the North Pacific. Both show an increase over eastern North America, but the AR4 increase is farther north. In addition, at high northern latitudes the AR4 suggests more systematic increases than the reconstruction.
Here we only perform a JEOF analysis using the ensemble of models, and we do not evaluate individual models. It is possible that some models may compare better with the reconstruction than others. Modeling groups may be able to use the reconstruction to evaluate their output over the twentieth century, which could aid the development of improved coupled models.
Historical global precipitation has been reconstructed on a 5° monthly grid beginning in 1900. Both land and ocean areas are analyzed. The land-area analysis is based on fitting the available gauge data to a set of large-scale spatial EOF modes. That analysis, referred to as REOF, was found to be able to represent large-scale monthly variations over land. Over the oceans the REOF represents most interannual and shorter-scale variations; however, because of the scarcity of gauges, the multidecadal variations over oceans appear to be less reliable. Therefore, the ocean-area analysis used is a combination of REOF with an analysis that uses a canonical correlation analysis to obtain precipitation anomalies from SST and SLP anomalies, referred to as RCCA. In combining them, the low-pass filtered RCCA is used to bias adjust the ocean-area low-pass REOF. Both REOF and RCCA are developed using the GPCP data, beginning in 1979. Statistics from that period are used to reconstruct precipitation over 1900–2008. The REOF and RCCA methods were developed and described in earlier papers. Here we show how to best combine them and also develop uncertainty estimates for the combined reconstruction.
Evaluations of the reconstruction suggest that it could be of use for climate studies and model evaluation. For example, an earlier version of the reconstruction was used by Mariotti (2010) to help analyze long-term changes in the Mediterranean region. This improved version may be used to assist near-global analyses. The reconstruction shows trendlike variations over both oceans and land, with the greatest changes over tropical oceans. Trends over land are weaker than over oceans, and in the tropics and subtropics they tend to be opposite to the ocean trends. This land–sea difference is similar to the land–sea precipitation differences associated with ENSO over the satellite period (Adler et al. 2008).
The reconstruction cannot resolve finescale variations because of the filtering using spatial modes, although it should represent most large-scale variations. Because much of the reconstruction is based on a gauge dataset, any systematic errors in that dataset will influence the reconstruction. Most random error in the gauge data should be eliminated by filtering using a set of modes, with screening to remove poorly sampled modes. In addition, the RCCA component of the analysis assumes that the relationships between precipitation and the combined SST and SLP are stationary over the reconstruction period.
Error estimates are based on changes in variance over time, and on error estimates for the SST and SLP that cause errors in the RCCA. Possible errors in the gauge dataset are not considered in the estimate, nor are possible errors in GPCP, or in our assumption that the relationships are stationary over the reconstruction period. In addition, errors caused by filtering the full data using the set of EOFs are also not considered here. The error estimate is a measure of how well historical data may be reconstructed relative to EOF-filtered satellite-era data.
In the future we will consider including other data in our analyses as well as evaluating other reconstruction methods. Possible additional data uses include the reinjection of the gauge data themselves and data from extended model-based reanalyses (Compo et al. 2006). The present reconstruction is available online to users (at http://cics.umd.edu/~tsmith/PR/PR.html).
We thank several centers for making their data easily available for this study, including the National Climatic Data Center for the GHCN and SST data, NASA for the GPCP, the Deutscher Wetterdienst for the GPCC, the University of East Anglia for the CRU, and the Met Office Hadley Centre for the SLP data (available online at www.metoffice.gov.uk/hadobs). We also thank R. Vose and two anonymous reviewers for their useful comments. This project is supported in part by the Climate Change Data and Diagnostic program element of the NOAA Climate Program Office and the Cooperative Institute for Climate Studies (NOAA Grant NA17EC1483). The contents of this paper are solely the opinions of the authors and do not constitute a statement of policy, decision, or position on behalf of NOAA or the U.S. Government.
A method for estimating the reconstruction error is outlined here. The error estimate is divided into two parts: a sampling error and a bias error. The sampling error variance is computed by finding the fraction of the variance resolved by the analysis and subtracting that from an estimate of the total variance. Here we use the REOF(GPCP) variance as the total variance since that filtered data represent the climate-scale precipitation variations that this analysis attempts to resolve using historical data.
To see how this works, consider the error variance, or mean-squared error, which is defined as
Here R is the reconstruction and P is the true precipitation anomaly; the angle brackets denote averaging. By expanding Eq. (1) we can obtain
Here σR2 is the reconstruction variance, σ2 is the true precipitation variance, and r is the correlation between the reconstruction and the true precipitation. The first three terms on the rhs of Eq. (2) account for the sampling and random error variance, ES2 = σR2 + σ2 − 2σRσr. The last term on the rhs of Eq. (2) accounts for the mean bias error variance, EB2 = (〈R〉 −〈P〉)2.
The random error, due to noise in the analysis, should be a small fraction of the total error. That is because the reconstructions are produced by filtering data using spatial modes that filter out most noise. Here we will assume that the reconstruction noise is negligible.
The correlation squared, r 2, defines the fraction of the variance accounted for by the reconstruction. By ignoring random noise we can estimate this as r 2 ≈ σR2/σ2, which makes it possible to estimate the sampling error variance as
The reconstruction anomaly variance can be estimated directly from the analysis. The sampling error variance is simply the variance not accounted for by the reconstruction.
For the sampling error variance, we are most interested in how well the climate-scale features of precipitation anomalies are resolved. Thus, for our estimate of the true variance we do not use the full GPCP.v2.1 variance, but rather the variance of the REOF(GPCP). The filtering removes small-scale variations; however, because of satellite sampling in the data, it retains sampling of all climate-scale variations. This gives a measure of relative error from one time to another that can help guide users of the reconstruction.
Over land the gauges that anchor the land REOF(Blend) are assumed to be unbiased, suggesting that the land bias error should be low. Here we simplify the land bias estimate by ignoring systematic differences that may be caused by systematic under representation by the available REOF modes. Thus, if all of the available REOF modes are used, then there will be no land REOF(Blend) bias error in this analysis since the gauges are assumed to be unbiased. If fewer REOF modes are used, then errors will be represented in the sampling error component discussed above. Making this assumption allows the errors to be estimated using the available data.
Over oceans the multidecadal signal of both reconstructions is forced to match the RCCA multidecadal signal. The RCCA bias error variance is estimated using the bias errors likely to contaminate the SST and SLP forcing data in the RCCA. Bias errors for these forcing fields are discussed by Smith et al. (2008a) and Rayner et al. (2006) for SST and by Allan and Ansell (2006) for SLP. Because these biases are only approximately known, they are roughly estimated to evaluate their approximate influence on the RCCA bias estimates. Here the SST bias uncertainty standard error is set to its global value, which is about 0.06°C or less before 1939 (Rayner et al. 2006). From 1939 to 1941 it is damped linearly each year down to 0.015°C in 1941. It is held at that level for the remaining years of the analysis. The larger values earlier in the period are due to the need for a large historical bias adjustment in that period when different types of buckets were typically used to measure SSTs. In more recent years the sampling is more consistent and SSTs have smaller biases, accounted for by the smaller estimate for the most recent period. The SLP bias standard error has not been studied as extensively. Therefore, we here estimate it to be 0.25 times the SLP anomaly standard deviation and hold it constant in time so as to compute bias error estimates. The actual bias uncertainty should be much less than the standard deviation. We use these uncertainty estimates in RCCA tests to compute how much they influence the results.
Three RCCA tests are used to evaluate bias uncertainty: the first is forced with SSTs equal to their bias standard error and SLPs set to zero, the second with SSTs set to zero and SLPs set to their bias standard error, and the third with both SSTs and SLPs set to their bias standard errors. In these tests the SST bias standard error dominates the resulting bias uncertainty estimates. The standard errors from all three RCCA tests are averaged to estimate the RCCA standard errors. Typical values for 5° annual estimates are ±6 mm month−1. Averaging reduces the magnitude of the bias error since the modes force both positive and negative values. Averaging over all (land and ocean RCCA areas) reduces the bias errors for the global average to 0.05 mm month−1 for 1900–38. After 1941 it is reduced to less than 0.01 mm month−1.
* Current affiliation: Department of Atmospheric Science, Colorado State University, Fort Collins, Colorado
Corresponding author address: T. Smith, NOAA/NESDIS/STAR/SCSB and CICS/ESSIC, 5825 University Research Ct., Suite 4001, College Park, MD 20740. Email: email@example.com