1. Introduction
Comprehensive documentation of the terrestrial water cycle at the global scale and its evolution over time is fundamental to understanding the earth’s climate system and assessing the impacts owing to climate change. Such documentation is also needed to characterize the memories, pathways, and feedbacks between key water, energy, and biogeochemical cycles. With such an enhanced understanding, there is the potential for research programs, such as the World Climate Research Programme (WCRP) Global Energy and Water Cycle Experiment (GEWEX) (Morel 2001) and the National Aeronautics and Space Administration (NASA) Energy and Water Cycle Study (NEWS) (NASA NEWS Science Integration Team 2007), to resolve overarching scientific goals to document and enable improved, observationally based predictions of the energy and water cycles, and “to understand how the Earth is changing and what are the consequences for life on Earth” (NASA 2003). The GEWEX long-term scientific goal is to obtain a quantitative description of weather-scale variations in the global energy and water cycles over a period of at least 20 yr, which will provide the needed scientific basis for understanding climate variability and change. Such long-term datasets have been referred to as Earth System Data Records (ESDRs) by NASA’s Making Earth Science Data Records for Use in Research Environments (MEaSUREs) program, and climate data records (CDRs) by the NOAA National Climatic Data Center and the European Organization for the Exploitation of Meteorological Satellites (EUMETSAT). More broadly, long-term, high-quality, and uninterrupted observational datasets are central to the goals of the Global Climate Observing System (GCOS) framework for the sustained monitoring of essential climate variables (ECV) in support of climate change research.
An underlying challenge in using current in situ measurements, remote sensing (RS) observations, terrestrial water cycle variables from reanalysis model output, or offline land surface models (LSMs) is resolving the uncertainty among the various estimates for a specific variable and the consistency among the variables comprised in the terrestrial water budget. Rawlins et al. (2010) analyzed pan-Arctic terrestrial water budget datasets to assess the intensification of the Arctic hydrologic cycle. While they found consistency among the trends, precipitation (p) estimates from eight sources ranged from 420 to 520 mm yr−1; evapotranspiration (e) from six sources ranged from 150 to 320 mm yr−1; and river discharge (q) estimated as p − e ranged from 180 to 290 mm yr−1 (five sources). Sahoo et al. (2011) evaluated remote-sensing-only datasets in terms of their depiction of terrestrial water budgets over 20 large global river basins. Uncertainties were found to be largest across precipitation datasets relative to other budget components. This was borne out by Tian and Peters-Lidard (2010) who found uncertainties across remote sensing precipitation products on the order of 100%–140% in mid to high latitudes and, especially, for cold season and light precipitation over complex terrain.
Providing data records for all scales and time periods, with sufficient accuracy and consistency for all applications, is virtually impossible from a single estimation system. Therefore, it is also one of the core strategies of initiatives such as GEWEX and MEaSUREs to utilize multiple estimates when providing water cycle data records to users, for example, from in situ observations, remote sensing retrievals, atmospheric reanalyses, and offline LSMs. This multisource strategy has the potential to compensate for the limitations of each individual estimation method in terms of its accuracy, spatial coverage, temporal sampling frequency, and so forth. At the same time, this strategy also raises new questions and challenges. First, how do we allow for the discrepancies among different data sources, which can be significant? If a single combined estimate of a variable is to be made from many competing sources, how do we determine the accuracy and consistency of each source and assign uncertainties to the final estimate? Another challenge is that, in many cases, estimates of budget components from different sources do not close the water budget (Pan and Wood 2006; Sheffield et al. 2009; Gao et al. 2010; McCabe et al. 2008; Sahoo et al. 2011); that is, the basic physical constraint of mass balance of water is not satisfied. For example, Sheffield et al. (2009) attempted to close the water budget for the Mississippi River basin using multiple remote sensing estimates of budget components but found significant errors that were larger than the observed streamflow, even after bias correction. Sahoo et al. (2011) found similar magnitude nonclosure errors for multiple global river basins and nonclosure errors on the order of 5%–25% of mean annual precipitation. If mass balance is not preserved, estimates for different variables may present conflicting information (Sheffield et al. 2009). How do we ensure such physical consistency in our estimates of individual variables when multiple sources of data are merged? With these questions and challenges in mind, this study is motivated to develop a strategy to integrate multiple sources of water cycle estimates into one consistent set of water-cycle data records using data assimilation techniques.
Our overall goal is to develop a method for merging multiple sources of water budget variables in a consistent manner and develop a long-term global data record for the terrestrial water cycle. By utilizing information from multiple sources, it is expected that the data record will provide a best possible estimate that can be used as a baseline for various hydrological and climate studies. Nevertheless, there are two major challenges that will have to be addressed to meet this goal:
how to determine the quality or error level of each individual estimate and optimally merge them;
how to ensure that the estimates are physically consistent, such that the mass balance of water is conserved among all flux and storage terms.
We apply the merging scheme for the 23-yr period 1984–2006, which provides the largest overlap of long-term, large-scale estimates from in situ, remote sensing, and model datasets. The merging can be applied over longer time periods, but this would rely more heavily on modeled datasets before the satellite period (and be subject to the associated shifts due to changing observing systems) and raises the issue of how to merge datasets that change in number and quality. This will be addressed in future work. The analysis is performed over 32 major river basins in the world (Fig. 1) at monthly time scale since these scales are commensurate with the availability of measured streamflow data for the study period, which forms a robust constraint on the water budget. Section 2 gives a discussion of how each terrestrial water cycle variable is estimated and the source data used. Section 3 details the methodology for data merging and balance constraining. Results are presented and discussed in section 4, followed by conclusions in section 5.
Thirty-two global basins selected for the water cycle estimation.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
2. The terrestrial water cycle and sources of data
The terrestrial water cycle consists of fluxes into and from the land surface and the water storage on and below the land surface, and is part of the larger circulation of water between the atmosphere, land, and ocean. This terrestrial water cycle is the closest to human habitats and, thus, has the most direct impacts on human lives, for example, through flooding, drought, agricultural productivity, water resources, and ecosystem health. The terrestrial water budget consists of four main terms: precipitation p, evapotranspiration e, runoff q, and total terrestrial water storage s. Note that the precipitation includes both liquid and solid forms, and the storage term includes all possible water stores (soil moisture/ice, snowpack, vegetation canopy moisture/snow, groundwater, wetlands, lakes, reservoirs, and streams). These water cycle variables can be observed in different ways.
For precipitation, long-term records come primarily from in situ (rain gauge) or ground radar measurements. The density of gauge or radar networks differs across the world with developed counties generally having dense networks and developing regions having sparse measurements. In the past decade, there has been a dramatic growth in the estimation of precipitation from satellite remote sensing (Pan et al. 2010) because of its superior spatial coverage, especially after the launch of the Tropical Rainfall Measurement Mission (TRMM) satellite in late 1997. In this study, four in-situ-based products are used—the Climate Prediction Center (CPC) product (Chen et al. 2002), the Climate Research Unit (CRU) product (Mitchell and Jones 2005), the Willmott–Matsuura (WM) product (Matsuura and Willmott 2010), and the Global Precipitation Climate Center (GPCC) product (Schneider et al. 2008). No satellite data are used because most satellite products are too short (begin after 1998), and some are rescaled against gauge data and so are only marginally different from in situ products at the basin and monthly scale. Figure 2 shows the seasonal cycle of precipitation calculated from the four products over six representative basins. In general, there is a good agreement among the products because they have many rain gauges in common. The spread is slightly higher in densely gauged basins like the Danube, possibly due to the difference in gridding and undercatch correction procedures. The spread is extremely low in sparsely monitored regions such as the Amazon and Niger basins, suggesting not only a heavy overlap of gauges being used among products, but also a lack of data for procedures like undercatch correction.
For evapotranspiration, in-situ-based estimates rely on networks of flux towers. Although these networks are very sparse globally, progress has been made in upscaling flux tower estimates to global coverage (Jung et al. 2009). Large-scale estimates can also be derived from remote sensing using satellite-retrieved radiation fluxes and surface meteorological conditions. The retrieval is usually performed using an empirically based, process-based, or energy balance model of boundary layer fluxes, such as the Penman–Monteith (PM), Priestly–Taylor (PT), or the Surface Energy Balance System (SEBS) models (Su 2002). Two evapotranspiration datasets are used in this study: the upscaled flux tower-based dataset from the Max Planck Institute (MPI) ( Jung et al. 2010) and the SEBS-derived estimates (Vinukollu et al. 2011) using radiation fluxes from the International Satellite Cloud Climatology Project (ISCCP; Rossow and Schiffer 1999).
Streamflow is well recorded through river gauges and long-term datasets are available for most major rivers. We use monthly streamflow data compiled by the Global Runoff Data Centre (GRDC). There are gaps in the GRDC records and a majority of basins miss no more than a few years of data except for seven: the Aral, Dnieper, Don, Limpopo, Nile, Ural, and Volga miss >40% of the months in the study period. To fill the gaps, a linear regression is performed using land surface model (LSM, details of the LSM are provided later) simulated streamflow and all available GRDC records (back to 1950), and the missing values are filled based on LSM simulated values. This procedure is very similar to that used in Dai et al. (2009) except that the regression is always performed without considering the significance of the linear dependency.
The terrestrial water storage term includes several variables such as soil moisture, snow, and lake storage. Some of these variables can be observed separately, both from in situ and satellite remote sensing, but others cannot, such as groundwater; some variables suffer from sampling issues, such as low temporal sampling and low resolution for altimeter data and shallow sampling depth for microwave-based soil moisture retrievals. The estimates derived from the Gravity Recovery and Climate Experiment (GRACE) sensor overcome these issues and provide information on total terrestrial storage anomalies. The GRACE storage anomalies (Swenson and Wahr 2002), available from 2002 onward, are used, and estimates from LSM are used for years prior to 2002.
In addition to observational approaches, dynamic model-based or hybrid approaches are also a good source of information. Climate/weather model reanalysis is one of the best ways to reconstruct the fluxes and states of the atmosphere and land because it provides consistent and continuous fields of all of the water cycle variables, albeit with errors due to the shortcomings of the modeling and assimilation process. We use the European Centre for Medium-Range Weather Forecast’s (ECMWF’s) Interim Re-Analysis (ERA-Interim; Simmons et al. 2006) as an alternative source for evapotranspiration (note that ERA-Interim data start from 1989). To minimize the artifacts caused by the assimilation that generally impacts the land water balance, the predicted evapotranspiration from ERA-Interim is replaced by the values that are “inferred” from its atmospheric water budget, that is, the moisture divergence minus the precipitation and the change in the atmospheric column total moisture (Troy et al. 2010). This is referred to as “ERA-Interim Inferred” hereafter.
Land surface models (LSMs) offer a very sophisticated parameterization for the land surface, and offline LSM simulations provide reasonable estimates of the land surface states and fluxes when they are forced with high-quality surface meteorological data. Unlike a climate or weather model, precipitation is not a prognostic variable in a LSM and has to be supplied as an input. Here we use an offline LSM simulation with the Variable Infiltration Capacity (VIC) model (Liang et al. 1994, 1996). This VIC simulation is forced with surface meteorological data from the Princeton Global Forcing (PGF) (Sheffield et al. 2006) and is an updated version of the simulation of Sheffield and Wood (2007). VIC is calibrated against streamflow data for 25 large river basins globally, and thus gives a reasonable depiction of the terrestrial water cycle at least at monthly time scale. We use the VIC estimated evapotranspiration within the data merging, the VIC runoff to infill missing data in GRDC, and VIC storage change for years prior to 2002 before the GRACE satellite was launched.
Seasonal cycle of precipitation from different products over six representative basins for 1984–2006.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
Table 1 summarizes the data sources used in this study. Figure 3 shows the seasonal cycle of evapotranspiration calculated from the four products—VIC, ERA-Interim Inferred, SEBS (ISCCP), and MPI—over the six basins. Unlike the precipitation, the agreement among products is very poor, for example, some products are consistently lower or higher than others. Over the Amazon, there is not even consensus on the shape of the seasonal cycle. This significant divergence among the products and the uncertainty in the true seasonal cycle exemplifies the need to merge information from different sources. The method for merging the data products will be discussed in the next section.
Observational and modeling data used in the water cycle study.
As in Fig. 2 but of evapotranspiration. The data for ERA-Interim Inferred are for 1989–2006.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
3. Water cycle assessment procedure
To produce a single consistent set of estimates for the water cycle variables from multiple sources of data, the assessment procedure includes three basic steps, as shown in the three boxes in Fig. 4. The first step performs conventional error and bias analysis on the various observational and model-derived datasets. This step determines the error variances (or equivalently rms errors or standard errors) of all data products, and corrects any known biases in them. Based on this error information, the second step merges, variable by variable, the various estimates on the same variable into one. The third step resolves the water balance errors and ensures a perfectly closed water budget.
Flowchart of the water cycle assessment procedure.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
a. Conventional error and bias analysis
In this first step, various aspects of the products are examined based on information like the sensor characteristics, production methods, source data, model parameterization, calibration procedure, error assessment/validation, and so on. When possible, we identify those “best” products that are believed to be relatively unbiased and have the most reasonable temporal dynamics, and perform bias correction with respect to the best ones. Otherwise no bias correction will be performed. Then, we try to quantify the error levels in different data products from available information. The process differs by variable.
Basin-mean precipitation gauge density (gauges per 106 km2) during the study period. Density data for WM are not available and are assumed to be one-half of GPCC. The density–error curve is plotted on the left for readers to loop up for error levels given the gauge density.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
The second type of error is caused by uncertainties in the production processes and, thus, can be approximated by their deviations from the mean of all products. The two types of errors are assumed independent and summed. The calculations are performed for each of the 12 months in a year because the error levels may vary seasonally: for each product, 12 error variances are calculated, one for each month.
For evapotranspiration, the situation is quite different. First of all, there is a lack of consensus on the magnitude and shape of the seasonal cycle for places like the Amazon (Fig. 3), and a bias correction can only be based on limited validation studies conducted on individual products over limited regions (Luo et al. 2003; Vinukollu et al. 2011). The Amazon is generally an energy-limited regime for evapotranspiration (Vinukollu et al. 2011) and remote-sensing-based methods tend to work less well because of the large uncertainties in surface radiation and surface meteorology. Also, ground observations are scarce, and the models tend to offer a more reliable seasonal cycle. Therefore, we bias correct the observational products, SEBS (ISCCP) and MPI, for each of the 12 months in a year, to match the seasonal cycle of the average of VIC and ERA-Interim Inferred. For other energy-limited basins, the same bias correction is performed. For water-limited basins, including the Niger, Nile, Murray–Darling, Limpopo, and Yellow, confidence in the model-produced seasonal cycle is not significantly stronger than for the observations and remote-sensing-based estimates. However, the annual total evapotranspiration from VIC is reasonable because VIC is forced with observed rainfall and calibrated against observed runoff, both of which are better observed than evapotranspiration, and VIC forces water balance by model construct. For these basins, we bias correct the SEBS (ISCCP), MPI, and ERA-Interim Inferred to match the annual totals of VIC. After the bias correction, the error variances are calculated around the mean of all products owing to the lack of an alternative method to quantify errors from production information.
b. Data assimilation to combine estimates





c. Closing the water balance and attribution of imbalance errors
Combined estimates of individual water budget variables from the second step do not necessarily close the water balance. The water balance errors are defined and treated using the constrained Kalman filter (CKF) as follows.
1) Water balance errors
2) Constrained Kalman filter
The water budget imbalance errors are analyzed with a constrained Kalman filter (CKF), which is a simpler (nonensemble) form of the constrained ensemble Kalman filter (CEnKF), as proposed in (Pan and Wood 2006).





There have been several approaches for this well-posed problem, for example, to formulate it as an error minimization problem or orthogonal projection problem onto the constraint surface (Simon and Chia 2002), but they all come to the correct solution. The solutions have been discussed thoroughly in Simon and Chia (2002) in the context of a filtering or data assimilation problem (McLaughlin 2002) with a state constraint. For simplicity and convenient implementation with a regular Kalman Filter (KF) (Kalman 1960) or an ensemble Kalman filter (EnKF) (Evensen 1994), the constraining procedure adopted in Pan and Wood (2006) is done as a “post processing” procedure, which can be independently performed after a regular filter update. Because this postprocessing approach is a standalone procedure, it is ideal for water balance estimation applications without filtering or data assimilation.





The state error covariance εrr is calculated entry by entry according to Eq. (14). All diagonal entries (εpp, εee, εqq, and εΔsΔs) have already been calculated from the merging step and they vary from month to month. To calculate the cross-covariance εpe, we break it down to the product of variances and correlations εpe = εppρpeεee, and assume that the error correlation ρpe remains unchanged in time (stationary). As a result, ρpe can be computed using the deviations from all product means during the merging step. Errors in q and Δs are assumed uncorrelated with errors in p or e and mutually uncorrelated as well owing to the lack of data for their estimation. Thus, εqp = εqe = εΔsp = εΔse = εqΔs = 0.
4. Results and discussion
Not all of the 32 basins need to be discussed in detail, and the Amazon will serve as the example for the entire water cycle assessment process since it is perhaps the most difficult but important study basin because of its extremely large size and flux volume. Difficulties in determining the budget arise from the extremely sparse observation networks relative to the magnitude of its budget terms, inadequate modeling because of the uncertainty in the inputs, and limited validation data. The merging process is shown for precipitation products over the Amazon in Fig. 6. Even though the four products do not differ significantly (top panel), their weighs in the merging are quite different: GPCC takes the majority of 74%, and WM, CPC, and CRU take 15%, 9%, and 2%, respectively. Such a weighting reflects the strong influence of gauge densities of the four products over Amazon: 44, 22, 14, and 4 (per 106 km2), because these low density numbers lead to large differences in sampling errors (see the curve in Fig. 5). The influence of gauge density becomes insignificant in high density areas as in the Danube or Mississippi, where cross-product differences will basically determine the uncertainties. Note that the merging weights differ seasonally, and the mean weights are shown in the bottom panel of Fig. 6 as a pie chart.
Example of merging of precipitation products for the Amazon. Monthly time series of (top) original products and (bottom) the merged estimate. The pie chart represents the mean of the merging weights applied to each product to form the merged estimate, and the gray area marks the range (between maximum and minimum) of inputs.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
Figure 7 shows the bias correction (middle panel) and merging process (bottom panel) for evapotranspiration. Over the Amazon, the bias correction adjusts all products to match the seasonal cycle of the average of VIC and ERA-Interim. As seen in Fig. 4, VIC and ERA-Interim have weak but conflicting seasonal cycles, and their average has very low seasonal variations, which is similar to MPI. The high-biased SEBS is adjusted downward significantly. In terms of merging weights, MPI and VIC tend to have very strong weights (44% and 36%), and ERA-Interim Inferred and SEBS (ISCCP) are less weighted (8% and 12%). Both ERA-Interim Inferred and SEBS (ISCCP) have relatively large spikes, making them farther awayfrom the all-product means and resulting in lower weight.
As in Fig. 6 but of evapotranspiration including (middle panel) bias-corrected versions of the products.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
The same merging procedure is implemented over all 32 basins, and Figs. 8 and 9 give the merging weights of all basins for precipitation and evapotranspiration, respectively. For precipitation (Fig. 8), the GPCC product has larger weights than others over sparsely gauged areas like the Arctic region, South America, and Africa. The dominance of GPCC diminishes over Europe, North America, and Australia. For evapotranspiration (Fig. 9), VIC and MPI hold significant and similar weights in most basins, and the two products contribute more than two-thirds of the total weight. ERA-Interim Inferred takes much less weight, and SEBS (ISCCP) contributes the least (slightly higher in tropical basins than in high-latitude basins). This may suggest that the remote sensing is often an outlier against modeling or the tower estimate, even though we lack evidence and data to show which is more reliable.
Mean merging weights for the precipitation products for the 32 selected basins.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
As in Fig. 8 but for the evapotranspiration.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
No merging is applied to the runoff or the storage change because only one product is used at a time. The merged water cycle estimates do not close the water budget, and this is shown in the left column in Fig. 10. The top panel plots all flux terms: evapotranspiration and runoff are plotted as blue and green stacked bars and the precipitation is shown by a purple line. If the water balance is satisfied, then the gap between the purple line and top of the stacked bars should be exactly balanced by the storage change (cyan bars in the middle panel); otherwise the imbalance values are plotted as red bars in the bottom panel. The imbalances over the Amazon have a mean close to zero, that is, no long-term accumulated imbalance, and the values vary seasonally. The right column in Fig. 10 repeats the left column except that all estimates are balance constrained using the CKF. The CKF removes the imbalances by distributing them to all water cycle components according to their covariances with the imbalances (variances + correlation). The CKF attributes the imbalances month by month and the mean attribution is shown as a pie chart in the bottom right panel of Fig. 10. For the Amazon, 38% of the errors are attributed to precipitation owing to the extremely sparse gauges. A very significant part is attributed to both evapotranspiration (25%) and runoff (25%). This is because on one hand the cross-product discrepancy is large for evapotranspiration in Amazon and on the other hand the runoff is a very large term in Amazon, so the 5% error becomes significant.
(left) Unconstrained and (right) constrained water budgets over the Amazon. The water budget imbalance is shown in the bottom panels. By construct, the constrained imbalance is zero. Also shown is the attribution of the imbalance to each variable averaged over the time period.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
Figures 11 and 12 summarize the final results for the merged and balance-constrained water budget estimates. Figure 11 gives the annual time series of all water cycle components for 1984–2006 over 12 selected basins. Linear trends are calculated from the time series and the slopes are annotated in Fig. 11. The slope is shown only if the trend is statistically significant at the 95% level. The largest trends are positive, for example, the rising fluxes over the Niger and Mekong. A notable negative trend occurs for the runoff in the Murray–Darling basin in Australia. Interestingly, the storage change over Murray–Darling has a net negative accumulation in the final years of the period. This net loss of terrestrial water storage relates to the long drought over the region (Murphy and Timbal 2008) and possibly the use of groundwater for irrigation that would be identified by the GRACE data (Leblanc et al. 2009). Figure 12 gives the seasonal cycles of all water cycle components, which are as expected, and confirms the importance of snow storage and melt over northern basins like the Lena, Mackenzie, and Volga.
Annual time series of all water budget components after balance constraining over 12 selected basins. The slope of the linear trend (mm yr−2) is annotated to the corresponding time series with the significance level in the brackets (annotated only if the significance level is greater than 95%). Trend and significance level are annotated in different fonts for different variables: bold font for p, normal font for e, and italic font for Δs. All time series values are in millimeters per year.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
Seasonal cycles of all water budget components after balance constraining for 12 selected basins.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
Figure 13 presents an overview of the final results in terms of a map of the mean annual fluxes and storage releases. The blue and green stacked bars (to scale) represent the long-term mean annual evapotranspiration and runoff (mm month−1). The sum of the two fluxes should be very close to the precipitation because the long-term storage change should equal zero. The cyan bars represent the mean annual maximum storage release (mm month−1). This map shows the relative magnitude of all terrestrial water cycle components and the relative importance of the snowmelt. Unsurprisingly, basins like the Amazon, Congo, and Mekong are among the wettest, and snow plays a large role in the water budgets for Arctic river basins such as the Northern Dvina and Pechora.
Finally, Fig. 14 shows a map of the mean error attribution from the water balance constraining. The evapotranspiration term receives the most error attribution in a significant portion of the 32 basins, especially those in Africa, South Asia, and Australia. Precipitation receives a significant amount of error attribution in South America where the gauges are scarce and the rainfall is heavy. Attributions to the storage term are generally small except for those northern basins where the snow storage is large.
Mean error attribution among water budget terms.
Citation: Journal of Climate 25, 9; 10.1175/JCLI-D-11-00300.1
5. Conclusions
A systematic method is proposed to optimally combine estimates of the terrestrial water cycle from different data sources and to enforce the water balance constraint using data assimilation techniques. The method has been applied to create global long-term records of the terrestrial water budget by merging a number of global datasets including in situ observations, remote sensing retrievals, LSM simulation output, and global reanalyses. The data merging process utilizes existing knowledge of the land surface dynamical system and the characteristics of various estimation methods, such that biases and errors from different data sources can be compensated to the extent possible and the merged estimations carry the best possible confidence. The global scale of the study and the number of different variables and products involved pose a major challenge to this study and, as a result some of the assumptions made during the bias and error analysis, especially those for evapotranspiration products, do not have adequate in situ data (and subsequent quantitative analysis) to support. However, we believe that the resulting global water budget estimations over 32 major basins for 1984–2006 can be used as a baseline dataset for large-scale diagnostic studies, for example, integrated assessment of basin water resources, trend analysis and attribution, and climate change studies. As 23 years is still relatively short for trend analysis for variables with strong interannual variability, the next step is to improve the quality, coverage, resolution, and potential usage of such a baseline data record by including more sources of long-term data where available.
The temporal aggregation to monthly values limits the applicability of the merged dataset to study issues such as the diurnal cycle and daily variations, but such issues have a small impact when basin-average water budgets or long-term time scale analyses are carried out. There is a potential to extend the merged estimates to finer time and space scales, but a major challenge is to spatially disaggregate river runoff to discrete grids, which is an ongoing activity by the authors.
Acknowledgments
This research is supported by National Aeronautic and Space Administration (NASA) Grant NNX08AN40A (“Developing consistent Earth system data records for the global terrestrial water cycle”).
REFERENCES
Bras, R., and I. Rodríguez-Iturbe, 1976: Evaluation of mean square error involved in approximating the areal average of a rainfall event by a discrete summation. Water Resour. Res., 12, 181–184.
Chen, M., P. Xie, J. E. Janowiak, and P. A. Arkin, 2002: Global land precipitation: A 50-yr monthly analysis based on gauge observations. J. Hydrometeor., 3, 249–266.
Dai, A., I. Y. Fung, and A. D. Del Genio, 1997: Surface observed global land precipitation variations during 1900–88. J. Climate, 10, 2943–2962.
Dai, A., T. Qian, K. E. Trenberth, and J. D. Milliman, 2009: Changes in continental freshwater discharge from 1948 to 2004. J. Climate, 22, 2773–2791.
Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99 (C5), 10 143–10 162.
Gao, H., Q. Tang, C. R. Ferguson, E. F. Wood, and D. P. Lettenmaier, 2010: Estimating the water budget of major U.S. river basins via remote sensing. Int. J. Remote Sens., 31, 3955–3978, doi:10.1080/01431161.2010.483488.
Jung, M., M. Reichstein, and A. Bondeau, 2009: Towards global empirical upscaling of FLUXNET eddy covariance observations: Validation of a model tree ensemble approach using a biosphere model. Biogeosci. Discuss., 6, 5271–5304.
Jung, M., and Coauthors, 2010: Recent decline in the global land evapotranspiration trend due to limited moisture supply. Nature, 467, 951–954.
Kalman, R. E., 1960: A new approach to linear filtering and prediction problems. J. Basic Eng., 82D, 35–45.
Leblanc, M. J., P. Tregoning, G. Ramillien, S. O. Tweed, and A. Fakes, 2009: Basin-scale, integrated observations of the early 21st century multiyear drought in southeast Australia. Water Resour. Res., 45, W04408, doi:10.1029/2008WR007333.
Liang, X., D. P. Lettenmaier, E. F. Wood, and S. J. Burges, 1994: A simple hydrologically based model of land surface water and energy fluxes for GSMs. J. Geophys. Res., 99 (D7), 14 415–14 428.
Liang, X., E. F. Wood, and D. P. Lettenmaier, 1996: Surface soil moisture parameterization of the VIC-2L model: Evaluation and modifications. Global Planet. Change, 13, 195–206.
Luo, L., and E. F. Wood, 2008: Use of Bayesian merging techniques in a multimodel seasonal hydrologic ensemble prediction system for the eastern United States. J. Hydrometeor., 9, 866–884.
Luo, L., and Coauthors, 2003: Validation of the North American Land Data Assimilation System (NLDAS) retrospective forcing over the Southern Great Plains. J. Geophys. Res., 108, 8843, doi:10.1029/2002JD003246.
Matsuura, K., and C. J. Willmott, 2010: Terrestrial air temperature and precipitation: 1900–2008 gridded monthly and annual time series. Version 2.01, Center for Climatic Research, University of Delaware. [Available online at http://climate.geog.udel.edu/~climate/html_pages/Global2_Ts_2009/README.global_t_ts_2009.html.]
McCabe, M. F., E. F. Wood, R. Wojcik, M. Pan, J. Sheffield, H. Gao, and H. Su, 2008: Hydrological consistency using multi-sensor remote sensing data for water and energy cycle studies. Remote Sens. Environ., 112, 430–444.
McLaughlin, D. B., 2002: An integrated approach to hydrologic data assimilation: Interpolation, smoothing, and filtering. Adv. Water Resour., 25, 1275–1286.
Milly, P. C. D., and K. A. Dunne, 2002: Macroscale water fluxes 1. Quantifying errors in the estimation of basin mean precipitation. Water Resour. Res., 38, 1205, doi:10.1029/2001WR000759.
Mitchell, T. D., and P. D. Jones, 2005: An improved method of constructing a database of monthly climate observations and associated high-resolution grids. Int. J. Climatol., 25, 693–712, doi:10.1002/joc.1181.
Morel, P., 2001: Why GEWEX? The agenda for a global energy and water cycle program. GEWEX News, Vol. 11, No.1, International GEWEX Project Office, Silver Spring, MD, 1–11.
Murphy, B., and B. Timbal, 2008: A review of recent climate variability and climate change in southeastern Australia. Int. J. Climatol., 28, 859–879, doi:10.1002/joc.1627.
NASA, 2003: Earth Science Enterprise Strategy. National Aeronautics and Space Administration, Washington, DC, 94 pp. [Available online at http://science.nasa.gov/media/medialibrary/2010/03/31/ESE_Strategy2003.pdf.]
NASA NEWS Science Integration Team, 2007: Predicting energy and water cycle consequences of earth system variability and change. 89 pp. [Available at http://news.cisc.gmu.edu/doc/NEWS_implementation.pdf.]
Oki, T., T. Nishimura, and P. Dirmeyer, 1999: Assessment of annual runoff from land surface models using Total Runoff Integrating Pathways (TRIP). J. Meteor. Soc. Japan, 77, 235–255.
Pan, M., and E. F. Wood, 2006: Data assimilation for estimating terrestrial water budget using a constrained ensemble Kalman filter. J. Hydrometeor., 7, 534–547.
Pan, M., E. F. Wood, R. Wojcik, and M. McCabe, 2008: Estimation of regional terrestrial water cycle using multi-sensor remote sensing observations and data assimilation. Remote Sens. Environ., 112, 1282–1294.
Pan, M., H. Li, and E. F. Wood, 2010: Assessing the skill of satellite-based precipitation estimates in hydrologic applications. Water Resour. Res., 46, W09535, doi:10.1029/2009WR008290.
Rawlins, M. A., and Coauthors, 2010: Analysis of the Arctic system for freshwater cycle intensification: Observations and Expectations. J. Climate, 23, 5715–5737.
Rodríguez-Iturbe, I., and J. M. Mejía, 1974: The design of rainfall networks in time and space. Water Resour. Res., 10, 713–728, doi:10.1029/WR010i004p00713.
Rossow, W. B., and R. A. Schiffer, 1999: Advances in understanding clouds from ISCCP. Bull. Amer. Meteor. Soc., 80, 2261–2287.
Sahoo, A. K., M. Pan, T. J. Troy, R. Vinukollu, J. Sheffield, and E. F. Wood, 2011: Reconciling the global terrestrial water budget using satellite remote sensing. Remote Sens. Environ., 115, 1850–1865, doi:10.1016/j.rse.2011.03.009.
Schneider, U., T. Fuchs, A. Meyer-Christoffer, and B. Rudolf, 2008: Global precipitation analysis products of the GPCC. GPCC Publication, 12 pp. [Available online at ftp://ftp.dwd.de/pub/data/gpcc/PDF/GPCC_intro_products_2008.pdf.]
Sheffield, J., and E. F. Wood, 2007: Characteristics of global and regional drought, 1950–2000: Analysis of soil moisture data from off-line simulation of the terrestrial hydrologic cycle. J. Geophys. Res., 112, D17115, doi:10.1029/2006JD008288.
Sheffield, J., G. Goteti, and E. F. Wood, 2006: Development of a 50-yr high-resolution global dataset of meteorological forcings for land surface modeling. J. Climate, 19, 3088–3111.
Sheffield, J., C. R. Ferguson, T. J. Troy, E. F. Wood, and M. F. McCabe, 2009: Closing the terrestrial water budget from satellite remote sensing. Geophys. Res. Lett., 36, L07403, doi:10.1029/2009GL037338.
Simmons, A. J., S. Uppala, D. Dee, and S. Kobayashi, 2006: ERA-Interim: New ECMWF reanalysis products from 1989 onwards. ECMWF Newsletter, No. 110, ECMWF, Reading, United Kingdom, 25–35. [Available online at http://www.ecmwf.int/publications/newsletters/pdf/110_rev.pdf.]
Simon, D., and T. L. Chia, 2002: Kalman filtering with state equality constraints. IEEE Trans. Aerosp. Electron. Syst., 38, 128–136.
Su, Z., 2002: The Surface Energy Balance System (SEBS) for estimation of turbulent heat flux. Hydrol. Earth Syst. Sci., 6, 85–99.
Swenson, S., and J. Wahr, 2002: Methods for inferring regional surface-mass anomalies from Gravity Recovery and Climate Experiment (GRACE) measurements of time-variable gravity. J. Geophys. Res., 107, 2193, doi:10.1029/2001JB000576.
Tang, Q., H. Gao, P. Yeh, T. Oki, F. Su, and D. P. Lettenmaier, 2010: Dynamics of terrestrial water storage change from satellite and surface observations and modeling. J. Hydrometeor., 11, 156–170.
Tian, Y., and C. D. Peters-Lidard, 2010: A global map of uncertainties in satellite-based precipitation measurements. Geophys. Res. Lett., 37, L24407, doi:10.1029/2010GL046008.
Troy, T. J., J. Sheffield, and E. F. Wood, 2010: Estimation of the terrestrial water budget over northern Eurasia through the use of multiple data sources. J. Climate, 24, 3272–3293.
Vinukollu, R. K., E. F. Wood, C. R. Ferguson, and J. B. Fisher, 2011: Global estimates of evapotranspiration for climate studies using multi-sensor remote sensing data: Evaluation of three process-based approaches. Remote Sens. Environ., 115, 801–823, doi:10.1016/j.rse.2010.11.006.
Yang, D., D. Kane, Z. Zhang, D. Legates, and B. Goodison, 2005: Bias corrections of long-term (1973–2004) daily precipitation data over the northern regions. Geophys. Res. Lett., 32, L19501, doi:10.1029/2005GL024057.