1. Introduction
Terrestrial water storage (TWS) includes all forms of water stored on or below land surfaces such as snow/ice, soil moisture, groundwater, surface water, lakes/reservoirs, canopy interception, and biomass water (Hirschi et al. 2007). It plays a key role in the terrestrial and global water cycle and its components are of great importance to the climate system (Seneviratne et al. 2004; Hirschi et al. 2006a; Hirschi et al. 2007; Syed et al. 2008). Terrestrial water storage change (TWSC), which is defined as the spatiotemporal evolution of TWS, is an important water balance component. Improved quantitative estimates of TWSC are important for the management of water resources, agriculture, and ecosystems (Yeh and Famiglietti 2008). Moreover, TWSC estimates support our understanding of global climate change, for instance, the impact of TWSC on sea levels (Pokhrel et al. 2012) and on Earth rotation parameters such as length of day and polar motion (Chao and O’Connor 1988; Adhikari and Ivins 2016; Liu et al. 2018). Previous studies have shown that TWSC is instrumental for assessing extreme climate events, such as floods, droughts, and heatwaves (Andersen et al. 2005; Reager and Famiglietti 2009; Reager et al. 2015; Yi and Wen 2016; Sinha et al. 2017).
However, observation data of TWSC are extremely scarce. Prior to the Gravity Recovery and Climate Experiment (GRACE) satellite mission, which was launched in 2002, no systematic, spatiotemporally resolved, and regional- to continental-scale observations of TWSC were available (Swenson et al. 2006). TWSC estimates from GRACE have been compared with in situ hydrological observations in several areas, showing promising performance (Swenson et al. 2006; Strassberg et al. 2007). Because the uncertainty of GRACE-derived TWSC is smaller than the uncertainty of TWSC simulation approaches (Alkama et al. 2010), GRACE data have been widely used from basin to regional and continental scales (Feng et al. 2012; Li et al. 2016; Wang and Li 2016; Yang et al. 2017).
Given the limited temporal coverage of GRACE data (less than two decades), it is necessary to develop methods to estimate longer-term TWSC data. Generally, there are two ways to estimate TWSC. The water balance method estimates TWSC by determining the difference between precipitation and the sum of evapotranspiration and runoff (Tang et al. 2010) or by replacing the residual between precipitation and evapotranspiration by transforming the atmospheric water balance equation (Seneviratne et al. 2004; Hirschi et al. 2006a; Hirschi et al. 2006b; Mueller et al. 2011; Hirschi and Seneviratne 2017). Non–water balance methods assess the major components of TWSC directly. For example, Chen and Wilson (2005) used soil moisture and snow data from the National Centers for Environmental Prediction (NCEP) reanalysis and Hirschi et al. (2006a) used in situ soil moisture and snow cover depth observations to represent TWSC. Seneviratne et al. (2004) used soil moisture, snow cover, and groundwater observations to estimate TWSC. Alkama et al. (2010) represented the TWSC as the total change in model-simulated soil moisture, snow, vegetation interception, and stream water content. Reanalysis data are often used to estimate TWSC because of the length of the available record, global coverage, and the consistency in both hydrological and atmospheric data (Hirschi et al. 2006a; Hirschi et al. 2006b; Yeh and Famiglietti 2008; Mueller et al. 2011; Hirschi and Seneviratne 2017).
Although numerous past TWSC studies have used water balance and non–water balance methods with reanalysis data, few studies have systematically compared both approaches. Previous studies chose one method over another method based only on theoretical deduction without a comparison (e.g., Chen and Wilson 2005). Thus, in this study, we focus on and analyze the differences between commonly used methods, including the terrestrial water balance method (hereafter called PER), the atmospheric and terrestrial water balance method (hereafter called AT), and the summation method (hereafter called SS), to estimate TWSC. To analyze the impact of using different reanalysis data, this study employs three advanced reanalysis datasets, the NCEP–Department of Energy (DOE) Reanalysis II (NCEP R2), the ECMWF interim reanalysis (ERA-Interim, hereinafter ERA-I), and the Japanese 55-Year Reanalysis (JRA-55) datasets, to calculate the TWSC using the three methods.
This study follows the path of many previous studies (Alkama et al. 2010; Zhang et al. 2016; Hirschi and Seneviratne 2017; Zhang et al. 2017) and uses GRACE-derived TWSC as the validation data to evaluate the estimates. First, the similarity of the global TWSC distribution derived from each estimate and GRACE is analyzed. Second, the global spatial mean of the estimates at monthly to yearly time scales are compared with GRACE. Third, the consistency of the regional TWSC between each estimate and GRACE is discussed by evaluating the globally distributed differences in the phase, long-term average, and amplitude. The possible reasons for the differences of the applied methods are discussed, and guidance is provided for choosing potential approaches to increase the accuracy of TWSC estimates.
2. Methods and data
a. Data
This study employs three reanalysis datasets, namely the NCEP R2 from NCEP, the ERA-I from the European Centre for Medium-Range Weather Forecasts, and the JRA-55 from the Japan Meteorological Agency (JMA). Precipitation, evapotranspiration, runoff, precipitable water, divergence of moisture flux, volumetric soil water layer, and snow water equivalent from these datasets are used to estimate TWSC. The GRACE Tellus LAND Grids dataset provided by NASA is used for comparison with the estimates. The Köppen–Geiger climate classification dataset is used for assessing the accuracy of the estimates.
All data in this study, except for the climate classification data, were interpolated to the NCEP T62 Gaussian grid (192 × 94, ~222 km × 222 km, the lowest spatial resolution of all data) at monthly temporal scale. Limited by the time span of GRACE (2002–16) and JRA-55 (1979–2014), the study period ranges from May 2002 to December 2014.
1) NCEP R2 dataset
The NCEP R2 (https://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis2.html) is an improved version of the NCEP R1, which is a state-of-the-art analysis/forecast system including assimilation of atmospheric, ocean, and land surface observations (see Table 1). Compared with NCEP R1, the NCEP R2 has corrected human errors, has higher resolution, and provides a more accurate picture of soil wetness, near-surface temperature, hydrological budget, winter precipitation of polar regions, hydrological and energy budget in polar regions, and snow cover (Kanamitsu et al. 2002).
NCEP R2 data used in the study.
Since air temperature data are not provided by NCEP R2, these data are derived from NCEP R1 and the sublimation data are from the NOAA-CIRES Twentieth Century Reanalysis version 2c, which is available at https://www.esrl.noaa.gov/psd/data/gridded/data.20thC_ReanV2c.html. The wind levels are set to be the same as the specific humidity to calculate the divergence of the atmospheric moisture flux.
2) ERA-I dataset
The ERA-I dataset is one of the latest global atmospheric reanalysis datasets from the ECMWF and was launched in 2011. The primary goal of ERA-I is to address several problems encountered in its former version (ERA-40), which includes the low accuracy in representing the hydrological cycle and the inconsistency in time of the reanalyzed fields. By correcting the incorrect data assimilation processes, the ERA-I improves the moist physical processes and provides a better hydrological cycle presentation (Dee et al. 2011).
This study uses data from ERA-I, which includes seven variables downloaded from http://apps.ecmwf.int/datasets/ (as shown in Table 2) to calculate TWSC.
ERA-I data used in the study.
3) JRA-55 dataset
JRA-55, which was launched in 2013, is the second Japanese global atmospheric reanalysis project conducted by the JMA. It is one of the latest and most advanced reanalysis datasets and features many improvements over its predecessor, the Japanese 25-Year Reanalysis (JRA-25). JRA-55 aims to provide a high-quality time-consistent dataset for studying climate change and multidecadal variability (Kobayashi et al. 2015). Compared with JRA-25, JRA-55 has shown significant improvements in temporal coverage and temperature bias in the lower stratosphere and the dry land surface problem in the Amazon basin (Ebita et al. 2011). (The data listed in Table 3 were downloaded from https://rda.ucar.edu and are used for estimating TWSC.)
JRA-55 data used in the study.
4) GRACE Tellus LAND Grids dataset
The GRACE Tellus LAND Grids dataset has been updated monthly and is available at https://grace.jpl.nasa.gov/data/get-data/monthly-mass-grids-land/. The updates of the dataset include the removal of atmospheric pressure and mass contributions, as well as postprocessing (Swenson and Wahr 2006), such as destriping, and Gaussian filters to reduce the errors in GRACE observation. In addition, much energy has also been removed at small spatial scales. A provided land-grid scaling was applied to these data to restore the energy removed by sampling and postprocessing (Landerer and Swenson 2012).
This study averages the solutions from three different processing centers, which are the Jet Propulsion Laboratory (JPL), Geoforschungs Zentrum Potsdam (GFZ), and the Center for Space Research (CSR) at the University of Texas, Austin. Each dataset represents the TWS mass deviation relative to the baseline average from January 2004 to December 2009. To compare the results with other estimates, the time average from May 2002 to December 2014 of each dataset is also subtracted in this study. Given that the baseline average from January 2004 to December 2009 has been subtracted in each dataset, the time average from May 2002 to December 2014 is equal to the time average of the other periods (May 2002–December 2003 and January 2010–December 2014). In addition, gaps in the GRACE time series were filled using gap-filling algorithms as presented in section 2b(1)(iv).
5) Köppen-Geiger climate classification dataset
To assess the accuracy of the estimates of TWSC, the global Köppen-Geiger climate classification dataset (Kottek et al. 2006) is used in this study. It is based on datasets from the Climatic Research Unit (CRU) of the University of East Anglia and the Global Precipitation Climatology Centre (GPCC) at the German Weather Service. The new digital Köppen–Geiger global climate classification dataset for the period of 1950–2000 can be downloaded from http://koeppen-geiger.vu-wien.ac.at/present.htm.
b. Methods
This study uses three methods (PER, AT, SS) and three reanalysis datasets (NCEP R2, ERA-I, JRA-55) to calculate TWSC. Thus there are nine sets of estimates based on the combinations of methods and inputs: NCEP/PER, NCEP/AT, NCEP/SS, ERA/PER, ERA/AT, ERA/SS, JRA/PER, JRA/AT, and JRA/SS. In addition, for each method, the average of three results is calculated to show the general performance of each method. To assess the accuracy of these estimates, six assessment indices in space and time are used to compare the results with the estimated TWSC, including the similarity coefficient of spatial distribution, the Nash–Sutcliffe efficiency, correlation coefficient, lag time, difference in mean value, and fractional difference of amplitude.
1) TWSC calculation methods
(i) Terrestrial water balance method (PER). In any given region, the TWSC can be estimated by the water balance equation:
(ii) Combined atmospheric and terrestrial water balance method (AT).
(iii) The summation method (SS).
(iv) Converting mass anomalies to change rates.
The monthly global mean of TWSC from GRACE (solid line with circles) and its Lagrange interpolation result (dashed line with crosses) from May 2002 to December 2014.
Citation: Journal of Climate 33, 2; 10.1175/JCLI-D-18-0637.1
2) Assessment indices
(i) Similarity in spatial distribution. The similarity coefficient is used to quantify the level of similarity between the spatial distribution of the estimates and the GRACE data; it is calculated as described by Lin (2006):
(ii) Consistency in time.
However, the NSE of the global mean is an overall measure of accuracy that is partly affected by the global differences in the fluctuation pattern, phase, mean value, and amplitude between the estimates and the GRACE data. To determine the underlying reasons for the differences in accuracy, the differences in the fluctuation pattern, phase, amplitude, and mean value are computed in each grid.
Second, the mean values of the estimates and the GRACE data are used to show the level of imbalance between the input and output of the water budget. Since the mean value of the TWSC monthly series in each grid is the mass change rate on a monthly scale, it reflects the regional water balance. The imbalance in the GRACE data can be regarded as the real changes in the water cycle caused by climate change or anthropogenic influences because of its relatively high accuracy. Thus the difference between the imbalance of the estimates and the GRACE data should be the bias in the simulation. The similarity in the global distribution of the mean value between each estimate and GRACE is computed to determine if there is any bias in the simulation.
Third, the fractional difference in amplitude between the estimates and the GRACE data is computed to assess the bias in the amplitude of the estimates. The amplitude here is defined as the differences between the maximum and minimum value in the multiyear averaged annual cycle of TWSC. To remove the effect of the uneven spatial distribution of the TWSC amplitude orders, the fractional difference in the amplitude is calculated as the difference in the amplitude between the estimates and the GRACE data divided by the amplitude of the GRACE data.
(iii) Uncertainty.
In this study, there are two approaches to calculate the uncertainties of the results. For each estimation method, the overall performance is represented as the performance of estimate averaged over three reanalysis datasets (hereafter called three-input average) and the uncertainty is represented as the differences between estimates of three different reanalysis datasets. For time-varying (space-varying) assessment index, the overall performance is represented as a time average (space average) and the uncertainty is represented as ±1 standard deviation.
3. Results
a. The differences in the evolution of global mean on monthly, mean annual cycle, and yearly scales
Figures 2a–c show the averaged global mean time series over three reanalysis datasets estimated by the three methods and GRACE data on monthly, multiyear mean annual cycle, and yearly scales. It is found that there are large uncertainties in the results of PER and AT in 2014, which appear to be outliers (Figs. 2a,c). A closer check of the document and data of the reanalysis datasets indicates that this result is related to the change in the data processing approach of JRA-55 starting in 2014. The entire JRA-55 production was completed in 2013 and thereafter was continued on a near-real-time basis (Kobayashi et al. 2015). We thus treat the global mean estimates of JRA/PER and JRA/AT in 2014 as outliers. To remove the effect of the inconsistent input from JRA-55, all estimates in 2014 have been removed before the calculation of all indices used later in this study. It should be noted that this inconsistency in time of the estimates derived from JRA/PER and JRA/AT only affects the others indices slightly compared to the effect on the yearly global mean changes.
(a) Monthly time series, (b) multiyear mean annual cycle, and (c) yearly changes of global mean TWSC from three different methods and GRACE during the period of 2002–14, and (d) the Nash–Sutcliffe efficiency of the global mean TWSC for the estimates from three different methods and different inputs on different time scales during the period of 2002–13 (the outliers of 2014 are removed). Note that (b) shows the multiyear monthly averages of (a), and (c) is the yearly average of (a). The error bars in (a)–(c) show the uncertainty of each method, i.e., the change in the results for the three different reanalysis input. The solid lines in (a)–(c) are the three-input average. In x-axis labels of (d), M means monthly changes, A means the annual cycle, and Y means yearly changes.
Citation: Journal of Climate 33, 2; 10.1175/JCLI-D-18-0637.1
As shown in Fig. 2a, there is a strong fluctuation in the monthly global mean TWSC and, generally, there is a high consistency between the estimates and the GRACE data. The NSE of the estimates shows that all the estimated monthly series match the GRACE data well; the NSE is 0.77 for AT, 0.81 for SS, and 0.62 for PER. This similar fluctuation pattern reflects the multiyear average annual cycle. As shown in Fig. 2b, the annual cycles are similar for the estimates and the GRACE data; the lowest values occur around June and the highest values around January. For the multiyear mean annual cycle, the NSE is 0.83 for AT, 0.89 for SS, and 0.62 for PER. On a yearly time scale, the NSE is 0.41 for AT, 0.81 for SS, and 0.73 for PER. All results of three-input average show higher accuracy of the SS in the global mean time series over PER and AT.
However, we notice that the accuracy of the global mean TWSC estimates of each method varies for different inputs (Fig. 2d). For the PER method, JRA/PER results in a global mean TWSC with high accuracy on both the yearly and monthly time scales whereas NCEP/PER fails to provide good performance on both time scales and ERA/PER misestimates on a yearly time scale. For the AT method, although NCEP/AT and JRA/AT have relatively low accuracy for the global mean yearly change, ERA/AT has high accuracy on both time scales. For the SS method, the estimates are generally highly consistent with the GRACE data, except for ERA/SS on a yearly time scale.
b. The evolution of differences in spatial similarity on monthly, mean annual cycle, and yearly scales
Averaged over three reanalysis datasets, the similarity coefficient of the global distribution between the estimates and GRACE of each month is fluctuating over time (Fig. 3a). The global distributions of TWSC derived from SS and PER are quite similar to the GRACE data with mean similarity coefficients of 0.47 ± 0.07 and 0.42 ± 0.11, followed by AT (0.22 ± 0.24). The fluctuation in the monthly series is also observed in the multiyear mean monthly changes (as shown in Fig. 3b) with a cycle of about 5–7 months. The yearly variations are much more stable, as shown in Fig. 3c. Overall, the three-input average result of SS is slightly better than that of PER and much better than that of AT.
(a) Monthly time series, (b) multiyear mean annual cycles, and (c) yearly changes in the similarity coefficients of global distribution between the estimates from the three methods and GRACE during the period of 2002–13, and (d) the comparison of three different methods with different inputs. Note that (b) shows the multiyear monthly averages of (a), and (c) is the yearly average of (a). The error bars in (a)–(c) show the uncertainty of each method, i.e., the change in of the results for the three different reanalysis input. The solid lines in (a)–(c) are the result between GRACE and the three-input averaged estimate of each method. The error bars in (d) show the uncertainty of the time-averaged similarity coefficient, represented as ±1 standard deviation. In the x-axis labels of (d), M means monthly changes, A means the annual cycle, and Y means yearly changes.
Citation: Journal of Climate 33, 2; 10.1175/JCLI-D-18-0637.1
Figure 3d shows the similarity coefficients for all method/reanalysis dataset combinations during the study period. It is found that the accuracy of the three methods depends on the different inputs. For NCEP R2 and ERA-I, the ranking of the accuracy of the three methods is similar to that of the three-input average. NCEP/AT and ERA/AT have low accuracy for the global distribution. However, JRA-55 provides relatively high accuracy in the results of all three methods after removing the outliers present in 2014.
c. The differences in the lag correlation between the monthly estimates and GRACE in each grid
The zero-lag correlation between the monthly estimates and the GRACE data in each grid cell is calculated to measure the similarity of temporal evolution on a regional scale. The averaged correlation over the three reanalysis data is illustrated in Figs. 4a–c. About 93% of the areas is in significant zero-lag correlation with GRACE at the 0.05 significance level for PER and SS and 68% for AT. The averaged zero-lag correlation coefficients on the global scale for AT, PER, and SS are 0.32 ± 0.30, 0.54 ± 0.26, and 0.56 ± 0.27, respectively (all at the 0.05 significance level). It is found that AT performs worst for the temporal evolution at the regional scale.
(a)–(l) Global lag time between GRACE and estimates in the regions with insignificant zero-lag correlation between GRACE and estimates at the 0.05 level, (m) the global distribution of the regions with arid climates and polar tundra climate according to the Köppen–Geiger climate classification during the period of 1951–2000, and (n) the comparison of results in (a)–(l). Note that (a)–(c) are the results of the three-input average, and (d)–(l) are results of three different methods with three different reanalysis datasets.
Citation: Journal of Climate 33, 2; 10.1175/JCLI-D-18-0637.1
The zero-lag correlation results of the three methods with three different reanalysis datasets are shown in Figs. 4d–l. For NCEP R2, the percentage of the significant zero-lag correlated area is 37% for AT, and 73% and 74% for PER and SS, respectively. For ERA-I, the significant areas for AT account for 65% of all areas whereas those for PER and SS are both 85%. The JRA-55 results are slightly better than those of ERA-I and the percentages are 70% for AT, 84% for PER, and 91% for SS. No matter which reanalysis datasets are used, the AT method performs worst for the temporal evolution on the regional scale. PER and SS have a similar better performance.
The nonzero lag correlation is further calculated for the areas with insignificant zero-lag correlations between the estimates and GRACE data. The lag times with the highest lag correlations between the estimates and GRACE are defined as the phase differences between the estimates and GRACE for these regions (Figs. 4a–l). It is shown that the AT results have the least agreement with the GRACE data in the phase with the largest lag time (3.38 ± 1.67 months for the average over three reanalysis datasets, 3.89 ± 1.59 months for NCEP R2, 3.36 ± 1.64 months for ERA-I, and 3.62 ± 1.52 months for JRA 55). Also, the SS performs best with the smallest lag time (2.78 ± 1.54 months for the average over three reanalysis datasets, 2.96 ± 1.71 months for NCEP R2, 2.86 ± 1.69 months for ERA-I, and 2.78 ± 1.58 months for JRA 55).
d. The differences in global distribution of the mean estimate of TWSC over the study period
Over a long period, based on the principle of water balance, the average of TWSC over the study period (2002–13) should equal to zero. The mean value averaged over the study period of the estimates and the GRACE data in each grid cell are shown in Figs. 5a–l; the results clearly show the issue of the water imbalanced over the globe. The differences of the global distributions of water-imbalanced issue are noticeably different among the three methods. The SS-derived estimates have the highest agreement with the GRACE data and the similarity coefficients of the global distribution are 0.37 for the average over three reanalysis datasets, 0.28 for NCEP R2, 0.21 for ERA-I, and 0.29 for JRA 55. The AT-derived estimates have the least agreement with GRACE and the similarity coefficients of the global distribution are 0.02 for the average over three reanalysis datasets, −0.01 for NCEP, −0.02 for ERA, and 0.09 for JRA. For the PER method, the result of the average over three reanalysis datasets is 0.12 but the result of NCEP is only 0.03 whereas those of ERA and JRA are 0.19 and 0.17, respectively.
Globally distributed mean value (mm) of TWSC from (a)–(l) estimates and (m) GRACE, and (n) the boxplots of results in (a)–(m). Note that (a)–(c) are the results of the three-input average, and (d)–(l) are the results of three different methods and three different reanalysis datasets. The central red line in the boxes indicates the median, and the blue bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The outliers, which fall below the 25th percentile minus 1.5 times the interquartile range (IQR, being equal to the difference between 75th and 25th percentiles) or above the 75th percentile plus 1.5 times the IQR, are plotted individually using red plus signs (+).
Citation: Journal of Climate 33, 2; 10.1175/JCLI-D-18-0637.1
Over the 12 years from 2002 to 2013, similar to the GRACE data, the result of SS shows a slight decrease trend of the mean value of TWSC (water deficit) in most areas of the globe and a slight increase trend (water sufficient) in tropical regions of the Northern Hemisphere and some subtropical regions of the Southern Hemisphere such as southern Africa. However, the results of PER and AT show a different global distribution pattern of the decreasing/increasing trend of TWSC from the GRACE data. AT even shows an opposite decreasing or increasing pattern. As shown in Fig. 5n, the large number of outliers in the results of AT and PER may be the reasons why AT and PER do not produce a similar pattern to that of the GRACE data.
e. The differences in global distribution of amplitude differences
As shown in Fig. 6, the global distribution of the fractional differences in the amplitude varies between the GRACE data and the estimates from different reanalysis datasets. Almost every estimate has considerable differences in regional TWSC amplitude with GRACE.
(a)–(l) Global fractional difference between the amplitude of estimates and GRACE, (m) the global amplitude (mm) of GRACE-derived TWSC, and (n) the boxplots of the results in (a)–(l). Note that (a)–(c) are the results of the three-input average, and (d)–(l) are the results of three different methods and three different reanalysis datasets. The central line in the boxes indicates the median and the bottom and top edges of the box indicate the 25th and 75th percentiles, respectively. The outliers, which fall below 25th percentile minus 1.5 times the IQR or above the 75th percentile plus 1.5 times the IQR, are plotted individually using plus signs (+).
Citation: Journal of Climate 33, 2; 10.1175/JCLI-D-18-0637.1
The amplitudes of the areas with high amplitudes around 200 mm (the black areas in Fig. 6m), that is, central Africa, northern Asia, and the Amazon basin, are more likely to be underestimated in estimates (the red areas in Figs. 6a–l). In contrast, the amplitudes of the areas with low amplitudes near 0 mm (the white areas in Fig. 6m), that is, eastern Australia, the Sahara Desert, southern Africa, the Great Plains, the central Andes Mountains, the Paraná basin, and most areas of eastern Asia, are more likely to be overestimated in estimates (the blue areas in Figs. 6a–l), especially the estimates derived from AT method (the dark blue areas Figs. 6b, 6e, 6h, and 6k).
According to the boxplots of all results, the estimates derived from AT methods perform worst as the median of fractional difference being 0.29 for estimate averaged over three reanalysis datasets, 0.28 for NCEP R2, 0.81 for ERA-I, and 0.87 for JRA 55. The estimates averaged over three reanalysis datasets based on the PER and SS methods, NCEP/SS and ERA/SS, are better than other estimates with the mean fractional differences being 0.21, 0.16, 0.03, and −0.16, respectively.
4. Discussion
a. Reason for the unstable performances of the water balance methods
In this study, we found that the time-averaged regional TWSC estimates based on PER and AT differed substantially from those of GRACE (Fig. 5). Eicker et al. (2016) examined the long-term consistency between TWSC estimates based on PER method and different reanalysis datasets and GRACE-derived TWSC and also found a lack of agreement. Eicker et al. (2016) concluded that the lack of agreement was attributed to the time-varying biases in the reanalysis data. Since the TWSC in this study is the mass change rate, the time-averaged TWSC in each grid also reflects the regional water balance conditions.
Numerous previous studies have pointed out the imbalance of the water budgets in reanalysis (Seneviratne et al. 2004; Hirschi et al. 2006a; Yeh and Famiglietti 2008; Zeng et al. 2008; Tang et al. 2010; Mueller et al. 2011; Hirschi and Seneviratne 2017). The artificial water sources or sinks caused by the data assimilation processes are the main reason for the imbalance in the water budget. Lorenz and Kunstmann (2012) also demonstrated the limited performance of reanalysis in closing the combined terrestrial and atmospheric water budget due to the forcing of atmospheric observations. Generally, the variables of the water cycle are constrained indirectly by many observations and thus are affected by any relevant data assimilation processes (Dee et al. 2011). Therefore, there is no doubt that estimates based on the water balance theory and reanalysis datasets will likely fail to describe the real water balance conditions at the regional scale.
Although imbalances in the water budget occur, the objective of data assimilation during reanalysis is to improve the accuracy of certain variables and different effects may occur due to different methods and goals. For example, the NCEP R2 corrects the infiltration processes by subtracting (adding) the difference between the observed precipitation and modeled precipitation from (to) soil moisture (Kanamitsu et al. 2002). This process improves the soil moisture data, thus only NCEP/SS produces highly accurate yearly global mean TWSC whereas NCEP/PER is less accurate (Fig. 2d). In addition, ERA-I exhibits an improvement in closing the global terrestrial and atmospheric water balance (Lorenz and Kunstmann 2012) and results in high accuracy for the yearly global mean TWSC for ERA/AT (Fig. 2d) but the regional water balance condition of ERA/AT has low accuracy (Fig. 5h).
In short, the water balance methods are affected by different data assimilation approaches and are very sensitive to the water balance conditions of the datasets. Therefore, caution should be used when water balance methods are employed to estimate TWSC.
b. Reason for low accuracy of the AT method for the global distribution
As shown in the results of Fig. 4, there is a considerable bias of the AT estimates in some regions, which reduces the spatial accuracy of AT. The study of Mueller et al. (2011) also demonstrated that there was a lack of agreement between the AT-derived TWSC estimates and GRACE-derived TWSC in some basins. The results of most of their regions matched the results of this study. Mueller et al. (2011) concluded that the reason that AT showed a lack of agreement with the GRACE data in some regions was the relatively high uncertainty in moisture divergence. Draper and Mills (2008) investigated the accuracy of the moisture divergence of the reanalysis data over the Murray–Darling basin and found that the moisture flux of this basin was too small to simulate and was easily overshadowed by simulation errors. In this study, the AT-derived estimate over the Murray–Darling basin did not agree with the GRACE data whereas the result of PER was in good agreement (Fig. 4). As shown in Fig. 4, this study also found that regions with a large bias in the phase of the AT-derived estimates were mostly arid climate and polar tundra climate regions. It can be assumed that the moisture flux bias has a more immediate impact on AT because the inactive moisture fluxes in those areas increase the difficulty in observing and estimating the relevant atmospheric data. The PER and SS, which are indirectly influenced, are less sensitive to the moisture flux bias than AT, as shown in Fig. 4.
c. Methods to improve the SS method
Although the SS result exhibited the best overall agreement with GRACE, all three methods, including SS, have some disadvantages when using reanalysis to estimate TWSC. For example, the amplitude of TWSC based on SS differed substantially from that of GRACE as shown in Fig. 6 and the similarity coefficient of the global distribution between the SS-derived estimate and GRACE was only 0.47.
One reason for the difference between SS and GRACE is that the groundwater has not been taken into account in the reanalysis datasets we used. Previous studies have pointed out the importance of groundwater in TWS (Alkama et al. 2010; Frappart et al. 2013; Sun et al. 2015). Zhang et al. (2017) investigated the consistency between the estimated TWSC from models with or without groundwater and the TWSC from GRACE. They found a better agreement with GRACE when the groundwater was taken into consideration. Thus the lack of groundwater information could have contributed to the lack of agreement between the estimates and GRACE. Since soil depth and terrain are unevenly distributed globally, the proportion of groundwater in TWSC varies spatially. This study showed underestimations of the amplitude in regions with abundant groundwater resources as shown in the study of Gleeson et al. (2016) (e.g., the Amazon basin and the northern part of Eurasia), as shown in Fig. 6. The uneven distribution of the groundwater contributed partly to the uneven global distribution in the differences in the amplitude and the global distribution between the estimates and GRACE.
Another reason is the human influence on the hydrological cycle through irrigation, reservoir impoundment, and other human activities, which are not fully considered in the reanalysis dataset. This problem will alter the natural curves of hydrological variations and cause discrepancies between the estimates and GRACE. According to Tang et al. (2010), although using measured data in the assimilation processes of reanalysis is equivalent to taking the human influence into account, the changes in the land surface schemes caused by human activities are not updated at the same time. This inconsistency between the input data may have also affected the estimates of TWSC.
Although all methods used in this study have some disadvantages in using reanalysis datasets to estimate TWSC, the SS-derived estimates exhibit a good agreement with the GRACE data. As long as the bias correction in the amplitude and global distribution is applied, it is expected that the use of SS results in high accuracy of the long-term TWSC series.
5. Conclusions
To find a better approach to estimate long-term and globally distributed TWSC with reanalysis datasets, in this study, we used three methods, namely PER, AT, and SS, and three reanalysis datasets, namely NCEP R2, ERA-I, and JRA-55, to estimate TWSC. The results show that the global mean changes of the averaged estimates over three reanalysis datasets by SS, with Nash–Sutcliffe efficiency coefficient (NSE) being 0.89 for multiyear mean annual cycle and 0.81 on a yearly time scale, are more consistent with the GRACE data than PER (AT), with NSE being 0.62 (0.83) and 0.73 (0.41). However, PER and AT can obtain the best performance based on single dataset of JRA-55 and ERA-I respectively. The temporal evolution of the global spatial distribution of the estimates by SS approaches, with the similarity coefficient being 0.47 ± 0.07, is better than PER (0.42 ± 0.11, 0.54 ± 0.26) and much better than AT (0.22 ± 0.24, 0.32 ± 0.30) for the average over the three reanalysis datasets. So do the results based on single dataset of NCEP R2 and ERA-I respectively while JRA-55 provides relatively high accuracy in the spatial distribution for all three methods. It is found that the considerable overestimation of long-term mean on regional scale for AT and PER may be one of the reasons why AT and PER are incapable of matching a pattern similar to that of the GRACE data. Different performances of different reanalysis datasets can be explained by the different focus of the updating used by the reanalysis datasets through data assimilation. Besides, the difficulty of atmospheric observation and simulation in arid and polar tundra regions is the documented reason for the failure of the AT method to represent the TWSC phase over 30% of the region found in this study. We found that the estimate based on SS was, in general, more consistent with the GRACE data than the water balance category methods (PER and AT); relatively high accuracy was obtained in space and time, especially for the three-input averaged SS estimate.
Therefore, among the three methods, SS has the most potential to estimate the long-term TWSC using reanalysis data. Although the SS method provided the best agreement with GRACE, it was found that there was some bias in the amplitude and global distribution of the estimate; this was partly attributed to the lack of groundwater information and human influence in the reanalysis datasets. It is suggested that a bias correction method should be used prior to the analysis.
Acknowledgments
This paper is financially supported by National Key Research and Development Program of China projects (Grants 2017YFA0603702, 2018YFE0106500), the National Program on Key Basic Research Project of China (Grant 2012CB957802), and the Danida Fellowship Centre EOForChina project (12-31359-K). Many thanks to Prof. Dr. Peter Bauer-Gottwein, Department of Environmental Engineering, Technical University of Denmark, for English editing of the manuscript. The first author also thanks the support of China Scholarship Council.
REFERENCES
Adhikari, S., and E. R. Ivins, 2016: Climate-driven polar motion: 2003–2015. Sci. Adv., 2, e1501693, https://doi.org/10.1126/sciadv.1501693.
Alkama, R., and Coauthors, 2010: Global evaluation of the ISBA-TRIP continental hydrological system. Part I: Comparison to GRACE terrestrial water storage estimates and in situ river discharges. J. Hydrometeor., 11, 583–600, https://doi.org/10.1175/2010JHM1211.1.
Andersen, O. B., S. I. Seneviratne, J. Hinderer, and P. Viterbo, 2005: GRACE-derived terrestrial water storage depletion associated with the 2003 European heat wave. Geophys. Res. Lett., 32, L18405, https://doi.org/10.1029/2005GL023574.
Bluestein, H. B., 1992: Synoptic-Dynamic Meteorology in Midlatitudes, Vol. I, Oxford University Press, 594 pp.
Chao, B. F., and W. P. O’Connor, 1988: Global surface-water-induced seasonal variations in the Earth’s rotation and gravitational field. Geophys. J. Int., 94, 263–270, https://doi.org/10.1111/j.1365-246X.1988.tb05900.x.
Chen, J. L., and C. R. Wilson, 2005: Hydrological excitations of polar motion, 1993–2002. Geophys. J. Int., 160, 833–839, https://doi.org/10.1111/j.1365-246X.2005.02522.x.
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Draper, C., and G. Mills, 2008: The atmospheric water balance over the semiarid Murray-Darling River basin. J. Hydrometeor., 9, 521–534, https://doi.org/10.1175/2007JHM889.1.
Ebita, A., and Coauthors, 2011: The Japanese 55-Year Reanalysis “JRA-55”: An interim report. SOLA, 7, 149–152, https://doi.org/10.2151/SOLA.2011-038.
Eicker, A., E. Forootan, A. Springer, L. Longuevergne, and J. Kusche, 2016: Does GRACE see the terrestrial water cycle “intensifying”? J. Geophys. Res., 121, 733–745, https://doi.org/10.1002/2015JD023808.
Feng, W., J. M. Lemoine, M. Zhong, and T. T. Hsu, 2012: Terrestrial water storage changes in the Amazon basin measured by GRACE during 2002–2010. Chin. J. Geophys., 55, 814–821.
Forootan, E., A. Safari, A. Mostafaie, M. Schumacher, M. Delavar, and J. L. Awange, 2017: Large-scale total water storage and water flux changes over the arid and semiarid parts of the Middle East from GRACE and reanalysis products. Surv. Geophys., 38, 591–615, https://doi.org/10.1007/s10712-016-9403-1.
Frappart, F., G. Ramillien, and J. Ronchail, 2013: Changes in terrestrial water storage versus rainfall and discharges in the Amazon basin. Int. J. Climatol., 33, 3029–3046, https://doi.org/10.1002/joc.3647.
Gleeson, T., K. M. Befus, S. Jasechko, E. Luijendijk, and M. B. Cardenas, 2016: The global volume and distribution of modern groundwater. Nat. Geosci., 9, 161–167, https://doi.org/10.1038/ngeo2590.
Hirschi, M., and S. I. Seneviratne, 2017: Basin-scale water-balance dataset (BSWB): An update. Earth Syst. Sci. Data, 9, 251–258, https://doi.org/10.5194/essd-9-251-2017.
Hirschi, M., S. I. Seneviratne, and C. Schar, 2006a: Seasonal variations in terrestrial water storage for major midlatitude river basins. J. Hydrometeor., 7, 39–60, https://doi.org/10.1175/JHM480.1.
Hirschi, M., P. Viterbo, and S. I. Seneviratne, 2006b: Basin-scale water-balance estimates of terrestrial water storage variations from ECMWF operational forecast analysis. Geophys. Res. Lett., 33, L21401, https://doi.org/10.1029/2006GL027659.
Hirschi, M., S. I. Seneviratne, S. Hagemann, and C. Schar, 2007: Analysis of seasonal terrestrial water storage variations in regional climate simulations over Europe. J. Geophys. Res., 112, D22109, https://doi.org/10.1029/2006JD008338.
Jacobson, M. Z., 2005: Fundamentals of Atmospheric Modeling. 2nd ed. Cambridge University Press, 813 pp.
Kanamitsu, M., W. Ebisuzaki, J. Woollen, S. K. Yang, J. J. Hnilo, M. Fiorino, and G. L. Potter, 2002: NCEP–DOE AMIP-II reanalysis (R-2). Bull. Amer. Meteor. Soc., 83, 1631–1643, https://doi.org/10.1175/BAMS-83-11-1631.
Kobayashi, S., and Coauthors, 2015: The JRA-55 Reanalysis: General specifications and basic characteristics. J. Meteor. Soc. Japan, 93, 5–48, https://doi.org/10.2151/jmsj.2015-001.
Kottek, M., J. Grieser, C. Beck, B. Rudolf, and F. Rubel, 2006: World map of the Köppen-Geiger climate classification updated. Meteor. Z., 15, 259–263, https://doi.org/10.1127/0941-2948/2006/0130.
Landerer, F. W., and S. C. Swenson, 2012: Accuracy of scaled GRACE terrestrial water storage estimates. Water Resour. Res., 48, W04531, https://doi.org/10.1029/2011WR011453.
Li, J. H., S. S. Wang, and F. Q. Zhou, 2016: Time series analysis of long-term terrestrial water storage over Canada from GRACE satellites using principal component analysis. Can. J. Remote Sens., 42, 161–170, https://doi.org/10.1080/07038992.2016.1166042.
Lin, S., 2006: Method of forecasting regional sandstorm process in spring on seasonal scale (in Chinese). J. Desert Res., 26, 478–483.
Liu, S., X. Mo, W. Zhao, V. Naeimi, D. Dai, C. Shu, and L. Mao, 2009: Temporal variation of soil moisture over the Wuding River basin assessed with an eco-hydrological model, in-situ observations and remote sensing. Hydrol. Earth Syst. Sci., 13, 1375–1398, https://doi.org/10.5194/hess-13-1375-2009.
Liu, S., S. Deng, X. Mo, and H. Yan, 2018: Indexing the relationship between polar motion and water mass change in a giant river basin. Sci. China Earth Sci., 61, 1065–1077, https://doi.org/10.1007/s11430-016-9211-2.
Lorenz, C., and H. Kunstmann, 2012: The hydrological cycle in three state-of-the-art reanalyses: Intercomparison and performance analysis. J. Hydrometeor., 13, 1397–1420, https://doi.org/10.1175/JHM-D-11-088.1.
Mueller, B., M. Hirschi, and S. I. Seneviratne, 2011: New diagnostic estimates of variations in terrestrial water storage based on ERA-Interim data. Hydrol. Processes, 25, 996–1008, https://doi.org/10.1002/hyp.7652.
Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through conceptual models. Part 1: A discussion of principles. J. Hydrol., 10, 282–290, https://doi.org/10.1016/0022-1694(70)90255-6.
Pokhrel, Y. N., N. Hanasaki, P. J. F. Yeh, T. J. Yamada, S. Kanae, and T. Oki, 2012: Model estimates of sea-level change due to anthropogenic impacts on terrestrial water storage. Nat. Geosci., 5, 389–392, https://doi.org/10.1038/ngeo1476.
Reager, J. T., and J. S. Famiglietti, 2009: Global terrestrial water storage capacity and flood potential using GRACE. Geophys. Res. Lett., 36, L23402, https://doi.org/10.1029/2009GL040826.
Reager, J. T., A. C. Thomas, E. A. Sproles, M. Rodell, H. K. Beaudoing, B. Li, and J. S. Famiglietti, 2015: Assimilation of GRACE terrestrial water storage observations into a land surface model for the assessment of regional flood potential. Remote Sens., 7, 14 663–14 679, https://doi.org/10.3390/rs71114663.
Seneviratne, S. I., P. Viterbo, D. Lüthi, and C. Schär, 2004: Inferring changes in terrestrial water storage using ERA-40 reanalysis data: The Mississippi River basin. J. Climate, 17, 2039–2057, https://doi.org/10.1175/1520-0442(2004)017<2039:ICITWS>2.0.CO;2.
Sinha, D., T. H. Syed, J. S. Famiglietti, J. T. Reager, and R. C. Thomas, 2017: Characterizing drought in India using GRACE observations of terrestrial water storage deficit. J. Hydrometeor., 18, 381–396, https://doi.org/10.1175/JHM-D-16-0047.1.
Strassberg, G., B. R. Scanlon, and M. Rodell, 2007: Comparison of seasonal terrestrial water storage variations from GRACE with groundwater-level measurements from the High Plains Aquifer (USA). Geophys. Res. Lett., 34, L14402, https://doi.org/10.1029/2007GL030139.
Su, T., and G. L. Feng, 2015: Spatial-temporal variation characteristics of global evaporation revealed by eight reanalyses. Sci. China Earth Sci., 58, 255–269, https://doi.org/10.1007/s11430-014-4947-8.
Sun, A. Y., J. Chen, and J. Donges, 2015: Global terrestrial water storage connectivity revealed using complex climate network analyses. Nonlinear Processes Geophys., 22, 433–446, https://doi.org/10.5194/npg-22-433-2015.
Swenson, S., and J. Wahr, 2006: Post-processing removal of correlated errors in GRACE data. Geophys. Res. Lett., 33, L08402, https://doi.org/10.1029/2005GL025285.
Swenson, S., P. J. F. Yeh, J. Wahr, and J. Famiglietti, 2006: A comparison of terrestrial water storage variations from GRACE with in situ measurements from Illinois. Geophys. Res. Lett., 33, L16401, https://doi.org/10.1029/2006GL026962.
Syed, T. H., J. S. Famiglietti, M. Rodell, J. Chen, and C. R. Wilson, 2008: Analysis of terrestrial water storage changes from GRACE and GLDAS. Water Resour. Res., 44, W02433, https://doi.org/10.1029/2006WR005779.
Tang, Q. H., H. L. Gao, P. Yeh, T. Oki, F. G. Su, and D. P. Lettenmaier, 2010: Dynamics of terrestrial water storage change from satellite and surface observations and modeling. J. Hydrometeor., 11, 156–170, https://doi.org/10.1175/2009JHM1152.1.
Wang, S. S., and J. H. Li, 2016: Terrestrial water storage climatology for Canada from GRACE satellite observations in 2002–2014. Can. J. Remote Sens., 42, 190–202, https://doi.org/10.1080/07038992.2016.1171132.
Yang, P., J. Xia, C. S. Zhan, Y. F. Qiao, and Y. L. Wang, 2017: Monitoring the spatio-temporal changes of terrestrial water storage using GRACE data in the Tarim River basin between 2002 and 2015. Sci. Total Environ., 595, 218–228, https://doi.org/10.1016/j.scitotenv.2017.03.268.
Yeh, P. J. F., and J. S. Famiglietti, 2008: Regional terrestrial water storage change and evapotranspiration from terrestrial and atmospheric water balance computations. J. Geophys. Res., 113, D09108, https://doi.org/10.1029/2007JD009045.
Yi, H., and L. X. Wen, 2016: Satellite gravity measurement monitoring terrestrial water storage change and drought in the continental United States. Sci. Rep., 6, 19909, https://doi.org/10.1038/srep19909.
Zeng, N., J. H. Yoon, A. Mariotti, and S. Swenson, 2008: Variability of basin-scale terrestrial water storage from a PER water budget method: The Amazon and the Mississippi. J. Climate, 21, 248–265, https://doi.org/10.1175/2007JCLI1639.1.
Zhang, L. J., H. Dobslaw, C. Dahle, I. Sasgen, and M. Thomas, 2016: Validation of MPI-ESM decadal hindcast experiments with terrestrial water storage variations as observed by the GRACE satellite mission. Meteor. Z., 25, 685–694, https://doi.org/10.1127/metz/2015/0596.
Zhang, L. J., H. Dobslaw, T. Stacke, A. Guntner, R. Dill, and M. Thomas, 2017: Validation of terrestrial water storage variations as simulated by different global numerical models with GRACE satellite observations. Hydrol. Earth Syst. Sci., 21, 821–837, https://doi.org/10.5194/hess-21-821-2017.