Accurately measuring interannual variability in terrestrial evapotranspiration ET is a major challenge for efforts to detect trends in the terrestrial hydrologic cycle. Based on comparisons with annual values of terrestrial evapotranspiration derived from a terrestrial water balance analysis, past research has cast doubt on the ability of existing products to accurately capture variability. Using a variety of estimates, this analysis reexamines this conclusion and finds that estimates of variations obtained from a land surface model are more strongly correlated with independently acquired from thermal infrared remote sensing than derived from water balance considerations. This tendency is attributed to significant interannual variations in terrestrial water storage neglected by the water balance approach. Overall, results demonstrate the need to reassess perceptions concerning the skill of estimates derived from land surface models and show the value of accurate remotely sensed ET products for the validation of interannual ET.
There has been a great deal of recent interest in the development of large-scale terrestrial evapotranspiration ET datasets for climate applications. These products can be derived via a range of remote sensing, modeling, and data assimilation approaches (Mueller et al. 2011). However, evaluating large-scale ET products at interannual time scales remains a major challenge (Zhang et al. 2012). The classical approach for verifying such products is comparison against a terrestrial water balance calculation. The instantaneous terrestrial water balance is typically based on equating changes in terrestrial water storage ΔTWS (mm) with the net sum of precipitation accumulation P, horizontal runoff flow Q, and evapotranspiration losses:
This balance holds within any spatial control volume; however, it is commonly applied to discrete hydrologic units so that Q can be equated with observed streamflow at a basin outlet. Summing (1) over annual time periods—indicated using the overbar notation—leads to
Therefore, based on (2), annual evapotranspiration can be estimated as
At annual time scales and above, is commonly assumed to be zero. Therefore, the classical water balance WB approach for estimating is based on measuring and and applying (3) under the assumption that .
By comparing decadal trends in WB-based with trends derived from independent estimates derived from modeling and remote sensing, Zhang et al. (2012) emphasized the inability of many model- and remote sensing–based products to accurately capture interannual trends in humid climates. Likewise, Jung et al. (2010) validated derived via the spatial interpolation of ground flux tower observations using independent estimates derived from a catchment-scale water balance analysis. However, at least in very large hydrologic basins, recent studies on the Gravity Recovery and Climate Experiment (GRACE) have called into question the classical assertion that can be safely neglected at annual time scales (Zeng et al. 2012, 2014).
Here, we seek to intercompare multiple large-scale products with the aim of developing an improved strategy for validating their interannual variability. Results are based on 1) from the Noah land surface model , 2) from the remote sensing–based Atmosphere–Land Exchange Inverse (ALEXI) energy balance model , and 3) from a WB approach that neglects annual changes in terrestrial water storage . For an additional analysis, from GRACE data and estimates from the interpolation of ground-based flux towers using the model tree ensemble (MTE) algorithm of Jung et al. (2009) , are also considered. Given that past work has already examined mutual biases in these products (Hain et al. 2015), our focus here is on the correlation (at zero lag) of interannual anomalies.
The analysis is divided into two scales. The first scale is defined by an east–west transect of 15 medium-sized (~402–1002 km2) unregulated basins within the U.S. southern Great Plains (SGP) region (Table 1). The second scale is consistent with five much larger (~5002–10002 km2) major basins within the Mississippi River system (Table 2). See Fig. 1 for a map of all basins. The 15 medium-sized basins described in Table 1 were selected based on a screening analysis by the Model Parameter Estimation Experiment (MOPEX) to remove basins with poor rain gauge coverage and/or excessive human regulation/impoundment of streamflow. In addition, an attempt was made to select medium-scale basins that span the strong east–west precipitation gradient across the SGP and receive a relatively low fraction of their annual precipitation as snowfall. Naturally, the impact of human streamflow regulation cannot be neglected within the major basins examined.
a. Water balance–ET
As described in (3), estimates were derived from the difference between annual observed streamflow and precipitation where is assumed to be zero. In particular, daily streamflow volumes at individual basin outlets listed in Tables 1 and 2 (and mapped in Fig. 1) were obtained from the U.S. Geologic Survey (USGS), normalized by basin drainage areas, and aggregated to (calendar year) annual values. Annual precipitation was based on the temporal aggregation of terrain-corrected daily rain gauge observations collected from the National Centers for Environmental Prediction (NCEP) Climate Prediction Center (CPC) and processed onto a 0.125° grid as part of phase 2 of the North American Land Data Assimilation System (NLDAS-2). More details on the NLDAS-2 project and meteorological forcing datasets can be found in Mitchell et al. (2004) and Xia et al. (2012). Following (3), was calculated for calendar years 2002–12.
The product was based on the temporal aggregation of hourly ET predictions acquired from a 0.125°-resolution Noah land surface model simulation driven by NLDAS-2 meteorological forcing data. The NLDAS-2 hourly precipitation forcing dataset is based on the disaggregation of daily NCEP CPC data using available ground-based rain radar observations. The Noah model is a one-dimensional, physically based land surface model that calculates surface state and flux variables using prognostic energy and water balance equations. Total ET is calculated by summing up hourly Noah predictions of 1) direct evaporation from the surface soil, 2) direct evaporation of canopy-intercepted precipitation, 3) transpiration via plant root uptake of water, and 4) sublimation. Annual averages were then obtained by summing hourly ET within calendar years 2002–12 and spatially averaging these 0.125° summations over all basin domains indicated in Fig. 1. More information about the Noah model version implemented in NLDAS-2 (version 2.8) is given in Chen et al. (1996), Chen and Dudhia (2001), and Ek et al. (2003). Note that since and are both derived (in part) from NLDAS-2 precipitation data, they cannot be considered wholly independent estimates.
Unlike and , the ALEXI surface energy balance model produces ET using thermal infrared (TIR) remote sensing data without any precipitation input (Anderson et al. 2011). ALEXI was processed at a spatial resolution of 10 km over the period of 2003–12, forced with meteorological inputs from the North American Regional Reanalysis (NARR; Mesinger et al. 2006), TIR land surface temperature from the Geostationary Operational Environmental Satellite (GOES), and leaf area index (LAI) from the 8-day Terra MODIS product (MOD15A2), used to estimate vegetation cover fraction fc. Instantaneous latent heat fluxes retrieved from ALEXI are upscaled to daytime-integrated ET estimates, assuming a self-preservation of the ratio of latent heat flux and incoming shortwave radiation fSUN during daytime hours (Cammalleri et al. 2014). Incoming shortwave radiation inputs are taken from the NCEP Climate Forecast System Reanalysis (CFSR; Saha et al. 2010). Currently, ALEXI is not executed over snow-covered surfaces. These periods are instead gap filled with a linear interpolation of fSUN.
While based on very different fundamental principles, ALEXI and Noah share some common inputs. Therefore, in order to minimize commonality in inputs between and , every effort was made to ensure that these inputs did not induce cross-correlated error in ET predictions. For instance, while both Noah and ALEXI require incoming solar radiation as a forcing, ALEXI simulations were based by radiation products generated by CFSR while Noah simulations were instead forced by radiation fields from the NARR. Likewise, while both ALEXI and Noah require fc, Noah uses a fixed monthly climatology acquired from a retrospective analysis of Advanced Very High Resolution Radiometer observations while ALEXI uses actual 8-day MODIS LAI composites to estimate fc.
For an additional analysis, estimates were also acquired from the flux tower observations and the MTE machine-learning algorithm introduced by Jung et al. (2009). The MTE upscales in situ ET measurements from a network of regional networks (FLUXNET) using the remotely sensed fraction of photosynthetically active radiation and gridded meteorological data to produce monthly gridded ET estimates at a 0.50° spatial resolution. These estimates were temporally averaged to an annual scale (within calendar years 2002–11) and spatially averaged within the five major basins listed in Table 2.
Monthly GRACE ΔTWS data were obtained by applying the rescaling coefficients of Landerer and Swenson (2012) to gridded 0.25° GRACE ΔTWS products provided by the GeoForschungsZentrum (GFZ) and the University of Texas Center for Space Research (CSR) and averaging the resulting two fields together. December and January GRACE ΔTWS values from this unified product were averaged together to estimate 1 January ΔTWS. The difference between successive 1 January ΔTWS values was then used to obtain for calendar years 2002–12. Finally, the resulting 0.25° annual fields were spatially averaged within the five major river basins listed in Table 2.
Our analysis focused on calculating the (lag zero) Pearson correlation coefficient between normalized anomalies of interannual, basin-scale variations and interannual variation found in other products. Given the west-to-east increase in P within the SGP region, mean within the 15 medium-scale basins (Table 1, Fig. 1) ranges from 500 to 900 mm yr−1 (Table 1). Figure 2 plots the correlation between and sampled in each medium-scale basin, where basins are sorted according to mean . Within the driest basins, – correlations are uniformly high. However, since and estimates are based on the same (uncertain) precipitation product, some of this correlation may be spurious because of positively correlated errors. In contrast, – correlations become highly erratic (and frequently negligible) within wetter basins (Fig. 2). This tendency has been attributed to the inability of land models to accurately capture interannual ET variability in humid climates (Zhang et al. 2012).
However, a different interpretation emerges when also considering the correlation between and . In particular, – correlations are uniformly positive for all medium-scale basins, even the wettest basins, which exhibit very low – correlations (Fig. 2). In addition, all sampled – correlations have interquartile sampling ranges (derived from a boot-strapping approach) that do not include zero. Such robust positive correlations occur despite the fact that and are obtained via wholly independent means and cold season is based on a simplistic temporal interpolation technique (section 2c). Therefore, Fig. 2 strongly implies that the aforementioned reduction in – correlation within humid basins is attributable to error in and not uncertainty in .
An obvious error source for is the neglect of . Within the larger-scale major basins listed in Table 2, the impact of can be directly examined using GRACE ΔTWS observations. Figure 3 is analogous to Fig. 2, except applied to much larger basins within the Mississippi River system (Fig. 1, Table 2). As in Fig. 2, – correlations are relatively high for the drier major basins (i.e., the Missouri, the Red, and the Arkansas) but fall sharply for the wetter major basins (i.e., the Ohio and the upper Mississippi). However, – correlations are uniformly improved by avoiding the problematic assumption that and instead estimating large-scale directly from GRACE (Fig. 3). The consistent improvement implies that the neglect of is playing a significant role in reducing sampled – correlations.
As in Fig. 2, – correlations in Fig. 3 are relatively high and remain stable across all five major basins. Sampled – correlations are even higher for all basins except the Ohio River basin. In particular, note that the exceptionally low (negative) correlation in (non-GRACE corrected) – correlations within the upper Mississippi basin is not reflected in either –, –, or GRACE-corrected – correlations. Therefore, the observed shortcoming in (uncorrected) – appears linked to relatively large interannual variability in surface water and snow storage within the upper Mississippi River basin. As a result, Fig. 3 supports Fig. 2 by suggesting that the decline in – correlations within humid basins is attributable to problems with the accuracy of the benchmark and not the ability of Noah to accurately capture interannual ET variability.
The low – correlation sampled within the (humid) Ohio River basin (Fig. 3) runs somewhat counter to this interpretation. However, the lack of a comparable reduction in either – or GRACE-corrected – correlation results within the Ohio River basin implies that the reduction is due to increased error in and not .
5. Discussion and conclusions
Here, we examine the correlation in interannual ET variations observed via a variety of independent means. When transitioning from a dry to a wet climate within the SGP, a large reduction is seen in the correlation between and . This trend holds along a transect of both medium-scale unregulated basins in the SGP (Fig. 2) and among five large-scale major Mississippi River subbasins (Fig. 3). However, an analogous reduction with wetter climate is not observed in the correlation between and (Figs. 2 and 3). Therefore, the reduction in the – correlation for wet climates appears to be a consequence of neglecting in water balance calculations and not reflective of any shortcoming in .
In addition, within the major Mississippi River system basins, the introduction of GRACE-based uniformly improves the – correlation (Fig. 3). Therefore, taken as a whole, results imply that WB-based calculations with the neglect of interannual ΔTWS do not represent a robust benchmark for the validation of interannual ET variations in relatively humid climates. Instead, a more robust verification approach appears to be the examination of correlations between model-based predictions and independently generated ET datasets and/or the use of GRACE data to refine annual water balance calculations (within sufficiently large basins). This may suggest the need to reevaluate previous work (Zhang et al. 2012) that utilized water balance approaches to conclude that model-based ET products contain little skill in capturing interannual ET variability.
The impact of on (annual) calculations has been previously noted (Rodell et al. 2007; Syed et al. 2008; Zeng et al. 2012); however, this analysis leads to several novel insights. First, by using correlations (and not GRACE observations) to infer the presence of significant variations, these results provide an independent source of verification for earlier studies based only on GRACE retrievals. One consequence of this is our ability to extend the observation-based analysis of down to small-scale catchments (1000–8000 km2 in size) that cannot be resolved by GRACE (Table 1, Fig. 2). Despite these small-scale basins being free of any major anthropogenic impoundment (and generally clear of major snowpack storage), still appears to play a major role in any attempt to estimate via water balance considerations and the neglect of terrestrial water storage variations. In addition, results suggest a relatively larger impact for on variations within relatively wet climates. This tendency is at odds with earlier GRACE-based studies (which suggested greater impacts in arid climates; Zeng et al. 2012) and provides an alternative explanation for the conclusion of Zhang et al. (2012) that land surface models cannot match trends within relatively wet climates.
Nevertheless, several important caveats should be considered. For example, Hain et al. (2015) identified large relative biases in for areas with extensive irrigation and/or direct groundwater extraction by plant roots. In such areas, correlations between and may be degraded and consequently unsuitable as a verification tool. Finally, all results are based on relatively short (9 or 10 years) data records because of limitations in the length of available satellite data records. Care should therefore be taken to avoid the overinterpretation of small—and potentially nonsignificant—differences in correlations. Finally, a number of obvious follow-on research topics can be defined based on initial results presented here. Such topics include 1) examining the impact of snow water storage on by modifying the start/stop times used to define an annual average, 2) replicating the analysis for multiple land surface models, and 3) evaluating water balance calculations on subannual time scales.
Research was funded by a grant to Wade T. Crow (PI) from the Water Resources Group within the NASA Applied Sciences Program.