Water and carbon fluxes simulated by 12 Earth system models (ESMs) that participated in phase 5 of the Coupled Model Intercomparison Project (CMIP5) over several recent decades were evaluated using three functional constraints that are derived from both model simulations, or four global datasets, and 736 site-year measurements. Three functional constraints are ecosystem water-use efficiency (WUE), light-use efficiency (LUE), and the partitioning of precipitation P into evapotranspiration (ET) and runoff based on the Budyko framework. Although values of these three constraints varied significantly with time scale and should be quite conservative if being averaged over multiple decades, the results showed that both WUE and LUE simulated by the ensemble mean of 12 ESMs were generally lower than the site measurements. Simulations by the ESMs were generally consistent with the broad pattern of energy-controlled ET under wet conditions and soil water-controlled ET under dry conditions, as described by the Budyko framework. However, the value of the parameter in the Budyko framework ω, obtained from fitting the Budyko curve to the ensemble model simulation (1.74), was larger than the best-fit value of ω to the observed data (1.28). Globally, the ensemble mean of multiple models, although performing better than any individual model simulations, still underestimated the observed WUE and LUE, and overestimated the ratio of ET to P, as a result of overestimation in ET and underestimation in gross primary production (GPP). The results suggest that future model development should focus on improving the algorithms of the partitioning of precipitation into ecosystem ET and runoff, and the coupling of water and carbon cycles for different land-use types.
Several intercomparison studies compared simulations by the Earth system models (ESMs) participating in phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012) against various observation-based in situ measurements or global datasets, including runoff, net primary productivity, and soil carbon (Alkama et al. 2013; Anav et al. 2015; Seager et al. 2013; Shao et al. 2013; Todd-Brown et al. 2013). These evaluations identified some key deficiencies in the simulated temporal dynamics and spatial patterns of specific variables or processes relating the water or carbon cycles, but none evaluated water and carbon cycles together, or the coupling of those two cycles at site and global scales.
Over the last three decades, there has been an accumulation of global flux network (FLUXNET) and observation-based gross primary production (GPP) and evapotranspiration (ET) datasets, which provide a unique opportunity to evaluate two key terrestrial processes: water and carbon cycles and their coupling, as represented in the most advanced ESMs. Such intercomparisons enable evaluation of models and highlight differences among them to further our understanding of how the carbon cycle operates and interacts with the physical climate system.
In this study, we intend to assess how the ESMs participating in CMIP5 perform against the broadscale fundamental functional constraints. Functional constraints are defined here as integrated indicators that describe integrated effects of multiple ecophysiological processes from environmental variables, and they may vary spatially but are quite conservative for a given location if averaged over multiple decades. Three functional constraints including ecosystem water-use efficiency (WUE), light-use efficiency (LUE), and the partitioning of precipitation into ET and runoff based on the Budyko framework were used to evaluate ESMs.
In ecophysiology, WUE and LUE are often used to quantify the efficiencies of different terrestrial ecosystems to use two primary resources for plant growth and ecosystem functioning: water and light (Ito and Inatomi 2012; Ruimy et al. 1999). WUE represents the interactions between carbon and water cycles at leaf, canopy, and ecosystem scales (Eamus 1991). Leaf-scale WUE and its response to meteorological variables have been relatively well understood (Medlyn et al. 2011; Wong et al. 1979). However, understanding of WUE at ecosystem or global scale is still limited (Dekker et al. 2016; Keenan et al. 2013; Medlyn et al. 2015) because many factors including soil properties, nutrient availability, and vegetation structure can play a significant role in regulating WUE (Field et al. 1995; Huntington 2008; Medlyn et al. 2015). Recent research showed that ecosystem WUE was positively related to rising atmospheric CO2 concentration and an increase in leaf area index, but negatively impacted by a decline in vapor pressure deficits (Cheng et al. 2017). Despite multiple environmental factors influencing WUE and its temporal variations, ecosystem WUE was found to be quite conservative for a specific location when averaged over multiple decades (Gao et al. 2014; Huxman et al. 2004; Tang et al. 2014). This robustness in ecosystem WUE is essentially determined by the intrinsic ecosystem sensitivity to water availability across precipitation regimes, independent of temporally hydroclimatic variations (Ponce-Campos et al. 2013).
LUE is used to quantify the efficiency of absorbed lights in photosynthesis or dry matter production (Monteith 1972), and has been used in ecosystem models for estimating GPP (Coops et al. 2005; Potter et al. 1993). LUE has been shown to be constant over a day in many studies (Rosati and Dejong 2003; Sims et al. 2005), but can also vary significantly at different temporal and spatial scales (Goerner et al. 2011; Turner et al. 2003; Yuan et al. 2007; Zhou et al. 2016). Phenology, temperature, vapor pressure deficit, soil water content, and diffuse radiation fraction were considered to be main factors influencing LUE (Mäkelä et al. 2008; Veroustraete et al. 2002; Yuan et al. 2007; Y. Zhang et al. 2015). However, ecosystem LUE was considered to be approximately constant over decadal time scales (Medlyn 1998; Monteith 1972; Monteith and Moss 1977), which is mainly based on the ecological optimization principle through adjusting plant nitrogen (N) content and Rubisco activity (Haxeltine and Prentice 1996).
In hydrology, the Budyko framework is a robust tool for estimating the partitioning of precipitation P into ET and runoff Roff as a function of aridity index (Budyko 1974; Zhang et al. 2001). Over multiannual or decadal time scales, the Budyko framework has been proven to be accurate in describing the interactions between climate and the hydrological cycle (Li et al. 2013; Yang et al. 2006; Zhang et al. 2008). In addition, the introduction of a plant-available water coefficient ω (Li et al. 2013; Yang et al. 2007; Zhang et al. 2001) in the Budyko-type equation allows the effects of vegetation on ET to be considered. Numerous studies showed the robustness of the Budyko framework in describing the joint controls of energy and water availabilities on ET, and the ratio of ET to Roff. The Budyko parameter ω remained considerably constant at multiple annual time scales for a given catchment (Zhang et al. 2001).
The objectives of this study are to assess the performance of the 12 ESMs for estimating WUE, LUE, and rainfall partitioning at individual sites or global scale against measurements, and to identify key sources of model errors. Results of this study will guide future model development.
2. Materials and methods
a. Earth system model output
Twelve ESMs were selected from all participating models in CMIP5 based on their availability of the selected monthly outputs of carbon, energy, and climatic variables. These ESMs differ in their representations of terrestrial carbon cycle processes, as well as in their atmospheric and oceanic components, but share common experimental designs and output protocols. The 12 ESMs used in this study are listed in Table S1 of the supplemental material and the detailed descriptions of each ESM are available from previous studies (Shao et al. 2013; Todd-Brown et al. 2013). A number of experimental simulations were performed for CMIP5, but we only focus here on the “historical” simulation. The historical CMIP5 simulation that covers the 1850–2005 period allows evaluation of ESMs through a direct comparison with observations (Taylor et al. 2012). We selected the most recent 24 years covering the period of 1982–2005, which enables comparison with observation-based estimates from the global FLUXNET data and available global GPP and ET data products.
Monthly carbon, water, and energy fluxes including GPP, ET, downward and upward shortwave and longwave radiation, and climatic variables including precipitation, temperature, humidity, and atmospheric pressure were obtained online (http://cmip-pcmdi.llnl.gov/cmip5/data portal.html). All variables were aggregated to annual values and their mean annual values were used in this study.
b. FLUXNET and global GPP and ET datasets
Site-scale data of monthly GPP and ET (FLUXNET2015) were downloaded from the global FLUXNET site (http://fluxnet.fluxdata.org/data/fluxnet2015-dataset) that included 165 stations and 1180 site years. In the FLUXNET2015 dataset, GPP was derived from observed net ecosystem exchange (NEE) and ecosystem respiration that was estimated from nighttime NEE (Papale et al. 2006; Reichstein et al. 2005). Annual ET was estimated from observed latent heat flux, in which latent heat flux was filled by a marginal distribution sampling algorithm and was corrected for any errors in energy balance closure (Reichstein et al. 2005). Ecosystem WUE was calculated as the ratio of annual GPP to annual ET. We selected site years with both annual GPP and ET available. As a result, 736 site years (93 sites) of data were selected for analysis. Geographical distribution of the 93 sites is shown in Fig. S1 of the supplemental material.
Global spatial datasets include one set of GPP and three sets of ET. One set of GPP and one set of ET were obtained from Jung et al. (2011, hereinafter JU11). The JU11 data are essentially model products derived by combining FLUXNET observations and a model tree ensemble (MTE) approach trained with satellite-derived fraction of absorbed photosynthetically active radiation (FAPAR) and gridded mean climates. The MTE statistical model employed by JU11 consists of a set of regression trees trained with the local GPP estimates obtained from measured NEE. The JU11 dataset represent the mean fluxes of GPP and ET during the period 1982–2008. Although JU11 GPP and ET are not direct measurements, they have been used as a “reference” for evaluating global models (Shao et al. 2013; Wang et al. 2012). To avoid the uncertainties caused by one dataset, two other independent global ET datasets were also used as “references.” The two independent ET datasets consist of the Global Land Evaporation Amsterdam Model (GLEAM; Miralles et al. 2011) and the LandFlux Eval (hereinafter EVAL; Mueller et al. 2013). The GLEAM ET dataset was generated using the Priestley and Taylor equation with a variety of satellite data including microwave-derived soil moisture, land surface temperature, and vegetation density, as well as the detailed estimation of rainfall interception loss as model inputs. The GLEAM ET data (version 3.0a) cover the period from 1980 to 2014 at 0.25° × 0.25°. The EVAL has longer data coverage (1989–2005) and is an ET synthesis product at 1° × 1°, merged from a total of 14 datasets including five diagnostic data (derived from satellite and/or in situ observations), five land surface models, and four reanalyses ET products (Mueller et al. 2013).
c. WUE, LUE, and partitioning of rainfall within the Budyko framework
For the purposes of this study, we define ecosystem WUE as GPP per unit total ET [mmol C (mol H2O)−1],
Ecosystem LUE is defined as GPP per unit incident photosynthetically active radiation (PAR), that is,
PAR is approximately calculated as half of downward shortwave radiation in each ESM with a conversion factor of 1 J = 4.6 μmol for PAR. The unit of LUE is micromole of carbon per micromole of PAR.
The Budyko framework states that the mean annual water balance of a given land surface is mainly controlled by available water and atmospheric demand for water (Budyko 1974). The original Budyko framework uses net radiation to describe atmospheric demand of water and doesn’t include the effect of vegetation that is expected to influence actual ET. Here we used the formulation proposed by Zhang et al. (2001) that includes the effect of vegetation on annual water balance through the parameter ω that is related to plant-available water in soil:
The parameter ϕ is the aridity index, defined as the ratio of potential ET (PET) to P, and ω is the only model parameter that varies with vegetation type, land use, and topography, with a typical value of 2.0 for forests and 0.5 for grasses (Li et al. 2013; Zhang et al. 2001).
PET was computed using the Priestley–Taylor equation
where α is the Priestley–Taylor parameter (set to 1.26), Δ is the slope of the saturation vapor pressure–temperature relationship, and γ is the psychometric constant. Net radiation Rnet is computed as
The four radiation fluxes rsds, rlds, rsus, and rlus are downward shortwave radiation, downward longwave radiation, upward shortwave, and upward longwave radiation, respectively. The four radiation fluxes and air temperature Tair are obtained from each ESM, and PET was computed based on Eq. (5). Combining ESM-simulated P and ET, a best-fit value of ω can be obtained by applying the Budyko framework [Eqs. (3a)–(3c)] to all ESMs.
d. CRU precipitation data
To determine if the model biases in ET and GPP were caused by biases in simulated climate (especially precipitation) or by the land model components of the ESMs themselves (i.e., in the way that ecological processes are represented), a global dataset of historical climate, the Climatic Research Unit (CRU) time series (TS) data, version 4.01 (Harris et al. 2014), was used to evaluate precipitation simulated by ESMs. The CRU gridded climate dataset is available at a monthly temporal resolution and a spatial resolution of 0.5° × 0.5° for the period 1961–2014. To match the period of global FLUXNET data, the precipitation data from the CRU climate dataset for the period 1982–2005 were used. Although precipitation, temperature, and incoming solar radiation exhibit primary controls on GPP and ET, simulated precipitation was considered the most important variable that differed most among ESMs.
The observed multiyear mean WUE at site scale from the FLUXNET [WUE in eddy covariance (EC)] dataset varied significantly among different land-cover types. For example, the observed WUE values for OSH were significantly lower than those for ENF. As compared with the range of the observed WUE variation from 0 to 5 mmol C (mol H2O)−1 (Fig. 1a, where 1 mmol C (mol H2O)−1 = 2/3 g C (kg H2O)−1, the ESMs estimated much smaller variation of multiyear mean WUE [from 1.0 to 2.0 mmol C (mol H2O)−1] for those selected sites (Fig. 1a). The linear correlation of site-scale WUE estimates between FLUXNET and ensemble ESM means was poor [WUE in ESM = 0.16 WUE in EC + 0.14, r2 = 0.22, and root-mean-square error (RMSE) = 0.86 mmol C (mol H2O)−1]. Overall, ensemble ESM mean underestimated WUE when the observed WUE from FLUXNET dataset was larger than 2.0 mmol C (mol H2O)−1.
Compared with the simulated WUE by ESMs, multiyear mean WUE from JU11 GPP and JU11 ET at site levels (WUE in JU11) had a much better linear correlation with WUE in EC [WUE in JU11 = 0.47 WUE in EC + 1.06, r2 = 0.36, and RMSE = 0.63 mmol C (mol H2O)−1]. Therefore, WUE in JU11 will be used to evaluate spatial patterns of WUE from ESMs.
At the global scale, the ratio of JU11 GPP to GLEAM ET (WUE in GLEAM) and EVAL ET (WUE in EVAL) showed a spatial pattern similar to WUE in JU11 (Fig. 2). All three estimates of WUE were relatively high [>2.5 mmol C (mol H2O)−1] in Europe and South America, and low [<1.0 mmol C (mol H2O)−1] in central Asia and western North America. Three WUE estimates based on observations were highly correlated with each other [r2 = 0.69 between WUE in JU11 and WUE in GLEAM, between WUE in JU11 and WUE in EVAL (r2 = 0.78), or between WUE in GLEAM and WUE in EVAL (r2 = 0.85)]. On the other hand, there are considerable variations in the multiyear mean WUEs as estimated from different ESM simulations for a given region. Particularly, GISS-E2-H simulated systematically higher WUE than the other 11 ESMs. Both GFDL-ESM2G and IPSL-CM5A-LR simulated higher WUE in the Northern Hemisphere (NH), but BCC_CSM1.1, CCSM4, and HadGEM2-CC simulated lower WUE in NH than other ESMs (see Fig. S2 in the supplemental material).
The ensemble mean of multiyear mean WUE of 12 ESMs has a spatial pattern similar to those derived from the model–data products (see Fig. 2): generally low in dry areas, such as central Australia and the southern United States, and high in the tropical rain forests in Africa and South America. However, the ensemble mean of 12 ESMs underestimated the WUE in Europe and the northern temperate forest regions. Ensemble mean WUEs of 12 ESMs were significantly lower than that of JU11 when pixel-by-pixel values were compared (WUE in ESM = 0.4 WUE in JU11 and r2 = 0.78).
Mean annual WUE from three observation-based estimates varied similarly with latitude (Fig. 3). The estimated WUE from both GLEAM and EVAL datasets was smaller than JU11 because of the relatively larger ET values in GLEAM and EVAL than in JU11 (see Fig. S3 in the supplemental material). Overall, estimated WUE from three different datasets was high in the tropics, subtropics, and temperate regions, but low in the regions near 15°N, 35°N, and 30°S. At least five of the 12 ESMs (BNU-ESM, CanESM2, GISS-E2-H, IPSL-CM5A-LR, and MPI-ESM-LR) failed to reproduce this pattern of latitudinal variation in ecosystem mean annual WUE (Fig. 3). For example, in the latitudinal range of 30°S–20°N, the estimated mean annual WUE by CanESM2 was significantly greater, up to 150% more, than the estimated WUE from the three observation-based data products. The remaining seven ESMs simulated much lower WUE across different latitudes (Fig. 3). As a result, the ensemble mean WUE averaged over 12 ESMs was comparable with the mean WUE from three observation-based estimates across different latitudes.
At site scale, ensemble means LUE of 12 ESMs (LUE in ESM) were significantly correlated with the observed LUE (LUE in EC) [LUE in ESM = 0.38 LUE in EC + 0.004, r2 = 0.51, and RMSE = 0.004 μmol C (μmol PAR)−1] across the selected sites (Fig. 4). The mean LUE across all the selected sites from the 12 ESMs was 0.0076 ± 0.0019 μmol C (μmol PAR)−1, as compared with the mean of 0.0086 ± 0.0053 μmol C (μmol PAR)−1 from the observations. Overall, the ensemble mean of 12 ESMs gave higher LUE when the observed LUE was >0.015 μmol C (μmol PAR)−1 [1 μmol C (μmol PAR)−1 = 55.2 g C MJ−1], such as EBF, and gave lower LUE when the observed LUE was <0.005 μmol C (μmol PAR)−1, such as OSH or GRA.
At the global scale, the calculated LUE showed that the LUE values for the majority of land grid cells from the 12 ESMs were less than 0.02 μmol C (μmol PAR)−1 (Fig. S4 in the supplemental material). Only 7 of 12 ESMs simulated LUE > 0.03 μmol C (μmol PAR)−1 for some land points, and none simulated LUE > 0.04 μmol C (μmol PAR)−1. Because CCSM4 and NorESM1-M used the same land surface model, the simulated LUE by these two models were similar. BNU-ESM and MIROC-ESM simulated relatively low LUE values [<0.02 μmol C (μmol PAR)−1 for most land points], as compared to other ESMs.
The mean LUE of 12 ESMs over land grid cells ranged from 0.001 to 0.025 μmol C (μmol PAR)−1, with an average value of 0.007 μmol C (μmol PAR)−1 (=0.37 g C MJ−1) and a standard deviation of 0.004 μmol C (μmol PAR)−1 (=0.23 g C MJ−1; Fig. 5a). Larger LUE values [>0.015 μmol C (μmol PAR)−1] were located in the tropical EBF in South America, part of central Africa, and South Asia (Fig. 5a). Relatively low LUE [<0.002 μmol C (μmol PAR)−1] was mostly located in central Asia. Regions with high-ensemble mean LUE, such as Amazonia, central Africa, and tropical Asia also showed high standard deviation (SD), which suggests a large difference in the estimated LUE among different ESMs. Furthermore, Europe and central United States also showed large variations in LUE estimates among 12 ESMs (Fig. 5b). As PAR simulated by 12 ESMs were quite similar at the global scale (see Fig. S5 in the supplemental material), differences in simulated LUE across ESMs were largely contributed by the differences in the simulated GPP by different ESMs (see Fig. S6 in the supplemental material).
To evaluate the differences in the simulated LUE by different ESMs, we constructed a probability density function of LUE by calculating the area fraction of land cells for a given LUE for each of the 12 ESMs (see Fig. 6). We grouped the 12 ESMs into three groups based on the peak values of their probability distribution functions (PDFs). BNU-ESM, CCSM4, HadGEM2, and CanESM2 all have an LUE peak around 0.001 μmol (μmol)−1, MIROC-ESM, INM-CM4.0, and BCC_CSM1.1 have a peak at an LUE of about 0.005 μmol C (μmol PAR)−1, and the remaining ESMs have a peak at an LUE of about 0.01 μmol C (μmol PAR)−1 (see Fig. 6).
c. Budyko framework
Figure 7a showed annual ET/P [Eq. (3a)] increased nonlinearly with ϕ, as described by the Budyko curve for site-scale observations (ω = 1.28, r2 = 0.34; Fig. 7a) or ensemble mean of 12 ESMs (ω = 1.74, r2 = 0.48, Fig. 7b) at those selected sites, where ω is an empirical parameter [see Eqs. (3a)–(3c)]. In the context of Fig. 7, ω does not represent an “equivalent vegetation,” but rather how the land surface component of a given ESM partitions precipitation into evaporation and runoff. The best-fit value of ω for the ensemble mean of 12 ESMs was larger than that for the FLXUNET measurements, which indicated that the ensemble mean of 12 ESMs systematically oversimulated ET and underestimated runoff for a given value of ϕ.
Although the relationship between ET/P and aridity index from the ensemble mean of 12 ESMs is quite similar to the observed (see Fig. 7), the best-fit value of ω varied from 0.88 in BCC_CSM1.1 to 2.61 in INM-CM4.0, with a high coefficient of variance of 0.27 (mean of 1.95, and SD of 0.52) (see Table 1 and Fig. 8). The best-fit ω values for four other ESMs (BNU-ESM, CanESM2, GFDL-ESM2G, and IPSL-CM5A-LR) are lower than 2.0, and the ω values for the remaining seven ESMs all are larger than 2.0, although the correlation coefficients are all larger than 0.75. This divergence of ω among the 12 ESMs suggests significant differences in the partitioning of precipitation between ET and total runoff. For example, BCC_CSM1.1 with the smallest ω (0.88) simulated a relatively smaller proportion of precipitation into ET than runoff. In contrast, the INM-CM4.0 model with the largest ω (2.61) predicted a relatively larger proportion of ET than runoff. This also can be seen from the ET estimates for INM-CM4.0, which had a significantly larger ET than BCC_CSM1.1 (see Fig. S7 in the supplemental material).
Our study showed that the ensemble mean WUE of 12 ESMs was generally lower and less variable than the observed WUE across 736 site years. The modeled WUEs by 12 different ESMs showed a little variation with a mean value of 1.71 and standard deviation of 0.32 mmol C (mol H2O)−1 for all land-cover types except CSH, as compared with the observed variation of WUE from 0 to 4 mmol C (mol H2O)−1 with a mean value of 1.96 and standard deviation of 0.93 mmol C (mol H2O)−1 [see Fig. 1 and previous reports (Beer et al. 2009; Tang et al. 2014)].
Simulated WUE by the multiple ESMs followed a similar latitudinal pattern with that of data products, and this is consistent with previous findings (Ito and Inatomi 2012; Tang et al. 2014; Xue et al. 2015). This indicates that there was strong offsetting for the simulated WUE, and some models overpredicted WUE, whereas others underestimated WUE (Mystakidis et al. 2016). Accuracy of simulated WUE was essentially dependent on accuracy in estimates of either GPP or ET or both. According to the intermodel comparisons, we found that uncertainties of simulated GPP were much larger than ET across ESMs; therefore, poor performance in simulating GPP in ESMs was the most important reason for poorly simulated WUE. In addition, recent studies found that global GPP increased more than global ET in response to the increase in atmospheric CO2 concentration (Cheng et al. 2017), which resulted in an increase in WUE (Keenan et al. 2013). However, the temporary trend of WUE was not examined in this study.
The underlying causes for the differences in GPP and ET between ESMs and data products may come from the differences in model input (including dynamically simulated or prescribed land cover, prescribed soil maps, and simulated climate in the models), and model structures and parameterizations (Prentice et al. 2015). Climate forcing errors were important sources of errors in regional carbon and water simulations, and the errors can be as high as 80% at site scales (Zhao et al. 2012). Here we investigated the effects of differences in model input on simulated GPP and ET. By calculating the difference between the ESM-simulated precipitation and the CRU precipitation at each grid, we found that differences in model precipitation can explain up to 31% of differences in the simulated GPP (GFDL-ESM2G, see Fig. S8 in the supplemental material) and up to 25% of differences in the simulated ET (MPI-ESM-LR, see Fig. S9 in the supplemental material). Differences in precipitation had positive linear and significant (p < 0.001) effects on the differences in the simulated GPP and ET for all ESMs.
Apart from the errors in climate forcing (simulated by the atmospheric component in an ESM), differences in the representation of key ecosystem processes could be an important source of differences in the simulated GPP and ET. Although global distributions of model differences for GPP or ET differed among the 12 ESMs, it was possible to identify broadscale common deficiencies across models. For example, CanESM2, BNU-ESM, and BCC_CSM1.1 significantly underestimated the GPP of the Amazon rain forests; CanESM2, BCC_CSM1.1, and GFDL-ESM2G underestimated ET over parts of the Amazon rain forest; and GISS-E2-H underestimated ET over parts of the Amazon savanna. Significant underestimations of GPP and ET in these ESMs (particularly in CanESM2) over Amazonia (see Figs. S6 and S7) can be partially explained by the underestimation of precipitation in the regions in ESMs (Shao et al. 2013). Other likely causes include inadequate representation of root functioning in the current land surface model components of ESMs, such as hydraulic redistribution (Baker et al. 2008; Oliveira et al. 2005) and the root water uptake process (Li et al. 2012), which led to underestimations of both GPP and ET. We also observed that some ESMs underestimated GPP but overestimated ET in some regions (e.g., BNU-ESM in the Amazon rain forest and over the Australian continent). Possible reasons for this include higher sensitivity of ET to precipitation than GPP in the BNU-ESM. The difference between ESM and data products can be also attributed to uncertainties within datasets themselves. For example, JU11 GPP data were essentially upscaled based on site-derived GPP but there were only a few measurement sites in the Amazon region (Oliphant 2012). For ET products, there were also large spatial differences among JU11, GLEAM, and EVAL, particularly in tropical regions and northern America (see Fig. S3). In addition, the number of land-cover types prescribed in an LSM was not always the same across models, and associated vegetation ecophysiological parameters (e.g., maximum carboxylation rate) and state variables [e.g., leaf area index (LAI)] varied largely (Shao et al. 2013), but they were crucially important for the estimates of GPP and ET (Lu et al. 2013; Wang et al. 2007). These factors can be important sources of errors and uncertainties for the simulated GPP and ET, and hence for WUE.
We found that intermodel differences in LUE among the 12 ESMs showed quite similar spatial patterns as the differences in their GPP (see Figs. S4 and S6), because simulated incoming PAR for a given location is very similar across the 12 models (Fig. S5). This demonstrated that differences in LUE among models mainly resulted from the difference in GPP, which was overpredicted by some ESMs in some regions (e.g., GISS-E2-H and HadGEM2.CC in the Amazon forest), and LUE was overestimated as well. However, overall, most of the 12 ESMs underestimated GPP and therefore LUE globally.
It is worth noting that, apart from model structures, other factors such as different classifications of land-cover types, the definition of LUE, canopy structure, and diffuse radiation fraction may also contribute to the significant discrepancies between the model simulations and observations (Ruimy et al. 1999; Yuan et al. 2007; L.-X. Zhang et al. 2015). Because the absorbed PAR was not available for some of the 12 ESMs and the selected sites from FLUXNET, we used the incoming PAR that was calculated as half of downward shortwave radiation for calculating LUE. Therefore the influences of some factors, such as canopy LAI and PAR extinction coefficient and fraction of diffuse PAR on the absorbed PAR (Polley et al. 2011) and therefore LUE, if being defined as per unit amount of absorbed PAR basis, were not accounted for in this study. However, this study found that differences in the estimated LUE among 12 ESMs for a given location largely resulted from the differences in GPP estimates. For some tropical evergreen forests, estimated GPP by some ESMs was very high (>5000 g C m−2 yr−1), and therefore the LUE was also very high [>0.03 mol C (mol PAR)−1]. Future improvement on those ESMs should focus on those regions with very high GPP and LUE values.
All of the 12 ESMs correctly captured the nonlinear dependence of ET/P, or Roff/P for aridity, as described within the Budyko framework. This agrees with the result by Alkama et al. (2013), who found a good agreement between the observed and simulated runoff by some CMIPs models for most of river basins globally. However, the best-fit value of the only parameter ω in the Budyko equation varied from 0.88 to 2.61 among 12 ESMs. Large variations in the best-fit ω values strongly indicated large discrepancies in the partitioning of precipitation into ET and runoff among 12 ESMs (see Fig. S10 in the supplemental material). Ensemble mean of 12 ESMs tended to perform better than most individual ESMs, as compared with the best-fit value of ω using site-scale measurements. This is consistent with results of Roderick et al. (2014) based on their analysis of CMIP3 models. The best-fit ω value (1.74) of ensemble model mean was close to the best-fit value (1.28) from FLUXNET measurements in this research but was still less than the reported value of 2.21 [converting the parameter n to ω through ω = n + 0.72 (Yang et al. 2008)], which was based on an old version of global FLUXNET data (Williams et al. 2012). The differences in estimated value of ω from the FLUXNET dataset between this study and Williams et al. (2012) may result from the exclusion of sites with annual ET exceeding annual precipitation in this study. The mean values of ω at basin scale were positively correlated with long-term averaged vegetation cover (Li et al. 2013), which means that ω should be quite conservative without significant changes in land use as the scenario used in this research. It is also likely that different interpretations of land cover and land use among the 12 ESMs contributed to the large differences in ω, therefore the partitioning of rainfall into ET and runoff.
The projected future carbon and water fluxes by ESMs were reported to have large uncertainties (Cox et al. 2013). To reduce uncertainties, observation-based carbon and water flux estimates or their sensitivities to a specific climatic variable as constraints can be used to constrain the projections of ESMs (Cox et al. 2013; Mystakidis et al. 2016; Wenzel et al. 2014). Such statistical relationships between future and historical simulations or observations to constrain projections of ESMs, however, can be spurious because of errors or uncertainties in models or observations themselves (Bracegirdle and Stephenson 2013). As ecological or hydrological relationships between ecosystem functioning and climatic variables have been established on theoretical (e.g., the Budyko framework) and experimental (e.g., WUE or LUE) considerations, they offer useful alternatives in constraining the projected responses of ecosystem functioning to climate change, particularly at annual or longer temporal scales (Huxman et al. 2004; Ruimy et al. 1999).
We recommended that these functional constraints found in ecophysiological and hydrological disciplines should be used as an alternative strategy to constrain future projections of ESMs. For example, WUE, LUE, or the Budyko framework can be used in conjunction with FLUXNET measurement and/or other independent observations (e.g., observed runoff at basin scales) to constrain ESM simulations. Corresponding mean values of parameters or constants and uncertainties (usually expressed as standard deviation) with regards to the three functional relationships can be quantified. Then we can correlate the measured parameters or constants with those derived from ESMs in present or future, and force the ESM simulations to comply with these functional relationships to reduce the uncertainties of projections of ESMs, as in the recently reported constraining approach based on WUE (Mystakidis et al. 2016). Clearly, the current research served as a prestudy of using such functional relationships to constrain ESMs.
Twelve CMIP5 ESMs were evaluated against one set of global GPP data and three sets of ET data and 736 site years of FLUXNET measurements. We assessed whether the simulated spatial variations of carbon and water fluxes by 12 ESMs were broadly consistent with our understanding using three functional constraints: WUE, LUE, and the Budyko framework. Our results showed that most CMIP5 ESMs underestimated both WUE and LUE, as results of overestimation of ET and underestimation of GPP. In addition, all 12 ESMs underestimated the spatial variation of the observed WUE among different land surface types. Although the simulated partitioning of precipitation into ET and runoff by 12 ESMs for CMIP5 is broadly consistent with the predictions by the Budyko framework, the best-fit value of the Budyko parameter ω was quite variable among the 12 ESMs. Therefore, future model development should focus on concurrently improving the algorithms of precipitation partitioning between ecosystem ET and runoff in hydrological perspective and the coupling of carbon and water cycles across different land surface types.
This work is supported by National Key R&D Program of China (2017YFA0603603) and Australian Research Council Discovery Early Career Researcher Award Project (DE120103022). UK Met Office authors were supported by the Joint UK BEIS/Defra Met Office Hadley Centre Climate Programme (GA01101) and the European Union’s Horizon 2020 research and innovation programme under Grant Agreement 641816 (CRESCENDO). J. T. acknowledges RCN-funded project EVA (229771) and Bjerknes-BIGCHANGE. We thank the CMIP5 modelers, and all simulations were downloaded online (http://cmip-pcmdi.llnl.gov/cmip5/data portal.html). Global flux data were provided by the global FLUXNET community (http://fluxnet.fluxdata.org/), and the data of Jung et al. (2011) was highly appreciated.
Supplemental information related to this paper is available at the Journals Online website: https://doi.org/10.1175/JCLI-D-16-0177.s1.
Publisher’s Note: This article was revised on 25 February 2019 to add the open access designation that was missing when originally published.