1. Introduction
Many LSMs were developed and pressed into service during the 1980s and 1990s to provide lower boundary conditions for the atmospheric GCMs used in climate and weather simulation and prediction (Santanello et al. 2018). This occurred at a time when observations of key land surface variables, and the coupled processes that link the water and energy cycles between the land and atmosphere, were extremely limited. As a result, performance of coupled LSM–GCM systems has been suboptimal (Dirmeyer et al. 2018).
The necessary observational datasets for validation are only recently becoming available; datasets that combine collocated measurements of land surface states, surface fluxes, near-surface meteorology, and properties of the atmospheric column. Early field campaigns (e.g., Sellers et al. 1992, 1995; Famiglietti et al. 1999; Jackson and Hsu 2001; Andreae et al. 2002) provided observations that helped advance theory and model parameterization development, but their short periods of operation meant collected data provided limited sampling of the phase space of land–atmosphere interactions, rarely quantifying interannual variability. In the mid-1990s, networks of observing stations began to be established and maintained, providing long-term datasets. A growing number of soil moisture monitoring networks have been established. Their data have been collated, homogenized, and standardized by two separate efforts (Dorigo et al. 2011, 2013, 2017; Quiring et al. 2016). Those datasets were used by Dirmeyer et al. (2016) in a first-of-its-kind multimodel multiconfiguration assessment of soil moisture simulation fidelity.
Simultaneously, efforts began in the ecological community to collect surface flux data over a variety of biomes (FLUXNET; Baldocchi et al. 2001). Over time, in consultation with interested scientific communities, FLUXNET expanded their instrumentation suite to measure soil moisture, ground heat flux, and four-component radiation, allowing detailed closure of the surface energy balance. Rigid standards for data formatting and dissemination within and across regional networks was lacking, so a global standardized and quality-controlled subset of data from many FLUXNET sites was produced (La Thuile FLUXNET dataset, see http://www.fluxdata.org) covering multiple links in the coupled land–atmosphere process chain (Santanello et al. 2011). The La Thuile dataset enabled a greater degree of model validation (e.g., Williams et al. 2009; Bonan et al. 2012; Boussetta et al. 2013; Melaas et al. 2013; Balzarolo et al. 2014; Purdy et al. 2016).
In this study, we employ the updated FLUXNET2015 synthesis dataset (Pastorello et al. 2017), expanding the multimodel multiconfiguration study of soil moisture simulations in Dirmeyer et al. (2016) to a global assessment of surface energy and water balance simulations and basic metrics of land–atmosphere coupling. Section 2 describes the observational data and models examined. The next three sections present validations of model annual means, annual cycles, and coupling metrics. We then discuss some of the pathological model behaviors that emerge from the analysis and present conclusions. Throughout the paper, we present synthesis figures. Detailed scatterplots showing results across all FLUXNET2015 sites for each model are consigned to the online supplement.
2. Data and models
The range of dates of data varies considerably among model simulations and also between individual observational sites. We analyze spatial variability and compare only climatologies (annual means or mean annual cycles) in order to minimize the effect of such asynchronicities, and we present a quantification of interannual variability. It is not the intent of this study to validate model simulations of specific events, but rather their overall coupled land–atmosphere behavior. Note also that many coupling metrics, including those used here, can be calculated for LSMs from a combination of forcing and model output, even though the LSMs are not coupled to GCMs.
a. Observed data
In situ measurements of near-surface meteorological variables, surface fluxes, and soil moisture used for model validation come from the November 2016 version of the FLUXNET2015 station dataset. Daily, monthly, and yearly data have been used; processing of the meteorological, radiation, heat flux, and surface hydrologic data including gap filling are described by Reichstein et al. (2005) and Vuichard and Papale (2015). Only the tier 1 (open access) data are used in this study (see Table S1 for a complete list of sites); Fig. 1 shows the spatial distribution of sites and some of the key characteristics regarding data availability. A total of 166 sites provide 1242 site-years of data, but coverage is concentrated in the midlatitudes, and there is particular underrepresentation in the tropics.
The variables processed for this analysis include surface pressure, near-surface air temperature and vapor pressure deficit, precipitation, four-component and net radiation, surface sensible and latent heat fluxes [gap filled following the method of Reichstein et al. (2005) and energy balance closure corrected], and soil water content measured at the first (shallowest) sensor. There is no consolidated information on the depth of the shallowest sensor across all sites, but typically it is at 5 or 10 cm below the surface. Vapor pressure deficit is converted to specific humidity using the Clausius–Clapeyron relationship. We have used the provided FLUXNET2015 data at the corresponding time intervals for each calculation: yearly data for annual means, monthly data for annual cycles, and daily data for calculating coupling indices.
In addition, we examine a number of gridded global precipitation products for comparison to FLUXNET2015 sites. These are listed in Table S2.
b. Model systems
Four global modeling systems are evaluated: two from operational forecast centers and two that are primarily used for research. The operational systems are from the U.S. National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Prediction (NCEP) and the European Centre for Medium-Range Weather Forecasts (ECMWF). The research systems are from the U.S. National Aeronautics and Space Administration (NASA) Global Modeling and Assimilation Office (GMAO) and the U.S. National Center for Atmospheric Research (NCAR).
Table 1 summarizes the model components and configurations. Generally, each modeling system is interrogated in three different configurations: 1) LSM only (offline), driven by gridded, observationally based meteorological analyses including downward radiation; 2) LSM coupled to GCM in a free-running mode where the coupled system evolves unconstrained after initialization; and 3) reanalysis, where the coupled LSM and GCM are constrained by data assimilation at diurnal or subdiurnal increments to represent the actual historical evolution of state variables. The NCAR model system does not have an associated reanalysis, so to keep the four-by-three matrix filled, two different reanalyses from GMAO are included. Note that when the coordinates for a FLUXNET2015 site lie within a model’s ocean grid cell, it is excluded from comparisons for that model. Thus, the number of stations compared varies from model to model depending on resolution and the land–sea mask.
Specifications for the four land–atmosphere model systems, including time span of data and spatial resolution. Two-letter abbreviations are used: for the first letter N = NCEP, M = NASA (MERRA system), C = NCAR (Community models), E = ECMWF; for the second letter L = LSM run “offline”, C = LSM coupled to GCM, R = reanalysis (except that two MERRA reanalyses are included, so they are labeled 1 and 2).
1) NCEP
Data for the offline configuration comes from an author-produced simulation using Noah LSM version 2.7.1 (Ek et al. 2003; Mitchell 2005) driven by 3-hourly gridded meteorological data from the Terrestrial Hydrology Research Group at Princeton University (Sheffield et al. 2006). The free-running coupled land–atmosphere simulation consists of a subset of 48 years from a 420-yr-long current climate simulation of CFSv2 initialized in 1980 (Shukla et al. 2018). The coupled simulation is unique among the model systems in that it also includes a coupled ocean component. However, this should have very little effect on the local coupled land–atmosphere behavior of the model. Years 2101–48 of the simulation are used, but the calendar dates have no real meaning in a fully coupled climate model so far from the initial state, wherein attributes such as atmospheric composition, solar intensity, orbital parameters, etc., are held constant at late-twentieth-century values. The latest NCEP reanalysis is also examined (CFSR; Saha et al. 2010), which combines a global land data assimilation system derived from the NASA Land Information System (LIS; Peters-Lidard et al. 2007), driven by a blended global precipitation analysis (Xie and Arkin 1997; Xie et al. 2007), used to update the coupled analysis cycle once per day over the period 1979–2009.
2) GMAO
Two reanalyses are included for GMAO: version 1 and version 2 of the Modern-Era Retrospective Analysis for Research and Applications (MERRA; Rienecker et al. 2011; Reichle et al. 2017a). MERRA data cover the period 1980–2015. MERRA-2 is the current state-of-the-art reanalysis covering 1980–2015 (Molod et al. 2015; Gelaro et al. 2017) and is the source of most of the meteorological forcing data for the offline simulation of the Catchment LSM (GMAO 2015a,b). As part of the MERRA-2 reanalysis, the GCM-generated precipitation is corrected with observation-based precipitation before it reaches the land surface (Reichle et al. 2017b); the reanalysis meteorological fields thus feel the observed precipitation rates indirectly through the surface fluxes. Additionally, a global 36-yr offline Catchment simulation on the MERRA grid and a 16-yr coupled GEOS-5 Catchment simulation at half-degree resolution with prescribed observed SSTs were generated for this comparison.
3) NCAR
There is no operational reanalysis produced with the NCAR Community Earth System Model (CESM). However, CESM is widely used for research in the academic community, and we have generated offline and coupled simulations for this comparison. The offline simulation uses version 4.5 of the Community Land Model (CLM; Lawrence et al. 2011) driven with forcing spanning 1991–2010 from version 4 of the blended and gap-filled CRU–NCEP (CRUNCEP; Viovy 2013) 0.5° dataset (https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.CRUNCEP.v4.html) aggregated to the nominal 1° GCM resolution. A simulation with CLM4.5 coupled to CAM4 in CESM1.2.2 has been produced spanning 1991–2014 with specified climatological SSTs.
4) ECMWF
The offline simulation from ECMWF is with cycle 43R1 of the Hydrology Tiled ECMWF Scheme of Surface Exchanges over Land (HTESSEL) run at ~16 km resolution based on a cubic octahedral global grid (TCo639) for the period 1979–2015. This offline simulation follows ERA-Interim/land configurations closely (see Balsamo et al. 2015), forced by ERA-Interim meteorology and fluxes with an altitude correction applied to temperature, humidity, and surface pressure. This offline simulation is used to initialize the land state of the operational ECMWF hindcasts. The coupled simulation comes from the Athena Project (Kinter et al. 2013) for 1961–2007, where an older version of HTESSEL is coupled to IFS cycle 32R3 at a similarly high native horizontal resolution and specified observed SSTs, but the data have been postprocessed to a 1.125° uniform grid. ERA-Interim (Dee et al. 2011), spanning 1979–2015, provides the reanalysis configuration of data for the comparison, which used TESSEL prior to hydrology upgrades.
3. Annual means
The comparison of models to FLUXNET2015 observations of annual means amounts to an assessment of model ability to reproduce global spatial patterns (within the limitations of the uneven distribution of station locations) of the variables’ time averages. For the offline LSM simulations, meteorological forcing data are specified from gridded datasets, so their correlation to FLUXNET2015 observations is not a pure reflection of model performance as the forcing data constrain LSM behavior. Similarly, for the reanalysis products, performance reflects a combination of model characteristics, data assimilation techniques, and the distribution and quality of the data assimilated. Assimilation of observational data constrains the coupled land–atmosphere model behavior to some degree. While the free-running model simulations provide an unabridged assessment of model performance, results from the other modes of simulation are nevertheless enlightening.
As an indicator of observational uncertainty and the impact of comparing model gridbox values to field sites, we first note how a number of gridded observational precipitation products and the reanalyses validate against precipitation measurements at FLUXNET2015 locations. Figure 2 shows mean (dots) and span (whiskers) of annual precipitation totals, where the abscissa always corresponds to measurements from the FLUXNET2015 sites. For most sites, the observational products (top two rows of Fig. 2) cover the entire time span of FLUXNET2015 observations (see Table S2 for details). All reanalyses (bottom row of Fig. 2) except CFSR span the FLUXNET2015 period. Several statistics of spatial agreement are shown: Pearson’s product moment correlation coefficient rp, Spearman’s rank correlation coefficient rs, root-mean-square error (RMSE), slope of the best-fit linear regression of Y on X (Slope), and the fraction of total stations (labeled “Span Diag” in Fig. 2), where the span of the individual annual totals from the gridded products (vertical whiskers) overlap the span from FLUXNET2015 sites (horizontal whiskers). The last statistic tests the possibility that the FLUXNET2015 observations and gridded estimates do not come from distinct populations, that is, their ranges overlap.
Estimates from gridded observational datasets, which range in spatial resolution from 0.25° [Multi-Source Weighted-Ensemble Precipitation (MSWEP), TRMM] to 2.5° (GPCP), provide a plausible upper bound to the accuracy we could expect from gridded Earth system models. For the 166 (or fewer) FLUXNET2015 sites compared, which admittedly represent a rather uneven sampling of global terrestrial precipitation, three observational products score at the top: MSWEP, CPC Unified, and University of Delaware. Each has a Pearson’s correlation of nearly 0.8, a rank correlation between 0.8 and 0.9, and the highest number of stations whose ranges span the diagonal X = Y line. The lower limit for RMSE across these sites is about 240 mm. Note that all gridded products underestimate the slope, indicating the inability of large area averages to resolve local variations in average precipitation.
MERRA-2 performs on par with the best gridded observed products, namely because it reports a bias-corrected precipitation that is used as part of the assimilation process instead of model-generated precipitation as an input to the LSM (Reichle and Liu 2014). Thus, it is effectively another gridded observational dataset for precipitation. Figure S1 compares the precipitation predicted by the model physical parameterizations in MERRA-2 alongside the corrected version in the same fashion as Fig. 2. The correction greatly reduces bias, cuts RMSE by one-third, slightly improves spatial correlations, and increases the number of stations spanning the diagonal by 28%. CFSR significantly underperforms other reanalyses at FLUXNET2015 locations.
Precipitation is among the most difficult quantities for models to simulate. We expect among near-surface meteorological variables the lowest correlations and largest coefficient of variation for precipitation. It also has many observationally based datasets to choose from, providing a robust estimate of skill to be expected from comparing point measurements to gridded datasets. Figure 2 provides generous thresholds, particularly for correlations, to keep in mind when assessing model simulations of the terms of the surface water and energy balance. As shown below, correlations of 0.7–0.8 are a challenge for models to attain for precipitation, as well as some other water and energy budget terms.
Among near-surface meteorology (e.g., temperature and specific humidity) and downward surface fluxes (including shortwave and longwave radiation), precipitation has the greatest small-scale variability on monthly to annual time scales and is thus the most difficult land surface “forcing” to replicate at the FLUXNET2015 sites. Figures S2–S6 show the scatters and statistics for the models listed in Table 1 for these five variables. Here, the restriction that the years of the models match those at each FLUXNET2015 site is lifted, and the climatologies of the complete datasets are compared. Not surprisingly, the global distribution of annual mean temperature is very well reproduced by the models (Fig. S2), with 88%–96% of the observed variance explained. Observed specific humidity is only slightly less well correlated among the models (Fig. S3), but there is a consistent positive bias relative to FLUXNET2015 measurements. Patterns of annual mean downward radiation (Figs. S4, S5) are well simulated, with a tendency for a slight negative bias in longwave radiation (Fig. S5) and a stronger positive bias in shortwave radiation across models (Fig. S4), consistent with other assessments of model shortwave errors that depend on GCM radiative transfer parameterizations (cf. Slater 2016). Precipitation shows the least agreement; note the bottom row of Fig. S6 is not identical to that of Fig. 2 because the years compared differ. Nevertheless, the results are similar. We can consider MERRA-2 as representing the upper limit of comparison for annual precipitation when the periods do not match between models and observations. Offline Catchment actually performs slightly better than MERRA-2, and CFSv2 is generally the poorest-performing model system in the set. Free-running climate models understandably perform worse than either reanalyses or offline LSM simulations, as they are least constrained by observational data. In the case of CFSv2, there are essentially no constraints within the Earth system as an ocean model is coupled; other free-running simulations have specified SSTs.
Precipitation is a major source of error at the land surface, but so are elements of the radiation budget. We employ Taylor diagrams to synthesize the statistics of correlation across FLUXNET2015 sites; RMSE and standard deviation are normalized by observed values. Figure 3 shows the global distribution of annual mean downward radiation terms is well simulated across all model configurations, with downward shortwave radiation performing slightly better than downward longwave radiation. Recall for the LSM-only models that downward radiation is an input forcing, and the quality of those datasets can vary significantly (Slater 2016). However, the distribution of upward shortwave radiation is rather poorly simulated, with the NCEP models showing the worst correlations and the NCAR models showing the best (yet explaining less than half of the variance). There is also a strong tendency to underrepresent the spatial variability (normalized standard deviations less than 1) of downward shortwave radiation. This degrades the simulation of net radiation, which has consistently lower correlations than downward radiation terms, yet uniformly better than upward shortwave radiation. The overlap of the spans of annual mean values from models and observations (size of the dots) generally decrease from shortwave down to longwave down to shortwave up.
Figure 3 implies discrepancies in the representation of surface albedo across models at FLUXNET2015 sites. We show a Taylor diagram for calculated albedo in Fig. 4. As there are many sites at relatively high northern latitudes that experience snow cover for some part of the year, snow albedo could specifically be a problem. However, a plot of only the JJA albedo verification shows boreal summer generally has even lower fidelity, and systematically low spatial variability, compared to the annual mean. The overlap between the spans of annual mean albedos range among the models from 16% to 38% of FLUXNET2015 sites, but for JJA they span only 13%–24%.
The low variability could be explained by the fact that most LSMs, whether stand-alone or coupled, have a simple parameterization of albedo based on properties of a small number of vegetation and soil types, often specified as a climatological seasonal cycle. CLM actually calculates surface albedo based on a number of properties including vegetation density and zenith angle of the sun, which may lead to the somewhat better performance of the NCAR models. As described later, the offline NCEP LSM (identified as NL) specifies a multiyear satellite-derived monthly green vegetation fraction as a boundary condition that appears in Fig. 4 to enhance variability, while its positive biases have been noted by Xia et al. (2012). Furthermore, discrepancies between gridbox average albedo and local conditions at field sites, including the effect of vegetation differences and soil moisture on albedo (Zaitchik et al. 2013), could add spatial “noise” to the FLUXNET2015 values relative to what models are representing. Nevertheless, such discrepancies lead to a degradation in the representation of surface available energy that is partitioned between sensible, latent, and ground heat fluxes. Even an otherwise “perfect” LSM could not produce the right values of these fluxes if net radiation is incorrect. Coupled with errors in precipitation, which affect available soil moisture and thus Bowen ratios, LSMs are at a compounded disadvantage in simulating the surface water and energy budget terms.
In Fig. 5 we correlate across the stations the mean errors in key water and energy cycle quantities and present a schematic representation of the relative coupling or connectedness exhibited between terms. This also suggests how errors in the simulation or specification of one term can propagate to others through the land–atmosphere coupling process chain (cf. Santanello et al. 2011). Parameter rs is generally larger than rp because it does not overemphasize outliers and thus is used for this comparison. Ratios show the fraction of models with correlations at the 90% confidence level, and p values are based on the average correlation across models. Note that the number of included stations varies depending on the availability of observed data (recall from Fig. 1 that a number of FLUXNET2015 sites do not allow for albedo estimations) and among models depending on whether the corresponding grid box is water or land. Furthermore, the data saved from the free-running ECMWF model simulations (EC) do not allow for estimation of albedo, so 11 models are compared for albedo.
Unsurprisingly, we find surface net radiation errors correlate strongly to albedo errors, with 11 of 11 models registering significant correlations (two-tailed p values < 0.05), and the multimodel average correlation across 114–118 sites has a p value of 4 × 10−7. For net radiation versus precipitation, only 2 of 12 models (CL and M1; see Table 1) show significant correlation across 144–151 sites and p = 0.55 for the multimodel average, so no direct arrow is drawn in Fig. 5. Note that precipitation errors arise not only from misrepresentation of land–atmosphere interactions, but also from the parameterization of dynamic and thermodynamic processes (so-called “model physics”) in the GCM.
FLUXNET2015 reports both raw and Bowen-ratio-corrected heat fluxes. Corrected fluxes are available at fewer than 100 of the sites (two-tailed p = 0.05 for correlations
This analysis shows that models have troublesome errors in both the surface water and energy cycles, which make their way into the land–atmosphere coupling process chain. As a result, the degree to which weather and climate models correctly simulate feedbacks of land surface anomalies onto the atmosphere may be cast into some doubt. However, the origins of several sources of error have been identified, and their alleviation can be pursued. In section 5 we will examine directly model fidelity in simulating metrics of land–atmosphere coupling.
4. Mean annual cycle
The next criterion for models, beyond simulating the annual means among FLUXNET2015 sites, is reproducing the annual cycle. The first harmonic is fit to the 12 monthly means for each variable, determining phase and magnitude (half of the valley-to-peak distance) using a standard Fourier transform. Errors in phase and magnitude at each station, quantified across all stations with similar metrics as the annual mean, indicate skill in simulating the annual cycle. Amplitude errors are displayed in conventional scatter diagrams (see Figs. S15–S24), but to display information for phase errors, we have configured the classical scatter diagram in a polar projection (see Figs. S25–34; the caption of Fig. S25 gives a detailed description of those plots). The whiskers in the supplemental figures again show that models frequently display a smaller range of year-to-year variability than data from FLUXNET2015 sites. This may be partially explained by the scale difference (point measurements will vary more than gridbox averages) but is also likely due to the overly deterministic nature of many model parameterizations (Palmer 2012).
Taylor diagrams summarize the results across models. We focus on depictions of energy budget terms, as they reveal some of the main issues among models. Figure 6 shows model performance in simulating the amplitudes of the annual cycles of net radiation and sensible and latent heat fluxes across FLUXNET2015 sites. All model products demonstrate similar skill for net radiation, clustered between 0.64 and 0.78 correlation and a tendency toward too large an annual cycle. Only the offline NCEP and coupled ECMWF models have a negative bias in amplitude. Latent heat flux simulations show lower skill for every model, clustering between 0.28 and 0.43 for correlations. At the stations where energy-balance-corrected fluxes are provided, correlations improve to 0.37–0.50 (not shown). The positive bias is not so pervasive for latent heat; rather, it appears the positive bias in net radiation tends to be expressed in the sensible heat term. There is also a much larger spread among models for sensible heat, both in terms of correlation (0.14–0.54) and normalized standard deviation (0.78–1.50).
The models’ skill in representing the phase of the annual cycle has a similar distribution (Fig. 7). The phase of net radiation is best represented; latent and sensible heat have spatial correlations of phasing between ~0.8 and 0.92, with sensible heat phases having slightly lower fidelity in general. It is interesting, as the general consensus is that sensible heat flux is a simpler process to model than latent heat flux, yet it has been shown in other contexts that LSMs struggle more to simulate sensible heat flux (e.g., Best et al. 2015).
The Taylor diagram for the annual cycle of albedo (Fig. 8) shows very similar correlations of the yearly amplitude between models and observations (0.50–0.71) but a large range in standard deviation; Noah v2.7.1 (NL) shows a particularly high value contributing to large RMSE. The phase is better represented by all models, but interestingly the standard deviations are uniformly overestimated. Most models now use global MODIS-based datasets of albedo as either a parameter set or for calibration of surface radiative parameterizations, so the large intermodel spread and lack of obvious clustering within families of models is surprising.
5. Coupling metrics
Correlations between land surface state variables and surface fluxes (the terrestrial leg of coupling) and between land surface fluxes and atmospheric states or properties (atmospheric leg) may indicate feedbacks. For instance, in the terrestrial leg, positive (negative) correlation between soil moisture and latent (sensible) heat flux implies soil moisture control of fluxes (a moisture limited situation) as opposed to energy (net radiation) limited situations where atmospheric states control the fluxes. However, the variance in the driving term(s) must also be sufficiently large for a sensitivity of atmosphere to the land to have a consequential impact on climate, relative to other factors. A coupling index I can be constructed from terms in either leg:
Figure 9 synthesizes the performance of the various model configurations regarding two-legged coupling metrics linking soil moisture to boundary layer properties. The formulae for the coupling indices are indicated on the figure axes calculated from daily mean values. The terrestrial leg quantifies the combined sensitivity (correlation) of surface fluxes [here, latent heat flux (LHF)] to land states [soil moisture (SM)] with variability (standard deviation) of the flux. The atmospheric leg links surface fluxes [sensible heat flux (SHF)] to atmospheric states (LCL, which combines near-surface temperature and humidity information). Larger values denote stronger feedback linkages.
In each panel of Fig. 9, similar to the approach of Sippel et al. (2017), quantities are calculated for the three consecutive months that have the warmest average temperature according to the FLUXNET2015 data. We distinguish between positive values of each metric, which indicate the existence of feedbacks from land to atmosphere, from negative (no feedbacks) by coloring the four quadrants by their coupling regimes: red = both legs present and a full coupling pathway; green = the land leg is present, the atmospheric leg is missing; blue = atmospheric leg is present, land is missing; gray = neither leg present. The white dots show where FLUXNET2015 sites fall in this two-dimensional metric space. The colored dots are each model’s rendering of the metrics for the grid boxes containing the FLUXNET2015 sites; the color indicates the quadrant according to the FLUXNET measurements. Thus, the more colored dots that fall in the quadrant with the matching color, the better the model is reproducing the global pattern of coupling regimes.
The model centroid usually lies below and to the right of the observed centroid for a given coupling regime, meaning models tend to overestimate the terrestrial coupling index (the rightward offset), yet underestimate the strength of the atmospheric leg (the downward offset). Recall the number of FLUXNET2015 sites compared is not the same for each model. The percentage in each quadrant indicates how many of the FLUXNET2015 sites in that regime are correctly placed in the right quadrant. For instance, the CFSR has 76% of the FLUXNET stations exhibiting both coupling legs (red) in the correct regime. However, there are clearly many dots of other colors also in the red quadrant, showing the model places many other stations erroneously in that regime. Interestingly, none of the models put the few sites with no warm-season coupling in the gray quadrant. Overall, the reanalyses perform best: a 56.5% overall hit rate for the fully coupled regime versus 52.8% for coupled models and 44.0% for offline LSMs, and for the atmosphere-only coupling regime, 49.2% versus 33.0% for coupled models and 31.6% for offline LSMs.
We have also examined the performance of the models for their simulation of the observed FLUXNET2015 correlations and standard deviations (the two terms in the coupling indices) separately. As implied previously for the terrestrial leg, there is a positive bias in correlations for all models except for ERA-Interim (Table 2). Bias in the standard deviation of latent heat fluxes across all sites is small for most models, so most of the positive bias in the coupling index comes from the correlation term. The model biases are even stronger in the anticorrelation between soil moisture and sensible heat flux (not shown). However, there is generally an even greater bias in correlations for the atmospheric leg (Table 2) paired in every model with an underrepresentation of the daily variability of the LCL. These two biases compound, leading to the strong underrepresentation of coupling in the atmospheric leg of land–atmosphere interactions.
The average value of the two terms used to calculate the terrestrial and atmospheric coupling indices using data from FLUXNET2015, each model, and averages from various groupings of the models. See Table 1 for explanation of two-letter abbreviations.
There are several caveats to note. First, the notion of calculating the atmospheric coupling leg from offline LSM simulations is only partially justifiable. It is certainly possible to calculate the correlations between surface fluxes and LCL height (which depends on near-surface meteorological data supplied as forcing to the LSM), but there is no possibility for the fluxes to affect 2-m temperature or humidity. Thus, this is more of a test of model consistency than a true diagnosis of coupling.
Second, estimates of the correlation component of the coupling indices from observed data must be closer to zero than the true values in nature, because random measurement errors will degrade correlations (Robock et al. 1995). Thus, it is not necessarily wrong that models show a stronger terrestrial coupling leg than FLUXNET2015 data. The degree of impact can be estimated for variables such as soil moisture, whose autocorrelation time scales are much longer than the daily data interval (cf. Dirmeyer et al. 2016) but can be difficult to estimate from small samples or for other quantities. Nevertheless, the fact that models routinely underestimate the strength of the atmospheric leg runs counter to being attributable to random observational errors at FLUXNET sites and likely represents real model bias.
Finally, the difference in scale between flux tower measurements (typically representative of conditions in an area of a square kilometer or less) and model gridbox averages (here ranging from 200 to 2 × 104 km2) can affect statistics. Dirmeyer et al. (2016) showed there was little sensitivity of estimates of temporal variations in daily soil moisture to spatial-scale differences in the model gridbox range; however, the same may not be true for other terms or for correlations. The larger the averaging area, the smoother we should expect time series to be, potentially affecting estimation of coupling indices.
6. Discussion and summary
We have compared four different global model systems in multiple configurations (LSM only, LSM coupled to GCM, and reanalysis) with flux tower observations from 166 sites in the global FLUXNET2015 dataset to determine how well they reproduce the spatial distribution of annual means and the annual cycle of state variables and terrestrial surface fluxes and coupling indices between land and atmosphere. Returning to Table 2, there is a separation evident between the three classes of models. For the terrestrial leg of land–atmosphere coupling, all models appear to overestimate correlations between soil moisture and latent heat flux, with the caveat discussed previously that correlations necessarily skew low when calculated from observed data. Nevertheless, assuming as much as a 50% reduction from true correlations, it appears the reanalyses do the best job at reproducing observed correlations, followed by the free-running models and then the uncoupled LSMs. There is a similar stratification for the standard deviation of latent heat flux: reanalyses very closely represent the observed temporal variability of this flux, while coupled models and stand-alone LSMs progressively underestimate it. For the atmospheric leg, represented by the coupling index between sensible heat flux and LCL height, all classes of models severely underestimate the correlation and the day-to-day variability in the LCL. Reanalyses again do the best job at correlations, and stand-alone LSMs are the worst. Here, coupled models fare slightly better than reanalyses in representing LCL variance. Given that reanalyses are somewhat constrained by the assimilation of observations, the errors in those models do not manifest as freely, so it makes sense reanalyses should verify the best. On the other hand, offline LSMs lack some of the coupling we are trying to gauge. For example, surface sensible and latent heat fluxes cannot affect near-surface temperature and humidity in such a configuration. This prescription of near-surface states interferes with the feedback processes.
General characteristics of note are that scatter diagrams of model versus FLUXNET2015 quantities almost always show a linear regression slope indicating a wider range of variation in the observations. Models also tend to have lower interannual variability (length of whiskers) than observations suggest. These traits are consistent with scale differences between model grid cells and the area sampled by flux towers; model grid values represent areas at least 2–4 orders of magnitude larger, which particularly affects precipitation forcing. Thus, this difference is not a concern regarding model performance per se, but rather representativeness across scales.
Another general characteristic is that the models verify better against the corrected surface fluxes and quantities derived from them, wherein observed sensible and latent heat values are adjusted to close the surface energy budget. This makes sense as models close surface energy (and water) budgets by design, whereas closure is not assured in an observational setting where a number of instruments, with different calibrations and error characteristics, contribute separate terms of the surface balances. However, when the propagation of model errors through the energy and water cycles are traced (Fig. 5), EF in models shows strong sensitivity to radiation errors, implying that conservation of Bowen ratio (and thus EF) as a means to correct observed heat fluxes and close the energy balance may not be the most efficacious.
There are differences that do appear to reflect general model biases. All models and configurations show a positive bias in near-surface humidity (Figs. S3, S14), downward shortwave radiation (Figs. S4, S17), and a range of biases in downward longwave radiation (Fig. S5). Such radiation biases are a long-standing problem in global models (cf. Dirmeyer et al. 2006) and stem from problems in the parameterization of atmospheric radiative transfer, clouds, and aerosols in GCMs. However, not all radiative errors are atmospheric in origin; there is clear indication that LSMs struggle to represent the spatial and temporal variability of surface albedo (Figs. 4, 8).
Combined with well-known difficulties that models have in simulating precipitation (Figs. 2, S6, S15, S25), it becomes extremely challenging for models to partition available energy correctly at the surface between latent, sensible, and ground heat fluxes and to reproduce the spatiotemporal patterns of relationships between soil moisture, surface fluxes, and the lower troposphere. Errors in latent heat flux generally correlate significantly to precipitation errors, while sensible heat flux errors relate strongly to surface albedo errors. Evaporative fraction errors connect to both, but more strongly to the energy (albedo–sensible heat flux) pathway than the water (precipitation–latent heat flux) pathway. Height of the LCL, which has a strong negative bias across all models related to the positive humidity bias, has errors that correlate strongly to the water cycle pathway, but also to the energy cycle pathway.
The spatial distributions of the annual cycles are generally well reproduced for energy budget terms, except for upward shortwave radiation, related to the albedo problems discussed earlier. However, there is a tendency for too strong a seasonal cycle in net radiation, caused by excessive summertime downward shortwave radiation, and expressed more strongly in the annual cycle of sensible heat flux than latent heat flux. Models generally do very well representing the spatial distribution of the phasing of the annual cycle, even for precipitation (64%–92% of variance explained) and soil moisture (40%–61% of variance explained).
Finally, despite the barriers described above to models’ capacity to represent the spatiotemporal distribution of land–atmosphere coupling, we find models often do a reasonable job. Some systematic biases are evident: models consistently overestimate the strength of the terrestrial leg of coupling (namely, too strong a correlation between soil moisture and turbulent heat fluxes), yet even more clearly underestimate the strength of the atmospheric leg (both the correlation between surface fluxes and boundary layer properties and day-to-day variability of boundary layer properties). Random observational error tends to reduce correlations between observed quantities, so it is possible that models are not greatly overestimating the terrestrial leg of coupling, or perhaps are not overestimating it at all. However, we find the time series at most FLUXNET2015 sites are too short to robustly estimate the random error effects on correlation—perhaps in another 10 years we will be able to quantify these errors. Similarly, the spatial-scale differences between observations and model output may contribute to the variance differences in the atmospheric leg, but disparity in correlations between surface fluxes and LCL could only be stronger than calculated here, not weaker, because of the effect of measurement error.
LSMs forced by global gridded meteorology rather than local forcing from the tower sites themselves are handicapped to some degree (cf. Chen et al. 2018). So our most confident conclusion regarding land–atmosphere coupling is that models underrepresent the feedback of surface fluxes on boundary layer properties at FLUXNET2015 sites. We find this unique dataset has potential for model development and parameter optimization to alleviate biases in model configurations shown to mirror those used in forecasting applications (Orth et al. 2016, 2017).
Overall, we conclude that many of the long-known problems and biases in global models of the land–atmosphere portion of the climate system still exist. Nevertheless, there is a fair degree of compensation among errors, such that model representations of land–atmosphere coupling often appear fairly good. Some targets for model improvement are clear, however, as coupling linkages suggest processes where problems may lie. The representation of surface albedo (LSM) and the quantities of downward radiation at the surface (GCM) need improvement among the energy cycle terms, along with the partitioning of available energy between latent and sensible heat flux (a coupled model development problem). Precipitation errors remain large, and inconsistencies in representing soil moisture among models and between models and nature (cf. Koster et al. 2009) remain stubborn issues.
As one might expect, reanalyses tend to perform better, as they are more constrained by observational data. LSMs run offline also benefit from meteorological forcing that is highly observational in origin, but can be handicapped by their lack of two-way interaction with the lower troposphere. It should be clear from the various figures that individual models perform better or worse at simulating specific facets of land–atmosphere interactions. However, we emphasize here the commonalities among models more than the differences. This study is not primarily intended as a model intercomparison, but rather a multimodel attempt to draw model-independent conclusions about the current state of performance of land–atmosphere models (in various configurations) by comparing them with a new and unique observational dataset.
Furthermore, this study is not a final judgement, but a first look that will hopefully catalyze accelerated development and improvement in coupled land–atmosphere modeling. Application of cross-component metrics like coupling indices can reveal prime areas for model development that are not evident from piecewise evaluation of model components. The next step is intensive, focused sensitivity studies with individual models, preferably validated in the context of coupled model systems, that will zero in on the problematic parameterizations. We may also need to revisit some of the fundamental assumptions that underpin the formulations in models (e.g., Cheng et al. 2017).
Furthermore, it is clear that long-term observational monitoring is highly valuable, and that value only increases with the duration of datasets at individual sites. Greater spatial distribution of flux tower sites, especially into undermonitored regions outside middle and high latitudes, would further increase the overall usefulness to model development.
Acknowledgments
This work has been primarily supported by National Aeronautics and Space Administration Grant NNX13AQ21G. NCAR model simulations were conducted with support from the National Science Foundation Grant AGS-1419445. The ERA-Interim reanalysis data are provided by ECMWF and processed by LSCE. This work uses eddy covariance data acquired and shared by the FLUXNET community (listed in Table S1), including these networks: AmeriFlux, AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, Fluxnet-Canada, GreenGrass, ICOS, KoFlux, LBA, NECC, OzFlux-TERN, TCOS-Siberia, and USCCC. The FLUXNET eddy covariance data processing and harmonization was carried out by the European Fluxes Database Cluster, AmeriFlux Management Project, and Fluxdata project of FLUXNET, with the support of CDIAC and ICOS Ecosystem Thematic Center, and the OzFlux, ChinaFlux and AsiaFlux offices. Taylor diagrams were produced using a modified version of the GrADS script developed by Bin Guan. We thank Cristina Benzo for her contributions to produce Table S1 and Eleanor Blyth and two anonymous reviewers for their helpful review comments.
REFERENCES
Andreae, M. O., and Coauthors, 2002: Biogeochemical cycling of carbon, water, energy, trace gases, and aerosols in Amazonia: The LBA-EUSTACH experiments. J. Geophys. Res., 107, 8066, https://doi.org/10.1029/2001JD000524.
Baldocchi, D., and Coauthors, 2001: FLUXNET: A new tool to study the temporal and spatial variability of ecosystem-scale carbon dioxide, water vapor and energy flux densities. Bull. Amer. Meteor. Soc., 82, 2415–2434, https://doi.org/10.1175/1520-0477(2001)082<2415:FANTTS>2.3.CO;2.
Balsamo, G., and Coauthors, 2015: ERA-Interim/Land: A global land surface reanalysis data set. Hydrol. Earth Syst. Sci., 19, 389–407, https://doi.org/10.5194/hess-19-389-2015.
Balzarolo, M., and Coauthors, 2014: Evaluating the potential of large-scale simulations to predict carbon fluxes of terrestrial ecosystems over a European Eddy Covariance network. Biogeosciences, 11, 2661–2678, https://doi.org/10.5194/bg-11-2661-2014.
Best, M. J., and Coauthors, 2015: The plumbing of land surface models: Benchmarking model performance. J. Hydrometeor., 16, 1425–1442, https://doi.org/10.1175/JHM-D-14-0158.1.
Bonan, G. B., K. W. Oleson, R. A. Fisher, G. Lasslop, and M. Reichstein, 2012: Reconciling leaf physiological traits and canopy flux data: Use of the TRY and FLUXNET databases in the Community Land Model version 4. J. Geophys. Res., 117, G02026, https://doi.org/10.1029/2011JG001913.
Boussetta, S., G. Balsamo, A. Beljaars, T. Kral, and L. Jarlan, 2013: Impact of a satellite-derived leaf area index monthly climatology in a global numerical weather prediction model. Int. J. Remote Sens., 34, 3520–3542, https://doi.org/10.1080/01431161.2012.716543.
Chen, L., P. A. Dirmeyer, Z. Guo, and N. M. Schultz, 2018: Pairing FLUXNET sites to validate model representations of land use/land cover change. Hydrol. Earth Sys. Sci., 22, 111–125, https://doi.org/10.5194/hess-22-111-2018.
Cheng, Y., C. Sayde, Q. Li, J. Basara, J. Selker, E. Tanner, and P. Gentine, 2017: Failure of Taylor’s hypothesis in the atmospheric surface layer and its correction for eddy-covariance measurements. Geophys. Res. Lett., 44, 4287–4295, https://doi.org/10.1002/2017GL073499.
Dee, D. P., and Coauthors, 2011: The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quart. J. Roy. Meteor. Soc., 137, 553–597, https://doi.org/10.1002/qj.828.
Dirmeyer, P. A., 2011: The terrestrial segment of soil moisture–climate coupling. Geophys. Res. Lett., 38, L16702, https://doi.org/10.1029/2011GL048268.
Dirmeyer, P. A., R. D. Koster, and Z. Guo, 2006: Do global models properly represent the feedback between land and atmosphere? J. Hydrometeor., 7, 1177–1198, https://doi.org/10.1175/JHM532.1.
Dirmeyer, P. A., S. Kumar, M. J. Fennessy, E. L. Altshuler, T. DelSole, Z. Guo, B. Cash, and D. Straus, 2013: Model estimates of land-driven predictability in a changing climate from CCSM4. J. Climate, 26, 8495–8512, https://doi.org/10.1175/JCLI-D-13-00029.1.
Dirmeyer, P. A., and Coauthors, 2016: Confronting weather and climate models with observational data from soil moisture networks over the United States. J. Hydrometeor., 17, 1049–1067, https://doi.org/10.1175/JHM-D-15-0196.1.
Dirmeyer, P. A., P. Gentine, M. B. Ek, and G. Balsamo, 2018: Land surface processes relevant to S2S prediction. The Gap between Weather and Climate Forecasting: Sub-Seasonal to Seasonal Prediction, A. W. Robertson and F. Vitart, Eds., Elsevier, in press.
Dorigo, W. A., and Coauthors, 2011: The International Soil Moisture Network: A data hosting facility for global in situ soil moisture measurements. Hydrol. Earth Syst. Sci., 15, 1675–1698, https://doi.org/10.5194/hess-15-1675-2011.
Dorigo, W. A., and Coauthors, 2013: Global automated quality control of in situ soil moisture data from the International Soil Moisture Network. Vadose Zone J., 12 (3), https://doi.org/10.2136/vzj2012.0097.
Dorigo, W. A., and Coauthors, 2017: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions. Remote Sens. Environ., 203, 185–215, https://doi.org/10.1016/j.rse.2017.07.001.
Ek, M. B., K. E. Mitchell, Y. Lin, E. Rogers, P. Grunmann, V. Koren, G. Gayno, and J. D. Tarplay, 2003: Implementation of Noah land surface model advances in the National Centers for Environmental Prediction operational mesoscale Eta model. J. Geophys. Res., 108, 8851, https://doi.org/10.1029/2002JD003296.
Famiglietti, J. S., and Coauthors, 1999: Ground-based investigation of soil moisture variability within remote sensing footprints during the Southern Great Plains 97 (SGP97) hydrology experiment. Water Resour. Res., 35, 1839–1851, https://doi.org/10.1029/1999WR900047.
Gelaro, R., and Coauthors, 2017: The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2). J. Climate, 30, 5419–5454, https://doi.org/10.1175/JCLI-D-16-0758.1.
GMAO, 2015a: MERRA-2 inst1_2d_lfo_Nx: 2d, 1-hourly, instantaneous, single-level, assimilation, land surface forcings V5.12.4. Goddard Earth Sciences Data and Information Services Center, accessed 3 July 2016, https://doi.org/10.5067/RCMZA6TL70BG.
GMAO, 2015b: MERRA-2 tavg1_2d_lfo_Nx: 2d, 1-hourly, time-averaged, single-level, assimilation, land surface forcings V5.12.4. Goddard Earth Sciences Data and Information Services Center, accessed 3 July 2016, https://doi.org/10.5067/L0T5GEG1NYFA.
Jackson, T. J., and A. Y. Hsu, 2001: Soil moisture and TRMM microwave imager relationships in the Southern Great Plains 1999 (SGP99) experiment. IEEE Trans. Geosci. Remote Sens., 39, 1632–1642, https://doi.org/10.1109/36.942541.
Kinter, J. L., III, and Coauthors, 2013: Revolutionizing climate modeling with Project Athena: A multi-institutional, international collaboration. Bull. Amer. Meteor. Soc., 94, 231–245, https://doi.org/10.1175/BAMS-D-11-00043.1.
Koster, R. D., Z. Guo, P. A. Dirmeyer, R. Yang, K. Mitchell, and M. J. Puma, 2009: On the nature of soil moisture in land surface models. J. Climate, 22, 4322–4335, https://doi.org/10.1175/2009JCLI2832.1.
Lawrence, D. M., and Coauthors, 2011: Parameterization improvements and functional and structural advances in version 4 of the Community Land Model. J. Adv. Model. Earth Syst., 3, M03001, https://doi.org/10.1029/2011MS00045.
Mahanama, S. P. P., and Coauthors, 2015: Land boundary conditions for the Goddard Earth Observing System model version 5 (GEOS-5) climate modeling system—Recent updates and data file descriptions. NASA/TM-2015-104606, Vol. 39, 55 pp., https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20160002967.pdf.
Melaas, E. K., A. D. Richardson, M. A. Friedl, D. Dragoni, C. M. Gough, M. Herbst, L. Montagnani, and E. Moors, 2013: Using FLUXNET data to improve models of springtime vegetation activity onset in forest ecosystems. Agric. For. Meteor., 171–172, 46–56, https://doi.org/10.1016/j.agrformet.2012.11.018.
Mitchell, K., 2005: The community Noah land-surface model (LSM) user’s guide, version 2.7.1. NOAA/NCEP Doc., 26 pp.
Molod, A., L. Takacs, M. Suarez, and J. Bacmeister, 2015: Development of the GEOS-5 atmospheric general circulation model: Evolution from MERRA to MERRA2. Geosci. Model Dev., 8, 1339–1356, https://doi.org/10.5194/gmd-8-1339-2015.
Orth, R., E. Dutra, and F. Pappenberger, 2016: Improving weather predictability by including land surface model parameter uncertainty. Mon. Wea. Rev., 144, 1551–1569, https://doi.org/10.1175/MWR-D-15-0283.1.
Orth, R., E. Dutra, I. F. Trigo, and G. Balsamo, 2017: Advancing land surface model development with satellite-based Earth observations. Hydrol. Earth Syst. Sci., 21, 2483–2495, https://doi.org/10.5194/hess-21-2483-2017.
Palmer, T. N., 2012: Towards the probabilistic Earth-system simulator: A vision for the future of climate and weather prediction. Quart. J. Roy. Meteor. Soc., 138, 841–861, https://doi.org/10.1002/qj.1923.
Pastorello, G. Z., D. Papale, H. Chu, C. Trotta, D. A. Agarwal, E. Canfora, D. D. Baldocchi, and M. S. Torn, 2017: A new data set to keep a sharper eye on land-air exchanges. Eos, Trans. Amer. Geophys. Union, 98, https://doi.org/10.1029/2017EO071597.
Peters-Lidard, C. D., and Coauthors, 2007: High performance earth system modeling with NASA/GSFC’s Land Information System. Innovations Syst. Software Eng., 3, 157–165, https://doi.org/10.1007/s11334-007-0028-x.
Purdy, A. J., J. B. Fisher, M. L. Goulden, and J. S. Famiglietti, 2016: Ground heat flux: An analytical review of 6 models evaluated at 88 sites and globally. J. Geophys. Res. Biogeosci., 121, 3045–3059, https://doi.org/10.1002/2016JG003591.
Quiring, S. M., T. W. Ford, J. K. Wang, A. Khong, E. Harris, T. Lindgren, D. W. Goldberg, and Z. Li, 2016: North American Soil Moisture Database: Development and applications. Bull. Amer. Meteor. Soc., 97, 1441–1460, https://doi.org/10.1175/BAMS-D-13-00263.1.
Reichle, R. H., and Q. Liu, 2014: Observation-corrected precipitation estimates in GEOS-5. NASA/TM-2014-104606, Vol. 35, 18 pp., http://gmao.gsfc.nasa.gov/pubs/docs/Reichle734.pdf.
Reichle, R. H., C. S. Draper, Q. Liu, M. Girotto, S. P. Mahanama, R. D. Koster, and G. De Lannoy, 2017a: Assessment of MERRA-2 land surface hydrology estimates. J. Climate, 30, 2937–2960, https://doi.org/10.1175/JCLI-D-16-0720.1.
Reichle, R. H., Q. Liu, R. Koster, C. Draper, S. Mahanama, and G. Partyka, 2017b: Land surface precipitation in MERRA-2. J. Climate, 30, 1643–1664, https://doi.org/10.1175/JCLI-D-16-0570.1.
Reichstein, M., and Coauthors, 2005: On the separation of net ecosystem exchange into assimilation and ecosystem respiration: Review and improved algorithm. Global Change Biol., 11, 1424–1439, https://doi.org/10.1111/j.1365-2486.2005.001002.x.
Rienecker, M. M., and Coauthors, 2011: MERRA: NASA’s Modern-Era Retrospective Analysis for Research and Applications. J. Climate, 24, 3624–3648, https://doi.org/10.1175/JCLI-D-11-00015.1.
Robock, A., K. Ya. Vinnikov, C. A. Schlosser, N. A. Speranskaya, and Y. Xue, 1995: Use of midlatitude soil moisture and meteorological observations to validate soil moisture simulations with biosphere and bucket models. J. Climate, 8, 15–35, https://doi.org/10.1175/1520-0442(1995)008<0015:UOMSMA>2.0.CO;2.
Saha, S., and Coauthors, 2010: The NCEP Climate Forecast System Reanalysis. Bull. Amer. Meteor. Soc., 91, 1015–1057, https://doi.org/10.1175/2010BAMS3001.1.
Santanello, J. A., Jr., C. D. Peters-Lidard, and S. V. Kumar, 2011: Diagnosing the sensitivity of local land–atmosphere coupling via the soil moisture–boundary layer interaction. J. Hydrometeor., 12, 766–786, https://doi.org/10.1175/JHM-D-10-05014.1.
Santanello, J. A., Jr., and Coauthors, 2018: Land–atmosphere interactions: The LoCo perspective. Bull. Amer. Meteor. Soc., https://doi.org/10.1175/BAMS-D-17-0001.1, in press.
Sellers, P. J., F. G. Hall, G. Asrar, D. E. Strebel, and R. E. Murphy, 1992: An overview of the First International Satellite Land Surface Climatology Project (ISLSCP) Field Experiment (FIFE). J. Geophys. Res., 97, 18 345–18 372, https://doi.org/10.1029/92JD02111.
Sellers, P. J., and Coauthors, 1995: The Boreal Ecosystem–Atmosphere Study (BOREAS): An overview and early results from the 1994 field year. Bull. Amer. Meteor. Soc., 76, 1549–1577, https://doi.org/10.1175/1520-0477(1995)076<1549:TBESAO>2.0.CO;2.
Sheffield, J., G. Goteti, and E. F. Wood, 2006: Development of a 50-yr high-resolution global dataset of meteorological forcings for land surface modeling. J. Climate, 19, 3088–3111, https://doi.org/10.1175/JCLI3790.1.
Shukla, R. P., B. Huang, L. Marx, J. L. Kinter, and C.-S. Shin, 2018: Predictability and prediction of Indian summer monsoon by CFSv2: Implication of the initial shock effect. Climate Dyn., 50, 159–178, https://doi.org/10.1007/s00382-017-3594-0.
Sippel, S., J. Zscheischler, M. D. Mahecha, R. Orth, M. Reichstein, M. Vogel, and S. I. Seneviratne, 2017: Refining multi-model projections of temperature extremes by evaluation against land–atmosphere coupling diagnostics. Eart Syst. Dyn., 8, 387–403, https://doi.org/10.5194/esd-8-387-2017.
Slater, A. G., 2016: Surface solar radiation in North America: A comparison of observations, reanalyses, satellite, and derived products. J. Hydrometeor., 17, 401–420, https://doi.org/10.1175/JHM-D-15-0087.1.
Viovy, N., 2013: CRUNCEP data set for 1901–2010. Climate Data Gateway at NCAR, accessed 3 October 2016, https://www.earthsystemgrid.org/dataset/ucar.cgd.ccsm4.CRUNCEP.v4.html.
Vuichard, N., and D. Papale, 2015: Filling the gaps in meteorological continuous data measured at FLUXNET sites with ERA-Interim reanalysis. Earth Syst. Sci. Data, 7, 157–171, https://doi.org/10.5194/essd-7-157-2015.
Williams, M., and Coauthors, 2009: Improving land surface models with FLUXNET data. Biogeosciences, 6, 1341–1359, https://doi.org/10.5194/bg-6-1341-2009.
Xia, Y., and Coauthors, 2012: Continental-scale water and energy flux analysis and validation for the North American Land Data Assimilation System project phase 2 (NLDAS-2): 1. Intercomparison and application of model products. J. Geophys. Res., 117, D03109, https://doi.org/10.1029/2011JD016048.
Xie, P., and P. A. Arkin, 1997: Global precipitation: A 17-year monthly analysis based on gauge observations, satellite estimates, and numerical model outputs. Bull. Amer. Meteor. Soc., 78, 2539–2558, https://doi.org/10.1175/1520-0477(1997)078<2539:GPAYMA>2.0.CO;2.
Xie, P., M. Chen, A. Yatagai, T. Hayasaka, Y. Fukushima, and S. Yang, 2007: A gauge-based analysis of daily precipitation over East Asia. J. Hydrometeor., 8, 607–626, https://doi.org/10.1175/JHM583.1.
Zaitchik, B. F., J. A. Santanello, S. V. Kumar, and C. D. Peters-Lidard, 2013: Representation of soil moisture feedbacks during drought in NASA Unified WRF (NU-WRF). J. Hydrometeor., 14, 360–367, https://doi.org/10.1175/JHM-D-12-069.1.