Accounting for the Effect of Noise in Satellite Soil Moisture Data on Estimates of Land–Atmosphere Coupling Using Information Theoretical Metrics

Abedeh Abdolghafoorian aCenter for Ocean–Land–Atmosphere Studies, George Mason University, Fairfax, Virginia

Search for other papers by Abedeh Abdolghafoorian in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0002-2996-6293
and
Paul A. Dirmeyer aCenter for Ocean–Land–Atmosphere Studies, George Mason University, Fairfax, Virginia

Search for other papers by Paul A. Dirmeyer in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Land states can affect the atmosphere through their control of surface turbulent fluxes and the subsequent impact of those fluxes on boundary layer properties. Information theoretic (IT) metrics are ideal to study the strength and type of coupling between surface soil moisture (SM) and land surface heat fluxes (HFs) because they are nonparametric and thus appropriate for the analysis of highly complex Earth systems containing nonlinear cause-and-effect interactions that may have nonnormal distributions. Specifically, a methodology for the estimation of IT metrics from noisy time series is proposed, accounting for random errors in satellite-based SM data. Performance of the proposed method is demonstrated through synthetic tests. Efficacy of the method is greatest for estimates of entropy and mutual information involving SM; improvements to estimates of transfer entropy are significant but less stark. A global depiction of the information flow between SM and HFs is then constructed from observationally based gridded data. This is used as independent verification for two configurations of the ECMWF modeling system: unconstrained open-loop (retrospective forecasts) and constrained by data assimilation (ERA5). Compared to studies that only investigate the linear SM–HF relationships, extended regions of significant terrestrial coupling are found over the globe, as IT metrics enable detection of nonlinear dependencies. The magnitude and spatial variability of coupling strength and type from models show discrepancies with those from observations, highlighting the potential to improve SM and HF covariability within models. Although ERA5 did not perform better than the unconstrained model in very dry climates, its performance is generally superior to that of the unconstrained model across metrics.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Abedeh Abdolghafoorian, aabdolgh@gmu.edu

Abstract

Land states can affect the atmosphere through their control of surface turbulent fluxes and the subsequent impact of those fluxes on boundary layer properties. Information theoretic (IT) metrics are ideal to study the strength and type of coupling between surface soil moisture (SM) and land surface heat fluxes (HFs) because they are nonparametric and thus appropriate for the analysis of highly complex Earth systems containing nonlinear cause-and-effect interactions that may have nonnormal distributions. Specifically, a methodology for the estimation of IT metrics from noisy time series is proposed, accounting for random errors in satellite-based SM data. Performance of the proposed method is demonstrated through synthetic tests. Efficacy of the method is greatest for estimates of entropy and mutual information involving SM; improvements to estimates of transfer entropy are significant but less stark. A global depiction of the information flow between SM and HFs is then constructed from observationally based gridded data. This is used as independent verification for two configurations of the ECMWF modeling system: unconstrained open-loop (retrospective forecasts) and constrained by data assimilation (ERA5). Compared to studies that only investigate the linear SM–HF relationships, extended regions of significant terrestrial coupling are found over the globe, as IT metrics enable detection of nonlinear dependencies. The magnitude and spatial variability of coupling strength and type from models show discrepancies with those from observations, highlighting the potential to improve SM and HF covariability within models. Although ERA5 did not perform better than the unconstrained model in very dry climates, its performance is generally superior to that of the unconstrained model across metrics.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Abedeh Abdolghafoorian, aabdolgh@gmu.edu

1. Introduction

Land states affect the atmosphere through their control of surface fluxes, and the subsequent impact of fluxes on boundary layer properties. However, land–atmosphere (LA) coupling type and strength are not consistently quantified and weather and climate prediction models have large differences in their diagnosed LA coupling (Abdolghafoorian and Dirmeyer 2021; Dirmeyer 2011; Guo et al. 2006; Koster et al. 2006). Observational (satellite-derived) analyses of land states and surface fluxes provide global coverage that span many years, and they show great promise for diagnosing LA coupling. Such data can also be used to assess the performance of models, diagnose errors in their structure, and potentially improve their prediction skills. Metrics that quantify LA coupling strength provide a way to understand the processes, linkages, and pathways by which variations at the land surface can affect weather and climate (Santanello et al. 2015). Many are correlation-based metrics, including the Pearson correlation coefficient and terrestrial coupling index, which have been used to characterize the coupling between soil moisture (SM) and surface heat fluxes (e.g., Abdolghafoorian and Dirmeyer 2021; Dirmeyer 2011; Guo et al. 2006; Lawrence et al. 2007). However, these metrics can only detect linear relationships between variables and leave nonlinearity unexamined. In addition, even though “correlation does not imply causality” is a well-accepted concept in the geoscience community, compared to correlation-based studies, there are only a few studies that have assessed causality between components of LA interactions (e.g., Ruddell and Kumar 2009; Salvucci et al. 2002; Li et al. 2020; Tawia Hagan et al. 2019).

Information theory approaches are nonparametric and are free of assumptions such as that the relationship between variables is linear or that the distribution of a variable is Gaussian; those are the assumptions that underpin typical correlation-based statistics (Hsu and Dirmeyer 2021). Therefore, information theoretic (IT) metrics are attractive for the analysis of highly complex Earth systems with nonlinear cause-and-effect interactions. IT metrics can be applied to time series variables to provide more robust statements regarding forcing and feedback interactions (Goodwell et al. 2020; Kumar et al. 2018; Ruddell et al. 2013; Lou et al. 2022; Gerken et al. 2019; Bennett et al. 2019). However, both conventional and IT metrics are compromised by the presence of errors in observational data. Specifically, random errors in observations, which are prominent in remote sensing data, can lead to underestimates of covariability and thus coupling between land and atmosphere states.

In this study, we focus on relationships between SM and turbulent surface fluxes of sensible heat (SH) and latent heat (LE), which are recognized as key factors affecting short-to-long-term Earth system prediction skills (Dirmeyer et al. 2019). LA coupling encompasses both the degree to which a surface flux responds to anomalies in land surface SM, and the robustness of that response. We employ IT metrics to scrupulously study the coupling strength and type between SM–surface heat fluxes. This includes quantifying contemporaneous dependencies (i.e., shared information) between SM and SH or LE as well as time-lagged dependencies (i.e., the degree to which antecedent SM, recognized as a source of prediction, determines the future of surface fluxes beyond the degree to which surface fluxes already disambiguate their own futures). IT metrics enable us to account for both nonlinear and linear dependencies between SM and surface heat fluxes, particularly over regions that experience transition among various evaporation regimes (Haghighi et al. 2018; Schwingshackl et al. 2017; Short Gianotti et al. 2019; Hsu and Dirmeyer 2021), as typical correlation-based statistics are usually low biased in recognizing the strength of SM–surface flux coupling.

We first construct, from observationally based gridded datasets, a global depiction of the information flow and coupling type between SM and surface heat fluxes. Given that the satellite-based measurements of soil moisture suffer from observational random errors (Dirmeyer et al. 2016; Kumar et al. 2018), the coupling metrics from observationally based datasets are subject to bias (Findell et al. 2015). Abdolghafoorian and Dirmeyer (2021) addressed the effect of random noise on correlation-based coupling metrics and obtained the “corrected” metrics from an analytical equation. One of the main contributions of this study lies in the analytical derivation of IT metrics from noisy variables and the proposition of a method to estimate noiseless IT metrics from noisy variables. We first approximate the observational errors in satellite-based SM data using established methods and then compute the strength of coupling metrics while accounting for those errors by implementing a novel and robust adjustment method.

We employ global maps of estimated noise-free information flow derived from observationally based products as independent verifications for the performance of two configurations of a global environmental modeling system (viz., unconstrained open-loop retrospective forecasts and a reanalysis constrained by data assimilation). This evaluation allows assessment of the embedded land surface model (LSM), atmospheric model components that interact at the surface, and the effect of data assimilation on the performance of the model system. Section 2 describes the in situ measurements, observational (satellite-based) products, and models used in the study. Section 3 describes the IT metrics, derivations of IT metrics from noisy variables, and the proposed method for accounting for random errors in observations. The evaluation of the proposed method, analyses of the SM–surface flux coupling strength, and categorical types in observations and models are presented and discussed in section 4. Section 5 presents a summary of the main conclusions of this study.

2. Data

In this study, we use global gridded observationally based datasets of surface soil moisture and land surface latent and sensible surface fluxes at the daily time scale to evaluate two configurations of the European Centre for Medium-Range Weather Forecasts (ECMWF) modeling system. We also use daily mean in situ measurements of surface soil moisture and heat fluxes to validate the accuracy of observationally based products. All analyses of global fields are performed at a common 1° spatial resolution and daily temporal resolution over the 20-yr period 1995–2014.

a. In situ measurements

We use daily mean measurements of surface latent heat fluxes and surface soil moisture from the FLUXNET dataset (https://fluxnet.org/data/fluxnet2015-dataset/). The top (i.e., closest to the surface) soil water content measurement is always used to represent surface soil moisture. Although subsurface soil moisture also has an influence on fluxes, especially latent heat, we use only surface soil moisture as it is observable globally by satellite and underpins the global gridded flux products we also use. We analyze the field sites with at least 4 years of data during JJA (see Table A1 in appendix A).

b. Observationally based datasets

We use a number of global or near-global gridded estimates of surface heat fluxes and soil moisture that are based directly or indirectly on observations.

1) Land surface fluxes

(i) FLUXCOM

FLUXCOM provides global (excluding unvegetated hot and cold deserts) land surface fluxes obtained using machine learning algorithms trained by heat flux measurements from FLUXNET eddy covariance towers (Jung et al. 2019). We use the remote sensing and meteorological databased (RS+METEO) FLUXCOM product which also uses daily ECMWF Reanalysis v5 (ERA5) meteorological data (Hersbach et al. 2020; see section 2c) as well as mean seasonal satellite-derived land surface variables from MODIS to better constrain estimates of surface fluxes.

(ii) GLEAM

The Global Land Evaporation Amsterdam Model (GLEAM) provides daily estimates of terrestrial evaporation and sensible heat flux from satellite and reanalysis datasets using a simple model based on the Priestley–Taylor equation. In this study, we use version 3.5a of the GLEAM product, which uses air temperature and radiation analyses from ERA5 (see below) and a combination of gauge-based, reanalysis, and satellite-based precipitation (Martens et al. 2017). Although sensible heat flux data are not part of the official release of GLEAM data, they are calculated in the process of estimating surface evaporation, as the residual of the energy balance, ignoring changes in energy storage, and those values are used here.

2) Soil moisture

(i) ESA CCI

The ESA Climate Change Initiative (CCI) provides the longest available observational, global soil moisture (ESA CCI SM) datasets (Dorigo et al. 2017). Here, we use version 6.1 of the ESA CCI SM combined product, which integrates various single-sensor active and passive microwave soil moisture products. Because the source of data is from polar-orbiting satellites, the product has multiple gaps across the world, e.g., seasonal gaps over northern latitudes or continuous gaps over tropical rain forests, as well as daily missing swaths depending on satellite orbits, especially earlier in the period.

(ii) GLEAM

We use the GLEAM V3.5a surface layer (0–10 cm) soil moisture in the synthetic tests only (section 4a). The GLEAM algorithm estimates root-zone SM mainly based on the parameterized physical processes and the use of extensive independent satellite data (Martens et al. 2017). This algorithm integrates different datasets of satellite soil moisture observations [viz., the SMOS Level 3 (Jacquette et al. 2010) and ESA CCI SM v5.3] using a Newtonian nudging data assimilation scheme.

c. Models

To demonstrate the methodology described below, we examine two configurations of the ECMWF modeling system: 1) unconstrained model retrospective forecasts, where its LSM is coupled to the GCM in a free-running mode, and 2) constrained model, where state observations are assimilated to the coupled LSM-GCM model as a reanalysis of past conditions (i.e., ERA5; Hersbach et al. 2020). The land surface modeling of ECMWF is based on the Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL). ERA5 is the first ECMWF reanalysis that directly assimilates soil moisture observations (from C-band scatterometers in addition to a variety of other observations) into the coupled model to improve the soil moisture and land surface fluxes consistency.

The required data (including land fluxes and soil moisture content) of the free-running (open-loop) mode of ECMWF are extracted from the retrospective forecasts that have been contributed to the Subseasonal to Seasonal (S2S) prediction project (Vitart et al. 2017). The frequency of initializing S2S–ECMWF reforecasts is weekly with forecasts spanning 32 days. Here, we use only days 0–6 of all four ensemble members. The S2S–ECMWF dataset provides soil moisture at various depths. We use the reported shallow soil moisture of the top 20 cm. For ERA5, we use soil moisture at the topmost (0–7 cm) of four available soil water layers. More details about the ECMWF model and its underlying HTESSEL land surface scheme can be found in Vitart (2014) and Balsamo et al. (2011), and references therein.

3. Methods

a. Information theoretic metrics

IT metrics provide a means to conduct statistical analyses of time series free of assumptions about the distributions of data or the interrelationships between variables. Let X be a discrete random variable with support Rx and probability density function (PDF) P(x). Shannon’s entropy H (Shannon 1948) is defined as the weighted average of amount information (uncertainty) in the variable’s possible outcomes:
H(X)=xRx[P(X=x)log2P(X=x)].
Let Y be another discrete random variable with support Ry and probability density function P(y). The mutual information I (Cover and Thomas 2006) is a measure of all dependencies (linear and nonlinear) between X and Y, independent of the assumption of a specific functional relationship, and obtained as
I(X,Y)=xRxyRy[P(X=x,Y=y)log2P(X=x,Y=y)P(X=x)P(Y=y)].
Transfer entropy from X to Y is represented as T(X > Y) and measures the amount of predictive information that X (as a source of information) at time lag τ (Xτ) can provide to disambiguate Y (as a sink or target of information) beyond the amount of information to which the single point immediate history of Y (Y1) already provides to its own future (Marschinski and Kantz 2002). In other words, T quantifies the shared information between the history of X and the present state of Y given knowledge of the immediate history of Y one data time step previous, and is given by
T(X>Y)=I(Xτ,Y|Y1)=I(Xτ,Y)I(Xτ,Y,Y1).

The magnitude of T(X > Y) reveals how strongly the historical time series of the source variable (here, soil moisture), which is not redundant with the historical time series of the target variable (here, surface fluxes) itself, impact the present state of target variable.

If relationships between variables are all assumed linear, mutual information could be considered as an analog to the Pearson’s correlation coefficient between X and Y, and transfer entropy could be considered as an equivalent to the partial correlation between Xτ and Y controlling for the effect of Y1. For Gaussian variables, Granger causality and transfer entropy are equivalent in detecting linear dependencies (Barnett et al. 2009).

We compute these IT metrics using the daily anomalies of land surface soil moisture and fluxes. To obtain anomalies of the variables, the climatology of variables for each day of the year is removed from datasets. A triangular window algorithm (i.e., a smoothing function with weights linearly decreasing with distance from the center point) with a smoothing window of 31 days (±15 days) is applied on the multiyear datasets to remove high-frequency variations (as well as sparsity in the CCI SM dataset) when calculating a climatological seasonal cycle.

b. Effect of random observational errors on IT metrics

We use the methodology proposed by Vinnikov and Yeserkepova (1991) for the estimation of SM random observational errors in satellite data. The temporal dynamic of soil moisture corresponds to a statistical model of red noise (Delworth and Manabe 1988) and can be generated by a first-order Markov process. Therefore, the ratio of SM observational error standard deviation σ to the temporal standard deviation of SM measurements γ is estimated as
a1+a,
where a is the intercept at τ = 0 of a linear best fit of ln(r) versus τ, where r is the autocorrelation function of observed soil moisture time series at time lag τ (Robock et al. 1995).

It should be noted that there are other types of errors in the CCI SM products (such as errors due to applying statistical methods to merge multiple active and passive sensors in the combined product). Here, we only address random errors which can be estimated as a normal PDF. Nevertheless, the random observational errors of SM are often nontrivial. Assuming the errors resemble white noise with a normal PDF, N(0, σ), we extend the IT metrics to account directly for their effect when present in satellite-based SM data.

To isolate the role of random errors in IT metrics, let Z be a discrete random variable from a normal probability distribution, independent of X, with support Rz and PDF P(z). The PDF of the sum X + Z, representing satellite-based SM measurement, can be derived by the convolution of P(x) and P(z) as
P(X+Z)=P(X)*P(Z)=zRz[P(X=x)P(Z=z)].
The convolution between P(X) and P(Z) is denoted by P(X)*P(Z).
The entropy of X + Z can be obtained using Eq. (1) as
H(X+Z)=H(P(X)*P(Z))=xRx{zRz[P(X=x)P(Z=z)]×log2zRz[P(X=x)P(Z=z)]}.
The effect of random observational errors on the mutual information between X + Z and Y can be addressed as well. The mutual information between X + Z and Y can be obtained as:
I(X+Z,Y)=H(P(X)*P(Z))yRy[P(Y=y) H(P(X|Y=y)*P(Z))]
The derivation of Eq. (7) is provided in appendix B.
The transfer of entropy from X + Z to Y can be obtained as
T(X+Z>Y)=y1Ry1[P(Y1=y1) H(P(Xτ|Y1=y1)*P(Z))]y1Ry1{yRy[P(Y=y,Y1=y1) H(P(Xτ|Y=y,Y1=y1)*P(Z))]}
The derivation of Eq. (8) is provided in appendix B.

The above analytical equations (5)(8) can be used to obtain the IT metrics considering the effect of random noise of determinable magnitude in the source of information. Similarly, analytical equations can be derived to obtain H(X + Z + Z′), I(X + Z + Z′, Y), and T(X + Z + Z′ > Y), where Z′, analogous to Z, is also a random variable from the normal distribution N(0, σ). By adding more random noise to a time series, the randomness of the time series increases and consequently, the entropy of time series increases while the amount of shared information between the time series and another time series (say Y) and the amount of predictive information from the time series to Y reduce. This characteristic is crucial to the utility of our proposed method, as is shown below.

Because Z is a random time series of known variance but unknown sequence, it is not possible to obtain the time series of X from X + Z by simply subtracting out the noise. Here, we propose a method to estimate the IT metrics of X by calculating IT metrics of ever noisier variables (e.g., X + Z, X + Z + Z′, X + Z + Z′ + Z″, etc.) and extrapolating a fitted curve through these estimates back to the hypothetical noiseless X. Hereafter, we call this method mitigation of noise (MoN). To obtain the IT metrics of noisy variables, one can derive additional analytical equations a la (5)(8) for further additions of noise, or one can use straightforward Monte Carlo methods with added computational expense. To provide quantitative demonstrations of the robustness of the method employing up to five levels of added noise, we use the latter.

We demonstrate the MoN method here with a synthetic test. We use daily anomalies of surface soil moisture (as a source, X) and latent heat flux (as a target, Y) directly from the GLEAM dataset for the period 1995–2014 at a grid cell with longitude 100°W, latitude 36°N. For the sake of illustration, Fig. 1a shows a 365-day period of X. We generate a random sequence Z from the Gaussian distribution N(0, σ) and add it to X to generate series XZ1 (XZ1 = X + Z, see Fig. 1b). Note that in a real-world case with satellite soil moisture data, time series X is not known and XZ1 represents the satellite SM time series. From the same Gaussian distribution, we generate another random sequence Z′ and add it to XZ1 to generate time series XZ2 (XZ2 = X + Z + Z′, see Fig. 1c). Similarly, we generate time series XZn where subscript n indicates how many times a random sequence from the Gaussian distribution N(0, σ) has been added to time series X. For example, time series XZ3 is equal to X + Z + Z′ + Z″, where Z″ is again a random sequence from N(0, σ). Figures 1a–f show one realization of time series of X and XZn for n = 1:5. We calculate the entropy of XZn for each n = 1:5. The objective of MoN is to estimate H(X) from H(XZn), n = 1:5.

Fig. 1.
Fig. 1.

Illustration of 365-day time series of (a) variable X (as source) and (b)–(f) one realization of XZn for n = 1–5, where subscript n indicates how many times a random sequence from the Gaussian distribution N(0, σ) is added to X. (g) Entropy of time series vs n.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

Figure 1g shows Shannon’s entropy of XZn versus n and a fitted power function curve, which we have found over many trials to provide the best fit. As expected, H(XZn) increases as n increases, revealing the increase of randomness in time series. By extrapolating the fitted curve to n = 0, we obtain an estimated of H(X) = 2.864, which is shown by a blue star in Fig. 1g. The true value of H(X) = 2.852 obtained from the original time series X (Fig. 1a) is represented by a gold star in Fig. 1g, for the sake of comparison. In this case, the relative error of the estimate is near 1%, indicating the success of MoN in estimating H(X) from noisy variables XZn. Note that we used Monte Carlo simulations to estimate H(XZn), meaning H(XZn) is the average of entropy of M realizations of XZn. Here, M is 20 and Figs. 1b–f each show only one realization of XZn for clarity. Increasing M increases the accuracy of H(XZn) estimates, but at a higher computational cost. Using the analytical equations (instead of the Monte Carlo simulations) to obtain H(XZn) will lead to an estimate of H(X) = 2.835, which is very close to the estimate of H(X) from the Monte Carlo simulation with M = 20. Similarly, MoN can be applied to estimate I(X, Y) and T(X > Y) from noisy variables XZn (see section 4a).

c. Optimal binning for histograms

The histogram is a nonparametric density estimator that estimates the underlying PDF of the data being analyzed. IT metrics are estimated from histogram-based PDFs of SM and surface fluxes. Therefore, the accuracy of estimated metrics depends on the size and number of bins that represent the PDFs. The number of bins should be large enough to capture the major features of the data distribution while avoiding excessive numbers of empty bins, not emphasizing fine details, and suppressing estimation errors. Since the underlying PDF of the variables is unknown, following Knuth (2006), we apply Bayesian probability theory to estimate the databased optimal number of bins, considering the histogram to be an equal bin-width piecewise-constant model of the underlying PDF from which the dataset is sampled. The number of bins is obtained by optimizing the logarithm of the posterior probability. Full details on the optimization approach can be found in Knuth (2006) and Knuth et al. (2013). To obtain the optimal bin numbers required for estimating Shannon’s entropy H, mutual information I, and transfer entropy T, we perform 1D, 2D, and 3D bin optimizations, respectively.

d. Statistical significance of metrics

We assess the statistical significance of estimated metrics (i.e., I and T) using the shuffled surrogates method (Ruddell and Kumar 2009), which has the null hypothesis that the time series of interest is a random sequence of values. In this method, first, the time series of variables are shuffled randomly in time to create a new sequence of variables. Then, the IT metric of interest is computed for the new sequence. These steps are repeated for multiple realizations (here, 500 times) resulting in a normal distribution of the metric with mean μ and standard deviation σ. A one-tailed hypothesis test is applied to determine whether the estimated metric exceeds the significance threshold Δ, with (1 − β)100% level of confidence (here, β = 0.01 and Δ = μ + 2.36σ).

e. Coupling type

The relationship between two variables (i.e., the “source” and “target” of information) can be described based on the relative magnitudes of mutual information and transfer entropy. While the mutual information quantifies the synchronization between two variables, the transfer entropy expresses the cause of synchronization (coupling). Following the convention of Ruddell and Kumar (2009), if there is a significant information flow between two variables at time lag zero (i.e., mutual information is statistically significant, I > ΔI) but the predictive information flow from source to target at other time lags is not significant (i.e., transfer entropy is not statistically significant, T < ΔT) then the coupling type is called “synchronization.” If there is a significant predictive information flow from the history of the source to target (T > ΔT) but its magnitude is less than the mutual information between the pair of variables at lag zero, then the coupling type is called “feedback” (or lagged synchronization). If information flow from the history of the source to the target is significant and exceeds the shared information at lag zero (T > I), then the cause of synchronization is residual forcing of target by source and the coupling type is called “forcing.” If the two variables do not share significant information at lag zero (I < ΔI) and also the history of the source cannot disambiguate the target beyond the degree to which the target can be disambiguated by its own history (T < ΔT), the pair of variables is called “uncoupled.”

To investigate the performance of models in the representation of dominant coupling types between SM and surface fluxes estimated from observations, we perform a multiclass classification evaluation. Precision, one of the main classification metrics, measures the ability of a model to identify the correct instances for each coupling type as Pr = TP/(TP + FP), where TP is the number of grid cells whose couple typing from observations, called the “true positives” is ct ∈ CT = {synchronization, feedback, forcing, uncoupled} and whose coupling type is correctly identified by the model to be ct. FP is the number of grid cells whose coupling type from observations is not ct but is incorrectly identified by the model to be ct, i.e., “false positives.” We use the average of individual coupling type’s precision to evaluate the overall skill of models on coupling type representation. We also use the informedness metric to evaluate the performance of models on correctly identifying each coupling type. The informedness metric is defined as
TPTP+FN+TNTN+FP1,
where FN is the number of grid cells whose couple typing from observation are ct but are incorrectly identified by the model to be not ct (“false negatives”). The values of informedness range from −1 to +1 when +1 implies all grid cells with coupling type ct will be identified as correctly as ct while −1 implies all grid cells with coupling type ct will be incorrectly identified by the model not to be ct.

4. Results and discussion

First, we present results evaluating the performance of the proposed MoN method (section 4a). Then we examine the relationship between surface soil moisture and surface fluxes for boreal summer (June–August) over the globe using observationally based datasets. We apply the MoN method to estimate IT metrics addressing effect of observational random noise in the satellite based CCI SM dataset. Then, the maps of IT metrics from observationally based datasets are used to evaluate the SM–surface flux interactions in ECMWF models.

a. Performance of the MoN method

We introduced MoN (see section 3b) through a synthetic test using daily anomalies of surface soil moisture (as a source, X) and latent heat flux (as a target, Y) from GLEAM dataset for period 1995–2014 at a grid cell with longitude 100°W, latitude 36°N (see Fig. 1). Figures 2a–c show entropy H, mutual information I, and transfer entropy T versus n for a range of values of noise to signal ratio (NSR) from 0.1 to 0.6. The true values of H(X), I(X, Y), and T(X > Y) are shown by gold stars for comparison.

Fig. 2.
Fig. 2.

Estimating (a) entropy H(X), (b) mutual information I(X, Y), and (c) transfer entropy T(X > Y) by employing the MoN method for a range of noise to signal ratio (NSR). The results are obtained from a synthetic test using SM and LE data from GLEAM at a grid cell with latitude 36°N, longitude 100°W.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

As shown in Fig. 2a, the absolute relative error of H(X) estimates for various NSR range from 0% to 2%. In comparison to H(XZ), which approximates the entropy of noisy satellite-based SM data, estimates of H(X) using MoN, is 11%–95% more accurate with respect to true values of H(X) for NSR = 0.3–0.6. The estimate of H(X) and H(XZ) are almost equal for NSR = 0.1–0.2, as binning acts like a smoother and therefore the effect of small noises is insignificant on entropy.

Figure 2b shows the mutual information between source and target. As expected, adding noise to the source of information reduces the mutual information between source and target. Also, increasing the magnitude of the noise (NSR) reduces the magnitude of mutual information between source and target. The absolute relative error of I(X, Y) estimates for various NSR range from 0.4% to 10%. MoN improves the estimate of I(X, Y) by 26%–99% for various NSR values.

Figure 2c shows the transfer of information from source to target. Similar to mutual information, increasing the magnitude of NSR degrades the transfer entropy. Compared to H and I, it appears noise has less impact on transfer entropy and MoN might be less effective as a way to retrieve the true value of T(X > Y). The absolute relative error of T(X > Y) estimates for various NSR range from 0.1% to 9%. However, the estimates of T(X > Y) are 5%–90% more accurate than T(XZ > Y) with respect to true T(X > Y) for NSR = 0.3–0.6, so there is still improvement for high-noise situations.

We repeated the synthetic test over 70 grid cells around the world with a diverse range of vegetation and climate types. Figure 3 compares the estimates of IT metrics with the true values. IT metrics are normalized with respect to the minimum entropies of the source and target in order to eliminate differences caused by the number of bins used for discretization and generating multidimensional histograms. Normalized mutual information and normalized transfer entropy are represented by nI and nT, respectively. The nI(X + Z, Y) and nT(X + Z > Y) resemble the estimates of nI and nT, if we use satellite-based SM data (noisy data), while nI(X, Y) and nT(X > Y) are our estimates of noise free dependencies between X and Y. Compared to true values of nI and nT, nI(X + Z, Y) and nT(X + Z > Y) are somewhat underestimated, revealing the effect of noise in reducing the apparent coupling strength. Applying MoN improves the estimates of nI and nT, as RMSE and bias are reduced and nI and nT generally lie nearer the X = Y line.

Fig. 3.
Fig. 3.

Evaluation of MoN performance through a set of synthetic tests: comparison between estimates of (top) normalized mutual information (nI) and (bottom) normalized transfer entropy (nT), with true values over 70 locations around the world; X + Z resembles satellite-based SM data with random errors.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

In addition to synthetic tests, we evaluate the effect of addressing observational random noise in the CCI SM dataset for global observationally based IT metrics. Figure 4 compares the IT metrics obtained from the satellite-based CCI SM dataset (XZ, as a source of information) and GLEAM latent heat flux (Y, as target) with those obtained after applying the MoN method. The global means of nI(XZ, Y), nI(X, Y), nT(XZ > Y), and nT(X > Y) are 0.07, 0.13, 0.06, and 0.07, respectively. Where MoN estimates are larger than the original satellite-based values, particularly over regions where the IT metrics are significant, reveal that neglecting the effect of noise in calculating IT-based LA coupling metrics could lead to misinterpretation of the dependencies between SM and surface heat fluxes. Figure 4 also indicates that the coupling type category may be misestimated if it is based on the noisy satellite-based dataset. This is an important feature for the applicability of satellite data to provide global estimates of LA coupling.

Fig. 4.
Fig. 4.

(top) Observationally based (left) nI, (center) nT, and (right) coupling type obtained from CCI SM data (XZ) and GLEAM latent heat flux data (Y). (bottom) The MoN estimates of (left) nI(X, Y), (center) nT(X > Y), and (right) coupling type (X, Y). The histogram on the lower left of the coupling type maps shows the distribution of four coupling types.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

We have also compared the estimates of CCI–GLEAM IT metrics with in situ IT metrics at the locations of FLUXNET towers (results are not shown here). We observed that while nI(X, Y) and nT(X > Y) show a better agreement with in situ IT metrics than nI(XZ, Y) and nT(XZ > Y) at some locations [i.e., 42% of locations in case of nI(X, Y) and 26% of locations in case of nT(X > Y)], at other locations implementing the MoN method does not improve or even worsens the estimates of observationally based IT metrics. Note that 77% of locations show an insignificant nT(SM > LE), meaning that history of SM has no predictive information to provide for LE (i.e., SM acts like a randomly shuffled sequence). This is one of the reasons that MoN could not improve the estimates of transfer entropy at most locations—there was little signal to recover. That said, MoN did not improve synchronization and uncoupled coupling types, but improved feedback and forcing coupling types by an average of 50%. Our investigation shows that there is a significant positive correlation between the amount of improvement of IT metrics and the temporal correlation between time series of in situ heat fluxes and observationally based flux products. This suggests that if in situ and observationally based surface flux products are not in good agreement, addressing the effect of CCI observational random noise would not improve estimates of IT-based coupling metrics; the quality of surface flux data is as important as that of soil moisture data.

In addition, we find there is a significant positive correlation between the degree of IT metric improvement and the SM NSR magnitudes. This suggests that the more erroneous/noisy SM data are, the more effective the proposed method could be. It should be noted that due to representativeness errors (resulting from the mismatch in spatial scale between the grid boxes and the instrument footprint, and particularly the vegetation and soil differences at FLUXNET sites and the wider region in a satellite pixel or GLEAM grid cell), the findings from the evaluation of MoN performance through the comparison of 1° × 1° gridded observationally based IT metrics with in situ IT metrics are not definitive.

b. SM–LE linkage

Figure 5 compares the coupling type and strength of SM–LE relationships in observationally based products (CCI–GLEAM, first column, and CCI–FLUXCOM, second column), an unconstrainted coupled model (ECMWF, third column), and a constrained reanalysis (ERA5, fourth column). The global distribution of normalized mutual information between SM and LE [represented by nI(SM, LE)], normalized transfer entropy from SM to LE [represented by nT(SM > LE)], and SM–LE dominant coupling types are depicted in the first, second, and third rows, respectively. A normalized IT metric for a grid cell is obtained by dividing that IT metric by min[H(SM), H(LE)] of that specific grid cell. The spatial mean of metrics over grid cells having a finite value from all three products, and the percentage of grid cells with significant values (p value < 0.01, represented by P) are shown above each map in Fig. 5. Note that observationally based IT metrics are obtained using the MoN method.

Fig. 5.
Fig. 5.

The global distribution of normalized mutual information, nI(SM, LE), normalized transfer entropy, nT(SM > LE), and SM–LE dominant coupling types from CCI–GLEAM, CCI–FLUXCOM, ECMWF, and ERA5 for boreal summer (June–August).

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

Red areas represent “hot spots” of the terrestrial leg of LA coupling in the water cycle. Unlike the correlation-based LA coupling metrics (such as the Pearson’s correlation coefficient and terrestrial coupling index), mutual information and transfer entropy have only nonnegative values. Therefore, the sign of IT metrics does not directly indicate the terrestrial versus atmospheric control of surface fluxes (Dirmeyer 2011), even though their magnitudes imply the strength of coupling. Rather, the atmospheric control of LE can be inferred from the presence of significant transfer entropy from LE to SM [i.e., nT(LE > SM)], and vice versa for terrestrial control.

Figure 5 (first row) shows regional variations in the strength of nI (SM, LE) within each product. CCI–FLUXCOM and CCI–GLEAM show a comparable spatial pattern with a spatial correlation of 0.80. Consistent with previous studies that investigated the linear relationship between SM and LE through correlation-based metrics, they both show relatively high nI (SM, LE) in dry areas (e.g., the U.S. Great Basin, Kalahari Desert, Mexico, and Australia) and relatively low nI(SM, LE) in densely vegetated areas (e.g., eastern United States and South America lowlands). This consistency is promising for benchmarking model performance.

In comparisons with observationally based data, both unconstrained and constrained configurations of ECMWF models characterize the shared information between SM and LE to be much larger, particularly over arid and semiarid regions. The nI(SM, LE) from the pairs GLEAM–ECMWF, GLEAM–ERA5, FLUXCOM–ECMWF, and FLUXCOM–ERA5 show spatial correlations of 0.50, 0.61, 0.50, and 0.50, respectively. The differences between observationally based products and models could be explained by the shortcomings of HTESSEL reported by previous studies: Martens et al. (2020) found GLEAM forced with ERA5 meteorological data outperforms ERA5 in terms of LE estimations in dry regimes, revealing HTESSEL shortcomings in capturing the response of surface energy partitioning to heat and drought stress. Best et al. (2015) found LSM, including HTESSEL, performance in estimating LE deteriorates at locations with low annual precipitation, suggesting suboptimal representation of the SM–LE relationship in LSMs during water-stressed conditions. Ukkola et al. (2016) also found LSMs, including HTESSEL, are systematically biased in their estimates of LE under water-stressed conditions. Stevens et al. (2020) found the fast water depletion from the top soil layer due to the suboptimal root distribution formulation in HTESSEL causes excessive evaporative drought. Furthermore, deficiencies in the partitioning between latent and sensible heat fluxes in HTESSEL (Trigo et al. 2015), simplified linear water stress function (which regulates evaporative stress) used in HTESSEL, large noise to signal ratios of the satellite-based soil moisture data used to constrain the model (Kumar et al. 2018), as well as generally poor performance of soil moisture data assimilation schemes over arid regions (causing unrealistic large temporal SM variations), could all be contributing causes of the discrepancies between models and observationally based data.

Note that while we have addressed the effects SM random observational error in the calculation of observationally based IT metrics (section 3b), we have not corrected the IT metrics from ERA5 because the PDF of random errors after assimilation of noisy satellite soil moisture observations into a complex Earth system model is not known. Also note that GLEAM and FLUXCOM are forced by reanalyses of air temperature and radiation (but not precipitation in the case of GLEAM) from ERA5. This reduces the effect of input data differences on the conclusions from comparison between observationally based products and models. The percentage of significant mutual information (P) is greater than 94% in the observationally based data and models. The larger value of P compared to that from correlation-based coupling metrics (e.g., Abdolghafoorian and Dirmeyer 2021) indicates the existence of nonlinear relationships between SM and LE that cannot be detected under a linear framework (Hsu and Dirmeyer 2021). This suggests that correlation-based coupling metrics may underestimate the strength of SM–LE dependencies.

The global distribution of nT(SM > LE) depicts the regional variations within each product. The spatial patterns of nT(SM > LE) from CCI–FLUXCOM and CCI–GLEAM are not compatible and spatial correlation is −0.05. CCI–FLUXCOM shows relatively high nT(SM > LE) over high latitudes and densely vegetated regions such as the eastern United States (where weak SM–LE dependency is expected) and relatively low nT(SM > LE) over regions where a strong SM–LE dependency is expected such as the western United States. The spatial pattern of nT(SM > LE) from FLUXCOM is not in agreement with models either. The spatial correlation of the pairs FLUXCOM–ECMWF and FLUXCOM–ERA5 are −0.16 and −0.17, respectively. We also find a very low agreement between nT(SM > LE) from FLUXCOM and in situ nT(SM > LE) at the FLUXNET flux tower locations (i.e., the Pearson correlation coefficient = 0.13, p value > 0.30). In addition, unlike CCI–GLEAM, ECMWF, and ERA5, the spatial patterns of nI(SM, LE) and nT(SM > LE) from CCI–FLUXCOM show negative correlations. This may have to do with FLUXCOM’s machine learning based algorithm for determining surface fluxes from a combination of data sources, having no physics-based algorithmic relationship between SM and LE. Based on these findings, the map of nT(SM > LE) from CCI–FLUXCOM is not used for benchmarking of models.

The percentage of grid cells with significant nT(SM > LE) is smaller than the percentage of grid cells with significant nI(SM, LE). This indicates that even though SM and LE synchronize over 95% of regions in CCI–GLEAM, as an example, predictive information from the history of SM to LE is significant over only about 49% of regions.

The spatial correlations between global maps of nI(SM, LE) and nT(SM > LE) from CCI–GLEAM, ECMWF, and ERA5 are 0.46, 0.73, and 0.66, respectively. Most regions, particularly semiarid regions with a strong contemporaneous SM–LE dependency, also show a strong time-lagged SM–LE dependency. However, strong nT(SM > LE) is also found over Central Asia and some of regions that have shown relatively weak nI(SM, LE). Strong nT(SM > LE) is also found over regions that show weak to moderate coupling in other studies during boreal summer, such as the U.S. Midwest and parts of subarctic Canada [e.g., Abdolghafoorian and Dirmeyer (2021) and Koster et al. (2004)].

The spatial correlations across the middle row of nT(SM > LE) from the pairs GLEAM–ECMWF and GLEAM–ERA5 are 0.43 and 0.41, respectively. The differences in spatial patterns of nT(SM > LE) are due to overestimation of nT(SM > LE) from ECMWF and ERA5 over regions such as the eastern United States and Sahara and Arabian Deserts and underestimation of nT(SM > LE) from ECMWF and ERA5 over regions such as Australia and the western United States. Relatively low nT(SM > LE) from CCI–GLEAM over the Sahara and Arabian Deserts is consistent with the results of the study by Abdolghafoorian and Dirmeyer (2021), where the strength of LA coupling over these regions was found to be weak in four observationally based products (including the combination of CCI–GLEAM) when using the correlation-based terrestrial coupling index that takes into account the variability in latent heat flux over time as a response term. Very low temporal variability of LE and consequently low entropy of LE leads to a weak influence of SM on LE over extremely arid regions. A larger positive nT(SM > LE) bias of ERA5 than ECMWF over the Sahara and Arabian Deserts suggests this approach is very sensitive to errors introduced to the modeling system over extremely arid climates by the assimilation of satellite-based soil moisture. Also, it should be noted that similar to FLUXCOM, HTESSEL uses (monthly) LAI climatology in order to characterize the effect of vegetation phenology, and consequently, the effect of daily variations of vegetation is not reflected in surface fluxes anomalies. This might be responsible for some of the differences seen between HTESSEL and GLEAM, as it might explain the nT(SM > LE) differences between FLUXCOM and others.

The maps of dominant coupling type reveal how SM and LE are related to each other over the globe and whether two variables synchronize and interact strongly with no time lag and/or one variable forces the other. Based on the map of coupling types from observationally based CCI–GLEAM data, the high latitudes, Southeast China, and East India are mostly classified as the synchronization type where SM and LE are related at lag zero in JJA. These regions have been traditionally categorized in the soil moisture wet regime based on the classic view of potential water limitation impact on surface fluxes (Schwingshackl et al. 2017) where SM and LE have no linear relationship and SM does not control LE. The results here, however, suggest the existence of SM–LE coupling at lag zero.

The dominant coupling type in the western United States, Australia, the Kazakh steppes, the Kalahari Desert, and part of the Sahara Desert is classified as feedback where both mutual information and transfer of entropy are significant. However, the shared information at lag zero is more than the predictive information from the history of SM to LE. These regions are traditionally categorized in the dry soil moisture regime where a very small dynamic range of soil moisture of the regions has little impact on LE. The forcing SM–LE coupling type is observed over the tropical grasslands of Africa (savannas); cool semiarid regions in North America, South America, and Asia; and a small portion of Europe, which have been predominantly recognized as a transitional soil moisture regime in other studies (Schwingshackl et al. 2017). A few spots are categorized in the uncoupled SM–LE coupling type, mostly in Arabian and Sahara deserts where traditional classification is a very dry soil moisture regime.

The spatial pattern of coupling type from observationally based CCI–GLEAM data is quite different from that of the unconstrained ECMWF model, where the SM–LE relationship is classified as feedback over almost all of the world (over 93% of land area), suggesting a uniform SM–LE relationship in the model regardless of climate or land cover. There is better agreement between observationally based CCI–GLEAM data and ERA5 than with the unconstrained ECMWF retrospective forecasts. The coupling type in ERA5 is accurately represented over regions such as Southeast Asia, East Africa, much of the high latitudes, and the African savannas.

Figure 6 compares the performance of ECMWF and ERA5 in representing the coupling type in comparison with observationally based CCI–GLEAM data. ECMWF and ERA5 respectively identify the SM–LE coupling type of 37.8% and 49.7% of grid cells correctly (but not assuredly for the right physical reasons). The informedness metric for synchronization, feedback, forcing, and uncoupled coupling type in ECWF is 0.02, 0.07, −0.01, and 0.00, while in ERA5 is 0.24, 0.27, 0.09, and 0.01, respectively. The lower overall precision and also lower informedness metric for each coupling type in ECMWF compared to ERA5 reveals the effectiveness of data assimilation on the improvement of LA coupling in the ECMWF prediction model. It is also clear that the models skew excessively toward the feedback coupling type, and they particularly misattribute the synchronization coupling type as feedback type.

Fig. 6.
Fig. 6.

Confusion matrix for (left) ECMWF and (right) ERA5 indicating the performance of models in the representation of SM–LE coupling type. Each entry in the matrices denotes the percentage of grid cells in that categorical pair; the sum in each matrix is 100%. The percentages of grid cells that are precisely classified by models are highlighted in red.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

c. SM–SH linkage

Figure 7 compares strength and coupling type of SM–SH relations in the observationally based products (i.e., CCI–GLEAM, the first column, and CCI–FLUXCOM, the second column), the unconstrainted coupled model (i.e., ECMWF, the third column), and the constrained reanalysis model (i.e., ERA5, the fourth column). The global distribution of nI(SM, SH), nT(SM > SH), and SM–SH dominant coupling type are depicted in the first, second, and third rows, respectively. The spatial mean of metrics over grid cells having a finite value from all four products, and the percentage of grid cells with significant values (p value < 0.01, represented by P) are shown above each map in Fig. 7.

Fig. 7.
Fig. 7.

The global distribution of normalized mutual information, nI(SM, SH), normalized transfer entropy, nT(SM > SH), and SM–SH dominant coupling types from GLEAM, FLUXCOM, ECMWF, and ERA5 for boreal summer (June–August).

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

Figure 7 (first row) shows regional variations in the strength of nI(SM, SH) within each product. Even though the strength of nI(SM, SH) from CCI–FLUXCOM is weaker than CCI–GLEAM (e.g., over Australia and the Great Plains of North America), CCI–FLUXCOM and CCI–GLEAM show a comparable spatial pattern with a spatial correlation of 0.73. They both show a relatively large nI(SM, SH) over the central and western United States, India, Australia, Mexico, and South Africa and a relatively low nI(SM, SH) over densely vegetated areas (e.g., the eastern United States) and high latitudes. This consistency is promising for benchmarking model performance.

In comparison with observationally based products, models mostly show weaker contemporaneous SM–SH dependencies over the Southern Hemisphere. ERA5 represents stronger contemporaneous SM–SH dependency over India and Sahel. The spatial patterns of models are in good agreement with the spatial correlation of 0.62. The spatial pattern of nI(SM, SH) from models, however, shows a moderate degree of consistency with the observationally based products. The spatial correlation of pairs GLEAM–ECMWF, FLUXCOM–ECMWF, GLEAM–ERA5, and FLUXCOM–ERA5 are 0.37, 0.33, 0.33, and 0.32, respectively. Unlike CCI–GLEAM and CCI–FLUXCOM, the two configurations of the ECMWF modeling system represent a relatively strong nI(SM, SH) over regions such as the eastern United States and tropical regions in Southeast Asia (where weak SM–SH dependency is expected) and a relatively weak nI(SM, SH) over regions such as Australia and South Africa (where strong SM–SH dependency is expected).

In comparison with the SM–LE relationships (Fig. 5), SM–SH dependencies are mostly weaker over the globe. ECMWF shows better agreement with CCI–GLEAM in case of SM–LE than SM–SH. This could be due to the better performance of HTESSEL in estimation of latent heat flux than sensible heat flux (Best et al. 2015), even though the representation of sensible heat flux is less complex than latent heat flux (Pitman 2003).

Figure 7 (second row) shows regional variations in the strength of nT(SM > SH) within each product. Similar to nI(SM, SH), the strength of nT(SM > SH) from observationally based data is stronger over arid and semiarid regions such as Australia, the eastern United States, and southern Africa. The regional variation of nT(SM > SH) from FLUXCOM is smaller than that from GLEAM but there is a statistically positive spatial correlation between maps of nT(SM > SH) from FLUXCOM and GLEAM (i.e., 0.33).

The domain averaged nT(SM > SH) from models and observationally based datasets is comparable. However, ERA5 overestimates nT(SM > SH) over very dry areas (e.g., the Sahel). The nT(SM > SH) from models show a weaker spatial correlation with observationally based data than nI(SM, SH). The spatial correlation of pairs GLEAM–ECMWF, GLEAM–ERA5, FLUXCOM–ECMWF, and FLUXCOM–ERA5 are 0.16, 0.24, 0.07, and 0.04, respectively. As mentioned earlier, the performance of LSMs, including HTESSEL, is weaker in estimating SH than LE. It was also shown that they are outperformed even by a linear regression against downward shortwave radiation (Best et al. 2015; Haughton et al. 2016), indicating deficiencies in parameterization of SH.

Figure 7 (third row) shows the global dominant coupling type between SM and SH. GLEAM and FLUXCOM are mostly similar in classifying the SM–SH coupling type over the globe. Based on the map of coupling type from observationally based data, high latitudes, southeast China, southern Mexico, the southeastern United States, and India are mostly classified as synchronization type where SM and SH are related at the lag of zero in JJA, but there is no significant predictive information from SM to SH. The dominant coupling type in the western United States, Australia, the Kazakh steppes, and Kalahari Desert is classified as feedback where both mutual information and transfer of entropy are statistically significant, however, the shared information at lag zero is stronger than predictive information from the history of SM to SH. The SM–SH coupling type is observed over few locations mostly located over sparsely vegetated areas. Few locations are identified as uncoupled SM–SH coupling type mostly in the Arabian and Sahara Deserts and some over high latitudes.

Similar to SM–LE, the unconstrained ECMWF model represents the dominant SM–SH coupling type as feedback almost everywhere over the world (more than 92% of land area), suggesting a uniform prescribed SM–SH relationship in the model regardless of climate and land cover type of regions. This is most probably due to the misrepresentation of time-lagged SM–SH dependencies in the model, as the low spatial correlation between maps of nT(SM > SH) from ECMWF and observationally based data is evident in Fig. 7 (second row). Apparently, data assimilation improves the performance of the LSM, as SM–SH coupling type from ERA5 show a better agreement with observationally based data. The coupling type in ERA5 is accurately represented as synchronization over regions such as India parts of Southeast Asia and Mexico and as uncoupled in Sahara and Arabian Deserts.

Figure 8 compares the performance of ECMWF and ERA5 in representing the coupling type in comparison with observationally based data. Considering GLEAM as the true benchmark, ECMWF and ERA5 precisely identify the SM–SH coupling type of 39.9% and 48.8% of grid cells, respectively. The informedness metric for synchronization, feedback, forcing, and uncoupled coupling type in ECMWF is 0.00, 0.02, 0.04, and 0.00, while in ERA5 is 0.12, 0.15, 0.03, and 0.05, respectively. Considering FLUXCOM as a true benchmark, ECMWF and ERA5 correctly identify the SM–SH coupling type of 28.9% and 39.5% of grid cells, respectively. The informedness metric for synchronization, feedback, forcing, and uncoupled coupling type in ECMWF is 0.00, 0.01, 0.03, and 0.00, while in ERA5 is 0.10, 0.14, 0.00, and 0.01, respectively. The lower overall precision and lower informedness metric for each coupling type in ECMWF compared to ERA5 suggest the effectiveness of data assimilation toward the improvement of LA coupling representation in the ECMWF prediction system.

Fig. 8.
Fig. 8.

Confusion matrix for (left) ECMWF and (right) ERA5 indicating the performance of models in the representation of SM–SH coupling type. Each entry in the matrices denotes the percentage of grid cells in that categorical pair; the sum in each matrix is 100%. The percentages of grid cells that are precisely classified by models are highlighted in red.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

5. Conclusions

In this study, we have employed information theoretic (IT) metrics, including mutual information (I) and transfer entropy (T), to study the coupling strength and type between surface soil moisture and land surface fluxes of sensible heat (SH) and latent heat (LE), known as the terrestrial leg of LA interactions. IT metrics are nonparametric and suitable for detecting nonlinear, contemporaneous, time-lagged, and cause-and-effect dependencies. Particularly, we have derived novel IT formulations and a method to account for the effect of observational random errors in remote sensing and estimate the strength of SM–surface heat flux coupling when using noisy time series of soil moisture. The mitigation of noise (MoN) method provides a powerful means to ameliorate the degradation of coupling estimates caused by random errors in variables whose time series resemble a first-order Markov process. Performance of MoN has been demonstrated through a series of synthetic tests and a comparison with metrics calculated from in situ data. Efficacy of the correction was greatest for estimates of soil moisture entropy and for mutual information involving soil moisture; improvements to estimates of transfer entropy were significant but less stark.

Using the MoN method, we have constructed, from gridded observationally based datasets, a global depiction of the information flow and coupling type between SM and surface heat fluxes during boreal summer (June–August). We have used satellite-based soil moisture datasets from ESA CCI and observationally based surface heat flux datasets from GLEAM and FLUXCOM. The comparison between observationally based MoN-derived IT metrics and in situ IT metrics at the locations of FLUXNET towers has revealed that the more erroneous/noisy the SM data are, the more effective MoN could be in improving estimates of IT-based coupling metrics. Addressing the effect of CCI observational random noise would not improve estimates of IT-based coupling metrics if in situ and observationally based surface flux products are not in good agreement. However, the findings from cross validation (i.e., 1° × 1° gridded observationally based IT metrics against in situ IT metrics) should be interpreted with caution due to the representativeness errors resulting from the mismatch in spatial scale between the grid boxes and the instrument footprints, and particularly the vegetation and soil differences between FLUXNET sites and the wider region within a satellite pixel or GLEAM grid cell.

We have based our error estimates on the method of Vinnikov and Yeserkepova (1991) rather than using the uncertainty estimates provided in the ESA CCI soil moisture data, but the MoN method presented here could be implemented using any source of error estimates. In addition, as discussed by Abdolghafoorian and Dirmeyer (2021), there are differences in the character of the time series of surface fluxes from FLUXCOM and GLEAM. This could cause inconsistencies between observationally based maps of IT metrics and coupling type. Unlike FLUXCOM, GLEAM uses ESA CCI soil moisture. On the other hand, unlike GLEAM, FLUXCOM uses in situ surface fluxes as a basis for its estimates. These differences also could contribute to inconsistencies between observationally based maps of IT metrics and coupling type from FLUXCOM and GLEAM.

Consistent with previous studies, a relatively strong SM–surface heat flux relationship is observed over dry areas. In addition, compared to studies that only investigate the linear SM–surface heat flux relationships through correlation-based metrics (e.g., among others), significant terrestrial coupling is observed in extended regions over the globe, as IT metrics enable detection of nonlinear contemporaneous and time-lagged dependencies. Results from gridded observationally based datasets indicate the stronger contemporaneous and time-lagged dependencies between soil moisture and latent heat flux than those between soil moisture and sensible heat flux.

We have used the global maps of information flow from observationally based products as independent verification for the performance of two configurations of the ECMWF modeling system (viz., unconstrained open-loop retrospective forecasts and constrained reanalysis, i.e., ERA5). This includes evaluation of the embedded land surface model (i.e., HTESSEL), atmospheric model components that interact at the surface, and also the effect of data assimilation on the performance of the model system.

Our comparison reveals that both unconstrained and constrained configurations of ECMWF models overrepresent the relationship between SM and LE, particularly over arid and semiarid regions, even after mitigating the effect of noise in the observationally based CCI soil moisture product. This is also the case for the SM–SH relationship in ERA5. The possible reasons for these discrepancies and low-to-moderate consistency between spatial patterns of IT metrics from models and observationally based products have been discussed in section 4. Improved formulation of water stress function (Combe et al. 2016), improved representation of land cover and vegetation (Nogueira et al. 2020), and considering dynamical rooting depth evolution (Liu et al. 2020) may enhance the terrestrial LA coupling in the ECMWF modeling system. The spatial pattern of coupling type from observations is quite different from that of the unconstrained ECMWF model, where the SM–surface heat flux relationships are classified as feedback over almost all the world), suggesting a uniform SM–surface heat flux relationships in the model regardless of climate or land cover. We have seen a better agreement between observationally based data and ERA5 than for the open-loop ECMWF retrospective forecasts; an important part of the improvements may be attributable to soil moisture data assimilation.

It should be noted that neither GLEAM nor FLUXCOM can be considered as ground truth, as they are themselves models fed by remotely sensed data, and consequently, the accuracy of their surface heat flux estimates is impacted by uncertainties introduced by the model structure and input data. Nevertheless, most analyses point toward the suboptimal representation of SM–surface heat flux couplings by models. The differences seen between unconstrained and constrained configurations of ECMWF models are intriguing, but this should not be taken as a conclusive assessment of these models. Rather, this study is more of a demonstration of the MoN methodology and its impact on model validation. There is room for progress on many fronts: expansion of observations, improved analyses, better use of the global coverage of remote sensing data, and higher fidelity models. The methodology presented here can help overcome the perpetual problem of observational error and its impact on quantification of multivariate linkages in the LA system.

Acknowledgments.

This research has been supported by National Aeronautics and Space Administration (NASA) Grant NNX14AM19G. Sensible heat flux data are not part of the official release of GLEAM data. We would like to extend our sincere thanks to Drs. Akash Koppa and Diego Miralles and their colleagues at Ghent University for their assistance and consultation regarding the GLEAM dataset.

Data availability statement.

All datasets used in this study (excluding sensible heat flux data from GLEAM) are publicly available: FLUXCOM at http://www.fluxcom.org/; GLEAM at https://www.gleam.eu/; ESA CCI SM at https://www.esa-soilmoisture-cci.org/; S2S reforecasts at http://apps.ecmwf.int/datasets/data/s2s/; and FLUXNET2015 at https://fluxnet.org/doi/FLUXNET2015/.

APPENDIX A

Description of FLUXNET Field Sites

Table A1 contains the information of selected FLUXNET flux tower sites, including ID, vegetation type, and location.>

Table A1

Selected FLUXNET flux tower site IDs and locations. Further details including the vegetation class abbreviations can be found via the FLUXNET data portal (https://fluxnet.org/data/fluxnet2015-dataset/).

Table A1

APPENDIX B

Derivation of IT Metrics

Mutual information between two variables X and Y can be expressed as a function of marginal entropy of X:H(X), and conditional entropy of X given a realization of Y:H(X|Y), as (see Fig. B1)
I(X,Y)=H(X)H(X|Y).
Fig. B1.
Fig. B1.

Venn diagram showing the relations between mutual information I(X, Y), entropy H(X), and conditional entropy H(X|Y).

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

Here, H(X|Y) is defined as the weighted average of H(X|Y = y) for each possible value of y, using P(Y = y) as the weights:
H(X|Y)=yRy[P(Y=y) H(X|Y=y)].
The H(X|Y = y) is obtained using (1) as
H(X|Y=y)=xRx[P(X=x|Y=y) log2P(X=x|Y=y)].
The P(X + Z|Y = y) can be obtained as a convolution of P(X|Y = y) and P(Z|Y = y) as
P(X+Z=x+z|Y=y)=zRz[P(X=x|Y=y)×P(Z=z|Y=y)].
Given Z and Y are independent, P(Z|Y = y) is equal to P(Z), therefore (B4) can be written as
P(X+Z=x+z|Y=y)=zRz[P(X=x|Y=y) P(Z=z)].
Substituting (B5) into (B3) and the obtained equation into (B2), entropy of X + Z conditional on Y is obtained as
H(X+Z|Y)=yRyxRx{P(Y=y)zRz[P(X=x|Y=y)P(Z=z)]×log2zRz[P(X=x|Y=y) P(Z=z)]}.
Substituting Eqs. (6) and (B6) in Eq. (B1), the mutual information between X + Z and Y is obtained as
I(X+Z,Y)=xRx {zRz[P(X=x)P(Z=z)]log2zRz[P(X=x)P(Z=z)]}+yRyxRx{P(Y=y)zRz[P(X=x|Y=y)P(Z=z)] log2zRz[P(X=x|Y=y) P(Z=z)]},
which can be presented in a simplified format as shown in Eq. (7):
I(X+Z,Y)=H(P(X)*P(Z))yRy[P(Y=y) H(P(X|Y=y)*P(Z))].
Transfer of entropy from X to Y can be expressed as a function of the conditional entropy of Xτ given Y1:H(Xτ|Y1), and the conditional entropy of Xτ given a realization of Y and Y1:H(Xτ|Y, Y1), as (see Fig. B2)
T(X>Y)=H(Xτ|Y1)H(Xτ|Y,Y1),
where the time lag of time series is indicated by subscripts.
Fig. B2.
Fig. B2.

Venn diagram showing the relations between transfer entropy, T(X > Y), conditional entropy of Xτ given Y1, H(Xτ|Y1), and conditional entropy of Xτ given Y and Y1, H(Xτ|Y, Y1). Subscripts represent time lag of time series.

Citation: Journal of Hydrometeorology 23, 10; 10.1175/JHM-D-21-0232.1

The H(Xτ|Y1) is defined as the weighted average of H(Xτ|Y1 = y1) for each possible value y1, using P(Y1 = y1) as the weights:
H(Xτ|Y1)=y1Ry1P(Y1= y1) H(Xτ|Y1= y1).
The H(Xτ|Y1 = y1) is obtained using Eq. (1) as
H(Xτ|Y1=y1)=xτRxτ[P(Xτ=xτ|Y1=y1)×log2P(Xτ=xτ|Y1=y1)].
The H(Xτ|Y, Y1) is defined as the weighted average of H(Xτ|Y = y, Y1 = y1), using the joint probability of P(Y = y, Y1= y1) as the weights:
H(Xτ|Y,Y1)=y1Ry1yRy[P(Y=y,Y1=y1) H(Xτ|Y=y,Y1=y1)]H(Xτ|Y,Y1)=y1Ry1yRy[P(Y=y,Y1=y1) H(Xτ|Y=y,Y1=y1)].
The H(Xτ|Y = y, Y1 = y1) is obtained using Eq. (1) as
H(Xτ|Y=y,Y1=y1)=xτRxτ[P(Xτ=xτ|Y=y,Y1=y1)×log2P(Xτ=xτ|Y=y,Y1=y1)].
The conditional probability of (X + Z)τ given Y and Y1 is obtained as a convolution of P(Xτ|Y = y, Y1 = y1) and P(Zτ |Y = y, Y1 = y1) as
P((X+Z)τ|Y=y,Y1=y1)=zRz[P(Xτ=xτ|Y=y,Y1=y1) P(Z=z)].
Note that P(Zτ|Y = y, Y1 = y1) is equal to P(Zτ), given Z and Y are independent and the PDF of Z is the same as the PDF of Zτ, given Z is a random variable with an assumed normal distribution. The entropy of (X + Z)τ given Y and Y1 is obtained as
H((X+Z)τ|Y,Y)=y1Ry1[yRy(P(Y=y,Y1=y1)xτRxτ{zRz[P(Xτ=xτ|Y=y,Y1=y1)P(Z)log2P(Xτ=xτ|Y=y,Y1=y1)P(Z)]})],
which can be presented as
H((X+Z)τ|Y,Y)=y1Ry1yRy[P(Y=y,Y1=y1) H(P(Xτ|Y=y,Y1=y1)*P(Z))].
Substituting (B15) into (B8), the transfer entropy from X + Z to Y is obtained as Eq. (8):
T(X+Z>Y)=y1Ry1[P(Y1=y1)H(P(Xτ|Y1=y1)*P(Z))]y1Ry1yRy[P(Y=y,Y1=y1)H(P(Xτ|Y=y,Y1=y1)*P(Z))].

REFERENCES

  • Abdolghafoorian, A., and P. A. Dirmeyer, 2021: Validating the land–atmosphere coupling behavior in weather and climate models using observationally based global products. J. Hydrometeor., 22, 15071523, https://doi.org/10.1175/JHM-D-20-0183.1.

    • Search Google Scholar
    • Export Citation
  • Balsamo, G., F. Pappenberger, E. Dutra, P. Viterbo, and B. van den Hurk, 2011: A revised land hydrology in the ECMWF model: A step towards daily water flux prediction in a fully-closed water cycle. Hydrol. Processes, 25, 10461054, https://doi.org/10.1002/hyp.7808.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Barnett, L., A. B. Barrett, and A. K. Seth, 2009: Granger causality and transfer entropy are equivalent for Gaussian variables. Phys. Rev. Lett., 103, 238701, https://doi.org/10.1103/PhysRevLett.103.238701.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bennett, A., B. Nijssen, G. Ou, M. Clark, and G. Nearing, 2019: Quantifying process connectivity with transfer entropy in hydrologic models. Water Resour. Res., 55, 46134629, https://doi.org/10.1029/2018WR024555.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Best, M. J., and Coauthors, 2015: The plumbing of land surface models: Benchmarking model performance. J. Hydrometeor., 16, 14251442, https://doi.org/10.1175/JHM-D-14-0158.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Combe, M., J. V. G. de Arellano, H. G. Ouwersloot, and W. Peters, 2016: Plant water-stress parameterization determines the strength of land–atmosphere coupling. Agric. For. Meteor., 217, 6173, https://doi.org/10.1016/j.agrformet.2015.11.006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cover, T. M., and J. A. Thomas, 2006: Elements of Information Theory. 2nd ed. John Wiley & Sons, Ltd., 556 pp.

  • Delworth, T. L., and S. Manabe, 1988: The influence of potential evaporation on the variabilities of simulated soil wetness and climate. J. Climate, 1, 523547, https://doi.org/10.1175/1520-0442(1988)001<0523:TIOPEO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., 2011: The terrestrial segment of soil moisture-climate coupling. Geophys. Res. Lett., 38, L16702, https://doi.org/10.1029/2011GL048268.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., and Coauthors, 2016: Confronting weather and climate models with observational data from soil moisture networks over the United States. J. Hydrometeor., 17, 10491067, https://doi.org/10.1175/JHM-D-15-0196.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., P. Gentine, M. B. Ek, and G. Balsamo, 2019: Land surface processes relevant to sub-seasonal to seasonal (S2S) prediction. Sub-Seasonal to Seasonal Prediction, Elsevier, 165181, https://doi.org/10.1016/B978-0-12-811714-9.00008-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dorigo, W., and Coauthors, 2017: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions. Remote Sens. Environ., 203, 185215, https://doi.org/10.1016/j.rse.2017.07.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Findell, K. L., P. Gentine, B. R. Lintner, and B. P. Guillod, 2015: Data length requirements for observational estimates of land-atmosphere coupling strength. J. Hydrometeor., 16, 16151635, https://doi.org/10.1175/JHM-D-14-0131.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gerken, T., and Coauthors, 2019: Robust observations of land-to-atmosphere feedbacks using the information flows of FLUXNET. npj Climate Atmos. Sci., 2, 37, https://doi.org/10.1038/s41612-019-0094-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goodwell, A. E., P. Jiang, B. L. Ruddell, and P. Kumar, 2020: Debates—Does information theory provide a new paradigm for Earth science? Causality, interaction, and feedback. Water Resour. Res., 56, e2019WR024940, https://doi.org/10.1029/2019WR024940.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guo, Z., and Coauthors, 2006: GLACE: The Global Land–Atmosphere Coupling Experiment. Part II: Analysis. J. Hydrometeor., 7, 611625, https://doi.org/10.1175/JHM511.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haghighi, E., D. J. S. Gianotti, R. Akbar, G. D. Salvucci, and D. Entekhabi, 2018: Soil and atmospheric controls on the land surface energy balance: A generalized framework for distinguishing moisture-limited and energy-limited evaporation regimes. Water Resour. Res., 54, 18311851, https://doi.org/10.1002/2017WR021729.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haughton, N., and Coauthors, 2016: The plumbing of land surface models: Is poor performance a result of methodology or data quality? J. Hydrometeor., 17, 17051723, https://doi.org/10.1175/JHM-D-15-0171.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

  • Hsu, H., and P. A. Dirmeyer, 2021: Nonlinearity and multivariate dependencies in the terrestrial leg of land-atmosphere coupling. Water Resour. Res., 57, e2020WR028179, https://doi.org/10.1029/2020WR028179.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jacquette, E., A. Al Bitar, A. Mialon, Y. Kerr, A. Quesney, F. Cabot, and P. Richaume, 2010: SMOS CATDS level 3 global products over land. Proc. SPIE, 7824, 78240K, https://doi.org/10.1117/12.865093.

    • Search Google Scholar
    • Export Citation
  • Jung, M., and Coauthors, 2019: The FLUXCOM ensemble of global land-atmosphere energy fluxes. Sci. Data, 6, 74, https://doi.org/10.1038/s41597-019-0076-8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knuth, K. H., 2006: Optimal data-based binning for histograms. arXiv, physics/0605197, https://doi.org/10.48550/arXiv.physics/0605197.

  • Knuth, K. H., A. Gotera, C. T. Curry, K. A. Huyser, K. R. Wheeler, and W. B. Rossow, 2013: Revealing relationships among relevant climate variables with information theory. arXiv, 1311.4632, https://doi.org/10.48550/arXiv.1311.4632.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, https://doi.org/10.1126/science.1100217.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2006: GLACE: The Global Land–Atmosphere Coupling Experiment. Part I: Overview. J. Hydrometeor., 7, 590610, https://doi.org/10.1175/JHM510.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kumar, S., P. A. Dirmeyer, C. D. Peters-Lidard, R. Bindlish, and J. Bolten, 2018: Information theoretic evaluation of satellite soil moisture retrievals. Remote Sens. Environ., 204, 392400, https://doi.org/10.1016/j.rse.2017.10.016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawrence, D. M., P. E. Thornton, K. W. Oleson, and G. B. Bonan, 2007: The partitioning of evapotranspiration into transpiration, soil evaporation, and canopy evaporation in a GCM: Impacts on land-atmosphere interaction. J. Hydrometeor., 8, 862880, https://doi.org/10.1175/JHM596.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, L., and Coauthors, 2020: A causal inference model based on random forests to identify the effect of soil moisture on precipitation. J. Hydrometeor., 21, 11151131, https://doi.org/10.1175/JHM-D-19-0209.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, X., F. Chen, M. Barlage, and D. Niyogi, 2020: Implementing dynamic rooting depth for improved simulation of soil moisture and land surface feedbacks in Noah-MP-crop. J. Adv. Model. Earth Syst., 12, e2019MS001786, https://doi.org/10.1029/2019MS001786.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lou, W., P. Liu, L. Cheng, and Z. Li, 2022: Identification of soil moisture–precipitation feedback based on temporal information partitioning networks. J. Amer. Water Resour. Assoc., https://doi.org/10.1111/1752-1688.12978, in press.

    • Search Google Scholar
    • Export Citation
  • Marschinski, R., and H. Kantz, 2002: Analysing the information flow between financial time series. Eur. Phys. J., 30B, 275281, https://doi.org/10.1140/epjb/e2002-00379-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Martens, B., and Coauthors, 2017: GLEAM v3: Satellite-based land evaporation and root-zone soil moisture. Geosci. Model Dev., 10, 19031925, https://doi.org/10.5194/gmd-10-1903-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Martens, B., D. L. Schumacher, H. Wouters, J. Muñoz-Sabater, N. E. C. Verhoest, and D. G. Miralles, 2020: Evaluating the land-surface energy partitioning in ERA5. Geosci. Model Dev., 13, 41594181, https://doi.org/10.5194/gmd-13-4159-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nogueira, M., C. Albergel, S. Boussetta, F. Johanssen, and E. Dutra, 2020: On the added value of improving the spatial representation and seasonal variations of vegetation cover in land surface models for simulated land surface temperature. EGU General Assembly 2020, Online, European Geophysical Union, EGU2020-18110, https://doi.org/10.5194/egusphere-egu2020-18110.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pitman, A. J., 2003: The evolution of, and revolution in, land surface schemes designed for climate models. Int. J. Climatol., 23, 479510, https://doi.org/10.1002/joc.893.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Robock, A., K. Y. Vinnikov, C. A. Schlosser, N. A. Speranskaya, and Y. Xue, 1995: Use of midlatitude soil moisture and meteorological observations to validate soil moisture simulations with biosphere and bucket models. J. Climate, 8, 1535, https://doi.org/10.1175/1520-0442(1995)008<0015:UOMSMA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ruddell, B. L., and P. Kumar, 2009: Ecohydrologic process networks: 1. Identification. Water Resour. Res., 45, W03419, https://doi.org/10.1029/2008WR007279.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ruddell, B. L., N. A. Brunsell, and P. Stoy, 2013: Applying information theory in the geosciences to quantify process uncertainty, feedback, scale. Eos, Trans. Amer. Geophys. Union, 94, 5656, https://doi.org/10.1002/2013EO050007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Salvucci, G. D., J. A. Saleem, and R. Kaufmann, 2002: Investigating soil moisture feedbacks on precipitation with tests of Granger causality. Adv. Water Resour., 25, 13051312, https://doi.org/10.1016/S0309-1708(02)00057-X.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., J. Roundy, and P. A. Dirmeyer, 2015: Quantifying the land-atmosphere coupling behavior in modern reanalysis products over the U.S. Southern Great Plains. J. Climate, 28, 58135829, https://doi.org/10.1175/JCLI-D-14-00680.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schwingshackl, C., M. Hirschi, and S. I. Seneviratne, 2017: Quantifying spatiotemporal variations of soil moisture control on surface energy balance and near-surface air temperature. J. Climate, 30, 71057124, https://doi.org/10.1175/JCLI-D-16-0727.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shannon, C. E., 1948: A mathematical theory of communication. Bell Syst. Tech. J., 27, 379423, https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Short Gianotti, D. J., A. J. Rigden, G. D. Salvucci, and D. Entekhabi, 2019: Satellite and station observations demonstrate water availability’s effect on continental‐scale evaporative and photosynthetic land surface dynamics. Water Resour. Res., 55, 540554, https://doi.org/10.1029/2018WR023726.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevens, D., P. M. A. Miranda, R. Orth, S. Boussetta, G. Balsamo, and E. Dutra, 2020: Sensitivity of surface fluxes in the ECMWF land surface model to the remotely sensed leaf area index and root distribution: Evaluation with tower flux data. Atmosphere, 11, 1362, https://doi.org/10.3390/atmos11121362.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tawia Hagan, D. F., G. Wang, X. San Liang, and H. A. J. Dolman, 2019: A time-varying causality formalism based on the Liang–Kleeman information flow for analyzing directed interactions in nonstationary climate systems. J. Climate, 32, 75217537, https://doi.org/10.1175/JCLI-D-18-0881.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trigo, I. F., S. Boussetta, P. Viterbo, G. Balsamo, A. Beljaars, and I. Sandu, 2015: Comparison of model land skin temperature with remotely sensed estimates and assessment of surface-atmosphere coupling. J. Geophys. Res. Atmos., 120, 12 09612 111, https://doi.org/10.1002/2015JD023812.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ukkola, A. M., M. G. de Kauwe, A. J. Pitman, M. J. Best, G. Abramowitz, V. Haverd, M. Decker, and N. Haughton, 2016: Land surface models systematically overestimate the intensity, duration and magnitude of seasonal-scale evaporative droughts. Environ. Res. Lett., 11, 104012, https://doi.org/10.1088/1748-9326/11/10/104012.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vinnikov, K. Y., and I. B. Yeserkepova, 1991: Soil moisture: Empirical data and model results. J. Climate, 4, 6679, https://doi.org/10.1175/1520-0442(1991)004<0066:SMEDAM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vitart, F., 2014: Evolution of ECMWF sub-seasonal forecast skill scores. Quart. J. Roy. Meteor. Soc., 140, 18891899, https://doi.org/10.1002/qj.2256.

  • Vitart, F., and Coauthors, 2017: The Subseasonal to Seasonal (S2S) Prediction project database. Bull. Amer. Meteor. Soc., 98, 163173, https://doi.org/10.1175/BAMS-D-16-0017.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
Save
  • Abdolghafoorian, A., and P. A. Dirmeyer, 2021: Validating the land–atmosphere coupling behavior in weather and climate models using observationally based global products. J. Hydrometeor., 22, 15071523, https://doi.org/10.1175/JHM-D-20-0183.1.

    • Search Google Scholar
    • Export Citation
  • Balsamo, G., F. Pappenberger, E. Dutra, P. Viterbo, and B. van den Hurk, 2011: A revised land hydrology in the ECMWF model: A step towards daily water flux prediction in a fully-closed water cycle. Hydrol. Processes, 25, 10461054, https://doi.org/10.1002/hyp.7808.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Barnett, L., A. B. Barrett, and A. K. Seth, 2009: Granger causality and transfer entropy are equivalent for Gaussian variables. Phys. Rev. Lett., 103, 238701, https://doi.org/10.1103/PhysRevLett.103.238701.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bennett, A., B. Nijssen, G. Ou, M. Clark, and G. Nearing, 2019: Quantifying process connectivity with transfer entropy in hydrologic models. Water Resour. Res., 55, 46134629, https://doi.org/10.1029/2018WR024555.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Best, M. J., and Coauthors, 2015: The plumbing of land surface models: Benchmarking model performance. J. Hydrometeor., 16, 14251442, https://doi.org/10.1175/JHM-D-14-0158.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Combe, M., J. V. G. de Arellano, H. G. Ouwersloot, and W. Peters, 2016: Plant water-stress parameterization determines the strength of land–atmosphere coupling. Agric. For. Meteor., 217, 6173, https://doi.org/10.1016/j.agrformet.2015.11.006.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cover, T. M., and J. A. Thomas, 2006: Elements of Information Theory. 2nd ed. John Wiley & Sons, Ltd., 556 pp.

  • Delworth, T. L., and S. Manabe, 1988: The influence of potential evaporation on the variabilities of simulated soil wetness and climate. J. Climate, 1, 523547, https://doi.org/10.1175/1520-0442(1988)001<0523:TIOPEO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., 2011: The terrestrial segment of soil moisture-climate coupling. Geophys. Res. Lett., 38, L16702, https://doi.org/10.1029/2011GL048268.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., and Coauthors, 2016: Confronting weather and climate models with observational data from soil moisture networks over the United States. J. Hydrometeor., 17, 10491067, https://doi.org/10.1175/JHM-D-15-0196.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmeyer, P. A., P. Gentine, M. B. Ek, and G. Balsamo, 2019: Land surface processes relevant to sub-seasonal to seasonal (S2S) prediction. Sub-Seasonal to Seasonal Prediction, Elsevier, 165181, https://doi.org/10.1016/B978-0-12-811714-9.00008-5.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dorigo, W., and Coauthors, 2017: ESA CCI Soil Moisture for improved Earth system understanding: State-of-the art and future directions. Remote Sens. Environ., 203, 185215, https://doi.org/10.1016/j.rse.2017.07.001.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Findell, K. L., P. Gentine, B. R. Lintner, and B. P. Guillod, 2015: Data length requirements for observational estimates of land-atmosphere coupling strength. J. Hydrometeor., 16, 16151635, https://doi.org/10.1175/JHM-D-14-0131.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Gerken, T., and Coauthors, 2019: Robust observations of land-to-atmosphere feedbacks using the information flows of FLUXNET. npj Climate Atmos. Sci., 2, 37, https://doi.org/10.1038/s41612-019-0094-4.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Goodwell, A. E., P. Jiang, B. L. Ruddell, and P. Kumar, 2020: Debates—Does information theory provide a new paradigm for Earth science? Causality, interaction, and feedback. Water Resour. Res., 56, e2019WR024940, https://doi.org/10.1029/2019WR024940.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Guo, Z., and Coauthors, 2006: GLACE: The Global Land–Atmosphere Coupling Experiment. Part II: Analysis. J. Hydrometeor., 7, 611625, https://doi.org/10.1175/JHM511.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haghighi, E., D. J. S. Gianotti, R. Akbar, G. D. Salvucci, and D. Entekhabi, 2018: Soil and atmospheric controls on the land surface energy balance: A generalized framework for distinguishing moisture-limited and energy-limited evaporation regimes. Water Resour. Res., 54, 18311851, https://doi.org/10.1002/2017WR021729.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Haughton, N., and Coauthors, 2016: The plumbing of land surface models: Is poor performance a result of methodology or data quality? J. Hydrometeor., 17, 17051723, https://doi.org/10.1175/JHM-D-15-0171.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hersbach, H., and Coauthors, 2020: The ERA5 global reanalysis. Quart. J. Roy. Meteor. Soc., 146, 19992049, https://doi.org/10.1002/qj.3803.

  • Hsu, H., and P. A. Dirmeyer, 2021: Nonlinearity and multivariate dependencies in the terrestrial leg of land-atmosphere coupling. Water Resour. Res., 57, e2020WR028179, https://doi.org/10.1029/2020WR028179.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Jacquette, E., A. Al Bitar, A. Mialon, Y. Kerr, A. Quesney, F. Cabot, and P. Richaume, 2010: SMOS CATDS level 3 global products over land. Proc. SPIE, 7824, 78240K, https://doi.org/10.1117/12.865093.

    • Search Google Scholar
    • Export Citation
  • Jung, M., and Coauthors, 2019: The FLUXCOM ensemble of global land-atmosphere energy fluxes. Sci. Data, 6, 74, https://doi.org/10.1038/s41597-019-0076-8.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Knuth, K. H., 2006: Optimal data-based binning for histograms. arXiv, physics/0605197, https://doi.org/10.48550/arXiv.physics/0605197.

  • Knuth, K. H., A. Gotera, C. T. Curry, K. A. Huyser, K. R. Wheeler, and W. B. Rossow, 2013: Revealing relationships among relevant climate variables with information theory. arXiv, 1311.4632, https://doi.org/10.48550/arXiv.1311.4632.

    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, https://doi.org/10.1126/science.1100217.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Koster, R. D., and Coauthors, 2006: GLACE: The Global Land–Atmosphere Coupling Experiment. Part I: Overview. J. Hydrometeor., 7, 590610, https://doi.org/10.1175/JHM510.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Kumar, S., P. A. Dirmeyer, C. D. Peters-Lidard, R. Bindlish, and J. Bolten, 2018: Information theoretic evaluation of satellite soil moisture retrievals. Remote Sens. Environ., 204, 392400, https://doi.org/10.1016/j.rse.2017.10.016.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lawrence, D. M., P. E. Thornton, K. W. Oleson, and G. B. Bonan, 2007: The partitioning of evapotranspiration into transpiration, soil evaporation, and canopy evaporation in a GCM: Impacts on land-atmosphere interaction. J. Hydrometeor., 8, 862880, https://doi.org/10.1175/JHM596.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Li, L., and Coauthors, 2020: A causal inference model based on random forests to identify the effect of soil moisture on precipitation. J. Hydrometeor., 21, 11151131, https://doi.org/10.1175/JHM-D-19-0209.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Liu, X., F. Chen, M. Barlage, and D. Niyogi, 2020: Implementing dynamic rooting depth for improved simulation of soil moisture and land surface feedbacks in Noah-MP-crop. J. Adv. Model. Earth Syst., 12, e2019MS001786, https://doi.org/10.1029/2019MS001786.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Lou, W., P. Liu, L. Cheng, and Z. Li, 2022: Identification of soil moisture–precipitation feedback based on temporal information partitioning networks. J. Amer. Water Resour. Assoc., https://doi.org/10.1111/1752-1688.12978, in press.

    • Search Google Scholar
    • Export Citation
  • Marschinski, R., and H. Kantz, 2002: Analysing the information flow between financial time series. Eur. Phys. J., 30B, 275281, https://doi.org/10.1140/epjb/e2002-00379-2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Martens, B., and Coauthors, 2017: GLEAM v3: Satellite-based land evaporation and root-zone soil moisture. Geosci. Model Dev., 10, 19031925, https://doi.org/10.5194/gmd-10-1903-2017.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Martens, B., D. L. Schumacher, H. Wouters, J. Muñoz-Sabater, N. E. C. Verhoest, and D. G. Miralles, 2020: Evaluating the land-surface energy partitioning in ERA5. Geosci. Model Dev., 13, 41594181, https://doi.org/10.5194/gmd-13-4159-2020.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Nogueira, M., C. Albergel, S. Boussetta, F. Johanssen, and E. Dutra, 2020: On the added value of improving the spatial representation and seasonal variations of vegetation cover in land surface models for simulated land surface temperature. EGU General Assembly 2020, Online, European Geophysical Union, EGU2020-18110, https://doi.org/10.5194/egusphere-egu2020-18110.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Pitman, A. J., 2003: The evolution of, and revolution in, land surface schemes designed for climate models. Int. J. Climatol., 23, 479510, https://doi.org/10.1002/joc.893.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Robock, A., K. Y. Vinnikov, C. A. Schlosser, N. A. Speranskaya, and Y. Xue, 1995: Use of midlatitude soil moisture and meteorological observations to validate soil moisture simulations with biosphere and bucket models. J. Climate, 8, 1535, https://doi.org/10.1175/1520-0442(1995)008<0015:UOMSMA>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ruddell, B. L., and P. Kumar, 2009: Ecohydrologic process networks: 1. Identification. Water Resour. Res., 45, W03419, https://doi.org/10.1029/2008WR007279.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ruddell, B. L., N. A. Brunsell, and P. Stoy, 2013: Applying information theory in the geosciences to quantify process uncertainty, feedback, scale. Eos, Trans. Amer. Geophys. Union, 94, 5656, https://doi.org/10.1002/2013EO050007.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Salvucci, G. D., J. A. Saleem, and R. Kaufmann, 2002: Investigating soil moisture feedbacks on precipitation with tests of Granger causality. Adv. Water Resour., 25, 13051312, https://doi.org/10.1016/S0309-1708(02)00057-X.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Santanello, J. A., J. Roundy, and P. A. Dirmeyer, 2015: Quantifying the land-atmosphere coupling behavior in modern reanalysis products over the U.S. Southern Great Plains. J. Climate, 28, 58135829, https://doi.org/10.1175/JCLI-D-14-00680.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Schwingshackl, C., M. Hirschi, and S. I. Seneviratne, 2017: Quantifying spatiotemporal variations of soil moisture control on surface energy balance and near-surface air temperature. J. Climate, 30, 71057124, https://doi.org/10.1175/JCLI-D-16-0727.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Shannon, C. E., 1948: A mathematical theory of communication. Bell Syst. Tech. J., 27, 379423, https://doi.org/10.1002/j.1538-7305.1948.tb01338.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Short Gianotti, D. J., A. J. Rigden, G. D. Salvucci, and D. Entekhabi, 2019: Satellite and station observations demonstrate water availability’s effect on continental‐scale evaporative and photosynthetic land surface dynamics. Water Resour. Res., 55, 540554, https://doi.org/10.1029/2018WR023726.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Stevens, D., P. M. A. Miranda, R. Orth, S. Boussetta, G. Balsamo, and E. Dutra, 2020: Sensitivity of surface fluxes in the ECMWF land surface model to the remotely sensed leaf area index and root distribution: Evaluation with tower flux data. Atmosphere, 11, 1362, https://doi.org/10.3390/atmos11121362.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Tawia Hagan, D. F., G. Wang, X. San Liang, and H. A. J. Dolman, 2019: A time-varying causality formalism based on the Liang–Kleeman information flow for analyzing directed interactions in nonstationary climate systems. J. Climate, 32, 75217537, https://doi.org/10.1175/JCLI-D-18-0881.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Trigo, I. F., S. Boussetta, P. Viterbo, G. Balsamo, A. Beljaars, and I. Sandu, 2015: Comparison of model land skin temperature with remotely sensed estimates and assessment of surface-atmosphere coupling. J. Geophys. Res. Atmos., 120, 12 09612 111, https://doi.org/10.1002/2015JD023812.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ukkola, A. M., M. G. de Kauwe, A. J. Pitman, M. J. Best, G. Abramowitz, V. Haverd, M. Decker, and N. Haughton, 2016: Land surface models systematically overestimate the intensity, duration and magnitude of seasonal-scale evaporative droughts. Environ. Res. Lett., 11, 104012, https://doi.org/10.1088/1748-9326/11/10/104012.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vinnikov, K. Y., and I. B. Yeserkepova, 1991: Soil moisture: Empirical data and model results. J. Climate, 4, 6679, https://doi.org/10.1175/1520-0442(1991)004<0066:SMEDAM>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Vitart, F., 2014: Evolution of ECMWF sub-seasonal forecast skill scores. Quart. J. Roy. Meteor. Soc., 140, 18891899, https://doi.org/10.1002/qj.2256.

  • Vitart, F., and Coauthors, 2017: The Subseasonal to Seasonal (S2S) Prediction project database. Bull. Amer. Meteor. Soc., 98, 163173, https://doi.org/10.1175/BAMS-D-16-0017.1.

    • Crossref
    • Search Google Scholar
    • Export Citation