Causality-Structured Deep Learning for Soil Moisture Predictions

Lu Li aSouthern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou, Guangdong, China

Search for other papers by Lu Li in
Current site
Google Scholar
PubMed
Close
,
Yongjiu Dai aSouthern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou, Guangdong, China

Search for other papers by Yongjiu Dai in
Current site
Google Scholar
PubMed
Close
,
Wei Shangguan aSouthern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou, Guangdong, China

Search for other papers by Wei Shangguan in
Current site
Google Scholar
PubMed
Close
,
Zhongwang Wei aSouthern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou, Guangdong, China

Search for other papers by Zhongwang Wei in
Current site
Google Scholar
PubMed
Close
,
Nan Wei aSouthern Marine Science and Engineering Guangdong Laboratory (Zhuhai), Guangdong Province Key Laboratory for Climate Change and Natural Disaster Studies, School of Atmospheric Sciences, Sun Yat-sen University, Guangzhou, Guangdong, China

Search for other papers by Nan Wei in
Current site
Google Scholar
PubMed
Close
, and
Qingliang Li bCollege of Computer Science and Technology, Changchun Normal University, Changchun, China

Search for other papers by Qingliang Li in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

The accurate prediction of surface soil moisture (SM) is crucial for understanding hydrological processes. Deep learning (DL) models such as the long short-term memory model (LSTM) provide a powerful method and have been widely used in SM prediction. However, few studies have notably high success rates due to lacking prior knowledge in forms such as causality. Here we present a new causality-structure-based LSTM model (CLSTM), which could learn time interdependency and causality information for hydrometeorological applications. We applied and compared LSTM and CLSTM methods for forecasting SM across 64 FLUXNET sites globally. The results showed that CLSTM dramatically increased the predictive performance compared with LSTM. The Nash–Sutcliffe efficiency (NSE) suggested that more than 67% of sites witnessed an improvement of SM simulation larger than 10%. It is highlighted that CLSTM had a much better generalization ability that can adapt to extreme soil conditions, such as SM response to drought and precipitation events. By incorporating causal relations, CLSTM increased predictive ability across different lead times compared to LSTM. We also highlighted the critical role of physical information in the form of causality structure to improve drought prediction. At the same time, CLSTM has the potential to improve predictions of other hydrometeorological variables.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Yongjiu Dai, daiyj6@mail.sysu.edu.cn

Abstract

The accurate prediction of surface soil moisture (SM) is crucial for understanding hydrological processes. Deep learning (DL) models such as the long short-term memory model (LSTM) provide a powerful method and have been widely used in SM prediction. However, few studies have notably high success rates due to lacking prior knowledge in forms such as causality. Here we present a new causality-structure-based LSTM model (CLSTM), which could learn time interdependency and causality information for hydrometeorological applications. We applied and compared LSTM and CLSTM methods for forecasting SM across 64 FLUXNET sites globally. The results showed that CLSTM dramatically increased the predictive performance compared with LSTM. The Nash–Sutcliffe efficiency (NSE) suggested that more than 67% of sites witnessed an improvement of SM simulation larger than 10%. It is highlighted that CLSTM had a much better generalization ability that can adapt to extreme soil conditions, such as SM response to drought and precipitation events. By incorporating causal relations, CLSTM increased predictive ability across different lead times compared to LSTM. We also highlighted the critical role of physical information in the form of causality structure to improve drought prediction. At the same time, CLSTM has the potential to improve predictions of other hydrometeorological variables.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Yongjiu Dai, daiyj6@mail.sysu.edu.cn

1. Introduction

Surface soil moisture (SM) controls the exchange of energy and water between land surface and atmosphere. Therefore, SM is critical for various environmental research fields, for example, weather forecasting (Koster 2004), identifying the trend of global climate variability (Seneviratne et al. 2013), and agriculture drought monitoring (Narasimhan and Srinivasan 2005). Accurate prediction of SM is important for many applications, such as drought monitoring and irrigation planning (Fang and Shen 2020).

Two main models are commonly used to forecast SM: process-based (PB) and machine learning (ML) models (Besaw et al. 2010; Tongal and Booij 2018; Lee et al. 2020; L. Li et al. 2022). PB models predict SM based on physical processes, such as runoff and infiltration. However, there are still some limitations for predicting SM based on PB models, e.g., the Richards equation that is used to calculate SM is highly nonlinear, so its numerical solution often has strong discontinuity and instability; the soil properties and soil parameters that are critical for SM forecasting have strong spatial heterogeneity and the data accuracy needs to be improved; and other physical processes that affect soil moisture, such as surface water movement, groundwater movement, and the soil–vegetation–atmosphere coupling process, are not well characterized, which brings uncertainty to the model. ML models directly build a relationship between input features and SM, avoiding describing complex physical processes. With traditional ML models (e.g., random forest), some studies have achieved satisfactory results on SM prediction (refer to a review study, Ali et al. 2015). Compared to traditional ML models, deep learning (DL) models significantly increase the ability to process big data, resulting in better performance (LeCun et al. 2015). Recurrent neural network (RNN) is good at time series prediction among DL models. However, RNN could not learn long-term time persistence, which is hard to give precise predictions over a long temporal scale (e.g., weekly) (Hochreiter and Schmidhuber 1997). The long short-term memory model (LSTM), an effective RNN model, utilizes three different gates to transfer long- and short-term memory to solve the long-term dependency problem (Hochreiter and Schmidhuber 1997). LSTM shows excellent predictive power for forecasting SM and is widely used for different hydrologic time series predictions (Read et al. 2019; Gauch et al. 2021; Li et al. 2021). For example, Fang et al. (2017) used LSTM to prolongate Soil Moisture Active Passive (SMAP) SM data spatiotemporally based on climatic forcing and static physiographic attributes and proved that LSTM outperformed traditional ML models. They also argued that LSTM has the advantage of learning time dependencies that originate from its recurrent nature compared to other DL models [e.g., deep neural network (DNN)] when facing time series prediction. Fang et al. (2019) further indicated that long-term LSTM predictions trained by SMAP data could capture long-term trends of in situ data and mostly retain the quality of SMAP. Adeyemi et al. (2018) used LSTM for modeling the temporal soil moisture fluxes. The models were trained to generate a 1-day-ahead prediction of the volumetric soil moisture content based on past soil moisture, precipitation, and climatic measurements, and achieved about 0.94 R2. Furthermore, for soil moisture estimation, Abbes et al. (2019) proposed an LSTM-based methodology to estimate the in situ SM. The input data include the brightness temperature from SMAP, the Moderate Resolution Imaging Spectroradiometer Vegetation Water Content (MODIS-VWC), and the soil temperature. LSTM shows a good ability to estimate the SM values with good accuracy. O and Orth (2021) utilized the LSTM model to extrapolate daily soil moisture based on in situ data collected from more than 1000 stations over the globe and generated the “SoMo.ml” dataset. Hence, the outstanding performance of LSTM on SM simulating is evident.

However, Zhao et al. (2019) pointed out that pure DL models perform poorly outside their calibration range (i.e., poor generalization ability), such as forecasting extremes. Zhao et al. (2019) also emphasized that pure DL models always perform well during training but perform dramatically worse when applied to other independent test data. These disadvantages are to a certain extent caused by lack of prior knowledge in pure DL models, such as physical laws and processes (e.g., energy balance) (Zhao et al. 2019; Read et al. 2019) and causal relations (Runge et al. 2015; Lehmann et al. 2020). To incorporate physical prior knowledge into the pure DL model, previous studies utilized two different ways. First, some studies constrained DL models by introducing physical laws and processes. For example, Read et al. (2019) developed a process-guided DL (PGDL) model based on the law of conservation of energy for lake water temperatures prediction and showed better performance than those from PB and pure DL models. Zhao et al. (2019) proposed a hybrid DL model that conserves the surface energy balance for evapotranspiration prediction, indicating that the hybrid DL model extrapolated better than pure DL models over 82 FLUXNET sites. Second, some studies utilized pure data mining methods (causal discovery algorithms) to find underlying causal relations and used causality information to improve prediction. For example, Kretschmer et al. (2017) proposed a response-guided causal precursor detection method (RG-CPD) to extract the causal precursors, which is able to improve stratospheric polar vortex prediction based on linear regression. Di Capua et al. (2019) further proved that involving causal relations (by RG-CPD) into ANN could improve long forecast time scale (4–5 months) prediction of Indian summer monsoon rainfall. However, these studies mainly focused on selecting causal drivers used as the inputs of DL models rather than modifying DL models’ internal network structure according to causal relations. To solve the problem, Zhang et al. (2019) first used causality information to adjust LSTM network by adding neighborhood gates and used the proposed network for short-term wind speed forecasting. The model is validated on a station and confirms the improvement of the proposed model compared to LSTM. However, the applicability of causality-structured DL model for SM forecasting at different soil conditions over the globe is not investigated.

Causal discovery methods have recently drawn significant attention in hydrometeorological research to infer causal relations from observation data (e.g., Tuttle and Salvucci 2016; Li et al. 2020). Various methods were developed and mainly included four categories: Granger causality (GC), graph-based algorithms (e.g., the PC algorithm), convergent cross mapping (CCM), and structural causal models (SCM). These methods are well documented in the review study for climate research (Runge et al. 2019a) and hydrometeorological research (Ombadi et al. 2020). Among these methods, GC is the first practical method to test for causality and is widely used in hydrometeorological research due to its simplicity and good performance in testing causal interactions (see the appendix for a detailed definition). For example, Tuttle and Salvucci (2016) utilized a linear GC framework to identify the pattern and strength of local soil moisture–precipitation (SM–P) feedback. On this basis, Li et al. (2020) further developed a novel nonlinear GC framework to infer the SM–P feedback. These applications proved the ability of GC to detect causal relations for hydrometeorological research.

In our study, we provide an alternative way to utilize causal relations to improve hydrometeorological prediction based on LSTM and GC (rather than only using causal discovery methods for feature selection). We generated causality structure (a structure contains causal relations of different variables, see section 3) derived from pure data mining methods (correlation and GC tests). And we proposed a DL model, namely, a causality-structured LSTM model (CLSTM), with forwarding propagation according to the causality structure to improve both short and medium forecast time scale (7–15 days) forecasting, especially the ability to forecast extremes. To our best knowledge, this is the first attempt to investigate causality-structured DL model for SM prediction over the globe.

The contributions of this paper are primarily threefold: 1) propose a causality-structured DL model to achieve more accurate predictions, amenable to different soil conditions; 2) validate the improvements of CLSTM compared to LSTM in different aspects (general performance, generality ability, ability to predict extremes) on FLUXNET sites over the globe; and 3) investigate the applicability and effectiveness of CLSTM from short-term to medium extended-range forecasting and discuss the impact of physically constrained prior knowledge on model performance.

The remainder of the paper is organized as follows. Section 2 shows 64 FLUXNET sites data used in the study and corresponding quality control processes. Section 3 describes the proposed CSTM model. Section 4 shows the results of experiments and thorough discussions, and section 5 concludes the paper.

2. Data

We used the daily dataset from FLUXNET2015 (Pastorello et al. 2020). The following quality-control procedures based on Liu et al. (2011) were applied to ensure robust DL models to be built.

  1. We label unrealistic data (such as discontinuities data and unchanged data in the time series that physical processes could not explain) and insufficient observations (quality flag value of SM is less than 1, which is defined as the percentage of measured and good quality gap-filled data, and the fraction between 0 and 1) that are most likely due to changes in sensor installation and calibration (mainly in the beginning of time series) as missing values.

  2. We discard the sites with a data length of fewer than 3 years, and we excluded sites when more than 10% of data are missing data.

These requirements led to a reduction in the number of sites from 165 to 64.

Figure S1 in the online supplemental material shows the spatial distribution of FLUXNET sites used in our study and the number of days of training data available per site. These sites are distributed globally and are mainly located in North America (15 sites), Europe (33 sites), and Australia (12 sites). The average temporal coverage of all sites is ∼8 years. The study sites cover 11 plant functional types over the globe (Table S1). Mean annual temperature, mean annual precipitation, and elevation of these sites range from −1° to 27.25°C, from 320 to 2043.77 mm, and from 6 to 3197 m, respectively.

Our study used surface SM observations as the target variable from 64 FLUXNET sites (https://fluxnet.org/data/fluxnet2015-dataset/). Table S2 shows 13 variables used to forecast surface SM, including meteorological variables [i.e., air temperature, longwave radiation, shortwave radiation, pressure, P, wind speed, vapor pressure deficit (VPD), carbon dioxide (CO2)], energy variables (i.e., soil heat flux, latent heat flux, sensible heat flux) and land surface variables (surface soil temperature and SM). Before the training process of models, we used linear interpolation to gap-filling input features datasets and normalized all feature data to speed convergence (Grus 2019). The normalization process is calculated as
ft=ftfminfmaxfmin,
where t is the time step, fi is the feature value on time step t and fi is the normalized value on time step t. The fmax, fmin variables denote the maximum and minimum value of feature time series, respectively.

3. Model

Figure 1 shows the flowchart of the causality-structured LSTM (CLSTM). The model consists of the following two steps: 1) generate causal structure (a structure contains the different causal relations of different variables) (denoted as G in Fig. 1) from input features (denoted as X in Fig. 1) based on four linear and nonlinear correlation and GC tests and 2) forecast SM based on causal structure and input features by CLSTM. The detail of each step is shown as follows.

Fig. 1.
Fig. 1.

The flowchart of causality-structured long short-term memory (CLSTM). Causality structure (G) is derived from step 1. The CLSTM cell in step 2 forward propagation according to the causality structure and is used to predict soil moisture. Root node is a node with no parent node, and leaf node is a node with no child node. The explanation of mathematical notation is shown in the figure. Notability, G shown in the figure is only an example but is not a real causality structure in our study.

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

a. Generate causality structure

A simple adjacency matrix (a square matrix used to represent a finite graph; the elements of the matrix denote whether pairs of vertices are adjacent or not in the graph, 1 for adjacent and 0 for not) for input features is generated based following four correlation and causality tests: linear correlation test based on Pearson correlation coefficient (PCC), linear Granger causality (LGC) (Granger 1969), and nonlinear Granger causality (NLGC) (Li et al. 2020). Details of these tests are shown in the appendix. Then causality structure is built according to the adjacency matrix. The process includes the following three steps:

  1. Adjacency matrix is denoted as G. For any two input features (denotes as i, j), Gi,j = 1 means i is the causal driver of j. We first calculate the PCC value. If the value is larger than 0.7 (we tune this parameter and find 0.7 is the best; see further discussion in section 4f) and passes the significance test, features i, j have significant correlation (set Gi,j = 1 and Gj,i = 1, otherwise set Gi,j and Gj,i = 0).

  2. Correlation does not imply causality, such as autocorrelation, and common drivers and indirect links may result in spurious correlation (Kretschmer et al. 2017). We further perform both LGC and NLGC tests to infer linear and nonlinear causal relations of pair variables with significant correlation (i.e., correlation value larger than 0.7 and pass the correlation significant test). If both the LGC and NLGC tests are passed, features have significant Granger causal relation (Gi,j is still set to 1, otherwise, set Gi,j = 0).

  3. From the previous two steps, we get the adjacency matrix G. We transform it into a simple causality structure to improve the following prediction. Figure S2 shows the procedure for the transformation. It includes three steps: first, we select SM as the leaf node (see step 1 in Fig. S2). Second, we select causal drivers of SM (i.e., blue tangles in adjacency matrix in Fig. S2) as the parent nodes of the leaf node (see step 2 in Fig. S2). Third, we repeat procedure 2 for each variable, i.e., find parent nodes for each causal driver in step 2 (see step 3 in Fig. S2).

To avoid overfitting of the DL model caused by complex causal structure, we set the depth of causal structure to three levels (the performance of different causality-structure depths is discussed in section 4f). Notably, if there are no significantly casual relations between SM and other features, we set the variable with the most significant correlation as the causal driver of SM to ensure the depth of causality structure is equal to 3.

b. Causality-structured LSTM

Figure 2 shows the general form of the internal structure of a single node (denoted as i) in CLSTM block (denoted as “Causal LSTM” in Fig. 1). For SM prediction based on LSTM, previous studies usually applied hidden state to a fully connect (FC) layer to make SM prediction (Fang and Shen 2020; Ouyang et al. 2021). To involve causality information into LSTM, for each node i, we introduce a new state (named causality state, denoted as C˜i) to constrain the hidden state (denoted as Hi) of LSTM according to parent nodes (of node i) in causality structure. From this, we could utilize causal relations for node i (i.e., C˜i) to constrain the hidden state from node i generated by original LSTM (i.e., Hi). To make predictions, we further applied the causality state of the leaf node (i.e., SM in our study) rather than the hidden state to a FC layer to forecast SM, because causality state contains both the time interdependency and causality information. CLSTM learns time interdependency [Eqs. (2)(7) are the same with calculation process of LSTM] and causality information [Eqs. (8)(10)] for each node in causality structure as follows.

Fig. 2.
Fig. 2.

The brief calculation structure of a single node in causality structure in CLSTM. For node i, the inputs are hidden and cell state from last time step (Hi,t−1 and Ci,t−1), input feature from current time step (Xi,t), and causality state of parent nodes (C˜Pi,t, for root node is set to 0 because there is no parent node) from the upper level in causality structure. The neuron learns temporal features according to Eqs. (2)(7) and generates hidden and cell states from current time step (Hi,t and Ci,t). Then C˜Pi,t is utilized to combine and constrain Hi,t using Eqs. (8)(10) and get the causality state of current time step (C˜i,t).

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

For each node i on time step t, the input includes input features (Xi,t), hidden and cell states from the last time step (Hi,t−1 and Ci,t−1) and causality state from parent nodes (C˜Pi,t) in the upper level of the causality structure. The Pi denotes a set of the index collected by parent nodes of node i.

First, we use Eqs. (2)(6) to generate hidden and cell states of time step t (Hi,t and Ci,t), which is the same as the calculation process of the LSTM model (Hochreiter and Schmidhuber 1997):input gate,
Ii,t=σ(Wi,I,XXi,t+Wi,I,HHi,t1+bi,I);
forget gate,
Fi,t=σ(Wi,F,XXi,t+Wi,F,HHi,t1+bi,F);
current cell state,
Ci,t=Fi,tCi,t1+Ii,ttanh(Wi,C,XXi,t+Wi,C,HHi,t1+bi,C);
output gate,
Oi,t=σ(Wi,O,XXi,t+Wi,O,HHi,t1+bi,O);
current hidden state,
Hi,t=Oi,ttanh(Ci,t).
Second, we use causality state from parent nodes (C˜Pi,t) to combine and constrain hidden state (Hi,t) by Eqs. (7)(12) and get the causality state of current time step for node i (C˜i,t). It includes three steps:
  1. For each node (or index) j in the set of parent nodes (Pi), we calculate the corresponding weight for causality state of node j (βj,t) by hidden state (Hi,t) and causality state from node j (C˜j,t):

βj,t=σ(Wj,β,C˜C˜j,t+Wj,β,HHi,t+bj,β).
  • 2) Combine information of causality states from all parent nodes of node i by weights generated from previous step (βj,t):

Li,t=jPiβj,tC˜j,t.
  • 3) Calculate weight for combined causality information from parent nodes (Li,t) [Eq. (9)] and current hidden state (Hi,t) [Eq. (10)] and generate causality state of node i (C˜i,t) by weighted summation [Eq. (11)]:

γi,t=σ(Wi,γ,LLi,t+Wi,γ,HHi,t+bi,γ)
δi,t=σ(Wi,δ,LLi,t+Wi,δ,HHi,t+bi,δ)
C˜i,t=γi,tLi,t+δi,tHi,t,
where W* is weight, b* is deviation, σ is the sigmoid function, and tanh is the hyperbolic tangent function. The is pointwise multiplication. Notably, Eqs. (8)(10) are not applied for the root node (the node that does not have any parent node, e.g., nodes 1, 2, 4, 5 in Fig. 1). Causality state (C˜i,t) of the root node is equal to hidden state (Hi,t), because the root node does not have information from casual drivers to constrain the hidden state. SM value is predicted by applying the causality state of leaf node (e.g., node 6 in Fig. 1) in causality structure to a FC layer (the number of neurons is 1, activation function is tanh).

c. Model training and tuning

For training processes, we divided the data into training, validation, and test sets according to 60%, 20%, and 20%, commonly applied for prediction (Shi et al. 2015). These data were split by time in order to avoid involving training data in future test data. DL models were trained on training datasets, tuned on validation datasets, and evaluated on test datasets. We used feature data with a previous time of 7–17 days (the impact of the length of input features on soil moisture prediction is discussed in section 4f), and predicted SM 7 days later based on 13 input features (Table S2) (we also predict soil moisture on 1 and 15 days later, see section 4d). Notably, we train a DL model for each site. O et al. (2020) indicated that training one model with a combination of all available data (i.e., diverse training datasets) could significantly improve the performance of LSTM. However, the aim of our study is not to achieve the best performance of DL models using all available data. We want to identify the general applicability of CLSTM, especially over different soil conditions and different training data lengths. Notably, establishing a DL model for each site has some limitations, such as the model could only be applicable on gauged sites and could significantly increase computational cost. The selected variables are different in different FLUXNET sites, but for 1-, 7-, and 15-day forecasts, we use the same causality structure generated from the whole time series of the training period at each site. The causality structure is also different for each FLUXNET site. Figure S3 shows the three typical types of causality structure, i.e., multidrivers of soil moisture, one causal driver of soil moisture with three levels, and one casual driver of soil moisture with two levels. Notably, we generate causality structure by training the dataset, which avoids including any information from the testing set.

For tuning processes, to fairly compare the performance of LSTM and CLSTM, we set the same structure (both have only one hidden layer and the same number of hidden cells) and tune hyperparameters (see Table S3) on each site. The main hyperparameters are 1) batch size, which is the number of samples put together to calculate the loss for one gradient descent step; 2) hidden size, which represents of learning capability of DL models; 3) epochs, which are the number of passes of the entire training dataset; and 4) learning rate, which controls the step size at each iteration during the gradient descent process. We trained LSTM and CLSTM model with different hidden size (16, 32, 64, 128, 256) and different batch size (50, 100, 200), and selected the best model with the minimal mean squared error (MSE) on validation dataset. Learning rate was adaptively updated by a multiply reducing factor (set to 0.1) when validation loss has stopped improving. DL models were optimized for 50 epochs by selecting the best model with the lowest validation loss. Our study indicated that this setting of epochs is enough for searching for the best parameterization because the loss on training and validation datasets does not decrease after reaching 50 epochs. Notably, we also tuned two hyperparameters (i.e., correlation threshold value and depth of causality structure) of causality structure from several experiments. We set the threshold value as 0.7 and depth as 3 (see section 4f for further discussion). Based on the fine-tuned process, we could get the “best” LSTM and CLSTM models on each site (with the same model structure parameters) and could fairly compare the prediction skill of two models. The backward propagation uses the Adam gradient descent method (Kingma and Ba 2015). The loss function of DL models is setting as MSE between observation and prediction (detailed information is shown in Table S4).

d. Performance assessment criteria

We evaluated model performance on SM prediction by two widely used performance assessment criteria (Prasad et al. 2019; Fang et al. 2019), i.e., Nash–Sutcliffe efficiency (NSE) (Nash and Sutcliffe 1970) and root-mean-square error (RMSE). A detailed description of these two criteria is shown in Table S4. NSE (ranges from −∞ to 1) is a percentage error criterion used to describe how much variation in the observations is explained by the model, with 1 indicating a perfect prediction. RMSE is a scaled error criterion used for describing the differences between observed and prediction time series. Smaller RMSE and larger NSE indicate better model performance. Moreover, to assess the ability of models to predict drought events, soil wetness deficit index (SWDI) (Martínez-Fernández et al. 2015) is used to estimate the number of extreme and severe drought events at each site. Extreme and severe drought events are defined when SWDI ≤ −10, and −5 < SWDI ≤ −10, respectively (Zhang et al. 2021). Furthermore, we also define the relative improvement metrics (RIM) based on metrics used in Fang and Shen (2020). For example, RIM of RMSE is defined as {[RMSE(Causal LSTM)RMSE(LSTM)]/RMSE(LSTM)}×100%. Based on the RIM of each metric, the model performances are categorized as “excellent” (RIM > 10%), “good” (0 < RIM < 10%), and “poor” (RIM < 0).

4. Result and discussion

a. General performance of pure and causality-structured DL models

The initial evaluation of forecasting capability of causality-structured DL model (CLSTM) and pure DL model (LSTM) based on NSE and RMSE during testing periods is shown in Table 1. CLSTM generates better performances yielding lower RMSE and larger NSE values when compared to LSTM. The median of NSE increases from 0.34 to 0.52, while the median of RMSE decreases from 3.98% to 3.47%, affirming that CLSTM significantly improves the weekly prediction of SM (from two different aspect: variance and difference). For more perspicacity, the values of 25th and 75th percentiles, mean, maximum, minimum, and standard deviation of two metrics over 64 sites are also shown in Table 1. CLSTM better captured all different statistical indicators. For example, CLSTM increases 0.13% of the 75th percentile NSE and decreases 0.64% of the 25th percentile RMSE compared to LSTM. Notably, for CLSTM, the standard deviation of two metrics at all sites is less than LSTM (e.g., NSE from 0.37 to 0.23), which suggests that the introduction of causality structure increases the model stability over different regions (e.g., different height, soil texture, plant cover).

Table 1

The detail of two performance metrics (NSE and RMSE) of causality-structured long short-term memory model (CLSTM) and LSTM of 7-day forecast.

Table 1

Stemming from the evidence mentioned above, the superiority of CLSTM is unquestionable. However, performance may vary over different soil conditions, and the general absolute value of RMSE is hard to represent the improvement over different regions (Prasad et al. 2019). Figure 3 shows the RIM (see section 3d) of two metrics from LSTM to CLSTM. Following our classification, for NSE and RMSE, CLSTM is excellent over 38 and 46 sites, good at 19 and 12 sites, and poor at 7 and 6 sites over the whole 64 sites. Thirty-seven sites (57.8% of total sites) are excellent for all two metrics, while only six sites are poor. Excellent and good sites cover globally, especially over Europe and the central and eastern United States. The outcomes reveal that use of causality structure constraints to LSTM has proven to be a valuable constraint for improving performance and stability over different soil conditions. Moreover, we also observe some poor performance in Australia, Canada, the western coast of North America, South America, and Africa. We give some possible reasons for the poor performance of CLSTM over these regions: 1) SM DL predictive model performance is significantly related to the temporal autocorrelation (TAC) of SM (L. Li et al. 2022). TAC represents the information of soil memory, which is the most important factor in SM prediction based on DL models. 2) The performance of DL models may significantly be influenced by the amount of training data, which contains the information of temporal variation of soil moisture that could be learned. Furthermore, we could figure that CLSTM improves less over some dry regions when compared to LSTM (Fig. S4 and Fig. 3). We think this may be because LSTM already performs well over this region. Fang and Shen (2020, their Fig. 3) show the performance of soil moisture forecast using LSTM. We also found that LSTM already performs very well over the western United States (dry regions in the United States), which confirms our supposition. LSTM could capture the mean state of soil moisture very well, and the fluctuation of soil moisture is less over dry regions. Therefore, LSTM could already give a good performance over this region, and CLSTM can hardly extract more information from data over this region.

Fig. 3.
Fig. 3.

The relative improvement of two metrics at all sites:(a) NSE and (b) RMSE. Sites (44 sites) denoted with a star indicate CLSTM significantly improves LSTM at the 95% confidence level based on Student’s t test. Other sites (20 sites) are denoted with “X.” The insets show the number of sites for excellent (>10%), good (>0% and <10%), and poor (<0%) improvement by CLSTM. The inset is located over 38°–53°N, 5°W–20°E.

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

We further validated the improvement over different soil memory (i.e., different soil conditions and time autocorrelation). Figure 4 shows the relationship between two performance metrics [NSE and normalized RMSE (NRMSE)] with TAC (calculated by the Pearson correlation; the lag window is set to 7 for weekly SM prediction). TAC is utilized to represent SM memory in our study. The Pearson correlation (R) value of NSE with TAC is 0.59 and 0.84 and of NRMSE is −0.25 and −0.24 for LSTM and CLSTM, respectively. Moreover, the slope of the least squared fitting lines is positive (1.5, 1.4) for NSE and negative (−0.3, −0.3) for NRMSE, suggesting that SM memory positively correlates with the DL model’s performance. SM memory (evaluated by TAC) represents the time interdependency information input to the DL model. Therefore, TAC has positive correlation with the performance of DL models. Under different SM memory (means different soil conditions), CLSTM improves NSE and NRMSE significantly, especially in areas with relatively low TAC. In these regions, the NSE of LSTM is much lower than the average performance (see the intercept of the blue fitting curve in Fig. 4a; the blue fitting curve means the least squares fitting lines of performance over all stations), which indicated LSTM cannot adequately learn the information of features in some sites compared to CLSTM. When compared LSTM with CLSTM, the R value of NSE with TAC increases from 0.59 to 0.84, and the NSE of CLSTM is closer to the red fitting curve, reflecting the ability of CLSTM to excavate time dependency information and the applicability to different soil conditions (i.e., different soil memory).

Fig. 4.
Fig. 4.

The relationship between soil moisture memory (represented by temporal autocorrelation, the lag time is set as 7 days) with models’ performance based on (a) NSE and (b) normalized RMSE. Blue lines and red lines are the least squares fitting lines of LSTM and CLSTM. The numbers represent the number of each FLUXENT sites.

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

b. Generalizations of pure and causality-structured DL models

The generalization ability of DL models is a term that determines the predictive capability to adapt properly to new, previously unseen data, such as data in different soil conditions and different lengths of training data, which are essential to predict extremes for SM prediction. Minimizing the loss on the training dataset is not sufficient for learning and may lead to poor generalization ability (Neyshabur et al. 2017). The generalization ability is defined by the difference of losses between training and testing datasets. A large gap between training and testing loss is an obvious hint that the model does not generalize well, while a closer gap means better generalization ability. Figure 5 shows loss values (i.e., MSE in our study) on training and validation datasets of CLSTM (dotted lines, denoted as Ltrain_clstm, Ltest_clstm) and LSTM (solid lines, denoted as Ltrain_lstm, Ltest_lstm) during the whole training process from epoch 1 to 50. Notably, we only save the best mode, which has minimum loss on the validation dataset. The Ltrain_lstm (light pink solid line) is usually less than Ltrain_clstm (dark pink solid line), and has a large gap with Ltrain_lstm, which is obviously an overfitting result, confirming the lousy generality of pure DL models. On the contrary, Ltrain_clstm and Ltest_clstm (dotted lines) have a closer gap than LSTM at nearly all 64 sites (except for several sites, e.g., AU-Cpr), which indicates that constraint by causality structure leads to dramatic improvements in generalization ability of LSTM. We also show the average training and testing loss of 64 sites (Fig. S5), which further reinforces the result that CLSTM improves the generality of LSTM. Notably, this measure has some limitations, such as a model always predicting random noise would have similar training and testing error and therefore good generalization capability.

Fig. 5.
Fig. 5.

The loss value of training and testing datasets of LSTM (solid lines) and CLSTM (dotted lines) at all sites for a 7-day forecast. Training loss curves and testing loss curves of LSTM are denoted as light pink lines and dark pink lines, while training loss curves and testing loss curves of CLSTM are denoted as light blue lines and dark blue lines.

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

Furthermore, Fig. S6 shows the relationships between RIM of NSE and the length of training data of CLSTM. Conventional statistical wisdom suggests the size of training data could influence the performance of pure DL models. However, we found that CLSTM provides significant improvements (larger than 10%, denoted as red line in Fig. S6) on many sites, with a small training data size suggesting that CLSTM appears to perform better in some data-sparse regions.

c. Predictivity on extreme events of pure and causality-structured DL models

To evaluate the ability of DL models to predict the SM response to P events, we selected five sites with the largest mean daily P (the spatial distribution of these sites is shown in Fig. S7). Figure 6 shows SM time series from CLSTM and LSTM for the five wettest sites and the corresponding P. An obvious wet bias appeared in the LSTM’s prediction at the AU-Wac site while CLSTM fit very well. In wet periods, LSTM mostly tended to underestimate the response of SM to extreme P events. At the same time, CLSTM gave a more reasonable forecast on the amount of SM response to P (green rectangles in Fig. 6). In dry periods, CLSTM had a stable variation of SM. However, LSTM had some unreasonable peaks (see red rectangles in Fig. 6), especially when P events happened. This phenomenon may indicate that LSTM is vulnerable to “noise” in datasets, such as the error of training data, slight fluctuations caused by environment factors which did not influence the amount of soil moisture. LSTM hardly depicts accurate soil moisture fluctuation when the noise happened while CLSTM tends to give more reasonable estimates for soil moisture. We explain this phenomenon as the generation of causality structure could remove the abundant variables by correlation and causality test (i.e., variables that did not have obvious correlation with SM), which may remove some noise in the input feature. On the contrary, CLSTM constrains the output of LSTM using causality structure in each neuron, which further corrects the amplitude and pattern of SM response to P.

Fig. 6.
Fig. 6.

Surface soil moisture (testing period) of the observation (black line), CLSTM (red lines), and LSTM (blue lines) at the five wettest sites with the largest mean daily precipitation. Some wet periods (green rectangle) and dry periods (red rectangle) are annotated for analysis. The orange bars indicate precipitation. The latitude and longitude of five sites are listed as AU-Wac (37.4259°S, 145.1878°E), U.S.-Blo (38.8953°N, 120.6328°W), IT-Lav (45.9562°N, 11.2813°E), IT-Col (41.8494°N, 13.5881°E), and CH-Cha (47.2102°N, 8.4104°E).

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

To evaluate the ability of DL models to predict drought events, we selected five sites with the most extreme drought events (the spatial distribution of these sites is shown in Fig. S7) based on SWDI (section 3d). Figure 7 shows SM time series from CLSTM and LSTM for the five driest sites and the corresponding P. LSTM tended to overestimate SM during severe and extreme drought events in most cases, which may underestimate the severity and the duration of drought events. In addition, LSTM also underestimated SM in some cases. But CLSTM usually gave better predictions in these cases (shown in the red and green rectangles in Fig. 7). CLSTM also preferred to forecast a more stable event with less fluctuation than LSTM, which may add value for predicting long-term drought events.

Fig. 7.
Fig. 7.

Surface soil moisture (testing period) of the observation (black line), CLSTM (red lines), and LSTM (blue lines) at the five driest sites with the most extreme drought events by the soil wetness deficit index (SWDI, see Table S4). The severe drought events (−10 < SWDI < −5) and extreme drought events (SWDI < 10) are annotated as pink points and red points at the bottom of each panel, respectively. The red and green rectangles indicate that LSTM had significant overestimation and underestimation, respectively, while CLSTM can predict the extremes better. The latitude and longitude of five sites are listed as NL-Loo (52.1666°N, 5.7436°E), U.S.-WCr (45.8059°N, 90.0799°W), U.S.-MMS (39.3232°N, 86.4131°W), DE-Tha (50.9626°N, 13.5651°E), and BE-Vie (50.3049°N, 5.9981°E).

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

d. The application for different temporal scales

Previous results adequately prove that CLSTM significantly improves weekly SM prediction. We further evaluate the applicability of CLSTM on different time scales (1 and 15 days) at the same 64 FLUXNET sites. In congruence, NSE is larger and RMSE is lower for CLSTM when compared to LSTM (Fig. 8). Following Table 1, all statistical indicators are improved by introducing causality structure in 1- and 15-day forecasts (see Tables S5 and S6). For instance, the median value of NSE increased from 0.72 to 0.85 for a 1-day forecast, and it increases from 0.09 to 0.26 for a 15-day forecast. Notwithstanding the great value of CLSTM, the predictivity of long-term forecast of SM based on DL models is still relatively mediocre. Therefore, efforts should focus more on improving long-term (from weekly to subseasonal) forecasts of SM rather than short-term forecasts. When considering the magnitude of improvement by CLSTM, 7- and 15-day forecasts reach the larger improvement compared to 1-day forecast on NSE (see Table 1, Tables S5 and S6). The advantage of causality-structure constraint shows potential to improve long-term SM forecast. The generalization ability of 1- and 15-day forecasts is also significantly improved (see Figs. S8 and S9). The gap between the loss value of training and testing datasets of two models becomes larger as the time scale increase from 1 to 15 days, suggesting that the DL model tends to overfit more easily in the long-term forecast. Moreover, we highlight that as the time scale increased, the improvement of generality becomes larger, reinforcing the result that CLSTM has the potential to improve the long-term forecast of SM.

Fig. 8.
Fig. 8.

The general performance (NSE and RMSE) of CLSTM and LSTM on 64 FLUXNET sites of different four experiments: 1-day forecast by LSTM, 1-day forecast by CLSTM, 15-day forecast by LSTM, and 15-day forecast by CLSTM.

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

e. Comparison of CLSTM with different sets of causality structure

The causality structure is generated by pure data mining methods in our study, which does not involve any physical prior knowledge. Seneviratne et al. (2010) shows SM plays an important role in land energy and water balances on P and evapotranspiration (or latent heat flux, denote as LE) [see Eq. (5) and Fig. 4 in their study]. To evaluate whether simple prior knowledge can improve weekly SM forecast of CLSTM, we manually added three causal relationships, i.e., P on SM (P → SM), LE on SM (LE → SM) or both (P → SM and LE → SM) into the causality structure, separately (named as CLSTM*). Notably, we only artificially add some “causal” relation between variables (which may not obey the direction of relationship in physical laws) to see that the influence of different set of causality structure. We trained a model on each location and selected the “best” model on each site for comparison (the same hyperparameters tuning process with CLSTM, see section 3c). Figure S10 shows the boxplot of the general performance of LSTM, CLSTM, and three CLSTM*s with different sets of causality structure. For general performance, CLSTM* was on par with CLSTM, suggesting that causality structure generated by pure data mining methods adequately represents the causal relationships needed for forecast on each site. The ability of these models to predict extreme drought events is shown in Fig. 9. In most cases, CLSTM* had similar performance on forecasting extreme drought events. But significant improvements of CLSTM* were observed at U.S.-WCr, U.S.-Blo, and IT-Tor sites. Figure S11 shows the time series of SM at these three sites. CLSTM* tended to give more fluctuate time series in accordance with P events, which mainly contributed by the constraining of P → SM, which involves high-frequency fluctuation of P to forecast SM.

Fig. 9.
Fig. 9.

The forecast skill of extreme drought events by CLSTM (green circles) and physically constrained CLSTM (red triangles) compared to observations (blue pentagons). The drought events are evaluated by SWDI (see section 3d). Three sites with significant improvement of physically constrained CLSTM is annotated by red rectangles.

Citation: Journal of Hydrometeorology 23, 8; 10.1175/JHM-D-21-0206.1

Moreover, simple physical rules also corrected the amplitude of SM response to P events (see Figs. S11a,b), which is also benefited from P → SM. In general, CLSTM derived from pure data mining methods reaches outstanding performance. However, we also highlight the important potential of involving physical prior knowledge to improve the forecast of extreme drought events at specific sites. We emphasized that the causality in our study is not truly a physical relationship, but we could use this causal relationship to guide LSTM to give better performance. We further show that adding additional relationship according to physical knowledge in the causality structure cannot improve model performance in general, but there were some improvements at some specific sites.

f. Further discussion and possible future directions

1) Impact of different experiment settings

For the calculation process of causality states [Eqs. (9)(11)], we first generate weight for each state by a neural network (NN) separately, and finally integrate all the states to generate new causality state. We did not use a more straightforward way, i.e., directly combine all state by a NN rather than generate weight first. The calculation could be expressed as
C˜i,t=Wi,C˜,LLi,t+Wi,C˜,HHi,t+bi,C˜.

Our method is inspired by the generation of three gates in vanilla LSTM, which generates gate (i.e., weight) first and combines the hidden state from the last time step and input. Previous studies (Hochreiter and Schmidhuber 1997; Fang and Shen 2020) suggested that this method could solve the gradient exploding and vanishing problem compared to RNN. Moreover, the increasing of parameters also could help the model to learn the weight of each state more accurately, which results in better performance. We further compare our strategy [Eqs. (9)(11)] with more straightforward ways [Eq. (12)] and found that generating weight before integrating states could slightly improve the performance and have nearly the same time cost for in situ soil moisture prediction (see Table S7).

For the length of input features, we select feature data with a previous time of 7–17 days based on our preliminary test. We are aware that introducing information on a multiscale may improve the performance of DL models. We use the features with a previous time of 7–37 days information to predict soil moisture and find that the performance slightly decreases when involving more information (Table S8). Although more input information could help the model learn the annual cycle and seasonal cycle of processes, complex input is highly nonlinearity and have many useless information, which tends to make DL model to suffer from the information entanglement effect, which undermines the performance of the model (Baltrušaitis et al. 2019).

For the parameters of causality structure, Figs. S12 and S13 show the general performance of CLSTM with different depths (3, 4, 5) and different thresholds (0.3, 0.5, 0.7), respectively. We could see that a more complex causality structure (less threshold value and large depth) may lead to poor performance because of the overfitting problem.

2) Physical meaning of the selected variables

We also show the frequency of causal drivers of soil moisture in Fig. S14. Notably, this figure is the frequency of drivers of soil moisture, and it does not indicate the relative importance between drivers. We found many interesting points. The top five frequency casual drivers of soil moisture prediction are air temperature, soil temperature, CO2, longwave radiation, and latent heat flux.

  1. We found that soil temperature and air temperature could influence soil moisture prediction in most sites. Soil temperature participates in local SM–P feedback by affecting the fluxes of outgoing longwave and ground heat (such as latent and sensible heat flux) (Venkat et al. 2003). The local SM–P feedback could significantly influence the generation of soil moisture. Moreover, air temperature is related to evaporation and has a significantly negative effect on soil moisture, which is confirmed by previous studies (Feng and Liu 2015). Thus, soil temperature and air temperature could influence soil moisture in most sites.

  2. It is interesting to see that CO2 also affects the generation of some sites, which reveals the importance of plant hydraulics effect in some regions. Plants use water to survive by photosynthesis, which influences the variation of soil moisture. Moreover, VPD also has some impact on soil moisture prediction. VPD represents the moisture deficit of the atmosphere, which also could influence evaporation. Low VPD may increase evaporation and therefore lower soil moisture.

  3. Radiant flux is primarily related to the reflective capacity of the surface, rather than the absorbing capacity (Chang et al. 2020). Shortwave radiation can affect SM variation, which provides energy for evaporation. However, longwave radiation was generated based on the reflection of shortwave radiation, which contained more potential information about the soil. Compared with shortwave radiation, the predictive model could more easily capture the potential relationship between soil moisture and longwave radiation (Q. Li et al. 2022).

  4. Notably, precipitation does not have an obvious correlation with soil moisture. Here we give two reasons: first, precipitation only happened in a few days, thus its effect is not obvious on average. However, we know that precipitation has an important effect on soil moisture, therefore, we further add precipitation into the causality structure (see section 4e) to see the impact of precipitation on soil moisture prediction. Second, FLUXNET sites were mostly in forested areas, which had high interception values for precipitation and could not directly reflect the correlation between precipitation and soil moisture (Q. Li et al. 2022).

3) The potential benefit for coupling CLSTM with SM2RAIN methods

CLSTM could have more benefits by coupling with other models using soil moisture as inputs. For example, Brocca et al. (2014) proposed a “hydrology backward” approach to infer preceding rainfall amount by using variations in soil moisture, which is named SM2RAIN. The SM2RAIN method could provide good performance (R, RMSE) on rainfall amount when evaluated by the Global Precipitation Climatology Centre (GPCC) product. Moreover, the SM-derived product performs well in the detection of low to medium rainfall events. The potential of the SM2RAIN method for improving the rainfall estimation is validated on a global scale. Overall, this study demonstrates the validity of the proposed SM2RAIN method for improving rainfall estimation on a global scale. Additionally, they emphasized that accurate soil moisture with expected improved accuracy may significantly improve the inferred rainfall amount, and the potential of improving rainfall estimation using accurate soil moisture is also confirmed by Ciabatta et al. (2018) and Brocca et al. (2019). In our study, we proposed CLSTM to accurately predict soil moisture 7 days later, and the result suggests that CLSTM could roughly figure out corresponding rainfall given historical climatic input (Figs. 6 and 7). Thus, it has the potential to couple CLSTM forecasting with the SM2RAIN method, which could achieve better rainfall estimation.

4) The advantages, limitation, and future direction of CLSTM

In our studies, we confirm the unprecedented accuracy of CLSTM when evaluated on 64 FLUXNET sites. We give some discussions to explain why CLSTM could improve weekly SM forecasting:

  1. SM has a strong memory effect, which was found to range between 5 and 40 days (Orth and Seneviratne 2012) and is helpful for forecasting SM (Pan et al. 2019). CLSTM is constructed based on LSTM; therefore, CLSTM inherits the advantages of LSTM that could consider both long-term and short-term dependency of SM, which could deliver superb performance for weekly SM prediction.

  2. Carefully tuning process including the constraint of the depth of causality structure and threshold for correlation tests avoids overfitting problem to some content.

  3. For model structure, CLSTM will benefit from two aspects compared to LSTM: increase hyperparameters according to some guidance (multilevel structure and involving causality states) and remove some abundant information by causality test compared to correlation test. We give some further discussion to confirm these two aspects.

CLSTM has larger parameters compared to LSTM. This is mainly caused by involving a new state into CLSTM according to some guidance which adds more complexity to the model. To show that the better performance is not simply due to the larger number of parameters, we simply increase the parameters (i.e., complexity) of LSTM (denotes as LSTM-2) by adding three hidden layers and increasing the number of cells in the hidden layers, which have nearly the same number (slightly larger) of parameters with CLSTM. The performance of 64 sites of these two models is shown in Fig. S15. When comparing the performance of LSTM-2 with vanilla LSTM (Table 1), the mean value of NSE over 64 sites increases from 0.32 to 0.36, which indicates that increasing the parameters could slightly increase the performance of vanilla LSTM. However, we could figure that the performance of LSTM-2 is still poor than CLSTM (Fig. S15), which suggests that randomly (or simply) increasing the number of parameters by adding layers or the number of cells could not adequately help LSTM to extract the information in the training dataset. CLSTM adds parameters according to some guidance (such as casualty) and could provide better performance than randomly adding parameters. Furthermore, to show whether the improvement of CLSTM is derived from variable selection by causality test, we further compared the CLSTM with LSTM* (i.e., only applies the statistical test to select variables and input to LSTM). We found that CLSTM also significantly improves the LSTM*, which further emphasizes the effect of the causality network (Table S9).

In the above discussion, we show the impact of involving causality states into LSTM (introducing more parameters by using multilevel structures of LSTM), which gives more reasonable prediction compared to LSTM. To show whether the identified causality is better than the “untrue” meaningless causality and “correlation” causality, we did two groups of experiments. In the first group, we randomly set three variables as the casual drivers of soil moisture, which is hard to obey causality observed in data. In the second group, we did not perform a causality test and only use the structure generated from linear correlation test. The results show that the performance of CLSTM is better than the correlation test, which shows the impact of causality test (Table S10). We think this is mainly caused by removing some abundant relationship to decrease the complexity of causal structure (which is similar with prune operation of tree-based ML models). Furthermore, we could figure that a random set of variables obviously decreases the performance (about 8% mean NSE and 4% mean RMSE), which further emphasizes that the importance of causality and correlation test that could remove abundant information and give some possible causal relationship to help the model to improve performance. Notably, although the causality benefits the model compared to using meaningless causality, the improvement is relatively lower than involving a new state and using stacked LSTM layers. Wang et al. (2022) indicated that different-level layers could extract different levels of feature patterns (e.g., the hidden representations can be more and more abstract from the top layer downward). Therefore, the cell states in each layer could learn different-level patterns by transfer information vertically.

CLSTM also have some limitations. The aim of this study is to identify the general improvement of involving casualty information in DL models rather than achieve the best performance using all available data. In our study, the experiment setting is that soil moisture on t + 7 days is predicted using features from the past 10 days for each day (denotes t). Features from t + 1–7 days are not involved in the model, which may be the major control of soil moisture formation. We use this experiment setting to simulate a real forecast scenario, which could not obtain the observation (in situ or remote sensing) of features (such as precipitation and soil moisture) in the future. Utilizing weather prediction from a numerical model could implement the information during the gaping days and may help to achieve better in situ soil moisture prediction. However, it also may involve the system error of the numerical model and the error derived from mismatching of the coordinate location of numerical model data and in situ data. Therefore, we emphasize that the experiment setting in our study cannot get the best performance for soil moisture hindcasting using an observation dataset or numerical predictions, but this experiment could show the predictability of DL models in a real forecast scenario and could present the effect of involving causality structure into LSTM. Furthermore, although CLSTM improves the predictive ability compared to LSTM, CLSTM costs more training time than LSTM, which may not be suitable for the application that needs lower time costs.

We first utilized a causality-structured DL model in SM prediction and evaluate the applicability of causality-structured DL model over the different soil conditions, different training data lengths, and different experiment settings. Here we give some future scopes. First, GC does not represent natural causality but statistical causality, which does not conform with more strict definitions of causality in Pearl (2009). In the future, a better causal discovery algorithm based on the graphical model, e.g., PCMCI [a combination of the PC algorithm (Spirtes and Glymour 1991) and the Momentary Conditional Independence (MCI) test; Runge et al. 2019b], should be used to get a more accurate causal structure to improve the performance further. Second, causal relations may rely on different temporal scales. Thus, ensemble CLSTM that processed causal relations on different time scales may need to improve performance. Third, additional efforts are required to explore CLSTM on other hydrometeorological variables and different spatial scales.

5. Conclusions

We developed a causality-structured deep learning model, named CLSTM, to predict soil moisture. This model consists of two parts: first, causality structure is derived from causal discovery methods (correlation and causality tests). Second, we design a DL model (CLSTM) that forward propagates according to the causality structure for SM prediction. This study was the first of causality-structured DL models for hydrometeorological applications, to our best knowledge. The model was trained on 64 FLUXNET sites, and the performance was evaluated by RMSE and NSE. CLSTM shows improvements from three aspects. 1) The general performance is improved for four metrics. Median values of NSE and RMSE of CLSTM are 0.52% and 3.47%, respectively, while LSTM is 0.34% and 3.98%, respectively. CLSTM performs excellent (improve larger than 10% of metric) in nearly 75% sites of four metrics compared to LSTM. 2) Generalization ability is dramatically improved by involving causality into DL model, and CLSTM can adapt to different soil conditions. 3) CLSTM has better performance on forecasting drought events and the response of SM to P. Moreover, we confirmed the applicability of CLSTM from 1- to 15-day forecasts. We also highlighted the important role of physically constrained prior knowledge to improve extreme drought events forecasting. The outcomes reveal that the involving of causal relations to LSTM has proven to be a valuable constraint for improving performance and stability over different soil conditions. Notably, we first generate weights for each state rather than use an NN to weight sum all states directly. This method was inspired by the gates in LSTM, which could be used to resolve the gradient vanishing problem of RNN. Interesting, this method also brings the advantage into CLSTM, which confirm the usefulness of the gate mechanism in RNN-based DL models.

In this study, we focus on improving the causality-structured DL model compared to the pure DL model rather than exploring the best representation of causality on each site. We highlight the improvements of causality-structured DL model by using the simple statistical causality relationship achieved by this study.

Acknowledgments.

We thank Dr. Kuai Fang and two anonymous reviewers for helping us improve the paper. This work was supported by the Natural Science Foundation of China under Grants U1811464 and 41730962, and the National Key R&D Program of China under Grant 2017YFA0604300. Wei Shangguan is supported by the National Science Foundation of China under Grant 41975122. Zhongwang Wei is supported by the National Science Foundation of China under Grant 42075158. Qingliang Li is supported by the National Science Foundation of China under Grant 42105144. The source code is available at https://github.com/leelew/CLSTM.

Data availability statement.

All site datasets used during this study are openly available from FLUXNET at https://fluxnet.org/data/fluxnet2015-dataset/ as cited in Pastorello et al. (2020).

APPENDIX

Definition of Statistical Tests

Here we show the detail of three correlation and GC tests. For feature a [denoted as a = (a1, …, at)] and b [denoted as b = (b1, …, bt)], t is time steps. We perform these four tests as follows.

a. Pearson correlation

Pearson correlation is calculated as
R=i=1t(aia¯)(bib¯)i=1t(aia¯)2i=1t(bib¯)2,
where a¯ and b¯ are mean values of a and b.

b. Linear and nonlinear GC test

Here we give a brief summary of GC test (both linear and nonlinear algorithm), the detail is referred to section 2b. in Li et al. (2020).

To test whether a is causal driver of b (i.e., ab), first, we define a pth-order polynomials of vector autoregression (VAR) model to predict bt from past information of a, b, and other features (denoted as z), and we call Eq. (A2) the “full” model. Second, we exclude the past information of a, and construct a model named “baseline” model [Eq. (A3)].
bt=f([atp,,at1,btp,,bt1,ztp,,zt1,εt])
bt=f([btp,,bt1,ztp,,zt1,εt]),
where f is linear regression for linear GC test and RF for nonlinear GC test. Li et al. (2020) has indicated that RF perform well in detecting nonlinear GC value compared to other ML models (e.g., ANN). The p is set to 7 for weekly SM prediction.

Finally, the null hypothesis is defined as: ab, that is, the full model has equal or less predictive power (evaluated by MSE in our study) than baseline model. The null hypothesis is rejected only when full model has better accuracy than baseline model, which represent past information of a is helpful for predicting b, i.e., ab. We further discuss the difference between correlation and causality and give some reasons that GC could avoid spurious relations (see supplemental material).

REFERENCES

  • Abbes, A. B., R. Magagi, and K. Goita, 2019: Soil moisture estimation from SMAP observations using Long Short-Term Memory (LSTM). 2019 IEEE Int. Geoscience and Remote Sensing Symp., Yokohama, Japan, IEEE, 15901593, https://doi.org/10.1109/IGARSS.2019.8898418.

    • Search Google Scholar
    • Export Citation
  • Adeyemi, O., I. Grove, S. Peets, Y. Domun, and T. Norton, 2018: Dynamic neural network modelling of soil moisture content for predictive irrigation scheduling. Sensors, 18, 3408, https://doi.org/10.3390%2Fs18103408.

    • Search Google Scholar
    • Export Citation
  • Ali, I., F. Greifeneder, J. Stamenkovic, M. Neumann, and C. Notarnicola, 2015: Review of machine learning approaches for biomass and soil moisture retrievals from remote sensing data. Remote Sens., 7, 16 39816 421, https://doi.org/10.3390/rs71215841.

    • Search Google Scholar
    • Export Citation
  • Baltrušaitis, T., C. Ahuja, and L. P. Morency, 2019: Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell., 41, 423443, https://doi.org/10.1109/TPAMI.2018.2798607.

    • Search Google Scholar
    • Export Citation
  • Besaw, L. E., D. M. Rizzo, P. R. Bierman, and W. R. Hackett, 2010: Advances in ungauged streamflow prediction using artificial neural networks. J. Hydrol., 386, 2737, https://doi.org/10.1016/j.jhydrol.2010.02.037.

    • Search Google Scholar
    • Export Citation
  • Brocca, L., and Coauthors, 2014: Soil as a natural rain gauge: Estimating global rainfall from satellite soil moisture data. J. Geophys. Res. Atmos., 119, 51285141, https://doi.org/10.1002/2014JD021489.

    • Search Google Scholar
    • Export Citation
  • Brocca, L., and Coauthors, 2019: SM2RAIN–ASCAT (2007–2018): Global daily satellite rainfall data from ASCAT soil moisture observations. Earth Syst. Sci. Data, 11, 15831601, https://doi.org/10.5194/essd-11-1583-2019.

    • Search Google Scholar
    • Export Citation
  • Chang, Y., and Coauthors, 2020: Effects of soil moisture on surface radiation balance and water-heat flux in desert steppe environment of Inner Mongolia. Pol. J. Environ. Stud., 30, 18811891, https://doi.org/10.15244/pjoes/127019.

    • Search Google Scholar
    • Export Citation
  • Ciabatta, L., and Coauthors, 2018: SM2RAIN-CCI: A new global long-term rainfall data set derived from ESA CCI soil moisture. Earth Syst. Sci. Data, 10, 267280, https://doi.org/10.5194/essd-10-267-2018.

    • Search Google Scholar
    • Export Citation
  • Di Capua, K., and Coauthors, 2019: Long-lead statistical forecasts of the Indian summer monsoon rainfall based on causal precursors. Wea. Forecasting, 34, 13771394, https://doi.org/10.1175/WAF-D-19-0002.1.

    • Search Google Scholar
    • Export Citation
  • Fang, K., and C. Shen, 2020: Near-real-time forecast of satellite-based soil moisture using long short-term memory with an adaptive data integration kernel. J. Hydrometeor., 21, 399413, https://doi.org/10.1175/JHM-D-19-0169.1.

    • Search Google Scholar
    • Export Citation
  • Fang, K., C. Shen, D. Kifer, and X. Yang, 2017: Prolongation of SMAP to spatiotemporally seamless coverage of continental US using a deep learning neural network. Geophys. Res. Lett., 44, 11 03011 039, https://doi.org/10.1002/2017GL075619.

    • Search Google Scholar
    • Export Citation
  • Fang, K., M. Pan, and C. Shen, 2019: The value of SMAP for long-term soil moisture estimation with the help of deep learning. IEEE Trans. Geosci. Remote Sens., 57, 22212233, https://doi.org/10.1109/TGRS.2018.2872131.

    • Search Google Scholar
    • Export Citation
  • Feng, H., and Y. Liu, 2015: Combined effects of precipitation and air temperature on soil moisture in different land covers in a humid basin. J. Hydrol., 531, 11291140, https://doi.org/10.1016/j.jhydrol.2015.11.016.

    • Search Google Scholar
    • Export Citation
  • Gauch, M., F. Kratzert, D. Klotz, G. Nearing, J. Lin, and S. Hochreiter, 2021: Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network. Hydrol. Earth Syst. Sci., 25, 20452062, https://doi.org/10.5194/hess-25-2045-2021.

    • Search Google Scholar
    • Export Citation
  • Granger, C. W., 1969: Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 424438, https://doi.org/10.2307/1912791.

    • Search Google Scholar
    • Export Citation
  • Grus, J., 2019: Data Science from Scratch: First Principles with Python. O’Reilly Media, 406 pp.

  • Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 17351780, https://doi.org/10.1162/neco.1997.9.8.1735.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. Ba, 2015: Adam: A method for stochastic optimization. arXiv, 1412.6980, https://doi.org/10.48550/arXiv.1412.6980.

  • Koster, R. D., 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, https://doi.org/10.1126/science.1100217.

    • Search Google Scholar
    • Export Citation
  • Kretschmer, M., J. Runge, and D. Coumou, 2017: Early prediction of extreme stratospheric polar vortex states based on causal precursors. Geophys. Res. Lett., 44, 85928600, https://doi.org/10.1002/2017GL074696.

    • Search Google Scholar
    • Export Citation
  • LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436444, https://doi.org/10.1038/nature14539.

  • Lee, T., J. Y. Shin, J. S. Kim, and V. P. Singh, 2020: Stochastic simulation on reproducing long-term memory of hydroclimatological variables using deep learning model. J. Hydrol., 582, 124540, https://doi.org/10.1016/j.jhydrol.2019.124540.

    • Search Google Scholar
    • Export Citation
  • Lehmann, J., M. Kretschmer, B. Schauberger, and F. Wechsung, 2020: Potential for early forecast of Moroccan wheat yields based on climatic drivers. Geophys. Res. Lett., 47, e2020GL087516, https://doi.org/10.1029/2020GL087516.

    • Search Google Scholar
    • Export Citation
  • Li, L., and Coauthors, 2020: A causal inference model based on random forests to identify the effect of soil moisture on precipitation. J. Hydrometeor., 21, 11151131, https://doi.org/10.1175/JHM-D-19-0209.1.

    • Search Google Scholar
    • Export Citation
  • Li, L., Y. Dai, W. Shangguan, N. Wei, Z. Wei, and S. Gupta, 2022: Multistep forecasting of soil moisture using spatiotemporal deep encoder–decoder networks. J. Hydrometeor., 23, 337350, https://doi.org/10.1175/JHM-D-21-0131.1.

    • Search Google Scholar
    • Export Citation
  • Li, Q., Z. Wang, W. Shangguan, L. Li, Y. Yao, and F. Yu, 2021: Improved daily SMAP satellite soil moisture prediction over China using deep learning model with transfer learning. J. Hydrol., 600, 126698, https://doi.org/10.1016/j.jhydrol.2021.126698.

    • Search Google Scholar
    • Export Citation
  • Li, Q., Y. Zhu, W. Shangguan, X. Wang, L. Li, and F. Yu, 2022: An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma, 409, 115651, https://doi.org/10.1016/j.geoderma.2021.115651.

    • Search Google Scholar
    • Export Citation
  • Liu, Q., and Coauthors, 2011: The contributions of precipitation and soil moisture observations to the skill of soil moisture estimates in a land data assimilation system. J. Hydrometeor., 12, 750765, https://doi.org/10.1175/JHM-D-10-05000.1.

    • Search Google Scholar
    • Export Citation
  • Martínez-Fernández, J., A. González-Zamora, N. Sánchez, and A. Gumuzzio, 2015: A soil water based index as a suitable agricultural drought indicator. J. Hydrol., 522, 265273, https://doi.org/10.1016/j.jhydrol.2014.12.051.

    • Search Google Scholar
    • Export Citation
  • Narasimhan, B., and R. Srinivasan, 2005: Development and evaluation of Soil Moisture Deficit Index (SMDI) and Evapotranspiration Deficit Index (ETDI) for agricultural drought monitoring. Agric. For. Meteor., 133, 6988, https://doi.org/10.1016/j.agrformet.2005.07.012.

    • Search Google Scholar
    • Export Citation
  • Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through conceptual models Part I—A discussion of principles. J. Hydrol., 10, 282290, https://doi.org/10.1016/0022-1694(70)90255-6.

    • Search Google Scholar
    • Export Citation
  • Neyshabur, B., S. Bhojanapalli, D. McAllester, and N. Srebro, 2017: Exploring generalization in deep learning. arXiv, 1706.08947, https://doi.org/10.48550/arXiv.1706.08947.

    • Search Google Scholar
    • Export Citation
  • O, S., and R. Orth, 2021: Global soil moisture data derived through machine learning trained with in-situ measurements. Sci Data, 8, 170, https://doi.org/10.1038/s41597-021-00964-1.

    • Search Google Scholar
    • Export Citation
  • O, S., E. Dutra, and R. Orth, 2020: Robustness of process-based versus data-driven modeling in changing climatic conditions. J. Hydrometeor., 21, 19291944, https://doi.org/10.1175/JHM-D-20-0072.1.

    • Search Google Scholar
    • Export Citation
  • Ombadi, M., P. Nguyen, S. Sorooshian, and K. Hsu, 2020: Evaluation of methods for causal discovery in hydrometeorological systems. Water Resour. Res., 56, e2020WR027251, https://doi.org/10.1029/2020WR027251.

    • Search Google Scholar
    • Export Citation
  • Orth, R., and S. I. Seneviratne, 2012: Analysis of soil moisture memory from observations in Europe. J. Geophys. Res., 117, D15115, https://doi.org/10.1029/2011JD017366.

    • Search Google Scholar
    • Export Citation
  • Ouyang, W., K. Lawson, D. Feng, L. Ye, C. Zhang, and C. Shen, 2021: Continental-scale streamflow modeling of basins with reservoirs: Towards a coherent deep-learning-based strategy. J. Hydrol., 599, 126455, https://doi.org/10.1016/j.jhydrol.2021.126455.

    • Search Google Scholar
    • Export Citation
  • Pan, J., W. Shangguan, L. Li, H. Yuan, S. Zhang, X. Lu, N. Wei, and Y. Dai, 2019: Using data-driven methods to explore the predictability of surface soil moisture with FLUXNET site data. Hydrol. Processes, 33, 29782996, https://doi.org/10.1002/hyp.13540.

    • Search Google Scholar
    • Export Citation
  • Pastorello, G., and Coauthors, 2020: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Sci. Data, 7, 225, https://doi.org/10.1038/s41597-020-0534-3.

    • Search Google Scholar
    • Export Citation
  • Pearl, J., 2009: Causality. 2nd ed. Cambridge University Press, 484 pp., https://doi.org/10.1017/CBO9780511803161.

  • Prasad, R., R. C. Deo, Y. Li, and T. Maraseni, 2019: Weekly soil moisture forecasting with multivariate sequential, ensemble empirical mode decomposition and Boruta-random forest hybridizer algorithm approach. Catena, 177, 149166, https://doi.org/10.1016/j.catena.2019.02.012.

    • Search Google Scholar
    • Export Citation
  • Read, J. S., and Coauthors, 2019: Process-guided deep learning predictions of lake water temperature. Water Resour. Res., 55, 91739190, https://doi.org/10.1029/2019WR024922.

    • Search Google Scholar
    • Export Citation
  • Runge, J., R. V. Donner, and J. Kurths, 2015: Optimal model-free prediction from multivariate time series. Phys. Rev. E, 91, 052909, https://doi.org/10.1103/PhysRevE.91.052909.

    • Search Google Scholar
    • Export Citation
  • Runge, J., and Coauthors, 2019a: Inferring causation from time series in Earth system sciences. Nat. Commun., 10, 2553, https://doi.org/10.1038/s41467-019-10105-3.

    • Search Google Scholar
    • Export Citation
  • Runge, J., P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic, 2019b: Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv., 5, eaau4996, https://doi.org/10.1126/sciadv.aau4996.

    • Search Google Scholar
    • Export Citation
  • Seneviratne, S. I., T. Corti, E. L. Davin, M. Hirschi, E. B. Jaeger, I. Lehner, B. Orlowsky, and A. J. Teuling, 2010: Investigating soil moisture–climate interactions in a changing climate: A review. Earth-Sci. Rev., 99, 125161, https://doi.org/10.1016/j.earscirev.2010.02.004.

    • Search Google Scholar
    • Export Citation
  • Seneviratne, S. I., and Coauthors, 2013: Impact of soil moisture-climate feedbacks on CMIP5 projections: First results from the GLACE-CMIP5 experiment. Geophys. Res. Lett., 40, 52125217, https://doi.org/10.1002/grl.50956.

    • Search Google Scholar
    • Export Citation
  • Shi, X., Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proc. 28th Int. Conf. on Neural Information Processing Systems, Vol. 1, Montreal, QC, Canada, NIPS, 802810, https://dl.acm.org/doi/10.5555/2969239.2969329.

    • Search Google Scholar
    • Export Citation
  • Spirtes, P., and C. Glymour, 1991: An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev., 9, 6272, https://doi.org/10.1177/089443939100900106.

    • Search Google Scholar
    • Export Citation
  • Tongal, H., and M. Booij, 2018: Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J. Hydrol., 564, 266282, https://doi.org/10.1016/j.jhydrol.2018.07.004.

    • Search Google Scholar
    • Export Citation
  • Tuttle, S., and G. Salvucci, 2016: Empirical evidence of contrasting soil moisture–precipitation feedbacks across the United States. Science, 352, 825828, https://doi.org/10.1126/science.aaa7185.

    • Search Google Scholar
    • Export Citation
  • Venkat, L., T. J. Jackson, and Z. Diane, 2003: Soil moisture–temperature relationships: Results from two field experiments. Hydrol. Processes, 17, 30413057, https://doi.org/10.1002/hyp.1275.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., H. Wu, J. Zhang, Z. Gao, J. Wang, P. Yu, and M. Long, 2022: PredRNN: A recurrent neural network for spatiotemporal predictive learning. IEEE Trans. Pattern Anal. Mach. Intell., in press, https://doi.org/10.1109/TPAMI.2022.3165153.

    • Search Google Scholar
    • Export Citation
  • Zhang, R., and Coauthors, 2021: Assessment of agricultural drought using soil water deficit index based on ERA5-land soil moisture data in four southern provinces of China. Agriculture, 11, 411, https://doi.org/10.3390/agriculture11050411.

    • Search Google Scholar
    • Export Citation
  • Zhang, Z., H. Qin, Y. Liu, Y. Wang, L. Yao, Q. Li, J. Li, and S. Pei, 2019: Long Short-Term Memory Network based on Neighborhood Gates for processing complex causality in wind speed prediction. Energy Convers. Manage., 192, 3751, https://doi.org/10.1016/j.enconman.2019.04.006.

    • Search Google Scholar
    • Export Citation
  • Zhao, W. L., and Coauthors, 2019: Physics-constrained machine learning of evapotranspiration. Geophys. Res. Lett., 46, 14 49614 507, https://doi.org/10.1029/2019GL085291.

    • Search Google Scholar
    • Export Citation

Supplementary Materials

Save
  • Abbes, A. B., R. Magagi, and K. Goita, 2019: Soil moisture estimation from SMAP observations using Long Short-Term Memory (LSTM). 2019 IEEE Int. Geoscience and Remote Sensing Symp., Yokohama, Japan, IEEE, 15901593, https://doi.org/10.1109/IGARSS.2019.8898418.

    • Search Google Scholar
    • Export Citation
  • Adeyemi, O., I. Grove, S. Peets, Y. Domun, and T. Norton, 2018: Dynamic neural network modelling of soil moisture content for predictive irrigation scheduling. Sensors, 18, 3408, https://doi.org/10.3390%2Fs18103408.

    • Search Google Scholar
    • Export Citation
  • Ali, I., F. Greifeneder, J. Stamenkovic, M. Neumann, and C. Notarnicola, 2015: Review of machine learning approaches for biomass and soil moisture retrievals from remote sensing data. Remote Sens., 7, 16 39816 421, https://doi.org/10.3390/rs71215841.

    • Search Google Scholar
    • Export Citation
  • Baltrušaitis, T., C. Ahuja, and L. P. Morency, 2019: Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell., 41, 423443, https://doi.org/10.1109/TPAMI.2018.2798607.

    • Search Google Scholar
    • Export Citation
  • Besaw, L. E., D. M. Rizzo, P. R. Bierman, and W. R. Hackett, 2010: Advances in ungauged streamflow prediction using artificial neural networks. J. Hydrol., 386, 2737, https://doi.org/10.1016/j.jhydrol.2010.02.037.

    • Search Google Scholar
    • Export Citation
  • Brocca, L., and Coauthors, 2014: Soil as a natural rain gauge: Estimating global rainfall from satellite soil moisture data. J. Geophys. Res. Atmos., 119, 51285141, https://doi.org/10.1002/2014JD021489.

    • Search Google Scholar
    • Export Citation
  • Brocca, L., and Coauthors, 2019: SM2RAIN–ASCAT (2007–2018): Global daily satellite rainfall data from ASCAT soil moisture observations. Earth Syst. Sci. Data, 11, 15831601, https://doi.org/10.5194/essd-11-1583-2019.

    • Search Google Scholar
    • Export Citation
  • Chang, Y., and Coauthors, 2020: Effects of soil moisture on surface radiation balance and water-heat flux in desert steppe environment of Inner Mongolia. Pol. J. Environ. Stud., 30, 18811891, https://doi.org/10.15244/pjoes/127019.

    • Search Google Scholar
    • Export Citation
  • Ciabatta, L., and Coauthors, 2018: SM2RAIN-CCI: A new global long-term rainfall data set derived from ESA CCI soil moisture. Earth Syst. Sci. Data, 10, 267280, https://doi.org/10.5194/essd-10-267-2018.

    • Search Google Scholar
    • Export Citation
  • Di Capua, K., and Coauthors, 2019: Long-lead statistical forecasts of the Indian summer monsoon rainfall based on causal precursors. Wea. Forecasting, 34, 13771394, https://doi.org/10.1175/WAF-D-19-0002.1.

    • Search Google Scholar
    • Export Citation
  • Fang, K., and C. Shen, 2020: Near-real-time forecast of satellite-based soil moisture using long short-term memory with an adaptive data integration kernel. J. Hydrometeor., 21, 399413, https://doi.org/10.1175/JHM-D-19-0169.1.

    • Search Google Scholar
    • Export Citation
  • Fang, K., C. Shen, D. Kifer, and X. Yang, 2017: Prolongation of SMAP to spatiotemporally seamless coverage of continental US using a deep learning neural network. Geophys. Res. Lett., 44, 11 03011 039, https://doi.org/10.1002/2017GL075619.

    • Search Google Scholar
    • Export Citation
  • Fang, K., M. Pan, and C. Shen, 2019: The value of SMAP for long-term soil moisture estimation with the help of deep learning. IEEE Trans. Geosci. Remote Sens., 57, 22212233, https://doi.org/10.1109/TGRS.2018.2872131.

    • Search Google Scholar
    • Export Citation
  • Feng, H., and Y. Liu, 2015: Combined effects of precipitation and air temperature on soil moisture in different land covers in a humid basin. J. Hydrol., 531, 11291140, https://doi.org/10.1016/j.jhydrol.2015.11.016.

    • Search Google Scholar
    • Export Citation
  • Gauch, M., F. Kratzert, D. Klotz, G. Nearing, J. Lin, and S. Hochreiter, 2021: Rainfall–runoff prediction at multiple timescales with a single Long Short-Term Memory network. Hydrol. Earth Syst. Sci., 25, 20452062, https://doi.org/10.5194/hess-25-2045-2021.

    • Search Google Scholar
    • Export Citation
  • Granger, C. W., 1969: Investigating causal relations by econometric models and cross-spectral methods. Econometrica, 37, 424438, https://doi.org/10.2307/1912791.

    • Search Google Scholar
    • Export Citation
  • Grus, J., 2019: Data Science from Scratch: First Principles with Python. O’Reilly Media, 406 pp.

  • Hochreiter, S., and J. Schmidhuber, 1997: Long short-term memory. Neural Comput., 9, 17351780, https://doi.org/10.1162/neco.1997.9.8.1735.

    • Search Google Scholar
    • Export Citation
  • Kingma, D. P., and J. Ba, 2015: Adam: A method for stochastic optimization. arXiv, 1412.6980, https://doi.org/10.48550/arXiv.1412.6980.

  • Koster, R. D., 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 11381140, https://doi.org/10.1126/science.1100217.

    • Search Google Scholar
    • Export Citation
  • Kretschmer, M., J. Runge, and D. Coumou, 2017: Early prediction of extreme stratospheric polar vortex states based on causal precursors. Geophys. Res. Lett., 44, 85928600, https://doi.org/10.1002/2017GL074696.

    • Search Google Scholar
    • Export Citation
  • LeCun, Y., Y. Bengio, and G. Hinton, 2015: Deep learning. Nature, 521, 436444, https://doi.org/10.1038/nature14539.

  • Lee, T., J. Y. Shin, J. S. Kim, and V. P. Singh, 2020: Stochastic simulation on reproducing long-term memory of hydroclimatological variables using deep learning model. J. Hydrol., 582, 124540, https://doi.org/10.1016/j.jhydrol.2019.124540.

    • Search Google Scholar
    • Export Citation
  • Lehmann, J., M. Kretschmer, B. Schauberger, and F. Wechsung, 2020: Potential for early forecast of Moroccan wheat yields based on climatic drivers. Geophys. Res. Lett., 47, e2020GL087516, https://doi.org/10.1029/2020GL087516.

    • Search Google Scholar
    • Export Citation
  • Li, L., and Coauthors, 2020: A causal inference model based on random forests to identify the effect of soil moisture on precipitation. J. Hydrometeor., 21, 11151131, https://doi.org/10.1175/JHM-D-19-0209.1.

    • Search Google Scholar
    • Export Citation
  • Li, L., Y. Dai, W. Shangguan, N. Wei, Z. Wei, and S. Gupta, 2022: Multistep forecasting of soil moisture using spatiotemporal deep encoder–decoder networks. J. Hydrometeor., 23, 337350, https://doi.org/10.1175/JHM-D-21-0131.1.

    • Search Google Scholar
    • Export Citation
  • Li, Q., Z. Wang, W. Shangguan, L. Li, Y. Yao, and F. Yu, 2021: Improved daily SMAP satellite soil moisture prediction over China using deep learning model with transfer learning. J. Hydrol., 600, 126698, https://doi.org/10.1016/j.jhydrol.2021.126698.

    • Search Google Scholar
    • Export Citation
  • Li, Q., Y. Zhu, W. Shangguan, X. Wang, L. Li, and F. Yu, 2022: An attention-aware LSTM model for soil moisture and soil temperature prediction. Geoderma, 409, 115651, https://doi.org/10.1016/j.geoderma.2021.115651.

    • Search Google Scholar
    • Export Citation
  • Liu, Q., and Coauthors, 2011: The contributions of precipitation and soil moisture observations to the skill of soil moisture estimates in a land data assimilation system. J. Hydrometeor., 12, 750765, https://doi.org/10.1175/JHM-D-10-05000.1.

    • Search Google Scholar
    • Export Citation
  • Martínez-Fernández, J., A. González-Zamora, N. Sánchez, and A. Gumuzzio, 2015: A soil water based index as a suitable agricultural drought indicator. J. Hydrol., 522, 265273, https://doi.org/10.1016/j.jhydrol.2014.12.051.

    • Search Google Scholar
    • Export Citation
  • Narasimhan, B., and R. Srinivasan, 2005: Development and evaluation of Soil Moisture Deficit Index (SMDI) and Evapotranspiration Deficit Index (ETDI) for agricultural drought monitoring. Agric. For. Meteor., 133, 6988, https://doi.org/10.1016/j.agrformet.2005.07.012.

    • Search Google Scholar
    • Export Citation
  • Nash, J. E., and J. V. Sutcliffe, 1970: River flow forecasting through conceptual models Part I—A discussion of principles. J. Hydrol., 10, 282290, https://doi.org/10.1016/0022-1694(70)90255-6.

    • Search Google Scholar
    • Export Citation
  • Neyshabur, B., S. Bhojanapalli, D. McAllester, and N. Srebro, 2017: Exploring generalization in deep learning. arXiv, 1706.08947, https://doi.org/10.48550/arXiv.1706.08947.

    • Search Google Scholar
    • Export Citation
  • O, S., and R. Orth, 2021: Global soil moisture data derived through machine learning trained with in-situ measurements. Sci Data, 8, 170, https://doi.org/10.1038/s41597-021-00964-1.

    • Search Google Scholar
    • Export Citation
  • O, S., E. Dutra, and R. Orth, 2020: Robustness of process-based versus data-driven modeling in changing climatic conditions. J. Hydrometeor., 21, 19291944, https://doi.org/10.1175/JHM-D-20-0072.1.

    • Search Google Scholar
    • Export Citation
  • Ombadi, M., P. Nguyen, S. Sorooshian, and K. Hsu, 2020: Evaluation of methods for causal discovery in hydrometeorological systems. Water Resour. Res., 56, e2020WR027251, https://doi.org/10.1029/2020WR027251.

    • Search Google Scholar
    • Export Citation
  • Orth, R., and S. I. Seneviratne, 2012: Analysis of soil moisture memory from observations in Europe. J. Geophys. Res., 117, D15115, https://doi.org/10.1029/2011JD017366.

    • Search Google Scholar
    • Export Citation
  • Ouyang, W., K. Lawson, D. Feng, L. Ye, C. Zhang, and C. Shen, 2021: Continental-scale streamflow modeling of basins with reservoirs: Towards a coherent deep-learning-based strategy. J. Hydrol., 599, 126455, https://doi.org/10.1016/j.jhydrol.2021.126455.

    • Search Google Scholar
    • Export Citation
  • Pan, J., W. Shangguan, L. Li, H. Yuan, S. Zhang, X. Lu, N. Wei, and Y. Dai, 2019: Using data-driven methods to explore the predictability of surface soil moisture with FLUXNET site data. Hydrol. Processes, 33, 29782996, https://doi.org/10.1002/hyp.13540.

    • Search Google Scholar
    • Export Citation
  • Pastorello, G., and Coauthors, 2020: The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data. Sci. Data, 7, 225, https://doi.org/10.1038/s41597-020-0534-3.

    • Search Google Scholar
    • Export Citation
  • Pearl, J., 2009: Causality. 2nd ed. Cambridge University Press, 484 pp., https://doi.org/10.1017/CBO9780511803161.

  • Prasad, R., R. C. Deo, Y. Li, and T. Maraseni, 2019: Weekly soil moisture forecasting with multivariate sequential, ensemble empirical mode decomposition and Boruta-random forest hybridizer algorithm approach. Catena, 177, 149166, https://doi.org/10.1016/j.catena.2019.02.012.

    • Search Google Scholar
    • Export Citation
  • Read, J. S., and Coauthors, 2019: Process-guided deep learning predictions of lake water temperature. Water Resour. Res., 55, 91739190, https://doi.org/10.1029/2019WR024922.

    • Search Google Scholar
    • Export Citation
  • Runge, J., R. V. Donner, and J. Kurths, 2015: Optimal model-free prediction from multivariate time series. Phys. Rev. E, 91, 052909, https://doi.org/10.1103/PhysRevE.91.052909.

    • Search Google Scholar
    • Export Citation
  • Runge, J., and Coauthors, 2019a: Inferring causation from time series in Earth system sciences. Nat. Commun., 10, 2553, https://doi.org/10.1038/s41467-019-10105-3.

    • Search Google Scholar
    • Export Citation
  • Runge, J., P. Nowack, M. Kretschmer, S. Flaxman, and D. Sejdinovic, 2019b: Detecting and quantifying causal associations in large nonlinear time series datasets. Sci. Adv., 5, eaau4996, https://doi.org/10.1126/sciadv.aau4996.

    • Search Google Scholar
    • Export Citation
  • Seneviratne, S. I., T. Corti, E. L. Davin, M. Hirschi, E. B. Jaeger, I. Lehner, B. Orlowsky, and A. J. Teuling, 2010: Investigating soil moisture–climate interactions in a changing climate: A review. Earth-Sci. Rev., 99, 125161, https://doi.org/10.1016/j.earscirev.2010.02.004.

    • Search Google Scholar
    • Export Citation
  • Seneviratne, S. I., and Coauthors, 2013: Impact of soil moisture-climate feedbacks on CMIP5 projections: First results from the GLACE-CMIP5 experiment. Geophys. Res. Lett., 40, 52125217, https://doi.org/10.1002/grl.50956.

    • Search Google Scholar
    • Export Citation
  • Shi, X., Z. Chen, H. Wang, D. Y. Yeung, W. K. Wong, and W. C. Woo, 2015: Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Proc. 28th Int. Conf. on Neural Information Processing Systems, Vol. 1, Montreal, QC, Canada, NIPS, 802810, https://dl.acm.org/doi/10.5555/2969239.2969329.

    • Search Google Scholar
    • Export Citation
  • Spirtes, P., and C. Glymour, 1991: An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev., 9, 6272, https://doi.org/10.1177/089443939100900106.

    • Search Google Scholar
    • Export Citation
  • Tongal, H., and M. Booij, 2018: Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J. Hydrol., 564, 266282, https://doi.org/10.1016/j.jhydrol.2018.07.004.

    • Search Google Scholar
    • Export Citation
  • Tuttle, S., and G. Salvucci, 2016: Empirical evidence of contrasting soil moisture–precipitation feedbacks across the United States. Science, 352, 825828, https://doi.org/10.1126/science.aaa7185.

    • Search Google Scholar
    • Export Citation
  • Venkat, L., T. J. Jackson, and Z. Diane, 2003: Soil moisture–temperature relationships: Results from two field experiments. Hydrol. Processes, 17, 30413057, https://doi.org/10.1002/hyp.1275.

    • Search Google Scholar
    • Export Citation
  • Wang, Y., H. Wu, J. Zhang, Z. Gao, J. Wang, P. Yu, and M. Long, 2022: PredRNN: A recurrent neural network for spatiotemporal predictive learning. IEEE Trans. Pattern Anal. Mach. Intell., in press, https://doi.org/10.1109/TPAMI.2022.3165153.

    • Search Google Scholar
    • Export Citation
  • Zhang, R., and Coauthors, 2021: Assessment of agricultural drought using soil water deficit index based on ERA5-land soil moisture data in four southern provinces of China. Agriculture, 11, 411, https://doi.org/10.3390/agriculture11050411.

    • Search Google Scholar
    • Export Citation
  • Zhang, Z., H. Qin, Y. Liu, Y. Wang, L. Yao, Q. Li, J. Li, and S. Pei, 2019: Long Short-Term Memory Network based on Neighborhood Gates for processing complex causality in wind speed prediction. Energy Convers. Manage., 192, 3751, https://doi.org/10.1016/j.enconman.2019.04.006.

    • Search Google Scholar
    • Export Citation
  • Zhao, W. L., and Coauthors, 2019: