This paper presents a framework that enables simultaneous assimilation of satellite precipitation and soil moisture observations into the coupled Weather Research and Forecasting (WRF) and Noah land surface model through variational approaches. The authors tested the framework by assimilating precipitation data from the Tropical Rainfall Measuring Mission (TRMM) and soil moisture data from the Soil Moisture Ocean Salinity (SMOS) satellite. The results show that assimilation of both TRMM and SMOS data can effectively improve the forecast skills of precipitation, top 10-cm soil moisture, and 2-m temperature and specific humidity. Within a 2-day time window, impacts of precipitation data assimilation on the forecasts remain relatively constant for forecast lead times greater than 6 h, while the influence of soil moisture data assimilation increases with lead time. The study also demonstrates that the forecast skill of precipitation, soil moisture, and near-surface temperature and humidity are further improved when both the TRMM and SMOS data are assimilated. In particular, the combined data assimilation reduces the prediction biases and root-mean-square errors, respectively, by 57% and 6% (for precipitation); 73% and 27% (for soil moisture); 17% and 9% (for 2-m temperature); and 33% and 11% (for 2-m specific humidity).
Numerical climate and land–atmosphere models are widely used for providing land–atmospheric predictions at different time scales. These models typically capture both atmospheric thermodynamic processes and cloud microphysics to predict the dynamics of land–atmosphere water and energy fluxes. To improve the predictions of land–atmosphere state variables and parameters, a common practice is to assimilate observations from in situ gauges, radiosondes, and satellite measurements into these numerical models. Although predictions of precipitation and soil moisture are intertwined (Case et al. 2011; Jiménez et al. 2014; Feng and Houser 2015), modern weather data assimilation systems often do not include soil moisture as a control state variable (Parrish and Derber 1992; Derber and Bouttier 1999; Barker et al. 2004; Wang et al. 2013). Therefore, the relative usefulness of assimilating satellite soil moisture observations into a coupled land–atmosphere model remains largely unknown. To this end, this paper develops a framework that allows simultaneous assimilation of satellite soil moisture and precipitation data into a coupled land–atmosphere model.
Direct assimilation of precipitation has received a lot of attention in the past years. The most common technique used for assimilation of accumulated precipitation is the four-dimensional variational data assimilation (4D-Var). Examples of global weather prediction systems capable of precipitation data assimilation include the Goddard Earth Observing System (GEOS) (Hou et al. 2000a,b, 2001, 2004; Pu et al. 2002; Lin et al. 2007), the European Centre for Medium-Range Weather Forecasts (ECMWF) operational system (Lopez and Bauer 2007; Geer et al. 2008; Lopez 2011, 2013), and the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) (Lien et al. 2016; Shao et al. 2016). On a regional scale, studies have assimilated rain rates into models such as the Weather Research and Forecasting (WRF) Model (P. Kumar et al. 2014; Lin et al. 2015) and the Japan Meteorological Agency (JMA) system (Koizumi et al. 2005). These studies have shown considerable improvement in rainfall forecasts over various spatiotemporal resolutions. However, there are several remaining issues associated with assimilation of precipitation, including (i) quick decay of the influence of assimilated information, (ii) non-Gaussian model error, (iii) inconsistency between full physics parameterization and its linearized representation, and (iv) large mismatches between observations and precipitation forecasts (Errico et al. 2007; Lopez 2007; Bauer et al. 2011).
Unlike precipitation data assimilation, soil moisture land surface data assimilation has been studied mostly offline with a land surface model that is not coupled with an atmospheric model (e.g., Dunne and Entekhabi 2006; Reichle et al. 2007; Liu et al. 2011; Peters-Lidard et al. 2011; Flores et al. 2012, 2014; S. V. Kumar et al. 2014; Zhao et al. 2016). The family of soil moisture data assimilation methods uses prescribed atmospheric forcing (e.g., precipitation and downward radiation) and updates only selected land surface states, which often include soil moisture and temperature profiles. More recently, Rasmy et al. (2011, 2012) developed a WRF-based system that is capable of updating the state of soil moisture, cloud liquid water, water vapor, rain, and snow through the assimilation of radiances from the Advanced Microwave Scanning Radiometer for Earth Observing System (AMSR-E). However, the analysis procedure of the atmospheric control states does not consider any error correlation in space. Another example is the development of a land data assimilation system semicoupled to Météo France’s Aire Limitée Adaptation Dynamique Développement International (ALADIN) weather system (Mahfouf et al. 2009; Mahfouf 2010; Draper et al. 2009, 2011a,b; Schneider et al. 2014). In this type of design, the land surface data assimilation system operates independently and provides land surface state analysis information to the land surface model coupled to the atmospheric weather model at the end of the simulation intervals (e.g., every 6 h).
In our previous work, Lin et al. (2015) assimilated ground-based radar precipitation data into the WRF Model for studying precipitation downscaling, and Lin et al. (2017) developed a soil moisture data assimilation system with the WRF–Noah and tested with the Soil Moisture and Ocean Salinity (SMOS) soil moisture data. In this paper, we aim to better understand how combined assimilation of precipitation and soil moisture can improve forecasting of land–atmospheric exchange and to understand their relative implications, rather than making any new algorithmic innovations to address the explained technical problems. To this end, we implement a combined variational data assimilation system to assimilate both satellite precipitation and soil moisture data into the WRF–Noah coupled land–atmosphere model. The background error covariances of the atmospheric control states and land surface soil moisture states are estimated separately using the National Meteorological Center (NMC) method (Parrish and Derber 1992). The main objective of this study is to investigate the relative impact of jointly assimilating precipitation and soil moisture data on the ability to forecast the two variables as well as atmospheric variables that control the land surface energy balance. We choose to assimilate precipitation and soil moisture retrievals directly instead of using indirect overland radiance assimilation, which is not well understood over frequency channels below 50 GHz. It is important to note that direct assimilation of ground-based precipitation rain rates has been used in the ECMWF operational forecast system (Lopez 2011, 2013). We conduct several numerical experiments with the developed assimilation system to assimilate data from the Tropical Rainfall Measuring Mission (TRMM) 3B42 version 7 precipitation (Huffman et al. 2007) using the WRF 4D-Var system and the SMOS soil moisture retrievals (Kerr et al. 2010) via a WRF–Noah one-dimensional variational data assimilation (1D-Var) system. The results are validated against several reference datasets. The results show that assimilation of both TRMM and SMOS data improves forecast skills of precipitation, soil moisture, and 2-m air temperature and specific humidity. The validation of 2-day forecasts also shows that the improvement rate due to precipitation data assimilation is nearly constant in time beyond a 6-h window, while the effects of soil moisture data assimilation increase throughout the 2-day forecasts.
The rest of the paper is organized as follows. Section 2 briefly explains the datasets, model configuration, and experiment design. In section 3, we evaluate the relative effect of combined data assimilation on predictions of precipitation, soil moisture, and 2-m air temperature and specific humidity. In section 4, we discuss the overall forecast skills and present the conclusions.
2. Datasets and methodology
This study uses three datasets in the data assimilation experiments, namely, the NCEP Final Analysis (FNL) to provide the boundary and initial conditions to our WRF experiments, the TRMM 3B42 precipitation to be assimilated into the WRF Model, and the SMOS soil moisture to be assimilated into the Noah land surface model. The 1° NCEP FNL data are produced by the Global Data Assimilation System (GDAS) on a nearly real-time scale and contain variables such as surface pressure, geopotential height, temperature, soil states, humidity, and winds. In the NCEP FNL data, the atmospheric variables are available in at least 26 levels from 10 to 1000 hPa, while the soil states are available at soil layers with thicknesses of 10, 30, 60, and 100 cm from top to bottom. The TRMM 3B42 product is retrieved from multiple satellite sensors with a temporal resolution of 3 h and a spatial resolution of 0.25° × 0.25° covering 50°S to 50°N latitudes (Huffman et al. 2007). This product uses a series of microwave and infrared estimates of precipitation and removes the bias using rain gauge observations. In addition, we use level-3 SMOS soil moisture retrieval at a spatial resolution of 25 km from the Barcelona Expert Centre, which is based on a level-2 SMOS orbital soil moisture dataset (Kerr et al. 2010).
To validate the performance of the experiments, we use the reference data from the NCEP Stage IV precipitation, the Soil Climate Analysis Network (SCAN), the Climate Reference Network (CRN), and the second version of the North American Land Data Assimilation System (NLDAS-2). The NCEP Stage IV precipitation dataset, available over the contiguous United States at a spatial resolution of 4 km, is a ground-based, radar-derived product with gauge correction (Lin and Mitchell 2005). Both the SCAN and CRN networks provide calibrated soil moisture measurements at depths of 5, 10, 20, 50, and 100 cm over the United States (Schaefer et al. 2007; Diamond et al. 2013). Figure 1 shows a map of 27 selected SCAN/CRN stations within an area of interest (see Fig. 2), over which we validate the conducted experiments. The NLDAS-2 data contain some of the best available land surface observations and model outputs over the contiguous United States with a spatial resolution of 0.125° and a temporal resolution of 1 h (Xia et al. 2012).
b. Configuration of the domain and WRF physics
This study uses WRF version 3.7.1 (Skamarock et al. 2008), compiled with GNU compilers. Figure 2 shows the configuration of a single domain that covers a large part of the Great Plains and exhibits strong land–atmosphere interaction (Koster et al. 2004, 2006). The grid spacing for the domain is 36 km. The top pressure level is set at 50 hPa, with 40 layers below. The WRF Model physics used in this study include the WRF single-moment 6-class microphysics scheme (Hong and Lim 2006), the Rapid Radiative Transfer Model longwave radiation scheme (Mlawer et al. 1997), the Dudhia shortwave radiation scheme (Dudhia 1989), the revised MM5 similarity land surface scheme (Jiménez et al. 2012), the Noah land surface model (Chen and Dudhia 2001), the Yonsei University (YSU) planetary boundary layer scheme (Hong et al. 2006), and the Kain–Fritsch cumulus scheme (Kain 2004).
c. WRF 4D-Var system and precipitation data assimilation
The WRF data assimilation (WRFDA) system, developed collaboratively by several agencies and institutes, is currently maintained by the National Center for Atmospheric Research (NCAR). This study uses the 4D-Var component of the system. The WRF 4D-Var system makes use of the incremental 4D-Var formulation to solve for the analysis increments by minimizing a quadratic cost function. The incremental 4D-Var includes tangent linear and adjoint models derived from a simplified version of the full nonlinear WRF Model. More detailed descriptions of the WRF 4D-Var system can be found in Huang et al. (2009). The standard control variables of the WRF 4D-Var system are the streamfunction, unbalanced velocity potential, unbalanced temperature, pseudorelative humidity, and unbalanced surface pressure (Barker et al. 2004). We employ the NMC method (Parrish and Derber 1992) to estimate domain-dependent, static background error covariance matrices for the standard control variables, referred to as option 5 (CV5). We use the NCEP FNL data in July 2013 as the initial and boundary conditions to produce multiple 12- and 24-h WRF forecasts and compute the background error covariance using forecasts valid at the same time but initialized 12 h apart.
In this study, we assimilate 6-h TRMM 3B42 precipitation data at its native spatial resolution into the WRF 4D-Var system. We chose 6-h accumulations because it has been shown that assimilation of precipitation accumulated at a shorter time than 6 h may not necessarily lead to improved forecasts (Lopez 2011). We employ a 6-h assimilation window and assimilate 6-h precipitation accumulation valid at the end of a 6-h cycle. In addition, we choose a threshold to discard those precipitation observations that are drastically far from the forecasts, as explained in Lin et al. (2015). To find an optimal threshold, we ran small-scale assimilation experiments over the study domain during 10–15 June 2009. We used various thresholds at 4, 6, 8, and 10 mm (6 h)−1 and found that 6 mm (6 h)−1 leads to the best forecast skills in terms of the mean absolute error and correlation.
d. Noah land surface model and soil moisture data assimilation
The Noah land surface model (Chen and Dudhia 2001) is used to provide land surface heat and moisture fluxes to the WRF Model. The Noah model is configured with four soil layers with thicknesses of 10, 30, 60, and 100 cm from top to bottom. Lin et al. (2017) characterized the monthly WRF–Noah soil moisture background error at a spatial resolution of 36 km using the NMC method and 8-yr WRF–Noah model simulations. The background error was used to assimilate only SMOS soil moisture data into the WRF–Noah model using a 1D-Var algorithm. Here, we use the bias-aware soil moisture background error covariance by Lin et al. (2017). This study uses a constant value of 0.04 m3 m−3 soil moisture observation error, consistent with the overall SMOS soil moisture retrieval error (Kerr et al. 2010), over the entire study domain.
Prior to the data assimilation, the original SMOS soil moisture data are regridded onto the 36-km grids of the study domain using the nearest-neighbor interpolation method. The regridded SMOS data are considered as the measurements of soil moisture in the top 10 cm of soil, even though it is well understood that the L-band soil moisture retrievals represent approximately the soil moisture in the top 5 cm of soil. We assimilate SMOS descending observations at 0000 UTC and ascending observations at 1200 UTC. The SMOS descending overpasses over the eastern United States (i.e., the right part of a straight-line cutoff approximately from Missouri to Michigan) are not assimilated in our experiments because of relatively large time differences between them and the assimilation time (see Lin et al. 2017).
e. Modeling framework and experimental design
Figure 3 shows the schematic of the framework of the combined data assimilation system. The framework employs the WRF 4D-Var system to assimilate TRMM 3B42 precipitation data (see section 2c) and the WRF–Noah 1D-Var system to assimilate SMOS soil moisture data (see section 2d). It is noted that the initial atmospheric control states are updated only after 4D-Var assimilation of precipitation, while the initial soil moisture states are updated after 1D-Var assimilation of soil moisture. Under this two-step data assimilation framework, we first assimilate precipitation and then soil moisture data independently. We believe that given the different time constants of atmospheric and soil processes, this two-step assimilation is reasonable approximation to a fully integrated simultaneous and joint assimilation. We conduct four numerical experiments during 1–28 July 2013, as follows:
OPL: The open-loop run without any data assimilation.
PDA: The 4D-Var precipitation data assimilation experiment, in which 6-h TRMM precipitation rain rates are assimilated into the WRF–Noah model.
SDA: The 1D-Var soil moisture data assimilation experiment, in which instantaneous SMOS soil moisture measurements are assimilated into the WRF–Noah model.
CDA: The combined data assimilation experiment that includes both (ii) and (iii).
In each 6-h analysis cycle, we obtain the first guess, or the background state, of soil moisture from the 6-h forecasts of the previous cycle (the cycling mode), while the first guess of the atmospheric control states is directly obtained from the NCEP FNL dataset (the cold-start mode). In other words, we initialize each experiment every 6 h by using the initial conditions obtained from the NCEP FNL dataset, except for the soil moisture states. Two-day forecasts are initialized every 6 h based on the estimated states without (i.e., OPL) and with data assimilation (i.e., PDA, SDA, and CDA) (Fig. 4). Output variables from the WRF–Noah model, such as precipitation, soil moisture, and 2-m air temperature and humidity, are evaluated over the study area for quantifying the impacts of data assimilation. We evaluate (i) the simulations of soil moisture from the cycling runs and (ii) the forecasts of precipitation, air temperature, and air-specific humidity with various lead times (i.e., from 6 to 48 h) from all of the 2-day forecast runs. These 2-day forecasts lead to more than 100 forecast runs over the study area (31 × 39 spatial grids) with a large enough number (>130 000) of grid-scale samples to assure the robustness of the forecast evaluation.
f. Bias correction of initial soil moisture conditions
We showed in Lin et al. (2017) that there is a wet model bias in soil moisture states, largely due to the existing biases in the NCEP FNL dataset, while SMOS soil moisture retrievals are relatively less biased over the study area. Therefore, rather than rescaling satellite observations onto the model climatology commonly used in land surface data assimilation (Reichle and Koster 2004), we estimate the soil moisture bias and remove it from the input NCEP FNL data. Then, we study the impact of such soil moisture bias correction on the WRF–Noah performance. The bias estimates are obtained by comparing the NCEP FNL soil moisture with measurements from the selected CRN/SCAN gauge data. Over the study area (see Fig. 2), the NCEP FNL soil moisture data have, on average, a wet bias of 0.08 and 0.01 m3 m−3 in the top 10-cm and lower 10–40-cm layers, respectively, relative to the gauge data. Here, prior to any data assimilation, we uniformly remove the soil moisture biases over the top 40-cm soil layers. When the soil moisture is unrealistically small or negative, the WRF Model automatically sets the soil moisture to a minimum value of 0.02 m3 m−3. Nonetheless, in this study, after bias correction, none of the pixels have an initial soil moisture condition less than 0.05 m3 m−3 over the study area.
To better understand the effects of soil moisture bias correction, we compare the ground-based soil moisture time series with the outputs from each experiment (i.e., OPL, PDA, SDA, and CDA) with and without bias correction. Figures 5a and 5b show the average soil moisture values obtained from 27 selected CRN/SCAN stations and the results within the numerical grids of the model containing those stations. The difference in soil moisture is shown in Figs. 5c and 5d. In the top 10-cm soil layer, bias correction of initial soil moisture conditions apparently leads to improvement in soil moisture simulations. However, the improvement vanishes after a few days. For instance, the difference in simulated soil moisture with and without bias correction becomes less than 0.01 m3 m−3 after nearly 2 weeks and remains even smaller later for the OPL and PDA experiments. For the SDA and CDA experiments, the soil moisture difference becomes less than 0.01 m3 m−3 after approximately 6 days and is nearly negligible (<0.001 m3 m−3) after 2 weeks (see Fig. 5c). For the lower soil layers, the effects of bias correction are still notable, but to a lesser extent than the top layer. Over the lower layer, the improvement decays slowly over time for the cases of OPL and PDA, while the impact of bias correction is negligible after four days for the SDA and CDA experiments. Overall, for a short-term case, bias correction of initial soil moisture condition can be helpful. Thus, throughout the paper, we will report only the experiments with bias-corrected soil moisture initial conditions.
g. Overview of the temperature and specific humidity analysis increments in PDA
To understand the effects of the 4D-Var system on air temperature and specific humidity, we analyze their analysis increments (analysis minus background). As SDA updates only land surface soil moisture states, we narrow down the comparison only to the results of PDA. Figure 6 shows the analysis increments of temperature and specific humidity, averaged over the entire study area (see Fig. 2) and time from 1 to 28 July 2013. It is found that, on average, PDA increases (decreases) the temperature (humidity) at the lower atmosphere below 500-hPa geopotential heights. As is evident, the analysis increments near the land surface are particularly significant compared to the results of upper levels in the atmosphere.
In this section, we quantify the impact of combined data assimilation on predictions of precipitation, soil moisture, and 2-m air temperature and specific humidity. We first compare the precipitation forecasts against the NCEP Stage IV dataset (section 3a) and the soil moisture simulations against soil moisture gauging observations from SCAN and CRN (section 3b). The forecasts of 2-m temperature and specific humidity are verified against the data from NLDAS-2 (section 3c). We emphasize that the precipitation data assimilation updates atmospheric but not soil moisture states in each assimilation cycle, and the soil moisture data assimilation updates only the soil moisture state. These two are connected via sensible and latent heat fluxes through which soil moisture states can directly influence the atmospheric states of the bottom atmospheric layer. We will investigate the significance of soil moisture data assimilation on the forecast skills of precipitation and near-surface variables. We use several metrics, namely the bias, root-mean-square error (RMSE), Pearson cross-correlation coefficient ρ, equitable threat score (ETS), false alarm ratio (FAR), and bias score (BS). These metrics are commonly used for quantifying the forecast quality. A detailed explanation of these metrics is included in the appendix.
a. Precipitation forecasts
We compare the 6-h precipitation analyses over the study area during 1–28 July 2013 against the NCEP Stage IV dataset (Fig. 7). The results in Fig. 7b show that PDA has a much lower false alarm ratio than OPL, except for the extreme rainfall intensity [e.g., rain rates greater than 20 mm (6 h)−1]. In addition, Fig. 7c shows that OPL overestimates precipitation with a bias score greater than one, while PDA significantly reduces the precipitation bias. Nevertheless, the bias score is less than one for rain rates greater than 10 mm (6 h)−1 in both the OPL and PDA experiments, indicating that the intensity and location of the precipitation extremes is not properly captured. Because the false alarm and bias score metrics are improved in the PDA experiment, it is not surprising that assimilation of TRMM data leads to a higher (better) equitable threat score than the open-loop experiment (Fig. 7a). However, we can see that assimilation of SMOS soil moisture data has a marginal impact on the 6-h precipitation analyses (Fig. 7). This observation suggests that a longer time scale may be required for the atmosphere to feel the changes in surface soil moisture.
To characterize the diurnal performance of the data assimilation of satellite precipitation, in Fig. 8 we group the precipitation analyses and report the scores for four different time intervals (i.e., 0000–0600, 0600–1200, 1200–1800, and 1800–2400 UTC). As the effects of soil moisture data assimilation are marginal on the precipitation analyses, we confine our consideration only to OPL and PDA. The most noticeable difference among different time intervals is that the WRF open-loop experiment produces significant overestimation during 1200–2400 UTC (local daytime). Especially from 1800 to 2400 UTC, the bias score for the OPL experiment is significantly greater than one, which is likely because the WRF Model tends to overestimate the summertime afternoon convection. This type of daytime overestimation is also reported by Lopez (2011). As is evident, assimilation of TRMM data significantly improves the bias on a diurnal scale, as well as the false alarm ratio and the equitable threat score.
We also analyze the quality of precipitation forecasts with a lead time of up to 2 days. Figure 9 shows the quantitative metrics obtained by comparing the 48-h precipitation forecasts with the reference NCEP Stage IV precipitation during 1–28 July 2013. Note that the statistics are computed for 6-h rainfall accumulated between two successive 6-h time intervals. It is evident that assimilation of TRMM data consistently reduces the bias and RMSE for forecasts beyond the 6-h assimilation window. As previously explained, PDA leads to, on average, an increased temperature and decreased humidity in the lower atmosphere, which may reduce the availability of precipitable water and therefore decrease the amount of analysis precipitation. For the forecasts with lead times between 6 and 48 h, the reduction of the bias and RMSE [see Eqs. (A4)–(A6)] is, on average, 50% and 4%, respectively. In addition, the results show that after the first 6 h, the difference between OPL and PDA in terms of correlation is very small, which suggests that the impact of precipitation data assimilation is predominately due to improvement in the intensity of precipitation (e.g., those measured in the bias and RMSE) rather than its spatial variability. In contrast to precipitation data assimilation, assimilation of SMOS soil moisture shows only a marginal effect on the quality of precipitation forecasts with a lead time of less than 18 h. However, for the precipitation forecasts with a lead time greater than 24 h, soil moisture data assimilation reduces the bias and RMSE, on average, by 26% and 2%, respectively. This time lag might be because soil moisture only directly affects the near-surface conditions, and it takes time to have large and accumulated effects throughout the atmosphere that can ultimately influence precipitation forecasts. With the simultaneous assimilation of TRMM and SMOS data, the improvement in precipitation forecasts is larger than that from the independent assimilation of TRMM and SMOS data. On average, over various lead times, CDA shows a reduction in the bias and RMSE in the precipitation forecasts by 57% and 6%, respectively.
b. Soil moisture simulations
We use ground-based soil moisture measurements from 16 SCAN and 11 CRN stations (see Fig. 1) as a reference to evaluate the performance of the soil moisture simulations. Throughout this subsection, we compare top 10-cm soil moisture simulations of a 36-km grid from the study area (see Fig. 2) with pixelwise collocated gauge measurements at a depth of 5 cm [section 3b(1)]. The lower 10–40-cm soil moisture simulations are also compared with the measurements at a depth of 20 cm [section 3b(2)]. Because of the inherent uncertainties associated with grid-to-point comparison, we report the averaged statistics over all of the chosen stations.
1) Top-Layer Soil Moisture
The quality metrics including bias, RMSE, and correlation of each station are computed by comparing hourly soil moisture measurements of the gauging stations (see Fig. 1) with the numerical simulations. Figure 10 shows the mean values and various percentiles of the quality metrics for each experiment. Table 1 shows the relative improvement (RI) between the open-loop experiment and the data assimilation experiments. First, without any data assimilation, OPL overestimates the top 10-cm soil moisture simulations on average by 0.066 m3 m−3 (Fig. 10a), which is partly caused by the overestimation of precipitation discussed previously (see Fig. 7c). The assimilation of TRMM data reduces the bias by 20% in surface soil moisture simulations, which can be largely attributed to the improved precipitation analyses (see section 3a). PDA also leads to a reduction in RMSE by 10% and an increase in correlation by 9% when compared to OPL (Table 1). Second, the impact of SMOS data assimilation on the surface soil moisture simulations is substantially larger than that of precipitation data assimilation. For instance, the reductions of bias and RMSE in SDA (compared to PDA) are 57% (20%) and 22% (10%), respectively. SDA also improves the temporal variation of the time series of hourly soil moisture through improving the correlation by 25%. Third, on average, CDA leads to apparent improvement in the soil moisture simulations in terms of the bias (73%), RMSE (27%), and correlation (33%). Figures 10b and 10c also show that CDA leads to the lowest RMSE and the highest correlation, which indicates usefulness of assimilating both soil moisture and precipitation.
2) Lower-Layer Soil Moisture
The OPL experiment overestimates the lower 10–40-cm layer soil moisture by about 0.01 m3 m−3 (Fig. 10d), which appears to be much smaller than the soil moisture bias in the top 10-cm layer. In general, assimilation of TRMM and SMOS data has a marginal effect on the quality of lower-layer soil moisture simulations. Figures 10d–f show that PDA slightly improves the quality of hourly soil moisture estimates in the lower soil layer in terms of bias, RMSE, and correlation. In contrast, SDA increases the correlation coefficient but leads to a negative bias and higher RMSE than OPL, and, clearly, CDA combines the outcomes of each individual assimilation of TRMM and SMOS data. However, it can be seen that none of the three assimilation scenarios (i.e., PDA, SDA, and CDA) have significant effects in the lower soil layers. This issue is likely attributed to the lack of observability for deep soil layers in the designed data assimilation system. Other factors, such as vertical soil heterogeneity, may also play a role as the WRF–Noah model assumes homogeneous textures across the soil layers. Many past studies have also reported similar observations, highlighting challenges in improving simulations of root-zone soil moisture through data assimilation (Reichle and Koster 2005; Yin et al. 2014; Blankenship et al. 2016).
c. Air temperature and specific humidity
In this subsection, we evaluate the effect of data assimilation on predictions of temperature and specific humidity at 2 m. Throughout the WRF–Noah model integration, the temperature and specific humidity at 2 m are computed for a diagnostic purpose and are mainly affected by soil moisture via the land surface sensible and latent heat fluxes and the atmospheric states of the lower planetary boundary layer. To compare the temperature and humidity forecasts with the reference dataset, the reference NLDAS-2 data were interpolated onto the 36-km grids of the study area using the area-average conservation option within the Earth System Modeling Framework.
Figure 11 shows the computed bias, RMSE, and correlation obtained by comparing the 2-m temperature forecasts with lead times of 6–48 h with the NLDAS-2 data. The open-loop experiment underestimates the temperature forecasts with an average bias of −2.4 K. Assimilation of TRMM data consistently reduces the bias and RMSE over different lead times. On average, over the lead times from 6 to 48 h, PDA leads to the relative reduction of bias and RMSE by 11% and 5%, respectively. The reduction of the bias and RMSE is likely attributed to the observed positive temperature analysis increment in the lower atmosphere (see section 2g). While the impacts of PDA remain nearly constant in time, the effect of SDA on the temperature forecasts increases with lead times. Specifically, for the temperature forecasts with a lead time of 6 h, the reduction of bias and RMSE is 3%, while for the 2-day forecasts, the reduction of bias and RMSE due to soil moisture data assimilation reaches 11% and 6%, respectively. This observation is consistent with our previous findings that the effects of soil moisture data assimilation manifest themselves in a time scale beyond 18 h (see Fig. 9). When the TRMM and SMOS data are assimilated together (CDA), this leads to improved temperature forecasts with a reduction of bias (17%) and RMSE (9%), averaged over all the lead times. Nonetheless, the effect of data assimilation is small in terms of correlation, which suggests that improvements are largely in terms of biases and not variation of the temperature signal.
Analogously, we demonstrate the impacts of data assimilation on the forecasts of near-surface specific humidity (Fig. 12). Despite the underestimation in temperature, the results show that the open-loop experiment overestimates 2-m specific humidity by approximately 1.4 g kg−1, averaged over various lead times (Fig. 12a). Assimilation of the TRMM precipitation consistently improves the 2-m specific humidity forecasts over various lead times, leading to an average reduction of bias and RMSE by 17% and 6%, respectively. The effects of soil moisture data assimilation are relatively small in the beginning but increase slightly for larger lead times. The results show that for the forecasts with a lead time of 6 h, SDA reduces the bias in specific humidity by 15%, but the reduction becomes 21% for larger lead times averaged between 24 and 48 h. On average, over various lead times, assimilation of SMOS data reduces the RMSE by 7%. As compared to SDA and PDA, the results with CDA are encouraging in terms of reduction in the bias and RMSE, which are 33% and 11%, respectively, highest among all the assimilation scenarios. Analogous to the temperature, the effect of data assimilation on the forecast of specific humidity is marginal in terms of the correlation coefficient.
4. Discussion and conclusions
In this study, we quantified the relative impact of assimilating TRMM precipitation and SMOS soil moisture data into the coupled WRF–Noah model. The model outputs of precipitation forecasts, soil moisture, and 2-m temperature and specific humidity forecasts were evaluated against a set of reference observations. It is found that the OPL experiment tends to overestimate precipitation, especially in the middle of the day, and overestimates top 10-cm soil moisture simulations by about 0.066 m3 m−3. In addition, the OPL experiment results in a bias of −2.4 K and 1.4 g kg−1 for the forecasts of 2-m temperature and specific humidity, respectively. The overestimation of the open-loop precipitation is likely the reason that leads to the overestimation of soil moisture, which often leads to a larger evaporation rate and therefore influences the predictions of 2-m temperature and humidity.
The statistics demonstrate the effectiveness of both precipitation and soil moisture data assimilation and further highlight the advantage of the combined data assimilation. Table 2 summarizes the relative improvement in terms of the bias and RMSE for the data assimilation experiments. The results of 2-day forecasts also show that the effect of 6-h TRMM data assimilation on the forecasts remains constant beyond a 6-h lead time. However, the effect of the SMOS data assimilation on the forecasts is relatively small for a lead time of less than 18 h and becomes more pronounced for longer lead times. This delayed response makes sense from a physical standpoint, and understanding the assimilation impacts on the forecasts of land surface and atmospheric variables with lead times greater than 2 days needs further investigation. Ultimately, the CDA includes the features of both precipitation and soil moisture data assimilation and leads to the highest improvement.
The results and conclusions are certainly subject to the quality of initial and lateral boundary conditions, which were obtained from the NCEP FNL datasets in this study. Testing the data assimilation system with other global datasets, such as the ECMWF products, is recommended. We can also extend the study to use newer precipitation and soil moisture products, such as the Integrated Multisatellite Retrievals for Global Precipitation Measurement (IMERG) and those from the Soil Moisture Active Passive (SMAP) satellite. IMERG has a finer spatial and temporal coverage than the TRMM 3B42 dataset does (Hou et al. 2014), while the estimates of soil moisture from SMAP are slightly more accurate than those from SMOS (Chan et al. 2016). High-resolution forecasts and data assimilation experiments are needed to understand the impact on the spatial organization of surface heat fluxes when precipitation forecast resolutions approach convection scales. This would also mean that the used constant precipitation error may not be sufficient in high-resolution precipitation data assimilation. We are currently working on multiplicative characterization of precipitation observation error, as well as the inclusion of other control states such as the rain/cloud mixing ratio in the 4D-Var analysis procedure to better accommodate to the challenge in high-resolution data assimilation. Because we estimated the background error for the atmospheric states and soil moisture states independently, the presented framework does not allow the precipitation observations to directly influence soil moisture states and the soil moisture observations to directly impact atmospheric states such as the air temperature and humidity. Future research should be devoted to overcome this limitation.
This research is part of Liao-Fan Lin’s Ph.D. dissertation (Lin 2016) and is sponsored by the NASA Precipitation Measurement Missions (PMM) science program through Grants NNX13AH35G and NNX16AE36G and by the Science Utilization of the Soil Moisture Active-Passive Mission (SUSMAP) science program through Grant NNX16AM12G. The support by the K. Harrison Brown Family Chair is also gratefully acknowledged. The NCEP FNL data were obtained from the National Weather Service, U.S. Department of Commerce, and NOAA/National Centers for Environmental Prediction (2000) (freely accessible at http://rda.ucar.edu/datasets/ds083.2/). The SMOS data (freely available at http://cp34-bec.cmima.csic.es) were obtained from the SMOS Barcelona Expert Centre, a joint initiative of the Spanish Research Council (CSIC) and Technical University of Catalonia (UPC), mainly funded by the Spanish National Program on Space. The TRMM data were obtained from the NASA PMM webpage (https://pmm.nasa.gov/index.php?q=data-access/downloads/trmm). The NCEP Stage IV data were obtained from the Earth Observing Laboratory at the NCAR (freely available at http://data.eol.ucar.edu/codiac/dss/id=21.093). The SCAN data were obtained from the Natural Resources Conservation Service (freely available at http://www.wcc.nrcs.usda.gov/scan/). The CRN soil moisture data were obtained from the National Centers for Environmental Information, NOAA (freely available at https://www.ncdc.noaa.gov/crn/). The NLDAS version 2 data were obtained from the NASA Goddard Earth Sciences Data and Information Services Center (freely available at http://disc.sci.gsfc.nasa.gov/uui/datasets?keywords=NLDAS). The WRF Model was obtained from the NCAR (freely available at http://www2.mmm.ucar.edu/wrf/users/). We appreciate these agencies for providing the models, data, and technical assistance. The authors would also like to thank three anonymous reviewers and editor Ryan Torn for their helpful comments.
This appendix briefly explains the performance metrics used in the paper, including the bias, RMSE, ρ, ETS, FAR, and BS. These metrics are commonly used for model verification in geosciences. More details can also be found in Wilks (2006). The formulas of the bias, RMSE, and ρ are defined as follows:
where and are the model outputs and the observed references, respectively, and the overbar indicates the sample mean. To quantify the impact of data assimilation, the relative improvement (RI) for the bias, RMSE, and correlation are computed as follows:
where OL and DA refer to the open-loop and data assimilation experiments, respectively.
The ETS, FAR, and BS metrics are computed based on a classic 2-by-2 contingency table that detects whether a rain rate exceeds a certain threshold. The table includes four components: (i) the total number of correct hits, (ii) the total number of false alarms, (iii) the total number of misses, and (iv) the total number of occasions that both forecasts and observations are under a specific threshold,
with a sample size . Based on Eq. (A7), the ETS, the FAR, and the BS are defined as follows:
where is the expected number of correct hits due to a random chance, as . ETS measures the fraction of observations that are predicted correctly and penalizes both false alarms and misses. ETS = 1 means a perfect forecast, while ETS ≤ 0 means that the model has no forecast skill. In Eq. (A9), FAR measures the fraction of false alarms and ranges between 0 and 1, indicating the best and the worst possible scenarios, respectively. Furthermore, Eq. (A10) illustrates overestimation for BS > 1 and underestimation for BS < 1.
This article is included in the Global Precipitation Measurement (GPM) special collection.