Strongly coupled land–atmosphere data assimilation has not yet been implemented into operational numerical weather prediction (NWP) systems. Up to now, upper-air measurements have been assimilated mainly in atmospheric analyses, while land and near-surface data have been assimilated mainly into land surface models. Thus, this study aims to explore the benefits of assimilating atmospheric and land surface observations within the framework of strongly coupled data assimilation. Specifically, we added soil moisture as a control state within the ensemble Kalman filter (EnKF)-based Gridpoint Statistical Interpolation (GSI) and conducted a series of numerical experiments through the assimilation of 2-m temperature/humidity and in situ surface soil moisture data along with conventional atmospheric measurements such as radiosondes into the Weather Research and Forecasting (WRF) Model with the Noah land surface model. The verification against in situ measurements and analyses show that compared to the assimilation of conventional data, adding soil moisture as a control state and assimilating 2-m humidity can bring additional benefits to analyses and forecasts. The impact of assimilating 2-m temperature (surface soil moisture) data is positive mainly on the temperature (soil moisture) analyses but on average marginal for other variables. On average, below 750 hPa, verification against the NCEP analysis indicates that the respective RMSE reduction in the forecasts of temperature and humidity is 5% and 2% for assimilating conventional data; 10% and 5% for including soil moisture as a control state; and 16% and 11% for simultaneously adding soil moisture as a control state and assimilating 2-m humidity data.
Over land, conventional data such as in situ surface weather observations and radiosondes are routinely incorporated in numerical weather prediction (NWP). Among various types of conventional observations, surface temperature and humidity are closely intertwined with land surface soil moisture, which is considered the most critical variable in land surface modeling (Santanello et al. 2019; Lin and Pu 2018). Soil moisture directly affects near-surface temperature and humidity forecasts via estimates of sensible and latent heat fluxes. However, in current NWP practices, conventional data are assimilated mainly within the atmospheric component where atmospheric variables such as temperature, humidity, winds, and pressure are used as control analysis states while land surface soil moisture is not. To further improve near-surface weather forecasts and land–atmosphere interactions within NWP, researchers and operational centers have been studying and implementing coupled data assimilation (e.g., Duerinckx et al. 2017; de Rosnay et al. 2013; Carrera et al. 2019; Munoz-Sabater et al. 2019). In almost all cases, assimilation of both land surface and atmospheric observations has been done in a framework of weakly coupled data assimilation (WCDA), with which land surface and atmospheric data analyses are performed separately. In contrast, the idea of strongly coupled data assimilation (SCDA) in NWP is relatively new and has not yet been implemented into any operational system to date. SCDA requires the estimation of cross-model error covariance and the simultaneous assimilation of both land and atmospheric measurements (Penny et al. 2017; Penny and Hamill 2017). In a previous study (Lin and Pu 2019), we found that SCDA has the potential to mitigate discrepancies between the interface of land–atmosphere data analysis and maximize the impact of land–atmosphere observations.
Direct assimilation of near-surface atmospheric observations into NWP models has demonstrated its importance for weather forecasts over land, especially in the lower troposphere (Pu et al. 2013; Zhang and Pu 2014, Pu 2017). For instance, Ha and Snyder (2014) assimilated 2-m temperature and humidity and 10-m winds in addition to conventional upper air data into the Weather Research and Forecasting (WRF) Model, demonstrating that assimilation of surface data reduced the forecast error of near-surface temperature and humidity by approximately 7% and 15%, respectively, and the error of near-surface winds by less than 5%. Ingleby (2015) studied the added value of assimilating surface observations in the Met Office global NWP system and concluded that assimilating near-surface winds has little impact, assimilating surface pressure data improves mainly forecasts of surface pressure, and simultaneously assimilating screen-level temperature and humidity measurements improves forecasts of near-surface temperature and humidity. Benjamin et al. (2010) and James and Benjamin (2017) quantified the impact of assimilating surface data in the Rapid Update Cycle (RUC) and its later version Rapid Refresh (RAP) based on the WRF Model. These two studies found that the assimilation of surface winds, pressure, temperature, and humidity data improved summertime weather forecasts in the lower troposphere (i.e., below 800 hPa) more than the assimilation of data from sounding, aircraft, global positioning system (GPS), and Geostationary Operational Environmental Satellites (GOES).
Progress has also been made in assimilating near-surface observations such as 2-m temperature and humidity and land surface soil moisture in a WCDA framework in NWP. In WCDA, soil moisture observations are assimilated into a land surface model, while the atmospheric component is affected by the land surface model mainly through land–atmosphere interactions. In the Aire Limitée Adaptation Dynamique Développement International (ALADIN) system, several studies have assimilated 2-m temperature and humidity and soil moisture retrievals using an extended Kalman filter but generally found a marginal impact on forecasts of 2-m temperature and humidity (Mahfouf 2010; Mahfouf and Bliznak 2011; Duerinckx et al. 2017). The global systems at the European Centre for Medium-Range Weather Forecasts (ECMWF) and Environment and Climate Change Canada (ECCC) are also capable of simultaneously assimilating 2-m temperature/humidity and surface soil moisture into their land surface systems. In ECMWF and ECCC, it is found that assimilation of surface soil moisture leads to much improved model skill for both surface and root-zone soil moisture compared to the assimilation of temperature and humidity (de Rosnay et al. 2013; Carrera et al. 2019; Munoz-Sabater et al. 2019); however, the improvement in the forecasts of 2-m temperature and humidity is seen mainly when assimilating near-surface temperature and humidity. Zheng et al. (2018) assimilated satellite soil moisture retrievals into the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) using an ensemble Kalman filter (EnKF) and found improved temperature and humidity forecasts over the contiguous United States up to 500 mb. Research efforts have also focused on the assimilation of surface soil moisture measurements on a regional scale (Schneider et al. 2014; Santanello et al. 2016; Seto et al. 2016; Lin et al. 2017a,b). Nevertheless, no progress has yet been made in implementing strongly coupled land–atmosphere data assimilation in NWP practices.
In our recent studies (Lin and Pu 2018, 2019), we highlighted the potential of strongly coupled land–atmosphere data assimilation in a variational data assimilation framework. In the first study (Lin and Pu 2018), we found that the error correlation between surface soil moisture, temperature, and humidity within the Noah land surface model coupled to WRF (WRF-Noah) was comparable, suggesting that 1) part of the error in surface soil moisture comes from atmospheric forcing, and 2) atmospheric initial conditions could potentially be corrected via soil moisture data assimilation. Then, in subsequent work (Lin and Pu 2019), we found that, over the U.S. Great Plains, the assimilation of NASA Soil Moisture Active Passive (SMAP) soil moisture retrievals into WRF-Noah under SCDA can provide additional benefits to forecasts of near-surface temperature and humidity as well as precipitation compared to that under WCDA. For example, WCDA leads to a bias reduction of 7.3% and 19.3% in 2-day forecasts of temperature and humidity, respectively, while SCDA contributes an additional bias reduction of 2.2% (temperature) and 3.3% (humidity). As a natural extension of this previous work, to further understand the relative importance of near-surface and land surface observations in NWP, we have implemented the framework of strongly coupled land–atmosphere data assimilation using the operational Gridpoint Statistical Interpolation (GSI) with EnKF and simultaneously assimilated 2-m temperature/humidity and land surface soil moisture into WRF with the Noah land surface model. The implementation includes 1) adding the soil moisture of all four Noah soil layers as a control analysis state and 2) assimilating soil moisture observations along with conventional atmospheric data simultaneously. In this study, we aim to understand the impact of adding soil moisture as a control analysis state and the relative effect of assimilating surface temperature, humidity, and soil moisture data on short-term weather forecasts.
The rest of the paper is organized as follows. Section 2 explains the configuration of the WRF Model and GSI-EnKF system, the implementation of strongly coupled land–atmosphere data assimilation, the experiment design, and the corresponding evaluation metrics. Section 3 describes the ensemble structure of the implemented coupled data assimilation system and the verification of the analyses and forecasts. Section 4 includes a discussion and the conclusions.
a. Configuration of WRF-Noah model and domain
This study uses WRF version 3.9.1 with the Advanced Research WRF (ARW) solver (Skamarock et al. 2008; Powers et al. 2017). The WRF Model is a mesoscale weather prediction system for both research and operations and is currently maintained by the National Center for Atmospheric Research (NCAR). We employ the physics suite that is well tested over the contiguous United States (CONUS) (see the WRF user guide at http://www2.mmm.ucar.edu/wrf/users/docs/user_guide_V3.9/contents.html). The CONUS suite includes the new Thompson microphysics scheme (Thompson et al. 2008), the Tiedtke cumulus parameterization scheme (Tiedtke 1989; Zhang et al. 2011), the new-version Rapid Radiative Transfer Model for GCMs (RRTMG) for longwave and shortwave radiation (Iacono et al. 2008), the Mellor–Yamada–Janjić planetary boundary layer scheme (Janjić 1994), the Monin–Obukhov Eta similarity surface layer scheme (Janjić 2002), and the Noah land surface model (Chen and Dudhia 2001). The Noah land surface model has four default soil layers with thicknesses of 10, 30, 60, and 100 cm from top to bottom.
A single domain of the Lambert conformal projection is configured with a resolution of 9 km and 120 × 150 grid points (Fig. 1a). We selected a domain over the central United States because this area is known for strong summertime land–atmosphere interactions (Koster et al. 2004, 2006; Dirmeyer et al. 2009). The model top pressure level is set at 50 hPa with 40 atmospheric layers below. Due to the use of sigma levels, the pressure of each atmospheric layer varies by location. The average pressure of each layer is shown in Fig. 1b.
The NCEP 0.25° Final Analysis (FNL) is used for providing boundary conditions. The NCEP FNL is produced based on GFS with the Noah land surface model, and thus the land surface boundary conditions from GFS-Noah are inherently consistent with our WRF-Noah experiments. The NCEP conventional upper air and surface weather observations that are routinely assimilated into the NCEP Global Data Assimilation System are used. From the conventional dataset, our numerical experiments assimilate radiosonde, surface data, and radar-derived winds, while radiosonde and 2-m temperature and humidity measurements are also used for verification. The International Soil Moisture Network (ISMN) collects various independently operated soil moisture networks to provide quality-controlled data in a unified format (Dorigo et al. 2013). From ISMN, the soil moisture measurements from the Soil Climate Analysis Network (SCAN; Schaefer et al. 2007) and the Climate Reference Network (CRN; Diamond et al. 2013) are assimilated and used for verification. Note that the soil moisture observations at a depth of 5 cm are used in both data assimilation and verification, as we aim to understand how well the new implementation works. The NCEP North American Mesoscale Forecast System (NAM) 12-km analysis is used for verifying the atmospheric analyses and forecasts in the WRF-Noah experiments. The sources of these datasets are stated in the acknowledgments.
c. GSI-EnKF and the implementation of soil moisture data assimilation
This research uses the community GSI-EnKF with version 3.6 of GSI and version 1.2 of EnKF, which is currently maintained by the NCAR Developmental Test bed Center. GSI-EnKF uses a two-step configuration. The first step is to compute the innovations for the ensemble mean and each ensemble member using the observation forward operator in GSI (Shao et al. 2016), and in the second step, the analyses of each member are computed based on the EnKF (Whitaker and Hamill 2002; Whitaker et al. 2008). Following the notation of Ide et al. (1997), the optimal analysis state xa in the EnKF is described as follows (Lorenc 1986; Whitaker and Hamill 2002):
In these two equations, xb denotes the background model forecast; yo is the vector of observations; is the linear operator that converts the model state to the observation space; and b and are the background and observation error covariance matrices, respectively.
Let us describe an ensemble mean with an overbar and a deviation from the mean with a prime. In the EnKF, b is estimated using the sample covariance from an ensemble of model forecasts and is described as
where n is the ensemble size and is the deviation from its ensemble mean, . The update equations for the EnKF may be further written as
where is the traditional Kalman gain, while is the gain used to update deviations from the ensemble mean for each ensemble member. In GSI-EnKF, is estimated as
In GSI-EnKF, observations are processed serially, one at a time. Thus, bT and in Eq. (7) are actually scalars in the computation.
We chose an ensemble size of 40, which is quite common in regional ensemble-based studies (e.g., Schwartz et al. 2015; McNicholas and Mass 2018). We obtained the initial and lateral boundary conditions of the ensemble members by adding random perturbations drawn from a global error covariance matrix using the RANDOMCV tool in the WRF Data Assimilation System (WRFDA) (Barker et al. 2004; Torn et al. 2006). The standard deviations of the initial perturbations are approximately 1.2 (K) for potential temperature, 1.1 (g kg−1) for specific humidity, and 2.6 (m s−1) for zonal and meridional winds within the troposphere under 200 mb. To keep a reasonable ensemble spread and avoid filter divergence, a tunable inflation coefficient can be set to adjust the posterior ensemble spread to match the prior ensemble spread (relaxation-to-prior spread; Whitaker and Hamill 2012). The inflation coefficient ranges from 0 (no inflation) to 1 (i.e., both prior and posterior ensemble spread are of the same magnitude). This study set the inflation coefficient at 0.9, a magnitude similar to Schwartz et al. (2015) and Lei et al. (2016, 2018).
A set of vertical and horizontal localization parameters is used to reduce spurious correlations caused by sampling errors (Gaspari and Cohn 1999). For example, several GSI-EnKF studies (Lei et al. 2016, 2018) set the parameters at around 1250 km for horizontal localization and 1.0 scale height for vertical localization, meaning there was no impact on the analysis beyond these ranges of a given observation point. However, these GSI-EnKF studies were done mainly on a global scale and with coarser model resolution than our experiments. To find a better choice of localization on a regional scale, we performed a week of GSI-EnKF experiments, cycling every 6 h during 1–7 July 2018 by assimilating conventional data, including radiosondes, radar-derived winds, and surface pressure data using the abovementioned settings. We used various localization parameters, including a combination of horizontal parameters of 300, 500, and 800 km and vertical scale height parameters of 0.2, 0.4, and 0.6, in addition to one using a horizontal parameter of 1250 km and a vertical scale height parameter of 1.0. The WRF-Noah analyses of temperature, humidity, and winds were verified against the NAM analysis. In terms of bias and root-mean-square error (RMSE), the model skill with the parameters of 1250 km and 1.0 scale height is the worst, while that with 500 km and 0.4 scale height is considered the best (not shown). Below we continue to use the localization of 500 km (horizontal) and 0.4 scale height (vertical) for the main experiments of the study (see Table 1).
We have two main implementations in GSI-EnKF for assimilating soil moisture observations via strongly coupled land–atmosphere data assimilation. The first is to include the soil moisture of all four Noah soil layers as a control state, together with other commonly used control analysis states including potential temperature, specific humidity, zonal and meridional winds, and surface dry air pressure. In GSI-EnKF, the pressure level of each atmospheric control analysis state is used for vertical localization. Thus, to ensure that the added soil moisture analysis state is compatible with the localization scheme, we set all four layers of soil moisture as a state at the surface by assigning surface pressure information at a given grid point. In the second implementation, we added soil moisture as a new type of conventional data and chose a standard deviation observation error of 0.04 (m3 m−3). This error value is consistent with several other studies that assimilated soil moisture data (e.g., Lin et al. 2017a,b), but it is certainly worthy of further investigation.
To understand the effect of localization and soil moisture observation error, we performed a single observation assimilation test. We chose to assimilate a single synthetic surface soil moisture observation with an ensemble-mean innovation of 0.0138 (m3 m−3) near the center of the study domain based on the ensemble first guess fields at 0000 UTC 3 July 2018 from experiment VarwSM_CONV_SM (see Table 1). The surface pressure of this assimilation site is 971 hPa. Single data assimilation tests with observation errors of 0.04 and 0.01 (m3 m−3) were conducted. Figure 2a clearly shows that there is no observational impact on the analysis beyond 500 km, given the horizontal localization used. Figure 2c shows that the surface data have no impact on the analysis above layer 12 (i.e., approximately 700 hPa, see Fig. 1b) due to the effect of vertical localization. In addition, from Figs. 2b and 2c, we can see that the observation error of 0.01 has a uniformly larger impact on the analysis (i.e., around 60% larger) than the error of 0.04.
d. Experimental design
Table 1 lists the numerical experiments performed in this study. The experiments were carried out from 1 to 28 July 2018, and the forecasts were cycled every 6 h. In addition to the cycling runs, we launched a 3-day forecast based on the GSI-EnKF analyses valid at 0000 UTC every 24 h during the study period to investigate the impact of data assimilation on short-range weather forecasts. Each of the experiments consists of 40 ensemble members. Overall, we have one one-loop experiment (OPNL) without any data assimilation; four data assimilation experiments using the commonly used atmospheric control analysis states (VarwoSM_*); and six data assimilation experiments that have soil moisture as an additional control analysis state (VarwSM_*). VarwoSM_CONV is also named as CNTL, as a control experiment with conventional data assimilation and control analysis states. The conventional data in the experiments include radar-derived winds, surface pressure, and radiosonde temperature, humidity, and winds. The 2-m temperature and humidity observations are assimilated in selected experiments to understand their relative impact on forecasts. Figure 3 shows the sample size of observations at various times in a day, averaged over the study period. At 0600 and 1800 UTC, there are less conventional data for temperature, humidity, and winds, which is due mainly to the measuring time of the radiosonde at 0000 and 1200 UTC.
Surface soil moisture observations from ground-based stations from SCAN and CRN are assimilated. A total of 31 soil moisture stations are spread across the study domain. We manually examined the top 10-cm soil moisture OPNL forecasts at a grid cell nearest to the SCAN/CRN stations with SCAN/CRN measurements at a depth of 5 cm every 6 h during 1–28 July 2018. We removed the stations when 1) the measurements showed substantial regular diurnal variation on dry days or no temporal variability and 2) the bias between the forecasts and measurements was greater than twice the chosen observation error (i.e., 0.04 m3 m−3). We also removed one of the dual stations that was located at the same place. As a result, the VarwSM_CONV_SM and VarwSM_CONV_T2Q2SM experiments assimilate the soil moisture measurements at a depth of 5 cm from the remaining 20 stations.
This study does not perform any bias correction over the assimilated observations due to assimilation of conventional atmospheric data and in situ soil moisture observations. It is common to assimilate conventional data without bias correction. This is particularly helpful in exploring the effectiveness of the implemented soil moisture data assimilation in GSI-EnKF.
e. Evaluation method
To quantify the performance of the experiments, we adopt the metrics of bias, RMSE, and Pearson’s correlation coefficient (ρ). Note that the metric of the correlation coefficient is computed mainly for evaluating soil moisture. To evaluate the relative impact (RI) in terms of these metrics between each data assimilation experiment relative to OPNL, we further use the following equations:
where DA and OL denote the data assimilation and open-loop experiments, respectively. An RI value of 0% indicates a neutral effect due to data assimilation, while a value of 100% shows the best possible scenario. Throughout our 4-week cycling experiments during 1–28 July 2018, the first week is considered spinup, and we verify mainly the analyses and forecasts initialized after 0000 UTC 8 July.
a. Examination of the GSI-EnKF ensemble structure
We first examine the ensemble structure such as the ensemble mean and spread. Figure 4 shows the ensemble mean and spread of soil moisture from selected experiments for brevity. The time–space-averaged ensemble mean and spread values were computed based on the first guess of each 6-h cycle during 1–28 July 2018. The results show that the assimilation experiments generally lead to wetter soils, especially for the upper three soil layers (Figs. 4a–d). The positive soil moisture analysis increments are directly attributed to assimilated atmospheric measurements and will be discussed later (see Figs. 5 and 6). In terms of ensemble spread, adding soil moisture as a control state often results in a reduced magnitude, particularly for the upper three layers (Figs. 4e–h). We also found that the magnitude of the GSI-EnKF ensemble spread is very comparable with that in Lin et al. (2017a), which used an approach called the National Meteorological Center method (Parrish and Derber 1992) to estimate the soil moisture background error of the WRF-Noah model analytically. From Fig. 4 in Lin et al. (2017a), the space–time-averaged error standard deviation of soil moisture in the summer over the contiguous United States is approximately 0.015, 0.007, 0.003, and 0.0015 (m3 m−3) for the layers from top to bottom, quite similar to the values in Figs. 4e–h.
We further investigate the ensemble mean and spread of the atmospheric control states by using the first guesses (i.e., 6-h forecasts from the previous cycle) every 6 h during 1–28 July 2018 and averaging over the domain and cycles (Fig. 5). In terms of the ensemble mean, Figs. 5b and 5f show that the assimilation experiments lead to cooler and wetter air in the lower troposphere, particularly below approximately 700 and 850 hPa for temperature and humidity, respectively. Assimilation of 2-m humidity data (i.e., VarSM_CONV_Q2) results in even cooler and wetter air than assimilation of other types of atmospheric data. For winds, data assimilation results in a reduced magnitude at around 750 hPa for zonal wind and below 750 hPa for meridional wind. Regarding the ensemble spread, as is obvious, data assimilation leads to a reduced magnitude for all atmospheric control states (Figs. 5d,h,l,p). We also found that the magnitude of the ensemble spread in Figs. 5c, 5g, 5k, and 5o is comparable with that of the background error standard deviation in Lin and Pu (2018), estimated according to the WRF-Noah simulations of multiple years using the NMC method (see Fig. 3 in Lin and Pu 2018). This suggests that the perturbation added in the boundary conditions and the GSI-EnKF setting is reasonable on a regional scale.
Cross-variable influence is another crucial component in strongly coupled data assimilation and is illustrated in Fig. 6. We computed the temporal correlation of the analysis increments from the states of soil moisture and atmospheric variables based on the results of every 6-h cycle during 1–28 July 2018. To categorize the diurnal variability, we also calculate the correlation coefficients for three time zones using samples valid at 0000 and 1800 UTC (day), 0600 and 1200 UTC (night), and combined (all). For brevity, we examine the results for the soil moisture of all four soil layers but only for the atmospheric variables of the bottom atmospheric layer from experiment VarSM_CONV. It is noted that there is only a marginal difference in cross-variable influence when the results from VarSM_CONV are compared to those from other experiments (not shown). According to the results at all times, Fig. 6 shows that assimilated observations tend to impact the analysis states of temperature and humidity simultaneously, but the impact is in a contrasting direction or sign. It is also clear that the deeper soil moisture state becomes less sensitive to the assimilated atmospheric observations. In addition, soil moisture tends to react simultaneously with temperature and humidity during the analysis procedure, but the direction (or sign) of the soil moisture analysis increment is on average opposite to that of the temperature analysis increment. The winds appear to have the smallest cross-variable influence via data assimilation. In terms of diurnal variability, the magnitude of the daytime correlation is clearly much larger than that of nighttime correlation. These findings are consistent with those of Lin and Pu (2018).
We further integrate the results from Figs. 5 and 6 to understand why the assimilation of atmospheric measurements leads to a wetter soil against OPNL in those experiments with soil moisture as a control state (i.e., those start with VarwSM_* in Fig. 4). Taking the analysis increments of temperature as an example (Fig. 5b), the data assimilation leads to a cooler temperature against OPNL near the surface with negative temperature analysis increments on average. Since the analysis increments between air temperature and soil moisture act in a reverse direction (or sign) shown in Fig. 6a, it is expected that data assimilation would result in a positive analysis increment in soil moisture on average and a wetter soil, compared with OPNL. Similarly, the average analysis increment of air humidity is positive near the surface, compared to OPNL (Fig. 5f), and the analysis increments between air humidity and soil moisture act in the same direction (or sign) (Fig. 6b). This, there is a positive soil moisture analysis increment on average via data assimilation. In combination, both of the analysis increments of temperature and humidity explain a wetter soil in the assimilation experiments with soil moisture added as a control state.
Overall, the ensemble structure of the experiments with soil moisture as a control analysis is comparable with previous research, which further demonstrates that the experiment design and implementation with GSI-EnKF are appropriate for studying strongly coupled land–atmosphere data assimilation. We note that this study does not evaluate the ensemble structure with the change of various localization and inflation parameters. In the future, one may explore these parameters for optimal assimilation of soil moisture data in EnKF.
b. Verification of the analyses
In this section, we focus on the evaluation of the analysis from each 6-h cycle. As mentioned, we consider the first week of the experiments as spinup and then evaluate the land and atmospheric analyses during 8–28 July 2018.
The analyses of the surface top 10-cm soil moisture and root-zone 10–100-cm soil moisture from each 6-h cycle are verified against the reference ISMN stations (Fig. 7). Note that the 6-h surface soil moisture estimates during 8–28 July are verified against all of the 20 ISMN stations (see Fig. 1a), while the 6-h root-zone soil moisture estimates in the same period are compared against only 12 ISMN stations due to data availability. For the surface 10-cm soil moisture, the sites for data assimilation and verification are the same so that we can better understand how well the new GSI-EnKF implementation works. Figure 7 shows the statistics by averaging the metrics of the bias, RMSE, and temporal correlation over the ISMN stations. For the surface soil moisture, there is a small bias of around 0.014 (m3 m−3) in OPNL. Not only adding soil moisture as a control state but also assimilating atmospheric data leads to a wetter soil. This is consistent with Figs. 4a, 5, and 6, which indicates that the data assimilation experiments tend to be cooler and wetter at near-surface with a wetter soil when compared with OPNL. Compared to assimilating only conventional data or assimilating 2-m temperature data, assimilating 2-m humidity data apparently causes the largest wet soil moisture analysis increments with degraded soil moisture analyses in terms of bias, RMSE, and correlation coefficient. VarwSM_CONV_T2Q2 shows the worst statistics in surface soil moisture analyses. In contrast, the implemented assimilation of in situ surface soil moisture appears to work well, as VarwSM_CONV_SM shows the smallest bias and RMSE and the highest correlation among all the experiments, with the relative improvement of 66%, 53%, and 59% in the bias, RMSE, and correlation, respectively, according to Eqs. (8)–(10).
For root-zone soil moisture, there is a large dry bias of around −0.09 (m3 m−3) in OPNL (Fig. 7b). It is found that adding soil moisture as a control state and assimilating near-surface temperature and humidity data result in a wetter soil, which is the same as the surface soil moisture and consistent with Figs. 4b and 4c. Consequently, the VarwSM_CONV, VarwSM_CONV_T2, VarwSM_CONV_Q2, and VarwSM_CONV_T2Q2 experiments effectively reduce the dry OPNL soil moisture bias and lead to a reduced RMSE. However, these experiments also show the lowest correlation metric between the experiments and the reference ISMN dataset (Fig. 7f). This is mainly due to the bias correction, which, as a result, modifies the temporal pattern of the soil moisture time series at several stations (not shown). Future work could be devoted to more extensive experiments from months to years to understand the impact of coupled data assimilation on root-zone soil moisture. It is also noted that the degraded temporal correlation in the surface and root-zone soil moisture estimates was reported in several studies with assimilating near-surface temperature and humidity into a land surface model (Carrera et al. 2019; Munoz-Sabater et al. 2019). In contrast, assimilating in situ surface soil moisture leads to an improved temporal correlation of the root-zone soil moisture analyses but degrades the analyses in the sense of bias and RMSE. Apparently, the conclusion is not unanimous in terms of metrics. Whenever the bias/RMSE in the root-zone soil moisture analyses is reduced, the temporal variability degrades, and vice versa. Further verification of atmospheric variables is needed to understand how variations in soil moisture analyses affect atmospheric forecasts.
We verify the atmospheric analysis from every 6-h cycle during 8–28 July 2018 against the reference NAM dataset. CNTL is first compared with OPNL over each model layer (Fig. 8). Figures 8a–d show that the assimilation of atmospheric conventional data leads to a bias reduction in the analyses of temperature and humidity over the lower troposphere but has a marginal to negative effect on the wind analyses. In terms of RMSE, the results also confirm that the benefit of assimilating atmospheric conventional data is seen mainly in the lower troposphere for temperature and humidity. However, it is found that there is a consistent RMSE reduction in the wind analyses throughout the troposphere. In general, GSI-EnKF is shown to be useful on a regional scale.
The analyses of all other experiments are further evaluated and compared with those of CNTL. Figure 9 shows the difference in the relative improvement between each assimilation experiment with respect to CNTL averaged over all atmospheric levels and the bottom 10 layers (i.e., below approximately 750 hPa). The results indicate that adding soil moisture as a control state appears to be helpful in most cases. For assimilating conventional data, VarwSM_CONV always leads to a positive reduction in the bias error and RMSE compared to CNTL (Fig. 9). For the experiments that assimilate 2-m temperature and humidity, adding soil moisture as a control state reduces the bias and RMSE of the temperature, humidity, and wind forecasts in general, except for a marginal degradation in the bias of humidity forecasts. Furthermore, when comparing the effect of assimilating various types of surface observations, 2-m humidity data appear to be the most beneficial on improving forecasts, especially for those of temperature and humidity. The assimilation of 2-m temperature data is helpful for the temperature analyses but has a marginal to negative effect on other variables. The assimilation of 2-m humidity data leads to the largest bias and RMSE reduction in the temperature and humidity analyses compared to the assimilation of 2-m temperature or surface soil moisture data. The effect of assimilating in situ soil moisture data seems to be neutral to slightly positive, which is mainly due to the observation size. Nonetheless, the assimilation of these surface data has only a marginal effect on the wind analyses. Therefore, below we will focus on the evaluation of temperature and humidity analyses and forecasts.
Figure 10 shows the statistics averaged from the temperature and humidity analyses of the experiments every 12 h during 8–28 July 2018 verified against the reference radiosonde measurements. As is obvious, CNTL is in a good agreement with the reference compared to OPNL (Figs. 10a,c,e,g). It is also found that the effect of adding soil moisture as a control state and assimilating surface data is confined mostly within the lower troposphere (e.g., below 700 hPa). Adding soil moisture as a control state often leads to the error reduction in the temperature and humidity forecasts. For example, we can see that experiment 5 (i.e., VarwSM_CONV) results in more bias and RMSE reduction than CNTL. In the experiments that assimilate 2-m temperature and humidity data, adding soil moisture is often helpful (e.g., see experiment 2 vs 6 or experiment 3 vs 7). Furthermore, at the levels of 850 and 925 hPa, assimilating 2-m temperature data reduces the RMSE in the temperature analysis but degrades the RMSE for the humidity analysis. In contrast, assimilation of 2-m humidity data reduces the RMSE for both the temperature and humidity analyses below 850 hPa. The findings from the sounding verification generally agree with those from Figs. 8 and 9. The limited effect of 2-m temperature data assimilation on other variables could be related to the observation error used in the GSI-EnKF and certainly requires further investigation.
Figure 11 shows the statistics averaged over all surface stations for the 2-m temperature and humidity analyses every 6 h during 8–28 July 2018 verified against the METAR stations. The overall results are consistent with the verification against the NAM and radiosonde measurements shown in Figs. 8–10. OPNL shows a warm temperature bias and a dry humidity bias, and CNTL reduces not only the bias but also RMSE. The benefit of adding soil moisture as a control state is seen mostly in the paired comparison (i.e., with and without soil moisture as a control state), including CNTL versus VarwSM_CONV, VarwoSM_CONV_T2 versus VarwSM_CONV_T2, and VarwoSM_CONV_Q2 versus VarwSM_CONV_Q2. Among various types of data, the impact of assimilating 2-m temperature is seen mainly on the temperature analysis, and that of assimilating soil moisture data is marginal on the analyses of temperature and humidity. In contrast, the impact of assimilating 2-m humidity data is found for reducing the most bias and RMSE of temperature and humidity analyses. In terms of RMSE, VarwSM_CONV_Q2 shows a relative improvement of 15% and 18% in the temperature and humidity analyses, respectively, compared to ONPL.
c. Verification of the forecasts
The temperature forecasts from the experiments initialized at 0000 UTC during 8–28 July 2018 are verified against the sounding data. Note that here we consider the forecasts with a lead time of 12, 36, and 60 h to be at night because they are valid at 1200 UTC or 0600 locally, while the forecasts with a lead time of 24, 48, and 72 h are valid in the daytime. For brevity, we show the statistics averaged over the domain and the cycles (i.e., 21 cycles) with various lead times for only the selected experiments (Figs. 12a–d,f–i) and then present the statistics for all the experiments that are further averaged over the lead times of 12, 24, 36, 48, 60, and 72 h (Figs. 12e,j). Due to the marginal effect on the analyses from the assimilation of 2-m temperature data, we have skipped the forecast runs for experiments VarwoSM_CONV_T2 and VarwSM_CONV_T2.
Figure 12a shows a strongly diurnal variability in OPNL in terms of bias below 850 hPa, with a cold bias during the night and a warm bias during the day. Nonetheless, the average over the various lead times below 850 hPa is positive, consistent with Fig. 10a. The largest OPNL RMSE also appears below 850 hPa (Fig. 12f). The results indicate that the impact of assimilating conventional data is confined mainly below 850 hPa and up to a lead time of 48 h (Figs. 12b,g). Compared to the improvement in the analysis (Figs. 10a,c,e,g), the improvement in the forecasts in CNTL is small. It is likely that the effect of assimilating sounding data is strong near the sites but relatively weak farther from the sites, and therefore the benefit of assimilating sounding data quickly decreases with forecast lead times due to the spread of errors. For all other assimilation experiments, we find that adding soil moisture as a control state and assimilating 2-m humidity data further extend the impact of data assimilation up to 72 h (Figs. 12c–d,h–i). However, the additional forecast improvements are seen mainly below 850 hPa and in the daytime (i.e., 0000 UTC) while the forecasts at night (i.e., 1200 UTC) can be slightly degraded. On average, the bias of the temperature forecasts is reduced by −2.3% (degradation), 3.5%, and 5.8% in experiments CTNL, VarwSM_CONV, and VarwSM_CONV_Q2, respectively, over the levels of 850 and 925 hPa and at various lead times (Fig. 12e). In terms of RMSE, the error is reduced by 1.1%, 2.0%, and 2.1% for these three experiments (Fig. 12j).
Similarly, Fig. 13 shows the verification of the humidity forecasts against the radiosonde measurements. OPNL shows a dry bias and a large RMSE, particularly below 850 hPa (Figs. 13a,f). Assimilating conventional sounding data (CNTL) results in only a marginal impact on the humidity forecasts. Adding soil moisture as a control state and assimilating 2-m humidity data are still considered beneficial and lead to forecast improvement up to 72 h. However, there is slight degradation around 700 hPa. On average over various lead times at the levels of 850 and 925 hPa, the bias of the humidity forecasts is reduced by 2%, 7%, and 17% in CNTL, VarwSM_CONV, and VarwSM_CONV_Q2, respectively (Fig. 13e), while the RMSE reduction is 0%, 1%, and 3% for these three experiments (Fig. 13j).
To explore the assimilation experiments in more details, we verify the temperature forecasts against the reference NAM analysis over each model grid point below the bottom 15 atmospheric layers (i.e., below 550 hPa). Similar to the verification against the sounding data, we show detailed results for only the selected experiments, including OPNL, CNTL, VarwSM_CONV, and VarwSM_CONV_Q2 and the averaged statistics over various lead times for all other experiments. Figures 14 and 15 show the verification for temperature and humidity forecasts, respectively. The overall findings are quite consistent with those from Figs. 12 and 13. The impact of data assimilation is confined mostly to the lower atmosphere below layer 10, or 750 hPa. CTNL can improve forecasts up to a lead time of 24 h, while adding soil moisture as a control state and assimilating 2-m humidity can extend the benefit of data assimilation up to at least 72 h. In terms of temperature, the relative improvement in terms of RMSE over the bottom 10 layers and various lead times is 4.8%, 9.5%, and 15.6% for CNTL, VarwSM_CONV, and VarwSM_CONV_Q2, respectively (Fig. 14j). Similarly, the average relative improvement in terms of RMSE for humidity is 2%, 5%, and 11% for these three experiments (Fig. 15j). To further highlight the difference between experiments with and without soil moisture a control state, the RMSE averaged over various lead times and the bottom 10 atmospheric layers is reported in Table 2. As is evident, the inclusion of soil moisture as a control analysis state improves forecasts compared to the use of only conventional atmospheric analysis states.
Figures 16 and 17 show the temperature and humidity forecasts verified against the 2-m METAR data. The overall results are consistent with the verification against the radiosonde and NAM analysis. The benefit of data assimilation in CNTL lasts up to a day, while adding soil moisture as a control state and assimilating 2-m humidity obviously further improve the forecasts for up to at least 3 days. In terms of temperature, the RMSE is on average reduced by 3%, 6%, and 10% in CNTL, VarwSM_CONV, and VarwSM_CONV_Q2, respectively (Fig. 16f). For humidity in these three experiments, the RMSE is on average reduced by 2%, 6%, and 12% (Fig. 17f). The averaged RMSE is computed in Table 3 to underscore the benefit of adding soil moisture as a control analysis state. The table clearly shows that adding soil moisture as a control state improves the forecasts of 2-m temperature and humidity in each paired experiment (i.e., with and without the soil moisture control state).
4. Summary and discussion
To examine the impacts of strongly coupled land–atmosphere data assimilation on short-range weather forecasting, we included soil moisture as a control analysis variable. Then, we assimilated surface soil moisture into the Gridpoint Statistical Interpolation (GSI) with an ensemble Kalman filter (EnKF). A series of experiments using the Weather Research and Forecasting (WRF) Model with the Noah land surface model was conducted by assimilating 2-m temperature and humidity and surface soil moisture data, in addition to assimilating conventional data including radiosondes, radar-derived winds, and surface pressure measurements. It was found that under the GSI-EnKF structure, adding soil moisture as a control state and assimilating 2-m humidity data are particularly helpful for improving forecasts, while the impact of assimilating 2-m temperature (surface soil moisture) data is positive mainly for temperature (soil moisture) analyses. The assimilation of conventional data results in an improved forecast of temperature and humidity up to mostly a day, while the inclusion of soil moisture as a control state and the assimilation of 2-m humidity further extend the improved forecast up to at least 3 days.
Moreover, when combining the statistics from both surface and root-zone soil moisture, there is a dry bias in the top 1-m soil moisture estimates. We found that the inclusion of soil moisture as a control state reduces the bias and RMSE in the top 1-m soil moisture estimates but degrades the correlation metric that measures soil moisture temporal variability. In contrast, the assimilation of the in situ surface soil moisture data improves mainly the temporal correlation metric. As this study finds that adding soil moisture as a control state is overall beneficial on near-surface weather forecasts, it appears that the bias and RMSE metrics are a more direct indicator than the correlation metric for evaluating the soil moisture impact of weather forecasts. In previous coupled data assimilation studies (de Rosnay et al. 2013; Carrera et al. 2019; Munoz-Sabater et al. 2019), there is a common finding that assimilating atmospheric data tends to degrade the soil moisture modeling skills. However, such degraded skills lead to improved near-surface weather forecasts. The reasons resulting in this finding is still an open question. Nevertheless, these previous studies often emphasized the correlation metrics rather than RMSE or bias in the evaluation of soil moisture estimates. Based on the results in this paper, we suggest that the metric of bias and RMSE should be used together with the metric of temporal correlation for evaluating soil moisture estimates in future coupled data assimilation research.
Finally, we note that the experiment configurations in this study are not fully compatible with the operational NWP systems. First, due to computation constraints, our study area covers the U.S. Great Plains, where land–atmosphere interactions are relatively strong. To better understand the spatial variability of the impact of strongly coupled data assimilation (SCDA) on weather forecasts, one needs to expand the experiments to a larger spatial extent, from a continent to the entire world. Second, like several studies involving algorithm implementation (e.g., Wang et al. 2008a,b; Whitaker et al. 2008; Lien et al. 2016), we assimilated only a subset of all available conventional atmospheric measurements into WRF-Noah. Given a large volume of satellite radiances is routinely assimilated into operational systems, further studies are needed to prove the forecast improvements due to the implementation of SCDA in the operational NWP environment. Nonetheless, the investigation and implementation in this study demonstrate the potential application of SCDA in operational NWP. Third, this study assimilated in situ surface soil moisture measurements and demonstrated its effectiveness in reducing the bias and RMSE of surface soil moisture forecasts with GSI-EnKF under the framework of SCDA. However, in situ soil moisture data are limited by sample size (even globally) and data latency in a real-time operational environment. Future work should devote to the assimilation of satellite-based soil moisture observations under SCDA.
The authors appreciate Dr. Jeffrey Anderson and three anonymous reviewers for their helpful comments in improving this manuscript. This study is sponsored by NOAA NWS under Award NA16NWS4680015. The support and resources from the Center for High-Performance Computing at the University of Utah are gratefully acknowledged. The ISMN soil moisture data were obtained freely at https://ismn.geo.tuwien.ac.at/ismn/. The National Centers for Environmental Prediction/National Weather Service/NOAA/U.S. Department of Commerce (2008, 2015a,b) provided the NCEP FNL dataset (https://rda.ucar.edu/datasets/ds083.3/), the conventional sounding and surface data (https://rda.ucar.edu/datasets/ds337.0/), and the NAM analysis (http://rda.ucar.edu/datasets/ds609.0/) at no cost. The WRF Model was obtained from NCAR, freely available at http://www2.mmm.ucar.edu/wrf/users/. We appreciate these agencies for providing the model, data, and technical assistance. The second author (ZP) would also like to express her appreciation to Drs. Mike Ek and Vijay Tallapragada for their collaboration and useful comments.
Denotes content that is immediately available upon publication as open access.