Uncertainties in precipitation forcing and prestorm soil moisture states represent important sources of error in streamflow predictions obtained from a hydrologic model. An earlier synthetic twin experiment has demonstrated that error in both antecedent soil moisture states and rainfall forcing can be filtered by assimilating remotely sensed surface soil moisture retrievals. This opens up the possibility of applying satellite soil moisture estimates to address both key sources of error in hydrologic model predictions. Here, in an attempt to extend the synthetic analysis into a real-data environment, two satellite-based surface soil moisture products—based on both passive and active microwave remote sensing—are assimilated using the same dual forcing/state correction approach. A bias correction scheme is implemented to remove bias in background forecasts caused by synthetic perturbations in the ensemble filtering routines, and a triple collocation–based technique is adopted to derive rescaled observations and observation error variances. Results are largely in agreement with the earlier synthetic analysis. That is, the correction of satellite-derived rainfall forcing is able to improve streamflow prediction, especially during relatively high-flow periods. In contrast, prestorm soil moisture state correction is more efficient in improving the base flow component of streamflow. When rainfall and soil moisture state corrections are combined, the RMSE of both the high- and low-flow components of streamflow can be reduced by ~40% and ~30%, respectively. However, an unresolved issue is that soil moisture data assimilation also leads to underprediction of very intense precipitation/high-flow events.
Surface soil moisture plays a key role in determining the partitioning of surface-incident rainfall between infiltration and surface runoff. As a result, the characterization of prestorm soil moisture states is an important component of most hydrologic prediction systems. With the growing availability of satellite-derived surface soil moisture retrievals (Naeimi et al. 2009; Entekhabi et al. 2010; Kerr et al. 2010), the application of data assimilation techniques in hydrological modeling has become increasingly popular. In particular, the assimilation of surface soil moisture retrievals into rainfall–runoff modeling has been demonstrated to benefit streamflow predictions via the improved constraint of antecedent soil moisture states that determine the infiltration capacity of the land surface (e.g., Aubert et al. 2003; Crow and Ryu 2009, hereafter CR09; Draper et al. 2011; Lee et al. 2011; Brocca et al. 2012; Li et al. 2013). Practical challenges in application, such as insufficient model coupling strength (Chen et al. 2011) as well as biased precipitation data (Draper et al. 2011), have also been discussed.
In addition to prestorm soil moisture level, precipitation uncertainty is another important source of error in streamflow predictions. Because of its global coverage at high temporal frequency (as high as 3 hourly), satellite precipitation measurements have become increasingly utilized in forcing hydrologic models (Hong et al. 2007; Wu et al. 2012) and are often the only source of precipitation data available for ungauged basins. However, real-time products with relatively low latency (less than daily) typically contain significant bias and errors that can propagate into other model variables during operational streamflow forecasting. Left uncorrected, such errors can significantly impact subsequent hydrologic model predictions (Harris et al. 2007; Li et al. 2009; Pan et al. 2010; Gebregiorgis et al. 2012). Recent work has also clarified that remotely sensed soil moisture has a potential role in reducing uncertainty in precipitation accumulation products (Crow et al. 2009; Pellarin et al. 2008; Brocca et al. 2013). In particular, Crow et al. (2011) introduced the Soil Moisture Analysis Rainfall Tool (SMART) to correct short-term (i.e., from daily to pentad scale) precipitation accumulation estimates using satellite-based surface soil moisture retrievals. As a result, remotely sensed surface soil moisture retrievals have the potential to address two separate sources of error in rainfall–runoff modeling: 1) uncertainty in prestorm soil moisture states and 2) error in within-storm precipitation accumulation totals. Recognizing this potential, CR09 presented a set of synthetic twin data assimilation experiments that verified that a dual-correction strategy (tasked with correcting both antecedent soil moisture states and incident rainfall forcing) is superior to competing techniques that apply data assimilation to correct either precipitation forcing or prestorm model states in isolation.
While encouraging, results in CR09 are based solely on synthetic experiments that do not capture the entire range of challenges involved in applying actual satellite-based surface soil moisture retrievals to hydrologic prediction applications. As a result, this paper attempts to build upon CR09 by testing its dual-correction soil moisture data assimilation strategy using real satellite data products. With a combination of passive microwave data from the Soil Moisture Ocean Salinity (SMOS) satellite and active microwave observation from the Advanced Scatterometer (ASCAT), near-daily soil moisture retrieval temporal frequency can be achieved. Through enhancement of both precipitation and antecedent soil moisture, this dual-assimilation system aims to simultaneously correct random errors in both soil moisture and rainfall precipitation inputs (via SMART) and therefore maximize the utility of existing remotely sensed soil moisture data products for hydrologic prediction applications in ground data–poor regions. In addition, the combined use of active and passive surface soil moisture retrieval provides an important analogy for the eventual availability of simultaneous active–passive soil moisture products from the National Aeronautics and Space Administration (NASA) Soil Moisture Active Passive (SMAP) mission (Entekhabi et al. 2010).
2. Study site and data
The analysis is focused on 13 basins located in the central United States (Table 1, Fig. 1) that have been screened for adequate rain gauge and low levels of anthropogenic stream regulation (Schaake et al. 2001). The chosen basins are from small to intermediate in size (700–10 000 km2) and demonstrate a general westward trend of decreasing annual precipitation and vegetation density. Climatic normal potential evapotranspiration (PET) data as well as parameter values for the Sacramento Soil Moisture Accounting (SAC-SMA, hereafter just SAC) model within each basin are obtained from the Model Parameter Estimation Experiment (MOPEX) datasets (Schaake et al. 2001). Daily mean discharge data measured at basin outlets are obtained from the U.S. Geological Survey (USGS).
The ASCAT on board the Meteorological Operational (MetOp) satellite measures radar backscatter in C band (5.255 GHz) with approximately 25-km resolution. MetOp was launched on 19 October 2006 and is in a sun-synchronous orbit, with equatorial crossing times of approximately 0930 (descending overpass) and 2130 (ascending overpass) local solar time. ASCAT surface soil moisture retrievals on 0.25° spatial grid were obtained from Naeimi et al. (2009) by applying the Water Retrieval Package 5 (WARP5) algorithm. For details regarding retrieval of ASCAT surface soil moisture data, refer to Wagner et al. (1999) and Naeimi et al. (2009). Retrievals of ascending and descending overpasses are combined into a daily dataset. In cases where both ascending and descending retrievals for a grid cell are available on a single day, the two retrievals are averaged to obtain a single daily value.
The SMOS satellite, designated for soil moisture observation, was launched on 2 November 2009 by the European Space Agency (ESA). The Microwave Imaging Radiometer using Aperture Synthesis (MIRAS) on board SMOS senses L-band microwave emission (1.400–1.427 GHz) that penetrates the top 5 cm of the soil column. Data used in this study are the SMOS operational Soil Moisture User Data Product 2, version 5.01 (SMUDP2; http://earth.esa.int/smos), retrieved using the Dobson dielectric model and processed at the National Oceanic and Atmospheric Administration (NOAA) National Environmental Satellite, Data, and Information Service (NESDIS) into a daily 0.25° gridded product as part of the Soil Moisture Operational Product System (SMOPS; a full description of SMOPS is available at www.ospo.noaa.gov/Products/land/smops/). As with ASCAT, soil moisture retrievals obtained from ascending and descending SMOS overpasses (at 0600 and 1800 local solar time) of SMOS are combined into a single daily dataset. When merged in this way, the combination of ASCAT and SMOS soil moisture datasets provides near-daily coverage of the study sites.
SMART has generally been applied to the correction of real-time precipitation products derived exclusively from satellite-based observations (Crow et al. 2009, 2011). Therefore, results presented here are meant to reflect data availability restrictions commonly faced by hydrologic prediction systems operating in ground data–poor regions. Therefore, the precipitation dataset used to force the SAC model, and subject to correction via SMART, is the daily accumulation from Tropical Rainfall Measuring Mission (TRMM) Multi-satellite Precipitation Analysis, version 7 (TMPA), 3B40RT (Ostrenga et al. 2013). This is a real-time, microwave-only product generated without any ground-based rain gauge observations. The NOAA Climate Prediction Center’s (CPC) 0.25°-grid unified gauge-based retrospective analysis of daily precipitation (Xie et al. 2010) is used as an independent source of verifying data for satellite-derived (and/or SMART corrected) rainfall accumulation estimates.
The products described above are spatially resampled to provide watershed-scale averages for all basins shown in Fig. 1. Daily precipitation is defined as the total depth of precipitation between 1200 and 1200 UTC. Accordingly, observed daily mean streamflow is obtained from hourly discharge data between 1200 and 1200 UTC measured at USGS gauges located at watershed outlets and converted to water-equivalent depth. However, the defined day for soil moisture products is shifted 12 h forward (i.e., from 0000 to 2400 UTC) in order to capture the delay between possible rainfall and resulting soil moisture.
3. Modeling and data assimilation methodology
a. Hydrologic modeling
All hydrologic modeling was based on application of the SAC hydrologic model to the 13 basins described in section 2 and Table 1. The SAC model has been used extensively for operational streamflow forecasting within medium-sized (~1000 km2) river basins in the United States (Burnash et al. 1973; Georgakakos 2006). Soil moisture accounting in the model is based on the estimation of six interdependent soil water states: upper-zone free water content (UZFWC), upper-zone tension water content (UZTWC), lower-zone tension water content (LZTWC), lower-zone free primary water content (LZFPC), lower-zone free supplemental water content (LZFSC), and basin-saturated fraction [additional impervious area (ADIMP)]. The movement of water between these states is calculated using the SAC model parameterization described in Sorooshian et al. (1993). Forced by PET and precipitation, the SAC model produces four separate runoff processes: surface infiltration excess runoff (SER) occurring when rainfall accumulation within a given time step is large enough to fill available upper-zone tension and free water storage capacity; surface saturation runoff (SSR) occurring when rainfall falls on saturated portions of the basin (as defined by ADIMP); shallow subsurface interflow (SIF) expressed as a direct function of UZFWC; and deep base flow (BF) expressed as a direct function of LZFSC and LZFPC. The SAC model was run on a daily time step for a 2.5-yr period between 12 January 2010 and 31 May 2012.
Table 2 provides a list of key SAC model parameters calibrated for the period of simulation (from 12 January 2010 to 31 May 2012). Calibrated parameter values for each of the 13 study basins shown in Fig. 1 were obtained via optimized calibration against USGS streamflow observations using the shuffled complex evolution method developed at the University of Arizona (SCE-UA) algorithm (Duan et al. 1994). Calibrated parameter sets in all thirteen basins met the baseline criteria of producing a Nash–Sutcliffe efficiency of 0.50 (or higher) in daily streamflow when SAC is forced using the NOAA CPC rainfall product. Based on these calibrated parameters, Fig. 2 shows comparisons between observed streamflow and SAC predictions (forced by CPC and TMPA rainfall, respectively) for three representative basins.
b. Data assimilation
Two separate data assimilation approaches are applied to integrate remotely sensed soil moisture into SAC streamflow modeling. The first approach, described in section 3b(1), is via the SMART algorithm (Crow et al. 2011), which aims to correct precipitation accumulation data before it is used to force the SAC model. The second approach is correcting the SAC soil moisture states using either an ensemble Kalman filter (EnKF) or smoother (EnKS) embedded in the SAC modeling process, described in section 3b(2). Results will be presented for the application of each approach in isolation and both modules combined (to correct both antecedent soil moisture and incident rainfall accumulations). A triple collocation (TC) procedure used to derive observation error variances and to scale observations to model space is described in section 3b(3), and the application of a bias correction technique for the EnKF/EnKS analysis is described in section 3b(4).
1) Rainfall correction using the SMART
The SMART algorithm was first described in Crow et al. (2009) and later updated in Crow et al. (2011) and Chen et al. (2012). Its basic function is the correction of satellite-based rainfall products using filtering innovations calculated during the assimilation of remotely sensed surface soil moisture products into a simple water balance model. Past work with SMART has generally focused on the assimilation of a single soil moisture retrieval product, which restricted it to the correction precipitation accumulations within a >3-day accumulation window (Crow et al. 2009). However, here we will apply SMART to simultaneously integrate both (passive) SMOS and (active) ASCAT retrievals in an attempt to correct daily rainfall inputs required by a hydrologic model.
At its core, SMART is based on the application of an Antecedent Precipitation Index (API) model:
where P′ is a (potentially erroneous) rainfall accumulation estimate obtained on day i and is a dimensionless API loss coefficient that varies according to day of year D:
The API model described by Eqs. (1) and (2) obviously represents a simple linear modeling strategy that fails to capture both surface saturation and the impact of radiation and meteorology on . Nevertheless, Crow et al. (2011) demonstrate that, for the specific case of applying SMART at relatively coarse spatial scales, it performs as well as more complex modeling approaches. In particular, Crow et al. (2011) demonstrated that attempts to modify Eq. (1) to account for nonlinear surface saturation generally led to degraded SMART results.
When available, remotely sensed surface soil moisture retrieval(s) are used to update Eq. (1) via a Kalman filter:
where the plus and minus superscripts denote values after and before Kalman filter updating on day i. Since SMOS and ASCAT data are simultaneously assimilated, the observation vector Θi is expressed as
where and are SMOS and ASCAT surface soil moisture retrievals that have been transformed from raw surface soil moisture retrievals to the reference space of API via a TC-based preprocessing step [see section 3b(3) for details]. Variable is a dynamic observation operator defined as
where and are logical numbers that have the value of one when the corresponding observation is available on day i and zero otherwise.
The Kalman gain K in Eq. (3) is defined as
where T is the scalar error variance for API forecasts, superscript T indicates transpose, and is the soil moisture observation error covariance matrix defined as
Note that is transformed to the error space of API via a TC preprocessing step [see section 3b(3)].
At times when API is updated via Eq. (3), T is also updated following
Between observations, API is forecasted forward in time using Eq. (1) and T is estimated as
where Z is the model background uncertainty added during every daily forecast time step and represents a conditioning of Z by assuming greater uncertainty in API forecasts when P′ > 0 (Crow et al. 2011). Following Crow et al. (2011), the value of ξ and Z are fixed at 5.0 (dimensionless) and 3 mm2, respectively.
Rainfall correction operates on the basis of analysis increments δ defined by
As δ should correlate with recent random errors in P′, it can be leveraged to linearly correct multiday rainfall accumulations via
where λ is a scaling factor and l is an index of nonoverlapping accumulation windows. The square brackets represent accumulation over a multiday window. Note that the corrected rainfall is not dynamically fed back into the API model. In practice, the length of the correction window is allowed to vary according to the availability of soil moisture observations (Chen et al. 2012). In other words, each correction window is defined so that a single satellite observation is available on the last day of the window. Therefore, this approach ensures that soil moisture retrievals are intuitively leveraged to improve only past, instead of future, rainfall accumulations. The corrected rainfall accumulations are then redistributed to daily values in proportion to the original daily rainfall within each correction window. When is zero and is positive, the addition is applied to the last day of the correction window.
The constant λ in Eq. (11) can be calibrated at each watershed so that the root-mean-square error (RMSE) of against a benchmark precipitation product (e.g., CPC) is minimized. However, temporally and spatially fixed λ values have also been shown to yield acceptable results (Crow et al. 2009, 2011; Chen et al. 2012). Here, following earlier results present in Crow et al. (2009), a fixed value of λ = 0.50 is applied. Negative values of produced are reset to zero. Since such resetting may introduce a positive overall bias in , the entire corrected time series is multiplicatively rescaled to match the long-term mean of P′.
SMART has been found to overestimate the occurrence of low-intensity rainfall events when positive noise in soil moisture due to retrieval error is misidentified as rainfall signals (Crow et al. 2009, 2011). Here a modification is proposed to reduce this tendency: when is zero, rainfall correction is applied to the last day of the window only when is greater than or equal to 2 mm. This 2-mm threshold represents an arbitrary rainfall intensity over which the signal from soil moisture is considered an indication of real rainfall, with full awareness of a trade-off that real rainfall signals under 2 mm may be discarded. Likewise, saturation of near-surface soil moisture can lead to conditions where soil moisture dynamics do not fully capture accumulation from large rainfall events. This limitation is discussed at length later in the paper (see section 5).
2) State correction using ensemble Kalman filter/smoother
A 35-member EnKF is employed to assimilate satellite-based surface soil moisture retrievals into the SAC model. The choice of 35 as ensemble size is arbitrary; however, larger ensemble sizes failed to produce significant changes to presented results. The EnKF is based on the generation and propagation of a Monte Carlo ensemble of model replicates to provide the error covariance information required by the Kalman filter update equation. An ensemble of model realizations is generated by perturbing key SAC model forcings (i.e., PET and P) and by the application of direct perturbations to the six SAC model soil moisture states. Additive PET perturbations are randomly sampled from a Gaussian distribution with a mean of 0 mm and a standard deviation of 1.0 mm (negative values are reset to zero). Precipitation is perturbed by multiplying a random factor sampled from a mean-one, lognormal distribution with a dimensionless standard deviation of 1.0. Daily additive noise applied to individual soil moisture states are assumed to be serially uncorrelated and mutually independent random variables sampled from a mean-zero Gaussian distribution with a standard deviation equal to 2.0% of the total capacity of each state. Note that the integration of these perturbations in time leads to the generation of realistic auto- and cross-correlated error in SAC soil moisture states.
At time i, the vector of SAC model states associated with the jth Monte Carlo replicate is given as
This vector can be transformed into an estimate of volumetric surface soil moisture using a linear observation operator:
where or are logical numbers that equal one when the corresponding observation is available on day i and zero otherwise. The constant φ is used to convert SAC upper-zone soil water content to volumetric soil moisture (m3 m−3):
where ρ is soil porosity, UZFWCmax (mm) is the maximum capacity of free water in the surface zone, and UZTWCmax (mm) is the maximum capacity of tension water in the surface zone. When remotely sensed surface soil moisture observation is available, replicates of S are updated via
where the observation vector Θi is defined as in section 3b(1), but is transformed to match the reference space of the SAC volumetric surface soil moisture using a TC method [see section 3b(3)]. The perturbation term υ is a 2 × 1 vector of mean-zero Gaussian random number with variance of
The gain K in Eq. (15) is calculated as
where is a 6 × 6 covariance matrix of forecast error calculated from the 35-member ensemble of background S predictions, and is the observation error covariance matrix defined as
which has been transformed to the error space of SAC-modeled volumetric surface soil moisture [see section 3b(3)]. Final EnKF state analyses are obtained by averaging across the ensemble mean of S.
In contrast to the EnKF, where model states are only updated using concurrent observations, a 1-day-lag EnKS is also employed to use the observation at time j + 1 to update the model states between time j and j + 1 (inclusive of end points). While the SAC model is run on a daily time step, the three free water states (i.e., UZFWC, LZPFC, and LZSFC) and ADIMP actually evolve on a 3-hourly basis. Thus, with an observation at time i, the EnKS updates an augmented, 40-element state vector , which includes the six SAC soil moisture state variables at times j − 1 and j as well as the subdaily states of the four variables (i.e., UZFWC, LZPFC, LZSFC, and ADIMP) between times j − 1 and j. The matrix is the new covariance matrix for this 40-element augmented state vector . As in the EnKF, components of this augmented covariance matrix are sampled directly from the SAC model ensemble and updated with an expression analogous to Eq. (15):
and is a 2 × 40 matrix similar to (i.e., elements corresponding to UZFWC and UZTWC are either or , all others being zero). As in EnKF, final EnKS state analyses are obtained by averaging across the updated soil moisture ensemble.
3) Observation rescaling and error estimation
Satellite soil moisture retrievals typically reflect a different long-term mean and dynamic range than comparable soil moisture time series generated by the assimilation model that they must be integrated into (Reichle et al. 2004). Therefore, following Yilmaz and Crow (2013), the TC-based rescaling method proposed by Stoffelen (1998) is applied to 1) rescale ASCAT and SMOS soil moisture retrievals into a climatology consistent with the SAC and/or API model prior to data assimilation and 2) obtain error covariance information for the subsequent assimilation of ASCAT and SMOS in both the API model for SMART and the SAC model for the EnKF/EnKS implementations. Note that the surface layer in the SAC model (i.e., the upper zone) is implicitly defined by the calibrated capacity of free and tension water contents (i.e., parameters UZFWM and UZTWM; Table 2) and typically does not match the penetration depth of satellite retrievals (0–5 cm). However, by scaling the SMOS and ASCAT data to the climatology and approximate dynamic range of the model surface-layer soil moisture, the impact caused by different vertical representativeness can be minimized.
This approach is based on the assumed availability of three independent soil moisture datasets: , , and . Each time series is decomposed into a corresponding climatological anomaly time series by subtracting the long-term 31-day moving average from the raw data, where is sampled from a multiyear dataset using a 31-day sampling window centered on the day of the year (i.e., D). This is done for two reasons. First, the observational error variance applied in a Kalman filter should reflect random error sources and not systematic errors due to, for example, differences in seasonal soil moisture climatologies (Crow and van den Berg 2010). Second, past work has demonstrated that TC-derived error variances are more accurate when calculated for anomalies relative to a seasonally varying climatology relative to absolute soil moisture products (Miralles et al. 2010; Draper et al. 2012).
Here, will refer to soil moisture products acquired from the offline model integration (without any actual data assimilation). During application of TC to data assimilation activities, this represents the appropriate reference for rescaling the other two soil moisture products. Therefore, using as the model reference (API in the case of SMART rainfall correction and SAC in the case of EnKF and EnKS state correction) and and representing SMOS and ASCAT products, is scaled to the reference space of via
Note that is the rescaled anomaly of in the reference space of , and and are the mean of and . Following Stoffelen (1998), the scaling factor is calculated from
where the overbar denotes temporal averaging. Similarly, can be calculated for . Then, based on a typical TC analysis, the error variances of and can be estimated as
provided that an assumption of mutually independent errors is met. These values of and are used to formulate the observation error covariance matrix and ensure that the relative uncertainties in SMOS and ASCAT soil moisture retrieval products are properly considered during data assimilation.
Finally, and are transformed to the mean reference space of via
where j is the temporal index (day) and Dj is the corresponding Julian day D (or day of year), and is the long-term mean of on D sampled from the multiyear time series using a 31-day sampling window centered on D. Note that the use of TC to constrain modeling (as opposed to observation) errors is complicated by the nonlinear nature of the SAC model (Crow and Yilmaz 2014). As a result, TC is used only to estimate the relative size of active and passive soil moisture retrieval errors.
The separate treatment of seasonality and anomalies (i.e., decomposing–combining) ensures that the final rescaled datasets have similar dynamic range as the reference dataset and removes potential seasonal differences between them. Prior to the data assimilation runs, the above procedure is performed twice to provide one set of scaled error variances and observations for SMART rainfall correction (based on the API model) and a separate set for EnKF/EnKS SAC model state correction.
4) Correction of perturbation bias
Because of the nonlinear nature of most hydrologic models, an ensemble of model forecasts perturbed by mean-zero Gaussian noise can produce biased background predictions. During the application of sequential data assimilation, such bias can lead to significant cumulative water balance errors (Ryu et al. 2009). Here, the simple bias correction method introduced in Ryu et al. (2009) is applied during the EnKF/EnKS state correction analysis. Specifically, the mean bias of perturbed state variable y of an n-member ensemble at time t is sampled as
where is a model forecast generated from a single open loop run without perturbing forcing and states. The perturbation bias sampled in Eq. (26) is then subtracted from each (perturbed) to ensure a bias-zero forecast ensemble. Past work suggests that this correction can significantly improve EnKF/EnKS profile soil moisture estimates (Ryu et al. 2009). Because of the linear nature of the API model, no comparable bias correction is applied within SMART.
c. Data assimilation strategies
Using real satellite-based surface soil moisture retrievals (obtained from both the passive microwave SMOS sensor and the active microwave ASCAT instrument), this study examines five different data assimilation strategies for improving SAC streamflow predictions.
A rainfall-correction-only (RC) strategy based on the application of the SMART algorithm to correct rainfall forcing data prior to its input into the SAC model. Note that this approach, unlike all the other strategies introduced below, does not involve the assimilation of active–passive soil moisture retrievals into the SAC model (and thus no direct state correction).
Two separate state-correction-only (EnKF and EnKS) strategies where active–passive surface soil moisture retrievals are directly assimilated into the SAC model using EnKF or EnKS to update SAC soil moisture states. An SAC streamflow prediction is then obtained by initializing a separate SAC realization with updated model states and forcing it with uncorrected TMPA 3B42RT rainfall. In contrast with the RC strategy introduced above, no rainfall correction is attempted.
Two separate dual-correction strategies (RC–EnKF and RC–EnKS), in which both state and rainfall correction is attempted. An illustration of these strategies is provided in Fig. 3. Here, active–passive soil moisture is assimilated into an ensemble of SAC realizations—forced by uncorrected TMPA 3B40RT rainfall data—using either an EnKF or EnKS, and a single set of corrected soil moisture states (i.e., the soil moisture analysis) are obtained by averaging the updated ensemble. As the data assimilation system marches forward in time, a new 1-day SAC model simulation is initialized every day using the previous day’s EnKF/EnKS soil moisture analysis and forced by corrected rainfall (obtained from a separate application of SMART) to produce the dual-correction streamflow estimates. Following the generation of streamflow, this 1-day SAC simulation is terminated and its state predictions are discarded. Note that feeding these state predictions back into the EnKF/EnKS data assimilation system would result in the cross correlation of modeling and observational errors and, therefore, the degradation of EnKF/EnKS state estimates (CR09).
Prior to consideration of SAC streamflow results, the improvement of TMPA 3B40RT rainfall via SMART application is evaluated in section 4a. This is followed by an assessment of the bias correction scheme described above on SAC streamflow results in section 4b. Finally, streamflow data assimilation results for the five separate cases described in section 3c are presented in section 4c.
a. SMART rainfall correction
Based on application of the SMART approach with the proposed modifications outlined in section 3b(1), daily TMPA 3B40RT rainfall accumulations are corrected using ASCAT and SMOS soil moisture observations. These corrections are verified based on comparisons against daily rainfall estimates obtained from the CPC rain gauge analysis. The application of SMART using ASCAT and SMOS surface soil moisture retrievals substantially improves the coefficient of determination R2 and RMSE performance of daily TMPA accumulations relative to the CPC benchmark for the case of optimized λ (on a basin by basin case) to minimize RMSE differences relative to CPC (Fig. 4). Moreover, replacing these optimized values with a globally fixed value of λ = 0.50 results in only a slight reduction in the performance of the SMART algorithm (Fig. 4). As a result, all subsequent SMART results are based on using a globally fixed value of λ = 0.50.
In addition to R2 and RMSE results shown in Fig. 4, changes in three categorical skill metrics are also assessed: false alarm ratio (FAR), probability of detection (POD), and threat score (TS). These metrics are calculated based on the number of events (defined as a daily rainfall accumulation in excess of a given threshold) falsely predicted, successfully captured, or missed:
where H is the number of events successfully predicted by a given rainfall product, F is the number of nonevents (i.e., actual rainfall below event threshold) erroneously predicted to occur, and M is the number of actual events that are missed. As a result, FAR refers to the fraction of predicted events (at a given percentile threshold) that are actually nonevents and POD refers to the fraction of all qualifying events correctly predicted. A perfect rainfall time series would yield FAR = 0, POD = 1, and TS = 1 for any given threshold value. Assuming the CPC rain gauge analysis as truth, average changes in the 13 basins in FAR, POD, and TS after SMART correction are calculated for 1-, 3-, and 5-day accumulations (Fig. 5). Given the known tendency of the CPC gauge analysis to overpredict frequency of light events due to spatial interpolation (Janowiak et al. 2004), “true” events of less than 2.0 mm are excluded from the categorical metrics analysis.
In contrast to the almost uniformly good R2 and RMSE results shown in Fig. 4, a number of problems emerge in Fig. 5. First, SMART frequently misinterprets soil moisture retrieval noise as low-intensity rainfall. As a result, the implementation of SMART tends to increase FAR at low-intensity thresholds. Second, SMART has a tendency to underpredict large events and therefore decrease their POD. As discussed in Crow et al. (2011) and Chen et al. (2012), this can be attributed to the saturation of the surface layer during periods of intense precipitation and the resulting difficulty of inverting a saturated soil moisture signal into an accumulation amount. Additional discussion of difficulties encountered during high precipitation/streamflow events is presented in section 5.
Nevertheless, as an integrated measure of performance that is sensitive to both FAR and POD, TS results in Fig. 5 suggest an improvement in SMART-corrected accumulations over a broad range of intensity thresholds. Although updating at near-daily frequency is possible, better categorical skill obtained in the 3- and 5-day scales (relative to daily) implies that SMART may ultimately be better suitable for application in larger basins with longer streamflow concentration times.
b. Ensemble perturbation bias correction
One issue facing the EnKF approach is the biased background predictions arising from the nonlinear model physics, which could degrade the subsequent data assimilation analysis (De Lannoy et al. 2006; Ryu et al. 2009). Here, we examine the impact of the Ryu et al. (2009) bias removal procedure described in section 3b(4) on EnKF-predicted streamflow. Note that results are based on the use of uncorrected TMPA 3B40RT rainfall (as opposed to corrected output from the SMART algorithm).
For all 13 basins described in Table 1, Fig. 6a shows the impact of ensemble bias correction on the RMSE between an SAC–EnKF analysis (based on simultaneously assimilating both active ASCAT and passive SMOS surface soil moisture retrievals) and observed streamflow. Application of the approach results in reduced streamflow RMSE (versus USGS streamflow gauges) in 9 out of the 13 basins examined. RMSE results in Fig. 6a are highly sensitive to performance during peak streamflow events. To examine the impact of data assimilation on the low-flow (or base flow) component of the hydrograph, changes in RMSE (ΔRMSE) for log-transformed streamflow is also shown in Fig. 6b. Since this procedure is found more effective in removing bias in deep-layer soil moisture (Ryu et al. 2009), it is expected to have more impact on EnKF/EnKS base flow prediction. Indeed, this trend is reflected on the larger magnitude of ΔRMSE in log flow (Fig. 6b) versus raw flow values (Fig. 6a). In addition, forecast bias correction has greater positive impact on the grass-/cropland-dominated western basins examined, while less improvement is noted in the wetter and more heavily vegetated eastern basins (Fig. 6).
Streamflow time series for two representative basins (basins 3 and 7 in Table 1) are shown in Fig. 7. In both basins (as well as in all other basins), EnKF-predicted low flow is persistently reduced by an almost constant volume throughout the simulation period, after bias correction. This indicates a persistent positive bias generated in the EnKF perturbation process, likely more significant in the deep-layer storages, that is adjusted for via the bias correction. The cause of this positive bias is that, in this relatively arid region, deep soil layers rarely reach saturation but are often near depletion, so that positive perturbations are usually kept while negative values are frequently truncated when they result in negative soil moisture flows. During observed low-flow periods, SAC predictions are often underestimated, or even intermittent (e.g., zero flow). In the case of basin 7, EnKF-generated low flow is not enough to make up the missing volume (Fig. 7d). Although it is only decreased by a marginal amount (on the order of ~0.04 mm day−1) after bias correction, the proportional reduction is significant, and it leads to the sharp increase in proportional RMSE of log flow (i.e., highest positive ΔRMSE in Fig. 6b). However, the degrading effect on high flow is small, and overall RMSE is improved (Fig. 6a). In basin 3 (Figs. 7a,b), the EnKF run produces a consistently positive bias during the low-flow periods, which is reduced by ~40% after bias correction is applied. Overall, Figs. 5 and 6 support the ability of the Ryu et al. (2009) bias correction procedure to improve EnKF streamflow estimates. As a result, it is applied to all subsequent EnKF and EnKS analyses.
c. Streamflow results
Following the SMART-based correction of rainfall (section 4a) and the bias correction of EnKF/EnKS (section 4b), we are now ready to examine streamflow predictions associated with all five data assimilation cases introduced in section 3c. In particular, Fig. 8 summarizes the average normalized RMSE (i.e., ratio of RMSE after data assimilation to the RMSE of the SAC control run) across all 13 basins for application of 1) the RC case, where SAC is forced with SMART-corrected rainfall and no state correction; 2) the ENKF state-correction-only case, where antecedent states are updated using an EnKF but SMART is not applied; 3) the EnKS state-correction-only case, where antecedent states are updated using an EnKS but SMART is not applied; and 4) two dual (RC–EnKS and RC–EnKF) cases where simultaneous rainfall and antecedent soil moisture state correction are attempted.
Overall, results in Fig. 8 demonstrate a reduction in streamflow prediction errors (i.e., a normalized RMSE of less than one) for all five cases. However, the SMART-based RC appears to provide the most benefit for reducing the normalized RMSE of the raw flows (Fig. 8a). This is consistent with earlier synthetic results in CR09, where average normalized streamflow RMSE of ~0.5 (RC), 1.0 (EnKF), 0.75 (EnKS), and 0.45 (EnKS–RC) were obtained for 97 individual MOPEX basins. On the other hand, when low-flow components are emphasized by examining log flows in Figs. 8b and 8d, the average skill obtained by the EnKF/EnKS cases is higher than that of the RC cases, suggesting that state correction has more impact on flow estimation during interstorm, dry-down periods. As in the CR09 synthetic experiment, the dual-correction cases demonstrate virtually the same amount of overall skill during low-flow conditions as the state-correction-only EnKF and EnKS cases.
However, closer examination reveals several noteworthy differences between real-data results presented here and the earlier synthetic results in CR09. First, unlike CR09 results in which EnKF state correction alone had little positive impact on runoff prediction, the real-data EnKF case proved capable of improving RMSE in all but one basin (see Fig. 8a). This may be partially attributed to the benefit of the bias correction approach applied here (but not in CR09). Secondly, the normalized RMSE for the real-data EnKS case is only marginally lower than comparable EnKF case results, whereas the relative advantage of the EnKS over an EnKF was much more significant in CR09. Finally, counter to the results in CR09, the dual-correction cases (EnKF–RC and EnKS–RC) do not clearly outperform the RC case in most basins for flow peaks (manifested in RMSE; Fig. 8a). This can be largely attributed to the synergism of RC and EnKF/EnKS that tends to produce underestimated flow estimations (see discussion in section 5).
Overall, results presented here are encouraging for efforts to improve operational streamflow forecasts using remotely sensed surface soil moisture. Rainfall correction using the SMART algorithm is able to improve the flow prediction, especially the high-flow component, effectively, although at the cost of underestimating some of the largest flow events (discussed below). Although less as effective, state (soil moisture) correction can also improve the flow prediction and contribute more to the low-flow component. When rainfall and state corrections are combined, RMSE of both the high- and low-flow components can be significantly reduced by ~40% and ~30%, respectively.
However, a common theme running through this analysis, as well as earlier SMART validation work (see, e.g., Chen et al. 2012), is the challenge of applying soil moisture–based assimilation strategies to improve the characterization of intense precipitation/streamflow events.
The reliable prediction of high-flow episodes is, of course, a vital goal for most hydrologic prediction systems. Given known shortcomings in SMART-corrected rainfall with regards to the detection of intense precipitation (Crow et al. 2011; Chen et al. 2012), it is important to specifically assess the impact of soil moisture data assimilation on high-flow events. To demonstrate the response of high-flow events to changes in forcing (precipitation) and antecedent state (soil moisture) due to the data assimilation (DA) procedures, Fig. 9 plots the post-DA increment of streamflow Q, rainfall (i.e., P), and antecedent surface-layer soil moisture SM of “high flow” events from all basins in the RC and EnKF cases. High-flow events are defined as daily observed streamflow volume of ≥5 mm (converted to water-depth unit for each basin). Based on this threshold, a total of 233 high-flow events (accounted for 2% of total daily flow events) are identified for the 13 basins studied. As in the rainfall correction case, data assimilation (both rainfall and state correction alone) often leads to a negative skew in ΔQ for large-flow events, and the largest negative ΔQ events are primarily associated with decreased rainfall (Figs. 9a,b). Figure 9a reveals a clear tendency for SMART to reduce P (and therefore Q) for many of the high-flow events. As noted in Fig. 5, this strong skeptical bias toward high P events has a deleterious effect of the POD performance of SMART during intense rainfall events.
There appears to be at least two sources for this bias. First, the Kalman filter at the core of the SMART approach has a tendency to produce conditionally biased results. That is, in an attempt to minimize the error variance of its P predictions, SMART will systematically underpredict peak values. This problem is well known in variance-minimized rainfall estimates (Ciach et al. 2000) but has likely not received the attention it deserves in land data assimilation. This limitation is obviously serious for hydrologic applications tasked with high-flow (i.e., flood) detection and forecasting. It is worth noting that other techniques exist for correcting precipitation via soil moisture exist that do not utilize a variance-minimizing strategy (e.g., Brocca et al. 2013). Therefore, it is possible that other rainfall correction techniques may be better suited to estimating very large rainfall/streamflow events than SMART.
Second, during periods of intense precipitation, surface soil moisture will reach saturation and can no longer be reliably inverted into a rainfall estimate. In practice, this makes it difficult for SMART to accurately identify the occurrence of intense rainfall events. Crow et al. (2011) attempted to address this POD problem for large events through specific modifications to the SMART rainfall correction approach—in particular, the addition of new terms into Eq. (9). However, the fact that POD problems persist here (in a follow-on analysis using different data products) implies that the changes suggested by Crow et al. (2011) to improve the large-event POD were ad hoc and not generally effective. Furthermore, because of the strong coupling between ΔP and changes in ΔQ for large events (Fig. 9), SMART POD problems (for rainfall) are likely to be carried over into streamflow results generated from the SAC hydrologic model (shown in Table 3).
In addition to rainfall correction, updating model soil moisture states is expected to have a positive impact on detecting high-flow events since it is a key factor that determines the partitioning between surface runoff and infiltration (e.g., Li et al. 2013; Marchi et al. 2010). Unfortunately, EnKF and EnKS state-correction-only cases are found to result in a net decrease of POD for high-flow events here (see Table 3). In addition to the inherent conditional bias issue discussed above, the mismatched timing of satellite overpass and rainfall event(s) may also contribute to the underestimation in predicted high-flow events by the EnKF and EnKS state-correction-only cases. When satellite overpasses occur prior to a rainfall event, observations will indicate a drier condition, while observations at a much later time relative to rainfall may not adequately capture the signal in soil moisture. Both situations will falsely introduce a negative bias into model states and subsequent streamflow conditions. In such cases, the degrading effects can be more serious for EnKS, because the instantaneous satellite measurement is used to update all 3-hourly subdaily states. Thus, the advantage of EnKS (i.e., its ability to improve runoff prediction of time j as well as j + 1, j + 2, …, with observation at time j) turns into a disadvantage; this explains the greater number of underestimated moderate events for the EnKS case relative to the EnKF case (Table 3). Such impacts can potentially be mitigated by calculating API at a finer temporal scale (e.g., 3 or 6 hourly) and by assimilating remotely sensed soil moisture retrievals, which are more accurately matched to the timing of individual precipitation events.
The design of appropriate data assimilation systems to simultaneously ingest both active and passive microwave-based surface soil moisture retrievals represents a critical goal for facilitating the development of hydrologic applications for SMAP surface products. Here, satellite surface soil moisture retrievals are used in a suite of data assimilation procedures in an attempt to improve streamflow predictions obtained from a hydrologic model. This attempt is based on applying data assimilation strategies that address errors in both prestorm soil moisture conditions and/or intrastorm rainfall accumulation estimates.
Unlike the earlier synthetic analysis in CR09, this analysis is based on real surface soil moisture retrievals derived from real, satellite-based active (i.e., ASCAT) and passive (i.e., SMOS) microwave sensors. By simultaneously assimilating both products, a near-daily average update frequency is obtainable, which is important for flow prediction in relatively small watersheds with short streamflow concentration temporal scales. The ensemble bias correction procedure outlined in Ryu et al. (2009) is also validated and exercised in the EnKF/EnKS routines. Another major methodological advance is our use of a TC approach for rescaling remote sensing observations and obtaining accurate estimates of their error variance statistics.
In general, results presented here are consistent with earlier synthetic results presented in CR09, which demonstrate the potential of combined rainfall–state correction (based on the assimilation of satellite-derived surface soil moisture retrievals) to improve estimation of both the peak and base flow components of total streamflow. In particular, peak flow estimation generally benefits the most from (SMART based) rainfall correction while improved low-flow predictions are primarily aided by state correction via an EnKF/EnKS. On the other hand, a number of notable differences between the results here and CR09 are also found. Here, the real-data, EnKF-only case proved capable of improving streamflow RMSE in most basins studied—a capability that was not noted in the earlier CR09 synthetic analysis. This improvement may be related to the bias correction procedure adopted here that largely removed the soil moisture bias caused by ensemble perturbation. Furthermore, unlike the CR09 results, the real-data EnKS case has little advantage over the EnKF case, and the dual-correction cases (state and rainfall) do not outperform the RC case in most basins in terms of flow peaks, suggesting that the advantage associated with the application of a smoother in the CR09 synthetic analysis may be difficult to realize in a real-data environment.
Moreover, a major unresolved challenge is the underprediction of intense precipitation/high-flow events (see section 5). Difficulties in predicting large-flow events represent, of course, a major shortcoming for any hydrologic prediction system, and future research is required for minimizing this tendency. Nevertheless, the results of this analysis illustrate the future potential of utilizing simultaneous active–passive surface soil moisture retrievals to correct state and forcing errors degrading hydrologic modeling predictions. Naturally, these efforts will receive a significant boost with the late 2014 launch of the NASA SMAP mission.
Research was funded via a grant from the NASA Precipitation Measurement Mission (PMM) program (W. T. Crow, principal investigator).
This article is included in the NASA Soil Moisture Active Passive (SMAP) – Pre-launch Applied Research Special Collection.