This study evaluates rainfall estimates from the Next Generation Weather Radar (NEXRAD), operational rain gauges, Tropical Rainfall Measuring Mission (TRMM) Multisatellite Precipitation Analysis (TMPA), and Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks Cloud Classification System (PERSIANN-CCS) in the context as inputs to a calibrated, distributed hydrologic model. A high-density Micronet of rain gauges on the 342-km2 Ft. Cobb basin in Oklahoma was used as reference rainfall to calibrate the National Weather Service’s (NWS) Hydrology Laboratory Research Distributed Hydrologic Model (HL-RDHM) at 4-km/l-h and 0.25°/3-h resolutions. The unadjusted radar product was the overall worst product, while the stage IV radar product with hourly rain gauge adjustment had the best hydrologic skill with a Micronet relative efficiency score of −0.5, only slightly worse than the reference simulation forced by Micronet rainfall. Simulations from TRMM-3B42RT were better than PERSIANN-CCS-RT (a real-time version of PERSIANN-CSS) and equivalent to those from the operational rain gauge network. The high degree of hydrologic skill with TRMM-3B42RT forcing was only achievable when the model was calibrated at TRMM’s 0.25°/3-h resolution, thus highlighting the importance of considering rainfall product resolution during model calibration.
The prediction of runoff ranging from common flows to extreme, small-scale events (i.e., flash floods) requires accurate estimates of rainfall available in real time. These estimates can be provided by in situ rain gauge networks, remote sensing platforms such as weather radars and satellites, and combined, multisensor algorithms. Gourley et al. (2010b) found the skill of rainfall estimated from the aforementioned sources varied as a function of spatial scale, temporal scale, and rainfall intensity. Here, we compare the same rainfall products, but in this study they are evaluated in the context as inputs to a calibrated, distributed hydrologic model on a densely gauged 342-km2 catchment. The hydrologic skill to be quantified for each rainfall algorithm will help guide and optimize multisensor merging approaches such as incorporating hourly and monthly rain gauge observations and downscaled microwave data to correct rainfall from remote sensing systems. While we evaluate products that are available in real time, near–real time, and in retrospect, we intend to highlight the best merging strategies for real-time applications. We elucidate whether the differing rainfall product resolutions and operational rain gauge densities are sufficient for flood and flash-flood prediction. In this latter step, we account for the rainfall product resolution differences when calibrating and evaluating the simulations from a distributed hydrologic model.
Evaluation of remote sensing algorithms is of particular interest for rainfall in ungauged basins that has been previously unobserved, and is now monitored operationally and quasi-globally up to the maximum latitude band of 50°N/S from instruments on board low-earth-orbiting and geostationary satellites. The consequence of satellite-derived products is the indirectness of distant radiance measurements to surface rainfall rates resulting in large uncertainties. Reviews of the uncertainties in high-resolution satellite rainfall products can be found in Astin (1997), Steiner et al. (2003), Gebremichael and Krajewski (2004), Hong et al. (2006), Ebert et al. (2007), Hossain and Huffman (2008), Villarini et al. (2009), and others. Despite these uncertainties, the availability of operational satellite rainfall products with quasi-global coverage has led to the demonstration of flood modeling and landslide applications on a global scale (e.g., Hong et al. 2007a,b; Yilmaz et al. 2010).
Given the recent potential to monitor rainfall globally from space, the effect of these rainfall algorithms’ accuracy and resolution on the skill of hydrologic simulations has been a topic of interest (Hossain and Anagnostou 2004; Hossain et al. 2004; Yilmaz et al. 2005; Hossain and Lettenmaier 2006). Hossain and Anagnostou (2004) assessed the potential utility of rainfall estimates from passive microwave (PM) and infrared (IR) sensors for flood prediction in medium-sized (50–500 km2) basins. Their study highlighted the complexity of satellite sensor detection capabilities and accuracy as the time and spatial scales of the flooding events became smaller. Hossain et al. (2004) examined the sensitivity of satellite PM retrieval and sampling errors on flood prediction uncertainty on a medium-sized basin in northern Italy using a semidistributed hydrologic model. Regarding temporal sampling frequencies, they found that 3-h rainfall retrievals yielded similar flood prediction uncertainties as hourly inputs, whereas the runoff prediction error amplified by a factor of three when 6-h rainfall inputs were used. Extension of these results to short-duration, extreme-flood-producing storms is one goal of the current study.
Yilmaz et al. (2005) compared mean areal rainfall estimates from satellite, radar, and rain gauges and then evaluated them using a lumped, operational hydrologic model. Differences in the hydrologic skill using Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks (PERSIANN) and rain gauge products were more noticeable in ~1000-km2 basins because random rainfall errors were less likely to cancel out in the translation process from rainfall to runoff. Nikolopoulos et al. (2010), on the other hand, found the propagated error in satellite rainfall magnified with basin drainage area. Both studies suggested evaluating satellite-based precipitation estimates for potential use in flood modeling applications for a variety of storms, basins, and distributed hydrologic models.
We intend to build upon the aforementioned studies by examining the hydrologic skill of rainfall estimates from satellite- and ground-based sensors by considering events ranging from season-scale prediction to short-duration, extreme floods. This evaluation considers the impacts of uncertainties due to rainfall algorithm accuracy while taking into account the product resolution differences in the model calibration step. The effect of spatial rainfall aggregation on peak discharge simulation for extreme flooding events was found to be significant by Sangati et al. (2009). The hydrologic evaluation framework follows the benchmark-dependent path suggested by Hossain and Lettenmaier (2006). We rely on the same Micronet rain gauge data source as Anagnostou et al. (2010) to serve as the benchmark reference rainfall. By aggregating the reference rainfall from the Micronet to the scale of the Tropical Rainfall Measuring Mission (TRMM) rainfall products and then objectively estimating the hydrologic model parameters, we are able to separate the accuracy- and resolution-dependent components of the satellite error structure. We anticipate results from this study will 1) help elucidate the limits of predictability to which satellite-based rainfall products have potential utility as inputs to real-time flood and flash-flood prediction systems relative to forcing from operational rainfall products based on radar, rain gauges, and combinations, and 2) guide the development of combined, multisensor rainfall algorithms by analyzing the impacts of adjustments from rain gauges at hourly and monthly scale, incorporation of downscaled microwave data, and rescaling rainfall to satellite-pixel resolution during model calibration.
The paper is organized as follows. Section 2 describes the study domain including the physical characteristics of the Ft. Cobb watershed, the U.S. Department of Agriculture (USDA) Agricultural Research Service (ARS) Micronet instrumentation, and details of the rainfall-runoff events constituting the wettest season in Oklahoma on record. Section 3 discusses the rainfall algorithms derived from satellite, radar, rain gauges, and combinations, and then compares them over the Ft. Cobb basin to the ARS Micronet rain gauges. The hydrologic evaluation framework is presented in section 4 including a description of the hydrologic model used, automatic estimation of the model parameters taking into account rainfall product resolutions, and the metrics developed to summarize the hydrologic performance conditioned on the different rainfall forcing. The hydrologic assessment is performed for a continuous three-month period as a function of observed discharge magnitude and for a rare, catastrophic flood. A summary of results, conclusions, and future work are provided in section 5.
2. Study domain
The USDA–ARS designated the Ft. Cobb basin as a research watershed in 2005 to study the effectiveness of conservation practices on water quality and wildlife habitat. An ARS Micronet consisting of 15 stations that measure atmospheric and soil properties was installed in the basin, which, when combined with the three U.S. Geological Survey (USGS) stream gauges located therein, makes this heavily instrumented and continuously monitored basin very attractive for conducting detailed hydrologic studies (see Fig. 1). The ARS Micronet consists of a suite of instruments that measure air temperature, rainfall, relative humidity, solar radiation, soil temperature at 5, 10, 15, and 30 cm below ground, and soil water content at 5, 25, and 45 cm below ground. In this study, we use hourly accumulated rainfall measurements from Met One tipping-bucket rain gauges. The data are quality controlled using procedures developed for the Oklahoma Mesonet as described in Shafer et al. (2000) and Fiebrich et al. (2006). We focus on the stream gauge data at the outlet of the 342-km2 subcatchment (USGS site 07325800) and the nearby rain gauges because of the large footprints of the satellite-based rainfall products. The center of the basin is 120–130 km from nearby Weather Surveillance Radar-1988 Doppler (WSR-88D) radars (KFDR and KTLX).
The Ft. Cobb basin is generally considered to be flat with elevations ranging from 379 to 564 m. The USGS National Land Cover Database designates 59% of the land area as cropland (Homer et al. 2007). The second largest designation is grassland. The soils are predominantly of the silt loam and loam type, which have deep profiles making them suitable for agriculture (Soil Survey Staff 1994; Soil Survey Staff 1996). Saturation excess is the primary runoff-producing mechanism on this basin. Typically, most of the rainfall occurs in the spring months with May receiving 150 mm of the annually averaged 800 mm. However, as described below, the anomalous behavior of the summer rains in 2007 provided for a unique, high-resolution rainfall-runoff dataset.
The focus of the study includes rainfall-runoff events from June to August 2007. The Oklahoma Climate Survey reported June 2007 as the wettest month on record in the state of Oklahoma since records began in 1895. In addition to setting records for consecutive days with rainfall reports, the state of Oklahoma had 15 days of damaging flash floods. The most catastrophic event was from a reintensifying Tropical Storm (TS) Erin traversing the state from 17 to 20 August 2007. Details of this unusual event can be found in Arndt et al. (2009). TS Erin reintensified well after making landfall and produced 187 mm of rainfall in three hours at the Ft. Cobb Mesonet rain gauge, which was determined to have a recurrence interval of 500 yr. The flooding impacts from TS Erin in the Ft. Cobb basin cost four people their lives; three perished when their vehicle was swept off the road by floodwaters while the fourth drowned in a flooded basement. The property damage in the basin was estimated at $110,000 while the damage over the entire state of Oklahoma from TS Erin was estimated at $4,960,000 with two additional fatalities (NWS 2007). The basin-averaged rainfall accumulation was 170 mm, which yielded a peak discharge estimated at 209.5 m3 s−1 and event runoff ratio of 0.25. This rather low runoff response was likely due to errors in the rating curve used to estimate discharge from stage height measurements. Because this was an extreme flooding case, the river exceeded its banks and flooded the nearby floodplain. The actual discharge was likely significantly underestimated from the stream-gauge-estimated values. Nonetheless, hydrologic simulations use the same rating curve as the observations and will be biased in the same manner. The dataset in our study comprises hourly accumulations from the ARS Micronet rain gauges, 15-min streamflow measurements from the circled USGS station in Fig. 1, and the rainfall estimates discussed in the next section.
3. Rainfall algorithms
a. Description of products
Rainfall products used in this study are derived from the following sources: WSR-88D radars, operational rain gauges used by the National Centers for Environmental Prediction (NCEP) Environmental Modeling Center (EMC), the ARS Micronet rain gauge network previously described, TRMM Multisatellite Precipitation Analysis (TMPA) (Huffman et al. 2007), and PERSIANN Cloud Classification System (PERSIANN-CCS) (Hong et al. 2004). The temporal and spatial sampling characteristics of these sources differ, and it is the object of this study to determine the impacts of considering these resolution differences in the model calibration procedure.
The radar-based product, which we refer to simply as “radar,” was generated operationally at NCEP–EMC and later retrieved from http://data.eol.ucar.edu/codiac/dss/id=21.090. In other studies, this product has been referred to as the stage II radar-only product. Rainfall is estimated from individual radars using the standard reflectivity-to-rainfall (Z–R) relation used in the U.S. National Weather Service (NWS) (i.e., Z = 300R1.4). The rainfall rates are summed to hourly accumulations and then merged with accumulations from adjacent WSR-88D radars using an inverse distance-weighting (IDW) scheme. The gridcell resolution of the radar product is approximately 4 km, but varies with latitude. While radars can provide rainfall estimates at a 5-min frequency, we use the hourly accumulations as inputs to the hydrologic model.
The operational gauge-based product was also generated at NCEP–EMC and retrieved from http://data.eol.ucar.edu/codiac/dss/id=21.088. In the NWS, this product is often referred to as the stage II gauge-only product; we call it the “gauge” product in this study. The product is derived from myriad rain gauge networks, all of which are automated and report in near–real time. The requirement of automatic collection and transmission of data typically means the instruments are either tipping-bucket or weighing gauges. The point estimates of rainfall are sampled on the same 4-km resolution grid [Hydrologic Rainfall Analysis Project (HRAP) grid] as the radar product using the optimal estimation technique described in Seo (1998). The gauge product did not incorporate ARS Micronet rain gauges, thus these datasets are independent.
The third NCEP–EMC product we use blends information from the above two sources and provides an opportunity for manual quality control and adjustment performed by NWS forecasters. We refer to this multisensor precipitation analysis simply as “stage IV.” Stage IV begins by mosaicking rainfall estimates from adjacent radars. A spatially variable bias field is computed on an hourly basis using collocated rain gauge amounts. The technique of radar-gauge blending has its roots in the precipitation process method (P1) originally developed at the Arkansas–Red Basin River Forecast Center. The bias field is sampled on the 4-km resolution grid using a weighted interpolation scheme. The bias is then reapplied to the radar product so that the spatial variability of rainfall resolved by radars is preserved, and the amounts are now calibrated to rain gauge accumulations. Additional details of the algorithm can be found at http://www.emc.ncep.noaa.gov/mmb/ylin/pcpanl/stage4/.
The TMPA rainfall products evaluated in this study were described by Huffman et al. (2007); a brief summary is provided here. The TRMM-3B42RT product uses information from the TRMM Combined Instrument (TCI), composed of data from the TRMM Ku-Band Precipitation Radar (PR) and Microwave Imager (TMI) on board the core satellite, microwave (MW) data from a variety of low-earth-orbiting satellites, and IR radiance on board a constellation of geostationary satellites. Huffman et al. (2009) described how TRMM-3B42RT is then scaled using the TCI. The intention of this scaling, performed in near–real time, is to minimize the need to incorporate rain gauges to remove bias. The 3B42RT product is derived entirely from remote sensing instruments, providing quasi-global rainfall coverage (50°N/S) at 3-h frequency with a spatial resolution of 0.25° × 0.25°. The second TMPA product evaluated, TRMM-3B42V6, is generated for research purposes and is available approximately 10–15 days after the end of each month. TRMM-3B42V6 incorporates monthly gauge accumulations comprising the 1.0° × 1.0° Global Precipitation Climatology Product (GPCP) as well as the 0.5° × 0.5° Climate Assessment and Monitoring System (CAMS). It is likely that gauges used to construct the gauge product are also incorporated in the GPCP and CAMS datasets. Bias ratios are computed on a monthly basis and then applied back to the 3-h, satellite-based rainfall accumulations on the same 0.25° resolution grid.
The PERSIANN-CCS algorithm extracts cloud features from IR geostationary satellite imagery to estimate rainfall at a resolution of 0.04° × 0.04° every hour. Limitations of IR-based algorithms include the indirectness of brightness temperatures Tb at cloud top to surface rainfall rate R and establishing rain–no rain Tb thresholds. PERSIANN-CCS addresses these issues by segmenting cloud features using an artificial neural network, thus enabling the use of multiple Tb–R relationships in a single IR image. Similar to TRMM, PERSIANN-CCS has a real-time product called PERSIANN-CCS-RT and a postprocessed algorithm called PERSIANN-CCS-MW, which is generated and archived after several days’ delay. The latter algorithm adjusts the real-time product through the use of MW-based rainfall estimates available from low-earth-orbiting satellite platforms (e.g., TRMM, etc.) using a self-organizing nonlinear output model (SONO; Hong et al. 2005). The MW-based scaling to the PERSIANN-CCS-RT product is based on three monthly accumulations and then applied to each hourly rainfall accumulation. This product is referred to as PERSIANN-CCS-MW in this study. Real-time data from the current version of PERSIANN-CCS-RT are available online both at regional (http://hydis8.eng.uci.edu/CCS/) and global scales (http://hydis8.eng.uci.edu/GCCS/).
b. Rainfall comparison
Hourly rain gauge estimates constituting the ARS Micronet are sampled on a grid with 4-km mesh (shown in Fig. 1) using a two-parameter, IDW scheme. The parameters of the IDW scheme are the shape and cutoff radius of the weighting function. These parameters are optimized for each hour using a leave-one-out, cross-validation scheme. Basin averages are computed and then assumed hereafter to represent the true rainfall to evaluate other rainfall products. From June to August 2007, the basin-averaged rainfall accumulation over Ft. Cobb is 499 mm. Figure 2a shows the TRMM-3B42RT product, based entirely on remote sensing data, generally overestimates rainfall with the exception of the TS Erin case on 19 August 2007. The monthly gauge adjustments applied to yield TRMM-3B42V6 reduces the original 237-mm bias down to −117 mm. Most of the negative rainfall bias from TRMM-3B42V6 occurs with the TS Erin case. The radar product grossly overestimates basin rainfall by 391 mm. However, the hourly adjustment by gauges and forecaster quality control procedures in the stage IV product results in a bias of only 37 mm that is consistent throughout the summer. It is not known if this degree of improvement is due to the forecaster quality control procedures or application of hourly rain gauge bias adjustments. Future work should consider adjusting the TRMM-3B42RT product using hourly rather than monthly rain gauge accumulations.
Basin-averaged rainfall from PERSIANN-CCS-RT overestimates accumulated rainfall by 160 mm (Fig. 2b). The incorporation of downscaled MW data halves this bias and thus leads to improvement in the rainfall estimates. Figure 2c shows an intercomparison of rainfall estimated by operational products available in real time. The most notable feature of this analysis is the 134-mm underestimation of rainfall by the operational rain gauge network. The temporal behavior of rainfall bias from PERSIANN-CCS-RT is similar to that of TRMM-3B42RT, but is smaller in magnitude. The ARS Micronet reference rainfall is then resampled to the spatiotemporal resolution of the 0.25°/3-h TRMM-3B42RT scale. A single satellite pixel, which corresponds to the perimeter of the grid shown in Fig. 1, covers a majority of the basin. Three-hourly rainfall rates are computed by taking the average rainfall centered at 0000, 0300, 0600, 0900, 1200, 1500, and 1800 UTC. Figure 2d shows the resampling to the satellite-pixel resolution results in a bias of 40 mm—slightly larger than that obtained with the finescale stage IV product.
4. Hydrologic evaluation
The hydrologic evaluation framework utilized herein follows the benchmark-dependent path of Hossain and Lettenmaier (2006), who called for a hydrologically relevant framework for assessing satellite-based rainfall algorithms in order to realize the full potential of the planned Global Precipitation Measurement (GPM) mission. In this section, we objectively calibrate a distributed hydrologic model using reference rainfall from the ARS Micronet. The reference rainfall is input into the model at 4-km/l-h and 0.25°/3-h resolutions, which corresponds to the evaluated rainfall product resolutions. Statistics are developed to evaluate simulations forced by the evaluated rainfall products in relation to the simulations forced by the reference rainfall and observed streamflow.
The rainfall-runoff generation hydrologic model employed in this study is the Sacramento model (Burnash et al. 1973). The Hydrology Laboratory Research Distributed Hydrologic Model (HL-RDHM) was described in detail by Koren et al. (2004); a brief summary is provided here. The Sacramento Soil Moisture Accounting (SAC-SMA) model is applied to each 4-km grid cell for water balance. A total of 18 parameters and six state variables are used in HL-RDHM to represent water fluxes and contents for direct runoff from impervious surfaces, evapotranspiration from tension water held in both soil zones, infiltration to the upper free water soil zone, surface runoff generation, percolation to the lower soil zone, partitioning between tension and free water, subsurface outflow, baseflow, and routing. Eleven of the 18 parameters are spatially distributed and are provided with a priori estimates based on soil types and depths (Koren et al. 2000). Surface runoff is estimated as a saturation-excess process from states describing the tension and free water contents in the upper and lower zones, which is appropriate with the flat terrain and deep-layer soils present in the Ft. Cobb basin. This surface runoff is kinematically routed downstream based on cell connectivity, slope, and flow direction derived from a digital elevation model. Two parameters describe the channel routing component. A power-law equation employing a coefficient and exponent is used to describe the relation between discharge and cross-sectional area. These parameters are found empirically using measurements of cross-sectional area and discharge at the USGS stream-gauging site circled in Fig. 1.
Forcing to HL-RDHM includes hourly and 3-h rainfall estimates resampled on the 4-km and 0.25° resolution grids and potential evaporation and adjustment grids (their multiplication yields potential evapotranspiration) for each month. The study period includes a model calibration period from 1 June 2005 through 30 June 2008 and hydrologic evaluation from 1 June to 31 August 2007. The model requires a lengthy (at least three years) period of data for calibration, thus the model calibration period and evaluation periods overlap. No independent validation period is used to provide metrics on the model as a forecast tool; the emphasis in the study design is on the hydrologic performance of the simulations forced by different rainfall algorithms relative to the ARS Micronet data source used for calibration; that is, it is a sensitivity study.
b. Model calibration
Vrugt et al. (2009) developed a parameter estimation method called Differential Evolution Adaptive Metropolis (DREAM) that is based on an adaptive Markov Chain Monte Carlo (MCMC) algorithm. DREAM runs multiple Markov chains in parallel to estimate the marginal posterior parameter distributions and their covariances using the sum of squared errors as the model objective. The global parameter space is explored by tuning the scale and orientation of the proposal distributions. These proposal distributions are calculated based on multiple Markov chains from the prior iteration. As such, DREAM communicates information between the chains, while each chain can be run simultaneously using parallel computing networks. The major advantage of using DREAM versus conventional, manual calibration methods is its ability to estimate multiple parameters, many of which interact, in an automatic manner. Moreover, the use of DREAM negates the subjectivity inherent in manual methods.
It is recognized that the validity of the DREAM-estimated parameters are conditioned on there being no bias in the rainfall forcing. If bias is present in the model forcings, then it is very likely that DREAM will arrive at biased model parameters that conceal the rainfall bias. If the rainfall bias is later corrected but model parameters remained fixed, then the model will produce biased outputs, perhaps giving the impression that the rainfall forcings are in error. This model “self-adjustment process” is accounted for using a method called “assessment of rainfall inputs using DREAM” (ARID; Gourley et al. 2010a). The primary requirement of ARID is a reference or “true” rainfall dataset that is unbiased and has negligible random errors. In the unique case of Ft. Cobb, we rely on the high-density ARS Micronet rain gauge data to represent the true, unbiased rainfall. The ARS Micronet data are an independent dataset; they were not included in producing any of the gauge-based products such as gauge, stage IV, or TRMM-3B42V6.
An alternative approach to ARID would be to estimate model parameters using DREAM for each of the different rainfall forcings during a calibration period and then compare the model simulations in a validation period. In essence, each rainfall algorithm would have its own optimized parameter set (and model states). The downfall of such an approach is the parameters would be conditioned on a potentially biased and erroneous rainfall record. Model simulations may be more accurate during validation, but they will be right for the wrong reasons. It is better practice to calibrate the hydrologic model to the true rainfall and runoff observations that represent the actual climatological response of the basin. This latter approach, which we have adopted in ARID, will make better use of GPM-era rainfall measurements that will continue to have incremental improvements and won’t require a recalibration of the model following each version release. A more robust methodology, to be considered in future work, is to incorporate an algorithm that performs global parameter optimization as in DREAM but also simultaneously estimates state variables (Vrugt et al. 2005).
For a period spanning 1 June 2005 through 30 June 2008, rainfall forcing from the ARS Micronet rain gauges, sampled at 4 km/1 h and 0.25°/3 h, are used to calibrate HL-RDHM parameters within the DREAM framework. Following 647 425 function evaluations, DREAM converged on final parameter estimates for all 18 parameter multipliers applied at the 4-km/l-h resolution. Convergence was monitored and defined by the R statistic of Gelman and Rubin (1992). Recall 11 of the parameters are spatially distributed, so in actuality the number of parameter values is 36 grid cells multiplied by 11 distributed parameters plus 7 lumped parameters, or 403, for the 4-km model and 18 for the lumped 0.25° model. While DREAM returns the posterior probability distributions of the parameters, we chose the single, optimum parameter set because the hydrologic evaluation is focused on the relative hydrologic performance from the different rainfall forcings. In other words, we would arrive at the same conclusions in comparing an ensemble of simulations with different rainfall inputs as we would using the optimum parameter set. The DREAM-calibrated hydrograph is shown with observed discharge in Fig. 3. For illustrative purposes, we have transformed the discharge using the following equation as in Yilmaz et al. (2005):
where Q is the original flow and Qtrans is the transformed flow. The parameter estimation method yields a Nash–Sutcliffe coefficient of efficiency (NSCE; Nash and Sutcliffe 1970) of 0.83 and a fractional bias (FB; defined as the sum of simulated minus observed runoff and divided by the sum of observed runoff) of −24.56%. As shown in Fig. 3, the underestimation primarily is a result of the model’s inability to accurately simulate baseflow. When the ARS Micronet data are resampled to 0.25°/3-h resolution and the calibration is repeated, the NSCE drops to 0.64. The model was run fully distributed in both cases, thus the decrease in skill is due to the loss of information in the rainfall aggregation process.
c. Streamflow simulations
The next procedure in ARID replaces the ARS Micronet rainfall inputs with those described in section 3 and compares the resulting hydrographs during the evaluation period from 1 June through 31 August 2007. Figure 4 shows the observed and simulated hydrographs over the three-month comparison period. Figure 4a shows the impacts of rain gauge adjustments to the radar and TRMM-3B42RT rainfall algorithms. TRMM-3B42RT and TRMM-3B42V6 are referred to as RT-RESAMPLE and V6-RESAMPLE when the products have been input into the model calibrated by ARS Micronet rainfall resampled at 0.25°/3-h resolution. The hourly rain gauge adjustments to the stage IV rainfall yield substantial improvements over the simulations forced by radar. The radar simulation has significant erroneous flows from 10 to 17 August 2008, which is an outlier from all the other simulations. The radar rainfall product can have errors with nonprecipitating echoes because of anomalous propagation of the radar beam. This causes backscattered energy from the ground to be received by the radar, and thus produces the incorrect appearance of rainfall. The degree of improvement of V6-RESAMPLE over RT-RESAMPLE is not as obvious; further details are elucidated in forthcoming statistical comparisons. Figure 4b reveals the hydrologic impacts of utilizing downscaled MW data to create the PERSIANN-CCS-MW product. In general, the simulation from PERSIANN-CCS-MW is the same or slightly lower than PERSIANN-CCS-RT. However, this negative bias resulting from the introduction of downscaled MW data is not always correct, as seen in the time series after 16 August 2007. Figure 4c compares in situ and remote sensing algorithms that are used in real time for operational purposes either in the United States (stage IV and gauge) or globally (RT-RESAMPLE and PERSIANN-CCS-RT). The multisensor stage IV product clearly outperforms the other rainfall algorithms, while simulations forced by gauge tend to underestimate flows. It is difficult to pinpoint obvious advantages in comparing simulations forced by PERSIANN-CCS-RT and RT-RESAMPLE. Figure 4d reveals the impacts due to resampling the reference rainfall to the TRMM data resolution and recalibrating the hydrologic model. Significant improvements are visible when the TRMM-3B42RT data are input to the model calibrated at the coarser resolution (i.e., RT-RESAMPLE). This improvement is not as obvious in the case of TRMM-3B42V6. We see a reduction of skill when using the coarser ARS-RESAMPLE as compared to the ARS Micronet forcing.
Next, we compute the FB (in %), root-mean-square error (RMSE; in m3 s−1), and Micronet-relative efficiency (MRE) for the different simulations shown in Fig. 4:
where Q is the discharge at each ith 15-min time step; a superscript of R refers to the rainfall algorithms being evaluated and obs is for observed streamflow. The MRE is similar in formulation to the more common NSCE. The difference is that the mean observed streamflow present in the denominator of NSCE is replaced with the simulation, which is the calibrated simulation using DREAM-optimized parameters with 4-km rainfall forcing from ARS Micronet rain gauge data as shown in Fig. 3. The MRE score thus casts simulation skill in relation to the skill achievable by the calibrated simulation. A score of 0 indicates the R rainfall input results in the same efficiency that was obtained using the true ARS Micronet rainfall as input. A maximum score of 1 indicates the simulation skill exceeds that produced by model calibration and agrees perfectly with observations. MRE scores worsen as they become more negative up to −∞, indicating the least skill. Each performance metric is analyzed for a spectrum of observed flows ranging from >0.5 to >21.1 m3 s−1, thus providing a relevant statistical characterization in a hydrologic context for each algorithm.
The impact of rain gauge adjustment to remote sensing algorithms’ hydrologic performance in terms of FB is shown in Fig. 5a. The unadjusted radar product has extreme overestimation (138%) for observed flows >0.5 m3 s−1. This FB reduces down to 70% for higher flows. The FB of the simulation forced by the stage IV product is approximately −10% and does not depend on flow magnitude. The application of monthly bias adjustment to 3-hourly rainfall products (i.e., V6-RESAMPLE) improves simulation bias only for flows >10.0 m3 s−1. The incorporation of bias adjustment at this scale has overcorrected the original positive FB in the RT-RESAMPLE simulation. The incorporation of MW data in PERSIANN-CCS-MW yields a very subtle reduction in FB from 8% to −6% for a majority of flows (95%) (Fig. 5b). These FBs become negligible with increasing flow exceedance threshold. Figure 5c shows a large negative FB of approximately −70% for all flow exceedance thresholds with the simulation forced by the NWS operational rain gauge network. In fact, the gauge FB is larger in magnitude than the RT-RESAMPLE simulation. The impact of considering rainfall product resolution in the model calibration process is shown in Fig. 5d. A negative FB of −50% for flows >0.5 m3 s−1 results when aggregating the ARS Micronet rainfall up to the same scale as the TRMM products and recalibrating the model. This has the same effect as comparing simulations from lumped, basin-averaged inputs to spatially distributed rainfall forcing. The consideration of product resolution has a positive impact on RT-RESAMPLE for a majority of flows, but this reduction in FB degrades for flows >10 m3 s−1. Improvements in the V6-RESAMPLE simulations over TRMM-3B42V6 occur with all observed discharges.
The impact of gauge adjustment on RMSE of hydrologic simulations indicates significant improvements in V6-RESAMPLE over RT-RESAMPLE and even more so in stage IV over radar (Fig. 6a). It is expected that RMSE will generally increase with discharge threshold because RMSE is not a normalized quantity. There is an increase in RMSE after incorporating downscaled MW data in PERSIANN-CCS-MW compared to PERSIANN-CCS-RT (Fig. 6b). For the real-time algorithms, the RMSE increases from stage IV to gauge to PERSIANN-CCS-RT to RT-RESAMPLE (Fig. 6c). Finally, resampling rainfall to the commensurate resolution in the model calibration process improves the V6-RESAMPLE simulation over TRMM-3B42V6, but the RMSE is worsened slightly with RT-RESAMPLE (Fig. 6d). Also, as noted previously, the model calibrated with the coarser resolution ARS Micronet rainfall (ARS-RESAMPLE) performs worse than the 4-km simulations.
The impact of gauge-adjustment strategies on MRE agrees with prior analyses in that the improvement over the radar simulation is considerable with the hourly adjusted stage IV product, resulting in a MRE of approximately −0.7 at all discharge thresholds (Fig. 7a). There is also improvement with monthly bias adjustments to yield V6-RESAMPLE. The MRE score indicates the adjustment of PERSIANN-CCS-RT with seasonal MW data, however, results in worse performance (Fig. 7b). Also, despite both PERSIANN-CCS simulations being unbiased as a function of discharge threshold, each has reduced MRE scores with increasing flows. Simulations from stage IV forcing perform best in real time while PERSIANN-CCS-RT is the worst (Fig. 7c). Curiously, RT-RESAMPLE is only slightly worse than gauge MRE for a majority of discharge values (95%), and RT-RESAMPLE becomes more skillful for discharges >10 m3 s−1. When we consider product resolution in model calibration, we see the improvement to RT-RESAMPLE over TRMM-3B42RT is substantial (Fig. 7d). In fact, the consideration of product resolution in the nongauge-adjusted RT-RESAMPLE yields a generally higher MRE than the gauge-adjusted TRMM-3B42V6 simulation that was calibrated at 4-km resolution. This result highlights the importance of considering product spatiotemporal resolution in the model calibration step. Further improvements occur with V6-RESAMPLE, which yields an approximate MRE value of −3 that does not degrade with increasing discharges.
To combine aspects of Figs. 5–7 describing the hydrologic skill in a condensed format, we utilize two statistical measures summarizing the precision and accuracy of the simulations for the three-month period of study. Precision is measured by the MRE defined in (4) and the accuracy, or bias, is computed as follows:
where MRB (in %) is the Micronet-relative bias for simulations corresponding to each of the rainfall inputs R. An MRB of 0% indicates the simulation bias was the same as that achieved with the unbiased rainfall source used in model calibration: . The two statistics shown in (4)–(5) have been designed with the expectation that simulation skill will generally worsen when the rainfall forcing deviates from the “true rainfall” that was used in the calibration step.
Figure 8 shows a two-dimensional plot of MRE and MRB for the rainfall algorithms evaluated in this study. Simulation skill equivalent to that achieved by ARS Micronet inputs at 4-km resolution with optimized model parameters will have MRE and MRB values of 0. The simulation forced with stage IV rainfall is only slightly worse than the ARS Micronet forcing used for calibration, thus confirming the high quality of this multisensor, quality-controlled rainfall product. The stage IV product is derived from the NEXRAD-based radar product—the latter of which is the worst performer, with a MRE and MRB of −43 and 150%. The black curve shows the MRE–MRB scores when we introduce bias to the ARS Micronet rainfall estimates and subsequently evaluate the resulting hydrologic simulations. This curve shows the dependence of MRE and MRB skill scores. The proximity of the radar’s MRE–MRB value with respect to this curve indicates a majority of its error is due to bias. Evidently, the hourly rain gauge adjustment in the stage IV product is very effective in removing this bias and is thus a recommended practice for hydrologic application.
There are also improvements in the MRE–MRB domain with gauge adjustments applied to the TRMM-3B42RT product to yield TRMM-3B42V6. However, the hydrologic evaluation reveals MRB values of −43% with TRMM-3B42V6 and −40% with V6-RESAMPLE, indicating a negative bias was present in the monthly gauge dataset used for adjustment. Most of this bias occurs with the TS Erin event as shown in Fig. 2. The PERSIANN-CCS-MW and PERSIANN-CCS-RT simulations are relatively unbiased, but there is a loss in hydrologic skill at the seasonal scale for the Ft. Cobb basin following the integration of seasonally downscaled MW data. The rainfall study of Gourley et al. (2010b) also found degraded performance at daily and hourly scales with PERSIANN-CCS-MW compared to PERSIANN-CCS-RT.
In comparing the products that are available in real time for operational use, we see the stage IV product is clearly the best. Note that this product requires a properly maintained, dense network of ground-based radars, automated rain gauge networks, and forecasters to manually quality control the product. Thus, it is a rather expensive and demanding product. Nonetheless, we can see the gauge correction and forecaster adjustment steps are very effective procedures because the simulation based on radar data alone (i.e., radar) is the worst. According to the MRE–MRB analysis, the model calibrated with the 0.25°/3-h resolution ARS Micronet rainfall and then forced with TRMM-3B42RT (i.e., RT-RESAMPLE) produced a simulation that is only slightly worse than gauge and better than PERSIANN-CCS-RT. The consideration of product resolution in the model calibration step is a significant outcome of this study. Figure 8 indicates significant improvements in RT-RESAMPLE and V6-RESAMPLE over the TRMM-3B42RT and TRMM-3B42V6 simulations that were from the model calibrated at 4-km resolution. One might argue that perhaps the coarser-scale model was simply calibrated better. However, we see a reduction in skill with ARS-RESAMPLE compared to the ARS Micronet simulation. This result on Ft. Cobb indicates 4-km resolution rainfall results in improved hydrologic simulations over 0.25°/3-h resolution rainfall. In the case of the 342-km2 Ft. Cobb basin, this statement also means distributed rainfall inputs are better for hydrologic simulation than basin-averaged (lumped) values. Secondly, the scale of the rainfall product used to estimate model parameters must remain the same from calibration to validation or prediction. In other words, distributed hydrologic parameter settings are sensitive to the spatiotemporal scale of rainfall forcing.
Finally, we focus on TS Erin to determine if the seasonal statistics adequately describe the expected skill associated with this rare, extreme event. The 500-yr recurrence interval of 3-h rainfall with TS Erin quantifies its rarity. The hydrologic simulations shown in Fig. 9 indicate that TS Erin indeed poses exceptional challenges to remote sensing algorithms. Peak flow is underestimated from all hydrologic simulations. The most egregious errors occur with the TRMM-3B42V6 and V6-RESAMPLE simulations that have peak flows of only 12.2 and 22.8 m3 s−1, to be compared with the observed peak flow of 209.5 m3 s−1. Here, we see the monthly gauge adjustment to TRMM-3B42RT did not accurately apply to this event, leading to worse performance than with RT-RESAMPLE. This finding for TS Erin is in contrast to the seasonal runoff statistics. Comparisons between PERSIANN-CCS-MW and PERSIANN-CCS-RT indicate the latter algorithm is more skillful for all flows, including TS Erin, according to all analyzed statistics.
The ranking of the simulation skill from the real-time rainfall algorithms for TS Erin becomes essentially reversed from the seasonal analysis, with the notable exception of stage IV. The simulation from gauge suffers from inadequate spatial density of operational rain gauges to capture the details of high-intensity rainfall. The real-time, remote sensing algorithms (i.e., radar, PERSIANN-CCS-RT and TRMM-3B42RT), on the other hand, are capable of observing the event with adequate resolution, but the retrieval algorithms themselves fail for this particular case. Arndt et al. (2009) noted TS Erin acquired tropical characteristics, including a warm-core vortex, during its reintensification after it had made landfall more than 500 km inland from the Gulf of Mexico, which is also considered a very infrequent occurrence. During this transition, Gourley et al. (2010a) found that rainfall estimates using radar Z with the NEXRAD Z–R relation underestimated observed rainfall amounts by 39%. They attributed this bias to a tropical drop size distribution (DSD) that was characteristic of high concentrations of small drops in contrast to convective thunderstorms, which is what the default Z–R equation is tuned for. It is quite likely that IR observations at cloud top were also unable distinguish the efficient warm rain microphysical processes that produced the tropical DSDs and high rainfall rates. This anomalous DSD, which resulted in underestimated rainfall rates from an active microwave sensor (NEXRAD), will also appear unremarkable from space-based active and passive microwave sensors. However, recent technological advances such as dual-frequency and dual-polarization radar have potential to improve rainfall estimation.
The impact of calibrating the model with reference rainfall at the satellite-pixel resolution is significant with TS Erin, which is also the case with the seasonal analysis. That is, improved simulations result with RT-RESAMPLE and V6-RESAMPLE over the high-resolution model forced with the same inputs (i.e., TRMM-3B42RT and TRMM-3B42V6). Peak flow simulation worsens when using the reference rainfall in the coarse-resolution model as compared to the 4-km one. This result agrees with the seasonal analysis and highlights the need to consider the resolution of model inputs when estimating model parameters for future predictions. Overall, simulations of this extreme flooding event highlight the challenges that remain with rainfall retrievals based on remote sensing data such as radar reflectivity and satellite MW and IR data.
5. Summary and conclusions
In this study, we analyze the same rainfall products evaluated by Gourley et al. (2010b) over the same time period, but employ a distributed hydrologic model on a densely instrumented, 342-km2 catchment in Oklahoma to assess the hydrologic skill of the rainfall algorithms. The Ft. Cobb watershed features a Micronet of several atmospheric and soil measurements including rain gauges, providing a unique opportunity to calibrate the hydrologic model parameters with an independent, unbiased rainfall dataset. After the hydrologic model parameters are automatically estimated for a 3-yr calibration period, the parameters are fixed, and the true Micronet rainfall inputs are replaced by the multisource rainfall algorithms. This procedure is repeated for the same ARS Micronet rainfall inputs that have been aggregated up to the 0.25°/3-h scale of the TRMM precipitation products. A hydrologic evaluation is then performed for the summer of 2007, which turns out to be the wettest season on record in the state of Oklahoma. Furthermore, the basin was struck by a particularly damaging flood from a reintensifying TS Erin that yielded a 3-h rainfall rate of 187 mm that was determined to have a 500-yr recurrence interval.
Below, we summarize the main findings from the seasonal hydrologic evaluation of rainfall inputs from radar, satellite, gauges, and combinations. While the results from this study come from the HL-RDHM model applied to the 342-km2 Ft. Cobb basin, many of the conclusions apply to the hydrologic application of rainfall algorithms more broadly.
Bias correction to TRMM-3B42RT and the NEXRAD radar products using monthly and hourly rain gauge accumulations, respectively, lead to improvements in hydrologic skill according to all analyzed statistics. This degree of improvement is most profound with the stage IV product that employs adjustments on an hourly basis and also incorporates forecaster quality control.
Adjustment of PERSIANN-CCS-RT using downscaled microwave data yielding the PERSIANN-CCS-MW product results in no improvements in hydrologic simulation skill.
Consideration of rainfall product resolution in the hydrologic modeling process is found to be fundamental. After the ARS Micronet reference rainfall is aggregated to the scale of the TRMM-3B42 products and the model is recalibrated, the MRE scores improve from −28.9 to −10.8 with TRMM-3B42RT forcing and from −10.3 to −2.6 with TRMM-3B42V6 forcing.
Better simulations occurred with ARS Micronet reference rainfall forcing at 4-km/l-h scale as compared to 0.25°/3 h. This result highlights the need for high-resolution, accurate rainfall for distributed hydrologic modeling.
An intercomparison of simulations forced by rainfall algorithms that are produced in real time for operational purposes identify the stage IV product as having the best hydrologic skill with a MRE of −0.5, which is only slightly worse than the reference simulation forced by the ARS Micronet rainfall. Simulations from TRMM-3B42RT forcing are better than PERSIANN-CCS-RT and equivalent to those forced by the operational rain gauge network (gauge). However, this high degree of skill is only achievable if the coarser TRMM product scale is taken into consideration in the model calibration process.
TS Erin was characteristic of efficient, warm rain microphysical processes, which yield rather unremarkable scattering signatures from active and passive microwave measurements and give few clues as to the extreme rainfall rates at cloud top from IR measurements. As such, this event provides a unique and challenging case study for rainfall algorithms based on in situ and remote sensing data. The monthly gauge adjustment to TRMM-3B42RT yielding TRMM-3B42V6 did not accurately apply to this damaging flood case; peak flow was underestimated by an order of magnitude. Although Ebert et al. (2007) and Gourley et al. (2010b) found the monthly gauge corrections offered submonthly improvements (i.e., at daily and 3-hourly time scales), this was not the case for this extreme event. The operational rain gauge network (gauge) is too sparse to capture the high-intensity rainfall with TS Erin, causing peak flow to be underestimated by 84%. Finally, simulations from radar, PERSIANN-CCS-RT, and TRMM-3B42RT yield less than half the observed peak flow. While these remote sensing algorithms are capable of observing the rainfall event, the retrieval algorithms failed to produce the observed high-intensity rainfall rates. The tropical drop size distribution with TS Erin produced an inconspicuous scattering signature, even with an active microwave sensor.
This study highlights the need to design remote sensing rainfall algorithms so that they perform accurately for all events ranging from common to rare. These latter events are particularly challenging from an operational remote sensing perspective because of the frequently noted presence of efficient, warm rain processes, which do not have strong scattering signatures. It is possible that these events will be better detected from the planned GPM dual-frequency measurements—a topic inviting future research. Other future topics worth exploring are performing the hydrologic evaluation with a different hydrologic model, adding a data assimilation component to reduce model uncertainty, and evaluating rainfall estimates on different study basins.
Funding was provided by NOAA/Office of Oceanic and Atmospheric Research under NOAA–University of Oklahoma Cooperative Agreement NA17RJ1227, U.S. Department of Commerce. Stage II radar- and gauge-based products were provided by NCAR–EOL under sponsorship of the National Science Foundation (http://data.eol.ucar.edu/). The stage IV rainfall product was obtained at the National Weather Service’s National Precipitation Verification Unit (http://www.hpc.ncep.noaa.gov/npvu/). The authors would like to gratefully acknowledge Dr. Soroosh Sorooshian at the University of California, Irvine and Dr. George Huffman at NASA Goddard for providing the PERSIANN-CCS and TMPA products in this study, respectively. The authors would also like to thank Dr. Jasper Vrugt of the University of California, Irvine for making the DREAM parameter estimation method available to us. Computer resources for the calibration of the model were made available from the University of Oklahoma’s Supercomputing Center for Education and Research (OSCER).