Search Results
You are looking at 1 - 10 of 25 items for :
- Author or Editor: Martyn P. Clark x
- Journal of Hydrometeorology x
- Refine by Access: All Content x
Abstract
A snow data assimilation study was undertaken in which real data were used to update a conceptual model, SNOW-17. The aim of this study is to improve the model’s estimate of snow water equivalent (SWE) by merging the uncertainties associated with meteorological forcing data and SWE observations within the model. This is done with a view to aiding the estimation of snowpack initial conditions for the ultimate objective of streamflow forecasting via a distributed hydrologic model. To provide a test of this methodology, the authors performed experiments at 53 stations in Colorado. In each case the situation of an unobserved location is mimicked, using the data at any given station only for validation; essentially, these are withholding experiments. Both ensembles of model forcing data and assimilated data were derived via interpolation and stochastic modeling of data from surrounding sources. Through a process of cross validation the error for the ensemble of model forcing data and assimilated observations is explicitly estimated. An ensemble square root Kalman filter is applied to perform assimilation on a 5-day cycle. Improvements in the resulting SWE are most evident during the early accumulation season and late melt period. However, the large temporal correlation inherent in a snowpack results in a less than optimal assimilation and the increased skill is marginal. Once this temporal persistence is removed from both model and assimilated observations during the update cycle, a result is produced that is, within the limits of available information, consistently superior to either the model or interpolated observations.
Abstract
A snow data assimilation study was undertaken in which real data were used to update a conceptual model, SNOW-17. The aim of this study is to improve the model’s estimate of snow water equivalent (SWE) by merging the uncertainties associated with meteorological forcing data and SWE observations within the model. This is done with a view to aiding the estimation of snowpack initial conditions for the ultimate objective of streamflow forecasting via a distributed hydrologic model. To provide a test of this methodology, the authors performed experiments at 53 stations in Colorado. In each case the situation of an unobserved location is mimicked, using the data at any given station only for validation; essentially, these are withholding experiments. Both ensembles of model forcing data and assimilated data were derived via interpolation and stochastic modeling of data from surrounding sources. Through a process of cross validation the error for the ensemble of model forcing data and assimilated observations is explicitly estimated. An ensemble square root Kalman filter is applied to perform assimilation on a 5-day cycle. Improvements in the resulting SWE are most evident during the early accumulation season and late melt period. However, the large temporal correlation inherent in a snowpack results in a less than optimal assimilation and the increased skill is marginal. Once this temporal persistence is removed from both model and assimilated observations during the update cycle, a result is produced that is, within the limits of available information, consistently superior to either the model or interpolated observations.
Abstract
This paper describes a flexible method to generate ensemble gridded fields of precipitation in complex terrain. The method is based on locally weighted regression, in which spatial attributes from station locations are used as explanatory variables to predict spatial variability in precipitation. For each time step, regression models are used to estimate the conditional cumulative distribution function (cdf) of precipitation at each grid cell (conditional on daily precipitation totals from a sparse station network), and ensembles are generated by using realizations from correlated random fields to extract values from the gridded precipitation cdfs. Daily high-resolution precipitation ensembles are generated for a 300 km × 300 km section of western Colorado (dx = 2 km) for the period 1980–2003. The ensemble precipitation grids reproduce the climatological precipitation gradients and observed spatial correlation structure. Probabilistic verification shows that the precipitation estimates are reliable, in the sense that there is close agreement between the frequency of occurrence of specific precipitation events in different probability categories and the probability that is estimated from the ensemble. The probabilistic estimates have good discrimination in the sense that the estimated probabilities differ significantly between cases when specific precipitation events occur and when they do not. The method may be improved by merging the gauge-based precipitation ensembles with remotely sensed precipitation estimates from ground-based radar and satellites, or with precipitation and wind fields from numerical weather prediction models. The stochastic modeling framework developed in this study is flexible and can easily accommodate additional modifications and improvements.
Abstract
This paper describes a flexible method to generate ensemble gridded fields of precipitation in complex terrain. The method is based on locally weighted regression, in which spatial attributes from station locations are used as explanatory variables to predict spatial variability in precipitation. For each time step, regression models are used to estimate the conditional cumulative distribution function (cdf) of precipitation at each grid cell (conditional on daily precipitation totals from a sparse station network), and ensembles are generated by using realizations from correlated random fields to extract values from the gridded precipitation cdfs. Daily high-resolution precipitation ensembles are generated for a 300 km × 300 km section of western Colorado (dx = 2 km) for the period 1980–2003. The ensemble precipitation grids reproduce the climatological precipitation gradients and observed spatial correlation structure. Probabilistic verification shows that the precipitation estimates are reliable, in the sense that there is close agreement between the frequency of occurrence of specific precipitation events in different probability categories and the probability that is estimated from the ensemble. The probabilistic estimates have good discrimination in the sense that the estimated probabilities differ significantly between cases when specific precipitation events occur and when they do not. The method may be improved by merging the gauge-based precipitation ensembles with remotely sensed precipitation estimates from ground-based radar and satellites, or with precipitation and wind fields from numerical weather prediction models. The stochastic modeling framework developed in this study is flexible and can easily accommodate additional modifications and improvements.
Abstract
The timing of snowmelt runoff (SMR) for 84 rivers in the western United States is examined to understand the character of SMR variability and the climate processes that may be driving changes in SMR timing. Results indicate that the timing of SMR for many rivers in the western United States has shifted to earlier in the snowmelt season. This shift occurred as a step change during the mid-1980s in conjunction with a step increase in spring and early-summer atmospheric pressures and temperatures over the western United States. The cause of the step change has not yet been determined.
Abstract
The timing of snowmelt runoff (SMR) for 84 rivers in the western United States is examined to understand the character of SMR variability and the climate processes that may be driving changes in SMR timing. Results indicate that the timing of SMR for many rivers in the western United States has shifted to earlier in the snowmelt season. This shift occurred as a step change during the mid-1980s in conjunction with a step increase in spring and early-summer atmospheric pressures and temperatures over the western United States. The cause of the step change has not yet been determined.
Abstract
This paper examines an archive containing over 40 years of 8-day atmospheric forecasts over the contiguous United States from the NCEP reanalysis project to assess the possibilities for using medium-range numerical weather prediction model output for predictions of streamflow. This analysis shows the biases in the NCEP forecasts to be quite extreme. In many regions, systematic precipitation biases exceed 100% of the mean, with temperature biases exceeding 3°C. In some locations, biases are even higher. The accuracy of NCEP precipitation and 2-m maximum temperature forecasts is computed by interpolating the NCEP model output for each forecast day to the location of each station in the NWS cooperative network and computing the correlation with station observations. Results show that the accuracy of the NCEP forecasts is rather low in many areas of the country. Most apparent is the generally low skill in precipitation forecasts (particularly in July) and low skill in temperature forecasts in the western United States, the eastern seaboard, and the southern tier of states. These results outline a clear need for additional processing of the NCEP Medium-Range Forecast Model (MRF) output before it is used for hydrologic predictions.
Techniques of model output statistics (MOS) are used in this paper to downscale the NCEP forecasts to station locations. Forecasted atmospheric variables (e.g., total column precipitable water, 2-m air temperature) are used as predictors in a forward screening multiple linear regression model to improve forecasts of precipitation and temperature for stations in the National Weather Service cooperative network. This procedure effectively removes all systematic biases in the raw NCEP precipitation and temperature forecasts. MOS guidance also results in substantial improvements in the accuracy of maximum and minimum temperature forecasts throughout the country. For precipitation, forecast improvements were less impressive. MOS guidance increases the accuracy of precipitation forecasts over the northeastern United States, but overall, the accuracy of MOS-based precipitation forecasts is slightly lower than the raw NCEP forecasts.
Four basins in the United States were chosen as case studies to evaluate the value of MRF output for predictions of streamflow. Streamflow forecasts using MRF output were generated for one rainfall-dominated basin (Alapaha River at Statenville, Georgia) and three snowmelt-dominated basins (Animas River at Durango, Colorado; East Fork of the Carson River near Gardnerville, Nevada; and Cle Elum River near Roslyn, Washington). Hydrologic model output forced with measured-station data were used as “truth” to focus attention on the hydrologic effects of errors in the MRF forecasts. Eight-day streamflow forecasts produced using the MOS-corrected MRF output as input (MOS) were compared with those produced using the climatic Ensemble Streamflow Prediction (ESP) technique. MOS-based streamflow forecasts showed increased skill in the snowmelt-dominated river basins, where daily variations in streamflow are strongly forced by temperature. In contrast, the skill of MOS forecasts in the rainfall-dominated basin (the Alapaha River) were equivalent to the skill of the ESP forecasts. Further improvements in streamflow forecasts require more accurate local-scale forecasts of precipitation and temperature, more accurate specification of basin initial conditions, and more accurate model simulations of streamflow.
Abstract
This paper examines an archive containing over 40 years of 8-day atmospheric forecasts over the contiguous United States from the NCEP reanalysis project to assess the possibilities for using medium-range numerical weather prediction model output for predictions of streamflow. This analysis shows the biases in the NCEP forecasts to be quite extreme. In many regions, systematic precipitation biases exceed 100% of the mean, with temperature biases exceeding 3°C. In some locations, biases are even higher. The accuracy of NCEP precipitation and 2-m maximum temperature forecasts is computed by interpolating the NCEP model output for each forecast day to the location of each station in the NWS cooperative network and computing the correlation with station observations. Results show that the accuracy of the NCEP forecasts is rather low in many areas of the country. Most apparent is the generally low skill in precipitation forecasts (particularly in July) and low skill in temperature forecasts in the western United States, the eastern seaboard, and the southern tier of states. These results outline a clear need for additional processing of the NCEP Medium-Range Forecast Model (MRF) output before it is used for hydrologic predictions.
Techniques of model output statistics (MOS) are used in this paper to downscale the NCEP forecasts to station locations. Forecasted atmospheric variables (e.g., total column precipitable water, 2-m air temperature) are used as predictors in a forward screening multiple linear regression model to improve forecasts of precipitation and temperature for stations in the National Weather Service cooperative network. This procedure effectively removes all systematic biases in the raw NCEP precipitation and temperature forecasts. MOS guidance also results in substantial improvements in the accuracy of maximum and minimum temperature forecasts throughout the country. For precipitation, forecast improvements were less impressive. MOS guidance increases the accuracy of precipitation forecasts over the northeastern United States, but overall, the accuracy of MOS-based precipitation forecasts is slightly lower than the raw NCEP forecasts.
Four basins in the United States were chosen as case studies to evaluate the value of MRF output for predictions of streamflow. Streamflow forecasts using MRF output were generated for one rainfall-dominated basin (Alapaha River at Statenville, Georgia) and three snowmelt-dominated basins (Animas River at Durango, Colorado; East Fork of the Carson River near Gardnerville, Nevada; and Cle Elum River near Roslyn, Washington). Hydrologic model output forced with measured-station data were used as “truth” to focus attention on the hydrologic effects of errors in the MRF forecasts. Eight-day streamflow forecasts produced using the MOS-corrected MRF output as input (MOS) were compared with those produced using the climatic Ensemble Streamflow Prediction (ESP) technique. MOS-based streamflow forecasts showed increased skill in the snowmelt-dominated river basins, where daily variations in streamflow are strongly forced by temperature. In contrast, the skill of MOS forecasts in the rainfall-dominated basin (the Alapaha River) were equivalent to the skill of the ESP forecasts. Further improvements in streamflow forecasts require more accurate local-scale forecasts of precipitation and temperature, more accurate specification of basin initial conditions, and more accurate model simulations of streamflow.
Abstract
An effort is under way aimed at historical analysis and monitoring of the pan-Arctic terrestrial drainage system. A key element is the provision of gridded precipitation time series that can be readily updated. This has proven to be a daunting task. Except for a few areas, the station network is sparse, with large measurement biases due to poor catch efficiency of solid precipitation. The variety of gauges used by different countries along with different reporting practices introduces further uncertainty. Since about 1990, there has been serious degradation of the monitoring network due to station closure and a trend toward automation in Canada.
Station data are used to compile monthly gridded time series for the 30-yr period 1960–89 at a cell resolution of 175 km. The station network is generally sufficient to estimate the mean and standard deviation of precipitation at this scale (hence the statistical distributions). However, as the interpolation procedures must typically draw from stations well outside of the grid box bounds, grid box time series are poorly represented. Accurately capturing time series requires typically four stations per 175-km cell, but only 38% of cells contain even a single station.
Precipitation updates at about a 1-month time lag can be obtained by using the observed precipitation distributions to rescale precipitation forecasts from the NCEP-1 reanalysis via a nonparametric probability transform. While recognizing inaccuracies in the observed time series, cross-validated correlation analyses indicate that the rescaled NCEP-1 forecasts have considerable skill in some parts of the Arctic drainage, but perform poorly over large regions. Treating climatology as a first guess with replacement by rescaled NCEP-1 values in areas of demonstrated skill yields a marginally useful monitoring product on the scale of large watersheds. Further improvements are realized by assimilating data from a limited array of station updates via a simple replacement strategy, and by including aerological estimates of precipitation less evapotranspiration (P − ET) within the initial rescaling procedure. Doing a better job requires better observations and an improved atmospheric model. The new ERA-40 reanalysis may fill the latter need.
Abstract
An effort is under way aimed at historical analysis and monitoring of the pan-Arctic terrestrial drainage system. A key element is the provision of gridded precipitation time series that can be readily updated. This has proven to be a daunting task. Except for a few areas, the station network is sparse, with large measurement biases due to poor catch efficiency of solid precipitation. The variety of gauges used by different countries along with different reporting practices introduces further uncertainty. Since about 1990, there has been serious degradation of the monitoring network due to station closure and a trend toward automation in Canada.
Station data are used to compile monthly gridded time series for the 30-yr period 1960–89 at a cell resolution of 175 km. The station network is generally sufficient to estimate the mean and standard deviation of precipitation at this scale (hence the statistical distributions). However, as the interpolation procedures must typically draw from stations well outside of the grid box bounds, grid box time series are poorly represented. Accurately capturing time series requires typically four stations per 175-km cell, but only 38% of cells contain even a single station.
Precipitation updates at about a 1-month time lag can be obtained by using the observed precipitation distributions to rescale precipitation forecasts from the NCEP-1 reanalysis via a nonparametric probability transform. While recognizing inaccuracies in the observed time series, cross-validated correlation analyses indicate that the rescaled NCEP-1 forecasts have considerable skill in some parts of the Arctic drainage, but perform poorly over large regions. Treating climatology as a first guess with replacement by rescaled NCEP-1 values in areas of demonstrated skill yields a marginally useful monitoring product on the scale of large watersheds. Further improvements are realized by assimilating data from a limited array of station updates via a simple replacement strategy, and by including aerological estimates of precipitation less evapotranspiration (P − ET) within the initial rescaling procedure. Doing a better job requires better observations and an improved atmospheric model. The new ERA-40 reanalysis may fill the latter need.
Abstract
Stations are an important source of meteorological data, but often suffer from missing values and short observation periods. Gap filling is widely used to generate serially complete datasets (SCDs), which are subsequently used to produce gridded meteorological estimates. However, the value of SCDs in spatial interpolation is scarcely studied. Based on our recent efforts to develop a SCD over North America (SCDNA), we explore the extent to which gap filling improves gridded precipitation and temperature estimates. We address two specific questions: 1) Can SCDNA improve the statistical accuracy of gridded estimates in North America? 2) Can SCDNA improve estimates of trends on gridded data? In addressing these questions, we also evaluate the extent to which results depend on the spatial density of the station network and the spatial interpolation methods used. Results show that the improvement in statistical interpolation due to gap filling is more obvious for precipitation, followed by minimum temperature and maximum temperature. The improvement is larger when the station network is sparse and when simpler interpolation methods are used. SCDs can also notably reduce the uncertainties in spatial interpolation. Our evaluation across North America from 1979 to 2018 demonstrates that SCDs improve the accuracy of interpolated estimates for most stations and days. SCDNA-based interpolation also obtains better trend estimation than observation-based interpolation. This occurs because stations used for interpolation could change during a specific period, causing changepoints in interpolated temperature estimates and affect the long-term trends of observation-based interpolation, which can be avoided using SCDNA. Overall, SCDs improve the performance of gridded precipitation and temperature estimates.
Abstract
Stations are an important source of meteorological data, but often suffer from missing values and short observation periods. Gap filling is widely used to generate serially complete datasets (SCDs), which are subsequently used to produce gridded meteorological estimates. However, the value of SCDs in spatial interpolation is scarcely studied. Based on our recent efforts to develop a SCD over North America (SCDNA), we explore the extent to which gap filling improves gridded precipitation and temperature estimates. We address two specific questions: 1) Can SCDNA improve the statistical accuracy of gridded estimates in North America? 2) Can SCDNA improve estimates of trends on gridded data? In addressing these questions, we also evaluate the extent to which results depend on the spatial density of the station network and the spatial interpolation methods used. Results show that the improvement in statistical interpolation due to gap filling is more obvious for precipitation, followed by minimum temperature and maximum temperature. The improvement is larger when the station network is sparse and when simpler interpolation methods are used. SCDs can also notably reduce the uncertainties in spatial interpolation. Our evaluation across North America from 1979 to 2018 demonstrates that SCDs improve the accuracy of interpolated estimates for most stations and days. SCDNA-based interpolation also obtains better trend estimation than observation-based interpolation. This occurs because stations used for interpolation could change during a specific period, causing changepoints in interpolated temperature estimates and affect the long-term trends of observation-based interpolation, which can be avoided using SCDNA. Overall, SCDs improve the performance of gridded precipitation and temperature estimates.
Abstract
Land models are increasingly used and preferred in terrestrial hydrological prediction applications. One reason for selecting land models over simpler models is that their physically based backbone enables wider application under different conditions. This study evaluates the temporal variability in streamflow simulations in land models. Specifically, we evaluate how the subsurface structure and model parameters control the partitioning of water into different flow paths and the temporal variability in streamflow. Moreover, we use a suite of model diagnostics, typically not used in the land modeling community to clarify model weaknesses and identify a path toward model improvement. Our analyses show that the typical land model structure, and their functions for moisture movement between soil layers (an approximation of Richards equation), has a distinctive signature where flashy runoff is superimposed on slow recessions. This hampers the application of land models in simulating flashier basins and headwater catchments where floods are generated. We demonstrate the added value of the preferential flow in the model simulation by including macropores in both a toy model and the Variable Infiltration Capacity model. We argue that including preferential flow in land models is essential to enable their use for multiple applications across a myriad of temporal and spatial scales.
Abstract
Land models are increasingly used and preferred in terrestrial hydrological prediction applications. One reason for selecting land models over simpler models is that their physically based backbone enables wider application under different conditions. This study evaluates the temporal variability in streamflow simulations in land models. Specifically, we evaluate how the subsurface structure and model parameters control the partitioning of water into different flow paths and the temporal variability in streamflow. Moreover, we use a suite of model diagnostics, typically not used in the land modeling community to clarify model weaknesses and identify a path toward model improvement. Our analyses show that the typical land model structure, and their functions for moisture movement between soil layers (an approximation of Richards equation), has a distinctive signature where flashy runoff is superimposed on slow recessions. This hampers the application of land models in simulating flashier basins and headwater catchments where floods are generated. We demonstrate the added value of the preferential flow in the model simulation by including macropores in both a toy model and the Variable Infiltration Capacity model. We argue that including preferential flow in land models is essential to enable their use for multiple applications across a myriad of temporal and spatial scales.
Abstract
Spatially distributed historical meteorological forcings (temperature and precipitation) are commonly incorporated into modeling efforts for long-term natural resources planning. For water management decisions, it is critical to understand the uncertainty associated with the different choices made in hydrologic impact assessments (choice of hydrologic model, choice of forcing dataset, calibration strategy, etc.). This paper evaluates differences among four commonly used historical meteorological datasets and their impacts on streamflow simulations produced using the Variable Infiltration Capacity (VIC) model. The four meteorological datasets examined here have substantial differences, particularly in minimum and maximum temperatures in high-elevation regions such as the Rocky Mountains. The temperature differences among meteorological forcing datasets are generally larger than the differences between calibration and validation periods. Of the four meteorological forcing datasets considered, there are substantial differences in calibrated model parameters and simulations of the water balance. However, no single dataset is superior to the others with respect to VIC simulations of streamflow. Also, optimal calibration parameter values vary across case study watersheds and select meteorological datasets, suggesting that there is enough flexibility in the calibration parameters to compensate for the effects of using select meteorological datasets. Evaluation of runoff sensitivity to changes in climate indicates that the choice of meteorological dataset may be as important in characterizing changes in runoff as climate change, supporting consideration of multiple sources of uncertainty in long-term planning studies.
Abstract
Spatially distributed historical meteorological forcings (temperature and precipitation) are commonly incorporated into modeling efforts for long-term natural resources planning. For water management decisions, it is critical to understand the uncertainty associated with the different choices made in hydrologic impact assessments (choice of hydrologic model, choice of forcing dataset, calibration strategy, etc.). This paper evaluates differences among four commonly used historical meteorological datasets and their impacts on streamflow simulations produced using the Variable Infiltration Capacity (VIC) model. The four meteorological datasets examined here have substantial differences, particularly in minimum and maximum temperatures in high-elevation regions such as the Rocky Mountains. The temperature differences among meteorological forcing datasets are generally larger than the differences between calibration and validation periods. Of the four meteorological forcing datasets considered, there are substantial differences in calibrated model parameters and simulations of the water balance. However, no single dataset is superior to the others with respect to VIC simulations of streamflow. Also, optimal calibration parameter values vary across case study watersheds and select meteorological datasets, suggesting that there is enough flexibility in the calibration parameters to compensate for the effects of using select meteorological datasets. Evaluation of runoff sensitivity to changes in climate indicates that the choice of meteorological dataset may be as important in characterizing changes in runoff as climate change, supporting consideration of multiple sources of uncertainty in long-term planning studies.
Abstract
We propose a conceptual and theoretical foundation for information-based model benchmarking and process diagnostics that provides diagnostic insight into model performance and model realism. We benchmark against a bounded estimate of the information contained in model inputs to obtain a bounded estimate of information lost due to model error, and we perform process-level diagnostics by taking differences between modeled versus observed transfer entropy networks. We use this methodology to reanalyze the recent Protocol for the Analysis of Land Surface Models (PALS) Land Surface Model Benchmarking Evaluation Project (PLUMBER) land model intercomparison project that includes the following models: CABLE, CH-TESSEL, COLA-SSiB, ISBA-SURFEX, JULES, Mosaic, Noah, and ORCHIDEE. We report that these models (i) use only roughly half of the information available from meteorological inputs about observed surface energy fluxes, (ii) do not use all information from meteorological inputs about long-term Budyko-type water balances, (iii) do not capture spatial heterogeneities in surface processes, and (iv) all suffer from similar patterns of process-level structural error. Because the PLUMBER intercomparison project did not report model parameter values, it is impossible to know whether process-level error patterns are due to model structural error or parameter error, although our proposed information-theoretic methodology could distinguish between these two issues if parameter values were reported. We conclude that there is room for significant improvement to the current generation of land models and their parameters. We also suggest two simple guidelines to make future community-wide model evaluation and intercomparison experiments more informative.
Abstract
We propose a conceptual and theoretical foundation for information-based model benchmarking and process diagnostics that provides diagnostic insight into model performance and model realism. We benchmark against a bounded estimate of the information contained in model inputs to obtain a bounded estimate of information lost due to model error, and we perform process-level diagnostics by taking differences between modeled versus observed transfer entropy networks. We use this methodology to reanalyze the recent Protocol for the Analysis of Land Surface Models (PALS) Land Surface Model Benchmarking Evaluation Project (PLUMBER) land model intercomparison project that includes the following models: CABLE, CH-TESSEL, COLA-SSiB, ISBA-SURFEX, JULES, Mosaic, Noah, and ORCHIDEE. We report that these models (i) use only roughly half of the information available from meteorological inputs about observed surface energy fluxes, (ii) do not use all information from meteorological inputs about long-term Budyko-type water balances, (iii) do not capture spatial heterogeneities in surface processes, and (iv) all suffer from similar patterns of process-level structural error. Because the PLUMBER intercomparison project did not report model parameter values, it is impossible to know whether process-level error patterns are due to model structural error or parameter error, although our proposed information-theoretic methodology could distinguish between these two issues if parameter values were reported. We conclude that there is room for significant improvement to the current generation of land models and their parameters. We also suggest two simple guidelines to make future community-wide model evaluation and intercomparison experiments more informative.
Abstract
The concepts of model benchmarking, model agility, and large-sample hydrology are becoming more prevalent in hydrologic and land surface modeling. As modeling systems become more sophisticated, these concepts have the ability to help improve modeling capabilities and understanding. In this paper, their utility is demonstrated with an application of the physically based Variable Infiltration Capacity model (VIC). The authors implement VIC for a sample of 531 basins across the contiguous United States, incrementally increase model agility, and perform comparisons to a benchmark. The use of a large-sample set allows for statistically robust comparisons and subcategorization across hydroclimate conditions. Our benchmark is a calibrated, time-stepping, conceptual hydrologic model. This model is constrained by physical relationships such as the water balance, and it complements purely statistical benchmarks due to the increased physical realism and permits physically motivated benchmarking using metrics that relate one variable to another (e.g., runoff ratio). The authors find that increasing model agility along the parameter dimension, as measured by the number of model parameters available for calibration, does increase model performance for calibration and validation periods relative to less agile implementations. However, as agility increases, transferability decreases, even for a complex model such as VIC. The benchmark outperforms VIC in even the most agile case when evaluated across the entire basin set. However, VIC meets or exceeds benchmark performance in basins with high runoff ratios (greater than ~0.8), highlighting the ability of large-sample comparative hydrology to identify hydroclimatic performance variations.
Abstract
The concepts of model benchmarking, model agility, and large-sample hydrology are becoming more prevalent in hydrologic and land surface modeling. As modeling systems become more sophisticated, these concepts have the ability to help improve modeling capabilities and understanding. In this paper, their utility is demonstrated with an application of the physically based Variable Infiltration Capacity model (VIC). The authors implement VIC for a sample of 531 basins across the contiguous United States, incrementally increase model agility, and perform comparisons to a benchmark. The use of a large-sample set allows for statistically robust comparisons and subcategorization across hydroclimate conditions. Our benchmark is a calibrated, time-stepping, conceptual hydrologic model. This model is constrained by physical relationships such as the water balance, and it complements purely statistical benchmarks due to the increased physical realism and permits physically motivated benchmarking using metrics that relate one variable to another (e.g., runoff ratio). The authors find that increasing model agility along the parameter dimension, as measured by the number of model parameters available for calibration, does increase model performance for calibration and validation periods relative to less agile implementations. However, as agility increases, transferability decreases, even for a complex model such as VIC. The benchmark outperforms VIC in even the most agile case when evaluated across the entire basin set. However, VIC meets or exceeds benchmark performance in basins with high runoff ratios (greater than ~0.8), highlighting the ability of large-sample comparative hydrology to identify hydroclimatic performance variations.