Development of a “Nature Run” for Observing System Simulation Experiments (OSSEs) for Snow Mission Development

Melissa L. Wrzesien aHydrological Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, Maryland
bESSIC, University of Maryland, College Park, College Park, Maryland

Search for other papers by Melissa L. Wrzesien in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-4958-9234
,
Sujay Kumar aHydrological Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, Maryland

Search for other papers by Sujay Kumar in
Current site
Google Scholar
PubMed
Close
,
Carrie Vuyovich aHydrological Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, Maryland

Search for other papers by Carrie Vuyovich in
Current site
Google Scholar
PubMed
Close
,
Ethan D. Gutmann cNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Ethan D. Gutmann in
Current site
Google Scholar
PubMed
Close
,
Rhae Sung Kim aHydrological Sciences Laboratory, NASA Goddard Space Flight Center, Greenbelt, Maryland
dGESTAR, Universities Space Research Association, Columbia, Maryland

Search for other papers by Rhae Sung Kim in
Current site
Google Scholar
PubMed
Close
,
Barton A. Forman eDepartment of Civil and Environmental Engineering, University of Maryland, College Park, College Park, Maryland

Search for other papers by Barton A. Forman in
Current site
Google Scholar
PubMed
Close
,
Michael Durand fSchool of Earth Sciences and Byrd Polar and Climate Research Center, The Ohio State University, Columbus, Ohio

Search for other papers by Michael Durand in
Current site
Google Scholar
PubMed
Close
,
Mark S. Raleigh gCollege of Earth, Ocean, and Atmospheric Sciences, Oregon State University, Corvallis, Oregon

Search for other papers by Mark S. Raleigh in
Current site
Google Scholar
PubMed
Close
,
Ryan Webb hDepartment of Civil, Construction, and Environmental Engineering, University of New Mexico, Albuquerque, New Mexico
iCenter for Water and the Environment, University of New Mexico, Albuquerque, New Mexico

Search for other papers by Ryan Webb in
Current site
Google Scholar
PubMed
Close
, and
Paul Houser jDepartment of Geography and Geoinformation Sciences, George Mason University, Fairfax, Virginia

Search for other papers by Paul Houser in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

Snow is a fundamental component of global and regional water budgets, particularly in mountainous areas and regions downstream that rely on snowmelt for water resources. Land surface models (LSMs) are commonly used to develop spatially distributed estimates of snow water equivalent (SWE) and runoff. However, LSMs are limited by uncertainties in model physics and parameters, among other factors. In this study, we describe the use of model calibration tools to improve snow simulations within the Noah-MP LSM as the first step in an observing system simulation experiment (OSSE). Noah-MP is calibrated against the University of Arizona (UA) SWE product over a western Colorado domain. With spatially varying calibrated parameters, we run calibrated and default Noah-MP simulations for water years 2010–20. By evaluating both simulations against the UA dataset, we show that calibration decreases domain averaged temporal RMSE and bias for snow depth from 0.15 to 0.13 m and from −0.036 to −0.0023 m, respectively, and improves the timing of snow ablation. Increased snow simulation performance also improves estimates of model-simulated runoff in four of six study basins, though only one has statistically significant improvement. Spatially distributed Noah-MP snow parameters perform better than default uniform values. We demonstrate that calibrating variables related to snow albedo calculations and rain–snow partitioning, among other processes, is a necessary step for creating a nature run that reasonably approximates true snow conditions for the OSSEs. Additionally, the inclusion of a snowfall scaling term can address biases in precipitation from meteorological forcing datasets, further improving the utility of LSMs for generating reliable spatiotemporal estimates of snow.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Melissa Wrzesien, melissa.l.wrzesien@nasa.gov

Abstract

Snow is a fundamental component of global and regional water budgets, particularly in mountainous areas and regions downstream that rely on snowmelt for water resources. Land surface models (LSMs) are commonly used to develop spatially distributed estimates of snow water equivalent (SWE) and runoff. However, LSMs are limited by uncertainties in model physics and parameters, among other factors. In this study, we describe the use of model calibration tools to improve snow simulations within the Noah-MP LSM as the first step in an observing system simulation experiment (OSSE). Noah-MP is calibrated against the University of Arizona (UA) SWE product over a western Colorado domain. With spatially varying calibrated parameters, we run calibrated and default Noah-MP simulations for water years 2010–20. By evaluating both simulations against the UA dataset, we show that calibration decreases domain averaged temporal RMSE and bias for snow depth from 0.15 to 0.13 m and from −0.036 to −0.0023 m, respectively, and improves the timing of snow ablation. Increased snow simulation performance also improves estimates of model-simulated runoff in four of six study basins, though only one has statistically significant improvement. Spatially distributed Noah-MP snow parameters perform better than default uniform values. We demonstrate that calibrating variables related to snow albedo calculations and rain–snow partitioning, among other processes, is a necessary step for creating a nature run that reasonably approximates true snow conditions for the OSSEs. Additionally, the inclusion of a snowfall scaling term can address biases in precipitation from meteorological forcing datasets, further improving the utility of LSMs for generating reliable spatiotemporal estimates of snow.

© 2022 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Melissa Wrzesien, melissa.l.wrzesien@nasa.gov

1. Introduction

Snow is a critical part of global and local water budgets, particularly in watersheds with headwaters in mountainous regions (Viviroli et al. 2007; Immerzeel et al. 2020). Millions of people around the world rely on snowmelt-derived runoff (Barnett et al. 2005; Li et al. 2017), especially in semiarid regions. Despite being an integral component of global and regional water balances, estimating mountain snow accumulation remains one of the largest challenges of snow hydrology (Bormann et al. 2018; Dozier et al. 2016). While some mountain ranges have relatively dense in situ networks, other areas lack observations (Dozier et al. 2016), limiting techniques for interpreting point observations across a larger scale. Beyond in situ observations, remote sensing offers the ability to observe snow extent from space (Hall et al. 2002), but estimating snow water equivalent (SWE) to understand the water content of the snowpack remains a significant challenge, particularly in the mountains (Lettenmaier et al. 2015; Nolin 2010; Takala et al. 2011; Vuyovich et al. 2014).

Due to limited in situ networks and uncertainty in remotely sensed observations, models are a practical alternative for developing spatiotemporal estimates of snow depth and SWE across large regions. Model intercomparison efforts have helped to identify important processes to improve simulating snow (Essery et al. 2009; Etchevers et al. 2004; van den Hurk et al. 2016; Krinner et al. 2018; Rutter et al. 2009), such as multilayer snowpack. While snow models often have complex physics and parameterizations, resulting in accurate simulations of snow compared to in situ observations (Dutra et al. 2012; Etchevers et al. 2004), such processes are often too computationally complex for land surface models (LSMs) designed to run over large geographical areas. Additionally, snow models are typically focused only on modeling the snowpack processes whereas LSMs also enable the linkages to the water, energy, and carbon cycle processes. Though LSMs allow for simulations across a range of spatial and temporal scales in a computationally efficient manner, the relatively simple nature of their conceptual formulations and model parameterizations, as compared to complex process models, increases the uncertainties of their predictions. Further, biases in model forcing data, particularly precipitation, are a major driver of model error (Raleigh et al. 2015; Schmucki et al. 2014; Henn et al. 2018), and studies suggest that reanalyses, which are often used for model meteorological forcing, underestimate precipitation in mountainous areas (Henn et al. 2018; Enzminger et al. 2019; He et al. 2019). Such limitations are well documented in the literature, where it has been suggested that common LSMs, such as the Noah LSM with multiple parameterization options (Noah-MP; Niu et al. 2011), underestimate snow mass (Chen 2014b; Kumar et al. 2019; Xia et al. 2017; Chen et al. 2014a). Despite these issues, LSMs are an essential tool for producing multiyear estimates of snow accumulation over continental or global study domains.

To reduce biases, models are often calibrated against reliable observation-based datasets (e.g., Ahl et al. 2008; Franz and Karsten 2013; Henn et al. 2016; Rutter et al. 2009). Calibration has a long history in operational snow modeling (e.g., Turcotte et al. 2007; Franz et al. 2008) and previous intercomparison projects explicitly considered the performance of calibrated versus noncalibrated models (Rutter et al. 2009; Essery et al. 2009). Often in snow and hydrological modeling, simulations are calibrated against discharge for improving model performance (Franz and Karsten 2013; Hay et al. 2006; Ahl et al. 2008; Turcotte et al. 2007). More recently, efforts have aimed to improve snow estimation by calibrating against SWE (Chen et al. 2017; Franz et al. 2010), snow-covered area (Franz and Karsten 2013; Parajka and Blöschl 2008), or multiobjective strategies that include two or more calibration variables (Nemri and Kinnard 2020; Parajka et al. 2007; Chen et al. 2017; Franz and Karsten 2013).

The performance of a calibrated model, however, will depend upon parameter selection for use during calibration, and complex LSMs such as Noah-MP have hundreds of parameters throughout the model code, some that are hard-coded to spatially uniform values. Cuntz et al. (2016) examined over 100 Noah-MP parameters, dozens of which are hard-coded into the LSM, and showed that simulated surface runoff is sensitive to almost all selected snow parameters; the authors conclude that it is necessary to expose some of the hard-coded parameters during calibration in order to improve model performance. Similarly, Mendoza et al. (2015) discussed that hard-coding parameters diminishes model agility; they identify several important hard-coded snow parameters that are treated as spatially uniform constants but in actuality likely vary through both time and space.

Here we calibrate Noah-MP against SWE estimates from the University of Arizona gridded observation-based snow data product (here referred to as UA; Zeng et al. 2018) in an effort to address dry biases in Noah-MP and improve snow estimation. We evaluate the impact of calibration on simulation of snow mass in a mountainous region. Since calibration will have implications beyond snow-related variables, we also examine impacts to other hydrologic processes, including runoff. The overarching motivation for the calibration is to produce a Noah-MP simulation that better approximates snow conditions through improvements to snow depth and SWE.

We aim for the calibrated simulation to be used as the “nature run” (NR) in a forthcoming snow-focused observing system simulation experiment (OSSE). OSSEs are data assimilation experiments, performed to evaluate the type and impact of data to be collected from proposed missions and to enable the assessment of the utility from competing mission designs and design configurations (Garnaud et al. 2019; Crow et al. 2001, 2005; Wang et al. 2008; Nearing et al. 2012). Further, these experiments help to quantify the utility of observations beyond the immediate variable of interest (e.g., the impact of assimilating snow information on other aspects of the water budget, such as streamflow). OSSEs are useful in developing assessments of proposed observational methods and can be performed in addition to field work, such as the extensive NASA SnowEx campaigns, for evaluating proposed sensors.

An NR is the foundational step of an OSSE, upon which the data assimilation experiments are built (see Fig. S1 in the online supplemental material for general steps to an OSSE). Within an OSSE, the NR simulation is considered the “true” state of the variable of interest. Therefore, NRs are developed using a high-quality model and meteorological inputs and should not have large uncertainty. Synthetic observations are then generated from the NR, after accounting for sources of errors and uncertainty associated with the anticipated sensor. The synthetic observations are assimilated into an open loop model simulation, and the assimilated result is compared back to the original NR to understand how well the proposed sensor captures the true conditions. The quality of the NR, therefore, significantly impacts the conclusions made from the OSSE. Since previous studies highlight biases in LSMs related to snow depth and SWE estimation, it is critical to reduce LSM bias and uncertainty to assess how proposed technologies perform in a variety of environments. If the NR and resulting synthetic observations are biased low, for example, it will be difficult to understand how a proposed sensor observes deep snowpacks. While an NR is not expected to be a perfect simulation, if it has a known systematic negative bias for SWE and snow depth, the assimilation experiments may not provide much information for how a sensor performs in regions where models have larger uncertainty, such as deep snow and forested regions (Kim et al. 2021). The calibration procedure described below is the first and an essential step in an OSSE designed to test potential configurations for a snow mission.

In addition to producing an improved NR for the OSSE, we aim to address three research questions: 1) Can calibration address known dry biases in LSMs that cause underestimation of snow accumulation? 2) How does calibration impact streamflow, beyond the targeted snow variables? 3) Can calibration suggest areas of model configuration that need improvement, such as meteorological forcing data for use as model boundary conditions? We test whether Noah-MP with calibration preprocessing yields similar snow estimates as a higher resolution, computationally expensive and complex snow physics model (SnowModel). We introduce the study area and calibration procedure in section 3 below. In section 4, we report results from the calibration experiments, and in section 5, we discuss implications and provide thoughts for future studies.

2. Data and methods

a. Model setup

We use the NASA Land Information System (LIS; Kumar et al. 2006; Peters-Lidard et al. 2007) for simulations over a western Colorado domain. The domain is selected to include sites from previous NASA SnowEx field campaign locations, including Grand Mesa and Senator Beck (Fig. 1). LIS is a land surface modeling framework designed to be highly flexible, offering users choice of LSM, meteorological forcing, and assimilation of in situ and remotely sensed observations, among other options. Created to be computationally efficient, LIS can perform simulations over large regional and global domains. The central component of the LIS framework is the LSM selection; LIS offers several community-supported LSMs relevant to operations and research. Here we use Noah-MP version 4.0.1. Recent work demonstrates that Noah-MP has superior performance to the original Noah LSM for simulating snow (Chen et al. 2014a,b; Kim et al. 2021; Minder et al. 2016; Wrzesien et al. 2015) due to model physics updates, including a multilayer (three layer) snowpack. Table S1 lists the physics options selected for the Noah-MP simulation.

Fig. 1.
Fig. 1.

Elevations of the western Colorado Noah-MP domain. The black box indicates the Grand Mesa intensive observation period field site from the NASA SnowEx 2017 field campaign. Triangles mark the six evaluation points and are labeled with the evaluation site name. The inset map shows the western Colorado domain with respect to the western United States. The bottom-right plot shows the land classes for the model domain.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

In the LIS framework, Noah-MP simulates both surface water and energy fluxes as they respond to meteorological boundary conditions supplied by LIS. Simulations are from September 2009 through July 2020 at 0.01° spatial resolution (∼1 km) and use hourly meteorological forcing data from the North American Land Data Assimilation System phase 2 (NLDAS-2; Xia et al. 2012). LIS includes statistical downscaling procedures for matching meteorological data to the specified spatial resolution of the LSM. The 1/8° spatial resolution NLDAS-2 forcing data are downscaled to ∼1 km through a bilinear spatial interpolation approach. The model was first spun up for 72 years beginning in January 1979 and running through January 2020 twice until the simulation begins in September 2009. We also simulate the same time period using the default parameters to understand how calibration impacts the Noah-MP results. We distinguish between the two simulations as Noah-MP-Cal and Noah-MP-Def to represent the calibrated and default configurations, respectively.

b. Noah-MP parameter calibration

Previous studies suggest that LSMs underestimate snow accumulation, particularly in mountains (Broxton et al. 2016b; Wrzesien et al. 2017, 2018). A recent model intercomparison using an ensemble of LSM simulations from LIS highlighted the model disagreement and uncertainty of snow estimation over North America, including mountain areas (Kim et al. 2021). To improve Noah-MP simulations, we select 24 parameters for calibration (Table 1), based on previous sensitivity studies (Cuntz et al. 2016; Mendoza et al. 2015) and their relationship to modeled snow processes. In Noah-MP-Def, these parameters are either hard-coded, often to a single spatially uniform value, or provided in lookup tables that vary based on land or soil properties. In contrast, the results from calibration are spatially distributed parameters that can vary across the domain (Fig. 2). In addition to 23 existing parameters within Noah-MP, we include a snowfall scale factor in the calibration. Precipitation underestimation will impact the snow simulation and lead to biases throughout the snow season. The inclusion of a snowfall scale factor allows us to target the uncertainty resulting from biases in precipitation forcing. All 24 parameters are explored in point scale and full domain tests, though only the parameters that are sensitive enough to warrant calibration are described in section 4a.

Fig. 2.
Fig. 2.

Calibrated parameters after the genetic algorithm procedure. Shown here are the 11 parameters that are most sensitive to calibration.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Table 1

Calibration parameters including default values, calibration range, and average calibrated value. The calibration range reference, when applicable, is noted. Otherwise, the calibration range is ±20% of the default value.

Table 1

Noah-MP is calibrated against SWE estimates from the University of Arizona dataset (UA; Zeng et al. 2018) in an optimization approach. The UA data product provides SWE at 4-km spatial resolution over the conterminous United States (Zeng et al. 2018). Estimates are provided daily between 1981 and 2020. UA is based on the assimilation of in situ measurements of both SWE and snow depth (Broxton et al. 2016a) and precipitation and temperature values from the PRISM dataset (Daly et al. 2000). UA has been evaluated against multiple datasets (Dawson et al. 2018), including airborne lidar measurements of snow depth. We note that any biases in UA SWE will likely be reflected in the calibrated parameters and the resulting simulations; however, such biases, especially in gridded observation-based data products like UA, are unavoidable.

We calibrate over water years 2007–09. This period was selected by examining domain averaged SWE from water years 1982–2020 from the UA record. From comparisons of domain-wide average maximum SWE and depth, this period included average (2009), high (2008), and low (2007) snow conditions for the study region. Domain-wide average maximum SWE (snow depth) for water years 2007, 2008, and 2009 is 135.5, 231.0, and 162.4 mm, respectively, (524.0, 850.1, and 536.9 mm, respectively) versus the long-term mean of 163.5 mm (618.5 mm).

For calibration, we use a genetic algorithm (GA), which is part of the LIS-Optimization and Uncertainty subsystem (Kumar et al. 2012b). The GA is a common stochastic tool used in hydrology model optimization (Duethmann et al. 2014; Isenstein et al. 2015; Shafii and De Smedt 2009; Wang 1991; Yapo et al. 1998) and is designed to mimic biological evolution where the fittest of the population (i.e., parameter sets), as determined through comparison to an observational dataset, survive and move to the next generation. Within each generation, crossover and mutation operators are used to produce new parameter estimates and to introduce diversity in the parameter set. To ensure good solutions are not lost between generations due to either crossover or mutation operators, an elitism strategy is used, where the best solution is carried over to the next generation. Over many generations, the average fitness, which reflects the quality of the solution, tends to increase due to the selection of individuals that compare favorably to observations.

GAs aim to prevent overfitting through an ensemble approach and by introducing poor performing solutions through mutation operators. Since they do not rely on gradient information, GAs can handle local optima and discontinuities in the search space, unlike gradient search. Since GAs require an ensemble that must be run over several generations, they are computationally expensive. Running 50 generations of the GA with 30 ensemble members for three water years over the study domain requires a total running time over 480 h, or over 20 days of continuous simulation, with 532 processors.

Within LIS, the GA does not provide estimates of parameter uncertainty. For estimating parameter uncertainty, variants of Markov chain Monte Carlo methods such as differential evolution Monte Carlo (ter Braak and Vrugt 2008) would be required; however, algorithms such as these have a high computational cost, with run times an order of magnitude higher than GA (Harrison et al. 2012), making their implementation over a domain size such as ours difficult. Since the primary objective of this study is to produce a better snow simulation, a thorough investigation into the parameter uncertainty is omitted. More detail on GAs within the LIS framework is discussed by Kumar et al. (2012b).

The GA results in calibrated values for a set of parameters that allow for the best match with observations. The range in parameter values for calibration (see Table 1) are either taken from the literature or allowed to vary ±20% of the default value, following Cuntz et al. (2016). As an objective function, we consider the squared difference between the observation and the model:
Ji=(diodim)2,
where dio is snow depth from the observations (UA) for grid cell i and dim is snow depth from the model (Noah-MP) for grid cell i. We minimize Ji for each grid cell i independently in the calibration, resulting in parameters that vary spatially (Fig. 2). In contrast, Noah-MP-Def has spatially uniform parameters. UA, produced at 4 km, is rescaled to match the Noah-MP resolution through bilinear interpolation during calibration.

c. Evaluation datasets

In addition to comparing Noah-MP estimates to UA, we evaluate snow simulations against a suite of independent datasets using the Land Surface Verification Toolkit (LVT; Kumar et al. 2012a). First, we compare snow depth across the full domain to the Snow Data Assimilation System (SNODAS; Carroll et al. 2001), which is an operational dataset available over the contiguous United States at approximately 1-km spatial resolution. Both Noah-MP simulations are evaluated against UA and SNODAS for the full analysis period of water years 2010–20. UA and SNODAS are both reprocessed in LVT to match the spatial resolution of Noah-MP.

We also compare to snow depth measurements from the Global Historical Climatology Network (GHCN; Menne et al. 2012); the western Colorado domain includes 79 GHCN stations with snow depth observations. Stations within the domain include a range of elevations (1467–3422 m) with an average station elevation of 2349 m. This compares to the full Noah-MP domain with elevations ranging from 1399 to 4185 m and an average elevation of 2639 m; approximately 9% of GHCN stations within the domain have elevations > 3000 m, compared to 26% of the full domain. While GHCN stations undersample higher elevations within the western Colorado domain, they provide an additional evaluation dataset for snow depth. GHCN data are available for water years 2010–16.

We also compare Noah-MP to datasets collected from the 2017 NASA SnowEx field campaign in Colorado. First, we evaluate Noah-MP against snow pit observations of snow depth and SWE from SnowEx (Elder et al. 2018) at Grand Mesa and Senator Beck, which were collected between 6 and 25 February 2017. For a spatial comparison, we evaluate Noah-MP snow depth against Airborne Snow Observatory (ASO) lidar observations of snow depth, which are produced at 3-m spatial resolution (Painter 2018). Here we use ASO flights over Grand Mesa from 8 to 16 February; though other flights are available for the 2017 field campaign, other days either included artifacts from the lidar collection or excluded portions of the mesa.

In addition to observations from SnowEx, Noah-MP is evaluated against a SnowModel simulation over Grand Mesa for the 2017 campaign, as described in Webb et al. (2020). SnowModel is a widely used snow model that simulates distributed snow properties in space and time and can be configured to simulate a single or multilayer snowpack (Liston and Elder 2006a; Liston and Sturm 1998). SnowModel is designed to include four interconnected models: MicroMet for processing and downscaling meteorological forcing data (Liston and Elder 2006b), EnBal for calculating the energy balance of the snowpack, SnowPack for simulating the snowpack in space and time, and SnowTran-3D for computing redistribution of snow due to wind (Liston and Sturm 1998; Liston et al. 2007). Webb et al. (2020) configure SnowModel to simulate a single layer snowpack over Grand Mesa for the 2016/17 water year to coincide with the SnowEx field campaign in February 2017. They use station observations as meteorological forcing data, including data from the Grand Mesa Study Plot (Skiles 2018), four SnowEx campaign weather stations, and three nearby Snowpack Telemetry (SNOTEL) sites. SNOTEL sites provide temperature and precipitation observations, and all other stations provide temperature, wind speed/direction, humidity, and radiation. No adjustment of precipitation or other forcing data were made, and SnowModel simulations were independent of any snow observations. Elevation data were from the 1/3-arc-s USGS National Elevation Dataset, while vegetation data were taken from 30-m USGS LANDFIRE v.1.4 Existing Vegetation Type data (Rollins 2009) and reclassified to SnowModel vegetation types. Webb et al. (2020) ran SnowModel at multiple spatial resolutions, but here we consider SWE and snow depth outputs from their 30-m simulation. Webb et al. (2020) provide additional information on the SnowModel configuration and evaluation.

For spatial evaluations against both ASO and SnowModel, we calculate the spatial efficiency (SPAEF; Koch et al. 2018; Demirel et al. 2018), which combines histogram matching, spatial correlation coefficient, and spatial variability error to evaluate spatial patterns. SPAEF is defined as
SPAEF=1(α1)2+(β1)2+(γ1)2,
where α=ρ(obs,mod),β=(σmod/μmod)/(σobs/μobs), and γ=j=1nmin(Kj,Lj)/j=1nKj. Here α is the Pearson correlation coefficient between the observation (ASO lidar or SnowModel simulation) and the model (Noah-MP), β is the fraction of the coefficient of variation, which represents spatial variability, and γ is the histogram intersection for the histogram of the observation K and the histogram from the model L (Swain and Ballard 1991). SPAEF has an optimal value of 1.

For streamflow, we compare to natural flow estimates for four basins in the Upper Colorado River basin (UCRB) that lie completely within the model domain (see Table 6). Natural flow estimates are from the Bureau of Reclamation and are available monthly between 1901 and 2018 (Prairie and Callejo 2005). We also compare to daily, unregulated streamflow for two basins from the Catchment Attributes and Meteorology for Large Sample Studies (CAMELS; Newman et al. 2015; Newman et al. 2014) dataset. We only use streamflow observations between 2009 and 2014 for the two CAMELS basins, and the daily streamflow has been processed into monthly averages. Since Noah-MP does not include human management on streamflow networks, we cannot compare model-simulated runoff to stream gauge observations, due to water diversions, dams, and other water management practices. Instead, we compare monthly gridcell-generated runoff—the summation of surface runoff and subsurface runoff—to monthly observations over small unmanaged basins and to estimated natural flow (i.e., runoff in the absence of human management) in larger basins. Using total runoff at monthly scales as a proxy to streamflow is a valid assumption (Chow 1964) and a strategy used in other studies (e.g., Koster et al. 2010). We evaluate monthly streamflow with Nash–Sutcliffe efficiency metrics (NSE; Nash and Sutcliffe 1970), where a perfect fit with observations has NSE = 1, and NSE > 0 indicates the model has better predictive skill than the mean of the observations.

3. Results

a. Calibration

We initially run point-scale calibration tests with 23 selected parameters from the snow modules within Noah-MP (Table 1). Noah-MP-Cal generally improved the snow ablation timing in spring months relative to Noah-MP-Def. However, maximum snow conditions remained largely underestimated, particularly for sites with deep snowpack (not shown). After implementation of a snowfall scaling factor, described below in Eq. (5), as an additional calibration parameter, test simulations resulted in snow depths in better agreement with UA estimates. Therefore, for calibration over the full domain, we include 24 spatially variable parameters: 23 from Noah-MP and an additional snowfall scale term (Fig. 2).

Though we include 24 parameters in the GA procedure, only 11 were sensitive to calibration. We determine that 13 are not sensitive because they do not demonstrate any noticeable spatial patterns such as those reported in Fig. 2 and instead calibrated values have noisy spatial patterns (see Fig. S2). Some of the 11 selected parameters have regions of noisy artificial patterns in regions of the domain that were insensitive to calibration, often in portions of the domain where less snow accumulates (Fig. 2). Despite these regions, we look further into the 11 sensitive parameters. The first four parameters are used within the CLASS snow albedo scheme (Verseghy 1991) and include minimum snow albedo (MNSNALB), maximum snow albedo (MXSNALB), the exponent in the snow albedo decay relationship (SNDECAYEXP), and the new snow mass required to cover old snow (SWEMX). These parameters are used in each time step to calculate snow albedo. First, the albedo of the snow cover for the new time step is determined as
αs(t)=MNSNALB+[αs(t1)MNSNALB]exp[SNDECAYEXPΔt3600],
where αs is snow albedo at time step t or t − 1 and Δt is the model time step. If new snow has fallen in an amount larger than SWEMX, snow albedo is refreshed to a value of MXSNALB.

The next group of calibration parameters relates to the rain–snow partitioning scheme used here, i.e., the Jordan (1991) scheme from the SNTHERM model. In this method, if air temperature is above the upper temperature limit (TULIMIT), all precipitation is rainfall. At air temperatures below the lower temperature limit (TLLIMIT), all precipitation is snowfall. For temperatures between TLLIMIT and a middle threshold (TMLIMIT), the fraction of precipitation that is frozen is a function of air temperature. At temperatures between TMLIMIT and TULIMIT, the fraction of precipitation that is frozen is set to 0.6. In the calibration procedure, TLLIMIT < TMLIMIT < TULIMIT.

The remaining four parameters are from different schemes throughout Noah-MP, and three were highlighted by Mendoza et al. (2015) as key parameters for model sensitivity. These include the exponent used in the snow depletion curve (MFSNO), liquid water holding capacity (SSI), and snow surface roughness length (Z0SNO). MFSNO is used within Noah-MP to calculate the fractional portion of the grid cell that is snow covered, as shown in Eq. (4) below from Niu and Yang (2007):
fsno=tanhhsno2.5z0g(ρsnoρnew)MFSNO,
where hsno is snow depth, z0g is the bare soil roughness length, ρsno is bulk density of snow, and ρnew is the density of new snow, which is set to 100 kg m−3. The parameter fsno is used throughout Noah-MP to scale gridcell calculations into snow-covered and non-snow-covered fractions, including within surface radiation calculations.

SSI and Z0SNO are each used only once in the Noah-MP code. SSI is included in the calculation of snow layer liquid water, which determines the rate of exfiltration of snowmelt release from the bottom of the snowpack. Z0SNO is used to calculate the surface roughness length for turbulent flux calculations over snow covered ground.

The final calibration parameter is the snowfall scaling term, SNOWF_SCALEF, which was included to address uncertainty in precipitation forcing data. SNOWF_SCALEF is described as
S=PficeSNOWF_SCALEF,
where S is snowfall, P is total precipitation, and fice is the fraction of the precipitation that is frozen. The snowfall scale factor is applied to frozen precipitation to reduce the bias introduced from NLDAS-2. Other studies introduce a similar precipitation scaling factor in optimization or assimilation experiments. Smyth et al. (2020), who also used NLDAS-2 for model forcing data, use a snowfall correction factor to scale precipitation at their SNOTEL study sites across the western United States. In their work, the average snowfall correction factor is 1.64, indicating NLDAS-2 underestimates mountain snowfall by more than 50%. In Smyth et al. (2020) and here, NLDAS-2 snowfall is too low and must be scaled to larger values to produce realistic snow accumulation. Other studies have also included a correction factor to address biases in snowfall from meteorological data (Magnusson et al. 2017; He et al. 2011; Franz and Karsten 2013). Errors in forcing data, particularly precipitation, have a large impact on snow modeling performance (Raleigh et al. 2015; Schmucki et al. 2014; Henn et al. 2018), and including a snowfall scaling term in the calibration procedure can help address this bias.

b. SWE and snow depth evaluation

1) UA and SNODAS comparisons

(i) Full domain comparison

Figure 3 shows the time series of average SWE and average snow depth across the domain for Noah-MP-Cal, Noah-MP-Def, and the UA dataset. In nearly all cases, calibration results in more snow and later snowmelt. Occasionally, Noah-MP-Cal produces more snow accumulation than the UA dataset, such as in 2015 and 2017 (Fig. 3). Over the 11-yr simulation, Noah-MP-Cal has larger magnitudes of snow depth and SWE; average maximum SWE (depth) from Noah-MP-Cal is 166.7 mm (0.61 m), while average maximum SWE (depth) from Noah-MP-Def is 131.8 mm (0.52 m).

Fig. 3.
Fig. 3.

Time series of average SWE (mm) and average snow depth (m) over the full domain for calibrated (blue), uncalibrated (orange), and UA (black) estimates.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Spatially, Noah-MP-Cal produces greater 1 April SWE at higher elevations across the domain, averaged over the water year 2010–20 simulation period (Fig. 4). Estimates from Noah-MP-Def have similar domain-wide averages as Noah-MP-Cal (Fig. 3), but the snow is less spatially variable. This is contrasted with Noah-MP-Cal where snow accumulation more closely follows local topography. We also compare Noah-MP-Cal and Noah-MP-Def to UA and SNODAS at six evaluation points throughout the domain that correspond to SnowEx field campaign sites (Table 2). At these points, Noah-MP-Cal generally has smaller biases and RMSE than Noah-MP-Def for the UA comparison (Table 3). Noah-MP-Cal also tends to perform better than Noah-MP-Def when evaluated against SNODAS (Table 3). Noah-MP-Cal has smaller bias and RMSE at all evaluation points except Fool Creek and Senator Beck, the two highest elevations stations. For a similar comparison but for SWE, see Table S2.

Fig. 4.
Fig. 4.

Average 1 April SWE (mm) for (a) the calibrated simulation, (b) the uncalibrated simulation, (c) UA observations, and (d) SNODAS observations. All plots use the same color bar.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Table 2

Details of six evaluation points, including location, elevation, and percent tree canopy cover.

Table 2
Table 3

Snow depth bias and RMSE for calibrated and uncalibrated Noah-MP simulations compared to UA and SNODAS for six SnowEx field site locations and the full western Colorado domain. Bold indicates better performance, and for the overall domain comparisons, an asterisk (*) indicates a statistically significant difference between the two model performances.

Table 3

To compare Noah-MP-Cal and Noah-MP-Def against UA and SNODAS over the full domain, we first calculate the SPAEF [Eq. (2)] to evaluate spatial performance. Compared to UA, Noah-MP-Cal has a SPAEF of 0.799 and Noah-MP-Def has a SPAEF of 0.508. For SNODAS, Noah-MP-Cal also has a higher SPAEF metric: 0.722 versus 0.460 for Noah-MP-Def. For RMSE (Fig. 5), higher elevations tend to have larger RMSE values, particularly for Noah-MP-Def compared to both SNODAS and UA. Noah-MP-Cal has high RMSE values in the central northern portion of the study domain. This area has much larger values of snow depth in Noah-MP-Cal than Noah-MP-Def, and the snowfall scale factor from calibration is high in the area (up to 2.5–3, compared to the domain average of 1.16), leading to increased precipitation and higher snow accumulations (discussed in section 5). Aside from this anomalous region and an area in the southern portion of the domain, Noah-MP-Cal generally reduces the UA snow depth RMSE (Fig. 5c), particularly at higher elevations. Averaged over the domain, Noah-MP-Cal has a slightly lower RMSE (0.13 m) than Noah-MP-Def (0.15 m) compared to UA (Table 3). Performance between Noah-MP-Cal and Noah-MP-Def is similar for SNODAS as for the UA comparison. Averaged over the full domain, Noah-MP-Def is in better agreement with SNODAS (RMSE of 0.18 m) than Noah-MP-Cal (RMSE of 0.19 m), though results are generally similar.

Fig. 5.
Fig. 5.

(a)–(c) Snow depth RMSE for Noah-MP-Cal and Noah-MP-Def compared to UA for the full analysis period. The right column shows the difference in RMSE values between Noah-MP-Cal and Noah-MP-Def. (d)–(f) As in (a)–(c), but compared against SNODAS.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Similar to RMSE, we also compare temporal bias over the full domain (Fig. 6). Noah-MP-Def has a negative bias for higher elevation grid cells compared to both UA and SNODAS. This suggests that Noah-MP-Def is underestimating snow accumulation in the mountains, highlighting the known dry bias of LSMs (e.g., Chen et al. 2014b; Holtzman et al. 2020; Kumar et al. 2019; Wang et al. 2019; Xia et al. 2017). Noah-MP-Cal bias spatial patterns are similar between both UA and SNODAS, with a large positive bias in the central northern portion of the domain due to anomalously high values of snow depth. Averaged over the full domain, Noah-MP-Cal versus UA has a bias of nearly zero (−0.0023 m), compared to Noah-MP-Def of −0.036 m (Table 3). For both UA and SNODAS comparisons, Noah-MP-Cal has more instances of positive bias at higher elevations (>3500 m), while these same grid cells in Noah-MP-Def tend to have negative biases. Noah-MP-Def underestimates snow accumulation at high elevations and calibration somewhat addresses these biases, though can result in too much snow in some regions.

Fig. 6.
Fig. 6.

As in Fig. 5, but for snow depth bias.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

(ii) Seasonal comparison

During the accumulation season (December–February), calibration increases the domain averaged snow depth by almost 18%, from a −14.0% difference with Noah-MP-Def to a +1.4% difference with Noah-MP-Cal, relative to UA. RMSE also improves slightly from 0.162 to 0.142 m. Similarly, for the peak snow season (March and April), calibration results in an improvement of snow depth percent difference from −24.8% (Noah-MP-Def) to −5.1% (Noah-MP-Cal). RMSE decreases from 0.269 m with Noah-MP-Def to 0.215 m with Noah-MP-Cal, a 20% improvement. Calibration results in large improvements for the ablation season (May–July), increasing the domain averaged snow depth by 45.4%. Noah-MP-Def mean snow depth is 31.6% less than the UA estimate, while Noah-MP-Cal is comparable to UA, only −0.5% smaller. RMSE decreases by over 12%, from 0.0981 m with Noah-MP-Def to 0.0863 m with Noah-MP-Cal. Across the full domain, calibration addresses the underestimation of snow throughout the full water year, though with slightly too much snow during the peak snow season.

At the gridcell scale, Noah-MP-Cal generally has more snow accumulation and a later end to the snow season than Noah-MP-Def, as shown in Fig. 7 for Senator Beck. Point scale evaluations have a better agreement between UA and Noah-MP-Cal, with RMSE declining by 4.23 cm for peak season. During the accumulation and ablation seasons, results are different, where Noah-MP-Def has smaller bias and RMSE. Noah-MP-Cal overestimates UA in the spring for several years (Fig. 7a), with snow lingering longer than observed in UA for water years 2015, 2017, and 2019. Performance is similarly mixed at other study points (Table 4), where calibration may improve performance during all seasons (Cameron Pass, Niwot Ridge, Skyway/Grand Mesa) or may degrade performance, depending on the season (accumulation and peak for Fool Creek, ablation for Rock Creek, and accumulation and ablation for Senator Beck). Comparing SWE bias and RMSE over different seasons has similar results (Table S3).

Fig. 7.
Fig. 7.

Evaluation of calibrated and uncalibrated Noah-MP over a single point in the Senator Beck basin. (a) Time series of daily snow depth over the grid cell that contains the Senator Beck study site. Scatterplot of Noah MP simulated snow depth verses UA snow depth for both calibrated (blue) and uncalibrated (orange) simulations, separated in (b) accumulation (December–February), (c) peak (March–April), and (d) ablation (May–July) seasons.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Table 4

Comparison of seasonal bias and RMSE of snow depth for evaluation points from Noah-MP-Def and Noah-MP-Cal for the accumulation (December–February), peak snow (March–April), and ablation (May–July) seasons. Bold indicates better performance.

Table 4

(iii) Comparison over vegetation class

We next aggregate the 20 LIS land cover classifications into five broader groups—forest, shrubland, grassland, cropland, and barren (see inset in Fig. 1 and see Table S4 for statistics)—and compare Noah-MP-Cal versus Noah-MP-Def against both UA and SNODAS (Figs. 8a,c). For average snow depth bias, Noah-MP-Cal performs better than Noah-MP-Def across land covers. Most comparisons have a negative bias, indicating that Noah-MP-Cal and Noah-MP-Def have less snow than either UA or SNODAS, though magnitude of the bias is generally smaller than 0.05 m. The exception is for the barren land cover class, which is the category with the fewest grid cells (1564 or 1.4% of the domain) and the land class with the highest average elevation (3178 m versus a domain average of 2639 m). Comparing with SNODAS, Noah-MP-Def has smaller RMSE than Noah-MP-Cal. In all classes except cropland, Noah-MP-Cal has a lower RMSE when compared to UA.

Fig. 8.
Fig. 8.

(a) Snow depth bias and (c) RMSE from Noah-MP-Cal and Noah-MP-Def compared to UA and SNODAS for five aggregated land cover categories. (b) Snow depth bias and (d) RMSE for the forest land cover category separated into elevation bands. Bias and RMSE values are temporal averages from the full analysis period.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Modeling in forested regions is often challenging due to uncertainty in snow-canopy interactions (Essery et al. 2009; Krinner et al. 2018). Therefore, we further subdivide the forest class into elevation bands to single out the impact of elevation on a land cover class with higher uncertainty (Figs. 8b,d and see Table S4 for number of grid cells within each category). Results are similar to the full land cover comparison, where Noah-MP biases are negative, and Noah-MP-Cal has smaller bias and RMSE than Noah-MP-Def. Higher elevations have larger biases and RMSEs. At forested elevations below 3000 m, Noah-MP-Cal and Noah-MP-Def have similar values of RMSE. Calibration decreases errors in the higher elevation grid cells, which is often where more snow accumulates due to colder temperatures coupled with orographic lifting. We also calculate the ratio of RMSE to mean snow depth (not shown), and for Noah-MP-Cal, this metric decreases with elevation, while for Noah-MP-Def, it increases above 2500 m. Much of the increase in Noah-MP-Cal RMSE is due to deeper snowpacks at higher elevation. SNODAS and UA are both based on observational datasets, which likely have larger uncertainty in forests. Noah-MP-Cal is in better agreement with the observation-based gridded data products than Noah-MP-Def, but the true accuracy in forested environments is limited by a lack of observations in forests.

2) GHCN comparisons

Across 79 GHCN stations, Noah-MP-Cal is less biased (0.0049 m) and has a lower RMSE (0.15 m) than Noah-MP-Def (bias of −0.04 m and RMSE of 0.20 m). Noah-MP-Cal generally reduces the snow depth bias in Noah-MP-Def in the Front Range and broadly reduces RMSE across the full domain (Fig. 9). While results are generally similar between Noah-MP-Cal and Noah-MP-Def, the evaluation with GHCN demonstrates an additional independent check that calibration improves the performance of modeled snow depth.

Fig. 9.
Fig. 9.

(a)–(c) Snow depth bias and (d)–(f) RMSE from Noah-MP-Cal and Noah-MP-Def compared to GHCN station observations. The right column shows the difference between Noah-MP-Cal and Noah-MP-Def for bias and RMSE.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

3) SnowEx comparisons

Finally, we also evaluate snow depth and SWE against 264 snow pit observations from the NASA SnowEx 2017 field campaigns at Grand Mesa and Senator Beck (Fig. 10). Here we include SnowModel simulations in the comparison to consider a snow process model. For both Noah-MP and SnowModel, we select the grid that contains each snow pit for the comparison. SnowModel is kept at its native 30-m resolution, though we also tested average SnowModel grid cells to the Noah-MP resolution and results were similar. The majority of pit observations (n = 224) are from Grand Mesa, where there is better agreement after calibration for snow depth (Table 5): mean bias decreases from −48.2 to −12.1 cm (mean percent absolute difference decreases from 32.2% to 20.0%) and RMSE decreases from 54.4 to 34.9 cm. Similar for SWE, Noah-MP-Cal has a smaller SWE mean bias at Grand Mesa than Noah-MP-Def (−23.0 versus −160.6 mm) and a smaller RMSE (132.9 versus 185.4 mm). For SWE, SnowModel has better agreement with snow pits than either Noah-MP simulation, though the performance of SnowModel and Noah-MP-Cal are comparable for snow depth, with Noah-MP-Cal having smaller MAE and RMSE. Similar performance for snow depth and SWE disagreements may be due to different density estimates in SnowModel and Noah-MP-Cal. At Senator Beck (n = 40 pits), where we do not have SnowModel simulations, Noah-MP-Cal greatly improves upon Noah-MP-Def evaluation metrics for both snow depth and SWE: for snow depth (SWE), RMSE increases from 49.7 to 102.5 cm (167.6–413.0 mm). This highlights the uneven performance across the domain after calibration.

Fig. 10.
Fig. 10.

Comparison of (a),(c) SWE and (b),(d) snow depth between Noah-MP and SnowModel simulations and observations from snow pits during the SnowEx 2017 field campaign. In all plots, blue squares are calibrated Noah-MP, orange squares are default Noah-MP, and yellow squares are SnowModel (at native 30-m resolution). Plots (a) and (b) compare snow pit measurements for Grand Mesa and plots (c) and (d) compare for Senator Beck.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Table 5

Error metrics for snow depth and SWE comparing Noah-MP-Cal and Noah-MP-Def with snow pit observations from Grand Mesa and Senator Beck SnowEx study sites from the February 2017 field campaign. Bold indicates better performance between the two Noah-MP configurations.

Table 5

Spatially, Noah-MP-Def has much lower values of snow depth than measured in the snow pits on a single day (Fig. 11). Noah-MP-Cal, on the other hand, has spatial patterns that better match the snow pits observations throughout the Grand Mesa study site, capturing the overall east–west gradient seen in the snow pit observations and in the SnowModel simulation. Calibrated Noah-MP at a 1-km resolution has similar error metrics to an uncalibrated snow process model at a 30-m resolution, but this evaluation is only possible over a small portion of the full domain.

Fig. 11.
Fig. 11.

Comparison of (a) Noah-MP-Cal, (b) Noah-MP-Def, and (c) SnowModel snow depth estimates with snow pit observations for 22 Feb 2017, over the SnowEx Grand Mesa field campaign site. SnowModel is shown at 30-m spatial resolution. Snow pit depths and model depths are on the same color scale.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Finally, we evaluate Noah-MP simulations against ASO lidar snow depth observations from SnowEx flights on 8 and 16 February (Painter 2018). Spatially, estimates from ASO, Noah-MP-Cal, and Noah-MP-Def have somewhat similar patterns on each flight day, with snow depth tending to increase toward the eastern portion of the domain (Fig. 12). ASO and Noah-MP-Cal also show that snow depth increases from the north to the south across the domain; Noah-MP-Def, on the other hand, has lower variability across the domain. Note the deeper band of snow in the ASO observations along the northern portion of the domain. The deeper snow here is likely due to snow accumulating at the base of the cliff. Snow persistence maps (Fig. S3) show that snow historically lingers longer along the base of the cliff, suggesting that the deeper snow depths in ASO are plausible. Noah-MP, with grid cells orders of magnitude coarser than ASO, cannot capture this fine scale spatial pattern. For both flight days, Noah-MP-Cal has higher values of SPAEF, which indicates better spatial agreement with ASO observations: on 8 February, Noah-MP-Cal has a SPAEF value of 0.408 and Noah-MP-Def has a value of 0.253; on 16 February, Noah-MP-Cal has a SPAEF of 0.516 compared to 0.195 from Noah-MP-Def.

Fig. 12.
Fig. 12.

Comparison of snow depth from ASO flights with Noah-MP-Cal and Noah-MP-Def over Grand Mesa for 8 and 16 Feb 2017. Spatial maps are all at native resolution: 3 m for ASO and 0.01° for Noah-MP simulations. Scatterplots compare Noah-MP simulations to ASO observations, where ASO has been upscaled to 0.01° resolution.

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

Originally collected at 3-m spatial resolution, ASO snow depth observations are aggregated to 0.01° resolution to match the Noah-MP simulations by averaging together over 100 000 ASO 3-m grid cells. In evaluations of Noah-MP grid cells against aggregated ASO depth observation, Noah-MP-Def underestimates ASO, and Noah-MP-Cal overestimates for snow depths above 1.5 m (Fig. 12). For each flight day, Noah-MP-Cal has smaller RMSE, MAE, and bias magnitude than Noah-MP-Def. From this comparison, calibration may lead to overestimates of snow depth in some regions, but calibration introduces more realistic spatial patterns of snow depth, as compared to ASO observations.

c. Streamflow evaluation

Beyond impacts on snow depth and SWE, calibration will impact LSM simulation of other hydrological variables. For six basins within the Colorado domain with little-to-no human management, calibration can improve streamflow estimation (Fig. 13). Of the six basins, four have higher NSE values for Noah-MP-Cal than Noah-MP-Def. After calibration, however, four of the six basins still have negative NSE values, though the streamflow bias may not all be due to snow. For the two basins with NSE > 0 (9072500 and 9081600), calibration improves performance, though only 9072500 has a statistically significant difference in monthly streamflow between Noah-MP-Cal and Noah-MP-Def. In two basins, both on the Gunnison River (9124700 and 9127800), calibration leads to a larger overestimation in streamflow for some evaluation years. For the Colorado River at Glenwood Springs (9072500), Noah-MP-Def largely underestimates streamflow, and calibration addresses this bias through increased runoff. In most years for most basins, Noah-MP-Cal has later peak streamflow, in agreement with the observations, which is also noted in Fig. 13. In 9107000, where Noah-MP-Def overestimates observations, Noah-MP-Cal decreases the magnitude of the bias, though Noah-MP-Cal still overestimates slightly; in 9081600, Noah-MP-Def underestimates observed streamflow, and the calibrated runoff value is a better match for the observations. This demonstrates that calibration does not increase snow and runoff in one direction, but rather calibration can improve upon both positive and negative biases. Results similar to the small basin analysis are seen across the full model domain, including higher springtime streamflow in Noah-MP-Cal compared to Noah-MP-Def (Figs. S4a–c). Peak streamflow in Noah-MP-Cal also generally occurs later in the year than Noah-MP-Def (Fig. S4f), in agreement with later snowmelt in Noah-MP-Cal (Figs. 3 and 8).

Fig. 13.
Fig. 13.

(a)–(d) Comparison of runoff (m3 month−1) between Noah-MP simulations and estimates of naturalized flow for four subbasins in the Upper Colorado River basin. (e),(f) Comparison of runoff (m3 month−1) between Noah-MP simulations and observed streamflow from USGS stream gauges for small unmanaged subbasins, selected from the CAMELS database. Stream gauge locations are shown on Fig. S4. Dashed lines in all plots show basin snow water storage (km3).

Citation: Journal of Hydrometeorology 23, 3; 10.1175/JHM-D-21-0071.1

4. Discussion

a. Summary of results

Here we investigate the impact of model calibration on simulations of snow depth, SWE, and streamflow. From this calibration exercise, we aim to answer the three research questions posed in the introduction. First, calibration can address dry biases in LSMs, which often result in underestimation of snow. We show improvements to not only simulated SWE and snow depth magnitude but also to timing, of both accumulation and ablation periods. Calibration also results in Noah-MP-Cal performing about as well as an uncalibrated, high-resolution snow process model (Table 5; Fig. 11). While evaluations of Noah-MP and SnowModel are limited to Grand Mesa, the spatial variability of snow depth across the mesa are similar in SnowModel and Noah-MP-Cal, though SnowModel simulations produce more detail with the finer spatial resolution. When comparing both models to snow pit measurements, Noah-MP-Cal actually has better performance for snow depth, though Noah-MP-Cal error metrics are larger for SWE. Results are similar for high resolution ASO lidar, where Noah-MP-Cal captures the realistic spatial variability in ASO estimates, suggesting that, over Grand Mesa at least, the calibration procedure largely improves the model simulation.

Second, impacts from calibration are observed beyond snow variables. For streamflow, Noah-MP-Cal improves estimates for four of the six study basins. Here we are limited to small unmanaged basins or reconstructed estimates of natural streamflow since Noah-MP does not include human management. We note that we are not evaluating routed streamflow here, but instead, we consider gridcell estimates of surface and subsurface runoff. Future work should consider dynamically routed streamflow in order to account for time lags between the upper reaches of the watershed and the evaluation point with the stream gauge. Even with those considerations, improved NSE metrics suggest that the increased snowpack in Noah-MP-Cal results in streamflow magnitude and timing that better matches observations.

Finally, the calibration highlights potential avenues for improving both model configuration and meteorological forcing data, though calibrated parameters may be reflective of the choice of forcing dataset (Elsner et al. 2014). The genetic algorithm procedure produces spatially varying model parameters, as compared to the spatially uniform parameters used in the default Noah-MP configuration. In particular, we highlight 10 parameters within Noah-MP that are likely candidates for further investigation. We show that allowing these parameters to vary in space results in improved model performance compared to the default, spatially uniform values. Some of the parameters, such as SNOWF_SCALEF, appear to have a relationship with elevation (cf. Fig. 2f with Fig. 1), while other parameters, such as MXSNALB and Z0SNO, appear to be more related to land class category (cf. Figs. 2b,j with Fig. 1). Future efforts should determine new estimates for these parameters, perhaps through investigation of relationships with landscape characteristics, such as elevation, vegetation class, and soil type.

Global maps of the sensitive parameters could likely improve simulation of snow without the need for a computationally expensive calibration procedure. In addition to investigating relationships for creating spatially varying parameters, work should consider whether parameters should also vary in time. Creating new estimates of spatially and temporally varying parameters could improve snow modeling without the data requirement of calibration, which would have implications for our ability to estimate global snow, regardless of data availability. Efforts to scale snow parameters examined here to larger domains are under development, resulting in spatially varying parameter estimates for all of CONUS.

In addition to the 10 parameters from Noah-MP discussed above, results from calibration demonstrate that introducing the snowfall scaling term has a large impact on the snow accumulation magnitude. This points to the need for better meteorological forcing data, particularly for precipitation at high elevations. There is often high variability between precipitation estimates from differing models and reanalyses (Decker et al. 2012; Essou et al. 2016; Henn et al. 2018; Hughes et al. 2017; Wrzesien et al. 2019), and previous studies have suggested that NLDAS-2 precipitation is too low in mountain regions (Enzminger et al. 2019; He et al. 2019; Henn et al. 2018; Smyth et al. 2020); such uncertainty will be propagated into the LSM. However, improving large-scale precipitation estimates is not trivial, and model-based precipitation estimates often outperform observation-based estimates in mountain areas (Lundquist et al. 2019), despite known model biases. If we cannot improve estimates of precipitation and snowfall in the forcing datasets, informing modeled snowpack estimates with observations of SWE and snow depth is likely the best option. This calibration procedure highlights a method for addressing biases in both meteorological forcing and the LSM itself and results in improved simulations of snow in a topographically complex region.

b. Implications for snow OSSE

As discussed, the Noah-MP-Cal simulation presented here will be used as the nature run (NR) in a snow-focused observing system simulation experiment (OSSE), where the NR is designed to approximate the “truth,” i.e., actual snow conditions. Though calibration is not a panacea for reducing all model uncertainty, the improved performance from Noah-MP-Def to Noah-MP-Cal provides compelling support for Noah-MP-Cal to be the NR for the OSSE. Of particular concern when designing the OSSE was whether the NR could address the underestimation of snow at higher elevations, which is necessary for understanding how proposed sensors will observe realistic ranges of snow conditions. Calibrating Noah-MP against UA SWE estimates reduces the negative bias for SWE and snow depth and results in snow spatial heterogeneity that better matches both UA and SNODAS.

While Noah-MP-Cal is not without error, a NR is not expected to perfectly replicate actual conditions, and no true observations are used in an OSSE. Therefore, the spatial and temporal variability in Noah-MP-Cal is adequate for approximating realistic snow conditions for the western Colorado domain. The main drawback of Noah-MP as the NR—whether the default or calibrated configuration—is that Noah-MP does not provide estimates of snow grain size. Understanding how satellite observations are impacted by snow grain size and metamorphism is a fundamentally important question (Durand et al. 2018; Nolin 2010; Foster et al. 2005). However, no models within the current LIS framework provide estimates of snow grain size, though work is ongoing to implement new snow models into LIS. While Noah-MP-Cal will be used in the OSSE described here, future work will consider a follow on OSSE that incorporates a model that does include the simulation of grain size.

c. Challenges with calibration

With a calibration exercise such as this one, there are a few notable challenges. While calibration can lead to domain-averaged improvements in the targeted variable, as presented here for SWE and snow depth, it can cause degraded performance in individual regions across the domain. We see this in the northern portion of the domain (Fig. 4a) to the west of the Cameron Pass evaluation site (Fig. 1). After the genetic algorithm optimization, the snowfall scale term is high in this region (Fig. 2k), resulting in snow depths and SWE values that much larger than either UA or SNODAS. In the calibration period, SWE estimates from UA were particularly large, where the 2007–09 average peak SWE value for this area from UA is higher than the average peak SWE value for 2010–20, causing the calibration to be trained on higher-than-average SWE. Anomalies such as this from calibration are often unavoidable.

Another challenge with our calibration setup is that the parameters are constant in time. Therefore, even if Noah-MP-Def performs well compared to UA, the calibrated parameters will still be applied. For example, in water years 2017, 2019, and 2020, domain-averaged SWE and snow depth from Noah-MP-Def is similar to UA (Fig. 3). Applying the calibrated parameters generally results in increased snow values, and as a result, Noah-MP-Cal overestimates SWE in these years. Calibration improves performance over the full study period (Table 3), but it does not always result in better performance for an individual year or season. As discussed above with the spatial anomalies, calibration will not result in uniformly improved performance.

For all calibration procedures, such as the genetic algorithm used here, a truth dataset is required to calibrate against, and data availability is limited in many regions, especially in high elevations and high latitudes where much of the global snow accumulates. Therefore, while the calibration procedure presented here is a critical step for the ongoing OSSE and for improving the representation of the truth, calibrating over a well-observed Colorado domain may not necessarily improve the model performance of global snow. Results presented here may not reflect other regions with differing snow conditions, such as maritime snow in the Pacific Northwest or tundra snow in the high latitudes (e.g., Kim et al. 2021). Future work will investigate similar calibration methods in other regions. We hypothesize that in regions with high precipitation uncertainty, such as mountainous regions, the snowfall scaling term will have similar impacts on snow magnitude as presented here.

Since we only calibrate against SWE and do not include additional constraints in the objective function, such as for streamflow, the calibration cannot directly address biases in other model processes. In the streamflow analyses, we see that Noah-MP-Def does not have good agreement with the observed runoff (Fig. 13 and Table 6). However, further observational constraints or model improvements (possibly unrelated to snow processes) are required to address runoff biases that we show here. In operational modeling, it is standard to calibrate snowmelt rates to runoff (e.g., Hay et al. 2006; Franz and Karsten 2013; Turcotte et al. 2007), in order to constrain snow ablation. Here, though, we do not calibrate against runoff. Degradation in unconstrained variables, such as runoff, are not uncommon during calibration efforts (e.g., Franz and Karsten 2013; Nemri and Kennard 2020). Future efforts could consider multicriteria objective functions to reduce biases in both snow variables and streamflow.

Table 6

Nash–Sutcliffe efficiency values for calibrated and uncalibrated Noah-MP simulations for six subbasins, as described by their USGS stream gauge ID. Included in the number of Noah-MP grid cells within each subbasin. Bold indicates better performance. Asterisk indicates where monthly streamflow difference between Noah-MP-Cal and Noah-MP-Def is statistically significant at the 95% confidence level.

Table 6

Beyond the calibration, evaluating gridded data with point observations presents additional challenges. There are significant differences in what a ∼1-m observation, such as a snow pit or a GHCN station, measures and what a ∼1000-m model grid cell simulates. Since snow depth and SWE measurements are typically point observations, this imperfect comparison is often necessary for evaluating models. However, during extensive field campaigns, such as SnowEx 2017, numerous observations are made in a small domain over a short period of time. While the result is still point-to-grid comparisons, the high density of observation allows for a more complete evaluation, if over a limited domain. Though we do acknowledge the uncertainty from scale differences, we aim to provide as thorough an evaluation as we can for demonstrating evidence of the improved performance of Noah-MP-Cal over Noah-MP-Def through the numerous independent comparison datasets.

5. Conclusions

The Noah-MP-Cal and Noah-MP-Def evaluation demonstrates that calibrating a land surface model against an observation-based SWE dataset (e.g., the University of Arizona dataset), improves model performance of snow, though not uniformly across the domain. The calibration procedure was motivated by an ongoing Observing System Simulation Experiment (OSSE) to evaluate the utility of proposed snow satellite sensors, and Noah-MP-Cal will be used as the nature run for the OSSE. That is, the improved Noah-MP simulation will act as the truth for the OSSE, upon which synthetic observations will be created. However, results presented here have important implications beyond the OSSE. We demonstrate a method for improving spatiotemporal estimates of snow, and we show that spatially uniform values of key model parameters result in worse performance. Allowing parameters to vary spatially, as we do in the Noah-MP-Cal simulation after a genetic algorithm optimization procedure, results in improved model performance of both snow depth and SWE. Future model development could consider implementing distributed values of sensitive parameters, which might improve LSM simulations without the need for an initial calibration step.

Acknowledgments.

This research was supported by Grants from the National Aeronautics and Space Administration (AIST18-0041, AIST18-0045, and NNH16ZDA001N). Computing was supported by the resources at the NASA Center for Climate Simulations. We thank three anonymous reviewers for feedback that strengthened the overall manuscript.

Data availability statement.

All datasets described here for modeling forcing and evaluation are available through the provided citations within the text. The University of Arizona data are available for download from the National Snow and Ice Data Center (NSIDC; Broxton et al. 2019). The Noah-MP model simulations upon which this study is based are too large to be publicly archived with available resources, though all model output are stored on the NASA Discover supercomputer system through the NASA Center for Climate Simulations. To replicate the simulation, interested users can access the NASA Land Information System at https://github.com/NASA-LIS/LISF.

REFERENCES

  • Aguado, E., 1985: Radiation balances of melting snow covers at an open site in the Central Sierra Nevada, California. Water Resour. Res., 21, 16491654, https://doi.org/10.1029/WR021i011p01649.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Ahl, R. S., S. W. Woods, and H. R. Zuuring, 2008: Hydrologic calibration and validation of SWAT in a snow-dominated rocky mountain watershed, Montana, U.S.A. J. Amer. Water Resour. Assoc., 44, 14111430, https://doi.org/10.1111/j.1752-1688.2008.00233.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Amorocho, J., and B. Espildora, 1966: Mathematical simulation of the snow melting process. Department of Water Science and Engineering, University of California, Davis, 156 pp.

    • Search Google Scholar
    • Export Citation
  • Anderson, E. A., 1973: National Weather Service River Forecast system—Snow accumulation and ablation model. NOAA Tech. Memo. NWS HYDRO-17, 87 pp., https://www.wcc.nrcs.usda.gov/ftpref/wntsc/H&H/snow/AndersonHYDRO17.pdf.

    • Search Google Scholar
    • Export Citation
  • Barnett, T. P., J. C. Adam, and D. P. Lettenmaier, 2005: Potential impacts of a warming climate on water availability in snow-dominated regions. Nature, 438, 303309, https://doi.org/10.1038/nature04141.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Bormann, K. J., R. D. Brown, C. Derksen, and T. H. Painter, 2018: Estimating snow-cover trends from space. Nat. Climate Change, 8, 924928, https://doi.org/10.1038/s41558-018-0318-3.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Broxton, P. D., N. Dawson, and X. Zeng, 2016a: Linking snowfall and snow accumulation to generate spatial maps of SWE and snow depth. Earth Space Sci., 3, 246256, https://doi.org/10.1002/2016EA000174.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Broxton, P. D., X. Zeng, and N. Dawson, 2016b: Why do global reanalyses and land data assimilation products underestimate snow water equivalent? J. Hydrometeor., 17, 27432761, https://doi.org/10.1175/JHM-D-16-0056.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Broxton, P., X. Zeng, and N. Dawson, 2019: Daily 4 km gridded SWE and snow depth from assimilated in-situ and modeled data over the conterminous US, version 1. NASA National Snow and Ice Data Center Distributed Active Archive Center, accessed 11 May 2020, https://doi.org/10.5067/0GGPB220EX6A.

    • Search Google Scholar
    • Export Citation
  • Carroll, T., D. Cline, G. Fall, A. Nilsson, L. Li, and A. Rost, 2001: NOHRSC operations and the simulation of snow cover properties for the coterminous US. Proc. 69th Annual Meeting of the Western Snow Conf., Sun Valley, ID, Western Snow Conference, 14 pp., https://westernsnowconference.org/node/185.

    • Search Google Scholar
    • Export Citation
  • Chen, F., and Coauthors, 2014a: Modeling seasonal snowpack evolution in the complex terrain and forested Colorado Headwaters region: A model intercomparison study. J. Geophys. Res. Atmos., 119, 13 79513 819, https://doi.org/10.1002/2014JD022167.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chen, F., C. Liu, J. Dudhia, and M. Chen, 2014b: A sensitivity study of high-resolution regional climate simulations to three land surface models over the western United States. J. Geophys. Res. Atmos., 119, 72717291, https://doi.org/10.1002/2014JD021827.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chen, X., D. Long, Y. Hong, C. Zeng, and D. Yan, 2017: Improved modeling of snow and glacier melting by a progressive two‐stage calibration strategy with GRACE and multisource data: How snow and glacier meltwater contributes to the runoff of the Upper Brahmaputra River basin? Water Resour. Res., 53, 24312466, https://doi.org/10.1002/2016WR019656.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Chow, V. T., 1964: Handbook of Applied Hydrology. McGraw-Hill, 1495 pp.

  • Crow, W. T., M. Drusch, and E. F. Wood, 2001: An observation system simulation experiment for the impact of land surface heterogeneity on AMSR-E soil moisture retrieval. IEEE Trans. Geosci. Remote Sensing, 39, 16221631, https://doi.org/10.1109/36.942540.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Crow, W. T., and Coauthors, 2005: An observing system simulation experiment for hydros radiometer-only soil moisture products. IEEE Trans. Geosci. Remote Sensing, 43, 12891303, https://doi.org/10.1109/TGRS.2005.845645.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Cuntz, M., and Coauthors, 2016: The impact of standard and hard-coded parameters on the hydrologic fluxes in the Noah-MP land surface model. J. Geophys. Res. Atmos., 121, 10 67610 700, https://doi.org/10.1002/2016JD025097.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Daly, C., G. H. Taylor, W. P. Gibson, T. W. Parzybok, G. L. Johnson, and P. A. Pasteris, 2000: High-quality spatial climate data sets for the United States and Beyond. Trans. ASAE, 43, 19571962, https://doi.org/10.13031/2013.3101.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dawson, N., P. Broxton, and X. Zeng, 2018: Evaluation of remotely sensed snow water equivalent and snow cover extent over the contiguous United States. J. Hydrometeor., 19, 17771791, https://doi.org/10.1175/JHM-D-18-0007.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Decker, M., M. A. Brunke, Z. Wang, K. Sakaguchi, X. Zeng, and M. G. Bosilovich, 2012: Evaluation of the reanalysis products from GSFC, NCEP, and ECMWF using flux tower observations. J. Climate, 25, 19161944, https://doi.org/10.1175/JCLI-D-11-00004.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Demirel, M. C., J. Mai, G. Mendiguren, J. Koch, L. Samaniego, and S. Stisen, 2018: Combining satellite data and appropriate objective functions for improved spatial pattern performance of a distributed hydrologic model. Hydrol. Earth Syst. Sci., 22, 12991315, https://doi.org/10.5194/hess-22-1299-2018.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dirmhirn, I., and F. D. Eaton, 1975: Some characteristics of the albedo of snow. J. Appl. Meteor., 14, 375379, https://doi.org/10.1175/1520-0450(1975)014<0375:SCOTAO>2.0.CO;2.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Dozier, J., E. H. Bair, and R. E. Davis, 2016: Estimating the spatial distribution of snow water equivalent in the world’s mountains: Spatial distribution of snow in the mountains. Wiley Interdiscip. Rev.: Water, 3, 461474, https://doi.org/10.1002/wat2.1140.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Duethmann, D., J. Peters, T. Blume, S. Vorogushyn, and A. Güntner, 2014: The value of satellite-derived snow cover images for calibrating a hydrological model in snow-dominated catchments in Central Asia. Water Resour. Res., 50, 20022021, https://doi.org/10.1002/2013WR014382.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Durand, M., C. Gatebe, E. Kim, N. Molotch, T. H. Painter, M. Raleigh, M. Sandells, and C. Vuyovich, 2018: NASA SnowEx Science Plan: Assessing approaches for measuring water in Earth’s Seasonal Snow, version 1.6. NASA, 68 pp., https://snow.nasa.gov/sites/default/files/SnowEx_Science_Plan_v1.6.pdf.

    • Search Google Scholar
    • Export Citation
  • Dutra, E., P. Viterbo, P. M. A. Miranda, and G. Balsamo, 2012: Complexity of snow schemes in a climate model and its impact on surface energy and hydrology. J. Hydrometeor., 13, 521538, https://doi.org/10.1175/JHM-D-11-072.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Elder, K., L. Brucker, C. Hiemstra, and H. Marshall, 2018. SnowEx17 community snow pit measurements, version 1. NASA National Snow and Ice Data Center Distributed Active Archive Center, accessed 20 July 2020, https://doi.org/10.5067/Q0310G1XULZS.

    • Search Google Scholar
    • Export Citation
  • Elsner, M. M., S. Gangopadhyay, T. Pruitt, L. D. Brekke, N. Mizukami, and M. P. Clark, 2014: How does the choice of distributed meteorological data affect hydrologic model calibration and streamflow simulations? J. Hydrometeor., 15, 13841403, https://doi.org/10.1175/JHM-D-13-083.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Enzminger, T. L., E. E. Small, and A. A. Borsa, 2019: Subsurface water dominates Sierra Nevada seasonal hydrologic storage. Geophys. Res. Lett., 46, 11 99312 001, https://doi.org/10.1029/2019GL084589.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Essery, R., and P. Etchevers, 2004: Parameter sensitivity in simulations of snowmelt. J. Geophys. Res., 109, D20111, https://doi.org/10.1029/2004JD005036.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Essery, R., and Coauthors, 2009: SNOWMIP2: An evaluation of forest snow process simulations. Bull. Amer. Meteor. Soc., 90, 11201136, https://doi.org/10.1175/2009BAMS2629.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Essou, G. R. C., F. Sabarly, P. Lucas-Picher, F. Brissette, and A. Poulin, 2016: Can precipitation and temperature from meteorological reanalyses be used for hydrological modeling? J. Hydrometeor., 17, 19291950, https://doi.org/10.1175/JHM-D-15-0138.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Etchevers, P., and Coauthors, 2004: Validation of the energy budget of an alpine snowpack simulated by several snow models (Snow MIP project). Ann. Glaciol., 38, 150158, https://doi.org/10.3189/172756404781814825.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Foster, J. L., C. Sun, J. P. Walker, R. Kelly, A. Chang, J. Dong, and H. Powell, 2005: Quantifying the uncertainty in passive microwave snow water equivalent observations. Remote Sensing Environ., 94, 187203, https://doi.org/10.1016/j.rse.2004.09.012.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Franz, K. J., and L. R. Karsten, 2013: Calibration of a distributed snow model using MODIS snow covered area data. J. Hydrol., 494, 160175, https://doi.org/10.1016/j.jhydrol.2013.04.026.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Franz, K. J., T. S. Hogue, and S. Sorooshian, 2008: Operational snow modeling: Addressing the challenges of an energy balance model for National Weather Service forecasts. J. Hydrol., 360, 4866, https://doi.org/10.1016/j.jhydrol.2008.07.013.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Franz, K. J., P. Butcher, and N. K. Ajami, 2010: Addressing snow model uncertainty for hydrologic prediction. Adv. Water Resour., 33, 820832, https://doi.org/10.1016/j.advwatres.2010.05.004.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Garnaud, C., S. Bélair, M. L. Carrera, C. Derksen, B. Bilodeau, M. Abrahamowicz, N. Gauthier, and V. Vionnet, 2019: Quantifying snow mass mission concept trade-offs using an observing system simulation experiment. J. Hydrometeor., 20, 155173, https://doi.org/10.1175/JHM-D-17-0241.1.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hall, D. K., G. A. Riggs, V. V. Salomonson, N. E. DiGirolamo, and K. J. Bayr, 2002: MODIS snow-cover products. Remote Sensing Environ., 83, 181194, https://doi.org/10.1016/S0034-4257(02)00095-0.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hansen, M. C., and Coauthors, 2013: High-resolution global maps of 21st-century forest cover change. Science, 342, 850853, https://doi.org/10.1126/science.1244693.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Harrison, K. W., S. V. Kumar, C. D. Peters-Lidard, and J. A. Santanello, 2012: Quantifying the change in soil moisture modeling uncertainty from remote sensing observations using Bayesian inference techniques. Water Resour. Res., 48, W11514, https://doi.org/10.1029/2012WR012337.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Hay, L. E., G. H. Leavesley, M. P. Clark, S. L. Markstrom, R. J. Viger, and M. Umemoto, 2006: Step wise, multiple objective calibration of a hydrologic model for a snowmelt dominated basin. J. Amer. Water Resourc. Assoc., 42, 877890, https://doi.org/10.1111/j.1752-1688.2006.tb04501.x.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, C., and Coauthors, 2019: Can convection-permitting modeling provide decent precipitation for offline high-resolution snowpack simulations over mountains? J. Geophys. Res. Atmos., 124, 12 63112 654, https://doi.org/10.1029/2019JD030823.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • He, M., T. S. Hogue, K. J. Franz, S. A. Margulis, and J. A. Vrugt, 2011: Characterizing parameter sensitivity and uncertainty for a snow model across hydroclimatic regimes. Adv. Water Resour., 34, 114127, https://doi.org/10.1016/j.advwatres.2010.10.002.

    • Crossref
    • Search Google Scholar
    • Export Citation
  • Henn, B., M. P. Clark, D. Kavetski, B. McGurk, T. H. Painter, and J. D. Lundquist, 2016: Combining snow, streamfl