1. Introduction
In this study, we compile evidence suggesting that many types of high-resolution atmospheric models (Gutowski et al. 2020) report cold biases in 2-m air temperature (“T2m”) measurements across major mountain regions. The inaccuracies in modeled T2m beg the question of how such biases should be treated and whether key processes are being adequately represented at the scales required for the skillful projection of climate change in Earth’s water towers (Viviroli and Weingartner 2004; Immerzeel et al. 2020; Siirila-Woodburn et al. 2021). The bias is defined here as the difference between the model and the observation, so a cold bias of 1°C indicates that the model is colder than the observations by that amount. In the following sections, we describe 44 studies from the last decade, including both limited-area models (LAMs) and variable resolution general circulation models (Table A1). Most notably, at the mountain-range scale, we were unable to find examples of warm biases in the published literature (though, as we will show, warm biases can occur in valley subregions of mountain ranges). For simplicity, we refer to all of these previous studies as Models of Applicable Resolution for Mountain Meteorology across Time Scales (MARMOTS), based on their common goal of producing meteorological information at scales applicable to questions related to mountain hydroclimates.
a. Mountains in a warming world.
Though they occupy a small percentage of Earth’s landmass (between 13% and 30%; Kapos et al. 2000; Körner et al. 2011, 2017; Snethlage et al. 2022), mountains have an outsized global impact as the world’s water towers (Viviroli and Weingartner 2004; Immerzeel et al. 2020; Siirila-Woodburn et al. 2021). Theoretically, mountains warm from anthropogenic climate change at rates different from low-lying regions through a variety of mechanisms related to both cryospheric changes (e.g., snow albedo feedbacks) and atmospheric thermodynamic considerations (Mountain Research Initiative EDW Working Group 2015; Palazzi et al. 2019; Hock et al. 2019). However, observational determination of elevation-dependent warming has proven more elusive (Pepin et al. 2022), and such assessments are limited by data quality, continuity, and coverage in high-elevation areas (Oyler et al. 2015b; McAfee et al. 2019; Ma et al. 2019). At the same time, T2m is the first-order control of whether precipitation falls as rain or snow (Harpold et al. 2017; Jennings et al. 2018) including rain-on-snow events (Heggli et al. 2022), snowmelt timing (Musselman et al. 2021), and streamflow drought (Udall and Overpeck 2017; Gangopadhyay et al. 2022) in many mountain regions.
A number of factors complicate the spatiotemporal patterns of T2m in complex mountain terrain compared to flat, low-elevation areas. Figure 1 illustrates idealized depictions of some of these processes. Mountains are generally high elevation, so the total mass of the atmosphere above them (pressure) is less than in low-lying areas. As a consequence, there is less diffuse radiation from scattering and more intense direct beam radiation (Smith 2019). At the surface, the incident radiation also depends on slope, aspect, and terrain shadowing and terrain reflection (Fig. 1a), so sun-facing aspects may have warmer temperatures (Strachan and Daly 2017). Additional complications of T2m in mountain regions may arise from the patchwork of land-cover types, namely, snow and vegetation, which impact the surface energy balance and therefore T2m. Snow has a high albedo and a high emissivity (Fig. 1b), so it both very effectively reflects incoming radiation and emits heat in the longwave (LW) (Armstrong and Brun 2008). Vegetation has high surface roughness and a lower albedo than snow (Fig. 1c) and likewise influences the surface energy balance through both radiative and turbulent exchange mechanisms (Lee et al. 2011; Schultz et al. 2017; Burakowski et al. 2018).
Cold-air pools are common features in mountain climates that result from the topography and occur especially during periods of light winds, clear skies, during winter, and at night (Figs. 1d,e; Daly et al. 2010; Whiteman 2000; Lundquist et al. 2008). In such cases, cold, dense air drains from aloft and settles in valley bottoms, leading to stable stratification with relatively warm air overlying the cold air near the surface. Cold-air pools may also form even with relatively minimal cold-air drainage, in cases where topography limits mixing with ambient air (Clements et al. 2003) and radiative cooling dominates. In these cases, T2m observed within the valley cold pool may be colder than T2m observed at higher altitudes (an inverted temperature profile; Fig. 1e). Such features may mix out during the day or persist for days or weeks (Fig. 1d). The intricacies of the mountain planetary boundary layer are complex and have only begun to be explored (Lehner and Rotach 2018; Serafin et al. 2018). Serafin et al. (2018) provide a thorough review of many unique features of the mountain planetary boundary layer and challenges for models therein.
b. MARMOTS: The keys to understanding mountain climates.
High-resolution atmospheric models are nonetheless the best methods for assessing climate impacts in the world’s mountain regions, as uniform-resolution general circulation models are currently too coarse to resolve the mountain topography that is so fundamental for shaping mountainous climates (Rhoades et al. 2018a; Gutowski et al. 2020; Demory et al. 2014). Prein et al. (2015) discuss some of the relatively recent advancements in MARMOTS development. Numerical weather prediction models often share similar dynamical cores and parameterizations with their climate model counterparts but are intended to operate on shorter time scales with continually updated initial and boundary conditions and even state information (through nudging) that make extensive use of existing data assimilation datasets. Thus, output from numerical weather prediction models is increasingly used as inputs into models for mountain hydrological research (e.g., Currier et al. 2017; Reynolds et al. 2021; Meyer et al. 2023), so numerical weather prediction studies are also considered in this review.
c. Goals and outline.
The paper is structured as follows. We start by reviewing papers evaluating MARMOTS temperature biases. We augment the literature review by analyzing T2m data from NCAR’s “high-resolution CONUS (HRCONUS)” model dataset, presented in Liu et al. (2017), which covers the entire United States at a 4-km grid spacing. We then pose the question: Is T2m bias truly a bias, or a by-product of model-to-observation resolution mismatches? To answer these questions, we review observational capabilities across the globe and examine some of the gridded reference datasets that are frequently used to compute T2m biases.
Finally, we examine model T2m biases in the 300-km2 upper East River watershed (ERW), located in the Colorado Rockies, using data collected during the Surface Atmosphere Integrated Field Laboratory (SAIL) field campaign (Feldman et al. 2023). We examine T2m from nine stations located throughout the ERW valley (spanning ∼600 m of elevation) in addition to single-site measurements of T2m covariates, namely, near-surface wind speed, 2-m specific humidity, snow skin temperature, precipitable water vapor (PWV), and cloud cover fraction. We compare observed T2m to output from the High-Resolution Rapid Refresh (HRRR) model (Benjamin et al. 2016) and the reanalysis-forced Weather Research and Forecasting (WRF; Powers et al. 2017) Model configuration described in Xu et al. (2023). We do not propose a solution to the problem of winter season T2m cold biases, as the solutions will undoubtedly require community-wide efforts, but instead seek to demonstrate the nature of the problem and illuminate paths forward for solutions.
2. MARMOTS are cold biased over mountains, particularly in winter
A review of the recent literature shows that 44 studies report winter season cold biases across the world’s major mountain regions (Fig. 2 and Table A2). Mentions of model cold bias were found by first searching major studies, such as multimodel ensemble evaluation studies and those that have been widely shared and cited in the mountain hydroclimate field. We uncovered the majority of the studies by looking at the chain of references from those studies. Additional studies were found using search terms on Google Scholar such as “regional climate model evaluation,” or “WRF Model evaluation mountains” and related terms. While it is possible that some published model configurations have shown warm biases over mountain regions, none were found from this analysis.
The model biases reported in these studies are typically determined by evaluating model output against gridded meteorological observations covering the same model extent or by comparing individual weather stations to the closest model grid cells. T2m tends to reach a minimum at night (TMIN) and a maximum during the day (TMAX) and is often reported in terms of the daily average (TAVG). Unless specified, T2m refers to TAVG. The magnitude of cold biases generally ranges from 1° to 5°C (Table A2), though not all studies report a quantitative value of the bias.
a. Examples of model cold bias.
Vautard et al. (2021) present a historical climate bias analysis of temperature from the 0.11° horizontal resolution European Coordinated Regional Downscaling Experiment (EURO-CORDEX) ensemble consisting of 8 global climate model drivers and 11 independent regional models. The median model is cold biased over the Alps, Pyrenees, and Scandinavian ranges. Even the “hottest” models (95th percentile), with positive temperature biases in lowlands, are too cold in mountain regions. They further show that the TMIN bias is dominated by model structural variability rather than boundary conditions. Earlier EURO-CORDEX analyses showed similar cold biases in the Alps and Scandinavian ranges (Kotlarski et al. 2014). South American CORDEX (SA-CORDEX) experiments are similarly cold biased in the Andes (Blázquez and Solman 2023; Solman and Blázquez 2019; Torrez-Rodriguez et al. 2023). Similar biases are also found in SA-CORDEX and North American CORDEX (NA-CORDEX) evaluations (Torrez-Rodriguez et al. 2023; Xu et al. 2019). Results from Australasia-CORDEX show extensive cold biases that may be related to elevation (Di Virgilio et al. 2019). NA-CORDEX is cold biased in the Sierra Nevada (Xu et al. 2019) and possibly the Southern Rockies (McCrary et al. 2017).
Variable-resolution global models likewise demonstrate cold biases. Rhoades et al. (2018b) found extensive cold biases in variable-resolution CESM (VR-CESM) (Table A1) over the Sierra Nevada. Similar results are found again in the Sierra Nevada (Fig. 13 of Xu et al. 2018; Xu et al. 2021) as well as the Rockies (Fig. 3 of Wu et al. 2017), the Tibetan Plateau (Xu et al. 2021), and the Andes (Bambach et al. 2022). The System for Integrated Modeling of the Atmosphere-MPAS (SIMA-MPAS) model (Table A1) likewise shows a cold bias for high peaks in the western United States (WUS; X. Huang et al. 2022). The recently evaluated regionally refined mesh Energy Exascale Earth System Model (RRM-E3SM) (Table A1) demonstrates pervasive cold biases, though elevation-specific analyses are not presented (Tang et al. 2023).
The WRF Model has been applied across the globe as both a regional climate model and a numerical weather prediction model. WRF’s utility for supporting snow-water resource applications has been strongly argued given its skill in simulating orographic precipitation (Ikeda et al. 2010; Gutmann et al. 2012; He et al. 2019; Liu et al. 2011; Lundquist et al. 2019), but fewer studies scrutinize WRF’s T2m performance in mountains. Nonetheless, cold biases in that model have been articulated or shown in the Sierra Nevada (Fig. 5 of Pan et al. 2011; Huang et al. 2018; Walton and Hall 2018), the Wasatch (Scalzitti et al. 2016), and the Idaho–Bitterroot (Rudisill et al. 2022; Havens et al. 2019) ranges and other interior WUS mountain ranges (Fig. 13 of Wang et al. 2018). Similar biases in that model are found in Japan, the Himalayas, and the Southern Alps (Kawase et al. 2013; Karki et al. 2017; Kropač et al. 2021). NCAR’s convection-permitting WRF simulations described in Liu et al. (2017), hereafter L2017, cover all of CONUS at a 4-km horizontal grid spacing between 2000 and 2013 using ERA-Interim lateral boundary conditions. L2017 is cold biased over major mountain ranges, particularly on snow-topped peaks (He et al. 2019). As this dataset is publicly available and one of very few covering the entire WUS at a 4-km grid spacing, we analyze L2017 in greater detail in the next section.
b. Scrutinizing T2m biases from the L2017 4-km WRF dataset.
To better illustrate the nature of T2m biases, we evaluate data from L2017 for January–March 2008 (Fig. 3a). This specific time period is chosen because it was also examined in L2017 (Fig. 11 of their paper), and they note that it was also analyzed in several other studies. We group temperature biases by selected mountain regions using the regional definitions from Snethlage et al. (2022). The T2m biases are computed against the 4-km Parameter-Elevation Regressions on Independent Slopes Model (PRISM) AN81d daily temperature (https://prism.oregonstate.edu/). In addition, we perform a landform classification that groups each site as either a slope, a valley, or ridge-top grid cell using standard terrain position metrics (Lindsay 2016). The supplemental material provides additional information about the processing steps. We also group T2m biases by the entire dataset (every grid cell evaluated in the model against PRISM) and only those grid cells that encompass a weather station observation. We do this because the PRISM data should be very close to the underlying observations for those grid cells with weather stations, so we can test whether biases persist for areas with observations or whether they are primarily in locations where PRISM does not have an observation.
We find T2m biases are approximately normally distributed across the entire CONUS (Figs. 3b,i), with a mean near zero, but each mountain range of the WUS shows a cold bias (Figs. 3c–i) of between 0.8° and 1.4°C averaged across all mountain-range grid cells. The dry continental interior ranges (e.g., Figs. 3h,c) have more extreme cold biases than the coastal ranges (e.g., Figs. 3e,f