The High-Resolution Rapid Refresh (HRRR) model became operational at the National Centers for Environmental Prediction (NCEP) in 2014 but the HRRR’s performance over certain regions of the coterminous United States has not been well studied. In the present study, we evaluated how well version 2 of the HRRR, which became operational at NCEP in August 2016, simulates the near-surface meteorological fields and the surface energy balance at two locations in northern Alabama. We evaluated the 1-, 3-, 6-, 12-, and 18-h HRRR forecasts, as well as the HRRR’s initial conditions (i.e., the 0-h initial fields) using meteorological and flux observations obtained from two 10-m micrometeorological towers installed near Belle Mina and Cullman, Alabama. During the 8-month model evaluation period, from 1 September 2016 to 30 April 2017, we found that the HRRR accurately simulated the observations of near-surface air and dewpoint temperature (R2 > 0.95). When comparing the HRRR output with the observed sensible, latent, and ground heat flux at both sites, we found that the agreement was weaker (R2 ≈ 0.7), and the root-mean-square errors were much larger than those found for the near-surface meteorological variables. These findings help motivate the need for additional work to improve the representation of surface fluxes and their coupling to the atmosphere in future versions of the HRRR to be more physically realistic.
The High-Resolution Rapid Refresh (HRRR) model is an hourly updating convection-allowing model that is used for short-range weather forecasts (Benjamin et al. 2016). Version 1 of the HRRR became operational for the coterminous United States in September 2014 and has been upgraded every two years since then, with version 2 of the HRRR (HRRRv2) becoming operational in August 2016, and HRRRv3 becoming operational in July 2018. Forecasts are available up to 18 h from initialization in versions 1 and 2 of the HRRR, and 36-h forecasts are available in version 3.
It is important that the HRRR simulates near-surface exchange processes of heat and moisture and modification of momentum from the land surface to the atmosphere to produce reliable and accurate weather forecasts (e.g., Smirnova et al. 2016; Lee et al. 2018; Wulfmeyer et al. 2018). For example, differences in vegetation cover, land use, soil moisture, soil temperature, and soil type lead to differences in the partitioning of energy into sensible and latent heat fluxes (e.g., Oke 1987; Segal and Arritt 1992; Brown and Arnold 1998; Pielke 2001; Kalthoff et al. 2011), resulting in finescale circulations which affect boundary layer growth and development (e.g., Courault et al. 2007). These feedbacks are highly nonlinear (e.g., Santanello et al. 2018; Wulfmeyer et al. 2018) and are hypothesized to be enhanced by a warming climate (e.g., Dirmeyer et al. 2012).
Ensuring that the HRRR is able to simulate near-surface exchange processes requires careful and thorough evaluation of the model output to identify and correct potential model biases. We focused our investigation on the southeast United States, where the only known evaluation of the HRRR’s performance is a recent study by Wagner et al. (2019) that used observations from the Atmospheric Emitted Radiance Interferometer (AERI; Knuteson et al. 2004; Turner and Blumberg 2019) installed on the Collaborative Lower Atmosphere Mobile Profiling System (CLAMPS) that was deployed near Belle Mina, Alabama. Wagner et al. used temperature and humidity profiles retrieved from AERI radiance measurements to calculate convective available potential energy (CAPE) and compared these values with output from the HRRRv1 in March and April 2016 as a component of VORTEX-Southeast (VORTEX-SE). AERI-derived CAPE observations have been shown to have reasonable skill compared to radiosondes (Blumberg et al. 2017); Wagner et al. (2019) noted that the HRRRv1 diurnal distribution of CAPE was lagged 2 to 4 h compared to the AERI observations, which was likely due to the lack of subgrid-scale clouds in that version of the HRRR and the subsequent feedback by the warmer surface on convective activity.
In the present study we used measurements from two 10-m micrometeorological towers that were installed in northern Alabama in 2016 and 2017, one of which was installed approximately 1 km northeast of CLAMPS at Belle Mina, to help evaluate HRRR’s performance over the southeast United States. We used output from the HRRRv2, which included a treatment for subgrid-scale clouds, from an 8-month model evaluation period from 1 September 2016 to 30 April 2017.
2. Datasets and models
a. Micrometeorological tower observations
Meteorological measurements were obtained from two 10-m micrometeorological towers in northern Alabama (Fig. 1a) that were installed in February 2016 to complement the rich suite of meteorological observations made during VORTEX-SE, including the CLAMPS referenced earlier. VORTEX-SE was a multiyear field experiment focused on studying the characteristics associated with the genesis of severe weather events that are (seemingly) unique to the Southeast United States. These characteristics include more variable terrain, larger and denser forested areas, and different vegetation type and coverage than in regions where previous studies have been conducted (i.e., the central and southern plains). For more details, we refer the reader to Dumas et al. (2016), Dumas et al. (2017), Wagner et al. (2019), and Lee et al. (2019).
One of the micrometeorological towers used to support VORTEX-SE was installed approximately 1 km east of CLAMPS at the Auburn University Tennessee Valley Research and Extension Center [34.69°N, 86.87°W, 189 m above mean sea level (MSL)] located 4.7 km north of Belle Mina, Alabama, which is approximately 27 km west of Huntsville (Fig. 1b). The area immediately surrounding the site was mostly flat with grazed pasture. A mixture of cotton and soybean crops was located in fields 1–2 km to the west and north of the tower. The second micrometeorological tower was installed at the North Alabama Horticulture Research Station near Cullman, AL (34.19°N, 86.80°W, 241 m MSL) (Fig. 1c). The area surrounding this tower was characterized by larger surface roughness than the Belle Mina site and consisted of ungrazed grassland as well as several fruit orchards located near the tower.
The towers at both sites were instrumented with temperature, humidity, wind, incoming and outgoing shortwave and longwave radiation, pressure, and rainfall sensors (Table 1). The sensor suite was chosen because of the sensors’ manufacturer-stated level of accuracy (cf. Table 1) and because these sensors have been used reliably for other micrometeorological studies in the eastern United States (e.g., Lee et al. 2015).
Broadband upwelling infrared measurements from a Hukseflux four-component net radiometer, model number NR-01, were used to compute skin temperature. We computed the skin temperature Ts using the Stefan–Boltzmann relationship:
In Eq. (1), LWout is the outgoing longwave radiation, is the surface emissivity, and is the Stefan–Boltzmann constant. When calculating Ts, we used a value for ε of 0.97, which is a typical value for plant leaves (e.g., Jackson 1982).
The sampling frequency from all meteorological variables at both sites was 1 Hz, and 1-min means were stored onto three on-site dataloggers (two Campbell Scientific CR-3000 loggers and one CR-1000 logger). Hourly means of the 1-min data were computed to facilitate comparisons with the HRRR output. Measurements from a CSAT3 sonic anemometer and an EC155 closed path infrared gas analyzer, both installed 10-m AGL, were sampled at 10 Hz and stored onto the dataloggers on-site. The 10-Hz measurements were then used to compute 30-min sensible and latent heat fluxes. A second CSAT3 sonic anemometer and an EC155 analyzer were installed at 3 m AGL at both sites, although in the present study we primarily focused on the measurements at 10 m AGL.
The postprocessing applied standard corrections and coordinate rotations to all high-frequency sonic anemometer and closed path gas analyzer datasets (e.g., Meyers 2001). To this end, we first used the 10-Hz data to compute the covariance between the u, υ, and w components of the wind and scalars, specifically , , , , , , and . Once we computed these covariances, a mathematical coordinate transformation was done with the constraints that (Meyers and Baldocchi 2005). The wind speed vectors were first computed using the mean u-, υ-, and w-wind components (i.e., , , and , respectively) measured using the sonic anemometer’s coordinates. Following Tanner and Thurtell (1969), we then computed the azimuth η and elevation θ angles and used these to rotate the covariances. We corrected the rotated covariances for angle-of-attack errors (e.g., Kochendorfer et al. 2012), using an angle-of-attack correction of 1.07. Once the rotated covariances were corrected, we computed the 30-min sensible (H) and latent (LE) heat fluxes using Eqs. (2) and (3):
In the above equations, is the specific heat capacity corrected for moist air, is the density corrected for moist air, is the kinematic form of the rotated vertical temperature flux, is the number of moles of dry air, is the kinematic form of the rotated vertical moisture flux, and is the mean temperature from the sonic anemometer. Similarly, we computed u* using the rotated covariances, following Eq. (4):
Once we computed the 30-min fluxes and u*, we performed additional screening to eliminate unrealistic estimates. To this end, we eliminated sensible and latent heat fluxes that were <−200 W m−2 or were >800 W m−2, and we filtered values of u* that were <0 m s−1 or were >2 m s−1. We then averaged the 30-min means to determine the 1-h means for comparison with the HRRR output. To calculate the ground heat fluxes from the observations, we used the gradient method (e.g., Sauer and Horton 2005). The gradient method computes the ground heat flux as a function of the soil temperatures measured at 2 and 5 cm below ground level, combined with an estimate of the soil’s thermal conductivity. The soil’s thermal conductivity was obtained from the bulk density and porosity measurements obtained from five different soil samples surrounding each of the sites.
The meteorological and flux datasets from both towers were mostly complete, with 91% and 83% of the data available from Belle Mina and Cullman, respectively, over the 8-month model evaluation period. The smaller percentage of data availability at Cullman was partially due to a 14-day data gap in April 2017 caused by an on-site power outage. Both towers were removed in May 2017 following the conclusion of the spring 2017 VORTEX-SE campaign.
The HRRR model is nested within the Rapid Refresh (RAP) modeling system domain. The HRRR is updated hourly and run with a 3-km grid spacing over the coterminous United States (e.g., Smith et al. 2008; Benjamin et al. 2016).
The HRRR uses the Rapid Update Cycle (RUC) land surface model (LSM) and uses an implicit scheme for computing surface fluxes [see, e.g., Smirnova et al. (1997) for more details]. The newer LSM implemented in the HRRR has nine soil levels (i.e., at 0, 1, 4, 10, 30, 60, 100, 160, and 300 cm), compared with the six levels used in earlier versions of this LSM, and has improved treatment of snow compared with earlier versions (Smirnova et al. 2016). In version 2 of the HRRR, which we used in the present study, 30-min land use obtained information from the Moderate Resolution Imaging Spectroradiometer (MODIS) was used (T. G. Smirnova 2019, personal communication). Additionally, the RAP modeling system uses the Mellor–Yamada–Nakanishi–Niino (MYNN) planetary boundary layer (PBL) mixing scheme (Nakanishi and Niino 2004, 2009). For its use in the RAP, the MYNN has been modified to prevent negative turbulent kinetic energy and to improve its mixing-length formulation (Benjamin et al. 2016). The Rapid Radiative Transfer Model Global (RRTMG) was used for computing shortwave and longwave radiation. The RRTMG was modified from the Rapid Radiative Transfer Model (RRTM; Iacono et al. 2008) to better account for aerosols (Benjamin et al. 2016). For additional details on the HRRR configuration, we refer the reader to Benjamin et al. (2016) and to Smirnova et al. (2016) for more details on the LSM.
The grid cell that contains Belle Mina is classified as cropland, which has a mean albedo of 0.168. The cropland plant functional type is not irrigated, although we note that there are plans to include the effects of irrigation into the cropland plant functional type in subsequent versions of the RUC LSM (e.g., Smirnova et al. 2016). Important to note here, though, is that the area surrounding Belle Mina was mostly grassland; the implications of this are discussed later in the paper. In contrast, the Cullman land use type in the HRRR is woody savanna, which has a mean albedo of 0.149.
We evaluated the HRRR’s analysis and forecasts for the following variables: 2-m air temperature (Ta), skin temperature (Ts), 2-m dewpoint temperature (Td), 10-m wind speed (Wspd), the u and υ components of the wind (u and υ), friction velocity (u*), incoming shortwave radiation (SWin), outgoing shortwave radiation (SWout), incoming longwave radiation (LWin), outgoing longwave radiation (LWout), sensible heat flux (H), latent heat flux (LE), and ground heat flux (G). [We obtained these data, which are available beginning in middle July 2016, from http://home.chpc.utah.edu/~u0553130/Brian_Blaylock/cgi-bin/hrrr_download.cgi (Blaylock et al. 2017).]
We then compared this value of with the observed energy balance components obtained from the net radiometer at both sites.
We also computed the net radiation from the HRRR output of H, LE, and G, with the assumption of surface energy balance closure, as shown in Eq. (6):
Because the HRRRv2 became operational at NCEP on 23 August 2016 and because the micrometeorological towers were removed from Belle Mina and Cullman in early May 2017, we focused on the 8-month period from 1 September 2016 to 30 April 2017 for evaluating the HRRRv2. Unlike the HRRRv1, which did not have subgrid-scale clouds, the HRRRv2 included a treatment for subgrid-scale clouds. The absence of subgrid-scale clouds in the HRRRv1 resulted in a well-known and large positive bias in SWin that created warm biases in Ta and Ts (e.g., Benjamin et al. 2016; Wagner et al. 2019), which has been improved in the HRRRv2.
We focused much of our investigation on the 1-h forecast from the HRRR for comparison with the micrometeorological tower observations discussed in the previous section. We also evaluated the HRRR analysis (i.e., its initial condition at 0 h) and at longer forecast periods (i.e., the 3-, 6-, 12-, and 18-h forecasts). We quantified the HRRR performance over the model evaluation period by computing the mean bias error (MBE), the coefficient of determination (R2), and root-mean-square error (RMSE).
We computed the MBE, R2, and RMSE using the hourly values and distinguished between daytime and nighttime periods. To remove the effects of the morning and evening transition periods, we defined daytime as between 1200 and 1600 LST (LST = UTC − 6 h), and we defined nighttime as between 0000 and 0400 LST.
a. Hourly observations
Over the 8-month period for which we evaluated the HRRR, we found good agreement between the observed and 1-h HRRR forecasts of Ta, Ts, and Td at both Belle Mina and Cullman (Table 2). The mean difference between the model and observations for these variables was less than 0.75°C, R2 exceeded 0.95 in all cases, and the slope of the relationship between the model and observations was between 0.96 and 1.03 for all three variables at both sites. Good agreement was also found when we selected only afternoon values, as shown in Table 3, and when we selected only nighttime values, as shown in Table 4. In both of these instances, the slopes of the relationships were generally ≈1, and the RMSEs did not show any significant differences between daytime and nighttime. Furthermore, we did not note any biases in the relationships for either high or low temperatures or dewpoints, suggesting that the HRRR was able to capture the range of variability across the different seasons studied. For a graphical representation of these relationships, as well as the other relationships discussed in this section, we refer the reader to the appendix.
The relationship between the observed 10-m Wspd and Wspd from the HRRR was weaker than the relationships found for the thermodynamic variables for both sites; R2 for Wspd is 0.71 and 0.70 at Belle Mina and Cullman, respectively, when focusing on the entire diurnal cycle, although we note that the slope of the relationship for Wspd was comparable with the slopes of the relationship for the thermodynamic variables at both sites (Table 2). We found similar agreement between the observations of Wspd and the HRRR during the daytime (Table 3) and nighttime (Table 4), and RMSEs were around 1 m s−1 regardless of the time period considered.
When breaking the 10-m wind into its u and υ components, we found that the υ-wind component compared better with the HRRR at both sites, with R2 around 0.86 at both sites. For the u-wind component, R2 was approximately 0.7 at Belle Mina and Cullman. Furthermore, the individual u- and υ-wind components did not show any notable differences in the MBE, R2, or RMSE between daytime and nighttime.
We found that HRRR overestimated u* at both sites, as indicated by the slopes of this relationship of 1.13 and 1.43 at Belle Mina and Cullman, respectively, (Table 2). The slope of this relationship was about the same when only daytime values were considered, but R2 decreased at both sites (Table 3). The agreement was poorest at nighttime as the HRRR significantly overestimated u* at Cullman. We revisit the poor agreement during the nighttime later in this section when discussing the agreement between the observed and HRRR-derived fluxes.
Overall, smaller R2 and larger deviations from the mean were found for the radiation components over the period for which we evaluated the HRRR. Of the four radiation components, we found that the agreement was best for LWout with R2 of 0.95 at both sites although the other radiation components all had R2 > 0.85 when the entire diurnal cycle was considered (Table 2). Comparable values of RMSE and R2 were found during the daytime (Table 3) and nighttime (Table 4). Although the slopes of the best fit lines were near 1 for LWin and LWout, the slopes of the best fit lines were ≈0.92 for SWin at both sites, but were <0.7 for SWout when the entire diurnal cycle was considered (cf. Table 2). This became more pronounced when considering only the afternoon period (cf. Table 3). The smaller slopes and poorer relationships for SWout were due to biases in SWin and may be attributed to a surface albedo problem in the HRRR, or potentially to an incorrect treatment of subgrid-scale clouds. We investigate and discuss this in more detail in section 3d.
When comparing H, LE, and G between the HRRR 1-h forecast and observations, we found better agreement between the Cullman observations and HRRR output than between the Belle Mina observations and the HRRR. At Belle Mina, R2 for H, LE, and G is 0.66, 0.58, and 0.76, respectively, but at Cullman, R2 for these variables is around 0.7 when the entire diurnal cycle was evaluated (cf. Table 2). Additionally, the slope of the relationship between the model and observed H, LE, and G was 1.02, 0.96, and 0.91, respectively, at Belle Mina, but was 0.94, 0.82, and 1.01, respectively, at Cullman. The result of these differences between the two sites is best illustrated when computing the net radiation from the observations and comparing this with the HRRR. There was better agreement between the observations and HRRR when Rn is computed as the net sum of the incoming and outgoing shortwave and longwave radiation components (i.e., ), with R2 of 0.90 and 0.89 at Belle Mina and Cullman, respectively. The R2 was lower and the RMSE was larger when is computed as the sum of the heat fluxes (i.e., ), with R2 of 0.87 and 0.83 at Belle Mina and Cullman, respectively.
When we separated by time of day, we found that the relationship between the observed and HRRR H, LE, and G, as R2 is smaller and the RMSE is larger during both the daytime and nighttime. We also found that the slopes of the best-fit lines for the afternoon period (cf. Table 3) and nighttime period (cf. Table 4) are lower than those for the entire diurnal cycle (cf. Table 2), which we attributed to the larger scatter during these periods. We investigate the larger scatter in more detail in section 3c where we describe the mean diurnal cycles of the surface energy balance components. When separating by time of day, we also found that the fluxes from the HRRR showed the poorest agreement with the observations during the nighttime when turbulent mixing is weakest and thus fluxes are smallest. During the nighttime periods, we found R2 values of <0.3 for H and LE at both sites. We attribute the poor agreement between the observed nighttime fluxes and the HRRR fluxes, as well as the HRRR’s overestimates of u* discussed earlier in this section, to the fact that models oftentimes struggle under weak wind conditions, which are characteristic of nighttime conditions over the Southeast United States. Furthermore, under weak wind conditions during the nighttime, errors in the observed fluxes increase (e.g., Aubinet et al. 2012). Also contributing to the lower R2 between the observed and model-derived fluxes and u* during the nighttime than during the daytime is that the range of fluxes, and also u*, is smaller at night.
Although we have so far focused here on relationships between the observations and the HRRR 1-h forecast, we found that the HRRR did well reproducing the observed values. Of the HRRR model runs that we evaluated, we found that the smallest RMSEs were in the 0-h output which indicates that the HRRR accurately reproduced the initial conditions. We did note some exceptions. In the case of G, for example, the lowest RMSE tended to occur in the longer forecast periods (i.e., the 6-, 12-, 18-h forecasts). Although the RMSEs were only slightly smaller for the longer forecast periods when the entire diurnal cycle was considered (cf. Table 5), lower RMSEs were most evident in the daytime values (cf. Table 6), and this was found to happen at both sites. The lower RMSEs for G at the longer forecast periods suggest that the HRRR initial conditions are not in good balance but come into better agreement at the later forecast periods.
For the all other variables besides G, the RMSEs increased as the forecast period increased, and this happened not only when the entire diurnal cycle was considered (Table 5), but also during the afternoon (Table 6) and nighttime (Table 7) periods. When considering the components of the surface energy balance, we found that the RMSE was typically lowest in the 1-h forecast when also compared to output from the 0-h HRRR runs. In the next sections, we further explore these differences by investigating how well the HRRR reproduces the mean monthly diurnal cycles of the meteorological variables and surface energy balance components.
b. Mean monthly diurnal cycle of surface meteorological variables
We focused on the mean monthly diurnal cycles in three different seasons by selecting the middle month within each season, and we analyzed output from the initial conditions (i.e., 0 h) as well as the 1-, 3-, 6-, 12-, and 18-h HRRR forecasts.
We found that the HRRR has a cold bias in Ta relative to the observations during the nighttime, and these biases can be up to 1.5°C depending on the HRRR forecast period. This pattern reversed during the daytime with warm biases in the HRRR up to 1.7°C at Belle Mina (Figs. 2a–c) and Cullman (Figs. 3a–c). All the model forecasts showed the same general trends in the diurnal differences, although there were some exceptions. For example, the shorter forecast periods of the HRRR (i.e., the 1- and 3-h forecasts) were too cool during the nighttime in October at Belle Mina, with mean biases of 1.0°–1.5°C. Also, the later forecast periods (i.e., the 12- and 18-h forecasts) tended to have a warm nocturnal bias at Cullman. In general, the HRRR’s forecasted Ta was pretty consistent for different length forecasts with the exception of during October, where at both Belle Mina (Fig. 2a) and Cullman (Fig. 3a) there were marked changes for the different forecasts during the nighttime hours. These differences between the HRRR forecasts and observations are further supported by the differences in LWout between the model and observations, as we saw larger daytime biases in LWout in the HRRR at both Belle Mina and Cullman in the mean monthly diurnal cycles of LWout (not shown). This finding is consistent with previous studies that have also found a warm daytime bias in near-surface temperatures in the HRRR (e.g., Benjamin et al. 2016). The mean diurnal cycles for Ts (not shown) somewhat contradicted the findings for LWout. We found that the largest Ts biases occurred during the daytime at Belle Mina between December and February, during which the mean observed Ts were over 3°C larger than in the HRRR.
In the case of Td, the HRRR forecasts showed a moist bias that was larger at Cullman than Belle Mina (Figs. 2d–f, Figs. 3d–f). Differences between the model and observations tended to be smaller during the nighttime than daytime. At Belle Mina, the HRRR 2-m Td was about 1°C cooler than the observations during the nighttime in October and April, whereas the HRRR had a slight wet bias during the nighttime in January at Cullman. This was fairly consistent for the different model runs. The largest differences between the model and observations, sometimes up to 3°C, occurred during the daytime, depending on the model run. Again, the diurnal bias of Td for most forecast times was pretty consistent in the three seasons, except for January at both Belle Mina and Cullman where the daytime moist bias in the HRRR increases for the longer forecast times.
When putting the above results on the differences between the modeled and observed Ta and Td into the context of previous studies, we note that the daytime warm bias that we identified in 2-m air temperatures is contradictory to findings from other evaluations of mesoscale models over the southern United States. For example, Hu et al. (2010), when evaluating three different PBL schemes commonly used in the Weather Research and Forecasting (WRF) Model [i.e., the Mellor–Yamada–Janjić (MYJ) scheme, Yonsei University (YSU) scheme, and Asymmetric Convective Model version 2 (ACM2) scheme], found that all three PBL schemes had a daytime cool bias of ≈2°C when averaged across 211 observing stations in the southern United States. Wilczak et al. (2009) found a similar bias when using the WRF-Chem Model for a study in Texas, and this daytime cool bias has been noted in studies with the RUC (e.g., Smirnova et al. 2016). Most likely, the discrepancy between our study and these previous studies is due to how the models used in these studies represented subgrid-scale clouds. Consistent with our findings evaluating the diurnal cycles of Td, though, was that the WRF simulations by Hu et al. (2010) evaluating different PBL schemes all showed a daytime moist bias.
When comparing the mean diurnal cycles of wind speed from the observations with the HRRR, we found that the HRRR forecasts simulated the mean diurnal cycle of Wspd accurately in each month at both Belle Mina (Figs. 2g–i) and Cullman (Figs. 2g–i) but overestimated the magnitude of Wspd by about 1.0–1.5 m s−1 at both sites. Of the HRRR forecast periods, the smallest errors were in the 1-h forecast, and errors tended to be lower during the daytime than nighttime. Interestingly, though, the wind speeds in the HRRR initial conditions (i.e., the 0-h output) were considerably different from the other forecast periods. The HRRR Wspd showed much better agreement with the observations at Cullman than at Belle Mina. HRRR’s overestimate of Wspd is surprising because the first model level in the HRRR is 8 m AGL, although the observations were made 10 m AGL, and because the roughness lengths are higher in the HRRR than the observed values. However, the tendency for models to overestimate near-surface winds, which is attributed to too much mechanical mixing, is a finding consistent with previous work [e.g., comparing near-surface observed wind speeds with those from the National Center for Atmospheric Research (NCAR) Mesoscale Model (MM5) (e.g., Zhang and Zheng 2004) and the WRF Model (e.g., Cheng and Steenburgh 2005; Hu et al. 2010)].
c. Monthly diurnal cycle of surface fluxes
When comparing the mean monthly diurnal cycles of the different components of the surface energy balance, we found similar agreement among the different HRRR forecast periods for H, LE, and G at Bella Mina (Fig. 4) and Cullman (Fig. 5). Mean differences between the observations and HRRR forecast periods were typically ±50 W m−2, but there were exceptions. Most notable was a significant overestimate of H in the HRRR in April at Belle Mina and Cullman. Also noteworthy was that in January when LE is typically small, we found that the HRRR still overestimated LE by up to 100 W m−2 at Belle Mina and up to 75 W m−2 at Cullman. In the case of G, the HRRR tended to be biased low relative to the observations. The largest differences in G of nearly 100 W m−2 occurred during the daytime in April at Cullman.
We found better agreement when comparing the observed mean monthly diurnal cycles of SWnet (i.e., SWin − SWout) and LWnet (i.e., LWin − LWout) with the HRRR forecast periods at Bella Mina (Fig. 6) and Cullman (Fig. 7). We found that there was larger variability among the different HRRR forecast periods relative to the observations for SWnet than we found for LWnet. LWnet in the HRRR output showed better agreement with the observations than did SWnet; for LWnet, the HRRR was generally between 10 and 40 W m−2 lower than the observations, depending on the time of day and season. The smallest HRRR bias occurred between 0900 and 1200 LST, whereas the largest HRRR bias was typically between 1500 and 1800 LST. These time-of-day biases suggested there may be some lag between the observations and HRRR. To investigate these lags in more detail, we computed the R2 and RMSE between the observed LWnet and HRRR LWnet using temporal lags ranging from −6 h to +6 h. We found that, at both sites, the R2 (RMSE) was slightly larger (smaller) when the observations lag the HRRR 1-h forecast by 1 h. At Belle Mina (Cullman), the R2 was 0.727 (0.712) with no lag but 0.754 (0.717) with this 1-h lag. The remaining lags showed much smaller R2 and larger RMSE. However, we found no temporal lag in SWnet. At Belle Mina (Cullman), the R2 was 0.894 (0.880) with no lag but 0.829 (0.795) with this 1-h lag, and RMSEs were smallest when there was no temporal lag.
d. Surface energy balance
As differences among the different HRRR forecast periods were generally small, we used the 1-h HRRR forecast as an example to illustrate how the calculated surface energy balance compared between the observations and HRRR. As stated earlier, we found that the HRRR simulated the cycles of incoming and outgoing longwave radiation well at Belle Mina (Figs. 8a–c) and Cullman (Figs. 9a–c). Furthermore, the model overestimated LWout by up to 40 W m−2; this overestimate is most noticeable at Belle Mina and Cullman in April (cf. Figs. 8c, 9c). The overestimates of LWout in the HRRR did not, however, correspond with positive biases in Ts; the observed Ts was typically larger the HRRR Ts, and these differences were maximized during the daytime. This discrepancy may be due to the small spatial representativeness of the Ts observations.
Furthermore, the HRRR has a tendency to underestimate SWin, likely due to an incorrect treatment of subgrid-scale clouds. As a result of the underestimates in SWin, there existed a systematic bias in SWout from the HRRR in all three months at both sites. The bias in SWout was due to the bias error in SWin and also likely to biases in the surface albedo (not shown). Overall, the biases in the SWin and SWout canceled out and resulted in , computed as the sum of SWnet and LWnet, showing good agreement with the observations.
We found larger differences between the mean diurnal cycles of the HRRR and observations when evaluating the diurnal time series of H, LE, and G (Figs. 8d–f, 9d–f). For example, in April at Belle Mina, we found that the HRRR overestimated H by ≈150 W m−2 during the midday and early afternoon, whereas LE was ≈50 W m−2 lower than the observations, which resulted in HRRR overestimates of , computed as the sum of H, LE, and G. In general, we found that afternoon LE was typically larger in the HRRR than in the observations at Belle Mina in all months except for April. The magnitude of the differences in afternoon LE between the observations and HRRR was smaller at Cullman, and the overestimate of afternoon LE in the model was present in all months except for August, September, and April. Differences between HRRR G and observed G were typically <±40 W m−2 at both sites.
Overall, the differences found here were expected, especially the HRRR’s overestimates of LE. For example, as noted in section 2b, the plant functional type for the HRRR grid cells containing Belle Mina was classified as cropland, whereas the area surrounding Belle Mina was mostly grassland, which would be expected to have a lower LE than cropland. The smaller differences between the HRRR and Cullman observations may have occurred because the HRRR land use type was more representative of the actual land use type. Even so, differences between the observed fluxes and HRRR-derived fluxes were not insignificant, as H was up to 100 W m−2 larger in the HRRR than in the Cullman observations (cf. Fig. 9f).
When comparing energy balance closure between the observations and model, we found that, the HRRR closed the energy balance as evident by throughout the entire diurnal cycle at both Belle Mina (Figs. 8g–i) and Cullman (Figs. 9g–i). Energy balance closure occurred because it is hard-coded in the HRRR’s LSM. However, the observations from both sites did not show full energy balance closure. Differences between and were up to 100 W m−2 at both Belle Mina (Figs. 8g–i) and at Cullman (Figs. 9g–i), which is attributed to the energy balance closure problem in micrometeorology (e.g., Aubinet et al. 1999; Foken 2008; Frank et al. 2013; Xu et al. 2017). We discuss this in more detail later in this section. For the entire evaluation period, the RMSE between and at Belle Mina (Cullman) was 35.2 W m−2 (37.2 W m−2) in the observations, but 9.7 W m−2 (10.0 W m−2) in the 1-h HRRR forecast. Additionally, R2 between and at Belle Mina (Cullman) was 0.956 (0.944) in the observations, but 0.997 (0.998) in the 1-h HRRR forecast.
To examine in more detail the relationships between the different components of the surface energy balance, we calculated the relationship between H + LE and SWnet at both Belle Mina and Cullman (Fig. 10) using the observations and HRRR 1-h forecast. Doing so allowed for us to quantify how well the observations and the HRRR capture the response of the land-atmosphere system (i.e., H + LE) to a given net input into the system (i.e., SWnet). We found that the relationships between these variables were similar; R2 was >0.90 at both sites in the observations and the HRRR. However, the slope of the relationship between the two variables was much steeper in the model (0.65 and 0.63 in the HRRR output at Belle Mina and Cullman, respectively, for the entire period of record) than in the observations (0.52 and 0.51 at Belle Mina and Cullman, respectively, for the entire period of record). The larger slopes in the HRRR than in the observations was consistent in the different seasons, as summarized in Table 8. When investigating this relationship as a function of time of day, we found that the slopes of this relationship were consistently lower in the morning than in the afternoon in both the observations and HRRR output at Belle Mina and Cullman. The findings here, particularly the larger slopes in the HRRR compared to the observations, underscore important time-of-day biases present in the HRRR.
The absence of complete energy balance closure is well known and is consistent with previous studies (e.g., Aubinet et al. 1999). Reasons for the energy imbalance are numerous; measurement errors are oftentimes cited (e.g., Foken 2008; Xu et al. 2017). To ensure high-quality datasets and to mitigate possible measurement errors, the closed path infrared gas analyzers used at both sites (cf. section 2a) were calibrated prior to deployment, and all observed datasets were carefully and rigorously screened (cf. section 2a). However, other measurement errors may be induced by potential errors in G and the representativeness of these measurements. As shown in, for example, Figs. 8d–f and 9d–f, G is not insignificant. Because of the inherent nature of soil variability, there can exist differences in soil properties, leading to differences in the bulk density and porosity derived from the soil samples and to differences in the soil temperatures, all of which are used to calculate G (cf. section 2a). To help mitigate these potential errors, we selected soil samples from five different sites surrounding the towers and averaged measurements from these to determine mean soil characteristics.
Besides measurement errors, horizontal advection and land surface heterogeneity are also cited as causes for the absence of energy balance closure (e.g., Foken 2008; Frank et al. 2013; Xu et al. 2017). To filter periods when there is significant horizontal advection of sensible or latent heat, we removed 30-min fluxes when the flux divergence, computed as the difference in H and LE between the measurements made 3 m AGL and 10 m AGL, exceeded 20% during the given period. However, we found that this did not impact the observed diurnal cycles of H and LE shown in Figs. 8 and 9 and cannot explain the absence of energy balance closure.
e. HRRR performance during spring 2017 at Belle Mina and Cullman
We have so far shown that the HRRR simulates the observed near-surface meteorological fields well but has difficulty simulating the energy balance. To investigate the absence of energy balance closure in more detail, we focused on the period in March and April 2017 when mean afternoon sensible heat fluxes were overestimated by 120.9 ± 110.3 W m−2 (110.7 ± 62.0 W m−2) at Belle Mina (Cullman) in the 1-h HRRR forecast, with comparable overestimates in the HRRR found for the other forecast periods. When calculating the mean afternoon (i.e., 1200–1600 LST) turbulent fluxes and using these means to evaluate trends during this period, we found that the HRRR compared poorly with the observed H at Belle Mina (r = 0.29, p = 0.03), as it overestimated H at times by over 200 W m−2 (Fig. 11a). Furthermore, the HRRR 1-h forecast indicated a statistically significant increase in H during this time (r = 0.29, p = 0.02) in contrast to the observations that showed a reduction, albeit not statistically significant (r = −0.18, p = 0.16), in H.
As noted in section 2a, there was a data gap in April 2017 at Cullman that limited the interpretation of the flux evolution at this site during the period of interest. Nonetheless, we noted that mean afternoon H was overestimated at Cullman by up to 100–200 W m−2 (Fig. 12a). Neither the observed or HRRR-derived H exhibited a statistically significant change during this period, as r = 0.15 (p = 0.35) and r = −0.09 (p = 0.56) in the observations and HRRR, respectively.
Although the differences between the observations and the HRRR were smaller for LE than for H, with the HRRR on average 16.2 ± 119.4 and 22.8 ± 57.4 W m−2 larger than the observations at Belle Mina and Cullman, respectively, the HRRR incorrectly simulated the LE evolution during this period. The Belle Mina observations showed a statistically significant (r = 0.69, p < 0.01) increase in LE between 1 March and 30 April 2017 that was absent from the HRRR (r = −0.04, p = 0.78) (Fig. 11b). However, at Cullman, we found a statistically significant increase in the observed LE (r = 0.51, p < 0.01), but this was absent in the HRRR-derived LE (r = 0.12, p = 0.46) (Fig. 12b).
The significant differences that we found between the HRRR-derived and observed H and LE at both sites can at least partially be explained by differences in soil moisture and near-surface moisture. Although we did not have soil moisture fields from the HRRR, we used HRRR rainfall as a surrogate. In doing so, we found that the HRRR underestimated rainfall during this period. For example, at Belle Mina, 201 mm of rain was recorded between 1 March and 30 April, whereas the HRRR 1-h forecast indicated 184 mm of rainfall during this period. Less precipitation during this period in the HRRR would have led to the HRRR underestimating soil moisture, contributing to the HRRR’s overestimates of H. These dry biases in the HRRR may partially help explain the differences between morning and afternoon in the slope of the relationship between SWnet and H + LE discussed in section 3d. Regardless, the HRRR’s large overestimates of H were inconsistent with what we found for Ta and Td during this period, as we would expect the model’s large overestimates of H to translate into much larger biases in Ta.
We speculate that the discrepancies noted here occurred because, as noted previously, the area surrounding Belle Mina was mostly grassland whereas it was assigned the cropland plant functional type in the HRRR. The slightly smaller biases at Cullman may be attributed to the HRRR land use type being more representative of the actual land use type, as discussed in section 3d. Furthermore, the model–data mismatch identified here may also be an artifact of HRRR’s data assimilation system; we discuss this point in more detail in the next section. Overall, though, we found that HRRR-derived mean afternoon Ta (Figs. 11c, 12c) and Td (Figs. 11d, 12d) from different length forecasts compared well with the Belle Mina and Cullman observations. For Ta, the mean difference between the observations and HRRR was −1.15° ± 0.70°C (−1.29° ± 0.60°C) at Belle Mina (Cullman); for Td, this difference was 0.17° ± 1.38°C (−0.90° ± 0.98°C) at Belle Mina (Cullman).
In summary, it was not surprising that the turbulent fluxes from the HRRR did not agree as well with the observations as did the near-surface meteorological fields. This finding was consistent with previous work (e.g., Patil et al. 2011; Sun et al. 2017) and can be partially attributed to the fact that the subgrid-scale variability in surface characteristics within the footprint of the flux tower measurements cannot be fully resolved by the HRRR.
However, the magnitude of differences between the HRRR and the observed fluxes from Belle Mina and Cullman is noteworthy, as shown by 1) the differences in the mean monthly diurnal cycles (cf. section 3c), 2) the morning versus afternoon differences in the forcing versus the response (cf. section 3d), and 3) the failure of the HRRR to capture trends in the surface flux evolution during the spring 2017 study period (cf. section 3e). Although there is unlikely a single cause for the discrepancies we found between the observations and the HRRR in the present study, these discrepancies are likely due to a combination of 1) errors in HRRR’s initial/boundary conditions, 2) errors in the HRRR’s LSM, and 3) errors in the HRRR’s PBL scheme. As the HRRR LSM is modified every hour using Ta and Td observations measured 2 m AGL (i.e., Benjamin et al. 2016), errors in these initial conditions can introduce errors in the HRRR forecasts. Newer versions of the HRRR, including the HRRRv3 which became operational at NCEP in summer 2018, include modifications to the surface layer coupling coefficients and changes in the data assimilation system [e.g., including giving more weight to the Gridpoint Statistical Interpolation (GSI) analysis system and changes in the digital filter initialization (DFI)]. Testing the extent to which these modifications improve the HRRR’s estimates of surface fluxes will be the subject of future studies.
Additionally, some of the biases we identified in the present study may be attributed to the PBL mixing scheme used, in addition to the HRRR’s LSM. We noted in section 3d that the HRRR underestimated SWin due to an incorrect treatment of subgrid-scale clouds. In addition to the modifications described previously, the HRRRv3 uses a MYNN PBL scheme that has been modified to an eddy diffusivity mass flux (EDMF) scheme (e.g., Angevine et al. 2018). The EDMF scheme includes better subgrid-scale clouds and is expected to reduce the radiative errors found in the present study.
Last, we note that, although the approach currently used in the HRRR may work for HRRR forecasts that are up to ≈36 h in length like the HRRRv3, this approach is likely to fail at longer time scales. Longer forecast periods (e.g., from subseasonal to seasonal) are more sensitive to LSM behavior than the forecast periods currently available from the HRRR; thus this current implementation is not expected to do as well at these longer forecast periods. For example, latent heat fluxes over the southeastern United States were overestimated by ≈200 W m−2 in the North American Model (NAM) following the 2007 spring freeze in the eastern United States. This was because the late season freeze led to significant vegetation damage (e.g., Gu et al. 2008; Mulholland et al. 2009) that resulted in a significant reduction in the observed latent heat fluxes and thereby a larger partitioning of available energy into sensible heat flux. If the models are unable to correctly simulate the fluxes, they cannot be expected to produce reliable Ta and Td forecasts.
5. Summary and outlook
In the present study, we evaluated how well the HRRR reproduced near-surface meteorological fields and the surface energy balance over an 8-month period, using observations obtained from two 10-m micrometeorological towers installed in northern Alabama. We found that the HRRR did well reproducing the observations of air and dewpoint temperature at both of these sites, with R2 generally >0.95 and the RMSE typically <2°C for the period over which we evaluate the HRRR. Model biases were largest during the warm season months, with overestimates in mean afternoon temperature on the order of 2°C at both sites, and smallest in the winter months when temperature differences were <1°C.
We also identified biases in the HRRR’s treatment of the components of the surface energy balance. Whereas the HRRR closed the surface energy balance and also generally simulated longwave and shortwave radiation well, there were significant biases in sensible and latent heat that varied seasonally and, to some extent, from site to site.
Overall, this study highlights some of the strengths and weaknesses of the operational HRRR forecast model. Although the HRRR accurately reproduced the near-surface meteorological fields during the study period, the HRRR has notable biases in the fluxes that cannot explain the good agreement between the observed and HRRR-derived near-surface temperature and moisture fields. This finding helps to motivate the need for additional studies of the HRRR land surface scheme so that improvements can be made to remedy these deficiencies. This will be essential as the HRRR makes forecasts for longer time scales.
We thank Mr. Mark Heuer, Mr. Randall White, Mr. Edward Dumas, and Mr. Tom Wood for their support in helping to maintain the micrometeorological measurements at Belle Mina and Cullman. We gratefully acknowledge Mr. Bobby E. Norris, Jr. of the Auburn University Tennessee Valley Research and Extension Center at Belle Mina for allowing us to install our micrometeorological tower on-site. We thank Mr. Arnold Caylor from the North Alabama Horticulture Research Station for allowing us to install our micrometeorological tower on his property at Cullman. We thank Dr. Tatiana Smirnova of the NOAA Global Systems Division (GSD) and the Cooperative Institute for Research in Environmental Sciences (CIRES) for providing additional insights into the HRRR’s LSM. We also thank Dr. John Kochendorfer of NOAA/ARL/ATDD, as well as the three anonymous reviewers, whose comments helped improve the manuscript. A portion of this work was supported by the NOAA Atmospheric Science for Renewable Energy (ASRE) program. Finally, we note that the results and conclusions, as well as any views expressed herein, are those of the authors and do not necessarily reflect those of NOAA or the Department of Commerce.
Scatterplots of the Relationships between Observations and HRRR
Here we present all 96 scatterplots from which the summary statistics shown in Tables 2, 3, and 4 were computed. To this end, Fig. A1 shows the relationship between the 1-h HRRR forecast and Belle Mina observations from 1 September 2016 to 30 April 2017 for all variables in the present study and for all time periods. The same information for Cullman is shown in Fig. A2. In Figs. A3 and A4, these relationships are shown only for daytime (i.e., 1200–1600 LST) values at Belle Mina and Cullman, respectively, and in Figs. A5 and A6 these relationships are shown only for the nighttime (i.e., 0000–0400 LST).