The Development and Accuracy Assessment of Wet Bulb Globe Temperature Forecasts

Jordan Clark aThe University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
bNicholas Institute for Energy, Environment and Sustainability, Duke University, Durham, North Carolina

Search for other papers by Jordan Clark in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0003-0164-2497
,
Charles E. Konrad aThe University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
cNOAA Southeast Regional Climate Center (SERCC), Chapel Hill, North Carolina

Search for other papers by Charles E. Konrad in
Current site
Google Scholar
PubMed
Close
, and
Andrew Grundstein dDepartment of Geography, The University of Georgia, Athens, Georgia

Search for other papers by Andrew Grundstein in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

Heat is the leading cause of weather-related death in the United States. Wet bulb globe temperature (WBGT) is a heat stress index commonly used among active populations for activity modification, such as outdoor workers and athletes. Despite widespread use globally, WBGT forecasts have been uncommon in the United States until recent years. This research assesses the accuracy of WBGT forecasts developed by NOAA’s Southeast Regional Climate Center (SERCC) and the Carolinas Integrated Sciences and Assessments (CISA). It also details efforts to refine the forecast by accounting for the impact of surface roughness on wind using satellite imagery. Comparisons are made between the SERCC/CISA WBGT forecast and a WBGT forecast modeled after NWS methods. Additionally, both of these forecasts are compared with in situ WBGT measurements (during the summers of 2019–21) and estimates from weather stations to assess forecast accuracy. The SERCC/CISA WBGT forecast was within 0.6°C of observations on average and showed less bias than the forecast based on NWS methods across North Carolina. Importantly, the SERCC/CISA WBGT forecast was more accurate for the most dangerous conditions (WBGT > 31°C), although this resulted in higher false alarms for these extreme conditions compared to the NWS method. In particular, this work improved the forecast for sites more sheltered from wind by better accounting for the influences of land cover on 2-m wind speed. Accurate forecasts are more challenging for sites with complex microclimates. Thus, appropriate caution is necessary when interpreting forecasts and onsite, real-time WBGT measurements remain critical.

Significance Statement

This research assesses the accuracy of wet bulb globe temperature (WBGT) forecasts. WBGT is a heat stress index that accounts for impacts of air temperature, humidity, wind, and radiation. It is widely used in occupational, athletic, and military settings for heat stress assessment, yet WBGT forecasting in the United States is a relatively new development. These forecasts can be used by decision-makers to better plan activities. We found that WBGT forecasts by NOAA’s Southeast Regional Climate Center and Carolinas Integrated Sciences and Assessments were within 0.6°C of observations overall in North Carolina and less biased than forecasts based on methods used by the U.S. National Weather Service, which had larger, colder biases that present potential safety issues in planning.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jordan Clark, jordan@alumni.unc.edu

Abstract

Heat is the leading cause of weather-related death in the United States. Wet bulb globe temperature (WBGT) is a heat stress index commonly used among active populations for activity modification, such as outdoor workers and athletes. Despite widespread use globally, WBGT forecasts have been uncommon in the United States until recent years. This research assesses the accuracy of WBGT forecasts developed by NOAA’s Southeast Regional Climate Center (SERCC) and the Carolinas Integrated Sciences and Assessments (CISA). It also details efforts to refine the forecast by accounting for the impact of surface roughness on wind using satellite imagery. Comparisons are made between the SERCC/CISA WBGT forecast and a WBGT forecast modeled after NWS methods. Additionally, both of these forecasts are compared with in situ WBGT measurements (during the summers of 2019–21) and estimates from weather stations to assess forecast accuracy. The SERCC/CISA WBGT forecast was within 0.6°C of observations on average and showed less bias than the forecast based on NWS methods across North Carolina. Importantly, the SERCC/CISA WBGT forecast was more accurate for the most dangerous conditions (WBGT > 31°C), although this resulted in higher false alarms for these extreme conditions compared to the NWS method. In particular, this work improved the forecast for sites more sheltered from wind by better accounting for the influences of land cover on 2-m wind speed. Accurate forecasts are more challenging for sites with complex microclimates. Thus, appropriate caution is necessary when interpreting forecasts and onsite, real-time WBGT measurements remain critical.

Significance Statement

This research assesses the accuracy of wet bulb globe temperature (WBGT) forecasts. WBGT is a heat stress index that accounts for impacts of air temperature, humidity, wind, and radiation. It is widely used in occupational, athletic, and military settings for heat stress assessment, yet WBGT forecasting in the United States is a relatively new development. These forecasts can be used by decision-makers to better plan activities. We found that WBGT forecasts by NOAA’s Southeast Regional Climate Center and Carolinas Integrated Sciences and Assessments were within 0.6°C of observations overall in North Carolina and less biased than forecasts based on methods used by the U.S. National Weather Service, which had larger, colder biases that present potential safety issues in planning.

© 2024 American Meteorological Society. This published article is licensed under the terms of the default AMS reuse license. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Jordan Clark, jordan@alumni.unc.edu

1. Introduction

Exposure to extreme heat leads to more deaths than any other weather event in the United States (CDC 2010; NWS 2020a). To prevent adverse health outcomes resulting from heat exposure, systems have been developed that provide warnings to the public during particularly dangerous periods of hot weather. These systems, often referred to as heat-health warning systems (HHWS), are based on weather forecasts of variables, such as air temperature or the heat index (Kovats and Kristie 2006). While the heat index is commonly used in the United States for warning of dangerous conditions, other tools such as the NWS Heat Risk tool are used and are currently being further spread. The heat index (Rothfusz 1990) accounts for the effect of humidity and air temperature, notably only in the shade. However, other heat stress indices, such as wet bulb globe temperature (WBGT), provide more comprehensive assessments of environmental heat stress (Budd 2008; Hondula et al. 2014).

Prior to 2018/19, WBGT was not a routinely forecast variable in the United States. In 2018, NOAA’s Southeast Regional Climate Center (SERCC) and Carolinas Integrated Sciences and Assessments (CISA) developed a WBGT forecast tool, which was operationalized in the summer of 2019 (SERCC and CISA 2023). The United States National Weather Service (NWS) released an experimental WBGT forecast in 2019 that was made operational 1 June 2022 (NWS 2019a, 2022). The research presented here assesses the accuracy of the SERCC WBGT forecast and the forecasts made with the methods used by the NWS through comparisons against observed WBGT in North Carolina and estimated WBGT at weather stations in select parts of the CONUS, providing insight on future improvements.

a. Wet bulb globe temperature

The U.S. military developed WBGT in the 1950s to reduce the incidence of heat-related casualties at training camps during times of extreme heat (Budd 2008; Yaglou and Minard 1957). WBGT is calculated by adding together three components: dry-bulb temperature, natural wet bulb temperature, and black globe temperature (Budd 2008). The black globe temperature is measured using a black globe thermometer. The temperature probe is suspended inside the black globe, and the globe itself is unshielded from radiation. The black globe temperature is an indicator of the temperature due to radiative forcing incident on human skin, including both direct and diffuse shortwave radiation and longwave radiation from the surface of Earth (Kopec 1977; Liljegren et al. 2008). The temperature of the black globe is also influenced by wind speed. The second term, the dry-bulb temperature, is a standard measure of ambient air temperature, with the temperature sensor located inside of a radiation shield that is naturally ventilated. Unlike the commonly measured and estimated psychrometric wet bulb temperature, which is similarly located in a radiation shield like the dry bulb thermometer, the natural wet bulb thermometer is measured unshielded from radiation (Liljegren et al. 2008). There is a wet wick that is wrapped around the bulb of the thermometer and the evaporation of water from this wick mimics the cooling effect of sweat evaporating off of human skin. From these three measures, WBGT is calculated according to the following equation:
WBGT=0.7NWB+0.2Tg+0.1Ta,
where NWB is the natural web bulb temperature, Tg is the black globe temperature, and Ta is the dry bulb temperature (ambient air temperature).

WBGT continues to become more popular for assessing heat stress, including applications in athletics (Casa et al. 2015; Roberts et al. 2021), occupational settings (ACGIH 2017; ISO 1989, 2017; NIOSH 2016; OSHA 2017), and the military (US Department of the Army 2022). Numerous U.S. states now require measurements of WBGT to determine if it is safe for athletic practice outdoors (Grundstein et al. 2015; NCHSAA 2016), such as in North Carolina, South Carolina, and Georgia. Ultimately, WBGT is utilized based on a given value’s corresponding level of danger, often referred to as flag level. The WBGTs used to define each threshold and associated activity modifications vary depending on setting and application (such as in occupational environments or athletics). An example of common values used for thresholds is 26.7°–29.4°C (green flag), 29.5°–31.0°C (yellow flag), 31.1°–32.1°C (red flag), and 32.2°C+ (black flag). These particular threshold values, which are the thresholds utilized throughout this paper, are utilized in high school athletics in North Carolina (NCHSAA 2016) and are identical to the thresholds used by the U.S. Army (US Department of the Army 2022), except for the green flag, which for the latter begins at 27.8°C instead of 26.7°C. Flag levels denote increasing levels of danger, with green flag requiring limited modifications to activity and black flag, for example, requiring cancellation of high school football practices in various states.

b. Forecasts of WBGT

Accessible forecasts of WBGT enable advance knowledge of potentially hazardous periods, which has applications in numerous sectors and settings. For example, in athletics and occupational environments, certain workouts or occupational activities may be necessary but also lead to increased heat strain. With the associated activity modification guidelines (such as work/rest ratios) and recommendations on adjustments to clothing or equipment based on WBGTs, forecasts of the index allow for planning of activities to occur on certain days or at certain times of day.

Additionally, with forecasts of WBGT available, this lays the groundwork for future use of WBGT by other sensitive groups, since it is recognized to be more comprehensive than the heat index (Budd 2008; Hondula et al. 2014). However, there has been limited research on the accuracy of WBGT forecasts. Although WBGT forecasts have been in use in some parts of the world for many years, such as in Japan (Hoshi and Inaba 2007), these forecasts have not been common in the United States.

In 2018, NOAA’s Southeast Regional Climate Center (SERCC) and Carolinas Integrated Sciences and Assessments (CISA) created a WBGT forecast tool, which was made operational in the summer of 2019 (SERCC and CISA 2023). As of 2022, this forecast tool was operational for the eastern two-thirds of the contiguous United States. Similarly, in 2019, the NWS began including WBGT forecasts in an experimental version of their gridded forecast data, the National Digital Forecast Database (NDFD) (NWS 2019a). This forecast recently became operational in 2022 (NWS 2022). However, the SERCC/CISA and NWS WBGT forecasts utilize different methods. One aspect of the research here is to assess the accuracy of these different forecast methodologies.

Complicating the forecasting of WBGT, the degree to which any weather model accurately captures the heterogeneity of land types and microclimates within a given pixel of its model output is limited, as there can be wide variation in WBGT and the microclimatic variables influencing WBGT across hundreds of meters (Verkaik et al. 2005). This variability is not well captured by weather models due to their coarser resolution, since the highest-resolution output of widely available and spatially expansive models is 2.5 km. In addition to assessing the accuracy of these WBGT forecasts, this research also assesses the utility of supplementing subforecast grid scale surface roughness information to better tune the downscaling of wind speeds for calculating the WBGT forecast. The current SERCC/CISA forecast tool employs Pasquil–Gifford (PG) stability classes for the downscaling of wind from 10 to 2 m (Bowen et al. 1983; Frank et al. 2020; U.S. EPA 2000).

The importance of accurately incorporating the wind at 2 m for WBGT is paramount, given that high (hot) values of WBGT are sensitive to slight changes in environmental conditions, particularly changes in wind speed. With the sensitivity of WBGT (and the human body) to small differences in wind speed, variations in surface roughness can lead to vastly different WBGTs at the ground between nearby locations. For example, under full sun and with a dewpoint of 21.1°C and an air temperature of 30°C, a slight change in wind speed from 1.3 to 0.6 m s−1 can result in a change in WBGT of two flag categories (from yellow to black flag).

The SERCC/CISA forecast tool utilizes both the NWS National Digital Forecast Database (NDFD) and the NWS National Blend of Models (NBM) as inputs to a routine that computes the WBGT forecast (SERCC and CISA 2023). Both products forecast at hourly time steps out to 36 h in the future, but the NDFD hourly forecast changes to a 6-hourly forecast at 72 h past initialization while the NBM forecasts 3-hourly values out to 192 h. Thus, the NDFD is used for 1–69 h and the NBM for 72–120 h for the SERCC/CISA forecast tool. References to the SERCC/CISA WBGT forecast will be hereafter referred to as S-WBGT. The S-WBGT methodology supplemented with surface roughness for downscaling wind speeds will hereafter be referred to as S-WBGT Z.

The forecast analysis here serves as an update to an assessment of forecast accuracy presented in Clark and Konrad (2020). For assessing accuracy, the WBGT forecasts will be compared with 1) Observed WBGT measurements collected with a WBGT meter that is compliant with standards established by the International Organization for Standardization and 2) WBGT estimated from weather stations. Satellite imagery (Sentinel 2A/2B and Landsat 8 ETM+) was used to calculate the normalized difference vegetation index (NDVI) and green vegetation fraction (GVF) to then estimate surface roughness across portions of the forecast domain for the S-WBGT Z forecast. Two sources of satellite imagery were used to assess if the higher-resolution imagery resulted in more accurate surface roughness values and thus a more accurate WBGT forecast. In addition to looking at forecast bias overall between methods, accuracy of WBGT forecasts using the NBM and NDFD are compared and variations in accuracy across space, weather condition, and time are assessed.

2. Data

a. Weather forecast data

Hourly gridded forecast data for two NWS forecast products (four runs per day: 0000, 0600, 1200, and 1800 UTC) were archived for the summers of 2019–21: National Digital Forecast Database (NDFD) and the National Blend of Models (NBM). The NDFD is a product containing a mosaic of digital weather forecasts from the NWS and National Centers for Environmental Prediction (NCEP) (NWS 2019b). The NBM is an ensemble of guidance, a statistically derived blend from numerous numerical weather prediction models (Craven et al. 2020). Two versions of the NBM were utilized here, version 3.2 (Craven et al. 2020) for the summer of 2019 and version 4.0 (which constitutes 85% of the NBM forecast data) for the summers of 2020 and 2021 (NWS 2020b). While the majority of the analysis focuses on the 24-h forecast, this incorporates a window surrounding this forecast lead time, ranging from 21 to 27 h to ensure forecasts are for relevant times of day and for larger sample sizes of forecast data points.

b. WBGT data

A weather station was collocated with a WBGT meter to provide data quality checks. The weather station was a Davis Instruments Vantage Pro 2 Plus (Fig. 1) that recorded all variables needed to estimate WBGT at 10-s intervals, detailed below. WBGT data were recorded with a WBGT meter designed to meet the specifications for a WBGT meter as outlined by the International Organization for Standardization, first used in Cooper et al. (2017) (Fig. 1). The ISO guidelines (Parsons 2006) require that the black globe have a diameter of 0.15 m and the temperature sensor should measure 20°–120°C with accuracies of ±0.5°C for 20°–50°C and ±1°C for temperatures greater than 50°C. The natural wet bulb temperature sensor should be a cylindrical shape, have a diameter of 6 ± 1 mm and length of 30 ± 5 mm, and measure the range 5°–40°C with an accuracy of ±0.5°C. The wick situated over the temperature sensor must be white and made of cotton or other water-absorbent material. Last, the dry bulb temperature sensor should be positioned within a radiation shield and measure the range 10°–60°C with a ±1°C accuracy (Parsons 2006).

Fig. 1.
Fig. 1.

Field work instruments (WBGT meters and weather station). The WBGT meter (right) with a black globe thermometer, natural wet bulb temperature probe situated in a water reservoir, and dry bulb sensor in a radiation shield. A weather station (left) and a Kestrel 5400 (middle) were also collocated with the WBGT meter.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

The instruments were situated such that they were 1.5 m above the ground, on average, with the weather station anemometer situated highest at 2 m. Data were recorded at various intervals (2-, 5-, 10-, 30-, 60-, or 120-s intervals), depending on the location of the meter and thus the frequency at which it could be accessed. Measurements were taken at several locations throughout the summers of 2019–21: the Horace Williams Airport in Chapel Hill, North Carolina, and within suburban environments in Chapel Hill, Durham, and Shelby, North Carolina.

Two methods were used to identify when instruments were in the shadows of buildings or trees during data collection. First, we periodically captured images of the instruments to directly observe any shading. Second, we compared the measured solar radiation from the weather station with estimated clear-sky solar radiation for the location and time. This was completed to ensure the accuracy of the observed WBGT being utilized, in that shaded WBGT would skew the forecast bias assessment since the forecasts were not made for shaded WBGT. All data were collected over grassy surfaces with no obstacles between the sky and sensors (i.e., no trees). The first 15 min of data for a given instance of data collection were discarded. This allowed adequate time for the instrumentation to equilibrate to the environment after being stored indoors or in the shade (Kestrel 2021). WBGT varies rapidly over small time periods due to slight changes in insolation (e.g., from clouds) and wind speed, which fluctuate rapidly. Thus, to robustly compare with the (instantaneous) hourly time step forecast data, the Observed WBGT was averaged over the course of 20 min (10 min before and after the top of every hour).

Station WBGT was estimated using weather stations from three networks. These three networks include the Automated Surface Observing System (ASOS) network, Automated Weather Observing System (AWOS) network, and the North Carolina Environment and Climate Observing Network (ECONet), which is maintained by the North Carolina State Climate Office. In total, 169 stations were used (130 stations from ASOS/AWOS and 39 stations from the ECONet) (Fig. 2). The 130 ASOS/AWOS stations were mainly in North Carolina. Stations in other states were selected based on the availability of Sentinel satellite imagery (detailed below). Iowa Environmental Mesonet (IEM) archive (Iowa Environmental Mesonet 2021) was used to download the weather station data for stations on the ASOS and AWOS networks. The Climate Retrieval Observations Network of the Southeast Database housed at the North Carolina State Climate Office (NC CRONOS 2021) was used to download the ECONet weather station data. ECONet weather stations directly measured solar radiation, which is required to estimate WBGT. However, for ASOS/AWOS weather stations, total cloud cover is reported in lieu of solar radiation. The reported cloud cover was used to estimate the observed solar radiation for the stations on this network, detailed below. Hourly data for all stations for the heat season (1 May–30 September) 2019–21 were used.

Fig. 2.
Fig. 2.

Map of the weather stations used (by network) for station WBGT.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

c. Satellite imagery and land cover

Satellite imagery from Sentinel-2A and Sentinel-2B were downloaded from the Copernicus Open Access Hub, specifically the Level-1C product (Copernicus Sentinel data 2019). Imagery included scenes covering the entire state of North Carolina and parts of South Carolina, Georgia, Alabama, Tennessee, Virginia, Pennsylvania, Illinois, Oklahoma, and Texas. All imagery utilized was captured on three dates: 29 August, 30 August, and 7 September 2019. In addition to the Sentinel imagery, Landsat 8 ETM+ Level-2 imagery was also obtained for the state of NC, with a collection date of 17 August 2019. The 2019 National Land Cover Database (NLCD) was retrieved for the CONUS to provide the land cover information utilized here (Dewitz 2021).

3. Methods

Station WBGT across the ASOS/AWOS network was estimated using the Liljegren et al. (2008) methodology, which has been found to be most accurate at estimating WBGT (Lemke and Kjellstrom 2012; Patel et al. 2013). S-WBGT forecasts likewise applied the Liljegren et al. methodology for the NDFD/NBM. The necessary formulas for using the Liljegren et al. (2008) method were provided by the R package “wbgt” (Lieblich and Spector 2017).

To estimate solar radiation for ASOS/AWOS weather stations, the reported cloud cover categories were converted to a percentage cloud cover. These stations report cloud cover at several levels, using the following categories: clear (0%–5%], few (5%–25%], scattered (25%–50%], broken (50%–87%], and overcast (87%–100%] (brackets denote inclusivity) (NOAA et al. 1998). Comparing the observed solar radiation at ECONet weather stations with estimated solar radiation revealed that using the maximum value within each of these ranges produced the most accurate results, that is, clear (5%), few (10%), scattered (50%), broken (87%), and overcast (100%). Since ASOS/AWOS stations measure clouds at multiple levels, the layer with the largest amount of cloud cover was used to derive the percentage cloud cover variable.

With 1) the percentage cloud cover at ASOS/AWOS stations and 2) the cloud cover provided in the forecast data, the clear-sky direct radiation value (based on the time and location of a station observation or forecast data point) was modified by the percentage cloud cover to estimate solar radiation with the following equation:
Srad=R0(10.75n3.4),
where n is the cloud cover fraction (0.0–1.0) and R0 is the clear-sky direct radiation (W m−2) estimated using (3):
R0=990sin(30),
where ∅ is solar elevation angle (Kasten and Czeplak 1980).
The forecast data and observed weather station data reported wind speed at 10 m. Except for the S-WBGT Z (detailed below), these measurements were logarithmically downscaled from 10 to 2 m using the following function:
UZ=Ur(ZZr)p,
where Uz is the mean wind speed at height Z above ground, Ur is the wind speed at the reference height Zr, and p is the power-law exponent (Bowen et al. 1983; Frank et al. 2020; U.S. EPA 2000). The “urban” exponents were used here and are provided in Table 1, since this is implemented in the SERCC/CISA WBGT tool after preliminary field work revealed that the “rural” exponents resulted in 2-m wind speeds that were too fast.
Table 1.

The solar radiation delta T (SRDT) method was used to determine Pasquil–Gifford stability classes and the corresponding power-law exponent (drawn from U.S. EPA 2000).

Table 1.

The solar radiation delta T (SRDT) method was used to determine Pasquil–Gifford stability classes and the corresponding power-law exponent (Frank et al. 2020; U.S. EPA 2000). The SRDT method serves as an indicator of atmospheric stability by using observed solar radiation during the day and vertical temperature difference at night (Frank et al. 2020; U.S. EPA 2000). For observations (or forecast data points) with wind speeds of less than 1 m s−1, the wind speed values were increased to 1 m s−1. This decision was based on the sensitivity of the anemometers installed at the weather stations that were used (NOAA et al. 1998).

a. Surface roughness and wind speed

A variety of methods have been developed for characterizing surface roughness across different land cover and land use types. Several characteristics of the land surface have been used to assess surface roughness length, including NDVI (Bastiaanssen et al. 1998; Markert et al. 2019), GVF (Markert et al. 2019; Zeng et al. 2012), and leaf area index (LAI) (Su 2002; Zheng et al. 2014). The methodology utilized here drew upon the method developed for the Noah Land Surface Model (LSM) version 3.4.1 (Chen and Zhang 2009) since this parameterization (7) was found to be most consistent across different land cover types and climate patterns (Markert et al. 2019; Zheng et al. 2014).

The first step was to atmospherically correct the Sentinel satellite imagery using the Sentinel Application Platform (SNAP) software and the plugin Sen2Cor. The Landsat imagery was already atmospherically corrected. For both the Sentinel and Landsat imagery, the NDVI was calculated with (5):
NDVI=NIRREDNIR+RED,
where NIR is reflectance in the near-infrared range and RED is reflectance in the red range. Following Zheng et al. (2014) and Markert et al. (2019), the GVF was calculated using (6):
GVF=NDVINDVIminNDVImaxNDVImin,
where NDVI is the NDVI of a given pixel, NDVImax is the maximum NDVI for a land cover class, and NDVImin is the NDVI of bare soil (0.01). The land cover classification data were drawn from the 2019 National Land Cover Database (NLCD) (Dewitz 2021). Since the NLCD has a resolution of 30 m, the Sentinel NDVI was rescaled to match this resolution before calculating NDVImax per land cover class and the GVF.
Finally, surface roughness was calculated with (7) (Markert et al. 2019; Zheng et al. 2014):
Z0m=(1GVF)Z0m,min+GVF×Z0m,max,
where GVF is green vegetation fraction, Z0m,min is the minimum surface roughness in meters, and Z0m,max is the maximum surface roughness in meters, with the values for the latter two provided in Table 2 (Markert et al. 2019).
Table 2.

Surface roughness length ranges for the eight land cover classes. Drawn from Markert et al. (2019).

Table 2.
To incorporate the influences of surface roughness at varying spatial scales for a given location, a weighted average surface roughness was calculated. The different configurations for weighting each scale for calculating this average are provided in Table A1 in appendix A. This average used the roughness values from the following scales: 30 m (10%), 100 m (25%), 250 m (50%), and 500 m (15%). After deriving the weighted average surface roughness, the wind speeds to be used for S-WBGT Z were downscaled utilizing the following function from van Den Berg (2004):
u(z)=u(zref)ln(Z/Z0)ln(Zref/Z0),
where Z0 is surface roughness length in meters (weighted average surface roughness), Zref is the height of original wind speed (10 m), Z is the height of the downscaled wind speed (2 m), and u is wind speed (van Den Berg 2004). This equation differs from (4) since surface roughness is being utilized in this case. This was calculated for both the forecast data and weather station data. After downscaling the wind speeds, the S-WBGT Z was calculated, with the methods for converting cloud cover to solar radiation identical to the ones described above for S-WBGT.

b. NWS WBGT

As of 2022, WBGT is a forecast parameter in the NDFD (NWS 2022). However, it is not a parameter in the current version of the NBM. Additionally, since the methods used by the NWS in their forecast WBGT product have changed over the duration of this study, the NDFD forecast WBGT values were not used directly for this assessment. Instead, the forecast WBGT was recalculated using the relevant parameters from the NBM/NDFD based on the latest methods used by the NWS. For the forecast WBGT calculated with methods used by the NWS (“NWS WBGT”), all data processing of solar radiation values and wind speed (downscaled without surface roughness) were identical to those described above for S-WBGT. Since the NDFD does not have the shortwave solar radiation parameter found in NBM v4.0, WBGT forecasts calculated from the NBM did not use this field, but instead derived solar radiation from the forecast percentage cloud cover. The NWS WBGT, however, utilizes different methods for estimating the black globe temperature and the natural wet bulb temperature (Boyer 2022). Black globe temperature estimation utilizes a slightly modified version of the methodology from Dimiceli et al. (2011). These modifications include 1) direct beam radiation capped at 0.75 (instead of 1.0) and 2) the convective heat transfer coefficient set to 0.228 (instead of 0.315) for daytime and 0 at night (T. Boyer 2022, personal communication).

In 2022, the NWS methodology for estimating the natural wet bulb temperature in their operational, gridded NDFD product was changed (Boyer 2022; T. Boyer 2022, personal communication). However, the methodology employed by the NWS before this change is used here, since it was found to perform notably better in comparison with the field observations collected in this study, which was a modified version of the method developed in Hunter and Minyard (1999):
Tn=Tw+0.00117S0.233u+1.072,
where Tn is the natural wet bulb temperature (°C), Tw is the psychrometric wet bulb temperature (°C), S is solar irradiance (W m−2), and u is 2-m wind speed (m s−1) (Boyer 2022).

c. Analysis

Observed WBGT data were collected throughout the summers of 2019–21 from May to September at the locations detailed above. Forecast accuracy for observed WBGT was defined as the forecast WBGT minus the observed WBGT. For station WBGT (WBGT estimated at ASOS/AWOS and ECONet stations), the accuracy was defined as the forecast WBGT minus the station WBGT, as estimated by the Liljegren et al. (2008) methodology. In addition to assessing the overall hourly forecast accuracy for each method compared to both observed WBGT and station WBGT, the forecast accuracy was stratified by hour of day and weather condition. It is worth noting that the sample sizes for the observed and station WBGT differ, given that the station WBGT consists of numerous weather stations while observed WBGT draws from field work with a limited number of instruments (Tables C1 and C2 in appendix C). Comparisons were made between the NBM and NDFD WBGT forecasts and at varying forecast lead times for these forecast products. To determine the utility of incorporating surface roughness into the forecast, comparisons were made between the forecast accuracy of the S-WBGT and S-WBGT Z. Only data points between 0600 and 2000 local time (LT) were included since this is the period during which heat stress and WBGT are highest.

Since WBGT is used based on the corresponding flag level for a given value, the accuracy of forecasting the flag levels associated with the forecast WBGT, observed WBGT, and station WBGT were also assessed. Contingency tables for WBGT flag accuracy were created and verification statistics calculated for each forecast method. Using the R package “verification” (NCAR Research Applications Laboratory 2015), the following statistics were calculated and compared for each flag level:

1) WBGT flag accuracy metrics (by flag level)

  1. Percent correct (%): Total of correct forecasts and correct rejections (no event forecast, no event observed) divided by total number of forecasts (Jolliffe and Stephenson 2012).

  2. Hit rate (%): percentage of correct forecasts (e.g., black flag forecast, black flag observed, hit) (Jolliffe and Stephenson 2012).

  3. Bias score: measure indicating the direction of bias (positive/negative) in addition to the magnitude (ratio of frequency of forecasting a flag level to the frequency of observations at that flag level). Values greater than 1 correspond to positive (warm) biases. Values less than 1 correspond to negative (cool) biases (Jolliffe and Stephenson 2012; NCAR Research Applications Laboratory 2015).

  4. False alarm ratio: total number of false alarms (event forecast, but not observed) for a given flag level divided by the total number of forecasts for that flag level (Jolliffe and Stephenson 2012).

2) WBGT flag accuracy metrics (overall assessment)

  1. Gerrity skill score (GSS): verification measure for categorical forecasts that accounts for the “closeness” of categories in its assessment, e.g., green flag being closer to yellow flag but farthest from black flag (Jolliffe and Stephenson 2012; NCAR Research Applications Laboratory 2015). Values range from −1 to 1, with 1 being a perfect forecast. In a Gerrity skill score analysis of accuracy, greater credit is given to correct (and almost correct) forecasts of rare events and less credit is awarded to correct forecasts of common events (Jolliffe and Stephenson 2012).

  2. Heidke skill score (HSS): verification measure for categorical forecasts that includes correct random forecasts. Values range from negative infinity to 1. Scores greater than zero mean the forecast does better than random chance (Jolliffe and Stephenson 2012).

4. Results

a. Forecast bias comparisons

The research presented here had two objectives: 1) assessing the accuracy of WBGT forecasts and 2) determining if forecasts can be improved by using surface roughness to downscale wind speeds. Unless otherwise stated, forecast bias refers to the “relative bias” between forecasts and observations, with positive and negative relative bias indicating over and under forecasting, respectively.

WBGT forecast bias compared against both observed and station WBGT varied as a function of location, hour of day, when stratified by weather conditions (e.g., temperature, humidity, wind speed), and also based on the WBGT estimation methodology. Overall, WBGT forecast biases in the NDFD were more negatively biased than NBM biases (and with higher MAEs) (Table 3), particularly when WBGT was greater than or equal to 32.2°C (Fig. 3).

Table 3.

Observed WBGT forecast mean absolute error. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods, applied to forecast data from the NBM and NDFD, respectively, in this table.

Table 3.
Fig. 3.
Fig. 3.

Observed WBGT forecast bias (24-h forecast): NBM and NDFD. Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

For both observed and station WBGT, forecast MAE of the three forecast methods were within 0.9° and 0.4°C of one another, respectively, when WBGT was less than 31.1°C. However, at higher values of WBGT (32.2°C+), the NWS WBGT median bias and MAE (Fig. 4) increased markedly, as well as the RMSE (Table 4), particularly for station WBGT. Median forecast bias for all methods relative to observed WBGT were positive (warm) when WBGT was less than or equal to 31°C. However, NWS WBGT and S-WBGT had a negative median bias when observed WBGT exceeded 31° and 32.1°C, respectively (Fig. 4). The S-WBGT Z forecast bias remained slightly positive at the highest WBGTs (Fig. 4).

Fig. 4.
Fig. 4.

WBGT relative forecast (top) bias and (bottom) MAE (24-h forecast). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT Z is SERCC/CISA forecast with wind speed downscaled to 2 m using PG stability classes (surface roughness). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

Table 4.

Mean and standard deviations (SD) of the observed/station WBGT and the three forecast WBGT values (when WBGT ≥26.7°C). RMSE and variance ratio of observed/station WBGT when WBGT ≥26.7°C and ≥29.4°C. Variance ratio is forecast:observed/station. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Table 4.

There were two notable differences with station compared to observed WBGT forecast accuracy (Fig. 4). First, the relative bias and MAE for both S-WBGT methods were lower when station WBGT was less than 31.1°C, and S-WBGT Z had a slightly negative bias when station WBGT was equal to or greater than 32.2°C (−0.1°C). Second, NWS WBGT bias was more negative when station WBGT was greater than or equal to 31.1°C compared to observed WBGT, and the median bias for both observed and station WBGT was negative (too cold) across all WBGTs, with median biases of −0.4° and −1.6°C when station WBGT was 26.7°–29.4°C and 32.2°C or above, respectively (Fig. 4).

b. Variations in bias

Assessing the forecast relative bias spatially (average daily maximum station WBGT) demonstrates the high variability across the region and between forecast methods (Fig. 3). Figures displaying WBGT forecast bias for a broader spatial domain are included in Fig. B1 in appendix B.

A majority of stations displayed a negative bias (from −0.8° to −0.1°C) for S-WBGT, with the most negative bias occurring at three stations located along the Appalachian Mountains (Fig. 5). However, S-WBGT Z bias was greater than S-WBGT (0°–1°C), and positive at all but ten stations, five of which were along the Appalachian Mountains (Fig. 5). The differences in bias between the two S-WBGT methods ranged from 0.2° to 1.0°C (WBGT > 28.9°C), with the largest bias difference occurring at stations with lower roughness (detailed below). Last, NWS WBGT forecast bias was notably more negative, with the coolest bias along the Appalachian Mountains, southern and western piedmont of North Carolina, and in coastal areas of North Carolina (Fig. 5).

Fig. 5.
Fig. 5.

Station WBGT daily maximum bias (24-h lead time): Regional variations across the Carolinas. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

Forecast bias varied across the daylight hours for observed and station WBGT. When observed WBGT was equal to or greater than 29.5°C, both S-WBGT forecasts were positive at all hours (except for 1100 and 1800 LT), and the NWS WBGT was negatively biased throughout the day, except for 1600 LT (Fig. 6). Compared to 29.5°C, all biases were larger for every hour of day when WBGT was greater than 32.2°C (not pictured). Like observed WBGT, station WBGT forecast bias followed a diurnal pattern, with the biases for all methods becoming more positive (warm) around and immediately after solar noon (Fig. 6). The highest median bias for S-WBGT and S-WBGT Z occurred at 1300 and 1400 LT. Similar to observed WBGT, early afternoon was when the NWS WBGT median bias was closest to zero, but the bias remained negative (cool) throughout all hours of the day (Fig. 6).

Fig. 6.
Fig. 6.

WBGT forecast bias (24-h lead time) by hour of day (NBM) (WBGT ≥ 29.5°C). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

Furthermore, S-WBGT forecast bias was markedly greater when observed WBGT was less than 31.1°C, particularly when dewpoint temperatures were high, wind speed was relatively high and variable (wide range of values) (0.6–2.2 m s−1), and solar radiation was both low and variable (Fig. 7). When observed WBGT was greater than or equal to 30°C, there was less variation in forecast bias across the different strata. Similar patterns were seen when stratifying station WBGT forecast bias (not pictured), but with less variation in bias magnitudes.

Fig. 7.
Fig. 7.

S-WBGT bias (24-h lead time) stratified by meteorological variables (observed WBGT). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

c. Forecast WBGT flag accuracy

The forecast WBGT flag accuracy results (corresponding to the flag thresholds discussed above) detailed here refer to the NBM 24-h forecast. Confusion matrices for all results are provided in Tables C1 and C2. The most notable difference between methods when assessing percentage of correct forecasts for each flag was with observed WBGT (Table 5). Both S-WBGT methods had the highest percent correct for yellow flag. For black flag observed, S-WBGT Z had a roughly 10% lower percent correct than the SC and NWS WBGT (Tables 5 and 6). The bias scores were similar and indicated under forecasting (bias score < 1) for green and yellow flags (observed) for both S-WBGT methods (Table 5). For red and black observed flags, both S-WBGT methods over forecast (bias score > 1) and the S-WBGT Z bias notably increased from red (1.07) to black flag (2.11) (Table 5). Station WBGT flag accuracy was similar; however, S-WBGT Z bias was 1.69 versus 1.07 for red flag (4) and the NWS WBGT hit rate for black flag dropped from 42% to 18% (Table 6).

Table 5.

Forecast WBGT flag accuracy compared to observed WBGT (NBM 24-h forecast). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT is the WBGT forecast utilizing the NWS methods.

Table 5.
Table 6.

Forecast WBGT flag accuracy compared to station WBGT (NBM 24-h forecast). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT is the WBGT forecast utilizing the NWS methods.

Table 6.

For observed and station WBGT flags, the false alarm ratio, which is the total number of false alarms for a given flag level divided by the total number of forecasts for that flag level (Jolliffe and Stephenson 2012), was highest for red flags by similar magnitudes across all forecast methods (Tables 5 and 6). The hit rate was notably higher for the S-WBGT methods relative to the NWS WBGT for 1) observed black flag (Table 5) and 2) red and black flags for station WBGT (Table 6). In both cases, the S-WBGT Z had the highest hit rates for black flag (94% and 75% for observed and station black flags, respectively).

Last, there were notable differences in bias between the 24- and 48-h NBM forecast for black flag, with a lower bias for all methods for the 48-h forecast (both observed and station WBGT) (Table 7). However, the hit rate was higher for the 24-h forecast, 92% versus 84% for S-WBGT, 94% versus 77% for S-WBGT Z, and 42% versus 8% for NWS WBGT (Table 7).

Table 7.

WBGT black flag accuracy (NBM 24- and 48-h forecast). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT is the WBGT forecast utilizing the U.S. NWS methods.

Table 7.

In addition to assessing the forecast accuracy for each flag level, two metrics were chosen to summarize the accuracy of each forecast method overall. For observed WBGT, the Gerrity skill score (GSS) and Heidke skill score (HSS) were relatively similar across methods (Table 8), with the S-WBGT having the highest scores, followed by S-WBGT Z and the NWS WBGT. At increasing forecast lead times, the GSS for S-WBGT Z remained relatively consistent (Table 9). Contrasting observed WBGT, at increasing forecast lead times, the station WBGT GSS for both S-WBGT methods decreased, with a GSS of 0.31 for S-WBGT, 0.41 for S-WBGT Z, and 0.18 for NWS WBGT for a 72-h forecast (Table 9).

Table 8.

Verification scores for WBGT flag forecast by method (NBM 24-h forecast). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT is the WBGT forecast utilizing the NWS methods.

Table 8.
Table 9.

Gerrity skill scores for NBM WBGT flag forecasts. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT is the WBGT forecast utilizing the NWS methods. Scores for forecast lead times of 24, 48, and 72 h.

Table 9.

d. Surface roughness

In addition to assessing the accuracy of WBGT forecasts, this research aimed to improve the S-WBGT forecast by incorporating surface roughness into the downscaling of wind speeds from 10 to 2 m. Using surface roughness to downscale wind speeds had a distinguishable impact on the forecast and resulting bias. Overall, using surface roughness in downscaling resulted in slower 2-m wind speeds compared to the winds downscaled using the SRDT method and Pasquil–Gifford stability classes (used in S-WBGT). Given the high sensitivity of WBGT to wind speed (e.g., the rapid increase in WBGT under low wind speeds), the improved (minimized) station WBGT bias for S-WBGT Z when WBGT was equal to or greater than 32.2°C can be directly related to an improved 2-m wind speed forecast with surface roughness being used (Fig. 4). Furthermore, the S-WBGT Z forecast bias had lower magnitudes of positive bias increases at and immediately after solar noon compared to the S-WBGT and NWS WBGT (Fig. 6).

The correlation between surface roughness length and differences in the station WBGT between the S-WBGT and S-WBGT Z were statistically significant (p < 0.01) with a Pearson correlation coefficient of 0.63 when WBGT was greater than or equal to 31.1°C. As surface roughness increased, differences between the S-WBGT Z bias compared to the bias of other methods also increased (Fig. 8). Thus, S-WBGT Z was a particularly better forecast for sites with rougher surfaces.

Fig. 8.
Fig. 8.

S-WBGT vs S-WBGT Z average station WBGT forecast bias relative to surface roughness. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

Last, weighted average surface roughness with different weightings for the roughness at varying spatial scales (30, 100, 250, and 500 m) (Table A1) was not sensitive to the different configurations of weights for calculating this average (Fig. A1). Variations in the station WBGT calculated using these different surface roughness values and corresponding wind speeds ranged from −0.1° to 0.2°C, but the majority of differences (between the 25th and 75th percentiles) ranged from −0.05° to +0.05°C (Fig. A1). The analysis here thus utilized the surface roughness weighting schema of 30 m (10%), 100 m (25%), 250 m (50%), and 500 m (15%).

The second aspect of the investigation into surface roughness sought to determine if the finer resolution of Sentinel 2A/2B satellite imagery (10 m) offered significant improvement relative to Landsat 8 ETM+ (30 m). Comparisons revealed no differences between the roughness values derived from these two sources, except for negligible differences in forested areas where there were slight differences in NDVI. Given the limited differences between the two and despite the challenges associated with higher-resolution data (e.g., storage space, data processing speed, etc.), the analysis here proceeded with using the Sentinel imagery as it was already processed.

5. Discussion and conclusions

Wet bulb globe temperature (WBGT) is a heat stress index that is increasingly utilized for safeguarding health, such as in athletics and workplaces, since it is comprehensive in accounting for environmental variables influencing heat stress. However, forecasts of WBGT have not been standard, with efforts to create these forecasts in the United States having only been undertaken recently, with the NWS forecast becoming operational in 2022. Furthermore, validation of the accuracy of these forecasts has been limited spatially, with respect to the number of sites with ground truth measurements, and temporally, by limited number of days sampled for verification. The research presented here assessed the accuracy of WBGT forecasts relative to 1) observed WBGT with an ISO-compliant WBGT meter and 2) estimated WBGT ASOS and mesonet stations. This research also evaluated efforts to improve the SERCC WBGT forecast by more accurately estimating the wind speed at 2 m based on wind speed at 10 m above the ground.

a. Overall WBGT forecast bias

Comparisons between WBGT forecasts using the NBM and NDFD revealed some differences, particularly at higher WBGTs, with the NDFD having a more negative (cold) bias. It is hypothesized that these differences are driven primarily by differences in the forecast wind speed between the two forecast products, with the NDFD wind speeds being faster than the NBM (Fig. D1 in appendix D). The NBM is used heavily to inform the NDFD gridded product (Craven et al. 2018). Thus, it is evident that the tendency for wind speeds to be increased from the baseline wind speeds forecast in the NBM has a significant impact on the performance of the NDFD WBGT forecast, particularly when WBGT is high. However, the current version of the NBM incorporated adjustments to the wind speed forecast, so it may be that this adjustment results in faster winds, and more negatively (cool) biased WBGT relative to what is seen in this study. Additionally, it is important to reiterate that the method used by the NWS for estimating the natural wet bulb component has changed from what was used here (Boyer 2022). However, this study utilized the modified Hunter and Minyard (1999) method since it was found to be more accurate.

The S-WBGT [which used the Liljegren et al. (2008) methodology for calculating WBGT] was found to be more accurate than the method used by the NWS in most instances, particularly when conditions were dangerous (WBGT > 31.1°C). When WBGT was relatively cooler (i.e., less than 29.4°C), the S-WBGT was more positively (warm) biased than the NWS WBGT. This warm bias could lead to premature or unnecessary cancellations of outdoor activities such as sports practices, which might pose logistical and economic challenges. However, at higher levels of heat stress and WBGTs, the NWS method underestimated the WBGT while the S-WBGT provided more accurate results, with a lower relative bias, lower MAE, and higher hit rate. A cold bias at extreme WBGTS (>31.1°C) could result in insufficient planning and resource allocation for hazardous periods, which is critical since it is as these WBGTs when health is more likely to be impacted by heat exposure.

These findings of higher accuracy with the S-WBGT further support existing research regarding the accuracy of the Liljegren et al. (2008) methodology (Lemke and Kjellstrom 2012; Patel et al. 2013). For red and black flags, S-WBGT was within 0.8°C of the observed WBGT. Accuracy at such high values is most critical given the use of WBGT in making decisions about the safety of outdoor activity. While this error range (0.8°C) could still lead to WBGT flag misclassification in some instances, it is sufficiently accurate when compared to the larger error ranges of common WBGT meters and established standards, e.g., from the Japanese National Institute of Occupational Safety and Health (Racinais et al. 2022). Furthermore, in this case, the S-WBGT bias erred on the side of caution (being too warm) under these most thermally stressful conditions, which is preferred here in relation to WBGT being used to safeguard health.

b. Variations in forecast bias

WBGT forecast bias varied across space, weather conditions, and hour of the day. Finding that the coldest biases for all methods was located in the Appalachian Mountains was unsurprising due to the intricate influences of terrain in that area, which are not accounted for in this work. Additionally, weather forecasts for this region are more complex during the summer with respect to cloud cover (and thus solar radiation), as terrain-induced, daytime thunderstorms are challenging to predict. Consequently, the cool outflows and debris cloud fields from these storms can potentially cause large forecast errors. Additional downstream implications of this geography arise from adiabatic warming driven by westerly winds in the lee of the mountains.

For the NWS WBGT, the central North Carolina region had the most accurate forecasts on average. It is hypothesized that this is related to the effect of wind speed on the differences in the estimated natural wet bulb temperature and black globe temperature from the different methodologies, with there being less difference at higher wind speeds. While central North Carolina is not a region with low surface roughness overall, the stations where the NWS WBGT performs better are stations where surface roughness is low, relative to neighboring stations. This is hypothesized to be the reason behind this pattern for the sparsely forested Midwest and Southern Plains, which have climatologically faster wind speeds.

Distinct variations in forecast bias were seen when stratifying by dewpoint temperature, wind speed, and solar radiation. Rapid variability in these variables (together and separately) leads to high variability in WBGT, thus increasing the likelihood of errors for the one top-of-the-hour forecast value. The sensitivity of WBGT to wind speed was revealed by the overall range of S-WBGT bias being greater when wind speeds were low (<0.6–1.0 m s−1). The high variability in the S-WBGT forecast bias under low and variable solar radiation arises partly from the difficulty in forecasting cloud cover. This cloud cover, in turn, impacts the estimated solar radiation. This is also evident in the diurnal curve of forecast bias, with the bias of all forecast methods becoming increasingly positive toward midday (e.g., the period of highest solar radiation) and then decreasing through the afternoon.

c. Forecasting WBGT flags

As was the case when assessing WBGT forecast bias at each temperature value, there were noteworthy differences in accuracy between the methods for forecasting WBGT flag levels. The analysis revealed that the S-WBGT and S-WBGT Z were superior in forecasting black flag, with higher hit rates. However, this paralleled a tendency to over forecast red and, particularly, black flag, with bias scores greater than one and higher false alarm ratios for black flag than the NWS WBGT: 0.45 (S-WBGT) and 0.55 (S-WBGT Z) compared to 0.37 for the NWS WBGT (Table 5). This pattern was true when assessing Observed and Station WBGT forecast flags. However, with station WBGT, the magnitude of the NWS WBGT under forecasting red and black flag, and S-WBGT and S-WBGT Z over forecasting these flags was slightly higher.

Ultimately this reveals a delicate balance that results in a decision between 1) being more certain that when a black flag is forecast, a black flag will be observed; but if a black flag is not forecast, it still very well could be observed (NWS WBGT) and 2) if a black flag is forecast, it may occur and, if it is not forecast, one has higher confidence in that being true (S-WBGT methods). This latter option is particularly true for locations more sheltered from wind, for which the S-WBGT Z produces a more accurate forecast. Given the use of WBGT in protecting health, erring on the side of overforecasting black flag conditions might be preferable to underforecasting. Even if these conditions do not materialize, activities can be adjusted. However, both under- and overforecasting have logistical implications: for instance, a sudden cancellation of high school practices due to forecasting changes can disrupt parents’ work schedules, who then have to pick up their child. Furthermore, the HSS and GSS support the use of the S-WBGT methods, since they both have higher scores and higher percent corrects relative to the NWS WBGT (Table 8). While the GSS for S-WBGT Z and NWS WBGT are very close (0.34 versus 0.32, respectively), it is important to note that the GSS is accounting for the “closeness” of the categorical forecast misses. However, even though the NWS methods resulted in “close” misses when a black flag was observed, the difference in activity modifications and health implications between a red and black flag are important to consider.

d. Surface roughness

Last, given the paramount influence of wind speed on WBGT, this research addressed how the influence of land cover and associated surface roughness impact current efforts to forecast wind speed and WBGT. Since wind speeds are measured and forecasted for 10 m above the ground, they must be downscaled to 2 m to assess human heat stress. This study compared two methods of downscaling (translating) winds: 1) Pasquil–Gifford stability classes and 2) surface roughness values (derived from high-resolution satellite imagery). Incorporating more granular land cover information and surface roughness improved the WBGT forecasts. Importantly, however, this improvement was not uniform. The use of surface roughness for downscaling wind speeds results in more positively (warm) biased WBGT at sites with low surface roughness relative to sites with higher roughness. This implies that the S-WBGT Z is more reliable in complex terrain (e.g., increased tree cover or urban environments with large structures). Future work could ascertain the roughness level below which the use of surface roughness does not improve the WBGT forecast. Exploring other methods for estimating surface roughness at a reasonable scale may also prove beneficial, including incorporating the influence of differences in surface roughness from different directions (e.g., for a given point, wind from the north travels over dense forest and thus is slowed more by roughness than winds from the south that travel over open fields).

Additionally, there were negligible differences between the surface roughness values derived from the Sentinel and Landsat imagery. It is hypothesized that this is largely due to the use of the NCLD 2019 as the landcover data with which the vegetation indices from the images were paired. If land cover were classified directly from the Sentinel and Landsat imagery themselves, there might have been more of a difference. However, given the challenge of accurately classifying land cover over broad spatial areas and the need to consider surface roughness values at varying scales (e.g., 30, 100 m, etc.), any substantial differences and possible benefits of using the higher resolution imagery are unlikely and significantly less feasible to operationalize.

This research continues to emphasize the importance of selecting the best methodologies for estimating WBGT and demonstrates the ability to forecast WBGT accurately, particularly when the conditions are dangerous. Further research should be conducted to confirm the accuracy of WBGT forecasts in comparison to in situ WBGT across the broader CONUS. As more organizations and entities begin using WBGT, accurate WBGT forecasts will continue to increase in value as they enable robust planning for outdoor activity and early warning of particularly hazardous periods.

Last, it is important to acknowledge that WBGT estimation involves multiple input variables, each with its own uncertainty. These uncertainties, such as those in downscaling wind speed and estimating solar radiation, can compound and affect the final WBGT values. While our methodology aims to minimize the effects of these uncertainties on the resulting estimations of the natural wet bulb temperature and black globe temperature, they should be considered when interpreting the results.

Overall, WBGT forecasts are more challenging for areas with complex microclimates, particularly microclimates with higher surface roughness (i.e., areas with many trees or structures). The forecasts should be used with caution in such areas. Additional information to complement these forecasts include 1) developing a general understanding of how your specific microclimate influences WBGT and 2) comparing WBGT readings with the forecast, from which user-based bias corrections could be estimated. WBGT forecasts enable robust planning of outdoor activity; however, measurements of WBGT onsite at the time of activity remain critical.

Acknowledgments.

The authors wish to acknowledge the staff at Horace Williams Airport in Chapel Hill, North Carolina, that allowed for WBGT data collection and the North Carolina State Climate Office for access to the weather station data located there.

Data availability statement.

The methods for estimating WBGT based on standard meteorological variables are provided in Liljegren et al. (2008) and in the R package “wbgt”. The methods for creating superpixels from collecting images of cloud cover are detailed in the documentation for the R package “SuperpixelImageSegmentation”. Archived data for the National Digital Forecast Database (NDFD) are available through National Centers for Environmental Information (NCEI).

APPENDIX A

Surface Roughness Weighted Average Schema

Table A1 displays seven configurations of calculating the weighted average surface roughness across different spatial scales for a given pixel. Figure A1 displays the difference between station WBGT calculated when using weighted average surface roughness with configuration 1 (Table A1) compared to the six other configurations.

Table A1.

Surface roughness weighted average schema.

Table A1.
Fig. A1.
Fig. A1.

Station WBGT sensitivity to differences in weighted averages across spatial scales.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

APPENDIX B

Station WBGT Bias: Regional Variations

Bias is calculated as forecast WBGT minus station WBGT. Forecast bias is for a 24-h lead time. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds (see Fig. B1). NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Fig. B1.
Fig. B1.

Station WBGT bias: Regional variations.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

APPENDIX C

Confusion Matrices for Observed and Station WBGT Flag Forecast Accuracy Assessment

WBGT flags are as follows: 1) green flag (26.7°–29.4°C), 2) yellow flag (29.5°–31.0°C), 3) red flag (31.1°–32.1°C), and 4) black flag (32.2°C+). The assessment is for a forecast lead time of 24 h. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds (see Tables C1 and C2). NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

Table C1.

Confusion matrices for observed WBGT flag forecast accuracy assessment.

Table C1.
Table C2.

Confusion matrices for station WBGT flag forecast accuracy assessment.

Table C2.

APPENDIX D

NBM and NDFD 2-m Wind Comparison

Figure D1 displays a comparison of forecast wind speeds (24-h lead time) between the National Blend of Models (NBM) and National Digital Forecast Database (NDFD). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT Z is SERCC/CISA forecast with wind speed downscaled to 2 m using PG stability classes (surface roughness).

Fig. D1.
Fig. D1.

NBM and NDFD 2-m wind comparison.

Citation: Weather and Forecasting 39, 2; 10.1175/WAF-D-23-0076.1

REFERENCES

Save
  • Fig. 1.

    Field work instruments (WBGT meters and weather station). The WBGT meter (right) with a black globe thermometer, natural wet bulb temperature probe situated in a water reservoir, and dry bulb sensor in a radiation shield. A weather station (left) and a Kestrel 5400 (middle) were also collocated with the WBGT meter.

  • Fig. 2.

    Map of the weather stations used (by network) for station WBGT.

  • Fig. 3.

    Observed WBGT forecast bias (24-h forecast): NBM and NDFD. Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

  • Fig. 4.

    WBGT relative forecast (top) bias and (bottom) MAE (24-h forecast). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT Z is SERCC/CISA forecast with wind speed downscaled to 2 m using PG stability classes (surface roughness). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

  • Fig. 5.

    Station WBGT daily maximum bias (24-h lead time): Regional variations across the Carolinas. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

  • Fig. 6.

    WBGT forecast bias (24-h lead time) by hour of day (NBM) (WBGT ≥ 29.5°C). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds. NWS WBGT refers to the WBGT forecast utilizing the NWS methods.

  • Fig. 7.

    S-WBGT bias (24-h lead time) stratified by meteorological variables (observed WBGT). Boxplot whiskers extend up to 1.5 times the interquartile range (e.g., top whisker is 1.5 × IQR + third quartile value). S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds.

  • Fig. 8.

    S-WBGT vs S-WBGT Z average station WBGT forecast bias relative to surface roughness. S-WBGT (Z) are the forecasts using the Liljegren method, with PG stability classes (surface roughness) to downscale wind speeds.

  • Fig. A1.

    Station WBGT sensitivity to differences in weighted averages across spatial scales.

  • Fig. B1.

    Station WBGT bias: Regional variations.

  • Fig. D1.

    NBM and NDFD 2-m wind comparison.

All Time Past Year Past 30 Days
Abstract Views 1967 1535 0
Full Text Views 762 634 438
PDF Downloads 319 171 50