## Abstract

A low-level turbulence (LLT) forecasting algorithm is proposed and implemented within the Graphical Turbulence Guidance (GTG) turbulence forecasting system. The LLT algorithm provides predictions of energy dissipation rate (EDR; turbulence dissipation to the one-third power), which is the standard turbulence metric used by the aviation community. The algorithm is based upon the use of distinct log-Weibull and lognormal probability distributions in a statistical remapping technique to represent accurately the behavior of turbulence in the atmospheric boundary layer for daytime and nighttime conditions, respectively, thus accounting for atmospheric stability. A 1-yr-long GTG LLT calibration was performed using the High-Resolution Rapid Refresh operational model, and optimum GTG ensembles of turbulence indices for clear-air and mountain-wave turbulence that minimize the mean absolute percentage error (MAPE) were determined. Evaluation of the proposed algorithm with in situ EDR data from the Boulder Atmospheric Observatory tower covering a range of altitudes up to 300 m above the surface demonstrates a reduction in the error by a factor of approximately 2.0 (MAPE = 55%) relative to the current operational GTG system (version 3). In addition, the probability of detection of typical small and large EDR values at low levels is increased by approximately 15%–20%. The improved LLT algorithm is expected to benefit several nonconventional turbulence-prediction sectors such as unmanned aerial systems and wind energy.

## 1. Introduction

Turbulence forecasts are used by pilots, dispatchers, and air traffic controllers to strategically avoid regions of strong turbulence that can lead to injuries, aircraft fatigue, and damage (Sharman and Lane 2016). Current turbulence forecast methods use numerical weather prediction (NWP) model output to diagnose different sources of turbulence, including clear-air turbulence (CAT), mountain-wave turbulence (MWT), and convectively induced turbulence at mid- and upper levels, and low-level turbulence (LLT) owing to turbulence events within the atmospheric boundary layer. These turbulence forecasting techniques typically relate the large-scale atmospheric turbulence production mechanisms that are explicitly resolved with horizontal grid spacings of NWP models (~10 km) to aircraft-scale turbulence (~10–100 m), through the assumption of a downscale energy-cascade process (e.g., Lindborg 1999). An example of turbulence forecasting algorithm is the Graphical Turbulence Guidance (GTG) system, originally devised by Sharman et al. (2006), that is run operationally for public use by NOAA’s National Weather Service (https://www.aviationweather.gov/turbulence/gtg). GTG uses an ensemble average that is based on an optimum combination of turbulence diagnostics that has been recently extended by (Sharman and Pearson 2017) to 1) provide forecasts at all flight altitudes from surface to flight level (FL) 500 (50 000 ft; 1 ft = 30.48 cm), 2) explicitly provide forecasts of MWT, and 3) output energy dissipation rate (EDR; =*ε*^{1/3}, where *ε* is the turbulence dissipation rate) instead of a likelihood metric (i.e., categorical levels such as light, moderate, or severe turbulence). In addition, GTG has been extended to perform EDR nowcasts using a short-term forecast nudged to observations (Pearson and Sharman 2017). This approach has shown to be particularly useful in identifying areas of turbulence associated with convection.

The original motivation for the development of GTG was to create an operational system that could be used as a guide for making turbulence-avoidance decisions for the aviation community. Therefore, emphasis was placed on forecasting turbulence primarily at upper levels (20 000–45 000 ft: FL 200–450) and midlevels (10 000–20 000 ft), for turbulence levels relevant to commercial aircraft. Within these altitude ranges, climatological distributions derived from in situ aircraft and pilot reports (PIREPs) support the notion that EDR follows a lognormal distribution (e.g., Nastrom and Gage 1985; Cho et al. 2003; Frehlich and Sharman 2004; Sharman et al. 2014). In light of these findings, Sharman and Pearson (2017) proposed a remapping method that is based on the lognormal nature of upper-tropospheric and lower-stratospheric (UTLS) EDR (the technique is outlined in section 2). Turbulence generation mechanisms within the atmospheric boundary layer (ABL) differ from those at the UTLS region, however, mainly because of the proximity to Earth’s surface and the role of the surface energy budget. A recent study by Muñoz-Esparza et al. (2018) has shown that, in the first few hundred meters above the surface, EDR is strongly driven by the diurnal evolution of the ABL. During convective daytime ABL conditions, EDR bears a log-Weibull distribution, whereas nighttime stably stratified ABLs follow a lognormal distribution. The similarity between LLT during stable conditions and UTLS turbulence in terms of statistical behavior is attributed to upper levels being typically characterized by quasi two-dimensional stratified turbulence (e.g., Lindborg 1999). Therefore, LLT predictions may necessitate a special treatment that differs from the mid- and upper-level turbulence forecast methods.

GTG has focused up to now on mid- and upper-level turbulence aviation forecasts, but there are other existing and emerging areas that can specifically benefit from improved strategic LLT forecasts. From an aviation standpoint, the use of small unmanned aerial systems (UAS), or drones, is growing at a fast pace. A clear example of the proliferation of such technology is the initiative of NASA to develop an air traffic management system for UASs (https://utm.arc.nasa.gov/; Kopardekar et al. 2016), with a main focus of enabling safe yet efficient low-altitude operations for UASs [in the lowest 200–500 ft ( ~60–150 m)]. In particular, such systems are more than an order of magnitude smaller in size and weight than typical commercial aircraft and therefore are expected to be sensitive to smaller-magnitude turbulence encounters that do not affect commercial aircraft operations at upper levels. Another aviation application of relevance is wake vortices (e.g., Holzäpfel et al. 2016). Vortices that form in the wake of an aircraft affect the latency of takeoff/landing maneuvers, as well as its design and execution. Interaction with downstream aircraft can result in wake-vortex encounters that pose difficulties in aircraft maneuvers and can even lead to aircraft accidents. The extent of these wake vortices depends on the background atmospheric LLT, with smaller ABL EDRs resulting in longer-lived wake vortices. Also, atmospheric LLT has significant impacts on wind turbine wake characteristics and thus affects power production and wind-plant control. In particular, recent field studies have found that the nature of the turbine wake strongly depends on background EDR of the incoming ABL (Smalikho et al. 2013; Lundquist and Bariteau 2015).

All these examples evidence the relevance and potential benefits of operational LLT forecasting algorithms. Herein, we propose a new method that extends the capabilities of GTG to improve the prediction of EDR in the ABL. The remainder of the paper is structured as follows. The new algorithm to predict low-level EDR that accounts for turbulence variability throughout the diurnal cycle of the ABL is introduced in section 2. The particular turbulence diagnostics utilized, as well as the calibration of the algorithm and determination of the optimum ensemble forecast, are discussed in section 3. Validation of the proposed LLT forecast method and its comparison with the predecessor GTG are presented in section 4. Section 5 is devoted to the conclusions.

## 2. A new approach to account for diurnal variability of turbulence in the ABL

The current GTG system uses a statistical remapping technique (Sharman and Pearson 2017). The method transforms any turbulence index or diagnostic of arbitrary units *D*_{i} derived from an NWP model into an EDR value (m^{2/3} s^{−1}), applying a linear remapping approach in logarithmic space:

The coefficients *a* and *b* in Eq. (1) are calculated on the basis of the expectation operator 〈〉 and the standard deviation operator SD() of the probability distribution function (PDF) of the observations and the quantity to be remapped, using the following expressions:

where *ε*^{1/3} represents the observational “climatology” of reference.

Muñoz-Esparza et al. (2018) have recently found using data from the Experimental Planetary Boundary Layer Instrumentation Assessment (XPIA) field campaign (Lundquist et al. 2017) that probability distributions in the ABL exhibit distinct patterns during daytime and nighttime conditions. During daytime convective ABLs (i.e., CBLs), the probability distribution of EDR follows a log-Weibull distribution, characterized by

where *k* and *λ* are the shape and scale parameters of the distribution, respectively. For a log-Weibull distribution, the mean and standard deviation are given by 〈ln*x*〉 = *λ*Γ(1 + *k*^{−1}) and SD(ln*x*) = *λ*{Γ(1 + 2*k*^{−1}) − [Γ(1 + *k*^{−1})]^{2}}^{1/2}, respectively, where Γ is the gamma function. In contrast, nighttime stable ABL (i.e., SBL) conditions bear a lognormal distribution as at upper levels, defined as

with the expectation and standard deviation of the distribution given by 〈ln*x*〉 = exp(*μ* + 0.5*σ*^{2}) and SD(ln*x*) = exp(*μ* + 0.5*σ*^{2})[exp(*σ*^{2}) − 1]^{1/2}, respectively. The EDR database generated by Muñoz-Esparza et al. (2018) encompasses levels from the surface to *z* = 300 m, covering 3 months (March–May 2015) of observations at the Boulder Atmospheric Observatory (BAO) tower near Boulder, Colorado (40°03′N, 105°00′W). The tower was instrumented with 20-Hz sonic anemometers at seven heights, resulting in a database (“climatology”) of ~1.5 million EDR records.

Figure 1a shows the mean and standard deviation of the probability distributions from Muñoz-Esparza et al. (2018) as a function of height above the surface and sorted by atmospheric stability. Solid lines represent the best fit to power and linear functions for the mean and standard deviation of the distributions, respectively. These were used to derive the mean parameters of the observational climatologies for the vertical grid points that are within the first 300 m above the surface in the High-Resolution Rapid Refresh (HRRR; Smith et al. 2008; http://ruc.noaa.gov/hrrr/) operational forecasting system that runs over the contiguous United States (CONUS) and is based on the Weather Research and Forecasting (WRF) Model (Skamarock and Klemp 2008). For the HRRR model, this includes five vertical model levels approximately located at *z* = 10, 37.5, 85, 165, and 280 m. Note that these are very similar to the Rapid Refresh (RAP; Benjamin et al. 2016) forecast product, with a slight increase of the vertical spacing with respect to the HRRR of about 1.3%; therefore the same parameters can be used for both NWP forecast models as input for GTG. The height-averaged values for the log-Weibull daytime distribution are *C*_{1} = −2.1980 m^{2/3} s^{−1} and *C*_{2} = 0.6564 m^{2/3} s^{−1}, whereas nighttime lognormal distribution values are *C*_{1} = −2.7017 m^{2/3} s^{−1} and *C*_{2} = 0.7641 m^{2/3} s^{−1}. The resulting PDFs for CBL and SBL conditions as based on the previous height-averaged parameters are presented in Fig. 1b. These results clearly indicate a predominance of large EDR values during daytime conditions, whereas nighttime LLT tends to be weaker. We explored the use of height-dependent coefficients for the PDFs, but only minor differences were found, likely because of other sources of error being more dominant. Therefore, height-averaged coefficients are used for simplicity.

The main changes with respect to the previous GTG algorithm for LLT are the use of two different PDFs for daytime and nighttime conditions (log-Weibull and lognormal, respectively) and the fact that the reference PDFs are derived from ABL measurements using sonic anemometers instead of in situ aircraft and PIREPs. The advantages of this new reference database are discussed in section 4. In addition, the LLT algorithm uses the total surface heat flux *H* (W m^{−2}), which is the sum of the sensible and latent heat fluxes, to discriminate between convective (*H* ≥ 0) and stable (*H* < 0) conditions and therefore uses the appropriate PDF for the statistical remapping [Eq. (1)]. Moreover, two different suites of indices are used for daytime and nighttime conditions. This approach is beneficial since turbulence mechanisms, and therefore GTG indices, depend on ABL stability, which results in an improvement of forecast skill at low levels in comparison with the state-of-the-art GTG algorithm from Sharman and Pearson (2017).

## 3. Turbulence diagnostics, calibration, and optimum ensemble forecast

As already mentioned, GTG uses operational NWP model output as a basis for computing all of the turbulence indices, typically involving calculation of spatial gradients. The current operational version of GTG utilizes the WRF-based RAP model, which has a horizontal grid spacing of ~13 km and covers the CONUS. For global applications, GTG uses NOAA’s Global Forecast System (GFS) NWP model (https://www.ncdc.noaa.gov/data-access/model-data/model-datasets/global-forcast-system-gfs) at 0.25° horizontal resolution. The WRF-based HRRR product is also available, which has a finer horizontal grid spacing of 3 km. For the next operational release of GTG we will make a transition to the HRRR model, since we expect to improve our EDR forecasts by better representing terrain effects and therefore flow features that lead to turbulence production. Therefore, for the development of the new LLT algorithm we use the HRRR model as the input operational NWP model for GTG.

The HRRR model output is normally available every hour from the current time out to an 18-h forecast horizon. In a typical case, the accuracy of the forecast progressively degrades as the forecast lead time increases. Mesoscale NWP models, however, require a spinup time to generate the energy content in the mesoscale portion of the energy spectrum that is missing in the initial conditions, mostly containing energy at the synoptic scales. In GTG, we want to have such mesoscale variability developed, which will subsequently be related to turbulence at smaller scales that are not resolved by the model. To quantify the spinup time of the HRRR product, energy spectra for the horizontal wind speed were computed:

where *k* is the wavenumber, *n* is the index in a given spatial direction, and Δ is the corresponding grid spacing. Energy spectra of horizontal wind at 2 km above the surface are presented in Fig. 2. These results are valid at the same time, progressively decreasing the initialization time and increasing the forecast time in the same time increment, and correspond to 0, 1, 2, 3, 5, 7, and 9 h forecasts. The two cases correspond to a daytime instance (2000 UTC 9 March 2015; Fig. 2a) and a nighttime instance (1000 UTC 10 March 2015; Fig. 2b). In both cases, the initial-condition state displays a considerable energy underprediction at the mesoscales, as expected, which progressively grows with forecast time. During daytime conditions, the model reaches equilibrium (i.e., mesoscale energy content becomes fully developed) in ~3 h after its initialization. For the nighttime case, the development process appears to be slightly slower, requiring ~5 h for the energy spectrum to stabilize. We hypothesize that the presence of a convective source at the mesoscales accelerates the model spinup phase, which during nighttime is essentially dominated by an energy-cascade process and thus necessitates larger times to propagate across the entire resolvable spectrum of spatial scales. We only presented two cases to illustrate the typical behavior, but consistent conclusions were found for other days and times (not shown). On the basis of these results, forecasts with 5-h lead time were used to calibrate GTG LLT indices and for the statistical analysis presented in section 4. This time is in fact very similar to the 6-h forecast used by Sharman and Pearson (2017) to generate climatological distributions of turbulence indices. Note also that the model displays an energy dependence on the wavelength of ~*k*^{−5/3} at the mesoscales, consistent with both observations and modeling studies (e.g., Lindborg 1999; Skamarock 2004; Skamarock et al. 2014).

Sharman and Pearson (2017) provide a list of all of the turbulence indices in GTG. In addition to these, two indices that are specific to ABL turbulence were incorporated. One is an EDR estimate that is derived from a turbulent kinetic energy (TKE; *q*) budget model that is based on large-eddy simulations for convective ABLs (“MSEDR”; Moeng and Sullivan 1994):

where *u*_{*} is the friction velocity, *w*_{*} is the convective velocity scale, *z*_{i} is the ABL depth, *ϕ*_{m} is the surface layer stability function, *ζ* = *z*/*L*, *L* = is the Obukhov length, *θ* is the potential temperature, *ρ* is the density, *c*_{p} is the specific heat of air at constant pressure, and *κ* is the von Kármán constant. The other additional index is from the dissipation-term parameterization in the planetary boundary layer (PBL) scheme that is based on Kolmogorov’s hypothesis (“PBLEDR”):

where *ℓ* is the turbulent master length scale and *b*_{1} is a model constant (=24.0). Muñoz-Esparza et al. (2018) have shown that WRF simulations using the same PBL scheme that the HRRR model uses—the Mellor–Yamada–Nakanishi–Niino (MYNN) scheme (Nakanishi 2001; Nakanishi and Niino 2006)—can produce estimates of EDR with a bias reduction of ~75% after applying the statistical remapping technique described in section 2. The HRRR output unfortunately only includes TKE, and therefore *ℓ* has to be calculated from the output. The turbulent length scale is computed as follows:

where *z*_{i2} = max(*z*_{i}, 300), *h*_{1} = min(*z*_{i2}, 750), and *h*_{2} = 0.5*h*_{1}. In Eq. (9), *ℓ*_{2} is the turbulent length scale from Bougeault and Lacarrere (1989) and *ℓ*_{1} corresponds to the length scale from Nakanishi (2001), which is the harmonic mean of the surface, turbulence, and buoyancy scales:

where

with *ℓ*_{∞} = 1 × 10^{8} m. The expression for *ℓ*_{1} from Eq. (10) is valid for convective conditions. Although the master length scale also has a definition for stable ABLs, we found that Eq. (8) performed poorly under such stability, likely because of the coarse vertical grid spacing in the HRRR model, and therefore it is not included here.

To derive PDFs for the 77 indices available in GTG, we generated predictions for 1 yr (May 2016–April 2017) using the HRRR model for 5-h forecasts initialized every hour (24 times per day). We intentionally used HRRR data from a different period for two reasons: 1) to calibrate GTG with the most recent HRRR data available, since the HRRR model continuously undergoes tuning and modifications, and 2) to not include the calibration data as part of the training. A 60 × 60 km^{2} area surrounding the BAO tower location was considered, including heights <3 km above the surface. With that, and taking into account the availability of archived HRRR output, a database of 45 619 200 samples was generated. Figures 3a and 3b show PDFs for two of the indices, MSEDR and “Ellrod2” (Ellrod and Knapp 1992), filtered for daytime and nighttime conditions, respectively, to illustrate the general behavior of the probability distributions. The PDFs are in good agreement with the observed log-Weibull and lognormal structure of ABL turbulence. Fitting of the PDFs to the analytical expressions is needed to determine the mean and standard deviation of the distributions. For that purpose, a least squares error minimization technique was used, considering bins with probability of occurrence larger than 0.1% (dark-blue dots). Nevertheless, this is a relatively minor correction, since the largest PDF values dominate the error minimization.

A limitation of the observational dataset used herein is the time span of the observations (March–May 2015). To investigate the likelihood of the sonic-anemometer-derived PDFs to account properly for intra-annual variability, we use the 1-yr-long GTG predictions as a proxy and filter the indices by season to analyze the variability. For daytime conditions (Fig. 3c), there is good overall agreement, with a slight shift to larger EDR values during the spring and a higher small EDR tail for the distribution during winter conditions. We attribute these to weaker convective ABLs developing during the winter, as opposed to the spring and summer months. Nighttime turbulence presents even less intra-annual variability, with slightly weaker turbulence during summertime. These relatively small differences affirm the statistical robustness of the observations used as reference for the expected EDR distributions. Also, we emphasize that springtime downslope turbulence events are prone to occur at the BAO site for the period during which observations were available. In contrast to the midlevel and upper level, the ABL is continuously turbulent, and therefore PDFs are dominated by turbulence events mainly driven by the stability of the ABL; these are well captured with a 3-month-long dataset.

The final GTG for CAT and MWT is computed as the ensemble average of the optimum indices:

In Eq. (11), *m* is the number of indices used by the specific combination; *w*_{i} is a weighting coefficient that can be adjusted dynamically or can be statically prescribed on the basis of performance for a long-term retrospective period (Sharman et al. 2006). Here, we give the same weight to all of the indices, *w*_{i} = 1.0, since the use of equal weights has been shown to provide sufficient accuracy in the latest GTG operational release. To determine the optimum suite of turbulence indices/diagnostics, different criteria can be employed. Sharman and Pearson (2017) derived the combination of GTG indices that maximizes the area under the curve (AUC) in “relative operating characteristic” curves for a threshold of EDR = 0.22 m^{2/3} s^{−1}, corresponding to the median 1-min peak EDR for moderate turbulence reports (Sharman et al. 2014). This criterion has been also found to be optimum in studies in which the main emphasis was on forecasting events above or below a given threshold (i.e., categorical classification) but was not necessarily focused on an accurate forecast along a wide range of values (e.g., Marzban 2004; Sharman et al. 2006; Gill 2014).

As mentioned in the introduction, an accurate forecast of both large and small EDR values is desirable for LLT applications. Thus, we optimized the GTG combination to minimize the mean absolute percentage error (MAPE), defined as

where *n* is the number of observation–forecast pairs available, is the remapped index into EDR space, and EDR_{obs} is the corresponding observation. Such a statistical measure is desirable since the range of EDR values spans a few orders of magnitude, and therefore larger EDR values will dominate the optimization processes when using a nonrelative error metric. Moreover, CAT and MWT combinations are determined separately for daytime and nighttime conditions. This is done because of the differences in turbulence mechanisms for CBLs and SBLs (i.e., buoyant vs shear production), which in turn results in different statistical behavior and therefore different performances depending on atmospheric stability. Following Sharman and Pearson (2017), the optimization process is performed by employing a forward selection of turbulence diagnostics starting with the one having the smallest MAPE and adding diagnostics that decrease the MAPE of the ensemble combination [Eq. (11)] until no further decrease of MAPE is achieved. A list of the subset of optimum indices used for the GTG LLT ensemble-average CAT and MWT combinations is presented in Table 1.

It is worth mentioning that, for daytime conditions, the term *b* ln*D*_{i} is replaced by *b*(ln*D*_{i} − *α*) in Eq. (1). This is imposed to ensure that ln*D*_{i} − *α* > 1, since otherwise the gamma function in the log-Weibull distribution is not always a real number. This transformation does not affect the results, and it is accounted for in the *a* and *b* coefficients presented in Table 2, for which *α* = 25 was used. Also, and although not explicitly indicated in the index definitions, the CAT indices for SBL conditions were divided by the Richardson number, as proposed in Sharman and Pearson (2017) to improve the statistical performance of the indices.

The suite of indices used in the optimum CAT and MTW combinations is presented in Table 2, which includes the individual MAPEs and the improvement of the intermediate GTG combinations as new indices are incorporated. Optimization is performed using EDR observations/forecasts at the BAO tower for *z* = 5, 50, 100, 150, and 300 m during April 2015 (*n* = 3112). Despite the difference in altitude from upper to lower levels, the stable ABL shares many commonalities with upper-level turbulence generation mechanisms. Both are stratified, the turbulence is quasi two-dimensional, and there is typically strong vertical shear. Therefore, indices that account for these mechanisms are potentially good candidates to diagnose low-level turbulence in stable conditions.

The use of an ensemble of indices improves the performance from any single individual index. For daytime conditions, PBLEDR is the best-performing CAT index, with MAPE = 40.246%, further confirming that the simplified turbulent length scale that is used works appropriately. The MWT CBL combination has ~20% larger error than that for the CAT CBL. We attribute this lower performance to the fact that mountain waves are dampened in the presence of deeper boundary layers and larger turbulence levels (e.g., Smith and Skyllingstad 2011; Sauer et al. 2016), as typically occur during daytime conditions. In contrast, for nighttime conditions CAT and MWT ensembles exhibit essentially the same performance, with a MAPE that is ~15% larger than in the case of the CAT CBL ensemble. We speculate that this is in part attributed to the coarse vertical grid resolution of the HRRR model and to difficulties of mesoscale models in stable ABLs either to resolve phenomena such as low-level jets or to parameterize subgrid-scale processes such as turbulence intermittency, gravity waves, and Kelvin–Helmholtz instabilities (e.g., Holtslag et al. 2013; Muñoz-Esparza et al. 2017).

## 4. Validation and performance assessment

In this section, we validate the new LLT algorithm implemented in GTG using part of the XPIA campaign observations. The training of the algorithm was carried out with XPIA data from April of 2015, and the rest of the campaign was left for validation and quantification of forecast skill. Because of outages in the archived HRRR database, model output was unfortunately unavailable for May of 2015, therefore reducing the data for validation to the period 9–30 March 2015 (*n* = 2462). The RAP-based GTG ensemble results from Sharman and Pearson (2017) are also included to judge the relative performance of the new LLT algorithm. Both the HRRR and the RAP forecasts are based on the WRF Model, which uses a terrain-following pressure-based vertical coordinate. Such a choice results in a vertical coordinate that slightly changes position in time; for the specific period and heights used herein that effect has standard deviations of less than 2% of the mean value (height variability is indicated at the top of each panel in the following figures). To avoid potential errors in performing interpolation to match exactly the observation heights, the closest vertical grid point available from GTG was used.

Figures 4 and 5 show time evolutions of the GTG ensemble during the evaluation period for the CAT and MWT indices, respectively. Visual inspection reveals several aspects related to the improved performance of the new LLT algorithm (GTG LLT HRRR, also referred to herein as GTG LLT). At all heights, GTG LLT exhibits a better diurnal evolution pattern, capturing the maximum EDR peaks occurring in the early evening and the nighttime minima. The use of different PDFs for daytime and nighttime conditions that are also distinct in structure, together with two different sets of indices depending upon ABL stability, results in a realistic capturing of the diurnal evolution of EDR in the ABL. In contrast, with the original GTG algorithm in which only a lognormal distribution is used, the diurnal pattern is not properly reproduced. The original GTG ensemble often tends to forecast the growth of EDR after sunrise with a time delay. We attribute this effect to the indices used, which are calibrated employing a lognormal distribution with *C*_{1} = −2.60 m^{2/3} s^{−1}, more representative of stable ABLs and therefore not resulting in increased EDR forecasts with the onset of daytime convection. Also, the original GTG tends to overpredict forecast EDR values during nighttime conditions, even though the reference lognormal distribution is similar to the one used by the new LLT algorithm (*C*_{1} = −2.7017 m^{2/3} s^{−1}). This behavior may be attributed to the optimization criteria used for the original GTG, which are based on minimizing the AUC for EDR = 0.22 m^{2/3} s^{−1} and do not penalize overpredictions above the specified threshold.

A suite of statistical scores is presented in Table 3 to quantify the performance of the LLT algorithms. These include MAPE for the entire evaluation period as well as the daytime and nighttime relative values, and the probability of detection (POD) for intervals of ±20% and ±50% around the target value, labeled as POD_{0.2} and POD_{0.5}, respectively. For illustration purposes, we selected the median of the daytime and nighttime EDR observations, 0.14 m^{2/3} s^{−1} and 0.04 m^{2/3} s^{−1}, respectively, and considered observations within a ±15% of such median EDR. The ratio of the two selected typical values is 3.5, providing a good assessment of the performance in forecasting EDR values over a wide range. The MAPE for CAT is improved by almost a factor of 2 by the new LLT algorithm, with a reduction from 101% in the current operational GTG down to 55%. Such improvement is larger during nighttime conditions, but MAPE is still reduced by a factor of 0.5 during daytime conditions. For MWT the error is reduced by ~20%. The better performance in CBL conditions also reflects on the probability of detection, with increases in both POD_{0.2} and POD_{0.5} of ~1.5–2.0, with respect to nighttime conditions for CAT and MWT combinations in both GTG algorithms. In all cases, the new GTG LLT exhibits comparable performance for CAT and MWT combinations. The original GTG CAT ensemble presents the largest errors, however, likely originating from the lower *C*_{2} and calibration of the indices using altitudes above the ABL.

To gain further insight into the accuracy of the GTG algorithms, the absolute percentage error (APE) distributions from which MAPE values were derived are presented in Fig. 6 for the CAT and MWT combinations. For the GTG CAT ensemble (Fig. 6a), the new LLT algorithm presents the largest probabilities for small APE values, with an exponential-like behavior. This type of distribution is desirable, in which the largest number of instances are associated with the smallest errors. In contrast, the current operational GTG has a flat distribution for APE < 75%, followed by a substantial decrease for larger APEs. The cumulative APE curves are also included, to compare the performance of both GTG algorithms. A perfect algorithm will have a cumulative APE curve that is always equal to 1. The new GTG LLT forecasts (red line) are characterized by a faster growth to 1 than the current operational GTG ones (blue line), with a clear gain in performance revealed by the larger cumulative APE values all over the range of APE. For the GTG MWT ensembles presented in Fig. 6b, similar features are found, with similar cumulative APE curves for APE > 100%. In particular, the original GTG MWT ensemble has an APE distribution peaking at ~75%, whereas the new LLT algorithm forecasts result in an APE distribution that decreases the probability as APE increases in an exponential manner.

The EDR estimates from the sonic anemometers were filtered using a 10-min moving average to have a fair comparison with the turbulence forecasts from GTG. This is due to the nature of the algorithm, which assumes that energy is transferred from larger scales resolved by the underlying NWP model and cannot reproduce turbulent events that are originated at scales not explicitly resolved by the model. To illustrate better the limitations of these LLT algorithms, sonic-anemometer-derived EDR observations every 30 s are presented in Fig. 7 for 12 March 2015 together with the corresponding hourly GTG LLT forecasts. The time evolution of observed EDR reveals variability at intrahour scales, which cannot be captured with GTG or any other turbulence algorithm that relies on NWP models with grid spacings of several kilometers and in which turbulence effects are fully parameterized. Nevertheless, the GTG LLT forecasts are able to satisfactorily reproduce the EDR nature of decay during nighttime and its subsequent increase with the onset of convective daytime conditions (~1500–1600 UTC), thus providing valuable information about the mean EDR behavior at low levels. This result demonstrates that parameterized turbulence algorithms are not sufficient if the interest lies not only on the average EDR level but also on the small-scale events. For these applications, a large-eddy-simulation model coupled to a mesoscale NWP model can be used. Recent work by Muñoz-Esparza et al. (2017) has proven that such a modeling approach is promising. The much smaller grid spacing required by LES models (~10 m) precludes it from usage for operational forecast purposes at this time, however.

Observations of EDR from the BAO tower were used to evaluate the performance of the low-level GTG algorithms. Although observations from other locations are desirable, the BAO tower (decommissioned in June 2016) was the only source of ABL sonic-anemometer data required to derive EDR that was publicly available at the time when this research was carried out. We explored the use of in situ aircraft EDR estimates as an additional source of data for validation. In situ EDRs are reported in terms of maximum and median values within 1-min intervals (Sharman et al. 2014). Aircraft rapidly change altitude in the ABL because of takeoff/landing maneuvers, however, making it impossible to associate a particular reported EDR to a specific height and also mixing ABL and free-troposphere observations. Therefore, these difficulties preclude its usage in validating LLT EDR forecasts. Nevertheless, the new GTG algorithm has been qualitatively inspected over all of the CONUS. In Fig. 8, an example is shown of CAT (Figs. 8a,b) and MWT (Figs. 8c,d) GTG ensembles for the new LLT algorithm at a height above the surface *z* of approximately 168.5 m (fourth vertical grid point in the HRRR grid). These two cases are representative of daytime (Figs. 8a,c) and nighttime (Figs. 8b,d) conditions, valid at 1800 UTC 15 March 2017 and 1000 UTC 27 March 2017, respectively. The CAT ensembles reveal overall larger forecast EDR values over the entire CONUS during daytime conditions (Fig. 8a) relative to the nighttime forecast case (Fig. 8b), consistent with the role of atmospheric stability in turbulence production. Moreover, mountain regions such as the Appalachian Mountains, Rocky Mountain range, Sierra Nevada, and Cascades display locally larger EDR values during both times of the day. This is expected because of terrain-induced turbulence effects, which appear to be captured by our GTG LLT algorithm. The MWT ensembles (Figs. 8c,d) exhibit larger values than the CAT ensembles at mountain regions. We speculate that such an effect could be related to the calibration performed over relatively homogeneous terrain and that complex terrain drivers will result in consistently larger EDR values. Further analysis using observations over large periods in these regions is required to confirm this hypothesis. It is also interesting that the CAT ensemble often predicts larger EDR values over the ocean during nighttime conditions than during daytime conditions, associated with the warmer water with respect to the surrounding air resulting in convective marine boundary layers during nighttime (see Figs. 8a and 8b over the Pacific Ocean region).

## 5. Summary and conclusions

We have proposed an improved low-level turbulence forecast algorithm and have implemented it within the Graphical Turbulence Guidance system. The LLT EDR forecasting algorithm is based upon recent observational results from Muñoz-Esparza et al. (2018) and leverages the recently proposed statistical remapping technique introduced by Sharman and Pearson (2017). The new LLT algorithm uses remapping to a lognormal distribution for nighttime (stable) conditions and to a log-Weibull distribution for daytime (convective) conditions, specific to ABL turbulence. The surface total heat flux is employed to determine the instances in which the ABL is convective (*H* ≥ 0) or stable (*H* < 0). Moreover, different suites of turbulence indices are used depending upon atmospheric stability, to improve forecasting of turbulence under the different forcing mechanisms in stable and convective ABLs.

A 1-yr-long GTG calibration was performed using the WRF-based HRRR operational product, which has a horizontal grid resolution of 3 km and runs over the CONUS, on the basis of 5-h forecasts with a latency of 1 h. Determination of the optimum CAT and MWT ensemble combinations was performed by minimizing the mean absolute percentage error to identify the suite of indices that provides the best agreement not only for large-turbulence cases but for EDR across the entire range of possible values, since smaller EDRs are relevant for applications such as unmanned aerial systems, takeoff and landing maneuvers, and wind energy. Validation using EDR observations from the BAO tower for the period of 9–30 March 2015 was carried out, coinciding with the XPIA campaign. The proposed LLT algorithm reduces the MAPE with respect to the GTG implementation described in Sharman and Pearson (2017) for CAT by a factor of ~2 (MAPE = 55%). For the MWT ensemble, the improvement was not as large, ~1.1 (MAPE = 66%), but mountain-wave activity was likely not present most of the time. Moreover, the probability of detection within a ±20% and ±50% range for characteristically small and large ABL values is improved by a factor of ~1.3–2.3, respectively. In addition, EDR forecasts with GTG using the new LLT algorithm exhibit realistic features such as larger daytime values relative to nighttime over land and enhanced EDR in regions of complex terrain.

The current operational GTG (version 3) used as reference to evaluate the performance of the proposed LLT algorithm is based on the RAP model, with a horizontal grid spacing of ~13 km (~3.25 times as coarse as the HRRR model). Although some improvement by the HRRR model is expected because its higher resolution, we attribute the majority of the gain in performance by the new LLT algorithm to the use of different reference PDFs for daytime versus nighttime ABL conditions, together with the use of different suites of indices depending upon atmospheric stability and new ABL-specific indices. Quantification of the exact contribution from the higher-resolution NWP model will require recalibration of the current operational GTG, which is beyond the scope of the current study. Preliminary efforts to make a transition in the operational GTG at middle and upper levels to use of the HRRR model have revealed that the characteristics of the resulting EDR forecasts are somewhat similar in magnitude and structure, with an increase in granularity from the RAP model, therefore confirming our hypothesis.

A caveat of this study is the limitation of the validation to ABL observations at the BAO tower. We expect to leverage the current analysis to include observations from the recently performed Wind Forecast Improvement Project 2 (WFIP 2) field campaign in the surroundings of the Columbia River Gorge in the Pacific Northwest (https://www.esrl.noaa.gov/gsd/renewable/wfip2.html), and the Mountain Terrain Atmospheric Modeling and Observations (MATERHORN) campaign in Utah (Fernando et al. 2015), thus providing turbulence data in mountainous regions. Also, in the near future, in situ EDR reports from aircrafts will provide peak time within 1-min windows, thereby allowing evaluation of GTG LLT in regions near airports over the entire CONUS. The reference PDFs used here are characteristic of relatively homogeneous ABLs. It is speculated that the presence of terrain may modify the parameters of the distributions, likely requiring a specific subset of indices/distributions for LLT in complex terrain.

Low-level turbulence forecasting in complex-terrain regions is hindered by the ability of operational NWP model forecasts to reproduce these scenarios accurately. McCaffrey et al. (2017) have recently shown that EDR estimates from a one-dimensional PBL scheme in subkilometer HRRR-like forecasts in complex terrain differ from radar observations in that the output from the PBL scheme appears to be more strongly modulated by the diurnal cycle of the ABL. We hypothesize that an approach like GTG, in which turbulent indices that involve spatial gradients in both horizontal and vertical directions are utilized, may provide a way to compensate for model biases and the lack of resolution when used with an appropriate climatological description of EDR that accounts for terrain effects.

## Acknowledgments

This research was funded by NASA-Ames through Grant NNX15AQ66G. The authors are especially thankful to William Chan for stimulating discussions and encouragement and to Julie K. Lundquist for providing access to the sonic-anemometer data from the XPIA campaign used to calculate EDR estimates. The authors are grateful to Branko Kosivić and Jeremy Sauer for providing comments on an earlier version of the manuscript. All of the simulations were performed on the Yellowstone high-performance computer (ark:/85065/d7wd3xhc) at the National Center for Atmospheric Research (NCAR), provided by NCAR’s Computational and Information Systems Laboratory, sponsored by the National Science Foundation.

## REFERENCES

*16th AIAA Aviation Technology, Integration, and Operations Conf.*, Washington, DC, AIAA, AIAA 2016-3292, https://utm.arc.nasa.gov/docs/Kopardekar_2016-3292_ATIO.pdf.

*Eighth Conf. on Weather, Climate, Water and the New Energy Economy/28th Conf. on Weather Analysis and Forecasting/24th Conf. on Numerical Weather Prediction*, Seattle, WA, Amer. Meteor. Soc., J3.1, https://ams.confex.com/ams/97Annual/webprogram/Paper309313.html.

*Aviation Turbulence: Processes, Detection, Prediction.*Springer, 523 pp.

*24th Conf. on Severe Local Storms*, Savannah, GA, Amer. Meteor. Soc., 11.1, https://ams.confex.com/ams/pdfpapers/142055.pdf.

## Footnotes

^{a}

The National Center for Atmospheric Research is sponsored by the National Science Foundation.

© 2018 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).