A Comparative Study of Various Approaches for Producing Probabilistic Forecasts of Upper-Level Aviation Turbulence

Hyeyum Hailey Shin aNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Hyeyum Hailey Shin in
Current site
Google Scholar
PubMed
Close
https://orcid.org/0000-0001-5781-1777
,
Wiebke Deierling aNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Wiebke Deierling in
Current site
Google Scholar
PubMed
Close
, and
Robert Sharman aNational Center for Atmospheric Research, Boulder, Colorado

Search for other papers by Robert Sharman in
Current site
Google Scholar
PubMed
Close
Free access

Abstract

The skill of operational deterministic turbulence forecasts is impacted by the uncertainties in both weather forecasts from the underlying numerical weather prediction (NWP) models and diagnoses of turbulence from the NWP model output. This study compares various probabilistic turbulence forecasting approaches to quantify these uncertainties and provides recommendations on the most suitable approach for operational implementation. The approaches considered are all based on ensembles of NWP forecasts and/or turbulence diagnostics, and include a multi-diagnostic ensemble (MDE), a time-lagged NWP ensemble (TLE), a forecast-model NWP ensemble (FME), and combined time-lagged MDE (TMDE) and forecast-model MDE (FMDE). Both case studies and statistical analyses are provided. The case studies show that the MDE approach that represents the uncertainty in turbulence diagnostics provides a larger ensemble spread than the TLE and FME approaches that represent the uncertainty in NWP forecasts. The larger spreads of MDE, TMDE, and FMDE allow for higher probabilities of detection for low percentage thresholds at the cost of increased false alarms. The small spreads of TLE and FME result in either hits with higher confidence or missed events, highly dependent on the performance of the underlying NWP model. Statistical evaluations reveal that increasing the number of diagnostics in MDE is a cost-effective and powerful method for describing the uncertainty of turbulence forecasts, considering trade-offs between accuracy and computational cost associated with using NWP ensembles. Combining either time-lagged or forecast-model NWP ensembles with MDE can further improve prediction skill and could be considered if sufficient computational resources are available.

© 2023 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Hyeyum Hailey Shin, hshin@ucar.edu

Abstract

The skill of operational deterministic turbulence forecasts is impacted by the uncertainties in both weather forecasts from the underlying numerical weather prediction (NWP) models and diagnoses of turbulence from the NWP model output. This study compares various probabilistic turbulence forecasting approaches to quantify these uncertainties and provides recommendations on the most suitable approach for operational implementation. The approaches considered are all based on ensembles of NWP forecasts and/or turbulence diagnostics, and include a multi-diagnostic ensemble (MDE), a time-lagged NWP ensemble (TLE), a forecast-model NWP ensemble (FME), and combined time-lagged MDE (TMDE) and forecast-model MDE (FMDE). Both case studies and statistical analyses are provided. The case studies show that the MDE approach that represents the uncertainty in turbulence diagnostics provides a larger ensemble spread than the TLE and FME approaches that represent the uncertainty in NWP forecasts. The larger spreads of MDE, TMDE, and FMDE allow for higher probabilities of detection for low percentage thresholds at the cost of increased false alarms. The small spreads of TLE and FME result in either hits with higher confidence or missed events, highly dependent on the performance of the underlying NWP model. Statistical evaluations reveal that increasing the number of diagnostics in MDE is a cost-effective and powerful method for describing the uncertainty of turbulence forecasts, considering trade-offs between accuracy and computational cost associated with using NWP ensembles. Combining either time-lagged or forecast-model NWP ensembles with MDE can further improve prediction skill and could be considered if sufficient computational resources are available.

© 2023 American Meteorological Society. For information regarding reuse of this content and general copyright information, consult the AMS Copyright Policy (www.ametsoc.org/PUBSReuseLicenses).

Corresponding author: Hyeyum Hailey Shin, hshin@ucar.edu

1. Introduction

Improvements to numerical weather prediction (NWP) models have been and continue being made to predict weather more accurately. However, there are still uncertainties in forecasting weather from state-of-the-art NWP models because of the growing, but still limited, observational information to initialize these models and our incomplete understanding of atmospheric processes and their representations in NWP models, e.g., parameterizations of unresolved cloud and turbulence processes (Buizza et al. 2005; Slingo and Palmer 2011; Bauer et al. 2015). Hence, a single deterministic forecast is inevitably accompanied with some uncertainty as it can only represent one weather state among a range of possible weather states.

These uncertainties carry over to operational aviation turbulence forecasts that are based on operational NWP models. The model resolution of operational NWP models is becoming finer, reaching a few kilometers in the case of limited-area models, e.g., the National Oceanic and Atmospheric Administration (NOAA) High-Resolution Rapid Refresh (HRRR) (Benjamin et al. 2016), the Met Office (UKMO) Unified Model (UM) Regional Atmosphere (Bush et al. 2020), and the Météo-France AROME (Seity et al. 2011); and O(10) km in the case of global models, e.g., the National Centers for Environmental Prediction (NCEP) Global Forecast System (GFS) (NCEP 2022), the UKMO UM Global Atmosphere (Walters et al. 2019), the European Centre for Medium-Range Weather Forecasts (ECMWF) High-Resolution Forecast (HRES) (Haiden et al. 2021), and the German Weather Service (DWD) Icosahedral Nonhydrostatic (ICON) model (Zängl et al. 2015). However, these model resolutions are still not fine enough to resolve small-scale turbulence that may impact aircraft operations (order of a few hundred meters or smaller). This is not expected to change for some time, for global models in particular, as tremendous computing power is needed to accommodate the required small grid spacing to resolve aviation-scale turbulence. Therefore, today’s operational turbulence forecasts rely on turbulence diagnostics, which are based on the assumption that NWP resolved large-scale atmospheric processes associated with the generation of aircraft-scale turbulence are tied to unresolved aircraft-scale turbulence (Ellrod and Knapp 1992; Ellrod and Knox 2010; Gill and Buchanan 2014; Sharman et al. 2006; Sharman and Pearson 2017). However, these diagnostics all suffer from some uncertainty as well. First, turbulence diagnostics do not explicitly resolve but only infer aircraft-scale turbulence from the NWP large-scale fields. Second, individual diagnostics can represent only certain aspects of a number of known turbulence generation mechanisms, yet some turbulence generation mechanisms themselves are not completely understood. This uncertainty in diagnosing turbulence can adversely affect the skill of operational deterministic turbulence forecasts.

As with NWP models themselves, these uncertainties in aviation turbulence forecasts can be expressed through probabilistic forecasts. Probabilistic turbulence forecasts may represent the uncertainty in NWP model by generating a range of possible weather states (Gill and Buchanan 2014; Storer et al. 2019; Lee et al. 2020) or the uncertainty in diagnosing turbulence by using an ensemble of turbulence diagnostics for a single deterministic weather state (Kim et al. 2018), or both (Storer et al. 2020). Gill and Buchanan (2014) used ensemble weather forecasts from the Met Office Global and Regional Ensemble Prediction Systems (MOGREPS) to compute probabilities of turbulence events and showed that the probabilistic forecasts based on the NWP ensembles improve the overall prediction skill in terms of the area under the relative (or receiver) operating characteristic (ROC) curve (AUC) compared to existing operational deterministic forecasts. Storer et al. (2019) also confirmed the benefits of probabilistic forecasts based on the MOGREPS and the ECMWF Ensemble Prediction System (EPS). They showed further enhancement in the skill of probabilistic forecasts by combining the two ensemble forecast models.

Such a multi-model-based turbulence forecasting system was also developed by Lee et al. (2020) using seven global operational deterministic NWP models available from The International Grand Global Ensemble (TIGGE) (Swinbank et al. 2016) database. The multi-model-based probabilistic forecasts that use a single turbulence diagnostic were shown to have a small ensemble spread as indicated by the relatively flat reliability line for intermediate probabilities (from 30% to 90%) in comparison to the perfect reliability line (Lee et al. 2020). For possible reasons of the small spread, they suggested that this small spread is due at least in part to the small number of ensemble members used, i.e., seven NWP models, and in part to the fact that for short-term forecasts relevant for aviation purposes (anywhere from 1 or 2 h to about 36-h lead) the NWP models have simply not had enough time to significantly diverge. Also, with these ensemble model and multimodel approaches the computational costs increase in proportion to the number of weather forecasts used, i.e., the number of forecast ensembles or the number of NWP models.

Conversely, the multidiagnostic approach proposed by Kim et al. (2018) is distinct from the ensemble model and multimodel approaches in that it utilizes an ensemble of individual turbulence diagnostics derived from a deterministic weather state to compute probabilities. In their comparisons using an earlier version of the NCEP GFS model, the AUC score of probabilistic forecasts was comparable to the skill of deterministic forecasts, suggesting better performance may be obtained by combining the multidiagnostic approach with NWP model ensembles.

The objective of this study is to compare various single NWP model-based probabilistic forecasting approaches to produce turbulence forecasts, including a multi-diagnostic ensemble, a time-lagged ensemble, a forecast-model ensemble, and combined time-lagged multi-diagnostic ensemble and forecast-model multi-diagnostic ensemble approaches. Only upper-level turbulence forecasts—flight levels (FL) ≥ 20 000 ft—are considered, and concentration is on evaluating “clear-air” turbulence (CAT) and mountain wave turbulence (MWT) performance. Results of this comparative study are expected to inform decisions to identify the best approaches (in terms of forecast skill and cost effectiveness) for producing next-generation operational probabilistic turbulence forecasts. To the best of our knowledge, none of these approaches have been directly compared to each other such that recommendations can be made on which approaches are the most suitable for operational implementation. Section 2 describes the NWP dataset, the turbulence forecasting system, and the probabilistic forecast approaches used in this study. Section 3 presents the comparison results based on case studies and statistical evaluations. The summary and conclusions follow in the final section.

2. Methods

The procedure for producing automated aviation turbulence forecasts consists of the following three components: 1) retrieving the appropriate meteorological variables from NWP model output to be used as input into the turbulence forecasting algorithms, 2) computing one or more turbulence diagnostics derived from the NWP meteorological variables, and 3) combining diagnostics and NWP meteorological variables to derive the final deterministic and/or probabilistic turbulence forecasts. In this study, an experimental NWP model dataset was created to enable a direct and fair comparison of different probabilistic forecasting approaches within a single NWP model framework. The Graphical Turbulence Guidance (GTG) system (Sharman et al. 2006; Sharman and Pearson 2017) was used to calculate multiple turbulence diagnostics and to generate the final deterministic and probabilistic turbulence forecasts. Following are overviews of the experimental NWP dataset, the GTG system, and probabilistic forecasting approaches.

a. An experimental NWP dataset: Pseudo–global ensemble forecast system (GEFS)

An experimental NWP dataset was created and used as input into GTG, hereafter referred to as the “pseudo-GEFS.” The pseudo-GEFS forecasts consist of 21-member ensemble forecasts for 12-, 18-, 24-, 30-, and 36-h forecasts at a 13-km horizontal grid spacing on 50 vertical sigma levels with a model top at 10 hPa. This pseudo-GEFS dataset was generated through dynamical downscaling of the publicly available GEFS forecasts dataset from the NOAA National Centers for Environmental Information (NCEI) (https://www.ncei.noaa.gov/products/weather-climate-models/global-ensemble-forecast) at 1° horizontal resolution on 26 pressure levels from 1000 to 10 hPa, consisting of 21 NWP ensemble members—1 control (unperturbed) and 20 perturbation forecasts. The GEFS ensemble is configured by perturbing initial conditions and employing stochastic physics schemes (Zhou et al. 2017), and the pseudo-GEFS is designed to mimic these characteristics of the GEFS ensemble.

For dynamical downscaling, the Weather Research and Forecasting (WRF) Model version 4.2.1 was configured over the contiguous United States (CONUS) domain and run at a 13-km horizontal grid spacing. The WRF Model and simulation setup are provided in the appendix. The 1° GEFS 0-h forecasts were used to initialize the pseudo-GEFS forecasts, and 1° GEFS forecasts were used to provide lateral boundary condition (LBC) forcings, respectively, as illustrated in Fig. 1. The 12-h 21-member pseudo-GEFS forecasts valid at 1800 UTC were initialized every 0600 UTC over a 1-yr period between 21 May 2019 and 20 May 2020.

Fig. 1.
Fig. 1.

Illustration of pseudo-GEFS forecasts valid at T0 (marked by red circles). The 1° GEFS 0-h forecasts were used to initialize the pseudo- GEFS forecasts, and the 1° GEFS forecasts were used to provide LBC forcings to the pseudo-GEFS forecasts. Gray triangles show when and what GEFS forecasts are used to provide LBC forcing to the pseudo-GEFS forecasts.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

The 13-km horizontal grid spacing used for the pseudo-GEFS is comparable to those currently employed in operational global deterministic forecasts—e.g., 13 km of NCEP GFS (NCEP 2022), 10 km of UKMO UM (Walters et al. 2019), and 9 km of ECMWF HRES (Haiden et al. 2021), but is somewhat higher resolution than those currently used in operational global ensemble forecasts—e.g., 25 km of NCEP GEFS and 20 km of UKMO MOGREPS and ECMWF EPS. The vertical resolution is also similar to the “raw” GEFS forecasts (∼50–54 hybrid levels below 10 hPa), i.e., before the data were interpolated to constant pressure levels. This downscaled GEFS ensemble output is needed because high-resolution input (close to operational NWP resolution) with all needed input variables into GTG is not available from the publicly available 1° GEFS data. This then provides an experimental ensemble NWP system that mimics the operational NWP ensemble forecasting method. The pseudo-GEFS dataset was created over a 1-yr period between 21 May 2019 and 20 May 2020, for 345 days on which the full 21 GEFS ensemble forecasts are available from the NOAA NCEI archive.

We begin by examining the ensemble spreads for two particular cases of observed widespread clear-air turbulence (CAT). Figure 2 shows the observations—in situ eddy dissipation rate (EDR) and pilot reports (PIREPs)—associated with the two cases. The 3 December 2019 case (Fig. 2a) was studied in detail by Trier et al. (2022) and was shown to be mainly associated with strong jet stream shears (CAT) but also with some embedded MWT over Colorado. The 11 January 2020 case (Fig. 2b) was associated with MWT over the Colorado Rockies, CAT over Illinois, and convectively induced turbulence (CIT) over the Mississippi Valley.

Fig. 2.
Fig. 2.

In situ EDR and PIREP observations of moderate (MOD) and severe (SEV) turbulence at upper levels (≥20 000 ft) and within a ±1-h time window with symbols specified in the legend, and lightning area (shaded in orange): (a) 1800 UTC 3 Dec 2019 and (b) 1800 UTC 11 Jan 2020. The cross marks in cyan denote the horizontal location of the Grand Junction station (39.12°N, 108.53°W).

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

The GEFS and the pseudo-GEFS forecasts for different forecast lead times are compared in terms of the ensemble spread of the 250-hPa geopotential height and wind speed for these two cases in Fig. 3. For the 3 December 2019 case (Fig. 3a) the pseudo-GEFS ensemble compares well with the GEFS ensemble in terms of the location and the amount of the ensemble spread and the increase of the spread with the forecast lead time while providing the needed meteorological variables at higher horizontal and vertical resolution into GTG. For the 11 January 2020 case (Fig. 3b) the ensemble spread of the pseudo-GEFS compares well with the GEFS ensemble spread, except for the large spread of wind speed from Mississippi all the way to Michigan and Wisconsin. This large spread in the pseudo-GEFS coexists with squall lines developed at the time indicated by lightning observations (Fig. 2b), which are partially resolved in the 13-km pseudo-GEFS but remain unresolved in the 1° GEFS. This study concentrates on the prediction of CAT and MWT, therefore the similarity between the pseudo-GEFS and GEFS over the non-convective areas supports the use of the pseudo-GEFS as a substitute for the operational GEFS.

Fig. 3.
Fig. 3.

Ensemble spreads of 250-hPa geopotential height (dam) (contour lines, 1-dam interval) and wind speed (m s−1) (shaded) derived from GEFS (1°) and pseudo-GEFS (13 km) forecasts valid at (a) 1800 UTC 3 Dec 2019 and (b) 1800 UTC 11 Jan 2020, respectively, for forecast lead times of 12, 24, and 36 h.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

To show the effects of increased horizontal and vertical resolutions, vertical profiles of gradient Richardson number (Ri = N2/S2) derived from the publicly available 1° GEFS vertically interpolated and the 13-km pseudo-GEFS 12-h forecasts are compared in Fig. 4, together with its components, the vertical wind shear {S ≡ [(∂u/∂z)2 + (∂υ/∂z)2]1/2 where u and υ are zonal and meridional wind components and z is height}, and the square of Brunt–Väisälä frequency [N2 = (g/θ)∂θ/∂z where θ is potential temperature of dry air]. The S, N, and Ri terms were calculated at every vertical level of the pseudo-GEFS and GEFS forecasts, using the simple centered differences. These forecasted soundings are compared against high vertical resolution radiosonde data (HVRRD) observations at the Grand Junction station in Colorado (39.12°N, 108.53°W). The location of the Grand Junction station is indicated in Fig. 2, which is close to where a number of moderate and/or severe turbulence reports were recorded. The raw HVRRD soundings at a 1-s sampling frequency are available from the NOAA NCEI (https://www.ncei.noaa.gov/data/us-radiosonde-bufr/), and we used soundings at a 5-m grid spacing that are interpolated from the 1-s raw soundings (as in Ko et al. 2019). Since no sounding observations are available at 1800 UTC for both cases, we used the 1200 UTC sounding observations for the comparison. The HVRRD soundings are displayed every 50 m for better representation; however, no interpolation or smoothing was applied for the calculations of S, N2, and Ri.

Fig. 4.
Fig. 4.

Vertical profiles of (a),(d) the magnitude of vertical wind shear (s−1); (b),(e) the square of Brunt–Väisälä frequency (N2); and (c),(f) gradient Richardson number (Ri = N2/S2) at the Grand Junction station (39.12°N, 108.53°W) derived from the 1° GEFS (solid black with dots indicating vertical grid spacing) and 13-km pseudo-GEFS (dashed black) 12-h forecasts valid at (top) 1800 UTC 3 Dec 2019 and (bottom) 1800 UTC 11 Jan 2020. The HVRRD observation soundings available at 1200 UTC for each case are presented for comparison (solid gray); the HVRRD soundings are displayed every 50 m for better representation. The horizontal location of the Grand Junction station is indicated in Fig. 2.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

The 3 December 2019 case (Figs. 4a–c) shows that the pseudo-GEFS forecast has larger shears and consequently smaller Ri at upper levels (e.g., 11–13 km MSL) similar to HVRRD observations, creating more favorable conditions for turbulence generation than the publicly available 1° GEFS. The 11 January 2020 (Figs. 4d–f) case also shows that the pseudo-GEFS forecast predicts larger shears and smaller Ri in the lower stratosphere. The coarse and vertically interpolated publicly available 1° GEFS forecasts (vertical grid spacings are indicated by the dots in the figure) predicts Ri larger than 10 at 11–13 km MSL for the December case and above ∼8 km MSL for the January case. For both cases, the general vertical structure of the computed quantities for the most part follow the HVRRD data, but with much smoother distributions.

b. The GTG system and deterministic forecasts

For these comparative experiments the GTG system (Sharman et al. 2006; Sharman and Pearson 2017) was used to compute various turbulence diagnostics and ultimately probabilities of turbulence using the pseudo-GEFS weather forecasts as input. Both CAT and MWT forecasts are provided. In GTG, multiple turbulence diagnostics in various units and scales are computed from the NWP meteorological variables and remapped to eddy dissipation rate (EDR = ε1/3; m2/3 s−1), the International Civil Aviation Organization (ICAO) standard metric for automated turbulence reporting. Then an ensemble mean of the remapped diagnostics is computed for CAT and MWT, separately:
GTGCAT=i=1NCATDCATi/NCAT,
GTGMWT=i=1NMWTDMWTi/NMWT,
where DCATi and DMWTi are remapped (to EDR) CAT and MWT diagnostics for the ith diagnostic, respectively; and NCAT and NMWT are the number of CAT and MWT diagnostics, respectively. Finally, the maximum of the two is taken to be the final combined GTG deterministic forecast (GTGMAX):
GTGMAX=MAX(GTGCAT,GTGMWT)
Various studies have demonstrated that forecasting turbulence based on multiple turbulence diagnostics encompassing several turbulence generation mechanisms is superior to using only a single turbulence diagnostic (e.g., Sharman et al. 2006; Gill and Buchanan 2014; Sharman and Pearson 2017; Kim et al. 2018). Therefore, a total of 62 CAT and 15 MWT diagnostics are available in the version of GTG used in this study, and 11 CAT and 8 MWT diagnostics, a total of 19, among them are used to compute GTGMAX (Table 1). The diagnostics used here are slightly different than those used in Sharman and Pearson (2017) based on reevaluations with newer observations and updates to the NOAA Rapid Refresh (RAP) NWP model. The remapping of turbulence diagnostics to EDR and the selection of the 19 diagnostics used for the ensemble mean are the same as calibrated for the NOAA RAP version 4 forecasts over the CONUS domain, which uses the same horizontal and vertical resolution as the pseudo-GEFS model, following the procedure outlined in Sharman and Pearson (2017).
Table 1

The list of 11 CAT and 8 MWT diagnostics used. The variable ds (m s−1) is a near-surface MWT diagnostic: ds = 0, if model terrain height (h) < 500 m or gradient(h) < 6.0 m km−1; otherwise, ds = vertical wind speed maximum in the lowest 1500 m.

Table 1

c. Probabilistic forecasts

Probabilistic turbulence forecasts are generated by calculating percentages of the number of ensemble members that fall within predefined EDR ranges: e.g., “light” (0.15 ≤ EDR < 0.22), “moderate” (0.22 ≤ EDR < 0.34), and “severe or greater (SOG)” (EDR ≥ 0.34) turbulence for midsized aircraft, the same as those used in Sharman and Pearson (2017) based on the guidance provided by Sharman et al. (2014). Herein, we tested three stand-alone methods to produce probabilistic turbulence forecasts—a multi-diagnostic ensemble (MDE) approach, a time-lagged NWP ensemble (TLE) approach, and a forecast-model ensemble (FME) (or an ensemble NWP model) approach, and two combined methods—a time-lagged multi-diagnostic ensemble (TMDE) approach and an ensemble-model multi-diagnostic (FMDE) approach. Table 2 summarizes the five probabilistic forecast methods tested in this study with details as follow.

Table 2

Summary of probabilistic forecast methods used in this study: the number of weather forecasts NWx (identical to the number of GTG forecast runs), the number of turbulence diagnostics ND (=NCAT + NMWT), the type of ensemble members (individual diagnostics vs GTGMAX), the number of total ensemble members used to compute probability NENS, and approximate computing runtimes using the runtime of MDE-19D as a baseline; NENS = NWx × ND if individual turbulence diagnostics are used as ensemble members; NENS = NWx if the final GTG products, GTGMAX, are used as ensemble members.

Table 2

1) Multi-diagnostic ensemble (MDE) approach

The MDE approach represents the uncertainty in diagnosing turbulence for a deterministic weather state (i.e., a single NWP input into GTG). This approach interprets individual remapped turbulence diagnostics calculated in GTG as ensemble members, and the number of ensemble members (NENS) is equal to the number of turbulence diagnostics (ND = NCAT + NMWT; Table 2). The MDE approach to compute probabilities showed promise in predicting CAT and MWT based on an earlier version of NCEP’s GFS model (Kim et al. 2018). In the MDE approach, the probability of the remapped diagnostics is computed for CAT and MWT, separately, and then the maximum of the two is taken to be the final combined GTG probabilistic forecast (Prob):
ProbCAT=i=1NCATICATi/NCAT×100(%),
ProbMWT=i=1NMWTIMWTi/NMWT×100(%),
Prob=MAX(ProbCAT,ProbMWT),
where ICATi and IMWTi are binary outcomes for detecting CAT and MWT for the ith diagnostic (1 if a turbulence event is forecasted and 0 if it is not), respectively. The probabilities are computed over the entire xy grid at all FL ≥ 20 000 ft (∼6 km).

An example of three representative ensemble members of the MDE method is shown in Figs. 5a–c, which are derived from the control (unperturbed) NWP member of the pseudo-GEFS 12-h forecasts valid at 1800 UTC 3 December 2019. Ellrod3 (Table 1) does capture moderate and severe turbulence events over eastern Colorado, as well as a cluster of light and moderate turbulence events along the Nevada–Utah border, but misses light turbulence events near the Arizona–New Mexico border (Fig. 5a). FTH/Ri (Table 1) predicts the light turbulence events near the Arizona–New Mexico border, but the turbulence events in Florida are not captured (Fig. 5b). VARE/Ri (Table 1) tends to predict larger areas of turbulence; this leads to the prediction of the light turbulence events near the Arizona–New Mexico border and over Florida, but also a large number of false alarms over the southeastern United States (Fig. 5c). These differences among the turbulence diagnostics show the uncertainty that may be associated with diagnosing turbulence.

Fig. 5.
Fig. 5.

An example of three ensemble members of the MDE method: (a) Ellrod3, (b) FTH/Ri, and (c) VARE/Ri remapped to EDR scales at FL370, valid at 1800 UTC 3 Dec 2019; an example of ensemble members of the TLE method: (d) 12-h, (e) 24-h, and (f) 36-h GTGMAX forecasts. Cross marks are light or greater (LOG) in situ EDR reports (EDR ≥0.15) within ±1-h time window and ±1000-ft altitude colored by EDR scales (m2/3 s−1).

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

For this comparative study, two different sets of diagnostics were tested. One uses 19 diagnostics (11 CAT and 8 MWT, as listed in Table 1) and the other uses 77 diagnostics (62 CAT and 15 MWT) that are available in the GTG version used in this study: hereafter, referred to as MDE-19D and MDE-77D, respectively (Table 2). All diagnostics used have been mapped to EDR values using the method outline in Sharman and Pearson (2017).

2) Time-lagged ensemble (TLE) approach

The TLE method has been successfully used in ensemble weather forecasting (e.g., Thompson et al. 2017; Xu et al. 2019), as well as in case studies of high-resolution probabilistic turbulence forecasting (Kim et al. 2015). The TLE approach consists of multiple deterministic forecasts that are derived from multiple NWP forecasts that target the same forecast valid time but differ in forecast lengths. By using different forecast lead times, the TLE method can represent the uncertainty in weather forecasts due to inherent predictability limits of atmospheric motions, constrained by model formulations of the underlying NWP model (e.g., numerics and physics parameterizations). A single NWP input provides a single ensemble mean of turbulence diagnostics, i.e., GTGMAX [Eqs. (1) and (2)], computed using the selected 19 diagnostics, and are the same as those used in the deterministic forecasts (Table 1). In the TLE approach:
Prob=i=1NWxIGTGi/NWx×100(%),
where IGTGi are binary outcomes for detecting turbulence events based on GTGMAX for the ith time-lagged member, and NWx is the number of weather forecasts, i.e., the number of time-lagged NWP ensemble members in the case of TLE. With 6-hourly forecasts out to 36 h, this method can use up to 5 time-lagged forecasts for 12-h probabilistic forecasts (NWx = 5, consists of forecast lead times of 12, 18, 24, 30, and 36 h). The forecast length (e.g., 12-h) of the TLE-based probabilistic forecasts refers to the forecast length of the shortest lead-time ensemble member that constructs the probabilistic forecasts. For example, a 12-h probabilistic forecast valid at T0 is only available when the 12-h deterministic forecast valid at T0 (i.e., the shortest lead-time forecast) is completed. This small number of ensemble members limits the granularity of the probabilistic forecasts (i.e., intervals of 100/5 = 20% for 12-h forecasts). Further expanding the forecast lead time increases the ensemble size but may degrade the accuracy of the probabilistic forecasts because of the inclusion of more uncertain longer lead times.

An example of selected ensemble members of the TLE method is shown in Figs. 5d–f. Overall, there is more agreement among the ensemble members of TLE compared to MDE. The light turbulence events over Arizona/New Mexico and Florida, as well as the intensity of moderate and severe turbulence events over Colorado, are better captured by the 12- and 24-h forecasts (i.e., shorter lead times) than by the 36-h forecasts.

3) Forecast-model ensemble (FME) (or ensemble NWP model) approach

The FME approach represents the uncertainty in NWP input into GTG due to NWP initial conditions and model formulations, by using the full 21 weather forecast ensemble members of the pseudo-GEFS. The pseudo-GEFS is designed to mimic the characteristics of the GEFS ensemble (section 2a and Fig. 3). The FME approach uses GTGMAX from the ensemble NWP members to compute probability, as in the TLE approach [Eq. (5)], and NWx = 21. Operationally, the TLE approach may have an advantage over the FME approach since they are (or could be) based on higher-resolution deterministic forecasts.

4) Combined approaches

The combined approaches—time-lagged multi-diagnostic ensemble (TMDE) and forecast-model multi-diagnostic ensemble (FMDE)—are designed to take into account the uncertainties in both the NWP input and turbulence diagnostics. While the TLE and FME approaches use GTGMAX of multiple NWP members to compute probabilities [Eq. (5); NENS = NWx], the TMDE and FMDE approaches use a set of individual turbulence diagnostics from each NWP ensemble member to compute probabilities. Therefore, NENS = NWx × ND (Table 2):
ProbCAT=j=1NWxi=1NCATICATi,j/(NWx×NCAT)×100(%),
ProbMWT=j=1NWxi=1NMWTIMWTi,j/(NWx×NMWT)×100(%),
Prob=MAX(ProbCAT,ProbMWT)
TMDE is tested using ND = 19 (as listed in Table 1) and ND = 77, and is referred to as TMDE-5TL×19D and TMDE-5TL×77D, respectively (Table 2). FMDE is tested using ND = 19, and referred to as FMDE-21EM×19D.

3. Assessments of the different probabilistic forecast approaches

The characteristics of ensembles from the different probabilistic forecast approaches and the resultant probabilistic turbulence forecasts are investigated based on two cases in section 3a. Statistical evaluations based on probabilistic forecasts over a 1-yr period follow in section 3b.

a. Case studies: Characteristics of different approaches

Ensemble spreads from 12-h turbulence forecasts (EDR) are compared in Fig. 6 for the 3 December 2019 case, together with the corresponding deterministic EDR forecasts derived from the control (unperturbed) NWP member of the pseudo-GEFS weather forecasts. The 3 December 2019 case was dominated by CAT along a jet with the jet core approximately at 300 hPa exceeding 65 m s−1 over southern Colorado (not shown). The deterministic forecasts overlaid with in situ EDR reports show moderate and severe turbulence reports over eastern Colorado, as well as a cluster of light and moderate turbulence reports along the Nevada–Utah border (Fig. 6h). Some MWT reports were also recorded over the Colorado Rockies (Trier et al. 2022).

Fig. 6.
Fig. 6.

(a)–(g) Ensemble spreads of EDR (m2/3 s−1) at FL370 derived from 12-h probabilistic forecasts valid at 1800 UTC 3 Dec 2019, together with (h) 12-h deterministic EDR forecasts derived from the control (unperturbed) NWP member of the pseudo-GEFS. In (a)–(g), in situ EDR reports of light (green cross marks), moderate (yellow cross marks), and SOG (red cross marks) turbulence within ±1-h time window and ±1000-ft altitude are shown; in (h), the same in situ EDR reports are colored by EDR scales, with null-to-light reports in white dots.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

For the TLE and FME approaches that use GTGMAX to compute probabilities [i.e., Eq. (5)], the ensemble spread is defined as the standard deviation (STD) of GTGMAX with respect to the ensemble mean:
Spread=[i=1NWx(GTGMAXiGTGMAX¯)2/NWx]1/2,
GTGMAX¯=i=1NWxGTGMAXi/NWx
For the MDE, TMDE, and FMDE approaches that compute probabilities of CAT and MWT separately [i.e., Eqs. (3), (4) and (6), (7)], the maximum of the ensemble spreads of CAT and MWT is defined as the final ensemble spread:
SpreadCAT=[j=1NWxi=1NCAT(DCATi,jGTGCAT¯)2/(NWx×NCAT)]1/2,
GTGCAT¯=j=1NWxi=1NCATDCATi,j/(NWx×NCAT),
SpreadMWT=[j=1NWxi=1NMWT(DMWTi,jGTGMWT¯)2/(NWx×NMWT)]1/2,
GTGMWT¯=j=1NWxi=1NMWTDMWTi,j/(NWx×NMWT),
Spread=MAX(SpreadCAT,SpreadMWT)
Among the three stand-alone approaches—MDE, TLE, and FME (Figs. 6a–d)—the MDE method shows the largest ensemble spread by far; and the maximum ensemble spread of MDE exceeding 0.20 m2/3 s−1 is comparable to the intensity of moderate turbulence (EDR ≥ 0.22 m2/3 s−1), indicating the large uncertainties in turbulence forecasts. The large dispersion of the MDE method for this selected case mainly results from the spread of CAT diagnostics, except over the Colorado Rockies where some MWT reports were recorded and the Alberta Rockies (cf. Figs. 7a,b). One possible reason for the small ensemble spread of the TLE and FME methods is the limits imposed by using a single model in ensemble weather forecasting (i.e., using the same model formulation, including the parameterization of subgrid-scale processes), which leads to restricted sampling of the forecast space (Slingo and Palmer 2011). Moreover, operational ensemble weather forecasting systems (e.g., GEFS) in general target medium-range forecasts (3–7 days in advance) (e.g., Zhou et al. 2017), and this may limit the ensemble spread of shorter-term weather forecasts that turbulence forecasts rely on. Another contributing factor for the smaller spread is that the TLE and FME methods use GTGMAX to compute probabilities [Eq. (5)], and GTGMAX is based on averages of multiple diagnostics [Eqs. (1) and (2)], which can smooth out outliers that individual diagnostics may produce. However, the impact of using the averages is minimal as can be seen by comparisons using a single diagnostic (not shown). For all approaches, large ensemble spreads are found over the areas of large deterministic EDR values (cf. Fig. 6h).
Fig. 7.
Fig. 7.

Ensemble spreads of (left) CAT and (right) MWT diagnostics remapped to EDR (m2/3 s−1) derived from 12-h probabilistic forecasts based on the MDE-19D approach: (a),(b) 1800 UTC 3 Dec 2019 at FL370 and (c),(d) 1800 UTC 11 Jan 2020 at FL340.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

Increasing the number of diagnostics in the MDE method further increases the peak values of the ensemble spread, e.g., over Colorado and Montana, and over the Pacific Ocean (cf. Figs. 6a,b). Overall, the spatial distributions and magnitudes of the ensemble spread of the two combined methods—TMDE-5TL×19D and FMDE-21EM×19D (Figs. 6e,f)—are similar to those of MDE-19D alone (Fig. 6a) that use the same number of turbulence diagnostics, except that the areas of the small ensemble spread of approximately 0.04–0.06 m2/3 s−1 are more widely distributed when the combined methods are used. Increasing the number of diagnostics in the combined TMDE method further smooths the areas of small ensemble spread of 0.04–0.06 m2/3 s−1, while increasing the peak values of the ensemble spread (cf. Figs. 6e,g).

Ensemble spreads for the 11 January 2020 case are compared in Fig. 8. The 11 January 2020 case has a cluster of moderate and severe turbulence reports associated with MWT over the Colorado Rockies at altitudes around FL340. A number of moderate and severe turbulence reports were also recorded at lower altitudes over the Mississippi Valley associated with strong convection, as well as CAT over Illinois (Fig. 2b). Overall, the characteristic features of different probabilistic approaches that were found from the 3 December 2019 case hold, including the larger spreads of MDE, TMDE, and FMDE (cf. Figs. 6,8). Over the Colorado Rockies, the spread of MWT diagnostics has a relatively larger contribution to the dispersion of the MDE method, while the spread of CAT diagnostics is larger elsewhere (cf. Figs. 7c,d). The ensemble spreads of the TLE and FME methods over the mountainous region are even smaller in this January case than in the 3 December case. Compared to synoptic-scale jet streams that dominate the December case, mountain waves are less resolvable in the 13-km pseudo-GEFS NWP model. This difference in scales of the primary turbulence generation mechanisms between the two cases may contribute to the less dispersive NWP ensembles—i.e., TLE and FME—over the MWT region in the January case.

Fig. 8.
Fig. 8.

As in Fig. 6, but for 1800 UTC 11 Jan 2020 at FL340.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

The results of 12-h probabilistic forecasts of light, moderate, and SOG turbulence at FL370 for the 3 December 2019 case are displayed in Figs. 911, together with the corresponding 12-h deterministic forecasts. For the prediction of light turbulence (Fig. 9), MDE-19D shows a relatively wide dispersion of intermediate probabilities ranging between 20% and 80% (Fig. 9a), indicative of a relatively large ensemble spread in comparison to the TLE and FME forecasts (Figs. 9c,d). By increasing the number of diagnostics used in the MDE approach, MDE-77D predicts light turbulence over larger areas at lower probabilities rarely exceeding 60% (Fig. 9b). Such large ensemble spread may result in a high probability of detection for low percentage thresholds (e.g., 10%) but at the same time the number of false alarms may increase if ensemble spread is too large. The light turbulence events over southern Arizona–New Mexico and Florida, which are captured by MDE-77D but missed by MDE-19D are good examples showing the characteristics of a large ensemble spread.

Fig. 9.
Fig. 9.

(a)–(g) 12-h probabilistic forecasts (%) of light turbulence (0.15 ≤ EDR < 0.22) at FL370 valid at 1800 UTC 3 Dec 2019, together with (h) 12-h deterministic EDR forecasts (m2/3 s−1). In situ EDR reports of light turbulence within ±1-h time window and ±1000-ft altitude are marked by green circles in (a)–(g); in (h), in situ EDR reports of light or greater turbulence within the same time and altitude windows are colored by EDR scales, with null-to-light reports in white dots.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

Fig. 10.
Fig. 10.

As in Fig. 9, but for moderate turbulence (0.22 ≤ EDR < 0.34) and in situ EDR reports of moderate turbulence marked by yellow circles in (a)–(g).

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

Fig. 11.
Fig. 11.

As in Fig. 10, but for SOG turbulence (EDR ≥ 0.34) and in situ EDR reports of SOG turbulence marked by red circles in (a)–(g).

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

The TLE- and FME-based forecasts show probabilities exceeding 80% in most regions where light turbulence is predicted (Figs. 9c,d), indicative of a relatively small ensemble spread (i.e., large degree of agreement among ensemble members). Such small spread may reduce the number of false alarms for low probability thresholds but also increases the number of missed events, and generally its performance is highly dependent on the performance of the underlying NWP model. Also, when the TLE and FME methods capture a turbulence event, they tend to capture it at higher probabilities (i.e., with higher confidence). These characteristics of the TLE and FME-based forecasts are better seen in moderate and SOG turbulence forecasts (Figs. 10 and 11). TLE-5TL and FME-21EM predict moderate turbulence over Colorado at higher probabilities than other methods, while missing the moderate events along the Nevada–Utah border and predicting the SOG events over Colorado at probabilities lower than 10% (Figs. 10c,d and 11c,d).

Combining the MDE method with the other two—i.e., TMDE and FMDE—leads to more widely spread lower probabilities compared to TLE and FME as expected from the ensemble spread (Figs. 9e–g, 10e–g, and 11e–g). Similar to the MDE forecasts, the TMDE and FMDE-based forecasts capture the moderate events along the Nevada–Utah border and the SOG events over Colorado, which are missed in the TLE and FME-based forecasts. The differences between the combined approaches and the stand-alone MDE approach are not noticeable from this selected case. The probabilistic forecast results for the 11 January 2020 case are presented in Fig. 12, confirming the characteristics of different approaches: the severe turbulence events over Colorado that are predicted by the MDE approach at low probabilities (10%–20%) but entirely missed by the TLE and FME approaches, and the light and moderate turbulence events over Colorado that are predicted at higher probabilities (with higher confidence) by the TLE and FME approaches.

Fig. 12.
Fig. 12.

The 12-h probabilistic forecasts (%) of (a)–(d) light, (e)–(h) moderate, and (i)–(l) SOG turbulence at FL340 valid at 1800 UTC 11 Jan 2020. In situ EDR reports within ±1-h time window and ±1000-ft altitude are marked by circles: light (green), moderate (yellow), and SOG (red) turbulence reports.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

b. Statistical evaluations

Statistical evaluations were performed using three evaluation metrics: the Brier skill score (BSS), a relative (or receiver) operating characteristic (ROC) curve, and a reliability diagram (Wilks 2011; Gill and Buchanan 2014). The 12-h probabilistic forecasts over the CONUS domain valid at 1800 UTC for a 1-yr period between 21 May 2019 and 20 May 2020 were evaluated in comparison to in situ EDR reports and PIREPs converted to EDR. Each EDR report was compared to the forecasted probability at the nearest grid point to the observation. The in situ EDR reports used are the peak values recorded over 1 min of flight time and are from Delta Air Lines and Southwest Airlines flights over the CONUS domain. The in situ EDR data are reported at regular 15- or 20-min intervals and also immediately when the peak EDR exceeds 0.18 m2/3 s−1. For more details of in situ EDR reporting, refer to section 2d in Sharman et al. (2014). The PIREPs of turbulence intensity category (P) on the 0–8 scale is mapped to EDR using the best-fitting curves derived by Sharman et al. (2014):
EDRPIREP=C×R(W)×P2,C=0.0138,
where R(W) is a coefficient that depends on aircraft weight-class categories. R(W) = 0.82, 1.0, and 1.22 for the ICAO “light,” “medium,” and “heavy” weight-class categories, respectively (Sharman and Pearson 2017). Using only data at upper levels (20 000–60 000 ft) and within a ±1-h time window, the total number of reports from in situ EDR and PIREPs used to compute statistics was 1 280 246, including 23 095 light turbulence events (0.15 ≤ EDR < 0.22), 7260 moderate turbulence events (0.22 ≤ EDR < 0.34), and 810 SOG events (EDR ≥ 0.34). The percentages of the light, moderate, and SOG turbulence reports to the total number of EDR reports are 1.803%, 0.567%, and 0.063%, respectively.
The BSS measures the mean square errors of probability forecasts relative to the errors of reference forecasts (deterministic forecasts in the case of this study). The deterministic (reference) forecasts refer to the GTGMAX fields derived from the control (unperturbed) NWP member of the pseudo-GEFS (e.g., Fig. 6h and used the 19 turbulence diagnostics listed in Table 1, as described in section 2b [Eqs. (1) and (2)]). With the BSS, higher is better with 1 being the best score, and positive/negative score values indicating better/worse performance than the reference forecasts:
BSS=1BS/BSref,
BS=i=1Nobs(ProbiOi)2/Nobs,
BSref=i=1Nobs(DeteiOi)2/Nobs
Here BS and BSref are the mean square errors of probabilistic and deterministic (reference) forecasts, respectively. The reference deterministic forecasts were computed using 19 turbulence diagnostics listed in Table 1 [Eqs. (1) and (2), section 2b]. The term Oi is the binary outcome of a turbulence event (Oi = 1 if an event is observed and 0 if it is not). Probi is the predicted probability and varies between 0 and 1. Detei is the binary outcome for predicting turbulence. Detei = 1 if a turbulence event is predicted—e.g., GTGMAX ≥ 0.15 for LOG turbulence events—and 0 if it does not. The BSS values derived from different probabilistic forecast approaches are compared in Table 3.
Table 3

The Brier skill score (BSS) of 12-h probabilistic forecasts of LOG and MOG turbulence events, derived from 345 cases over a 1-yr period from 21 May 2019 to 20 May 2020. The BSS is calculated using the 12-h deterministic forecasts as reference predictions. The best scores among the stand-alone methods and among all methods are highlighted in italics and in bold, respectively.

Table 3

Among the three stand-alone approaches, the MDE method that uses all 77 diagnostics exhibits the highest BSS for light or greater (LOG) and moderate or greater (MOG) turbulence events. One thing to note is the improvement of the MDE-based probabilistic forecasts attained by increasing the number of diagnostics from 19 to 77. Increasing the number of diagnostics from 19 to 77 leads to improvements of statistical performance by incorporating a larger variety of turbulence generation mechanisms. The BSS is increased from 0.353 to 0.479 for LOG and from 0.157 to 0.359 for MOG. Increasing the number of diagnostics in the MDE framework does not require any additional GTG runs (i.e., NWx = 1), therefore this performance gain is achievable with only a small increase of computational cost. A single MDE-77D forecast run time is only 1.2–1.3 times longer than a single MDE-19 forecast, compared to the TLE or FME approaches for which computational times increase linearly with the number of ensembles used (Table 2). Among the three stand-alone approaches, MDE-19D shows the second highest BSS for LOG turbulence, and TLE-5TL for MOG turbulence. Between the two NWP ensemble approaches, the TLE approach outperforms the FME approach. This is at least in part due to the slightly larger ensemble spread of TLE than FME where turbulence events are reported (Figs. 6c,d and 8c,d).

Combining MDE with either TLE or FME also improves the BSS; the skill score is improved from 0.353 to 0.437 (TMDE-5TL×19D) or 0.409 (FMDE-21EM×19D) for LOG turbulence, and from 0.157 to 0.343 (TMDE-5TL×19D) or 0.273 (FMDE-21EM×19D) for MOG turbulence. The performance gain by combining the MDE to TLE is comparable to the performance gain by increasing the number of diagnostics in the MDE approach, but at the expense of additional computational costs proportional to the number of weather forecasts (Table 2). Increasing the number of diagnostics from 19 to 77 in the combined TMDE framework further improves the skill score; in terms of the BSS, the largest improvement over the reference (deterministic) forecast is obtained when using the TMDE approach with 77 diagnostics.

The ROC curve and the area under the ROC curve (AUC) score measure the ability of forecasts to distinguish events from nonevents, based on the 2 × 2 contingency table (Table 4). The ROC curve is created by connecting the pairs of the probability of forecasts to predict observed events (probability of detection (POD); “observations = yes” and “forecasts = yes”) against the rate of forecasts to produce false alarms [false alarm rate (FAR); “observations = no” and “forecasts = yes”] at varying probability thresholds from 0% to 100%. For each probability threshold, POD and FAR are derived from a 2 × 2 contingency table (Table 4):
POD=hithit+miss,
FAR=falsealarmfalsealarm+correctrejection.
The ROC curves of LOG and MOG turbulence events are shown in Fig. 13. The MDE, TLE, and FME methods show sharp bends in the ROC curves, particularly in the case of MOG turbulence. Where the curves bend corresponds to the POD-FAR pair of the lowest nonzero percentage threshold of each method: e.g., 9.09% in the case of MDE-19D (1/NCAT × 100), 20% and 4.76% in the case of TLE-5TL and FME-21EM (1/NWx × 100), respectively. These sharp bends are due to a scarcity of higher-intensity turbulence reports. The TLE and FME approaches exhibit sharper bends that are closer to the origin point than other approaches. These two approaches predict the lowest nonzero probabilities less often than other approaches (Fig. 14) and this is related to their small ensemble spread, resulting in more frequent predictions of either 0% or 100% probabilities and less frequent predictions of intermediate probabilities, as shown in the case studies. Therefore, the number of high-intensity turbulence reports at the lowest nonzero probability threshold is even smaller in these two approaches, leading to the sharper bends in the ROC curves and lower AUC scores than in other approaches (Table 5). This issue of ROC curves for rare events was also noted by Casati et al. (2008) (i.e., “a tendency for the points on the ROC to cluster toward the lower left corner of graph”) and where the ROC curve bends (or “how much of the curve is missing”) depends on the lowest nonzero probability (Ben Bouallègue and Richardson 2021), consistent with our analysis results. The ROC curves with the bends inevitably lead to lower AUC scores than those without the bends. Therefore, comparing the slopes of the ROC curves for low FAR provides more useful comparison than comparing the AUC scores (Gill 2016; Storer et al. 2019).
Table 4

The 2 × 2 contingency table.

Table 4
Fig. 13.
Fig. 13.

ROC curves of (a) LOG and (b) MOG turbulence forecasts derived from MDE-19D (solid blue), MDE-77D (dotted blue), TLE-5TL (solid red), FME-21EM (solid green), TMDE-5TL×19D (solid purple), TMDE-5TL×77D (dotted purple), and FMDE-21EM×19D (solid cyan), together with a random classifier (gray).

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

Fig. 14.
Fig. 14.

Histograms of predicted probabilities of (a) LOG and (b) MOG turbulence that correspond to in situ EDR reports and PIREPs converted to EDR (1 280 246 samples). The number of bins equals to NWx in the case of TLE and FME, and the maximum between NCAT and NWx in the case of MDE, TMDE, and FMDE. Frequency of turbulence (%) shown at the bottom of each panel is the ratio of forecasts that predict turbulence (probability > 0%) to the total number of EDR observations.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

Table 5

As in Table 3, but for the AUC score.

Table 5

The slopes of the ROC curves presented in Fig. 13 are estimated by finding the ratio of the change of POD to the change of FAR centered at FAR ∼0.2 (0.05) where the TLE and FME ROC curves of LOG (MOG) bend. A larger slope indicates a better prediction skill, i.e., a larger increase of POD for the same increase of FAR for practical regions of interest (i.e., low FAR and high POD). Similar reasoning using a partial ROC was used by Gill (2016). For LOG turbulence (MOG turbulence), FME-21EM shows the steepest slopes for FAR < 0.2 (for FAR < 0.05), followed by TLE-5TL and TMDE-5TL×77D. MDE-19D shows the least steep slope. Increasing the number of diagnostics improves the MDE-based probabilistic forecasts (i.e., the slopes of the ROC curves are steeper in MDE-77D), and combining MDE with either TLE or FME also leads to steeper slopes, therefore better prediction skills. This is consistent with the findings based on the BSS scores.

Histograms of predicted probabilities that correspond to in situ EDR reports and PIREPs and reliability diagrams derived from the histograms are presented in Figs. 14 and 15, respectively. As was seen in the case studies, the MDE method predicts intermediate probabilities more frequently than the TLE and FME methods and 0% probability less often. It rarely predicts 100% probability, as expected from the large disagreement among the diagnostics (Figs. 5a–c). The TLE and FME methods, on the other hand, predict both 0% and 100% probabilities more frequently than the other methods, indicative of a small ensemble spread. Combining MDE with either TLE or FME decreases the frequency of 0% probability while increasing the frequency of low probabilities between 0% and 20%. The relative frequency of forecasted events (i.e., the sum of frequencies for probability > 0% in the histograms) is much higher than the relative frequency of observed turbulence events with respect to the total number of observations (2.43% and 0.63% for LOG and MOG turbulence events, respectively). This emphasizes the overall overprediction of events regardless of method, if not calibrated, as presented in Fig. 15. This overprediction of turbulence events by probabilistic forecasts was also found in previous studies as well (e.g., Gill and Buchanan 2014; Storer et al. 2019, 2020; Lee et al. 2020). This is due in part to a suspected bias in the observations toward nonturbulent events created when pilots try to avoid regions of known turbulence (Sharman et al. 2014; Lee et al. 2020). Lee (2021) demonstrated that the reliability curves can be improved by using subsets of in situ EDR reports that were randomly sampled to have the ratios of turbulence events to the total in situ reports similar to those of the forecasts.

Fig. 15.
Fig. 15.

Reliability diagrams of (a) LOG and (b) MOG turbulence forecasts derived from MDE-19D (solid blue), MDE-77D (dotted blue), TLE-5TL (solid red), FME-21EM (solid green), TMDE-5TL×19D (solid purple), TMDE-5TL×77D (dotted purple), and FMDE-21EM×19D (solid cyan); reliability diagrams with forecast probability divided by a calibration constant (6.0 for LOG and 12.0 for MOG) are shown in the insets. The diagonal (black solid) and horizontal (black dashed) lines indicate perfect reliability and no resolution (observed climatology), respectively.

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

To address the overprediction of probabilistic forecast, which is obvious in the reliability curves (Fig. 15), recalibration can be introduced following Storer et al. (2019, 2020). Multiplying the forecast probabilities of LOG and MOG turbulence events by calibration constants of 1/6 and 1/12, respectively, results in the reliability diagrams plotted in the insets of Fig. 15. The calibration constants depend on turbulence thresholds (i.e., LOG or MOG turbulence) and underlying NWP models. For example, Storer et al. (2020) used 1/50 for UKMO MOGREPS and 1/40 for ECWMF EPS. For LOG turbulence (Fig. 15a), the MDE, TMDE, and FMDE show similar slopes that are in parallel with the perfect reliability line (solid diagonal line) for calibrated forecast probability up to ∼3%. For calibrated probabilities above ∼4%, the TMDE-5TL×77D forecasts follow the perfect reliability line but with only slight overprediction. The combined forecasts that use selected diagnostics (TMDE-5TL × 19D and FMDE-5TL×19D) and the MDE forecasts further deviate from the perfect reliability line (i.e., overforecast). The TLE and FME forecasts show relatively flat lines, and it is hard to improve the reliability of these two methods by modulating the calibration constant; decreasing the calibration constant exacerbates the underprediction at low probabilities, while increasing it results in overprediction at all probabilities.

For MOG turbulence (Fig. 15b), MDE-77D and TMDE (TMDE-5TL×19D and TMDE-5TL×77D) show the slope closer to the perfect reliability line than other forecasts, but underforecast for calibrated probabilities above ∼6%–7%. The TLE and FME forecasts show relatively flat lines, as for LOG turbulence. The improvements of statistical performance achieved by combining the NWP-based TLE or FME methods with the MDE method is mainly because of their different characteristics of ensemble spreads: i.e., the larger spread of the MDE approach that leads to a wider dispersion of intermediate probabilities and the smaller spread of TLE and FME that results in probabilities at extremes.

4. Summary and conclusions

In this study, we implemented various probabilistic forecasting approaches to aviation turbulence forecasts and compared their characteristic features and prediction skills to identify the most suitable approach for operational implementation of probabilistic turbulence forecasts. This included the MDE method that represents the uncertainty in turbulence diagnostics, the TLE and FME methods that represent the uncertainty in weather forecasts, and the combined TMDE and FMDE methods that take account of both types of uncertainties. Two case studies were conducted to compare the characteristic features of the different approaches focusing on the ensemble spread of the forecasted EDR and the resulting probabilistic forecasts. Large ensemble spreads were found over the areas of large (deterministically predicted and observed) EDR for all methods, and the ensemble spreads can be comparable to or even larger than the intensity of MOG turbulence. This indicates large uncertainties in predicting aviation turbulence events, supporting the necessity of probabilistic forecasts. The stand-alone and combined multi-diagnostic approaches—MDE, TMDE, and FMDE—all exhibited larger spreads than either of the NWP ensemble approaches tested—TLE and FME. The larger spreads allowed for a higher probability of detection for low percentage thresholds at the cost of increasing false alarms. The small spreads of TLE and FME resulted in either hits with higher probability or missed events, highly dependent on the performance of the underlying NWP model. Operationally, the TLE approach may have an advantage over the FME approach since they are (or could be) based on higher-resolution deterministic forecasts.

Each approach has its own advantages and disadvantages over others, hence, statistical evaluations against in situ EDR reports and PIREPs were performed to quantify trade-offs. The three different evaluation metrics considered—the Brier skill score, the ROC curve, and the reliability diagram—all indicate superior skill for the multi-diagnostic approach MDE compared to the NWP model-based ensemble approaches, TLE and FME, especially when 77 turbulence diagnostics were used. The multi-diagnostic approach should therefore be preferred operationally for describing the uncertainty of turbulence forecasts, even when a relatively small number of turbulence diagnostics (19 in this case) are used. Also the computational resources required are less for the MDE approach than for the TLE or FME approaches (Table 2). Combining multiple diagnostics with either NWP time-lagged ensembles or NWP forecast ensembles can further improve prediction skill if sufficient computational resources are available.

For the pseudo-GEFS NWP dataset we tested in this study, it was found that representing the uncertainty in diagnosing turbulence is more important, i.e., provides higher forecast skill, than representing the uncertainty in NWP. This is because the pseudo-GEFS dataset has limited spread as can be inferred from the spread of GEFS (Zhou et al. 2017), which was used to generate the pseudo-GEFS dataset. In other words, the pseudo-GEFS ensemble spread is not sufficiently large to represent the NWP forecast error, which is carried over to the ensemble spread of turbulence forecasts (indicated in Figs. 6 and 8). Given the limited NWP spread, representing the uncertainty in turbulence diagnostics plays a more important role than representing the uncertainty in NWP, for the NWP ensemble model tested in this study.

Finally, it should be noted that these results are based on the GEFS NWP ensemble model, and if other NWP ensemble models that have larger or smaller spreads than the GEFS are considered, the results may change. Increasing the number of ensemble members of the time-lagged or forecast-model approaches can further increase the ensemble spread of these methods. The spreads of these methods can also be sensitive to NWP model resolutions, considering that the degree to which moist convection and mesoscale processes are resolved depends on model resolutions. However, it is likely that the MDE approach will still have larger spreads than these other ensemble models. Implementing recently developed CIT diagnostics (e.g., Kim et al. 2021) into the MDE framework could help improve the performance of probabilistic forecasts. For the GEFS NWP ensemble model tested in this study, it was shown that probabilistic forecasts skill is improved by increasing spreads of the NWP/diagnostic ensembles. The majority of previous studies on probabilistic forecasts of aviation turbulence also pointed out the lack of spread in their ensembles (e.g., Kim et al. 2018; Lee et al. 2020), and we can expect improved forecast skill by increasing the spread. However, if NWP models that have smaller forecast errors (forecast uncertainty) are considered, the results may change as they need smaller spreads to represent their forecast uncertainty.

Acknowledgments.

This research is in response to requirements and funding by the Federal Aviation Administration (FAA). The views expressed are those of the authors and do not necessarily represent the official policy or position of the FAA. The National Center for Atmospheric Research is sponsored by the National Science Foundation. We appreciate Teddie Keller (NCAR) for providing the turbulence and lightning observations plots that were used in Fig. 2, and Han-Chang Ko and Hye-Yeong Chun (Yonsei University) for their help with the HVRRD data processing. Matthias Steiner and Ken Stone (NCAR), and Jung-Hoon Kim (Seoul National University) are acknowledged for informative discussions that benefited this work. Junkyung Kay (NCAR) and Jiwoo Lee (LLNL) are acknowledged for discussions that helped the pseudo-GEFS model setup. Peer reviews provided by Joshua Scheck (NOAA AWC) and two anonymous reviewers helped improve and clarify the manuscript.

Data availability statement.

All simulations presented in the case studies will be made available from the corresponding author upon request. For the simulations used for statistical evaluations, relevant post-processed output including EDR reports and matching forecasts will be made available from the corresponding author upon request. The PIREPs used here can be obtained from NOAA’s Family of Services (https://weather.gov/noaaport/), and the in situ EDR data are available through NOAA’s MADIS data (https://madis-data.noaa.gov/madisPublic1/data/archive). The raw HVRRD soundings at a 1-s sampling frequency are available from the NOAA NCEI (https://www.ncei.noaa.gov/data/us-radiosonde-bufr/).

APPENDIX

Pseudo-GEFS Model Setup

The pseudo-GEFS ensemble forecasts were produced by dynamically downscaling the publicly available 1° GEFS ensemble forecasts. For dynamical downscaling, the Weather Research and Forecasting (WRF) Model version 4.2.1 was configured over the contiguous United States (CONUS) domain (Fig. A1) on Lambert conformal projection and run at a 13-km horizontal grid spacing on 50 vertical sigma levels with a model top at 10 hPa.

Fig. A1.
Fig. A1.

Pseudo-GEFS model domain with terrain heights (m).

Citation: Weather and Forecasting 38, 1; 10.1175/WAF-D-22-0086.1

The physic parameterizations include the Thompson microphysics scheme (Thompson et al. 2008), the Grell–Freitas convective parameterization (Grell and Freitas 2014), the Rapid Radiative Transfer Model for General Circulation Models (RRTMG) longwave and shortwave radiation schemes (Iacono et al. 2008), the Mellor–Yamada–Nakanishi–Niino (MYNN) level-2.5 planetary boundary layer and surface layer schemes (Nakanishi and Niino 2009), and the unified Noah land surface model (Tewari et al. 2004). The model resolution and physics parameterization schemes used for the pseudo-GEFS are similar to the NOAA RAP NWP model.

REFERENCES

  • Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 4755, https://doi.org/10.1038/nature14956.

    • Search Google Scholar
    • Export Citation
  • Ben Bouallègue, Z., and D. S. Richardson, 2021: On the ROC area of ensemble forecasts for rare events. Wea. Forecasting, 37, 787796, https://doi.org/10.1175/WAF-D-21-0195.1.

    • Search Google Scholar
    • Export Citation
  • Benjamin, S. G., and Coauthors, 2016: A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144, 16691694, https://doi.org/10.1175/MWR-D-15-0242.1.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., 133, 10761097, https://doi.org/10.1175/MWR2905.1.

    • Search Google Scholar
    • Export Citation
  • Bush, M., and Coauthors, 2020: The first Met Office Unified Model–JULES regional atmosphere and land configuration, RAL1. Geosci. Model Dev., 13, 19992029, https://doi.org/10.5194/gmd-13-1999-2020.

    • Search Google Scholar
    • Export Citation
  • Casati, B., and Coauthors, 2008: Forecast verification: Current status and future directions. Meteor. Appl., 15, 318, https://doi.org/10.1002/met.52.

    • Search Google Scholar
    • Export Citation
  • Ellrod, G. P., and D. I. Knapp, 1992: An objective clear-air turbulence forecasting technique: Verification and operational use. Wea. Forecasting, 7, 150165, https://doi.org/10.1175/1520-0434(1992)007<0150:AOCATF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Ellrod, G. P., and J. A. Knox, 2010: Improvements to an operational clear-air turbulence diagnostic index by addition of a divergence trend term. Wea. Forecasting, 25, 789798, https://doi.org/10.1175/2009WAF2222290.1.

    • Search Google Scholar
    • Export Citation
  • Gill, P. G., 2016: Aviation turbulence forecast verification. Aviation Turbulence: Processes, Detection, Prediction, R. Sharman and T. Lane, Eds., Springer, 261–283, https://doi.org/10.1007/978-3-319-23630-8_13.

  • Gill, P. G., and P. Buchanan, 2014: An ensemble based turbulence forecasting system. Meteor. Appl., 21, 1219, https://doi.org/10.1002/met.1373.

    • Search Google Scholar
    • Export Citation
  • Grell, G. A., and S. R. Freitas, 2014: A scale and aerosol aware stochastic convective parameterization for weather and air quality modeling. Atmos. Chem. Phys., 14, 52335250, https://doi.org/10.5194/acp-14-5233-2014.

    • Search Google Scholar
    • Export Citation
  • Haiden, T., M. Janousek, F. Vitart, Z. Ben-Bouallegue, L. Ferranti, C. Prates, and D. Richardson, 2021: Evaluation of ECMWF forecasts, including the 2020 upgrade. ECMWF Tech. Memo. 880, 54 pp., https://www.ecmwf.int/en/elibrary/19879-evaluation-ecmwf-forecasts-including-2020-upgrade.

  • Iacono, M. J., J. S. Delamere, E. J. Mlawer, M. W. Shephard, S. A. Clough, and W. D. Collins, 2008: Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models. J. Geophys. Res., 113, D13103, https://doi.org/10.1029/2008JD009944.

    • Search Google Scholar
    • Export Citation
  • Kim, J.-H., W. N. Chan, B. Sridhar, and R. Sharman, 2015: Combined winds and turbulence prediction system for automated air-traffic management applications. J. Appl. Meteor. Climatol., 54, 766784, https://doi.org/10.1175/JAMC-D-14-0216.1.

    • Search Google Scholar
    • Export Citation
  • Kim, J.-H., R. Sharman, M. Strahan, J. W. Scheck, C. Bartholomew, J. C. H. Cheung, P. Buchanan, and N. Gait, 2018: Improvements in nonconvective aviation turbulence prediction for the World Area Forecast System. Bull. Amer. Meteor. Soc., 99, 22952311, https://doi.org/10.1175/BAMS-D-17-0117.1.

    • Search Google Scholar
    • Export Citation
  • Kim, S.-H., H.-Y. Chun, D.-B. Lee, J.-H. Kim, and R. D. Sharman, 2021: Improving numerical weather prediction–based near-cloud aviation turbulence forecasts by diagnosing convective gravity wave breaking. Wea. Forecasting, 36, 17351757, https://doi.org/10.1175/WAF-D-20-0213.1.

    • Search Google Scholar
    • Export Citation
  • Ko, H.-C., H.-Y. Chun, R. Wilson, and M. A. Geller, 2019: Characteristics of atmospheric turbulence retrieved from high vertical‐resolution radiosonde data in the United States. J. Geophys. Res. Atmos., 124, 75537579, https://doi.org/10.1029/2019JD030287.

    • Search Google Scholar
    • Export Citation
  • Lee, D.-B., 2021: Development and evaluation of global Korean aviation turbulence forecast systems using the operational numerical weather prediction model outputs. Ph.D. dissertation, Yonsei University, 143 pp., https://dcollection.yonsei.ac.kr/common/orgView/000000539276.

  • Lee, D.-B., H.-Y. Chun, and J.-H. Kim, 2020: Evaluation of multimodel-based ensemble forecasts for clear-air turbulence. Wea. Forecasting, 35, 507521, https://doi.org/10.1175/WAF-D-19-0155.1.

    • Search Google Scholar
    • Export Citation
  • Nakanishi, M., and H. Niino, 2009: Development of an improved turbulence closure model for the atmospheric boundary layer. J. Meteor. Soc. Japan, 87, 895912, https://doi.org/10.2151/jmsj.87.895.

    • Search Google Scholar
    • Export Citation
  • NCEP, 2022: NCEP Service Change Notice SCN22-104 Updated: Upgrade NCEP Global Forecast System to v16.3.0: Effective November 29, 2022. NCEP, 5 pp., https://www.weather.gov/media/notification/pdf2/scn22-104_gfs.v16.3.0_aaa.pdf.

  • Seity, Y., P. Brousseau, S. Malardel, G. Hello, P. Bénard, F. Bouttier, C. Lac, and V. Masson, 2011: The AROME-France convective-scale operational model. Mon. Wea. Rev., 139, 976991, https://doi.org/10.1175/2010MWR3425.1.

    • Search Google Scholar
    • Export Citation
  • Sharman, R. D., and J. M. Pearson, 2017: Prediction of energy dissipation rates for aviation turbulence. Part I: Forecasting nonconvective turbulence. J. Appl. Meteor. Climatol., 56, 317337, https://doi.org/10.1175/JAMC-D-16-0205.1.

    • Search Google Scholar
    • Export Citation
  • Sharman, R. D., C. Tebaldi, G. Wiener, and J. Wolff, 2006: An integrated approach to mid- and upper-level turbulence forecasting. Wea. Forecasting, 21, 268287, https://doi.org/10.1175/WAF924.1.

    • Search Google Scholar
    • Export Citation
  • Sharman, R. D., L. B. Cornman, G. Meymaris, T. Farrar, and J. Pearson, 2014: Description and derived climatologies of automated in situ eddy-dissipation-rate reports of atmospheric turbulence. J. Appl. Meteor. Climatol., 53, 14161432, https://doi.org/10.1175/JAMC-D-13-0329.1.

    • Search Google Scholar
    • Export Citation
  • Slingo, J., and T. Palmer, 2011: Uncertainty in weather and climate prediction. Philos. Trans. Roy. Soc., A369, 47514767, https://doi.org/10.1098/rsta.2011.0161.

    • Search Google Scholar
    • Export Citation
  • Storer, L. N., P. G. Gill, and P. D. Williams, 2019: Multi-model ensemble predictions of aviation turbulence. Meteor. Appl., 26, 416428, https://doi.org/10.1002/met.1772.

    • Search Google Scholar
    • Export Citation
  • Storer, L. N., P. G. Gill, and P. D. Williams, 2020: Multi-diagnostic multi-model ensemble forecasts of aviation turbulence. Meteor. Appl., 27, e1885, https://doi.org/10.1002/met.1885.

    • Search Google Scholar
    • Export Citation
  • Swinbank, R., and Coauthors, 2016: The TIGGE project and its achievements. Bull. Amer. Meteor. Soc., 97, 4967, https://doi.org/10.1175/BAMS-D-13-00191.1.

    • Search Google Scholar
    • Export Citation
  • Tewari, M., and Coauthors, 2004: Implementation and verification of the unified Noah land surface model in the WRF model. 20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 14.2a, https://ams.confex.com/ams/84Annual/techprogram/paper_69061.htm.

  • Thompson, G., P. R. Field, R. M. Rasmussen, and W. D. Hall, 2008: Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part II: Implementation of a new snow parameterization. Mon. Wea. Rev., 136, 50955115, https://doi.org/10.1175/2008MWR2387.1.

    • Search Google Scholar
    • Export Citation
  • Thompson, G., M. K. Politovich, and R. M. Rasmussen, 2017: A numerical weather model’s ability to predict characteristics of aircraft icing environments. Wea. Forecasting, 32, 207221, https://doi.org/10.1175/WAF-D-16-0125.1.

    • Search Google Scholar
    • Export Citation
  • Trier, S. B., R. D. Sharman, D. Munoz-Esparza, and T. L. Keller, 2022: Effects of distant organized convection on forecasts of widespread clear-air turbulence. Mon. Wea. Rev., 150, 25932615, https://doi.org/10.1175/MWR-D-22-0077.1.

    • Search Google Scholar
    • Export Citation
  • Walters, D., and Coauthors, 2019: The Met Office Unified Model Global Atmosphere 7.0/7.1 and JULES Global Land 7.0 configurations. Geosci. Model Dev., 12, 19091963, https://doi.org/10.5194/gmd-12-1909-2019.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

  • Xu, M., G. Thompson, D. R. Adriaansen, and S. D. Landolt, 2019: On the value of time-lag-ensemble averaging to improve numerical model predictions of aircraft icing conditions. Wea. Forecasting, 34, 507519, https://doi.org/10.1175/WAF-D-18-0087.1.

    • Search Google Scholar
    • Export Citation
  • Zängl, G., D. Reinert, P. Rípodas, and M. Baldauf, 2015: The ICON (ICOsahedral Non-hydrostatic) modelling framework of DWD and MPI-M: Description of the non-hydrostatic dynamical core. Quart. J. Roy. Meteor. Soc., 141, 563579, https://doi.org/10.1002/qj.2378.

    • Search Google Scholar
    • Export Citation
  • Zhou, X., Y. Zhu, D. Hou, Y. Luo, J. Peng, and R. Wobus, 2017: Performance of the new NCEP ensemble global forecast system in a parallel experiment. Wea. Forecasting, 32, 19892004, https://doi.org/10.1175/WAF-D-17-0023.1.

    • Search Google Scholar
    • Export Citation
Save
  • Bauer, P., A. Thorpe, and G. Brunet, 2015: The quiet revolution of numerical weather prediction. Nature, 525, 4755, https://doi.org/10.1038/nature14956.

    • Search Google Scholar
    • Export Citation
  • Ben Bouallègue, Z., and D. S. Richardson, 2021: On the ROC area of ensemble forecasts for rare events. Wea. Forecasting, 37, 787796, https://doi.org/10.1175/WAF-D-21-0195.1.

    • Search Google Scholar
    • Export Citation
  • Benjamin, S. G., and Coauthors, 2016: A North American hourly assimilation and model forecast cycle: The Rapid Refresh. Mon. Wea. Rev., 144, 16691694, https://doi.org/10.1175/MWR-D-15-0242.1.

    • Search Google Scholar
    • Export Citation
  • Buizza, R., P. L. Houtekamer, Z. Toth, G. Pellerin, M. Wei, and Y. Zhu, 2005: A comparison of the ECMWF, MSC, and NCEP global ensemble prediction systems. Mon. Wea. Rev., 133, 10761097, https://doi.org/10.1175/MWR2905.1.

    • Search Google Scholar
    • Export Citation
  • Bush, M., and Coauthors, 2020: The first Met Office Unified Model–JULES regional atmosphere and land configuration, RAL1. Geosci. Model Dev., 13, 19992029, https://doi.org/10.5194/gmd-13-1999-2020.

    • Search Google Scholar
    • Export Citation
  • Casati, B., and Coauthors, 2008: Forecast verification: Current status and future directions. Meteor. Appl., 15, 318, https://doi.org/10.1002/met.52.

    • Search Google Scholar
    • Export Citation
  • Ellrod, G. P., and D. I. Knapp, 1992: An objective clear-air turbulence forecasting technique: Verification and operational use. Wea. Forecasting, 7, 150165, https://doi.org/10.1175/1520-0434(1992)007<0150:AOCATF>2.0.CO;2.

    • Search Google Scholar
    • Export Citation
  • Ellrod, G. P., and J. A. Knox, 2010: Improvements to an operational clear-air turbulence diagnostic index by addition of a divergence trend term. Wea. Forecasting, 25, 789798, https://doi.org/10.1175/2009WAF2222290.1.

    • Search Google Scholar
    • Export Citation
  • Gill, P. G., 2016: Aviation turbulence forecast verification. Aviation Turbulence: Processes, Detection, Prediction, R. Sharman and T. Lane, Eds., Springer, 261–283, https://doi.org/10.1007/978-3-319-23630-8_13.

  • Gill, P. G., and P. Buchanan, 2014: An ensemble based turbulence forecasting system. Meteor. Appl., 21, 1219, https://doi.org/10.1002/met.1373.

    • Search Google Scholar
    • Export Citation
  • Grell, G. A., and S. R. Freitas, 2014: A scale and aerosol aware stochastic convective parameterization for weather and air quality modeling. Atmos. Chem. Phys., 14, 52335250, https://doi.org/10.5194/acp-14-5233-2014.

    • Search Google Scholar
    • Export Citation
  • Haiden, T., M. Janousek, F. Vitart, Z. Ben-Bouallegue, L. Ferranti, C. Prates, and D. Richardson, 2021: Evaluation of ECMWF forecasts, including the 2020 upgrade. ECMWF Tech. Memo. 880, 54 pp., https://www.ecmwf.int/en/elibrary/19879-evaluation-ecmwf-forecasts-including-2020-upgrade.

  • Iacono, M. J., J. S. Delamere, E. J. Mlawer, M. W. Shephard, S. A. Clough, and W. D. Collins, 2008: Radiative forcing by long-lived greenhouse gases: Calculations with the AER radiative transfer models. J. Geophys. Res., 113, D13103, https://doi.org/10.1029/2008JD009944.

    • Search Google Scholar
    • Export Citation
  • Kim, J.-H., W. N. Chan, B. Sridhar, and R. Sharman, 2015: Combined winds and turbulence prediction system for automated air-traffic management applications. J. Appl. Meteor. Climatol., 54, 766784, https://doi.org/10.1175/JAMC-D-14-0216.1.

    • Search Google Scholar
    • Export Citation
  • Kim, J.-H., R. Sharman, M. Strahan, J. W. Scheck, C. Bartholomew, J. C. H. Cheung, P. Buchanan, and N. Gait, 2018: Improvements in nonconvective aviation turbulence prediction for the World Area Forecast System. Bull. Amer. Meteor. Soc., 99, 22952311, https://doi.org/10.1175/BAMS-D-17-0117.1.

    • Search Google Scholar
    • Export Citation
  • Kim, S.-H., H.-Y. Chun, D.-B. Lee, J.-H. Kim, and R. D. Sharman, 2021: Improving numerical weather prediction–based near-cloud aviation turbulence forecasts by diagnosing convective gravity wave breaking. Wea. Forecasting, 36, 17351757, https://doi.org/10.1175/WAF-D-20-0213.1.

    • Search Google Scholar
    • Export Citation
  • Ko, H.-C., H.-Y. Chun, R. Wilson, and M. A. Geller, 2019: Characteristics of atmospheric turbulence retrieved from high vertical‐resolution radiosonde data in the United States. J. Geophys. Res. Atmos., 124, 75537579, https://doi.org/10.1029/2019JD030287.

    • Search Google Scholar
    • Export Citation
  • Lee, D.-B., 2021: Development and evaluation of global Korean aviation turbulence forecast systems using the operational numerical weather prediction model outputs. Ph.D. dissertation, Yonsei University, 143 pp., https://dcollection.yonsei.ac.kr/common/orgView/000000539276.

  • Lee, D.-B., H.-Y. Chun, and J.-H. Kim, 2020: Evaluation of multimodel-based ensemble forecasts for clear-air turbulence. Wea. Forecasting, 35, 507521, https://doi.org/10.1175/WAF-D-19-0155.1.

    • Search Google Scholar
    • Export Citation
  • Nakanishi, M., and H. Niino, 2009: Development of an improved turbulence closure model for the atmospheric boundary layer. J. Meteor. Soc. Japan, 87, 895912, https://doi.org/10.2151/jmsj.87.895.

    • Search Google Scholar
    • Export Citation
  • NCEP, 2022: NCEP Service Change Notice SCN22-104 Updated: Upgrade NCEP Global Forecast System to v16.3.0: Effective November 29, 2022. NCEP, 5 pp., https://www.weather.gov/media/notification/pdf2/scn22-104_gfs.v16.3.0_aaa.pdf.

  • Seity, Y., P. Brousseau, S. Malardel, G. Hello, P. Bénard, F. Bouttier, C. Lac, and V. Masson, 2011: The AROME-France convective-scale operational model. Mon. Wea. Rev., 139, 976991, https://doi.org/10.1175/2010MWR3425.1.

    • Search Google Scholar
    • Export Citation
  • Sharman, R. D., and J. M. Pearson, 2017: Prediction of energy dissipation rates for aviation turbulence. Part I: Forecasting nonconvective turbulence. J. Appl. Meteor. Climatol., 56, 317337, https://doi.org/10.1175/JAMC-D-16-0205.1.

    • Search Google Scholar
    • Export Citation
  • Sharman, R. D., C. Tebaldi, G. Wiener, and J. Wolff, 2006: An integrated approach to mid- and upper-level turbulence forecasting. Wea. Forecasting, 21, 268287, https://doi.org/10.1175/WAF924.1.

    • Search Google Scholar
    • Export Citation
  • Sharman, R. D., L. B. Cornman, G. Meymaris, T. Farrar, and J. Pearson, 2014: Description and derived climatologies of automated in situ eddy-dissipation-rate reports of atmospheric turbulence. J. Appl. Meteor. Climatol., 53, 14161432, https://doi.org/10.1175/JAMC-D-13-0329.1.

    • Search Google Scholar
    • Export Citation
  • Slingo, J., and T. Palmer, 2011: Uncertainty in weather and climate prediction. Philos. Trans. Roy. Soc., A369, 47514767, https://doi.org/10.1098/rsta.2011.0161.

    • Search Google Scholar
    • Export Citation
  • Storer, L. N., P. G. Gill, and P. D. Williams, 2019: Multi-model ensemble predictions of aviation turbulence. Meteor. Appl., 26, 416428, https://doi.org/10.1002/met.1772.

    • Search Google Scholar
    • Export Citation
  • Storer, L. N., P. G. Gill, and P. D. Williams, 2020: Multi-diagnostic multi-model ensemble forecasts of aviation turbulence. Meteor. Appl., 27, e1885, https://doi.org/10.1002/met.1885.

    • Search Google Scholar
    • Export Citation
  • Swinbank, R., and Coauthors, 2016: The TIGGE project and its achievements. Bull. Amer. Meteor. Soc., 97, 4967, https://doi.org/10.1175/BAMS-D-13-00191.1.

    • Search Google Scholar
    • Export Citation
  • Tewari, M., and Coauthors, 2004: Implementation and verification of the unified Noah land surface model in the WRF model. 20th Conf. on Weather Analysis and Forecasting/16th Conf. on Numerical Weather Prediction, Seattle, WA, Amer. Meteor. Soc., 14.2a, https://ams.confex.com/ams/84Annual/techprogram/paper_69061.htm.

  • Thompson, G., P. R. Field, R. M. Rasmussen, and W. D. Hall, 2008: Explicit forecasts of winter precipitation using an improved bulk microphysics scheme. Part II: Implementation of a new snow parameterization. Mon. Wea. Rev., 136, 50955115, https://doi.org/10.1175/2008MWR2387.1.

    • Search Google Scholar
    • Export Citation
  • Thompson, G., M. K. Politovich, and R. M. Rasmussen, 2017: A numerical weather model’s ability to predict characteristics of aircraft icing environments. Wea. Forecasting, 32, 207221, https://doi.org/10.1175/WAF-D-16-0125.1.

    • Search Google Scholar
    • Export Citation
  • Trier, S. B., R. D. Sharman, D. Munoz-Esparza, and T. L. Keller, 2022: Effects of distant organized convection on forecasts of widespread clear-air turbulence. Mon. Wea. Rev., 150, 25932615, https://doi.org/10.1175/MWR-D-22-0077.1.

    • Search Google Scholar
    • Export Citation
  • Walters, D., and Coauthors, 2019: The Met Office Unified Model Global Atmosphere 7.0/7.1 and JULES Global Land 7.0 configurations. Geosci. Model Dev., 12, 19091963, https://doi.org/10.5194/gmd-12-1909-2019.

    • Search Google Scholar
    • Export Citation
  • Wilks, D. S., 2011: Statistical Methods in the Atmospheric Sciences. 3rd ed. International Geophysics Series, Vol. 100, Academic Press, 704 pp.

  • Xu, M., G. Thompson, D. R. Adriaansen, and S. D. Landolt, 2019: On the value of time-lag-ensemble averaging to improve numerical model predictions of aircraft icing conditions. Wea. Forecasting, 34, 507519, https://doi.org/10.1175/WAF-D-18-0087.1.

    • Search Google Scholar
    • Export Citation
  • Zängl, G., D. Reinert, P. Rípodas, and M. Baldauf, 2015: The ICON (ICOsahedral Non-hydrostatic) modelling framework of DWD and MPI-M: Description of the non-hydrostatic dynamical core. Quart. J. Roy. Meteor. Soc., 141, 563579, https://doi.org/10.1002/qj.2378.

    • Search Google Scholar
    • Export Citation
  • Zhou, X., Y. Zhu, D. Hou, Y. Luo, J. Peng, and R. Wobus, 2017: Performance of the new NCEP ensemble global forecast system in a parallel experiment. Wea. Forecasting, 32, 19892004, https://doi.org/10.1175/WAF-D-17-0023.1.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Illustration of pseudo-GEFS forecasts valid at T0 (marked by red circles). The 1° GEFS 0-h forecasts were used to initialize the pseudo- GEFS forecasts, and the 1° GEFS forecasts were used to provide LBC forcings to the pseudo-GEFS forecasts. Gray triangles show when and what GEFS forecasts are used to provide LBC forcing to the pseudo-GEFS forecasts.

  • Fig. 2.

    In situ EDR and PIREP observations of moderate (MOD) and severe (SEV) turbulence at upper levels (≥20 000 ft) and within a ±1-h time window with symbols specified in the legend, and lightning area (shaded in orange): (a) 1800 UTC 3 Dec 2019 and (b) 1800 UTC 11 Jan 2020. The cross marks in cyan denote the horizontal location of the Grand Junction station (39.12°N, 108.53°W).

  • Fig. 3.

    Ensemble spreads of 250-hPa geopotential height (dam) (contour lines, 1-dam interval) and wind speed (m s−1) (shaded) derived from GEFS (1°) and pseudo-GEFS (13 km) forecasts valid at (a) 1800 UTC 3 Dec 2019 and (b) 1800 UTC 11 Jan 2020, respectively, for forecast lead times of 12, 24, and 36 h.

  • Fig. 4.

    Vertical profiles of (a),(d) the magnitude of vertical wind shear (s−1); (b),(e) the square of Brunt–Väisälä frequency (N2); and (c),(f) gradient Richardson number (Ri = N2/S2) at the Grand Junction station (39.12°N, 108.53°W) derived from the 1° GEFS (solid black with dots indicating vertical grid spacing) and 13-km pseudo-GEFS (dashed black) 12-h forecasts valid at (top) 1800 UTC 3 Dec 2019 and (bottom) 1800 UTC 11 Jan 2020. The HVRRD observation soundings available at 1200 UTC for each case are presented for comparison (solid gray); the HVRRD soundings are displayed every 50 m for better representation. The horizontal location of the Grand Junction station is indicated in Fig. 2.

  • Fig. 5.

    An example of three ensemble members of the MDE method: (a) Ellrod3, (b) FTH/Ri, and (c) VARE/Ri remapped to EDR scales at FL370, valid at 1800 UTC 3 Dec 2019; an example of ensemble members of the TLE method: (d) 12-h, (e) 24-h, and (f) 36-h GTGMAX forecasts. Cross marks are light or greater (LOG) in situ EDR reports (EDR ≥0.15) within ±1-h time window and ±1000-ft altitude colored by EDR scales (m2/3 s−1).

  • Fig. 6.

    (a)–(g) Ensemble spreads of EDR (m2/3 s−1) at FL370 derived from 12-h probabilistic forecasts valid at 1800 UTC 3 Dec 2019, together with (h) 12-h deterministic EDR forecasts derived from the control (unperturbed) NWP member of the pseudo-GEFS. In (a)–(g), in situ EDR reports of light (green cross marks), moderate (yellow cross marks), and SOG (red cross marks) turbulence within ±1-h time window and ±1000-ft altitude are shown; in (h), the same in situ EDR reports are colored by EDR scales, with null-to-light reports in white dots.

  • Fig. 7.

    Ensemble spreads of (left) CAT and (right) MWT diagnostics remapped to EDR (m2/3 s−1) derived from 12-h probabilistic forecasts based on the MDE-19D approach: (a),(b) 1800 UTC 3 Dec 2019 at FL370 and (c),(d) 1800 UTC 11 Jan 2020 at FL340.

  • Fig. 8.

    As in Fig. 6, but for 1800 UTC 11 Jan 2020 at FL340.

  • Fig. 9.

    (a)–(g) 12-h probabilistic forecasts (%) of light turbulence (0.15 ≤ EDR < 0.22) at FL370 valid at 1800 UTC 3 Dec 2019, together with (h) 12-h deterministic EDR forecasts (m2/3 s−1). In situ EDR reports of light turbulence within ±1-h time window and ±1000-ft altitude are marked by green circles in (a)–(g); in (h), in situ EDR reports of light or greater turbulence within the same time and altitude windows are colored by EDR scales, with null-to-light reports in white dots.

  • Fig. 10.

    As in Fig. 9, but for moderate turbulence (0.22 ≤ EDR < 0.34) and in situ EDR reports of moderate turbulence marked by yellow circles in (a)–(g).

  • Fig. 11.

    As in Fig. 10, but for SOG turbulence (EDR ≥ 0.34) and in situ EDR reports of SOG turbulence marked by red circles in (a)–(g).

  • Fig. 12.

    The 12-h probabilistic forecasts (%) of (a)–(d) light, (e)–(h) moderate, and (i)–(l) SOG turbulence at FL340 valid at 1800 UTC 11 Jan 2020. In situ EDR reports within ±1-h time window and ±1000-ft altitude are marked by circles: light (green), moderate (yellow), and SOG (red) turbulence reports.

  • Fig. 13.

    ROC curves of (a) LOG and (b) MOG turbulence forecasts derived from MDE-19D (solid blue), MDE-77D (dotted blue), TLE-5TL (solid red), FME-21EM (solid green), TMDE-5TL×19D (solid purple), TMDE-5TL×77D (dotted purple), and FMDE-21EM×19D (solid cyan), together with a random classifier (gray).

  • Fig. 14.

    Histograms of predicted probabilities of (a) LOG and (b) MOG turbulence that correspond to in situ EDR reports and PIREPs converted to EDR (1 280 246 samples). The number of bins equals to NWx in the case of TLE and FME, and the maximum between NCAT and NWx in the case of MDE, TMDE, and FMDE. Frequency of turbulence (%) shown at the bottom of each panel is the ratio of forecasts that predict turbulence (probability > 0%) to the total number of EDR observations.

  • Fig. 15.

    Reliability diagrams of (a) LOG and (b) MOG turbulence forecasts derived from MDE-19D (solid blue), MDE-77D (dotted blue), TLE-5TL (solid red), FME-21EM (solid green), TMDE-5TL×19D (solid purple), TMDE-5TL×77D (dotted purple), and FMDE-21EM×19D (solid cyan); reliability diagrams with forecast probability divided by a calibration constant (6.0 for LOG and 12.0 for MOG) are shown in the insets. The diagonal (black solid) and horizontal (black dashed) lines indicate perfect reliability and no resolution (observed climatology), respectively.

  • Fig. A1.

    Pseudo-GEFS model domain with terrain heights (m).

All Time Past Year Past 30 Days
Abstract Views 535 37 0
Full Text Views 378 203 22
PDF Downloads 360 189 23