1. Introduction
Aircraft-scale turbulence (about 10–1000 m) encounters are a major meteorological hazard for the aviation industry, as it can cause aircraft damage and injuries to cabin crews and passengers. As this hazard is a significant safety issue and can cost millions of dollars for airlines, aviation experts and researchers have investigated ways for avoiding turbulence encounters through onboard or remote detection techniques and improved forecasting (Sharman et al. 2012; Sharman and Lane 2016).
In general, aviation turbulence is classified according to its generation source. One source is convective systems such as convective clouds or thunderstorms. Turbulence associated with convective systems [i.e., convectively induced turbulence (CIT)] can be additionally differentiated depending on whether the generated location is outside or inside the cloud boundary (e.g., Sharman and Lane 2016; Kim et al. 2019). Moreover, turbulence generated by convective gravity wave (CGW) propagation and breaking outside clouds is called near-cloud turbulence (NCT) (Lane et al. 2012). Turbulence not associated with convective systems is classified as clear-air turbulence (CAT), which is mainly caused by strong wind shear and unbalanced flow near upper-level fronts and jet streams (e.g., Dutton and Panofsky 1970; Ellrod and Knapp 1992; Knox 1997). Propagating and breaking of mountain waves are another major source of turbulence in clear air, which is called mountain wave turbulence (MWT) (Sharman et al. 2012; Sharman and Lane 2016). As CAT, MWT, and NCT are invisible and cannot be detected even indirectly through onboard radars that detect convective clouds, avoiding them during flight is problematic.
The common method of turbulence forecasting is based on numerical weather prediction (NWP) model output that can provide information on regions in which turbulence will likely occur in the future, and allow strategic planning for the safe flight of commercial air carriers (Sharman and Pearson 2017). It is still impossible for the NWP model-based forecast systems to resolve aircraft-scale turbulence explicitly, although the latest global NWP models have greatly improved with developing computing technologies. Therefore, most turbulence forecast systems assume an energy cascade from large-scale disturbances to small-scale eddies that are responsible for turbulence (e.g., Tung and Orlando 2003; Sharman et al. 2006; Kim et al. 2011) and predict potential turbulence regions by computing turbulence diagnostics from NWP model which resolves synoptic or mesoscale atmospheric disturbances. Most developed turbulence diagnostics are related to CAT prediction and mostly consist of evaluations of strong spatial gradients of meteorological parameters (i.e., temperature and wind components), many of which originate in jet streams and frontal systems. For MWT, Sharman and Pearson (2017) proposed new MWT diagnostics which combine CAT diagnostics and a two-dimensional MWT-related parameter derived based on variables of the low-level wind speed, terrain height, and gradient of terrain height, which was also tested with CAT diagnostics to improve the World Area Forecast System (Kim et al. 2018).
It has been challenging to develop CIT or NCT diagnostics, as it is difficult to represent the meteorological factors related to CIT or NCT generation in current NWP models. Although the current operational turbulence forecast systems have considered CIT or NCT implicitly based on CAT diagnostics since they are intended to capture strong spatial gradients in or outside clouds (Sharman and Pearson 2017; Kim et al. 2018, 2019), CIT or NCT diagnostics that explicitly represent their generation mechanisms have not been developed so far. However, recently Kim et al. (2019) developed NCT diagnostics related to CGW breaking based on a CGW drag (CGWD) parameterization scheme originally proposed by Chun and Baik (1998), and examined the feasibility of the NCT diagnostics using the outputs of the multiscale numerical simulations for two real NCT cases.
Sharman et al. (2006) first introduced the graphical turbulence guidance (GTG) system for forecasting turbulence at upper levels over the United States. The system was developed based on an integrated approach that combines several CAT diagnostics with optimal weighting scores. They showed that the ensemble mean of the diagnostics provides better forecast skill than the individual diagnostics, which has been also demonstrated in other studies (e.g., Kim et al. 2011; Kim and Chun 2012; Gill and Buchanan 2014; Lee and Chun 2014; Kim et al. 2015, 2018). Sharman and Pearson (2017) upgraded the GTG system (GTG version 3; GTG3) by including the following key improvements: (i) the range of flight altitudes is extended from the surface to 50 000 ft; (ii) the GTG system produces a forecast of the cube root of energy dissipation rate (EDR; m2/3 s−1), which is the standard atmospheric turbulence metric of the International Civil Aviation Organization (ICAO 2001); and (iii) new MWT diagnostics proposed in Sharman and Pearson (2017) are implemented.
In South Korea, the Korean aviation turbulence guidance (KTG) system was developed (Kim and Chun 2012; Lee and Chun 2014) using the pilot reports (PIREPs) and the output of a regional operational weather forecasting system of Korean Meteorological Administration (KMA), named Regional Data Assimilation and Prediction System (RDAPS), based on an integrated approach similar to GTG. The KTG system consists of the twenty best performing CAT diagnostics for East Asia and provides the ensemble mean forecasts of those diagnostics at flight level (FL) 100–FL450 (10 000–45 000 ft). KTG has been operationally used by the Aviation Meteorological Office (AMO) of Korea since 2012 (http://global.amo.go.kr). Lee and Chun (2018) extended the KTG system to cover the global region (i.e., G-KTG system) based on the operational Global Data Assimilation and Prediction System (GDAPS) of KMA with a horizontal grid spacing of 17 km.
Given that both turbulence diagnostics and the underlying NWP model have inherent uncertainties, using only deterministic forecasts will not provide the degree of confidence required for turbulence prediction. To mitigate uncertainties and provide more reliable turbulence predictions, studies on the development of probabilistic turbulence forecasts using ensemble systems have been conducted (e.g., Gill and Buchanan 2014; Kim et al. 2015, 2018; Storer et al. 2019; Lee et al. 2020). Gill and Buchanan (2014) computed probabilistic turbulence forecasts using the ensemble members of Met Office Global and Regional Ensemble Prediction System (MOGREPS; Bowler et al. 2008), while Kim et al. (2018) suggested a multi-diagnostic approach for computing probabilistic turbulence forecasts using a single forecast from the Global Ensemble Forecast System (Sela 2010). Recently, Storer et al. (2019) and Lee et al. (2020) evaluated multimodel ensemble-based probabilistic forecasts using two ensemble systems (MOGREPS and European Centre for Medium-Range Weather Forecasts; Molteni et al. 1996) and seven ensemble systems of the THORPEX Interactive Grand Global Ensemble (TIGGE; Swinbank et al. 2016), respectively. Most of those previous studies have shown that probabilistic turbulence forecasts have better performance skills than deterministic turbulence forecasts.
As the operational GDAPS of the KMA has been upgraded with a finer grid spacing of 10 km since June 2018, a new G-KTG system based on the latest GDAPS is desired to provide continuous turbulence predictions to forecasters, dispatchers, and pilots. In this study, the G-KTG system is upgraded using the latest operational GDAPS data of the KMA with a novel and advanced approach considering all possible source of turbulence including CAT, MWT, and NCT. To determine the suite of individual turbulence diagnostics that optimize the performance skills of the G-KTG, both suites of the individual turbulence diagnostics that compose the GTG3 and KTG combinations are tested. In the G-KTG system presented in this study, the NCT diagnostics developed by Kim et al. (2019) are included in addition to the CAT and MWT diagnostics. Also, a global Korean probabilistic turbulence forecast (G-KPT) system is developed using GDAPS based on a multi-diagnostic approach. The statistical performance skills of the G-KTG and G-KPT are evaluated against one year (September 2018–August 2019) of global turbulence in situ observations. Although concentration here is on the development of a turbulence forecast system for use by the KMA, the results should be generally useful to other applications as well. Here, section 2 describes the GDAPS data, global turbulence observations, and the methodologies of the G-KTG and G-KPT systems. In section 3, the results of the statistical evaluations of the G-KTG and G-KPT systems against the turbulence observations of one year are presented. Last, a summary and conclusion are provided in section 4.
2. Data and methodology
a. GDAPS and turbulence observation data
The current operational GDAPS of the KMA is based on the Unified Model (UM) of the U.K. Met Office (Prasanna et al. 2018; https://www.kma.go.kr/aboutkma/intro/supercom/model/model_manage_2020.jsp). The latest version of GDAPS provides midterm weather forecasts over the globe with a horizontal resolution of N1280 (∼10 km in midlatitudes). The GDAPS runs four times a day (at 0000, 0600, 1200, and 1800 UTC) out to a maximum lead time of 288 h. In this study, to develop the G-KTG and G-KPT systems, the GDAPS consisting of 26 pressure levels with a model top of 0.4 hPa is used.
As in previous studies (e.g., Gill and Buchanan 2014; Sharman and Pearson 2017; Kim et al. 2018), turbulence observations from in situ equipped aircraft are used to evaluate the discrimination skills of the G-KTG and G-KPT systems against null and moderate or greater (MOG) turbulence reports. In situ EDR data from two sources are used as turbulence observations in this study. The first dataset is EDR estimates in real time using an EDR algorithm based on vertical winds (Cornman 2016; Sharman et al. 2014). The EDR estimates used were recorded by United Airlines, Southwest Airlines, and Delta Air Lines. These data are obtained from NCAR for research purposes (hereafter, NCAR-EDR). The data include 1-min mean and peak values of EDR computed continuously at 1-min intervals, which are downlinked routinely at 15- or 20-min intervals, or immediately, when estimated EDR exceeds a certain threshold (Sharman et al. 2014; Sharman and Pearson 2017).
The second dataset is EDR estimates converted from the derived equivalent vertical gust velocity (DEVG) (e.g., Sherman 1985; Hoblit 1988) of the Aircraft Meteorological Data Relay (AMDAR) data (Kim et al. 2017, 2020). The AMDAR data provide EDR or DEVG at various reporting frequencies from major airlines currently using EDR or DEVG estimation algorithms. The EDR and DEVG data mainly cover the Northern (NH) and Southern Hemispheres (SH), respectively (Kim et al. 2020). To evaluate the G-KTG and G-KPT systems using consistent turbulence observations covering both hemispheres, DEVG from AMDAR is converted into EDR using the best-fit quadratic equation established in Kim et al. (2017). Kim et al. (2017) established two best-fit quadratic equations for the conversion of DEVG into EDR based on a one-to-one comparison of the DEVG and EDR observations recorded by Boeing and Airbus aircraft. Kim et al. (2020) then evaluated the DEVG-derived EDR estimated by using the quadratic equation for the Boeing aircraft of Kim et al. (2017) against in situ flight EDR estimates, and it was shown that these DEVG-derived EDR were generally consistent with in situ flight EDR, and bolsters confidence in the use of both in situ flight EDR and DEVG converted to EDR in the evaluations. In this study, following Kim et al. (2020), the DEVG-derived EDR (hereafter, AMDAR-EDR) is obtained based on the best-fit quadratic equation for Boeing aircraft established in Kim et al. (2017). In addition, because the DEVG recorded from climbing or descending aircraft can include erroneous values, the EDR conversion is performed using only the DEVG values at cruising levels above 15 000 ft (Kim et al. 2017, 2020).
In this study, null and MOG turbulence events observed from NCAR-EDR and AMDAR-EDR data within one hour of the valid NWP time (1800 UTC) for one year (September 2018–August 2019) are used to evaluate the G-KTG and G-KPT forecasts. Although the GDAPS is produced four times a day, to conduct research more efficiently, we focus on 1800 UTC at which a large volume of turbulence observation data are recorded from the U.S. carriers, following previous studies (Sharman and Pearson 2017; Kim et al. 2018; Lee et al. 2020). To categorize the turbulence intensity, a threshold of EDR ≥ 0.22 m2/3 s−1 for MOG turbulence (Sharman and Pearson 2017; Kim et al. 2018) is applied to both the AMDAR-EDR and NCAR-EDR, which differs slightly from the threshold value of EDR ≥ 0.2 m2/3 s−1 established in ICAO (ICAO 2018). For null turbulence, the threshold of EDR < 0.05 m2/3 s−1 is applied to both types of turbulence observations. Although Sharman and Pearson (2017) applied the threshold of EDR < 0.15 m2/3 s−1 for null turbulence, this study uses a lower EDR threshold to reduce the large numbers of repetitious null turbulence events, recorded at 1-min regular intervals.
Figure 1 shows the horizontal distributions of the null and MOG turbulence events from the NCAR-EDR and AMDAR-EDR data for three altitude bands [upper levels: FL200–FL500 (20 000–50 000 ft), midlevels: 10 kft MSL–FL200, and low levels: surface–10 kft MSL]. The numbers of null and MOG turbulence events shown in Fig. 1 are tabulated in Table 1. At upper levels, the NCAR-EDR observations (Fig. 1a) cover the North American, North Atlantic, and Pacific regions along the major international flight routes of U.S. airlines, with the vast majority over the conterminous United States. By contrast, the null and MOG turbulence events from the AMDAR-EDR observations (Fig. 1b) are mainly reported in the SH over the Indian Ocean, Australia, South Africa, South America, and Southern Pacific Ocean where NCAR-EDR observations are rare because the major airlines operating in the SH (e.g., Qantas and South African Airways) mainly record DEVG (Kim et al. 2020). At mid- and low levels (Figs. 1c–e), most null and MOG turbulence events are observed near major airports in the United States since most turbulence events at mid- and low levels originate from the NCAR-EDR data. Some turbulence events are observed over South Africa, Australia, and New Zealand at midlevels (Fig. 1d). The results show that the use of the two EDR observation datasets facilitates the performance evaluations of G-KTG and G-KPT globally, encompassing both the NH and SH, although the observation data at mid- and low levels have limitations in that coverage is mainly over the United States. Also, note that the AMDAR data at cruising altitudes are reported at intervals of 3–7 min (WMO 2014) while the NCAR-EDR data have a temporal resolution of 1 min, thus possibly affecting the evaluation results when both observation sources are used.
Spatial distributions of null- and MOG-level turbulence events at (a),(b) upper, (c),(d) mid-, and (e) low levels over the global region from (left) NCAR-EDR and (right) AMDAR-EDR observations reported within 1 h centered at 1800 UTC for 1 year (September 2018–August 2019). Null-, moderate-, and severe-level turbulence reports are depicted as green, orange, and red dots, respectively. Black boxes in (a) represent the U.S. and East Asia regions considered in the evaluation of the G-KTG combinations in section 3.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Numbers corresponding to null-, moderate-, and severe-level turbulence reported from NCAR-EDR and AMDAR-EDR data at low (surface–10 kft MSL), mid- (10 kft MSL–FL200), and upper levels (FL200–FL500) over the global region within 1 h centered at 1800 UTC during a 1-yr period (September 2018–August 2019), which are used in the evaluation of G-KTG.
b. G-KTG system (deterministic)
In this study, the G-KTG is developed based on the GTG3 methodology and the three-step procedure outlined in Sharman and Pearson (2017) and Kim et al. (2018).
1) Computing individual turbulence diagnostics
The first step is to compute the individual turbulence diagnostics using the meteorological variables from the GDAPS output. In addition to CAT and MWT diagnostics, the G-KTG includes the newly developed NCT diagnostics (Kim et al. 2019), which are not yet included in most operational turbulence forecast systems. Kim et al. (2021) showed that adding NCT diagnostics to the CAT turbulence forecasts can improve the performance skill. Among the eight NCT diagnostics proposed by Kim et al. (2019), only two NCT diagnostics—CGWD and EDR induced by CGWD (EDRCGWD)—are used in this study (Kim et al. 2021). Formulations of CGWD and EDRCGWD can be found in Eqs. (1)–(10) of Kim et al. (2019, 2021).
The CAT and MWT diagnostics used for G-KTG are taken from the current ensemble of GTG3 and KTG component diagnostics. The suites of turbulence diagnostics applied to the G-KTG combination differ between low (surface–10 kft MSL), mid- (10 kft MSL–FL200), and upper levels (FL200–FL500). A list of the CAT and MWT diagnostics used in GTG3 and KTG for each altitude band is provided in Tables 2 and 3, respectively. While the GTG3 suites of turbulence diagnostics are provided for upper, mid-, and low levels (Table 2), the suites of component diagnostics from KTG are only applied to upper and midlevels (Table 3) since KTG was not developed for low levels. For MWT diagnostics, the suite used in GTG3 is applied. The detailed formulations and references of CAT and MWT diagnostics used in this study are presented in Sharman et al. (2006), Jang et al. (2009), Kim et al. (2009), and Sharman and Pearson (2017).
Names, coefficients a and b used in the EDR remapping equation, and AUC values for individual CAT, MWT, and NCT diagnostics, which are components of the Group-1-based G-KTG combinations, derived from the GDAPS 12-h forecast outputs during a 1-yr period (September 2018–August 2019) at upper, mid-, and low levels.
As in Table 2, but for the CAT diagnostics, which are components of the Group-2-based G-KTG combinations, at upper and midlevels. MWT and NCT diagnostics that compose the Group-2-based G-KTG are the same as shown in Table 2.
The NCT diagnostics are based on the CGWD parameterization scheme formulated by Chun and Baik (1998), which uses the convective heating rate (CHR)—an essential but nonstandard output in most global NWP models including GDAPS. To overcome this limitation, Kim et al. (2021) estimated the CHR using the lapse rate and vertical velocity. Results from estimated CHR showed good agreement with CHR calculated directly from the cumulus parameterization scheme [see Figs. 1 and 2 of Kim et al. (2021)] based on two months (January and July 2017) of GDAPS simulations. In this study, we estimate the CHR following the method by Kim et al. (2021) [see Eqs. (10)–(12) of Kim et al. (2021)]. The two NCT diagnostics calculated use postprocessed CHR and are only applied for upper levels (Table 2) since NCT diagnostics are calculated above the convective cloud top height.
2) Converting turbulence diagnostics to EDR
Here, overbar and σ denote the mean and standard deviation, respectively, and ε1/3 is the climatological observed turbulence intensity in the troposphere and lower stratosphere. Based on the premise that the NWP model-based turbulence diagnostics and observed EDR follow a lognormal distribution (e.g., Cho et al. 2003; Sharman and Pearson 2017), the mean and standard deviation of ln Di and ln ε1/3 for a and b are estimated based on the lognormal distribution that best represents the probability density functions (PDFs) of the turbulence diagnostics and observed EDR, respectively. In this study, the
Example probability density functions (circles) of individual turbulence diagnostics used as G-KTG component diagnostic at (top) upper, (middle) mid-, and (bottom) low levels, which are derived from the GDAPS 12-h forecast of 1 year (September 2018–August 2019). The curves are the best lognormal curve fits for the range of relatively high values of the turbulence diagnostics (filled circles). The vertical dashed lines represent the values of the turbulence diagnostics that correspond to the EDR thresholds of 0.15, 0.22, and 0.34 m2/3 s−1 for light-, moderate-, and severe-level turbulence, respectively.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
3) Ensemble mean of turbulence diagnostics
In addition to the G-KTG forecasts for CAT, MWT, and NCT, three types of G-KTG combinations can be generated: (i) CAT and MWT, (ii) CAT and NCT, and (iii) CAT, NCT, and MWT. Here, the NCT-MWT combination is excluded because these turbulence diagnostics can only predict turbulence in specified source areas, such as mountains and convection systems. These three combinations are derived by the maximum of the G-KTG forecasts at each grid point. For example, the combination of CAT and MWT G-KTG forecasts is obtained by a maximum of CAT and MWT G-KTG forecasts. In this study, a suite of GTG3 CAT diagnostics and a suite of KTG CAT diagnostics are compared, and the G-KTG combination sets including each of those suites of CAT diagnostics are classified into Group-1 and Group-2, respectively.
c. G-KPT system (probabilistic)
In this study, the GDAPS-based global Korean probabilistic turbulence forecast (G-KPT) system is developed based on a multi-diagnostic approach, which can complement the lack of spread obtained when the members of the ensemble prediction system with a short forecast period (1–2 days) are used (Kim et al. 2018; Lee et al. 2020). This approach also can improve the forecasting performance for turbulence generated by various mechanisms. Referring to the multi-diagnostic ensemble approach proposed by Kim et al. (2018), the G-KPT forecast is calculated as the ratio of the turbulence diagnostics that exceed a certain EDR threshold for MOG turbulence (EDR ≥ 0.22 m2/3 s−1) among the GDAPS-based turbulence diagnostics converted into EDR. The G-KPT forecasts for CAT and MWT are derived based on 15 CAT and 15 MWT diagnostics, respectively. The NCT diagnostics are excluded in the G-KPT computation, as only two NCT diagnostics are considered in this study. Subsequently, the maximum between the CAT G-KPT and MWT G-KPT forecasts at each grid point is denoted as maxG-KPT forecast. Among all the GTG3- and KTG-based CAT diagnostics, the 15 CAT diagnostics that show the best performance skill based on evaluations of the area under the relative operating characteristic (ROC) curve (AUC) statistic are chosen as the component diagnostics of the G-KPT forecast. The evaluation method is described in section 2d. The list of CAT diagnostics for the three altitude bands is shown in the appendix. For MWT forecasts, 15 MWT diagnostics including the orographic gravity wave drag parameterization of Palmer et al. (1986) (TKE_GWB) are considered, which are the same as in Sharman and Pearson (2017) and Kim et al. (2018). The MWT diagnostics used in the G-KPT forecast are also listed in the appendix.
d. Evaluation method
In this study, the statistical evaluations of the G-KTG and G-KPT forecasts are performed based on the probability of detection (POD) method (Mason 1982) that has been used in most previous studies for turbulence forecasting systems (e.g., Kim et al. 2011; Gill 2014; Kim and Chun 2016; Sharman and Pearson 2017; Storer et al. 2019). Fundamentally, the probability that the forecasts predict the occurrence of turbulence near the locations of the observed MOG turbulence events (PODY) and the probability that the forecasts do not predict the occurrence of turbulence near the locations of the observed null turbulence events (PODN) are derived by comparing each observation to the diagnostic values at the grid point closest to the observation. In this study, the forecast threshold for turbulence is set to 0.22 m2/3 s−1 (Sharman and Pearson 2017). By combining the computed PODY and PODN, the true skill statistic (TSS: TSS = PODY + PODN − 1; as recommended by Allouche et al. 2006) can be determined. Last, as in previous studies (e.g., Kim et al. 2011; Gill 2014; Kim and Chun 2016; Sharman and Pearson 2017; Storer et al. 2019; Lee et al. 2020), the AUC can also be derived. Here, the ROC curve is constructed based on multiple pairs of the POFD (= 1 − PODN, the false-alarm rate) and PODY values derived by varying the EDR threshold range from the minimum value to the maximum value of the forecast. The higher the PODY, PODN, TSS, and AUC values, the better the performance of the forecast.
In the statistical evaluation of the G-KPT forecast, a reliability diagram (Gill and Buchanan 2014; Kim et al. 2018; Storer et al. 2019) is additionally constructed by calculating the frequency of the observed MOG turbulence events against the forecast probability. The diagonal line in the diagram represents perfect reliability. The closer the reliability line of the forecast is to the diagonal line, the better the reliability of the forecast. If the slope of the reliability line is larger (smaller) than that of the diagonal line, it indicates that the forecast tends to be underforecasting (overforecasting) (e.g., Gill and Buchanan 2014; Kim et al. 2018). The horizontal line denotes the sample climatology derived by the ratio of the number of the observed turbulence events to the number of forecasts over the evaluation period. The farther the reliability line is from the sample climatology line, the better the resolution that represents a measure of how well the forecasts can distinguish situations with different frequencies of events.
3. Evaluation results
In this study, statistical evaluations of 12-h forecasts, which are a lead time of operational interest for the global region (Sharman and Pearson 2017), of the G-KTG and G-KPT systems valid at 1800 UTC are performed against the two available EDR observation datasets (NCAR-EDR and AMDAR-EDR) during a 1-yr period (September 2018–August 2019). The performance of the G-KTG system is mainly evaluated based on the AUC.
a. Example of G-KTG and G-KPT forecasts
Figure 3 shows an example of various G-KTG forecasts at FL290 valid at 1800 UTC 13 February 2019. GTG3-CAT (Fig. 3a) and KTG-CAT (Fig. 3b), which are the combined CAT G-KTG forecasts of Group-1 and Group-2, respectively, commonly predict strong turbulence potential areas along the upper-level flow curvatures in midlatitudes in which Kelvin–Helmholtz or inertial instabilities related to CAT generation can be induced. Although the turbulence areas in those two CAT G-KTG forecasts are quite similar, KTG-CAT shows stronger turbulence intensities and additional turbulence areas, particularly at low latitudes (30°S–30°N), likely due to the fact that more CAT diagnostics (20 versus 11) are included in KTG-CAT than in GTG3-CAT. GTG3-MWT (Fig. 3c), which represents the combined MWT G-KTG forecast, predicts turbulence over the major mountainous regions worldwide, including strong turbulence areas over the western region of the United States and the Himalayas. In Fig. 3d, the NCT G-KTG forecast (maxNCT) predicts local turbulence areas at low and middle latitudes (60°S–60°N) in which convective systems often occur. This provides valuable information on turbulence potential regions at lower latitudes in which the CAT diagnostics cannot forecast well. The maxGCMN (Fig. 3e) and maxKCMN (Fig. 3f), which are the maximum of the CAT, MWT, and NCT G-KTG forecasts of Group-1 and Group-2, respectively, shows turbulence potential areas related to the curved flow, mountain waves, and convective systems, with especially strong intensities over the Northeast Pacific, CONUS, Atlantic, and Tibetan Plateau.
Example G-KTG 12-h forecasts of (a) GTG3-CAT, (b) KTG-CAT, (c) GTG3-MWT, (d) maxNCT, (e) maxGCMN, and (f) maxKCMN at FL290 valid at 1800 UTC 13 Feb 2019. The black contours represent the geopotential height with a 120-gpm interval.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Figure 4 shows G-KTG forecasts, which are the same as in Fig. 3, except with a focus on the contiguous United States (CONUS) region, with the turbulence events observed from in situ flight EDR data. As can be seen, numerous events of MOG turbulence occurred over California, Texas, and the southeastern part of the CONUS region. Both GTG3-CAT (Fig. 4a) and KTG-CAT (Fig. 4b) forecasts indicate strong turbulence potential areas along the ridge throughout the United States and the trough in the eastern United States. Although both CAT forecasts capture well the locations of MOG turbulence events, the KTG-CAT forecast matches better with the observed MOG turbulence intensity than the GTG3-CAT forecast. However, the large areas of strong turbulence in the KTG-CAT forecast increase the possibility of false alarms of MOG turbulence areas over the western United States where no actual turbulence events were reported (Fig. 4b). In Fig. 4c, GTG3-MWT predicts high turbulence potential over the Rocky and Sierra Nevada Mountains and captures some MOG turbulence events observed over the western and eastern United States. However, GTG3-MWT overforecasts MOG intensity over Colorado and Wyoming where actual turbulence is not observed. In Fig. 4d, the maxNCT predicts the localized high-risk turbulence regions, which are likely generated by convective systems over the northeastern Pacific Ocean and California (Figs. 4g,h). The NEXRAD radar imagery (Fig. 4g) and GDAPS-derived cloud top height (Fig. 4h) confirm that the turbulence events over California occurred in the clear air near the clouds, implying the source is NCT. The moderate and severe turbulence events observed over California are well captured by the maxNCT forecast. In particular, the locations and intensities of severe turbulence events over California are predicted only by the maxNCT forecast, whereas CAT and MWT forecasts predict only moderate intensities. Unfortunately, as there are no in situ EDR data over the northeastern Pacific Ocean at that time, it is not possible to determine whether turbulence actually occurred in the regions predicted by maxNCT. In Figs. 4e and 4f, the maxGCMN and maxKCMN forecasts, which consider CAT, MWT, and NCT, predict almost all MOG turbulence events.
As in Fig. 3, but focusing on the CONUS region where turbulence events occurred during the time examined. The black dot, blue asterisk, and pink triangle represent null-, moderate-, and severe-level turbulence events recorded in NCAR-EDR observations within 1 h centered at the valid time, respectively. To identify the location of clouds, NEXRAD radar reflectivity at 1755 UTC and cloud top height from GDAPS 12-h forecast valid at 1800 UTC 13 Feb 2019 are shown in (g) and (h), respectively.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Figure 5 shows an example of the CAT G-KPT, MWT G-KPT, and maxG-KPT forecasts at FL290 for the same valid time as in Fig. 3. In Fig. 5a, CAT G-KPT predicts strong turbulence likelihood along the upper anticyclonic and cyclonic flows, particularly over the Atlantic and North America. In Fig. 5b, MWT G-KPT predicts strong turbulence likelihood regions over the Himalayas and California. The turbulence areas from the two G-KPT forecasts coincide with those of relatively strong intensities shown in the G-KTG forecasts (Fig. 3), although the suites of CAT and MWT diagnostics used to construct the G-KTG and G-KPT systems are somewhat different. The maxG-KPT forecast (Fig. 5c) reflects well, as expected, all turbulence potential areas predicted by the CAT and MWT G-KPT forecasts.
As in Fig. 3, but for 12-h forecasts of (a) G-KPT for CAT, (b) G-KPT for MWT, and (c) maxG-KPT.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Figure 6 is the same as Fig. 5, except for focusing on the CONUS region superimposed on the observed EDRs. Again, the areas with high probability regions of CAT G-KPT and MWT G-KPT values shown in Figs. 5a and 5b nearly coincide with the turbulence areas predicted by the CAT G-KTG and MWT G-KTG forecasts (Figs. 4a,c), respectively. In general, the CAT G-KPT and MWT G-KPT forecasts predict well the MOG turbulence events observed over the CONUS region at the time considered (1800 UTC 13 February 2019), although the MWT G-KPT forecast misses some MOG turbulence events. Moreover, the CAT G-KPT and MWT G-KPT forecasts have high values near Colorado and Wyoming in which actual turbulence events do not occur, which increases the probability of false alarms. To understand why the turbulence is overpredicted in those areas, the simulated wind field of the GDAPS 12-h forecast was compared with that of ERA5 reanalysis data (not shown). The overall features of the anticyclonically curved flow located in the western United States are generally similar, although the wind field from GDAPS shows that the strong flow over 60 m s−1 is more widely distributed over the western United States, especially for the larger area of the jet core located near Colorado and Utah. This stronger and wider flow simulated from GDAPS is likely to lead to the overforecasting results of CAT and MWT diagnostics near Colorado and Wyoming. Consequently, maxG-KPT predicts almost all observed MOG turbulence events, but overforecasts turbulence likelihood over the western United States. These qualitative observations will be quantified in the next subsection.
As in Fig. 4, but for 12-h forecasts of (a) G-KPT for CAT, (b) G-KPT for MWT, and (c) maxG-KPT.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
b. Statistical evaluation of G-KTG system
Figure 7 shows the AUC values for the combined G-KTGs based on Group-1 (using the GTG3-based CAT diagnostics) and for the individual diagnostics used in Group-1 for the three altitude bands. Here, maxGCN (maxGCM) indicates the maximum of the combined CAT G-KTG and NCT (MWT) G-KTG of Group-1. The AUC values of the individual CAT, MWT, and NCT diagnostics used in Group-1 are presented in Table 2. At all vertical levels, the AUC values of the NCT and MWT diagnostics are lower than those of the CAT diagnostics, because the NCT and MWT diagnostics predict small turbulence potential areas and miss many turbulence events. Additionally, the AUC values of the MWT and NCT diagnostics are evaluated against null and MOG turbulence events observed over MWT- and NCT-prone areas and are depicted as a gray triangle in Fig. 7. In this study, the MWT- and NCT-prone areas are classified using the terrain height and cloud top height information from the GDAPS data, respectively. The MWT areas are defined by the areas where the terrain height is higher than 200 m and the gradient of terrain height is larger than 5 m km−1 at each model grid point, which is consistent with the definition used by Sharman and Pearson (2017). And, following Kim et al. (2021), turbulence observations reported within 80 km horizontally around a grid point where convection exists and within 4 km vertically from the cloud top height are classified as NCT events. In Fig. 7, the MWT and NCT diagnostics show better performance skills for turbulence events related to each generation source than for all turbulence events.
AUC values (middle bar) of 12-h forecasts of G-KTG combinations based on Group-1 (GTG3) and of individual turbulence diagnostics used for composing the Group-1-based G-KTG combinations at (a) upper, (b) mid-, and (c) low levels. The evaluation using AUC is performed against null- and MOG-level turbulence observations reported within 1 h centered at 1800 UTC for 1 year (September 2018–August 2019). Upper and lower bars represent the upper and lower bound on a 95% confidence interval for AUCs of when each turbulence diagnostic is evaluated against the 1000 subsets constructed by randomly selecting a quarter of the 1-yr turbulence observations, respectively. The AUC values of MWT and NCT diagnostics against turbulence observations over the MWT- and NCT-prone areas, respectively, are depicted as a gray triangle.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
At upper and low levels, the AUC value of the combined GTG3-CAT forecast is higher than that of any of the individual diagnostics (Figs. 7a,c; Tables 2 and 4), consistent with results from previous studies (e.g., Sharman et al. 2006; Kim et al. 2011; Gill and Buchanan 2014). Moreover, maxGCN at upper levels and maxGCM at low levels show similar or better performance than GTG3-CAT. However, the AUC values of maxGCM and maxGCMN are lower than even the best individual CAT diagnostic (here NCSU2/Ri) at upper levels, as the turbulence area is widened by including MWT diagnostics, which increases the false alarm rate. Possible methods that can reduce the overforecasting of MWT diagnostics and improve the performance of MWT diagnostics are suggested as follows: (i) the terrain conditions (i.e., terrain height, terrain height gradient) considered when determining the mountainous area in which MWT diagnostics are calculated could be redefined for GDAPS, (ii) the coefficients a and b in the remapping equation are reevaluated by changing the spatial range of model grid points considered in the PDF calculation, or by readjusting the range of diagnostic values considered in the fitting of the lognormal distribution, and (iii) instead of using only low-level wind speed to determine MW forcing, other variables, such as wind direction or imposed vertical velocity, could be used to improve MWT diagnostic. At midlevels (Fig. 7b) the G-KTG combinations (GTG3-CAT, maxGCM) generally outperform the individual diagnostics with a few exceptions (i.e., EDR, 1/RiTW). This is likely because the suite of individual diagnostics that are optimal for the GTG3 based on the Global Forecast System (Sharman and Pearson 2017), which is a global NWP model from the National Centers for Environmental Prediction of the United States, may not be optimal for the GDAPS-based G-KTG. Also, as mentioned above, the redetermination of the remapping equations for turbulence diagnostics, which can affect the performance of the diagnostics, might be needed.
Statistical evaluation results (AUC, PODY, PODN, and TSS) of various G-KTG combinations based on Group-1 and Group-2 at upper, mid-, and low levels using all turbulence observations during a 1-yr period (September 2018–August 2019). PODY, PODN, and TSS are calculated using the EDR value of 0.22 m2/3 s−1 as a forecast and observation threshold for MOG turbulence.
To investigate the statistical significance of the AUC-based performance of the diagnostics, additional statistical evaluations are performed based on 1000 EDR subsets made by randomly selecting a quarter of the total EDR observations used in the evaluation. These AUC sensitivity tests (top and bottom bars in Fig. 7) show that the G-KTG combinations generally have less variability in performance (the difference between the maximum and minimum AUC values) than the individual turbulence diagnostics for all three altitude bands.
Figure 8 is the same as Fig. 7, except it portrays the AUC performance for the diagnostics considered in Group-2 (using the KTG-based CAT diagnostics). Here, maxKCN (maxKCM) denotes the maximum of the combined CAT G-KTG and NCT (MWT) G-KTG. The AUC values of CAT diagnostics used in Group-2 at upper and midlevels are presented in Table 3. Note that the CAT diagnostics at low levels, and NCT and MWT diagnostics used in Group-2 are the same as in Group-1 as shown in Table 2. At upper levels (Fig. 8a), KTG-CAT and maxKCN have higher AUC values. MaxKCM and maxKCMN have lower AUC values than some of the individual diagnostics due to the increased false alarm rate introduced by including MWT diagnostics. This is consistent with the results of Group-1 analysis. Also, at midlevels (Fig. 8b), the AUC values of KTG-CAT and maxKCM are high, but are not shown to be the best. As in the GTG3-based G-KTG (Fig. 7), these results are likely because the combination of the RDAPS-based KTG component diagnostics may not be suitable for the GDAPS-based G-KTG. Additionally, the fitting coefficients of the KTG component diagnostics may need to be reevaluated. In the AUC sensitivity test performed using 1000 observation subsets (top and bottom bars in Fig. 8), the G-KTG forecasts at upper levels show small variability in AUC among all turbulence diagnostics, which is consistent with Fig. 7. On the other hand, at midlevels, the AUC variability of the G-KTG forecasts is similar to the large variability of individual diagnostics.
As in Fig. 7, but for 12-h forecasts of G-KTG combinations based on Group-2 (KTG) and of individual turbulence diagnostics used for composing the Group-2 based G-KTG combinations at (a) upper and (b) midlevels.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Table 4 shows the various statistical skill scores (PODY, PODN, TSS, and AUC) for the G-KTG combinations of Group-1 and Group-2 for the three altitude bands. PODY, PODN, and TSS are computed by applying the forecast (EDR) threshold of 0.22 m2/3 s−1. Among the diagnostics of Group-1 at upper levels, as shown in Fig. 7a, maxGCM and maxGCMN have lower AUC and PODN values than GTG3-CAT due to overforecasting turbulence regions by including MWT diagnostics. However, the widened turbulence areas of maxGCM and maxGCMN also improve the performance of detecting MOG turbulence (PODY), and as a result, maxGCMN has the highest PODY and TSS values. At mid- and low levels, maxGCM shows better performances than GTG3-CAT in the PODY, AUC, and TSS statistics. These results for the three altitude bands are also shown in the results of the G-KTG diagnostics of Group-2. Also, the AUC, PODY, and TSS values of the G-KTG diagnostics of Group-1 and Group-2 are significantly higher for upper levels than for mid and low levels. This suggests a need for the development of better lower-level turbulence diagnostics.
Figure 9 presents the ROC curves of the G-KTG combinations of Group-1 for the three altitude bands for each season. Here, fall is in September–November (March–May), winter is December–February (June–August), spring is in March–May (September–November), and summer is in June–August (December–February) in the NH (SH). At upper levels, the PODY of maxGCN (maxGCMN) is slightly better for higher values of POFD for all seasons than GTG3-CAT (maxGCM), despite the GTG3-CAT (maxGCM) ROC curve being similar to maxGCN (maxGCMN). Also, the ROC curves of GTG3-CAT and maxGCN are closer to the upper-left corner of plot than those of maxGCM and maxGCMN, which indicates that GTG3-CAT and maxGCN have better performance skills (AUC) than the other two G-KTG forecasts. At mid- and low levels, the ROC curves of maxGCM and GTG3-CAT in winter are quite similar, although the ROC curve of maxGCM is generally closer to the upper-left corner than that of GTG3-CAT for all seasons.
ROC curves of G-KTG 12-h forecasts of Group-1 at (top) upper, (middle) mid-, and (bottom) low levels for each season against null- and MOG-level turbulence observations reported within 1 h centered at 1800 UTC in (a),(e),(i) fall (September–October–November in the NH and March–April–May in the SH); (b),(f),(j) winter (December–January–February in the NH and June–July–August in the SH); (c),(g),(k) spring (March–April–May in the NH and September–October–November in the SH); and (d),(h),(l) summer (June–July–August in the NH and December–January–February in the SH). Dots represent the PODY and POFD values of G-KTG forecasts of Group-1 derived using the EDR threshold for MOG turbulence.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
The AUC results of the G-KTG combinations of Group-1, shown in Fig. 9, are summarized in Table 5. At upper levels, the four G-KTG combinations provide the best performance in fall and the worst performance in summer, which is likely due to the fact that most of the developed individual turbulence diagnostics are related to strong shear and/or deformation in the vicinity of the jet stream (Kim et al. 2011). Among the four G-KTG forecasts, maxGCN shows the best performance in all seasons except spring. The increase in AUC of G-KTG by adding NCT diagnostics is greatest in summer than in other seasons. Therefore, adding NCT diagnostics can significantly improve the turbulence prediction skill in summer when the turbulence forecast systems usually perform poorly. In addition, at mid- and low levels, the G-KTG forecasts perform best in winter and worst in summer, and in fall and spring, the performance skills of the G-KTG forecasts are much lower than in winter. This suggests that the development of turbulence diagnostics that can better represent the mechanisms of low-level turbulence (e.g., dry thermals over hot surfaces; Sharman and Lane 2016) is needed, as discussed in Table 4. Unlike upper levels, maxGCM at mid and low levels show better performances than GTG3-CAT for all seasons.
Seasonal evaluation results (AUC) of various G-KTG combinations based on Group-1 and Group-2 at upper, mid-, and low levels during a 1-yr period (September 2018–August 2019).
The seasonal performance results of the G-KTG combinations of Group-2 for the two altitude bands (upper and midlevels) are represented in Fig. 10 and Table 5. In Fig. 10, it is shown that the ROC curves of the G-KTG forecasts of Group-2 are consistent with those of Group-1 for each altitude band (Fig. 9). One difference is that the PODY of maxKCM is slightly lower than that of KTG-CAT at high values of the POFD in the fall and summer.
As in Fig. 9, but for G-KTG 12-h forecasts of Group-2 at (top) upper and (bottom) midlevels.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Table 5 shows the G-KTG forecasts performing better in winter compared to other seasons for both altitude bands, except for the fall forecast, which demonstrates equal performance with winter at upper levels. The G-KTG forecasts for both altitude bands show the worst performances in summer. Furthermore, the AUC values of maxKCN at upper levels and maxKCM at midlevels are higher than KTG-CAT for each season and altitude band, with an increase in AUC from maxKCN compared to KTG-CAT at upper levels is the greatest in summer. Those results are consistent with the results of the G-KTG of Group-1 shown in Fig. 9. Compared to the performance results of the G-KTG forecasts of Group-1 (Fig. 9), the G-KTG forecasts of Group-2 perform worse at upper levels, while the G-KTG forecasts of Group-1 and Group-2 show similar performance skills at midlevels.
To examine the performance skills of the G-KTG combinations focusing on international flight routes, the G-KTG combinations are evaluated using turbulence observations in all areas excluding the CONUS (Fig. 1a) where most turbulence observations are recorded. As almost all turbulence observations at mid- and low levels are recorded over the CONUS, the G-KTG combinations at upper levels are only evaluated. Table 6 shows the evaluation results of the upper-level G-KTG combinations of Group-1 and Group-2 over the global region excluding the CONUS. Compared to the performance results of the G-KTG forecasts over the global region including the CONUS (Table 4), all the G-KTG combinations of Group-1 and Group-2 shows higher AUC and PODN values and lower PODY and TSS values outside the CONUS. Also, maxGCN and maxKCN have the highest AUC values, and maxGCMN and maxKCMN have the highest PODY and TSS values among the four G-KTG combinations of each Group, consistent with the evaluation result of the G-KTG combinations over the global region including the CONUS.
As in Table 4, but for the upper-level G-KTG combinations based on Group-1 and Group-2 evaluated against turbulence observations over the global region excluding the CONUS depicted as the rectangles in Fig. 1a. The numbers of null (MOG) turbulence observations outside the CONUS are 2 187 848 (1333), respectively.
Figure 11 shows the ROC curves of the maxGCMN and maxKCMN forecasts, which combine all types of turbulence diagnostics over the globe, U.S., and East Asia regions during the 1-yr period. Because turbulence observations at mid- and low levels over East Asia are rare, only the evaluation results of the G-KTG forecasts at upper levels are compared. Evidently, maxGCMN has high AUC values over 0.82 in all three regions. MaxKCMN also results in good performance with AUC values of 0.79–0.82 in all three regions. The AUC values of maxKCMN are slightly smaller than those of maxGCMN in all regions because the false alarm rate of maxKCMN is increased due to the relatively larger overforecasting region, as shown in Figs. 3 and 4. Table 6 and Fig. 11 indicate that the G-KTG combinations also have good enough performance over the global region excluding the CONUS.
ROC curves with AUC values of 12-h forecasts of maxGCMN (Group-1) and maxKCMN (Group-2) derived against turbulence observations reported within 1 h centered at 1800 UTC for 1 year (September 2018–August 2019) at upper levels over the (a) global, (b) U.S., and (c) East Asia (EA) regions. The considered regions of the United States and East Asia are indicated by rectangles in Fig. 1a. The numbers of null (MOG) turbulence observations used in the G-KTG evaluation for global, U.S., and EA regions are 13 809 024, 12 458 016, and 120 726 (10 280, 9481, and 96), respectively.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
As mentioned in section 2a, NCAR-EDR and AMDAR-EDR use different computational algorithms and reporting strategies, and their main coverage areas are quite different. To examine the sensitivity of the turbulence observation sources, the ROC curves of the upper-level maxGCMN and maxKCMN forecasts are derived against each of NCAR-EDR and AMDAR-EDR separately, and the results are presented in Fig. 12. It is found that both the maxGCMN and maxKCMN forecasts show higher AUC values when they are evaluated against NCAR-EDR. The performance skills of maxGCMN and maxKCMN against AMDAR-EDR worsen relative to NCAR-EDR for higher values of POFD, which is likely due to the smaller number of MOG turbulence observed from AMDAR-EDR. At low values of the POFD, the performance skills of maxGCMN and maxKCMN against AMDAR-EDR are quite similar to those against NCAR-EDR. These results suggest that AMDAR-EDR converted by using the method of Kim et al. (2017) may be a valuable source of verification for global turbulence forecasts.
ROC curves with AUC values (bracket) of 12-h forecasts of (a) maxGCMN and (b) maxKCMN at upper levels against each of NCAR-EDR and AMDAR-EDR observations for 1 year (September 2018–August 2019).
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
c. Statistical evaluation of G-KPT system
Figure 13 shows the PODY, PODN, and TSS values of maxG-KPT for the three altitude bands during the 1-yr evaluation period. The statistical evaluations are derived by applying probability thresholds from 10% to 90% at 10% intervals. At all vertical levels, PODY and TSS decrease with increasing forecast probability along with an increase in PODN. This is due to the smaller turbulence forecast areas at higher forecast probabilities. The G-KPT forecasts at upper (lower) levels have higher (lower) TSS values for all forecast probabilities, while PODN at lower levels is higher than for upper levels. This is likely due to the relatively small turbulence area of the G-KPT forecasts in the lower-level regions of CONUS. In the evaluation results of the G-KTG forecasts in section 3b (Table 4), the TSS values of the maxGCMN forecasts at upper levels and the maxGCM forecasts at mid- and low levels (the maxKCMN forecasts at upper levels and the maxKCM forecasts at midlevels) are 0.212, 0.193, and 0.100 (0.242 and 0.166), respectively. In Fig. 13, the TSS values of the G-KPT forecasts at upper, mid-, and low levels are 0.248 at 60% forecast probability, 0.253 at 40% forecast probability, and 0.116 at 60% forecast probability, respectively, which are higher than the TSS values of G-KTG.
PODY (black), PODN (red), and TSS (blue) of maxG-KPT 12-h forecast at (a) upper, (b) mid-, and (c) low levels, which are evaluated against total turbulence observations (NCAR-EDR + AMDAR-EDR) for 1 year (September 2018–August 2019).
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Figure 14 presents the ROC curves for maxG-KPT for all three altitude bands. As shown in Fig. 13, the PODN values have a narrow range from 0.9 to 1 for all forecast probabilities due to the small turbulence areas of the G-KPT forecast. This result makes the ROC curve lean to the bottom left-hand corner. Nevertheless, the AUC values of maxG-KPT at upper, mid-, and low levels are 0.715, 0.677, and 0.627, respectively. For comparison, the POD performance statistics of the representative G-KTG forecasts of Group-1 and Group-2 presented in Table 4 are also shown as a single dot in Fig. 14. It is seen that the performance of maxG-KPT is comparable with that of G-KTG for all three altitude bands.
ROC curve of maxG-KPT 12-h forecast at (a) upper, (b) mid-, and (c) low levels, which are evaluated against total turbulence observations for 1 year (September 2018–August 2019). The PODY-POFD of G-KTG 12-h forecasts of Group-1 (black; maxGCMN at upper levels, maxGCM at mid- and low levels) and Group-2 (red; maxKCMN at upper levels, maxKCM at midlevels) derived using the EDR threshold for MOG turbulence is depicted as a dot.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
The relative economic value [V; Eq. (5)] plots (Richardson 2000; Gill and Buchanan 2014; Storer et al. 2019) of the representative G-KTG forecasts of Group-1 and Group-2 as well as the maxG-KPT forecast for all altitude bands are shown in Fig. 15. For maxG-KPT, the relative economic value is estimated by varying the probability thresholds used to compute the hit rate and false alarm rate. For maxG-KPT, it is shown that as the probability threshold increases, the peak relative economic value decreases along with a wider distribution toward high cost–loss ratios. At upper levels, the relative economic values of G-KTG and G-KPT appear in different ranges of cost–loss ratios, although the maximum of relative economic values of maxG-KPT with the probability thresholds below 60% are higher than that of G-KTG. This implies that the type of forecast for decision-making is critical depending on what cost–loss ratios are important to the user. Note that the probabilistic maxG-KPT forecasts with the probability thresholds of 40% at midlevels and 60% at low levels have greater relative economic values than the deterministic G-KTG forecasts for all cost–loss ratios.
Relative economic value plot for G-KTG 12-h forecasts of Group-1 (dotted line) and Group-2 (dot–dashed line), and maxG-KPT 12-h forecasts (solid line) with different probability thresholds at (a) upper, (b) mid-, and (c) low levels for 1 year (September 2018–August 2019). The x axis is represented on a log scale.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
Figure 16 shows the reliability diagram for maxG-KPT for all three altitude bands. Here, the observed frequency is obtained by calculating the frequency of the MOG turbulence events among the total observations (NCAR-EDR + AMDAR-EDR) captured in each binned forecast probability area. All reliability curves are located below the perfect reliability line (thin solid line), which indicates that the G-KPT forecast is substantially overforecasting for all altitude bands. This is likely due to the relatively large turbulence area calculated by the NWP model output with a much larger grid spacing than the aviation turbulence scale. The overforecasting feature of the NWP model-derived probabilistic turbulence forecasts was also found in previous studies for probabilistic turbulence forecast systems using a single turbulence diagnostic (Storer et al. 2019; Lee et al. 2020). Also, the use of several turbulence diagnostics representing different generation mechanisms can be another possible reason for overforecasting. However, the overforecasting of G-KPT may be preferable for aircraft safety as it can capture more severe turbulence events (e.g., Díaz-Fernández et al. 2021).
Reliability diagram of maxG-KPT 12-h forecast against the total turbulence observations (thick solid curves) at (a) upper, (b) mid-, and (c) low levels valid at 1800 UTC for 1 year (September 2018–August 2019) The diagonal (black thin) and horizontal (gray dotted) lines represent the perfect reliability and sample climatology, respectively. The blue dashed curves in (a) represent reliability diagram results of maxG-KPT at upper levels estimated against 100 turbulence observation subsets.
Citation: Weather and Forecasting 37, 3; 10.1175/WAF-D-21-0095.1
The lower the altitude band, the closer (the farther) the reliability line is to the perfect reliability line (from the sample climatology line). In particular, the G-KPT forecast at low levels at the low probabilities below 10% is very close to the perfect reliability line. The main reason for the very low reliability of the G-KPT forecast at upper levels is likely due to the significant lack of MOG turbulence events compared to the regularly reported null events, which may be due in part to aircraft trying to avoid already known turbulence areas (Kim et al. 2018; Lee et al. 2020). Note that some previous studies of turbulence observations (Dutton 1980; Sharman et al. 2006) found that the ratio of MOG turbulence occurrence at upper levels to be about 1%. Considering this, we perform the reliability test of maxG-KPT at upper levels against 100 observation subsets that have the ratio of MOG turbulence to total observations of 1%. The results are represented as blue dashed lines in Fig. 16a. It shows that the reliability curves against the observation subsets are slightly closer to the diagonal line compared to the reliability result using all turbulence observations, although those curves are still below the diagonal line (overforecasting). This result suggests that if the G-KPT (or any turbulence forecast system) is evaluated using observation data that have a more realistic frequency of atmospheric turbulence, more robust and better evaluation results can be obtained.
4. Summary and discussion
In this study, both deterministic and probabilistic turbulence forecast systems have been developed based on the latest operational UM-based GDAPS data of the KMA with a horizontal grid spacing of ∼10 km. The deterministic G-KTG derives an ensemble mean EDR forecast by combining several EDR-scaled individual turbulence diagnostics. In the G-KTG system, the diagnostics of multiple turbulence sources (CAT, MWT, and NCT) are combined differently for each of the three altitude (upper, mid-, and low level) regions. The suites of the CAT and MWT diagnostics applied in the G-KTG combination are obtained from the suites used in GTG3 (Sharman and Pearson 2017) and KTG (Kim and Chun 2012; Lee and Chun 2014), although the performance of the system may be improved if optimization of the suites of diagnostics based on the GDAPS was performed. Finally, the G-KTG system produces ensemble forecasts for multiple turbulence sources based on various combinations of G-KTG forecasts for CAT, MWT, and NCT. In addition to the deterministic G-KTG, the probabilistic G-KPT system has also been developed using a multi-diagnostic ensemble approach to calculate the probability that 15 CAT (15 MWT) diagnostics, all mapped to EDR, exceed a certain EDR threshold. The 15 CAT diagnostics with the highest AUC values for each altitude band and the 15 MWT diagnostics from GTG3 are used. The final G-KPT forecast is derived by combining of the G-KPT forecasts for CAT and MWT.
The G-KTG and G-KPT 12-h forecasts are evaluated against null and MOG turbulence events from two in situ EDR observation data sources (NCAR-EDR and AMDAR-EDR) during a 1-yr period using various statistical skill scores. By using these two observation databases, both NH and SH turbulence events can be included in the evaluation. In the evaluation results of the G-KTG combinations with individual diagnostics using the AUC, the G-KTG combinations generally show better performance than the individual diagnostics alone. Also, according to AUC sensitivity tests using observation subsets, it is found that the G-KTG forecasts have less variability than the individual diagnostics, especially when the G-KTG combination is based on the GTG3 component diagnostics. When comparing the performance skills of the various G-KTG combinations for each altitude band, the G-KTG forecast considering all turbulence sources (CAT, MWT, and NCT) shows the highest PODY and TSS related to the MOG turbulence detection skill. In the seasonal performance evaluation, the G-KTG forecast provides the highest AUC values in winter and fall, and the lowest AUC value in summer. At upper levels, including the NCT diagnostics improves the summer performance of the G-KTG forecast more than those for other seasons. Last, the GTG3-based G-KTG forecast has a higher AUC than the KTG-based G-KTG forecast in all regions tested. Given the small number of turbulence diagnostics used in the ensemble mean and the good performance, the GTG3-based G-KTG forecast using the high-resolution GDAPS output is expected to provide better turbulence information to forecasters, dispatchers, and aviation users as an operational global turbulence forecast system of the AMO.
In the evaluation of the G-KPT forecasts using the POD-based statistical skill scores, both PODY and TSS for detecting MOG turbulence decrease at higher forecast probabilities due to the small forecast areas. Furthermore, if the areas of the forecast probabilities applied in the turbulence forecasting are less than 60% at upper levels, 40% at midlevels, and 60% at low levels, the G-KPT forecast show better performance skill (TSS) than the G-KTG forecast. Examination of the relative economic value also reveals that if the probabilities of 40% at midlevels and 60% at low levels are considered as thresholds for decision-making, G-KPT would be more useful than G-KTG. However, the G-KPT forecast shows an overforecasting at all altitude bands based on reliability diagrams. This is likely due to the large turbulence area derived from the NWP model output with a grid spacing that is much coarser than the turbulence scale and due to the lack of MOG turbulence observations. In the reliability tests against 100 randomly selected observation subsets, which consider the observed occurrence rate of MOG turbulence in the atmosphere, the G-KPT forecast shows better reliability than using all turbulence observations. This implies that if the G-KPT forecast is evaluated against the turbulence observations, which have a more realistic occurrence rate of turbulence, better evaluation results could be obtained.
As in previous studies (e.g., Sharman et al. 2006; Kim et al. 2011; Gill and Buchanan 2014), it was shown that the simple ensemble mean approach can provide more skillful turbulence forecasts than individual diagnostics alone, even when the optimal combination of diagnostics used is determined from different NWP model inputs. Moreover, it was found that including the NCT diagnostics can improve the performance skill of the existing turbulence forecast system based only on CAT and MWT diagnostics, especially during the summer. The performance skill of the G-KTG forecast at low levels is relatively low indicating a need for better low-level turbulence diagnostics. To develop new low-level turbulence algorithms, more studies on the mechanisms of low-level turbulence compared to turbulence observations (e.g., Muñoz-Esparza and Sharman 2018) should be conducted in the future. Regarding the operational NWP models provided from the KMA, because the deterministic forecasts have significantly higher horizontal and vertical resolutions than the ensemble forecasts, the GDAPS-based multi-diagnostic G-KPT system developed in this study can provide turbulence probabilistic information with higher resolution than that based on low-resolution ensemble forecasting members.
Although this study concentrated on developing turbulence forecasts systems for the KMA using the GDAPS NWP model as input, most of these findings would be expected to hold for other NWP-based forecast systems as well.
In the future, as an advanced alternative of the G-KTG and G-KPT systems, machine learning methods such as random forest or logistic regression (e.g., Tebaldi et al. 2002; Muñoz-Esparza et al. 2020), could be used to provide more skillful and accurate turbulence forecast information by flexibly selecting different turbulence diagnostics according to specific atmospheric conditions.
Acknowledgments.
The GDAPS data and in situ flight EDR and AMDAR observations for research purposes were provided by the KMA and NCAR, respectively. This work was funded by the Korea Meteorological Administration Research and Development Program under Grant KMI2018-07810.
APPENDIX
G-KPT Turbulence Diagnostics
Here, CAT and MWT diagnostics used as the G-KPT component diagnostics for each altitude band are listed in Table A1. Definitions of diagnostics are shown in Jang et al. (2009), Kim et al. (2009), and Sharman and Pearson (2017).
List of CAT and MWT diagnostics used in the G-KPT computation for each vertical level region. For MWT diagnostics, ds is derived by combining the low-level wind speed and terrain height (Sharman and Pearson 2017).
REFERENCES
Allouche, O., A. Tsoar, and R. Kadmon, 2006: Assessing the accuracy of species distribution models: Prevalence, kappa and the True Skill Statistic (TSS). J. Appl. Ecol., 43, 1223–1232, https://doi.org/10.1111/j.1365-2664.2006.01214.x.
Bowler, N. E., A. Arribas, K. R. Mylne, K. B. Robertson, and S. E. Beare, 2008: The MOGREPS short-range ensemble prediction system. Quart. J. Roy. Meteor. Soc., 134, 703–722, https://doi.org/10.1002/qj.234.
Cho, J. Y. N., R. E. Newell, B. E. Anderson, J. D. W. Barrick, and K. L. Thornhill, 2003: Characterizations of tropospheric turbulence and stability layers from aircraft observations. J. Geophys. Res., 108, 8784, https://doi.org/10.1029/2002JD002820.
Chun, H.-Y., and J.-J. Baik, 1998: Momentum flux by thermally induced internal gravity waves and its approximation for large-scale models. J. Atmos. Sci., 55, 3299–3310, https://doi.org/10.1175/1520-0469(1998)055<3299:MFBTII>2.0.CO;2.
Cornman, L. B., 2016: Airborne in situ measurements of turbulence. Aviation Turbulence: Processes, Detection, Prediction, R. Sharman and T. Lane, Eds., Springer, 97–120.
Díaz-Fernández, J., and Coauthors, 2021: On the characterization of mountain waves and the development of a warning method for aviation safety using WRF forecast. Atmos. Res., 258, 105620, https://doi.org/10.1016/j.atmosres.2021.105620.
Dutton, J. A., and H. A. Panofsky, 1970: Clear air turbulence: A mystery may be unfolding. Science, 167, 937–944, https://doi.org/10.1126/science.167.3920.937.
Dutton, M. J. O., 1980: Probability forecasts of Clear-Air Turbulence (CAT) based on numerical output. Meteorite Mag., 109, 293–310.
Ellrod, G. P., and D. I. Knapp, 1992: An objective clear-air turbulence forecasting technique: Verification and operational use. Wea. Forecasting, 7, 150–165, https://doi.org/10.1175/1520-0434(1992)007<0150:AOCATF>2.0.CO;2.
Gill, P. G., 2014: Objective verification of World Area Forecast Centre clear air turbulence forecasts. Meteor. Appl., 21, 3–11, https://doi.org/10.1002/met.1288.
Gill, P. G., and P. Buchanan, 2014: An ensemble based turbulence forecasting system. Meteor. Appl., 21, 12–19, https://doi.org/10.1002/met.1373.
Hoblit, F. M., 1988: Gust Loads on Aircraft: Concepts and Applications. AIAA Education Series, American Institute of Aeronautics and Astronautics, 306 pp.
ICAO, 2001: Meteorological service for international air navigation. Annex 3 to the Convention on International Civil Aviation, 14th ed. ICAO International Standards and Recommended Practices Tech. Annex, 128 pp.
ICAO, 2018: Meeting of the Meteorology Panel (METP): Fourth meeting. International Civil Aviation Organization Tech. Rep. 4, 340 pp., https://www.icao.int/airnavigation/METP/Panel%20Meetings/METP4_Final%20Report.pdf.
Jang, W., H.-Y. Chun, and J.-H. Kim, 2009: A study of forecast system for clear-air turbulence in Korea. Part I: Korean Integrated Turbulence Forecasting Algorithm (KITFA) (in Korean with English abstract). Atmosphere, 19, 255–268.
Kim, J.-H., and H.-Y. Chun, 2012: Development of the Korean aviation Turbulence Guidance (KTG) system using the operational Unified Model (UM) of the Korea Meteorological Administration (KMA) and pilot reports (PIREPs) (in Korean with English abstract). J. Korean Soc. Aviat. Aeronaut, 20, 76–83, https://doi.org/10.12985/ksaa.2012.20.4.076.
Kim, J.-H., H.-Y. Chun, W. Jang, and R. D. Sharman, 2009: A study of forecast system for clear-air turbulence in Korea, Part II: Graphical Turbulence Guidance (GTG) system (in Korean with English abstract). Atmosphere, 19, 269–287.
Kim, J.-H., H.-Y. Chun, R. D. Sharman, and T. L. Keller, 2011: Evaluations of upper-level turbulence diagnostics performance using the Graphical Turbulence Guidance (GTG) system and pilot reports (PIREPs) over East Asia. J. Appl. Meteor. Climatol., 50, 1936–1951, https://doi.org/10.1175/JAMC-D-10-05017.1; Corrigendum, 50, 2193, https://doi.org/10.1175/JAMC-D-11-0188.1.
Kim, J.-H., W. N. Chan, B. Sridhar, and R. D. Sharman, 2015: Combined winds and turbulence prediction system for automated air-traffic management applications. J. Appl. Meteor. Climatol., 54, 766–784, https://doi.org/10.1175/JAMC-D-14-0216.1.
Kim, J.-H., R. Sharman, M. Strahan, J. W. Scheck, C. Bartholomew, J. C. H. Cheung, P. Buchanan, and N. Gait, 2018: Improvements in non-convective aviation turbulence prediction for the World Area Forecast System (WAFS). Bull. Amer. Meteor. Soc., 99, 2295–2311, https://doi.org/10.1175/BAMS-D-17-0117.1.
Kim, S.-H., and H.-Y. Chun, 2016: Aviation turbulence encounters detected from aircraft observations: Spatiotemporal characteristics and application to Korean aviation turbulence guidance. Meteor. Appl., 23, 594–604, https://doi.org/10.1002/met.1581.
Kim, S.-H., H.-Y. Chun, and P. W. Chan, 2017: Comparison of turbulence indicators obtained from in situ flight data. J. Appl. Meteor. Climatol., 56, 1609–1623, https://doi.org/10.1175/JAMC-D-16-0291.1.
Kim, S.-H., H.-Y. Chun, R. D. Sharman, and S. B. Trier, 2019: Development of near-cloud turbulence diagnostics based on a convective gravity wave drag parameterization. J. Appl. Meteor. Climatol., 58, 1725–1750, https://doi.org/10.1175/JAMC-D-18-0300.1.
Kim, S.-H., H.-Y. Chun, J.-H. Kim, R. D. Sharman, and M. Strahan, 2020: Retrieval of eddy dissipation rate from derived equivalent vertical gust included in Aircraft Meteorological Data Relay (AMDAR). Atmos. Meas. Tech., 13, 1373–1385, https://doi.org/10.5194/amt-13-1373-2020.
Kim, S.-H., H.-Y. Chun, D.-B. Lee, J.-H. Kim, and R. D. Sharman, 2021: Improving numerical weather prediction-based near-cloud aviation turbulence forecasts by diagnosing convective gravity wave breaking. Wea. Forecasting, 36, 1735–1757, https://doi.org/10.1175/WAF-D-20-0213.1.
Knox, J. A., 1997: Possible mechanisms of clear-air turbulence in strongly anticyclonic flows. Mon. Wea. Rev., 125, 1251–1259, https://doi.org/10.1175/1520-0493(1997)125<1251:PMOCAT>2.0.CO;2.
Lane, T. P., R. D. Sharman, S. B. Trier, R. G. Fovell, and J. K. Williams, 2012: Recent advances in the understanding of near-cloud turbulence. Bull. Amer. Meteor. Soc., 93, 499–515, https://doi.org/10.1175/BAMS-D-11-00062.1.
Lee, D.-B., and H.-Y. Chun, 2014: Development of the seasonal Korean aviation Turbulence Guidance (KTG) system using the regional Unified Model of the Korea Meteorological Administration (KMA) (in Korean with English abstract). Atmosphere, 24, 235–243, https://doi.org/10.14191/Atmos.2014.24.2.235.
Lee, D.-B., and H.-Y. Chun, 2018: Development of the Global-Korean aviation Turbulence Guidance (Global-KTG) system using the Global Data Assimilation and Prediction System (GDAPS) of the Korea Meteorological Administration (KMA) (in Korean with English abstract). Atmosphere, 28, 223–232, https://doi.org/10.14191/Atmos.2018.28.2.223.
Lee, D.-B., H.-Y. Chun, and J.-H. Kim, 2020: Evaluation of multimodel-based ensemble forecasts for clear-air turbulence. Wea. Forecasting, 35, 507–521, https://doi.org/10.1175/WAF-D-19-0155.1.
Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30, 291–303.
Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF ensemble prediction system: Methodology and validation. Quart. J. Roy. Meteor. Soc., 122, 73–119, https://doi.org/10.1002/qj.49712252905.
Muñoz-Esparza, D., and R. D. Sharman, 2018: An improved algorithm for low-level turbulence forecasting. J. Appl. Meteor. Climatol., 57, 1249–1263, https://doi.org/10.1175/JAMC-D-17-0337.1.
Muñoz-Esparza, D., R. D. Sharman, and W. Deierling, 2020: Aviation turbulence forecasting at upper levels with machine learning techniques based on regression trees. J. Appl. Meteor. Climatol., 59, 1883–1899, https://doi.org/10.1175/JAMC-D-20-0116.1.
Palmer, T. N., G. J. Shutts, and R. Swinbank, 1986: Alleviation of a systematic westerly bias in general circulation and numerical weather prediction models through an orographic gravity wave drag parametrization. Quart. J. Roy. Meteor. Soc., 112, 1001–1039, https://doi.org/10.1002/qj.49711247406.
Prasanna, V., H.-W. Choi, J. Jung, Y.-G. Lee, and B.-J. Kim, 2018: High-resolution wind simulation over Incheon international airport with the unified model’s Rose nesting suite from KMA operational forecasts. Asia-Pac. J. Atmos. Sci., 54, 187–203, https://doi.org/10.1007/s13143-018-0003-5.
Richardson, D. S., 2000: Skill and relative economic value of the ECMWF ensemble prediction system. Quart. J. Roy. Meteor. Soc., 126, 649–667, https://doi.org/10.1002/qj.49712656313.
Sela, J., 2010: The derivation of the sigma pressure hybrid coordinate semi-Lagrangian model equations for the GFS. NCEP Office Note 462, 31 pp.
Sharman, R., and T. Lane, Eds., 2016: Aviation Turbulence: Processes, Detection, Prediction. Springer, 523 pp.
Sharman, R., and J. Pearson, 2017: Prediction of energy dissipation rates for aviation turbulence. Part I: Forecasting nonconvective turbulence. J. Appl. Meteor. Climatol., 56, 317–337, https://doi.org/10.1175/JAMC-D-16-0205.1.
Sharman, R., C. Tebaldi, G. Wienner, and J. Wolff, 2006: An integrated approach to mid- and upper-level turbulence forecasting. Wea. Forecasting, 21, 268–287, https://doi.org/10.1175/WAF924.1.
Sharman, R., S. B. Trier, T. P. Lane, and J. D. Doyle, 2012: Sources and dynamics of turbulence in the upper troposphere and lower stratosphere: A review. Geophys. Res. Lett., 39, L12803, https://doi.org/10.1029/2012GL051996.
Sharman, R., L. B. Cornman, G. Meymaris, J. Pearson, and T. Farrar, 2014: Description and derived climatologies of automated in situ eddy-dissipation-rate reports of atmospheric turbulence. J. Appl. Meteor. Climatol., 53, 1416–1432, https://doi.org/10.1175/JAMC-D-13-0329.1.
Sherman, D. J., 1985: The Australian implementation of AMDAR/ACARS and the use of derived equivalent gust velocity as a turbulence indicator. Department of Defense, Defense Science and Technology Organisation, Aeronautical Research Laboratories, Structures Rep. 418, 28 pp.
Storer, L. N., P. G. Gill, and P. D. Williams, 2019: Multi-model ensemble predictions of aviation turbulence. Meteor. Appl., 26, 416–428, https://doi.org/10.1002/met.1772.
Swinbank, R., and Coauthors, 2016: The TIGGE project and its achievements. Bull. Amer. Meteor. Soc., 97, 49–67, https://doi.org/10.1175/BAMS-D-13-00191.1.
Tebaldi, C., D. Nychka, B. G. Brown, and R. Sharman, 2002: Flexible discriminant techniques for forecasting clear-air turbulence. Environmetrics, 13, 859–878, https://doi.org/10.1002/env.562.
Tung, K. K., and W. W. Orlando, 2003: The k3 and k5/3 energy spectrum of atmospheric turbulence: Quasigeostrophic two-level model simulation. J. Atmos. Sci., 60, 824–835, https://doi.org/10.1175/1520-0469(2003)060<0824:TKAKES>2.0.CO;2.
WMO, 2014: The WMO AMDAR observing systems: Benefits to airlines and aviation. World Meteorological Organization, 4 pp., https://library.wmo.int/doc_num.php?explnum_id=6376.