1. Introduction
The importance of long-range forecasts is growing since climate extremes occur more frequently than the preindustrial period due to the impacts of climate change (IPCC 2012). While future climate information is critical to various sectors including energy, water resources, and agriculture (Kumar 2010; Sillmann et al. 2017), the changing climate and more frequent climate extremes hinder the success of long-range forecasts unless we constantly update our understanding of the atmospheric and oceanic physical mechanisms (Sillmann et al. 2017).
The real-world representations of dynamical general circulation models (GCMs) have been evolved, and forecast skills for global key players, such as El Niño–Southern Oscillation (ENSO), have been much improved (e.g., Tang et al. 2018; van Oldenborgh et al. 2005). The future of dynamical climate models appears promising. However, there is still much scope in obtaining great skills, especially for extratropical regions (e.g., Genthon and Armengaud 1995; Barnston et al. 1994; Ray et al. 2021). Global climate models are complex, requiring enormous computing power. Numerous statistical models and statistical–dynamical hybrid models have been developed, keeping pace with the persistent development of dynamical models for long-range forecasting to complement the dynamical-only models (e.g., Schepen et al. 2012; Rajeevan et al. 2007; van Oldenborgh et al. 2005; Badr et al. 2014).
Various statistical models have been developed for long-range forecasts in South Korea and used along with dynamical climate models by the Korea Meteorological Administration (KMA 2018). Key predictors for monthly temperature and rainfall have been derived based on seasonal and monthly climatic characteristics of South Korea. Statistical models have been developed mainly using multiple linear regressions (MLR) (KMA 2018), and deterministic forecasts are produced based on these statistical models focusing only on South Korea for each month of interest. The models are relatively simple to use, do not require much computing resources, and perform well during the analysis periods (KMA 2018). The models are updated regularly for operational purposes; regression coefficients of MLR models are renewed using the extended train set.
We propose the objective long-range forecasting model based on Gaussian processes (OLRAF-GP) for objective and probabilistic long-range forecasting to address some of the limitations of typical statistical models. First, the proposed model is based on the Gaussian processes (GP), which can work with nonlinear interactions and produces probabilistic forecast results. It is possible to obtain probabilistic forecasts from ensembles of dynamical climate model outputs or multiple statistical model outputs using methods, such as Bayesian model averaging (BMA; Raftery et al. 2005) and ensemble model output statistics (EMOS; Gneiting et al. 2005). However, typical statistical models only provide deterministic forecast results. GP, a nonparametric Bayesian model, provides posterior predictive probability distributions of the target variable (Rasmussen and Williams 2006). We obtained uncertainty information of the predictions from probabilistic forecast results. Second, the predictors of the proposed model were objectively selected based on their relationships with the target variables for the new train set whenever the model is used. Typical statistical models use predetermined predictors; the main advantage is that the mechanisms behind the relationships between the predictors and the target variables are well investigated. On the other hand, the numbers of predictors are often limited, and existing studies tend to focus on specific months or seasons. For instance, there are no or few predictors suggested for the September–November period in the guidance for South Korea (KMA 2018). Updating the statistical models is not easy after they are developed using predetermined predictors. Last, the proposed model uses additional information from dynamical climate model outputs for the period with no observed data. Since typical statistical models use climate monitoring and analysis based on observed data, information is not available during one to several months before the target month. For example, we only have data up until April to forecast the temperature of June, July, and August in May. We fill this gap of data absence using the APEC Climate Center multimodel ensemble (APCC MME). In the example, APCC MME data from May (or April with longer latency of observed data) to the target month are used. Details of the method are presented in section 3.
The objectives of this study are the following: 1) to propose a long-range forecasting method, providing probabilistic predictions based on objectively derived predictors using Gaussian processes, 2) to investigate the use of dynamical climate model results from APCC MME as additional predictors for the period with no observed data, and 3) to explore objectively derived predictors in terms of previously known mechanisms. South Korea is our case study site; it is located in the midlatitude region, and is greatly in need of more predictors for long-range forecasts besides key drivers such as ENSO. We focus on summertime near-surface 2-m air temperatures, including mean daily mean temperature (TMm), mean daily minimum temperature (TNm), and mean daily maximum temperature (TXm) in June (1-month lead forecast), July (2-month lead forecast), and August (3-month lead forecast), respectively.
This paper is organized as follows. In the following section, we describe the case study site and materials. In section 3, we present methods for selecting predictors objectively and developing long-range forecast models based on Gaussian processes. Methods of analyzing mechanisms for notable predictors are also presented. Section 4 presents the results of our analyses and discussions. Section 5 completes the paper with conclusions.
2. Study site and materials
a. Study site
The case study site of South Korea is located in 32°–39°N latitude and 124°–132°E longitude region in East Asia (Fig. 1). The total area is approximately 100 360 km2. With four distinct seasons over the course of the year, annual total precipitation of the inland ranges between 1000 and 1800 mm; annual average mean temperature of the inland ranges between 10° and 15°C. About half of the total precipitation occurs during boreal summer; the long-term averaged annual precipitation of 62 Automated Synoptic Observing System (ASOS) stations is approximately 1331.5 mm for 1991–2020. Summer precipitation during June, July, and August is approximately 727.2 mm. August is the warmest month, followed by July and June. The long-term averaged mean daily temperatures of 62 ASOS stations are 25.1°, 24.6°, and 21.4°C, respectively.
The western North Pacific subtropical high (WNPSH) is known to be an important circulation system during boreal summer, affecting the East Asia weather (Lee et al. 2006; Choi and Kim 2019). The WNPSH is associated with ENSO and Indian Ocean variability (Wang and Zhang 2002; Wang et al. 2003). In addition, the WNPSH is also affected by variability in the tropical Atlantic and North Atlantic (Keenlyside and Latif 2007; Ham et al. 2013; Zhao et al. 2020; Chen et al. 2020; Myoung 2021). The intensity of the WNPSH and the location of its boundary affect the summertime temperature and precipitation of the study site (KMA 2018).
b. Materials
1) Temperature indices
Daily mean, minimum, and maximum 2-m air temperature data for 41 years (1980–2020) were obtained from the Korea Meteorological Administration (https://data.kma.go.kr) for 62 ASOS stations located in the inland area of South Korea. Monthly mean values for each station were calculated from daily data and averaged over the 62 stations to obtain TMm, TNm, and TXm for South Korea. We focused on the summer temperatures of June, July, and August, considering the importance of their forecasts to cope with adverse impacts of extremely hot summer temperatures. Monthly anomalies were obtained with the base period of 1991–2020. Monthly anomalies were linearly detrended for analyses because of the existence of long-term increasing trends based on both the Mann–Kendall test (p = 0.002 for June and 0.027 for August) and the linear trend test (p = 0.001 for June and 0.009 for August).
2) Observed climate data as predictors
We used global gridded datasets of outgoing longwave radiation (OLR), sea surface temperature (SST), precipitation (PRCP), snow cover extent (SCE), and 500- and 850-hPa geopotential height (Z500 and Z850) to derive predictors affecting summer temperatures of South Korea (see Table A1). Sea ice area (SIA) for the Barents, Bering, Kara, and Laptev Seas as well as pre-calculated monthly atmospheric and oceanic time series, such as Arctic Oscillation index (AO) and Pacific decadal oscillation index (PDO), were also used as well-known potential predictors (see Table A2).
Four notable predictors in this study. “Correl” indicates a 34-yr correlation (1980–2013) between the predictor and corresponding monthly mean temperature (TMm). The geographical locations of the predictors are shown in Fig. 6.
Monthly OLR data from the Climate Prediction Center (CPC), National Centers for Environmental Prediction (NCEP), and National Oceanic and Atmospheric Administration (NOAA) were downloaded from the data library of the International Research Institute for Climate and Society (IRI) of the Earth Institute, Columbia University (https://iridl.ldeo.columbia.edu/SOURCES/.NOAA/.NCEP/.CPC/.GLOBAL/.monthly/.olr/datafiles.html). The NOAA/Extended Reconstructed Sea Surface Temperature (ERSST) Version 5 data were used for monthly SST and were downloaded from the Physical Sciences Laboratory (PSL) of the Office of Oceanic and Atmospheric Research (OAR) (https://psl.noaa.gov/data/gridded/data.noaa.ersst.v5.html). Month PRCP data were obtained from the NOAA Global Precipitation Climatology Project (GPCP) dataset of NOAA/OAR/PSL (https://psl.noaa.gov/data/gridded/data.gpcp.html). Moreover, weekly SCE data were acquired from the NOAA Climate Data Record (CDR) of the Northern Hemisphere (NH) Snow Cover Extent (SCE) dataset (https://www.ncei.noaa.gov/data/snow-cover-extent/access/); the data were converted to monthly snow cover extent. Monthly Z500 and Z850 data were downloaded from the NOAA/OAR/PSL (https://psl.noaa.gov/data/gridded/data.ncep.reanalysis.pressure.html).
Moreover, monthly SIA calculated for the Barents, Bering, Kara, and Laptev Seas were obtained from the National Snow and Ice Data Center (NSIDC) (https://nsidc.org/data/G02135/versions/3). Pre-calculated atmospheric and oceanic indices were obtained mostly from NOAA/OAR/PSL (https://psl.noaa.gov/data/climateindices/list/). Teleconnection indices of NAO, EA, WP, EP/NP, PNA, EA/WR, SCA, and PE were downloaded from NOAA/CPC (https://www.cpc.ncep.noaa.gov/data/teledoc/telecontents.shtml). Monthly PDO data were acquired from Tokyo Climate Center (https://ds.data/jma.go.jp/tcc/tcc/products/elnino/decadal/pdo_month.html).
3) APCC multimodel ensemble data as predictors
Dynamical climate model results from APCC MME were used to derive additional predictors for the period with a lack of observed data. Monthly mean MME data for SST, PRCP, and Z500 calculated based on the simple composite method (SCM) during hindcast and forecast periods were obtained from the Climate Information toolKit (CLIK) (https://cliks.apcc21.org/dataset/mme/6-MON) (Table A3).
Performance of GP-PD from 34 LOOCV train-validation sets.
3. Methods
This section explains methods for selecting predictors objectively and developing long-range forecast models based on Gaussian processes. Gaussian process–based models have been widely used in both statistics and machine learning communities due to their advantages; GPs provide a principled, practical, and probabilistic approach and enable the interpretation of model predictions (Rasmussen and Williams 2006). GPs are known to be easier to handle and provide a well-founded framework to train and select models (Rasmussen and Williams 2006). Gaussian process models used in this study were developed with predetermined predictors from observations (GP-PD, hereafter), with objectively derived predictors from observations (GP-OBS, hereafter), and with objectively derived predictors from observations and MME (GP-MME, hereafter).
The overall train period was from 1980 to 2013: the leave-one-year-out cross validation (LOOCV) was performed to formulate models. Therefore, there were 34 LOOCV train-validation sets. The test period for final model evaluations was from 2014 to 2020.
a. Selection of predictors
1) Predetermined predictors
There were a total of 12 predictors from observations used for existing statistical models to predict summer temperatures in South Korea (Table 1). Several of them were derived by analyzing dipole or tripole patterns of atmospheric and oceanic variables related to summer temperatures. Although they were obtained based on their relationships to TMm, we used them for all target variables of TMm, TNm, and TXm in the GP-PD model. Detrended correlation coefficients between predictors and target variables during the 1980–2013 period are also shown (Table 1). While most predictors are highly correlated to target variables, correlations of the last four predictors are relatively low since they had originally been selected based on the July–August average temperatures (KMA 2018).
2) Objective selection of predictors
Selecting predictors objectively can help derive potential predictors that have not been identified or received attention in existing studies. Since factors responsible for air temperatures in South Korea may be notably different in different months, predictors were selected for each target month as well as for each temperature index and train set. Data from up to six previous months and the target month were used; if the target month is June, we used data from December of the previous year to June for potential predictors considering both lagged and contemporaneous correlations. Global gridded datasets, sea ice area, and pre-calculated atmospheric and oceanic indices were only available at most until April if predictions were made in mid-May because of the latency of observed data. APCC MME data are available for the period with no observed data.
The procedures for objective selection of predictors are described in Fig. 2. Spatially distributed correlation data with each temperature index were produced for each available monitoring month for each variable of global gridded datasets (Table A1). Outliers farther than five standard deviations from the mean were removed. We applied Gaussian filtering with 1σ for smoothing and derived regions with correlation coefficient values larger than 0.3. We tried a cluster analysis to obtain the regions but decided to simply derive regions based on the correlation coefficient criterion not to limit the number of regions. Smoothing was applied only to obtain visually smoother boundaries of derived regions; it does not significantly affect the performance. Regions smaller than 30 of 2.5° × 2.5° grids were removed to avoid narrow areas with high correlations by chance. Derived regions mostly appear spatially conterminous. Time series data for each region were then obtained by zonal averaging (Fig. 2a for target variable of TMm, target month of August, predictor variable of Z500, and monitoring month of April as an example). Furthermore, sea ice area and pre-calculated atmospheric and oceanic indices became predictors.
We obtained additional predictors for the period with no observed data from APCC MME (Table A3). Global gridded climate datasets of train sets were used to derive regions. Time series data for each region were obtained by zonal averaging. Time series data from APCC MME were bias corrected based on the variance scaling method using observed data, which is a procedure of shifting and scaling to adjust the mean and variance (Teutschbein and Seibert 2012). Bias-corrected MME data were then used for the validation or test sets (Fig. 2b for target variable of TMm, target month of August, predictor variable of Z500, and monitoring month of June as an example).
b. Gaussian processes
A GP is a supervised learning method to solve regression or classification problems. As a stochastic process, it is a generalization of the Gaussian probability distribution to functions; it governs the properties of functions while a probability distribution describes random variables (Rasmussen and Williams 2006). In supervised learning, we need observed data as inputs xi and outputs yi, and assume yi = f(xi) for some function f. As a nonparametric Bayesian model, a GP defines a prior over functions, p(f|X) = N(f|μ, K); a GP assumes that p[f(x1), …, f(xN)] is jointly Gaussian for a finite but arbitrary set of points x1,…, xN with some mean μ and covariance K given by Kij = κ(xi, xj), where κ is a positive definite kernel function (Murphy 2012). If points xi and xj are similar by the kernel, the output of the function at those points, f(xi) and f(xj), are considered to be similar (Murphy 2012).
c. Forecasting summer temperatures
The Gaussian Process Regressor of the gaussian_process module implementing the algorithm of Rasmussen and Williams (2006) from Python scikit-learn 0.24.0 library was used. The SE kernel can also be implemented using the kernels.RBF function of the gaussian_process module.
1) GP with predetermined predictors from observations
The GP model with predetermined predictors from observations (GP-PD) was developed for 34 LOOCV train-validation sets with the isotropic (ISO, hereafter) and anisotropic (ANISO, hereafter) length scale matrix of the SE kernel, as shown in Eq. (4). The type of the length scale matrix was chosen based on the model performance of 34 validation sets (Fig. 3a). The model performance of GP-PD was evaluated for the test period using the chosen type of the length scale matrix (Fig. 4a).
2) GP with objectively derived predictors from observations
The GP model with objectively derived predictors from observations (GP-OBS) was also developed for 34 LOOCV train-validation sets. Other than using the isotropic or anisotropic length scale matrix of the SE kernel, we examined two more conditions of model formulations. We tested the performance of the model with or without the restrictions of geographical extents of the locations of predictors. We expect the performance of the model would be better if the locations are not restricted (“locations not restricted”, LNR hereafter) because it would increase the possibility of the derivations of important predictors. Meanwhile, predictors derived from some restricted locations (“locations restricted”, LR hereafter) tend to be preferred because they are easy to interpret based on existing studies (e.g., KMA 2018).
Areas for LR are determined considering the locations of the predetermined predictors of existing statistical models (Table 1). Predictors are not used for LR if their centroids are located in the shaded areas in Fig. 5.
We also explored if the performance of the model can be improved if we only use more relevant predictors by adopting an additional criterion for the selection of the predictors: correlations to the target variable for LOOCV train sets (significance levels of 0.01 as P1, 0.05 as P5, and 0.1 as P10, hereafter, respectively).
The type of the length scale matrix, the geographical extent of the locations of predictors, and the optimal significance level were determined based on the model performance of 34 validation sets (Fig. 3b). The model performance of GP-OBS was evaluated for the test period using the chosen conditions (Fig. 4b).
3) GP with objectively derived predictors from observations and MME
Since MME data were only available for a shorter period of time (1991–2010) during the overall train period (1980–2013) and predictors from MME were intended to be used not as substitutes but as additional predictors, the same conditions chosen for GP-OBS were used for GP-MME. The model performance of GP-MME was evaluated for the test period (Fig. 4c).
d. Verification measures
The probabilistic forecasts were eventually presented as tercile probabilistic forecasts of above-normal (AN), near-normal (NN), and below-normal (BN) temperatures for South Korea. We used four categorical verification measures: proportion correct (PC), Heidke skill score (HSS), area under a receiver operating characteristic curve (AUC), and ranked probability skill score (RPSS). The PC and HSS were used for deterministic categorical forecasts, where probabilistic forecasts were converted based on the highest probabilities. The AUC and RPSS were used for probabilistic forecasts. Moreover, reliability diagrams and frequency histograms were also examined. Large samples are required for accurate measurements of reliability (WMO 2019), and we only used 7 years of the test period for assessing model performances. However, we still examined the reliability diagrams by pooling all target months and indices for assessing the reliability of the GP models for forecasting summer temperatures.
e. Primary predictors and their physical mechanisms
Permutation importance scores of each predictor were calculated as increased mean square errors when the values for the variable were randomly permutated. We ranked the predictors based on the permutation importance scores of train and test sets, respectively. After that, we chose highly important primary predictors whose ranks for both train and test sets were within 20% of all predictors.
Especially, we chose some notable predictors from GP-OBS and GP-MME for TMm to explore the objectively derived predictors for previously known mechanisms, focusing on June 2016, July 2018, and August 2018, in which intensive heat waves prevailed in South Korea. Large-scale circulations associated with the predictors were examined to understand the underlying physical mechanisms of the primary predictors. This was achieved by computing interannual correlations (41 years from 1980 to 2020) of the predictors with Z500, OLR, and SST on monthly time scales (both lagged and contemporaneous correlations from January to the target month). Correlations were also calculated with previously known predictors in KMA (2018) and other studies. This study examines the physical mechanisms of the four most notable predictors (notable predictors, hereafter) (Table 2; Fig. 6).
1) The EU_Z500_Feb
The EU_Z500_Feb is one of the primary predictors of TMm in June 2016 because of its strong positive correlation with TMm in June (r = 0.67). Correlations between the index and Z500 in February (Fig. 7a) show strong positive values in the Arctic regions and strong negative values in the midlatitudes, especially in East Asia. This atmospheric circulation pattern resembles the negative phases of the PE teleconnection pattern. Myoung (2021) asserted that a negative phase of December–February means PE (PE_DJF) tends to cause hotter summer and strong heat waves in South Korea; it can increase SSTs in the Philippine Sea during spring and then convection later in summer, which is responsible for the development of downward motion, anticyclonic circulations and then high TMm in the Korean Peninsula. These processes are manifested in the correlation of the EU_Z500_Feb, i.e., 1) positive correlations with SST over the Philippine Sea in April (Fig. 7f), 2) negative correlations with OLR over the South China Sea and the Philippine Sea in June (Fig. 7h), and 3) positive correlation with Z500 over South Korea and Japan (Fig. 7g). EU_Z500_Feb is strongly correlated with the PE_DJF (r = −0.65 for 1980–2020), supporting the speculation that impacts of the EU_Z500_Feb on TMm in June are analogous to the impacts of the PE_DJF on summer temperatures in South Korea.
2) The IND_Z500_Jan
The IND_Z500_Jan is the primary predictor of TMm in July 2018 (r = −0.38 with TMm), which indicates that TMm in July tends to be high when Z500 over India in January is low The NA_PRCP_Feb is the primary predictor of TMm in August 2018 (r = −0.55 with TMm), indicating that TMm in August tends to be high when precipitation in the northwestern Pacific and Southwestern United States is low The correlations between the IND_Z500_Jan and SST in January and February (Figs. 8c,f) show strong negative values in the eastern and central tropical Pacific but positive values in the western tropical Pacific. Negative correlations prevail over the Indian Ocean from late winter to spring (Figs. 8f,i).
3) The NA_PRCP_Feb
Similar patterns are observed in the correlations between the NA_PRCP_Feb and SST (Figs. 9c,f). These SST features look like typical eastern Pacific type La Niña events. Furthermore, correlations of the previous winter-mean Niño-3.4 with the atmospheric variables (not shown) also capture strong links with Z500 over India in January (similar to the IND_Z500_Jan) and OLR over the northwestern Pacific and the Southwestern United States in February (similar to the NA_PRCP_Feb), implying that physical processes of the two notable predictors can be understood in terms of the impacts of wintertime ENSO variability on July and August temperatures in South Korea.
Impacts of ENSO in the previous winter on summer temperatures in Korea is indirect via SST variability in the Indian Ocean, called “Indian Ocean Capacitor Effect” (Xie et al. 2009, 2016); Impacts of warm springtime the Indian Ocean caused by previous wintertime El Niño events tend to induce anticyclonic circulations in the western subtropical Pacific (i.e., enhanced WNPSH) in summer and then cyclonic circulations in East Asia, decreasing summer temperatures in South Korea. The opposite occurs under La Niña conditions; the cool Indian Ocean in spring and then cyclonic circulations in the western subtropical Pacific (i.e., weakened WNPSH) and anticyclonic circulations in East Asia increase summer temperatures in South Korea. These features are concurrently found in the results of correlations of the two predictors, IND_Z500_Jan and NA_PRCP_Feb, in Figs. 8j, 9g, and 9j. During the winter in 2017/18, a La Niña event occurred (December–February mean Niño-3.4 = −0.91), causing a cooler Indian Ocean in the following spring. In KMA (2018), it is also pointed out that colder SST in April over the eastern Indian Ocean (“tropical Indian Ocean” in the predetermined predictors in Table 1) can result in hotter July and August.
4) The EA_PRCP_Aug
Predictors based on MME outputs are also notable. For instance, a higher TMm in Aug was predicted in 2018 by lower values of the EA_PRCP_Aug in 2018 due to their strong negative correlation (r = −0.83). Correlations in Fig. 10 indicate that TMm in August would increase if MME expects strong development of anticyclonic circulations in East Asia and then suppressed precipitation in South Korea and Japan in August. These results suggest that a reliable prediction of MME can be a good predictor of summer temperatures in South Korea, which happened in August 2018.
4. Results and discussions
a. Model formulation by leave-one-year-out cross validation
1) GP with predetermined predictors from observations
GP-PD models with the isotropic length scale matrix perform better in several cases than the models with the anisotropic length scale matrix based on the LOOCV of the overall train period (Table 3 and Fig. 11). There are more cases of ISO with better performance (Table 3 and Fig. 11). Skill scores for ANISO are below the criteria of random forecasts for TMm in July (HSS = −0.01, RPSS = −0.06), TXm in July (PC = 0.24, HSS = −0.19, AUC = 0.47, RPSS = −0.23), and TXm of August (RPSS = −0.08) (Table 3). We selected ISO for GP-PD.
Although GP-PD uses small numbers of predictors (Table 1), some predictors are given very large length scale values for ANISO. For example, the length scale values of the first (SCE of March) and the second (SST of January) predictors were 0.17 for ISO and 0.01 and 60 983.50 for ANISO for TNm in June and cv1982 (where the models were trained with data of 1980, 1981, and 1983–2013 and validated with data of 1982). For TMm in July and cv1980, the length scale was 6.27 of ISO. However, the length scales of five predictors out of seven were larger values than 1000. The length scale for SIA of Bering Sea in April reached the maximum value assigned (105).
Very large length scale values imply that the covariance is approximately independent of the predictors and they are excluded from the inference (Rasmussen and Williams 2006). The removal of such irrelevant predictors by assigning large length scales results in better performance, and ANISO would outperform ISO. However, our results show that ISO outperforms ANISO overall because of the following two reasons: we only included quite relevant predictors (for GP-PD from existing studies and for GP-OBS and GP-MME from correlation analyses), and the length of the train period was relatively short to experience all different mechanisms. Detailed mechanisms controlling a target variable may be different every year, and the role of each predictor may also differ. Assigning different length scales to predictors in ANISO led to the exclusion of some predictors in our results owing to giving them very large length scales; furthermore, the models with ANISO were overfitted to train sets. Assignment of the same length scale to all predictors in ISO shows overall better results.
2) GP with objectively derived predictors from observations
GP-OBS models with the isotropic length scale matrix (ISO), not-restricted geographical extents of the locations of predictors (LNR), and an additional criterion for the selection of the predictors with a significance level of 0.05 (P5) perform better based on the LOOCV of the train period (see Table 4 and Fig. 12 for ISO and P5). There are more cases of ISO with better performance than ANISO, probably due to the same reasons mentioned for GP-PD (data not shown). The effect of the additional criteria for the selection of the predictors was not large; skill scores of P5 were only slightly higher (data not shown). However, there are some notable differences in skill scores between LR and LNR (Table 4 and Fig. 12 for ISO and P5). Although there are more cases with skill scores below the criteria of random forecasts for LNR (PC = 0.29, HSS = −0.06, RPSS = −0.07 for TMm in August, and RPSS = −0.16 for TXm in August) than LR (RPSS = −0.05 and −0.07 for TMm in August and TXm in August, respectively), the overall performance of LNR is better than LR. We selected ISO, LNR, and P5 for GP-OBS and GP-MME.
Performance of GP-OBS from 34 LOOCV train-validation sets with ISO and P5 conditions.
LNR performs better than LR, implying that the predictors located even in areas that have not been considered in previous studies may play important roles. Although not all predictors are going to be interpreted one by one, our results show the value of selecting predictors objectively.
b. Model verifications
GP-PD, GP-OBS, and GP-MME models were evaluated for the test period from 2014 to 2020 (Fig. 4). There was an overlapped period between the predictor analysis period of GP-PD (KMA 2018) and the test period; the analysis period for the first eight predictors of GP-PD (Table 1) was 1979–2016. Furthermore, the analysis period for the remaining four predictors covered at least 2014 (KMA 2018). We still used the test period of 2014–20 for final evaluations because the 2017–20 period was too short.
First, we compared GP-PD and GP-OBS because both methods use predictors only from observations. GP-OBS outperformed GP-PD mostly in June and July (Table 5 and Fig. 13). However, GP-PD shows higher scores for PC and HSS for TXm in June. Moreover, GP-OBS present higher scores for AUC and RPSS, indicating better calibrations (Table 5 and Fig. 13). However, the performance of GP-PD in August is better for all temperature indices except the AUC of TNm (Table 5 and Fig. 13). It is possibly due to the overlapped period of data analysis, including 2014, as mentioned above. While GP-PD correctly predicted TMm and TXm in 2014 as BN, GP-OBS predicted them as AN (Fig. 14).
Performance of GP-PD, GP-OBS, and GP-MME for the test period.
We then compared GP-OBS and GP-MME because we used additional predictors from APCC MME for the period with no observed data for GP-MME. The effect of additional predictors from MME is not apparent for June and July; GP-OBS and GP-MME show highly comparable scores with some disagreements between verification measures for TMm in June and TXm in July (Table 5 and Fig. 13). However, GP-MME mostly outperformed GP-OBS in August, implying larger contributions of the additional predictors from MME. GP-MME even outperformed GP-PD for TMm and TNm in August (Table 5 and Fig. 13).
We found that GP-OBS with objectively selected predictors performs better than GP-PD with predetermined predictors in June and July. GP-MME with objectively selected predictors from observations and MME mostly outperforms GP-OBS and GP-PD, especially in August. While the contribution of the objective selection of predictors is apparent for all target months, the contribution of additional predictors from MME is more prominent in August. It is because the period with no observed data is longer for the target month of August than June and July. Data are only available until at most April in mid-May; however, we used MME data at least from May to August for the target month of August.
We also examined the overall performances of the models for the test period using ROC curves, reliability diagrams, and frequency histograms (Fig. 15). The ROC curves of GP-OBS and GP-MME show better skill than GP-PD for AN and BN. Furthermore, AUC scores for BN are relatively low for all three models (Fig. 15). Reliability curves are not smooth due to the small sample size (WMO 2019); there were only 63 sets of data in the test period pooled over all target months and temperature indices. However, the reliability of AN and NN forecasts still seems to have been improved for GP-OBS and GP-MME than GP-PD with enhanced sharpness (Fig. 15). We expect enhanced reliability and sharpness for better predictions. For example, the means of the probability distributions of GP-OBS and GP-MME are closer to the observed value, and the density of probability values is higher in 2016 for June and in 2018 for July and August (Fig. 16).
5. Conclusions
We propose the OLRAF-GP model to be used for regions in great need of more predictors besides the well-known drivers, such as ENSO. We focus on summertime near-surface air temperatures, including TMm, TNm, and TXm in June (1-month lead forecast), July (2-month lead forecast), and August (3-month lead forecast).
The proposed model is based on the GP, which can work with nonlinear interactions and produces probabilistic forecast results. The predictors of the proposed model were objectively selected based on their relationships with the target variables. We compared GP-PD and GP-OBS because both methods use predictors only from observations. While the performance of GP-PD is still better for all temperature indices except the AUC of TNm in August, GP-OBS outperformed GP-PD mostly in June and July, reflecting the contributions of the objectively derived predictors. The proposed model also uses dynamical climate model results from APCC MME as additional predictors for the period with no observed data. We compared GP-OBS and GP-MME. The effect of additional predictors from MME is not apparent for June and July. However, GP-MME mostly outperformed GP-OBS, implying larger contributions of the additional predictors from MME in August. GP-MME even outperformed GP-PD for TMm and TNm in August. The use of the OLRAF-GP models, especially GP-MME, is expected to contribute to better forecast summertime temperatures in regions where existing models have been struggling.
We attempted to find the physical mechanisms of the notable predictors and to understand them for the previously known predictors and their mechanisms. As demonstrated in section 4c, the physical processes associated with the notable predictors are found to be aligned with those in KMA (2018) and the previous studies (e.g., Xie et al. 2009, 2016; Myoung 2021). For example, the IND_Z500_Jan and the NA_PRCP_Feb as a good predictor of TMm in July and August, respectively, seem to be attributed to El Niño/La Niña conditions in the previous winter and the related Indian Ocean Capacitor Effect (Figs. 8 and 9). Additionally, the EU_Z500_Feb tends to increase TMm in June via influences on SSTs in the Philippine Sea and then summer temperature in Korea. These processes are not found in KMA (2018); however, they are analogous to the impacts of wintertime PE on summer temperatures in South Korea, which is published recently (Myoung 2021). These results imply that the mechanisms of the objectively selected predictors can be physically meaningful and that inclusion of those predictors in the model can contribute to the improvement of model performance and efficiency.
Acknowledgments.
This research was supported by the APEC Climate Center (APCC). The authors acknowledge the APCC multi-model ensemble (MME) Producing Centers for making their hindcast/forecast data available for analysis, the APCC for collecting and archiving the data, and for producing APCC MME predictions.
APPENDIX A
Observed Climate Data and APCC Multi-Model Ensemble Data
We used global gridded datasets of OLR, SST, PRCP, SCE, Z500, and Z850 to derive predictors affecting summer temperatures of South Korea (Table A1). SIA for the Barents, Bering, Kara, and Laptev Seas as well as pre-calculated monthly atmospheric and oceanic time series, such as AO and PDO, were used as well-known potential predictors (Table A2). Monthly mean MME data for SST, PRCP, and Z500 calculated based on the simple composite method (SCM) during hindcast and forecast periods were also used (Table A3).
Sea ice and pre-calculated atmospheric and oceanic indices used as predictors.
Sea ice and pre-calculated atmospheric and oceanic indices used as predictors.
APCC MME datasets used to derive predictors.
APPENDIX B
Verification Measures
APCC MME datasets used to derive predictors.
A receiver operating characteristics (ROC) graph has been used in many fields for visualizing and analyzing the behavior of diagnostic systems and for evaluating and comparing model performances (Fawcett 2006; Mason 1982). ROC graphs are two-dimensional graphs with a false positive rate on the x axis, and a true positive rate on the y axis obtained based on a set of thresholds covering the range of forecasted probabilities. We used AUC to compare model performances because a ROC curve is a two-dimensional depiction of performance (Hanley and McNeil 1982); the AUC is the portion of the area under the ROC curve. The AUC score would be 1.0 for a perfect set of forecasts, and 0.5 for a set of random forecasts. Microaveraged values of AUC were used owing to their lack of observed data in the BN category for TMm and TXm in June during the test period.
Reliability of forecasts can be assessed using reliability diagrams (Wilks 1995; Hamill 1997), with forecast probability on the x axis and observed relative frequency on the y axis. A forecasting system can be considered reliable if the forecast probabilities agree with the observed relative frequency; therefore, the diagonal lines of reliability diagrams mean perfect reliability.
REFERENCES
Adler, R. F., and Coauthors, 2003: The version-2 Global Precipitation Climatology Project (GPCP) Monthly Precipitation Analysis (1979–present). J. Hydrometeor., 4, 1147–1167, https://doi.org/10.1175/1525-7541(2003)004<1147:TVGPCP>2.0.CO;2.
Badr, H. S., B. F. Zaitchik, and S. D. Guikema, 2014: Application of statistical models to the prediction of seasonal rainfall anomalies over the Sahel. J. Appl. Meteor. Climatol., 53, 614–636, https://doi.org/10.1175/JAMC-D-13-0181.1.
Barnston, A. G., and R. E. Livezey, 1987: Classification, seasonality and persistence of low-frequency atmospheric circulation patterns. Mon. Wea. Rev., 115, 1083–1126, https://doi.org/10.1175/1520-0493(1987)115<1083:CSAPOL>2.0.CO;2.
Barnston, A. G., and Coauthors, 1994: Long-lead seasonal forecasts—Where do we stand? Bull. Amer. Meteor. Soc., 75, 2097–2114, https://doi.org/10.1175/1520-0477(1994)075<2097:LLSFDW>2.0.CO;2.
Bell, G. D., and J. E. Janowiak, 1995: Atmospheric circulation associated with the Midwest floods of 1993. Bull. Amer. Meteor. Soc., 76, 681–696, https://doi.org/10.1175/1520-0477(1995)076<0681:ACAWTM>2.0.CO;2.
Brier, G. W., 1950: Verification of forecasts expressed in terms of probabilities. Mon. Wea. Rev., 78, 1–3, https://doi.org/10.1175/1520-0493(1950)078<0001:VOFEIT>2.0.CO;2.
Chen, S., R. Wu, W. Chen, K. Hu, and B. Yu, 2020: Structure and dynamics of a springtime atmospheric wave train over the North Atlantic and Eurasia. Climate Dyn., 54, 5111–5126, https://doi.org/10.1007/s00382-020-05274-7.
Chiang, J. C. H., and D. J. Vimont, 2004: Analogous Pacific and Atlantic meridional modes of tropical atmosphere–ocean variability. J. Climate, 17, 4143–4158, https://doi.org/10.1175/JCLI4953.1.
Choi, W., and W.-Y. Kim, 2019: Summertime variability of the western North Pacific subtropical high and its synoptic influences on the East Asian weather. Sci. Rep., 9, 7865, https://doi.org/10.1038/s41598-019-44414-w.
Ebdon, R. A., 1960: Notes on the wind flow at 50mb in tropical and sub-tropical regions in January 1957 and January 1958. Quart. J. Roy. Meteor. Soc., 86, 540–542, https://doi.org/10.1002/qj.49708637011.
Enfield, D. B., A. M. Mestas-Nunez, D. A. Mayer, and L. Cid-Serrano, 1999: How ubiquitous is the dipole relationship in tropical Atlantic sea surface temperatures? J. Geophys. Res., 104, 7841–7848, https://doi.org/10.1029/1998JC900109.
Enfield, D. B., A. M. Mestas-Nunez, and P. J. Trimble, 2001: The Atlantic multidecadal oscillation and its relation to rainfall and river flows in the continental U.S. Geophys. Res. Lett., 28, 2077–2080, https://doi.org/10.1029/2000GL012745.
Epstein, E. S., 1969: A scoring system for probability forecasts of ranked categories. J. Appl. Meteor., 8, 985–987, https://doi.org/10.1175/1520-0450(1969)008<0985:ASSFPF>2.0.CO;2.
Fawcett, T., 2006: An introduction to ROC analysis. Pattern Recognit. Lett., 27, 861–874, https://doi.org/10.1016/j.patrec.2005.10.010.
Fetterer, F., K. Knowles, W. N. Meier, M. Savoie, and A. K. Windnagel, 2017: Sea ice index, version 3 (N_Sea_Ice_Index_Regional_Monthly_Data_G02135_v3.0.xlsx). National Snow and Ice Data Center, Boulder, CO, accessed 15 May 2020, https://doi.org/10.7265/N5K072F8.
Genthon, C., and A. Armengaud, 1995: GCM simulations of atmospheric tracers in the polar latitudes: South Pole (Antarctica) and Summit (Greenland) cases. Sci. Total Environ., 160–161, 101–116, https://doi.org/10.1016/0048-9697(95)04348-5.
Gneiting, T., A. E. Raftery, A. H. Westveld, and T. Goldman, 2005: Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Mon. Wea. Rev., 133, 1098–1118, https://doi.org/10.1175/MWR2904.1.
Graystone, P., 1959: Meteorological office discussion on tropical meteorology. Meteor. Mag., 88, 117.
Ham, Y. G., J. S. Kug, J. Y. Park, and F. F. Jin, 2013: Sea surface temperature in the north tropical Atlantic as a trigger for El Niño/Southern Oscillation events. Nat. Geosci., 6, 112–116, https://doi.org/10.1038/ngeo1686.
Hamill, T. M., 1997: Reliability diagrams for multicategory probabilistic forecasts. Wea. Forecasting, 12, 736–741, https://doi.org/10.1175/1520-0434(1997)012<0736:RDFMPF>2.0.CO;2.
Hanley, J. A., and B. J. McNeil, 1982: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36, https://doi.org/10.1148/radiology.143.1.7063747.
Heidke, P., 1926: Berechnung Des Erfolges Und Der Gute Der Windstarkevorhersagen Im Sturmwarnungsdienst (in German). Geogr. Ann., 8, 301–349, https://doi.org/10.1080/20014422.1926.11881138.
Huang, B., and Coauthors, 2017: NOAA Extended Reconstructed Sea Surface Temperature (ERSST), version 5. NOAA/National Centers for Environmental Information, accessed 15 May 2020, https://doi.org/10.7289/V5T72FNM.
IPCC, 2012: Managing the Risks of Extreme Events and Disasters to Advance Climate Change Adaptation. Cambridge University Press, 582 pp.
Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project. Bull. Amer. Meteor. Soc., 77, 437–472, https://doi.org/10.1175/1520-0477(1996)077<0437:TNYRP>2.0.CO;2.
Keenlyside, N. S., and M. Latif, 2007: Understanding equatorial Atlantic interannual variability. J. Climate, 20, 131–142, https://doi.org/10.1175/JCLI3992.1.
KMA, 2018: Long-Range Forecast Guidance (III) Based on Climate Monitoring and Analyses (in Korean). KMA, 226 pp.
Kumar, A., 2010: On the assessment of the value of the seasonal forecast information. Meteor. Appl., 17, 385–392, https://doi.org/10.1002/met.167.
Lee, E.-J., S.-W. Yeh, J.-G. Jhun, and B.-K. Moon, 2006: Seasonal change in anomalous WNPSH associated with the strong East Asian summer monsoon. Geophys. Res. Lett., 33, L21702, https://doi.org/10.1029/2006GL027474.
Liebmann, B., and C. A. Smith, 1996: Description of a complete (interpolated) outgoing longwave radiation dataset. Bull. Amer. Meteor. Soc., 77, 1275–1277, http://www.jstor.org/stable/26233278.
Lindzen, R. S., and J. R. Holton, 1968: A theory of the quasi-biennial oscillation. J. Atmos. Sci., 25, 1095–1107, https://doi.org/10.1175/1520-0469(1968)025<1095:ATOTQB>2.0.CO;2.
Lorenz, E. N., 1951: Seasonal and irregular variations of the Northern Hemisphere sea-level pressure profile. J. Meteor., 8, 52–59, https://doi.org/10.1175/1520-0469(1951)008<0052:SAIVOT>2.0.CO;2.
Mantua, N. J., S. R. Hare, Y. Zhang, J. M. Wallace, and R. C. Francis, 1997: A Pacific interdecadal climate oscillation with impacts on salmon production. Bull. Amer. Meteor. Soc., 78, 1069–1079, https://doi.org/10.1175/1520-0477(1997)078<1069:APICOW>2.0.CO;2.
Mason, I., 1982: A model for assessment of weather forecasts. Aust. Meteor. Mag., 30, 291–303.
Min, Y. M., V. N. Kryjov, and S. M. Oh, 2014: Assessment of APCC multi-model ensemble prediction in seasonal climate forecasting: Retrospective (1983–2003) and real-time forecast (2008–2013). J. Geophys. Res. Atmos., 119, 12 132–12 150, https://doi.org/10.1002/2014JD022230.
Min, Y. M., V. N. Kryjov, S. M. Oh, and H. J. Lee, 2017: Skill of real-time operational forecasts with the APCC multi-model ensemble prediction system during the period 2008–2015. Climate Dyn., 49, 4141–4156, https://doi.org/10.1007/s00382-017-3576-2.
Murphy, K. P., 2012: Machine Learning: A Probabilistic Perspective. The MIT Press, 1067 pp.
Myoung, B., 2021: Recent trend of cold winters followed by hot summers in South Korea due to the combined effects of the warm western tropical Pacific and North Atlantic in spring. Environ. Res. Lett., 16, 084014, https://doi.org/10.1088/1748-9326/ac1134.
Overland, J. E., J. M. Adams, and N. A. Bond, 1999: Decadal variability of the Aleutian low and its relation to high-latitude circulation. J. Climate, 12, 1542–1548, https://doi.org/10.1175/1520-0442(1999)012<1542:DVOTAL>2.0.CO;2.
Raftery, A. E., T. Gneiting, F. Balabdaoui, and M. Polakowski, 2005: Using Bayesian model averaging to calibrate forecast ensembles. Mon. Wea. Rev., 133, 1155–1174, https://doi.org/10.1175/MWR2906.1.
Rajeevan, M., D. S. Pai, R. A. Kumar, and B. Lal, 2007: New statistical models for long-range forecasting of southwest monsoon rainfall over India. Climate Dyn., 28, 813–828, https://doi.org/10.1007/s00382-006-0197-6.
Rasmussen, C. E., and C. K. I. Williams, 2006: Gaussian Processes for Machine Learning. The MIT Press, 248 pp.
Ray, P., X. Zhou, H. Tan, J. Dudhia, and M. W. Moncrieff, 2021: Improved simulation of midlatitude climate in a new channel model compared to contemporary global climate models. Geophys. Res. Lett., 48, e2021GL093297, https://doi.org/10.1029/2021GL093297.
Robinson, D. A., T. W. Estilow, and NOAA CDR Program, 2012: NOAA Climate Data Record (CDR) of Northern Hemisphere (NH) Snow Cover Extent (SCE), version 1. NOAA/National Centers for Environmental Information, accessed 15 May 2020, https://doi.org/10.7289/V5N014G9.
Schepen, A., Q. J. Wang, and D. E. Robertson, 2012: Combining the strengths of statistical and dynamical modeling approaches for forecasting Australian seasonal rainfall. J. Geophys. Res., 117, D20107, https://doi.org/10.1029/2012JD018011.
Sillmann, J., and Coauthors, 2017: Understanding, modeling and predicting weather and climate extremes: Challenges and opportunities. Wea. Climate Extremes, 18, 65–74, https://doi.org/10.1016/j.wace.2017.10.003.
Tang, Y., and Coauthors, 2018: Progress in ENSO prediction and predictability study. Natl. Sci. Rev., 5, 826–839, https://doi.org/10.1093/nsr/nwy105.
Teutschbein, C., and J. Seibert, 2012: Bias correction of regional climate model simulations for hydrological climate-change impact studies: Review and evaluation of different methods. J. Hydrol., 456–457, 12–29, https://doi.org/10.1016/j.jhydrol.2012.05.052.
Thompson, D. W. J., and J. M. Wallace, 1998: The Arctic oscillation signature in the wintertime geopotential height and temperature fields. Geophys. Res. Lett., 25, 1297–1300, https://doi.org/10.1029/98GL00950.
Trenberth, K. E., 1997: The definition of El Niño. Bull. Amer. Meteor. Soc., 78, 2771–2777, https://doi.org/10.1175/1520-0477(1997)078<2771:TDOENO>2.0.CO;2.
Trenberth, K. E., and D. P. Stepaniak, 2001: Indices of El Niño evolution. J. Climate, 14, 1697–1701, https://doi.org/10.1175/1520-0442(2001)014<1697:LIOENO>2.0.CO;2.
Troup, A. J., 1965: The ‘southern oscillation.’ Quart. J. Roy. Meteor. Soc., 91, 490–506, https://doi.org/10.1002/qj.49709139009.
van Oldenborgh, G. J., M. A. Balmaseda, L. Ferranti, T. N. Stockdale, and D. L. T. Anderson, 2005: Did the ECMWF seasonal forecast model outperform statistical ENSO forecast models over the last 15 years? J. Climate, 18, 3240–3249, https://doi.org/10.1175/JCLI3420.1.
Wallace, J. M., and D. S. Gutzler, 1981: Teleconnections in the geopotential height field during the Northern Hemisphere winter. Mon. Wea. Rev., 109, 784–812, https://doi.org/10.1175/1520-0493(1981)109<0784:TITGHF>2.0.CO;2.
Wang, B., and Q. Zhang, 2002: Pacific–East Asian teleconnection. Part II: How the Philippine Sea anticyclone is established during El Niño development. J. Climate, 15, 3252–3265, https://doi.org/10.1175/1520-0442(2002)015<3252:PEATPI>2.0.CO;2.
Wang, B., R. Wu, and T. Li, 2003: Atmosphere–warm ocean interaction and its impacts on the Asian–Australian monsoon variation. J. Climate, 16, 1195–1211, https://doi.org/10.1175/1520-0442(2003)16<1195:AOIAII>2.0.CO;2.
Wang, C., and D. B. Enfield, 2001: The tropical Western Hemisphere warm pool. Geophys. Res. Lett., 28, 1635–1638, https://doi.org/10.1029/2000GL011763.
Wilks, D. S., 1995: Statistical Methods in the Atmospheric Sciences: An Introduction. International Geophysics Series, Vol. 59, Elsevier, 467 pp.
WMO, 2019: Guidance on operational practices for objective seasonal forecasting. WMO-Rep. 1246, WMO, 91 pp.
Xie, S.-P., K. M. Hu, J. Hafner, H. Tokinaga, Y. Du, G. Huang, and T. Sampe, 2009: Indian Ocean capacitor effect on Indo-western Pacific climate during the summer following El Niño. J. Climate, 22, 730–747, https://doi.org/10.1175/2008JCLI2544.1.
Xie, S.-P., Y. Kosaka, Y. Du, K. M. Hu, J. S. Chowdary, and G. Huang, 2016: Indo-western Pacific Ocean capacitor and coherent climate anomalies in post-ENSO summer: A review. Adv. Atmos. Sci., 33, 411–432, https://doi.org/10.1007/s00376-015-5192-6.
Zhao, W., W. Chen, S. Chen, S. L. Yao, and D. Nath, 2020: Combined impact of tropical central‐eastern Pacific and North Atlantic sea surface temperature on precipitation variation in monsoon transitional zone over China during August–September. Int. J. Climatol., 40, 1316–1327, https://doi.org/10.1002/joc.6231.