1. Introduction
Anthropogenic climate change has been extensively examined in recent decades. The main findings from these studies have been used to better understand past and future climate change and variability. Among others, the Fifth Assessment Report (AR5) of the Intergovernmental Panel on Climate Change (IPCC) concluded that human activity has been the primary cause of changes in global and regional climate since the 1900s and that climate changes will very likely continue into the future (IPCC 2013). These results are largely based on climate projection experiments that typically prescribe external forces in the climate models as part of phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012).
Unlike climate projections that span multiple decades, near-term climate change is controlled not only by anthropogenic forces, but also by natural variability (e.g., Meehl et al. 2014). This is particularly true for climate change and variability on time scales of a decade or shorter (Smith et al. 2010; Ham et al. 2014; Meehl and Teng 2014). Smith et al. (2007) showed that interannual-to-decadal climate change and variability can be partially predicted by describing the natural variability in a coupled model through initializing oceanic variables. Doblas-Reyes et al. (2013) and Meehl and Teng (2014) further showed that climate models with observed initial and boundary conditions produce a more realistic near-surface air temperature (TAS; 2-m air temperature) evolution than uninitialized simulations, particularly for the mid-1970s climate shift and the early-2000s global warming hiatus.
Like interannual-to-decadal climate predictions, seasonal-to-interannual climate predictions are still in their infancy in comparison to seasonal prediction or multidecadal climate projections. Since the early 2000s, several international programs, such as the Development of a European Multimodel Ensemble System For Seasonal-to-Interannual Predictions (DEMETER; Palmer et al. 2004), the Climate Prediction and its Application to Society (CliPAS; Wang et al. 2009), and the Ensemble-Based Predictions of Climate Changes and Their Impacts (ENSEMBLES; Weisheimer et al. 2009), have tried to assess climate predictability on a multiseasonal time scale with forecast lead times of up to 9 months. Most operational climate prediction systems have useful prediction skills for the El Niño–Southern Oscillation (ENSO) beyond 6–9 months (e.g., Jin et al. 2008; Wang et al. 2009; Luo et al. 2016). The JAMSTEC SINTEX-F demonstrated skillful prediction of ENSO with lead times of up to 2 years (Luo et al. 2008), indicating the potential for seamless seasonal-to-decadal climate predictions.
The above studies suggest that an initialized coupled model is promising for climate prediction at seasonal-to-decadal time scales. However, previous studies have commonly analyzed seasonal-to-interannual and interannual-to-decadal climate predictions separately. For example, the former have been evaluated with operational climate predictions of up to 6–9 months. Although some studies documented a 2-yr-long prediction skill (e.g., Luo et al. 2008), multimodel analyses covering months to years are rare. In contrast, the latter typically utilizes 5- to 10-yr-long model simulations. However, most studies have evaluated prediction skills by smoothing variables of interest with 1-, 2–5-, or 2–9-yr averages. As such, seasonality and interannual variability have not been addressed.
The present study aims to evaluate the seasonal-to-interannual prediction skills of TAS and ENSO of state-of-the-art climate models that have participated in the CMIP5 decadal hindcast/forecast experiments, which provide 10-yr-long ensemble predictions from multiple modeling groups [here called CMIP5 decadal hindcasts (Meehl et al. 2014)]. All simulations are initialized with observed initial and boundary conditions in either November or January of selected years and integrated for up to 10 years. These simulations have already been utilized to understand the prediction skills for TAS, sea surface temperature (SST), precipitation, hurricane activity, and regional surface climate on interannual-to-decadal time scales (Smith et al. 2010; Kim et al. 2012; van Oldenborgh 2012; Doblas-Reyes et al. 2013; Goddard et al. 2013; Caron et al. 2014; Meehl et al. 2014; Bellucci et al. 2015). However, regional and global prediction skills at seasonal-to-interannual time scales have not been extensively examined.
In this study, to quantify the seasonal-to-interannual prediction skills for TAS and ENSO, individual and multimodel ensemble (MME) hindcasts of up to 3–4 years are evaluated. Seasonal-mean values are used instead of annual-mean or multiyear-mean values, and only simulations initialized in the same month are used. This allows us to objectively identify the seasonal dependency of prediction skills and to quantify the maximum forecast lead time. The reliability of prediction skill and characteristics of ensemble dispersion (model error and ensemble spread) are also examined.
In the next section, a brief description of the data is presented. The two deterministic metrics used to quantify the prediction skills are described in section 3. Section 4 presents the prediction skills for TAS and ENSO in CMIP5 decadal hindcasts. A summary and discussion are presented in section 5.
2. Data
Table 1 summarizes the CMIP5 models used in this study. These models are [in IPCC AR5 acronyms (IPCC 2013)] BCC_CSM1.1 (Wu et al. 2014), CanCM4 (Arora et al. 2011), GFDL CM2.1 (Delworth et al. 2006), MIROC5 (Tatebe et al. 2012), and MPI-ESM-LR (Matei et al. 2012). Two additional models, HadCM3 (Smith et al. 2010) with anomaly initialization (HadCM3a) and full-field initialization (HadCM3f), are examined separately. Although more than 10 models are available for decadal hindcasts in the CMIP5 archive (Meehl and Teng 2014), only these 7 models—initialized every year since 1980—are selected in this study. Only the satellite era (after 1980) is considered, because this is when reliable TAS data over the oceans are available. Each model in this study has been involved in at least 27 experiments with varying starting years between 1981 and 2011. The prediction skills of MME hindcasts are evaluated using five models (BCC_CSM1.1, CanCM4, GFDL CM2.1, MIROC5, and MPI-ESM-LR) initialized in January of every year. Since the last BCC_CSM1.1 initialization is in 2007, the MME is evaluated with starting years from 1981 to 2007. Since HadCM3a and HadCM3f are initialized in November instead of in January from 1980 to 2009, they are not included in the MME and are analyzed separately. The MME-H denotes the ensemble mean of HadCM3a and HadCM3f. Each of these experiments consists of at least three ensemble members (Table 1) and is integrated for 10 years.
Description of the models used in this study.

The prediction skill is evaluated by comparing the model hindcast/forecast with the TAS of the ERA-Interim (Dee and Uppala 2009) and with the observed SST of the Extended Reconstructed Sea Surface Temperature Version 3 (ERSSTv3b; Smith et al. 2008). Observations and model data are first interpolated to a uniform 2.5° × 2.5° grid to reduce uncertainty associated with different data resolutions. Then a full-field bias adjustment is applied according to the guidelines of CLIVAR (2011), which removes the model drift (i.e., time-varying bias) that may remain even when anomaly initialization is used (Smith et al. 2013; Meehl and Teng 2014). After bias correction, the ensemble mean of each experiment is calculated. The prediction skill scores are computed using the ensemble-mean data. Likewise, the MME is also constructed with each model’s ensemble mean instead of averaging individual ensemble members in all experiments.
Prediction skills are primarily evaluated for the global-mean TAS and ENSO index. The ENSO, which has important implications for interannual climate variability not only in the tropics but also in the extratropics, is quantified by the Niño-3.4 index (SST averaged over 170°–120°W and 5°S–5°N). Although seasonal-to-interannual prediction typically covers a period from 1 month up to 2 years, the prediction skills are evaluated in 3–4-yr-long ensemble hindcast runs in this study.
3. Method






Here,
With regard to skillful prediction, subjective values [i.e., ACC greater than 0.5–0.7 and MSSS greater than 0 (i.e.,



4. Results
a. Global-mean near-surface air temperature
The overall performance of the CMIP5 decadal hindcast is evaluated for TAS prediction. Figure 1a shows the time series of annual-mean TAS averaged over 60°S–70°N and 0°–360°E [global-mean near-surface air temperature (GMT)] predicted by the MME. Only the first 4-yr predictions are shown in rainbow colors and compared with the reanalysis data. It is found that the model drift disappears by performing a bias correction and that the MME predicts the long-term trend of GMT reasonably well. A quantitative evaluation of the MME prediction skills is shown in Fig. 1b as a function of forecast lead years in which all four seasons are considered separately. Here, March–May (MAM) of the first prediction year corresponds to 2–4 months after model initialization in January. Likewise, December–February (DJF) of the first prediction year denotes 11–13 months after initialization. In general, MME (solid line) shows higher ACCs than individual models (crosses in Fig. 1b), although this is not always true, as has been demonstrated in previous studies (e.g., Palmer et al. 2004; Wang et al. 2009). The MME efficiency, defined by the difference between the MME skill and averaged skill of individual models (Wang et al. 2009), is dependent on individual model prediction skills and intermodel independency (Yoo and Kang 2005). In the CMIP5 models, the MME efficiency tends to decrease with forecast lead time. This is particularly true in DJF when several models have higher prediction skills than the MME.

(a) Annual-mean global-mean (60°S–70°N) near-surface air temperature time series (°C) from ERA-Interim (black line) and CMIP5 MME (colored lines) of the first, second, third, and fourth prediction years. (b) ACC and MSSS of the GMT for annual-mean (gray), MAM (yellow), JJA (red), SON (green), and DJF (blue) seasons. Solid line and crosses indicate MME and individual hindcast runs, respectively. Closed circles indicate the MME values that are statistically significant at the 95% confidence level. (c) As in (b), but for the prediction skills calculated after removing the linear trend of GMT. (d),(e) As in (b),(c), but for the prediction skills of MME-H, HadCM3a, and HadCM3f.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
In Fig. 1b, the MME and individual models have statistically significant ACCs in all seasons, at least up to the 4-yr lead time. Although not shown, useful ACC skills (above 0.5 shown by the horizontal dashed line in Fig. 1b) are found for lead times of up to 9 years in most seasons except DJF. This is consistent with results by Doblas-Reyes et al. (2013), who found similar GMT prediction skills using 4-yr-averaged values. The high ACCs mainly result from the global warming trend, as discussed in previous studies (Corti et al. 2012; Kim et al. 2012; Meehl et al. 2014). When the linear trend is removed, the GMT ACCs rapidly decrease and become statistically insignificant at the 95% confidence level within two or three years (Fig. 1c).
The ACC of the MME GMT strongly depends on the season. The highest skill is found in June–August (JJA) and September–November (SON) while the lowest skill is present in DJF (Fig. 1b). For example, the DJF prediction skill at the 11–13-month lead (i.e., first prediction year) is similar to or even lower than other seasons’ prediction skills at the 44–46 months lead (i.e., SON in the fourth prediction year; Fig. 1b). The same is true when long-term trends are removed (Fig. 1c). This result indicates that the interannual prediction of DJF climate is systematically biased.
The MSSS takes into account both ACC and conditional bias [Eq. (1)] and shows much lower skill scores and wider intermodel spread than the ACC in all seasons (Fig. 1b). The MSSSs of GMT are lowest in DJF, but highest in SON, as with the ACCs. For example, MSSSs of SON GMT in the fourth prediction year are significant at the 95% confidence level in the 44–46-month lead. However, in DJF, MSSSs rapidly decrease with increasing lead times, and essentially no prediction skills are found at the 11–13 lead months (first prediction year in Fig. 1b). If the linear trend is removed, significant MSSSs are not found at all. This result confirms that the model biases depend on the season of interest.
In the previous section, it is stated that the model successfully predicts both sign and magnitude of anomaly if ACC and MSSS are statistically significant and ACC is greater than 0.5. For the annual-mean GMT, these conditions are satisfied only up to the second prediction year (Fig. 1b). This result indicates that, although the sign of the GMT anomalies is qualitatively predictable on a decadal time scale, its magnitude is predictable only at a seasonal-to-interannual time scale. However, if the long-term trend is removed, no significant prediction skill is found (Fig. 1c).
The prediction skills of MME-H are presented in Figs. 1d and 1e. General features are very similar to those shown in Figs. 1b and 1c (i.e., an extended ACC skill due to a long-term global warming trend and a relatively lower prediction skill in DJF than in other seasons). The annual-mean detrended GMT (gray lines in Fig. 1e) shows a slightly higher prediction skill than the MME. However, for seasonal-mean GMT, the MME-H shows somewhat a lower prediction skill than the MME. For instance, while MME MSSS in JJA in the first prediction year is statistically significant at the 95% confidence level, no significant MSSS is found in the MME-H at the same prediction year. This may be in part caused by a lead time that is two months longer in the MME-H (i.e., JJA in the first prediction year corresponds to 5–7 lead months for the MME, but 7–9 lead months for the MME-H). A comparison between HadCM3a and HadCM3f reveals no systematic differences in both ACC and MSSS skills. The results indicate that GMT prediction skills are not highly sensitive to the initialized month (November vs January) and initialization method (full-field vs anomaly initialization).
To identify the spatial structure of the model bias, the spatial distribution of ACCs and MSSSs of MME TAS is analyzed and is shown in Fig. 2. Statistically significant ACCs (MSSSs) are shown by shading (black line). For comparison with previous studies, the linear trend is not removed in all time series. As expected, prediction skills decrease with lead times but do not decrease uniformly in space. For example, the MME TAS over the tropical Pacific is highly predictable at 11–13 lead months (Fig. 2d). This long-lead prediction skill is also found over the North Atlantic Ocean south of Greenland and the tropical Atlantic up to the third prediction year, which is related to the tripolar SST change at decadal time scales (Alvarez-Garcia et al. 2008). This is consistent with the long-lead predictability of North Atlantic SST and the related TAS, as documented in previous studies (Kim et al. 2012; Doblas-Reyes et al. 2013; Meehl et al. 2014; Ham et al. 2014). Although the detailed dynamical mechanism needs to be determined, one possible reason for the long-lead prediction skill in the tropical Atlantic is the ocean condition of the subpolar gyre region of the North Atlantic, which is related to the Atlantic meridional overturning circulation (Dunstone et al. 2011; Ham et al. 2014; Meehl et al. 2014; H.-M. Kim et al. 2014).

Shading and black lines indicate the spatial pattern of ACC (greater than 0.5) and MSSS (greater than 0.0) for MME TAS that are statistically significant at the 95% confidence level, respectively. The prediction targets are (a) MAM, (b) JJA, (c) SON, and (d) DJF seasons of the (left)–(right) first, second, and third prediction years.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
Higher ACCs in the boreal summer and autumn are also evident in Fig. 2. Interestingly, significant ACCs and MSSSs are found over Europe and northeast Asia for more than 3 years in JJA (Fig. 2b), even though the low-frequency variability is not robust over these regions (see also Corti et al. 2012). Since they completely disappear in DJF (cf. Figs. 2b,d), there is a seasonal dependency of the model prediction skill. The extended prediction skill shown in Fig. 2 can again be partly attributed to the long-term trend. Although not shown, the highest ACCs of detrended TAS appears in the central tropical Pacific for up to 17–19 lead months, as documented in previous studies (e.g., Luo et al. 2008; Chikamoto et al. 2015).
To examine the sensitivity to the initialization month and method in more detail, a similar analysis to the one described in the previous section is conducted for HadCM3a and HadCM3f (Fig. 3). In general, the spatial distribution of useful ACC skills resembles that in Fig. 2 quite well. Moreover, no significant differences between the HadCM3a and HadCM3f are observed (cf. the blue and red contour lines). At a short lead time (i.e., 4–6 lead months), the full-field initialization (yellow shading; HadCM3f) is likely more skillful than the anomaly initialization (light blue shading; HadCM3a) except at the equator. However, as the lead time increases, the difference in the two initializations becomes negligible. This result confirms that the GMT prediction skill is not strongly sensitive to the initialization month and method. Accordingly, most of the following analyses are focused on the MME.

ACCs of HadCM3f and HadCM3a TAS. Red and blue lines indicate the values that are statistically significant at the 95% confidence level for the HadCM3f and HadCM3a, respectively. Yellow (light blue) shading indicates the regions where ACCs of HadCM3f (HadCM3a) are larger than those of HadCM3a (HadCM3f) more than 0.1. If ACCs are comparable (within 0.1), gray shading is applied.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
Figure 4 summarizes the regional and seasonal dependencies of the ACC prediction skill for MME TAS. Each color represents the season for which the highest ACC is found in the first, second, and third prediction years. The hatching indicates that ACCs in the selected season are statistically insignificant at the 95% confidence level. If the ACC prediction skills decrease linearly with lead times, the MAM (yellow) would be the best season in the first prediction year. This is partly true. Over the oceans and equatorial regions, MAM (yellow) is indeed selected as the best season (Fig. 3a) as a result of the initialized SST. However, over high-latitude continents in the Northern Hemisphere, the highest ACCs are found in JJA (red) or SON (green). This may indicate that the initialization impact is rather weak over the Northern Hemisphere continents or that the initialization of land condition (e.g., snow cover and soil moisture) at high latitudes is improperly treated. Here it is noteworthy that the TAS over the Southern Hemisphere’s extratropical lands shows insignificant ACCs regardless of lead times, which is consistent with Fig. 2.

The best season of ACC for MME TAS. Yellow, red, green, and blue regions indicate that the highest ACC occurred in the MAM, JJA, SON, and DJF season, respectively. The prediction years are the (a) first, (b) second, and (c) third years. Hatched regions signify that ACCs in the selected season are statistically insignificant at the 95% confidence level.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
What controls the regional and seasonal dependencies of the TAS prediction skill? The MME of individual models’ errors and ensemble spreads for the first prediction year are shown in the left and middle columns of Fig. 5. The model error and ensemble spread, which quantify the models’ prediction errors with respect to the observation and models’ spreads against the ensemble mean (see section 3), respectively, show maximum values for the high-latitude continents in the Northern Hemisphere. They also exhibit a strong seasonality with the largest model error and ensemble spread in DJF as well as the smallest values in JJA, which is consistent with Figs. 1–4. This result indicates that the seasonal dependency of the GMT prediction skill is mainly caused by the model error with regard to the high-latitude continents in the Northern Hemisphere. In fact, Fig. 2 shows no prediction skill over these regions in DJF.

(left) MME of individual models’ errors, (middle) ensemble spread of TAS, and (right) ratio between the two in the first prediction year for (a) MAM, (b) JJA, (c) SON, and (d) DJF.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
The right column of Fig. 5 indicates the individual models’ mean spread-to-error ratios in the first prediction year. In most regions, particularly in the oceans, the models tend to be underdispersed. This is consistent with Ho et al. (2013), who noted that initialized models produce underdispersed SST forecasts at lead times shorter than two years. Interestingly, the models are overdispersed near tropical South America and the equatorial western Pacific, especially in DJF. The seasonal dependency of the spread-to-error ratio is relatively weak in comparison to the model error and ensemble spread.
Figures 6 and 7 further illustrate the interannual variability and linear trend of TAS, which are the two key factors in determining ACCs and MSSSs, derived from the first prediction year. Here, the standard deviation of detrended TAS is used to quantify the interannual variability of TAS. Figure 6 shows that the observed interannual variability is dominant in two regions: the equatorial eastern Pacific and the high-latitude continents in the Northern Hemisphere (left column). The largest variability is observed in these regions, particularly in DJF, presumably because of the strong variability of the ENSO and Arctic Oscillation. The MME variability, averaged from the five models’ standard deviation, is also presented in Fig. 6 (middle column). For all seasons, the variability is severely underestimated in the Northern Hemisphere, with the largest values in DJF (see right column of Fig. 6). The only exception can be seen in the western to central equatorial Pacific, where the interannual variability is slightly overestimated in the model prediction. This result suggests that, while high-latitude climate processes are severely underestimated in the model, ENSO-related Pacific variability is somewhat overestimated in the western equatorial Pacific.

(left) Standard deviation of detrended TAS in observation, (middle) in the first prediction year, and (right) difference between the two for (a) MAM, (b) JJA, (c) SON, and (d) DJF.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1

As in Fig. 6, but for the linear trend of TAS.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
Figure 7 presents the linear trends of the observed and predicted TAS. The first-year predictions show significant warming trends over Eurasia and eastern Canada for most seasons with a comparable magnitude. This is in contrast to the observed trends, which show a clear seasonality. Especially in DJF, the MME fails to reproduce the observed trends over the Northern Hemisphere’s high latitudes. For example, the strong warming trend over eastern Canada is severely underestimated in the MME prediction. Over the Eurasian continent, the predicted TAS trend exhibits a sign opposite to the observation.
These biases in interannual variability (Fig. 6d) and long-term trends (Fig. 7d) explain the poor prediction skills for Eurasian and eastern Canadian TAS (and GMT) during DJF (Fig. 2d). Although not shown, we also estimated the individual models’ biases. All five models evaluated in this study consistently failed to reproduce the Eurasian cooling trend and strong interannual variability over eastern Canada during DJF, indicating that these biases are not model dependent. As hinted in Fig. 2, the biases in DJF (11–13 months lead) are worse than those in MAM of the second prediction year (14–16-month lead).
Why do the CMIP5 decadal hindcasts fail to predict the Eurasian and eastern Canadian climate variability? Although further analyses are needed, one possible factor might be the uncertainty in land surface and cryospheric process and their coupling with the atmosphere in the model (e.g., Outten et al. 2013; Materia et al. 2014). Land surface and sea ice initializations are typically less reliable than ocean initializations because of limited observations. The coupling processes between high-latitude surface conditions and atmospheric processes are also not well understood, although they may play crucial roles in both regional and global atmospheric circulations (Koster et al. 2004; Overland et al. 2011; Lim et al. 2012). For example, the Arctic sea ice loss observed in recent decades is known to cool the surface air temperature over Eurasia during the boreal winter. This so-called warm Arctic and cold continent pattern (Honda et al. 2009; Overland et al. 2011) is not well reproduced by climate models (e.g., Screen et al. 2013), and only selected models are able to reproduce this pattern (e.g., B.-M. Kim et al. 2014). Both the Arctic sea ice loss and sea ice loss–related SST distribution over the Arctic Ocean modulate atmospheric states in high-latitude regions (e.g., Jun et al. 2014). Further studies are needed to identify the exact cause(s) of the models’ poor prediction skills.
b. ENSO
As addressed earlier, the equatorial Pacific TAS is reasonably well predicted with an 11–13-month lead. This is partly because of successful ENSO prediction. It has been reported that ENSO can possibly be predicted beyond 9 months (Jin et al. 2008; Luo et al. 2016) and even up to 2 years with the selected model (Luo et al. 2008). The CMIP5 MME shows a successful ENSO prediction with at least a 1-yr lead time. Figure 8 presents the observed and predicted JFM-mean Niño-3.4 indices for the first (0–2 lead months; Fig. 8a), second (12–14 lead months, Fig. 8b), and third prediction years (24–26 lead months; Fig. 8c). The Niño-3.4 index is almost perfectly predicted for the first few months, as the models are initialized in January (Fig. 8a). However, the interannual variability is slightly overestimated in most models, which is consistent with Fig. 6a (see equatorial Pacific). Even in the second prediction year, the MME qualitatively predicted the ENSO with an ACC of about 0.64 (Fig. 8b). However, the good prediction skill disappears in the third prediction year (Fig. 8c).

(a) Niño-3.4 index time series during JFM seasons in the first prediction year (i.e., 0–2 lead months). Black and red lines represent the ERSSTv3b and CMIP5 MME, respectively. Yellow marks denote the individual hindcast runs. (b),(c) As in (a), but for the (b) second and (c) third prediction years (i.e., 12–14 and 24–26 lead months).
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
It is worth noting that the Niño-3.4 index shows a positive trend in an extended prediction (Fig. 8c). Although not shown, such El Niño–like warming increases with increasing lead times regardless of the seasons. This is opposite to the observations [i.e., La Niña–like cooling trend during the past two decades (e.g., McPhaden et al. 2011; Luo et al. 2012)]. It may be relevant to the long-term trend signals in the initial conditions, which can lead to the skillful trend at short lead times, but not at multiyear time scales (e.g., Luo et al. 2011).
The extended ENSO prediction skill (e.g., Fig. 8b) can affect the TAS not only in the equatorial Pacific region, but also in remote locations. The observed spatial pattern of lag correlation between the seasonal-mean TAS and previous winter ENSO index is shown in the left column of Fig. 9. The observed ENSO index is defined as the 3-month-averaged Niño-3.4 SST index during JFM, as shown by the black lines in Fig. 8. The El Niño event in the previous winter leads to warm anomalies over the entire equatorial region and cold anomalies over the subtropical Pacific and southern United States in the following spring (Fig. 9a). This lag correlation disappears in most regions by summer. The ENSO-related TAS changes in the CMIP5 MME are also illustrated in the right column of Fig. 9. The ENSO-related TAS variability pattern is well reproduced, but the correlation coefficient is too large. Unlike in observations, the teleconnection pattern that arches from the equatorial Pacific to the subtropical Pacific and to Alaska or Antarctica appears even with an 8-month lag. This overly persistent teleconnection is partly caused by the overestimated SST variability over the equatorial Pacific (e.g., right column of Fig. 6a) and exaggerated persistence in the model. This result suggests that high TAS prediction skills over the equatorial region with a 1-yr lead (see Fig. 2) partly originate from the lagged ENSO impact. Significant ACCs in low latitudes and some midlatitude regions in the second prediction year are likely due to the combined effects of the extended prediction of ENSO for 1 yr (over 1-yr lag) and the lagged TAS response (additional 1- or 2-season lag).

(left) Spatial pattern of lag correlation between JFM Niño-3.4 index and seasonal-mean TAS in observations, and (right column) in the first prediction year. The target seasons are (a) MAM (2-month lag), (b) JJA (5-month lag), and (c) SON (8-month lag).
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
The ENSO prediction skill is more quantitatively evaluated in Fig. 10 using ACCs and MSSSs. A 3-month running mean is applied to the Niño-3.4 index, so the “0001JAN” on the x axis indicates the prediction skills for the JFM-mean Niño-3.4 index (i.e., 0–2 lead months). As in Fig. 1, MME-H, HadCM3a, and HadCM3f, which are initialized in November, are separately presented in Figs. 10c and 10d. An extended prediction skill is found in the ACC (Fig. 10a). Somewhat surprisingly, two models (HadCM3a and MIROC5) and the MME-H show statistically significant ACCs in the third prediction year (0003JAN; 24–26-month lead), although they are lower than 0.5. In fact, no models shows ACC greater than 0.5 in the third prediction year. Only CanCM4 and the MME show statistically significant ACCs greater than 0.5 in the second prediction year (0002APR; 15–17-month lead). A similar result is also found for MSSS (Fig. 10b). In terms of MSSSs, only the MME shows a statistically significant value at the 95% confidence level in the second prediction year (0002APR). None of the individual models shows significant skills after a few months of initialization.

(a) ACC and (b) MSSS of the 3-month running-averaged Niño-3.4 index. Closed marks indicate the values that are statistically significant at the 95% confidence level. (c),(d) As in (a),(b), but for the prediction skills of MME-H, HadCM3a, and HadCM3f.
Citation: Journal of Climate 29, 4; 10.1175/JCLI-D-15-0182.1
Figure 10 also indicates that the ENSO prediction skill is not linear in time. In all models, both ACCs and MSSSs decrease in the first spring and summer then increase in the winter. This seasonality is often called the “spring predictability barrier of ENSO.” This barrier is caused by the combined effects of seasonal cycles of SST and ENSO growth rate (e.g., Jin et al. 2008). In the boreal spring, the mean SST over the central and eastern equatorial Pacific is higher than in other seasons, which can degrade the prediction skill by internal error growth during the boreal spring season.
The above results indicate that the CMIP5 MME can predict the boreal winter ENSO index over one year in advance if initialized in January. This is largely consistent with state-of-the-art seasonal climate prediction systems documented in the literature. For example, the ACC of the Niño-3.4 index in the ECMWF seasonal forecast system 3 reached about 0.6 at the 13-month lead for the 1981–2007 period (Stockdale et al. 2011). Based on the five models’ MME for the same period, the CMIP5 decadal hindcasts in this study show ACC skills of approximately 0.66 in the first DJF (11–13 lead months) (Fig. 10a). Since the long-lead prediction skill of the extratropical climate partly comes from ENSO (Lee et al. 2011; Jeong et al. 2012), this result has a great potential in advancing the extratropical climate prediction on the interannual time scale.
Does the ENSO prediction skill depend on initialization month or method? Fig. 10d shows that the MSSS of MME-H is not significant in the second prediction year (0002JAN). Although ACCs are significant, they are lower than 0.5 (Fig. 10c). These results indicate that the ENSO prediction skill of MME-H is somewhat lower than that of MME. However, given the small number of models (i.e., only two models), it cannot conclusively be stated that models that are initialized in January have a better prediction skill than those initialized in November. Further comparison between HadCM3a and HadCM3f reveals nonnegligible differences. For instance, ENSO prediction skills decrease slower in HadCM3a (orange lines in Fig. 10c and 10d) than in HadCM3f (yellow lines). In particular, HadCM3a shows a statistically significant MSSS at 3–5 lead months (0001FEB), while no significant MSSS is found for HadCM3f (Fig. 10d). However, ENSO prediction skills for the following winter (0002JAN) are about the same for the two experiments. These results may indicate that, even if the prediction skill is sensitive to the initialization method at a time scale of a few months, it may not be on a time scale of a year or longer. These results are consistent with Smith et al. (2013), who found a negligible difference in the two initialization methods for long-lead predictions.
5. Summary and discussion
The present study attempts to evaluate the seasonal-to-interannual prediction skills for near-surface air temperature (TAS) and ENSO of the CMIP5 decadal hindcast/forecast experiments that have been initialized every November or January since 1980. The prediction skills are evaluated using the ACC and MSSS. Both metrics are applied to the individual models and the MME.
The global-mean TAS (GMT) exhibits significantly high ACCs for up to a 4-yr lead in most seasons because of the long-term global warming trend. An exception is DJF. Regardless of lead times, the GMT ACCs in DJF are lower than those in other seasons. For instance, DJF ACC in the first prediction year (11–13-month lead) is lower than SON ACC in the fourth prediction year (44–46-month lead). Likewise, an evaluation with MSSS shows no DJF prediction skill for the first prediction year (11–13-month lead). The poor prediction skill in winter is partly attributed to an unrealistic TAS prediction over the high-latitude continents in the Northern Hemisphere. In particular, all five models used in this study fail to reproduce the observed long-term trend and interannual variability of TAS over Eurasia and eastern Canada, resulting in a large model error and ensemble spread for these regions. However, it is unclear what drives these biases in high-latitude continents.
The ENSO prediction skills are reasonably good for more than one year. The prediction skills decrease for the boreal spring of the first prediction year, commonly known as the spring predictability barrier of ENSO, but increase later in the cold season, presumably because of the extended memory of the subsurface signal (e.g., Luo et al. 2008). Both ACCs and MSSSs of the MME Niño-3.4 index are statistically significant from SON of the first prediction year (i.e., 8–10-month lead; ACC = 0.66) to MAM of the second prediction year (i.e., 14–16-month lead; ACC = 0.59). This result confirms recent studies that document an extended ENSO prediction skill with at least a 1-yr lead in state-of-the-art climate prediction systems (Luo et al. 2008; Stockdale et al. 2011).
The sensitivity of the prediction skill to the initialization month and method is addressed using a limited dataset. Specifically, a comparison between the MME initialized in January and the MME-H initialized in November is made. No significant difference is found in GMT prediction skill. Although the MME shows a more extended ENSO prediction skill than the MME-H, this difference is not conclusive because of the different sample size. In terms of initialization method, it is found that full-field initialization (HadCM3f) shows an enhanced prediction skill compared to anomaly initialization (HadCM3a) for short-term TAS, except at the equator. However, this difference disappears as the lead time increases, and no significant difference is found at interannual time scales.
Although this study documents the seasonal-to-interannual prediction skills of the CMIP5 decadal hindcast/forecast runs, detailed dynamical or physical mechanisms of the extended prediction skills are not discussed and deserve further studies. In addition, sensitivity to the initialization months and initialization methods are not fully explored because of limited datasets. None of the CMIP5 decadal hindcast experiments is initialized with varying months, preventing a sensitivity test to the initialization month in the same model. However, it is highly plausible that models initialized in different seasons (e.g., summer instead of winter) may have different GMT and ENSO prediction skills (Jin et al. 2008; Wang et al. 2009). Further studies using multiple models initialized in varying months are needed.
The authors sincerely thank the reviewers for their thoughtful reviews. We acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling, which is responsible for CMIP, and we thank the climate modeling groups (listed in Table 1 of this paper) for producing and making available their model output. This work was funded by the Korea Meteorological Administration Research and Development Program under Grant KMIPA 2015-2100.
REFERENCES
Alvarez-Garcia, F., , M. Latif, , and A. Biastoch, 2008: On multidecadal and quasi-decadal North Atlantic variability. J. Climate, 21, 3433–3452, doi:10.1175/2007JCLI1800.1.
Arora, V. K., and Coauthors, 2011: Carbon emission limits required to satisfy future representative concentration pathways of greenhouse gases. Geophys. Res. Lett., 38, L05805, doi:10.1029/2010GL046270.
Bellucci, A., and Coauthors, 2015: An assessment of a multi-model ensemble of decadal climate predictions. Climate Dyn., 44, 2787–2806, doi:10.1007/s00382-014-2164-y.
Caron, L.-P., , C. G. Jones, , and F. Doblas-Reyes, 2014: Multi-year prediction skill of Atlantic hurricane activity in CMIP5 decadal hindcasts. Climate Dyn., 42, 2675–2690, doi:10.1007/s00382-013-1773-1.
Chikamoto, Y., and Coauthors, 2015: Skilful multi-year predictions of tropical trans-basin climate variability. Nat. Commun., 6, 6869, doi:10.1038/ncomms7869.
CLIVAR, 2011: Data and bias correction for decadal climate prediction. International CLIVAR Project Office Publication Series 150, 5 pp.
Corti, S., , A. Weisheimer, , T. N. Palmer, , F. J. Doblas-Reyes, , and L. Magnusson, 2012: Reliability of decadal predictions. Geophys. Res. Lett., 39, L21712, doi:10.1029/2012GL053354.
Dee, D. P., , and S. Uppala, 2009: Variational bias correction of satellite radiance data in the ERA-Interim reanalysis. Quart. J. Roy. Meteor. Soc., 135, 1830–1841, doi:10.1002/qj.493.
DelSole, T., , and M. K. Tippett, 2014: Comparing forecast skill. Mon. Wea. Rev., 142, 4658–4678, doi:10.1175/MWR-D-14-00045.1.
Delworth, T. L., and Coauthors, 2006: GFDL’s CM2 global coupled climate models. Part I: Formulation and simulation characteristics. J. Climate, 19, 643–674, doi:10.1175/JCLI3629.1.
Doblas-Reyes, F. J., and Coauthors, 2013: Initialized near-term regional climate change prediction. Nat. Commun., 4, 1715, doi:10.1038/ncomms2704.
Dunstone, N. J., , D. M. Smith, , and E. Eade, 2011: Multi-year predictability of the tropical Atlantic atmosphere driven by the high latitude North Atlantic Ocean. Geophys. Res. Lett., 38, L14701, doi:10.1029/2011GL047949.
Goddard, L., and Coauthors, 2013: A verification framework for interannual-to-decadal predictions experiments. Climate Dyn., 40, 245–272, doi:10.1007/s00382-012-1481-2.
Ham, Y.-G., , M. M. Rienecker, , M. J. Suarez, , Y. Vikhliaev, , B. Zhao, , J. Marshak, , G. Vernieres, , and S. D. Schubert, 2014: Decadal prediction skill in the GEOS-5 forecast system. Climate Dyn., 42, 1–20, doi:10.1007/s00382-013-1858-x.
Ho, C. K., , E. Hawkins, , L. Shaffrey, , J. Bröcker, , L. Hermanson, , J. M. Murphy, , D. M. Smith, , and R. Eade, 2013: Examining reliability of seasonal to decadal sea surface temperature forecast: The role of ensemble dispersion. Geophys. Res. Lett., 40, 5770–5775, doi:10.1002/2013GL057630.
Honda, M., , J. Inoue, , and S. Yamane, 2009: Influence of low Arctic sea-ice minima on anomalously cold Eurasian winters. Geophys. Res. Lett., 36, L08707, doi:10.1029/2008GL037079.
IPCC, 2013: Climate Change 2013: The Physical Science Basis. Cambridge University Press, 1535 pp., doi:10.1017/CBO9781107415324.
Jeong, H.-I., and Coauthors, 2012: Assessment of the APCC coupled MME suite in predicting the distinctive climate impacts of two flavors of ENSO during boreal winter. Climate Dyn., 39, 475–493, doi:10.1007/s00382-012-1359-3.
Jin, E. K., and Coauthors, 2008: Current status of ENSO prediction skill in coupled ocean–atmosphere models. Climate Dyn., 31, 647–664, doi:10.1007/s00382-008-0397-3.
Jun, S.-Y., , C.-H. Ho, , B.-M. Kim, , and J.-H. Jeong, 2014: Sensitivity of Arctic warming to sea surface temperature distribution over melted sea-ice region in atmospheric general circulation model experiments. Climate Dyn., 42, 941–955, doi:10.1007/s00382-013-1897-3.
Kim, B.-M., , S.-W. Son, , S.-K. Min, , J.-H. Jeong, , S.-J. Kim, , X. Zhang, , T. Shim, , and J.-H. Yoon, 2014: Weakening of the stratospheric polar vortex by Arctic sea-ice loss. Nat. Commun., 5, 4646, doi:10.1038/ncomms5646.
Kim, H.-M., , P. J. Webster, , and J. A. Curry, 2012: Evaluation of short-term climate change prediction in multi-model CMIP5 decadal hindcasts. Geophys. Res. Lett., 39, L10701, doi:10.1029/2012GL051644.
Kim, H.-M., , Y.-G. Ham, , and A. A. Scaife, 2014: Improvement of initialized decadal predictions over the North Pacific Ocean by systematic anomaly pattern correction. J. Climate, 27, 5148–5162, doi:10.1175/JCLI-D-13-00519.1.
Koster, R. D., and Coauthors, 2004: Regions of strong coupling between soil moisture and precipitation. Science, 305, 1138–1140, doi:10.1126/science.1100217.
Lee, J.-Y., , B. Wang, , Q. Ding, , K.-J. Ha, , J.-B. Ahn, , A. Kumar, , B. Stern, , and O. Alves, 2011: How predictable is the Northern Hemisphere summer upper-tropospheric circulation? Climate Dyn., 37, 1189–1203, doi:10.1007/s00382-010-0909-9.
Lim, Y.-K., , Y.-G. Ham, , J.-H. Jeong, , and J.-S. Kug, 2012: Improvement in simulation of Eurasian winter climate variability with a realistic Arctic sea ice condition in an atmospheric GCM. Environ. Res. Lett., 7, 044041, doi:10.1088/1748-9326/7/4/044041.
Luo, J.-J., , S. Masson, , S. K. Behera, , and T. Yamagata, 2008: Extended ENSO predictions using a fully coupled ocean–atmosphere model. J. Climate, 21, 84–93, doi:10.1175/2007JCLI1412.1.
Luo, J.-J., , S. K. Behera, , Y. Masumoto, , and T. Yamagata, 2011: Impact of global ocean surface warming on seasonal-to-interannual climate prediction. J. Climate, 24, 1626–1646, doi:10.1175/2010JCLI3645.1.
Luo, J.-J., , W. Sasaki, , and Y. Masumoto, 2012: Indian Ocean warming modulates Pacific climate change. Proc. Natl. Acad. Sci. USA, 109, 18 701–18 706, doi:10.1073/pnas.1210239109.
Luo, J.-J., , C. Yuan, , W. Sasaki, , S. K. Behera, , Y. Masumoto, , T. Yamagata, , J.-Y. Lee, , and S. Masson, 2016: Current status of intraseasonal–seasonal-to-interannual prediction of the Indo-Pacific climate. Indo-Pacific Climate Variability and Predictability, S. K. Behera and T. Yamagata, Eds., World Scientific Series on Asia-Pacific Weather and Climate, Vol. 7, World Scientific, 324 pp.
Matei, D., , H. Pohlmann, , J. Jungclaus, , W. Müller, , H. Haak, , and J. Marotzke, 2012: Two tales of initializing decadal climate prediction experiments with the ECHAM5/MPI-OM model. J. Climate, 25, 8502–8523, doi:10.1175/JCLI-D-11-00633.1.
Materia, S., , A. Borrelli, , A. Bellucci, , A. Alessandri, , P. Di Petro, , P. Athanasiadis, , A. Navarra, , and S. Gualdi, 2014: Impact of atmosphere and land surface initial conditions on seasonal forecasts of global surface temperature. J. Climate, 27, 9253–9271, doi:10.1175/JCLI-D-14-00163.1.
McPhaden, M. J., , T. Lee, , and D. McClurg, 2011: El Niño and its relationship to changing background conditions in the tropical Pacific Ocean. Geophys. Res. Lett., 38, L15709, doi:10.1029/2011GL048275.
Meehl, G. A., , and H. Teng, 2014: CMIP5 multi-model hindcasts for the mid-1970s shift and early 2000s hiatus and predictions for 2016–2035. Geophys. Res. Lett., 41, 1711–1716, doi:10.1002/2014GL059256.
Meehl, G. A., and Coauthors, 2014: Decadal climate prediction: An update from the trenches. Bull. Amer. Meteor. Soc., 95, 243–267, doi:10.1175/BAMS-D-12-00241.1.
Murphy, A. H., 1988: Skill scores based on the mean square error and their relationships to the correlation coefficient. Mon. Wea. Rev., 116, 2417–2424, doi:10.1175/1520-0493(1988)116<2417:SSBOTM>2.0.CO;2.
Outten, S. D., , R. Davy, , and I. Esau, 2013: Eurasian winter cooling: Intercomparison of reanalyses and CMIP5 data sets. Atmos. Oceanic Sci. Lett., 6, 324–331, doi:10.1080/16742834.2013.11447102.
Overland, J. E., , K. R. Wood, , and M. Wang, 2011: Warm Arctic–cold continents: Climate impacts of the newly open Arctic Sea. Polar Res., 30, 15 787, doi:10.3402/polar.v30i0.15787.
Palmer, T., and Coauthors, 2004: Development of a European Multimodel Ensemble System for Seasonal-to-Interannual Prediction (DEMETER). Bull. Amer. Meteor. Soc., 85, 853–872, doi:10.1175/BAMS-85-6-853.
Screen, J. A., , I. Simmonds, , C. Deser, , and R. Tomas, 2013: The atmospheric response to three decades of observed Arctic sea ice loss. J. Climate, 26, 1230–1248, doi:10.1175/JCLI-D-12-00063.1.
Smith, D. M., , S. Cusack, , A. W. Colman, , C. K. Folland, , G. R. Harris, , and J. M. Murphy, 2007: Improved surface temperature prediction for the coming decade from a global climate model. Science, 317, 796–799, doi:10.1126/science.1139540.
Smith, D. M., , R. Eade, , N. J. Dunstone, , D. Fereday, , J. M. Murphy, , H. Pohlmann, , and A. A. Scaife, 2010: Skillful multi-year predictions of Atlantic hurricane frequency. Nat. Geosci., 3, 846–849, doi:10.1038/ngeo1004.
Smith, D. M., , R. Eade, , and H. Pohlmann, 2013: A comparison of full-field and anomaly initialization for seasonal to decadal climate prediction. Climate Dyn., 41, 3325–3338, doi:10.1007/s00382-013-1683-2.
Smith, T. M., , R. W. Reynolds, , T. C. Peterson, , and J. Lawrimore, 2008: Improvements to NOAA's historical merged land–ocean surface temperature analysis (1880–2006). J. Climate, 21, 2283–2296, doi:10.1175/2007JCLI2100.1.
Stockdale, T. N., and Coauthors, 2011: ECMWF seasonal forecast system 3 and its prediction of sea surface temperature. Climate Dyn., 37, 455–471, doi:10.1007/s00382-010-0947-3.
Tatebe, H., and Coauthors, 2012: The initialization of the MIROC climate models with hydrographic data assimilation for decadal prediction. J. Meteor. Soc. Japan, 90A, 275–294, doi:10.2151/jmsj.2012-A14.
Taylor, K. E., , R. J. Stouffer, , and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design. Bull. Amer. Meteor. Soc., 93, 485–498, doi:10.1175/BAMS-D-11-00094.1.
van Oldenborgh, G. J., , F. Doblas-Reyes, , B. Wouters, , and W. Hazeleger, 2012: Decadal prediction skill in a multi-model ensemble. Climate Dyn., 38, 1263–1280, doi:10.1007/s00382-012-1313-4.
Wang, B., and Coauthors, 2009: Advance and prospectus of seasonal prediction: Assessment of the APCC/CliPAS 14-model ensemble retrospective seasonal prediction (1980–2004). Climate Dyn., 33, 93–117, doi:10.1007/s00382-008-0460-0.
Weisheimer, A., and Coauthors, 2009: ENSEMBLES: A new multi-model ensemble for seasonal-to-annual predictions—Skill and progress beyond DEMETER in forecasting tropical Pacific SSTs. Geophys. Res. Lett., 36, L21711, doi:10.1029/2009GL040896.
Wu, T., and Coauthors, 2014: An overview of BCC climate system model development and application for climate change studies. J. Meteor. Res., 28, 34–56.
Yoo, J.-H., , and I.-S. Kang, 2005: Theoretical examination of a multi-model composite for seasonal prediction. Geophys. Res. Lett., 32, L15711, doi:10.1029/2005GL023513.