## 1. Introduction

Prediction of decadal climate variability has become one of the major challenging issues in the climate science community because of its large socioeconomic impacts on the globe. While climate change projection focuses on reproducing the long-term trend, decadal prediction aims to predict the internally generated low-frequency variability superimposed on the externally forced climate change while perhaps also correcting forced signals through initialization (Smith et al. 2007; Meehl et al. 2009; Smith et al. 2012). To address the current status of decadal climate prediction in climate models, coordinated near-term climate prediction (or decadal prediction) experiments have been carried out under phase 5 of the Coupled Model Intercomparison Project (CMIP5; Taylor et al. 2012).

Improvement of decadal prediction through initialization has been most clearly found over the North Atlantic Ocean, resulting in extended prediction skill out to the multiyear horizon due to decadal fluctuations of the Atlantic meridional overturning circulation (AMOC) (Keenlyside et al. 2008; Smith et al. 2010; van Oldenborgh et al. 2012; Doblas-Reyes et al. 2013; Goddard et al. 2013; Lienert and Doblas-Reyes 2013; Meehl et al. 2014; Yang et al. 2013; Ham et al. 2014). Over the North Pacific, some signs of improved prediction skill through initialization have been found associated with the Pacific decadal oscillation (PDO; Mantua et al. 1997) or interdecadal Pacific oscillation (IPO; Keenlyside et al. 2008; Smith et al. 2010; Meehl et al. 2010; van Oldenborgh et al. 2012; Meehl and Teng 2012). Mochizuki et al. (2012) and Chikamoto et al. (2012) demonstrated that the models’ ability to follow the subsurface temperature evolution in the North Pacific increases through initialization. However, decadal prediction of the North Pacific appears to be far less skilful than prediction of the North Atlantic and there is much uncertainty around decadal variability mechanisms in the Pacific Ocean (Chikamoto et al. 2012; Guemas et al. 2012; Kim et al. 2012; Doblas-Reyes et al. 2013; Lienert and Doblas-Reyes 2013). Therefore, decadal prediction of the North Pacific climate variability remains a challenge.

Decadal prediction using climate models can be improved by realistic representation of processes and feedbacks in models following better understanding of the physical processes related to decadal variability, and by optimal assimilation methods for high-quality initial conditions. However, forecast error is still inevitable as small errors in the climate model or initial conditions grow and eventually destroy forecast skill. Therefore, efforts have been made to reduce model errors through postprocessing of decadal hindcasts. A multimodel ensemble (MME) is a useful approach to reduce model uncertainty and to enhance decadal prediction skill under limited computational resources (Chikamoto et al. 2012; Kim et al. 2012; Smith et al. 2012; Doblas-Reyes et al. 2013). However, this method is not effective when most climate models suffer from common systematic errors. This raises the need for further correction of anomalous systematic errors for decadal prediction, which has been successfully applied in seasonal prediction (Kang et al. 2004; Kang and Shukla 2006; Kim et al. 2008; Kug et al. 2008).

In this study, we attempt to correct the systematic errors in decadal hindcast anomalies, especially over the North Pacific Ocean where prediction skill is relatively low. Many previous studies used multimodel decadal hindcasts produced following the core experimental set (Taylor et al. 2012), where predictions start every five years, resulting in only 10 predictions during the period from 1961 to 2006. In this study, we assess hindcasts from additional decadal hindcast experiments initialized every year from 1961 to 2008, giving a total of 48 start dates.

Section 2 introduces details of hindcasts and observational data. Section 3 describes the decadal prediction skill and systematic errors in predicted anomalies. The skills of predictions before and after the statistical error correction are examined in section 4. A summary and discussion are provided in section 5.

## 2. Decadal hindcasts and observational data

We assess the CMIP5 decadal hindcasts conducted by six modeling centers: The UK Met Office decadal prediction system based on the Hadley Centre coupled global climate model, version 3 (HadCM3), Geophysical Fluid Dynamics Laboratory Climate Model, version 2.1 (GFDL CM2.1), Model for Interdisciplinary Research on Climate, version 5 (MIROC5), Fourth Generation Canadian Coupled Global Climate Model (CanCM4), Goddard Earth Observing System version 5 (GEOS5), and Max Planck Institute Earth System Model, low resolution (MPI-ESM-LR, hereafter MPI-LR). Details of each model’s experimental configuration can be found in Meehl et al. (2014). The HadCM3, GFDL CM2.1, and CanCM4 experiments are based on 10-member ensembles, while MIROC5 has six members and GEOS5 and MPI-LR have three members. The hindcasts consist of ensembles of 10-yr retrospective predictions initialized every year from 1960/61 to 2007/08. There are more than just six modeling centers that have performed decadal hindcasts with a yearly frequency of start dates, but only the hindcasts from those six modeling centers consisted of 48-yr sets at the time of writing. The initialization date varies from November to the following January depending on the forecast system. The first forecast year starts in the first January of the hindcasts in this study. For GEOS5 hindcasts, the predictions initialized from 1982 to 1984 are not included because of technical problems (Ham et al. 2014). The annual mean surface temperature hindcasts are compared with observations of the surface air temperature from the Hadley Centre Climate Research Unit temperature dataset, version 4 (HadCRUT4; Morice et al. 2012). All hindcast data are interpolated to the horizontal resolution of observations, 5° longitude and 5° latitude.

Decadal hindcasts made by various groups are typically bias-adjusted by removing a time-independent climatological drift away from the initial state. The bias adjustments are first done as mentioned below in the conventional sense prior to the data analysis (i.e., removal of trend and application of anomaly error correction). In both observation and hindcasts, the anomaly is calculated by subtracting climatological means for the entire period from 1961 to 2008 following the method from the International CLIVAR Project Office (ICPO 2011). The model forecast anomaly is calculated as *j* is the starting year (*n* = 1, 2, …, 48), and *τ* is the forecast lead time (*τ* = 1, 2, …, 10). The equally weighted average from ensemble members in each hindcast provides the values for the ensemble mean. Temporal smoothing of hindcasts follows Goddard et al. (2013) for forecast lead time at 2–5- and 6–9-yr averages. The period of verification differs as the averaged lead time changes. The forecasts verified are those whose initial year is 1961 to 2008 (*n* = 48) for 2–5-yr averaged prediction, and 1961–2004 (*n* = 44) for the 6–9-yr average. A 4-yr running average is applied to the observed anomalies. The linear trend is calculated at each forecast lead time and then subtracted from the anomalies. We compare the prediction skill of the initialized hindcasts with persistence prediction, a commonly used statistical forecast, which assumes that future conditions will be the same as past conditions. For example, to calculate the persistence prediction skill of SST anomalies for 2–5-yr averaged lead time, detrended SST anomalies of 2–5-yr averaged lead time are compared with the detrended SST anomaly of the initial year (year 0) over the entire period.

## 3. Prediction skill and systematic errors

### a. Impact of the linear trend on decadal prediction skill

Relatively high prediction skill out to a decade and beyond in surface temperature appears over regions where external forcing dominates, indicating that a large portion of the decadal prediction skill is due to the long-term trend. (Keenlyside et al. 2008; Meehl et al. 2009; Pohlmann et al. 2009; Smith et al. 2010; Chikamoto et al. 2012; Kim et al. 2012; Mochizuki et al. 2012; van Oldenborgh et al. 2012; Goddard et al. 2013; Lienert and Doblas-Reyes 2013). In decadal predictions, the trend of global mean surface temperature tends to be stronger than the observed trend and this discrepancy increases as forecast lead time increases, resulting in a warm bias in decadal predictions (Smith et al. 2007; Kim et al. 2012; van Oldenborgh et al. 2012). Figure 1 shows the direct comparison of the modeled (MME) and observed linear trend. All MME results in the present analysis are based on the three best performing models that have relatively high prediction skill over the North Pacific Ocean (HadCM3, GFDL CM2.1, and MIROC5). The trend is calculated for the 4-yr moving average temperature anomaly in the observation and for the 2–5-yr averaged predicted temperature anomaly from the MME. The observed temperatures in the central and subtropical North Pacific do not show strong linear trends, while the predicted warming trend is apparent over the entire globe. The spatial pattern of the trend could be slightly different in different observational datasets.

To examine additional prediction skill over the global warming signal, we calculate the residual of the linear trend and investigate the variability around the trend. Both in the observation and hindcasts, the long-term linear trend at each grid point is removed to avoid it dominating the prediction skill estimates. The detrended 2–5- and 6–9-yr averaged surface temperature anomaly is compared with the observed detrended 4-yr moving average surface temperature anomaly. Prediction skill is measured in terms of anomaly correlation coefficients and root-mean-square error (RMSE). The correlation coefficients and RMSE are calculated based on the ensemble mean for the hindcasts of each model. The pattern of the correlation coefficient of total skill with the linear trend included (Fig. 1c) is similar to the pattern of the linear trend in the observation (Fig. 1a), indicating that a large part of the prediction skill results from the linear trend. The prediction skill without trend (Fig. 1d) shows differences from the total skill. The highly predictable areas are shown near the centers of action associated with decadal climate variability in the major ocean basins. To examine the prediction skill of decadal hindcasts, we will focus on the surface temperature anomalies without the trend from now on.

Figure 2 shows the prediction skill (correlation coefficient) for 2–5 yr averaged surface temperature anomalies without the linear trend in the MME and in ensemble mean of each hindcast. In most of the models, high prediction skill appears over the central western North Pacific Ocean, tropical and subpolar North Atlantic, Indian Ocean, and Mediterranean regions, while low skill is seen in the eastern equatorial Pacific and North Pacific Ocean. The persistence prediction (Fig. 2h) shows positive correlation coefficients over most of the North Pacific. Figure 3 shows the RMSE for detrended surface temperature anomaly in the MME, ensemble mean of each hindcast, and persistence prediction. The RMSE is generally large where the correlation coefficient is low. The North Atlantic Ocean has smaller RMSE than that of the North Pacific Ocean. The RMSE is large over the high latitudes, the tropical Pacific, and the eastern and northern parts of the North Pacific in all hindcasts. The RMSE in persistence is generally larger than that in the hindcasts.

Correlation coefficients for the detrended annual mean surface temperature anomaly between observations and predictions at 2–5-yr average lead time for (a) MME, (b)–(g) ensemble mean of each model, and (h) persistence prediction. Solid black line represents statistical significance at the 99% confidence level. Numbers in parentheses indicate total number of ensemble members.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Correlation coefficients for the detrended annual mean surface temperature anomaly between observations and predictions at 2–5-yr average lead time for (a) MME, (b)–(g) ensemble mean of each model, and (h) persistence prediction. Solid black line represents statistical significance at the 99% confidence level. Numbers in parentheses indicate total number of ensemble members.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Correlation coefficients for the detrended annual mean surface temperature anomaly between observations and predictions at 2–5-yr average lead time for (a) MME, (b)–(g) ensemble mean of each model, and (h) persistence prediction. Solid black line represents statistical significance at the 99% confidence level. Numbers in parentheses indicate total number of ensemble members.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Generally, HadCM3, GFDL CM2.1, and MIROC5 show relatively higher skill than CanCM4, GEOS5, and MPI-LR in most of the Northern Hemisphere oceans, especially over the North Pacific. The relatively low prediction skill in CanCM4, GEOS5, and MPI-LR over the North Pacific could possibly be due to initialization problems or model biases (Müller et al. 2012; Ham et al. 2014), but the reason for lower prediction skill needs to be investigated in detail. Because of the low prediction skill over the Pacific Ocean, we exclude CanCM4, GEOS5, and MPI-LR in calculating the MME. Only predictions from HadCM3, GFDL CM2.1, and MIROC5 are used for calculating the MME with equal weighting on each ensemble member. The MME using three models (HadCM3, GFDL CM2.1, and MIROC5) exhibits better forecast skill than the MME using all six models (not shown), implying that the forecast skill is not just a function of the number of ensemble members. The difference of prediction skill between the three best models (HadCM3, GFDL CM2.1, and MIROC5) and three worst models (CanCM4, GEOS5, and MPI-LR) is significant with 95% confidence. Thus, a total of 26 ensemble members from the three best models are used for the MME.

### b. Systematic errors over the North Pacific Ocean

We found that the prediction skill is relatively low in the eastern North Pacific and tropical Pacific in the three best models (HadCM3, GFDL CM2.1, and MIROC5). We hypothesize that the low skill is closely linked to the systematic error of representing the dominant mode of North Pacific climate variability. To examine errors of the predicted anomalies in the North Pacific Ocean, we investigate the dominant mode of sea surface temperature (SST) variability in both observation and predictions by performing empirical orthogonal function (EOF) analysis on detrended SST anomalies. The EOF analysis is performed on the 4-yr moving averaged detrended SST anomaly in observations over the North Pacific Ocean (10°–60°N, 110°–260°E). The first EOF explains 45.7% of the total detrended SST variability. The normalized principal component (PC) time series of the first EOF represents the PDO index (Fig. 4i). Figure 4a shows the regression of observed detrended SST onto the leading PC time series of observed SST anomalies (Fig. 4i). It is a “PDO-like” pattern in the Pacific, with negative anomalies in the interior North Pacific and positive anomalies along the eastern Pacific coast and tropical and subtropical Pacific (Fig. 4a).

The regression of the SST anomaly (K) on the leading PC time series of observed SST anomalies: (a) observations, (b) MME, and (c)–(h) ensemble mean of each model at 2–5 yr average lead time. (i) The normalized PC time series from the first EOF of the observed SST whose value for a specific year indicates the averaged value for the following 4 yr. For example, the value in 1961 indicates the average of 1962–65.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The regression of the SST anomaly (K) on the leading PC time series of observed SST anomalies: (a) observations, (b) MME, and (c)–(h) ensemble mean of each model at 2–5 yr average lead time. (i) The normalized PC time series from the first EOF of the observed SST whose value for a specific year indicates the averaged value for the following 4 yr. For example, the value in 1961 indicates the average of 1962–65.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The regression of the SST anomaly (K) on the leading PC time series of observed SST anomalies: (a) observations, (b) MME, and (c)–(h) ensemble mean of each model at 2–5 yr average lead time. (i) The normalized PC time series from the first EOF of the observed SST whose value for a specific year indicates the averaged value for the following 4 yr. For example, the value in 1961 indicates the average of 1962–65.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The predicted pattern that is related to the observed PDO mode is obtained by the regression of the predicted detrended SST anomaly onto the normalized leading PC (Fig. 4i). The regressed SST patterns of the MME at 2–5-yr lead time (Fig. 4b) shows basinwide cooling over the North Pacific, whose pattern is similar to the predicted linear trend (Fig. 1b). The predicted pattern related to the observed PDO captures negative SST anomalies over the central North Pacific Ocean where the predictive skill is relatively high (Fig. 2a). However, the MME prediction shows negative anomalies over the eastern North Pacific and tropical Pacific, while the observed pattern exhibits positive anomalies. This is also true when the regression is performed on the individual models, especially for HadCM3, GFDL CM2.1, and MIROC5 (Figs. 4c–e).

Newman (2013) showed that the most notable deficiency in prediction of North Pacific decadal variability in initialized hindcasts appears to be related to the second eigenmode over the Pacific that represents decadal variability with sufficiently long *e*-folding time (Newman 2007), while the first mode corresponds to the trend, which has been removed in the present analysis. In the Pacific, the predicted SST patterns (Figs. 4b–e) resemble the second eigenmode of Newman (2007). This implies it is possible that the excessive excitation of the second mode in the forecast leads differences between the observed and predicted first EOF. That is, the error in the dominant EOF of the forecast comes from the different contribution of predictable modes to the PDO rather than the model error in representing the PDO-related variability. This is supported by the notion that the PDO is the sum of independent red noise sources (Beran 1994; Newman 2007) rather than a single phenomenon. The different contribution of PDO-related modes is possibly due to differences in atmospheric variability (Hasselmann 1976), or air–sea coupling strength (Barsugli and Battisti 1998) in the simulation; however, further investigation should be performed to figure out the detailed mechanism.

Three models (HadCM3, GFDL CM2.1, and MIROC5) have common systematic errors in predicting the dominant mode of SST variability in the North Pacific. The regressed patterns from models (Fig. 4) are similar to the leading modes of the singular value decomposition (SVD) analysis between the observed and predicted SST (not shown) for HadCM3, GFDL CM2.1, and MIROC5. This indicates that the three models share common errors in their simulation of decadal variability in the North Pacific Ocean. In particular, it is the general absence of oppositely signed anomalies in the eastern North Pacific Ocean. However, the leading modes of the other three models (CanCM4, GEOS5, and MPI-LR) are not strongly related to PDO like variability and the errors are not systematically different from observations (not shown), and thus the statistical error correction method cannot enhance their prediction skill. Therefore, we apply the statistical correction method to the predictions only from HadCM3, GFDL CM2.1, and MIROC5 whose predictions are systematically different from the observation.

## 4. Improvement of decadal predictions with statistical error correction

Systematic errors are seen over the North Pacific Ocean in the predicted detrended SST anomalies. Here, we apply the stepwise pattern projection method (SPPM; Kug et al. 2008) to correct the systematic error in the predicted anomalies.

### a. Bias correction procedure

The selection of the predictor domain *D* is important. In general, the pattern projection models use a fixed domain. However, in the method suggested by Kug et al. (2008), the predictor domain is selected optimally based on the correlation between predictand and the predictor during the training period. To select the optimal predictor domain, we follow the same process as Kug et al. (2008). First, over the training period, the correlation coefficients between the

The correction of the SST anomaly in MME prediction at grid point 20°N, 120°W (point A in Fig. 5) is performed as an example. This is a grid point where the models have apparent systematic errors (Fig. 4). In the training period, the observed SST anomaly at point A is

Correlation coefficients (contours) between the observed SST anomaly at 20°N, 120°W (red dot) with (a) the predicted SST anomaly (MME) at 2–5-yr average lead time and (b) the observed SST anomaly. The shading represents the area selected as predictors in the SPPM process.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Correlation coefficients (contours) between the observed SST anomaly at 20°N, 120°W (red dot) with (a) the predicted SST anomaly (MME) at 2–5-yr average lead time and (b) the observed SST anomaly. The shading represents the area selected as predictors in the SPPM process.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Correlation coefficients (contours) between the observed SST anomaly at 20°N, 120°W (red dot) with (a) the predicted SST anomaly (MME) at 2–5-yr average lead time and (b) the observed SST anomaly. The shading represents the area selected as predictors in the SPPM process.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Based on the correlation coefficients, grid points (shaded area in Fig. 5a) are selected as predictors

### b. SST anomaly prediction with statistical error correction

The observed detrended SST anomaly (Fig. 6a) averaged over the period 1991–94, when the climate state stayed in the positive PDO phase, is shown together with predicted anomalies from 2–5- and 6–9-yr MME targeted to 1991–94 before correction (Figs. 6b,d). The SST anomaly patterns (Figs. 6b,d) resemble the regressed pattern (Fig. 4b) with the minimum temperature in the North Pacific shifted slightly to the north compared to observations, and with a negative anomaly over the entire North Pacific Ocean. The observed positive anomaly in the eastern North Pacific and tropical central Pacific is not predicted (Figs. 6b,d). After the error correction with SPPM, the prediction has been greatly improved (Figs. 6c,e). Although the strong negative anomaly in the central North Pacific still has its maximum slightly shifted to the north, the positive SST anomaly in the eastern North Pacific surrounding the negative anomalies in the central North Pacific is captured, thus enhancing the prediction skill in those regions.

The detrended SST anomalies (K) average over 1991–94 in (a) observations, and the MME prediction targeted to 1991–94 at (b),(c) 2–5- and (d),(e) 6–9-yr average lead time (left) before and (right) after correction.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The detrended SST anomalies (K) average over 1991–94 in (a) observations, and the MME prediction targeted to 1991–94 at (b),(c) 2–5- and (d),(e) 6–9-yr average lead time (left) before and (right) after correction.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The detrended SST anomalies (K) average over 1991–94 in (a) observations, and the MME prediction targeted to 1991–94 at (b),(c) 2–5- and (d),(e) 6–9-yr average lead time (left) before and (right) after correction.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Figure 7 shows the prediction skill for the detrended SST anomaly before and after correction for 2–5-yr averaged lead time hindcasts from the MME and three individual models. Nineteen years (initial years of 1990–2008) from the forecasting period are used to calculate the temporal correlation coefficient. Before statistical correction, the MME shows low prediction skill in the eastern North Pacific and tropical central Pacific. After statistical correction, the areas of high prediction skill remain high or slightly improved (e.g., western Pacific), while the skill is enhanced over the central and eastern North Pacific (Fig. 7a). Note that the enhanced predictive skill in the North Pacific and tropical East Pacific after correction results from correcting systematic errors. The improvement is also clear in individual models (Figs. 7b–d), implying that the statistical correction has a positive impact as far as the predicted anomalies have systematic errors. Here, the SPPM has been applied to the ensemble mean of individual models (Figs. 7b–d) and then averaged over the predicted anomalies after correction for the MME (Fig. 7a). However, if the SPPM is applied to individual ensemble members rather than to the ensemble mean, the prediction skill does not improve significantly after correction.

Prediction skill (correlation coefficients) for the detrended SST anomaly between observations and model predictions at 2–5-yr average lead time (left) before and (right) after correction: (top to bottom) MME, HadCM3, GFDL CM2.1, and MIROC5.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Prediction skill (correlation coefficients) for the detrended SST anomaly between observations and model predictions at 2–5-yr average lead time (left) before and (right) after correction: (top to bottom) MME, HadCM3, GFDL CM2.1, and MIROC5.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Prediction skill (correlation coefficients) for the detrended SST anomaly between observations and model predictions at 2–5-yr average lead time (left) before and (right) after correction: (top to bottom) MME, HadCM3, GFDL CM2.1, and MIROC5.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Several sensitivity tests have been performed to test the robustness of the application of SPPM. We applied SPPM for forecasts with initial years 1990–2008 with training from forecast initial years 1961–80 and compared it with training from 1961 to 1985. We also applied the SPPM to the first half of the hindcast period based on training from the second half. The PDO-related anomalies are better predicted after the SPPM correction in all cases.

Figure 8 compares the area averaged detrended SST anomalies for 2–5 yr lead time over the eastern North Pacific (10°–30°N, 215°–250°E) and the central North Pacific (15°–40°N, 160°–210°E) before and after correction in the MME and GFDL CM2.1. We show the results from GFDL CM2.1 together with the MME as the prediction skill is improved the most after correction in these cases. Before correction, the ensemble mean SST anomaly over the eastern North Pacific predicted by the MME and GFDL CM2.1 does not capture the large positive anomalies in the early 1990s or the low-frequency variation with decreasing trend (Figs. 8a,b). After the error correction, the anomalies resemble the observations, with positive anomalies in the 1990s, a decreasing trend up to 2000, and a slight increase between 2001 and 2005 (Figs. 8a,b). In the MME, the temporal correlation coefficient between the predicted and observed SST anomaly over the forecast period is −0.70 before correction and 0.76 after error correction. The predictions over the central North Pacific also show improvements after correction (Figs. 8c,d).

The observed (black) and predicted (colors) detrended SST anomalies at 2–5-yr average lead time for (left) MME and (right) GFDL CM2.1 over (top) the eastern North Pacific (10°–30°N, 215°–250°E) and (bottom) the central North Pacific (15°–40°N, 160°–210°E) before (dashed lines) and after correction (solid lines). Numbers at the top of each panel indicate the temporal correlation coefficients of the SST anomaly time series before (left) and after (right) correction. The years on the *x* axes represent the first year of the target prediction period. For example, 1991 is the average of the observations and prediction targeting to 1991–94.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The observed (black) and predicted (colors) detrended SST anomalies at 2–5-yr average lead time for (left) MME and (right) GFDL CM2.1 over (top) the eastern North Pacific (10°–30°N, 215°–250°E) and (bottom) the central North Pacific (15°–40°N, 160°–210°E) before (dashed lines) and after correction (solid lines). Numbers at the top of each panel indicate the temporal correlation coefficients of the SST anomaly time series before (left) and after (right) correction. The years on the *x* axes represent the first year of the target prediction period. For example, 1991 is the average of the observations and prediction targeting to 1991–94.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The observed (black) and predicted (colors) detrended SST anomalies at 2–5-yr average lead time for (left) MME and (right) GFDL CM2.1 over (top) the eastern North Pacific (10°–30°N, 215°–250°E) and (bottom) the central North Pacific (15°–40°N, 160°–210°E) before (dashed lines) and after correction (solid lines). Numbers at the top of each panel indicate the temporal correlation coefficients of the SST anomaly time series before (left) and after (right) correction. The years on the *x* axes represent the first year of the target prediction period. For example, 1991 is the average of the observations and prediction targeting to 1991–94.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The probability of the spatial frequency distributions of pointwise correlation coefficients before and after correction is shown in Fig. 9. The statistics are based on all grid points over the entire inner domain over the North Pacific (15°–45°N, 120°–245°E) with a total of 26 ensemble members from three models (HadCM3, GFDL CM2.1, and MIROC5). The frequency is calculated with 0.1 intervals and then divided by the total numbers of grid points over the selected domain to represent the probability of occurrence. Before correction, correlation coefficients are almost normally distributed with symmetry about zero mean (Fig. 9, black solid line). The peak frequency is shifted to high correlation after the correction (Fig. 9, red bar), indicating that the prediction has higher skill over the North Pacific Ocean than before correction. After correction, the peak has shifted from zero to 0.2 and the probability of correlations above 0.5 has almost doubled.

Spatial frequency distribution of correlation coefficients before (line) and after (bars) error correction for a 2–5-yr average lead time. The frequency is calculated over the North Pacific Ocean (120°–245°E, 15°–45°N) for all ensemble members from HadCM3, GFDL CM2.1, and MIROC5. A two-sample Kolmogorov–Smirnov test shows that the distributions before and after error correction are significantly different from each other at the 90% confidence level.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Spatial frequency distribution of correlation coefficients before (line) and after (bars) error correction for a 2–5-yr average lead time. The frequency is calculated over the North Pacific Ocean (120°–245°E, 15°–45°N) for all ensemble members from HadCM3, GFDL CM2.1, and MIROC5. A two-sample Kolmogorov–Smirnov test shows that the distributions before and after error correction are significantly different from each other at the 90% confidence level.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Spatial frequency distribution of correlation coefficients before (line) and after (bars) error correction for a 2–5-yr average lead time. The frequency is calculated over the North Pacific Ocean (120°–245°E, 15°–45°N) for all ensemble members from HadCM3, GFDL CM2.1, and MIROC5. A two-sample Kolmogorov–Smirnov test shows that the distributions before and after error correction are significantly different from each other at the 90% confidence level.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

Statistical significance for differences in distributions is determined using a two-sample Kolmogorov–Smirnov test (Smirnov 1948). To calculate the independent number of samples, the effective number of grid point is measured using an autocorrelation map. We chose a single point (i.e., 30°N, 185°E) where the skill has mostly improved, then calculated the autocorrelation map between the chosen temperature anomaly and anomalies of the whole grid within the target region to measure the effective radius. The effective radius is defined as the length where the autocorrelation is at 99% confidence level (i.e., correlation value 0.57), and we assume that two grid points separated by more than the effective radius are independent of each other. Finally, the total number of independent grid points in the target region (15°–45°N, 120°–245°E) is defined by dividing the total area length by the effective radius. For example, if the effective radius in longitudinal direction is 10 degrees, the number of independent grid points in the longitudinal direction is 12.5 [i.e., (245 − 120)/10 = 12.5]. A two-sample Kolmogorov–Smirnov test shows that the distributions before and after correction are significantly different from each other at the 90% confidence level. It could be argued that the 19-yr period used for validation of the statistical correction might be too short. To address this issue, we change the length of the training and forecasting periods to 1961–75 and 1980–2008, respectively. The result is similar to Fig. 9.

This study shows that the statistical correction improves deterministic prediction skill of multiyear forecasts. However, in terms of probabilistic forecast, we found that the ensemble spread is underestimated compared to the forecast error when the statistical correction method is applied. This is due to the individual ensemble member being corrected too much through the statistical correction, and the differences among ensemble members become too close to each other.

## 5. Summary and discussion

The prediction skill and errors for the predicted surface temperature anomalies in CMIP5 decadal hindcasts have been assessed. Six ocean–atmosphere coupled models that are initialized every year from 1961 to 2008 and the multimodel ensemble mean of the models are compared in parallel. The prediction skill is examined by detrending the predicted anomalies. The highly predictable areas are shown to be near the centers of action of the dominant decadal climate oscillations, the PDO and AMO, while lower prediction skill appears over the tropical and subtropical North Pacific and eastern North Pacific Ocean where three models (HadCM3, GFDL CM2.1, and MIROC5) have common systematic errors in the predicted SST anomaly patterns. These three models are capable of predicting a large-scale PDO signal, although the spatial patterns associated with the PDO are systematically different from the observed pattern. By statistically correcting the systematic pattern errors using the SPPM method, based on independent data for training and forecasting periods, the prediction skill is enhanced over the North Pacific Ocean.

This study implies that the statistical error correction method can be applied successfully to decadal predictions, even though decadal prediction tends to have a smaller number of samples for the training period than seasonal or weather prediction. This statistical error correction leads to better results than the MME method, which has been known as a useful approach to enhance decadal prediction skill under limited computational resources. To examine this point in more detail, the prediction skills for 2–5-yr averaged SST anomaly over the North Pacific Ocean (15°–45°N, 120°–245°E) are compared (Fig. 10). Before correction, the individual models show maximum correlations around 0.2. Although the three models are averaged with equal weighting, the skill for the MME is still around 0.2. However, after applying the statistical correction, each prediction shows enhancement of prediction skill. The correlation in single model after correction exceeds the skill of the MME before correction. It clearly shows that the MME method is not effective because it cannot remove common systematic errors, while the statistical correction has a substantial impact. On the other hand, the MME method is effective in reducing other forecast errors from the models and initial condition, which are not common to different models, while statistical correction is not. In addition, in the uninitialized simulations, the spatial structure and magnitude of PDO is generally realistic in the coupled climate models (Deser et al. 2012), while our results of initialized simulation show systematic errors of the PDO-related pattern. Therefore, to enhance the ability of multiyear prediction over the North Pacific Ocean, the dynamic origins of the errors in the initialized hindcasts and the amplification of the error by forecast lead time need to be further investigated.

The mean prediction skill (correlation coefficients) for 2–5 yr of SST predictions averaged over the North Pacific Ocean (120°–245°E, 15°–45°N) in the ensemble mean of each model and the MME for before (blue) and after (red) correction.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The mean prediction skill (correlation coefficients) for 2–5 yr of SST predictions averaged over the North Pacific Ocean (120°–245°E, 15°–45°N) in the ensemble mean of each model and the MME for before (blue) and after (red) correction.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

The mean prediction skill (correlation coefficients) for 2–5 yr of SST predictions averaged over the North Pacific Ocean (120°–245°E, 15°–45°N) in the ensemble mean of each model and the MME for before (blue) and after (red) correction.

Citation: Journal of Climate 27, 13; 10.1175/JCLI-D-13-00519.1

It is shown that the statistical error correction is beneficial for decadal prediction over the North Pacific Ocean if models have common errors in simulating Pacific decadal variability. We found that the statistical error correction over the North Atlantic Ocean is not as effective as the North Pacific Ocean because most of the models already predict the Atlantic decadal climate variability relatively well and because there is no strong systematic error in predictions over North Atlantic SST. As a result, after the statistical correction, the prediction skill of SST over the North Pacific becomes similar to that over the North Atlantic. For example, the SPPM method greatly enhances the mean prediction skill over the North Pacific (correlation coefficients of 0.45 over 15°–45°N, 120°–245°E), which is comparable to that over the North Atlantic (correlation coefficients of 0.42 over 0°–60°N, 280°–350°E before the correction, and similar scores after the correction).

Given that there is currently a consensus that the prediction skill over the Pacific Ocean is less than that over the North Atlantic Ocean, and this may limit the usage and application of decadal forecasts to the Atlantic sector, this study provides a foundation to extend the utility of decadal forecasts to the global oceans including the Pacific Ocean and its surrounding countries over the Asia–Pacific sectors.

## Acknowledgments

Kim was supported by the Korea Meteorological Administration Research and Development Program under Grant APCC 2013-3141 and by the New York Sea Grant R/MARR14NJ-NY. Adam Scaife was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101). We thank Dr. J. S. Kug for providing us the SPPM code and Daehyun Kang (UNIST) for providing downloaded CMIP5 decadal hindcast data. The constructive and valuable comments of two anonymous reviewers are greatly appreciated.

## REFERENCES

Barsugli, J. J., and D. S. Battisti, 1998: The basic effects of atmosphere–ocean coupling on midlatitude variability.

,*J. Atmos. Sci.***55**, 477–493, doi:10.1175/1520-0469(1998)055<0477:TBEOAO>2.0.CO;2.Beran, J., 1994:

*Statistics for Long-Memory Processes.*Chapman and Hall, 315 pp.Chikamoto, T., and Coauthors, 2012: Predictability of a stepwise shift in Pacific climate change during the late 1990s in hindcast experiments using MIROC.

,*J. Meteor. Soc. Japan***90A**, 1–21, doi:10.2151/jmsj.2012-A01.Deser, C., and Coauthors, 2012: ENSO and Pacific decadal variability in Community Climate System Model version 4.

,*J. Climate***25**, 2622–2651, doi:10.1175/JCLI-D-11-00301.1.Doblas-Reyes, F. J., and Coauthors, 2013: Initialized near-term regional climate change prediction.

,*Nat. Commun.***4**, 1715, doi:10.1038/ncomms2704.Goddard, L., and Coauthors, 2013: A verification framework for interannual-to-decadal predictions experiments.

,*Climate Dyn.***40**, 245–272.Guemas, V., F. J. Doblas-Reyes, F. Lienert, H. Dui, and Y. Soufflet, 2012: Identifying the causes of the poor decadal climate prediction skill over the North Pacific.

,*J. Geophys. Res.***117**, D20111, doi:10.1029/2012JD018004.Ham, Y. G., M. M. Rienecker, M. J. Suarez, Y. Vikhliaev, B. Zhao, J. Marshak, G. Vernieres, and S. D. Schubert, 2014: Decadal prediction skill in the GEOS-5 Forecast System.

,*Climate Dyn.***42**, 1–20, doi:10.1007/s00382-013-1858-x.Hasselmann, K., 1976: Stochastic climate models. Part I. Theory.

,*Tellus***28**, 473–485, doi:10.1111/j.2153-3490.1976.tb00696.x.ICPO, 2011: Data and bias correction for decadal climate predictions. International CLIVAR Project Office (ICPO), CLIVAR Publication Series 150, 3 pp. [Available online at http://www.clivar.org/resources/news/data-and-bias-correction-decadal-climate-predictions-14-feb-2011.]

Kang, I.-S., and J. Shukla, 2006: Dynamic seasonal prediction and predictability of monsoon.

*The Asian Monsoon,*B. Wang. Ed., Springer Praxis, 585–612.Kang, I.-S., J.-Y. Lee, and C.-K. Park, 2004: Potential predictability of summer mean precipitation in a dynamical seasonal prediction system with systematic error correction.

,*J. Climate***17**, 834–844, doi:10.1175/1520-0442(2004)017<0834:PPOSMP>2.0.CO;2.Keenlyside, N. S., M. Latif, J. Jungclaus, L. Kornblueh, and E. Roeckner, 2008: Advancing decadal-scale climate prediction in the North Atlantic sector.

,*Nature***453**, 84–88, doi:10.1038/nature06921.Kim, H.-M., I.-S. Kang, B. Wang, and J.-Y. Lee, 2008: Interannual variations of the boreal summer intraseasonal variability predicted by ten atmosphere–ocean coupled models.

,*Climate Dyn.***30**, 485–496, doi:10.1007/s00382-007-0292-3.Kim, H.-M., P. J. Webster, and J. A. Curry, 2012: Evaluation of short-term climate change prediction in multi-model CMIP5 decadal hindcasts.

,*Geophys. Res. Lett.***39**, L10701, doi:10.1029/2012GL051644.Kug, J.-S., J.-Y. Lee, and I.-S. Kang, 2008: Systematic error correction of dynamical seasonal prediction using a stepwise pattern project method.

,*Mon. Wea. Rev.***136**, 3501–3512, doi:10.1175/2008MWR2272.1.Lienert, F., and F. J. Doblas-Reyes, 2013: Decadal prediction of interannual tropical and North Pacific sea surface temperature.

,*J. Geophys. Res.***118**, 5913–5922, doi:10.1002/jgrd.50469.Mantua, N. J., S. R. Hare, Y. Zhang, J. M. Wallace, and R. C. Francis, 1997: A Pacific decadal climate oscillation with impacts on salmon.

,*Bull. Amer. Meteor. Soc.***78**, 1069–1079, doi:10.1175/1520-0477(1997)078<1069:APICOW>2.0.CO;2.Meehl, G. A., and H. Teng, 2012: Case studies for initialized decadal hindcasts and predictions for the Pacific region.

,*Geophys. Res. Lett.***39,**L22705, doi:10.1029/2012GL053423.Meehl, G. A., and Coauthors, 2009: Decadal prediction: Can it be skillful?

,*Bull. Amer. Meteor. Soc.***90**, 1467–1485, doi:10.1175/2009BAMS2778.1.Meehl, G. A., A. Hu, and C. Tebaldi, 2010: Decadal prediction in the Pacific region.

,*J. Climate***23**, 2959–2973, doi:10.1175/2010JCLI3296.1.Meehl, G. A., and Coauthors, 2014: Decadal climate prediction: An update from the trenches.

,*Bull. Amer. Meteor. Soc.***95,**243–267, doi:10.1175/BAMS-D-12-00241.1.Mochizuki, T., and Coauthors, 2012: Decadal prediction using a recent series of MIROC global climate models.

,*J. Meteor. Soc. Japan***90A**, 373–383, doi:10.2151/jmsj.2012-A22.Morice, C. P., J. J. Kennedy, N. A. Rayner, and P. D. Jones, 2012: Quantifying uncertainties in global and regional temperature change using an ensemble of observational estimates: The HadCRUT4 dataset.

,*J. Geophys. Res.***117**, D08101, doi:10.1029/2011JD017187.Müller, W. A., and Coauthors, 2012: Forecast skill of multi-year seasonal means in the decadal prediction system of the Max Planck Institute for Meteorology.

,*Geophys. Res. Lett.***39**, L22707, doi:10.1029/2012GL053326.Newman, M., 2007: Interannual to decadal predictability of tropical and North Pacific sea surface temperatures.

,*J. Climate***20**, 2333–2356, doi:10.1175/JCLI4165.1.Newman, M., 2013: An empirical benchmark for decadal forecasts of global surface temperature anomalies.

,*J. Climate***26**, 5260–5269, doi:10.1175/JCLI-D-12-00590.1.Pohlmann, H., J. Jungclaus, A. Köhl, D. Stammer, and J. Marotzke, 2009: Initializing decadal climate predictions with the GECCO oceanic synthesis: Effects on the North Atlantic.

,*J. Climate***22**, 3926–3938, doi:10.1175/2009JCLI2535.1.Smirnov, N. V., 1948: Tables for estimating the goodness of fit of empirical distributions.

,*Ann. Math. Stat.***19**, 279–281, doi:10.1214/aoms/1177730256.Smith, D. M., S. Cusack, A. W. Colman, C. K. Folland, G. R. Harris, and J. M. Murphy, 2007: Improved surface temperature prediction for the coming decade from a global climate model.

,*Science***317**, 796–799, doi:10.1126/science.1139540.Smith, D. M., R. Eade, N. J. Dunstone, D. Fereday, J. M. Murphy, H. Pohlmann, and A. A. Scaife, 2010: Skilful multi-year predictions of Atlantic hurricane frequency.

,*Nat. Geosci.***3,**846–849, doi:10.1038/ngeo1004.Smith, D. M., A. A. Scaife, and B. Kirtman, 2012: What is the current state of scientific knowledge with regard to seasonal and decadal forecasting?

,*Environ. Res. Lett.***7**, 015602, doi:10.1088/1748-9326/7/1/015602.Taylor, K. E., R. J. Stouffer, and G. A. Meehl, 2012: An overview of CMIP5 and the experiment design.

,*Bull. Amer. Meteor. Soc.***93**, 485–498, doi:10.1175/BAMS-D-11-00094.1.van Oldenborgh, G., F. Doblas Reyes, B. Wouters, and W. Hazeleger, 2012: Decadal prediction skill in a multi-model ensemble.

,*Climate Dyn.***38**, 1263–1280, doi:10.1007/s00382-012-1313-4.Yang, X., and Coauthors, 2013: A predictable AMO-like pattern in GFDL’s fully-coupled ensemble initialization and decadal forecasting system.

,*J. Climate***26**, 650–661, doi:10.1175/JCLI-D-12-00231.1.