## 1. Introduction

The Madden–Julian oscillation (MJO) is the dominant component of tropical intraseasonal variability. It is a slow-moving planetary-scale envelope of convection propagating eastward across the equatorial Indian and western–central Pacific Oceans. As a naturally occurring component of the coupled ocean–atmosphere system, the MJO affects tropical precipitation, the frequency of tropical cyclones, and extratropical weather patterns (Lau and Waliser 2012). A central problem in contemporary meteorology with large societal impacts is in understanding and predicting the MJO (Zhang et al. 2013). Predicting the MJO is a major enterprise through either low-order statistical models (Jiang et al. 2008; Seo et al. 2009; Kang and Kim 2010; Oliver and Thompson 2012; Kondrashov et al. 2013; Cavanaugh et al. 2014) or operational dynamical models (Gottschalck et al. 2010; Vitart and Molteni 2010; Zhang et al. 2013; Kim et al. 2014; Neena et al. 2014).

The Real-time Multivariate MJO (RMM) index (Wheeler and Hendon 2004) is one of the most popular metrics for assessing the large-scale skill in MJO prediction. The RMM index involves both the winds at the top and bottom of the troposphere, as well as the outgoing longwave radiation (OLR), which is a surrogate for convective activity. The statistical low-order models utilized for forecasting the RMM indices are mainly based on multivariate regression (Maharaj and Wheeler 2005; Jiang et al. 2008; Seo et al. 2009; Kang and Kim 2010), time series analysis (Seo et al. 2009; Love and Matthews 2009; Kang and Kim 2010), and analogs (Seo et al. 2009), with the model uncertainty typically represented by additive stochastic noise. The useful prediction skill for the MJO of these models is about 15–20 days, which is similar to that of the operational dynamical models. Although incorporating the past-noise forecasting (PNF) method into the empirical statistical models (Kondrashov et al. 2013) extends the empirical MJO prediction to 25 days regarding the anomaly pattern correlation, the severe underestimation of the amplitudes in prediction by this model, especially for the strong MJO events, hinders potential useful forecasting in practice.

Here, we improve the predictability of the RMM index in two aspects. First, a recent systematic strategy for a data-driven, physics-constrained, low-order stochastic modeling procedure (Majda and Harlim 2013; Harlim et al. 2014) is applied to the RMM index, which results in a four-dimensional nonlinear stochastic model for the two variables representing the two RMM components and two hidden variables. This low-order model involves correlated multiplicative noise defined through energy-conserving nonlinear interactions between the observed and hidden variables, as well as additive stochastic noise. The special structure of the low-order model allows efficient data assimilation for the initialization of the hidden variables. This, together with the initialization of the observed variables provided by the singular spectrum analysis (SSA) reconstruction (Vautard and Ghil 1989) of the RMM index, facilitates the ensemble prediction algorithm. Second, because of the failures involved with measuring the disparity in the peaks between the observed and forecast RMM indices by the path-wise approaches utilizing anomaly pattern correlation, an information-theoretic framework (Roulston and Smith 2002; Weisheimer et al. 2014; Branicki and Majda 2014) is applied to the model calibration over a short training period. This framework involves generalizations of the anomaly pattern correlation, the RMS error, and the information deficiency in the model forecast, which is an indicator for assessing the amplitudes in the forecast RMM index.

The remainder of the paper is organized as follows. Section 2 includes the preliminaries of the RMM index and SSA-based initialization. The nonlinear-physics-constrained low-order stochastic models, as well as the prediction algorithm and data assimilation algorithm for the hidden variables, are presented in section 3. The information-theoretic framework is introduced in section 4, followed by the calibration of model parameters through information theory over a short training period. Section 5 illustrates the prediction skill of the nonlinear-physics-constrained stochastic models as well as that of the linear stochastic models. The relationship between the MJO prediction skill and El Niño–Southern Oscillation (ENSO) is investigated in this section as well. The paper is concluded in section 6.

## 2. Preliminaries

### a. RMM index

The Real-time Multivariate MJO index is a combined measure of convection and circulation of tropical intraseasonal variability. It is based on the first two empirical orthogonal functions (EOFs) of the combined fields of near–equatorially averaged 850-hPa zonal wind, 200-hPa zonal wind, and satellite-observed outgoing longwave radiation (OLR) data, where the OLR data are those measured by the NOAA polar-orbiting satellites and the wind data are from the NCEP–NCAR reanalysis and the NCEP operational analysis. Projecting the daily observed data onto these multiple-variable EOFs and removing the annual cycle and components of interannual variability yields principal component (PC) time series that vary mostly on the intraseasonal time scale of the MJO. These two PC time series are defined as the RMM index (Wheeler and Hendon 2004). Since the publication of the work (Wheeler and Hendon 2004), the RMM index has become the leading method for identifying the state of the MJO in observations (Jiang et al. 2011; Riley et al. 2011; Wang et al. 2012; Straub 2013) and model analyses (Kim et al. 2009; Gottschalck et al. 2010).

### b. Initialization reconstruction through SSA

Initialization plays a significant role in the effective short- and medium-range forecasting. Yet, employing the raw noisy RMM indices for the initialization impedes the skillful prediction. Various statistical procedures are often utilized to improve the initialization.

We rely on the SSA (Vautard and Ghil 1989) to reconstruct the RMM indices, serving as the initialization for prediction. SSA is a data-adaptive, nonparametric method for spectral estimation that extends classic principal component analysis into the time-lagged domain. In the following, we apply SSA for the entire dataset (years 1980–2013) containing both the training and prediction phases with a lagged embedding window of 50 days, which is consistent with the intraseasonal time scale. The SSA reconstruction of the RMM indices can be understood as extracting the dominant part of the signal from the noisy time series. The SSA reconstruction for the entire dataset in predicting the RMM indices was also utilized by Kang and Kim (2010) to reconstruct the predicted values from the autoregression models and by Kondrashov et al. (2013) to identify the low-frequency mode. The leading two and four SSA reconstructed components [hereafter SSA(1–2) and SSA(1–4)], accounting for 56% and 84% of the total energy of the RMM indices, respectively, are shown in Fig. 1. They are utilized as the reconstructed initialization for prediction.

Since the SSA(1–4) reconstruction removes only the small-scale random fluctuations in the RMM indices, the prediction with the SSA(1–4) initialization is expected to be skillful in the short range but share the same problem in the medium range as that with initialization based on the raw RMM indices. On the other hand, SSA(1–2) reconstruction extracts the large-scale principal components and brings about an obvious discrepancy compared with the RMM indices. Therefore, although the skill of forecasting at a very short lead time cannot be improved as a result of this intrinsic barrier, both short- and medium-range initialized predictions with the SSA reconstructions can be more skillful.

We point out that the direct application of SSA reconstruction for initialization is not practical in real-time prediction since it utilizes the “future” information of the indices. Yet, for the predictability study in this work, we aim at presenting the optimal prediction skill of the nonlinear-physics-constrained low-order stochastic model with model calibration given by the information-theoretic framework as proof of concept and therefore SSA reconstruction based on the entire dataset is adopted here as has been done in previous work (Kang and Kim 2010; Kondrashov et al. 2013).

## 3. The nonlinear-physics-constrained low-order stochastic model

*υ*and

*a*(also of dimension

*γ*is the coefficient of the nonlinear interaction. All the model variables are real.

*υ*multiplying

*υ*, as well as the trivial cancelation of skew-symmetric terms involving

The low-order stochastic nonlinear models in (2) are fundamentally different from those utilized earlier (Kravtsov et al. 2005; Kondrashov et al. 2013), which allow for nonlinear interactions only between the observed variables *υ* and stochastic phase *a*. It is evident that a negative value of the stochastic damping

A more sophisticated version of (2) with additional time-periodic damping has been shown to have significant skill for determining the predictability limits of the large-scale cloud patterns of boreal winter MJO (Chen et al. 2014b). Note that these models are a special case of the models described by Majda and Harlim (2013) and Harlim et al. (2014).

### Prediction algorithm and data assimilation for the hidden variables

The full time series are divided into the training and prediction periods. The training period is utilized to calibrate the model parameters and the prediction is studied over a later phase.

The ensemble prediction algorithm is adopted by running the forecasting model (2) forward given the initial values. The initial data of the two state variables

The estimates of the hidden parameters *N* ensemble members with *N* = 50. This is a practical online data assimilation algorithm for the stochastic models in (2).

## 4. Calibration of model parameters through information theory

The commonly adopted forecasting scores for MJO in the literature (Lin et al. 2008; Gottschalck et al. 2010; Rashid et al. 2011) are the root-mean-square error and the bivariate correlation, both of which are in the path-wise sense. However, as illustrated in the following motivating example, these path-wise measures are insufficient in assessing the prediction skill of the RMM indices in the sense that they are incapable of measuring the lack of information in the model forecast compared with the truth and consequently evaluating the forecasts with these metrics can fail to capture the peaks that correspond to the strong MJO events in the true signals.

To improve the prediction skill, an information-theoretic framework (Roulston and Smith 2002; Majda and Gershgorin 2010; Majda and Branicki 2012; Weisheimer et al. 2014; Branicki and Majda 2014), which incorporates both the surrogates of path-wise errors and the measure of the lack of information, is utilized to calibrate the model parameters in (2) and only a short training phase of 3 yr is needed. These information measures are also adopted to assess the forecasting skill during the prediction phase.

### a. A motivating example

*n*is the number of the points in the time series. Since the RMM index has zero mean for both the components, the bivariate correlation is essentially the same as the anomaly pattern correlation.

To illustrate the insufficiency of the two path-wise measures (5) and (6) in assessing the skill of the RMM prediction, the ensemble forecasting of the RMM indices at a lead time of 25 days utilizing the low-order stochastic model in (2) with the same SSA(1–2) initializations but two different sets of parameters for the model is shown in Fig. 2. The prediction time interval is from August 2005 to December 2008.

Looking at the RMM1 index (thin black) and comparing its ensemble mean predictions (thick blue) as shown in Figs. 2a,c, the severe underestimation of the forecasting amplitudes in prediction 2 leads to a much less skillful prediction than that of prediction 1; despite the fact that they have nearly the same anomaly pattern correlation and RMS error, the time-averaged PDF associated with the predicted signal in prediction 1 as shown in Fig. 2b is almost perfectly overlapped with that of the truth while the PDF corresponding to prediction 2 shown in Fig. 2d is highly concentrated around the origin, which indicates a large lack of information in the forecasting statistics. In fact, the phases with large amplitudes, corresponding to the strong MJO events, are of more practical concern and the failure in capturing the peaks as in prediction 2 implies an almost useless forecasting. However, the comparable path-wise scores fail to distinguish the two predictions because neither of the traditional path-wise measures assesses the lack of information in forecasting.

### b. Information-theoretic framework

An information-theoretic framework is utilized as a systematical procedure to calibrate the eight parameters

Consider the following three information-theoretic measures (Branicki and Majda 2014) within a Gaussian framework:

- the
*Shannon entropy*of the residual, - the
*relative entropy*of the PDFassociated with compared with the truth *π*, - the
*mutual information*between the true signaland the predicted one ,

*q*is the dimension of the observed variables and

Each one of the three measures provides different information about the forecasting skill. The Shannon entropy of the residual

Although the comparable path-wise skill scores of the two predictions in Fig. 2 correspond to a small difference in both the Shannon entropy and the mutual information, the disparity in amplitude of the two predictions leads to a significant difference in the relative entropy

### c. Calibration of the model with information theory

*information criterion*, we follow the same idea proposed in (Branicki and Majda 2014). We assess the prediction skill in the training phase through the functionalwhere the constant prefactor in the Shannon entropy (7) is removed in (10):This guarantees the weights of the three information measures in the information criterion (10) are of the same order in the calibration of the model parameters for the RMM indices.

The information criterion

What remains is to select an appropriate expression of

- approach 1 involves inimizing the information criterion
for a specific *S*-day lead prediction skill, where in (10)is the *S*-day lead prediction ofin , and - approach 2, which minimizes the averaged information criterion
, where in is the -day lead prediction of in .

The MJO prediction at lead times of 15 and 25 days and the overall medium-range forecasting are of particular concern. According to the properties of SSA-reconstructed initialization as described in section 2, we adopt the following strategies in the calibration stage:

- strategy 1a, where
- strategy 1b, where
- strategy 2, where

Trained in the period *υ* at the phases corresponding to large bursts in the observed variables (e.g., May 2002, January 2003, and April 2004) are strongly positive with small uncertainty. Starting from such phases, the stochastic antidamping *υ* than strategy 1b in (12) because of the fact that the feedback from the observations utilizing strategy 2 is weaker and thus the estimation contains more uncertainty.

Optimized parameters in the nonlinear low-order stochastic model calibrated by the information criterion (10) during the training period

Before systematically studying the prediction skill of the low-order stochastic model, we provide the evidence to support that the calibrated parameters in the 3-yr short training phase are also the nearly optimal ones in the prediction phase. To this end, the prediction skill for the RMM indices in an independent time interval from August 2005 to December 2008 at a lead time of 25 days as a function of parameter variations around the optimal values are shown in Fig. 4, where the optimized parameters are given by strategy 1b in (12) (i.e., the calibration of the forecasting skill at a lead time of

## 5. Prediction results

### a. Prediction skill of the RMM indices utilizing the nonlinear-physics-constrained low-order stochastic model with different calibration strategies

In the first experiment, the nonlinear-physics-constrained low-order stochastic model is trained in the time interval

In the following, we report the prediction skill of the nonlinear-physics-constrained low-order stochastic model with the optimized parameters from Table 1 and the ensemble initialization scheme for the hidden variables described in section 3. The path-wise skill scores and the information measures for prediction as a function of lead days, *S*, utilizing different calibration strategies, are shown in Fig. 5. Recall that calibration strategy 1a aims at optimizing the parameters for prediction at a lead time of 15 days with SSA(1–4) initialization of the observed variables while strategies 1b and 2 are designed for calibrating the model parameters at a lead time of 25 days and at lead times of 25, 35, and 45 days, respectively, with SSA(1–2) initialization.

Since SSA(1–4) removes the noiselike fluctuations in the smallest scales but nevertheless contains most of the information in the RMM indices, the skillful short-term prediction with SSA(1–4) initialization is extended to 20 days compared with the predictability limit of 15 days utilizing the raw RMM indices for initialization (not shown here) in both the path-wise sense (Figs. 5a,b) and the measure of the lack of information in prediction (Fig. 5d). The skillful predictions in the time domain at lead times of *S* = 5, 10, and 15 days are shown in Fig. 6 and the small difference in time-averaged PDFs between the truth and the prediction verifies the insignificant lack of information in prediction. Yet, the medium-range forecasting with SSA(1–4) initialization is unskillful because of the remaining small-scale fluctuations in the initial values. In addition, as may be seen in Fig. 5d, since the calibration is designed to optimize the 15-day-lead prediction, the lack of information in the forecast RMM indices shoots up at lead times that are more than 30 days. We have also tested the calibration strategy aiming at minimizing the averaged information criterion at lead times of 25, 35, and 45 days with SSA(1–4) initialization (not shown here) and find that skillful prediction remains only up to 20 days as well.

To extend the useful prediction to the medium range, SSA(1–2) initialization is adopted. Although SSA(1–2) initialization leads to some intrinsic barriers for a very short range forecast, as shown in Fig. 5d, the skillful prediction lasts up to 30 days for both strategies 1b and 2. The difference in prediction between these two strategies is that strategy 2 leads to a smaller lack of information in the forecast RMM indices within the ~(30~60)-day range. This is because strategy 2 focuses on medium-range forecasting within the interval that covers 25-, 35- and 45-day-lead predictions and therefore the relative entropy remains low for the medium-range forecasting, as shown in Fig. 5d. On the other hand, strategy 1b optimizes the prediction skill only at a lead time of *S* = 25 days and therefore it has little effect on the prediction when *S* is far from 25. It is worthwhile noticing that although both the information criterion *P* of strategy 1b are larger than those of strategy 2 in the medium-range forecasting for *S* > 30 as expected, the RMS error (Shannon entropy) of strategy 1b is smaller instead. This again implies the potentially misleading assessment of the prediction skill utilizing only the path-wise measures.

Figure 7 shows prediction in the time domain at lead times of 25, 35, and 45 days utilizing strategy 2 and SSA(1–2) initialization of the observed variables. The prediction at a 25-day lead time is quite accurate in most phases regarding both the path-wise error and the anomaly pattern correlation. Particularly, the peaks and strong MJO events are well predicted. The prediction is unskillful only in the strongly irregular phases (e.g., April–July 2006, November 2006, and November–December 2008), which can be attributed to both the deficiency of SSA initialization in these phases and the model error in (2) compared to the perfect physics. The predictions at lead times of 35 and 45 days are nevertheless capable of capturing the main trend of the truth and the bivariate correlation of the strong MJO events even in a 45-day lead prediction is still close to 0.5, indicating the skillful prediction. In addition, the insignificant difference in the time-averaged PDFs for all the 25-, 35-, and 45-day lead predictions implies the lack of information in the forecast RMM indices compared with the truth is small.

The skill scores for prediction utilizing the nonlinear-physics-constrained low-order stochastic model as a function of lead time are illustrated in Fig. 8. Information-theoretic strategy 2 with SSA(1–2) initialization of the observed variables is utilized for model calibration. The overall prediction is skillful up to 30 days and the useful prediction of the strong MJO events is about 40 days, both of which are much improved compared with those utilizing empirical model reduction (EMR) and PNF, as shown in Fig. 2 of Kondrashov et al. (2013). The only unskillful prediction via information theory is the bivariate correlation of the weak MJO events, which is, however, of less concern in practice. Furthermore, the forecast RMM indices in the time domain at a lead time of 25 days utilizing both EMR and PNF methods as shown in Fig. 1 of Kondrashov et al. (2013) look quite similar to those of prediction 2 in Fig. 2 of the motivating example, demonstrating the severe underestimation of the forecasting amplitudes by these methods. This contrasts with the prediction here incorporating information theory in the calibration stage, where the forecast RMM indices as shown in Fig. 7a capture all the peaks and extreme events and minimize the information deficiency in the prediction.

Finally, the long-range forecasting of the nonlinear-physics-constrained low-order stochastic model in (2) with SSA(1–2) initialization calibrated by strategy 2 is shown in Fig. 9. Different panels show the prediction starting from the first day of different months in the year 2007 and each prediction lasts for 6 months. The ensemble mean predictions are skillful up to about 1–2 months with the ensemble spread evolving with the same trend as the truth. The ensemble mean predictions do not have any long-range skill but the ensemble spread automatically predicts this lack of skill and the envelope of the ensemble predictions is able to capture the true signal. This is a significant attractive feature of the methods developed here.

### b. Prediction skill of the RMM indices utilizing the nonlinear-physics-constrained low-order stochastic model in different years

We now explore the prediction skill of the RMM indices in different years utilizing the nonlinear-physics-constrained low-order stochastic model with case studies. To check the prediction skill for a longer period, we modify the training phase to an early period

The RMS error and bivariate correlation for prediction in different years are shown in Figs. 10b,c. Note that the yearly averaged skill scores of the *i*th year are put in the middle of the *i*th and the

TOGA COARE was conducted from November 1992 to February 1993, where two pronounced MJO events associated with super–cloud clusters and westerly wind bursts were observed and are well reflected in the RMM indices. Actually, besides the TOGA COARE period, the RMM indices illustrate the strong MJO events with regular intraseasonal variability frequencies around 50 days throughout the whole year 1992 (Fig. 11c) and therefore a skillful overall prediction (

CINDY/DYNAMO collected unprecedented observations during October 2011–March 2012. Coincidentally, a La Niña event took place at nearly the same time with about −1°C anomaly in the sea surface temperature (SST) (Zhang et al. 2013). Three MJO events, which occurred over the tropical Indian Ocean in late October, late November, and late December 2011, were observed. However, the first two of them barely reached the Pacific Ocean, partially because of the La Niña conditions, while the late December MJO event is not recognized by the RMM indices as an independent MJO event. All these three MJO events have high-frequency variability with only a 30-day period, which leads to a tough prediction. In addition, a similar fast oscillation of MJO phases is observed in the RMM indices during May and June in 2011 (Fig. 11h). All these factors lead to the unskillful prediction (

Looking at the skill scores in Fig. 10, the most poorly predicted years are 1987, 1998, and 2011, when considering both the overall and strong MJO forecasting skill. The unskillful prediction in 2011 due to a large amount of high-frequency variability was explained above. The other two poorly predicted years (1987 and 1998) are accompanied by strong ENSO phases. The oscillation frequencies of the RMM indices in late 1986 and the whole year of 1987 are irregular, containing quite a few fast oscillations, and therefore the correlations for both the overall events and the strong events are below the skillful level [*Climate Dyn.*). This contrasts with the relatively strong MJO activity with nearly 50-day periodic bursts and breaks and their skillful predictions during the pre-ENSO transition phases, years 1985 [

### c. Prediction skill of the RMM indices utilizing the nonlinear-physics-constrained low-order stochastic model in the phase space

To explore the progression of the MJO in prediction through different phases, we study the phase diagrams of the RMM indices utilizing the nonlinear-physics-constrained low-order stochastic model. strategy 2 is utilized in the calibration period

Four phase diagrams of RMM1 and RMM2 prediction up to 45 days are included in Fig. 12. Figure 12a shows a weak MJO period while Fig. 12c shows a moderate period and Figs. 12b,d show two strong MJO periods. The ensemble mean prediction in Figs. 12a–c succeeds in capturing the trend of the truth, which is covered by the ensemble spread. Because of the large error in the initialization, the short-range prediction in Fig. 12d is not skillful, but after a few days’ adjustment the forecasting in the medium range describes the truth accurately.

Next, we study the forecasting skill of the nonlinear low-order stochastic model in different phases. Table 2 demonstrates the skill scores with respect to the RMS error and bivariate correlation in the prediction period at a lead time of 25 days starting from and ending at different phases. In the same table, the percentages of the strong MJO events within each phase are included. It is evident that the prediction utilizing the nonlinear low-order stochastic model is skillful in all the phases. Particularly, starting from phases 2 and 3, the bivariate correlations at a lead time of 25 days are almost 0.6 and the RMS errors (1.1767 and 1.2145) are far below the climatological forecast (

Skill scores of 25-day-lead predictions from 1985 to 2013 utilizing the nonlinear low-order stochastic model starting and ending in different phases. The model parameters used in predicting the RMM index are calibrated via strategy 2 during the training period

### d. Prediction skill utilizing the linear models

To understand the role of nonlinearity in the nonlinear-physics-constrained low-order stochastic model in predicting the RMM indices, the prediction skill utilizing the following two linear models is studied.

To compare the forecasting skill of the nonlinear stochastic model with the two linear models in (14) and (15), we calibrate the two linear models during the same training period

Optimized parameters in the 2D and 4D linear models in (14) and (15) calibrated by the information criterion (10) via strategy 2 during the training period

Figure 13 illustrates the comparison of the information measures and the information criterion in prediction as a function of the number of days of lead time utilizing the nonlinear-physics-constrained stochastic model and the two linear stochastic models equipped with the optimized parameters. The validation interval for prediction is again from August 2005 to December 2008 and SSA(1–2) initialization is adopted in the prediction stage. The prediction results utilizing different models lead to indistinguishable mutual information and thereby almost identical anomaly pattern correlation. The Shannon entropy of the residual in prediction utilizing the two linear models is even slightly smaller than that utilizing the nonlinear-physics-constrained stochastic model, indicating the smaller RMS error in prediction with the linear models. However, conspicuous information barriers are revealed in the relative entropy utilizing the two linear models for prediction; the information barrier in the 2D linear model is more significant than that in the 4D linear model because of its more simplified form. These information barriers (Majda and Gershgorin 2010; Majda and Branicki 2012) imply the failure of capturing the extreme events of the two linear models in prediction as shown in Figs. 14a–c, which illustrate the predictions of the three models in the time domain. It is obvious that many extreme events, such as those around 10 September 2005, 15 October 2006, 15 July 2007, and 15 October 2007, are captured well by the nonlinear-physics-constrained stochastic model but are missed by the two linear models and therefore the time-averaged PDFs of prediction utilizing the two linear models, shown in Figs. 14e,f, have a significantly smaller variance compared with that associated with the RMM1 index.

The information barriers to prediction by the linear models even with additive correlated stochastic forcing indicate the necessity of including the multiplicative noise and the nonlinear interaction between the observed and hidden variables in the low-order stochastic model. The stochastic damping, which is the mechanism used to stimulate the local exponential growth of the observed variables in prediction, plays a significant role in capturing the extreme events. See (Chen et al. 2014b) for another example. Therefore, the results from the linear models emphasize once again that the traditional path-wise measures alone are insufficient in assessing the prediction skill and assessing the lack of information is essential for useful prediction. We also find that the ensemble spread utilizing the linear models in (14) and (15) is not as skillful as that of the nonlinear-physics-constrained low-order stochastic model in representing the lack of information in long-range forecasting, but we omit detailed discussion here.

## 6. Conclusions

In this paper, we predict the RMM indices utilizing suitable low-order stochastic models. The systematic physics-constrained nonlinear regression strategies for time series developed recently (Majda and Harlim 2013; Harlim et al. 2014) as well as the stochastic skeleton model (Thual et al. 2014) suggest a four-dimensional model shown in (2) with two observed variables

The failure of measuring the disparity in the peaks between the observed and forecast RMM indices in Fig. 2 indicates the insufficiency of the standard path-wise measures (i.e., anomaly pattern correlation and RMS error). Therefore, an information-theoretic framework (Branicki and Majda 2014) is applied to the calibration of model parameters in a short training phase of 3 yr. This framework involves generalizations of the anomaly pattern correlation, the RMS error, and the information deficiency in the model forecast. The nonlinear stochastic models in (2) show skillful prediction for 30 days on average in these metrics. More importantly, the predictions succeed in capturing the amplitudes of the RMM index, and the useful skill of forecasting strong MJO events is around 40 days. In addition, the prediction at a lead time of 25 days is skillful in all eight of the phases. Regarding the prediction in different years, the forecast is quite skillful in the pre-ENSO and post-ENSO transition years while the transition to ENSO is a barrier for skillful medium-range forecasting in the low-order stochastic model. The very long-range forecasts by the nonlinear stochastic model also succeed in capturing the truth within the ensemble spread, another attractive feature (see Fig. 9). Furthermore, the information barriers to prediction by the linear models imply the necessity of the nonlinear interactions between the observed and hidden variables as well as the multiplicative noise in these low-order stochastic models.

Many general circulation models (GCMs) and multimodel ensemble systems (Maloney and Kiehl 2002; Liess and Bengtsson 2004; Zhang et al. 2006; Vitart et al. 2007; Vitart and Molteni 2010; Fu et al. 2013; Ling et al. 2014) are suffering from underestimating the MJO amplitudes in prediction. In these operational models, additive noise is the main source of the noise and path-wise skill scores are the typical measures of the predictability. Motivated by the improvement of the nonlinear stochastic models with multiplicative noise in (2) compared with the linear models with only additive noise in (14) and (15) in predicting RMM indices (see Fig. 14), involving the multiplicative noise in GCMs is able to enhance the amplitudes of prediction, especially for capturing the local extreme events. Stochastic parameterization based on multiplicative noise improves the capability of coarse-resolution models to represent the MJO (Thual et al. 2014; Deng et al. 2015). In addition, the insufficiency of measuring the disparity in amplitudes by the path-wise measures in predicting the RMM indices (see Fig. 2) also implies the necessity of incorporating the information-theoretic framework in GCMs for predicting MJO and other climate variabilities.

Two issues will be taken into consideration in the future. First, as mentioned in section 2, SSA initialization of the observed variables is impractical in real-time prediction because it utilizes “future” information. Exploring effective and practical data-smoothing approaches, which utilize only the information in the “historic” time series, is of importance. In addition, as noted in section 4, the three information measures and the information criterion in this work are restricted to the Gaussian framework. Although the climatological PDFs associated with the RMM indices are nearly Gaussian, those of the predictions are not necessarily Gaussian (e.g., Fig. 7e). It is interesting to see in the future if there is improved predictability utilizing information theory in the non-Gaussian framework.

Finally, we point out that information theory can be applied to many other problems. Imperfect predictions via multimodel ensemble forecasts are improved with the information-theoretic framework (Branicki and Majda 2015). Applying information theory to the noisy Lagrangian tracers reveals the practical information barrier as a function of the number of tracers (Chen et al. 2014c, 2015). Information theory also has many desirable properties for characterizing model error (Cai et al. 2002; Majda and Gershgorin 2010; Majda and Branicki 2012; Branicki et al. 2013) and predictive skill (Kleeman 2002; Branicki and Majda 2012; Giannakis and Majda 2012; Giannakis et al. 2012; Chen et al. 2014a).

This research of AJM is partially supported by Office of Naval Research Grant ONR MURI N00014-12-1-0912. NC is supported as a graduate research assistant on this grant.

# APPENDIX A

## General Definitions of the Three Information Measures

This appendix includes the general definitions and properties of the three information measures as stated in section 4.

### a. The Shannon entropy

### b. The relative entropy

*π*is given bywhich quantifies the lack of information in the statistics of the prediction

### c. The mutual information

The explicit forms of these information measures within a Gaussian framework are given by (7)–(9), respectively.

# APPENDIX B

## Mathematical Details of the Data Assimilation and Prediction Algorithm

This appendix involves the mathematical details of utilizing data assimilation algorithm to estimate the hidden variables

*υ*and

As a remark, the formulas in (B2) are optimal if and only if the signal is generated from system (B1). Since our observed signal RMM indices are not from the low-order nonlinear stochastic model (B1), the evolutions of the conditional Gaussian distributions (B2) are suboptimal.

## REFERENCES

Branicki, M., , and A. J. Majda, 2012: Quantifying uncertainty for predictions with model error in non-Gaussian systems with intermittency.

,*Nonlinearity***25**, 2543, doi:10.1088/0951-7715/25/9/2543.Branicki, M., , and A. J. Majda, 2014: Quantifying Bayesian filter performance for turbulent dynamical systems through information theory.

,*Comm. Math. Sci.***12**, 901–978, doi:10.4310/CMS.2014.v12.n5.a6.Branicki, M., , and A. J. Majda, 2015: An information-theoretic framework for improving imperfect dynamical predictions via Multi-Model Ensemble forecasts.

, doi:10.1007/s00332-015-9233-1, in press.*J. Nonlinear Sci.*Branicki, M., , N. Chen, , and A. J. Majda, 2013: Non-Gaussian test models for prediction and state estimation with model errors.

,*Chin. Ann. Math.***34B**, 29–64, doi:10.1007/s11401-012-0759-3.Cai, D., , R. Kleeman, , and A. J. Majda, 2002: A mathematical framework for quantifying predictability through relative entropy.

,*Methods Appl. Anal.***9**, 425–444, doi:10.4310/MAA.2002.v9.n3.a8.Cavanaugh, N. R., , T. Allen, , A. Subramanian, , B. Mapes, , H. Seo, , and A. J. Miller, 2014: The skill of atmospheric linear inverse models in hindcasting the Madden–Julian oscillation.

,*Climate Dyn.***44,**897–906, doi:10.1007/s00382-014-2181-x.Chen, N., , D. Giannakis, , R. Herbei, , and A. J. Majda, 2014a: An MCMC algorithm for parameter estimation in signals with hidden intermittent instability.

*SIAM/SAS J. Uncertainty Quantif.,***2,**647–669, doi:10.1137/130944977.Chen, N., , A. J. Majda, , and D. Giannakis, 2014b: Predicting the cloud patterns of the Madden–Julian oscillation through a low-order nonlinear stochastic model.

,*Geophys. Res. Lett.***41**, 5612–5619, doi:10.1002/2014GL060876.Chen, N., , A. J. Majda, , and X. T. Tong, 2014c: Information barriers for noisy Lagrangian tracers in filtering random incompressible flows.

,*Nonlinearity***27**, 2133–2163, doi:10.1088/0951-7715/27/9/2133.Chen, N., , A. J. Majda, , and X. T. Tong, 2015: Noisy Lagrangian tracers for filtering random rotating compressible flows.

, doi:10.1007/s00332-014-9226-5, in press.*J. Nonlinear Sci.*Deng, Q., , B. Khouider, , and A. J. Majda, 2015: The MJO in a coarse-resolution GCM with a stochastic multicloud parameterization.

,*J. Atmos. Sci.***72**, 55–74, doi:10.1175/JAS-D-14-0120.1.Fu, X., , J.-Y. Lee, , P.-C. Hsu, , H. Taniguchi, , B. Wang, , W. Wang, , and S. Weaver, 2013: Multi-model MJO forecasting during DYNAMO/CINDY period.

,*Climate Dyn.***41**, 1067–1081, doi:10.1007/s00382-013-1859-9.Giannakis, D., , and A. J. Majda, 2012: Quantifying the predictive skill in long-range forecasting. Part II: Model error in coarse-grained Markov models with application to ocean-circulation regimes.

,*J. Climate***25**, 1814–1826, doi:10.1175/JCLI-D-11-00110.1.Giannakis, D., , A. J. Majda, , and I. Horenko, 2012: Information theory, model error, and predictive skill of stochastic models for complex nonlinear systems.

,*Physica D***241**, 1735–1752, doi:10.1016/j.physd.2012.07.005.Gottschalck, J., and Coauthors, 2010: A framework for assessing operational Madden–Julian oscillation forecasts: A CLIVAR MJO Working Group project.

,*Bull. Amer. Meteor. Soc.***91**, 1247–1258, doi:10.1175/2010BAMS2816.1.Harlim, J., , A. Mahdi, , and A. J. Majda, 2014: An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models.

,*J. Comput. Phys.***257**, 782–812, doi:10.1016/j.jcp.2013.10.025.Hendon, H. H., , M. C. Wheeler, , and C. Zhang, 2007: Seasonal dependence of the MJO–ENSO relationship.

,*J. Climate***20**, 531–543, doi:10.1175/JCLI4003.1.Jiang, X., , D. E. Waliser, , M. C. Wheeler, , C. Jones, , M.-I. Lee, , and S. D. Schubert, 2008: Assessing the skill of an all-season statistical forecast model for the Madden–Julian oscillation.

,*Mon. Wea. Rev.***136**, 1940–1956, doi:10.1175/2007MWR2305.1.Jiang, X., and Coauthors, 2011: Vertical diabatic heating structure of the MJO: Intercomparison between recent reanalyses and TRMM estimates.

,*Mon. Wea. Rev.***139**, 3208–3223, doi:10.1175/2011MWR3636.1.Kang, I.-S., , and H.-M. Kim, 2010: Assessment of MJO predictability for boreal winter with various statistical and dynamical models.

,*J. Climate***23**, 2368–2378, doi:10.1175/2010JCLI3288.1.Kim, D., and Coauthors, 2009: Application of MJO simulation diagnostics to climate models.

,*J. Climate***22**, 6413–6436, doi:10.1175/2009JCLI3063.1.Kim, H.-M., , P. J. Webster, , V. E. Toma, , and D. Kim, 2014: Predictability and prediction skill of the MJO in two operational forecasting systems.

,*J. Climate***27**, 5364–5378, doi:10.1175/JCLI-D-13-00480.1.Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy.

,*J. Atmos. Sci.***59**, 2057–2072, doi:10.1175/1520-0469(2002)059<2057:MDPUUR>2.0.CO;2.Kondrashov, D., , M. Chekroun, , A. Robertson, , and M. Ghil, 2013: Low-order stochastic model and past-noise forecasting of the Madden–Julian oscillation.

,*Geophys. Res. Lett.***40**, 5305–5310, doi:10.1002/grl.50991.Kravtsov, S., , D. Kondrashov, , and M. Ghil, 2005: Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability.

,*J. Climate***18**, 4404–4424, doi:10.1175/JCLI3544.1.Lau, W. K., , and D. E. Waliser, 2012:

*Intraseasonal Variability in the Atmosphere–Ocean Climate System.*Springer, 614 pp.Liess, S., , and L. Bengtsson, 2004: The intraseasonal oscillation in ECHAM4. Part II: Sensitivity studies.

,*Climate Dyn.***22**, 671–688, doi:10.1007/s00382-004-0407-z.Lin, H., , G. Brunet, , and J. Derome, 2008: Forecast skill of the Madden–Julian oscillation in two Canadian atmospheric models.

,*Mon. Wea. Rev.***136**, 4130–4149, doi:10.1175/2008MWR2459.1.Lin, H., , G. Brunet, , and J. S. Fontecilla, 2010: Impact of the Madden–Julian oscillation on the intraseasonal forecast skill of the North Atlantic Oscillation.

,*Geophys. Res. Lett.***37**, L19803, doi:10.1029/2010GL044315.Ling, J., , P. Bauer, , P. Bechtold, , A. Beljaars, , R. Forbes, , F. Vitart, , M. Ulate, , and C. Zhang, 2014: Global versus local MJO forecast skill of the ECMWF model during DYNAMO.

,*Mon. Wea. Rev.***142**, 2228–2247, doi:10.1175/MWR-D-13-00292.1.Liptser, R. S., , and A. N. Shiryaev, 2001:

*Statistics of Random Processes II: Applications.*2nd ed. Springer, 402 pp.Love, B. S., , and A. J. Matthews, 2009: Real-time localised forecasting of the Madden–Julian oscillation using neural network models.

,*Quart. J. Roy. Meteor. Soc.***135**, 1471–1483, doi:10.1002/qj.463.Maharaj, E. A., , and M. C. Wheeler, 2005: Forecasting an index of the Madden-oscillation.

,*Int. J. Climatol.***25**, 1611–1618, doi:10.1002/joc.1206.Majda, A. J., , and S. N. Stechmann, 2009: The skeleton of tropical intraseasonal oscillations.

,*Proc. Natl. Acad. Sci. USA***106**, 8417–8422, doi:10.1073/pnas.0903367106.Majda, A. J., , and B. Gershgorin, 2010: Quantifying uncertainty in climate change science through empirical information theory.

,*Proc. Natl. Acad. Sci. USA***107**, 14 958–14 963, doi:10.1073/pnas.1007009107.Majda, A. J., , and S. N. Stechmann, 2011: Nonlinear dynamics and regional variations in the MJO skeleton.

,*J. Atmos. Sci.***68**, 3053–3071, doi:10.1175/JAS-D-11-053.1.Majda, A. J., , and M. Branicki, 2012: Lessons in uncertainty quantification for turbulent dynamical systems.

,*Discrete Contin. Dyn. Syst.***32**, 3133–3231, doi:10.3934/dcds.2012.32.3133.Majda, A. J., , and J. Harlim, 2012:

*Filtering Complex Turbulent Systems.*Cambridge University Press, 357 pp.Majda, A. J., , and J. Harlim, 2013: Physics constrained nonlinear regression models for time series.

,*Nonlinearity***26**, 201–217, doi:10.1088/0951-7715/26/1/201.Maloney, E. D., , and J. T. Kiehl, 2002: MJO-related SST variations over the tropical eastern Pacific during Northern Hemisphere summer.

,*J. Climate***15**, 675–689, doi:10.1175/1520-0442(2002)015<0675:MRSVOT>2.0.CO;2.Neena, J. M., , J. Y. Lee, , D. Waliser, , B. Wang, , and X. Jiang, 2014: Predictability of the Madden–Julian oscillation in the Intraseasonal Variability Hindcast Experiment (ISVHE).

,*J. Climate***27**, 4531–4543, doi:10.1175/JCLI-D-13-00624.1.Oliver, E. C., , and K. R. Thompson, 2012: A reconstruction of Madden–Julian oscillation variability from 1905 to 2008.

,*J. Climate***25**, 1996–2019, doi:10.1175/JCLI-D-11-00154.1.Rashid, H. A., , H. H. Hendon, , M. C. Wheeler, , and O. Alves, 2011: Prediction of the Madden–Julian oscillation with the POAMA dynamical prediction system.

,*Climate Dyn.***36**, 649–661, doi:10.1007/s00382-010-0754-x.Riley, E. M., , B. E. Mapes, , and S. N. Tulich, 2011: Clouds associated with the Madden–Julian oscillation: A new perspective from

*CloudSat*.,*J. Atmos. Sci.***68**, 3032–3051, doi:10.1175/JAS-D-11-030.1.Roulston, M. S., , and L. A. Smith, 2002: Evaluating probabilistic forecasts using information theory.

,*Mon. Wea. Rev.***130**, 1653–1660, doi:10.1175/1520-0493(2002)130<1653:EPFUIT>2.0.CO;2.Seo, K.-H., , W. Wang, , J. Gottschalck, , Q. Zhang, , J.-K. E. Schemm, , W. R. Higgins, , and A. Kumar, 2009: Evaluation of MJO forecast skill from several statistical and dynamical forecast models.

,*J. Climate***22**, 2372–2388, doi:10.1175/2008JCLI2421.1.Stechmann, S. N., , and A. J. Majda, 2015: Identifying the skeleton of the Madden–Julian oscillation in observational data.

,*Mon. Wea. Rev.***143,**395–416, doi:10.1175/MWR-D-14-00169.1.Straub, K. H., 2013: MJO initiation in the real-time multivariate MJO index.

,*J. Climate***26**, 1130–1151, doi:10.1175/JCLI-D-12-00074.1.Thual, S., , A. J. Majda, , and S. N. Stechmann, 2014: A stochastic skeleton model for the MJO.

,*J. Atmos. Sci.***71**, 697–715, doi:10.1175/JAS-D-13-0186.1.Vautard, R., , and M. Ghil, 1989: Singular spectrum analysis in nonlinear dynamics, with applications to paleoclimatic time series.

,*Physica D***35**, 395–424, doi:10.1016/0167-2789(89)90077-8.Vitart, F., , and F. Molteni, 2010: Simulation of the Madden–Julian oscillation and its teleconnections in the ECMWF forecast system.

,*Quart. J. Roy. Meteor. Soc.***136**, 842–855, doi:10.1002/qj.623.Vitart, F., , S. Woolnough, , M. Balmaseda, , and A. Tompkins, 2007: Monthly forecast of the Madden–Julian oscillation using a coupled GCM.

,*Mon. Wea. Rev.***135**, 2700–2715, doi:10.1175/MWR3415.1.Vitart, F., , A. Leroy, , and M. C. Wheeler, 2010: A comparison of dynamical and statistical predictions of weekly tropical cyclone activity in the Southern Hemisphere.

,*Mon. Wea. Rev.***138**, 3671–3682, doi:10.1175/2010MWR3343.1.Wang, L., , K. Kodera, , and W. Chen, 2012: Observed triggering of tropical convection by a cold surge: Implications for MJO initiation.

,*Quart. J. Roy. Meteor. Soc.***138**, 1740–1750, doi:10.1002/qj.1905.Webster, P. J., , and R. Lukas, 1992: TOGA COARE: The Coupled Ocean–Atmosphere Response Experiment.

,*Bull. Amer. Meteor. Soc.***73**, 1377–1416, doi:10.1175/1520-0477(1992)073<1377:TCTCOR>2.0.CO;2.Weisheimer, A., , S. Corti, , T. Palmer, , and F. Vitart, 2014: Addressing model error through atmospheric stochastic physical parametrizations: Impact on the coupled ECMWF seasonal forecasting system.

*Philos. Trans. Roy. Soc.,***372A,**doi:10.1098/rsta.2013.0290.Wheeler, M. C., , and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction.

,*Mon. Wea. Rev.***132**, 1917–1932, doi:10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.Wolter, K., , and M. S. Timlin, 1993: Monitoring ENSO in COADS with a seasonally adjusted principal component index.

*Proc. 17th Climate Diagnostics Workshop,*Norman, OK, NOAA/NSSL, 52–57.Wolter, K., , and M. S. Timlin, 1998: Measuring the strength of ENSO events: How does 1997/98 rank?

,*Weather***53**, 315–324, doi:10.1002/j.1477-8696.1998.tb06408.x.Woolnough, S., , F. Vitart, , and M. Balmaseda, 2007: The role of the ocean in the Madden–Julian oscillation: Implications for MJO prediction.

,*Quart. J. Roy. Meteor. Soc.***133**, 117–128, doi:10.1002/qj.4.Yanai, M., , B. Chen, , and W.-w. Tung, 2000: The Madden–Julian oscillation observed during the TOGA COARE IOP: Global view.

,*J. Atmos. Sci.***57**, 2374–2396, doi:10.1175/1520-0469(2000)057<2374:TMJOOD>2.0.CO;2.Yoneyama, K., , C. Zhang, , and C. N. Long, 2013: Tracking pulses of the Madden–Julian oscillation.

,*Bull. Amer. Meteor. Soc.***94**, 1871–1891, doi:10.1175/BAMS-D-12-00157.1.Zhang, C., , M. Dong, , H. H. Hendon, , E. D. Maloney, , A. Marshall, , K. R. Sperber, , and W. Wang, 2006: Simulations of the Madden–Julian oscillation by global weather forecast and climate models.

,*Climate Dyn.***27,**573–592, doi:10.1007/s00382-006-0148-2.Zhang, C., , J. Gottschalck, , E. D. Maloney, , M. W. Moncrieff, , F. Vitart, , D. E. Waliser, , B. Wang, , and M. C. Wheeler, 2013: Cracking the MJO nut.

,*Geophys. Res. Lett.***40**, 1223–1230, doi:10.1002/grl.50244.