1. Introduction
The monsoon intraseasonal oscillation (MISO) (Sikka and Gadgil 1980; Webster et al. 1998; Goswami and Ajayamohan 2001; Lau and Waliser 2011; Kikuchi et al. 2012; Lee et al. 2013) is one of the prominent modes of tropical intraseasonal variability. As a slow-moving planetary-scale envelope of convection propagating northeastward, it strongly interacts with the boreal summer monsoon rainfall over South Asia. Because of the interaction with the mean monsoon circulation and other modes of tropical variability, the propagation of the MISO is more complex compared with the eastward-propagating Madden–Julian oscillation (MJO) (Zhang 2005). The MISO plays a crucial role in determining the onset and demise of the Indian summer monsoon as well as affecting the seasonal amount of rainfall over the Indian subcontinent (Murakami et al. 1986; Goswami and Ajayamohan 2001; Goswami et al. 2003; Gadgil 2003). Therefore, both the real-time monitoring and accurate extended-range forecast of MISO phases are important, and they have large socioeconomic impacts over the Indian subcontinent (Sahai et al. 2013; Abhilash et al. 2014a).
Both dynamical models (Wang et al. 2005; Pattanaik and Kumar 2010; Acharya et al. 2011; Nair et al. 2014) and low-order statistical models (Rajeevan et al. 2007; DelSole and Shukla 2002) are widely utilized for predicting the MISO. While the forecast through operational models captures more refined structures, prediction with low-order models aims at the large-scale features and is thus computationally efficient. The prediction through low-order models relies on developing effective MISO indices, which are typically given by a few principal components (PCs) that explain the intraseasonal variabilities of the high-dimensional raw data. Once the indices are predicted, spatiotemporal reconstruction by making use of the spatial bases associated with these PCs results in the forecast of the large-scale spatial patterns.
Several indices have been proposed for the real-time monitoring and extended-range forecast of the MISO. The Indian Institute of Tropical Meteorology (IITM) relies on an index based on extended empirical orthogonal function (EEOF) analysis, which is applied to longitudinal averaged daily rainfall anomalies for the extended range prediction of MISO (Suhas et al. 2013; Sahai et al. 2013; Abhilash et al. 2014a). Another well-known MISO index (Lee et al. 2013) mimics the real-time multivariate MJO (RMM) index (Wheeler and Hendon 2004) and is based on the multivariate EOF analysis of daily anomalies of the zonal wind at 850 hPa and outgoing longwave radiation (OLR). Other MISO indices (Kikuchi et al. 2012; Goswami et al. 1999) are based on similar EOF and EEOF techniques or its analog, multichannel singular spectrum analysis (MSSA; Krishnamurthy and Shukla 2007). These covariance-based approaches in general capture the spatiotemporal MISO patterns reasonably well and isolate the northeastward-propagating intraseasonal periodicity band from the high-frequency band (Suhas et al. 2013; Abhilash et al. 2014a,b). Yet, the seasonal extraction and longitudinal averaging in computing these indices are sometimes ad hoc and can potentially lead to the loss of predictive information or mixing with other modes. In addition, these covariance-based techniques have potential inadequacy in capturing the rare/extreme events in complex nonlinear dynamics (Crommelin and Majda 2004) that have significant societal and economic impacts.
Recently Sabeerali et al. (2017) developed a new MISO index based on the nonlinear Laplacian spectral analysis (NLSA; Giannakis and Majda 2012b,a) technique. NLSA is a nonlinear data analysis technique that combines ideas from lagged embedding (Packard et al. 1980; Sauer et al. 1991), machine learning (Coifman and Lafon 2006; Belkin and Niyogi 2003), adaptive weights, and spectral entropy criteria to extract spatiotemporal modes of variability from high-dimensional time series. These modes are computed utilizing the eigenfunctions of a discrete analog of the Laplace–Beltrami operator, which can be thought of as a local analog of the temporal covariance matrix employed in EOF and EEOF techniques, but adapted to the nonlinear geometry of data generated by complex dynamical systems. A key advantage of NLSA over classical covariance-based techniques is that NLSA by design requires no ad hoc preprocessing of data such as detrending or spatiotemporal filtering of the full dataset, and it captures both intermittency and low-frequency variability (Giannakis and Majda 2012a,b, 2013; Giannakis et al. 2012b). Therefore, the NLSA-based MISO index provides an objective identification of MISO patterns from noisy precipitation data. In addition, as reported in Sabeerali et al. (2017) the NLSA MISO modes have higher memory and predictability, stronger amplitude, and higher fractional explained variance over the western Pacific, Western Ghats, and adjoining Arabian Sea regions and a more realistic representation of the regional heat sources over the Indian and Pacific Oceans compared with those extracted via EEOF analysis. Other applications of NLSA beyond the capability of EOF and EEOF in capturing both the intermittent and low-frequency modes in climate, atmosphere, and ocean can be found in previous studies (Székely et al. 2016a,b; Slawinska and Giannakis 2017; Giannakis and Majda 2012a, 2011; Brenowitz et al. 2016).
In this article, we develop a prediction framework for the large-scale MISO precipitation. This is achieved in two steps: 1) predicting the NLSA-based MISO indices and 2) reconstructing the predicted large-scale spatiotemporal patterns of the MISO precipitation. In the first step, a physics-constrained low-order stochastic model (Majda and Harlim 2013; Harlim et al. 2014) is developed to describe and predict the NLSA MISO indices. This physics-constrained low-order stochastic model contains two MISO variables and two hidden variables. They couple with each other through energy-conserving nonlinear interactions and involve both correlated multiplicative noise and additive stochastic noise. The special structure of this nonlinear stochastic model allows an effective data assimilation algorithm for determining the initial ensemble of the hidden variables that facilitates the ensemble prediction scheme. Note that this nonlinear low-order stochastic modeling framework has been shown to have significant skill for determining the predictability limits of the large-scale cloud patterns of both the boreal winter MJO and boreal summer intraseasonal oscillations (Chen et al. 2014; Chen and Majda 2015a) as well as improving the prediction skill of the RMM indices (Chen and Majda 2015b). In the second step, an effective and practical spatiotemporal reconstruction algorithm is designed for obtaining the predicted large-scale spatiotemporal patterns of the MISO precipitation. By incorporating a “predicted spatial basis” determined completely from the training data, this spatiotemporal reconstruction method overcomes the fundamental difficulty in most data decomposition techniques with lagged embedding that requires extra information in the future beyond the predicted range of the time series.
Several related issues are also addressed in this article. First, because of the lack of sufficient observational data in many real-world situations, a short training phase is usually preferred from a practical point of view. To this end, we compare both the statistical features and the prediction skill using a 3-yr short training period with those using the 10-yr period as in the default setup. It is shown that the model is able to capture and predict the main characteristics of the MISO indices even with such a short training period. Second, since most tropical rainfall is convective, OLR is a potential candidate for assessing the precipitation in the tropics. To see whether OLR is a good proxy for describing the MISO precipitation, the parameter values calibrated from the OLR monsoon modes (Chen and Majda 2015a) are adopted in the low-order model to study the skill of predicting the MISO precipitation indices. Furthermore, an intraseasonal time length is shown to be crucial for the lagged embedding window size in the NLSA in order to capture the main MISO characteristics as well as determining the predictability of the MISO precipitation.
The remainder of this article is organized as follows. Section 2 describes the precipitation dataset and the MISO indices obtained from the NLSA technique. Section 3 presents the physics-constrained low-order nonlinear stochastic model as well as the calibration and the effective prediction algorithm. The results of predicting the MISO indices are reported in section 4, and the prediction of the spatiotemporal reconstructed patterns is shown in section 5. Section 6 discusses the possibility of shortening the training period to only 3 years, the strong connection between OLR and precipitation in describing and predicting the MISO, and the significance of adopting the lagged embedding window with intraseasonal time length. Summary and conclusions are included in section 7.
2. The precipitation MISO indices from NLSA
In this study, the MISO indices for the period 1998–2013 are extracted from the daily Global Precipitation Climatology Project (GPCP) rainfall data (Huffman et al. 2001) over the Asian summer monsoon region (20°S–30°N, 30°–140°E), using the NLSA algorithm. The spatial resolution of the GPCP dataset is 1° × 1°, amounting to
NLSA is applied to the daily GPCP dataset with a lagged embedding window of size
It was shown in Sabeerali et al. (2017) that the NLSA MISO modes display the key characteristics of MISO such as northeastward propagating anomalies associated with the MISO. A case study there also revealed three consecutive MISO events in the NLSA MISO modes in the boreal summer of 2004, the onset and demise phases of which are highly consistent with observations. These facts indicate that the time series depicted in Fig. 1a give a reasonable representation of the full life cycle of the northward propagating boreal summer convection band and can be utilized to determine the phase and amplitude of the poleward-propagating rainfall anomalies associated with the MISO. Below, we utilize the terminology “MISO indices” for the two time series in Fig. 1.
3. The low-order nonlinear stochastic model and the prediction algorithm
a. The model
The physics-constrained nonlinear low-order stochastic model [(1) and (2)] has been shown to have significant skill for determining the predictability limits of the large-scale cloud patterns of both the boreal winter MJO and boreal summer intraseasonal oscillations (Chen et al. 2014; Chen and Majda 2015a) as well as for improving the prediction skill of the RMM indices (Wheeler and Hendon 2004) by incorporating a new information-theoretic strategy in the training phase (Chen and Majda 2015b).
b. Calibration of the nonlinear stochastic model
The parameters of the stochastic model in (1) and (2) are calibrated by systematically minimizing the information distance (see appendix A) of the highly non-Gaussian equilibrium PDF of the stochastic model compared with that of the actual data (Majda and Gershgorin 2010, 2011) and minimizing the root-mean-squared (RMS) error in the autocorrelations of the two MISO variables
Optimal parameters for the nonlinear low-order stochastic model [(1)] are shown in the first row (the parameters
Figures 1d–f demonstrate the statistics of the nonlinear stochastic model [(1) and (2)] with these parameters and compare them with those of the two MISO indices. Figure 1d displays that the stochastic model captures the correlation functions almost perfectly for a 3-month duration as well as the timing of the wiggles that appear with lags around one year. Figure 1e shows that the nonlinear stochastic model reproduces the fat tails of the highly non-Gaussian PDF of the two MISO indices, which cannot be captured by linear models (Chen and Majda 2015a). Figure 1f reveals that the power spectrums of the stochastic model match those of the MISO indices (Fig. 1b) very well within the intraseasonal band from 30 to 60 days, which contains the most power. Note that in the absence of the stochastic damping and stochastic phase, the model fails to capture the highly non-Gaussian PDFs, the autocorrelation functions, and the power spectrums simultaneously even with the large-scale damping
c. Prediction algorithm and data assimilation of the hidden variables
An ensemble prediction algorithm, which involves running the forecast model [(1)] forward in time given the initial values, is adopted for the MISO indices. The initial data of the two MISO variables
In fact, the equations in (1) are a conditional Gaussian system with respect to the observed variables
4. Results of predicting the MISO indices
The skill scores of the ensemble mean prediction as a function of lead time (days) in different years from 2008 to 2013 are shown in Fig. 2. Among these years, the year 2010 has useful predictions for about 20 days while years 2011 and 2013 have skillful predictions around 25–30 days. In some years like 2008, 2009, and 2012, prediction skill reaches out about 50 days. In general, prediction using this nonlinear stochastic model shows much higher skill than the conventional EEOF-based indices (Suhas et al. 2013).
The 15- and 25-day lead prediction of years 2008, 2009, and 2010 are shown in the top panels of Fig. 3. Both the phase and amplitude of MISO activity play important roles in determining the prediction skill in different years. For example, year 2008 has an overall strong and regular MISO activity during the whole monsoon season that results in a long predictability, whereas the signal-to-noise ratio in year 2010 is smaller than other years, and thus the predictability is greatly reduced. Note that although year 2009 is a drought year with weak MISO activity during the late monsoon season (September), the MISO activity in other months of the 2009 boreal summer remains strong, and the overall prediction skill is high. From the limited sample size (12 years) of our analysis, it is hard to derive a relationship between the predictability of MISO and the interannual variability of the monsoon. However, it appears that the drought years do not necessarily have low predictability.
In addition to the ensemble mean prediction, the ensemble spread that represents the predictive uncertainty is another important indicator of the prediction skill. The bottom panels in Fig. 3 shows the ensemble predictions including the ensemble spread for years 2008, 2009, and 2010, beginning at three different dates that correspond to a transition between the quiescent phase and the active phase (1 April), a starting date in the active mature phase (1 June), and a starting date in the decaying phase of MISO activity (1 October), respectively. Although the ensemble mean predictions for the 1 April starting date do not have any long-range skill, the envelope of the ensemble predictions contains the true signal and forecasts for both the summer active and winter quiescent phases. The forecasts from 1 June obviously have skill from both the ensemble mean and ensemble spread for moderate to long lead times. The forecasts starting from 1 October have both an accurate mean and small ensemble spread for very long times.
It is easy to perform twin prediction experiments with the perfect nonlinear stochastic model in (1) and (2) where 10-yr training segments of the data generated from the model are utilized to make 6-yr forecasts. It is significant that this internal prediction skill of the stochastic model is comparable to its skill in predicting the MISO indices from observations (not shown here). This lends support to the fact that the nonlinear stochastic model in (1) and (2) can accurately determine the predictability limits of the two MISO indices.
5. The spatiotemporal reconstruction
With the predicted MISO indices in hand, the next step is to recover the predicted large-scale MISO patterns in physical space. This requires the spatiotemporal reconstruction that combines the predicted MISO indices and the associated spatial bases.
a. Method
b. Prediction of the spatially and temporally reconstructed precipitation fields
Figure 4a shows three phase-space diagrams of predicting the MISO indices, starting from 1 July 2009, 1 June 2008, and 1 June 2013, and all lasting for 30 days of lead time. Among the three cases, a significant skillful prediction is found for July 2009 whereas the prediction skill of June 2008 is moderate. The true signal of June 2013 has a weak amplitude, and the corresponding prediction is far from the truth.
We demonstrate the prediction of spatiotemporal patterns based on the improved method [(7)], where the ensemble mean of prediction is utilized for the spatiotemporal reconstruction. The skill scores of the predicted spatiotemporal patterns for each of the three periods are shown in Fig. 4b. Consistent with the MISO indices, July 2009 has the highest prediction skill, and the useful prediction lasts for 40 days. On the other hand, a higher pattern correlation is found in predicting the spatiotemporal patterns of June 2008 compared with that of June 2013, where the useful prediction of June 2008 is up to around 22 days. These results indicate that the spatiotemporal patterns at different time instants largely depend on the corresponding MISO indices. An accurate prediction of the stochastic phase and amplitude usually results in a good reconstruction of the spatiotemporal patterns. Note that, different from predicting the MISO indices, the skill scores of predicting the spatiotemporal pattern do not decrease monotonically as a function of lead time, and the error at very short lead times does not approach zero as well. These facts are due to the approximation of the spatial basis
Figures 5 and 6 compare the truth and the predicted spatiotemporal patterns of July 2009 and June 2008, respectively. The predicted patterns for all of July 2009 are highly consistent with the truth, especially in the regions of the Indian subcontinent and Bay of Bengal. On the other hand, although the overall skillful prediction is up to 20 days lead time in June 2008, significant errors in the spatiotemporal patterns appear for longer time predictions due to the failure in predicting the precipitation in regions such as the Indian Ocean.
c. Comparison of different prediction algorithms for the spatiotemporal fields
To further understand the skill of predicting the spatiotemporal MISO patterns, the skill scores of using three different prediction algorithms are compared in Fig. 7. In addition to the improved algorithm [(7)], the other two methods applied here are the prediction using the direct algorithm [(6)] and the persistence prediction. The persistence method assumes that the conditions at the time of the forecast will not change, and it is typically used as a baseline for prediction. On the other hand, the direct algorithm [(6)] requires the predicted time series up to
First, because of the oscillation nature of the MISO, the persistence prediction is skillful only for 5–6 days in July 2009 and June 2008, when the signal is strong and moderate. Then the skill of the persistence prediction deteriorates rapidly and becomes much worse than the other two prediction methods, despite the fact that a reemergence of skill of the persistence prediction is found after around 40 days when another MISO event appears. On the other hand, although the short-term skill of the persistence prediction is higher than the other two methods in June 2013, the true MISO signal in this month is very weak, which implies much less significance of the MISO prediction. Next, we compare the direct approach [(6)] and the improved method [(7)]. In the strong MISO month of July 2009, the direct method is slightly better (pattern correlation ~ 0.9) than the improved method (pattern correlation ~ 0.8), and both methods are skillful up to 40 days. In this month, the prediction of the time series is quite accurate, and therefore the approximate error in
6. Model robustness and sensitivity to key parameters
a. Prediction with a 3-yr short training period
A typical situation in climate science is that only a short period of observational data is available. This actually leads to one of the fundamental difficulties in prediction utilizing most nonparametric methods that require a huge amount of data for training. Suitable models that are able to describe the essential characteristics of the data are usually preferred since they allow for a much shorter training period. Recall in the previous sections, 10 years of observations (1998–2007) were adopted for model calibration, and the prediction skill was assessed for the remaining 6 years (2008–13). Although this 10-yr training window is already much shorter than that required by most nonparametric methods, it is important to understand whether an even shorter training period is possible here for the nonlinear model to obtain the information in nature.
To this end, a very short training period involving only the first three years of the time series (1998–2000) is adopted here for model calibration. Since MISO occurs only in boreal summer and the averaged duration of one event is roughly 40 days, this 3-yr training period only contains about 10 events, which is a small number of sample size. Figure 8 compares the statistics of the MISO time series obtained with different lengths of training periods, including this short 3-yr training period (1998–2000), the 10-yr training period adopted in previous sections (1998–2007), and the full analysis period (1998–2013). The fact that the statistics of the 10-yr training period and the full analysis period almost perfectly match each other indicates the sufficiency of the 10-yr training period in obtaining the unbiased information. On the other hand, the 3-yr training period, including one weak year (1998), one moderate year (1999), and one strong year (2000) of MISO activity, also has highly consistent statistics with those associated with the MISO time series obtained with full analysis period, including the non-Gaussian fat-tailed PDFs, the power spectrums, and the autocorrelations up to 1.5 months. Therefore, the key features of the full MISO indices are well reflected in this short 3-yr training period. Because of the robustness of the model parameters (appendix A), the calibrated parameters based on this 3-yr short training period are nearly the same as the optimal parameters shown in Table 1. Importantly, this short training period allows for the study of prediction skill for a long period back to year 2001, and the results are roughly reported here.
Figure 9 shows the skill scores and the predicted signals based on the ensemble mean prediction from year 2001 to 2007, analogous to those in Figs. 2 and 3 from years 2008 to 2013. The useful prediction of these 7 years all exceeds 25 days, where in particular the skillful predictions in years 2001, 2003, and 2007 are more than 40 days. Among these 7 years, years 2002 and 2004 are recorded as drought years. A significant error is found in predicting the subdued MISO activity during August and September of year 2002, which explains its lower overall prediction skill than most of the other years. On the other hand, the major error in predicting the MISO indices of year 2004 is in fact due to the model’s failure in capturing the extremely slow oscillation frequency during August and September.
We have also checked the model statistics and prediction skill by utilizing any three consecutive years between 1998 and 2013 as the training phase. Despite the discrepancy in the signal variance due to the strength of the MISO activity in different years, the fat tails in the non-Gaussian PDFs, the peak of the power spectrums, and the autocorrelations up to 1.5 months all resemble those of the full MISO time series. Notably, the ensemble prediction skill does not have significant deterioration based on different training periods.
b. MISO indices based on different lagged embedding window sizes and the corresponding prediction skill
Recall that the two MISO indices shown in Fig. 1a and studied throughout this article were obtained by applying NLSA to the precipitation data with a lagged embedding window of length
Figure 10 shows the resulting MISO indices by applying NLSA with
Also shown in appendix C (Figs. C1–C4) are the spatiotemporal patterns of the year 2004 boreal summer monsoon season with different values of q. As discussed in Sabeerali et al. (2017) it is found that the patterns with
Figure 11 shows the prediction skill with different values of q. Here useful prediction is defined in the same way as that in section 4: first, the RMS error in the prediction is less than the standard deviation of the truth at the equilibrium and, second, the pattern correlation between the predicted signal and the truth is above 0.5. In addition to illustrating the prediction skill for the whole year, the prediction skill conditioned on the boreal summertime (June–September) is also emphasized. As expected, with the decrease in q, the overall prediction skill deteriorates. Nevertheless, conditioned on the boreal summertime, the prediction with
c. Significant prediction skill of the precipitation MISO indices with parameters calibrated from the OLR dataset
Most tropical rainfall is convective, which implies that OLR, a proxy for the convection, is a potential candidate to describe the precipitation in the tropics. Positive (negative) OLR anomalies are associated with reduced (increased) cloudiness, and hence suppressed (enhanced) deep convection. Because of the strong relationship between the OLR and tropical precipitation anomalies in describing the MISO, it is important to compare the MISO modes based on the OLR and precipitation as well as to understand the skill of the low-order nonlinear stochastic model [(1) and (2)] in predicting the MISO indices with parameters calibrated from the OLR dataset.
In Chen and Majda (2015b), the low-order nonlinear stochastic model [(1) and (2)] was adopted to predict the two boreal summer intraseasonal oscillation (BSISO) modes obtained by applying NLSA to the brightness temperature, a highly correlated variable with OLR, within the equatorial tropical belt from 15°S to 30°N. The dataset utilized there was the Cloud Archive User Service (CLAUS) version 4.7. As shown in Székely et al. (2016a), the spatial patterns of the BSISO modes initiate in the Indian Ocean and propagate northeastward, essentially the same as the MISO precipitation modes used here (Sabeerali et al. 2017). Figure 12 compares the time series and the associated statistics of OLR BSISO and those of the MISO precipitation. In addition to the intermittent time series, the power spectrums, autocorrelation functions, and cross-correlation functions of the two kinds of indices are all quite similar to each other. Both the OLR BSISO and MISO precipitation indices have non-Gaussian fat-tailed PDFs, although the variance of the OLR indices is relatively smaller.
Next, instead of comparing the predictability limit of the OLR and precipitation indices with the corresponding optimal parameters, a cross-validation type of experiment is adopted here to assess the prediction skill. Namely, the parameters associated with the OLR BSISO Chen and Majda (2015b) are used in the low-order model [(1) and (2)] to predict the precipitation MISO indices. For simplicity, these parameters are named as the OLR-based parameters. The OLR-based parameters are listed in the second row of Table 1 with two minor modifications. First, since the time series in Chen and Majda (2015b) were started from September instead of January, the phase parameter ϕ in Chen and Majda (2015b) is modified accordingly. Second, because of the general negative correlation between OLR and precipitation, the sign of the oscillation frequency a in Chen and Majda (2015b) is flipped. In fact, as shown in Table 1, the OLR-based parameters are quite similar to the optimal parameters utilized in the previous sections.
Figure 13 compares the 25-day lead prediction of the low-order model with the optimal parameters calibrated in section 3b and the parameters taken from OLR data in Chen and Majda (2015a). The RMS error and pattern correlation of the prediction using the OLR-based parameters remain nearly the same as those using the optimal parameters. This together with the comparable statistics shown in Fig. 12 confirms a strong (negative) correlation between OLR and precipitation anomalies in describing and predicting the MISO (Sabeerali et al. 2017). Based on these findings, it is also interesting and valuable to develop a low-order model using combined OLR and precipitation data, which remains as a future work.
It is worthwhile pointing out that since the variance in the OLR BSISO is smaller than that of the MISO precipitation indices, the prediction with OLR-based parameters tends to underestimate the amplitude of the MISO variability. For example, Figs. 13d and 13e show that the peaks in July 2008, June 2009, and August 2013 with the OLR-based parameters are slightly weaker than those with the optimal parameters. Since these extreme events are usually associated with the nonlinear and non-Gaussian features of the underlying system, the error in the extreme events may not be accurately captured by linear measures such as the RMS error and pattern correlations. On the other hand, the underestimation of the amplitude is clearly indicated by the information measure [see (A1)] (Chen and Majda 2015b; Majda and Gershgorin 2010, 2011) that assesses the lack of information in the PDFs, as shown in Fig. 13c.
7. Conclusions
A recently developed nonlinear data analysis technique NLSA (Giannakis and Majda 2012a,b, 2013) has been applied to the raw daily GPCP rainfall dataset without detrending or spatiotemporal filtering (Sabeerali et al. 2017). The resulting MISO precipitation mode contains two time series that have non-Gaussian fat-tailed PDFs as a consequence of intermittency. We predict the large-scale MISO precipitation in two steps.
In the first step, a physics-constrained nonlinear stochastic model (Majda and Harlim 2013; Harlim et al. 2014) is developed to calibrate and predict the MISO indices. This physics-constrained low-order stochastic model contains two MISO variables and two hidden variables that couple with each other through energy-conserving nonlinear interactions, and the model involves both correlated multiplicative noise and additive stochastic noise. The model succeeds in capturing the observed non-Gaussian PDFs, power spectrums, and autocorrelations in the MISO indices. An effective data assimilation algorithm that determines the initial ensemble of the hidden variables facilitates the ensemble prediction scheme. It is shown in section 4 that the low-order nonlinear stochastic model is skillful in predicting the MISO indices ranging from 20 to 50 days of lead time in different years.
In the second step, an effective and practical spatiotemporal reconstruction algorithm is developed (section 5), which overcomes the fundamental difficulty in most data decomposition techniques with lagged embedding that requires extra information in the future beyond the predicted range of the time series. The prediction skill of the reconstruction spatiotemporal patterns is consistent with that of the MISO indices.
A few issues are addressed in section 6. First, the model calibration and prediction with a 3-yr short training period are studied. The resulting statistics and prediction skill do not have significant deterioration compared with those based on a 10-yr training period. This suggests the advantage of utilizing the low-order nonlinear model [(1)] over most nonparametric methods in predicting the MISO indices from a practical point of view (Alexander et al. 2017). Second, the NLSA MISO indices obtained by using different lagged embedding window sizes are compared. The resulting MISO indices with shorter lagged embedding window sizes (
The simple spatiotemporal reconstruction strategy proposed in this article does not include the discrepancy of the patterns conditioned on different MISO phases. Therefore, applying a phase decomposition method to the NLSA spatial modes is a potential way to improve the spatiotemporal reconstruction in the prediction stage. Yet, the phase decomposition method has an obvious drawback in that the predicted spatiotemporal patterns are discontinuous in time when the corresponding spatial basis transits from one phase to another. One remedy to the discontinuity issue is to introduce a smooth transition between different phases such as adopting a convolution with a Gaussian kernel. On the other hand, the clustering method (Giannakis et al. 2012a) is also a promising technique for recovering more detailed features of the spatial basis conditioned on different phases. In addition, exploring the causality between MISO and other modes is another potential way of improving the MISO predictions. The study of these strategies remains as a future work.
Acknowledgments
The research of A.J.M. is partially supported by the Office of Naval Research Grant ONR MURI N00014-16-1-2161 and the New York University Abu Dhabi Research Institute. N.C. is supported as a postdoctoral fellow following A.J.M.’s ONR MURI Grant. C.T.S., R.S.A., and A.J.M. also acknowledge the support from the Monsoon Mission of the Ministry of Earth Sciences (MoES), Government of India (Grant MM/SERP/NYU/2014/SSC-01/002). The research of C.T.S. and R.S.A. is also supported by the New York University Abu Dhabi Research Institute. The authors thank Dimitrios Giannakis for useful discussions.
APPENDIX A
Calibration of the Nonlinear Stochastic Model with Information Theory
The sensitivity in prediction is studied by randomly drawing suboptimal parameters from the interval given by the two dotted lines in each panel of Fig. A1. Comparable prediction skill is found with these random suboptimal parameters as the optimal parameters.
APPENDIX B
Mathematical Details of Effective Data Assimilation and Prediction Algorithm
Figure B1 shows the posterior mean and posterior variance of stochastic damping υ and stochastic phase
APPENDIX C
Spatiotemporal Patterns with Different Lagged Embedding Window Sizes
Figures C1–C3 show the spatiotemporal patterns of boreal summer (June–September) in year 2004 obtained by NLSA with lagged embedding window sizes
REFERENCES
Abhilash, S., A. K. Sahai, S. Pattnaik, B. N. Goswami, and A. Kumar, 2014a: Extended range prediction of active-break spells of Indian summer monsoon rainfall using an ensemble prediction system in NCEP Climate Forecast System. Int. J. Climatol., 34, 98–113, https://doi.org/10.1002/joc.3668.
Abhilash, S., and Coauthors, 2014b: Prediction and monitoring of monsoon intraseasonal oscillations over Indian monsoon region in an ensemble prediction system using CFSv2. Climate Dyn., 42, 2801–2815, https://doi.org/10.1007/s00382-013-2045-9.
Acharya, N., S. C. Kar, U. Mohanty, M. A. Kulkarni, and S. K. Dash, 2011: Performance of GCMs for seasonal prediction over India—A case study for 2009 monsoon. Theor. Appl. Climatol., 105, 505–520, https://doi.org/10.1007/s00704-010-0396-2.
Alexander, R., Z. Zhao, E. Székely, and D. Giannakis, 2017: Kernel analog forecasting of tropical intraseasonal oscillations. J. Atmos. Sci., 74, 1321–1342, https://doi.org/10.1175/JAS-D-16-0147.1.
Belkin, M., and P. Niyogi, 2003: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput., 15, 1373–1396, https://doi.org/10.1162/089976603321780317.
Branicki, M., N. Chen, and A. J. Majda, 2013: Non-Gaussian test models for prediction and state estimation with model errors. Chin. Ann. Math., 34B, 29–64, https://doi.org/10.1007/s11401-012-0759-3.
Brenowitz, N. D., D. Giannakis, and A. J. Majda, 2016: Nonlinear Laplacian spectral analysis of Rayleigh–Bénard convection. J. Comput. Phys., 315, 536–553, https://doi.org/10.1016/j.jcp.2016.03.051.
Chen, N., and A. J. Majda, 2015a: Predicting the cloud patterns for the boreal summer intraseasonal oscillation through a low-order stochastic model. Math. Climate Wea. Forecasting, 1, 1–20, https://doi.org/10.1515/mcwf-2015-0001.
Chen, N., and A. J. Majda, 2015b: Predicting the real-time multivariate Madden–Julian oscillation index through a low-order nonlinear stochastic model. Mon. Wea. Rev., 143, 2148–2169, https://doi.org/10.1175/MWR-D-14-00378.1.
Chen, N., and A. J. Majda, 2016: Filtering nonlinear turbulent dynamical systems through conditional Gaussian statistics. Mon. Wea. Rev., 144, 4885–4917, https://doi.org/10.1175/MWR-D-15-0437.1.
Chen, N., A. J. Majda, and D. Giannakis, 2014: Predicting the cloud patterns of the Madden–Julian oscillation through a low-order nonlinear stochastic model. Geophys. Res. Lett., 41, 5612–5619, https://doi.org/10.1002/2014GL060876.
Coifman, R. R., and S. Lafon, 2006: Diffusion maps. Appl. Comput. Harmon. Anal., 21, 5–30, https://doi.org/10.1016/j.acha.2006.04.006.
Crommelin, D. T., and A. J. Majda, 2004: Strategies for model reduction: Comparing different optimal bases. J. Atmos. Sci., 61, 2206–2217, https://doi.org/10.1175/1520-0469(2004)061<2206:SFMRCD>2.0.CO;2.
DelSole, T., and J. Shukla, 2002: Linear prediction of Indian monsoon rainfall. J. Climate, 15, 3645–3658, https://doi.org/10.1175/1520-0442(2002)015<3645:LPOIMR>2.0.CO;2.
Gadgil, S., 2003: The Indian monsoon and its variability. Annu. Rev. Earth Planet. Sci., 31, 429–467, https://doi.org/10.1146/annurev.earth.31.100901.141251.
Ghil, M., and Coauthors, 2002: Advanced spectral methods for climatic time series. Rev. Geophys., 40, 1003, https://doi.org/10.1029/2000RG000092.
Giannakis, D., and A. J. Majda, 2011: Time series reconstruction via machine learning: Revealing decadal variability and intermittency in the North Pacific sector of a coupled climate model. 2011 Conf. on Intelligent Data Understanding (CIDU), Mountain View, CA, NASA, 107–117.
Giannakis, D., and A. J. Majda, 2012a: Comparing low-frequency and intermittent variability in comprehensive climate models through nonlinear Laplacian spectral analysis. Geophys. Res. Lett., 39, L10710, https://doi.org/10.1029/2012GL051575.
Giannakis, D., and A. J. Majda, 2012b: Nonlinear Laplacian spectral analysis for time series with intermittency and low-frequency variability. Proc. Natl. Acad. Sci. USA, 109, 2222–2227, https://doi.org/10.1073/pnas.1118984109.
Giannakis, D., and A. J. Majda, 2013: Nonlinear Laplacian spectral analysis: Capturing intermittent and low-frequency spatiotemporal patterns in high-dimensional data. Stat. Anal. Data Min., 6, 180–194, https://doi.org/10.1002/sam.11171.
Giannakis, D., A. J. Majda, and I. Horenko, 2012a: Information theory, model error, and predictive skill of stochastic models for complex nonlinear systems. Physica D, 241, 1735–1752, https://doi.org/10.1016/j.physd.2012.07.005.
Giannakis, D., W.-W. Tung, and A. J. Majda, 2012b: Hierarchical structure of the Madden–Julian oscillation in infrared brightness temperature revealed through nonlinear Laplacian spectral analysis. 2012 Conf. on Intelligent Data Understanding, Boulder, CO, NASA, 55–62, https://doi.org/10.1109/CIDU.2012.6382201.
Golyandina, N., V. Nekrutkin, and A. A. Zhigljavsky, 2001: Analysis of Time Series Structure: SSA and Related Techniques. CRC Press, 320 pp.
Goswami, B. N., and R. S. Ajayamohan, 2001: Intraseasonal oscillations and interannual variability of the Indian summer monsoon. J. Climate, 14, 1180–1198, https://doi.org/10.1175/1520-0442(2001)014<1180:IOAIVO>2.0.CO;2.
Goswami, B. N., V. Krishnamurthy, and H. Annmalai, 1999: A broad-scale circulation index for the interannual variability of the Indian summer monsoon. Quart. J. Roy. Meteor. Soc., 125, 611–633, https://doi.org/10.1002/qj.49712555412.
Goswami, B. N., R. S. Ajayamohan, P. K. Xavier, and D. Sengupta, 2003: Clustering of synoptic activity by Indian summer monsoon intraseasonal oscillations. Geophys. Res. Lett., 30, 1431, https://doi.org/10.1029/2002GL016734.
Harlim, J., A. Mahdi, and A. J. Majda, 2014: An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models. J. Comput. Phys., 257, 782–812, https://doi.org/10.1016/j.jcp.2013.10.025.
Huffman, G. J., R. F. Adler, M. M. Morrissey, D. T. Bolvin, S. Curtis, R. Joyce, B. McGavock, and J. Susskind, 2001: Global precipitation at one-degree daily resolution from multisatellite observations. J. Hydrometeor., 2, 36–50, https://doi.org/10.1175/1525-7541(2001)002<0036:GPAODD>2.0.CO;2.
Kikuchi, K., B. Wang, and Y. Kajikawa, 2012: Bimodal representation of the tropical intraseasonal oscillation. Climate Dyn., 38, 1989–2000, https://doi.org/10.1007/s00382-011-1159-1.
Kleeman, R., 2002: Measuring dynamical prediction utility using relative entropy. J. Atmos. Sci., 59, 2057–2072, https://doi.org/10.1175/1520-0469(2002)059<2057:MDPUUR>2.0.CO;2.
Kondrashov, D., M. D. Chekroun, A. W. Robertson, and M. Ghil, 2013: Low-order stochastic model and “past-noise forecasting” of the Madden–Julian oscillation. Geophys. Res. Lett., 40, 5305–5310, https://doi.org/10.1002/grl.50991.
Kravtsov, S., D. Kondrashov, and M. Ghil, 2005: Multilevel regression modeling of nonlinear processes: Derivation and applications to climatic variability. J. Climate, 18, 4404–4424, https://doi.org/10.1175/JCLI3544.1.
Krishnamurthy, V., and J. Shukla, 2007: Intraseasonal and seasonally persisting patterns of Indian monsoon rainfall. J. Climate, 20, 3–20, https://doi.org/10.1175/JCLI3981.1.
Lau, W. K.-M., and D. E. Waliser, 2011: Intraseasonal Variability in the Atmosphere–Ocean Climate System. 2nd ed. Springer, 614 pp.
Lee, J.-Y., B. Wang, M. C. Wheeler, X. Fu, D. E. Waliser, and I.-S. Kang, 2013: Real-time multivariate indices for the boreal summer intraseasonal oscillation over the Asian summer monsoon region. Climate Dyn., 40, 493–509, https://doi.org/10.1007/s00382-012-1544-4.
Liptser, R. S., and A. N. Shiryaev, 2001: II. Applications. Vol. 2, Statistics of Random Processes, Springer, 402 pp.
Majda, A. J., and B. Gershgorin, 2010: Quantifying uncertainty in climate change science through empirical information theory. Proc. Natl. Acad. Sci. USA, 107, 14 958–14 963, https://doi.org/10.1073/pnas.1007009107.
Majda, A. J., and B. Gershgorin, 2011: Improving model fidelity and sensitivity for complex systems through empirical information theory. Proc. Natl. Acad. Sci. USA, 108, 10 044–10 049, https://doi.org/10.1073/pnas.1105174108.
Majda, A. J., and M. Branicki, 2012: Lessons in uncertainty quantification for turbulent dynamical systems. Discrete Contin. Dyn. Syst., 32, 3133–3221, https://doi.org/10.3934/dcds.2012.32.3133.
Majda, A. J., and Y. Yuan, 2012: Fundamental limitations of ad hoc linear and quadratic multi-level regression models for physical systems. Discrete Contin. Dyn. Syst., 17B, 1333–1363, https://doi.org/10.3934/dcdsb.2012.17.1333.
Majda, A. J., and J. Harlim, 2013: Physics constrained nonlinear regression models for time series. Nonlinearity, 26, 201–217, https://doi.org/10.1088/0951-7715/26/1/201.
Murakami, T., L.-X. Chen, and A. Xie, 1986: Relationship among seasonal cycles, low-frequency oscillations, and transient disturbances as revealed from outgoing longwave radiation data. Mon. Wea. Rev., 114, 1456–1465, https://doi.org/10.1175/1520-0493(1986)114<1456:RASCLF>2.0.CO;2.
Nair, A., U. Mohanty, A. W. Robertson, T. Panda, J.-J. Luo, and T. Yamagata, 2014: An analytical study of hindcasts from general circulation models for Indian summer monsoon rainfall. Meteor. Appl., 21, 695–707, https://doi.org/10.1002/met.1395.
Packard, N. H., J. P. Crutchfield, J. D. Farmer, and R. S. Shaw, 1980: Geometry from a time series. Phys. Rev. Lett., 45, 712, https://doi.org/10.1103/PhysRevLett.45.712.
Pattanaik, D. R., and A. Kumar, 2010: Prediction of summer monsoon rainfall over India using the NCEP Climate Forecast System. Climate Dyn., 34, 557–572, https://doi.org/10.1007/s00382-009-0648-y.
Rajeevan, M., D. S. Pai, R. A. Kumar, and B. Lal, 2007: New statistical models for long-range forecasting of southwest monsoon rainfall over India. Climate Dyn., 28, 813–828, https://doi.org/10.1007/s00382-006-0197-6.
Sabeerali, C. T., R. S. Ajayamohan, D. Giannakis, and A. J. Majda, 2017: Extraction and prediction of indices for monsoon intraseasonal oscillations: An approach based on nonlinear Laplacian spectral analysis. Climate Dyn., 49, 3031–3050, https://doi.org/10.1007/s00382-016-3491-y.
Sahai, A. K., and Coauthors, 2013: Simulation and extended range prediction of monsoon intraseasonal oscillations in NCEP CFS/GFS version 2 framework. Curr. Sci., 104, 1394–1408.
Sauer, T., J. A. Yorke, and M. Casdagli, 1991: Embedology. J. Stat. Phys., 65, 579–616, https://doi.org/10.1007/BF01053745.
Sikka, D. R., and S. Gadgil, 1980: On the maximum cloud zone and the ITCZ over Indian longitudes during the southwest monsoon. Mon. Wea. Rev., 108, 1840–1853, https://doi.org/10.1175/1520-0493(1980)108<1840:OTMCZA>2.0.CO;2.
Slawinska, J., and D. Giannakis, 2017: Indo-Pacific variability on seasonal to multidecadal time scales. Part I: Intrinsic SST modes in models and observations. J. Climate, 30, 5265–5294, https://doi.org/10.1175/JCLI-D-16-0176.1.
Suhas, E., J. M. Neena, and B. N. Goswami, 2013: An Indian monsoon intraseasonal oscillations (MISO) index for real time monitoring and forecast verification. Climate Dyn., 40, 2605–2616, https://doi.org/10.1007/s00382-012-1462-5.
Székely, E., D. Giannakis, and A. J. Majda, 2016a: Extraction and predictability of coherent intraseasonal signals in infrared brightness temperature data. Climate Dyn., 46, 1473–1502, https://doi.org/10.1007/s00382-015-2658-2.
Székely, E., D. Giannakis, and A. J. Majda, 2016b: Initiation and termination of intraseasonal oscillations in nonlinear Laplacian spectral analysis-based indices. Math. Climate Wea. Forecasting, 2, 1–25, https://doi.org/10.1515/mcwf-2016-0001.
Takens, F., and Coauthors, 1981: Detecting strange attractors in turbulence. Lect. Notes Math., 898, 366–381, https://doi.org/10.1007/BFb0091924.
Thomson, D. J., 1982: Spectrum estimation and harmonic analysis. Proc. IEEE, 70, 1055–1096, https://doi.org/10.1109/PROC.1982.12433.
Tung, W.-W., D. Giannakis, and A. J. Majda, 2014: Symmetric and antisymmetric convection signals in the Madden–Julian oscillation. Part I: Basic modes in infrared brightness temperature. J. Atmos. Sci., 71, 3302–3326, https://doi.org/10.1175/JAS-D-13-0122.1.
Wang, B., Q. Ding, X. Fu, I.-S. Kang, K. Jin, J. Shukla, and F. Doblas-Reyes, 2005: Fundamental challenge in simulation and prediction of summer monsoon rainfall. Geophys. Res. Lett., 32, L15711, https://doi.org/10.1029/2005GL022734.
Webster, P. J., V. O. Magaña, T. N. Palmer, J. Shukla, R. A. Tomas, M. Yanai, and T. Yasunari, 1998: Monsoons: Processes, predictability, and the prospects for prediction. J. Geophys. Res., 103, 14 451–14 510, https://doi.org/10.1029/97JC02719.
Wheeler, M. C., and H. H. Hendon, 2004: An all-season real-time multivariate MJO index: Development of an index for monitoring and prediction. Mon. Wea. Rev., 132, 1917–1932, https://doi.org/10.1175/1520-0493(2004)132<1917:AARMMI>2.0.CO;2.
Zhang, C., 2005: Madden–Julian Oscillation. Rev. Geophys., 43, RG2003, https://doi.org/10.1029/2004RG000158.