## Abstract

Motivated by an attempt to augment dynamical models in predicting the Madden–Julian oscillation (MJO), and to provide a realistic benchmark to those models, the predictive skill of a multivariate lag-regression statistical model has been comprehensively explored in the present study. The predictors of the benchmark model are the projection time series of the leading pair of EOFs of the combined fields of equatorially averaged outgoing longwave radiation (OLR) and zonal winds at 850 and 200 hPa, derived using the approach of Wheeler and Hendon. These multivariate EOFs serve as an effective filter for the MJO without the need for bandpass filtering, making the statistical forecast scheme feasible for the real-time use. Another advantage of this empirical approach lies in the consideration of the seasonal dependence of the regression parameters, making it applicable for forecasts all year-round. The forecast model exhibits useful extended-range skill for a real-time MJO forecast. Predictions with a correlation skill of greater than 0.3 (0.5) between predicted and observed unfiltered (EOF filtered) fields still can be detected over some regions at a lead time of 15 days, especially for boreal winter forecasts. This predictive skill is increased significantly when there are strong MJO signals at the initial forecast time. The analysis also shows that predictive skill for the upper-tropospheric winds is relatively higher than for the low-level winds and convection signals. Finally, the capability of this empirical model in predicting the MJO is further demonstrated by a case study of a real-time “hindcast” during the 2003/04 winter. Predictive skill demonstrated in this study provides an estimate of the predictability of the MJO and a benchmark for the dynamical extended-range models.

## 1. Introduction

Since its discovery in the early 1970s, the significant role of the Madden–Julian oscillation (MJO; Madden and Julian 1994) as a component of the tropical variability has been widely recognized. The MJO activities have been found to be intimately associated with onset and active/break conditions of the Asian (Yasunari 1979; Lau and Chan 1986; Goswami 2005; Waliser 2006) and Australian (Hendon and Liebmann 1990; Wheeler and McBride 2005) monsoons. The MJO significantly modulates tropical cyclone genesis (Maloney and Hartmann 2000; Mo 2000; Higgins and Shi 2001). Moreover, the westerly wind burst associated with the enhanced MJO phase over the western Pacific may act as stochastic forcing in triggering El Niño–Southern Oscillation (ENSO; e.g., Moore and Kleeman 1999; McPhaden 1999; Kessler and Kleeman 2000). The direct impacts of the MJO on the evolution of extratropics through the convective heating variability are also evident (e.g., Weickmann 1983; Liebmann and Hartmann 1984). Ferranti et al. (1990) demonstrated that the skill of European Centre for Medium-Range Weather Forecasts in the extratropics is significantly improved when the errors associated with the representation of the tropical intraseasonal oscillation are minimized.

In addition to short-range weather prediction with typical lead time of days, and seasonal-to-interannual climate prediction with typical lead times of seasons, recently there has been growing interest in subseasonal forecasts with lead times on the order of weeks (e.g., Winkler et al. 2001; Schubert et al. 2002; Waliser et al. 2003a; Waliser 2005). In particular, the important role of the MJO as a basis for developing and exploiting subseasonal predictions has been highlighted (Schubert et al. 2002). These growing interests in the subseasonal forecast led to the recent development of an experimental MJO prediction project (Waliser et al. 2006) as well as other real-time efforts (e.g., Webster and Hoyos 2004; see also information online at http://www.icess.ucsb.edu/asr/mjo_forecasts.htm, http://www.cdc.noaa.gov/map/lim, http://www.bom.gov.au/bmrc/clfor/cfstaff/matw/maproom/OLR_modes/index.htm, http://www.bom.gov.au/bmrc/clfor/cfstaff/matw/maproom/RMM).

It is an intuitive notion that the predictability of a phenomenon is proportional to its own period or lifetime (e.g., Van den Dool and Saha 1990), namely, approximately 50 days for the MJO. However, current operational numerical weather prediction (NWP) models show rather limited predictive skill for the MJO, with useful skill only up to lead times of about 7–10 days (e.g., Waliser et al. 1999; Hendon et al. 2000; Jones et al. 2000; Waliser 2005; Seo et al. 2005). This limited predictive skill for the MJO in current NWP models mainly could be ascribed to the model deficiencies in representing the deep convection of the MJO, rather than the reaching of an intrinsic limit of predictability of the MJO. Thus, in order to explore the theoretical limit of predictability for the MJO, a natural avenue adopted was the development of empirical models [see Waliser (2005, 2006 for more complete discussion].

The first study based on the empirical model of the MJO was conducted by Von Storch and Xu (1990). They developed a scheme based on principal oscillation pattern analysis of equatorial 200-hPa velocity potential anomalies. Several years later, as an attempt to explore the feasibility of employing such empirical models to augment operational long-range forecasting procedures, Waliser et al. (1999) assessed the predictive skills of the empirical models based on a field-to-field singular value decomposition (SVD) of lagged maps of 30–70-day filtered outgoing longwave radiation (OLR) and upper-tropospheric zonal wind. Lo and Hendon (2000), then, developed a forecast scheme by using the first leading pair of empirical orthogonal functions (EOFs) of OLR and three leading EOFs of 200-hPa streamfunction. Jones et al. (2004), on the other hand, used a combined EOF analysis of 20–70-day bandpass-filtered OLR, 850- and 200-hPa zonal winds, and multiple lag regression to assess forecast skill over the bulk of the tropics. The model utilizes the first five principal components (PCs) from the combined EOF analysis and the five most recent values of the PCs. With quite a different approach, Mo (2001) used a combination of singular spectral analysis and maximum entrophy methods for the monitoring and forecast of the OLR. Meanwhile, Wheeler and Weickmann (2001) conducted prediction of convectively coupled tropical modes by employing filtering techniques based on tropical wave theory.

The forecast skill of the MJO obtained by these empirical approaches is generally on the order of 20–25 days of lead time for the dynamical fields and about 15 days for rainfall–OLR, surpassing performances by most dynamical NWP models. While this result seems rather encouraging for subseasonal forecasting, the main hurdle of these approaches in the real-time application is considered to be the extraction of the low-frequency signals without the use of temporal filtering as utilized in most of the aforementioned schemes. For this purpose, Wheeler and Hendon (2004, hereinafter WH04) developed an MJO index based on the first two combined EOFs of equatorially averaged OLR and zonal winds at 850 and 200 hPa, from which the annual cycle and an estimate of the interannual variability have been subtracted. Their method removes the necessity to perform time filtering to identify the MJO, making it feasible for the real-time MJO monitoring–forecast. Based on this approach, Maharaj and Wheeler (2005) predicted the time series of two leading PCs based on an autoregressive model. While this approach has been implemented in an operational forecast for the MJO at the Australian Bureau of Meteorological Research Center (BMRC), a comprehensive documentation of predictive skills based on this approach is still missing.

In the present study, we attempt to report in detail the predictive skills of an empirical forecast model for the MJO based on WH04’s approach. Such work is necessary in light of the following rationale: (a) to provide assessment of how well a statistical model might perform for the *real-time* forecast of the MJO; (b) to quantify the potential predictability of the MJO for the subseasonal forecast; (c) to provide a more realistic benchmark by which to assess the predictive skill of extended-range prediction from numerical forecast models; and (d) to explore the feasibility of an experimental study that augments a dynamical forecast model via assimilation of an empirical forecast, as will be discussed in the summary section. The organization of this paper is as follows. The datasets and methodology of a combined EOF employed for this study are described in section 2. In section 3, details of the empirical forecast model and predictive skill will be described. Then, sensitivity of the predictive skill to the number of leading PCs and number of the most recent values of the PCs in the forecast model, as well as its dependence on the MJO state in the initial condition, are illustrated in section 4. In section 5, application of this empirical model for the real-time forecast will be discussed. A summary and discussion are provided in section 6.

## 2. Data and methodology

### a. Datasets and initial analysis

Following WH04, we use the combined fields of OLR and zonal wind at the 850- and 200-hPa levels to construct an index of the MJO. The OLR data, a good proxy for tropical convection (Waliser et al. 1993), were processed by the National Oceanic and Atmospheric Administration (NOAA; Liebmann and Smith 1996). The zonal winds at 850 (u850) and 200 (u200) hPa are based on the National Centers for Environmental Prediction (NCEP)/Department of Energy (DOE) Global Reanalysis-2 (Kanamitsu et al. 2002). Both the OLR and wind data have horizontal resolution of 2.5° × 2.5° and cover the global tropics (30°S–30°N) for the 1983–2004 period. Additionally, for the purpose of removing the interannual variability from the data, the monthly mean NOAA Optimum Interpolation Sea Surface Temperature, version 2 (OISST V2; Reynolds et al. 2002), with a horizontal resolution of 1° × 1° during 1983–2004, is also used to construct an index of ENSO.

Before conducting the combined EOF analysis, the seasonal cycle (time mean and first three harmonics of climatological annual cycle for the period of 1983–2004) is removed from each grid point of OLR, u850, and u200 fields. Then, the interannual variability (IAV) associated with ENSO is removed. The removal of the IAV is necessary because time mean anomalies associated with the mature phase of El Niño–La Niña resemble the phase of the MJO when convection is centered near the date line–Maritime Continent (Lo and Hendon 2000; WH04). The IAV of each variable is obtained by the variability that is linearly related to an ENSO index, where the latter is based on the time series of the first rotated EOF of SSTs over the Indian and Pacific sectors (50°S–60°N, 30°E–70°W).^{1} Monthly regression parameters against the ENSO index for each variable at each grid point are calculated and then interpolated to daily values to form a 365-day seasonally dependent regression relationship. At any given time, the IAV component of each field can be derived based on the observed ENSO index and regression parameter at that particular day, and is subtracted from the value at each grid point. Finally, a 120-day mean of the previous 120 days is subtracted to further remove any information of IAV, decadal variability, and trends (WH04; also see http://www.bom.gov.au/bmrc/clfor/cfstaff/matw/maproom/RMM for more details of these procedures).

### b. Combined EOF analysis

After the above procedures for the removal of certain low-frequency components, an EOF analysis (Kutzbach 1967) is performed based on the correlation matrix^{2} of the combined daily fields of equatorially averaged (15°S–15°N) OLR, u850, and u200 over global longitudes for 1983–2004. This combined EOF yields a leading pair of PC time series that vary mostly on the intraseasonal time scale. It serves as an effective filter for the MJO without the need for time filtering, making the PC time series an effective index for real-time MJO monitoring/forecasting (WH04).

Figure 1 illustrates the leading pair of combined EOFs (OLR, u200, and u850). Immediately evident are the zonal wavenumber-1 and -2 structures of the MJO. The first EOF pattern shows enhanced convection over the eastern Indian Ocean–Maritime Continent, with perturbation westerlies (easterlies) in the lower (upper) troposphere on the western side of convection center; while the second EOF displays enhanced convection over the western Pacific with associated perturbation winds. The first and second EOFs contribute to 11% and 10% of the total combined variances, respectively, and are well separated from other EOFs based on North et al.’s (1982) formula (Fig. 2). It has been widely documented that these two leading EOFs represent the same propagating MJO mode at different phases. Thus, PC time series of the two leading EOFs are used as the MJO index, a subsample of which, for the period of 2003–04, is displayed in Figs. 3a,b. This MJO index has also been shown to be advantageous for a measure of interannual modulation of the MJO over that of some previous studies (WH04). The combined amplitude of the two PCs (Fig. 3c) is further used as the index of MJO amplitude, and it will be used to determine amplitudes of the MJO at the initial forecast time when performing real-time empirical forecasts of the MJO in the following sections.

## 3. Empirical forecast model and validation

### a. Multivariate lag-regression model

Based on PC time series obtained above, a linear lag-regression model is constructed. Although in the present study we will mainly demonstrate forecast results of OLR and zonal winds at 850 and 200 hPa, a similar approach can be easily applied for the forecasts of various variables and at multiple vertical levels. Note that the predictand in the forecast model is also subject to the removal of low-frequency components.

For predictand *X* at a particular grid point, the lag-regression model can be written as

where *t*_{0} is the time at the forecast point and *τ* is the forecast lead. Here, *N* is the number of total PCs included in this forecast model, *M* is the number of lagged days used for the prediction, *C _{j}*,

*is the lag-regression parameter for particular PC*

_{k}*at the*

_{k}*j*th day earlier than

*t*

_{0}. Note that there is one forecast model for

*X*at each grid point and each lead time

*τ*.

Because two leading EOFs are capable of capturing the essential features of the MJO as previously discussed, we mainly consider a baseline for the MJO forecast by only including the first two leading PCs at the most recent point in time, that is, *N* = 2, *M* = 1 in Eq. (1). Then, the forecast model at a particular grid point can be simplified as

where *β*_{1} and *β*_{2} are the lag-regression parameters of PC_{1} and PC_{2} at forecast time *t*_{0} against the predictand *X*, which are obtained based on independent historical observations. Note that seasonally varying lag-regression parameters are considered in this forecast model, that is, the monthly based lag-regression coefficients are calculated with the observed variable *X* and corresponding PCs in that particular month. (The data points for each regression are about 600 and slightly vary with month and time lag.) Then, the monthly based regression coefficients are linearly interpolated to daily values for 365 days of a year.

### b. Model skills

In this section, the predictive skill of the forecast model for OLR, u850, and u200 for the 1983–2004 period is assessed. A cross-validation approach is adopted in order to efficiently exploit the limited observations, that is, when conducting the predictions for a particular year, the observations during the 1983–2004 period except that particular year are used to obtain the lag-regression coefficients *β*_{1} and *β*_{2} in Eq. (2). Then, with the observed PC time series in that year, prediction can be conducted based on the forecast model.

Because the MJO has the maximum amplitude over the tropical region, the forecast model is confined to the global tropics (30°S–30°N). The prediction has been performed at each grid point for each variable, including OLR, u850, and u200, at every fifth day from the beginning of each month for a 30-day-period daily forecast. The statistics of predictive skill for the boreal summer season are calculated based on forecasts starting from 1 June to 31 August, and those for boreal winter are based on forecasts from 1 December to the end of February.

Figure 4 illustrates model predictive skill for OLR (solid lines) by providing pattern correlation coefficients between forecasts and observed EOF-filtered OLR patterns as a function of forecast lead. The skill obtained by persistence is also shown by dashed lines. Because the combined EOF is based on equatorially averaged fields, the 2D spatial pattern of EOF-filtered OLR is defined based on the linear regression equation of OLR against two leading PCs at lag 0, that is, Eq. (2) with *τ* = 0. Given the observed parameters of two leading PCs and their corresponding simultaneous regression coefficients, the OLR perturbation value can be obtained at each grid point. The pattern correlations shown in Fig. 4 are calculated over the global tropics (30°S–30°N) at the original resolution of the datasets, that is, 2.5° × 2.5° for both OLR and zonal wind fields as displayed in the following figures.^{3} The predictive skill for boreal summer [June–August (JJA)] and winter [December–February (DJF)] is displayed separately. It is obvious that the forecast model exhibits superior skill over the persistence for both winter and summer predictions. Also clearly evident is that the predictive skill for the MJOs during winter is generally higher than that during summer, which is in accord with the perception that MJOs are generally better organized and exhibit stronger amplitudes during boreal winter. The pattern correlation coefficient for the winter prediction is about 0.35 at 15-day forecast lead and 0.19 for the summer prediction. This lead of skill is relatively lower than that obtained in previous studies, for example, Waliser et al. (1999) and Jones et al. (2004), which have illustrated correlation of about 0.6 at 15-day lead for winter MJOs. However, note again that in these aforementioned studies, bandpass filtering with an intraseasonal period has been applied before constructing the empirical models, while such a technique has not been employed here for the purpose of real-time application. Also, more predictors included in the forecast models of these previous studies could also be responsible for this difference (e.g., Jones et al. 2004). Nevertheless, it is worth mentioning that the predictive skill for the MJO will depend significantly on spatial location as well as the MJO amplitude at the forecast initial time. The predictive skill as shown in Fig. 4 actually reflects an average level over the global tropics with inclusion of many weak MJO events and quiescent periods. As will be shown later, over some particular locations, especially with strong MJO events during the forecast period, the present statistical model can still exhibit very encouraging predictive skill at a long forecast lead time.

Figure 5 presents spatial patterns of temporal correlation between forecast and EOF-filtered observations of OLR at forecast leads from 5 to 20 days for both boreal winter and summer seasons. For winter, the decrease of skill scores with forecast lead first appears over the eastern equatorial Pacific and in the vicinity of Gulf of Mexico and Caribbean Sea. As forecast lead increases, areas with poor predictive skills expand to the North and central Pacific. In contrast, relatively high skill scores persist over a number of regions, including the Maritime Continent, south-central Pacific, and South America continent. In particular, a correlation as high as 0.5 is exhibited to the north of Australia, the central Pacific around 10°S, and over Brazil at the lead of 15 days (Fig. 5c). At the lead of 20 days, correlation scores greater than 0.3 are still evident over these regions.

Consistent with the results shown by Fig. 4, the skill scores for the boreal summer (Figs. 5e–h) at each forecast lead are generally lower than the corresponding winter predictions. Relatively higher skill scores during summer are found over the South Asian monsoon region, including the Arabian Sea and Bay of Bengal, and the Maritime Continent, as well as the northeastern Pacific and South America, with scores of about 0.3 at a lead of 15 days. Relatively low skill scores are observed over the central Pacific and north of Australia.

Next, a stricter test of model predictive skill is performed by illustrating pattern correlation coefficients between predicted OLR, u850, and u200 against their corresponding unfiltered daily anomalies. In this case, the observed daily anomalies are only subject to the removal of the annual cycle and interannual variations associated with ENSO. Results are displayed in Fig. 6 for both winter and summer. Immediately noticeable is that the model shows much higher predictive skill for u200 than OLR and u850, which is particularly true during boreal winter. Skill scores for u850 are comparable to that for OLR during winter, while they are slightly higher than the latter during summer. Because observed unfiltered anomaly fields are used when calculating the correlation, the skill scores shown in this figure are generally lower than those shown in Fig. 4.

Then, similar to Fig. 5, the spatial distributions of temporal correlation coefficients between predicted OLR, u850, and u200 against their corresponding observed unfiltered perturbation patterns for both winter and summer are demonstrated in Figs. 7, 8 and 9, respectively. Figure 7 shows the spatial correlation pattern for OLR. The spatial distribution of high correlation shown in this figure is largely similar to that by Fig. 5, although the magnitude is considerably lower here. For instance, regions with relatively high skill scores for the winter predictions are located over northern Australia, equatorial Africa, the central South Pacific, and Brazil, while for summer predictions, high predictive skills are found over the Arabian Sea, where the cross-equatorial Somali jet resides, the eastern equatorial Indian Ocean, and the Maritime Continent regions, as well as Brazil. It is particularly noteworthy that although unfiltered OLR patterns have been employed for this figure, the model still exhibits encouraging skills over north of Australia with correlation as high as 0.3 for the winter predictions.

Similarly, Fig. 8 shows a spatial correlation distribution of u850 at various forecast leads. For both seasons, the regions with relatively higher scores are located over elongated zones over the global tropics. While the high skill score zone is largely located to south of the equator during boreal winter, it shifts to north of the equator with stronger amplitude over the Indian Ocean and western Pacific sectors. Then, Fig. 9 displays the correlation distribution pattern of u200. Much higher correlation scores as compared to OLR and u850 can be found over vast regions in the Eastern Hemisphere, especially during boreal winter. At a forecast lead of 15 days, most regions of the equatorial Indian Ocean show skill scores greater than 0.3. In particular, correlation scores as high as 0.5 can even be detected over small regions near the tip of the Indian Peninsula and Indochina during winter. During boreal summer (Fig. 9, right panels), higher predictive skill for u200 is mainly located over the South Indian Ocean off the Africa coast, the Maritime Continent, and the Atlantic Ocean to the east of Brazil (e.g., Fig. 9g).

To summarize, although the present empirical model shows relatively lower predictive skills as compared with those by previous studies subject to bandpass filtering of the intraseasonal time period, it still exhibits very promising skill over some particular regions, even based on the validation against unfiltered observed fields. Note that this skill can be further dramatically increased with knowledge of the state of the MJO at the initial time of the forecast (e.g., Von Storch and Baumhefner 1991; Goswami and Xavier 2003; Waliser et al. 2003b; Fu et al. 2007), as will also be illustrated in the following section.

## 4. Sensitivity tests

As previously noted, the main advantage of the current forecast model is its potential feasibility for the real-time application by avoiding a bandpass filtering to extract the intraseasonal signals. In this section, we attempt to explore the degrees of skill that could be gained by employing a number of variations to the model.

### a. Impact of time filtering with an intraseasonal period

For the purpose of clarity, we refer to the forecast model described in the previous section based on the most recent PC values of combined EOF of OLR, u850, and u200 as the control experiment (Ctrl_exp). Then, two additional experimental forecasts have been conducted following a similar method adopted in the Ctrl_exp, except that OLR, u850, and u200 fields are further subject to time filtering to highlight the intraseasonal variability after the removal of seasonal cycle and interannual variability associated with ENSO. The filtered fields are then used for the combined EOF analysis to get the two leading PCs, which consist of the predictors in the forecast model, and are also used for training the model to get the regression parameters. In one experiment, a Lanczos time filtering is performed by keeping the intraseasonal period of 20–70 days (Exp_20_70d; Duchon 1979). This experiment mimics the schemes used in many previous studies (e.g., Waliser et al. 1999; Jones et al. 2004). In another experiment, a low-pass time filtering with 20-day cutoff is applied (Exp_20d_cutoff). This low-pass filtering to remove the higher-frequency noise in this experiment is similar to some extent with the study by Lo and Hendon (2000), in which a T12 spatial truncation was applied to serve this purpose by assuming that the high-frequency temporal and high-wavenumber spatial variations tend to occur concomitantly.

Additionally, a third experimental forecast Exp_pentad is conducted, which is the same as Ctrl_exp except that 5-day (pentad) mean variables are used to construct and train the empirical model instead of the daily values used in Ctrl_exp. The application of pentad means may also effectively remove high-frequency noise. However, it is noted that because application of pentad means are involved, the forecast needs to be started 2 days earlier for a real-time forecast.

The predictive skills from these three experiments are illustrated in Fig. 10, which shows the pattern correlations between predicted and EOF-filtered OLR during boreal winter. While the application of low-pass time filtering (Exp_20d_cutoff) is seen to improve the skill modestly, the bandpass filtering (Exp_20_70d) does bring a significant increase of predictive skill over Ctrl_exp. The pattern correlation score is about 0.56 at the forecast lead of 15 days by Exp_20_70d, which is 0.33 in Ctrl_exp, as previously mentioned. Moreover, a correlation of greater than 0.3 can persist up to lead times of 25 days in Exp_20_70d. This result is largely consistent with those described in the aforementioned previous studies. Moreover, it is found that the pentad mean forecast (Exp_pentad) only exhibits a very limited increase of predictive skills over Ctrl_exp. Considering that a lead time of 2 days is lost due to application of pentad mean calculations, the skill by this experiment does not show much advantage over Ctrl_exp.

### b. Inclusion of more PCs

In the control experiment, only two leading PCs of the combined EOF are employed as predictors in constructing the forecast model. In this section, the dependence of model predictive skill on the number of PCs is explored. We have conducted a series of tests by including up to five PCs at the most recent time in the empirical model, that is, *N* = 2, 3, 4, 5 and *M* = 1 in Eq. (1).

Figures 11a,b illustrate the predictive skill of OLR as a function of forecast lead by considering various numbers of PCs in the model. (Note that the experiment with two PCs is identical to that of Ctrl_exp.) Here, the pattern correlation is calculated between the forecast OLR fields against their corresponding observed unfiltered perturbation patterns. The correlations at forecast lead 0 provide the variances of the observed perturbation fields explained by retained PCs. It is shown that inclusion of more PCs beyond the leading pair only slightly increases the explained variance of the total field. Also, it is evident that inclusion of more PCs in the forecast model only moderately increases predictive skills for both boreal winter and summer forecasts. The spatial pattern of temporal correlation coefficients between the predicted and observed unfiltered OLR at the forecast lead of 20 days based on the model with five PCs are displayed in Figs. 11c,d for both boreal winter and summer cases. Compared to the correlation distribution in Ctrl_exp (Figs. 7d,h), the skill gained for OLR by inclusion of more PCs is mainly confined over the central (eastern) equatorial Pacific during boreal winter (summer). Further inspection suggests that limited improvement of predictive skills over these regions is mainly brought about by including the third PC in the forecast model (figure not shown). It is illustrated by Kessler (2001) that the third EOF of OLR over the tropics largely captures the eastward shift of MJO activity to the east of the date line during El Niño events. Thus, inclusion of the third PC in the forecast model may enhance the predictive skills by capturing the MJO activities over these regions.

### c. Inclusion of additional lags of the PCs

In this section, we attempt to test whether additional information on the state of the MJO earlier in time can add to the skill of predicting the MJO similar to that by Jones et al. (2004). In contrast with Ctrl_exp, in which only the MJO state at the most recent point in time (*t*_{0}) is included, we have conducted additional forecasts in which parameters of two PCs at 5, 10, 15, and up to 20 days prior to *t*_{0} in the forecast model are considered in addition to only those at *t*_{0}, that is, *N* = 2 and *M* = 1, 2, 3, 4, and 5 pentad lags in Eq. (1). (Here the forecast with one pentad lag is identical to Ctrl_exp.) Predictive skills by these experiments for boreal winter and summer are demonstrated in Fig. 12. The results generally suggest that inclusion of more information of the two leading PCs in the past does not bring much gain in predictive skill.

### d. Sensitivity to MJO strength

It has been noted that predictive skills of such empirical models could dramatically depend on the state of MJO in the initial condition. Lo and Hendon (2000) demonstrated that the empirical scheme readily beats dynamical extended-range forecast model for lead times greater than 6 days when the MJO is active in the initial condition, while the empirical model exhibits rather limited skills when the MJO is quiescent. Similar findings are also discussed by other studies (e.g., Von Storch and Baumhefner 1991; Goswami and Xavier 2003; cf. Jones et al. 2000; Waliser et al. 2003b).

Next, we conduct two experimental forecasts to illustrate the dependence of model forecast skill on the MJO amplitude at the initial time. These forecasts are similar to Ctrl_exp, except that they are only performed with strong or weak MJO signals at the initial time. Strong versus weak MJO conditions are determined based on the combined amplitude of PC_{1} and PC_{2} (Fig. 3c), with the amplitudes of strong MJO events exceeding 1.5 (black line) and weak MJO events less than 1 (gray line). Note that this information for the MJO amplitude is available in real time.

Figure 13 shows the pattern correlations with forecast leads based on forecasts of all cases (i.e., Ctrl_exp), and cases with strong and weak MJO signals at the initial forecast time, respectively, for both boreal winter and summer seasons. The pattern correlation coefficients derived based on the predicted and observed unfiltered OLR fields are displayed in Figs. 13a,b. It is readily seen that predictive skills are dramatically increased when there are strong MJO signals at the initial condition. For instance, for both winter and summer predictions, the skill levels at a forecast lead of 15 days by the predictions for the strong MJO events are nearly comparable to that of a 5-day forecast in the control experiments.

Significant improvement of the predictive skill for the prediction with strong MJO signals at the initial forecast time for both boreal winter and summer can also be clearly evident in Fig. 14, which displays the temporal correlation distribution of OLR, u850, and u200 against the unfiltered fields at the forecast lead of 20 days. Compared to previous figures (e.g., Figs. 7d,h, 8d,h, 9d,h) the patterns are generally similar, however the maxima of correlation coefficients for each variable are markedly increased.

Note that correlation scores shown in Figs. 13a,b also contain information of model predictive skill for the weather-scale systems. Because the present empirical model is mainly designed for predicting the MJO, the pattern correlations between the predicted and EOF-filtered OLR fields are also illustrated in Figs. 13c,d. For the real-time prediction, forecasts may only be performed when there are strong MJO signals at the initial forecast time. Thus, skill scores for the strong MJO events as shown in Figs. 13c,d may provide particularly valuable estimates of the baseline capability of this empirical model in predicting the MJO. The results again clearly illustrate that the predictive skill for the MJO will increase dramatically if there are strong MJO signals at the initial forecast time, especially for the winter predictions. A pattern correlation of about 0.5 is evident at the lead time of 15 days for the winter MJOs, which is 0.33 for all-case predictions.

### e. Advantage by considering seasonally dependent regression parameters

As previously mentioned, seasonally dependent regression parameters are considered in the present empirical model, making it applicable for forecasts all year-round. In this section, we will illustrate how extra predictive skill can be gained by adopting seasonally dependent regression coefficients. For this purpose, an additional experiment has been conducted (Exp_ noseasn), which is similar to the control experiment, except that the regression coefficients for the two leading PCs are calculated based on the all-year data and do not vary with season.

Comparison of the skill scores by this experiment, as measured by pattern correlation to unfiltered observational anomalies, to those by the control experiment are illustrated in Fig. 15. It is evident that extra predictive skill can indeed be gained by employing the seasonally dependent regression parameters as in Ctrl_ exp, especially for summer predictions. Note that the distribution of regression coefficients based on the all-year data in Exp_noseasn is very close to the winter pattern in Ctrl_exp, which mainly describes an eastward-propagating MJO component along the equator; while in addition to this eastward-propagating component, the summer regression pattern in Ctrl_exp also describes observed meridional migrating signals over the Asian monsoon region. Thus, these results indicate that the extra predictive skill gained by employing seasonally varying regression parameters could be through a better description of the relationship between the PC time series and predictand (e.g., distribution of regression coefficients).

## 5. Real-time forecast example

In this section, we briefly describe the approach used to perform the real-time forecast based on the present forecast model. First, the leading EOF patterns of combined fields of OLR, u850, and u200 can be derived based on long-period historical observations. Then, once given the real-time observed OLR, u850, and u200, the removal of the seasonal cycle for these fields can be performed in real time, assuming the seasonal cycle is stationary and can be defined using independent data. Next, with the real-time observed daily SST pattern, a real-time ENSO index can be obtained by projecting this SST pattern onto the leading rotated EOF of SST. The removal of the interannual variability associated with ENSO in these daily fields can be simply accomplished by subtracting the anomalies predicted by the linear regression relation between these daily fields and the leading PC of the rotated EOF of SST. Then, the value of the previous 120-day mean for each variable is further removed at each grid point. After being subject to the removal of seasonal cycle and interannual variations in the real-time daily observations, the parameters of PC_{1} and PC_{2} can be derived by projecting combined fields of OLR, u850, and u200 to the leading pair of combined EOFs. Once PC_{1} and PC_{2} are ready, a real-time forecast can be performed for various variables at the multiple vertical levels based on the lag-regression model, in which the lag-regression coefficients have been determined based on historical datasets. Actually, an operational forecast for the MJO is being routinely conducted at the BMRC of Australia based on this approach (see the aforementioned Web site for details).

Last, we illustrate an example of a real-time “hindcast” for the OLR during the 2003/04 winter (from 1 November 2003 to ∼30 May 2004). In this case, the regression parameters of PC_{1} and PC_{2} are derived based on the observations from 1983–2002, while the observed daily OLR, u850, and u200 as well as the daily NOAA OISST pattern linearly interpolated from its weekly values during the 2003/04 winter are employed to make the real-time daily hindcasts. In Fig. 16a, the eastward-propagating features associated the MJO during the 2003/04 winter season are portrayed by the Hovmöller diagram of observed EOF-filtered OLR (averaged over 10°S–10°N). The predicted counterparts of OLR at forecast leads of 5, 10, and 15 days are displayed by Figs. 16b–d, respectively. It is readily seen that several major eastward-propagating MJO events observed during this winter season are very well predicted, even in the forecast with a lead time of 15 days. The pattern correlation coefficients for the OLR between the observations and forecasts at 5, 10, and 15 days during this period are 0.72, 0.48, and 0.31, respectively. Nevertheless, the decaying amplitudes of predicted OLR with an increase of forecast lead are also clearly discerned. Furthermore, Fig. 17 demonstrates a particular forecast result by providing time evolution of spatial patterns of OLR (shading) and u200 (contours) based on the EOF-filtered observations (left panels) and predictions (right panels) with the initial forecast time of 1 February 2004. It clearly shows that the observed essential features associated with this MJO event in the OLR and u200 fields are well predicted by the empirical model.

## 6. Summary

While the dynamical extended-range forecast models generally exhibit rather limited predictive skills for the MJO at the current stage, the empirical models display much more superior performance and are able to provide useful predictions at a forecast lead of 2–3 weeks. Empirical models, thus, become main avenues to explore the predictability of the MJO and to test the baseline capability in forecasting the MJO in the dynamical extended-range forecast models. On the other hand, it is natural to expect that better representation of MJO information produced by the empirical models may further augment dynamical extended-range predictions. In a current ongoing project, we attempt to explore the feasibility of a “hybrid forecasts” methodology, by which the predicted large-scale, slow-varying MJO circulations generated by an empirical model are to be assimilated into a dynamical model. If this activity results in showing an improvement in forecast skill of either the large-scale circulation pattern and/or the MJO-influenced synoptic features, it will not only provide a near-term means to improve extended-range predictions but also unequivocally motivate the need to improve intrinsic capability in regards to MJO simulation–prediction in the dynamical models. Because development of this project will need an empirical model for predicting the MJO circulations, it becomes the main motivation of the present study.

A multivariate lag-regression model is employed for this purpose. As a first step to our ultimate goal, predictive skills of this empirical model have been comprehensively explored in the present study. The predictors of this regression model include two principal components of first leading pair of the combined EOF of equatorially averaged OLR, u850, and u200 (WH04). This combined EOF can serve as an effective filter for the MJO without the need for bandpass filtering, which has been employed by many previous studies, thus making this scheme feasible for real-time applications. Another advantage of this empirical scheme lies in the consideration of the seasonal dependence of regression parameters, making it applicable for the MJO forecasts all year-round. Also note that although operational forecasts for the MJO based on this scheme have been routinely conducted at the BMRC of Australia, a systematic documentation of predictive skills by this approach is still missing, which is another motivation of present study.

The analysis illustrates that the present empirical model exhibits useful extended-range skill for the real-time MJO predictions. Useful predictions with correlation greater than 0.3 (0.5) between prediction and observed unfiltered (EOF filtered) fields can be still detected over some regions at the forecast lead of 15 days, especially in the forecast for boreal winter MJOs. Furthermore, a significant dependence of the predictive skills on the initial condition of the MJO is demonstrated. When there are strong MJO signals at the initial forecast time, the predictive skills would be greatly improved. Correlation coefficients greater than 0.4 to ∼0.5 between the predicted and observed unfiltered variables can be discerned over vast regions at the forecast lead of 20 days. The analysis also shows that predictive skills for the upper-tropospheric winds are relatively higher than the low-level winds and convection signals. Finally, the capability of this empirical model in predicting the MJO is further demonstrated by a “hindcast” case study during the winter of 2003 to ∼2004. The results show that the observed major eastward-propagating MJO events during that winter are well captured by the forecasts even at the forecast lead of 15 days, although decaying amplitudes in the predicted fields with the increase of forecast lead are evident. While the empirical model demonstrated in this study provides an estimate of the predictability of the MJO and benchmark for the dynamical extended-range models, it is still a relatively simple model. Improvements should be considered in the future that might consider multivariate and/or nonlinear approaches.

## Acknowledgments

This research was carried out at the Jet Propulsion Laboratory (JPL), California Institute of Technology (Caltech), under a contract with NASA; X. Jiang and D. Waliser were jointly supported by the Research and Technology Development program and Human Resources Development Fund at JPL, as well as the NASA Modeling, Analysis and Prediction program. We also thank K. Weickmann, G. Kiladis, S. Woolnough, A. Vintzileos, and another anonymous reviewer for their insightful comments, which led to considerable improvements of earlier versions of this manuscript.

## REFERENCES

**,**

**,**

**,**

**.**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

**,**

## Footnotes

*Corresponding author address:* Xianan Jiang, Jet Propulsion Laboratory/California Institute of Technology, MS 183-501, 4800 Oak Grove Drive, Pasadena, CA 91109. Email: xianan.jiang@jpl.nasa.gov

^{1}

The EOF analysis of SST is performed on a 4° × 4° grid. The rotation of EOFs is based on the varimax algorithm constraint (Richman 1986).

^{3}

Note that because of the original spatial resolution of T63 truncation in the NCEP–DOE Global Reanalysis 2 model, the effective horizontal resolution of the reanalysis data on pressure levels is about 5° in the zonal direction. Thus, pattern correlation coefficients shown here could be slightly underestimated, since roughly 4 times as many grid points than needed are used for the calculation.