## Abstract

A multivariable linear regression model is constructed based on the status of the Madden–Julian oscillation (MJO) and persistence in order to forecast wintertime surface air temperature anomalies over North America out to 4 pentads (20 days). The current and previous states of the MJO are utilized as predictors, based on the Real-time Multivariate (RMM) indices of Wheeler and Hendon. Beyond the persistence-driven first pentad, potentially useful skill is mainly observed during strong MJO events in phases 3, 4, 7, and 8, which correspond to a dipole diabatic heating anomaly in the tropical Indian Ocean and western Pacific. This skill is largely centered over the eastern United States and the Great Lakes region during pentads 2 and 3.

## 1. Introduction

Subseasonal prediction of the atmosphere with lead times between 2 weeks and 2 months presents a distinct challenge compared to the problems of weather and short-term climate prediction (Schubert et al. 2002; Waliser 2005). In terms of traditional weather prediction, 6–10 days is the order for which skillful forecasts may typically be expected (Thompson 1957; Lorenz 1965, 1982; Palmer 1993; van den Dool 1994). Such forecasts are based on the initial conditions observed in the atmosphere which can be exploited by numerical weather prediction (NWP) models (Lin and Brunet 2009; Waliser et al. 2003). Seasonal forecasts on the other hand have lead times on the order of months and are based upon boundary conditions that vary on relatively long interannual time scales, such as tropical sea surface temperature (SST) anomalies (Lin and Brunet 2009; Waliser et al. 2003). The oceans serve this purpose well as they have the ability to impart some predictability into the atmosphere because of their rather large thermal and mechanical inertia (Derome et al. 2001).

Improved subseasonal forecasts would likely benefit short- and long-range forecasts as well. In the short range, the skill of traditional weather forecasts and their associated statistics may be occasionally extended, while the climate prediction problem would benefit with a more accurate treatment of the atmospheric variability that occurs on the subseasonal time scale (Waliser 2005). The difficulty of the subseasonal prediction problem was foreseen by von Neumann (1960). Progress would first be made with the short-range problem where the initial conditions dominate, followed by the long-range problem (climate), which is governed in part by the adjacent boundary conditions. Only then could the intermediate (subseasonal) problem, where the initial and boundary conditions must both be accounted for, be addressed adequately (Waliser 2005; von Neumann 1960). The increasing interest in this problem reflects a certain level of maturity in the more traditional “weather” and “climate” problems (Waliser 2005; Jiang et al. 2008); however, much remains to be desired on the subseasonal time scale as model imperfections along with the amplification of initial errors in the flow contribute to NWP difficulties (Lin and Brunet 2009). It then becomes incumbent to seek other potential sources of predictability that are not treated well by such models. Winkler et al. (2001) suggest that improving extratropical predictability may be more of a tropical diabatic heating issue as opposed to merely addressing tropical SSTs.

The Madden–Julian oscillation (MJO) is the dominant source of intraseasonal variability in the tropics, with a period of 30–60 days (Madden and Julian 1971). It is characterized by deep convection generally residing between the Indian Ocean and the central Pacific coupled with large-scale overturning zonal circulations that extend throughout the entire depth of the troposphere (Madden and Julian 1972; Zhang 2005). With zonal wind and precipitation fields on the order of 1 and 1–3 zonal wavenumbers, it is perhaps not surprising that this phenomenon may be discerned in the observations even without filtering. With this large scale, remote influences outside of the tropics may be expected as well (Zhang 2005; Lin and Brunet 2009).

Interest in the MJO is not merely driven by the phenomenon itself, but by the myriad of nonlocal influences it has upon the atmosphere. The influence of the MJO upon the various monsoons including the Australian (Hendon and Liebmann 1990; Wheeler and McBride 2005), the Asian (Yasunari 1979; Lau and Chan 1986; Goswami 2005), the North American (Mo 2000; Higgins and Shi 2001), and the South American (Jones and Carvalho 2002) is well documented. The relationship between the MJO and tropical cyclones (TCs) has also been investigated. Maloney and Shaman (2008), Maloney and Hartmann (2000), and Mo (2000) are examples of studies of the MJO's impact upon TC activity in the Atlantic basin while Camargo et al. (2008) consider the same issue in the Pacific basin. The exact link between the MJO and ENSO (El Niño–Southern Oscillation) remains controversial (Zhang 2005). This is likely due in part to the nonlinear, or stochastic nature of their interaction (Tang and Yu 2008). Westerly wind bursts associated with the MJO are thought to excite oceanic Kelvin waves and induce changes in the thermocline structure, which directly impacts the state of ENSO (McPhaden 2004; Tang and Yu 2008; Kessler and Kleeman 2000). And Zhang and Gottschalck (2002) argue that the interannual variability of Kelvin wave forcing by the MJO may be more important than individual MJO events in terms of influencing ENSO.

Anomalous tropical convection is known to have a broad impact on the general circulation (Liebmann and Hartmann 1984), so it is not surprising that the MJO has been shown to have a significant impact upon the extratropical circulation. Positive (negative) anomalous tropical heating results in large-scale ascent (descent) and thus, divergence (convergence) in the upper troposphere (Sardeshmukh and Hoskins 1988; Shabbar 2006). Such mass sources or sinks act as a source of Rossby waves that propagate into each hemisphere (e.g., Hoskins and Karoly 1981). These wave trains extend well into the extratropics and are thought to project themselves upon the Pacific–North American circulation pattern (PNA), the North Atlantic Oscillation (NAO), and the Arctic Oscillation (Thompson and Wallace 1998; Cassou 2008; Ferranti et al. 1990; Lin et al. 2009). Such atmospheric modes have a significant impact upon intraseasonal variability in North America. Other meteorological variables affected by the MJO include precipitation over North America (Mo and Higgins 1998; Lin et al. 2010), wintertime surface air temperature (SAT) variability in the Arctic (Vecchi and Bond 2004), and a nearly global rainfall signal (Donald et al. 2006).

Given the impact that the MJO has on the extratropics, it is apparent that it must be taken into account when forecasting on subseasonal time scales. As NWP models have just begun to forecast the MJO with some modest skill (Kang and Kim 2010; Vitart and Molteni 2010; Rashid et al. 2011), this source of predictability is still underutilized. Statistical forecast models do perform fairly well; however, the inherent heterogeneity of this phenomenon and its lack of absolute periodicity limits the usefulness of statistical approaches to some degree (Waliser 2005). Without an adequate treatment of the MJO, it would seem rather difficult to bridge the gap between extended-range weather forecasting and the subseasonal regime. However, Yao et al. (2011) suggest a different approach to the issue of the MJO–extratropical interaction. Given the fact that it takes roughly a week for a diabatic heating signal in the tropics to propagate into North America (Lin et al. 2007) and about 2 weeks for the extratropical response to fully develop (Jin and Hoskins 1995), it would seem natural to use current information of the MJO to forecast relevant atmospheric properties in North America. In Yao et al. (2011), the MJO was represented by an empirical orthogonal function (EOF) analysis of outgoing longwave radiation (OLR). The most recent second principal component was used with some success in a linear regression model to predict intraseasonal temperature anomalies over the North American winter.

Whereas most previous studies sought simultaneous connections between the MJO and the midlatitude atmospheric variable in question, Lin and Brunet (2009) found significant correlations between the MJO and Canadian wintertime surface temperatures with a lag of up to 3 pentads. In this instance, a potential connection between the NAO and Rossby wave trains excited by tropical forcing was noted. Here, we seek to extend the analysis of Yao et al. (2011) by formulating a multilinear regression model of current and past components of the MJO in order to predict wintertime surface air temperatures over North America. We focus on the boreal winter due to the fact that the MJO is more active and thus, more influential on the extratropics (Hendon et al. 2000). The stronger upper-level midlatitude westerlies observed during winter provide a more favorable environment for Rossby wave propagation from the tropical diabatic heating (Hoskins and Karoly 1981). The hope is a more complete model than that of Yao et al. (2011) would be able to demonstrate useful forecast skill on the submonthly time scale during opportune MJO events. This would serve to potentially extend weather forecast lead times at least occasionally via the impact of the MJO on the extratropical circulation. The data used in this study are addressed in section 2, followed by a description of the statistical model and overall results in section 3. The performance of the model during strong MJO events is given in section 4 and final conclusions are discussed in section 5.

## 2. Data

### a. Temperature

Daily averaged SAT data were obtained from the National Centers for Environmental Prediction (NCEP) North American Regional Reanalysis (Mesinger et al. 2006), a localized, high-resolution (32 km) extension of the NCEP–National Center for Atmospheric Research (NCAR) global reanalysis (Kalnay et al. 1996). The daily values were averaged into nonoverlapping pentad data and winter was defined in a December–January–February (DJF) fashion from 1979/80 to 2008/09 for a total of 30 winters. Each winter consists of 18 pentads starting from 2 December for a total of 540 pentads. The final pentad of each winter always spans 25 February to 1 March, regardless of any potential leap days, the presence of which means the “pentad” is actually an average over 6 days.

The time mean and the first two harmonics of the entire 30-yr pentad climatology define the seasonal cycle, and are removed from each point. To eliminate interannual variability the mean for each winter is then removed, leaving the intraseasonal variability. The entire temperature analysis follows from Yao et al. (2011) except that the global reanalysis was utilized in that case (Kalnay et al. 1996) and only 29 winters spanning from 1979/80 to 2007/08 were used. Note we are dealing with temperature anomalies as opposed to the temperatures themselves; throughout this study, the terms shall be used interchangeably.

### b. MJO

EOFs have commonly been employed in characterizing the MJO as seen in Maloney and Hartmann (1998) and Kessler (2001). A similar approach was taken in Wheeler and Hendon (2004) to calculate the Real-time Multivariate MJO (RMM) index. Here the daily RMM values are obtained from the Australian Bureau of Meteorology website (http://cawcr.gov.au/staff/mwheeler/maproom/RMM/). In Wheeler and Hendon (2004), the EOFs were derived from the combined fields of meridionally averaged OLR and zonal winds at 200 and 850 hPa between 15°S and 15°N. The first two modes explain 25% of the variance in the aforementioned fields, representing enhanced convection over the Maritime Continent and the Pacific Ocean, respectively (Wheeler and Hendon 2004). The associated principal components (PCs) are thus denoted as RMM1 and RMM2; they largely vary on the intraseasonal time scale as desired. The seasonal cycle and interannual variability have already been removed from these time series before being organized in the same pentad fashion as the temperature data. This approach differs from that of Yao et al. (2011), where OLR data alone are used as a proxy for convection, and thus the MJO. Despite the different methodologies, the resulting EOFs between Yao et al. (2011) and Wheeler and Hendon (2004) are similar. The correlation between the first PC of Yao et al. (2011) and RMM1 is 0.51, while the correlation between the second PC and RMM2 is −0.77.

## 3. The statistical model

### a. Methodology

Our task is to produce temperature forecasts over North America during the winter season. Yao et al. (2011) used the following simple regression model:

Here PC2 represents the most recent second principal component of the OLR. The multilinear regression model utilized in the present study was chosen through testing the use of different combinations of MJO states several pentads in the past. The expression that was selected exhibited forecast skill as well as relative simplicity:

where *T* is the temperature anomaly to be forecast; [*α*_{1}(*t*), *α*_{2}(*t*), *β*_{1}(*t*), *β*_{2}(*t*), *γ*(*t*)] are the regression coefficients to be solved for; RMM1 and RMM2 are the principal components of Wheeler and Hendon (2004) that represent the MJO; and *t* = 1, 2, 3, 4 represents the forecast pentad. Thus, the current and past states of the MJO (one pentad in the past) are used to predict future SAT anomalies. The MJO indices and the autocorrelation term (current temperature pentad) were regressed against the future (observed) temperature anomalies to obtain the coefficients. One issue to be dealt with however is the significant intercorrelation to be expected among the different predictors in (2). The multicollinearity between the predictors would likely result in dubious regression coefficient values if the standard numerical solution of the normal equation was utilized, namely:

where *a* = [*α*_{1}(*t*), *α*_{2}(*t*), *β*_{1}(*t*), *β*_{2}(*t*), *γ*(*t*)], represents the independent variable matrix consisting of the MJO indices and the temperature persistence term, and represents the dependent variable vector consisting of the future temperature anomalies. Therefore, the ridge regression technique (van Gaans and Vriend 1990) was applied, whereby the standard least squares problem is regularized in order to remove any multicollinearity issues. However, it is noteworthy that the forecast results were generally quite similar whether using the conventional regression analysis (not shown) or ridge regression though the regression coefficients did exhibit an appreciable difference.

The method of *cross validation* (Michaelsen 1987) is utilized to construct (2). One winter is removed from consideration and the remaining 29 winters are used to determine the regression coefficients. The regression model is then tested on the deleted winter using its RMM components and the current temperature anomaly as predictors and the future SAT anomaly as the predictand. This process is repeated for all 30 winters; thus, a forecast is completed for each season based on regression coefficients independent of the season under consideration.

### b. Results

The temporal correlation between the observed and forecast temperatures based on (2) is shown in Fig. 1. Statistically significant correlations are depicted at the 0.01 level via a Student's *t* test. Tests have shown that the pentad 1 forecast is dominated by the persistence term. By pentads 2 and 3, forecast skill is observed from the eastern seaboard of the United States to the southern Canadian prairies. One may surmise this is based on the MJO signal from the temperature composites shown in Fig. 3 of Lin and Brunet (2009). The pentad 4 forecast sees skill extend into Alaska, which is due to the autoregressive term as the wintertime temperature becomes negatively correlated with itself throughout western Canada. Forecast skill associated with the MJO terms was seen to decline substantially into pentad 5 and disappear completely thereafter. The results of (2) were compared with the same analysis done with only the RMM2(0) term (not shown), which is akin to the expression of Yao et al. (2011). Skill during pentad 1 for the latter method is nonexistent due to the absence of the temperature persistence term. A correlation advantage of up to ~0.1 is seen with (2) for pentad 2 over the eastern United States, while the forecast skill for the last 2 pentads is comparable for both models, with the main difference seen in western Canada due to the temperature persistence term. This comparison demonstrates the benefit of including temperature persistence as seen in pentads 1 and 4, while pentads 1 and 2 are the main beneficiaries of the additional RMM terms.

The overall results of (2) generally show low skill beyond the first pentad. Although the forecast is of little practical use, the results indicate that the MJO does modulate outcomes over North America to some degree. Here we have assumed a linear relationship between the MJO and these temperatures, an assumption that somewhat limits the performance of the regression model. Having noted this, as we consider more specific MJO cases later on, more skill shall be found.

### c. The regression coefficients

One may gain insight into the observed forecast skill by comparing the regression coefficients. Figures 2a–d depict the lagged regression coefficient of SAT with respect to RMM2(0) averaged over all 30 winters, with the shading indicative of areas where the correlation is statistically significant at the 0.01 level. Pentad 1 shows no impact while a statistically significant region begins to build into the eastern United States and the Great Lakes region in pentad 2. This influence strengthens and expands into pentads 3 and 4. From these results, several observations can be made. The significant coefficients observed are negative, so it is apparent that the RMM2(0) (enhanced convection in the western/central tropical Pacific) component of the MJO corresponds to negative temperature anomalies over these regions. The coefficient values are also stronger over the last two pentads as opposed to pentad 2, so the influence is primarily seen over the 11–20-day time span. This is consistent with the idea stated earlier that it takes roughly a week (about 2 pentads) for the influence of the MJO to reach North America and about 2 weeks (about 3 pentads) for the response to fully develop. The coefficient values associated with RMM2(−1) (not shown) are rather intuitive, with the correlation patterns simply shifted ahead by one pentad. The maximum influence here is seen with pentads 2 and 3. However, the coefficient values are weaker than their RMM2(0) counterparts.

A different picture is seen with the RMM1 terms (enhanced convection in eastern Indian Ocean–Maritime Continent). Figures 2e–h depict the regression coefficients associated with RMM1(0). For pentad 1, rather large coefficient values are seen across much of the eastern United States and the Great Lakes region. The values are much smaller, though still significant into pentad 2, before dissipating into the last two pentads. Comparing the coefficient values associated with RMM1(0) with those of RMM2(0) (Figs. 2a–d), we see that they are rather similar apart from sign and a time lag. Significance is largely focused over the first two pentads for RMM1(0), while the last two pentads are emphasized with RMM2(0). This is due to the fact that a negative RMM2 leads a positive RMM1 by 9 days (about 2 pentads; see Fig. 6 of Wheeler and Hendon 2004). Considering that the atmospheric response to a tropical diabatic heating anomaly centered near the Maritime Continent associated with RMM1 is weak as discussed in Lin et al. (2010), the positive regressions in Figs. 2e–h for pentads 1–2 are likely mainly the result of a negative phase of RMM2 about 2 pentads in the past.

We may also consider the regression associated with RMM1(−1) (not shown). It mainly appears to be a one pentad leading version of the RMM1(0) results, although it is noteworthy that for pentad 1, the highest regression coefficient values have shifted into the Great Plains region of the United States. Finally, the autocorrelation coefficient is depicted in Figs. 2i–l. For the first pentad it is statistically significant throughout North America. After essentially disappearing during pentad 2, it begins to reemerge throughout western North America during pentad 3 before strengthening further into pentad 4, a result that was alluded to earlier.

From these regression results, a clear picture emerges in terms of the individual contributions to forecast skill (Fig. 1). Pentad 1 is dominated by the temperature persistence term T(0), in addition to some contribution from RMM1(0) and RMM1(−1). Pentad 2 sees significant contributions from the RMM2(−1) and RMM1(0) terms, while RMM2(0) has a somewhat lesser influence. Pentad 3 is dominated by RMM2(0) and RMM2(−1), while pentad 4 sees skill mainly from RMM2(0) and *T*(0).

## 4. Strong MJO events

Although the skill observed with all MJO cases considered (Fig. 1) is statistically significant, the results fall far short of being potentially useful in the context of operational weather forecasting. Here we focus on specific cases whereby more useful information may be extracted. By placing RMM1 and RMM2 in phase space, one may define the amplitude of the MJO with the standard Euclidean metric. Figure 3 depicts the forecast skill during very strong MJO events with an amplitude greater than 2. It is qualitatively similar to Fig. 1 (with the exception of pentad 4, where statistically significant skill largely disappears) with much higher correlation values apparent. It should be noted that for all forecast subsets considered in this chapter, the regression coefficients were not recomputed from the previous section.

### Phase dependence

Other subsets may be formed by considering the *phase* of the MJO, a quantity that is reflected by the location of the RMM vector in phase space and which yields its physical location. According to Lin and Brunet (2009), the MJO's influence on Canadian wintertime SAT is primarily seen in phases 3, 4, 7, and 8—in particular phases 3 and 7 when the tropical diabatic heating anomaly has a dipole structure. This is significant in light of the aforementioned discussion of the influence of the MJO. EOF2 (RMM2) has a similar anomaly structure to phases 3 and 7, namely, a dipole pattern with convection anomalies over the eastern Indian Ocean and the western Pacific (Yao et al. 2011). These are the MJO phases that have a stronger impact on North American SAT predictability. Forecast skill associated with RMM1 on the other hand is in part a result of the cross correlation with RMM2.

The subset of very strong MJO events was divided into two phase groups. The first consists of phases 3, 4, 7, and 8 while the second consists of 1, 2, 5, and 6. We first consider the phase group (3, 4, 7, 8), which should be more favorable for forecast skill based on the above discussion. During pentad 1 (Fig. 4), correlation values above 0.5 extend from the eastern seaboard to Alaska, a result of persistence and the MJO as noted previously. By pentad 2, significant skill is maintained throughout much of the eastern United States and the Ohio Valley. This signal is somewhat more localized during pentad 3, with a prominent 0.6 maximum in the U.S. Midwest. A rather abrupt change occurs for pentad 4, however, as significant skill disappears. To summarize, beyond pentad 1, correlation values above 0.5 are largely constrained to the eastern United States and the Great Lakes region during pentads 2 and 3 (6–15 days out).

The potential usefulness of this model may be increased by relaxing the forecast demands somewhat. The observed and the forecast temperatures were divided into three equally probable categories—*above normal*, *near normal*, and *below normal*—by considering the terciles of the normal distributions of each temperature set. This is done by defining a threshold as 0.43 times both of the standard deviations (Kharin et al. 2009). Figure 5 shows the fraction of forecasts made by the regression model (for MJO amplitude greater than 2; phases 3, 4, 7, 8) that were found to be correct using this procedure; in other words, how often it correctly predicted the category of the observed temperature anomaly. The shading indicates areas where the 90% bootstrap confidence threshold is achieved. The results are qualitatively similar to the correlation map (Fig. 4). This time, however, forecast scores above 0.5 extend into western Canada during pentads 2 and 3 (more than 50% chance of being correct), while scattered statistically significant scores are observed during pentad 4. The results through the first three pentads indicate that categorical temperature pentad forecasts may be of some practical use.

## 5. Summary and conclusions

A multilinear regression model based on the principal components that characterize the MJO and persistence was formulated in order to forecast wintertime surface temperature anomalies over North America up to 4 pentads (20 days) out. This was based on an extension of the regression technique utilized in the work of Yao et al. (2011), which only considered the current state of the MJO in making its predictions. In our case the current and the past state of the MJO were both used as predictors, as was persistence. The performance of the statistical model is highly dependent upon the magnitude and phase of the MJO. Potentially useful forecast skill is mainly seen during strong MJO events in phases 3, 4, 7, and 8. Beyond the persistence-dominated first pentad, model skill mainly resides in the eastern United States and the Great Lakes region, with correlation values up to ~0.6 during pentads 2 and 3 (6–15 days out). In contrast during phases 1, 2, 5, and 6 skill was essentially nonexistent after the first pentad. These phase dependent results are consistent with Lin and Brunet (2009), as they demonstrated that Canadian wintertime temperatures were influenced by the MJO mainly when in phases 3, 4, 7, and 8 as opposed to phases 1, 2, 5, and 6. Forecasting in one of three equiprobable categories was also considered, with potentially useful results observed during pentads 1–3 for strong MJO events in phase group (3, 4, 7, and 8). Such forecasts may provide relevant information in terms of possible temperature regimes a few weeks ahead.

The purpose of this study was to address the issue of subseasonal weather prediction. By utilizing atmospheric phenomena that occur on the intraseasonal time scale, predictability may be extended to lead times that are not feasible by traditional weather prediction methods. Thus, the MJO with a typical time scale on the order of 30–60 days is seen as a prime candidate to exploit for this purpose (Madden and Julian 1971; Waliser 2005). Over the past few years, much attention has been given to improving forecasts of the MJO for this very reason (Gottschalck et al. 2010). This study takes a different approach by exploiting the fact that the associated extratropical response takes an appreciable amount of time to develop. Thus, current and past information of the MJO was used. This work may provide a basis for which the MJO may be directly utilized in empirical models in order to enhance the validity of extended-range weather forecasting in the extratropics up to lead times of at least 20 days. Much useful information for operational weather forecasts on subseasonal time scales probably remains to be extracted from the MJO. This study may also aid in developing appropriate benchmarks for numerical models at such lead times.

## Acknowledgments

This research was made possible by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant to the second author. The authors thank two anonymous reviewers whose comments and suggestions helped to improve the paper.

## REFERENCES

*Intraseasonal Variability in the Atmosphere–Ocean Climate System,*W. K. M. Lau and D. E. Waliser, Eds., Springer, 19–61.

*Dynamics of Climate,*R. Pfeffer, Ed., Pergamon Press, 9–11.

*Intraseasonal Variability in the Atmosphere–Ocean Climate System,*W. K. M. Lau and D. E. Waliser, Eds., Springer, 389–423.

*Intraseasonal Variability in the Atmosphere–Ocean Climate System,*W. K.-M. Lau and D. E. Waliser, Eds., Springer, 125–173.