This study evaluates the role of the interdecadal Pacific oscillation (IPO) in modulating the El Niño–Southern Oscillation (ENSO)–precipitation relationship. The standard IPO index is described together with several alternatives that were derived using a low-frequency ENSO filter, demonstrating that an equivalent IPO index can be obtained as a low-frequency version of ENSO. Several statistical artifacts that arise from using a combination of raw and smoothed ENSO indices in modeling the ENSO–precipitation teleconnection are then described. These artifacts include the potentially spurious identification of low-frequency variability in a response variable resulting from the use of smoothed predictors and the potentially spurious modulation of a predictor–response relationship by the low-frequency version of the predictor under model misspecification. The role of the IPO index in modulating the ENSO–precipitation relationship is evaluated using a global gridded precipitation dataset, based on three alternative statistical models: stratified, linear, and piecewise linear. In general, the information brought by the IPO index, beyond that already contained in the Niño-3.4 index, is limited and not statistically significant. An exception is in northeastern Australia using annual precipitation data, and only for the linear model. Stratification by the IPO index induces a nonlinear ENSO–precipitation relationship, suggesting that the apparent modulation by the IPO is likely to be spurious and attributable to the combination of sample stratification and model misspecification. Caution is therefore required when using smoothed climate indices to model or explain low-frequency variability in precipitation.
The El Niño–Southern Oscillation (ENSO) phenomenon is a coupled ocean–atmosphere mode of climate variability that influences weather patterns globally, and its alternate phases of El Niño and La Niña lead to enhanced or diminished risk of droughts, floods, cyclones, bushfires, heatwaves, and other meteorological extremes (Nicholls and Wong 1990; Ropelewski and Halpert 1987). Because of the economic and societal disruption that is often attributed to ENSO extremes, significant research has been conducted to better understand its dynamics (Battisti and Hirst 1989; Bjerknes 1969; Neelin et al. 1998; Suarez and Schopf 1988), predict its occurrence (Barnston et al. 2012; Cane et al. 1986; Latif et al. 1998), and describe its teleconnection with societally important hydrometeorological variables such as temperature, precipitation, and streamflow (Dai and Wigley 2000; Horel and Wallace 1981; Ropelewski and Halpert 1987; Trenberth et al. 1998; Walker 1924).
Although ENSO is commonly described as a pseudoperiodic phenomenon with a spectral peak in the 2–7-yr band (Cane 2004; McPhaden et al. 2006; Torrence and Compo 1998; Trenberth 1997), there has been greater recent attention paid to its lower-frequency dynamics, with extended periods of reduced or elevated ENSO frequency and/or intensity identified in the twentieth-century instrumental record (Rasmusson et al. 1994; Wang and Ropelewski 1995) and in longer-term paleo reconstructions (Biondi et al. 2001; D’Arrigo et al. 2001). This low-frequency variability is associated with two lower-frequency indices, described below.
The Pacific decadal oscillation (PDO), defined as the leading principal component of Pacific sea surface temperature poleward of 20°N (Mantua et al. 1997; Zhang et al. 1997). This index is sometimes described as the “reddened” response to both atmospheric noise and the ENSO phenomenon (Newman et al. 2003), and it has a greater degree of decadal variability than ENSO.
The interdecadal Pacific oscillation (IPO) index, defined as the second principal component of low-frequency filtered global sea surface temperature (Parker et al. 2007; Power et al. 1999). This index is often described as exhibiting “ENSO-like” decadal patterns (Power and Colman 2006; Power et al. 2006), although the relative importance of the decadal variability increases away from the equator (Power and Colman 2006).
There has been considerable debate about whether the low-frequency variability is externally forced or arises because of the internal dynamics of ENSO. For example, Wang and Ropelewski (1995) suggest that multidecadal and ENSO-scale variability are distinct, and Gu and Philander (1997) provide a mechanism by which an influx of water from the extratropics can drive variability in the tropics at interdecadal time scales. An alternative view is given by Jain and Lall (2001), who use the Zebiak and Cane (1987) model of tropical ENSO dynamics to show that it is possible to obtain significant interdecadal variability without requiring external low-frequency forcings, with the lower-frequency variations arising because of the nonlinear dynamical behavior of the ENSO system (see also Fedorov and Philander 2000; Kirtman et al. 2013; Newman et al. 2003; Power and Colman 2006; Walland et al. 2000; Wittenberg 2009). Power et al. (2006) furthermore showed that the IPO index can be derived as a low-frequency representation of several ENSO indices, particularly when focusing on the period subsequent to 1945 when the quality of the SST measurements is highest. This implies that, regardless of the mechanism that causes ENSO to vary at interdecadal time scales, this variability is contained within standard indices of ENSO such as the Southern Oscillation index (SOI) and the Niño-3.4 index. A recent review summarizing this debate is given in Liu (2012).
In addition to interdecadal variability in ENSO itself, the associations between ENSO and hydrological variables such as precipitation and streamflow also appear to vary at interdecadal frequencies, suggesting that the IPO and PDO may “modulate” the relationship between ENSO and various hydrometeorological variables. For example, the IPO and PDO have been shown to influence the relationship between ENSO and precipitation in Australia (Power et al. 1999), North America (Gershunov and Barnett 1998; Khedun et al. 2014), southern Africa, and northern India (Dong and Dai 2015), as well as drought (Henley et al. 2013) and flood (Franks and Kuczera 2002; Ishak et al. 2013; Kiem et al. 2003; Micevski et al. 2006; Pui et al. 2011; Verdon et al. 2004) risk in Australia. This information has been used as the basis for developing models of flood risk conditional on the IPO phase (Kiem et al. 2003), and also as an input to climate-informed stochastic models for drought risk (Henley et al. 2011), suggesting the importance of understanding the nature and causes of interdecadal variability for water resources applications.
The apparent modulation of the ENSO–precipitation teleconnection by the IPO is difficult to interpret physically, particularly in light of the finding by Power et al. (2006) that the IPO index can be derived as a smoothed version of ENSO indices. Consider the case where a given year exhibits a strong El Niño but the surrounding years are all La Niña events: for that year the ENSO index is likely to have the opposite sign to the IPO index. In this case, how can the atmosphere “know” the state of ENSO (the leading mode of SST variability) in these surrounding years? The atmosphere has limited internal memory beyond approximately 10–14 days, with the seasonal predictability of atmospheric variables such as precipitation attributed to the relatively longer time scales of variability in the atmosphere’s lower boundary layer: the land surface and the sea surface temperature (SST) field (Goddard et al. 2001; Palmer and Anderson 1994). Furthermore, land surface precipitation responds almost instantaneously to SST forcing at seasonal or annual time scales (Westra and Sharma 2010), with any identified lags being due to the persistence in the SST field. Note that unlike for precipitation, the issue of memory is easier to explain for drought and flood risk, given the role of catchment moisture stores in influencing flood risk (e.g., Pathiraja et al. 2012; Pui et al. 2011).
A resolution to this issue was proposed by Power et al. (2006), who showed that the apparent modulation could arise because of a combination of the use of smoothing in deriving the IPO index, and the apparent nonlinearity (or “asymmetry”) in the association between ENSO and precipitation. This asymmetry has been observed in Australia in several other studies (Cai et al. 2010; King et al. 2013; Sun et al. 2014), although Cai et al. (2010) and King et al. (2013) both suggest that the degree of asymmetry in the relationship between ENSO and precipitation may itself change as a function of the IPO. However, the precise way in which smoothing can lead to the apparent modulation in the teleconnection between the IPO/PDO and hydroclimatic variables such as precipitation has received limited attention in the literature.
This paper extends the work of Power et al. (2006), using theoretical and empirical results to assess the implications of using smoothed time series in statistical modeling of data with nonlinear predictor–response relationships. Taking as a starting point the working hypothesis that the IPO index can be obtained as a smoothed version of standard representations of ENSO (discussed further in section 3), we examine potential issues associated with using smoothed data as a covariate to predict seasonal and annual global precipitation (we focus on the IPO rather than the PDO in this research, because the PDO is derived using only the SST data in the extratropical northern Pacific, and therefore its relationship with ENSO is likely to be more complex). We also investigate the conditions under which the apparent modulation of the ENSO–precipitation relationship by the IPO index can be explained by the application of smoothed data, or by the nonlinearity of the association between ENSO and precipitation.
Given that many of the studies that find a modulation use stratified analyses in which the relationships between ENSO and precipitation and/or streamflow are stratified by both the phase of ENSO and by the IPO (Khedun et al. 2014; Kiem et al. 2003; Verdon et al. 2004), we devote particular attention to this class of model. We then describe how artifacts can be introduced from a combination of 1) using a predictor along with its smoothed version in developing predictor–response relationships, 2) stratifying according to the ENSO phase when the true ENSO–precipitation relationship is in fact continuous, and 3) assuming that a linear ENSO–precipitation relationship is modulated by the IPO when the true ENSO–precipitation relationship is nonlinear.
The remainder of this paper is structured as follows. The data used for our analysis are described first (section 2), followed by a demonstration that IPO-like indices can be derived from standard indices of the ENSO phenomenon (section 3). The theoretical implications of using smoothed predictors under model misspecification are then described (section 4), followed by an analysis using global seasonal and annual precipitation data (section 5). Practical implications of using smoothed data are discussed in section 6, followed by conclusions in section 7.
A global precipitation product, the global sea surface temperature anomaly (SSTA) field, and indices of ENSO and the IPO were used as the basis for our investigations. The extended sea surface temperature anomaly dataset was obtained from Kaplan et al. (1998) and Reynolds and Smith (1994), and is available on a 5° latitude by 5° longitude global grid. This dataset was obtained from the International Research Institute for Climate and Society data library (http://iridl.ldeo.columbia.edu/). The Niño-3.4 index was also obtained from this data library, and is defined as the average SSTA calculated over the domain bounded by 5°S–5°N, 120°–170°W.
The updated IPO data developed by Parker et al. (2007) using the Hadley Centre Sea Surface Temperature version 2 dataset (HadSST2; Rayner et al. 2006) were downloaded from the Institute of Global Environment and Society (www.iges.org/c20c/IPO_v2.doc). The IPO index is derived by projecting the HadSST2 field onto the second empirical orthogonal function (EOF) applied to the low-frequency filtered HadSST2 covariance matrix. The low-frequency filtered HadSST2 field is obtained by applying an 11-yr low-pass Chebychev filter to the SST data. The “smoothed” IPO index is then obtained by projecting the low-frequency filtered HadSST2 data directly onto this EOF, while the “unsmoothed” IPO index is obtained by projecting the unfiltered monthly HadSST2 field onto the same EOF.
The Global Precipitation Climatology Centre (GPCC) full data reanalysis product (v6) from 1901 to 2010 was downloaded from http://www.esrl.noaa.gov/psd/data/gridded/data.gpcc.html (Schneider et al. 2011, 2014). The data are available in the form of monthly gridded precipitation depth, and the 2.5° latitude by 2.5° longitude resolution product was used for our analysis. To minimize the potential for spurious results because of insufficient sampling, only data from 1921 to 2005 were used (see Fig. 1 for station availability over this period). The data were aggregated to seasonal and annual time scales.
3. The IPO index as a smoothed version of ENSO
A foundation of our analysis is that the IPO index can be represented as a smoothed version of standard representations of ENSO, and we begin by showing that IPO-like indices can be obtained through smoothing standard representations of the ENSO phenomenon.
Figure 2, top panel, shows the annual time series of the Niño-3.4 index, the unsmoothed IPO index, and the first principal component (PC1) of the detrended global sea surface temperature field (obtained by removing the linear trend at each grid point). All series were standardized and the sign of PC1 was reversed to facilitate comparison between indices. These series are highly correlated, with a correlation coefficient of 0.94 between the unsmoothed IPO index and PC1, 0.93 between the Niño-3.4 index and PC1, and 0.85 between the Niño-3.4 index and the unsmoothed IPO index.
To approximate the smoothed IPO index from Parker et al. (2007) (Fig. 2, bottom), we applied a local polynomial regression (loess, with span = 0.35) to the unsmoothed IPO index, with the span selected to maximize the similarity with Parker et al.’s (2007) smoothed IPO index. Our loess-smoothed IPO index is clearly very similar to the standard smoothed IPO index, despite the substantial differences in how each series was derived. A moving average filter was also applied to the unsmoothed IPO index using a 13-yr centered window, and again the time series exhibit similar features. Finally, the loess smoothed series of the PC1 and Niño-3.4 indices are presented, and both series show similar features to the IPO index except for some differences in the Niño-3.4 series prior to 1945. Similar findings were obtained by Power et al. (2006), who suggested that the difference between the Niño-3.4 index and the IPO index prior to 1945 might be due to higher levels of observational uncertainty in the earlier period of record. Interestingly, visual inspection of the raw (unsmoothed) IPO and Niño-3.4 series (Fig. 2, top) shows that the raw time series are highly correlated throughout the period of record, with the correlation coefficient between the Niño-3.4 and IPO indices being 0.89 for the period prior to 1945 and 0.84 for the period from 1945 onward.
An alternative perspective can be obtained by presenting the correlation between both the Niño-3.4 and unsmoothed IPO time series and the raw sea surface temperature data (Fig. 3). The magnitude and spatial pattern of the correlation coefficients are almost indistinguishable, suggesting a high degree of similarity between the series.
This analysis corresponds closely to that conducted by Power et al. (2006), who noted high levels of association between series of the SOI, Niño-3.4, and the IPO after applying a low-pass spectral filter with 13-yr cutoff, particularly in the period from 1945 when confidence in the SST observations in the tropical Pacific increased significantly (Kaplan et al. 1998). Power et al. (2006) also applied a low-pass filter to the first principal component of the unfiltered sea surface temperature data and obtained a correlation coefficient of 0.97 with the IPO index.
We conclude that the IPO index can be obtained as a low-frequency representation of standard measures of the ENSO phenomena. We stress that this conclusion does not depend on an understanding of the presence of, or physical reasons for, low-frequency variability in ENSO, or whether it is internally or externally forced; all that is required for the arguments in the following section is that the IPO index can be obtained as a smoothed version of ENSO. We now examine possible implications of using smoothed series when statistically modeling the association with hydrometeorological series such as precipitation.
4. Smoothing, stratification, and model misspecification
Here we present theoretical results that describe implications and possible artifacts that result from relating unsmoothed and smoothed predictors (e.g., the Niño-3.4 index and the IPO index) to a response variable (e.g., precipitation), including the case where the statistical model has been misspecified.
a. A simplified representation of the IPO–ENSO–precipitation relationship
Consider the situation in which a predictor is associated with response via the linear model:
where a and b are constants, and and are the variances of X and ε, respectively. We also construct a moving average of the predictor:
where ω is the moving average window (assumed to be an odd number to simplify notation). For all three time series, the time index t = 1: NT can be thought of as representing successive years, or possibly a season over successive years. Several statistical properties of the joint distribution (Xt, Yt, Rt) are described in section a of the appendix.
For this example, Xt represents a climate index of interannual variability (e.g., some ENSO index), except that the Xt values are assumed to be independent and identically distributed (iid), whereas ENSO indices typically exhibit some autocorrelation (e.g., Torrence and Compo 1998). Also, Rt represents a precipitation variable, which in this example depends linearly on Xt. Since Rt only depends on Xt with iid residuals, Rt is not expected to exhibit autocorrelation. Finally, Yt is a smoothed version of Xt, which is designed to be analogous to the IPO index being derived as a smooth version of the Niño-3.4 index as described in section 3. Here, smoothing is implemented through a moving average with a window of size ω centered on the current time step, and is analogous to the moving average filter of ENSO depicted in Fig. 2. It is acknowledged that the standard derivation of the IPO index (which involves low-pass filtering of the SSTA field followed by an EOF analysis) is considerably more complex than indicated in Eq. (3); however, this simplified representation enables the derivation of analytical results that highlight possible effects of smoothing.
b. Autocorrelation of the smoothed predictor and its relationship to the response
A first remark is that smoothing, by construction, creates apparent low-frequency variability. In the case of the moving average used in Eq. (3), it can be shown that the autocorrelation function of Yt decreases linearly as a function of the time lag τ [i.e., ρ(τ) = max(1 − |τ|/ω, 0)], creating the low-frequency behavior. This is a pure consequence of the moving average filter and does not denote any existing low-frequency component in the raw variable Xt (which is iid here).
A second remark is that the averaging window encompasses the current time step t, so that Xt and Yt are statistically dependent, with correlation coefficient ρ = ω−1/2 (see proof in section b of the appendix). In other words, knowing the value taken by Yt provides some information on the value of Xt. As a consequence, any variable that depends on Xt may also depend on Yt through the information brought by Yt about Xt. More formally, the distribution of Rt knowing that Yt takes the value y is (see proof in section c of the appendix)
yielding a correlation coefficient between Rt and Yt equal to
Equation (4) shows that while Rt is defined without any reference to Yt, its conditional distribution knowing Yt = y does depend on y. More precisely, the relationship between Rt and Yt is similar to the relationship between Rt and Xt, with the same b + ay linear relationship, but with residual variance increased from to .
In practice, the correlation between Rt and Yt implies that one may spuriously attribute some low-frequency variability in precipitation to a low-frequency climate mode with the following reasoning:
The smoothed index Yt exhibits low-frequency variability, but this might be a pure consequence of filtering.
Precipitation is related to the smoothed index, but this might be a consequence of precipitation being related to the unsmoothed index, which may or may not exhibit a low-frequency component.
An alternative perspective of the relationship between Rt and Yt can be obtained by examining Rt conditional on alternative values of Yt. Following the approach of Thyer et al. (2006), we evaluate the probability that Rt in the “wet” phase of Yt [denoted as ] is less than Rt in the “dry” phase [denoted as ], for some separation measure Δy. This probability is denoted as , where indicates that the distributions are “highly similar” with little or no separation, and indicates that the distributions are less similar and have a higher separation.
We can express Δy as function of the number of standard deviations (c) of σy (i.e., Δy = cσy = cσxω−1/2) and refer to c as the “degree of separation.” Since a represents the coefficient of the linear relationship between Xt and Rt, while σε represents the noise in this relationship, we can also introduce a pseudo signal-to-noise ratio as d = a/σε. This allows us to develop an expression for in terms of c and d, which is given as Eq. (A4) in section d of the appendix.
Using a typical value of ω = 13, as used for the IPO index in Fig. 2, we can plot the “similarity measure” as a function of the typical ranges of c and d. This is shown in Fig. 4, with the similarity measures decreasing toward values closer to zero, suggesting the existence of distinct wet and dry states with a high separation, and with increasing values of both c and d. This is in contrast to the model in Eqs. (1)–(3) that has no low-frequency variability and no distinct wet and dry states. To provide a guide to the range of values for , Thyer et al. (2006) found values of 0.13–0.30 for annual rainfall from capital sites around Australia. This example illustrates that the distribution of precipitation can vary substantially when conditioning on smoothed indices, and highlights the potential for misleading conclusions when using this form of analysis to investigate the nature of low-frequency variability in precipitation.
c. Spurious modulation due to stratifying on multiple correlated climate indices
An additional issue is linked to the practice of stratifying the precipitation data according to a categorical discretization of the climate index. For instance, one may separate precipitation data according to the sign of the ENSO index, and evaluate whether the distribution of precipitation during positive ENSO phase (ENSO+) and negative ENSO phase (ENSO−) are significantly different. In this section, we show that this approach might create spurious modulations when several correlated variables are used: typically, a climate index and its smoothed version.
Assume a category for climate index Xt is defined as Xt ∈ Ix. As an illustration, an El Niño event might be defined as Niño-3.4 index >0.5°C, yielding the interval Ix = (0.5; +∞). Similarly, a category for the smoothed climate index Yt can be defined as Yt ∈ Iy. It is then of interest to consider the distribution of the following conditional random variables: (Rt | Xt ∈ Ix) (representing precipitation stratified by ENSO category) and (Rt | Xt ∈ Ix, Yt ∈ Iy) (representing precipitation stratified by both ENSO and IPO categories).
It is possible to derive explicit formulas for these two conditional distributions (section e of the appendix), and they are illustrated in Fig. 5. The gray curve shows the marginal distribution of precipitation, representing what one knows about precipitation in the absence of any external information. If we receive information that Xt > 0, the distribution shifts toward higher precipitation values (black curve), as a consequence of the Xt − Rt linear relationship of Eq. (1). When conditioning on both Xt > 0 and Yt > 0, one may intuitively expect the same distribution when conditioning on Xt > 0 alone (black curve) since, by construction, Rt is defined based on Xt alone. This is actually not the case: the distribution of Rt knowing that both Xt > 0 and Yt > 0 is slightly higher (light red curve) than when conditioning only on Xt > 0. Conversely, the distribution of precipitation knowing that Xt > 0 and Yt < 0 (light blue curve) is slightly lower than when conditioning only on Xt > 0. The difference between the light blue and light red curves could be incorrectly interpreted as a “modulation” of the ENSO effect by the IPO phase. This difference is even larger when conditioning on “extreme” Yt values (dark red and blue curves).
The observations in Fig. 5 can be explained as follows. The distribution of Rt conditional on a given value Xt = x is indeed independent of Yt. However, the distribution of Rt conditional on a range of Xt values (e.g., Xt > 0) depends on the information available on Yt [see Eq. (A5) in section e of the appendix]. In the particular case of Fig. 5, knowing that Yt is larger than 1 (i.e., that the moving average of Xt is larger than 1) suggests that Xt is less likely to be close to zero. In contrast, if the (more precise) information that, for example, Xt = 2 was available, then any information on Yt would become irrelevant.
d. Spurious modulation due to model misspecification
Assessing the presence of a modulation of the ENSO–precipitation relationship by the IPO requires the assumption of a statistical model that describes this modulation. A simple way to derive such a model is to assume a linear ENSO–precipitation relationship, whose slope varies depending on the IPO phase. This can be written as
Equation (6) defines the distribution of Rt conditional on the values taken by both Xt and Yt, where μ0, μ1, and γ are regression coefficients. It is also interesting to evaluate the distribution of Rt conditional on Xt alone. It can be shown (see section f of the appendix) that with the assumptions made in Eqs. (2) and (3), this distribution is a mixture of two Gaussian distributions, with the following mean:
with F(u; m, s2) denoting the cumulative distribution function of a Gaussian distribution with mean m and variance s2, evaluated at u. Equation (7) has an important consequence: it shows that the initial shape of the Xt–Rt relationship, as defined within each Yt phase, is not preserved when precipitation is conditioned on Xt alone. This is illustrated in Fig. 6: while the Xt–Rt relationship is linear within each Yt phase, it is not linear anymore when one ignores the information on Yt. Interestingly, when x → +∞, the expected precipitation conditional on Xt = x tends toward the expected precipitation conditional on both Xt = x and Yt > 0 (and similarly when x → −∞). This can be explained as follows: when a very large Xt value occurs, it is very likely that Yt is positive (since the latter is the moving average of the former), so that the relationship of the positive Yt phase holds.
A statistical model that assumes a linear Xt–Rt relationship within each Yt phase therefore does not result in a linear overall Xt–Rt relationship. This may be problematic in cases where the true Xt–Rt relationship is nonlinear: indeed, the statistical model may use the Yt modulation to induce some curvature in the overall Xt–Rt relationship (as shown by the black curve in Fig. 6), therefore detecting a spurious Yt modulation effect that is in fact attributable to a model misspecification (i.e., assuming a linear Xt–Rt relationship within each Yt phase when the true relationship is nonlinear).
This can be illustrated as follows. Assume that the true Xt–Rt relationship is no longer linear, but rather piecewise linear, as suggested by several recent studies (e.g., Sun et al. 2014). This leads to modifying Eq. (1) as follows:
We then fit the model of Eq. (6), corresponding to the illustration in Fig. 6. If parameter γ is found to be nonzero, this would indicate there is a modulation of the Xt–Rt relationship by Yt. Note that this model is misspecified: while the true precipitation depends nonlinearly on only Xt, the model assumes that precipitation depends linearly on Xt, but that the slope of the relationship may vary according to the Yt phase.
The model misspecification results in the spurious detection of a modulation effect. This is illustrated by a Monte Carlo experiment using 1000 simulations of precipitation simulated according to Eq. (9) (n = 85, σx = 1, ω = 7, b = 40, a− = −8, a+ = 0, and σε = 5). Figure 7 shows that the γ parameter is positive in approximately 91% of the 1000 simulations, providing apparent evidence of modulation. The model in Eq. (6) is compensating for its structural inadequacy by detecting a spurious modulation effect.
Figure 8 yields additional insights into this compensation mechanism. It corresponds to one particular realization over the 1000 Monte Carlo replications described above. A first observation is that most largely negative Yt events (blue dots) correspond to negative Xt values: this is not surprising, and is simply a consequence of Yt being a smoothed version of Xt. Another observation is that the mean of Rt | Xt = x no longer depends linearly on x. It therefore appears that the misspecified model of Eq. (6) is trying to recreate the nonlinear relationship of Eq. (9) by means of the modulation term.
5. Application to global precipitation data
The theoretical results described in the previous section show that a predictor–response relationship can be spuriously “modulated” by the smoothed version of the predictor when stratifying and under model misspecification. Using a global monthly precipitation product together with standard indices of ENSO and the IPO, we now evaluate whether it is possible to detect an apparent ENSO–precipitation modulation by the IPO, and then examine whether alternative model formulations reduce or eliminate this modulation. We begin by describing three alternative model formulations for examining the role of ENSO–precipitation modulation by the IPO.
a. Three formulations of the ENSO–precipitation relationship
The following three statistical models will be considered:
In Eqs. (10)–(12), the residual term εt is assumed to follow a Gaussian distribution with mean zero and unknown standard deviation σε. In each formulation, the Niño-3.4 index described in section 2 was used to represent the ENSO phenomenon, and the IPO index developed by Parker et al. (2007) was used to represent the IPO. Figure 9 illustrates the three formulations:
The stratified formulation assumes constant mean precipitation within each combination of ENSO/IPO phase. The difference in mean precipitation between ENSO phases is modulated by IPO: it is equal to 2μ1 during negative IPO phase (IPO−) and to 2(μ1 + γ) during positive IPO phase (IPO+). The parameter γ can therefore be interpreted as the modulation of the ENSO effect by the IPO phase. Note that this formulation depends only on the sign of ENSO; its strength is not considered.
The linear formulation assumes a linear relationship between the mean precipitation and ENSO, with the slope of this relationship differing between positive and negative IPO phases. The difference in the slope between IPO phases is controlled by the modulation parameter γ.
The piecewise linear formulation is similar to the previous model, except that the linear relationship is replaced by a piecewise linear relationship, allowing for different impacts on the mean rainfall during La Niña and El Niño phases. As a consequence, two distinct modulation parameters γ1 and γ2 are introduced.
b. Estimation and testing
All three formulations introduced in section 5a can be written as linear statistical models (see section g of the appendix). Beware of the possible confusion with the adjective “linear” here: in the statistical sense, a linear model is linear with respect to its parameters. All three models in Fig. 9 are linear with respect to their parameters, so that the standard statistical framework of linear models can be used for parameter estimation and hypothesis testing. However, only the second model assumes a linear relationship (or more precisely, an affine relationship) between ENSO and precipitation within each IPO phase. Note also that all formulations of section 5a assume normally distributed residuals, which may be problematic in some regions of the world where the distribution of seasonal/annual precipitation is asymmetric. To circumvent this problem, a normal score transformation is applied to the raw seasonal/annual precipitation values. Last, all formulations of section 5a assume that the standard deviation of residuals σε remains constant between IPO phases. This assumption has been evaluated by testing the equality of residual variances between IPO phases using a standard F test. In most cases no significant difference was found, with the few significant differences being safely attributable to the error level of the test (not shown). All computations have been implemented with the package lm in R.
We are particularly interested in assessing the significance of ENSO and modulation effects. Tests of the significance of ENSO and modulation parameters are easy to apply on a site-by-site basis, and this local procedure is repeated over all sites with available data. However even in the absence of any ENSO/modulation effect on precipitation, we expect to find locally significant effects in a percentage of sites close to the error level of the test. It is therefore necessary to assess significance at the global scale or, in other words, to determine which sites show globally significant effects, knowing that the local tests have been repeated over Ns sites. The false detection rate (FDR) approach (Benjamini and Hochberg 1995; Renard et al. 2008; Ventura et al. 2004; Wilks 2006) is used for this purpose. The results presented below will therefore distinguish two types of significance: local significance (related to the p value of a local test) and FDR significance (denoting sites where effects remain significant after accounting for the repetition of the test over a large number of sites).
c. Results at the seasonal scale
We commence by examining the results from each formulation for individual 3-month periods (seasons), defined as January–March (JFM), April–June (AMJ), July–September (JAS), and October–December (OND). Results below are for the season OND; results for other seasons (not shown) are qualitatively similar, but they all exhibit less pronounced ENSO/modulation effects.
Figure 10 shows the significance of ENSO effects. The stratified and linear formulations yield similar results: a locally significant ENSO effect is detected in many places, and in most cases this effect remains FDR significant. The spatial patterns are in agreement with the well-known spatial distribution of ENSO–precipitation teleconnections reported in other papers (e.g., Ropelewski and Halpert 1987). In comparison to the first two formulations, the piecewise linear formulation, which distinguishes ENSO effects during El Niño and La Niña phases, exhibits fewer FDR-significant effects. This is likely to be due to the higher complexity of the piecewise linear formulation, which uses five parameters (plus the residual standard deviation) versus three parameters for the other formulations. The additional complexity can result in higher p values for significance tests, decreasing the number of FDR-significant effects.
Figure 11 shows the significance of the modulation effects. They are much less significant than the ENSO effects for all three formulations, suggesting that the information brought by the IPO index beyond the information already contained in the Niño-3.4 index is quite limited. In addition, modulation effects never reach FDR significance: this further suggests that while locally significant modulations may be detected in certain cases, the hypothesis that they are all false detections attributable to the repetition of the test over many sites cannot be excluded.
d. Results at the annual scale
While results for all four seasons suggest that modulation effects are not FDR significant for all formulations, results at the annual scale are slightly different. Figure 12 shows that FDR-significant modulations exist in northeastern Australia with a linear formulation (top-right panel). However, the modulations are not FDR significant with a stratified formulation (top-left panel) or with a piecewise linear formulation (bottom panels).
The particular case of northeastern Australia is investigated further by plotting the normal-scored precipitation against ENSO values for one cell located in the dark red region in the top-right panel of Fig. 12. Figure 13 shows the ENSO–precipitation relationship for this site, along with the ENSO-conditional precipitation distribution according to each of the three formulations. These distributions are derived from the distribution of precipitation conditional on both the Niño-3.4 and IPO indices specified in Eqs. (10)–(12), as described in section 4d and section f of the appendix.
Figure 13 highlights an apparent nonlinearity in the ENSO–precipitation relationship. Since a linear formulation cannot describe such nonlinearity, the model is using the modulation parameter to recreate the nonlinear ENSO–precipitation relationship suggested by the data (see the curvature of the conditional mean in Fig. 13b). This closely corresponds to the synthetic example given in section 4d (cf. Figs. 8 and 13b). Conversely, the piecewise linear formulation is able to describe such nonlinearity by using very different slopes during El Niño and La Niña phases (Fig. 13c) and therefore does not need a modulation effect to induce this behavior. This explains why modulation effects are not FDR significant with this formulation.
6. Discussion and implications
The previous section showed that the modulation of the ENSO–precipitation relationship was not FDR significant in all locations globally for all seasons, and was FDR significant for the annual data only for a region in northeastern Australia and only for the linear formulation. When the ENSO–precipitation relationship was modeled for this specific region, a nonlinear association could be detected, suggesting that the apparent modulation by the IPO could be explained by the nonlinear ENSO–precipitation association. In this section we describe potential implications of these findings, starting with alternative physical interpretations associated with nonlinear ENSO–precipitation relationships compared to one modulated by the IPO.
a. Physical interpretation of alternative models
Linear ENSO–precipitation models have often been assumed for convenience (Hoerling et al. 1997); however, there are no a priori reasons to assume that the association is linear. In fact, numerous papers have highlighted the apparent nonlinearity in the ENSO–precipitation association (Nicholls and Wong 1990; Power et al. 2006; Sun et al. 2014), including the potential for asymmetry depending on whether the ENSO is positive or negative (Sun et al. 2014; Zhang et al. 2014). It has been suggested that the nonlinearity (or asymmetry) of the relationship is due to the asymmetry in midtropospheric circulation, and attendant thermodynamic controls on deep convection (Hoerling et al. 1997; Zhang et al. 2014).
In contrast to a nonlinear ENSO–precipitation relationship, the physical mechanism that could explain the modulation of the ENSO–precipitation relationship by a smoothed version of ENSO is less clear. Continuing with the argument made in section 3 that the IPO index can be represented as a smoothed version of standard representations of ENSO such as the Niño-3.4 index, then the IPO index for a given year t can be divided into three distinct components: the ENSO series at time t, the ENSO series in previous years, and the ENSO series in future years. If there is a relationship between ENSO and precipitation at time t, then it is not surprising that a relationship between the IPO index and precipitation can also be detected, since the IPO index is partly made up of ENSO variability at time t (see theoretical discussion in section 4b). However, assuming that this component is already accounted for by directly modeling the ENSO–precipitation relationship, then the residual influence of the IPO (i.e., after accounting for ENSO at time t) must be due to the ENSO state in the years before, or after, year t. Using the simplified representation in Eq. (3) and assuming a 13-yr window as was adopted in Fig. 2, this corresponds to the parts and , respectively.
It is difficult to provide a physical interpretation for the modulating role of ENSO up to six years in the past on the teleconnection between ENSO at time t and precipitation at time t. As discussed in the introduction, the atmosphere itself has limited internal memory beyond approximately 10–14 days (Goddard et al. 2001; Palmer and Anderson 1994), and land surface precipitation appears to respond almost instantaneously to SST forcing at seasonal or annual time scales (Westra and Sharma 2010). Although there is evidence that atmospheric variables such as equatorial winds can affect decadal variability of ENSO (Wang and An 2002), this variability occurs as part of a coupled system with the ocean, rather than being independent of it; therefore it is unclear how the atmosphere can maintain multiyear memory in the absence of that variability being detected in the SSTA field.
It is possible that some of the information on historical ENSO variability transfers to non-ENSO SST variability at time t (e.g., through thermocline depth variations or some form of atmospheric bridge; McPhaden et al. 2011; Wang and An 2002), which could then influence the ENSO–precipitation relationship at time t. However, the orthogonality of the ENSO signal with other elements of SST variability (based on ENSO being described by the first principal component of SST variability, which is by definition orthogonal to the remainder of the SST field), combined with the fact that the variance accounted for by each non-ENSO mode of variability is much smaller than the variance accounted for by ENSO (e.g., Westra and Sharma 2010), suggests that this is unlikely to fully explain the apparent modulation of the ENSO–precipitation relationship. Even less clear than the role of ENSO variability up to six years in the past, however, is the physical relevance of the state of ENSO up to six years into the future with regard to precipitation at time t.
Although it is not possible to prove conclusively that the IPO does not modulate the ENSO–precipitation teleconnection, we have shown how an apparent modulation can arise spuriously because of the nonlinear ENSO–precipitation association and, furthermore, that the effect of the IPO after accounting for ENSO is no longer FDR significant, with the exception of a small region in northeastern Australia using annual data for one of the three model formulations described here. Combining this with the difficulty in physically explaining the mechanism causing IPO modulation of the ENSO–precipitation teleconnection as described in this section, we suggest that the apparent IPO modulation of the ENSO–precipitation relationship may be spurious and attributable to the use of a combination of a smoothed time series, stratification by ENSO phase, and model misspecification.
b. Use of the IPO index for prediction
It is sometimes suggested that use of the IPO index can lead to improved short-term forecasts of hydrological variables, with potential applications for operational flood management, infrastructure maintenance, and reservoir management (Kiem et al. 2003; Verdon et al. 2004). The justification for this is because of the persistence of the IPO index itself, so that knowledge of the current phase of the IPO will provide significant information on the future state of the IPO (as discussed in Power and Colman 2006).
As argued in section 4b, the persistence in the IPO index can be induced by smoothing the data, regardless of whether the underlying data is autocorrelated or not. Furthermore, given that smoothing of the IPO index is centered on time t, it is not possible to obtain a “real time” estimate of the IPO. This is reflected in the moving average IPO formulation described in Eq. (3), in which assuming a 13-yr moving window one needs to know ENSO up to six years into the future; similarly, the official website for the IPO (www.iges.org/c20c/IPO_v2.doc) cautions that “future values [of SSTs] may significantly affect the filtered values for about the most recent five years, so these should be regarded with caution.”
We therefore suggest that to improve forecasts of precipitation, streamflow, and other hydrometeorological variables, research efforts should focus on improving ENSO predictions (Barnston et al. 2012; Cane et al. 1986; Latif et al. 1998) or better understanding the role of other climate modes of variability such as the Indian Ocean dipole (Cai et al. 2011), the North Atlantic Oscillation (Qian et al. 2000), or sea surface temperatures more generally (Westra and Sharma 2010). However, given the potential for spurious effects when using smoothed data, it is generally recommended that relevant climate modes be represented using unsmoothed indices in preference to smoothed indices when developing predictive models.
We note that the identification of potentially spurious effects when using smoothed series does not negate the possibility that parts of the climate system exhibit significant low-frequency variability, which may be expressed through large-scale climate modes or which may be found in hydroclimatic variables such as precipitation (Dai 2013). Better understanding of the nature of this low-frequency variability might ultimately provide a basis for developing decadal-scale climate forecasts, with research in this area having intensified in recent years (Kirtman et al. 2013).
To this end, a number of frequency-based methods are available, and can been used to 1) assess the significance of one or several low-frequency components in a hydroclimatic time series (e.g., Boé and Habets 2014); 2) assess the presence of significant coherence between two series at one or several frequency levels (e.g., through the use of wavelet coherence analyses; Park and Mann 2000; Dieppois et al. 2013); and 3) develop predictive models of one or several hydroclimatic variables (e.g., Rajagopalan et al. 1998). However, our results indicate that simply using a combination of unsmoothed and smoothed climatic series of the same or closely related phenomena (such as ENSO and the IPO) can lead to misleading conclusions and should probably not be used in a predictive context.
c. Extensions to other surface variables
This paper has focused on the effects of the IPO on seasonal and annual precipitation. The implications for other atmospheric variables, including precipitation extremes or atmospheric temperature, are unknown but given the limited persistence in these variables (other than through persistence in the atmosphere’s lower boundary), it is likely that the conclusions would be consistent with those presented here.
Much of the research into the IPO has focused on hydrological variables such as seasonal streamflow (Verdon et al. 2004) and flood risk (Kiem et al. 2003) rather than directly on precipitation. In such cases, there is likely to be significant persistence in streamflow time series, because of the role of catchment moisture stores (e.g., soil moisture and groundwater) in imparting low-frequency behavior to these hydrological variables (Pathiraja et al. 2012). In these cases, a significant IPO–streamflow association might remain after accounting for ENSO at time t, for the following reasons:
The precipitation up to k time steps in the past influences the catchment moisture store at time t, as a result of the persistence in the soil moisture and/or groundwater levels in the catchment.
The precipitation over the period from t − k to t − 1 may be associated with ENSO over the same time period, with at least some of the ENSO variability from this period being incorporated into the definition of the IPO index at time t.
Although this provides a mechanism by which the prior states of ENSO can influence streamflow, it is likely to be preferable to explicitly simulate the persistence structure of soil moisture and/or groundwater in a catchment, since the dynamics and time scales of the moisture stores will vary markedly from one catchment to the next whereas the time scale of the IPO index is a fixed quantity.
This study evaluated the role of the interdecadal Pacific oscillation (IPO) in modulating the El Niño–Southern Oscillation (ENSO)–precipitation relationship. The outcomes of the study are the following:
The IPO index can be derived as a smoothed version of standard representations of ENSO.
It is theoretically possible to spuriously attribute modulation of the ENSO–precipitation teleconnection to the IPO, whereas in reality the apparent modulation is due to a combination of data stratification and model misspecification.
Based on observational data of global land surface precipitation, it was found that the information brought by the IPO index beyond the information already contained in ENSO is quite limited, and could be explained by the role of stratification and model misspecification (i.e., assuming a linear model to simulate a nonlinear predictor–response association).
This paper has focused entirely on the possible modulation of the teleconnection between ENSO and precipitation by the IPO index; as such, the analysis does not require assumptions as to whether the ENSO phenomenon itself exhibits low-frequency variability, and whether this low-frequency variability in ENSO is forced internally or externally to the ENSO system.
We also described a number of potential issues associated with using the IPO index to statistically model the ENSO–precipitation relationship, including the difficulty in physically explaining the mechanism that causes the modulation, and we caution against the use of the IPO index for predicting future precipitation. In the absence of the identification of a physical mechanism of how the IPO modulates the ENSO–precipitation relationship, the IPO index is not recommended as the basis for statistical modeling of seasonal or annual precipitation because of the potential for statistical artifacts when stratifying and using smoothed series to simulate the ENSO–precipitation relationship.
S. Westra was supported by Discovery project DP120100338. M. Thyer was supported by Australian Research Council Discovery project DP1094796.
a. Preliminaries: The joint distribution of (Xt, Yt, Rt)
The following definitions link the random variables Xt, Yt, and Rt:
We start by showing that the joint distribution of (Xt, Yt, Rt) is a trivariate normal distribution. To achieve this, it suffices to show that any linear combination Z = α1Xt + α2Yt + α3Rt is univariate normal,
Therefore, Z can be written as a constant plus a linear combination of independent normal variables, since for any k ≠ 0, the random variables Xt+k, Xt, and εt are independent and normal by construction. Since any linear combination of independent normal variables remains normal (i.e., the normal family is stable), it follows that Z is also normal for any value of (α1, α2, α3), and therefore (Xt, Yt, Rt) is trivariate normal.
We now derive the mean and covariance matrix of (Xt, Yt, Rt). It is straightforward to show that the mean vector is equal to μ = (0, 0, b). For the covariance matrix, we start by deriving the marginal variances.
The covariance terms are computed as follows:
Finally, the joint distribution of (Xt, Yt, Rt) is
Most subsequent results are derived from this trivariate normal distribution. In particular, we will make use of the following well-known result for conditional distributions of a multivariate normal distribution (Mardia et al. 1976). Let Z be a random variable following a multivariate normal distribution with mean μ and covariance matrix Σ. Let us further assume that Z, μ, and Σ are partitioned as
The distribution of (Z1 | Z2 = z) is then multivariate normal, with the following mean and covariance:
b. Correlation between Xt and Yt
This correlation is directly obtained from Eq. (A1) as
c. Dependence between Rt and Yt
We first deduce the conditional distribution of (Rt | Yt = y) from Eq. (A2):
The correlation between Rt and Yt is directly obtained from Eq. (A1) as
d. Rt conditional on Yt
Consider the following conditional distributions:
where R±y is shorthand for the distribution of Rt conditional on Yt = ±Δy, and where Δy represents a measure of the separation between the two conditional distributions. As the two distributions R±y are normally distributed, the probability that R+y is less than R−y can be evaluated as
where Φ(Z) is the cumulative probability function for the standard Gaussian distribution.
If we express Δy as function of the number of standard deviations (c) of σy (i.e., Δy = cσy = cσxω−1/2), we can refer to c as the “degree of separation.” Hence, the numerator of Eq. (A3) can be rewritten as
To further simplify the numerator, we assume that the Xt have been standardized so that σx = 1. As a represents the coefficient of the linear relationship between Xt and Rt, while σε represents the noise in this relationship, we define a pseudo signal-to-noise ratio as d = a/σε. Using these relationships, the denominator in Eq. (A3) can be rewritten as
so that Eq. (A3) simplifies to
e. Stratified distributions of Rt
The probability density function of (Rt | Xt ∈ Ix) can be derived using Bayes’s theorem as follows:
In this expression, the denominator and the second term of the numerator are known since the marginal distributions of Rt and Xt can be deduced directly from Eq. (A1). To compute the first term in the numerator, one needs to derive the distribution of (Xt | Rt = r). By applying Eq. (A2),
The pdf of (Rt | Xt ∈ Ix) is hence equal to
where we introduced the following notation:
f(u; m, s2) is the pdf of a univariate normal variable with mean m and variance s2, evaluated at u, and
Φ(I; μ, Σ) is the probability that a multivariate normal variable with mean μ and variance Σ belongs to I. Note that there is no explicit formula for this probability, but it can be approximated numerically (see, e.g., package mvtnorm in R).
With a similar reasoning, the pdf of (Rt | Xt ∈ Ix, Yt ∈ Iy) is
f. Distribution of Rt given Xt in a modulation model
The following statistical model is assumed in this section:
The objective is to derive the distribution of Rt conditional on the value taken by Xt. This can be done by applying the law of total probability as follows:
To compute the terms of this equation, one needs to determine the distribution of Yt conditional on Xt = x. By applying Eq. (A2),
where F(u; m, s2) is the cumulative distribution function of a univariate normal variable with mean m and variance s2, evaluated at u.
The distribution of (Rt | Xt = x) is therefore a mixture of two normal distributions. Its expectation can be derived as
g. Rewriting of the three formulations as linear statistical models
Using the notation
the models can be rewritten as follows: