## 1. Introduction

In a thought provoking paper, Trenberth and Hoar (1996, hereafter, TH) suggested that the 1990–95 El Niño–Southern Oscillation (ENSO) event was very unusual and may provide evidence of global warming and climate change associated with increased greenhouse gas concentrations in the atmosphere. Their argument is based on a statistical analysis of the 1882–1995 time series (Fig. 1) of seasonal (4 seasons yr^{−1}) Darwin sea level pressure (DSLP). They fit an autoregressive-moving average (ARMA) model of order (3, 1) to the seasonal 1882–1981 DSLP time series. Based on simulations from the ARMA model they assign a return period of 2000 yr to the mean anomaly associated with the 1990–95 event. The return period associated with a 22-season long 1990–95 run of positive anomaly (PA) was reported as 8850 yr.

The estimation of recurrence intervals for multivariate extreme values (e.g., a run) of time series requires the specification of a time series model and its attributes. The choice of model may affect the outcome. Although the ARMA model is a plausible choice, investigation of alternate models is useful. Specifically, we (a) directly explore evidence of nonstationarity in ENSO occurrence over the record, and (b) examine alternate representations (e.g., Markov chains) that focus more directly on PA runs and can model nonstationary processes. An analysis of the DSLP time series from this perspective is presented here.

## 2. Overview

An ARMA model for DSLP assumes stationary, linear dynamics underlying the ENSO occurrence process. These assumptions of linear dynamics in particular may be improved on as seen in Fig. 2. This figure shows a locally weighted regression or LOESS (Cleveland and Devlin 1988) surface fitted to the scatterplot of DSLP (*t* − 2), DSLP(*t* − 1), DSLP(*t*); the fitted surface can be thought of as the state transition function from DSLP(*t* − 2), DLSP(*t* − 1) to DSLP(t). A nonlinear state transition function has an “S” shape that is suggestive of a binary or two state system with switching between the two states. This is consistent with the paradigm of ENSO dynamics that consider nonlinear underlying dynamics (Tziperman et al. 1994, 1995; Graham and White 1988) for the tropical ocean–atmosphere system associated with ENSO.

Trenberth and Hoar applied the ARMA model to a continuous random variable, and identified runs of positive (PA) or negative (NA) anomalies from the climatological mean. The run lengths are used to assess the return period associated with the 22-season long 1990–95 PA anomaly. The two states (PA, NA) can be thought of as the state space of a binary discrete random variable. It is possible for the mean run length of PA or NA to be biased downward in this process as random crossings of the mean DSLP may be more likely in a model defined with a continuous rather than a discrete random variable. Conversely, there is a loss of information if one works directly with a discrete representation. Nevertheless, a discrete representation allows a more direct focus on the run length statistics. Nonparametric time series models are useful to address the possibility of nonlinear dependence and to explore nonstationarity.

The 1882–1995 DSLP series is converted into a binary sequence (PA = 1, NA = 0), hereafter referred to as BDSLP, with anomalies defined relative to the 1882–1981 DSLP mean. The sign of the anomaly depends on the choice of the period used to compute the mean. Different choices may influence conclusions. Consequently, we used the same period as TH to allow a consistent comparison with their results. Nonstationarities in the rate of occurrence of PA events over the record are then identified using a kernel intensity estimator (Solow 1991; Rajagopalan and Lall 1995b). The kernel intensity estimate can be thought of as a filtered representation of the BDSLP series using a filter with a bandwidth optimized by cross validation.

Markov chain (MC) representations of the BDSLP series are then considered. A second-order, homogeneous MC is first fitted to several segments of the historical record and differences in state transition probabilities across these segments are noted. Kernel (weighted moving average) methods (Rajagopalan and Lall 1996) are then used to estimate the state transition probabilities of a first-order nonhomogeneous MC as a continuous function of time over the historical record. These two representations of nonstationarity in ENSO occurrence are compared. Finally, we make 100 simulations, each of length 100000 yr from a second-order homogeneous MC model and a first-order nonhomogeneous MC model assess the probability of exceedance of runs longer than 21 seasons (the 1990–95 event).

## 3. PA occurrence rate

*T*is a random variable

*n*(

*T*) with a Poisson distribution with mean

*λ*

*T*:

*p*

*n*

*T*

*λ*

*T*

^{k}

*e*

^{−(λT)}

*k*

*λ*is called the rate or intensity parameter. A nonhomogeneous Poisson process is one for which the rate

*λ*(

*t*) is presumed to vary over time. If

*T*is taken to be one season,

*λ*(

*t*) is interpretable as the time-varying probability of occurrence of a PA event [hereafter,

*P*

_{PA}(

*t*)]. Serial dependence of PA values is not considered, and hence this model is appropriate to investigate variation in the average rate of incidence of PA over time, but not for investigations of run length properties.

*P*

_{PA}(

*t*) from the record, through an optimal, weighted moving average of the rate of occurrence of PA over time. Here, we use a discrete kernel estimator from Rajagopalan and Lall (1995a):

*H*is the characteristic function for PA;

*H*(

*t*

_{j}) = 1 if DSLP > 0 at time

*t*

_{j}and 0 otherwise;

*t*

_{1},

*t*

_{2}, . . . ,

*t*

_{n}are the time indices from the start to the end of the record;

*K*(·) is a kernel or weight function centered at the time

*t*; and

*h*is an integer bandwidth or averaging interval:

*h*is selected by minimizing a least squared cross-validation (LSCV) function [Rajagopalan and Lall 1995a, their Eq. (16)]:

*P*

^{*}

_{PA}

*t*

_{i}) is the estimate at time index

*t*

_{i}obtained by dropping a PA event at time

*t*

_{i}from the dataset. The LSCV optimal bandwidth

*h*for the BDSLP time series is 35 seasons. The rate function is consequently estimated using a 70-season (≈17 yr) weighted moving window. The estimated rate is plotted in Fig. 3. The estimates of the first and the last 35 seasons lack a full complement of data on one side of the window and are disregarded. If the occurrence process was a homogeneous Poisson process, then the constant rate

*λ*would be the inverse of the average number of seasons of PA in the entire record. In this case, it turns out to be 0.5 and is shown as a dotted line in Fig. 3. The rate of occurrence of PA at the turn of the last century is similar to that in recent times. It appears to be almost constant during the 1920–65 period and starts to increase from about 1970. This is consistent with recent observations and links to interdecadal changes in climate throughout the Pacific basin (Trenberth 1990; Trenberth and Hurrell 1994), and reflects an interdecadal to century-scale variation.

The choice of bandwidth reflects a trade-off between bias and variance in the estimate of the rate of occurrence. A reduction in the bandwidth potentially reduces the bias, but leads to an increase in the variance since a smaller sample is being used and vice versa.

## 4. Homogeneous Markov chain

Markov chain models (e.g., Gabriel and Neumann 1962; Hopkins and Robillard 1964; Guzman and Torrez 1985, etc.) of times series are attractive because of their nonparametric nature, ease of application, interpretability, ability to approximate nonlinear state transition functions, and well-developed literature. Discrete parameter MC models are considered here. For a *k*th-order MC, the state (PA or NA) at the current time “*t*” is presumed to depend only on the state in the preceding “*t* − *k*” time steps. The MC is defined through data-based estimates of state transition probabilities. For a Markov chain of order one, the transition probabilities needed are the probability of an NA following a PA, *P*_{PANA} = *a*_{1} and the probability of PA following an NA, *P*_{NAPA} = *a*_{2}; *P*_{PAPA} and *P*_{NANA} are (1 − *a*_{1}) and (1 − *a*_{2}), respectively. Tong (1975) and Gates and Tong (1976) proposed Akaike’s information criterion (AIC) for choosing the order *k.*

For the seasonal 1882–1981 BDSLP time series, corresponding to the DSLP series modeled by TH, this criteria suggests an order two MC. Eight permutations of PA and NA are possible for the three season sequences considered. The probabilities *P*(PA_{t}|PA_{t−1}, PA_{t−2}) and *P*(NA_{t}|NA_{t−1}, NA_{t−2}) are 0.77 and 0.72, respectively, suggesting the possibility of long run lengths in either state.

The transition probabilities for a second-order Markov chain applied to three subperiods, 1882–1921, 1922–61, and 1962–88, are also summarized in Table 1. The transition probabilities P(PA_{t}|PA_{t−1}, PA_{t−2}) and *P*(NA_{t}|NA_{t−1}, NA_{t−2}) for the three time segments indicate that the probability for long runs of PA or NA varied considerably over the last 100 yr, and the current probability levels for persistence of PA are reaching those for the start of the century. Using the transition probabilities from the 100-yr period, 1000 Monte Carlo simulations each 40 yr in length (the approximate length of each subperiod) were made and the 95% and 90% confidence limits for each transition probability were estimated. The transition probabilities of the early and recent subperiods were significantly different at the 90% confidence level.

One hundred simulations, each of length 100000 yr were made with the second-order MC fit from the 100-yr transition probabilities. The average return period of runs of PA greater than or equal to 22 seasons was 575 yr. Simulations were also made from the second-order MC fitted from each of the three time segments. The average return period for a run of PA of length greater than or equal to 22 seasons were found to be 79, 3250, and 660 yr using the models based on 1882–1921, 1922–61, and 1962–88, respectively.

## 5. Nonhomogeneous Markov chain

Nonstationarity in the serial dependence attributes of the BDSLP series is investigated using a nonhomogeneous MC with time-varying transition probabilities estimated using kernel methods (Rajagopalan et al. 1996). Only a first-order MC model was considered to simplify the analysis.

*P*

_{PANA}(

*t*) and

*P*

_{NAPA}(

*t*) are

*H, K,*and

*h*have the definitions provided earlier. To complete the set of transition probabilities note that

*P*

_{PAPA}(

*t*) = 1 −

*P*

_{PANA}(

*t*) and

*P*

_{NANA}(

*t*) = 1 −

*P*

_{NAPA}(

*t*).

The transition probability estimates at any time *t* are obtained by using the information from time points in the range [*t* − *h*_{(·)}, *t* + *h*_{(·)}] with the contribution to the estimate determined by the discrete kernel as given in Eq. (3). Rajagopalan et al. (1996) propose a LSCV measure for selecting the bandwidths for the estimators in Eqs. (5) and (6).

To allow for a graphical comparison with the rate parameter estimated for PA events, the bandwidths *h*_{PN} and *h*_{NP} were chosen to be 35 seasons, and the estimators [Eqs. (5) and (6)] were applied to the 1882–1995 data. The transition probabilities *P*_{PAPA}(*t*) and *P*_{NANA}(*t*) are shown in Fig. 4 (as the other two probabilities are complimentary). The dotted line in each plot represents the transition probability estimated by fitting a one-step homogeneous MC to the BDSLP series. As before, the estimates of the first 35 and the last 35 seasons are disregarded. The trend in *P*_{PAPA}(*t*) is of interest. First, we see that it is consistent with the *P*_{PAPA}(*t*) values estimated for the three segments using a homogeneous MC. Second, it is generally similar to the trend in *P*_{NANA}(*t*), during 1891–1970. The transition probabilities are high through 1882–1920, then decrease with minima around 1930 and 1955–60, and then increase until 1970. During 1970–86, *P*_{PAPA}(*t*) continues to increase, while *P*_{NANA}(*t*) decreases sharply to 0.4. This may represent an interesting shift in the regime associated with the underlying dynamics of the system. The general trends (Fig. 3) in the probability of PA occurrence, *P*_{PA}(*t*), are consistent with the trends in *P*_{PAPA}(*t*). This is a consequence of the fact that each PA event is several seasons long. The variations in *P*_{PAPA}(*t*) are more pronounced than those in *P*_{PA}(*t*).

The probability of a PA event in any season has recently reached levels as high as or higher than those at the turn of the last century (Fig. 3). Given the high recent values of *P*_{PA}(*t*) one may be tempted to argue for a change in the dynamics of the underlying system. However, an examination of the variations of *P*_{PAPA}(*t*) reveals that the magnitude of this statistic is now approaching levels reached during 1882–1920. This statistic is indicative of the potential for a spell of PA values. The observation that *P*_{PAPA}(*t*) was also high during the 1882–1920 period, when CO_{2} was lower, seems to suggest natural variability of the system, rather than a monotonic response to the monotonic increase in CO_{2} as a more plausible explanation for the observed phenomena.

The persistence of the system in the state PA or NA as indicated by *P*_{PAPA}(*t*) or *P*_{NANA}(*t*) is significantly different from that indicated by *P*_{PA}(*t*) or *P*_{NA}(*t*). This is implicit in the fact that the optimal order of the MC is greater than one and is also established from an analysis of *P*_{PAPA}(*t*) − [*P*_{PA}(*t*)]^{2} and *P*_{NANA}(*t*) − [*P*_{NA}(*t*)]^{2}, respectively. An interesting observation from the plots of *P*_{PAPA}(*t*) and *P*_{NANA}(*t*) is that the dynamics of the system is likely different for the period at the end of the last century and the recent period. While the *P*_{PA}(*t*), *P*_{PAPA}(*t*), and *P*_{NA}(*t*) values are similar for the two periods, the *P*_{NANA}(*t*) values are quite different (high at the end of the nineteenth century, and low during the recent period). Thus, one would expect PA events to have longer spell lengths, but NA events to have shorter spells than in the earlier period. The higher recent values of *P*_{PA}(*t*) are consistent with the interpretation that while the PA spell length attributes have not changed, the frequency with which PA spells are initiated is increased, since the NA spells are now shorter. Indeed, the *P*_{NANA}(*t*) is the lowest it has been over the record, and provides more of an evidence for long-term trend than any of the other statistics. The issue of whether this is a consequence or sign of natural variability or of greenhouse gas–related warming remains unresolved.

One hundred thousand simulations, each 100 yr long were made from the first-order nonhomogeneous MC fitted to the 1882–1981 BDSLP data. The process used here is analogous to drawing samples that are statistically similar to the 1882–1981 record, with Markov chain probabilities changing over time within each sample, in a manner consistent to the time line in the original data. The average return period for a PA run of 22 seasons is about 350 yr. A 1000-yr run from the Zebiak–Cane model (Zebiak and Cane 1987) for ENSO produced a return period of about 330 yr for warm spells of length greater than or equal to 22 seasons. Thus, this model of the tropical Pacific, with no feedbacks from elsewhere, is capable of producing long warm spells. Internal oscillations of the tropical ocean–atmosphere system are consequently a plausible cause of the unusual long spells.

## 6. Conclusions

An objective of the work presented here was to explore the sensitivity of the conclusions from the ARMA analysis of DSLP presented by TH to (a) model form and (b) likely nonstationarities. The rarity of the 1990–95 event in their analysis led TH to argue for anthropogenic effects rather than natural decadal-scale variability. One could argue that the low-frequency variability is adequately captured by their ARMA model since the variance spectrum of the data is adequately reproduced by the fitted model. However, second moment properties (the spectrum) and linear dynamics (the ARMA structure) need not constitute a necessary and sufficient description of a stochastic process.

The nonparametric methods used here permit such sensitivity analyses while retaining the general Markovian constructs inherent in ARMA modeling. Models of lags 0, 1, and 2 were used. The state variable used in the models is directly the indicator of the state of interest (PA or NA), rather than a surrogate (DSLP) from which runs of this variable are inferred as with the ARMA model. Nonstationarity is directly focused on. One observes that there is considerable sensitivity to assumptions regarding model form, to the segment of the record used, and to explicit consideration of nonstationarity in modeling the process. In the analyses reported here, the return periods are considerably smaller than those obtained using the ARMA analysis. Indeed, the average return period of 350 yr from the nonhomogeneous MC suggests that the 1990–95 event could occur in a 114-yr record with a probability of 0.28, not a very rare event at all. The spectral analysis of BDSLP suggests spectral power at interannual (around a 4-yr band) and interdecadal (around 17 yr) frequencies.

This paper also illustrates the point that given a limited record of 100 yr of serially correlated data, estimates of the return period of extreme runs of data are likely to be subject to a very high degree of variability and sensitivity to model assumptions. Also, the time series is too short to determine the true return time of a 5-yr 1990–95 event, and a return period of 350 yr is just as likely (and defendable) as TH’s estimate of 8850 yr because these differences in return period can result from subtle (and equally defensible) differences in the models that are used to fit the existing data.

In summary, we would like to argue that conclusions as to anthropogenic factors as the cause of the 1990–95 event may be somewhat premature, in the absence of any direct statistical or mechanistic results that relate it to the historical evolution of greenhouse gases. At the same time, the likelihood of such factors being responsible cannot be ruled out. It may be useful to consider effects of anthropogenic factors as juxtaposed on natural decadal-scale variability rather than viewing them as mutually exclusive causes of the observed variability. Thus, we cannot rule out the possibility that green house warming made the recent persistence event more likely, even though the variability in the 100-yr period appears to provide an adequate explanation for the increase in the recent frequency of PA events.

## Acknowledgments

This work is supported by NOAA Grants UCSIO CU01556601D and NA36GP0074-02. Thanks are due to the anonymous reviewers for helpful comments and suggestions in improving the manuscript.

## REFERENCES

Cleveland, W. S., and S. J. Devlin, 1988: Locally weighted regression: An approach to regression analysis by local fitting.

*J. Amer. Stat. Assoc.,***83,**596–610.Cox, D. R., and V. Isham, 1980:

*Point Processes.*Chapman and Hall, 188 pp.Diggle, P. J., 1985: A kernel method for smoothing point-process data.

*Appl. Stat.,***34,**138–147.Gabriel, K. R., and J. Neumann, 1962: A Markov chain model for daily rainfall occurrence at Tel Aviv.

*Quart. J. Roy. Meteor. Soc.,***88,**90–95.Gates, P., and H. Tong, 1976: On Markov chain modeling to some weather data.

*J. Appl. Meteor.,***15,**1145–1151.Graham, N. E., and W. B. White, 1988: El Niño cycle: A natural oscillator of the Pacific Ocean–atmosphere system.

*Science,***240,**1293–1302.Guzman, A. G., and C. W. Torrez, 1985: Daily rainfall probabilities: Conditional upon prior occurrence and amount of rain.

*J. Climate Appl. Meteor.,***24,**1009–1014.Hopkins, J. W., and P. Robillard, 1964: Some statistics of daily rainfall occurrence for the Canadian prairie provinces.

*J. Appl. Meteor.,***3,**600–602.Rajagopalan, B., and U. Lall, 1995a: A kernel estimator for discrete distributions.

*J. Nonparam. Stat.,***4,**409–426.——, and ——, 1995b: Seasonality of precipitation along a meridian in the western United States.

*Geophys. Res. Lett.,***22**(9), 1081–1084.——, ——, and D. G. Tarboton, 1996: A nonhomogeneous Markov model for daily precipitation simulation.

*ASCE J. Hydrol. Eng.,***1**(1), 33–40.Solow, A. R., 1991: The nonparametric analysis of point process data: The freezing history of Lake Konstanz.

*J. Climate,***4,**116–119.Tong, H., 1975: Determination of the order of a Markov chain by Akaike’s information criteria.

*J. Appl. Prob.,***12,**488–497.Trenberth, K. E., 1984: Signal versus noise in the Southern Oscillation.

*Mon. Wea. Rev.,***112,**326–332.——, 1990: Recent observed interdecadal climate changes in the Northern Hemisphere.

*Bull. Amer. Meteor. Soc.,***71,**988–993.——, and J. W. Hurrell, 1994: Decadal atmosphere–ocean variations in the Pacific.

*Climate Dyn.,***9,**303–319.——, and T. J. Hoar, 1996: The 1990–1995 El Niño–Sourthern Oscillation event: Longest on record.

*Geophys. Res. Lett.,***23**(1), 57–60.Tziperman, E., L. Stone, M. Cane, and H. Jarosh, 1994: El Niño chaos: Overlapping of resonances between the seasonal cycle and the Pacific Ocean–atmosphere oscillator.

*Science,***264,**72–74.——, M. Cane, and S. Zebiak, 1995: Irregularity and locking to the seasonal cycle in an ENSO prediction model as explained by the quasi-periodicity route to chaos.

*J. Atmos., Sci.,***52,**293–306.Waymire, E., and V. K. Gupta, 1981: The mathematical structure of rainfall representations, 2, A review of the theory of point processes.

*Water. Resour. Res.,***17**(5), 1273–1286.Zebiak, E. S., and M. A. Cane, 1987: A model El Niño–Southern Oscillation.

*Mon. Wea. Rev.,***115,**2262–2278.

Scatterplot of DSLP(*t* − 2) and DSLP(*t* − 1) vs DSLP(*t*) along with the fitted LOESS surface, for the period 1882–1991.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

Scatterplot of DSLP(*t* − 2) and DSLP(*t* − 1) vs DSLP(*t*) along with the fitted LOESS surface, for the period 1882–1991.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

Scatterplot of DSLP(*t* − 2) and DSLP(*t* − 1) vs DSLP(*t*) along with the fitted LOESS surface, for the period 1882–1991.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

Intensity or rate function *P*_{PA}(*t*) estimated by the DKE. The dotted line is the constant rate estimated from the BDSLP data.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

Intensity or rate function *P*_{PA}(*t*) estimated by the DKE. The dotted line is the constant rate estimated from the BDSLP data.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

Intensity or rate function *P*_{PA}(*t*) estimated by the DKE. The dotted line is the constant rate estimated from the BDSLP data.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

One step time-varying transition probabilities (a) *P*_{PAPA}(*t*) and (b) *P*_{NANA}(*t*) for the nonhomogeneous MC estimated using kernel estimators. The dotted line in each frame is the corresponding one step homogeneous transition probability estimated from the BDSLP data.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

One step time-varying transition probabilities (a) *P*_{PAPA}(*t*) and (b) *P*_{NANA}(*t*) for the nonhomogeneous MC estimated using kernel estimators. The dotted line in each frame is the corresponding one step homogeneous transition probability estimated from the BDSLP data.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

One step time-varying transition probabilities (a) *P*_{PAPA}(*t*) and (b) *P*_{NANA}(*t*) for the nonhomogeneous MC estimated using kernel estimators. The dotted line in each frame is the corresponding one step homogeneous transition probability estimated from the BDSLP data.

Citation: Journal of Climate 10, 9; 10.1175/1520-0442(1997)010<2351:AEOAAV>2.0.CO;2

Transition probability matrix. The values within the parentheses (from left to right) are for the three time segments: 1882–1921, 1922–61, and 1962–88.

^{}

*Lamont-Doherty Earth Observatory Contribution Number 5667.