## 1. Introduction

Theoretical methods from which synthetic time series can be constructed provide a means to overcome the limitations of our inevitably short climate records. The algorithm devised in this study is a synthetic sea surface temperature (SST) anomaly generator, which successfully reproduces the equatorial Pacific SST anomaly patterns associated with the El Niño–Southern Oscillation (ENSO) and other identifiable SST anomaly signatures.

ENSO is defined quantitatively by its SST signature, using indices such as the Japan Meteorological Agency (JMA) index (discussed in section 4), but is more broadly defined as a coupled ocean–atmosphere phenomena in which anomalous increases (decreases) of SSTs in the eastern equatorial Pacific are preceded by anomalous westerly trade wind patterns (strengthened easterly) in the central equatorial Pacific, and the associated excitation of oceanic internal wave dynamics.

Extreme variability in the equatorial Pacific has important implications for the Tropics as well as higher latitudes. Quantities such as air temperature, SST, and specific humidity in the western equatorial Pacific (WEP) are comparably large with respect to other latitudes due to intense, year-round solar radiation. The persistent trade winds across the equatorial Pacific basin, in conjunction with the high SSTs (≥25°C), result in very high surface latent heat fluxes and evaporation in the WEP. Anomalous SST signatures would thus imply a displacement of large heat and moisture transfers, corresponding convective perturbations, and associated atmospheric wave formation (Salby 1996). Such perturbations in the Tropics have been shown to be associated with anomalous patterns in pressure, temperature, and moisture in extratropical latitudes (Yarnal 1985).

We develop a method that successfully reproduces the SST signatures associated with warm and cold events to study ENSO extremes. Algorithms for generating synthetic data are developed using frequency-domain analyses to extract information regarding the contribution of different frequency oscillations to the associated variability of the time series. Thompson and O’Brien (1973) developed a technique to produce realizations of wind stress data, given a known sample spectrum of wind speed. An analytic approximation of the kinetic energy spectrum was derived from the Fourier coefficients of wind speed. Random, antisymmetric phases, [*θ*(*f*) = *θ*(−*f*)], are applied to the analytic spectrum. The wind stress realization is obtained via the inverse Fourier transform. Recently, Theiler et al. (1992) used a frequency-domain method, although no analysis of the spectrum is done, to generate synthetic data from the original amplitude spectrum by simply randomizing phases under the aforementioned constraint. Theiler’s method, called the method of surrogate data, was applied by Elsner and Tsonis (1993) to investigate the existence of nonlinearity in monthly sea level differences between Tahiti and Darwin. The datasets generated using these methods remain statistically indistinguishable from the observed realizations and thereby produce realistic time series from which valid statistical inferences can be made.

Thompson and O’Brien utilize a functional fit to the amplitude spectrum, which assumes that such an analytical approximation accurately captures the variance contribution with frequency needed to produce consistent realizations. The method developed in this study will take the work of Thompson and O’Brien a step further. We assert that the analytic approximation to the amplitude function should be a deterministic variance distribution within the frequency bands attributable to physical processes, such as ENSO. The remaining portions of the amplitude function are modeled with the theoretical spectra for red and white noise processes, which are scaled to applicable amplitudes determined directly from the original spectrum. Thus, we allow the rest of the spectrum to represent random interactions between actual physical processes, correlated noise, or random noise. The method used to determine an appropriate functional fit, therefore, requires further spectral analysis than existing methods, and is described in section 3.

By validating ENSO indices of a sample of synthetic data against the data reconstructed by Smith et al. (1996), it is found that the frequencies of ENSO occurrences are retained, as are the approximate number of ENSO warm and cold events in a 40-yr period. Furthermore, the synthetic data have approximately the same average ENSO event duration as the Smith et al. (1996) data.

In the following, section 2 describes the Smith et al. (1996) SST dataset and the empirical orthogonal function (EOF) technique that makes a frequency domain analysis of these data possible. A description of the methods used to determine an appropriate theoretical model for the amplitude spectra is discussed in section 3. In section 4, ENSO indices are used to validate a sample set of synthetic data against the Smith et al. (1996) data. Further, ENSO event extremes are quantified in terms of expected return periods for warm and cold events of a given SST magnitude. All findings are then summarized in section 5.

## 2. Data

The data used for this project are the EOF reconstructed SSTs (Smith et al. 1996). The complete dataset is global, has a 2° lat × 2° long spatial resolution, and extends from 1950 to 1992. We utilize only the equatorial Pacific region (30°S–30°N and 120°E–90°W) during 1953–92.

The reconstruction technique of Smith et al. (1996) uses monthly SST fields from January 1982 to December 1993 (Reynolds and Smith 1994). These data have been analyzed on a 1° grid using the method of optimum interpolation (OI). The OI technique incorporates both in situ and satellite data. The satellite data improve the overall spatial coverage of the SST data, while the in situ data allow for the correction of satellite SST retrieval bias due to atmospheric aerosol content (Reynolds and Smith 1994). The reconstruction methodology is described in Smith et al. (1996), and is summarized in the following sentences. The in situ data that are found to be without systematic bias and represent sufficient spatial coverage, as determined by Smith et al. (1996), extend from 1950 to the present. However, inadequacies such as inconsistent spatial and temporal in situ data coverage have resulted in gridded analyses that contain bull’s-eyes in data-rich regions and gaps in other regions. The improved spatial coverage provided by the OI data is utilized to fill spatial gaps and produce a smooth SST field for observations prior to 1982. The spatial components of an EOF analysis on the OI data are used as basis functions. These functions are fit to existing data, such as the Comprehensive Ocean Atmosphere Data Set (Slutz et al. 1985) from 1950 to 1982. The result is a reconstructed SST dataset for the period 1950 to 1992 on the 2° grid.

Monthly SST climatology data on a 2° grid are utilized for the computation of SST anomalies in this study. These data were spatially averaged from the monthly mean OI SSTs from 1982 to 1993 on a 1° grid (Smith et al. 1996).

### a. EOF analysis method

The EOF analysis method is used by Smith et al. (1996) for data reconstruction, as well as in this study to allow for the application of classical time series analysis. The technique is especially useful for data covering large spatial domains. The data are rewritten as orthogonal spatial and temporal components and thus extract necessary information regarding the variability.

*y*that vary both spatially as well as temporally. The EOF representation would be

*t*is time,

*s*represents horizontal coordinate points,

*S*

_{j}are spatial fluctuations in the data (spatial patterns),

*T*

_{j}are temporal fluctuations (principle components), and

*M*is the number of orthogonal patterns (equal to the dimension

*s*or

*t,*whichever is smaller). The method requires the construction of the covariance matrix of the original data,

**C**

**Cx**

**λ**

**x**

**λ**is a matrix of

*M*eigenvalues, and

**x**

*x*

_{j}. The

*M*elements of

**x**

*δ*is the Kronecker delta (defined as unity where the indices are equal and it vanishes otherwise),

*k*is a space or time index, and

*E*is either a spatial or temporal component. Using this relation, it can be shown that the sum of the squared elements of the nonnormalized component are the variances of each EOF. The individual components of the EOF analysis are then ordered by descending percent variance contribution. The physical mechanisms that constitute the largest amount of variability in a region are associated with the largest eigenvalues.

The EOF analysis for this study is applied to the Smith et al. (1996) SST anomalies in the selected equatorial Pacific domain. The spatial and temporal indices are 2093 and 480, respectively. The spatial domain contains 2250 grid points; however, 157 of them correspond to land, and are thus omitted from the data matrix. We find that the first principle component (PC1) constitutes approximately 46% of the total variance in the SSTs for this region. The second principle component (PC2) constitutes approximately 10% of the total variance.

### b. Spectra of PCs 1 and 2

The raw amplitude spectra of each PC are of primary interest for the identification of the pseudoperiodicities and associated physical processes that contribute to the variability. For example, the largest spectral peaks of PC1 are mainly attributable to ENSO (Fig. 1). There are pronounced peaks at frequencies near 0.01670 and 0.02355 cpm, which correspond well with the timing of SST anomalies in the eastern equatorial Pacific (EEP) according to the delayed action oscillator theory of ENSO (Philander 1990; Allan et al. 1996). The theory holds that a 12–15-month delay between Rossby wave–induced SST anomaly maxima and minima exists in the EEP. The SST maximum occurs during an ENSO warm event. The central/eastern equatorial Pacific westerly wind anomaly is associated with deepening of the thermocline due to the excitation of both downwelling Kelvin waves at the equator and off-equatorial Rossby waves that further enhance downwelling (or weaken upwelling). The SST minimum is indicative of an ENSO cold event. The upwelling Rossby waves reflect off the western boundary of the equatorial Pacific and propagate eastward as upwelling Kelvin waves, which reverses the sign of the SST anomaly in the EEP (Allan et al. 1996). Rossby waves that favor upwelling are thus excited, which enhances the EEP upwelling and the corresponding decrease in the SSTs. The negative feedback caused by wave dynamics initially serves to break down the warm event, but later leads to enhanced decreases in SSTs in the EEP, upwelling, and a reversal of the westerly wind anomaly. Based on the speed of wave propagation and the timing of excitation, the process typically results in SST maxima and minima occurring 12–15 months apart (Allan et al. 1996).

The dominant physical mechanisms represented by spectral peaks in PC2 are ENSO, as well as the decadal and quasibiennial oscillations (Fig. 2). The physics of the quasibiennial oscillation (QBO) are not well known;however, the QBO is detected in the SST fluctuations of the equatorial Pacific region, as well as in the troposphere and stratosphere (Salby 1996). The peak in the raw amplitude spectrum of PC2 corresponding to approximately 10 yr (*f* ∼ 0.008 cpm) supports a fluctuation in SST patterns in the equatorial Pacific on decadal timescales. There is evidence that the amount of solar radiation entering the earth’s atmosphere (and associated surface heating) varies, possibly due to sunspot activity, on an approximately decadal timescale (Barnett 1989). Thus, the decadal SST variability could be a result of the sunspot cycle. Enfield and Cid S. (1991) argues that variations in ENSO characteristics on decadal timescales are inversely related to high and low sunspot activity; when sunspot activity is low, ENSO occurrence is found to be more frequent. It is also possible that the variability is related to internal ocean dynamics. Although the physical mechanisms are not known, the decadal variability of SSTs in the Tropics remains pronounced (IOC 1992).

The three aforementioned quasiperiodicities are considered to prevail over any other SST anomaly fluctuations, such as the intraseasonal, seasonal, or other oscillations in the first two principle components. The amplitude peaks associated with interannual (ENSO), biennial, and decadal oscillations are comparably large relative to other spectral peaks, and thus account for the most variance. Thus, the term *select physical processes* throughout this paper refers to the interannual, quasibiennial, and decadal oscillations collectively.

## 3. Methodology

The goal in constructing synthetic time series from existing, observed SST data is to determine the important characteristics contained in the observed set and develop a theoretical model function from which those data could have been produced. We approach the model development by first determining the number of principle components that need to be modeled based on what meaningful information is contained in the probability structure of the amplitude spectra. We then determine mathematical functions that describe both the amplitude and phase spectra in the frequency domain for those EOFs that contain signal information; that is, they contain information of SST fluctuations for dominant physical processes in this region. These are called *significant EOFs* or correspondingly, *significant principle components* (PCs), and are determined, in part, by using methods of EOF truncation.

One common method of EOF truncation assumes that the components that remain distinguishable from noise will possess a steep slope relative to adjacent eigenvalues when plotted on a Scree graph (Fig. 3). The eigenvalues that produce a nearly horizontal line on this graph are most likely components of uncorrelated noise, and they do not represent relevant signals in the data (Wilks 1995). The Scree graph for Pacific equatorial SSTs implies a truncation point at the second principle component.

Another method of truncation determines which modes describe the selected physical processes discussed in section 2 by analyzing the Fourier amplitude functions of each PC. The SST fluctuations associated with these physical processes are identified as peak values in the amplitude function at the corresponding frequencies. Recall from section 2 that the ENSO exists in the amplitude function of PC1, while the ENSO, biennial, and decadal variabilities are found in PC2. No other large amplitude peaks are identified as recognizable and dominant physical processes in the amplitude functions of PCs 3 and higher. Therefore, the first and second components are categorized as *significant PCs* because their spectra contain peaks attributable to physical processes. This result is consistent with the Scree truncation.

A Kolmogrov–Smirnov (K-S) goodness of fit test is applied to the spectra of PCs 3–11 to determine whether or not the remaining PCs can be considered random noise. The range 3–11 is selected because the slope between the eigenvalues on the Scree diagram does not appear to be horizontal, and thus may not simply represent uncorrelated noise. The K-S test compares the magnitude of the maximum difference between the integrated spectrum of the data and that of purely random noise (Priestly 1981). A confidence limit is selected a priori. The null hypothesis states that the sample cumulative distribution function (cdf) of the data will not be distinguishable from the theoretical cdf of a random process. Principle components 3–11 *are* distinguishable from white noise at the 95% confidence level. Thus, PCs 3–11 are not simply uncorrelated noise. The spectra contain much larger amplitude peaks at lower frequencies than the peaks at higher frequencies. We deduce that some amount of temporal correlation exists in PCs 3–11; however, the amplitude peaks at low frequency cannot be attributed to any obvious and well-described properties. Thus, we have selected the term *red noise* to describe that portion of the temporal fluctuation that is assumed to be neither deterministic nor random. Red noise is selected because it is a common term describing correlated noise that is characterized by a spectrum containing larger variance at lower frequencies. For example, the spectra of autoregressive processes with coefficients greater than zero are known as red noise processes (Wilks 1995).

We expect the spatial and temporal variations in the data associated with EOFs of low percent variances to be purely random. Therefore, PCs 12–480 represent white noise. These PCs constitute only 15% of the total variance in the system. Individually, their observed variances are less than or equal to 2.0°C^{2}, with an average observed variance of only 0.04°C^{2}. Additionally, on the Scree diagram the lines become most nearly horizontal at PCs 12 and higher. For these reasons, we assume that random numbers, rescaled by equipartitioned original variance can represent the random fluctuations in both space and time for PCs 12–480.

A distinction has been made, for the purpose of EOF truncation, between the portions of the amplitude functions that represent select physical processes, correlated noise, and stochastic noise. Thus, we will develop mathematical functions to denote the three separate amplitude models: a *deterministic amplitude model,* a *red noise model,* and a *white noise model* to correspond to these three distinctions, respectively.

### a. White noise amplitude model

*f*

_{c}to the Nyquist frequency, 1/(2 months). The value

*f*

_{c}is that frequency between low-frequency red noise or deterministic (physical) signal and higher frequencies in the amplitude function. The determination of the frequency dividing correlated and uncorrelated information is chosen subjectively from the spectrum of each PC. We select

*f*

_{c}to be large enough so that we do not model relevant physical information as uncorrelated noise. Let

*M*represent the sample mean of the amplitude values from the original spectrum in this defined white noise frequency range. Then, the white noise model is defined over all frequencies such that

*i*is the frequency index. From this model, the time domain white noise variance can be computed. The remaining variance is attributed to either red noise processes or deterministic processes.

### b. Red noise amplitude model

*X*over time,

*K*denotes a chosen maximum lag, from

*k*= 0, 1, 2, 3, . . . ,

*K,*and

*ϕ*(

*k*) are the autoregressive coefficients.

*ϕ.*We are interested in the approximate correlation of

*only*the red noise over time. The sample autocorrelation functions of the PCs cannot be used, as they contain information from the deterministic processes as well as red noise. Thus, we select an indicative red noise autocorrelation function that drops off exponentially such that

*k*= 0, 1, 2, 3, . . . ,

*K*represents the lag, and

*γ*represents a reference lag. Such an estimate of the autocorrelation function represents a temporal correlation for the first few lags and approaches zero correlation at higher lags. The autoregression coefficients are determined from the autocorrelation function by solving the Yule–Walker equations for all lags (Wilks 1995). The matrix equation,

**R**

**Φ**

**r**

*ϕ,*are determined for all lags (

*k*= 0, 1, 2, 3,. . . ,

*K*), and they are used to fit an autoregressive model, initially in the time domain. The Fourier transform of the autoregressive model represents a preliminary red noise amplitude model in the frequency domain.

The red noise model must be scaled to match the time domain variance of the red noise processes being modeled. Thus, we need to quantify the time domain variance of the red noise process for each PC. To do this, we developed a method called the *overlap technique.* The modeled red noise can be scaled to the peak value of the original amplitude function in the low frequency range 0.0–0.007 cpm.

The time domain variance of this model represents an upper-bound guess for red noise, *σ*^{2}_{R}_{max}

*σ*

^{2}

_{R}

*σ*

^{2}

_{Rmin}

*σ*

^{2}

_{Rmax}

*σ*

^{2}

_{R}

*σ*

^{2}

_{det}

*σ*

^{2}

_{total}

*σ*

^{2}

_{R}

*σ*

^{2}

_{w}

*σ*

^{2}

_{w}

*σ*

^{2}

_{total}

### c. Deterministic amplitude model

The amplitude peaks that occur at or near frequencies of the select physical processes will be modeled with rescaled probability density functions (PDFs) in the frequency domain (Fig. 5). PDFs are selected because the area under the PDF curve provides meaningful probability information for each frequency (Bendat and Piersol 1986). Thus, each of the select physical processes are considered to be aperiodic and are best represented as deterministic distributions about their peak occurrence frequency. A good example is the well-known range of 2–7 yr for the return interval of an ENSO event. An amplitude peak for ENSO would reflect the *distribution* of occurrence probability for each frequency in this range, rather than at a single frequency. We argue that such an amplitude distribution over the frequency range for physical processes is deterministic.

The three PDFs that will be used have been determined to closely fit the shape of the distribution about the frequency of the amplitude peak. The Gaussian PDF is used, as are the Rayleigh and the Maxwell PDFs. The latter two distributions are specific cases of the more general Wiebull PDF.

*F*(

*x*) represents the distribution and Δ

*x*is the sampling interval. Equation (13) reiterates that the area under the PDF is a probability. Moreover, the shape of the distribution of probability over a given frequency range marks the important difference between these PDFs.

### d. Specific model development

To construct the model for each individual PC, we will utilize the aforementioned amplitude models as needed to represent the probability distribution over the frequencies of select physical processes. Further, we compute the total time domain sample variance directly from the original time series of each PC. The corresponding modeled time series must retain this variance.

For the significant principle components, the modeled amplitude function is comprised of a root-mean-square sum of all three amplitude models. Much of the significant variability occurs in the frequency range 0.0 to 0.1 month^{−1}, which corresponds to timescales longer than 10 months. We assume that all three portions of the amplitude model overlap in this range, and contribute to the larger variance at low frequencies. Physically, this assumption could imply an interaction between deterministic processes, linearly correlated noise, and purely random noise.

*f*

_{o}is the Gaussian peak frequency. Moreover, the ENSO signal also has a peak at the 0.0353 frequency (2.4 yr). However, the rate at which the amplitude peak drops off with increasing frequency is most closely reproduced with a rescaled Maxwell PDF:

*f*are the frequencies and

*c*is a constant given by

*f*

_{oM}

*f*

_{c}to

*f*

_{Nyq}methodology. Since no deterministic amplitudes contribute to these components, the low frequency peaks are modeled using the red noise model methodology. However, the determination of the red noise variance is simply

*σ*

^{2}

_{R}

*σ*

^{2}

_{total}

*σ*

^{2}

_{w}

*a*

_{i}and

*b*

_{i}such that

*π, π*), and are antisymmetric about the zero frequency, so

*θ*(

*f*) =

*θ*(−

*f*). No obvious correlation between adjacent phases was found in these spectra. Thus, the final model phases are justifiably random numbers in this interval. The model phase function is

*e*

^{jθ(f)}. Thus, in the frequency domain we multiply our model amplitude by our modeled phases for each PC. The inverse Fourier transform for each yields a synthetic time series. These are the modeled temporal components.

Finally, a model is needed to represent the random spatial and temporal SST fluctuations of PCs 12–480. A single model PC can represent the necessary random fluctuations in both space and time and can be rescaled to match the *observed variance* of all 469 remaining components. The observed variance is obtained from the sum of the squared spatial patterns of all remaining EOFs. Thus, we construct a single model spatial pattern that consists of Gaussian random noise with equipartitioned observed standard deviation over each point in our spatial domain. By simply generating Gaussian random numbers and rescaling by the observed standard deviation for all grid points in the domain for all 480 months, a final modeled EOF is produced.

We are now able to produce a complete dataset of synthetic SST anomalies for the equatorial Pacific by projecting the modeled temporal components on the associated spatial patterns. By taking the product of the spatial and temporal parts of the first 11 modeled PCs, we project the modeled temporal variation onto the original spatial patterns. The true standard deviation is now modeled in the first 11 respective EOFs. By adding the final modeled EOF to EOFs 1–11, we have constructed a complete synthetic dataset for the same spatial domain and temporal extent as the Smith et al. (1996) reconstructed SST anomalies.

## 4. Results

The modeled significant PCs are compared to the corresponding PCs from the Smith et al. (1996) data to validate the success of the individual component models. Then, a sample of complete synthetic datasets are compared to the complete Smith et al. (1996) dataset. The ENSO SST signature for the JMA domain will be used as a standard of comparison. The frequencies of ENSO events are reproduced. Moreover, the average number of respective warm and cold events are retained in the synthetic data for a 40-yr period. Statistical inference is made regarding the expected return period of an extreme ENSO event. The extreme ENSO warm event reaching an SST anomaly magnitude of 2°C occurs more frequently than an equivalent magnitude cold event.

### a. Comparison of model PCs with original PCs

The model amplitude functions for PCs 1 and 2 are analytical approximations to the original amplitude functions that represent a good fit to the relevant spectral peaks and to the overall distribution of red and white noise amplitudes over frequency. The model for PC1 (Fig. 6) represents a good fit to the amplitude distribution corresponding to the ENSO pseudoperiodicities, which constitute the largest percent variance for all ENSO events in the Smith et al. (1996) data. Moreover, the largest amount of variance in the spectrum is modeled with deterministic ENSO peaks, while much less variance is attributed to the red and white noise processes. The model for the spectrum of PC2 (Fig. 7) captures the amplitude distributions of the decadal and biennial pseudoperiodicities, as well as the remaining ENSO pseudoperiodicity. The ENSO peak in PC2 constitutes much less variance than the ENSO peaks of PC1. The red noise peaks, in both PC1 and PC2, give accurate variance contribution at frequencies smaller than 0.007 cpm, which could conceivably represent the leakage of longer-scale signals into the spectra of the 40-yr set.

The statistics of the original data are retained for each synthetic time series generated from this method. The modeled PCs retain the normalization criteria given by Eq. (3) in section 2, and the final reprojected EOFs retain the original variances. The red, white, and deterministic amplitude models are defined to be zero at *f* = 0; the frequency domain equivalent of a zero temporal mean. Thus, any small, nonzero mean resulting from the computation of anomalies from climatology data is removed for the model development process and added in at the end.

The autocorrelation function is an important statistic of the realization, as it represents the dependence of the value of the data at one instant with the value separated by an interval (or lag), *τ.* Periodicities occurring in the time series are manifest as sinusoidal fluctuations with increasing lag in the autocorrelation function. Both modeled components closely reproduce the temporal correlation of the Smith et al. (1996) data out to approximately 100 lags (Figs. 8 and 9). The modeled and original autocorrelation functions (ACFs) for PC1 contain the ENSO pseudoperiodicities at lags of 40–60 months (3–6 yr) (Fig. 8). Moreover, the Smith et al. (1996) ACF for PC1 falls within two standard deviations of the mean modeled ACF for PC1. Likewise, the autocorrelation function for PC2 shows pseudoperiodicity, with peak correlations associated with the decadal, ENSO, and biennial oscillations at lags of 20–30 months (1.6–2.5 yr), 50 months (4 yr), and 65 months (5.4 yr) (Fig. 9). The Smith et al. (1996) ACF for PC2 also falls within two standard deviations of the mean modeled ACF for PC2.

Thus, by determining an analytical approximation to the amplitude spectra of PCs 1 and 2, we have constructed a theoretical amplitude function from which the observed data can be derived, and from which statistically consistent synthetic data are derived. We have developed, via spectral analysis, a useful variance decomposition for the original amplitude spectrum into two or three distinct amplitude functions for each PC being modeled. As a result, we have gained a much better understanding of the possible physical mechanisms that compose the amplitude structure and what amount of the variability is likely attributable to noise—either correlated or uncorrelated.

### b. Complete dataset comparison

A consistent point of comparison is necessary to determine the success of the model in producing synthetic data that is a valid reproduction of the original data. Ten independent synthetic SST time series are produced, and the criteria for the JMA index is applied.

The JMA index is computed as a 5-month running mean of spatially averaged SST anomalies for the region 4°S–4°N and 150°–90°W. From this index, an ENSO warm event is identified by having an SST anomaly value greater than or equal to 0.5°C for a minimum of six consecutive months (Japan Meteorological Agency 1991). Since the synthetic data do not necessarily correspond to particular months or years, we do not include the JMA criteria that October–December be three of the six months in the event. Following Sittel (1994), a symmetric definition will be utilized for the identification of ENSO cold events.

The Smith et al. (1996) JMA time series contains 11 warm events and 8 cold events. The average number of ENSO events from the JMA time series for the 10 runs of synthetic data are 10 warm and 7 cold events. A total of 95% of the mean number of events from the synthetic data will fall within the range of 9.4–11.0 warm events in a 40-yr period and between 6.3 and 7.5 cold events in a 40-yr period. Thus, the Smith et al. (1996) values for the mean number of events fall within the range of two standard deviations from the mean number of synthetic events. Moreover, the average durations of ENSO events for the Smith et al. (1996) data are 10 months for cold events and 13 months for warm events. In the synthetic data, the average duration of an ENSO event is 12 months for a cold event and 15 months for a warm event. Thus, the JMA time series for the synthetic data closely reproduces the ENSO frequency and duration of the original data.

The ENSO events are characterized by an east or west displacement of isotherms that disrupts the usual regions of intense tropical convection. In a warm event, the 28°C isotherm can be displaced to the central and eastern equatorial Pacific, and in an extreme case such as the 1982–83 El Niño, the isotherm can be displaced as far as 90°W (Fig. 10a). Tropical convection has been associated with SSTs equal to or in excess of 28°C (Enfield 1989). During a cold event, the displacement of the 28°C isotherm is toward the western equatorial Pacific. A comparison of the Smith et al. (1996) SST data to a single synthetic SST run illustrates that the truncations selected for the deterministic, red, and white noise accurately reproduce the balance of signal and noise (Fig. 10).

### c. Characteristics of future, unobserved ENSO events

When the magnitude, duration, and/or frequency of an event is of interest, an extreme value analysis is applicable. We utilize the maximum SST magnitude per event to constitute our sample of extreme values. A representative distribution of extreme data is selected.

The extreme value analysis is done on the JMA indices for all 10 sets of synthetic data. The method is applied to a set of extreme values selected to be the maximum SST anomaly magnitude per event over all 10 sets. Separate samples of extreme values are determined for warm events and cold events. By plotting the magnitude of the extreme values in ascending order on a log–log plot, information regarding the return period of an event with a given magnitude can be determined (Gumbel 1958). For a warm event, an SST magnitude of 2°C can be expected to occur approximately every eight warm events (Fig. 11a). For a cold event, an SST magnitude of 1.8°C can be expected to occur approximately every eight cold events (Fig. 11b).

According to Enfield and Cid S. (1991), the most probable return period for the ENSO warm event is 3–4 yr, whereas that of the strong and very strong category events (Quinn and Neal 1987) is 9–12 yr. We find that events with SST anomaly magnitudes of 1.4°C occur in the range of 9–12 yr, if we assume an ENSO return period of 3–4 yr (Fig. 11a). Lau (1985) utilizes statistical methods to determine the average return period of super ENSOs to be every 30–40 yr. A warm event with a maximum SST anomaly magnitude of 2.0°C is expected to occur every 7–10 event, corresponding to a range of 30–40 yr for an ENSO return period of approximately 4 yr. For a cold event, the value of 1.7°C will occur at approximately 30–40 yr, for an ENSO return period of 4 yr.

## 5. Conclusions

The principle components of the Smith et al. (1996) reconstructed SST dataset are used to determine the dominant physical processes in SST anomaly fluctuations via spectral analysis. An EOF analysis of the Smith et al. (1996) data determines and ranks the spatial and temporal modes. The dominant physical processes are identified by their spectral peaks and the percent variance contribution. PC1 describes 46% of the total variance, and contains the ENSO pseudoperiodicities occurring on timescales between 2.4 and 6.5 yr. PC2 contains the ENSO (3.6 yr), biennial, and decadal pseudoperiodicities as large amplitude peaks. PCs 3–11 contain correlated red noise and band-limited white noise. Therefore, the amplitude functions can be divided into three distinct parts, based on the results of the spectral analysis and the determination of the significant EOFs. The deterministic, red noise, and white noise amplitude models are the separate analytical functions developed to approximate the parts of the spectra for the principle components via the methodology described in section 3. Random phases are applied to the spectral amplitude models, which permits the generation of numerous statistically indistinguishable SST anomaly datasets.

The time series generating method is useful for producing a large sample of SST anomaly data for statistical testing. By retaining the statistics of the observed data (specifically the ACF), it is possible to make valid statistical inferences about processes such as the ENSO. In section 4, we found that the return period of an extreme ENSO warm event having a maximum SST anomaly magnitude of 2°C occurs approximately every eight warm events.

Further use for these synthetic data could be as forcing input in coupled ocean–atmospheric or atmospheric numerical models. The envelope of atmospheric response to various SST anomaly forcing associated with the ENSO and other pseudoperiodicities can be understood by combining the work of this study and the current modeling studies of coupled air–sea interaction.

## Acknowledgments

We would like to thank the scientists and staff at the Center for Ocean–Atmospheric Prediction Studies (COAPS), particularily Mark Bourassa and Mark Verschell for their input and expertise. We would also like to acknowledge the support of this research by the Department of Defense Grant N00014-93-1-1132 (JMC) and the base support for COAPS by the Physical Oceanography section of the Office of Naval Research.

## REFERENCES

Allan, R., J. Lindesay, and D. Parker, 1996:

*El Niño Southern Oscillation and Climatic Variability.*CSIRO, 405 pp.Barnett, T., 1989: A solar-ocean relation: Fact or fiction?

*Geophys. Res. Lett.,***16,**803–806.Bendat, J. S., and A. G. Piersol, 1986:

*Random Data Analysis and Measurement Procedures.*Wiley-Interscience, 566 pp.Elsner, J. B., and A. A. Tsonis, 1993: Nonlinear dynamics established in the ENSO.

*Geophys. Res. Lett.,***20,**213–216.Enfield, D. B., 1989: El Niño past and present.

*Rev. Geophys.,***27,**159–187.——, and L. Cid S., 1991: Low-frequency changes in El Niño–Southern Oscillation.

*J. Climate,***4,**1137–1146.Gumbel, E. J., 1958:

*Statistics of Extremes.*Columbia University Press, 375 pp.IOC, 1992:

*Oceanic Interdecadal Climate Variability.*IOC Tech. Series 40, UNESCO, 40pp. [Available from NOAA Central Library, IOC Depository for USA, 1315 East–West Highway, 2d floor, SS MC3, Silver Spring, MD 20910.].Japan Meteorological Agency, 1991: Climate charts of sea surface temperatures of the western north Pacific and the global ocean. Japan Meteorological Agency, Tokyo, Japan, 51 pp. [Available from Japan Meteorological Agency, 3-5 Otemachi-1, Chiyoda-Ku, Tokyo 100, Japan.].

Lau, K.-M., 1985: Elements of stochastic–dynamical theory of the long-term variability of the El Niño–Southern Oscillation.

*J. Atmos. Sci.,***42,**1552–1558.Philander, S. G., 1990:

*El Niño, La Niña, and the Southern Oscillation.*Academic Press, 299 pp.Priestly, M., 1981:

*Spectral Analysis and Time Series.*Academic Press, 653 pp.Quinn, W. H., and V. T. Neal, 1987: El Niño occurrences over the past four and a half centuries.

*J. Geophys. Res.,***92**(C13), 14 449–14 461.Reynolds, R. W., and T. Smith, 1994: Improved global sea surface temperature analyses using optimum interpolation.

*J. Climate,***7,**929–948.Salby, M. L., 1996:

*Fundamentals of Atmospheric Physics.*Academic Press, 627 pp.Sittel, M. C., 1994: Marginal probabilities of the extremes of ENSO events for temperature and precipitation in the southeastern United States. Tech. Rep. 94-1, Center for Ocean Atmospheric Prediction Studies, The Florida State University, 55 pp. [Available from COAPS, 2135 East Paul Dirac Drive, R. M. Johnson Bldg., Tallahassee, FL 32310.].

Slutz, R., S. J. Lubker, J. Hiscox, S. Woodruff, R. Jenne, D. Joseph, P. Steurer, and J. D. Elms, 1985:

*COADS, Comprehensive Ocean-Atmosphere Data Set. Release 1.*NOAA/Environmental Research Laboratories, Climate Research Program, Boulder, CO, 268 pp. [Available from NOAA-CIRES, Climate Diagnostics Center, 325 Broadway, Boulder, CO 80303.].Smith, T. M., R. W. Reynolds, R. E. Livezey, and D. C. Stokes, 1996:Reconstruction of historical sea surface temperatures using empirical orthogonal functions.

*J. Climate,***9,**1403–1420.Theiler, J. S., A. Longtin, B. Galdrikian, and J. Farmer, 1992: Testing for nonlinearity in time series: The method of surrogate data.

*Physica D,***58,**77–94.Thompson, J. D., and J. J. O’Brien, 1973: Time-dependent coastal upwelling.

*J. Phys. Oceanogr.,***3,**33–46.Wilks, D. S., 1995:

*Statistical Methods in the Atmospheric Sciences.*Academic Press, 467pp.Yarnal, B., 1985: Extratropical teleconnections with El Niño–Southern Oscillation (ENSO) events.

*Prog. Phys. Geogr.,***9,**315–352.

## APPENDIX

### Red Noise Model Parameters for PCs 1 through 11

Given Eq. (6) in section 3b, a separate value for the cutoff lag Γ is needed for each PC to determine the sample ACF corresponding to the red noise portion of the amplitude spectrum. The value selected depends on the magnitude of the amplitude peak(s) at frequencies less than 0.007 cpm, as well as the rate of drop-off of the low-frequency amplitudes. Moreover, a value *X*_{o} is needed to serve as a first guess to the autoregressive time series model. The time domain variance changes as the value *X*_{o} changes, and thus it is essentially a *scale parameter* for the red noise model. Both *X*_{o} and Γ have been selected for PCs 1–11 as shape and scale parameters for the respective red noise amplitude models (Table A1).

Plot of the amplitude spectrum of principle component 2. Note the amplitude peaks that represent decadal *f* ∼ 0.008 cpm (∼10 yr), intradecadal *f* ∼ 0.023 cpm (∼3.6 yr), and biennial *f* ∼ 0.04 cpm (∼2 yr) pseudoperiodicities.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Plot of the amplitude spectrum of principle component 2. Note the amplitude peaks that represent decadal *f* ∼ 0.008 cpm (∼10 yr), intradecadal *f* ∼ 0.023 cpm (∼3.6 yr), and biennial *f* ∼ 0.04 cpm (∼2 yr) pseudoperiodicities.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Plot of the amplitude spectrum of principle component 2. Note the amplitude peaks that represent decadal *f* ∼ 0.008 cpm (∼10 yr), intradecadal *f* ∼ 0.023 cpm (∼3.6 yr), and biennial *f* ∼ 0.04 cpm (∼2 yr) pseudoperiodicities.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Plot of the eigenvalues [representing variance contribution (°C^{2})] as a function of eigenvalue rank. The slope of the line between the eigenvalues is related to the amount of signal information in those EOFs. The eigenvalue is known as the *observed variance* of the component.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Plot of the eigenvalues [representing variance contribution (°C^{2})] as a function of eigenvalue rank. The slope of the line between the eigenvalues is related to the amount of signal information in those EOFs. The eigenvalue is known as the *observed variance* of the component.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Plot of the eigenvalues [representing variance contribution (°C^{2})] as a function of eigenvalue rank. The slope of the line between the eigenvalues is related to the amount of signal information in those EOFs. The eigenvalue is known as the *observed variance* of the component.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Amplitude spectrum for the first principle component with modeled amplitude spectrum overlay. The solid line is the amplitude spectrum for the original PC1. The dotted line is the modeled amplitude spectrum.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Amplitude spectrum for the first principle component with modeled amplitude spectrum overlay. The solid line is the amplitude spectrum for the original PC1. The dotted line is the modeled amplitude spectrum.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Amplitude spectrum for the first principle component with modeled amplitude spectrum overlay. The solid line is the amplitude spectrum for the original PC1. The dotted line is the modeled amplitude spectrum.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Amplitude spectrum for the second principle component with modeled amplitude spectrum overlay. The solid line is the amplitude spectrum for the original PC1. The dotted line is the modeled amplitude spectrum.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Amplitude spectrum for the second principle component with modeled amplitude spectrum overlay. The solid line is the amplitude spectrum for the original PC1. The dotted line is the modeled amplitude spectrum.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Amplitude spectrum for the second principle component with modeled amplitude spectrum overlay. The solid line is the amplitude spectrum for the original PC1. The dotted line is the modeled amplitude spectrum.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

The shaded region is an example overlap region for the maximum possible red noise variance and the corresponding deterministic amplitude model. The time domain variance of the overlap region is Δ*σ*^{2}_{R}

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

The shaded region is an example overlap region for the maximum possible red noise variance and the corresponding deterministic amplitude model. The time domain variance of the overlap region is Δ*σ*^{2}_{R}

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

The shaded region is an example overlap region for the maximum possible red noise variance and the corresponding deterministic amplitude model. The time domain variance of the overlap region is Δ*σ*^{2}_{R}

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Example Rayleigh and Maxwell PDFs. The Maxwell distribution has a sharper peak and drops off more quickly than the Rayleigh.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Example Rayleigh and Maxwell PDFs. The Maxwell distribution has a sharper peak and drops off more quickly than the Rayleigh.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Example Rayleigh and Maxwell PDFs. The Maxwell distribution has a sharper peak and drops off more quickly than the Rayleigh.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Mean modeled autocorrelation function for 10 model runs of the first principle component (solid line). The dashed lines denote the bounds of two standard errors above and below the mean. The dash–dot line is the autocorrelation function for PC1 of the Reynolds data.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Mean modeled autocorrelation function for 10 model runs of the first principle component (solid line). The dashed lines denote the bounds of two standard errors above and below the mean. The dash–dot line is the autocorrelation function for PC1 of the Reynolds data.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Mean modeled autocorrelation function for 10 model runs of the first principle component (solid line). The dashed lines denote the bounds of two standard errors above and below the mean. The dash–dot line is the autocorrelation function for PC1 of the Reynolds data.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Mean modeled autocorrelation function for 10 model runs of the second principle component (solid line). The dashed lines denote the bounds of two standard errors above and below the mean. The dash–dot line is the autocorrelation function for PC2 of the Reynolds data.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Mean modeled autocorrelation function for 10 model runs of the second principle component (solid line). The dashed lines denote the bounds of two standard errors above and below the mean. The dash–dot line is the autocorrelation function for PC2 of the Reynolds data.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Mean modeled autocorrelation function for 10 model runs of the second principle component (solid line). The dashed lines denote the bounds of two standard errors above and below the mean. The dash–dot line is the autocorrelation function for PC2 of the Reynolds data.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

(a) Hovmöller diagram of Reynolds SSTs (in degrees Celsius) along the equator. (b) Hovmöller diagram of synthetic SSTs (in degrees Celsius) along the equator.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

(a) Hovmöller diagram of Reynolds SSTs (in degrees Celsius) along the equator. (b) Hovmöller diagram of synthetic SSTs (in degrees Celsius) along the equator.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

(a) Hovmöller diagram of Reynolds SSTs (in degrees Celsius) along the equator. (b) Hovmöller diagram of synthetic SSTs (in degrees Celsius) along the equator.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

(a) Plot of the return period of a given SST anomaly magnitude per warm event. (b) Plot of the return period of a given SST anomaly magnitude per cold event.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

(a) Plot of the return period of a given SST anomaly magnitude per warm event. (b) Plot of the return period of a given SST anomaly magnitude per cold event.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

(a) Plot of the return period of a given SST anomaly magnitude per warm event. (b) Plot of the return period of a given SST anomaly magnitude per cold event.

Citation: Monthly Weather Review 126, 11; 10.1175/1520-0493(1998)126<2809:TGOSSS>2.0.CO;2

Table A1. The final values for the selected lag, gamma, and for the initial value of the autoregression, *Xo*.