## 1. Introduction

Difficulties in interpretation of climatic time series center on two issues. The first is recognizing multidecadal variability from relatively short time series of a century or perhaps a little longer. The second is isolating a low-frequency signal from a time series with large interannual or several year variability. For example, only 37% of the winter interannual variance of the Aleutian low sea level pressure time series is on timescales greater than 5 yr (Overland et al. 1999). There is a current interest in decadal and multidecadal variability of Northern Hemisphere time series. One facet of this interest is that possible climate change will occur as an increase in amplitude or a persistent phase of ongoing large-scale atmospheric variability (Palmer 1999). An additional interest is the impact of low-frequency variability on ecosystems. Many species such as salmon are adapted to interannual variability, but are strongly modulated by interdecadal changes (Mantua et al. 1997). Because of the age structure of marine organisms, Hare and Mantua (2000) in fact conclude that monitoring North Pacific ecosystems might allow for an earlier identification in sign changes than is possible from monitoring climate data alone. It is often difficult to establish a statistical superiority of one model over other models due to the relatively short time series. Even though we are on less certain ground, additional information may be useful in providing a preference of one model among competing models (von Storch and Zwiers 1999).

Because of these limitations, it is likely that the true nature of North Pacific time series of climate processes will remain unknown for a long time to come. In such a situation it is important to understand the potential consequences of model choices for interpreting the underlying process. For example, Minobe (1999) suggests bidecadal and pentadecadal oscillations for the Aleutian low sea level pressure time series. A second concept is regimes where objective change point techniques suggest significantly different sections in the time series, for example, before and after 1977 (Overland et al. 1999; Hare and Mantua 2000). A chaotic model suggests rapid changes at transition points but an eventual return to the vicinity of previous locations in state space (Overland et al. 2000). A purely stochastic white and red noise view is suggested by Pierce (2001) for the North Pacific and Wunsch (1999) for the North Atlantic. An assumption of a white noise model is that energy is cascading to all frequencies in equal amounts. As Wunsch and Pierce point out, there is by definition considerable low-frequency energy in a white noise process. What is of interest in interpreting climate time series is whether there is an enhanced contribution to the variance at low frequencies. This enhancement might suggest a physical feedback process, such as air–ocean interaction; this would permit some measure, however small, of enhanced predictability.

This paper investigates the influence of model choice in representing North Pacific atmospheric processes. In the North Pacific region, there are distinctly different weather and teleconnection patterns between winter and summer. In this paper we concentrate on the winter regime, which is defined here as the average Aleutian low sea level pressure field over the 5-month period from November to March (Trenberth and Paolino 1980). This time series is referred to as the North Pacific (NP) index. The NP index is a surface index associated with the Pacific–North American (PNA) pattern of hemispheric variability in the troposphere. The NP index also serves as a measure of atmospheric forcing of North Pacific Ocean variability.

We investigate whether the NP index has characteristics of a long-memory process, which is a broadband process that exhibits a persistent dependence between distant observations (Beran 1994). A default stochastic model suggested by several authors (von Storch and Zwiers 1999) is the autoregressive moving average (ARMA) process. We contrast a simple ARMA model, namely, a first-order autoregressive [AR(1)] model with a simple long-memory model, namely, a fractionally differenced (FD) process. Both are fit to the NP index. By inspecting various diagnostic statistics, we conclude that the AR(1) and FD models are equally viable for the North Pacific and that, given the amount of available data, we cannot expect to be able to distinguish between the two models. While the two models have similar behavior for short-term prediction, the FD model suggests different zero crossing behavior. The AR(1) and FD models also lead to different physical models for the NP index. While an AR(1) model corresponds to a discretized first-order differential equation with a single damping constant, an FD model corresponds to an aggregation of such equations involving many different damping constants (Beran 1994). This suggests that the NP index is influenced by several physical processes, with damping constants spanning a range of temporal scales. Long-memory behavior was encountered in hydrologic studies in the 1950s for Nile River flow and is known in that literature as the Hurst effect. Long memory is also suggested for soil fertility and detrended global temperature anomalies. These processes no doubt include multiple factors.

We present our analysis of the NP index in section 2, after which we compare this analysis with a similar one for a somewhat longer North Pacific time series, namely, a temperature record from Sitka, Alaska (section 3). We discuss the implications of the two stochastic models applied to the North Pacific in section 4 in terms of run lengths, which could serve as one definition for regimelike behavior. We state our conclusions in section 5.

## 2. Statistical models for the NP index

In this section we consider two Gaussian stationary models for the NP index. This time series consists of *N* = 100 annual values for the years 1900–99 (Fig. 1); an examination of the empirical quantiles of the series versus the corresponding theoretical quantiles for a Gaussian distribution indicates that the Gaussian assumption is reasonable (Chambers et al. 1983). The two models we consider are an AR(1) process and an FD process, both of which are completely determined by three parameters and hence can be regarded as being equally simple. For each process, one of the parameters is its expected value, and another effectively controls the shape of the spectral density function (SDF) and the corresponding autocovariance sequence (ACVS). The final parameter merely adjusts the levels (heights) of the SDF and ACVS. The essential difference between these two models is that an AR(1) process postulates a rapidly decaying ACVS whereas the ACVS for an FD process decays much more slowly. This qualitative difference is what is meant in the literature when an AR(1) process is said to have “short memory” whereas a comparable FD model is said to have “long memory.”

In what follows, we first define each model and outline a procedure for estimating the model parameters from the NP index, from which we learn that the autocorrelation in this series is fairly weak overall. We then consider some goodness-of-fit tests (model diagnostics) for both short- and long-memory models, from which we conclude that both models are quite reasonable for the NP index and that, from a statistical point of view, each model fits equally well. Next we explore how well we can expect to discriminate between short- and long-memory models, given a time series with the length and characteristics of the NP index. We conclude that, since the overall autocorrelation in the NP index is weak, we would need much more than *N* = 100 yr of data to be able to reject a short-memory model if in fact the NP index were a realization of an FD process (and vice versa).

### a. A short-memory model for the NP index

*X*

_{0},

*X*

_{1}, … ,

*X*

_{N−1}of a stationary Gaussian AR(1) process; that is, we assume

*X*

_{t}

*μ*

_{X}

*ϕ*

*X*

_{t−1}

*μ*

_{X}

_{t}

*μ*

_{X}≡

*E*{

*X*

_{t}} and |

*ϕ*| < 1, while {ϵ

_{t}} is a Gaussian white noise process with mean zero and variance

*σ*

^{2}

_{ϵ}

*X*

_{t}and the best linear predictor of

*X*

_{t}based upon prior values

*X*

_{t−1},

*X*

_{t−2}, …). This process has three parameters, namely,

*μ*

_{X},

*ϕ,*and

*σ*

^{2}

_{ϵ}

*S*

_{X}( · ) and its ACVS {

*s*

_{X,τ}}:When

*ϕ*= 0, the AR(1) process becomes a white noise process. We can thus assess the null hypothesis that the NP index is realization of a Gaussian white noise process by estimating

*ϕ*and then ascertaining whether or not the estimated value is significantly different from zero.

*μ*

_{X}using the sample mean

*X*

*N*Σ

*X*

_{t}, after which we recenter the series by forming

*X̃*

_{t}≡

*X*

_{t}−

*X*

*ϕ*and

*σ*

^{2}

_{ϵ}

*X̃*

_{t}≡

*ϕX̃*

_{t−1}+ ϵ

_{t}, that is, Eq. (1) with

*X̃*

_{t}replacing

*X*

_{t}and with

*μ*

_{X}set to zero. (A possible refinement is to use the ML approach to estimate

*μ*

_{X}along with

*ϕ*and

*σ*

^{2}

_{ϵ}

*μ*

_{X}—see section 8.2 of Beran 1994.) For completeness, details on the formulation of the ML estimators

*ϕ̂*

*σ̂*

^{2}

_{ϵ}

*ϕ*and

*σ*

^{2}

_{ϵ}

*ϕ̂*

*ϕ*

^{2})/

*N*and var{

*σ̂*

^{2}

_{ϵ}

*σ*

^{4}

_{ϵ}

*N.*Approximate 95% confidence intervals (CIs) for

*ϕ*and for

*σ*

_{ϵ}can thus be constructed using, respectively,A set of residuals (sometimes called observed innovations or observed prediction errors) that can be examined to evaluate the adequacy of the model is given by

*ϕ*and

*σ*

_{ϵ}and associated 95% CIs given in the upper third of Table 1. Note that the interval for

*ϕ*just barely misses zero, so there is evidence that the true

*ϕ*differs from zero at an observed level of significance of approximately 0.05. Since this result depends on the large sample approximation for the variance of

*ϕ̂*

*N*= 100 from an AR process with parameter

*ϕ*= 0.21 (i.e., the observed

*ϕ̂*

*ϕ̂*

*ϕ*in perfect agreement with the one stated in Table 1. The upper left-hand plot of Fig. 2 shows the theoretical autocorrelation sequence (ACS)

*ρ*

_{X,τ}≡

*s*

_{X,τ}/

*s*

_{X,0}for an AR(1) process with parameter

*ϕ̂*

*ρ̂*

_{τ}for the NP index (plotted as deviations from zero), whereAlso shown are upper and lower 95% CIs (thin curves) for the ACS under the assumption that the NP index is a realization of a white noise process (see Corollary 6.3.6.2 of Fuller 1996 for details). The lower left-hand plot shows an SDF estimate (thick smooth curve) obtained by substituting

*ϕ̂*

*σ̂*

^{2}

_{ϵ}

*Ŝ*(

*f*

_{k}) for the NP index (thin jagged curve); that is,where

*f*

_{k}≡

*k*/

*N,*1 ≤

*k*<

*N*/2. In the left-hand part of this plot is a confidence interval about a circle. If we move this interval such that the center of the circle is positioned at a particular

*Ŝ*(

*f*

_{k}), then we have a 95% confidence interval for the true SDF at frequency

*f*

_{k}[this interval is based upon the standard assumption that

*Ŝ*(

*f*

_{k}) is proportional to a chi-square random variable with 2 degrees of freedom]. Such an interval in fact traps the theoretical AR(1) SDF at all but 2 of the 50 Fourier frequencies, indicating that there are no serious discrepancies between the NP index and the fitted AR(1) model.

### b. A long-memory model for NP

*Y*

_{0},

*Y*

_{1}, … ,

*Y*

_{N−1}of a stationary Gaussian FD process. By definition such a process has a first moment given by

*μ*

_{Y}≡

*E*{

*Y*

_{t}} and an SDF and ACVF given bywhere

*σ*

^{2}

_{ε}

*δ*< 1/2 (Granger and Joyeux 1980; Hosking 1981). The parameter

*σ*

^{2}

_{ε}

*δ*causes an FD process to exhibit long memory when 0 <

*δ*< 1/2. At low frequencies, we have

*S*

_{Y}(

*f*) ≈

*σ*

^{2}

_{ε}

*πf*|

^{2δ}, so the SDF of an FD process is approximately proportional to a power law |

*f*|

^{α}with exponent

*α*= −2

*δ.*It follows from Eq. (7) and standard relationships for the Γ function (see, e.g., Abramowitz and Stegun 1964) that the variance of this process is given byand that, for

*τ*≥ 1, the ACVF can be computed recursively using the formulaWhen

*δ*= 0, the FD process becomes a white noise process, so we can assess the null hypothesis of white noise for the NP index based upon an estimate of

*δ.*

*τ,*the ACVS for an AR(1) model decreases exponentially, that is,

*s*

_{X,τ}∝

*ϕ*

^{|τ|}, whereas the ACVS for an FD model decays at a much slower rate since

*s*

_{Y,τ}∝ |

*τ*|

^{2δ−1}approximately for large

*τ.*For the AR(1) model, we can define a measure of the decorrelation time (or integral timescale) as(von Storch and Zwiers 1999). This measure is sometimes interpreted as “the time needed for any correlation between

*X*

_{t}and

*X*

_{t+τ}to die out” (Yaglom 1987); that is, the subseries

*X*

_{n⌈τD⌉}

*n*= … , −1, 0, 1, … , can be regarded as a reasonable approximation to a white noise process, where ⌈

*τ*

_{D}⌉ is the smallest integer greater than or equal to

*τ*

_{D}. If we attempt to define an analogous decorrelation time for an FD model, we find that

*τ*

_{D}= ∞ because the ACVS now decays so slowly that it does not sum to a finite value. Thus, whereas AR(1) models are associated with finite decorrelation times, FD models are not. A related qualitative difference between the two models is reflected in their SDFs. The SDF

*S*

_{X}( · ) for an AR(1) model is finite at

*f*= 0, has a slope of zero at

*f*= 0 and varies by less than approximately a factor of 2 when

*f*∈ [0, 1/(

*πτ*

_{D})], which says that an AR(1) model is an approximation to band-limited white noise with a cutoff frequency of 1/(

*πτ*

_{D}). By contrast, the SDF for an FD model increases unboundedly as

*f*decreases to zero, and hence band-limited white noise is not a good approximation for an FD model. Finally, we note that as

*ϕ*→ 1, an AR(1) model converges to a random walk, which is a nonstationary process that can be taken to have infinite variance and an SDF that is proportional to a power law with

*α*= −2. An AR(1) model with a

*ϕ*close to unity and an FD model thus both have SDFs that can be approximated by a power law, but we note two qualitative differences. First, the approximation

*S*

_{X}(

*f*) ∝ |

*f*|

^{−2}for an AR(1) model breaks down as

*f*decreases below 1/(2

*πτ*

_{D}), whereas the approximation

*S*

_{Y}(

*f*) ∝ |

*f*|

^{−2δ}for an FD model becomes better and better as

*f*→ 0. Second, the power-law exponent is fixed at

*α*= −2 for an AR(1) model no matter what the model parameters

*ϕ*and

*σ*

^{2}

_{ϵ}

*α*= −2

*δ.*The FD model thus offers more flexibility in accurately portraying the low-frequency properties of climatic time series that do not have the highly dispersive characteristics of a random walk (e.g., sample variances that increase unboundedly as the sample size

*N*increases).

*μ*

_{Y},

*δ,*and

*σ*

^{2}

_{ε}

*Ỹ*

_{t}≡

*Y*

_{t}−

*Y*

*δ*and

*σ*

^{2}

_{ε}

*δ̂*

*σ̂*

^{2}

_{ε}

*δ̂*

*σ̂*

^{2}

_{ε}

*δ̂*

*π*

^{2}

*N*) and var{

*σ̂*

^{2}

_{ε}

*σ*

^{4}

_{ε}

*N.*Approximate 95% CIs for

*δ*and

*σ*

_{ε}are given byAs is true when fitting an AR(1) model, the ML procedure leads to a set of residuals

_{t}that we can use to evaluate the adequacy of the fitted FD model (see appendix A for details).

Application of the ML procedure to the NP index yields the estimates and 95% CIs shown in Table 1. As was true for the AR(1) parameter *ϕ,* the interval for *δ* just barely misses zero, so evidently the true *δ* differs from zero at an observed level of significance of approximately 0.05. To ascertain the validity of this CI, we carried out a Monte Carlo experiment analogous to the one for the AR(1) case and obtained a CI of [0.01, 0.33] for *δ,* which is very close to the interval [0.02, 0.32] reported in Table 1 (details about how to create the simulated FD series are given in appendix B). The upper right-hand plot of Fig. 2 shows the theoretical ACS for an FD process with parameter *δ̂**δ̂**σ̂*^{2}_{ε}

### c. Goodness-of-fit tests for short- and long-memory models

Figure 2 indicates that, when we take their sampling variability into consideration, the sample ACS and the periodogram for the NP index are visually in reasonably good agreement with the corresponding theoretical quantities derived from the fitted AR(1) and FD models. A more quantitative approach for assessing the adequacy of these models is to consider four well-known goodness-of-fit test statistics. The results of one or more of these tests could in principle lead us to favor one model over the other. The first test statistic *T*_{1} is an SDF test that compares the periodogram for the NP index to the SDF corresponding to the fitted model. The remaining three test statistics make use of the residuals _{t} and _{t} obtained in the process of fitting the AR(1) and FD models. These tests are built using the concept that, if the proposed model is in fact correct, the residuals should be approximately a realization of a white noise process. The first such statistic *T*_{2} is the cumulative periodogram test, while the remaining test statistics *T*_{3} and *T*_{4} are variations on the portmanteau test, which looks at the squares of a small number of sample autocorrelations of the residuals (the sample ACSs for _{t} and _{t} are shown in, respectively, the left- and right-hand plots of Fig. 3). Details about *T*_{j}, *j* = 1, … , 4, are given in appendix C, but the manner in which we use each test statistic is quite similar. Thus, based upon a computed *T*_{j} and a predetermined significance level *α,* we can reject the null hypothesis that the NP index is a realization from one of the fitted models if *T*_{j} exceeds the (1 − *α*) × 100% percentage point *Q*_{j}(1 − *α*) for the statistic under the null hypothesis. If, for example, we let *α* = 0.05, we would incorrectly reject the null hypothesis about 5% of the occasions when in fact it is true. Alternatively, if we do not want to use a prespecified significance level, we can compute the observed critical level *α̂**α̂**α̂**α* (e.g., 0.05 or 0.01), we have no real reason to reject the null hypothesis.

Table 2 gives the results of applying the four goodness-of-fit tests to the AR(1) and FD models for the NP index (in keeping with recommendations in the literature, we set *K* = *N*/20 = 5 when computing and evaluating *T*_{3} and *T*_{4}, but we obtained comparable results with *K* = 10). For the sake of comparison, we also used each test statistic on the NP index itself; that is, we entertained the null hypothesis that the NP index is a realization of a white noise (WN) process. None of the four tests rejects the null hypothesis at the 0.05 level of significance for either the AR(1) or FD models, and all reject the null hypothesis of white noise for the NP index itself. The observed critical levels *α̂**T*_{j}, which might suggest that the FD model is slightly better than the AR(1) model; however, at best this is quite weak evidence. The main conclusion we can draw from these tests is that the short- and long-memory models are quite comparable and that both models are to be preferred over a simple white noise model.

### d. Discriminating between short- and long-memory models

*ϕ̂*

*V*

_{t}, whose SDF is given byThe creation of

*V*

_{t}amounts to subjecting an FD process to a prewhitening filter in which the filter is in fact appropriate for an AR(1) process. If the fitted AR(1) and FD models are in fact comparable over a range of frequencies whose lower limit is approximately equal to the inverse of the total time span of the available data, we might expect a goodness-of-fit statistic to be unable to distinguish between

*V*

_{t}and a white noise process, but, by making

*N*sufficiently large, we can expect to make the distinction. Conversely, we can swap the roles of the AR(1) and FD processes in this exercise, leading us to a set of residuals that are a realization of a process, say

*W*

_{t}, with SDF given byAndersson (1998) has previously studied the implications of mismatching short- and long-memory models in the context of forecasting economic time series.

To determine how large *N* must be before we can reasonably expect to reject the null hypothesis that a fitted model is adequate when in fact the time series is a realization of a different process, we conducted a series of Monte Carlo experiments that yielded an estimate of the probability of rejecting the null hypothesis for sample sizes ranging from *N* = 400 up to *N* = 6000 (see appendix D for details). Figure 4 shows plots of the probability of rejection as a function of sample size under the two scenarios. The left-hand plot in this figure is for the case when we fit an AR(1) model to a realization of an FD process with parameters *δ̂**σ̂*^{2}_{ε}*N* = 2500, 1700, 500, and 500 when using, respectively, the *T*_{1}, *T*_{2}, *T*_{3}, and *T*_{4} test statistics. The right-hand plot shows that, when the roles of the AR(1) and FD processes are swapped, we would now need *N* = 750 when using the *T*_{2}, *T*_{3}, and *T*_{4} test statistics and an *N* in excess of 4000 for *T*_{1}. All of these sample sizes are considerably larger than the *N* = 100 values that make up the NP index, thus reinforcing the notion that, given the weak overall correlation that is exhibited by the NP index and given the amount of data that is available to us, we cannot hope to distinguish between short- and long-memory models.

## 3. Statistical models for Sitka air temperature

Let us now consider the same two statistical models for the Sitka, Alaska, winter air temperature time series (Fig. 5). This time series consists of 146 data values collected over a 168 period (1829–1996), so there are 22 yr for which there are no recorded values. Sitka lies in the eastern Gulf of Alaska. Winter temperature anomalies relate to changes in the wind field, with more southerly winds producing warm anomalies. These winds would respond in part to both the intensity and east/west location of the Aleutian low. For comparison, we fit AR(1) and FD models both to the original unequally sampled series and to an equally sampled version of the Sitka series formed by linearly interpolating values for the missing years. While the interpolated series can be handled using exactly the same ML estimation procedures as in the case of the NP index, the unequally sampled Sitka series requires an adaptation of these procedures that can deal with missing values (see appendix A for details). The resulting estimates and corresponding 95% confidence intervals for the uninterpolated and interpolated series are displayed in, respectively, the middle and bottom thirds of Table 1 [for the uninterpolated series, we used Monte Carlo experiments to verify that the CIs based upon Eqs. (3) and (10) with *N* = 146 are indeed accurate].

The estimated *ϕ* and *δ* parameters for the uninterpolated series are quite comparable to the estimates for the NP index. It is interesting to note that the estimates of *ϕ* for the NP index (0.21) and the Sitka series (0.18) are both close to a estimate of 0.15 for the North Atlantic oscillation (NAO) found by Stephenson et al. (2000); moreover, the corresponding estimates of *δ* (0.17 and 0.18, respectively) are both close the NAO estimate of 0.15 for this parameter in a fractionally integrated AR(1) model determined by those same authors. For the Sitka series the estimates of *ϕ* and *δ* for the interpolated series are somewhat higher, suggesting that a slightly stronger degree of autocorrelation has been artificially introduced by the interpolation procedure. As was done in Fig. 2 for the NP index, Fig. 6 shows the sample ACS and periodogram for the interpolated series, along with the theoretical ACSs and SDFs corresponding to the AR and FD processes that were fit to the uninterpolated series. Based upon these plots and the goodness-of-fit tests, we can conclude, as for the NP index, that the AR(1) and FD models are quite comparable for the Sitka series and that there is no statistical evidence to favor one model over the other.

## 4. Discussion

### a. Implications of short- versus long-memory models

Based upon the previous sections, there is no statistical reason to prefer an AR(1) process over an FD process as a model for the NP and Sitka series (or vice versa). Both processes depend upon three parameters, so both have the same degree of simplicity. We cannot thus appeal to the principle of Occram's razor here to make a case for one process over the other. Nonetheless, the fact that the two processes appear to describe both series equally well does not mean that there are not potentially important implications if we arbitrarily select one of these processes to model certain statistical properties of these series. As an illustration of this fact, here we consider the extent to which the two processes lend support to the notion of “regimes” in the NP index.

Loosely speaking, a regime is an interval of time during which a time series remains predominantly either above or below its long-term average value. To clarify this idea, let us consider Fig. 1 that shows the NP index (thin curve) along with a 5-yr running average of the index (thick curve) and a horizontal line indicating the sample mean of the entire series (1009.8). In the NP index itself there appear to be intervals over which the index is predominantly above its sample mean. For example, from 1901 to 1923, all of the NP values were above the sample mean with the exception of the ones for 1905 and 1919. This stretch of 23 yr would constitute a positive regime and is clearly identified in the 5-yr running average (see also Minobe 1999). The idea behind the running average is to quantify the notion of “predominantly above,” but, while the choice of 5 yr is admittedly subjective, it is in keeping with smoothing procedures typically applied to climatological time series in the literature to reduce the influence of interannual variability. After 1923, we can see that the running averages are predominantly (but not strictly) below the sample mean up to 1946. Based upon this visual inspection, we might be tempted to deem this 23-yr interval to be a negative regime (and to formulate a hypothesis that a “typical” regime lasts 23 yr), but this is obviously a subjective judgment that is open to valid criticism. If we take the definition of a regime to be a contiguous stretch over which a 5-yr running average is strictly above or below the sample mean, then the period from 1924 to 1946 breaks up into 7 regimes (4 of length 1 yr, and 1 each of lengths 3, 7, and 9 yr). If in fact climatological series such as the NP index were to exhibit regimes with typical sizes, we could presumably use this information to help predict when a switch from, say, a positive to a negative regime is about to occur.

While the predictability of regime shifts for the NP index is open to question when we view the series as a realization of either a short- or long-memory process, it is nonetheless of interest to see how the fitted FD and AR(1) models impact what we would deduce about the distribution of regime sizes. Knowledge of this distribution gives us some idea as to how compatible these two processes are with the idea of regimes, or at least tell us which process is more likely to generate realizations that supporters of the regime idea would deem to be realistic. While it is difficult to determine the distribution of regime sizes analytically, it is easy to do so via Monte Carlo experiments (von Storch and Zwiers 1999, p. 205). To do so, we generated 1000 realizations of size 1024 from a zero mean FD process whose parameter *δ* is dictated by our fitted FD model for the NP index. In order to account for the uncertainty in the parameter estimate *δ̂**δ*_{k} for the *k*th such realization; that is, we selected *δ*_{k} from a Gaussian distribution with mean *δ̂**π*^{2}). We need not concern ourselves with uncertainty in the estimate for *σ*^{2}_{ε}*x*_{t}, *t* = 0, … , 1023, denote one of the realizations. In the first definition, the starting index *t*_{s} for a positive regime is one for which *x*_{ts}*x*_{ts−1}*t*_{e} is the one satisfying *t*_{e} ≥ *t*_{s}, *x*_{te}*x*_{te+1}*x*_{ts+k}*k* = 0, … , *t*_{e} − *t*_{s} (a negative regime is defined analogously by *x*_{ts}*x*_{ts−1}*x*_{te}*x*_{te+1}*y*_{t} = (*x*_{t−2} + *x*_{t−1} + *x*_{t} + *x*_{t+1} + *x*_{t+2})/5 and then identify regimes by applying the first definition to this 5-point running average. Here we wish to suppress the influence of the large interannual variability. For either definition, the regime size is taken to be *t*_{e} − *t*_{s} + 1. We also only tabulated “fully expressed” regimes; that is, regimes that might have started prior to index *t* = 0 or after index *t* = 1023 were not used. This procedure, in fact, biases the distribution somewhat toward smaller regime sizes, but this bias should be relatively small and can be lessened by increasing the size of each realization beyond 1024. An analogous procedure was followed for the AR(1) model.

Figure 7 summarizes the results of the Monte Carlo experiments. Here we plot the empirically determined probability of a regime size being greater than or equal to a specified length for the AR(1) model (thin curves) or FD model (thick) when used with the first or second definition for a regime (left- and right-hand plots, respectively). We see that the FD model tends to yield regime sizes that are longer than those for the AR(1) model. For example, if we consider the typical regime size of 23 yr suggested by our visual inspection of the first half of the NP index, we see that a run of this length or longer is 4 times more likely to occur in the 5-yr running averages with the FD model than with the AR(1) model. As a second example, a run of 35 yr is 10 times more likely to occur with the FD model. Thus, even though the statistics for short runs are quite comparable in both models, the FD model suggests a greater likelihood of observing long runs, which is in keeping with visual analyses that inspired the notion of regimes.

In the previous sections we have noted that the FD model is as viable a model for the NP and Sitka series as the AR(1) model. In this section one notion for regimelike behavior is presented in which extended intervals between zero crossings are shown to be consistent with the FD model, but it is not well supported by the AR(1) model. Note that, since the FD model is also stochastic, regimelike behavior based on zero crossings does not necessarily require a deterministic oscillation model. The fact that FD models are more supportive of regimelike behavior allows us to make practical discrimination between AR(1) and FD models based upon auxiliary information. For example, the evidence of regimes in several biological systems in the North Pacific is strong, particularly for salmon (Mantua et al. 1997). Our FD model is consistent with the quite reasonable hypothesis that the physical environment in the North Pacific is a contributing factor to the regimes observed in these biological systems. Biological systems are capable of oscillatory behavior driven by white noise. However, the food chain is quite narrow and short in the North Pacific and physical–biological links are plausible (Gargett 1997). Auxiliary information from the physical system can also be used to lend support for the FD model over the AR(1) model. The maximum likelihood analysis of Haines and Hannachi (1995) suggests that the PNA pattern, and thus the NP index, have preferred bimodal states. Another plausible mechanism for persistence or a long memory effect is ocean atmospheric feedback in the North Pacific (Latif and Barnett 1994). Feldstein (2000) suggests that interannual variability of the PNA pattern arises both from climate noise and external forcing, which might be consistent with the level of persistence suggested by the FD model.

### b. Interpretation and adequacy of long-memory models

From Table 1 we see that the estimated values of the FD parameter *δ* for both the NP and Sitka series are around 0.2. The allowable range of *δ* for stationary FD models with long-memory dependence is 0 < *δ* < 0.5. As *δ* approaches zero, an FD process approaches white noise, which has “no memory” in the sense that its random variables are pairwise uncorrelated. At the other extreme, as *δ* approaches a half, realizations from the FD process exhibit a strong long-memory effect. To get a better idea of how to interpret *δ,* Fig. 8 shows two columns of simulated FD series (thin curves), each with four rows. Each row corresponds to a different choice of *δ.* From top to bottom, these are *δ* = 0.02 (the lower end of the 95% CI for *δ* for the NP index, which is quite close to white noise), 0.17 (the estimated value for the NP index), 0.32 (the upper end of the 95% CI, which corresponds to a moderate long-memory effect) and 0.45 (a value in the upper allowable range for *δ* corresponding to a strong long-memory effect). All four processes have zero mean and unit innovations variance, so they only differ in the choice of *δ.* All four time series in a given column were formed using the *same* realization of white noise so that differences among these series can be attributed entirely to *δ* (see the discussion in appendix B). Note that, as the degree of the long-memory effect increases, we see a more regimelike structure in the series; that is, there is a greater tendency for the series to be above (or below) the process mean of zero for long stretches of time. To quantify this, let us consider 5-point running averages (the thick curves on each plot). As *δ* increases, the total number of runs in the 5-point running averages tends to decrease (24, 16, 16, and 12 in, respectively, top to bottom left-hand plots; 20, 15, 13, and 11 in right-hand plots), while the length of the longest run increases (12, 17, 18 and 23, left-hand plots; 11, 21, 24, and 25, right-hand plots). These results are not linear in *δ.* In fact, the best estimate of *δ* = 0.17 for the NP index has similar behavior to *δ* = 0.32, and is substantially different from the white noise model.

We can thus interpret *δ* as an indicator of how much regimelike structure there is in a time series: if *δ* is close to zero, there is very little tendency for the series to remain above its process mean for long stretches of time, whereas the opposite is true when *δ* is close to a half. If we consider that the time series has both short-term (interannual) variability and long-term memory, then even the modest value of *δ* = 0.17 is enough to change the zero crossing behavior to provide a regimelike behavior in the 5-yr running means. Thus, the estimated *δ* parameters for both the NP and Sitka series are significantly different from zero (i.e., they cannot be reasonably taken to be a realization of a white noise process), and the size of *δ* suggests there is moderate long-memory structure, based on run statistics. The appeal of FD models is that they have a single parameter (*δ*) that can help us understand if a particular climatological series exhibits weak, moderate, or strong long-memory characteristics. For the NP and Sitka series, we can conclude from the estimated *δ* that there are structures that are compatible with the notion of regimes, but both series cannot be characterized as being dominated by a single strong long-memory process. In addition, by inspection of Figs. 1, 4, and 7, there is a broad distribution of zero crossing intervals. If the FD model were the true underlying process for the NP index, then even though regimes are a major feature, prediction would be problematic.

## 5. Conclusions

We have compared a first-order autoregressive [AR(1)] model with a fractionally differenced (FD) model applied to two North Pacific (NP) time series, the winter NP sea level pressure index that is centered on the Aleutian low region, and the winter average of the monthly temperature records from Sitka, Alaska. Both models reduce to white noise when one of their model parameters is zero. For both time series, this parameter is (just barely) significantly different from zero at a 95% level of confidence, and hence there is evidence to say that both time series have significant serial correlation. The AR(1) model has a rapid drop-off of the autocovariance sequence, which essentially models the large interannual variability of the time series. The autocovariance sequence for the FD model has a similar drop for short lags, but also has a long tail of small but positive correlations at longer lags, which is termed “long memory” in the statistical literature. This has been referred to as the Hurst effect in hydrology. The AR(1) and FD models lead to different physical models for the NP index. While an AR(1) model corresponds to a discretized first-order differential equation with a single damping constant, an FD model corresponds to an aggregation of such equations involving many different damping constants (Beran 1994).

The statistical analysis of the winter-averaged NP index shows that the AR(1) and FD models fit equally well. A similar analysis using the longer Sitka air temperature series corroborates this result. Like the AR(1) model, fitting the FD model to a given time series involves the estimation of just three parameters; hence the models are equally parsimonious. The FD model has the additional property that, unlike the AR(1) model, it creates regimelike behavior in which the winter-averaged NP index tends to remain above or below the mean for a number of years. This is true even when the low-frequency variance is a relatively small percentage of the total, as it is for the NP and Sitka series.

To fit the AR(1) and FD models to climatic series, we have adopted a rigorous statistical approach appropriate for the problem at hand. This approach includes maximum likelihood estimation of model parameters (adapted, in the case of the Sitka series, to handle missing values in a time series without the need for a questionable interpolation scheme); use of Monte Carlo experiments to verify large sample approximations to the variance of the estimated parameters; use of goodness-of-fit test statistics to evaluate the fitted models; and an evaluation of the performance of these test statistics in the presence of incorrect models. This approach should prove useful to investigators who wish to examine other climatic datasets from a short- versus long-memory perspective.

Based on synthetic time series derived from both the AR(1) and FD models, we show that it would take a time series of several hundred years to discriminate between the two models as being the underlying process for the North Pacific. In such a situation with relatively short time series and large interannual variability, we are left with the less attractive option of comparing models rather than claiming that one model is statistically more appropriate than another. Hence, in modeling climate variability in the North Pacific, it is necessary to rely on model to model and time series to time series comparisons, and to bring in additional information outside the time series in order to choose between models. For example, both of the distinct North Pacific time series under study, the NP index and the Sitka air temperatures, have a fitted FD model with nearly the same parameter value, *δ* = 0.17 and *δ* = 0.18. Physical arguments can also be brought in as additional information. For example, in the North Pacific there are atmosphere–ocean models that suggest feedbacks on decadal scales. The PNA teleconnection pattern has been shown to have bimodal behaviors. Perhaps further evidence for a FD model over an AR(1) model is from biological time series (Hare and Mantua 2000). Well established regime behavior seen in the biology of the region, such as geographic changes in salmon populations, support evidence for shifts in the physical system near 1925, 1947, and 1977. Biological systems could amplify or filter climate variability, but the different responses of individual species suggests complicated behavior. However, the strongest statement that we can make from our analysis is that regimelike behavior for the North Pacific, based on the long-memory model, cannot be ruled out based on statistical grounds.

One important point that our work draws attention to regards the interpretation of *δ,* the parameter in the FD model that determines its long-memory characteristics. This parameter varies from *δ* = 0.0 for white noise to just below 0.5 for a strong long-memory effect. However, if we consider regimelike behavior based on interval statistics for zero crossings, then the behavior of *δ* is nonlinear. The *δ* parameter is a measure of the tendency to form regimes. The value of the parameter for the two North Pacific time series is *δ* ≈ 0.17, yet its behavior in terms of run lengths was similar to *δ* = 0.32, and both of these values had behavior closer to *δ* = 0.45 than *δ* = 0.02. Apparently, the small displacement contributions from long periods is enough so that weak excursions at interannual scales do not cross the zero level. Because the FD model is completely stochastic, it cannot be used to make deterministic predictions for the beginning and duration of regimes. We show, however, that North Pacific time series are consistent with moderate regimelike behavior, based on the FD model. The results of our comparison of the FD model with the AR(1) model leave room for further characterization and potential prediction of North Pacific climate processes.

## Acknowledgments

We thank the editor and two referees for quite helpful comments that improved the exposition of this paper. This contribution was supported in part by the North Pacific Marine Research Initiative, Steller Sea Lion Research, and the EPA through the National Research Center for Statistics and the Environment at the University of Washington.

## REFERENCES

Abramowitz, M., , and I. A. Stegun, Eds.,. . 1964:

*Handbook of Mathematical Functions*. U.S. Government Printing Office, 1046 pp. (Reprinted by Dover Publications, 1968.).Andersson, M. K., 1998: On the effects of imposing or ignoring long memory when forecasting. Working Paper Series in Economics and Finance No. 225, Department of Economic Statistics, Stockholm School of Economics, 14 pp.

Beran, J., 1994:

*Statistics for Long Memory Processes*. Chapman and Hall, 315 pp.Box, G. E. P., , and D. A. Pierce, 1970: Distribution of residual autocorrelations in autoregressive integrated moving average time series models.

,*J. Amer. Stat. Assoc***65****,**1509–1526.Chambers, J. M., , W. S. Cleveland, , B. Kleiner, , and P. A. Tukey, 1983::

*Graphical Methods for Data Analysis*. Duxbury Press, 395 pp.Davies, R. B., , and D. S. Harte, 1987: Tests for Hurst effect.

,*Biometrika***74****,**95–101.Feldstein, S. B., 2000: The timescale, power spectra, and climate noise properties of teleconnection patterns.

,*J. Climate***13****,**4430–4440.Fuller, W. A., 1996:

*Introduction to Statistical Time Series*. 2d ed. Wiley-Interscience, 698 pp.Gargett, A. E., 1997: Physics to fish: Interactions between physics and biology on a variety of scales.

,*Oceanograhy***10****,**128–131.Granger, C. W. J., , and R. Joyeux, 1980: An introduction to long-memory time series models and fractional differencing.

,*J. Time Series Anal***1****,**15–29.Haines, K., , and A. Hannachi, 1995: Weather regimes in the Pacific from a GCM.

,*J. Atmos. Sci***52****,**2444–2462.Hare, S. R., , and N. J. Mantua, 2000: Empirical evidence for North Pacific regime shifts in 1977 and 1989.

*Progress in Oceanography,*Vol. 47, Pergamon, 103–146.Hosking, J. R. M., 1981: Fractional differencing.

,*Biometrika***68****,**165–176.Jones, R. H., 1980: Maximum likelihood fitting of ARMA models to time series with missing observations.

,*Technometrics***22****,**389–395.Kay, S. M., 1981: Efficient generation of colored noise.

,*Proc. IEEE***69****,**480–481.Latif, M., , and T. P. Barnett, 1994: Causes of decadal climate variability over the North Pacific and North America.

,*Science***266****,**634–637.Ljung, G. M., , and G. E. P. Box, 1978: On a measure of lack of fit in time series models.

,*Biometrika***65****,**297–303.Mantua, N. J., , S. R. Hare, , Y. Zang, , and J. M. Wallace, 1997: A Pacific interdecadal climate oscillation with impacts on salmon production.

,*Bull. Amer. Meteor. Soc***78****,**1069–1079.Milhøj, A., 1981: A test of fit in time series models.

,*Biometrika***68****,**177–187.Minobe, S., 1999: Resonance in bidecadal and pentadecadal climate oscillations over the North Pacific: Role in climate regime shifts.

,*Geophys. Res. Lett***26****,**855–858.Overland, J. E., , J. M. Adams, , and N. A. Bond, 1999: Decadal variability of the Aleutian low and its relation to high-latitude circulation.

,*J. Climate***12****,**1542–1548.Overland, J. E., , J. M. Adams, , and H. O. Mofjeld, 2000: Chaos in the North Pacific: Spatial modes and temporal irregularity.

*Progress in Oceanography,*Vol. 47, Pergamon, 337–354.Palma, W., , and N. H. Chan, 1997: Estimation and forecasting of long-memory time series with missing values.

,*J. Forecast***16****,**395–410.Palmer, T. N., 1999: A nonlinear dynamical perspective on climate prediction.

,*J. Climate***12****,**575–591.Percival, D. B., , and A. T. Walden, 1993:

*Spectral Analysis for Physical Applications: Multitaper and Conventional Univariate Techniques*. Cambridge University Press, 583 pp.Pierce, D. W., 2001: Distinguishing coupled ocean–atmosphere interactions from background noise in the North Pacific.

*Progress in Oceanography,*Pergamon, in press.Priestley, M. B., 1981:

*Spectral Analysis and Time Series*. Academic Press, 890 pp.Stephens, M. A., 1974: EDF statistics for goodness of fit and some comparisons.

,*J. Amer. Stat. Assoc***69****,**730–737.Stephenson, D. B., , V. Pavan, , and R. Bojariu, 2000: Is the North Atlantic oscillation a random walk?

,*Int. J. Climatol***20****,**1–18.Trenberth, K. E., , and D. A. Paolino, 1980: The Northern Hemisphere sea level pressure data set: Trends, errors, and discontinuities.

,*Mon. Wea. Rev***108****,**855–872.von Storch, H., , and F. W. Zwiers, 1999:

*Statistical Analysis in Climate Research*. Cambridge University Press, 484 pp.Wood, A. T. A., , and G. Chan, 1994: Simulation of stationary Gaussian processes in [0,1]

*d*.,*J. Comput. Graph. Stat***3****,**409–432.Wunsch, C., 1999: The interpretation of short climate records, with comments on the North Atlantic and Southern Oscillations.

,*Bull. Amer. Meteor. Soc***80****,**245–255.Yaglom, A. M., 1987:

*Correlation Theory of Stationary and Related Random Functions I: Basic Results*. Springer-Verlag, 526 pp.

## APPENDIX A

### Maximum Likelihood Estimation

**U**≡ [

*U*

_{0},

*U*

_{1}, … ,

*U*

_{N−1}]

^{T}is a vector of random variables (RVs) that form a portion of a real-valued Gaussian stationary process with zero mean and ACVS {

*s*

_{U,τ}:

*τ*= … , −1, 0, 1, …}. Let Σ be the covariance matrix for

**U**; that is, the (

*j,*

*k*)th element of Σ is given by

*s*

_{U,j−k}, where 0 ≤

*j,*

*k*≤

*N*− 1. The joint probability density function for these RVs can be written aswhere |Σ| and Σ

^{−1}are, respectively, the determinant and inverse of Σ. Suppose now that {

*s*

_{U,τ}} and hence Σ are completely determined by a vector

**a**of

*K*unknown parameters, where typically

*K*≪

*N.*Given

**U**, we can regard the right-hand side of Eq. (A1) as an implicit function of

**a**known as the likelihood function:The maximum likelihood (ML) estimator

**â**for

**a**is the vector that maximizes

*L*(

**a**|

**U**) as a function of

**a**; equivalently, the ML estimator is the vector that minimizesWhen dealing with a time series with missing values (e.g., the Sitka series), we can reformulate the above by letting

**U**just contain the random variables corresponding to the actual observations and by deleting all rows and columns Σ corresponding to the missing values. The ML estimators satisfy a number of optimality criteria and hence are generally to be preferred over other estimators, particularly when dealing with small sample sizes (see, e.g., section 5.2 of Priestley 1981).

#### MLEs for an AR(1) process

**U**to be [

*X̃*

_{0},

*X̃*

_{1}, … ,

*X̃*

_{N−1}]

^{T}, where the recentered time series {

*X̃*

_{t}} is assumed to obey the model

*X̃*

_{t}=

*ϕX̃*

_{t−1}+ ϵ

_{t}. The process {ϵ

_{t}} is taken to be Gaussian white noise with mean zero and variance

*σ*

^{2}

_{ϵ}

*ϕ*and

*σ*

^{2}

_{ϵ}

*ϕ̂*

*ϕ*is the value of

*ϕ*that minimizes the reduced (or profile) log likelihood function, namely,(for details, see, e.g., section 9.8 of Percival and Walden 1993). Differentiation of the above yieldswhich is a cubic polynomial in

*ϕ.*The desired estimator

*ϕ̂*

*A*

^{(AR)}(

*ϕ*) = 0 that minimizes

*l*

^{(AR)}(

*ϕ*). The corresponding ML estimator of

*σ*

^{2}

_{ϵ}

*σ̂*

^{2}

_{ϵ}

*C*(

*ϕ̂*

*N.*When dealing with a time series with missing values, the above formulation does not apply, but we can make use of a Kalman filtering (state space) formulation of an AR(1) process to compute the ML estimators for

*ϕ*and

*σ*

^{2}

_{ϵ}

#### MLEs for fractionally differenced processes

**U**to be [

*Ỹ*

_{0},

*Ỹ*

_{1}, … ,

*Ỹ*

_{N−1}]

^{T}, where the recentered time series {

*Ỹ*

_{t}} is assumed to obey an FD process with parameters

*ϕ*and

*σ*

^{2}

_{ε}

*δ*as follows. We first compute the partial autocorrelation sequence (PACS)

*ϕ*

_{t,t},

*t*= 1, … ,

*N*− 1, which is given by

*ϕ*

_{t,t}=

*δ*/(

*t*−

*δ*) (Hosking 1981). The PACS is used to recursively compute the coefficients of the best linear predictor of

*Ỹ*

_{t}given

*Ỹ*

_{t−1}, … ,

*Ỹ*

_{0}for

*t*= 2, … ,

*N*− 1. These coefficients are given by

*ϕ*

_{t,k}

*ϕ*

_{t−1,k}

*ϕ*

_{t,t}

*ϕ*

_{t−1,t−k}

*k*

*t*

*e*

_{0}to be

*Ỹ*

_{0}). We also use the PACS is to compute a sequence {

*υ*

_{t}} relating var{

*e*

_{t}} to var{

*e*

_{0}} = var{

*Ỹ*

_{t}} [the latter is given by Equation (8)]:Given

*Ỹ*

_{t}, the sequences {

*ϕ*

_{t,k}}, {

*e*

_{t}}, and {

*υ*

_{t}} are all implicit functions of

*δ*and are entirely determined by it. Define

_{t}≡

*e*

_{t}/

*υ*

_{t}

*δ̂*

*δ̂*

*σ̂*

^{2}

_{ε}

*σ*

^{2}

_{ε}

*δ̂*

## APPENDIX B

### Simulation of AR(1) and FD Processes

We used the method described by Kay (1981) to simulate a time series of length *N* whose statistical properties are dictated by the AR(1) model of Eq. (1) with *μ*_{X} = 0. To so do, we first used a pseudorandom number generator on a digital computer to obtain *N* independent and identically distributed deviates *Z*_{0}, *Z*_{1}, … , *Z*_{N−1} from a Gaussian distribution with zero mean and unit variance. We set *X*_{0} = *σ*_{ϵ}*Z*_{0}/*ϕ*^{2})*X*_{t} = *ϕX*_{t−1} + *σ*_{ϵ}*Z*_{t} for *t* = 1, … , *N* − 1 to obtain the rest of the time series (note that a realization from a model with *μ*_{X} ≠ 0 can be obtained by simply adding *μ*_{X} to the generated *X*_{0}, *X*_{1}, … , *X*_{N−1}).

*Y*

_{0},

*Y*

_{1}, … ,

*Y*

_{N−1}, we started with 2

*N*deviates

*Z*

_{0},

*Z*

_{1}, … ,

*Z*

_{2N−1}and used the exact simulation method described in Davies and Harte (1987) and Wood and Chan (1994), which works by transforming these 2

*N*deviates into the desired series of length

*N*via the following steps. First we computed the real-valued sequencewhere

*s*

_{Y,τ}is defined in Eqs. (8) and (9). Note that the

*S*

_{k}sequence constitutes the the first

*N*+ 1 values of the discrete Fourier transform (DFT) of the sequence of length 2

*N*given by

*s*

_{Y,0},

*s*

_{Y,1}, … ,

*s*

_{Y,N−1},

*s*

_{Y,N},

*s*

_{Y,N−1},

*s*

_{Y,N−2}, … ,

*s*

_{Y,1}. The DFT can be efficiently computed using a standard fast Fourier transform algorithm if the desired sample size

*N*is a power of 2 (if we want, say,

*N*= 1000, we can always simulate a series whose length is 1024 and then discard either the first or last 24 values). We then computed the complex-valued sequencewhere the asterisk denotes complex conjugation. Finally we used an inverse DFT algorithm to compute the real-valued sequenceThe desired simulation is given by

*Y*

_{0}, … ,

*Y*

_{N−1}(i.e., the values

*Y*

_{N}, … ,

*Y*

_{2N−1}that are returned by the inverse DFT algorithm were not used).

## APPENDIX C

### Goodness-of-Fit Tests

Here we describe the four statistical tests that we used to assess the adequacy of short- and long-memory models. In what follows, we let *ê*_{t} stand for the residuals under either the AR model (i.e., _{t}) or the FD model (i.e., _{t}).

#### Spectral density function test

*Ŝ*(

*f*

_{k}) be the periodogram for the NP index at the Fourier frequency

*f*

_{k}≡

*k*/

*N*as given in Eq. (6). Let

*S*(

*f*

_{k};

*θ̂*

*θ̂*

*S*(

*f*

_{k};

*θ̂*

*θ̂*

*ϕ̂*

*σ̂*

^{2}

_{ϵ}

^{T}and

*θ̂*

*δ̂*

*σ̂*

^{2}

_{ε}

^{T}. Letting

*M*be the integer part of (

*N*− 1)/2, the SDF test statistic is given byUnder the null hypothesis of a correct model, this test statistic is asymptotically normally distributed with mean 1/

*π*and variance 2/(

*π*

^{2}

*N*) (for details, see Milhøj 1981 and section 10.2 of Beran 1994). We can reject the null hypothesis at level of significance

*α*when

*N*/2

*πT*

_{1}− 1) exceeds the upper (1 −

*α*) × 100% percentage point

*Q*

_{1}(1 −

*α*) for the standard normal distribution; for example,

*Q*

_{1}(1 −

*α*) ≐ 1.96 when

*α*= 0.05. The critical level

*α̂*

*T*

_{1}is given by Φ(

*N*/2

*πT*

_{1}− 1]), where Φ( · ) is the cumulative distribution function for a standard normal random variable.

#### Cumulative periodogram test

*Ŝ*

_{ê}(

*f*

_{k}) be the periodogram for

*ê*

_{t}at the Fourier frequency

*f*

_{k}≡

*k*/

*N,*i.e., the right-hand side of Eq. (6) with

*X̃*

_{t}replaced by

*ê*

_{t}. We form the normalized cumulative periodogramThe test statistic

*T*

_{2}is given by max{

*D*

^{+},

*D*

^{−}}, whereWe reject the null hypothesis of white noise at the

*α*level of significance if

*D*exceeds the upper

*α*× 100% percentage point

*Q*

_{2}(1 −

*α*) for

*D*under the null hypothesis. To a good approximation, we havewhere

*C*(0.9) = 1.224,

*C*(0.95) = 1.358, and

*C*(0.99) = 1.628 (Stephens 1974). We can get some idea as to what the critical value

*α̂*

*T*

_{2}to

*Q*

_{2}(0.90),

*Q*

_{2}(0.95), and

*Q*

_{2}(0.99).

#### Portmanteau tests

*τ*= 1, … ,

*K*is consistent with a hypothesis of zero mean white noise, where

*K*is taken to be relatively small in relation to the sample size

*N*[the sample ACS is defined as in Eq. (5) with

*X̃*

_{t}replaced by

*ê*

_{t}]. Here we consider two variations on the portmanteau test, namely, the Box–Pierce test statistic

*T*

_{3}and the Ljung–Box–Pierce test statistic

*T*

_{4}, given by, respectively,(Box and Pierce 1970; Ljung and Box 1978). For either test statistic, we reject the null hypothesis of white noise at significance level

*α*when the statistic exceeds the (1 −

*α*) × 100% percentage point

*Q*

_{3}(1 −

*α*) =

*Q*

_{4}(1 −

*α*) for the chi-square distribution with

*K*− 1 degrees of freedom. If we let

*χ*

^{2}

_{K−1}

*χ*

^{2}

_{K−1}

*T*

_{3}) and

*χ*

^{2}

_{K−1}

*T*

_{4}).

## APPENDIX D

### Performance of Test Statistics under Incorrect Models

Here we give some details on the Monte Carlo experiments used to determine the probability of rejecting an incorrect model using one of the goodness-of-fit test statistics discussed in appendix C. Suppose first that the NP index is in fact a realization of an FD process with parameters *δ̂**σ̂*^{2}_{ε}*N,* we can generate a realization from this process as described in appendix B. We then fit an AR(1) model to this realization using the maximum likelihood method described in appendix A and apply all four goodness-of-fit test statistics to assess the hypothesis that the AR(1) model is an adequate fit. We repeat this procedure *M* times and keep track of the number of times *M*_{j} that the test statistic *T*_{j} rejects the null hypothesis. Our estimate of the probability that the *T*_{j} will reject the null hypothesis is given by *p̂*_{j} ≡ *M*_{j}/*M.* The statistics of the binomial distribution says that, for large *M,* the estimator *p̂*_{j} should be approximately normally distributed with a mean value given by the true rejection probability *p*_{j} and a variance given by *p*_{j}(1 − *p*_{j})/*M.* We can thus estimate the standard error in *p̂*_{j} using [*p̂*_{j}(1 − *p̂*_{j})/*M*]^{1/2}. By letting *M* = 2500, we found that the estimated standard errors were no larger than 0.02. The left-hand plot of Fig. 4 shows *p̂*_{j} as a function of sample size for the four test statistics. By reversing the roles of the AR(1) and FD processes, we obtain the right-hand plot.

The AR and FD process parameter estimates for the NP index, uninterpolated Sitka air temperature, and interpolated Sitka air temperature series

Model goodness-of-fit tests for the NP index

^{*}

Pacific Marine Environmental Laboratory Contribution Number 2249.