## 1. Introduction

Time series obtained from various observational and modeling activities may provide crucial information for understanding and predicting atmospheric and climatic variability. However, traditional methods of time series analysis involving linear parametric models are “based on certain probabilistic assumptions about the nature of the physical process that generates the time series of interest. Such mathematical assumptions are rarely, if ever, met in practice” (Ghil et al. 2002). One such common assumption is that observations follow a normal distribution. Yet, distributions of many meteorological and climatological variables are not normal, such as the velocity field in a turbulent flow (Lesieur 1997), the precipitation amount, or the economic damage from extreme weather events (Katz 2002; Katz et al. 2002), and new advances in statistics have made it clear that even very slight departures from normality can be a source of concern (e.g., Wilcox 2003).

Consider also time series *W* of the vertical velocity of wind recorded under Project Lake-Effect Snow Studies (LESS) in the winter of 1983/84 (Agee and Gilbert 1989). Figure 1 shows a segment of 4096 values (corresponding to about 14.3 km) from the record of *W* taken at 50 m above Lake Michigan with 70 m s^{−1} flight speed and 20-Hz sampling rate. After being subjected to the test for stationarity (Gluhovsky and Agee 1994), these data can be considered as stationary, unlike the entire record. The sample mean, variance, and skewness computed from the record in Fig. 1 are, respectively, 0.03, 1.11, and 0.84. In statistics, confidence intervals (CIs) are used to decide how much importance is reasonable to attach to such numbers, our “best guesses,” that by themselves are not guaranteed to be close to the real time series parameters (mean, variance, and skewness) they are supposed to estimate. For example, positive skewness of *W* (implying that *W* is not normal) indicated by the large sample skewness (0.84) would be reasonably confirmed by a CI for the skewness containing only positive numbers. Another (climatological) example of a time series with large sample skewness (0.92) is shown in Fig. 2. This is the monthly Palmer drought index (PDI) for Arizona, division 6, for the period of 1904–2006.

Because the data-generating mechanism is usually unknown, the common practice is to assume a *linear* parametric model for it [thus assuming a *normal* time series; e.g., autoregressive moving average (ARMA) model] and then to estimate the model from the observed record and compute CIs for parameters of the underlying time series based on the estimated model. In section 2, we demonstrate with Monte Carlo simulations of a model nonlinear time series that even “small” nonlinearities in the *real* data-generating mechanism may render useless the inference (90% CIs for the variance of the time series) based on estimated linear parametric models. After that, in section 3, we show how modern resampling (bootstrap) methods (e.g., Efron and Tibshirani 1993; Davison and Hinkley 1997; Politis et al. 1999; Lahiri 2003) may be employed to obtain reliable inference without making questionable assumptions about the data-generating mechanism. These and other nonparametric methods have become increasingly popular in time series analysis (Fan and Yao 2003). In this paper, we argue that nonparametric methods should be used when making inferences regarding meteorological or climatological datasets, because these methods make few assumptions and do not require that linear or nonlinear models be fitted to the series. “The basic idea of nonparametric inference is to use data to infer an unknown quantity while making as few assumptions as possible” (Wasserman 2006).

## 2. Conventional confidence intervals

A 90% CI is the range of numbers containing an unknown parameter with *coverage probability* of 0.90. This implies that if instead of one time series record (a finite collection of observations made sequentially over time or along a path line) commonly available in practice, an enormous number of such records of equal lengths were obtainable, and from each record a CI were to be computed, then 90% of the resulting CIs would contain the parameter. Such coverage probability [often referred to as *nominal* or *target* coverage probability; e.g., Davison and Hinkley (1997)] is attained only if all assumptions underlying the method for the CI construction are met. This is typically not the case in geosciences, so that the *actual* coverage probability may differ (sometimes considerably) from the target level. Intervals with confidence levels other than 90% (e.g., 95% or 99%) are often used in various applications (the higher the confidence level is, the wider is the interval).

In this study, to get an idea of how much in error one can possibly be when computing CIs for parameters of observed time series from estimated linear models, this commonly accepted procedure was subjected to Monte Carlo simulations with a model time series. The Monte Carlo simulations permit one to determine the actual coverage probability of such CIs using its probabilistic interpretation given above. Thus, our Monte Carlo simulations were conducted by generating 1000 records of a model nonlinear time series, fitting to each record a linear model, and computing from this model the 90% CI for the variance of the data-generating time series. Last, from the resulting set of 1000 CIs, the actual coverage probability was determined as the fraction of those among them that contain the “true” variance (which is known from the data-generating model employed in this experiment).

### a. Coverage probabilities of CIs for a linear time series

*ϕ*< 1 is a constant and

*ε*is white noise (a sequence of uncorrelated random variables with zero mean and variance

_{t}*σ*

^{2}

_{ε}). The AR(1) process is widely used in studies of climate as a default model for correlated time series (e.g., Katz and Skaggs 1981; von Storch and Zwiers 1999; Percival et al. 2004).

*X*is given bywhere sample variance

_{t}*σ̂*

^{2}

_{X}, an estimate of the “true” variance of

*X*,

_{t}*σ*

^{2}

_{X}=

*σ*

^{2}

_{ε}/(1 −

*ϕ*

^{2}), is computed from data. When

*σ*

^{2}

_{X}in Eq. (2) is unknown (which is usually the case), it must be estimated from data (unknown parameters

*ϕ*and

*σ*

^{2}

_{ε}are commonly estimated). Equation (2) follows from the fact that

*σ̂*

^{2}

_{X}is asymptotically normal with mean

*σ*

^{2}

_{X}and standard error(e.g., Priestley 1981; Brockwell and Davis 1991). For brevity, the CI defined by Eq. (2) will be denoted as CI(2).

In our simulations, realizations of length *n* = 1024 were generated from the model in Eq. (1) with *ϕ* = 0.67 and Gaussian white noise with zero mean and variance *σ*^{2}_{ε} = 1 − *ϕ*^{2} ≈ 0.55 (which makes *σ*^{2}_{X} = 1). At the chosen value of *ϕ*, about 1000 data points from Eq. (1) [and of the model in Eq. (3) below] allow the same accuracy in the estimation of variance as 400 independent normal observations (see, e.g., Priestley 1981). In practice, when only one record is available, determination of the optimal block size in subsampling (see below) requires the record length to be a power of 2 (1024 = 2^{10}).

Pretending that, as in reality, the data-generating mechanism is unknown, an AR(1) model was fitted to each such realization, and the goodness of fit of the model was confirmed by commonly employed diagnostic checking procedures (residual analysis, portmanteau test; e.g., Brockwell and Davis 1991). Not surprising is that the coverage probability of CI(2) in this case is about its nominal value, 0.90, because the data-generating time series was AR(1). In the next section, a “real life” situation with effects of nonlinearity is explored.

### b. Coverage probabilities of CIs for a nonlinear time series

*modified*model:where

*X*is the same as in section 2a [i.e., the model in Eq. (1)] and

_{t}*a*is a constant [

*a*= 0 corresponds to the model in Eq. (1)]. Linear models may match the first two moments (mean and variance) of observed time series, but they have zero skewness, whereas a nonlinear model may be capable of matching all three moments. Note that at

*a*= 0.14 the mean, variance, and skewness of

*Y*are, respectively, 0, 1 + 2

_{t}*a*

^{2}≈ 1.04, and 6

*a*+ 8

*a*

^{3}≈ 0.86, close to corresponding sample characteristics (0.04, 1.11, and 0.84) of time series

*W*of the vertical velocity discussed in the introduction. Thus

*Y*might provide a better description for

_{t}*W*than do linear models.

The procedure described in section 2a was repeated but now with time series generated from nonlinear model Eq. (3) for various values of *a* (0.05, 0.1, 0.15, 0.20, 0.25, 0.30, and 0.35). One thousand realizations of time series Eq. (3) were generated; again, an AR(1) model was fitted to each realization, and it passed, as before, the common residual-based postfitting diagnostic checking.

The results for the coverage probabilities of 90% CI(2) shown in Fig. 3, however, are very different from those for the linear time series. They demonstrate that for a *nonlinear* time series, the actual coverage turns out to be considerably less than nominal (0.90). This means that CI(2) now becomes too narrow to provide the desired 0.90 coverage (and for *a* above 0.2 is practically useless), so that there is a large probability that it *does not* contain the estimated parameter. We found that the widths of CI(2) remain the same (around 0.22) for all values of *a*, whereas CIs that do provide the desired 0.90 coverage prove to be 1.5 (*a* = 0.20) and 2.3 (*a* = 0.35) times as wide.

## 3. Subsampling confidence intervals

*one*realization of a time series. It works under even weaker (practically no) assumptions than other bootstrap methods, thus delivering us from having to rely on questionable assumptions about data. Subsampling is based on the values of the statistic of interest recomputed over subsamples of a record of time series

*Y*, that is,

_{t}*blocks*of consecutive observations of the same length

*b*(the block size) sufficient to retain the dependence structure of the time series. One block of size

*b*is underscored in a record containing

*n*observations and, therefore,

*n*−

*b*+ 1 blocks:To construct an accurate 90% CI for the variance

*when the model is known*, one could (following the probabilistic interpretation of CIs in the beginning of section 2) generate a very large number of realizations from the model, compute the sample variance from each realization, estimate the 0.05 and 0.95 quantiles of its distribution, and use them as a 90% (percentile) CI. In practice, that is,

*when the model is unknown*, subsampling comes to the rescue by replacing computer-generated realizations from the known model with subsamples of size

*b*from only one available realization of the observed time series.

*b*. In fact, the optimal choice of the block size is the most difficult practical problem in subsampling shared by all blocking methods. The asymptotic conditions for consistency of the subsampling method (Politis et al. 1999),(i.e., that the block size

*b*needs to tend to infinity with the sample size

*n*, but at a slower rate), do not give much guidance for the choice of

*b*in the practical case of a finite sample. Subsampling CIs given below were computed based on the optimal block size (

*b*= 80) determined through Monte Carlo simulations with the model in Eq. (3). In practice, when typically only one record of a time series is available, the optimal block size can be determined using a technique suggested by Gluhovsky et al. (2005), which is based on a version of the circular bootstrap (Politis and Romano 1992) and provides a different and more practical approach to the optimal block size selection than that employed by Gluhovsky and Agee (2002).

In Monte Carlo simulations for computing the coverage probabilities of subsampling CIs, realizations of length *n* = 1024 were again generated from Eq. (3) for various values of *a* (0.00, 0.05, 0.1, 0.15, 0.20, 0.25, 0.30, and 0.35), and the subsampling (symmetric percentile) CI for the variance of *Y _{t}* was computed for each realization. Coverage probabilities of 90% subsampling CIs for the variance of

*Y*[the model in Eq. (3)] are presented in Fig. 4 by a solid curve. Unlike CI(2) that did not grow with

_{t}*a*, which resulted in diminishing coverage, the subsampling CIs expand with increasing

*a*, so that their coverage is close to the target (0.90) and remains practically the same for all values of

*a*. Using

*calibration*, this allows one to achieve coverage even closer to the target. That is, one might replace nominal 90% CIs providing the actual coverage of 0.86 (at

*a*= 0) with nominal 95% CIs providing the actual coverage noticeably closer to the target (0.90 at

*a*= 0, 0.87 at

*a*= 0.35), as seen from a dotted curve in Fig. 4. In practice, calibration can be carried out using a model time series that shares certain statistical properties with the one under study [e.g., Eq. (3) with

*a*= 0.14 for the vertical velocity time series

*W*].

## 4. Summary and conclusions

This study has addressed the problem of obtaining reliable statistical inference from atmospheric and climatic time series. Two motivating examples that signify the need to depart from ubiquitous linear models were chosen, one meteorological (vertical velocity *W*) and one climatological (PDI), whose nonzero sample skewnesses (0.84 and 0.92, respectively) indicate possible nonlinearity. In practice, a linear parametric model is commonly assumed for the time series under study (often a questionable assumption), then the model is estimated from the time series record, and CIs for parameters of the time series are computed based on the estimated linear model.

To investigate how nonlinearities may affect statistical inference based on *linear* models, a first-order autoregressive process, typically used as a default model for correlated time series in climate studies, was altered with a nonlinear component. It was demonstrated that when a time series is nonlinear (which is often the case because they originate from an inherently nonlinear system), the CIs for its variance obtained from the estimated linear model are inferior and can become misleading, whereas those obtained through a subsampling method are valid for both the linear and nonlinear time series. The CIs for the variance, and not for the skewness, were chosen to focus on issues essential for this paper (subsampling vs methods based on questionable assumptions) because 1) CIs for the variance are also important, 2) formulas for CIs like Eq. (2) are derived assuming normality and hence zero skewness, and 3) the skewness is notoriously more difficult for estimation in terms of the required amount of data (e.g., Gluhovsky and Agee 1994; Lenschow et al. 1994).

Meteorological observations are more likely to have adequate record lengths for nonparametric inference, whereas many climatological time series (such as the global annual mean surface temperature, with only about 140 data points) are often too short [even for choosing the best linear model for the observed time series as shown by Percival et al. (2004)]. On the other hand, general circulation models, for example, can provide data volumes that are sufficiently large for reliable inference, which can be obtained using resampling methods.

In this paper, subsampling was performed on the linear models and the nonlinear models but not on the real datasets. A future research project will involve subsampling analysis of datasets from observations and numerical models, including subsampling CIs for the skewness to ascertain nonlinearity in observed time series.

This work was supported by National Science Foundation Grants ATM-0514674 and ATM-0541491.

## REFERENCES

Agee, E. M., , and S. R. Gilbert, 1989: An aircraft investigation of mesoscale convection over Lake Michigan during the 10 January 1984 cold air outbreak.

,*J. Atmos. Sci.***46****,**1877–1897.Brockwell, P. J., , and R. A. Davis, 1991:

*Time Series: Theory and Methods*. Springer, 577 pp.Davison, A. C., , and D. V. Hinkley, 1997:

*Bootstrap Methods and Their Application*. Cambridge University Press, 582 pp.Efron, B., , and R. Tibshirani, 1993:

*An Introduction to the Bootstrap*. Chapman and Hall, 436 pp.Fan, J., , and Q. Yao, 2003:

*Nonlinear Time Series*. Springer, 551 pp.Ghil, M., and Coauthors, 2002: Advanced spectral methods for climatic time series.

,*Rev. Geophys.***40****.**1003, doi:10.1029/2000RG000092.Gluhovsky, A., , and E. Agee, 1994: A definitive approach to turbulence statistical studies in planetary boundary layers.

,*J. Atmos. Sci.***51****,**1682–1690.Gluhovsky, A., , and E. Agee, 2002: Improving the statistical reliability of data analysis from atmospheric measurements and modeling.

,*Mon. Wea. Rev.***130****,**761–765.Gluhovsky, A., , M. Zihlbauer, , and D. N. Politis, 2005: Subsampling confidence intervals for parameters of atmospheric time series: Block size choice and calibration.

,*J. Statist. Comput. Simul.***75****,**381–389.Katz, R. W., 2002: Techniques for estimating uncertainty in climate change scenarios and impact studies.

,*Climate Res.***20****,**167–185.Katz, R. W., , and R. H. Skaggs, 1981: On the use of autoregressive-moving average processes to model meteorological time series.

,*Mon. Wea. Rev.***109****,**479–484.Katz, R. W., , M. B. Parlange, , and P. Naveau, 2002: Statistics of extremes in hydrology.

,*Adv. Water Res.***25****,**1287–1304.Lahiri, S. N., 2003:

*Resampling Methods for Dependent Data*. Springer, 374 pp.Lenschow, D. H., , J. Mann, , and L. Kristensen, 1994: How long is long enough when measuring fluxes and other turbulence statistics?

,*J. Atmos. Oceanic Technol.***11****,**661–673.Lesieur, M., 1997:

*Turbulence in Fluids*. Kluwer, 515 pp.Percival, D. B., , J. E. Overland, , and H. O. Mofjeld, 2004: Modeling North Pacific climate time series.

*Time Series Analysis and Applications to Geophysical Systems,*D. R. Brillinger, E. A. Robinson, and F. P. Schoenberg, Eds., The IMA Volumes in Mathematics and Its Applications Series, Vol. 139, Springer, 151–167.Politis, D. N., , and J. P. Romano, 1992: A circular block-resampling procedure for stationary data.

*Exploring the Limits of Bootstrap,*R. LePage and L. Billard, Eds., John Wiley and Sons, 263–270.Politis, D. N., , J. P. Romano, , and M. Wolf, 1999:

*Subsampling*. Springer, 347 pp.Priestley, M. B., 1981:

*Spectral Analysis and Time Series*. Academic Press, 890 pp.von Storch, H., , and F. W. Zwiers, 1999:

*Statistical Analysis in Climate Research*. Cambridge University Press, 484 pp.Wasserman, L., 2006:

*All of Nonparametric Statistics*. Springer, 276 pp.Wilcox, R. R., 2003:

*Applying Contemporary Statistical Techniques*. Academic Press, 608 pp.