Abstract

Signal detection from noisy data by rejecting a noise null hypothesis depends critically on a priori assumptions regarding the background noise and the associated statistical methods. Rejecting one kind of noise null hypothesis cannot rule out the possibility that the detected oscillations are generated from the stochastic processes of another kind. This calls for an adaptive null hypothesis based on general characteristics of the noise that is present. In this paper, a new method is developed for identifying signals from data based on the finding that true physical signals in a well-sampled time series cannot be destroyed or eliminated by resampling the time series with fractional sampling rates through linear interpolation. Therefore, the significance of signals could be tested by checking whether the signals persist in the true time–frequency spectral representation during resampling. This hypothesis is based on the general characteristics of noise as revealed by empirical mode decomposition, an adaptive data analysis method without linear or stationary assumptions, and without any predefinition of the background noise. Applications of this method to synthetic time series, solar spot number, and sea surface temperature time series illustrate its power in identifying characteristics of background noise without any a priori knowledge.

1. Introduction

Identification of signals in noisy data is always a challenging task. The confidence we attribute to a potential signal depends critically on a priori assumptions of the background noise as well as the appropriate statistical model related to it (Mann and Lees 1996). One of the well-established statistical methods to distinguish signal from noise is through the rejection of the white-noise null hypothesis (Davenport and Root 1958). Since the white noise has a flat spectrum, an exact statistical significance test can be established by using the known probability distribution of periodogram of white noise to estimate the likelihood that a given periodogram exceeds some thresholds or likelihood, so that a peak in a given periodogram is indeed caused by random fluctuations in a noise spectrum (Priestley 1981).

Rejection of the white-noise null hypothesis does not shed any light on the existence of possible oscillations from a large class of oscillatory phenomena contaminated by red noise (with a power spectrum biased toward low frequencies but without any physical significant oscillations), which is often encountered in ocean and climate observations (Stephenson et al. 2000), paleoclimate records (Braun et al. 2009), and astrophysical sources (Vaughan 2005). To be useful for this kind of data, null hypothesis of red noise must be considered. Accordingly, several methods have been developed based on variable Fourier spectrum (Schulz and Mudelsee 2002) and singular spectrum analysis (Allen et al. 1996). In most of these models, a fundamental form of background noise is assumed to be the first-order autoregressive (AR1) process, which is one of the simplest statistical models to represent the memory ability of system-integrating stochastic forcing (Hasselmann 1976). Testing against the AR1 null hypothesis generally involves two steps. The first step is to estimate two free parameters: the noise level and the lag-1 autocorrelation coefficient using linear or nonlinear regression on the original time series. The second step is to gauge the significance of the spectrum peaks of the time series relative to the estimated AR1 background noise (Schulz and Mudelsee 2002). Since AR1 does not support oscillations, rejection of this null hypothesis could be used to identify potential oscillations in the original time series.

Though these methods have been used successfully for statistical significance test of oceanic and climatic time series (Allen and Smith 1996; Feldstein 2000; Ditlevsen et al. 2007; Paluš and Novotná 2008), several issues have not been fully addressed.

First, these methods are based on the spectral shapes of noise, which is only valid when detecting signals from linear time series. For nonlinear processes, when applying Fourier transformation, harmonics of physical signals are often mixed with spectrum of background noise, making the isolation of noise from signals difficult.

Second, these methods construct the confidence level in frequency domain based on Fourier analysis, masking the temporary variability of either signal or noise, which is only valid when detecting signals from a stationary time series.

Third, the confidence attributed to a signal depends critically on assumptions that the background noise has predefined form, such as white noise or a first-order autoregressive noise. However, for real data from the natural world, it is quite common that neither signals nor noise in the data is of the known forms before analysis. Rejection of the null hypothesis on white or AR1 noise may falsely indicate some random oscillations as statistically significant.

Facing the challenge of detecting signals from nonlinear and nonstationary time series, Wu and Huang (2004) and Flandrin et al. (2004) investigated the characteristics of uniform-distributed white noise and fractional Gaussian noise, respectively, using the empirical mode decomposition (EMD) method (Huang et al. 1998). EMD method only uses local extrema information of a time series to decompose the data into a set of intrinsic mode functions (IMFs) without any linear or stationary assumptions (Chen et al. 2010; Wu et al. 2011). It is found that under EMD, any white noise can be decomposed into IMFs with identical Fourier spectra shape shifting in the axes of frequency and power spectrum density. This property is used to assign the statistical significance of IMFs for any noisy data with white-noise contamination (Wu and Huang 2004).

However, the method proposed by Wu and Huang (2004) depends on the white-noise null hypothesis. If a time series is contaminated by red noise, the significance assigned to IMFs is problematic. A simple example is that all IMFs of an AR1 process can pass the statistical significance test under the white-noise null hypothesis because AR1 certainly is not white noise and does not share its characteristics. A direct way to solve this problem is to establish a statistical significance test scheme against an AR1-noise null hypothesis, but a more fundamental question is this: Is there any general characteristic of noise, and is it possible to identify signals from noisy data without predefinition of the noise in data?

These concerns motivated us to establish an adaptive null hypothesis based on the general characteristics of noise without linear or stationary assumptions. Through this study, we found an empirical common characteristic shared by white and red noise: white or red noise can be separated into IMFs having a mean period nearly twice the value of previous IMF because EMD is an efficient dyadic filter (Wu and Huang 2004; Flandrin et al. 2004), but when resampling the same time series using factional sampling rate through linear interpolation, their IMFs cannot be preserved at its original position but are shifted toward low frequencies when the sampling rate is increased. This common characteristic of noise is identified without using any linear and stationary assumptions; therefore, it could serve as an adaptive null hypothesis to detect potential signals from data without any preknowledge of the background noise.

The paper is organized as follows. A general characteristic of white and red noise based on EMD is presented in section 2. An adaptive null hypothesis to detect signals from noise is proposed in section 3, which is applied in synthesized time series and observations from the real world in section 4. Section 5 gives the discussion and conclusions. For completeness, a brief introduction of EMD and Hilbert–Huang transformation (HHT) is given in the  appendix.

2. Common characteristics of white and red noise based on EMD

In this section, EMD is used to study the characteristics of two types of noise. The first kind is from the AR1 process, which has been widely used as background noise for geophysical processes (Ghil and Childress 1987; Zwiers and von Storch 1995). The second kind of noise is a fractional Gaussian noise (fGn), which is proposed to reproduce the “Hurst phenomenon,” a general term to describe the tendency of having long-term memory of a process (Mandelbrot and van Ness 1968). Studying of this kind of noise is for the purpose of comparison with the previous results provided by Flandrin et al. (2004).

a. Characteristics of first-order autoregressive processes

AR process assumes the time series under study to be a linear combination of its past values, , , … , plus the white noise,

 
formula

where the coefficients are the autoregressive parameters, the model order, the sampling rate, and the white noise with the standard deviation . Once the AR parameters are determined, the nonnormalized power spectrum of the time series is given by

 
formula

where denotes the frequency.

An example of an AR1 process with parameters and is first analyzed. The theoretical spectra of AR1 can be easily obtained using Eq. (2).

The 5000 independent AR1 time series with length 210 are generated to study the general characteristics of AR1 noise. EMD decomposes each time series into six IMFs. Figure 1a shows the 5000-ensemble mean of Hilbert–Huang spectra (HHS) as a function of frequency and time, and its marginal spectra (HHMS, time integration of HHS) as a function of frequency, of each IMF of original AR1 process, respectively. Since the AR1 process is statistically stationary, of Monte Carlo samples gives no more information than the derived . Both of them are shown for comparison with the nonstationary examples later.

Fig. 1.

(a) Time–frequency spectra (HHS) of each IMF of AR1 process. (left)–(right) IMFs 1, 2, 3, 4, and 5 and the HHMS of IMFs 1–5 (red through yellow color lines) and the total HHMS of all IMFs (blue dashed line). (b) The HHMS of IMFs 1–5 are redrawn after normalized by its maximum value and aligned at the position of maxima.

Fig. 1.

(a) Time–frequency spectra (HHS) of each IMF of AR1 process. (left)–(right) IMFs 1, 2, 3, 4, and 5 and the HHMS of IMFs 1–5 (red through yellow color lines) and the total HHMS of all IMFs (blue dashed line). (b) The HHMS of IMFs 1–5 are redrawn after normalized by its maximum value and aligned at the position of maxima.

There are several noticeable characteristics of the AR1 process deserving special attention:

  1. of IMFs shows obvious increasing energy when the frequency decreases. This is a typical characteristic of red noise, which is evident in the energy of each IMF.

  2. of each IMF is of a Gaussian-bell shape with some distortions. As shown in Fig. 1b, if of each IMF is normalized by its maximum value that follows a well-defined scaling rule, the resulting spectrum has an identical dependence on lnT, where T is the period. This feature is similar to that of white noise (Wu and Huang 2004; Flandrin et al. 2004).

  3. Although there exists a spectrum gap between the of two adjacent IMFs, adding of all IMFs together yields that of original AR1, and is equal to the theoretical smooth spectrum of AR1 obtained using Eq. (2), except at the very-low-frequency band owing to computational accuracy. This gives a general relationship between spectrum of each IMF and total spectrum of the original AR1 noise.

From the above observations, the following question emerged: For an AR1 process with a smooth theoretical spectrum, which factors determine the frequency of spectral peaks of their IMFs?

To answer this question, we carry out Monte Carlo experiments on changing parameters and of an AR1 process and introduce spectrum-weighted mean frequency (SWMF) to represent the mean frequency of of each IMF as follows:

 
formula

where is instantaneous frequency, and are limits of the interested frequency, and the subscript k denotes the kth IMF. Obviously, for a spectrum with a dominant frequency, the SWMF equals this frequency, but for a spectra peak with a wide dynamics range, SWMF is the mean frequency of the spectrum. The results of Monte Carlo experiments show that different sets of parameters of AR1 processes may lead to different distortions of Gaussian-bell shape of individual , but will not change of each IMF significantly (not shown). This result is consistent with the dyadic filter property of EMD (Flandrin et al. 2004; Wu and Huang 2004) and implies that changing the sampling rate of original time series may change the mean frequency of AR1’s IMFs.

To illustrate this property, we use fractional sampling rate [1.1, 1.2, … , 1.9] from to to resample the original AR1 process. The reason to use the fractional sampling rate is based on the fact that the EMD method acts as a dyadic filter; therefore, the IMF of resampled time series with sampling rate is very likely to have the same mean period as that of the next IMF of the original time series with sampling rate . To resample the original AR1 process, the linear interpolation method is applied, because it will not add extrema to the original time series during interpolation, and hence will not introduce spurious periods into the resampled time series. After resampling the original AR1 process using the fractional sampling rate [1.1, 1.2, … , 1.9], the EMD method is applied to decompose the resulting time series into their IMFs. The 5000-ensemble mean and of each IMF are shown in Figs. 2a and 2b, respectively. It is found that when the sampling rate increases from 1.0 to 1.9 gradually, the spectrum of one IMF of the resulting time series shifts toward lower frequency. Since the AR1 process is statistically stationary, gives similar information as that of . The shift of the spectral peaks of one time series to another one with different fractional sampling rate presents one important characteristic of the AR1 noise: changing the sampling rate of the original time series will change the mean period of its corresponding IMF.

Fig. 2.

(a) Ensemble mean of HHS of each IMF of the AR1 processes with sampling rates from (left) to (right) . The abscissa is time, from beginning to the end of whole time series. The ordinate is frequency. Notice the different scales are used for different IMF. (b) Ensemble mean of normalized HHMS of each IMF of the AR1 process with sampling rates from to , as a function of logarithm of their periods. IMFs 1–5 of the AR1 process with each sampling rate are shown in one color group from left to right (from high frequency to low). (c) Ensemble mean of SWMF of IMFs 1–5 of the AR1 process normalized by its maximum value and shifted left with , where subscript k denotes the kth IMF and subscript i denotes the ith resampled time series. (d) Normalized SWMF of each IMF. For better visualization, it is slightly shifted right for the time series with different sampling rates. The error bar denotes one standard deviation of 5000-ensemble mean. The black dashed line with slope 0.5 illustrates the property of EMD as adaptive dyadic filter.

Fig. 2.

(a) Ensemble mean of HHS of each IMF of the AR1 processes with sampling rates from (left) to (right) . The abscissa is time, from beginning to the end of whole time series. The ordinate is frequency. Notice the different scales are used for different IMF. (b) Ensemble mean of normalized HHMS of each IMF of the AR1 process with sampling rates from to , as a function of logarithm of their periods. IMFs 1–5 of the AR1 process with each sampling rate are shown in one color group from left to right (from high frequency to low). (c) Ensemble mean of SWMF of IMFs 1–5 of the AR1 process normalized by its maximum value and shifted left with , where subscript k denotes the kth IMF and subscript i denotes the ith resampled time series. (d) Normalized SWMF of each IMF. For better visualization, it is slightly shifted right for the time series with different sampling rates. The error bar denotes one standard deviation of 5000-ensemble mean. The black dashed line with slope 0.5 illustrates the property of EMD as adaptive dyadic filter.

The impact of the sampling rate on the original time series can be presented by normalizing the of each IMF obtained by different fractional resampling rates by its maximum value, and then shifting them left with , as shown in Fig. 2c. All normalized have the same shape. This characteristic can be described quantitatively by comparing the SWMF of each IMF with different sampling rates (Fig. 2d). To emphasize the gradual shift of mean frequency of IMFs during the resampling procedure, the SWMF of each IMF is normalized by the SWMF of the IMF of the original time series—that is, , where subscript k denotes the kth IMF, subscript i denotes the ith resampled time series, and 0 corresponds to the original time series. Obviously, during resampling with the sampling rate increasing from to 1.9, of each IMF decreases gradually toward 0.5; that is, . For convenience, we use to denote the SWMF of the kth IMF for all the resampled time series.

For the parameter , the AR1 process is reduced to white noise, which has already been discussed in Wu and Huang (2004). Applying the same resampling and analysis procedures, we can derive the SWMF of white noise as shown in Fig. 3 with the same property as that of the AR1 noise time series.

Fig. 3.

As in Fig. 2d, but for the AR1 process with the parameter (i.e., the white noise).

Fig. 3.

As in Fig. 2d, but for the AR1 process with the parameter (i.e., the white noise).

We also analyzed other high-order AR processes with Monte Carlo sets of parameters and found that this feature is very stable; that is, the spectra of IMFs of time series in one frequency band will shift their peaks toward lower frequency during the fractional resampling procedure. However, it should be noted that some high-order AR processes may support oscillation itself. For such oscillations, resampling will not change the position of their theoretical peaks.

b. Characteristics of fractional Gaussian noise

Another type of noise process, fGn, is used to investigate whether the previous property is only valid for autoregressive noise processes. The statistical properties of fGn depend on one single scale parameter, H, its Hurst exponent (Embrechts and Maejima 2002; Flandrin et al. 2004). The special case is , corresponding to white noise. When , the spectrum is high pass, corresponding to ultraviolet noise, and when , the spectrum shows the properties of red noise with a “1/f”-type spectral divergence. Here, we investigate its characteristics under the fractional resampling procedure.

Different parameter H values show different scale variability of noise processes; for example, when the parameter H satisfies , the spectrum of fGn noise is biased toward high frequency. Notice that the fractional resampling procedure has no special requirements for the spectral distribution of the original time series; it is expected that no significantly different characteristics are introduced during the fractional resampling procedure. We carried out comprehensive experiments with different parameters, and only the results with parameter are presented here. As shown in Fig. 4, when resampling fGn noise using fractional resampling rates, of each IMF of the resampled time series also shifts their mean-frequency-band position toward lower frequency.

Fig. 4.

As in (a) Fig. 2b and (b) Fig. 2d, but for the fGn processes with . When , the spectrum is biased toward high frequency (short period).

Fig. 4.

As in (a) Fig. 2b and (b) Fig. 2d, but for the fGn processes with . When , the spectrum is biased toward high frequency (short period).

c. Difference using true time–frequency and mean-frequency spectral representation

The above analysis, based on the EMD method, reveals a general characteristic of AR1 and fGn noise, which is independent of definition or parameters. This section investigates whether the noise presents this characteristic using Fourier transformation, which is essentially the mean-frequency spectral representation. We resampled the AR1 noise processes studied in section 2a and computed the Fourier spectrum of the resulted time series, respectively. As shown in Fig. 5, the Fourier spectrum of time series keeps unchanged during the resampling procedure except at the Nyquist frequency band, owing to the different sampling rate. The main reason is that the Fourier-based method is based on globally constant frequency, which is essentially a mean value because of the integral transformation. As a result, the difference between the spectra owing to different sampling rates will vanish when computing time integration (Huang et al. 2011). On the contrary, HHT defines frequency as the time derivative of the phase function, which gives true time–frequency spectral presentation of nonlinear and nonstationary time series (Huang et al. 2009). Therefore, this property is actually one fundamental characteristic of noise processes, based on EMD and HHT analysis method.

Fig. 5.

Fourier spectrum of the fractional resampled time series of AR1 processes studied in section 2a.

Fig. 5.

Fourier spectrum of the fractional resampled time series of AR1 processes studied in section 2a.

By using of this property, an adaptive null hypothesis could be proposed so that we could detect signals from noisy data without preknowledge or assumptions of background noise and its statistical models.

3. Adaptive null hypothesis and statistical significance test

a. Adaptive null hypothesis

The finding in section 2 implies that a potential oscillation of data with red noise could be identified by testing whether the mean position of its spectrum peak is stable or not when resampling time series using fractional sampling rates. Based on this finding, an adaptive hypothesis for detecting signals from noisy data is proposed as follows:

  • The null hypothesis H0: The time series under investigation contains nothing but random noise.

  • The alternative hypothesis H1: Real signals are present in the data.

For any time series whose sampling rate is and length is , satisfying , where is the data points, the statistical test of H0 against H1 is as follows.

1) Step 1: Resampling the time series using fractional sampling rates

As a dyadic filter as EMD acts, when the sampling rate exceeds , the mean period of the first IMF of resampled time series would be close to that of the second IMF of the original time series. Therefore, the fractional sampling rates are chosen as any real number in the region of .

For any given sampling rate, the original time series is resampled using linear interpolation. The data points of resampled time series with fractional sampling rate is , the largest integer less than , where is number of resampling rates. The resulting time series are .

2) Step 2: Decomposing the resulting time series into their IMFs

Based on Wu and Huang (2004, 2009), the maximum number of IMFs of a dataset is close to the logarithm of the data length. For the new resampled data, the total number of the IMFs lies in , which is at most one less than that of the original time series. Since our analysis mainly focuses on the IMFs with high frequency (compared with data length), we just use the same IMF number for all the resampled data; that is, . The resulting IMFs of each time series are .

3) Step 3: Computing instantaneous frequency (period), Hilbert–Huang spectrum, and marginal spectrum of all

Several methods can be used to compute instantaneous frequency (period), , and (Huang et al. 2009). In this paper, we mainly use the generalized zero-crossing method, which is accurate to ¼ period of intrawave frequency modulations. In the resampling step (step 1), the sampling rate and total number of data points are changed, which must be born in mind when computing the instantaneous frequency and spectrum.

4) Step 4: Computing spectrum-weighted mean-frequency of

When the instantaneous frequency (period) of time series with different sampling rates are determined, we could compute the SWMF of every IMF, which can be normalized by so that the SWMF of original time series equals to 1 for all its IMFs.

b. Confidence interval

For the white or red noise, will deviate from the unit line gradually when the fractional sampling rate increases in the range of . Therefore, for any time series, the deviation of from could be used as an indicator to determine whether an IMF of time series is a potential oscillation, or is likely from stochastic processes. Constructing a confidence interval for this deviation is difficult because the null hypothesis proposed in section 3a is an adaptive one, which has no predefined functional form or probability distribution of the background noise, hence no general distribution of .

Nevertheless, the deviation of from could provide some information of the signal-to-noise ratio of a time series in specified frequency band. Let us study the case of a regular sharp-crested Stokes wave, with a mean period of 14.7 and logarithm of mean-period 3.9, and an AR1 process added as noisy background, with various amplitude (signal-to-noise ratio) 0.1, 0.2, and 0.3, respectively (Fig. 6). We choose resampling rate as [1.0:0.01:1.9] and compute their and .

Fig. 6.

Noise (blue) and noise-contaminated signal (red). The signals are the regular Stokes wave with different amplitudes: 0.1, 0.2, and 0.3. The mean period of the signal is 14.7, whose logarithm is 3.9, which is used in Fig. 7.

Fig. 6.

Noise (blue) and noise-contaminated signal (red). The signals are the regular Stokes wave with different amplitudes: 0.1, 0.2, and 0.3. The mean period of the signal is 14.7, whose logarithm is 3.9, which is used in Fig. 7.

As Fig. 7 shows, a Stokes wave with weak amplitude (0.1) has little impact on the original red noise process; its energy band shown in keeps moving upward to low frequency when the resampling rate is increased. But when the amplitude ratio reaches 0.3, the energy band of remains unchanged (Fig. 7d), indicating the existence of the potential significant signal in this frequency band. An interesting case is found for the signal with the amplitude ratio 0.2 (Fig. 7c). There exists two spectrum bands when resampling rate exceeds 1.2, one of them going upward with increasing resampling rate, whereas the other keeps its positions within a narrow band of mean period 3.9 (same as that of the added Stokes wave). Comparing with Fig. 7a, we could find that the signal has similar energy to that of the original noise at low frequency. For this case, comparing the energy contribution is not sufficient to identify which one is signal and which one is noise, unless we have a priori knowledge of the background noise. But with the help of the fractional resampling technique, we could determine which energy band is probably from signal against the adaptive null hypothesis.

Fig. 7.

The normalized HHMS of all IMFs of the time series with different fractional resampling rates. The original time series are given in Fig. 5, including noise and the signal with different amplitudes: (a) 0.0, (b) 0.1, (c) 0.2, and (d) 0.3. The abscissa is the sampling rate and the ordinate is the logarithm scale of mean period.

Fig. 7.

The normalized HHMS of all IMFs of the time series with different fractional resampling rates. The original time series are given in Fig. 5, including noise and the signal with different amplitudes: (a) 0.0, (b) 0.1, (c) 0.2, and (d) 0.3. The abscissa is the sampling rate and the ordinate is the logarithm scale of mean period.

Investigation of the pattern of is only qualitative. The quantitative description is given by the SWMF for every IMF. As shown in Fig. 8, during the fraction resampling procedure, decreases gradually from 1 to 0.5, but for the third IMF with the amplitude ratio of 0.3, is quite close to the unit line compared with other IMFs, revealing a potential signal in this frequency band.

Fig. 8.

SWMF of IMFs 1–4 of sample background noise added with the signal with amplitude (a) 0.0, (b) 0.1, (c) 0.2, and (d) 0.3. The black dashed line with slope 0.5 is as in Fig. 2c. The legend in (b)–(d) is as in (a).

Fig. 8.

SWMF of IMFs 1–4 of sample background noise added with the signal with amplitude (a) 0.0, (b) 0.1, (c) 0.2, and (d) 0.3. The black dashed line with slope 0.5 is as in Fig. 2c. The legend in (b)–(d) is as in (a).

c. Nonstationary processes

One of the important advantages of HHT is its ability to derive a true time–frequency spectrum for nonstationary processes adaptively. With this property, the method proposed in section 3a can be extended in detecting intermittent signals in a nonstationary time series, which is shown in Fig. 9.

Fig. 9.

(a) 400 time series of SST from a randomly chosen 5° longitude × 5° latitude grid box in the North Pacific during 2007–09. (top) The original observed SST without the modulated annual cycle and long-term variability. (middle) The constructed data based on (top) with regular sinusoidal wave sections added following the steps in the text. (bottom) The difference between (top) and (middle). (b) As in Fig. 2a, but for the nonstationary synthetic time series given in (a). Notice the time–frequency Hilbert–Huang spectrum of IMF 3 remaining unchanged during the two specific periods: October 2007–March 2008 and April–October 2009. (c) The SWMF for the third IMF of the nonstationary synthetic time series given in (a). The abscissa is the sampling rate and the ordinate is time.

Fig. 9.

(a) 400 time series of SST from a randomly chosen 5° longitude × 5° latitude grid box in the North Pacific during 2007–09. (top) The original observed SST without the modulated annual cycle and long-term variability. (middle) The constructed data based on (top) with regular sinusoidal wave sections added following the steps in the text. (bottom) The difference between (top) and (middle). (b) As in Fig. 2a, but for the nonstationary synthetic time series given in (a). Notice the time–frequency Hilbert–Huang spectrum of IMF 3 remaining unchanged during the two specific periods: October 2007–March 2008 and April–October 2009. (c) The SWMF for the third IMF of the nonstationary synthetic time series given in (a). The abscissa is the sampling rate and the ordinate is time.

To construct a nonstationary time series, we choose a background noise from a real-world observation, the 3-day-averaged daily remotely sensed sea surface temperature records in a randomly chosen 5° longitude × 5° latitude grid box in the North Pacific during the period from 2007 to 2009 (Wentz et al. 2000). We first remove the annual cycle using the EMD method (Wu et al. 2008), and then replace two sections of the third IMF of the data with a regular sinusoidal wave with same mean period of 35 days and same mean amplitude as that of the third IMF. These two sections are during October 2007–March 2008 and April–October 2009. Figure 9a compares the derived 400 nonstationary time series with their counterparts.

To investigate whether artificially embedded signals during the periods from October 2007 to March 2008 and from April to October 2009 can be identified, the constructed nonstationary time series are first resampled using fractional sampling rates of 1.1, 1.2, … , and 1.9 days, and the resulting time series are decomposed into their IMFs using EMD. The of each IMF is computed and 400-ensemble-averaged is shown in Fig. 9b. The first two IMFs present continuous shift of the time–frequency spectrum toward low frequency when the resampling rate is increased. This illustrates that there is no potential signals in these two frequency bands. The of the third IMF presents two staircases, which coincide with the periods of embedded signals. During the other periods, the of the third IMF also shows shifting toward low frequencies with increasing resampling rate, suggesting that there is no significant signals.

The quantitative identification of this intermittent signal could be implemented by computing as a function of time and sampling rate; that is,

 
formula

for each IMF. Figure 9c shows the distribution of for the third IMF. Obviously, the two specific periods, October 2007–March 2008 and April–October 2009, stand out as remains close to the unit line during the fractional resampling procedure, whereas during the other periods, keeps decreasing toward 0.5.

4. Case studies

Having established the adaptive null hypothesis for background noise and the fractional resampling procedure, the characteristics of background noise in climatic data are studied in this section. Two examples are presented.

a. Solar cycle

The best-documented observation of the Sun is the sunspots, which are cool regions on the photosphere of the Sun that appear as dark spots compared to surrounding facular, which are brighter. The observations of the sunspot number can be traced back to the early seventeenth century. Since 1979, when satellite measurements of the absolute solar irradiance become available, it has been established that the sunspot number correlates with the intensity of the solar radiation, with increased solar radiation during the active periods as measured by the sunspot number (Lean 1987, 1991). Therefore, the study of sunspots number time series is helpful for understanding the historical variability of solar irradiation, as well as its impacts on Earth’s climate. The first firm relationship between sunspot number and global surface temperature was established by Coughlin and Tung (2004). Later, Tung et al. (2008) and Zhou and Tung (2010) also detected a sunspot cycle on the ocean surface temperature record. These discoveries made the study of the relationship between solar variability and climate even more compelling (Beer et al. 2000; Van Loon and Meehl 2008; Gray et al. 2010; and references therein). In this paper, the propose fractional resampling technique (FRT) method is applied to study the characteristics of background noise of the sunspot number time series.

Monthly averages of the sunspot numbers available at http://solarscience.msfc.nasa.gov/greenwch/spot_num.txt are analyzed in this section. The data covers the period from 1750 to present (Fig. 10a). The sunspot number has a clear period of about 11 yr, shown as the fourth IMF in Fig. 10b after decomposed by EMD. What is new in this analysis is the characteristics of the first two IMFs. These two IMFs have an obvious 11-yr-period amplitude modulation. However, through the analysis of FRT, it is found that these two IMFs have similar characteristics as noise; that is, the SWMF of IMFs 1–3 keeps decreasing toward 0.5 under fractional resampling. This indicates that the background noise of the sunspot number is modulated by the general solar cycle, with intensive oscillation during sunspot maximum and weak oscillation during sunspot minimum. Comparing the amplitudes of IMF 1 and 4 shows that during the sunspot maximum period, the amplitude of noise sometimes may exceed the half amplitude of solar cycle with about an 11-yr period. Whether and how this background noise of solar activity influences Earth’s climate variability at interannual time scales remain to be studied.

Fig. 10.

(a) Monthly averages of the sunspot numbers during 1750–2011. Data are available at http://solarscience.msfc.nasa.gov/greenwch/spot_num.txt. (b) IMFs of sunspot numbers time series. The fourth IMF is the solar cycle with mean period about 11 yr. First three IMFs are of high-frequency and the last IMF low-frequency variability. (c) SWMF of IMFs 1–4 of sunspot numbers time series. The blue line is SWMF of solar cycle.

Fig. 10.

(a) Monthly averages of the sunspot numbers during 1750–2011. Data are available at http://solarscience.msfc.nasa.gov/greenwch/spot_num.txt. (b) IMFs of sunspot numbers time series. The fourth IMF is the solar cycle with mean period about 11 yr. First three IMFs are of high-frequency and the last IMF low-frequency variability. (c) SWMF of IMFs 1–4 of sunspot numbers time series. The blue line is SWMF of solar cycle.

b. Sea surface temperature

The next application of the FRT method is to study the characteristics of the background noise of remote sensing sea surface temperature (SST) provided by the Advanced Microwave Scanning Radiometer for Earth Observing System (EOS) (AMSR-E, available at http://www.ssmi.com). The SST data during the period from 1 January 2003 to 31 December 2011 are used, covering eight complete annual cycles. Because of its ability to see through clouds, AMSR-E provides an uninterrupted view of the SST, enabling the study of the SST variability in the intraseasonal time scale.

We choose two 5° longitude × 5° latitude areas, one in the North Pacific (25°–30°N, 180°–175°E, referred to as NP; Fig. 11a) and one in the Indian Ocean (15°–20°N, 85°–90°E, referred to as IN; Fig. 12a), and study the area-averaged SST. In IN area, the semiannual cycle of SST driven by the monsoon has nearly the same magnitude of its annual cycle, whereas in the NP area, there is no semiannual variability. After decomposing the two SST time series into their IMFs and computing the HHMS, we could find that in the NP area, the strongest spectral peak represents the annual cycle. In addition to the annual peak, there are three spectral peaks at intraseasonal time scales, with mean periods at about 90, 30, and 15 days (Fig. 11c). In the IN area, there are significant peaks on annual and semiannual time scales. Besides these two spectral peaks with clear physical mechanism, there are also three spectral peaks at intraseasonal time scales, with mean periods of about 60, 30, and 15 days (Fig. 12c). Several authors previously attributed these intraseasonal spectral peaks to tidal mixing because the mean periods of these peaks are very close to the tidal cycles (Ffield and Gordon 1996; Souza and Pineda 2001). However, through the significance test by the FRT method, we could find that these intraseasonal spectral peaks are not stable when resampling the SST time series using a different sampling rate. In the NP area, as Fig. 11d shows, their SWMF keeps decreasing toward 0.5 during the fractional resampling procedure, which means that IMFs 2–4 have similar characteristics as the noise. In the IN area, as Fig. 12d shows, the FRT method could identify both annual and semiannual cycle, whose SWMF remains nearly unchanged under fractional resampling, but for IMFs 2–3, the SWMF shows similar characteristics as noise. The spatial distribution of the background noise of global SST will be reported in another paper.

Fig. 11.

(a) Area-averaged SST in the North Pacific (25°–30°N, 180°–175°E) during 2003–11. Data are available at http://www.ssmi.com/amsr/amsr_browse.html. (b) IMFs of SST time series in the North Pacific. (c) Hilbert–Huang marginal spectrum of IMFs in (b). The blue line denotes HHMS of the original time series. The mean periods of four obvious spectral peaks are about 15, 30, 90, and 365 days. (d) SWMF of the SST time series of the NP. Only the annual cycle (blue line) stays close to 1 while others decrease toward 0.5 when sampling rate increases.

Fig. 11.

(a) Area-averaged SST in the North Pacific (25°–30°N, 180°–175°E) during 2003–11. Data are available at http://www.ssmi.com/amsr/amsr_browse.html. (b) IMFs of SST time series in the North Pacific. (c) Hilbert–Huang marginal spectrum of IMFs in (b). The blue line denotes HHMS of the original time series. The mean periods of four obvious spectral peaks are about 15, 30, 90, and 365 days. (d) SWMF of the SST time series of the NP. Only the annual cycle (blue line) stays close to 1 while others decrease toward 0.5 when sampling rate increases.

Fig. 12.

As in Fig. 11, but for the area-averaged SST in the Indian Ocean (15°–20°N, 85°–90°E). The semiannual component stands out compared with that in the North Pacific.

Fig. 12.

As in Fig. 11, but for the area-averaged SST in the Indian Ocean (15°–20°N, 85°–90°E). The semiannual component stands out compared with that in the North Pacific.

These examples illustrated the power of the proposed FRT method for identifying background noise of time series from the real world.

5. Discussion and conclusions

To detect potential signals from noisy data, one needs to reject the null hypothesis that they are simply noise. That null hypothesis should have a noise model consistent with the particular background noise in the data involved. However, we usually have little information of either signal or background noise. Therefore, it would be difficult to construct the corresponding null hypothesis. In this paper, an adaptive null hypothesis of noise is constructed through a general characteristic of AR1 noise and fractional Gaussian noise revealed by using the EMD method and the fractional resampling technique (FRT).

With the examples presented in the paper, it is shown that the proposed FRT has several advantages for detecting signals from noisy data:

  1. It is based on one of the general characteristics of noise processes, without predefined function form or a priori knowledge of background noise. This makes the method effective when dealing with various real applications, in which neither signals nor noise is known before analysis.

  2. This method is based on the EMD method, which is developed mainly for analyzing nonlinear and nonstationary time series. Notice that both the null hypothesis and the testing methods do not involve linear or stationary assumptions. Therefore, this method is valid for nonlinear and nonstationary processes, which are very often the case in real applications.

There are, however, some problems or limitations of the proposed method that may require further study.

First, the conclusion drawn by this method is only qualitative. As in the examples shown in Figs. 8c, 8d, and 10c, the SWMF of the IMF also decreases with increasing resampling rate, although the SWMF . In such a case one may tend to believe that this IMF represents a meaningful signal; however, the uncertainty is unknown. This is because it is hard to establish the general distribution of the SWMF for the derivation of the confidence level if there is no predefinition of the functional form or probability distribution of the background noise of the studied time series.

Second, the method is only valid for studying high-frequency components (compared with data length and sampling rate). One reason for this is that FRT itself might introduce spurious low-frequency oscillation, making its separation from the original low-frequency variability separation difficult. Another reason is that, because the low-frequency component of noise usually has stronger energy, only the signals with significant energy can be identified, which is consistent with Wu and Huang (2004).

Third, rejecting the adaptive null hypothesis proposed in this paper cannot rule out the possibility that the data is from a higher order autoregressive process. This difficulty is associated with the classic issue: What is a valid random oscillation and what is noise? The demarcation is somewhat arbitrary. The resolution is more philosophical than technical.

Acknowledgments

This work was supported by the National Basic Research Program of China 2012CB957802, National Science Foundation of China 41176029, Public Science and Technology Research Funds Projects of Ocean 201105019, and Chinese Polar Research Project CHINARE2012-04-04. ZW is supported by U.S. National Science Foundation Grant AGS-1139479, and NEH was supported by grants from NSC (NSC95-2119-M-008-031-MY3 and NSC97-2627-B-008-007) and a grant from NCU (965941).

APPENDIX

Empirical Mode Decomposition and Hilbert–Huang Transformation

Hilbert–Huang transformation (HHT), consisting of empirical mode decomposition (EMD) and Hilbert spectral analysis, is a newly developed adaptively data analysis method for studying nonlinear and nonstationary time series. A detailed introduction of the method is given in Huang et al. (1998) and a thorough review on its applications to geophysical studies is in Huang and Wu (2008). A brief introduction of EMD is given as follows.

EMD decomposes a time series into a set of intrinsic mode functions in the form

 
formula

where is the residual, which has at most one extremum representing either a mean trend or a constant. The term is the jth intrinsic mode function (IMF) of original time series, which can be written as

 
formula

where and are amplitude- and frequency-modulated functions of time, respectively.

The IMFs are extracted through a sifting process for any time series: 1) locate all maxima and minima and connect all maxima (minima) with a cubic spline as upper (lower) envelope of the time series; 2) compute the difference between the time series and the mean of upper and lower envelopes to yield a new time series ; 3) for the time series , repeat steps 1 and 2 until upper and lower envelopes are symmetric with respect to zero mean under the certain criteria, then an IMF, , is derived as the time series ; and 4) subtract from original time series to yield a residual and treat as the original time series and repeat steps 1–3 until the residual becomes a monotonic function or a function with only one extremum; then, the whole sifting process is completed.

When the IMFs of original time series are derived, the physically meaningful instantaneous frequency and instantaneous amplitude of each IMF can be determined efficiently using Hilbert transformation, which is helpful for understanding the nonlinear and nonstationary processes. For each IMF , its Hilbert transform is

 
formula

With Hilbert transform of IMF, an analytic function is obtained as

 
formula

where , and instantaneous amplitude and instantaneous phase function are derived as follows:

 
formula

Then the instantaneous frequency is given as differential of instantaneous phase function; that is,

 
formula

Because an IMF is derived as a narrow band function, the instantaneous frequency defined using the above procedure can give a physically meaningful representation of nonlinear and nonstationary process (Huang et al. 2009).

REFERENCES

REFERENCES
Allen
,
M. R.
, and
L. A.
Smith
,
1996
:
Monte Carlo SSA: Detecting irregular oscillations in the presence of colored noise
.
J. Climate
,
9
,
3373
3404
.
Beer
,
J.
,
W.
Mende
, and
R.
Stellmacher
,
2000
:
The role of the sun in climate forcing
.
Quat. Sci. Rev.
,
19
,
403
415
.
Braun
,
H.
,
P.
Ditlevsen
,
J.
Kurths
, and
M.
Mudelsee
,
2009
:
Limitations of red noise in analyzing Dansgaard-Oeschger events
.
Climate Past
,
5
,
1803
1818
.
Chen
,
X. Y.
,
Z.
Wu
, and
N. E.
Huang
,
2010
:
The time-dependent intrinsic correlation based on the empirical mode decomposition
.
Adv. Adap. Data Anal.
,
2
,
233
265
.
Coughlin
,
K.
, and
K. K.
Tung
,
2004
:
Eleven-year solar cycle signal throughout the lower atmosphere
.
J. Geophys. Res.
,
109
, D21105,
doi:10.1029/2004JD004873
.
Davenport
,
W. B.
, and
W. L.
Root
,
1958
: An Introduction to the Theory of Random Signals and Noise. McGraw-Hill, 408 pp.
Ditlevsen
,
P. D.
,
K. K.
Andersen
, and
A.
Svensson
,
2007
:
The DO-climate events are probably noise induced: Statistical investigation of the claimed 1470 years cycle
.
Climate Past
,
3
,
129
134
.
Embrechts
,
P.
, and
M.
Maejima
,
2002
: Selfsimilar Processes. Princeton University Press, 152 pp.
Feldstein
,
S.
,
2000
:
Teleconnections and ENSO: The timescale, power spectra, and climate noise properties
.
J. Climate
,
13
,
4430
4440
.
Ffield
,
A.
, and
A. L.
Gordon
,
1996
:
Tidal mixing signatures in the Indonesian Seas
.
J. Phys. Oceanogr.
,
26
,
1924
1937
.
Flandrin
,
P.
,
G.
Rilling
, and
P.
Goncalves
,
2004
:
Empirical mode decomposition as a filter bank
.
IEEE Signal Process. Lett.
,
11
,
112
114
.
Ghil
,
M.
, and
S.
Childress
,
1987
: Topics in Geophysical Fluid Dynamics: Atmospheric Dynamics, Dynamo Theory, and Climate Dynamics. Springer, 527 pp.
Gray
,
L. J.
, and
Coauthors
,
2010
:
Solar influences on climate
.
Rev. Geophys.
,
48
, RG4001,
doi:10.1029/2009RG000282
.
Hasselmann
,
K.
,
1976
:
Stochastic climate models Part I. Theory
.
Tellus
,
28
,
473
485
.
Huang
,
N. E.
, and
Z.
Wu
,
2008
:
A review on Hilbert-Huang transform: Method and its applications to geophysical studies
.
Rev. Geophys.
,
46
,
1
23
.
Huang
,
N. E.
, and
Coauthors
,
1998
:
The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis
.
Proc. Roy. Soc. London
,
454A
,
903
–995.
Huang
,
N. E.
,
Z.
Wu
,
S. R.
Long
,
K. C.
Arnold
,
X. Y.
Chen
, and
K.
Blank
,
2009
:
On instantaneous frequency
.
Adv. Adap. Data Anal.
,
1
,
177
229
.
Huang
,
N. E.
,
X. Y.
Chen
,
M.
Lo
, and
Z.
Wu
,
2011
:
On Hilbert spectral representation: A true time-frequency representation for nonlinear and nonstationary data
.
Adv. Adap. Data Anal.
,
3
,
63
93
.
Lean
,
J.
,
1987
:
Solar ultraviolet irradiance variations: A review
.
J. Geophys. Res.
,
92
(D1),
839
868
.
Lean
,
J.
,
1991
:
Variations in the Sun’s radiative output
.
Rev. Geophys.
,
29
,
505
535
.
Mandelbrot
,
B. B.
, and
J. W.
van Ness
,
1968
:
Fractional Brownian motions, fractional noises and applications
.
SIAM Rev.
,
10
,
422
437
.
Mann
,
M. E.
, and
J. M.
Lees
,
1996
:
Robust estimation of background noise and signal detection in climatic time series
.
Climatic Change
,
33
,
409
445
.
Paluš
,
M.
, and
D.
Novotná
,
2008
: Detecting oscillations hidden in noise: Common cycles in atmospheric, geomagnetic and solar data. Nonlinear Time Ser. Anal. Geosci.,112, 327–353.
Priestley
,
M. B.
,
1981
: Spectral Analysis and Time Series. Vol. 1. Academic Press, 653 pp.
Schulz
,
M.
, and
M.
Mudelsee
,
2002
:
REDFIT: Estimating red-noise spectra directly from unevenly spaced paleoclimatic time series
.
Comput. Geosci.
,
28
,
421
426
.
Souza
,
A. J.
, and
J.
Pineda
,
2001
:
Tidal mixing modulation of sea-surface temperature and diatom abundance in Southern California
.
Cont. Shelf Res.
,
21
,
651
666
.
Stephenson
,
D. B.
,
V.
Pavan
, and
R.
Bojariu
,
2000
:
Is the North Atlantic Oscillation a random walk?
Int. J. Climate
,
20
,
1
18
.
Tung
,
K.-K.
,
J.
Zhou
, and
C. D.
Camp
,
2008
:
Constraining model transient climate response using independent observations of solar-cycle forcing and response
.
Geophys. Res. Lett.
,
35
, L17707,
doi:10.1029/2008GL034240
.
Van Loon
,
H.
, and
G. A.
Meehl
,
2008
:
The response in the Pacific to the sun’s decadal peaks and contrasts to cold events in the Southern Oscillation
.
J. Atmos. Sol. Terr. Phys.
,
70
,
1046
1055
.
Vaughan
,
S.
,
2005
:
A simple test for periodic signals in red noise
.
A&A
,
431
,
391
403
.
Wentz
,
J.
,
C.
Gentemann
,
D.
Smith
, and
D.
Chelto
,
2000
:
Satellite measurements of sea surface temperature through clouds
.
Science
,
288
,
847
850
.
Wu
,
Z.
, and
N. E.
Huang
,
2004
:
A study of the characteristics of white noise using the empirical mode decomposition method
.
Proc. Roy. Soc. London
,
460A
,
1597
–1611.
Wu
,
Z.
, and
N. E.
Huang
,
2009
:
Ensemble empirical mode decomposition: A noise-assisted data analysis method
.
Adv. Adap. Data Anal.
,
1
,
1
41
.
Wu
,
Z.
,
E. K.
Schneider
,
B. P.
Kirtman
,
E. S.
Sarachik
,
N. E.
Huang
, and
C. J.
Tucker
,
2008
:
The modulated annual cycle: An alternative reference frame for climate anomalies
.
Climate Dyn.
,
31
,
823
841
.
Wu
,
Z.
,
N. E.
Huang
, and
X. Y.
Chen
,
2011
:
Some considerations on physical analysis of data
.
Adv. Adap. Data Anal.
,
3
,
95
113
.
Zhou
,
J. S.
, and
K. K.
Tung
,
2010
:
Solar cycle in 150 years of global sea surface temperature data
.
J. Climate
,
23
,
3234
3248
.
Zwiers
,
F. W.
, and
H.
von Storch
,
1995
:
Taking serial correlation into account in tests of the mean
.
J. Climate
,
8
,
336
351
.