## 1. Introduction

Regression patterns are useful diagnostics of multivariate variability associated with a prespecified time series. For instance, patterns of trend coefficients often are used to quantify the spatial structure of climate change (see Trenberth et al. 2007), and the relation between El Niño and seasonal climate has been investigated using regression patterns at least since Walker (1923). However, a major limitation of regression patterns is the absence of a rigorous significance test—while testing the significance of a *single* regression coefficient is standard, testing the significance of multiple coefficients simultaneously is not straightforward. First, the multiplicity of tests needs to be taken into account. If individual coefficients are tested at the 5% significance level and the tests are independent, then 5% of the coefficients, on average, would be expected to be found significant even if no relation existed. Second, the dependence of the tests needs to be taken into account. In general, variables in gridded datasets tend to be correlated with nearby variables, implying that if a particular variable is found to be significant, then nearby variables will tend to be deemed significant as well.

Walker (1910) appears to have been the first to recognize the problem of multiple testing with regression patterns, or equivalently correlation patterns, and to propose a solution for it. Specifically, Walker proposed adjusting the significance threshold of individual tests to ensure that the probability of at least one false rejection of the null hypothesis was a predefined threshold. The latter is often called the *experimentwise* error rate. However, this approach assumes the tests are independent, which is a dubious assumption for gridded data.

Livezey and Chen (1983) proposed testing the “field significance” of a regression pattern by Monte Carlo techniques. The core of this technique is to replace the prespecified time series by random numbers drawn from a similar distribution, calculate the regression coefficients between the resulting time series and the gridded data, and count the number of rejections of the null hypothesis. This procedure is then repeated many times and the number of rejections is recorded for each regression pattern. The original regression pattern is said to have field significance if the number of rejections exceeds a threshold derived from the upper tail of rejections based on random time series. However, this approach is based on counting the number of rejections regardless of the spatial location or degree of significance. These shortcomings could be important if the spatial correlation structure varies significantly in the domain or if some coefficients are highly significant.

Wilks (2006) proposed testing field significance based on the false discovery rate. The false discovery rate is defined as the expected proportion of rejected null hypotheses that are actually true. The procedure is to perform *M* hypothesis tests for *M* variables and record the corresponding *p* values. Then, the *p* values are ordered in increasing order. For a given significance level *α*, the largest value of *k* such that the *k*th *p* value is less than *αk*/*M* is determined. Then the *k* hypotheses corresponding to the *k* smallest *p* values are rejected. The main shortcomings of this approach are that it assumes the hypothesis tests are independent and ignores spatial structure in these dependencies, though Wilks argues that this approach is relatively insensitive to correlations among local tests.

In this paper, we argue that multivariate regression provides a natural framework for testing the significance of a regression pattern relative to the null hypothesis that the regression coefficients vanish simultaneously. A statistic for testing the null hypothesis can be derived from the likelihood ratio test. This statistic depends on the ratio of determinants of relevant covariance matrices and, as such, simultaneously accounts for the multiplicity and interdependence of the tests. The statistic depends only on the canonical correlations between predictors and predictands, thereby revealing a connection to canonical correlation analysis. Furthermore, the statistic and its distribution are the same when the predictors and predictands are reversed. This symmetry allows the field significance test to be reduced to a standard univariate significance test. These results are discussed in section 2. However, the proposed statistic cannot be evaluated when the number of coefficients exceeds the sample size, reflecting the fact that testing more hypotheses than data is ill conceived. To formulate a proper significance test and to mitigate overfitting that occurs in any optimization problem, we follow a standard approach of applying the test in a reduced space spanned by the leading principal components of the data. The number of principal components is chosen based on cross-validation experiments. In contrast to most studies, we do not select the model that minimizes the cross-validated error but instead estimate a confidence interval for the minimum cross-validated error, and then select the most parsimonious model (i.e., the model with the fewest principal components) whose error is within the selected confidence interval. This additional step avoids picking complex models whose error is close to those of much simpler models. Details of this procedure are discussed in section 3. In section 4, the framework is used to diagnose multivariate trends in annual mean SST data and winter mean 300-hPa zonal wind data. Conclusions are given in section 5.

## 2. Hypothesis testing in multivariate regression

We argue that field significance of a regression pattern can be framed as a hypothesis test in multivariate regression. To justify this argument, we review certain results of multivariate regression that lay a foundation for field significance. Proofs and further details can be found in standard multivariate texts (Mardia et al. 1979; Johnson and Wichern 2002; Press 2005).

### a. Testing multivariate hypotheses

*y*variables and one or more

*x*variables. The

*y*variables are called

*predictands*and the

*x*variables are called

*predictors*. Let

*M*be the number of predictands and

*K*be the number of predictors. The

*N*observed values of the predictors can be collected in the

*N*×

*K*matrix

*N*×

*M*matrix

**Σ**

*is a positive definite matrix that accounts for spatial correlations and variance structure in the data. The parameter*

_{δ}*δ*with respect to

**Σ**

*is positive definite, the derivative vanishes if and only if the term in parentheses vanishes, which requires that the coefficients be*

_{δ}**Σ**

*, which is fortunate since it is unknown. This independence occurs because the errors are independent of time. If the errors were correlated in time, then generalized least squares would be needed.*

_{δ}A convenient property of the least squares solution (4) is that each column of *y* only through the corresponding column of *estimation* does not carry over to multivariate *hypothesis testing*. In particular, the individual elements of *F* tests to each column of

In this paper, we consider the particular hypothesis

*M*,

*K*,

*N*−

*K*− 1 (where centering has been taken into account). The hypothesis is rejected for

*small*values of Λ.

The statistic Λ provides a basis for testing a multivariate hypothesis in a manner that accounts for the multiplicity and interdependency of the tests. Note that the statistic depends on the determinant of matrices and therefore depends on correlations between predictands and correlations between residuals. Multiplicity of hypothesis tests also taken into account.

### b. Remarks on the hypothesis text

*x*) that is maximally correlated with a linear combination of another set of variables (say,

*y*). The maximum squared correlation is given by the leading eigenvalue

*ρ*

^{2}, called the

*squared canonical correlation*. Since the determinant of a matrix equals the product of the eigenvalues, it follows that the statistic can be written equivalently as

*ρ*is the

_{i}*i*th canonical correlation. Thus, the statistic for testing

The fact that the statistic Λ depends only on the canonical correlations is not surprising. The hypothesis *x* is correlated with any combination of *y*. That is, the hypothesis

*x*and

*y*variables. This follows from the fact that Λ defined in (12) depends only on the canonical correlations and canonical correlations are symmetric with respect to the predictors and predictands. A more direct demonstration is to note that reversing the

*x*and

*y*variables in (10) leads to

*M*,

*K*,

*N*−

*K*− 1 is the same as that with parameters

*K*,

*M*,

*N*−

*M*− 1 (Anderson 1984, theorem 8.4.2). Thus, both the value of Λ and its distribution under the null hypothesis of independence are invariant to reversing

*x*and

*y*. It follows that testing the hypothesis

*x*and

*y*by multivariate regression or by canonical correlation analysis yields precisely the same decision rule, regardless of which variables are identified as predictors and predictands. These equivalences are somewhat surprising, given that the regression coefficients for predicting

*y*given

*x*differ from those for predicting

*x*given

*y*. Nevertheless, these equivalences are natural because the concept of independence does not depend on which variables are identified as predictors or predictands. A similar symmetry occurs in line fitting: fitting the line

*y*=

*ax*+

*ϵ*yields a different slope than fitting the line

*x*=

*a*′

*y*′ +

*ϵ*, but testing the hypothesis

*a*= 0 leads to the same decision rule as testing the hypothesis

*a*′ = 0, which in turn is equivalent to testing the hypothesis that

*x*and

*y*are uncorrelated.

**Σ**

*In particular, the hypothesis*

_{yx}.**Σ**

*= 0, and the hypothesis*

_{xy}**Σ**

*= 0. These two hypotheses are equivalent because*

_{yx}**Σ**

*, and*

_{xy}**Σ**

*implies the vanishing of the other three.*

_{yx}### c. Application to testing field significance

*K*= 1 in model (1). In this case,

**X**is a column vector and

**B**is a

*row*vector. The column vector

**X**is identified with a prespecified time series, for example, a linear trend or a climate index. Both the predictor and predictands are assumed to be centered. In this context, the least squares estimate

*regression pattern*because each element of the vector

*y*given

*x*. We are interested in testing the hypothesis that all regression coefficients vanish simultaneously,

*x*and

*y*reversed. Accordingly, we consider the model (14), but with only one predictand

*x*, and test the hypothesis

*R*

^{2}is the

*multiple correlation*between

*x*and

*y*, which is a scalar defined as

*F*has an

*F*distribution with

*M*and

*N*−

*M*− 1 degrees of freedom. In this way, standard results from univariate regression can be invoked to solve the multivariate hypothesis associated with field significance. A direct algebraic proof that the statistic Λ in (7) reduces to (18) when

*K*= 1 is given in appendix A.

*R*

^{2}in the significance test is not surprising. As discussed in section 2 2b, the multivariate hypothesis is equivalent to testing whether all linear combinations of

*y*are independent of

*x*. The multiple correlation is in fact defined as the largest possible correlation between

*x*and a linear combination of

**y**(Anderson 1984, definition 2.5.4),

**y**are proportional to the least squares coefficients (16). For one predictor, the canonical correlation reduces to the multiple correlation.

A direct demonstration that CCA leads to the above results is given in appendix B.

## 3. Solution based on PCs

The field significance test proposed in the previous section cannot be applied when the number of regression coefficients exceeds the sample size because the test depends on the inverse of the covariance matrix *M* is not a small fraction of the sample size *N*. To mitigate these issues, we reduce the dimension of the data by projecting the data onto a few leading PCs.

### a. Calculation details

*K*columns of

*K*principal components. With this definition, we make the substitution

*x*and the PCs of

*y*. The associated time series is found by fitting the model

*projection pattern*or

*canonical weights*, is

*x*and each PC. The time series for predicting

*x*, called the

*canonical time series*, is

*x*and the PCs is

### b. Selecting the PCs to include in the regression model

*y*given

*x*is equivalent to testing the significance of the model for predicting

*x*given

*y*. In the latter case, however, the choice of the number of PCs is a

*model selection problem*, for which a variety of strategies have been proposed (Hastie et al. 1995). Therefore, we consider the model

*μ*for completeness. Here, there is one predictand but

*K*predictors. The question then is which and how many PCs to include in

*i*th PC and

*x*, then selecting the first

*K*PCs gives the greatest canonical correlation of any

*K*PCs. That is, the best subset of PCs can be obtained without searching all possible subsets, because the contribution of each PC to the canonical correlation is uncorrelated with the other PCs. However, selecting PCs in this way is a form of

*best subset regression*, also called screening or “data fishing.” Such selection methods invalidate standard hypothesis tests and lead to poor prediction models (Rencher and Pun 1980; Freedman 1983; Flack and Chang 1987; Pinault 1988; DelSole and Shukla 2009). Using Monte Carlo techniques, we have confirmed that screening substantially inflates the canonical correlations, and hence inflates the

*F*statistic, and that the final canonical correlation is not statistically significant when screening is taken into account.

*μ*

^{(n)}denote the least squares estimates of

*μ*derived from the data leaving out the

*n*th sample. Note the mean term

*μ*is needed because the predictors and predictands do not have zero mean when a sample is withheld. This procedure is repeated for

*n*= 1, 2, … ,

*N*so that all samples will have been withheld exactly once. The cross-validated mean square error (CVMSE) is then defined as

*z*.

*x*]) reduces to the squared multiple correlation, which in turn is the variance explained by the predictors. By analogy, the cross-validated skill score (CVSS) is a measure of the amount of variance the model explains in a cross-validated sense. Negative values of CVSS imply that the model has larger cross-validated error than a prediction based on the mean. The confidence interval for CVSS is the appropriate linear transformation of the confidence interval of CVMSE.

### c. Explained variance

**Σ**

*is a scalar in the case of a single predictor, so FEV is a scalar.*

_{x}## 4. Application to climate time series

### a. Data

The datasets used in this paper are the annual mean SST from the extended reconstruction sea surface temperature analysis, version 3b (Smith and Reynolds 2004), which is on a 2° × 2° grid, and the December–February (DJF) mean 300-hPa zonal wind in the Northern Hemisphere from the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis, which is on a 2.5° × 2.5° grid (Kalnay et al. 1996; Kistler et al. 2001). The period analyzed in both sets is 1948–2009. The 1948–2009 mean of each variable at each grid point is subtracted prior to analysis.

We are interested in diagnosing multivariate trends. Accordingly, the prespecified time series **X** is chosen to be a linear function of year with zero mean. The time series increases in increments of , so that the regression coefficients have units of per decade.

### b. Diagnosing trends in annual mean SST

The pattern of trend coefficients for annual mean SST during the period 1948–2009 is shown in Fig. 1a. This regression pattern is calculated from

The squared canonical correlation *ρ*^{2} (or equivalently, the squared multiple correlation) between the trend and the leading PCs, as a function of the number of PCs used to represent the data, is shown in Fig. 2a. For reference, *ρ*^{2} for the first PC is 68%, which is statistically significant. Nevertheless, the first three PCs can be combined linearly to increase the squared canonical correlation to 92%. Comparison with the 5% significance curve for *ρ*^{2}, calculated from (18) and shown as the dashed line in Fig. 2a, indicates that the canonical correlation is significant for all PC truncations. However, the majority of this significance arises from the first three PCs, and very little gain in correlation occurs using more than three PCs.

(a) The squared canonical correlation (dots) and associated 5% significance level (dashed) between a linear trend and the annual mean SST during 1948–2009, as a function of the number of PCs used to represent the data. (b) The cross-validated skill score (38) for predicting the trend based on the PCs of annual mean SST, as a function of the number of PCs. The error bar shows the standard error of the maximum skill score, and the vertical dashed line indicates the selected number of PCs. (c) The correlation between the trend and individual PCs (bars), and the corresponding 5% significance levels (red lines).

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

(a) The squared canonical correlation (dots) and associated 5% significance level (dashed) between a linear trend and the annual mean SST during 1948–2009, as a function of the number of PCs used to represent the data. (b) The cross-validated skill score (38) for predicting the trend based on the PCs of annual mean SST, as a function of the number of PCs. The error bar shows the standard error of the maximum skill score, and the vertical dashed line indicates the selected number of PCs. (c) The correlation between the trend and individual PCs (bars), and the corresponding 5% significance levels (red lines).

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

(a) The squared canonical correlation (dots) and associated 5% significance level (dashed) between a linear trend and the annual mean SST during 1948–2009, as a function of the number of PCs used to represent the data. (b) The cross-validated skill score (38) for predicting the trend based on the PCs of annual mean SST, as a function of the number of PCs. The error bar shows the standard error of the maximum skill score, and the vertical dashed line indicates the selected number of PCs. (c) The correlation between the trend and individual PCs (bars), and the corresponding 5% significance levels (red lines).

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

We perform cross validation as described in section 3b. To illustrate the impact of leaving one year out, an extreme example based on 50 PCs is shown in Fig. 3. In this example, all years except 1959 were used to construct the projection pattern (27). The figure shows that the projection pattern gives a reasonable prediction of the trend in all years except 1959, the year that was left out of the analysis. The large prediction error in the withheld year is a classic symptom of overfitting. Cross validation involves repeating this procedure for each year in turn and collecting the statistics of the resulting prediction errors.

Prediction of a linear trend (dots) using as predictors the leading 50 PCs of annual mean SST. The prediction model was estimated by the least squares method using all years during 1948–2009 except 1959. The trend function being fitted is shown as the solid line. For reference, the year 1959 is indicated by a thin vertical dashed line.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Prediction of a linear trend (dots) using as predictors the leading 50 PCs of annual mean SST. The prediction model was estimated by the least squares method using all years during 1948–2009 except 1959. The trend function being fitted is shown as the solid line. For reference, the year 1959 is indicated by a thin vertical dashed line.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Prediction of a linear trend (dots) using as predictors the leading 50 PCs of annual mean SST. The prediction model was estimated by the least squares method using all years during 1948–2009 except 1959. The trend function being fitted is shown as the solid line. For reference, the year 1959 is indicated by a thin vertical dashed line.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

The cross-validated skill score (38) of predicting the trend based on the leading PCs of annual mean SST is shown in Fig. 2b. The figure shows that the maximum skill score occurs at six PCs. However, the figure shows that the skill scores using 3–8 PCs also lie within the estimated confidence interval of the maximum. By the principle of parsimony, we choose the simplest model (i.e., the model with the fewest number of PCs) that lies within the confidence interval of the maximum skill score, namely, the model with three PCs.

The correlation between the trend and individual PCs is shown in Fig. 2c; also shown as horizontal lines are the associated 5% significance levels. The figure shows that only the first three PCs have statistically significant trends. It is interesting to note that the *ρ*^{2} shown in Fig. 2a is proportional to the sum of squared canonical correlations of individual PCs up to and including the PC indicated on the abscissa (as discussed in section 3a).

The trend pattern derived from the first three PCs is shown in Fig. 1b; this pattern is derived from (25). The similarity between the EOF-filtered trend pattern (Fig. 1b) and the regression pattern (Fig. 1a) indicates that there is little loss of trend information by using only the first three global EOFs to represent the data. For comparison, we also show the leading EOF in Fig. 1c. As anticipated from the fact that the leading PC contains a significant trend, the canonical trend pattern and the leading EOF are similar. The main difference is that the leading EOF has stronger amplitudes in the equatorial Pacific, suggesting a stronger contribution due to ENSO. The canonical component explains 24% of the SST spatial variance compared to 27% explained by the leading EOF. Thus, although the canonical component explains slightly less variance than the leading EOF, it explains significantly more long-term trend (e.g., *ρ*^{2} are 92% versus 68%).

The canonical projection pattern is shown in Fig. 1d. This pattern is projected onto the SST to produce the canonical time series. The pattern has positive amplitudes concentrated in the Indian Ocean and south subtropical oceans, and negative amplitudes in the northern oceans. The projection pattern differs most significantly from the regression pattern by having negative amplitudes in the central equatorial Pacific, as well as generally weaker amplitudes in the western tropical Pacific. The similarity between the projection and regression patterns suggests that a significant fraction of the trend can be recovered by projecting the regression pattern onto the SSTs. However, this approach would not be optimal—differences between the two patterns imply that more trend can be captured by taking differences in SST along the equatorial Pacific, which presumably filters out interannual ENSO variability.

The time series for the leading EOF and the canonical component are shown in Figs. 4a and 4b, respectively, as computed from (22) and (28), respectively. The time series for the canonical pattern exhibits a stronger trend component than the leading PC, as anticipated from the fact that the *ρ*^{2} for the leading EOF and canonical component are 68% and 92%, respectively. The correlation between the leading PC and the Niño-3.4 index is about 0.68, while the correlation between the canonical component and the Niño-3.4 index is 0.25. Loosely speaking, the canonical projection vector “filters out” ENSO noise to enhance the trend. Note that this filtering is based on spatial structure, not on temporal smoothing, which allows the amplitude of the canonical pattern to be monitored year by year.

Time series for (a) the leading PC and (b) canonical trend pattern and for annual average SST during the period 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Time series for (a) the leading PC and (b) canonical trend pattern and for annual average SST during the period 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Time series for (a) the leading PC and (b) canonical trend pattern and for annual average SST during the period 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Most of the above results are insensitive to the number of PCs. The main exceptions are that the projection pattern shown in Fig. 1d tends to exhibit more small-scale noise as more PCs are included in the CCA, and the canonical time series tends to get smoother and approach an exact linear trend as the number of PCs increases (not shown). Nevertheless, the physical pattern shown in Fig. 1b and the *ρ*^{2} shown in Fig. 2a are virtually indistinguishable for PC truncations greater than three.

### c. Diagnosing trends in DJF 300-hPa zonal wind

The trend pattern for December–February 300-hPa zonal wind is shown in Fig. 5a. The figure reveals a “tripole” pattern with mostly positive amplitudes along 30°–45° latitude and negative amplitudes north and south. When superimposed on the background mean flow, this trend pattern indicates that the downstream end of the jet core is shifting eastward and that winds outside the jet core are decelerating by about 2 m s^{−1} decade^{−1}.

(a) The point-by-point regression coefficients (shading) for the trend of DJF 300-hPa zonal wind (in meters per second per decade), (b) the trend pattern (shading) as represented by the first 21 PCs, (c) the leading EOF (shading), and (d) corresponding canonical projection pattern for the trend. The contours in (a)–(c) show the corresponding mean wind.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

(a) The point-by-point regression coefficients (shading) for the trend of DJF 300-hPa zonal wind (in meters per second per decade), (b) the trend pattern (shading) as represented by the first 21 PCs, (c) the leading EOF (shading), and (d) corresponding canonical projection pattern for the trend. The contours in (a)–(c) show the corresponding mean wind.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

(a) The point-by-point regression coefficients (shading) for the trend of DJF 300-hPa zonal wind (in meters per second per decade), (b) the trend pattern (shading) as represented by the first 21 PCs, (c) the leading EOF (shading), and (d) corresponding canonical projection pattern for the trend. The contours in (a)–(c) show the corresponding mean wind.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

The squared canonical correlations between a linear trend and the leading PCs are shown in Fig. 6a. We see a dramatic difference compared to the canonical correlations for SST (Fig. 2a): the *ρ*^{2} is negligible for the first few PCs, but then it increases and becomes statistically significant only after about 9 PCs. The cross-validated skill score of predicting the trend based on the PCs is shown in Fig. 6b. The figure shows that the maximum skill occurs at 28 PCs. Nevertheless, the skill for 21–35 PCs lie within the confidence interval of the maximum. Thus, the simplest model that is within the confidence interval of the best model is the 21-PC model. The 21-PC model also has statistically significant canonical correlation (as seen in Fig. 6a). The correlation between the trend and individual PCs, shown in Fig. 6c, reveals that the first two PCs do not have statistically significant trends. Indeed, only PCs 3, 9, 10, and 13 have statistically significant trends. This example illustrates the difficulty of choosing a single truncation based on the statistical significance of the trend in individual PCs.

(a) The squared canonical correlation (solid) and associated 5% significance level (dashed) between a linear trend and DJF 300-hPa zonal wind during 1948–2009, as a function of the number of PCs used to represent the data. (b) The CVSS (38) for predicting the trend based on the PCs of DJF 300-hPa zonal wind, as a function of the number of PCs. The error bar shows the standard error for the maximum skill score, and the vertical dashed line indicates the selected number of PCs. (c) The correlation between the trend and individual PCs (bars), and the corresponding 5% significance levels (red lines).

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

(a) The squared canonical correlation (solid) and associated 5% significance level (dashed) between a linear trend and DJF 300-hPa zonal wind during 1948–2009, as a function of the number of PCs used to represent the data. (b) The CVSS (38) for predicting the trend based on the PCs of DJF 300-hPa zonal wind, as a function of the number of PCs. The error bar shows the standard error for the maximum skill score, and the vertical dashed line indicates the selected number of PCs. (c) The correlation between the trend and individual PCs (bars), and the corresponding 5% significance levels (red lines).

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

(a) The squared canonical correlation (solid) and associated 5% significance level (dashed) between a linear trend and DJF 300-hPa zonal wind during 1948–2009, as a function of the number of PCs used to represent the data. (b) The CVSS (38) for predicting the trend based on the PCs of DJF 300-hPa zonal wind, as a function of the number of PCs. The error bar shows the standard error for the maximum skill score, and the vertical dashed line indicates the selected number of PCs. (c) The correlation between the trend and individual PCs (bars), and the corresponding 5% significance levels (red lines).

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

The regression pattern derived from the leading 21 PCs is shown in Fig. 5b. The similarity between this EOF-filtered pattern and the point-by-point regression pattern (Fig. 5a) indicates that there is little loss of trend information by using only 21 global PCs. For comparison, we also show the leading EOF in Fig. 5c. While the leading EOF also shows a tripole pattern, it is shifted southward relative to the trend pattern and generally has stronger amplitudes and more distinctive centers of action. The leading EOF explains 21% of the variance, whereas the canonical pattern explains only 4% of the variance. Thus, the canonical pattern represents a relatively small fraction of the seasonal variability in winds.

The projection pattern for the trend is shown in Fig. 5d. This pattern differs considerably from the canonical pattern and the leading EOF. In general, the projection pattern tends to have its largest amplitudes in regions where the background zonal wind is weakest. To clarify the reason for this, we show in Fig. 7 the standard deviation of the wind. A comparison between Figs. 5d and 7 shows that the largest amplitudes in the projection pattern tend to be in regions where the wind *variance* is weakest. While regions of weak variance may have weak trends, they also have less noise and therefore potentially larger signal-to-noise ratio. By weighting low-variance regions more strongly than regions with high variance, the projection pattern is able to filter out noise to enhance the signal-to-noise ratio.

Standard deviation (shading) and mean (contours) of the DJF 300-hPa zonal wind (m s^{−1}) during 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Standard deviation (shading) and mean (contours) of the DJF 300-hPa zonal wind (m s^{−1}) during 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Standard deviation (shading) and mean (contours) of the DJF 300-hPa zonal wind (m s^{−1}) during 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

The time series for the leading EOF and the canonical component are shown in Figs. 8a and 8b, respectively. The canonical time series exhibits a much stronger trend component than the leading PC, as expected. The squared correlation between the trend and the canonical component is 83% compared to 0.2% for the leading PC—a stark contrast in trends. This example provides a dramatic illustration of the fact that the leading principal components can fail to capture the trend. The leading principal component is highly correlated with the Niño-3.4 index (the correlation coefficient is 0.64), suggesting a relation with ENSO. In contrast, the correlation coefficient between the zonal wind canonical component and Niño-3.4 index is 0.05, suggesting that the trend component has little to do with ENSO.

Time series for the (a) leading PC and (b) canonical time series for the trend for DJF 300-hPa zonal wind during the period 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Time series for the (a) leading PC and (b) canonical time series for the trend for DJF 300-hPa zonal wind during the period 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

Time series for the (a) leading PC and (b) canonical time series for the trend for DJF 300-hPa zonal wind during the period 1948–2009.

Citation: Journal of Climate 24, 19; 10.1175/2011JCLI4105.1

## 5. Conclusions

This paper argues that multivariate regression provides a natural framework for testing the field significance of a regression pattern. Specifically, a regression model for predicting *y* given a single *x* leads naturally to the regression pattern. A test for the hypothesis that all regression coefficients vanish can then be derived from the likelihood ratio test. The statistic derived from the likelihood ratio test equals a ratio of determinants of relevant covariance matrices. This ratio takes into account correlations between predictors and predictands and takes into account the multiplicity of tests. It turns out that the statistic is invariant to reversing *x* and *y*, implying that testing the field significance of a regression pattern for predicting *y* given *x*, *is equivalent* to testing the model for predicting *x* given *y.* In this way, the field significance test is reduced to a standard univariate hypothesis test.

In general, the statistic derived from the likelihood ratio test depends only on the canonical correlations between predictors and predictands. This connection between multivariate regression and canonical correlation analysis follows naturally from the fact that vanishing of the regression coefficients implies that the predictors and predictands are independent. More generally, if *x* and *y* are independent, then the regression coefficients for predicting *y* given *x* vanish, the regression coefficients for predicting *x* given *y* vanish, and the canonical correlations vanish. Thus, each hypothesis leads to precisely the same decision rule because they are fundamentally the same—they each imply that *x* and *y* are independent. This reasoning also explains why the test is the same if *y* and *x* are reversed, since the concept of independence does not depend on which variables are identified as predictors or predictands.

The fact that field significance can be addressed naturally through multivariate regression raises the question of why it was overlooked in the past. Part of the answer is perhaps because most climate scientists are unaware of the likelihood ratio test in the context of multivariate regression, owing to its advanced nature statistically and mathematically. Also, the test cannot be applied when the number of variables exceeds the number of samples, and perhaps no one thought to apply EOF filtering to reduce the number of variables. Finally, the decision rule from the likelihood ratio test depends on Wilk’ s U distribution, whose critical values are not conveniently tabulated. That this distribution reduces to the *F* distribution for the case of one predictor does not seem to be mentioned often, even in the statistics literature.

A variety of other field significance tests have been proposed in the literature (see introduction). These alternative approaches lead to different decision rules, raising questions of interpretation. For instance, what is the meaning of a test that disagrees with CCA? An advantage of the proposed framework is that the decision rule for field significance is always consistent with that derived from multivariate regression and canonical correlation analysis.

When the number of regression coefficients exceeds the sample size, the field significance test cannot be evaluated because certain covariance matrices become singular. This singularity is more than a practical limitation, it signifies a fundamental problem. Field significance involves testing that all regression coefficients vanish simultaneously. If the number of coefficients exceeds the sample size, then field significance is tantamount to testing more hypotheses than data, which is ill conceived. To pose a proper field significance test, some constraints need to be imposed. We followed the standard approach of reducing the number of variables by using only a small number of leading principal components of the data. Furthermore, since testing field significance is equivalent to testing the significance of a model with reversed predictors and predictands, the problem of choosing the number of PCs is equivalent to a *model selection problem*, for which many strategies exist. We specifically avoided selecting PCs that were individually correlated with the predictand, as this strategy is a form of screening that leads to biased skill estimates. We proposed selecting PCs in the order of their explained variances, which avoids the screening problem but may lead to poor prediction models. To reduce the latter possibility, the number of ordered PCs was chosen based on cross-validation experiments in which the model predicts the prespecified time series given the PCs. In contrast to most studies, we do not select the model that minimizes the cross-validated error. Instead, we estimate a confidence interval for the minimum cross-validated error, and then select the most parsimonious model (i.e., the model with the fewest number of principal components) whose error is within the confidence interval of the minimum error. This additional step avoids picking complex models whose error is very close to those of much simpler models. While we believe that the above procedure merits attention, we also emphasize that the model selection issue is in most need of further research.

Because the significance test in multivariate regression is applicable to arbitrary predictors and predictands, the orthogonality of the PCs is in no way essential or critical to the field significance test—any set of variables, correlated or uncorrelated, can be used, and the statistical hypothesis test automatically accounts for correlations in the data.

The proposed field significance test was applied to diagnose trends in annual mean sea surface temperature and in December–February 300-hPa zonal wind in the Northern Hemisphere. The squared canonical correlation between the SST and trend is 92% compared with a squared correlation of 68% for the leading principal component alone. The canonical trend pattern and the leading EOF of SST are similar, except that the EOF generally has larger amplitudes, especially in the equatorial and North Pacific. Overall, much of the trend information in SST is contained in the leading EOF. The situation differs dramatically for upper-level zonal wind. Specifically, the leading EOF of zonal wind has no statistically significant trend, whereas the squared canonical correlation with the trend is 83%.

The canonical projection pattern is a spatial filter that maximizes the signal-to-noise ratio of the prespecified time series, and hence is analogous to climate change “fingerprints” that maximize the signal-to-noise ratio of the climate change signal (Hasselmann 1979; Hasselmann 1997; Allen and Tett 1999). To the extent that 50-yr trends are dominated by the response to anthropogenic and natural forcing (see DelSole et al. 2011), the canonical projection vector can serve as a “fingerprint” for *real-time monitoring* of the response to climate forcing.

The field significance test proposed in this paper makes several assumptions that may fail in practice. For instance, the test assumes that the errors are independent in time. In principle, autocorrelated errors can be taken into account by postulating an appropriate model for the autocorrelation, such as a multivariate autoregressive model, and deriving the corresponding likelihood ratio, although this entails increasing the number of parameters to be estimated. The test also assumes that the data are distributed as a multivariate Gaussian. Muirhead and Waternaux (1980) show that the likelihood ratio test is sensitive to departures from normality—even for large sample sizes, especially to long-tailed distributions. In cases in which violation of the underlying assumptions compromises the test, alternative techniques, such as based on the bootstrap or Monte Carlo methods, may become preferable.

The proposed test applies to any prespecified time series, not just trends. In particular, the proposed framework can be used to quantify the relation between a particular climate index and other variables, for example, the relation between ENSO and global precipitation. The framework (aside from EOF filtering) can be applied to variables with different units and different natural variances, since the component is invariant to nonsingular linear transformations of the data (see Anderson 1984, p. 490). Also, the framework can be generalized to multiple prespecified time series, such as orthogonal Fourier harmonics. The treatment of missing data within the proposed field significance test needs further investigation.

## Acknowledgments

We thank Michael Tippett, Tony Barnston, and J. Shukla for their insightful discussions about this research, which led to improvements and clarifications. We also thank three anonymous reviewers for their constructive comments and for pointing out important references. This research was supported by the National Science Foundation (Grants ATM0332910, ATM0830062, ATM0830068), the National Aeronautics and Space Administration (Grants NNG04GG46G, NNX09AN50G), and the National Oceanic and Atmospheric Administration (Grants NA04OAR4310034, NA09OAR4310058, NA05OAR4311004). The views expressed herein are those of the authors and do not necessarily reflect the views of these agencies.

## APPENDIX A

### Equivalence of the Statistics Λ and *F* for One Predictor

*F*distribution as (Anderson 1984, theorem 8.4.5)

*F*distribution with

*M*and

*N*−

*M*− 1 degrees of freedom. This shows the equivalence between the statistics Λ and

*F*in the case of a single predictor.

## APPENDIX B

### CCA for One Predictor

*x*and

*y*from CCA is based on solving the eigevalue problem

**Σ**

*is a scalar and*

_{x}**Σ**

*is a vector. Let*

_{yx}*γ*is an arbitrary normalization constant. This eigenvector solution agrees with (16) with the choice

## REFERENCES

Allen, M. R., and S. F. B. Tett, 1999: Checking for model consistency in optimal fingerprinting.

,*Climate Dyn.***15**, 419–434.Anderson, T. W., 1984:

*An Introduction to Multivariate Statistical Analysis*. Wiley-Interscience, 675 pp.DelSole, T., and J. Shukla, 2009: Artificial skill due to predictor screening.

,*J. Climate***22**, 331–345.DelSole, T., M. K. Tippett, and J. Shukla, 2011: A significant component of unforced multidecadal variability in the recent acceleration of global warming.

,*J. Climate***24**, 909–926.Flack, V. F., and P. C. Chang, 1987: Frequency of selecting noise variables in subset regression analysis: A simulation study.

,*Amer. Stat.***41**, 84–86.Freedman, D. A., 1983: A note on screening regression equations.

,*Amer. Stat.***37**, 152–155.Hasselmann, K., 1979: On the signal-to-noise problem in atmospheric response studies.

*Meteorology of the Tropical Ocean,*D. B. Shaw, Ed., Royal Meteorological Society, 251–259.Hasselmann, K., 1997: Multi-pattern fingerprint method for detection and attribution of climate change.

,*Climate Dyn.***13**, 601–611.Hastie, T., A. Buja, and R. Tibshirani, 1995: Penalized discriminant analysis.

,*Ann. Stat.***23**, 73–102.Johnson, R. A., and D. W. Wichern, 2002:

*Applied Multivariate Statistical Analysis*. 5th ed. Prentice-Hall, 767 pp.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanlysis Project.

,*Bull. Amer. Meteor. Soc.***77**, 437–471.Kistler, R., and Coauthors, 2001: The NCEP–NCAR 50-Year Reanalysis: Monthly means CD–ROM and documentation.

,*Bull. Amer. Meteor. Soc.***82**, 247–267.Livezey, R. E., and W. Chen, 1983: Statistical field significance and its determination by Monte Carlo techniques.

,*Mon. Wea. Rev.***111**, 46–59.Mardia, K. V., J. T. Kent, and J. M. Bibby, 1979:

*Multivariate Analysis*. Academic Press, 518 pp.Muirhead, R. J., and C. M. Waternaux, 1980: Asymptotic distributions of canonical correlation analysis and other multivariate procedures for nonnormal populations.

,*Biometrika***67**, 31–43.Muller, K. E., 1982: Understanding canonical correlations through the general linear model and principal components.

,*Amer. Stat.***36**, 342–354.Pinault, S. C., 1988: An analysis of subset regression for orthogonal designs.

,*Amer. Stat.***42**, 275–277.Press, S. J., 2005:

*Applied Multivariate Analysis*. Dover, 671 pp.Rencher, A. C., and F. C. Pun, 1980: Inflation of

*R*2 in best subset regression.,*Technometrics***22**, 49–53.Smith, T. M., and R. W. Reynolds, 2004: Improved extended reconstruction of SST (1854–1997).

,*J. Climate***17**, 2466–2477.Trenberth, K. E., and Coauthors, 2007: Observations: Surface and atmospheric climate change.

*Climate Change 2007: The Physical Science Basis,*S. Solomon, et al., Eds., Cambridge University Press, 235–336.Walker, G. T., 1910: Correlation in seasonal variations of weather.

,*Mem. Indian Meteor. Dept.***21**, 22–45.Walker, G. T., 1923: Correlation in seasonal variations of weather. VIII. A preliminary study of world weather.

,*Mem. Indian Meteor. Dept.***24**, 75–131.Widmann, M., 2005: One-dimensional CCA and SVD, and their relationship to regression maps.

,*J. Climate***18**, 2785–2792.Wilks, D. S., 2006: On “field significance” and the false discovery rate.

,*J. Appl. Meteor. Climatol.***45**, 1181–1189.