## 1. Introduction

The identification of coupled patterns is important to many areas of climate research, including teleconnection and attribution studies (Smoliak et al. 2015), seasonal climate forecasting (Wilks 2008), climate field reconstruction using proxy records (Smerdon et al. 2011), and linear inverse modeling (DelSole and Tippett 2008). While early research focused on the linear dependence between two climate fields (Bretherton et al. 1992; Tippett et al. 2008), other studies have also recognized the importance of nonlinear dependence (Ramírez et al. 2006; Cannon and Hsieh 2008; Ortiz Beviá et al. 2010; Evans et al. 2014).

Two main methods have previously been used to investigate the nonlinear coupling between two fields: neural net canonical correlation analysis (NNCCA) (Hsieh 2001; Wu et al. 2005; Cannon and Hsieh 2008) and kernel canonical correlation analysis (KCCA) (Arenas-García et al. 2013). In NNCCA, parametric mapping functions are used to link one-dimensional projections of the climate fields, while in KCCA, the climate fields are abstractly mapped into a higher-dimensional space (or feature space) and linear canonical correlation analysis (CCA) is performed in the feature space. KCCA has not been widely used in the climate sciences (perhaps for the reason below), but Lima et al. (2009) used the related method of kernel principal component analysis (KPCA). Both NNCCA and KCCA have issues. For KCCA (and KPCA), the time components are not linear combinations of the predictor field or the response field, which can make the KCCA (KPCA) components difficult to interpret. For NNCCA, the projection weights are found by optimization methods, and NNCCA typically relies on iterative decomposition (i.e., the components are extracted one at a time), so NNCCA is computationally expensive. Also, NNCCA is not appropriate for high-dimensional data and hence is typically applied to a low-dimensional principal component subspace of the original data fields. In both methods, the kernel (or mapping) functions are generally arbitrarily chosen.

The main aims of this study are to develop a new method for investigating nonlinear dependence between two climate fields and to illustrate the method by examples. The new method, gradient-based kernel canonical correlation analysis (gKCCA), builds on the ideas of Fukumizu and Leng (2014), who introduced gradient-based kernel regression with a univariate response variable. The following paper describes a gradient-based kernel method for coupled fields (section 2) and also its connections with CCA (sections 2a, 2b, and 2d) and the linear model (section 2c). Cross validation for nonlinear models is discussed in section 3. Studies of the tropical Pacific Ocean have suggested that the system oscillates at preferred frequencies, such as annual, quasi-biennial (QB), and quasi-quadrennial (QQ) frequencies (Bejarano and Jin 2008), but the nature, and the spatial structure, of the coupling between these modes remains unclear. In section 4, the gKCCA method is used to investigate nonlinear coupling between modes with different frequencies in the Pacific Ocean. Finally, the sensitivity of the gKCCA method with respect to different levels of noise and different kernel functions is investigated (section 5).

## 2. Methods

### a. Overview

*j*in

*j*th column vector of matrix

*d*. The prime symbol denotes matrix transposition, and

*ρ*are the diagonal elements of the

Notation used.

*k*and column

*d*in matrix

The gKCCA method has two steps:

- Find
, the subspace of that is linearly or nonlinearly related to . - Model the functions
and find , by performing CCA on and a nonlinear augmentation of .

### b. The gradient field and its eigenvectors (or estimating )

This section explains how to do step 1 and illustrates it using a simulated example. Some extra details of the method are found in appendix A and supplementary sections 1–3, and cross-references are given as necessary.

So, given only

One solution is to find the directions that maximize the gradient of

To find the directions in which the average absolute gradient is the steepest, an estimate of the gradient field

*n*is the same as in the input space, but the number of variables

*p*can be much greater than in the input space. For some feature spaces, the inner product of the vector of mapping functions is a kernel function; that is,

*σ*[Eqs. (9) and (10)]. It is convenient to reexpress this parameter in terms of the median Euclidean distance

The gradient field for the example [Eq. (7)], calculated using Eq. (8) for two different values of *σ*, is shown in Fig. 1b. For

Both linear CCA and gKCCA [Eqs. (8)–(12)] were used to estimate

### c. The linear model in kernel form

*λ*is a ridge parameter as in ridge regression (see, e.g., Cannon 2009). The effect of the ridge parameter is that higher values of the parameter will give more weight to the principal components of the predictor matrix that have large eigenvalues, leading to less overfitting of the response.

*ε*is a ridge parameter (which has the same effect as

*λ*above). The gradient vector of the linear model [Eq. (14)] can now be reexpressed as follows:where

Different kernel functions

### d. Modeling and estimating

Section 2b showed how to estimate the matrix

There are several possible basis functions that would be suitable here including piecewise linear (appendix B), piecewise polynomial, or orthogonal polynomial basis functions. For any particular application, the parameter

Figure 2 shows

### e. Explained variance and prediction

Also, once

## 3. Cross validation

### Cross-validation metrics for linear and nonlinear models

The gKCCA method has two main parameters that need to be estimated: σ [Eq. (10) in Eq. (19b)] and *ε* [Eq. (19b)], for a given

*d*[e.g., Eq. (4)], which is the number of model dimensions. That is,

*d*is a model parameter. It is possible that

For gKCCA, measures of linear association between

## 4. Nonlinearly coupled oscillations in the Pacific Ocean

### a. Introduction

In this section, gKCCA is used to investigate nonlinear coupling in the Pacific Ocean.

It is likely that ENSO has multiple modes of variability, each with their own characteristic time scale, that can coexist (Barnett 1991). This aspect of ENSO has been investigated by several researchers using the Zebiak–Cane (ZC) model (Yang et al. 1997; Roulston and Neelin 2003; Bejarano and Jin 2008). For example, Bejarano and Jin (2008) ran the full ZC model over a wide range of values of a two-dimensional parameter space, the parameters being the upper-ocean layer thickness and the percentage wind stress. The ZC model exhibited three main regimes:

- A quasi-quadriennial regime, which dominated for deep upper-layer thickness (150 m) and high-scaled wind stress
- A quasi-biennial QB regime, which dominated for shallow upper-layer thickness (130 m) and high-scaled wind stress
- A mixed regime, where the QQ and QB modes coexisted, for average upper-layer thickness (140 m) and average scaled wind stress

Earlier work on the interaction between the QQ and QB modes was conducted by Barnett (1991), Yang et al. (1997), and Roulston and Neelin (2003). Using bicoherence to analyze global SST data (1950–87), Barnett (1991) showed that the QQ and QB modes were quadratically coupled, while the QB mode and the annual mode were not quadratically coupled. Yang et al. (1997) and Roulston and Neelin (2003) performed experiments with a linearized version and a lite nonlinear version of the ZC model, respectively. In the full ZC model, SST is a nonlinear function of thermocline depth, and atmospheric heating is a nonlinear function of surface-wind convergence, but in the Roulston and Neelin model only the former function is nonlinear (hence lite nonlinear version). Both Yang et al. (1997) and Roulston and Neelin (2003) performed their model experiments without a seasonal cycle. Yang et al. (1997) found that the most unstable mode was a quasi-biennial oscillation, and the linearized model produced no quasi-quadrennial oscillation, suggesting that the QQ mode might arise from a nonlinear interaction with the QB mode. In contrast, Roulston and Neelin (2003) found that the QB mode was stable and only became oscillatory as a result of nonlinear coupling to an unstable QQ mode. Also, the nonlinearity in their model did not produce a quasi-biennial spectral peak without the presence of a stable separate QB mode. That is, the QB spectral peak was not simply a subharmonic of the QQ mode.

### b. Data and analysis

The main data used in this study are from the Extended Reconstructed Sea Surface Temperature (ERSST) dataset (Huang et al. 2016; NOAA 2017), which spans the years 1854 to present and has a spatial resolution of *p* and *q* were

*ε*and σ.

For both regCCA and gKCCA the model parameters were estimated using gridded parameter values and leave-half-out cross-validation, with the cross-validation metric *ε* in gKCCA. For σ in gKCCA, 10 different values were used:

In addition, the two other parameters

*σ*and

*ε*, a multivariate phase randomization test (Schreiber and Schmitz 2000) was performed to test if the coupling between the components

*ε*. The nonlinear coupling between fields was tested using a time-domain version of the bicoherence statistic

*b*:where

### c. Results

For regCCA, the cross-validated parameter estimates were

Figure 3 shows the correlation between the paired components

Table 2 contains the results of the multivariate phase randomization test applied to the leading three gKCCA subspaces. The bicoherence of the first component pair is significantly different from the null hypothesis bicoherence distribution, while the bicoherences of the second and third subspace are not significantly different. This demonstrates that the quadratic phase coupling between the

Bicoherence results from multivariate phase randomization test.

Figure 4 shows the correlation between the paired components

The second pair of gKCCA components

## 5. Discussion

In this section, the gKCCA method is further investigated with respect to the effect of noise on the estimation of the parameters σ and *ε* and the application of other kernels. These aspects are explored using a third simulated example.

*t*is the time in decimal years (with monthly resolution), and

*z*-score operator. Both the

In the first experiment, the gKCCA method was applied using a Gaussian kernel. The parameter *ε* were estimated using gridded parameter values (the same as in section 4b). The leading subspace, for the three different values of SNR, are shown in Fig. 6 (top row): the quadratic relationship between *ε* was the same for all three values of SNR. The parameter *σ* were simply due to the randomness in the simulations. This example suggests that for the Gaussian kernel the parameters *σ* and *ε* are insensitive to the level of noise in

*k*is the power of the polynomial. Thus for this experiment there are also two parameters to be estimated:

*ε*[Eq. (19b)] and

*k*. The parameter

*ε*was estimated using the same gridded parameter set as above. The parameter

*k*was estimated using the set

*k*was the same

*ε*showed values of

*ε*) is not expected when using the polynomial kernel. In this example, using

The investigation of nonlinearities in climate science has been hindered by the lack of methods which are relatively easy to use and computationally efficient. There are many areas in climate science where gKCCA could be applied, and two further examples are given here. The first is in the area of climate reconstructions using climate proxies, such as tree rings. Tree rings carry a climate signal that is both nonlinear and multivariate (Evans et al. 2014). It would be interesting to test how well gKCCA reconstructs temperature and rainfall variation in the context of pseudoproxy experiments with nonlinear proxy models, such as VS-Lite (Vaganov-Shashkin-Lite), a tree-ring model (Tolwinski-Ward et al. 2011). VS-Lite is an example of a proxy model that defines a nonlinear function between climate variables (temperature, precipitation) and the climate proxy (tree rings). The gKCCA method should be able to find such a nonlinear function. The second example is in the area of predictability. For example, in the Lorenz model, the state variables at one point in time are nonlinearly dependent on the state at past times. This nonlinear predictability has been investigated using a neural network (Cannon 2006), and it would be interesting to see how gKCCA performs. The gKCCA method can also be developed further. For example, it is possible that there could be nonlinear coupling between fields with propagating signals. Complex CCA (i.e., CCA with complex vectors) can be used to investigate linear coupling between such fields (Schreier 2008), so complex gKCCA needs to be developed in order to investigate nonlinear coupling between fields with propagating signals.

## 6. Conclusions

Gradient-based kernel dimension reduction relies on two powerful mathematical ideas: (i) the gradient vector

This paper has introduced the gKCCA method and investigated several aspects of the method, including nonlinear cross validation, sensitivity to noise, and the application of different kernel functions. The gKCCA method has at least two model parameters: the Gram matrix regularization parameter *ε* and other parameters belonging to the kernel function, such as Gaussian *σ* or polynomial power *k*. To estimate these parameters, nonlinear cross validation was performed using the adjusted multiple coefficient of determination from an augmented linear model as the cross-validation metric. Experiments with different levels of noise, and different kernels, show that gKCCA is able to recover the underlying nonlinearity between two fields. The application of gKCCA to observed SST data (ERSST) shows a significant quadratic coupling between the low-pass (4–6 years) field and the high-pass (2–3 years) field for the leading gKCCA subspace. The high-frequency pattern has stronger anomalies in the central Pacific than in the east Pacific, compared to other studies investigating nonlinear interactions, but the methods used in these other studies were different. Future studies of these components using climate models are needed.

Further investigations of the application of gKCCA to nonlinear questions in climate science are needed. Further methodological experiments, including the application of a wider range of kernel functions (e.g., Laplacian, periodic, and Matern kernels), are also required. Finally, methodological developments are possible, such as the development of gKCCA for fields with nonlinear coupling between propagating components.

My Julia package CoupledFields contains source code for gKCCA (https://github.com/Mattriks/CoupledFields). Julia is a state-of-the-art programming language. We thank several anonymous reviewers for their useful suggestions that helped improve the manuscript.

# APPENDIX A

## Factorizing the Polynomial and Gaussian Kernel

*k*) is the feature space vector of the Gaussian kernel. The sum of polynomial kernels in Eq. (A2c) means that the Gaussian kernel feature space vector in Eq. (A2e) contains the monomials of all polynomial kernel feature space vectors. That is, the Gaussian kernel feature space vector is of infinite length.

All the above proves (for the polynomial and Gaussian kernels) that a Kernel matrix

# APPENDIX B

## Piecewise Linear Basis Matrix

*k*% quantile of

## REFERENCES

Arenas-García, J., K. B. Petersen, G. Camps-Valls, and L. K. Hansen, 2013: Kernel multivariate analysis framework for supervised subspace learning.

,*IEEE Signal Process. Mag.***30**, 16–29, doi:10.1109/MSP.2013.2250591.Barnett, T. P., 1991: The interaction of multiple time scales in the tropical climate system.

,*J. Climate***4**, 269–285, doi:10.1175/1520-0442(1991)004<0269:TIOMTS>2.0.CO;2.Bejarano, L., and F.-F. Jin, 2008: Coexistence of equatorial coupled modes of ENSO.

,*J. Climate***21**, 3051–3067, doi:10.1175/2007JCLI1679.1.Bretherton, C., C. Smith, and J. M. Wallace, 1992: An intercomparison of methods for finding coupled patterns in climate.

,*J. Climate***5**, 541–560, doi:10.1175/1520-0442(1992)005<0541:AIOMFF>2.0.CO;2.Cannon, A. J., 2006: Nonlinear principal predictor analysis: Application to the Lorenz system.

,*J. Climate***19**, 579–589, doi:10.1175/JCLI3634.1.Cannon, A. J., 2009: Negative ridge regression parameters for improving the covariance structure of multivariate linear downscaling models.

,*Int. J. Climatol.***29**, 761–769, doi:10.1002/joc.1737.Cannon, A. J., and W. W. Hsieh, 2008: Robust nonlinear canonical correlation analysis: application to seasonal climate forecasting.

,*Nonlinear Processes Geophys.***15**, 221–232, doi:10.5194/npg-15-221-2008.Chung, C., and S. Nigam, 1999: Weighting of geophysical data in principal component analysis.

,*J. Geophys. Res.***104**, 16 925–16 928, doi:10.1029/1999JD900234.DelSole, T., and M. K. Tippett, 2008: Predictable components and singular vectors.

,*J. Atmos. Sci.***65**, 1666–1678, doi:10.1175/2007JAS2401.1.Evans, M. N., J. E. Smerdon, A. Kaplan, S. E. Tolwinski-Ward, and J. F. González-Rouco, 2014: Climate field reconstruction uncertainty arising from multivariate and nonlinear properties of predictors.

,*Geophys. Res. Lett.***41**, 9127–9134, doi:10.1002/2014GL062063.Fukumizu, K., and C. Leng, 2014: Gradient-based kernel dimension reduction for regression.

,*J. Amer. Stat. Assoc.***109**, 359–370, doi:10.1080/01621459.2013.838167.Hsieh, W. W., 2001: Nonlinear canonical correlation analysis of the tropical Pacific climate variability using a neural network approach.

,*J. Climate***14**, 2528–2539, doi:10.1175/1520-0442(2001)014<2528:NCCAOT>2.0.CO;2.Huang, B., and Coauthors, 2016: Further exploring and quantifying uncertainties for Extended Reconstructed Sea Surface Temperature (ERSST) version 4 (v4).

,*J. Climate***29**, 3119–3142, doi:10.1175/JCLI-D-15-0430.1.Lim, Y., S. Jo, J. Lee, H.-S. Oh, and H.-S. Kang, 2012: An improvement of seasonal climate prediction by regularized canonical correlation analysis.

,*Int. J. Climatol.***32**, 1503–1512, doi:10.1002/joc.2368.Lima, C. H. R., U. Lall, T. Jebara, and A. G. Barnston, 2009: Statistical prediction of ENSO from subsurface sea temperature using a nonlinear dimensionality reduction.

,*J. Climate***22**, 4501–4519, doi:10.1175/2009JCLI2524.1.Martínez-Gómez, E., M. T. Richards, and D. S. P. Richards, 2014: Distance correlation measures for discovering associations in large astrophysical databases.

,*Astrophys. J.***781**, 39, doi:10.1088/0004-637X/781/1/39.Monahan, A. H., and A. Dai, 2004: The spatial and temporal structure of ENSO nonlinearity.

,*J. Climate***17**, 3026–3036, doi:10.1175/1520-0442(2004)017<3026:TSATSO>2.0.CO;2.NOAA, 2017: Extended Reconstructed Sea Surface Temperature, version 4. NOAA/OAR/ESRL/PSD, accessed 16 January 2017. [Available online at http://www.esrl.noaa.gov/psd/data/gridded.]

Ortiz Beviá, M. J., I. Pérez-González, F. J. Alvarez-García, and A. Gershunov, 2010: Nonlinear estimation of El Niño impact on the North Atlantic winter.

,*J. Geophys. Res.***115**, D21123, doi:10.1029/2009JD013387.Ramírez, M. C., N. J. Ferreira, and H. F. C. Velho, 2006: Linear and nonlinear statistical downscaling for rainfall forecasting over southeastern Brazil.

,*Wea. Forecasting***21**, 969–989, doi:10.1175/WAF981.1.Roulston, M. S., and J. D. Neelin, 2003: Nonlinear coupling between modes in a low-dimensional model of ENSO.

,*Atmos.–Ocean***41**, 217–231, doi:10.3137/ao.410303.Schreiber, T., and A. Schmitz, 2000: Surrogate time series.

,*Physica D***142**, 346–382, doi:10.1016/S0167-2789(00)00043-9.Schreier, P. J., 2008: A unifying discussion of correlation analysis for complex random vectors.

,*IEEE Trans. Signal Process.***56**, 1327–1336, doi:10.1109/TSP.2007.909054.Scott, D. W., 2015: The curse of dimensionality and dimension reduction.

*Multivariate Density Estimation: Theory, Practice, and Visualization*, D. W. Scott, Ed., John Wiley and Sons, 217–240, doi:10.1002/9781118575574.ch7.Smerdon, J. E., A. Kaplan, D. Chang, and M. N. Evans, 2011: A pseudoproxy evaluation of the CCA and RegEM methods for reconstructing climate fields of the last millennium.

,*J. Climate***24**, 1284–1309, doi:10.1175/2010JCLI4110.1.Smoliak, B. V., J. M. Wallace, P. Lin, and Q. Fu, 2015: Dynamical adjustment of the Northern Hemisphere surface air temperature field: Methodology and application to observations.

,*J. Climate***28**, 1613–1629, doi:10.1175/JCLI-D-14-00111.1.Tippett, M. K., T. DelSole, S. J. Mason, and A. G. Barnston, 2008: Regression-based methods for finding coupled patterns.

,*J. Climate***21**, 4384–4398, doi:10.1175/2008JCLI2150.1.Tolwinski-Ward, S. E., M. N. Evans, M. K. Hughes, and K. J. Anchukaitis, 2011: An efficient forward model of the climate controls on interannual variation in tree-ring width.

,*Climate Dyn.***36**, 2419–2439, doi:10.1007/s00382-010-0945-5.Wilks, D. S., 2008: Improved statistical seasonal forecasts using extended training data.

,*Int. J. Climatol.***28**, 1589–1598, doi:10.1002/joc.1661.Wu, A., W. W. Hsieh, and A. Shabbar, 2005: The nonlinear patterns of North American winter temperature and precipitation associated with ENSO.

,*J. Climate***18**, 1736–1752, doi:10.1175/JCLI3372.1.Yang, X.-Q., Q. Xie, and S.-S. Huang, 1997: Unstable interannual oscillation modes in a linearized coupled ocean-atmosphere model and their association with ENSO.

,*Meteor. Atmos. Phys.***62**, 161–177, doi:10.1007/BF01029700.