Constructed Analogs and Linear Regression

Michael K. Tippett International Research Institute for Climate and Society, Columbia University, Palisades, New York, and Center of Excellence for Climate Change Research, Department of Meteorology, King Abdulaziz University, Jeddah, Saudi Arabia

Search for other papers by Michael K. Tippett in
Current site
Google Scholar
PubMed
Close
and
Timothy DelSole George Mason University, Fairfax, Virginia, and Center for Ocean–Land–Atmosphere Studies, Calverton, Maryland

Search for other papers by Timothy DelSole in
Current site
Google Scholar
PubMed
Close
Full access

Abstract

The constructed analog procedure produces a statistical forecast that is a linear combination of past predictand values. The weights used to form the linear combination depend on the current predictor value and are chosen so that the linear combination of past predictor values approximates the current predictor value. The properties of the constructed analog method have previously been described as being distinct from those of linear regression. However, here the authors show that standard implementations of the constructed analog method give forecasts that are identical to linear regression forecasts. A consequence of this equivalence is that constructed analog forecasts based on many predictors tend to suffer from overfitting just as in linear regression. Differences between linear regression and constructed analog forecasts only result from implementation choices, especially ones related to the preparation and truncation of data. Two particular constructed analog implementations are shown to correspond to principal component regression and ridge regression. The equality of linear regression and constructed analog forecasts is illustrated in a Niño-3.4 prediction example, which also shows that increasing the number of predictors results in low-skill, high-variance forecasts, even at long leads, behavior typical of overfitting. Alternative definitions of the analog weights lead naturally to nonlinear extensions of linear regression such as local linear regression.

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus, 61 Route 9W, Palisades, NY 10964. E-mail: tippett@iri.columbia.edu

Abstract

The constructed analog procedure produces a statistical forecast that is a linear combination of past predictand values. The weights used to form the linear combination depend on the current predictor value and are chosen so that the linear combination of past predictor values approximates the current predictor value. The properties of the constructed analog method have previously been described as being distinct from those of linear regression. However, here the authors show that standard implementations of the constructed analog method give forecasts that are identical to linear regression forecasts. A consequence of this equivalence is that constructed analog forecasts based on many predictors tend to suffer from overfitting just as in linear regression. Differences between linear regression and constructed analog forecasts only result from implementation choices, especially ones related to the preparation and truncation of data. Two particular constructed analog implementations are shown to correspond to principal component regression and ridge regression. The equality of linear regression and constructed analog forecasts is illustrated in a Niño-3.4 prediction example, which also shows that increasing the number of predictors results in low-skill, high-variance forecasts, even at long leads, behavior typical of overfitting. Alternative definitions of the analog weights lead naturally to nonlinear extensions of linear regression such as local linear regression.

Corresponding author address: M. K. Tippett, International Research Institute for Climate and Society, The Earth Institute of Columbia University, Lamont Campus, 61 Route 9W, Palisades, NY 10964. E-mail: tippett@iri.columbia.edu

1. Introduction

A general prediction problem is to find the best estimate of a quantity y given a related quantity x. We refer to vectors y and x as the predictand and predictor, respectively. Examples of typical earth science prediction problems are as follows: x is the current sea surface temperature and y is its future state (Penland and Magorian 1993); x is a prescribed CO2 concentration and y is global surface temperature (Krueger and Von Storch 2011); x is a large-scale climate feature and y is an associated small-scale climate feature (Robertson et al. 2012). In principle, the probability distribution of y for a particular value of predictor x = x0 (the conditional distribution) can be computed from physical laws or estimated from data. In either case, the mean of that distribution (the conditional mean) is the best forecast in the sense of minimizing the expected squared error. When x and y have a joint Gaussian distribution, the best forecast, as well as its uncertainty, is given by linear regression (LR).

The idea of conditional averaging is also found in the constructed analog (CA) method (Van den Dool 1994, 2006), a statistical forecast method that has been applied in a variety of geophysical problems (e.g., Van den Dool et al. 2003; Maurer and Hidalgo 2008; Hawkins et al. 2011). A prediction yCA is made for a particular value of the predictor x = x0 by searching through historical data for values of y corresponding to values of x that are close to x0, so-called analogs. The CA method expresses the current predictor state x0 as a weighted linear combination of past states and makes a prediction by applying those same weights to the corresponding values of y, an averaging procedure reminiscent of the conditional mean. The CA has previously been described as differing from LR in two fundamental ways. First, it has been claimed that by making no assumption of a linear relation between predictor and predictand, CA captures nonlinearity. Second, it has been claimed that since CA is not based on minimizing the mean squared error of the predictions, there is no danger of overfitting. Here we show that typical implementations of CA do not have these properties, and, in fact, CA forecasts are identical to LR forecasts.

The paper is organized as follows. In section 2 we review the least squares problems that arise in the formulations of LR and CA, and use the matrix pseudoinverse to show that simple (without predictor truncation or regularization) implementations of the two methods give identical forecasts. In section 3, we identify situations where the simple implementation overfits the data and show that a recommended CA implementation is the same as principal component regression. In section 4, we show that another common CA implementation corresponds to ridge regression. In section 5, we show that LR and CA predictions of the Niño-3.4 index are identical and may have large variance even at long leads. In section 6, we present and illustrate some nonlinear regression methods that follow naturally from modifications to CA. A summary and discussion are given in section 7.

2. Linear regression, constructed analogs, and pseudoinverses

We use the following matrix notation for the training data. Let be the Nx × Nt matrix of predictor data; Nx is the number of predictor variables and Nt is the number of time samples. Each column of contains the predictor variables (x) at a particular time; each row of contains the time series of a particular predictor variable. Likewise, let be the Ny × Nt matrix of predictand data; Ny is the number of predictand variables. Let x0 be the Nx × 1 column vector of predictor variables to be used in a forecast. We assume that predictors and predictands are expressed as anomalies. More generally, a row of ones can be included in to account for an intercept term.

Linear regression finds the Ny × Nx matrix of regression coefficients such that the norm of the residuals
e1
is minimized. The notation ‖ · ‖2 denotes the square of the Frobenius norm, which is the sum of the squares of the entries of the matrix or vector to which it is applied. The linear regression forecast yLR is
e2
Practically, computing the matrix of regression coefficients by direct minimization of (1) may be ill posed (there is no unique solution when Nx > Nt) or ill advised (overfitting can lead to poor performance in independent data when Nx is comparable to Nt).
The CA method also involves a linear least squares minimization problem. In the CA method, x0 is expressed as a weighted sum of past states (columns of ), and a prediction is formed by applying those same weights to the columns of . Specifically, CA finds the Nt × 1 column-vector a of weights that minimizes
e3
and then makes a prediction yCA by applying those weights to the columns of :
e4
The linear least squares problems appearing in the formulations of LR and CA appear quite different. For instance, the matrix of LR coefficients multiplies the data on the left to combine different predictors, and the vector a of CA weights multiplies the data on the right to combine different times. Also, CA involves fitting x0 while LR fits . One of the least squares problems is always underdetermined and the other overdetermined unless Nx = Nt. We will use the pseudoinverse of the data matrix to solve both linear least squares problems and show that the resulting LR and CA forecasts are identical. In particular, (1) is minimized by
e5
where + is the pseudoinverse of , a quantity that we will define and discuss later (Hansen 1998). When Nx > Nt, (1) is underdetermined, its minimizer is not unique, and = + is the minimizer with minimum value of ‖2. The pseudoinverse commutes with the transpose in that, (+)T = (T)+. For this reason, the minimizer of (3) can also be expressed using the pseudoinverse, and
e6
When Nx < Nt, (3) is underdetermined, and a = +x0 is the minimizer with minimum norm. We refer to these direct minimizing solutions as providing “simple” implementations of LR and CA. Substituting the simple minimizers of (5) and (6) into the definitions of the LR and CA predictions, (2) and (4), respectively, we see that
e7
Remarkably, the simple linear regression and constructed analog predictions are identical.

3. Connection to principal component regression

While the simple LR implementation does solve the least squares problem and find the best fit to the data, it does so using all of the predictors. Such an approach is ill advised when the number of predictors is comparable to the number of samples since overfitting may result in poor predictions on independent data. To see this point more clearly, let us return to the matter of actually defining the pseudoinverse. The pseudoinverse of is defined using its singular value decomposition (SVD):
e8
where and are orthogonal square matrices of size Nx × Nx and Nt × Nt, respectively, and is a diagonal Nx × Nt matrix with nonnegative entries (Golub and Van Loan 1996). The so-called economical SVD is
e9
where and retain the columns of and , respectively, corresponding to the nonzero diagonal elements of , and the elements of the square diagonal matrix are strictly positive; the number of positive diagonal entries of is at most min(Nx, Nt − 1) for anomaly data. The pseudoinverse of is defined to be
e10
The matrix is square with positive diagonal entries and is thus invertible. Therefore, the simple LR and CA forecasts are
e11
In the language of principal component analysis (PCA), the columns of the matrices and are the empirical orthogonal function (EOFs) and principal components (PCs), respectively, of the anomaly data . The factors of serve to normalize the PCs to have unit variance since the columns of are unit vectors with zero mean. Principal component regression (PCR) arises from taking the PCs as predictors rather than the original data in . If we were to use all of the PCs as predictors [simple PCR (SPCR)], we would find the matrix SPCR of regression coefficients that minimizes
e12
This linear least squares problem can be solved by finding the pseudoinverse of , which is . Therefore,
e13
The simple PCR forecast ySPCR is obtained by applying SPCR to the PC amplitudes of x0 which are . Therefore,
e14
Therefore, the LR and CA forecasts with the simple minimizers are the same as the simple PCR forecast, which uses all of the PCs as predictors. Such an approach overfits the data and has poor prediction skill on independent data unless the number of samples is substantially larger than the number of predictors.
To obtain more robust CA weights in the case where the number Nx of predictors is comparable or exceeds the number Nt of samples, Van den Dool (2006) proposed projecting x0 and on to a truncated set of EOFs. We use the tilde notation to denote such a truncation with and , and the double-dot notation to denote the truncation of the SVD. Computing the CA weights with the truncated data gives
e15
and applying them to gives as prediction,
e16
where and are the (truncated) PC amplitudes of x0. From the previous discussion leading to (14) we recognize (16) as the PCR forecast based on the truncated set of PCs. Computing CA weights with data projected on to a truncated set of EOFs gives the same forecast as PCR using the same truncated set of PCs.

The choice of the number of PCs to use in the calculation of the CA weights has exactly the same effect on the forecast as the choice of the number of PCs to use in PCR. In both cases, using too many PCs leads to overfitting.

4. Connection to ridge regression

Another approach to the linear least squares problems in (1) and (3) that appear in the formulations of LR and CA is ridge regression, also known as Tikhonov regularization. The regularized solutions of (1) and (3) are
e17
and
e18
respectively, where is the appropriately sized identity matrix and the ridge parameter δ is a positive scalar (Hansen 1998). The regularized solutions are well-defined irrespective of the parameters Nx and Nt. The matrix δ is precisely that used in ridge regression, and Van den Dool (2006) suggested using aδ in CA. Remarkably, the resulting forecasts yLA,δ and yCA,δ are identical:
e19
where we have used the push-through matrix identity (T + δ)−1T = T(T + δ)−1. Use of ridging in computing the CA weights or in computing the LR coefficients results in identical forecasts.
The ridge regression solution is directly related to the pseudoinverse-based solution since an equivalent definition of the pseudoinverse is
e20
Consequently,
e21
The ridge regression forecast in the limit of δ going to zero is the same as the LR or CA forecast with a simple minimizer. This result is consistent with the interpretation of ridge regression as solving the least squares problems subject to a constraint on the size of the solution (DelSole 2007).

5. Example: Niño-3.4 prediction

A typical application of CA and LR is the prediction of the Niño-3.4 index (Van den Dool 2006). We consider forecasts made in the beginning of July and take as predictors the gridded April–June sea surface temperature (SST) anomaly in the region from 40°S to 40°N from the extended reconstructed SST (ERSST) dataset, version 3b (Smith and Reynolds 2004). The historical data used to form come from the 49-yr period 1955–2003, and the anomalies are computed with respect to the same period. The predictand y is the 3-month average of Niño-3.4 anomaly with respect to the 1971–2000 period taken from the extended Kaplan dataset (Kaplan et al. 1998) at leads extending to lead 22; denoting the July–September 2005 as the zero-month lead forecast, lead 22 is April–June 2007. Here the initial condition x0 is the April–June 2005 SST anomaly, and y consists of the Niño-3.4 index from April–June 2005 to April–June 2007, 25 leads. Forecasts are made based on varying number of area-weighted EOFs; no ridging is used.

Figure 1 shows that CA and PCR forecasts based on the same number of EOFs are identical. On the other hand, forecasts based on different numbers of EOFs can vary greatly. Forecasts using 10 EOFs show little variability, while those with 25 or more show considerable variability. This particular set of forecasts verifies well against observations out to a lead of nearly two years. The skill of forecasts made in July for the following March–May (lead 8) was computed for period 1955–2003 using the entire dataset and using leave-one-out cross validation (CV) applied to the LR coefficients and CA weights; the PCs were computed using the full dataset. The CV skill of the 10 EOF forecasts is the highest, and as the number of EOFs increases, the resulting forecasts have lower CV skill and greater variance (Table 1). On the other hand, the in-sample correlation increases as the number of EOFs increases, and the in-sample ratio of forecast to climatological variance is equal to the in-sample correlation. The variance of the cross-validated forecasts is greater than the climatological variance when 25 or more EOFs are used. The reason for this behavior is that the in-sample explained variance and the variance of the regression coefficient estimates, both of which are increasing functions of the number of predictors, contribute to the variance of the cross-validated forecasts. The behavior of the CV forecasts, especially those with more than 10 EOFs, is consistent with that of overfitting with in-sample skill being substantially greater than the CV skill, and the CV skill being inconsistent with the variance.

Fig. 1.
Fig. 1.

Constructed analog (CA) and principal component regression (PCR) forecasts along with observations (obs) of the three-month-average Niño-3.4 index. Forecasts are made at the beginning of July and extend through April–June of 2007. The numbers in the legend indicate the number of EOFs retained.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00223.1

Table 1.

Skill and ratio of forecast to climatological variance of in-sample and leave-one-out cross-validated (CV) forecasts made in the beginning of July for the following March–May average (lead 8) of the Niño-3.4 index during the period 1955–2003.

Table 1.

6. Nonlinear CA

Our demonstration that CA forecasts are identical to LR forecasts depends on the weights being defined as the solution of the least squares problem in (3), and a particular solution being chosen in the underdetermined case. Other characterizations of the weights lead to quite different methods. Before considering other methods of computing weights, we examine the properties of the CA weights in more detail, focusing on the case when Nx < Nt. In this case, (3) does not have a unique solution and using the pseudoinverse or ridge regression selects a particular solution for the weights. The simple minimizer weights are
e22
The form of (22) means that for any x0, the vector a of weights is a linear combination of the columns of . Since the columns of span the same linear space as the rows of , the weights are a linear combination of the rows of . In other words, for some Nx × 1 vector b,
e23
in particular, , and in the case that the predictors are PCs, b = x0. Equation (23) means that the weights, viewed as a function of the data, lie on a hyperplane perpendicular to the (Nt + 1) × 1 vector [bT, −1]. Because the weights are linear functions of the data, data with values near x0 do not receive the largest weights, nor do data far from x0 receive the smallest weights. The CA weights do not measure the distance of x0 to the training data values. In particular, if x0 is a “natural analog” and has the same values as a column of , the weights are not concentrated on that column of .
Modifying the definition of the CA weights, so that they are a function of the distance between the data and x0, results in nonlinear statistical prediction algorithms with weights that depend nonlinearly on the data. Importantly, in the case when Nx < Nt, such a modification requires neither changing the least squares problem in (3) or the forecast equation in (4), but rather involves constructing alternative solutions to (3), that is, ones without the constraint that ‖a2 be minimized. For instance, in the k-nearest neighbors (KNN) algorithm, the elements of the weight vector are all zero except for those corresponding to the k columns of that are closest to x0, which have value 1/k (Hastie et al. 2009). Explicitly, the ith KNN weight is
e24
where Ck(x0) is the set of k columns of nearest to x0. The KNN prediction is the average of the columns of corresponding to the k columns of nearest to x0. Kernel methods generalize KNN by using weights that are a smoothly decreasing function of the distance between the columns of and x0. In particular, the ith kernel smoother (KS) weight is
e25
where the kernel function K(x, x0, λ) is a smoothly decreasing, positive function of the distance between x and x0, and λ is a parameter that determines how quickly the kernel function decreases to zero. Local linear regression (LLR) is another kernel method and computes the weights using generalized least squares with data close to x0 receiving more emphasis. Specifically,
e26
where the matrix is a Nt × Nt diagonal matrix that depends on x0 and whose ith diagonal entry is
e27
We applied these methods to 30 samples of univariate data generated by
e28
where x and ε are Gaussian distributed with mean zero and unit variance. A row of ones is included in to account for a possible intercept term. Figure 2a shows that LR–CA fails to capture the nonlinear relation. The KKN fit with k = 5 is noisy and piecewise constant with discontinuities. A Gaussian kernel smoother (GKS) with a standard deviation of 0.35 and LLR (with the same Gaussian kernel) give similar results with LLR showing an advantage near the boundaries of the data. It is important to note that the performance of KNN depends on the choice of k, while the performance of the GKS and LRR depends on the kernel parameter λ. Here we have selected fairly arbitrary values for these parameters that give good performance. However, like the regression coefficients, these parameters should be chosen objectively in a way that avoids overfitting.
Fig. 2.
Fig. 2.

(a) Data (plus signs) generated by (28) fit by linear regression (LR)–constructed analog (CA), k-nearest neighbors (KNN), Gaussian kernel smoother (GKS), and local linear regression (LLR). The “truth” curve is the expected value of y given x. (b) The CA, KNN, GKS, and LLR weights for x0 = −0.5. The LLR weights are divided by 4 for display purposes.

Citation: Monthly Weather Review 141, 7; 10.1175/MWR-D-12-00223.1

The CA, KNN, GKS, and LLR weights are quite different for x0 = −0.5 as shown in Fig. 2b. The sum of the weights is one for all methods due to the intercept term. A clear feature of the CA weights is that they are a linear function of the data values and display no maximum near x0. This behavior is general as discussed earlier. The KNN weights are zero except for the five data points nearest to x0 where they are ⅕. The GKS weights have largest values near x0 and decrease to zero as the distance to x0 increases. The LLR weights are locally linear near x0 with values that go to zero far from x0.

7. Summary and discussion

While the constructed analog (CA) statistical forecast method has previously been described as having properties that are distinct from those of linear regression (LR; Van den Dool 2006), we have shown here that, with comparable treatment of the data, CA and LR produce identical forecasts, and therefore the properties of CA are the same as those of LR. In particular, CA forecasts are linear functions of the predictors and subject to overfitting. When EOF truncation is used in the CA calculation, the resulting forecast is the same as that given by principal component regression (PCR) based on the same EOFs. Likewise, using ridging in the calculation of CA weights results in the same forecast as does ridge regression.

These results were illustrated in an example where sea surface temperature was used to predict the Niño-3.4 index. The CA and PCR forecasts based on the same number of PCs are identical. When many PCs were used, the forecasts show high variance, even at long leads, but low cross-validated skill, a symptom of overfitting. The equivalence between LR and CA depends on the precise definition of the weights. Allowing the weights to depend nonlinearly on the data leads naturally to generalizations of CA such as kernel smoothers and local linear regression, which we have illustrated with an example.

In practice, LR forecasts are observed to differ from CA forecasts. Moreover, forecasts from different implementations of LR also differ. For instance, LR-based statistical forecasts of ENSO including CA have quite different properties (Barnston et al. 2012). Use of distinct datasets may explain some of these differences. However, it must be recognized that many linear regression forecasts, with significant variations in skill, can be constructed from a given dataset of predictors and predictands. There are two primary sources of variety. First, the predictors or predictands can be truncated, and the regression developed on the truncated data. Principal component analysis and canonical correlation analysis are commonly used methods for truncating the data that enter a LR. The resulting forecasts depend on the truncation choices as illustrated here in the Niño-3.4 example where the forecasts depend strongly on the number of principal components retained as predictors. Linear inverse models and autoregressive methods usually project both the predictors and predictands onto EOFs (DelSole and Chang 2003); CA generally only projects the predictors, thus leading to different forecasts. Second, there are a variety of methods for estimating the LR coefficients. In addition to the classic least squares method, there are shrinkage methods like ridge and lasso (Hastie et al. 2009). The CA often uses ridge; PCR does not, again leading to different forecasts. Appropriate choices of data truncation and coefficient estimation method are key to developing a skillful LR forecast.

Acknowledgments

The authors thank Huug van den Dool for his generous and helpful comments, and two anonymous reviewers for their useful suggestions. MKT is supported by grants from the National Oceanic and Atmospheric Administration (Grants NA05OAR4311004 and NA08OAR4320912) and the Office of Naval Research (Grant N00014-12-1-0911). TD gratefully acknowledges support from grants from the NSF (Grant 0830068), the National Oceanic and Atmospheric Administration (Grant NA09OAR4310058), and the National Aeronautics and Space Administration (Grant NNX09AN50G). The views expressed herein are those of the authors and do not necessarily reflect the views of NOAA or any of its subagencies.

REFERENCES

  • Barnston, A. G., M. K. Tippett, M. L. L'Heureux, S. Li, and D. G. DeWitt, 2012: Skill of real-time seasonal ENSO model predictions during 2002–2011. Is our capability increasing? Bull. Amer. Meteor. Soc., 93, 631651.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., 2007: A Bayesian framework for multimodel regression. J. Climate, 20, 28102826.

  • DelSole, T., and P. Chang, 2003: Predictable component analysis, canonical correlation analysis, and autoregressive models. J. Atmos. Sci., 60, 409416.

    • Search Google Scholar
    • Export Citation
  • Golub, G. H., and C. F. Van Loan, 1996: Matrix Computations. 3rd ed. The Johns Hopkins University Press, 694 pp.

  • Hansen, P., 1998: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. Society for Industrial and Applied Mathematics, 247 pp.

  • Hastie, T., R. Tibshirani, and J. Friedman, 2009: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 768 pp.

  • Hawkins, E., J. Robson, R. Sutton, D. Smith, and N. Keenlyside, 2011: Evaluating the potential for statistical decadal predictions of sea surface temperatures with a perfect model approach. Climate Dyn., 37, 24952509.

    • Search Google Scholar
    • Export Citation
  • Kaplan, A., M. A. Cane, Y. Kushnir, A. C. Clement, M. B. Blumenthal, and B. Rajagopalan, 1998: Analyses of global sea surface temperature 1856-1991. J. Geophys. Res., 103 (C9), 18 56718 589.

    • Search Google Scholar
    • Export Citation
  • Krueger, O., and J.-S. Von Storch, 2011: A simple empirical model for decadal climate prediction. J. Climate, 24, 12761283.

  • Maurer, E. P., and H. G. Hidalgo, 2008: Utility of daily vs. monthly large-scale climate data: An intercomparison of two statistical downscaling methods. Hydrol. Earth Syst. Sci., 12, 551563.

    • Search Google Scholar
    • Export Citation
  • Penland, C., and T. Magorian, 1993: Prediction of Niño-3 sea surface temperatures using linear inverse modeling. J. Climate, 6, 10671076.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., J.-H. Qian, M. K. Tippett, V. Moron, and A. Lucero, 2012: Downscaling of seasonal rainfall over the Philippines: Dynamical versus statistical approaches. Mon. Wea. Rev., 140, 12041218.

    • Search Google Scholar
    • Export Citation
  • Smith, T. M., and R. W. Reynolds, 2004: Improved extended reconstruction of SST (1854–1997). J. Climate, 17, 24662477.

  • Van den Dool, H., 1994: Searching for analogues, how long must we wait? Tellus, 46A, 314324.

  • Van den Dool, H., 2006: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 240 pp.

  • Van den Dool, H., J. Huang, and Y. Fan, 2003: Performance and analysis of the constructed analogue method applied to U.S. soil moisture over 1981–2001. J. Geophys. Res., 108, 8617, doi:10.1029/2002JD003114.

    • Search Google Scholar
    • Export Citation
Save
  • Barnston, A. G., M. K. Tippett, M. L. L'Heureux, S. Li, and D. G. DeWitt, 2012: Skill of real-time seasonal ENSO model predictions during 2002–2011. Is our capability increasing? Bull. Amer. Meteor. Soc., 93, 631651.

    • Search Google Scholar
    • Export Citation
  • DelSole, T., 2007: A Bayesian framework for multimodel regression. J. Climate, 20, 28102826.

  • DelSole, T., and P. Chang, 2003: Predictable component analysis, canonical correlation analysis, and autoregressive models. J. Atmos. Sci., 60, 409416.

    • Search Google Scholar
    • Export Citation
  • Golub, G. H., and C. F. Van Loan, 1996: Matrix Computations. 3rd ed. The Johns Hopkins University Press, 694 pp.

  • Hansen, P., 1998: Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion. Society for Industrial and Applied Mathematics, 247 pp.

  • Hastie, T., R. Tibshirani, and J. Friedman, 2009: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 768 pp.

  • Hawkins, E., J. Robson, R. Sutton, D. Smith, and N. Keenlyside, 2011: Evaluating the potential for statistical decadal predictions of sea surface temperatures with a perfect model approach. Climate Dyn., 37, 24952509.

    • Search Google Scholar
    • Export Citation
  • Kaplan, A., M. A. Cane, Y. Kushnir, A. C. Clement, M. B. Blumenthal, and B. Rajagopalan, 1998: Analyses of global sea surface temperature 1856-1991. J. Geophys. Res., 103 (C9), 18 56718 589.

    • Search Google Scholar
    • Export Citation
  • Krueger, O., and J.-S. Von Storch, 2011: A simple empirical model for decadal climate prediction. J. Climate, 24, 12761283.

  • Maurer, E. P., and H. G. Hidalgo, 2008: Utility of daily vs. monthly large-scale climate data: An intercomparison of two statistical downscaling methods. Hydrol. Earth Syst. Sci., 12, 551563.

    • Search Google Scholar
    • Export Citation
  • Penland, C., and T. Magorian, 1993: Prediction of Niño-3 sea surface temperatures using linear inverse modeling. J. Climate, 6, 10671076.

    • Search Google Scholar
    • Export Citation
  • Robertson, A. W., J.-H. Qian, M. K. Tippett, V. Moron, and A. Lucero, 2012: Downscaling of seasonal rainfall over the Philippines: Dynamical versus statistical approaches. Mon. Wea. Rev., 140, 12041218.

    • Search Google Scholar
    • Export Citation
  • Smith, T. M., and R. W. Reynolds, 2004: Improved extended reconstruction of SST (1854–1997). J. Climate, 17, 24662477.

  • Van den Dool, H., 1994: Searching for analogues, how long must we wait? Tellus, 46A, 314324.

  • Van den Dool, H., 2006: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 240 pp.

  • Van den Dool, H., J. Huang, and Y. Fan, 2003: Performance and analysis of the constructed analogue method applied to U.S. soil moisture over 1981–2001. J. Geophys. Res., 108, 8617, doi:10.1029/2002JD003114.

    • Search Google Scholar
    • Export Citation
  • Fig. 1.

    Constructed analog (CA) and principal component regression (PCR) forecasts along with observations (obs) of the three-month-average Niño-3.4 index. Forecasts are made at the beginning of July and extend through April–June of 2007. The numbers in the legend indicate the number of EOFs retained.

  • Fig. 2.

    (a) Data (plus signs) generated by (28) fit by linear regression (LR)–constructed analog (CA), k-nearest neighbors (KNN), Gaussian kernel smoother (GKS), and local linear regression (LLR). The “truth” curve is the expected value of y given x. (b) The CA, KNN, GKS, and LLR weights for x0 = −0.5. The LLR weights are divided by 4 for display purposes.

All Time Past Year Past 30 Days
Abstract Views 0 0 0
Full Text Views 4227 3625 57
PDF Downloads 517 141 9