1. Introduction
In meteorology and oceanography, and other fields, it is often necessary to check whether two quantities x and y are linearly related. Usually this involves the calculation of a correlation coefficient and a regression coefficient. Both coefficients are useful; the sample correlation coefficient
In practice, “noise” reduces the correlation coefficient and affects the accuracy of the regression coefficient. Noise can be due to measurement error or to different physical processes in x and y that affect the linearizing process common to both. While measurement error can sometimes be estimated for each variable, error due to possible physical influences in the real data is often not known. So there is a large class of regression problems in meteorology, oceanography, and other fields in which the signal-to-noise ratio in both variables is unknown.















Many methods (Ricker 1973; Jolicoeur 1975; Riggs et al. 1978; McArdle 1988, 2003; Frost and Thompson 2000) have been devised to obtain a best estimate for
In this paper, we overcome this difficulty by noting that when the relative size of the noise for each variable is unknown, we have no basis for choosing between the variables. Consequently, we must assume that the noise in each variable is equally likely subject to the constraint that the sample correlation coefficient for the given set of data is known. This equal likelihood noise assumption, subject to the known sample correlation coefficient, enables us to determine explicitly the confidence intervals in the
The rest of the paper is organized as follows: in the next section we discuss an unbiased estimate for the true regression coefficient when nothing is known about the noise in each variable. Then, in section 3, we examine the
2. An unbiased estimate for the true regression coefficient
Consider the problem of estimating the true regression coefficient between x and y when nothing is known or assumed about the noise in x and y except for the constraint provided by the sample correlation coefficient. One estimate for the true regression coefficient α of y on x is



















Note that
3. The probability density function for α/αGMR for the limiting M 
case

Given M data pairs, we can calculate both a correlation coefficient












The hyperbolic curve segments defined by
Citation: Journal of Atmospheric and Oceanic Technology 30, 1; 10.1175/JTECH-D-12-00067.1
Note that other assumptions about the distribution of nx and ny along the curve (18) are not justifiable. For example, if we assume nx is uniformly distributed along the curve, then because of the hyperbolic form of (18), ny is not uniformly distributed along the curve. Different distributions would be inappropriate, since we have no way of distinguishing the noise in x and y; all we have is the knowledge that the noise pair (nx, ny) lies somewhere along the curve defined by (18) and illustrated in Fig. 1.



















The
Citation: Journal of Atmospheric and Oceanic Technology 30, 1; 10.1175/JTECH-D-12-00067.1









The length of the hyperbolic curves
Citation: Journal of Atmospheric and Oceanic Technology 30, 1; 10.1175/JTECH-D-12-00067.1
While it is useful to have obtained results for the limiting large M case, in practice we are usually faced with the problem of determining α given a finite number M of hard-won data points and finite M estimates
4. Confidence intervals for the true regression coefficient for finite M






















We found the required confidence intervals numerically from (43) and (44), sampling the zero-mean, unit-variance variables X*, ɛ*, and δ* from independent zero-mean, unit-variance normal distributions. The numerical calculations proceed by first obtaining a random r from a uniform distribution over the interval [−1, 1]. For that r we then randomly sample a point (nx, ny) from a uniform distribution along the curve AB corresponding to r2. Now that we have nx and ny we can use the independent zero-mean, unit-variance normal distributions of X*, ɛ*, and δ* to obtain M random samples of x* and y* from (43) and (44) and hence obtain the correlation coefficient







Since we have an analytical solution for

Upper (U) and lower (L) 95% confidence interval limits as a function of
Citation: Journal of Atmospheric and Oceanic Technology 30, 1; 10.1175/JTECH-D-12-00067.1
5. An example


Bunge and Clarke (2009) tested the validity of (46) using monthly pressure and wind stress data from 1978 to 2003 inclusive. Since the data are autocorrelated, the number of degrees of freedom in the data is not the number of months of data, but rather the number of years of data because El Niño time series can be thought of as independent 12-month segments [see, e.g., Fig. 2.14 of Clarke (2008)]. As there are 26 yr of data, M = 26. Also, Bunge and Clarke found from the correlation of the time series in (46) and their standard deviations that


Note that the above confidence interval takes into account both the finite number of points M and our uncertainty about the noise. By comparison, the standard ordinary least squares regression of the left-hand side of (46) and on the right-hand side gives a 95% confidence interval (138 m, 204 m). This interval is smaller than that in (47), but it is a confidence interval for the M =
6. Concluding remarks
Two reviewers’ comments made us think that we should point out here the difference between linear prediction and our goal of estimating the true regression coefficient
The preceding is related to, but separate from, our goal to find the true regression coefficient α between the variables Y and X given noisy realizations y and x as stated in (2) and (3). In our case, if it is known that the “noise”-to-signal ratio in at least one of the variables x and y (say x) is small, then the ordinary least squares regression coefficient is nearly unbiased and can be used [see (9) with small nx]. But when the noise-to-signal ratio is not known in both variables, the ordinary least squares regression coefficient is biased. In that case the confidence intervals that are calculated for
We thank F. Huffer for helpful comments on an early version of our paper and the National Science Foundation for financial support (Grants OCE-0850749 and OCE-1155257).
REFERENCES
Barker, F., , Soh Y. C. , , and Evans R. J. , 1988: Properties of the geometric mean functional relationship. Biometrics, 44, 279–281.
Bunge, L., , and Clarke A. J. , 2009: A verified estimation of the El Niño index Niño-3.4 since 1877. J. Climate, 22, 3979–3992.
Clarke, A. J., 2008: An Introduction to the Dynamics of El Niño & the Southern Oscillation. Academic Press, 324 pp.
Emery, W. J., , and Thomson R. E. , 2001: Data Analysis Methods in Physical Oceanography. 2nd rev. ed. Elsevier, 638 pp.
Frost, C., , and Thompson S. G. , 2000: Correcting for regression dilution bias: Comparison of methods for a single predictor variable. J. Roy. Stat. Soc., A163, 173–189.
Garrett, C., , and Petrie B. , 1981: Dynamical aspects of the flow through the Strait of Belle Isle. J. Phys. Oceanogr., 11, 376–393.
Jolicoeur, P., 1975: Linear regressions in fisheries research: Some comments. J. Fish. Res. Board Canada, 32, 1491–1494.
Kendall, M. G., , and Stuart A. , 1973: The Advanced Theory of Statistics. 3rd ed. Vol. 2, Griffin, 723 pp.
McArdle, B. H., 1988: The structural relationship: Regression in biology. Can. J. Zool., 66, 2329–2339.
McArdle, B. H., 2003: Lines, models, and errors: Regression in the field. Limnol. Oceanogr., 48, 1363–1366.
Ricker, W. E., 1973: Linear regressions in fishery research. J. Fish. Res. Board Canada, 30, 409–434.
Ricker, W. E., 1975: A note concerning Professor Jolicoeur’s comments. J. Fish. Res. Board Canada, 32, 1494–1498.
Riggs, D. S., , Guarnieri J. A. , , and Addelman S. , 1978: Fitting straight lines when both variables are subject to error. Life Sci., 22, 1305–1360.
Sokal, R. R., , and Rohlf F. J. , 1995: Biometry: The Principles and Practice of Statistics in Biological Research. 3rd ed. W. H. Freeman and Co., 887 pp.
Sprent, P., , and Dolby G. R. , 1980: The geometric mean functional relationship. Biometrics, 36, 547–550.