Abstract

Climate “normals” are statistical estimates of present and/or near-future climate means for such quantities as seasonal temperature or precipitation. In a changing climate, simply averaging a large number of previous years of data may not be the best method for estimating normals. Here eight formulations for climate normals, including the recently proposed “hinge” function, are compared in artificial- and real-data settings. Although the hinge function is attractive conceptually for representing accelerating climate changes simply, its use is in general not yet justified for divisional U.S. seasonal temperature or precipitation. Averages of the most recent 15 and 30 yr have performed better during the recent past for U.S. divisional seasonal temperature and precipitation, respectively; these averaging windows are longer than those currently employed for this purpose at the U.S. Climate Prediction Center.

1. Introduction

Climate “normals” are statistical estimates of present and/or near-future climate means for such quantities as monthly or seasonal temperature or precipitation. These estimates play a large role in societal perceptions of climate (Hulme et al. 2009), are used in design and planning settings of economic significance (e.g., Dixon and Shulman 1984; Kunkel and Court 1990), and provide simple, empirical forecasts at lead times of nearly 1 year and longer (Huang et al. 1996; O’Lenic et al. 2008; Livezey and Timofeyeva 2008; van den Dool 2007). Indeed, climate normals may provide the only appreciable prediction skill for seasonal forecasts in some circumstances (Livezey and Timofeyeva 2008), and their use as a baseline to anchor seasonal forecast anomalies corresponding, for example, to ENSO predictability has also been advocated (Livezey et al. 2007).

Climate normals are conventionally estimated as averages over some number of the most recently observed annual values. In a purely stationary climate, where the underlying climatic data-generating process is unchanging through time, the best averaging period will be the full length of the climatic record, unless problems such as measurement or reporting errors in earlier years result in older data being less reliable or data inhomogeneities (Menne et al. 2009; Williams et al. 2012) yield inconsistent estimates for the normals. To the extent that climate means are not stationary, shorter averaging periods will yield more accurate estimates. The averaging period recommended by the World Meteorological Organization (WMO) is 30 yr, with decadal updating (Arguez and Vose 2011; van den Dool 2007), although annually updated 30-yr normals are expected to be more representative of conditions at the end of, and after, the averaging period (Arguez and Vose 2011; Livezey et al. 2007), even when changes in the underlying climate means are modest and slow. Shorter annually updated averaging periods are used operationally at the U.S. Climate Prediction Center (CPC) for forecasts of seasonally averaged temperature (10 yr) and seasonal total precipitation (15 yr) (Huang et al. 1996; O’Lenic et al. 2008).

Various studies have considered the problem of optimizing the averaging period for climate normals by finding that period yielding best hindcasts for the immediately subsequent year (Beaumont 1957; Dixon and Shulman 1984; Enger 1959; Huang et al. 1996; Lamb and Changnon 1981; Sabin and Shulman 1985), with the result now often called the optimal climate normal (OCN; Huang et al. 1996). The maximum averaging period usually considered is 30 yr, in which case these predictions can be optimized over n − 30 target years in an n-year training sample. This fitting procedure may yield spurious “artificial skill” because many candidate averaging periods are considered within the same dataset used to test them, so that such results must be treated cautiously (Wilks 1996).

In a provocative paper, Livezey et al. (2007) argue that more sophisticated methods of defining climate normals are now necessary because of recent and ongoing climate changes, especially for temperature normals. In particular they observe that the character of recent temperature nonstationarity is not a simple trend or a low-frequency quasiperiodicity, but instead often appears to be a quasi-stationary period until approximately 1975, followed by roughly linearly increasing means since that time. They propose defining climate normals consistent with this pattern using a two-phase regression model (e.g., Lund and Reeves 2002; Percival and Rothrock 2005; Solow 1987) with zero slope in the earlier period, which they call a “hinge” function.

The solid piecewise linear function in Fig. 1a illustrates the hinge model for annually averaged temperatures over the conterminous United States, computed as unweighted averages of the 102 approximately equal-area “megadivisions” described in appendix A of Livezey et al. (2007). Figure 1 shows data for 1931–2011, because the underlying divisional data for earlier years include regression estimates (Guttman and Quayle 1996). The clear warming trend from the mid-1970s onward is captured by the upward slope of the hinge function after 1977, which is the fitted change point. Figure 1b shows the corresponding data for annual precipitation, where the gradually increasing mean represented by the least squares linear trend (solid line) appears to provide an adequate description of the nonstationarity in the mean of the spatially aggregated annual U.S. precipitation climate. Of course the annually and nationally aggregated results in Fig. 1 are not equally representative of all divisions and seasons, as described in appendix A of Livezey et al. (2007) for these data through 2005.

Fig. 1.

Time series of annually averaged U.S. (a) temperatures and (b) precipitation for 1931–2011. The solid piecewise linear function in (a) is the hinge function [Eq. (6)] with the change point at c = 1977, fitted using data through 2010 (circles). The dashed piecewise linear function is the fitted continuous two-phase regression [Eq. (8)] with the change point at 1972. The solid line in (b) is the ordinary least squares trend [Eq. (5)].

Fig. 1.

Time series of annually averaged U.S. (a) temperatures and (b) precipitation for 1931–2011. The solid piecewise linear function in (a) is the hinge function [Eq. (6)] with the change point at c = 1977, fitted using data through 2010 (circles). The dashed piecewise linear function is the fitted continuous two-phase regression [Eq. (8)] with the change point at 1972. The solid line in (b) is the ordinary least squares trend [Eq. (5)].

Following an analysis of synthetic data based on an underlying hinge-climatological generating process, Livezey et al. (2007) suggest that hinge extrapolations be used for climate normals when their trends are sufficiently strong and that OCN-based normals should be used otherwise. They did not quantitatively specify when these two approaches should be used, however, or examine the performance of their proposed protocol using real data. The study presented here examines this and other approaches to defining normals using the U.S. megadivisional data in section 4. Prior to that, section 2 formally defines the hinge and other candidate functions for defining climate normals, and the performance of these candidates in an artificial-data setting will be examined in section 3. Section 5 will conclude with discussion and recommendations for climate normals in both the present and potential future stages of the changing climate.

2. Alternative climate-normal formulations

a. Fixed averaging periods

The conventional default estimator for climate normals, both here and in general practice, is the arithmetic average over a previous k = 30-yr period. Let yi, i = 1, … , n, denote the data of interest (e.g., average temperature or total precipitation, for a given month, season, or annual period) in year i. The k = 30-yr normal for target year i is then

 
formula

Equation (1) indicates annual recomputation of normals rather than the decadal updating recommended by WMO. Nevertheless, use of Eq. (1) with k = 30 (i.e., ) will be referred to as WMO30 in the following. Necessarily, n > 30 here and in the following, so that k = 30-yr averages can be considered.

The shorter averaging periods k = 10 and k = 15 are used with Eq. (1) by CPC to define climate normals for seasonal temperature and precipitation, respectively, which are applied as simple forecast tools for representing the effects of recent climate nonstationarity (Huang et al. 1996; O’Lenic et al. 2008). These fixed averaging periods were arrived at following examination of geographically and seasonally varying OCN averaging windows and compromising on constant values to simplify and regularize operational forecasting procedures (Huang et al. 1996). Here, use of Eq. (1) with k = 10 () and k = 15 () will be referred to as CPC10 and CPC15, respectively.

b. Optimal climate normals

Equation (1) can be optimized within an n-year training-data sample by choosing that averaging period k* that minimizes the mean-square error (MSE):

 
formula

over 1 ≤ k ≤ 30. The best averaging period k* is chosen as the one that produces the smallest mean-square error over all of the M = n − 30 available training-data windows of length 30. The resulting OCN estimate is . That is, each of the M data values yi, 31 ≤ in, is predicted by the average of the k* most recent values preceding it.

In a nonstationary climate for which the underlying mean is increasing or decreasing linearly throughout the training-data record, each of the M training-data windows is statistically equivalent, and optimization of the averaging period k* can be expressed quantitatively as a compromise between random averaging error (which decreases with increasing k) and systematic trend bias (which increases with increasing k) (Livezey et al. 2007). Indeed, Livezey et al. (2007) note that the resulting optimal k* could be used to define the averaging period k in Eq. (1), and they provide a table illustrating how this k* varies as a function of the (assumed known) recent climate trend, and the magnitude of the climate noise controlling the averaging error.

If the character of the underlying climate nonstationarity is nonlinear, however, the OCN training-data windows have different statistical characteristics. If the underlying climate change is accelerating (e.g., if the true climate mean is increasing at a faster rate in later years), as approximated for example by the solid hinge-function mean trend in Fig. 1a, then the less recent windows will yield smaller errors for longer averaging periods, which will tend to dilute the tendency for the more recent data windows (which should be more representative, on average, of the current and near-future climate) to yield shorter averaging periods. In a similar way, if the underlying climate change is decelerating, then the less recent windows will dilute the tendency of the more recent windows to yield longer averaging periods.

Consider the possibility of computing OCNs using only the most recent m of the available M = n − 30 training-data windows. For each m, a different optimal averaging length k*(m) may be computed, which minimizes

 
formula

and which yields the OCN estimates , 1 ≤ mM. For small m the estimates will tend to be more representative of ongoing nonlinear climate changes on average (will have lower bias) but will be subject to large sampling variations (high variance) because they will be based on shorter training series. For larger m, and in the limit of m = M = n − 30, the sampling variations will be minimized by the maximum training-data size, but on average the resulting will be less representative of the more recent climate nonstationarity.

The sequence of OCN estimates form a series that will tend to increase with decreasing m if the underlying climate change is accelerating. Jointly, they can be used to estimate an extrapolated OCN, as the intercept of the regression

 
formula

This estimate will be called the M-fold optimal climate normal, or OCNM:

 
formula

Although the smaller-m estimates will be more representative on average, they will be subject to more sampling variability, so that the intercept a = OCNM in Eq. (4) should be estimated using weighted regression, as detailed in  appendix A.

c. Regression methods

The remaining methods to be considered here for estimating nonstationary climate normals involve extrapolating linear regressions into the near future. The first is the trend estimate defined by a simple least squares linear regression estimated using the full n-member training-data sample,

 
formula

where the predictors i consist of the integers from 1 to n and the ɛi are the regression residuals. An example is the linear trend line shown in Fig. 1b for annual precipitation averaged over the conterminous United States. Here for example the training data (circles) are the n = 80 years 1931–2010, and extrapolating the regression line yields the estimated normal of = 814 mm for 2011. The observed average precipitation for 2011 (the “X” in the figure) was 769 mm.

Livezey et al. (2007) propose extrapolating climate normals using the hinge regression

 
formula

where τi = ic, c is the hinge point or change point, and I() is the indicator function whose value is 1 if its argument is true and is 0 otherwise. The subscripts on the regression parameters indicate that they pertain to that portion of the function that occurs after the change point, with the remaining two parameters constrained to a1 = a2 and b1 = 0. Following Livezey et al. (2007), it will be convenient to nondimensionalize the regression slope, expressing it as

 
formula

where the denominator is the sample standard deviation of the regression residuals.

The solid piecewise linear function in Fig. 1a is an example hinge regression, fit to training data (circles) for the n = 80 years 1931–2010, which extrapolates to the nonstationary normal = 12.11°C for 2011. The actual average temperature for 2011 was 11.97°C (the “X” in the figure). Here β = 0.0305/0.698 = 0.0437. For a given value of the change point, the parameters of Eq. (6) are estimated by setting the first c predictors to zero and setting the last nc predictors to τi = ic, which are the integers from 1 to nc. Livezey et al. (2007) proposed fixing the change point at 1975 and considering only temperature data from 1940 onward, on the basis of large-scale diagnostics and modeling of modern climate change (e.g., Livezey and Smith 1999; Solomon et al. 2007) together with inspection of the seasonal U.S. divisional data through 2005. The performance of hinge-function extrapolations using these constraints for prediction of the 2006–11 data that were not available to Livezey et al. (2007) will be investigated below. Hinge-normal forecasts will also be made by treating the change point as an additional parameter to be estimated, as in Fig. 1a, which requires repeating the fitting procedure for each c to be considered. To avoid unstable parameter estimates for c very near the beginnings or ends of the training-data records, 5 ≤ cn − 5 will be enforced.

Equation (6) is a special case of the two-phase linear regression model (e.g., Lund and Reeves 2002; Quandt 1958), in which separate regressions are fit to the portions of a time series on either side of a change point, so that the assumption in Eq. (6) of stationary mean up to and including the change point is relaxed. Solow (1987) considered a slightly restricted version of this model, which constrains the two-phase regression function to be continuous at the change point:

 
formula

Here the intercept term for the later regression phase is a2 = a1+b1c. Again, when the change point c is treated as a free parameter it will be constrained to 5 ≤ cn − 5. The dashed piecewise linear function in Fig. 1a is an example of such a continuous two-phase regression, yielding the nonstationary normal = 12.17°C when extrapolated to 2011, which is very similar to the result obtained from Eq. (6). In addition, performance of Eq. (8) with the change point fixed at 1975, and trained only on data from 1940 onward, will also be investigated for seasonal temperatures.

When the change point c in Eqs. (6) or (8) is treated as a free parameter, the respective regressions are fitted for all allowable change points, and that parameter set is chosen as best which maximizes the log likelihood, assuming independent Gaussian residuals (Quandt 1958),

 
formula

Equation (9) is applicable also to the simple linear regression in Eq. (5), by specifying c = n, so that estimation of the parameters a2 and b2 is moot. Equation (9) can therefore be used to compare the three regression models for a given time series. For the temperature data in Fig. 1a, the log likelihoods for the regressions in Eqs. (6) and (8) are −11.26 and −3.671, respectively, so that Eq. (8) would be preferred for these data according to a likelihood ratio test (e.g., Wilks 2011). For the precipitation data plotted in Fig. 1b, the log likelihoods for Eqs. (5), (6), and (8), respectively, are 11.50, 11.86, and 13.92, suggesting that the simple linear regression is to be preferred for these data, according to both the Akaike information criterion and Bayesian information criterion (e.g., Wilks 2011) statistics.

Livezey et al. (2007) recommend using generalized least squares (i.e., accounting for residual autocorrelation) when fitting the regression models considered here, although they noted that this more elaborate estimation procedure leads to only small gains over conventional least squares. It is shown in  appendix B that autocorrelation of residuals from Eq. (6) are consistent with at most weak underlying time dependence, for seasonal (i.e., 3-month averages of temperature and 3-month totals for precipitation) 1931–2010 time series for the 102 U.S. megadivisions. Similar results (not shown) are obtained also for the residuals from Eqs. (5) and (8).

3. Best normals under hinge-function climatic nonstationarity

Livezey et al. (2007) presented theoretical analyses of various formulations for climate normals under the assumption of linear nonstationarity of the mean. A somewhat different picture emerges if a nonlinear nonstationarity is assumed, however. Simulations described in this section will assume that the underlying climate nonstationarity is in the form of the hinge function [Eq. (6)], which might be regarded as a simple piecewise linear approximation to gradually accelerating nonlinear climate changes of the kind investigated by Epstein (1982).

Figure 2 summarizes the results of 100 000 stochastic simulations each, for regularly spaced values of the parameters β and τn over the ranges indicated on the axes. For each of 61 values of β, 105 time series of length n = 101 were generated according to Eq. (6), with c = 45 and Gaussian residuals having σɛ = 1. These random series were then truncated after 46, 47, 48, … , 100 time points to yield series having τn ranging from 1 to 55. This setup might be regarded as representing annual time series beginning in 1931, with the change point occurring in 1975, which are reanalyzed each year beginning in 1976. The double-headed arrow at the bottom of the figure then indicates values of τn corresponding to change points in the range 1970–80, relative to the present time. Each of the eight candidate climate-normals formulations described in section 2 [with change point c treated as a free parameter in Eqs. (6) and (8)] was fit to each of the 105 realizations for each (β, τn) pair, and the regions indicated in Fig. 2 show that formulation yielding minimum prediction MSE.

Fig. 2.

Best (according to prediction MSE) climate-normal formulations assuming hinge-function [Eq. (6)] nonstationarity in the underlying climate means, evaluated for each parameter combination of β and τn, over 100 000 synthetic time series of length n = 101.

Fig. 2.

Best (according to prediction MSE) climate-normal formulations assuming hinge-function [Eq. (6)] nonstationarity in the underlying climate means, evaluated for each parameter combination of β and τn, over 100 000 synthetic time series of length n = 101.

For very weak trends (small |β|) and/or years near the estimated change point (small τn), the conventional WMO30 provides the best climate-normal specification. Conversely, for sufficiently large |β| and τn, Fig. 2 shows that the correct (i.e., hinge) climate normal is most accurate. Other formulations are seen to be best at intermediate values of β and τn, ranging from linear trends for weak β and large τn, progressively through CPC15, CPC10, OCNM, and OCN for progressively increasing |β| and decreasing τn. The continuous two-phase regression [Eq. (8)] is not chosen as best overall for any of the parameter combinations when the underlying climate nonstationarity is defined by Eq. (6).

The practical utility of the parameter-space regions shown in Fig. 2 is unfortunately limited by sampling variability in the model-fitting process and in particular by the need to estimate the parameters β and τn by fitting Eq. (6) to finite data series. Figure 3 illustrates the problem for 100 synthetic time series each, generated from Eq. (6) with β = 0.2 and τn = 40 (upper-right collections of small dots), β = 0.05 and τn = 30 (small dots in the middle portion of the figure, representing data series whose generating process is similar in character to that in Fig. 1a), and β = 0.05 and τn = 10 (plus signs in the middle and left portions of the figure). In each case the true underlying β and τn pairs are shown as the heavy dots, and collections of estimated parameter pairs from the 100 sample time series are shown as the small symbols.

Fig. 3.

Sample estimates for 100 (β, τn) pairs fit to synthetic time series from Eq. (6) with true underlying values β = 0.2 and τn = 40 (upper-right collections of small dots), β = 0.05 and τn = 37 (small dots in the middle portion of the figure), and β = 0.05 and τn = 10 (plus signs in the middle and left portions of the figure). In each case the true underlying β and τn are shown as the heavy dots. Dashed curves show selected isolines of t2 [Eq. (10)] for positive β. Thick gray curves reproduce boundaries from Fig. 2.

Fig. 3.

Sample estimates for 100 (β, τn) pairs fit to synthetic time series from Eq. (6) with true underlying values β = 0.2 and τn = 40 (upper-right collections of small dots), β = 0.05 and τn = 37 (small dots in the middle portion of the figure), and β = 0.05 and τn = 10 (plus signs in the middle and left portions of the figure). In each case the true underlying β and τn are shown as the heavy dots. Dashed curves show selected isolines of t2 [Eq. (10)] for positive β. Thick gray curves reproduce boundaries from Fig. 2.

For large underlying β and τn (upper-right group of points), the corresponding sample estimates are sufficiently precise that they cluster tightly around the true value, and each would correctly indicate that a hinge normal would be the appropriate choice according to the regions shown in Fig. 2 (and reproduced as the gray lines in Fig. 3). For weaker trends and/or shorter post-change-point training-data records, however, the sample estimates for β and τn can be very poor. In such cases a sample estimate for β and τn may well fall in a different region of the figure than the true (and unknown) β and τn, leading to incorrect and therefore suboptimal choices for the formulation representing the normals, even if it is known (hypothetically) that the true underlying climatic stationarity is governed by a hinge function. In all three cases there is a strong tendency for negative covariance of the estimation errors, so that underestimates of τn generally occur in conjunction with overestimates of β, and vice versa.

Assuming that the underlying climate nonstationarity is reasonably well represented by the hinge function [Eq. (6)], it should be possible [as noted by Livezey et al. (2007)] to correctly choose the hinge normal to extrapolate a given finite training time series, if the fitted hinge regression is sufficiently strong. A convenient statistic for this purpose is the square of (to accommodate also negative β) the conventional t statistic for the regression slope,

 
formula

where, as before, τn = nc. The middle equality in Eq. (10) expresses the square of the standard error of an estimated regression slope in terms of the estimated residual variance divided by the sum of squared anomalies of the predictor variables (e.g., Wilks 2011). The final equality in Eq. (10) follows from the fact that the sample variance of any τn consecutive integers is τn(τn + 1)/12 and shows that the t2 statistic will be large for sufficiently large |β| and τn. The dashed curves in Fig. 3 show four levels of t2, for positive β (only).

When fitting Eq. (6) to an individual time series, we see only sample estimates of β and τn, corresponding to the small symbols in Fig. 3, and not the true values indicated by the large dots. For sufficiently large t2, however, it can be concluded with high probability that the true (β, τn) pair is within the hinge region indicated in Fig. 2. To investigate how large this t2 must be, 10 000 sample time series were generated according to Eq. (6), at each of 50 true underlying (β, τn) pairs located along the lower boundary of this region, with distributions of resulting t2 tabulated in each case. The 75th, 90th, 95th, and 99th percentiles of these 10 000-member distributions of t2 statistics are approximately 20, 25, 30, and 40, respectively. That is, sample t2 values of these magnitudes indicate approximately the corresponding probabilities for the underlying true (β, τn) pair being within the hinge region of Fig. 2. These are very stringent criteria for adopting the hinge function to represent a climate normal, and they again assume that it is known that the true climate-mean nonstationarity is a hinge function. For example t2 = 10 corresponds to a hypothesis test rejecting H0: β = 0, with the nominal significance level p = 0.0016; and t2 = 20 corresponds to the nominal p = 0.000 008. The conclusion here is that, even if the mean of the underlying data-generating process is truly governed by a hinge function, a large sample t2 value is required to be confident that its parameters are within the hinge region of Figs. 2 and 3 and therefore that the fitted hinge normal will be the most effective choice for prediction of future data.

4. Best normals for U.S. divisional temperature and precipitation, 1994–2011

In this section, the alternative formulations for climate normals described in section 2 are tested on the seasonal megadivisional data that are described in Livezey et al. (2007). The megadivisions are 102 regions in the conterminous United States that have been derived from the 344 National Climatic Data Center (NCDC) divisions (Guttman and Quayle 1996), by aggregating some of the smaller NCDC divisions located mainly in the eastern United States to yield approximately equal areas. The underlying data are monthly values from the “TD9640” dataset, obtained from NCDC for the period January 1931–February 2012. Seasonal (i.e., averages and totals for consecutive 3-month periods, for temperature and precipitation, respectively) values for January–March 1931 through December–February (DJF) 2011 are used, which in the following will be labeled according to the year of the initial month of each season.

Nine-month-ahead (e.g., using data through February 2011 to predict DJF 2011) hindcasts have been computed for each of the 12 seasons for the periods 1994–2011 and 2006–11 to allow comparison with the results in both Huang et al. (1996) and Livezey et al. (2007). Each of the climate-normals formulations described in section 2 has been recomputed for each prediction using all prior years as training data (and, in addition for temperature forecasts using the piecewise linear regressions, only from 1940 onward), so that the results emulate out-of-sample forecasts that could have been achieved operationally during the 1994–2011 period that were not available to Huang et al. (1996) and the 2006–11 period that were not available to Livezey et al. (2007). For example, prediction equations for a given season in 1994 were trained using previous values for that season from the years 1931–93. Equations predicting 2011 were trained using all data through 2010.

Figure 4 shows performance of the various normals formulations for the temperature specifications relative to WMO30, in terms of the reduction of variance (RV):

 
formula

This statistic is numerically equal to 1 minus the conventional MSE-based skill score (e.g., Wilks 2011), and was denoted as MSE* in Wilks (1996). Values smaller than RV = 1 indicate hindcasts that are more accurate than the baseline.

Fig. 4.

The RV [Eq. (11)] for nine normals formulations predicting seasonal temperature, relative to WMO30, for 2006–11. Open bars show results for the indicated spatial stratifications, shaded bars show nationally aggregated results, vertical whiskers show 90% bootstrap confidence intervals, and stars indicate the best-performing formulation in each case. Shown are (a) winter (from DJF through FMA), (b) spring [from March–May (MAM) through May–July (MJJ)], (c) summer (from JJA through ASO), and (d) autumn [from September–November (SON) through November–January (NDJ)].

Fig. 4.

The RV [Eq. (11)] for nine normals formulations predicting seasonal temperature, relative to WMO30, for 2006–11. Open bars show results for the indicated spatial stratifications, shaded bars show nationally aggregated results, vertical whiskers show 90% bootstrap confidence intervals, and stars indicate the best-performing formulation in each case. Shown are (a) winter (from DJF through FMA), (b) spring [from March–May (MAM) through May–July (MJJ)], (c) summer (from JJA through ASO), and (d) autumn [from September–November (SON) through November–January (NDJ)].

Results for the 2006–11 period not available to Livezey et al. (2007) are shown in Fig. 4 to evaluate Livezey et al.’s conclusions using independent data. For the two piecewise linear normals formulations [Eqs. (6) and (8)], results are presented both for training data beginning in 1931 and the change point c estimated as a free parameter (labeled “c estd.”) and, following the prescription in Livezey et al. (2007), for training data limited to 1940 onward while fixing the change point at 1975 (“c = 1975”). Within each seasonally stratified panel of Fig. 4, the results are in addition stratified geographically, with the “west” region composed of the U.S. states to the west of Montana, Wyoming, Colorado, and New Mexico, inclusive (roughly west of 104° longitude) and the “east” region corresponding to the area east of the Mississippi River (east of Wisconsin, Illinois, Kentucky, Tennessee, and Mississippi, inclusive, or east of roughly 90° longitude), with the “central” region between. These boundaries have been chosen to correspond roughly to hinge-trend regions indicated by Livezey et al. (2007) in their Figs. 1a and A1.

Collectively, the panels of Fig. 4 show that the relative accuracy of the temperature-normals projections is worst in “winter” [DJF–February–April (FMA)], is generally best in “summer” [June–August (JJA)–August–October (ASO)], and is intermediate in the transition seasons. In the two transition seasons the fixed averaging period of CPC15 yields the most accurate projections in two of the three geographical stratifications and for the nationally aggregated results (gray bars). In summer the continuous two-phase regression [Eq. (8)] provides the best accuracy in most cases. For the winters during 2006–11 the least squares linear trend extrapolations were most accurate, except in the western region, where the OCN exhibits the best RV. WMO30 yielded most accurate extrapolations (all RVs > 1) in winter, except for the eastern region. Winter and autumn are also the only seasons in which there are consistent differences among the regions, with the western division generally being forecast least accurately and the eastern division being forecast most accurately.

Figure 4 also shows clearly, using fully independent data, that the equation-fitting prescription of Livezey et al. (2007) is justified, that is, that hinge normals for temperature extrapolations should be fit using data from 1940 onward and using the fixed change point c = 1975. The resulting hindcasts are consistently better than those allowing an estimated change point, except for the western region during the winter and spring seasons. In a similar way, fixing the change point at 1975 is also effective at improving the continuous two-phase regressions.

Figure 5 presents corresponding temperature-projection results for the 1994–2011 period that was not available to Huang et al. (1996). Here it is clear that CPC15 yielded the best results, except for the western region and national aggregation during summer. For this longer period, seasonal and geographical differences are less pronounced than for the shorter-period results in Fig. 4. Fixing the change point at 1975 again yields better extrapolated normals than allowing c to be a free parameter, although this is not a fully out-of-sample result since these data include the 1994–2005 period available to Livezey et al. (2007).

Fig. 5.

As in Fig. 4, but for 1994–2011.

Fig. 5.

As in Fig. 4, but for 1994–2011.

Figure 6 shows RVs relative to WMO30 for nationally and annually aggregated results for the 2006–11 period that was not available to Livezey et al. (2007) (open bars), and the 1994–2011 period that was not available to Huang et al. (1996) (shaded bars). Figure 6a shows that, contrary to the recommendation of Huang et al. (1996) that fixed 10-yr averaging periods should provide the best overall temperature-normals specifications, the 15-yr average (CPC15) has performed best overall since their data cutoff and also during the most recent 6 yr. In a similar way, for precipitation (Fig. 6b), none of the alternative normals formulations for precipitation has been more accurate overall since 1994 than the simple WMO30, contradicting the Huang et al. (1996) recommendation that CPC15 should be used for this purpose. Figure 6b also shows that, similar to the temperature results in Figs. 4 and 6a, fixing the change point at 1975 for specifying precipitation normals (here, using training data from 1931) improves both hinge and continuous two-phase forecasts of the nonstationary means, including particularly during the 2006–11 period that was not available to Livezey et al. (2007).

Fig. 6.

Annually and nationally aggregated RVs [Eq. (11)] relative to WMO30 for normals formulations predicting seasonal (a) temperature and (b) precipitation for 2006–11 (open bars) and 1994–2011 (gray bars). The vertical whiskers show 90% bootstrap confidence intervals, and the stars indicate the best-performing formulation in each case.

Fig. 6.

Annually and nationally aggregated RVs [Eq. (11)] relative to WMO30 for normals formulations predicting seasonal (a) temperature and (b) precipitation for 2006–11 (open bars) and 1994–2011 (gray bars). The vertical whiskers show 90% bootstrap confidence intervals, and the stars indicate the best-performing formulation in each case.

Figures 46 also plot 90% bootstrap confidence intervals for the averaged RVs, as the thin vertical whiskers. Differences in the underlying sampling variability for the various normals formulations and for temperature in contrast to precipitation are most easily seen in the aggregated results of Fig. 6. For temperature, the least sampling uncertainty is exhibited by the conventional linear regressions, followed closely by CPC15 and OCN. The generally shorter averaging periods of CPC10 and OCNM yield yet larger sampling uncertainty. The hinge and two-phase regressions exhibit the largest sampling variability, and fixing the change point at 1975 reduces this uncertainty only slightly. Sampling uncertainty for the precipitation projections is consistently less than for temperature, presumably because the smaller spatial scale of precipitation variations yields larger effective sample sizes. In a similar way, sampling uncertainty for the longer 1994–2011 period is consistently smaller than for the shorter 2006–11 period.

Figure 7 shows forecast biases, again for the 2006–11 (open bars) and 1994–2011 (shaded bars) periods. For temperature hindcasts (Fig. 7a) during the longer period, both WMO30 and linear trend extrapolation yield strong negative (i.e., cold, or underforecasting) biases, as they fail to keep up with the accelerating temperature increases after the mid-1970s that are evident in Fig. 1a. Shorter averaging periods (CPC15 and CPC10) exhibit smaller negative biases than WMO30, and the hinge and continuous two-phase regressions show much better bias characteristics than do the ordinary linear regressions. Similarly, the more data-adaptive OCNM exhibits a smaller negative bias than does the conventional OCN. The bias characteristics for temperature projections during the 2006–11 period are very different because most of these years (most notably 2008 and 2009) were below the trend established over the previous decades, yielding positive (warm, overforecasting) biases during this relatively short sample for all of the methods except WMO30 and linear regression. Precipitation biases (Fig. 7b) are more consistent between the two time periods than are the temperature biases in Fig. 7a. The relatively small biases for linear trends in precipitation in Fig. 7b reflect the quasi-linearity of the precipitation trends exemplified by the overall average in Fig. 1b, in contrast to the accelerating trend exhibited by the national and annual temperature averages in Fig. 1a.

Fig. 7.

As in Fig. 6, but for forecast biases.

Fig. 7.

As in Fig. 6, but for forecast biases.

Livezey et al. (2007) suggested a “hybrid” procedure for defining climate normals, in which the hinge function is used to extrapolate recent trends in cases in which the fitted hinge model is sufficiently strong, with OCN being used otherwise. They did not specify a quantitative criterion for choosing between the two. Figure 8 shows results for this hybrid procedure, in terms of RMSEs of seasonal temperature predictions, again for the 2006–11 period that was not available to Livezey et al. (2007). Here the change point is fixed at c = 1975, hinge functions are fit using data from 1940 onward, and use of the fitted hinge functions is invoked at minimum t2 levels ranging from t2 > 3 to t2 > 20 (i.e., never using the hinge). The six curves in Fig. 8 show the progressive decreases in RMSE when the indicated alternative normals formulations are used for cases at or below the specified t2 thresholds. Best results are obtained when the alternative normals formulation is WMO30 (although results for CPC15 are very similar), consistent with the open bars in Fig. 6a. In none of the cases does use of the hinge normal above any t2 level improve the RMSE over the t2 > 20 case, for which the hinge function is never used. Similar results (not shown) are obtained also for the continuous two-phase regression, for both piecewise linear methods when the change point is allowed to differ from 1975 and when predictions for 1994–2005 are included also, although in that case the rankings for the different alternative methods are consistent with the shaded bars in Fig. 6a. Therefore, the Livezey et al. (2007) “hybrid” procedure cannot yet be recommended for projecting U.S. megadivisional normals.

Fig. 8.

RMSE for seasonal temperature predictions for all seasons and divisions, 2006–11, if hinge-normal predictions with change point c fixed at 1975 are used when t2 values are above those specified on the horizontal axis and the six indicated alternatives are used otherwise. Parenthetical values indicate numbers of hindcasts (of 7272 total) for which t2 is equal to or exceeds the indicated values.

Fig. 8.

RMSE for seasonal temperature predictions for all seasons and divisions, 2006–11, if hinge-normal predictions with change point c fixed at 1975 are used when t2 values are above those specified on the horizontal axis and the six indicated alternatives are used otherwise. Parenthetical values indicate numbers of hindcasts (of 7272 total) for which t2 is equal to or exceeds the indicated values.

5. Summary and conclusions

The performance of eight formulations for computing climate normals has been examined here in both artificial-data and real-data settings. Assuming underlying climate nonstationarities defined by the “hinge” function [Eq. (6)], fitting that functional form to sample time series is not the best approach for specifying climate normals unless those fitted regressions are very highly significant—exhibiting t2 values [Eq. (10)] larger than perhaps 30 or more. The reason for this very stringent requirement derives from the large sampling variability in the fitted hinge-function parameters unless both β and τn are large. When the hinge-function climate nonstationarity is weaker, because of shallow slopes (small β) or short times since the change point (small τn), or both, other candidate formulations for the climate normals perform better even when the hinge function is the correct statistical model.

Huang et al. (1996) studied the OCN formulation [Eq. (2)] in the context of U.S. divisional temperature and precipitation data and concluded that fixed averaging periods [Eq. (1)] of k = 10 (CPC10) for temperature and k = 15 (CPC15) for precipitation were the best single definitions for the respective seasonal normals when constrained by operational considerations. The results presented here, including especially those for the 1994–2011 period that was not available to Huang et al. (1996), show that the fixed averaging period with k = 15 (CPC15) has performed best for the seasonal temperature hindcasts and that k = 30 (WMO30) was best for seasonal precipitation. These results are also consistent with those in Wilks (1996), in which the same 1961–93 target period as in Huang et al. (1996) was used. At present the best single approach to defining climate normals for U.S. divisional data apparently continues to be fixed averaging periods with k = 15 yr for seasonal temperature and k = 30 yr for seasonal precipitation. Attempts to optimize these on a case-by-case basis (i.e., using OCN) have yielded inferior results here, including for the 1994–2011 period not used in Huang et al. (1996) to examine the OCN-, CPC10-, and CPC15-based normals.

Although an attractive idea, the suggestion of Livezey et al. (2007) to specify climate normals using the hinge function in cases in which the fitted hinge regressions are sufficiently strong and using a more conventional formulation otherwise did not provide improvements over the more conventional formulations alone for U.S. megadivisional data during 2006–11. The parenthetical subsample sizes on the horizontal axis of Fig. 7 show that only about 1.6% of the plotted cases exhibit t2 > 10, so that reference to Fig. 3 suggests that the fitted hinge functions are not yet strong enough to be preferred for specifying climate normals. Another potential contributor to the disappointing performance of this proposed method may be data inhomogeneities in the observations underlying the megadivisional data, which may have led to underestimation of the fitted trends (Menne et al. 2009; Williams et al. 2012; R. E. Livezey 2012, personal communication). Accordingly, a repetition of this study using homogenized (Williams et al. 2012) data would be warranted. To the extent that the climate continues to warm in the coming decades at a rate comparable to that indicated in Fig. 1a, it is very possible that the hinge and continuous two-phase regressions may become preferred methods, at least for locations and seasons exhibiting the most prominent piecewise linear historical trends. As of 2011 τn ≈ 35 yr, so that if the present warming trend continues (i.e., to the extent that βs do not change appreciably), Eq. (10) indicates that t2 values will approximately double in the next decade and quadruple over the next two decades.

A new climate-normal statistic, the M-fold OCN, or OCNM, was introduced here. As detailed in  appendix A, it is formulated as a weighted regression of conventional OCN estimates, computed over all M = n − 30 of the most recent training windows, as an attempt to capture accelerating (or decelerating) climate changes better than does the conventional OCN. This approach often yielded more accurate temperature normals than hinge or continuous two-phase regression extrapolations for the winter and spring seasons (Fig. 4) but was generally less accurate for the summer–autumn portion of the year. Its use should also be reconsidered periodically, particularly to the extent that ongoing climate changes may appear to further accelerate in the future.

Acknowledgments

I thank Marina Timofeyeva for supplying the data used to aggregate NCDC climate divisions to the CPC megadivisions and Bob Livezey for valuable comments on earlier drafts of this paper. This research was supported by the National Science Foundation under Grant AGS-1112200.

APPENDIX A

M-Fold OCN

In a nonstationary climate in which the change in an underlying mean value is accelerating (or decelerating), ordinary OCNs will yield an apparently best averaging period k that is a compromise between smaller (or larger) k for more recent averaging periods and vice versa for less recent averaging periods. Equation (3) expresses the fact that a best averaging period k*(m) can be estimated for each of the most recent m training-data windows, m = 1, … , M = n − 30; each yielding in turn an estimate for the OCN of .

Equation (4) expresses the calculation of the M-fold OCN, OCNM, as the intercept of a regression summarizing the linear relationship between the and m, in effect extrapolating the to m = 0. Although the small-m estimates are expected to be more representative of current conditions in a nonlinearly nonstationary climate, they will be estimated with fewer data values and so will be more subject to sampling variations. Therefore the appropriate estimation procedure will be weighted regression (e.g., Draper and Smith 1981), in which the variance of each is specified as 1/m. That is,

 
formula

where

 
formula

and the data matrix and predictand vector y are

 
formula
 
formula

Substituting into Eq. (A1) and using analytic results for sums of the integers and squared integers from 1 to M yields the OCNM estimate:

 
formula

It is also possible to formulate Eq. (4a) as a quadratic function of the number of training-data windows used, but that approach yielded slightly inferior out-of-sample hindcasts for the U.S. megadivisional data considered in this paper.

APPENDIX B

Autocorrelation of Residuals from Hinge Fits

Livezey et al. (2007) recommend adjusting for autocorrelation of the residuals when fitting the hinge-function regression model in Eq. (6). Some of these sample residual autocorrelations are larger in absolute value than 0.3 for the U.S. megadivision seasonal temperature and precipitation data for 1931–2010, but overall they are consistent with, at most, weak underlying autocorrelation.

Figure B1 shows histograms of autocorrelations of residuals from hinge regressions for the seasonal megadivisional temperature and precipitation time series. For the temperature autocorrelations in Fig. B1a the change point has been fixed at 1975, although this has little impact on the results relative to allowing unconstrained c. In each plot there are 102 (megadivisions) × 12 (3-month seasons) = 1224 sample autocorrelations, which have been transformed using the inverse hyperbolic tangent (i.e., Fisher Z transform),

 
formula

The sampling distribution of these transformed correlations is expected to be Gaussian, with variance (n − 3)−1 = 1/76 = 0.1152 (e.g., Wilks 2011), because the lagged correlations have been computed using n = 79 pairs of consecutive years.

Fig. B1.

Histograms of (Fisher Z transformed) sample autocorrelations of residuals from hinge regressions [Eq. (6)] representing U.S. megadivision (a) temperature and (b) precipitation time series for 3-month seasons, 1931–2010. Superimposed Gaussian probability density functions show theoretical sampling distributions for underlying data-generating processes exhibiting zero residual autocorrelation.

Fig. B1.

Histograms of (Fisher Z transformed) sample autocorrelations of residuals from hinge regressions [Eq. (6)] representing U.S. megadivision (a) temperature and (b) precipitation time series for 3-month seasons, 1931–2010. Superimposed Gaussian probability density functions show theoretical sampling distributions for underlying data-generating processes exhibiting zero residual autocorrelation.

For the temperature autocorrelations in Fig. B1a, the sample mean, standard deviation, and skewness coefficients are 0.041, 0.137, and 0.305, respectively, and so are shifted slightly to the right and are slightly more variable and more positively skewed than the theoretical Gaussian distribution (smooth curve), but they appear to be consistent with a sample of correlations drawn from a process whose underlying mean is only slightly larger than zero. For the precipitation autocorrelations (Fig. B1b), the mean, standard deviation, and skewness coefficients are −0.008, 0.106, and 0.048, respectively, suggesting that they are completely consistent with having been drawn from a process with zero underlying autocorrelation.

Similar results (not shown) are obtained for the residuals from simple linear regressions [Eq. (5)] and continuous two-phase regressions [Eq. (8)], fit to the same data. Note that in a nonstationary climate autocorrelation of the data (or of data anomalies from a constant mean) should be substantially larger than those for the regression anomalies shown in Fig. B1. For example, in a warming climate, anomalies from the overall mean will tend to be mainly negative early in the record and mainly positive late in the record and so will be generally similar to anomalies for adjacent years, yielding a tendency for positive autocorrelation even if the corresponding regression residuals are essentially uncorrelated.

REFERENCES

REFERENCES
Arguez
,
A. A.
, and
R. S.
Vose
,
2011
:
The definition of the standard WMO climate normal
.
Bull. Amer. Meteor. Soc.
,
92
,
699
704
.
Beaumont
,
R. T.
,
1957
:
A criterion for selection of length of record for moving arithmetic mean for hydrological data
.
Trans. Amer. Geophys. Union
,
38
,
198
200
.
Dixon
,
K. W.
, and
M. D.
Shulman
,
1984
:
A statistical evaluation of the predictive abilities of climatic averages
.
J. Climate Appl. Meteor.
,
23
,
1542
1552
.
Draper
,
N. R.
, and
H.
Smith
,
1981
: Applied Regression Analysis. John Wiley and Sons, 709 pp.
Enger
,
I.
,
1959
:
Optimum length of record for climatological estimates of temperature
.
J. Geophys. Res.
,
64
,
779
787
.
Epstein
,
E. S.
,
1982
:
Detecting climate change
.
J. Appl. Meteor.
,
21
,
1172
1182
.
Guttman
,
N. B.
, and
R. G.
Quayle
,
1996
:
A historical perspective of U.S. climate divisions
.
Bull. Amer. Meteor. Soc.
,
77
,
293
303
.
Huang
,
J.
,
H. M.
van den Dool
, and
A. G.
Barnston
,
1996
:
Long-lead seasonal temperature prediction using optimal climate normals
.
J. Climate
,
9
,
809
817
.
Hulme
,
M.
,
S.
Dessai
,
I.
Lorenzoni
, and
D. R.
Nelson
,
2009
:
Unstable climates: Exploring the statistical and social constructions of “normal” climate
.
Geoforum
,
40
,
197
206
.
Kunkel
,
K. E.
, and
A.
Court
,
1990
:
Climatic means and normals—A statement of the American Association of State Climatologists
.
Bull. Amer. Meteor. Soc.
,
71
,
201
204
.
Lamb
,
P. J.
, and
S. A.
Changnon
Jr.
,
1981
:
On the “best” temperature and precipitation normals: The Illinois situation
.
J. Appl. Meteor.
,
20
,
1383
1390
.
Livezey
,
R. E.
, and
T. M.
Smith
,
1999
:
Covariability of aspects of North American climate with global sea surface temperatures on interannual to interdecadal time scales
.
J. Climate
,
12
,
289
302
.
Livezey
,
R. E.
, and
M. M.
Timofeyeva
,
2008
:
The first decade of long-lead U.S. seasonal forecasts: Insights from a skill analysis
.
Bull. Amer. Meteor. Soc.
,
89
,
843
854
.
Livezey
,
R. E.
,
K. Y.
Vinnikov
,
M. M.
Timofeyeva
,
R.
Tinker
, and
H. M.
van den Dool
,
2007
:
Estimation and extrapolation of climate normals and climatic trends
.
J. Appl. Meteor. Climatol.
,
46
,
1759
1776
.
Lund
,
R.
, and
J.
Reeves
,
2002
:
Detection of undocumented change points: A revision of the two-phase regression model
.
J. Climate
,
15
,
2547
2554
.
Menne
,
M. J.
,
C. N.
Williams
Jr.
, and
R. S.
Vose
,
2009
:
The U.S. Historical Climatology Network monthly temperature data, version 2
.
Bull. Amer. Meteor. Soc.
,
90
,
993
1007
.
O’Lenic
,
E. A.
,
D. A.
Unger
,
M. S.
Halpert
, and
K. S.
Pelman
,
2008
:
Developments in operational long-range climate prediction at CPC
.
Wea. Forecasting
,
23
,
496
515
.
Percival
,
D. B.
, and
D. A.
Rothrock
,
2005
:
“Eyeballing” trends in climate time series: A cautionary note
.
J. Climate
,
18
,
886
891
.
Quandt
,
R. E.
,
1958
:
The estimation of the parameters of a linear regression system obeying two separate regimes
.
J. Amer. Stat. Assoc.
,
53
,
873
880
.
Sabin
,
T. E.
, and
M. D.
Shulman
,
1985
:
A statistical evaluation of the efficiency of the climatic normal as a predictor
.
J. Climatol.
,
5
,
63
77
.
Solomon
,
S.
,
D.
Qin
,
M.
Manning
,
M.
Marquis
,
K.
Averyt
,
M. M. B.
Tignor
,
H. L.
Miller
Jr.
, and
Z.
Chen
, Eds.,
2007
: Climate Change 2007: The Physical Science Basis. Cambridge University Press, 996 pp.
Solow
,
A. R.
,
1987
:
Testing for climate change: An application of the two-phase regression model
.
J. Climate Appl. Meteor.
,
26
,
1401
1405
.
van den Dool
,
H. M.
,
2007
: Empirical Methods in Short-Term Climate Prediction. Oxford University Press, 215 pp.
Wilks
,
D. S.
,
1996
:
Statistical significance of long-range “optimal climate normal” temperature and precipitation forecasts
.
J. Climate
,
9
,
827
839
.
Wilks
,
D. S.
,
2011
. Statistical Methods in the Atmospheric Sciences. 3rd ed. Academic Press, 676 pp.
Williams
,
C. N.
,
M. J.
Menne
, and
P. W.
Thorne
,
2012
:
Benchmarking the performance of pairwise homogenization of surface temperatures in the United States
.
J. Geophys. Res.
,
117
,
D05116
,
doi:10.1029/2011JD016761
.