## 1. Introduction

Climate “normals” are statistical estimates of present and/or near-future climate means for such quantities as monthly or seasonal temperature or precipitation. These estimates play a large role in societal perceptions of climate (Hulme et al. 2009), are used in design and planning settings of economic significance (e.g., Dixon and Shulman 1984; Kunkel and Court 1990), and provide simple, empirical forecasts at lead times of nearly 1 year and longer (Huang et al. 1996; O’Lenic et al. 2008; Livezey and Timofeyeva 2008; van den Dool 2007). Indeed, climate normals may provide the only appreciable prediction skill for seasonal forecasts in some circumstances (Livezey and Timofeyeva 2008), and their use as a baseline to anchor seasonal forecast anomalies corresponding, for example, to ENSO predictability has also been advocated (Livezey et al. 2007).

Climate normals are conventionally estimated as averages over some number of the most recently observed annual values. In a purely stationary climate, where the underlying climatic data-generating process is unchanging through time, the best averaging period will be the full length of the climatic record, unless problems such as measurement or reporting errors in earlier years result in older data being less reliable or data inhomogeneities (Menne et al. 2009; Williams et al. 2012) yield inconsistent estimates for the normals. To the extent that climate means are not stationary, shorter averaging periods will yield more accurate estimates. The averaging period recommended by the World Meteorological Organization (WMO) is 30 yr, with decadal updating (Arguez and Vose 2011; van den Dool 2007), although annually updated 30-yr normals are expected to be more representative of conditions at the end of, and after, the averaging period (Arguez and Vose 2011; Livezey et al. 2007), even when changes in the underlying climate means are modest and slow. Shorter annually updated averaging periods are used operationally at the U.S. Climate Prediction Center (CPC) for forecasts of seasonally averaged temperature (10 yr) and seasonal total precipitation (15 yr) (Huang et al. 1996; O’Lenic et al. 2008).

Various studies have considered the problem of optimizing the averaging period for climate normals by finding that period yielding best hindcasts for the immediately subsequent year (Beaumont 1957; Dixon and Shulman 1984; Enger 1959; Huang et al. 1996; Lamb and Changnon 1981; Sabin and Shulman 1985), with the result now often called the optimal climate normal (OCN; Huang et al. 1996). The maximum averaging period usually considered is 30 yr, in which case these predictions can be optimized over *n* − 30 target years in an *n*-year training sample. This fitting procedure may yield spurious “artificial skill” because many candidate averaging periods are considered within the same dataset used to test them, so that such results must be treated cautiously (Wilks 1996).

In a provocative paper, Livezey et al. (2007) argue that more sophisticated methods of defining climate normals are now necessary because of recent and ongoing climate changes, especially for temperature normals. In particular they observe that the character of recent temperature nonstationarity is not a simple trend or a low-frequency quasiperiodicity, but instead often appears to be a quasi-stationary period until approximately 1975, followed by roughly linearly increasing means since that time. They propose defining climate normals consistent with this pattern using a two-phase regression model (e.g., Lund and Reeves 2002; Percival and Rothrock 2005; Solow 1987) with zero slope in the earlier period, which they call a “hinge” function.

The solid piecewise linear function in Fig. 1a illustrates the hinge model for annually averaged temperatures over the conterminous United States, computed as unweighted averages of the 102 approximately equal-area “megadivisions” described in appendix A of Livezey et al. (2007). Figure 1 shows data for 1931–2011, because the underlying divisional data for earlier years include regression estimates (Guttman and Quayle 1996). The clear warming trend from the mid-1970s onward is captured by the upward slope of the hinge function after 1977, which is the fitted change point. Figure 1b shows the corresponding data for annual precipitation, where the gradually increasing mean represented by the least squares linear trend (solid line) appears to provide an adequate description of the nonstationarity in the mean of the spatially aggregated annual U.S. precipitation climate. Of course the annually and nationally aggregated results in Fig. 1 are not equally representative of all divisions and seasons, as described in appendix A of Livezey et al. (2007) for these data through 2005.

Following an analysis of synthetic data based on an underlying hinge-climatological generating process, Livezey et al. (2007) suggest that hinge extrapolations be used for climate normals when their trends are sufficiently strong and that OCN-based normals should be used otherwise. They did not quantitatively specify when these two approaches should be used, however, or examine the performance of their proposed protocol using real data. The study presented here examines this and other approaches to defining normals using the U.S. megadivisional data in section 4. Prior to that, section 2 formally defines the hinge and other candidate functions for defining climate normals, and the performance of these candidates in an artificial-data setting will be examined in section 3. Section 5 will conclude with discussion and recommendations for climate normals in both the present and potential future stages of the changing climate.

## 2. Alternative climate-normal formulations

### a. Fixed averaging periods

*k*= 30-yr period. Let

*y*,

_{i}*i*= 1, … ,

*n*, denote the data of interest (e.g., average temperature or total precipitation, for a given month, season, or annual period) in year

*i*. The

*k*= 30-yr normal for target year

*i*is thenEquation (1) indicates annual recomputation of normals rather than the decadal updating recommended by WMO. Nevertheless, use of Eq. (1) with

*k*= 30 (i.e.,

_{30}in the following. Necessarily,

*n*> 30 here and in the following, so that

*k*= 30-yr averages can be considered.

The shorter averaging periods *k* = 10 and *k* = 15 are used with Eq. (1) by CPC to define climate normals for seasonal temperature and precipitation, respectively, which are applied as simple forecast tools for representing the effects of recent climate nonstationarity (Huang et al. 1996; O’Lenic et al. 2008). These fixed averaging periods were arrived at following examination of geographically and seasonally varying OCN averaging windows and compromising on constant values to simplify and regularize operational forecasting procedures (Huang et al. 1996). Here, use of Eq. (1) with *k* = 10 (*k* = 15 (_{10} and CPC_{15}, respectively.

### b. Optimal climate normals

*n*-year training-data sample by choosing that averaging period

*k** that minimizes the mean-square error (MSE):over 1 ≤

*k*≤ 30. The best averaging period

*k** is chosen as the one that produces the smallest mean-square error over all of the

*M*=

*n*− 30 available training-data windows of length 30. The resulting OCN estimate is

*M*data values

*y*, 31 ≤

_{i}*i*≤

*n*, is predicted by the average of the

*k** most recent values preceding it.

In a nonstationary climate for which the underlying mean is increasing or decreasing linearly throughout the training-data record, each of the *M* training-data windows is statistically equivalent, and optimization of the averaging period *k** can be expressed quantitatively as a compromise between random averaging error (which decreases with increasing *k*) and systematic trend bias (which increases with increasing *k*) (Livezey et al. 2007). Indeed, Livezey et al. (2007) note that the resulting optimal *k** could be used to define the averaging period *k* in Eq. (1), and they provide a table illustrating how this *k** varies as a function of the (assumed known) recent climate trend, and the magnitude of the climate noise controlling the averaging error.

If the character of the underlying climate nonstationarity is nonlinear, however, the OCN training-data windows have different statistical characteristics. If the underlying climate change is accelerating (e.g., if the true climate mean is increasing at a faster rate in later years), as approximated for example by the solid hinge-function mean trend in Fig. 1a, then the less recent windows will yield smaller errors for longer averaging periods, which will tend to dilute the tendency for the more recent data windows (which should be more representative, on average, of the current and near-future climate) to yield shorter averaging periods. In a similar way, if the underlying climate change is decelerating, then the less recent windows will dilute the tendency of the more recent windows to yield longer averaging periods.

*m*of the available

*M*=

*n*− 30 training-data windows. For each

*m*, a different optimal averaging length

*k**(

*m*) may be computed, which minimizesand which yields the OCN estimates

*m*≤

*M*. For small

*m*the estimates will tend to be more representative of ongoing nonlinear climate changes on average (will have lower bias) but will be subject to large sampling variations (high variance) because they will be based on shorter training series. For larger

*m*, and in the limit of

*m*=

*M*=

*n*− 30, the sampling variations will be minimized by the maximum training-data size, but on average the resulting

*m*if the underlying climate change is accelerating. Jointly, they can be used to estimate an extrapolated OCN, as the intercept of the regressionThis estimate will be called the

*M*-fold optimal climate normal, or OCN

_{M}:Although the smaller-

*m*estimates will be more representative on average, they will be subject to more sampling variability, so that the intercept

*a*= OCN

_{M}in Eq. (4) should be estimated using weighted regression, as detailed in appendix A.

### c. Regression methods

*n*-member training-data sample,where the predictors

*i*consist of the integers from 1 to

*n*and the

*ε*are the regression residuals. An example is the linear trend line shown in Fig. 1b for annual precipitation averaged over the conterminous United States. Here for example the training data (circles) are the

_{i}*n*= 80 years 1931–2010, and extrapolating the regression line yields the estimated normal of

*τ*

_{i}=

*i*–

*c*,

*c*is the hinge point or change point, and

*I*() is the indicator function whose value is 1 if its argument is true and is 0 otherwise. The subscripts on the regression parameters indicate that they pertain to that portion of the function that occurs after the change point, with the remaining two parameters constrained to

*a*

_{1}=

*a*

_{2}and

*b*

_{1}= 0. Following Livezey et al. (2007), it will be convenient to nondimensionalize the regression slope, expressing it aswhere the denominator is the sample standard deviation of the regression residuals.

The solid piecewise linear function in Fig. 1a is an example hinge regression, fit to training data (circles) for the *n* = 80 years 1931–2010, which extrapolates to the nonstationary normal *β* = 0.0305/0.698 = 0.0437. For a given value of the change point, the parameters of Eq. (6) are estimated by setting the first *c* predictors to zero and setting the last *n* − *c* predictors to *τ _{i}* =

*i*−

*c*, which are the integers from 1 to

*n*−

*c*. Livezey et al. (2007) proposed fixing the change point at 1975 and considering only temperature data from 1940 onward, on the basis of large-scale diagnostics and modeling of modern climate change (e.g., Livezey and Smith 1999; Solomon et al. 2007) together with inspection of the seasonal U.S. divisional data through 2005. The performance of hinge-function extrapolations using these constraints for prediction of the 2006–11 data that were not available to Livezey et al. (2007) will be investigated below. Hinge-normal forecasts will also be made by treating the change point as an additional parameter to be estimated, as in Fig. 1a, which requires repeating the fitting procedure for each

*c*to be considered. To avoid unstable parameter estimates for

*c*very near the beginnings or ends of the training-data records, 5 ≤

*c*≤

*n*− 5 will be enforced.

*a*

_{2}=

*a*

_{1}+

*b*

_{1}

*c*. Again, when the change point

*c*is treated as a free parameter it will be constrained to 5 ≤

*c*≤

*n*− 5. The dashed piecewise linear function in Fig. 1a is an example of such a continuous two-phase regression, yielding the nonstationary normal

*c*in Eqs. (6) or (8) is treated as a free parameter, the respective regressions are fitted for all allowable change points, and that parameter set is chosen as best which maximizes the log likelihood, assuming independent Gaussian residuals (Quandt 1958),Equation (9) is applicable also to the simple linear regression in Eq. (5), by specifying

*c*=

*n*, so that estimation of the parameters

*a*

_{2}and

*b*

_{2}is moot. Equation (9) can therefore be used to compare the three regression models for a given time series. For the temperature data in Fig. 1a, the log likelihoods for the regressions in Eqs. (6) and (8) are −11.26 and −3.671, respectively, so that Eq. (8) would be preferred for these data according to a likelihood ratio test (e.g., Wilks 2011). For the precipitation data plotted in Fig. 1b, the log likelihoods for Eqs. (5), (6), and (8), respectively, are 11.50, 11.86, and 13.92, suggesting that the simple linear regression is to be preferred for these data, according to both the Akaike information criterion and Bayesian information criterion (e.g., Wilks 2011) statistics.

Livezey et al. (2007) recommend using generalized least squares (i.e., accounting for residual autocorrelation) when fitting the regression models considered here, although they noted that this more elaborate estimation procedure leads to only small gains over conventional least squares. It is shown in appendix B that autocorrelation of residuals from Eq. (6) are consistent with at most weak underlying time dependence, for seasonal (i.e., 3-month averages of temperature and 3-month totals for precipitation) 1931–2010 time series for the 102 U.S. megadivisions. Similar results (not shown) are obtained also for the residuals from Eqs. (5) and (8).

## 3. Best normals under hinge-function climatic nonstationarity

Livezey et al. (2007) presented theoretical analyses of various formulations for climate normals under the assumption of linear nonstationarity of the mean. A somewhat different picture emerges if a nonlinear nonstationarity is assumed, however. Simulations described in this section will assume that the underlying climate nonstationarity is in the form of the hinge function [Eq. (6)], which might be regarded as a simple piecewise linear approximation to gradually accelerating nonlinear climate changes of the kind investigated by Epstein (1982).

Figure 2 summarizes the results of 100 000 stochastic simulations each, for regularly spaced values of the parameters *β* and *τ _{n}* over the ranges indicated on the axes. For each of 61 values of

*β*, 10

^{5}time series of length

*n*= 101 were generated according to Eq. (6), with

*c*= 45 and Gaussian residuals having

*σ*= 1. These random series were then truncated after 46, 47, 48, … , 100 time points to yield series having

_{ε}*τ*ranging from 1 to 55. This setup might be regarded as representing annual time series beginning in 1931, with the change point occurring in 1975, which are reanalyzed each year beginning in 1976. The double-headed arrow at the bottom of the figure then indicates values of

_{n}*τ*corresponding to change points in the range 1970–80, relative to the present time. Each of the eight candidate climate-normals formulations described in section 2 [with change point

_{n}*c*treated as a free parameter in Eqs. (6) and (8)] was fit to each of the 10

^{5}realizations for each (

*β*,

*τ*) pair, and the regions indicated in Fig. 2 show that formulation yielding minimum prediction MSE.

_{n}For very weak trends (small |*β*|) and/or years near the estimated change point (small *τ _{n}*), the conventional WMO

_{30}provides the best climate-normal specification. Conversely, for sufficiently large |

*β*| and

*τ*, Fig. 2 shows that the correct (i.e., hinge) climate normal is most accurate. Other formulations are seen to be best at intermediate values of

_{n}*β*and

*τ*, ranging from linear trends for weak

_{n}*β*and large

*τ*, progressively through CPC

_{n}_{15}, CPC

_{10}, OCN

_{M}, and OCN for progressively increasing |

*β*| and decreasing

*τ*. The continuous two-phase regression [Eq. (8)] is not chosen as best overall for any of the parameter combinations when the underlying climate nonstationarity is defined by Eq. (6).

_{n}The practical utility of the parameter-space regions shown in Fig. 2 is unfortunately limited by sampling variability in the model-fitting process and in particular by the need to estimate the parameters *β* and *τ _{n}* by fitting Eq. (6) to finite data series. Figure 3 illustrates the problem for 100 synthetic time series each, generated from Eq. (6) with

*β*= 0.2 and

*τ*= 40 (upper-right collections of small dots),

_{n}*β*= 0.05 and

*τ*= 30 (small dots in the middle portion of the figure, representing data series whose generating process is similar in character to that in Fig. 1a), and

_{n}*β*= 0.05 and

*τ*= 10 (plus signs in the middle and left portions of the figure). In each case the true underlying

_{n}*β*and

*τ*pairs are shown as the heavy dots, and collections of estimated parameter pairs from the 100 sample time series are shown as the small symbols.

_{n}For large underlying *β* and *τ _{n}* (upper-right group of points), the corresponding sample estimates are sufficiently precise that they cluster tightly around the true value, and each would correctly indicate that a hinge normal would be the appropriate choice according to the regions shown in Fig. 2 (and reproduced as the gray lines in Fig. 3). For weaker trends and/or shorter post-change-point training-data records, however, the sample estimates for

*β*and

*τ*can be very poor. In such cases a sample estimate for

_{n}*β*and

*τ*may well fall in a different region of the figure than the true (and unknown)

_{n}*β*and

*τ*, leading to incorrect and therefore suboptimal choices for the formulation representing the normals, even if it is known (hypothetically) that the true underlying climatic stationarity is governed by a hinge function. In all three cases there is a strong tendency for negative covariance of the estimation errors, so that underestimates of

_{n}*τ*generally occur in conjunction with overestimates of

_{n}*β*, and vice versa.

*β*) the conventional

*t*statistic for the regression slope,where, as before,

*τ*=

_{n}*n*−

*c*. The middle equality in Eq. (10) expresses the square of the standard error of an estimated regression slope in terms of the estimated residual variance divided by the sum of squared anomalies of the predictor variables (e.g., Wilks 2011). The final equality in Eq. (10) follows from the fact that the sample variance of any

*τ*consecutive integers is

_{n}*τ*(

_{n}*τ*+ 1)/12 and shows that the

_{n}*t*

^{2}statistic will be large for sufficiently large |

*β*| and

*τ*. The dashed curves in Fig. 3 show four levels of

_{n}*t*

^{2}, for positive

*β*(only).

When fitting Eq. (6) to an individual time series, we see only sample estimates of *β* and *τ _{n}*, corresponding to the small symbols in Fig. 3, and not the true values indicated by the large dots. For sufficiently large

*t*

^{2}, however, it can be concluded with high probability that the true (

*β*,

*τ*) pair is within the hinge region indicated in Fig. 2. To investigate how large this

_{n}*t*

^{2}must be, 10 000 sample time series were generated according to Eq. (6), at each of 50 true underlying (

*β*,

*τ*) pairs located along the lower boundary of this region, with distributions of resulting

_{n}*t*

^{2}tabulated in each case. The 75th, 90th, 95th, and 99th percentiles of these 10 000-member distributions of

*t*

^{2}statistics are approximately 20, 25, 30, and 40, respectively. That is, sample

*t*

^{2}values of these magnitudes indicate approximately the corresponding probabilities for the underlying true (

*β*,

*τ*) pair being within the hinge region of Fig. 2. These are very stringent criteria for adopting the hinge function to represent a climate normal, and they again assume that it is known that the true climate-mean nonstationarity is a hinge function. For example

_{n}*t*

^{2}= 10 corresponds to a hypothesis test rejecting

*H*

_{0}:

*β*= 0, with the nominal significance level

*p*= 0.0016; and

*t*

^{2}= 20 corresponds to the nominal

*p*= 0.000 008. The conclusion here is that, even if the mean of the underlying data-generating process is truly governed by a hinge function, a large sample

*t*

^{2}value is required to be confident that its parameters are within the hinge region of Figs. 2 and 3 and therefore that the fitted hinge normal will be the most effective choice for prediction of future data.

## 4. Best normals for U.S. divisional temperature and precipitation, 1994–2011

In this section, the alternative formulations for climate normals described in section 2 are tested on the seasonal megadivisional data that are described in Livezey et al. (2007). The megadivisions are 102 regions in the conterminous United States that have been derived from the 344 National Climatic Data Center (NCDC) divisions (Guttman and Quayle 1996), by aggregating some of the smaller NCDC divisions located mainly in the eastern United States to yield approximately equal areas. The underlying data are monthly values from the “TD9640” dataset, obtained from NCDC for the period January 1931–February 2012. Seasonal (i.e., averages and totals for consecutive 3-month periods, for temperature and precipitation, respectively) values for January–March 1931 through December–February (DJF) 2011 are used, which in the following will be labeled according to the year of the initial month of each season.

Nine-month-ahead (e.g., using data through February 2011 to predict DJF 2011) hindcasts have been computed for each of the 12 seasons for the periods 1994–2011 and 2006–11 to allow comparison with the results in both Huang et al. (1996) and Livezey et al. (2007). Each of the climate-normals formulations described in section 2 has been recomputed for each prediction using all prior years as training data (and, in addition for temperature forecasts using the piecewise linear regressions, only from 1940 onward), so that the results emulate out-of-sample forecasts that could have been achieved operationally during the 1994–2011 period that were not available to Huang et al. (1996) and the 2006–11 period that were not available to Livezey et al. (2007). For example, prediction equations for a given season in 1994 were trained using previous values for that season from the years 1931–93. Equations predicting 2011 were trained using all data through 2010.

_{30}, in terms of the reduction of variance (RV):This statistic is numerically equal to 1 minus the conventional MSE-based skill score (e.g., Wilks 2011), and was denoted as MSE* in Wilks (1996). Values smaller than RV = 1 indicate hindcasts that are more accurate than the baseline.

Results for the 2006–11 period not available to Livezey et al. (2007) are shown in Fig. 4 to evaluate Livezey et al.’s conclusions using independent data. For the two piecewise linear normals formulations [Eqs. (6) and (8)], results are presented both for training data beginning in 1931 and the change point *c* estimated as a free parameter (labeled “*c* estd.”) and, following the prescription in Livezey et al. (2007), for training data limited to 1940 onward while fixing the change point at 1975 (“*c* = 1975”). Within each seasonally stratified panel of Fig. 4, the results are in addition stratified geographically, with the “west” region composed of the U.S. states to the west of Montana, Wyoming, Colorado, and New Mexico, inclusive (roughly west of 104° longitude) and the “east” region corresponding to the area east of the Mississippi River (east of Wisconsin, Illinois, Kentucky, Tennessee, and Mississippi, inclusive, or east of roughly 90° longitude), with the “central” region between. These boundaries have been chosen to correspond roughly to hinge-trend regions indicated by Livezey et al. (2007) in their Figs. 1a and A1.

Collectively, the panels of Fig. 4 show that the relative accuracy of the temperature-normals projections is worst in “winter” [DJF–February–April (FMA)], is generally best in “summer” [June–August (JJA)–August–October (ASO)], and is intermediate in the transition seasons. In the two transition seasons the fixed averaging period of CPC_{15} yields the most accurate projections in two of the three geographical stratifications and for the nationally aggregated results (gray bars). In summer the continuous two-phase regression [Eq. (8)] provides the best accuracy in most cases. For the winters during 2006–11 the least squares linear trend extrapolations were most accurate, except in the western region, where the OCN exhibits the best RV. WMO_{30} yielded most accurate extrapolations (all RVs > 1) in winter, except for the eastern region. Winter and autumn are also the only seasons in which there are consistent differences among the regions, with the western division generally being forecast least accurately and the eastern division being forecast most accurately.

Figure 4 also shows clearly, using fully independent data, that the equation-fitting prescription of Livezey et al. (2007) is justified, that is, that hinge normals for temperature extrapolations should be fit using data from 1940 onward and using the fixed change point *c* = 1975. The resulting hindcasts are consistently better than those allowing an estimated change point, except for the western region during the winter and spring seasons. In a similar way, fixing the change point at 1975 is also effective at improving the continuous two-phase regressions.

Figure 5 presents corresponding temperature-projection results for the 1994–2011 period that was not available to Huang et al. (1996). Here it is clear that CPC_{15} yielded the best results, except for the western region and national aggregation during summer. For this longer period, seasonal and geographical differences are less pronounced than for the shorter-period results in Fig. 4. Fixing the change point at 1975 again yields better extrapolated normals than allowing *c* to be a free parameter, although this is not a fully out-of-sample result since these data include the 1994–2005 period available to Livezey et al. (2007).

Figure 6 shows RVs relative to WMO_{30} for nationally and annually aggregated results for the 2006–11 period that was not available to Livezey et al. (2007) (open bars), and the 1994–2011 period that was not available to Huang et al. (1996) (shaded bars). Figure 6a shows that, contrary to the recommendation of Huang et al. (1996) that fixed 10-yr averaging periods should provide the best overall temperature-normals specifications, the 15-yr average (CPC_{15}) has performed best overall since their data cutoff and also during the most recent 6 yr. In a similar way, for precipitation (Fig. 6b), none of the alternative normals formulations for precipitation has been more accurate overall since 1994 than the simple WMO_{30}, contradicting the Huang et al. (1996) recommendation that CPC_{15} should be used for this purpose. Figure 6b also shows that, similar to the temperature results in Figs. 4 and 6a, fixing the change point at 1975 for specifying precipitation normals (here, using training data from 1931) improves both hinge and continuous two-phase forecasts of the nonstationary means, including particularly during the 2006–11 period that was not available to Livezey et al. (2007).

Figures 4–6 also plot 90% bootstrap confidence intervals for the averaged RVs, as the thin vertical whiskers. Differences in the underlying sampling variability for the various normals formulations and for temperature in contrast to precipitation are most easily seen in the aggregated results of Fig. 6. For temperature, the least sampling uncertainty is exhibited by the conventional linear regressions, followed closely by CPC_{15} and OCN. The generally shorter averaging periods of CPC_{10} and OCN_{M} yield yet larger sampling uncertainty. The hinge and two-phase regressions exhibit the largest sampling variability, and fixing the change point at 1975 reduces this uncertainty only slightly. Sampling uncertainty for the precipitation projections is consistently less than for temperature, presumably because the smaller spatial scale of precipitation variations yields larger effective sample sizes. In a similar way, sampling uncertainty for the longer 1994–2011 period is consistently smaller than for the shorter 2006–11 period.

Figure 7 shows forecast biases, again for the 2006–11 (open bars) and 1994–2011 (shaded bars) periods. For temperature hindcasts (Fig. 7a) during the longer period, both WMO_{30} and linear trend extrapolation yield strong negative (i.e., cold, or underforecasting) biases, as they fail to keep up with the accelerating temperature increases after the mid-1970s that are evident in Fig. 1a. Shorter averaging periods (CPC_{15} and CPC_{10}) exhibit smaller negative biases than WMO_{30}, and the hinge and continuous two-phase regressions show much better bias characteristics than do the ordinary linear regressions. Similarly, the more data-adaptive OCN_{M} exhibits a smaller negative bias than does the conventional OCN. The bias characteristics for temperature projections during the 2006–11 period are very different because most of these years (most notably 2008 and 2009) were below the trend established over the previous decades, yielding positive (warm, overforecasting) biases during this relatively short sample for all of the methods except WMO_{30} and linear regression. Precipitation biases (Fig. 7b) are more consistent between the two time periods than are the temperature biases in Fig. 7a. The relatively small biases for linear trends in precipitation in Fig. 7b reflect the quasi-linearity of the precipitation trends exemplified by the overall average in Fig. 1b, in contrast to the accelerating trend exhibited by the national and annual temperature averages in Fig. 1a.

Livezey et al. (2007) suggested a “hybrid” procedure for defining climate normals, in which the hinge function is used to extrapolate recent trends in cases in which the fitted hinge model is sufficiently strong, with OCN being used otherwise. They did not specify a quantitative criterion for choosing between the two. Figure 8 shows results for this hybrid procedure, in terms of RMSEs of seasonal temperature predictions, again for the 2006–11 period that was not available to Livezey et al. (2007). Here the change point is fixed at *c* = 1975, hinge functions are fit using data from 1940 onward, and use of the fitted hinge functions is invoked at minimum *t*^{2} levels ranging from *t*^{2} > 3 to *t*^{2} > 20 (i.e., never using the hinge). The six curves in Fig. 8 show the progressive decreases in RMSE when the indicated alternative normals formulations are used for cases at or below the specified *t*^{2} thresholds. Best results are obtained when the alternative normals formulation is WMO_{30} (although results for CPC_{15} are very similar), consistent with the open bars in Fig. 6a. In none of the cases does use of the hinge normal above any *t*^{2} level improve the RMSE over the *t*^{2} > 20 case, for which the hinge function is never used. Similar results (not shown) are obtained also for the continuous two-phase regression, for both piecewise linear methods when the change point is allowed to differ from 1975 and when predictions for 1994–2005 are included also, although in that case the rankings for the different alternative methods are consistent with the shaded bars in Fig. 6a. Therefore, the Livezey et al. (2007) “hybrid” procedure cannot yet be recommended for projecting U.S. megadivisional normals.

## 5. Summary and conclusions

The performance of eight formulations for computing climate normals has been examined here in both artificial-data and real-data settings. Assuming underlying climate nonstationarities defined by the “hinge” function [Eq. (6)], fitting that functional form to sample time series is not the best approach for specifying climate normals unless those fitted regressions are very highly significant—exhibiting *t*^{2} values [Eq. (10)] larger than perhaps 30 or more. The reason for this very stringent requirement derives from the large sampling variability in the fitted hinge-function parameters unless both *β* and *τ _{n}* are large. When the hinge-function climate nonstationarity is weaker, because of shallow slopes (small

*β*) or short times since the change point (small

*τ*), or both, other candidate formulations for the climate normals perform better even when the hinge function is the correct statistical model.

_{n}Huang et al. (1996) studied the OCN formulation [Eq. (2)] in the context of U.S. divisional temperature and precipitation data and concluded that fixed averaging periods [Eq. (1)] of *k* = 10 (CPC_{10}) for temperature and *k* = 15 (CPC_{15}) for precipitation were the best single definitions for the respective seasonal normals when constrained by operational considerations. The results presented here, including especially those for the 1994–2011 period that was not available to Huang et al. (1996), show that the fixed averaging period with *k* = 15 (CPC_{15}) has performed best for the seasonal temperature hindcasts and that *k* = 30 (WMO_{30}) was best for seasonal precipitation. These results are also consistent with those in Wilks (1996), in which the same 1961–93 target period as in Huang et al. (1996) was used. At present the best single approach to defining climate normals for U.S. divisional data apparently continues to be fixed averaging periods with *k* = 15 yr for seasonal temperature and *k* = 30 yr for seasonal precipitation. Attempts to optimize these on a case-by-case basis (i.e., using OCN) have yielded inferior results here, including for the 1994–2011 period not used in Huang et al. (1996) to examine the OCN-, CPC_{10}-, and CPC_{15}-based normals.

Although an attractive idea, the suggestion of Livezey et al. (2007) to specify climate normals using the hinge function in cases in which the fitted hinge regressions are sufficiently strong and using a more conventional formulation otherwise did not provide improvements over the more conventional formulations alone for U.S. megadivisional data during 2006–11. The parenthetical subsample sizes on the horizontal axis of Fig. 7 show that only about 1.6% of the plotted cases exhibit *t*^{2} > 10, so that reference to Fig. 3 suggests that the fitted hinge functions are not yet strong enough to be preferred for specifying climate normals. Another potential contributor to the disappointing performance of this proposed method may be data inhomogeneities in the observations underlying the megadivisional data, which may have led to underestimation of the fitted trends (Menne et al. 2009; Williams et al. 2012; R. E. Livezey 2012, personal communication). Accordingly, a repetition of this study using homogenized (Williams et al. 2012) data would be warranted. To the extent that the climate continues to warm in the coming decades at a rate comparable to that indicated in Fig. 1a, it is very possible that the hinge and continuous two-phase regressions may become preferred methods, at least for locations and seasons exhibiting the most prominent piecewise linear historical trends. As of 2011 *τ _{n}* ≈ 35 yr, so that if the present warming trend continues (i.e., to the extent that

*β*s do not change appreciably), Eq. (10) indicates that

*t*

^{2}values will approximately double in the next decade and quadruple over the next two decades.

A new climate-normal statistic, the *M*-fold OCN, or OCN_{M}, was introduced here. As detailed in appendix A, it is formulated as a weighted regression of conventional OCN estimates, computed over all *M* = *n* − 30 of the most recent training windows, as an attempt to capture accelerating (or decelerating) climate changes better than does the conventional OCN. This approach often yielded more accurate temperature normals than hinge or continuous two-phase regression extrapolations for the winter and spring seasons (Fig. 4) but was generally less accurate for the summer–autumn portion of the year. Its use should also be reconsidered periodically, particularly to the extent that ongoing climate changes may appear to further accelerate in the future.

I thank Marina Timofeyeva for supplying the data used to aggregate NCDC climate divisions to the CPC megadivisions and Bob Livezey for valuable comments on earlier drafts of this paper. This research was supported by the National Science Foundation under Grant AGS-1112200.

# APPENDIX A

*M*-Fold OCN

In a nonstationary climate in which the change in an underlying mean value is accelerating (or decelerating), ordinary OCNs will yield an apparently best averaging period *k* that is a compromise between smaller (or larger) *k* for more recent averaging periods and vice versa for less recent averaging periods. Equation (3) expresses the fact that a best averaging period *k**(*m*) can be estimated for each of the most recent *m* training-data windows, *m* = 1, … , *M* = *n* − 30; each yielding in turn an estimate for the OCN of

*M*-fold OCN, OCN

_{M}, as the intercept of a regression summarizing the linear relationship between the

*m*, in effect extrapolating the

*m*= 0. Although the small-

*m*estimates are expected to be more representative of current conditions in a nonlinearly nonstationary climate, they will be estimated with fewer data values and so will be more subject to sampling variations. Therefore the appropriate estimation procedure will be weighted regression (e.g., Draper and Smith 1981), in which the variance of each

*m*. That is,whereand the data matrix

**y**areSubstituting into Eq. (A1) and using analytic results for sums of the integers and squared integers from 1 to

*M*yields the OCN

_{M}estimate:

It is also possible to formulate Eq. (4a) as a quadratic function of the number of training-data windows used, but that approach yielded slightly inferior out-of-sample hindcasts for the U.S. megadivisional data considered in this paper.

# APPENDIX B

## Autocorrelation of Residuals from Hinge Fits

Livezey et al. (2007) recommend adjusting for autocorrelation of the residuals when fitting the hinge-function regression model in Eq. (6). Some of these sample residual autocorrelations are larger in absolute value than 0.3 for the U.S. megadivision seasonal temperature and precipitation data for 1931–2010, but overall they are consistent with, at most, weak underlying autocorrelation.

*c*. In each plot there are 102 (megadivisions) × 12 (3-month seasons) = 1224 sample autocorrelations, which have been transformed using the inverse hyperbolic tangent (i.e., Fisher

*Z*transform),The sampling distribution of these transformed correlations is expected to be Gaussian, with variance (

*n*− 3)

^{−1}= 1/76 = 0.115

^{2}(e.g., Wilks 2011), because the lagged correlations have been computed using

*n*= 79 pairs of consecutive years.

For the temperature autocorrelations in Fig. B1a, the sample mean, standard deviation, and skewness coefficients are 0.041, 0.137, and 0.305, respectively, and so are shifted slightly to the right and are slightly more variable and more positively skewed than the theoretical Gaussian distribution (smooth curve), but they appear to be consistent with a sample of correlations drawn from a process whose underlying mean is only slightly larger than zero. For the precipitation autocorrelations (Fig. B1b), the mean, standard deviation, and skewness coefficients are −0.008, 0.106, and 0.048, respectively, suggesting that they are completely consistent with having been drawn from a process with zero underlying autocorrelation.

Similar results (not shown) are obtained for the residuals from simple linear regressions [Eq. (5)] and continuous two-phase regressions [Eq. (8)], fit to the same data. Note that in a nonstationary climate autocorrelation of the data (or of data anomalies from a constant mean) should be substantially larger than those for the regression anomalies shown in Fig. B1. For example, in a warming climate, anomalies from the overall mean will tend to be mainly negative early in the record and mainly positive late in the record and so will be generally similar to anomalies for adjacent years, yielding a tendency for positive autocorrelation even if the corresponding regression residuals are essentially uncorrelated.

## REFERENCES

Arguez, A. A., , and R. S. Vose, 2011: The definition of the standard WMO climate normal.

,*Bull. Amer. Meteor. Soc.***92**, 699–704.Beaumont, R. T., 1957: A criterion for selection of length of record for moving arithmetic mean for hydrological data.

,*Trans. Amer. Geophys. Union***38**, 198–200.Dixon, K. W., , and M. D. Shulman, 1984: A statistical evaluation of the predictive abilities of climatic averages.

,*J. Climate Appl. Meteor.***23**, 1542–1552.Draper, N. R., , and H. Smith, 1981:

*Applied Regression Analysis*. John Wiley and Sons, 709 pp.Enger, I., 1959: Optimum length of record for climatological estimates of temperature.

,*J. Geophys. Res.***64**, 779–787.Epstein, E. S., 1982: Detecting climate change.

,*J. Appl. Meteor.***21**, 1172–1182.Guttman, N. B., , and R. G. Quayle, 1996: A historical perspective of U.S. climate divisions.

,*Bull. Amer. Meteor. Soc.***77**, 293–303.Huang, J., , H. M. van den Dool, , and A. G. Barnston, 1996: Long-lead seasonal temperature prediction using optimal climate normals.

,*J. Climate***9**, 809–817.Hulme, M., , S. Dessai, , I. Lorenzoni, , and D. R. Nelson, 2009: Unstable climates: Exploring the statistical and social constructions of “normal” climate.

,*Geoforum***40**, 197–206.Kunkel, K. E., , and A. Court, 1990: Climatic means and normals—A statement of the American Association of State Climatologists.

,*Bull. Amer. Meteor. Soc.***71**, 201–204.Lamb, P. J., , and S. A. Changnon Jr., 1981: On the “best” temperature and precipitation normals: The Illinois situation.

,*J. Appl. Meteor.***20**, 1383–1390.Livezey, R. E., , and T. M. Smith, 1999: Covariability of aspects of North American climate with global sea surface temperatures on interannual to interdecadal time scales.

,*J. Climate***12**, 289–302.Livezey, R. E., , and M. M. Timofeyeva, 2008: The first decade of long-lead U.S. seasonal forecasts: Insights from a skill analysis.

,*Bull. Amer. Meteor. Soc.***89**, 843–854.Livezey, R. E., , K. Y. Vinnikov, , M. M. Timofeyeva, , R. Tinker, , and H. M. van den Dool, 2007: Estimation and extrapolation of climate normals and climatic trends.

,*J. Appl. Meteor. Climatol.***46**, 1759–1776.Lund, R., , and J. Reeves, 2002: Detection of undocumented change points: A revision of the two-phase regression model.

,*J. Climate***15**, 2547–2554.Menne, M. J., , C. N. Williams Jr., , and R. S. Vose, 2009: The U.S. Historical Climatology Network monthly temperature data, version 2.

,*Bull. Amer. Meteor. Soc.***90**, 993–1007.O’Lenic, E. A., , D. A. Unger, , M. S. Halpert, , and K. S. Pelman, 2008: Developments in operational long-range climate prediction at CPC.

,*Wea. Forecasting***23**, 496–515.Percival, D. B., , and D. A. Rothrock, 2005: “Eyeballing” trends in climate time series: A cautionary note.

,*J. Climate***18**, 886–891.Quandt, R. E., 1958: The estimation of the parameters of a linear regression system obeying two separate regimes.

,*J. Amer. Stat. Assoc.***53**, 873–880.Sabin, T. E., , and M. D. Shulman, 1985: A statistical evaluation of the efficiency of the climatic normal as a predictor.

,*J. Climatol.***5**, 63–77.Solomon, S., , D. Qin, , M. Manning, , M. Marquis, , K. Averyt, , M. M. B. Tignor, , H. L. Miller Jr., , and Z. Chen, Eds., 2007:

*Climate Change 2007: The Physical Science Basis.*Cambridge University Press, 996 pp.Solow, A. R., 1987: Testing for climate change: An application of the two-phase regression model.

,*J. Climate Appl. Meteor.***26**, 1401–1405.van den Dool, H. M., 2007:

*Empirical Methods in Short-Term Climate Prediction*. Oxford University Press, 215 pp.Wilks, D. S., 1996: Statistical significance of long-range “optimal climate normal” temperature and precipitation forecasts.

,*J. Climate***9**, 827–839.Wilks, D. S., 2011.

*Statistical Methods in the Atmospheric Sciences*. 3rd ed. Academic Press, 676 pp.Williams, C. N., , M. J. Menne, , and P. W. Thorne, 2012: Benchmarking the performance of pairwise homogenization of surface temperatures in the United States.

,*J. Geophys. Res.***117**, D05116, doi:10.1029/2011JD016761.