## 1. Introduction

It is well known that daily temperature normals computed by simply averaging the 30 values for each day of the year (i.e., “raw daily normals”) constitute a fairly noisy time series because of sampling variability (see Fig. 1). This effect is exacerbated if missing values are present in the data record. There are numerous ways to calculate daily temperature normals such that high-frequency noise is suppressed, yielding a smooth representation of the annual cycle.^{1} Two common approaches are time series filtering and interpolation. The filtering approach applies a low-pass filter to the raw daily normals. The interpolation approach fits a curve through the 12 monthly normals. This method is extremely useful in situations where the underlying daily data are not available, which is not uncommon when dealing with international data, as nations are more likely to make their monthly data freely available rather than their daily data (T. Peterson 2013, personal communication).

A factor complicating the computation of daily temperature normals is the need to homogenize temperature time series. The U.S. surface temperature record is known to contain inhomogeneities due to station moves, instrumentation changes, and other changes in observing practices (Menne et al. 2010). For example, the transition from manual observations to Automated Surface Observing System by the National Weather Service in the 1990s is known to have introduced inhomogeneities into those stations' time series (Guttman and Baker 1996). Homogenization is essential for ensuring that climate normals are as representative of the current observing practices as possible at the time of computation. Currently, the U.S. surface temperature record is only homogenized at the monthly (and annual) scale (Menne and Williams 2009; Menne et al. 2009), whereas daily temperature values from the Global Historical Climatology Network–Daily (GHCN-Daily) are not (Menne et al. 2012), even though the mean monthly temperatures are derived from daily observations. Figure 1 illustrates this difference. The raw daily normals for minimum temperature at the Reno, Nevada, station are ~2°F (~1.1°C) cooler than the homogenized monthly normals for each month of the year. As articulated by Menne et al. (2009), this station's minimum temperature time series was homogenized to account for an apparent urbanization signal, resulting in warmer temperature values over 1981–2010 and a substantially reduced upward trend over the last three decades that is more in line with the trends at neighboring stations (see their Fig. 8 for more information).

Thus, we developed a technique for computing daily normals with the following properties:

- Day-to-day sampling variability is removed.
- The monthly average of the daily temperatures exactly equals the corresponding monthly normal for each month, thereby “passing through” the monthly homogeneity adjustments to the daily scale.
- There are no intermonth discontinuities (which can be evident in other methods when monthly varying homogeneity adjustments are applied).
- The shape of the annual cycle is influenced by the daily data.

A method that satisfies all four properties is an approach we call the constrained harmonic fit, which was utilized to compute NOAA's 1981–2010 daily temperature normals (Arguez et al. 2012). Harmonic analysis (without constraints) has been shown to be a useful tool for estimating an annual cycle (see examples 8.8–8.11 in Wilks 2006). By incorporating constraints, we can ensure that the average of the resulting daily temperature normals for each month exactly equals the corresponding monthly temperature normal. The constrained harmonic fit approach is described in greater detail in section 2. Examples are shown in section 3, followed by a summary in section 4.

## 2. Data and methodology

Daily maximum temperatures (tmax) and minimum temperatures (tmin) were extracted from GHCN-Daily for 2041 U.S. stations. The data values in GHCN-Daily have undergone extensive quality control as described by Durre et al. 2010, resulting in some values that are flagged as erroneous. Each station has at least 25 nonflagged values available for each Julian day (*t*), for both tmax and tmin, over the 1981–2010 time period. The raw daily normals *y*(*t*) are calculated as the simple arithmetic mean of the valid values for each Julian day. The homogenized monthly normals of tmin and tmax are also used. The generalized approach presented here does not depend on the manner in which monthly temperature normals are calculated. For the purposes of this study, we assume that accurate monthly temperature normals have been computed a priori for use in calculating daily temperature normals.

*h*(

*t*) takes the familiar formwhere

*t*ranges from 1 to

*N*= 365,

*ω*= 2

_{k}*πk*/

*N*, and

*M*is the number of harmonics used. If

*M*=

*N*/2, then

*h*(

*t*) =

*y*(

*t*); the curve

*y*(

*t*) has been reconstructed exactly via the superposition of sines and cosines. If

*M*≪

*N*/2, then

*h*(

*t*) represents a heavily smoothed version of

*y*(

*t*), equivalent to a low-pass filter. If no constraints are applied, then we can solve for the coefficient vectors

**A**and

**B**in (1) via least squares minimization of the following cost function:This system of linear equations (2

*M*+ 1 equations and 2

*M*+ 1 unknowns) can be solved fairly easily using singular value decomposition (SVD). For this unconstrained case, the coefficient vectors

**A**and

**B**are more commonly calculated using a Fourier transform.

*T*

_{jan}is the monthly temperature normal for January,

*T*

_{feb}is the monthly temperature normal for February, etc. The

*λ*terms are Lagrange multipliers that impose the constraints. The above formulation follows the “earlier tradition” or the “Lagrangian function approach” described by Kalman (2009). Once again, we can solve this linear system of equations using SVD.

If the number of harmonics *M* is ≥6, then the constraints may be met exactly. Otherwise, the linear system is overdetermined and, though SVD can yield a solution for the coefficients, the constraints will not be met, as we will illustrate in the next section. After experimenting with *M* ≥ 6 and visually inspecting the smoothing provided on several samples, we chose to set *M* equal to 6 for this exercise, although *M* > 6 can be a viable alternative for other applications so long as the potential for overfitting is accounted for. The constrained harmonic fit was calculated for tmax and tmin for all 2041 stations.

## 3. Results

The raw daily tmax normal curve for Salinas, California (Fig. 2a), clearly demonstrates an annual cycle that deviates from a classic sinusoidal curve. In fact, using a single harmonic (*M* = 1), the constrained harmonic fit would place the annual peak in early August, whereas the daily data indicate the warmest time of year in Salinas is September–early October. Salinas is representative of many stations in the western United States that have a steeper transition during fall and a more gradual transition during spring. Both the cubic spline interpolation and the constrained harmonic fit (*M* = 6) do a respectable job of capturing the annual cycle in Salinas [the maximum absolute difference between the two is ~0.5°F (~0.3°C)], although the cubic spline interpolation is incapable of resolving intramonth fluctuations (Fig. 2b). For example, the raw daily normals clearly indicate the presence of a local minimum in late July, a feature that is captured by the constrained harmonic fit, yet the cubic spline interpolation increases monotonically from June through August.

The raw daily tmax normal curve for Beaver City, Nebraska (Fig. 3), exhibits strong continentality, in sharp contrast to the much narrower annual range of tmax in coastal California as seen in Salinas. The raw daily normals are noticeably warmer (~2.5°F; ~1.4°C) than the cubic spline interpolation or the constrained harmonic fit, as the monthly homogenization procedure had the net effect of lowering the monthly tmax values for Beaver City (in contrast to Reno tmin; Fig. 1). The constrained harmonic fit is shifted toward cooler tmax values to accommodate this, whereas the cubic spline interpolation is simply fit through the cooler homogenized normals. The two methods result in daily temperature normals that differ by as much as ±1.5°F (±0.8°C) for specific days, particularly in the warmest and coldest months, although the annual means differ by less than 0.1°F (0.06°C).

Consider how the cubic spline interpolation performs in December for Beaver City tmax. The December tmax normal is 40.8°F (4.9°C), and the average of the constrained harmonic fit values for December is also 40.8°F (4.9°C), given the hard constraint. The cubic spline interpolation is anchored at 40.8°F (4.9°C) on 16 December (the middle of the month), but early December values range from 41° to 46°F (5.0°–7.8°C), whereas late December values bottom out near 40°F (4.4°C). As a result, the mean of the cubic spline interpolation for December is over 0.7°F (0.4°C) warmer than the December normal (see inset in Fig. 3). This highlights the inability of the cubic spline interpolation to preserve the homogenized monthly mean. This leads the cubic spline interpolation to underestimate peaks in summer and overestimate minima in winter, resulting in a slight underestimation of total variance with respect to the raw daily normals as well as the constrained harmonic fit approach, which does not result in lowered total variance (not shown).

Overall, the mean differences between the constrained harmonic fit and the cubic spline interpolation results for the 2041 stations considered is essentially zero for both tmax and tmin. However, individual values can vary to some degree, as seen in Figs. 2 and 3. The average absolute difference between the two approaches is 0.3°F (0.2°C) for both tmax and tmin. In terms of maintaining consistency between the monthly temperature normals and the resulting daily temperature normals, the constrained harmonic fit approach with (*M* = 6) has zero discrepancy, by design, for all station months. For both tmax and tmin, the spline fit interpolation is typically off by 0.2°F (0.1°C) in an absolute sense from maintaining consistency, and the discrepancy can be as high as 1.0°F (0.6°C). As intimated by Fig. 3, the largest inconsistencies tend to occur in summer and winter months near the maxima and minima for the year.

An alternative to these methods is to apply a low-pass filter to the raw daily normals, such as successive passes of the 1–2–1 filter (see von Storch and Zwiers 1999). However, this simple approach cannot guarantee preservation of the mean. Further, low-pass filtering typically results in reduced variance (e.g., see Arguez et al. 2008), which can lead to severe underestimation and overestimation of highs and lows, respectively.

Other potential options involve a multistep approach. For example, one could pass the monthly adjustments down to either the daily data or to the raw daily normals series (e.g., Vincent et al. 2002). Then, an unconstrained harmonic fit could be used to derive the daily temperature normals. Again, this method does not guarantee preservation of the monthly normals. Another approach is to compute the unconstrained harmonic fit of the raw daily normals, and then adjust the values in each month up or down to match the corresponding monthly normal. However, this introduces intermonth discontinuities, as seen when it is applied to the raw daily tmin normals for Hartford, Connecticut (Fig. 4).

As stated earlier, the constrained harmonic fit does not preserve the mean monthly normals when fewer than six harmonics are used. Table 1 shows, as a function of *M* (the number of harmonics used), the root-mean-square errors (RMSEs) between the homogenized monthly normals and the monthly averages of the constrained harmonic fit values for the four time series considered in this investigation. For example, the *M* = 1 case for Salinas tmax, which was displayed graphically in Fig. 2a, has an RMSE value of 1.69°F (0.9°C). The constraints in (3) ensure that the RMSE will equal 0 for every month when *M* ≥ 6, which is reflected in Table 1.

RMSEs in °F (°C) between the homogenized monthly normal and the monthly average of the daily normals calculated using the constrained harmonic fit as a function of *M*.

## 4. Summary

The constrained harmonic fit effectively smooths the raw daily normals without introducing discontinuities, and ensures preservation of the mean monthly normals. The cubic spline interpolation approach results in a smooth and continuous daily temperature normals curve. However, it does not always capture intraseasonal fluctuations as seen in Fig. 2a, and it does not guarantee preservation of the mean monthly normals as documented in Fig. 3.

The constrained harmonic fit described here offers a one-step approach for computing daily temperature normals satisfying the four properties listed in the introduction. It is particularly useful when monthly temperatures are homogenized but the underlying daily data values are not. However, it also accommodates inherent inconsistencies between monthly and daily data caused by other factors, such as separate quality control of monthly and daily temperatures or the manner in which monthly temperature normals are computed. We recommend the constrained harmonic fit approach for computation of daily temperature normals when the underlying daily data are available and monthly temperature data are deemed to be of superior quality versus the daily data.

## Acknowledgments

The authors kindly acknowledge I. Durre, R. Vose, M. Squires, X. Yin, J. Nielsen-Gammon, J. Christy, and J. Curtis for providing feedback on this approach. We also thank three anonymous reviewers for providing actionable feedback that helped strengthen the manuscript.

## REFERENCES

Arguez, A., , Yu P. , , and O'Brien J. J. , 2008: A new method for time series filtering near endpoints.

,*J. Atmos. Oceanic Technol.***25**, 534–546.Arguez, A., , Durre I. , , Applequist S. , , Vose R. S. , , Squires M. F. , , Yin X. , , Heim R. R. Jr., , and Owen T. W. , 2012: NOAA's 1981–2010 U.S. Climate Normals: An overview.

*Bull. Amer. Meteor. Soc.,***93,**1687–1697.Durre, I., , Menne M. J. , , Gleason B. E. , , Houston T. G. , , and Vose R. S. , 2010: Comprehensive automated quality assurance of daily surface observations.

,*J. Appl. Meteor. Climatol.***49**, 1615–1633.Easterling, D. R., , and Peterson T. C. , 1995: A new method for detecting and adjusting for undocumented discontinuities in climatological time series.

,*Int. J. Climatol.***15**, 369–377.Guttman, N. B., , and Baker C. B. , 1996: Exploratory analysis of the difference between temperature observations recorded by ASOS and conventional methods.

,*Bull. Amer. Meteor. Soc.***77**, 2865–2873.Kalman, D., 2009: Leveling with Lagrange: An alternative view of constrained optimization.

,*Math. Mag.***82**, 186–196.Menne, M. J., , and Williams C. N. Jr., 2009: Homogenization of temperature series via pairwise comparisons.

,*J. Climate***22**, 1700–1717.Menne, M. J., , Williams C. N. Jr., , and Vose R. S. , 2009: The United States Historical Climatology Network monthly temperature data version 2.

,*Bull. Amer. Meteor. Soc.***90**, 993–1007.Menne, M. J., , Williams C. N. Jr., , and Palecki M. A. , 2010: On the reliability of the U.S. surface temperature record.

*J. Geophys. Res.,***115,**D11108, doi:10.1029/2009JD013094.Menne, M. J., , Durre I. , , Vose R. S. , , Gleason B. E. , , and Houston T. G. , 2012: An overview of the Global Historical Climatology Network-Daily database.

,*J. Atmos. Oceanic Technol.***29**, 897–910.Owen, T. W., , and Whitehurst T. , 2002: United States Climate Normals for the 1971–2000 period: Product descriptions and applications. Preprints,

*Third Symp. on Environmental Applications: Facilitating the Use of Environmental Information,*Orlando, FL, Amer. Meteor. Soc., J4.3. [Available online at https://ams.confex.com/ams/annual2002/techprogram/paper_26747.htm.]Peterson, T. C., , and Easterling D. R. , 1994: Creation of homogeneous composite climatological reference series.

,*Int. J. Climatol.***14**, 671–679.Vincent, L. A., , Zhang X. , , Bonsal B. R. , , and Hogg W. D. , 2002: Homogenization of daily temperatures over Canada.

,*J. Climate***15**, 1322–1334.von Storch, H., , and Zwiers F. W. , 1999:

*Statistical Analysis in Climate Research.*Cambridge University Press, 484 pp.Wilks, D. S., 2006.

*Statistical Methods in the Atmospheric Sciences.*2nd ed. International Geophysics Series, Vol. 59, Academic Press, 627 pp.

^{1}

The term “annual cycle” is often used to refer to a single cosine, sine, or harmonic even though the annual march of temperatures for some locations can be highly asymmetric. Here, we refer to an annual cycle as the annual march of temperatures whose morphology can vary from station to station.