## 1. Introduction

Leith (1978) proposed a novel method for empirically correcting a dynamical forecast model in which a term is added to the governing equations that subtracts the predicted tendency error at each time step. This type of empirical correction method differs from after-the-fact correction methods, such as subtracting the bias of a forecast, in that the former involves modifying the dynamical equations while the latter involves modifying the forecast after the unmodified model generates the forecast. Recently, DelSole et al. (2008) concluded that empirical correction methods can substantially improve the bias of dynamical forecasts, but otherwise have relatively little impact on the remaining forecast errors. This conclusion differs from that of other papers, most notably Johansson and Saha (1989), DelSole and Hou (1999), Yang and Anderson (2000), and Danforth et al. (2007), which conclude that a state-independent empirical correction can improve the forecast skill of a model. Given the significant benefits that might follow from improving the skill of an operational weather forecast model, it is important to investigate whether an empirical correction method can indeed improve forecast skill.

The study of DelSole et al. (2008) was based on the Center for Ocean–Land–Atmosphere Studies, version 3.2, (COLAv3.2) model, which is not an operational weather forecast model. In addition, the analyses used to verify the forecasts were generated by a data assimilation system using a different dynamical model. Inconsistencies between the initial condition and dynamical model could be detrimental to empirical correction methods because the correction coefficients are estimated from very short forecasts, which might be dominated by spurious gravity waves. Last, the results from the COLA model revealed that the tendency error at individual grid points was noisy, suggesting that the empirical correction coefficients might be estimated more accurately if the correction were based on large-scale basis functions, such as spherical harmonics, which would filter small-scale noise. The purpose of this paper is to test whether the main conclusions of DelSole et al. (2008) hold for an operational weather forecast model, with a validation dataset that is dynamically consistent with the model, and using large-scale spherical harmonics instead of grid points for estimating correction terms.

In section 2, we review the National Centers for Environmental Prediction (NCEP) operational Global Forecast System (GFS) model, datasets, and method used in this study. The main results of our experiments are presented in section 3. We conclude the paper with a summary and discussion.

## 2. Dynamical model, data, and method

The dynamical model used in this study is the NCEP GFS that has been operational since May of 2007 (Moorthi et al. 2001; Saha et al. 2006), except that the resolution used here is T126 (equivalent to a latitude–longitude grid of 0.94° × 0.94°) instead of T384 and 64 sigma levels are used instead of 64 hybrid vertical levels.

The verification dataset used in this study is the sigma analysis produced by the NCEP Global Data Assimilation System (GDAS). Important is that the GDAS assimilates data into the GFS model; therefore the verification data and the initial conditions are dynamically consistent with the forecast model.

*m*is an index of forecast lead time;

*x*and

_{f}*x*are the forecast and analysis fields, respectively; and

_{a}*τ*is the forecast lead time. This approximation differs from that used in DelSole et al. (2008) in that here we divide forecast errors by lead time (nevertheless, the two estimation methods give similar results).

_{m}DelSole et al. (2008) found that the short-term forecast errors at individual grid points deviated substantially from the linear growth assumption, but the climatological mean tendency errors were large scale. This result suggests that a more accurate estimate of tendency errors can be obtained by computing tendency errors for individual spectral coefficients rather than individual grid points, since the large-scale structures will project strongly on the mean tendency errors, and hence filter small-scale noise. Accordingly, we truncated the GFS forecasts to T62 and estimated the tendency error at each day for each spectral coefficient independently and individually using Eq. (1).

The correction terms were estimated from the operational GFS forecasts during the period 1 June 2005–28 February 2007. This period was chosen for the following reasons. Technically, the empirical correction terms should be recomputed each time the model changes. Unfortunately, the GFS model was updated 20 times during 2001–08, or about once every 6 months (at the time of writing, documentation of these changes could be found online at http://www.emc.ncep.noaa.gov/gmb/STATS/html/model_changes.html). The impacts of these changes on the forecast errors are not documented in detail. However, we pooled forecasts from different versions of the GFS if we believed the changes were small. In May 2005, GFS was changed significantly, including a new land model, higher horizontal model resolution, an enhanced orographic height, and modified vertical diffusion. In addition, several changes were made to the analysis scheme, including new satellite inputs. Changes to the GFS after May 2005 were less drastic. We estimated the correction terms using the data before and after May 2005, and detected significant differences. This led us to use only forecasts after May 2005. Although changes after May 2005 might also corrupt the estimates of instantaneous errors, we will show that the empirical correction terms clearly reduce the forecast bias, indicating that a significant part of the mean tendency errors after May 2005 were consistent between model changes.

The climatological tendency errors were determined by fitting the tendency error at every 6 h between 1 June 2005 and 28 February 2007 to a weighted sum of a constant term plus a sine and cosine with an annual period. The fits were performed for each spectral coefficient independently and individually using the method of least squares. Only fits that were statistically significant at the 5% significant level were retained; otherwise, the climatological tendency error of the spectral coefficient was set to zero.

In the experiments reported below, the empirical correction was applied to four prognostic variables: temperature *T*, zonal wind *U*, meridional wind *V*, and humidity *Q*. Because of our physical intuition that it is unwise to upset mass balance, surface pressure was not corrected. The correction terms were incorporated by integrating the GFS for one time step, then adding the product of estimated tendency error and model time step as an extra forcing term. The control experiments were conducted using the original GFS model without empirical correction.

One might be tempted assume that correction forcings for wind and temperature should obey thermal wind balance. However, the empirical correction terms represent tendencies, not states. There is little reason to believe that tendency errors should be in thermal balance. For instance, if all tendency errors arose from incorrect radiative warming, then this error could be eliminated completely by adding a suitable cooling term to the temperature equations. In this case, the correction terms would not obey thermal wind balance because the momentum equations would have no compensating forcing. More generally, just as the true diabatic heating and friction are not balanced, we do not expect the empirical correction terms, which effectively parameterize missing physics, to be balanced. While tendency errors in temperature eventually produce wind errors after a sufficiently long time, and the resulting errors may even be in thermal wind balance, the tendency errors that caused these long-term errors need not be balanced.

We explored simple flow-dependent models of tendency errors. However, the correlation coefficient between the anomalous tendency error (i.e., difference from the climatology) and simultaneous spectral coefficient was generally less than 0.1, indicating a negligible relation between tendency errors and simultaneous state. Accordingly, we concluded that simple flow-dependent corrections were not worthwhile.

*b*= 〈

*x*−

_{f}*x*〉,

_{a}*r*=

*x*−

_{f}*x*−

_{a}*b*, and the angle brackets denote a time average for fixed lead time, with the square brackets having the same meaning as above.

The verification period for this study is 1–30 March 2007, with each forecast initialized on 0000 UTC of each day. Consequently, the time average indicated by the angle brackets is equivalent to a 30-day average.

## 3. Results

Figure 1 shows a time series and the fitted climatology of the tendency errors in Eq. (1) of temperature at sigma level 0.2, corresponding to the spherical harmonic function of degree 2 and order 0 (or, equivalently, zonal wavenumber 0). For this spectral coefficient, the model tends to have a positive bias in boreal summer and a negative bias in boreal winter. The spread of the correction terms about the climatology for 6- and 12-h lead times are generally larger than those for 18- and 24-h lead times. The figure reveals that the tendency error for this spectral coefficient undergoes a substantial annual cycle. If accounted for in the model integration, it may reduce bias in this component.

To investigate the vertical structure of temperature biases, we computed the temperature bias at each grid point, squared them, and then computed the zonal average. The zonal mean of the square of the 5-day temperature bias in the control and corrected runs in the independent period of March 2007 is shown in Fig. 2. The strongest biases occur throughout the troposphere poleward of 30°. The difference between the squared biases in the control and corrected runs is shown in Fig. 2c. The figure shows that the difference is primarily negative, indicating that the empirical correction reduces the biases nearly everywhere.

Testing the statistical significance of the above results is not straightforward because the number degrees of freedom for the 30 forecasts is unclear given that because of the temporal autocorrelation of synoptic and planetary waves the 30 forecasts are not independent. An alternative method is to test the results using additional independent data. Accordingly, after this paper was submitted and returned for revisions, we applied the correction to another independent period, namely March 2008. We found that the vertical distribution of temperature bias was similar in the second independent period March 2008 (not shown). Analogous plots for wind and humidity (not shown) reveal that the empirical correction also reduces the bias in these variables, but the spatial structure of these biases differed between March 2007 and March 2008, suggesting that these reductions may not be statistically significant.

The horizontal structure of the temperature bias in the control and empirically corrected model at 5 days is shown in Fig. 3. Figure 3a reveals that the temperature bias of the control run is generally negative, indicating a cold bias. The temperature bias for the corrected run, shown in Fig. 3b, is substantially smaller in most regions. The spatial structure of the correction terms for temperature (Fig. 3c) is generally opposite to that of bias, but it is not a mere mirror image of bias.

Figures 4a,b shows the decomposition of error into total, bias, and random components for temperature and zonal wind in March 2007. We show results for sigma level 0.2 because the bias error is largest for this level (see Fig. 2). For temperature, the empirical correction reduces bias by a factor of about 50% at 5 days (Fig. 4a). However, the random error of corrected runs is slightly worse than that of the control runs. Thus, reduction in total MSE arises solely from reduction in bias. For zonal wind, the errors of the corrected runs are marginally smaller than those of the control runs (Fig. 4b). The analogous plot for meridional wind and humidity (not shown) also reveals little to no benefit from empirical correction.

Since the empirical correction has dominant impact on temperature errors and relatively little impact on wind errors, the question arises as to how much bias reduction would result with a temperature-only correction. To address this question, we conducted experiments with temperature-only correction. The result for March 2008 is shown in Figs. 4e,f. Comparison of Figs. 4c and 4e reveals that the temperature bias reduction is greater in the experiments with *U*, *V*, *T*, and *Q* corrections than with temperature-only correction. The wind biases are nearly the same whether we correct *U*, *V*, *T*, and *Q* or correct *T* only. Comparison of results for March 2007 yields the same conclusion (not shown). We conclude that temperature-only correction captures well over one-half of the bias reduction due to correcting *U*, *V*, *T*, and *Q*, but the latter gives the best bias reduction.

*gradients*in temperature bias. To demonstrate this fact, we show in Fig. 5 the square root of the spatially averaged, MSE in temperature gradient, called the gradient error (GE), defined aswhere the square brackets again denote averaging over the Northern Hemisphere. The figure shows that gradient error is nearly the same in the corrected and control runs. Since thermal wind relates temperature gradients to wind gradients, it follows that the empirical correction negligibly impacts the thermal wind because it has only a minor impact on the temperature gradient error. In essence, the empirical correction improves the spatially uniform part of the temperature bias and hardly affects the gradient errors.

It is of interest to compare the bias improvement due to empirical correction with that due to simply subtracting the bias from the forecast. Following Saha (1992), we call the latter method “after the fact” correction. We estimated the forecast bias at each lead time using from the forecasts initialized on 0000 UTC of each day in March 2006 (a total of 30). Figure 6 shows the bias as a function of forecast time for after-the-fact correction and empirical correction. The empirical correction produces less bias throughout the 5-day period than after-the-fact correction, for both temperature and zonal wind. This conclusion holds even if the 30 forecasts are split into two sets of 15 forecasts and the bias is computed separately for each set (figures not shown). We speculate that the reason that after-the-fact correction fails to remove as much bias as the empirical correction is that the bias was estimated only from one month in one year, whereas the empirical correction estimated the bias from fitting 21 consecutive months to a sinusoid with an annual period. While it may seem unfair to compare corrections derived from very different samples sizes, no other data are available for computing monthly biases: there is only one March in the period June 2005–February 2007, and operational forecasts earlier than May 2005 have significantly different biases. A more innovative after-the-fact correction might be constructed by fitting the biases to an annual cycle, thereby utilizing several months of data, but this is not the traditional method for correcting systematic errors. We emphasize that the empirical correction derived from 24-h forecast errors improves the bias for each lead time up to at least 5 days, whereas an after-the-fact correction requires estimating the bias at each lead time separately.

One counterintuitive result is that the empirical correction improves the bias while making the random error slightly worse (Fig. 4). This can be explained by the fact that the two forecasts have different variances. Consider Fig. 7a, which shows the square root of the spatially averaged variance of temperature as a function of lead time. The variance of the analysis is steady, while the variance of the uncorrected forecasts tends to decrease with lead time. In contrast, the variance of the corrected forecasts is larger than that of the control forecasts, and is closer to that of the reanalysis. In general, it is not fair to compare MSEs of two forecasts with very different variances. One way to compensate for variance differences is to divide the MSE by the sum of the variance of the forecast and verification—if the forecast is useless, then forecast and verification are independent and the normalized MSE is equal to 1. Thus, normalized MSE (NMSE) accounts for variations in MSE caused by variations in forecast variance. The NMSE for the corrected and control runs are shown in Fig. 7b. The figure shows that the empirical correction reduces the normalized random error after 4 days. This result implies that the increased random error in the corrected model is an artifact of the change in total variance of the two forecasts. Note that the empirical correction also improves the normalized bias.

## 4. Conclusions and discussion

This paper investigated the degree to which an empirical correction method can improve the GFS operational forecasts. The empirical correction method is to subtract the climatological tendency error of winds, temperature, and moisture at each time step, as estimated from forecast errors at 6, 12, 18, and 24 h. Simple flow-dependent models of tendency errors were investigated but rejected because the tendency errors in the spectral coefficients were only weakly correlated with the simultaneous state. We found that forcing the model with the negative of the climatological tendency errors significantly reduced the bias of temperature (e.g., a 50% reduction at 5 days). Analysis of horizontal and vertical maps of the forecast biases demonstrates that the empirical correction reduces the temperature biases nearly everywhere. The analogous maps for winds also reveal that the correction improves the wind biases, but the relative magnitudes were smaller and the structure was sensitive to the verification data. Thus, the empirical correction primarily improves the bias in temperature. The consistent reduction of temperature bias at most points and all lead times in two independent years strongly suggests that our results are not due to sampling fluctuation.

Interestingly, the corrected model had larger random errors than the control model. The increase in random error was attributed to the fact that the variance of the forecasts differed substantially between the two models. In general, it is not fair to compare MSEs of forecasts with different variances. One way to compensate for the change in variance is to divide the MSE by the sum of the variance of the forecast and verification—if the forecast is useless, then the forecast and verification are independent and the normalized MSE is equal to 1. We found that the MSE normalized in this way was (marginally) smaller for the corrected model than for the control model, implying that the increased random error in the corrected model is an artifact of the change in total variance of the two forecasts.

We conclude that forcing a model by (the negative of) its climatological tendency errors can substantially reduce the temperature bias, but generally does not reduce the random error variance. This conclusion is consistent with that of DelSole et al. (2008) and Saha (1992), despite numerous differences in models, methodology, initial conditions, and variables being corrected. In contrast, this conclusion differs significantly from that of Johansson and Saha (1989), DelSole and Hou (1999), Yang and Anderson (2000), and Danforth et al. (2007). We note that the latter models were idealized and not based on an operational forecast model. We suggest that the different conclusions arise from the fact that operational models generally have smaller biases than idealized models that have been used to justify the claim that empirical corrections can improve forecast skill [see DelSole et al. (2008) for further elaboration].

We found that empirical correction reduced the temperature bias more than after-the-fact correction. However, the bias in after-the-fact correction was estimated only from 1 month of data, whereas the empirical correction used a bias estimated from fitting 21 consecutive months to a sinusoid with an annual period. This difference in sample sizes was unavoidable: there is only one March in the period June 2005–February 2007, and operational forecasts earlier than May 2005 have significantly different biases. The problem of limited data for computing systematic errors is persistent in operational centers that continuously update forecast models. However, Saha (1992) and DelSole et al. (2008) show that reasonable empirical correction forcing can be estimated from just 1 month of data, suggesting a benefit to this type of correction strategy as compared with traditional after-the-fact-type strategies.

The benefits of empirical correction to medium- or seasonal-range forecast models might be greater than for short-range forecast models because the correction may reduce climate drift and thereby improve forecast skill. The result of applying an empirical correction to the NCEP Climate Forecast System will be reported in a future paper.

## Acknowledgments

This research is supported by the National Oceanographic and Atmospheric Administration (Grant NA06OAR4310001). We thank Cathy Thiaw and Bert Katz for helping us to obtain the GDAS on sigma coordinates and for running the numerical experiments reported in this paper. We also thank Editor Dr. Thomas Hamill and two anonymous reviewers for thoughtful comments that lead to substantial clarifications. We also thank the Center for Ocean–Land–Atmosphere Studies for providing computational resources required by this project.

## REFERENCES

Danforth, C. M., , E. Kalnay, , and T. Miyoshi, 2007: Estimating and correcting global weather model error.

,*Mon. Wea. Rev.***135****,**281–299.DelSole, T., , and A. Y. Hou, 1999: Empirical correction of a dynamical model. Part I: Fundamental issues.

,*Mon. Wea. Rev.***127****,**2533–2545.DelSole, T., , M. Zhao, , P. A. Dirmeyer, , and B. P. Kirtman, 2008: Empirical correction of a coupled land–atmosphere model.

,*Mon. Wea. Rev.***136****,**4063–4076.Johansson, A., , and S. Saha, 1989: Simulation of systematic error effects and their reduction in a simple model of the atmosphere.

,*Mon. Wea. Rev.***117****,**1658–1675.Leith, C. E., 1978: Objective methods for weather prediction.

,*Annu. Rev. Fluid Mech.***10****,**107–128.Moorthi, S., , H-L. Pan, , and P. Caplan, 2001: Changes to the 2001 NCEP operational MRF/AVN global analysis/forecast system. NWS Tech. Procedures Bull. 484, 14 pp. [Available online at http://www.nws.noaa.gov/om/tpb/484.htm.].

Saha, S., 1992: Response of the NMC MRF model to systematic-error correction within integration.

,*Mon. Wea. Rev.***120****,**345–360.Saha, S., and Coauthors, 2006: The NCEP Climate Forecast System.

,*J. Climate***19****,**3483–3517.Yang, X-Q., , and J. L. Anderson, 2000: Correction of systematic errors in coupled GCM forecasts.

,*J. Climate***13****,**2072–2085.