## 1. Introduction

It is well known that coupled ice–ocean–land–atmosphere models exhibit climate drift; that is, their mean states drift from the observed climatological state. To suppress drift in coupled ocean–atmosphere models, Manabe et al. (1991) proposed *the flux adjustment* method and Kirtman et al. (1997) proposed the *anomaly coupling* method. The question arises as to whether suppression of climate drift leads to improved forecasts and simulations. Furthermore, how do different methods for suppressing climate drift compare with each other? The purpose of this paper is to shed light on these questions by investigating the performance of three *simple* correction strategies within the context of a coupled land–atmosphere model. A focus on coupled land–atmosphere models is appropriate because, despite successes in reducing drift in coupled ocean–atmosphere models, precipitation biases have been only modestly reduced (Phillips and Glecker 2006) and, consequently, coupled land–atmosphere models still exhibit significant drift. This lag in model development arises for several reasons, including a lack of global verification datasets for land surface variables, particularly soil moisture; the greater complexity and nonlinearity of sensible and latent heat fluxes at the land surface compared to the ocean surface; and the great variation of land surface dynamics in space and time. As long as these difficulties remain, empirical correction strategies for land–atmosphere models will remain attractive.

A novel strategy for reducing climate drift was proposed by Leith (1978). In this approach, the tendency errors of a model, that is, the errors in the rate of change of a variable, are fitted to a linear function of the prognostic variables, and then the resulting model for the tendency errors is subtracted from the dynamical equations to produce a new set of equations with less tendency error. Leith (1978) showed that the resulting empirically corrected model would preserve the mean and preserve the instantaneous rate of change of second moments. This strategy has been shown to reduce the forecast error and improve the simulation quality of idealized imperfect models (Faller and Lee 1975; Faller and Schemm 1977; DelSole and Hou 1999; Achatz and Branstator 1999). Unfortunately, estimation of the full linear model for a climate system is prohibitive, although recently Danforth et al. (2007) proposed promising approximate methods. In this paper we apply Leith’s method only for state-independent correction terms, which is equivalent to using a forcing term equal to (minus) the climatological mean tendency error. To estimate this term, Klinker and Sardeshmukh (1992) used the average one-time-step forecast, starting at each 6-hourly analysis within that month. This novel method decouples errors among different parameterizations and different locations, since the model is integrated only for one time step. However, Klinker and Sardeshmukh (1992) found considerable similarity between the 1-day forecast error and the tendency errors computed from one-time-step integrations. Consequently, we estimate the tendency errors from 1-day forecasts. We call this correction strategy *nudging based on tendency errors.*

In addition to the above strategy, we consider two others: 1) relaxation and 2) nudging based on long-term biases. In the *relaxation method,* the state is relaxed toward a reference state at a rate that is chosen empirically. In *nudging based on long-term biases*, the state is nudged at a rate that opposes the biases that develop over the time scale of interest. The relative merits of these strategies are straightforward to understand. Relaxation methods adaptively adjust the state toward the climatology, but also damp deviations about the climatology. Nudging based on long-term biases opposes the biases that ultimately have the largest amplitude, but the dynamical response to such nudging may be highly nonlinear and nonlocal, and thus may not behave as planned. Nudging based on tendency errors captures the fastest growing errors in the early stages of development, before nonlinear and nonlocal effects become important, but the tendency errors that contribute to long-term biases may not be detectable in the day-to-day variations of tendency errors. Nudging based on tendency errors has no tunable parameters, whereas the other two methods involve tunable parameters associated with relaxation rates.

If a dynamical model were linear, then empirical correction would remove *only* the bias and hence provide no advantages compared to after-the-fact correction, such as simply replacing the climatological mean forecast with the observed climatological mean state. Hence, bias correction is worthwhile only to the extent that the model is nonlinear. Unfortunately, the available studies do not consistently support the hypothesis that bias correction improves forecast skill. For instance, Johansson and Saha (1989) found that a state-independent correction in a barotropic model improved forecast skill, whereas Saha (1992) found that slowly varying corrections in an atmospheric GCM did not. DelSole and Hou (1999) found that state-independent corrections did not improve forecast skill, but this result might be due to the fact that the contrived model errors were state dependent. Yang and Anderson (2000) found that state-independent nudging for ocean temperature in a coupled atmosphere–ocean GCM reduced bias and improved skill in the tropical Pacific. Similarly, Danforth et al. (2007) found that nudging based on the climatologically averaged tendency errors could improve the anomaly correlation of simple atmospheric models. These mixed results demonstrate that the effectiveness of empirical correction depends on the forecast model. Therefore, only direct verification with a realistic land–atmosphere model can settle whether bias correction can improve forecast skill.

Our investigation of empirical correction strategies can be regarded as more definitive than previous studies in several aspects. First, we apply corrections to the coupled system—that is, to both the atmospheric and land components of the model. Second, our sample size is large; the correction coefficients are estimated from 10 years of daily forecasts, and then tested on 10 independent years. Third, more than one correction strategy is investigated. Fourth, the impact of the empirical correction method on different time scales is examined, from days to seasons.

In the next section, we review the dynamical model and datasets used in this study. The empirical correction strategies are discussed in detail in section 3, and our measures of forecast performance are discussed in section 4. Our main results are discussed in section 5. We conclude with a summary and discussion. In a forthcoming companion paper (Zhao et al. 2008, manuscript submitted to *J. Hydrometeor.,* hereafter ZDD), we report upon our results from applying the empirical correction method to the land and atmospheric components separately, which sheds light on dynamical interactions between the land and atmosphere.

## 2. Dynamical model and data

The dynamical model used in this study is the Center for Ocean–Land–Atmosphere land–atmosphere model, version 3.2 (COLA V3.2), with specified sea surface temperatures. The details of this model are reviewed in Misra et al. (2007). Among the major differences relative to the earlier version, we mention that V3.2 predicts soil temperature and wetness in six layers (instead of three), with four embedded within the root zone. In addition, several atmospheric model parameters were tuned to produce reasonable energy balances over the oceans.

The dataset used to validate the atmospheric forecast is the National Centers for Environmental Prediction–National Center for Atmospheric Research (NCEP–NCAR) reanalysis product (Kalnay et al. 1996). To avoid errors in the atmospheric fields arising from the interpolation of data to the model grid, we extracted the reanalysis data in spectral form on the same 28 sigma levels as were used in the data assimilation system. Furthermore, the COLA V3.2 model was run at exactly the same resolution as the NCEP–NCAR reanalysis model, namely T62L28, with identical topography.

The dataset used to validate the land forecast is the Global Offline Land Surface Dataset (GOLD; Dirmeyer and Tan 2001), version 2. An updated GOLD dataset using the same land model as the COLA V3.2 was generated specifically for this project to ensure consistency between the offline and COLA V3.2 land surface models, and to provide data at 6-hourly intervals. We consider forecasts only during the summer months of June–August (JJA). The empirical correction terms were estimated during the period 1982–91, while the empirically corrected model was verified using data from the independent period 1992–2001.

## 3. Estimation of the correction coefficients

**x**is a state vector and

*g*(

**x**) is a tendency (i.e., a rate of change) computed from a dynamical model, such as a general circulation model. We seek a forcing

*ϵ*such that the model

*ϵ*to be identified as the tendency error. The tendency error is estimated as the slope of the least squares line fit between forecast errors and lead time. The fitting uses the 6-, 12-, 18-, and 24-h forecast errors, and is performed at each grid point and sigma level individually. We find that model (2) is unstable if the daily tendency error is used. Stable integrations can be produced by smoothing the tendency errors in time using the formula

*f*is the estimated tendency error on the

_{N}*N*th day, and

*N** is a fixed constant. This equation is an autoregressive model for

*ϵ*with a decorrelation time of

*N** days (where the “decorrelation time” is the sum of the autocorrelations over all lead times). Results for

*N** = 1 and 2 were indistinguishable from each other, while those for

*N** = 5 were systematically worse (in a mean square error sense); hence, results are presented only for

*N** = 2.

A critical issue concerns the initialization of the model. We adopt the following procedure, which is illustrated in Fig. 1. For each initial condition, two 24-h forecasts are computed: one based on the uncorrected model (1) and one based on the corrected model (2). The uncorrected 24-h forecast gives the 6-, 12-, 18-, and 24-h forecast errors, from which the tendency errors for that day are computed, as described in the previous paragraph. Having computed the tendency error for the *N*th day, denoted *f _{N}*, the recursive relation (3) is used to compute the forcing term

*ϵ*for that day. [For the very first forecast, we initialize the recursive relation (3) with

_{N}*ϵ*

_{0}=

*f*

_{1}, which gives

*ϵ*

_{1}=

*f*

_{1}.] Then, the corrected model (2) is integrated for 24 h using the forcing term

*ϵ*for the appropriate day. The end of the 24-h corrected forecast is then used as the initial condition for the next iteration of this procedure.

_{N}Initial conditions produced by the above procedure can be interpreted as model states nudged toward the instantaneous analysis. Hence, it is appropriate to call an initial condition generated this way an assimilation. This assimilation is reminiscent of the incremental analysis update method of Schubert et al. (1993). In ZDD, we discuss experiments in which only subsets of variables were assimilated, as tabulated in Table 1.

The empirical correction will be applied only to three atmospheric variables—temperature (T0), zonal velocity (U0), and meridional velocity (V0)—and to two land variables—soil temperature (ST) and soil wetness (SW). Correction of the water vapor concentration had little impact on the forecast and was considered inappropriate given its questionable accuracy in the reanalysis. Surface pressure was not corrected because of technical difficulties arising from its being integrated in spectral space, in contrast to the other variables that are integrated in grid space. Applying corrections to the top atmospheric levels produced instability. Accordingly, corrections were applied only to the lowest 22 sigma levels, corresponding to sigma values of 0.08–1.0.

Since the reanalysis is not “truth,” the estimated tendency error is not the true tendency error, but rather the tendency *difference* between the reanalysis and the model. Thus, any bias in the reanalysis will contaminate the estimated *ϵ* term. However, we will show that the empirical correction method substantially reduces biases beyond lead times from which the tendency errors were estimated, suggesting that the true tendency errors are captured by the estimation procedure.

*τ*is a relaxation time scale and

_{R}**x**

*is a “climatological” field taken to be the monthly average field for the 1982–91 period. A value of*

_{c}*τ*= 5 days was chosen because this is comparable to the time scale of the error growth of atmospheric variables. The resulting runs are identified as “relax to climatology” in Table 1.

_{R}*b*is the monthly mean uncorrected forecast minus the monthly mean reanalysis. The associated runs are identified as “nudging, long term” in Table 1.

_{M}*a*and

*b*are chosen to minimize 〈(

*ϵ*−

_{N}*ax*−

*b*)

^{2}〉 at each grid point individually. The resulting model is denoted “linear, flow dependent” in Table 1.

We attempted to include a correction that depended on the time of day, by fitting forecast errors at 6, 12, 18, and 24 h to the sine and cosine functions with 24-h periods. Unfortunately, the resulting empirically corrected model performed worse than did models with corrections that were independent of the time of day. This poor performance is likely due to the fact that 6 h is too short to resolve the diurnal cycle (it is only one harmonic separated from the Nyquist frequency). These runs will not be discussed in the remainder of the paper.

## 4. Skill measures

*x*and

_{f}*x*are the forecast and analysis fields, respectively; the brackets 〈〉 denote a time average while holding the lead time fixed; and

_{a}*σ*

^{2}

_{f}and

*σ*

^{2}

_{a}are the 10-yr variances of the forecast and analyses, respectively, at each grid point, sigma level, and calendar day. We also define a normalized random squared error as

The skill of weather predictions is measured by applying the above metrics to instantaneous states *x _{f}* and

*x*. The skill beyond weather prediction can be measured by interpreting the states

_{a}*x*and

_{f}*x*as monthly or seasonal mean states, with the brackets 〈〉 denoting 10-yr averages of these states. We will apply these metrics separately to the total field and anomaly field, defined as the total field minus the 10-yr average field.

_{a}Results for atmospheric variables are displayed on horizontal cross sections on sigma level 0.13, corresponding to approximately 130 hPa, and on vertical cross sections of zonal mean quantities. Results for land variables are displayed on the second soil level, which has depths ranging from 4 to 14 cm, depending on the vegetation. Results for other soil layers mimic those for the second soil level, except for a general decrease in error amplitude with depth.

## 5. Results

### a. Comparison of different empirical correction schemes

We now show the performance of various empirical correction schemes. The root-mean-square errors (i.e., RMSEs) of selected schemes are shown in Fig. 2. In general, results for U0 and V0 are similar and so only V0 is shown. First, note that the time scale of the error growth is about 5 days for V0 and longer for other variables, suggesting that the time scale *τ _{R}* used in relaxation and nudging schemes should be no smaller than 5 days.

Second, note that the errors for different runs are similar, except for temperature errors in the run corresponding to nudging based on long-term biases (the “v run” indicated by gray triangles). In the latter run, the correction scheme enhances the error in the atmospheric temperature in the second week of the integration, in contrast to other schemes. Furthermore, the run becomes numerically unstable after 15 July. (The “kink” at 1 July is due to the abrupt increase in the monthly bias across the month boundary.) The performance of this scheme might be improved by tuning the time scale *τ _{R}*, but such tuning probably could not improve the errors at the 2-week time scale and at longer time scales simultaneously, because of their opposite relations to the error of the uncorrected model. Similar results are found at other model levels. These results show that simply forcing a model with the rescaled and sign-reversed monthly biases does not improve the model forecasts at all time scales. Accordingly, we give no further consideration to this scheme.

Third, note that errors for the linear, flow-dependent correction (i.e., the t run) are nearly indistinguishable from those for nudging based on tendency errors (i.e., the p run). Consistent with this, the absolute correlation between the instantaneous tendency error *ϵ _{N}* and the local state are generally less than 0.2, implying a relatively small linear relation between forecast error and local state. Similar results were found at other model levels. Since linear, flow-dependent correction schemes involve more parameters without clear benefits in forecast skill, we conclude that they are not worthwhile, at least for this model, and we consider them no further.

Fourth, note that the errors in the wind variables for the relaxation method (i.e., the u run) are less than those for nudging based on tendency errors (i.e., the p run). In contrast, relaxation tends to degrade the errors in land variables, especially in the first week. This early degradation is probably due to the fact that the land variables evolve on slower time scales, implying that the 5-day relaxation time is too short for land variables. We suspect that a longer relaxation time, say 10–15 days, could improve the errors of the land variables, but could degrade errors in the wind variables. Presumably, applying different relaxation rates to different variables could improve the overall forecast, but this entails further experimentation and tuning.

The decomposition of the error into total, bias, and random components is shown in Fig. 3. Nudging based on tendency errors consistently reduces the bias more than do relaxation methods (the tiny exceptions to this rule in the wind variables are not considered to be statistically significant). In contrast, relaxation consistently reduces the random error more than nudging based on tendency errors. The reason for this difference will be discussed shortly.

The globally averaged variances [*σ*^{2}* _{f}*] and [

*σ*

^{2}

*] as a function of lead time are shown in Fig. 4. We see that the relaxation scheme (i.e., the u run) significantly underestimates the observed variances of most variables, presumably because the associated correction term (5) contains linear dissipation. In contrast, nudging based on tendency errors tends to preserve the variance or slightly underestimate it. The fact that relaxation fails to preserve variance diminishes its usefulness as an empirical correction strategy. Furthermore, if we account for this damping by normalizing the random error variance by the sum of the individual variances, as in (12), then we find that the normalized random error variance is always larger for the relaxation scheme than for nudging based on tendency errors (see Fig. 5). Thus, the reduction in random error variance due to relaxation is an artifact of the decrease in variance. In contrast, the normalized random error variance for nudging based on tendency errors is nearly the same as for the uncorrected model.*

_{a}In summary, nudging based on long-term biases is not a suitable correction strategy because it produces clearly inferior mean square errors in temperature (see Fig. 2) and leads to numerical instability. Also, relaxation is not a suitable correction scheme because it excessively damps forecast variance (see Fig. 4). These conclusions hold consistently for different levels and all local regions we have examined. Having eliminated two out of the three correction strategies, we conclude that the best correction strategy is nudging based on tendency errors. This scheme performs nearly the best in each case for the variables SW, ST, and T0, and it preserves the observed variance of all variables, unlike relaxation methods. In addition, nudging based on tendency errors does not involve adjustable tuning parameters (the parameter *N** is used for estimating the errors; it is not a parameter in the correction scheme). Accordingly, in the remainder of this paper, we focus primarily on nudging based on tendency errors (i.e., the p run).

### b. Nudging based on tendency errors

The global mean square errors of models with monthly updated nudging terms were virtually indistinguishable from those based on daily updated nudging terms (p versus m runs). This result, plus the marginal improvement due to linear, flow-dependent corrections, implies that the tendency error is dominated by a nearly constant term, at least in the variables ST, SW, and T0. Interestingly, empirically correcting all variables except the wind variables U0 and V0 gave virtually identical mean square errors to runs in which all variables were nudged, suggesting that empirical correction of winds has relatively little benefit.

The spatial structure of the bias in the control and empirically corrected models is shown in Fig. 6. The top two panels in Fig. 6 show that empirical correction removes the long-term temperature bias at upper levels of the atmosphere. The analogous plot for zonal wind and meridional wind (not shown) reveals no overall bias reduction, consistent with Fig. 2, although the spatial structure of the bias differs. The remaining panels in Fig. 6 reveal that the control model has a significant warm–dry bias on all continents, which is substantially reduced in the empirically corrected model. On the other hand, the cold–wet biases in the control model are aggravated in the empirically corrected model (e.g., Alaska, northern Canada, and Russia). The cause of the latter asymmetry appears to be due to the fact that the precipitation in the corrected model generally increases over land regions that have a significant moisture bias. Specifically, in the first month of integration, the correction does indeed reduce the cold–wet biases. However, the precipitation field (not shown) also increases over regions with a wet bias and thus reinforces the original bias. At the end of the 3-month integration, the wet bias reemerges owing to the enhanced precipitation. The precipitation also is enhanced over the dominant dry regions, but this opposes the original bias. We do not understand why the model precipitation responds in this way.

The August mean correction coefficients derived from the 1982–91 assimilation runs for atmospheric temperature, soil temperature, and soil wetness are illustrated in Fig. 7 (other months give nearly the same structures). The atmospheric temperature correction terms act to warm the upper troposphere and cool the lower troposphere, consistent with counteracting the strong cooling bias in the upper troposphere seen in Fig. 6. The soil temperature and soil wetness correction coefficients are predominantly mirror images of the associated biases seen in Fig. 6. The zonal momentum correction terms (not shown) are dominated by small-scale structures everywhere except in the south polar regions. However, the error of the zonal momentum is actually worse in many regions in the corrected run than in the control run. Similarly, the meridional correction terms (not shown) are localized to the tropics, but the resulting empirically corrected run is not better in an RMSE sense than the control run. Interestingly, the average 1-day forecast errors, shown in the right panels in Fig. 7, have nearly the same structure as the tendency errors estimated by fitting the 6-, 12-, 18-, and 24-h forecasts errors to a line (the left panels in Fig. 7), suggesting that the structure of the tendency errors is roughly the same on time scales up to 1 day and, thus, can be estimated with simple averaging methods.

The spatial structure of the correction terms for atmospheric temperature (top panels in Fig. 7) differs from that of the long-term biases (top left panel in Fig. 6). Thus, the forcing for removing biases in atmospheric temperature is not a mirror image of the long-term biases, owing to the difference between short- and long-term biases.

### c. Skill of seasonal mean forecasts

In this section we consider the skill of forecasts of seasonal means. First, we consider the skill of absolute seasonal means; no climatological mean state is subtracted from either the forecast or analyses. The mean square error normalized by the sum of the forecast variance and analysis variance is shown in Fig. 8. We see that the skill of the model with nudging based on tendency errors (i.e., p runs) is superior to either the original forecast (i.e., a runs) or models with relaxation (i.e., u runs). However, the NMSE of all forecasts exceeds one, indicating no skill.

Now consider the forecast skill of anomalies about the 1992–2001 mean, shown in Fig. 9 and measured by NRSE in (12). First note that the NRSE values are much smaller than those without subtraction of the climatology. This result implies that bias correction due to empirical correction is not as effective as after-the-fact correction. We also applied a cross-validated after-the-fact correction, in which the climatology being subtracted is estimated from all years except the forecast year, but this analysis simply raised the error amplitudes without changing the order of the errors. Second, the performance levels of empirical correction strategies are mixed: Nudging based on tendency errors (i.e., p runs) improves NRSE almost all the time in soil temperature but rarely in atmospheric temperature or soil wetness, whereas relaxation improves NRSE almost all the time in soil wetness and rarely in atmospheric temperature. Mixed results also are obtained if pattern correlation is used to measure skill.

Maps of the correlation between the seasonal mean forecast and verification at each grid cell (not shown) are very similar for corrected and uncorrected forecasts, implying that the empirical correction has only a minor impact on the local correlation skill of seasonal mean forecasts.

The above results show that the investigated correction methods do not lead to consistent improvements in seasonal anomaly forecasts relative to simple postprocessing methods, at least for this model and with our correction strategy. Furthermore, empirical correction does not avoid the need for postprocessing methods, since further improvements in skill can be made by subtracting the climatological mean bias from the empirically corrected forecast.

### d. Sensitivity to sample size and initial condition

To test the sensitivity of the results to sample size, we estimated the tendency error in each month from 10 one-day forecasts, starting on the 15th of each month. The initial conditions were obtained from the assimilation (i.e., j runs). The resulting NMSE values (i.e., w runs), shown in Fig. 10, reveal that nudging estimated from 1 calendar day of data performs almost as well as nudging based on 30 calendar days of data (per month). However, this conclusion is somewhat misleading because the relevant initial conditions were obtained from prior nudged forecasts, which require integrations longer than 1 day. To check the sensitivity of the results to the type of initial conditions, we repeated the above experiments, except this time using initial conditions drawn directly from the reanalysis, with no initialization or nudging. These experiments constitute the z runs in Table 1 and the corresponding RMSEs are shown in Fig. 10. We see that nudging based on tendencies as computed from initial conditions drawn directly from the reanalysis does not perform as well as that based on tendencies computed from nudged initial conditions. Inspection of the correction coefficients (not shown) reveals that the correction terms have similar structure to those based on the nudged initial conditions (i.e., similar to those shown in Fig. 7), but with an amplitude about 5 times smaller. The smaller amplitudes plausibly explain why the correction is not as effective as the corrections with larger amplitudes. It is ironic that less accurate initial conditions lead to better correction coefficients.

A plausible explanation for the above result is the following. The nudged initial conditions were constructed by nudging the model toward the 6-hourly reanalysis at each step. However, the correction terms do not fully compensate for the bias—there is a drift away from the observed climatology—so the model state drifts away from the reanalysis states, thereby increasing the apparent tendency errors. Therefore, there is some merit in inflating the correction terms estimated from tendency errors, depending on the type of initial condition, in order to compensate for both the biases and the transient adjustment. We verified that the correction coefficients estimated from the assimilations produced with *N** = 5 days (i.e., the e runs) were nearly the same as those with *N** = 2 days (i.e., the j runs), even though the drift in the 5-day case is much larger than the drift in the 2-day case. In addition, we performed an additional experiment in which we inflated by 70% all correction coefficients derived from the 2-day time-scale filter (these are the y runs in Table 1), and found that the resulting model performed worse than those without inflation. Thus, the RMSE is degraded when the coefficients for the p runs are increased (y runs) or decreased (z runs), suggesting that our estimation strategy is optimal.

## 6. Summary and discussion

This paper investigated empirical correction strategies based on adding forcing terms to an atmospheric GCM, and tested the hypothesis that a bias correction can improve forecast skill. We found that the best empirical correction scheme was nudging based on tendency errors. This method involves estimating the tendency errors of prognostic variables based on short forecasts—say lead times less than 24 h—and then subtracting these tendency errors at every time step. We focused primarily on state-independent corrections in which the correction term equals (minus) the climatological mean tendency error. This method significantly reduced biases in long-term forecasts of temperature and soil moisture, and preserved the variance of the forecast field, unlike relaxation methods. Nudging based on long-term biases degraded the skill on 2-week time scales and produced numerical instabilities. Linear, local, flow-dependent correction terms yielded no detectable improvement in a mean square error sense compared to nudging based on tendency errors. Nudging based on tendency errors was just as effective if terms were updated monthly rather than daily, and even if corrections for the momentum equations were omitted. These results indicate that most bias arises from errors in the model thermodynamics.

The forecast skill of empirically corrected models was investigated in detail. In the first 5 days, little or no improvement in random error variance was detected. Beyond 2 weeks, the random error saturates and the improvement in the mean square error arises primarily from bias correction. If the climatological means of the forecasts are not subtracted, then nudging based on tendency errors clearly improves the mean square errors of the monthly and seasonal means. If the climatological means of the forecasts are subtracted, so that only anomalies are compared, then empirical correction methods do not consistently improve the forecast skill. Moreover, improvement due to subtracting the mean bias is greater than that due to empirical correction alone. Results for global averages and seasonal means were shown, but examination of local geographic averages and monthly means yielded the same conclusions. Also, the corrected model did not always improve the skill on a point-by-point basis. These results lead us to conclude that, for our particular model and correction strategies, the primary benefit of empirical correction methods is to reduce biases; they do not consistently improve forecast skill of long-term averages.

Remarkably, the mean tendency error computed from ten 1-day forecasts yielded nearly the same reduction in mean square error as tendency errors estimated from 10 yr of daily data in each month. However, the tendency errors estimated from initial conditions drawn directly from the reanalysis were about a factor of 5 less than those from the nudged initial conditions, though they had similar structures. This amplitude discrepancy is consistent with the fact that the nudged initial conditions drift away from the reanalysis, thereby increasing the magnitude of the apparent tendency errors. It is ironic that less accurate initial conditions lead to better correction coefficients. Further experiments with correction coefficients that were either amplified or damped produced worse forecasts, suggesting that our estimation strategy is optimal.

Our main conclusion—that nudging based on tendency errors fails to improve random error variance—agrees with some studies but contradicts others. Unfortunately, different studies use different models, data, and methodologies, so comparison is not straightforward.

Nevertheless, we suggest that the contradictory results can be explained by assuming that a bias correction can improve forecast skill only if the bias is sufficiently large. That is, a large bias implies strong climate drift and relatively useless forecasts, whereas a small bias probably does not degrade forecast skill. For example, Saha (1992) and DelSole and Hou (1999) found that nudging based on tendency errors did not substantially improve skill, consistent with the fact that the bias in their models was always less than 10% for time scales less than 5 days. In contrast, Yang and Anderson (2000) found that state-independent nudging improved skill, but their original model could not beat the skill of a persistence forecast for the Niño-3 region in the first 3 months. Similarly, Danforth et al. (2007) found that state-independent correction improved skill, but the bias in their models presumably was large since they used idealized models (such as a quasigeostrophic model) to forecast the NCEP–NCAR Reanalysis. Although Johansson and Saha (1989) found that a state-dependent correction improved forecast skill, their bias fluctuated from being dominant on short time scales to constituting about 30% of the total error after 20 days.

In addition to the magnitude of the bias, there are several other model-dependent aspects of this study that could affect the results. First, the COLA V3.2 model is not an operational weather forecast model and thus may have larger errors than operational models. Also, analyses used to verify the forecasts were generated by a data assimilation system that used a different dynamical model. Inconsistencies between the initial conditions and dynamical model could lead to spurious gravity waves that dominate the initial tendency errors. Finally, tendency error were estimated at individual grid points, whereas less noisy estimates might be obtained using large-scale spherical harmonics, which would filter small-scale noise. These issues will be addressed in a forthcoming paper (Yang et al. 2008) in which empirical correction is applied to National Oceanic and Atmospheric Administration’s (NOAA) Global Forecast System (GFS) using observations that have been assimilated with the same model.

Nudging based on tendency errors does not directly attribute systematic errors to specific physical processes. Nevertheless, Klinker and Sardeshmukh (1992) used similarities between tendency errors and the tendencies from physical parameterizations to infer that a major source of error in their model arose from the specification of orographically induced gravity wave drag. Similarly, our method might be useful in model development since it represents the error in the form of a tendency that can be compared directly to other terms in the tendency equation.

## Acknowledgments

We thank Zhichang Guo for providing the updated GOLD dataset used in this work and Larry Marx for extensive assistance on coding. We also thank two anonymous reviewers for their comments that led to substantial improvements in the presentation. This research was supported by the National Science Foundation (ATM0332910, EAR-0233320), National Aeronautics and Space Administration (NNG04GG46G), and the National Oceanic and Atmospheric Administration (NA04OAR4310034).

## REFERENCES

Achatz, U., and G. Branstator, 1999: A two-layer model with empirical linear corrections and reduced order for studies of internal climate variability.

,*J. Atmos. Sci.***56****,**3140–3160.Danforth, C. M., E. Kalnay, and T. Miyoshi, 2007: Estimating and correcting global weather model error.

,*Mon. Wea. Rev.***135****,**281–299.DelSole, T., and A. Y. Hou, 1999: Empirical correction of a dynamical model. Part I: Fundamental issues.

,*Mon. Wea. Rev.***127****,**2533–2545.Dirmeyer, P. A., and L. Tan, 2001: A multi-decadal global land-surface data set of state variables and fluxes. COLA Tech. Rep. 102, 43 pp. [Available from the Center for Ocean–Land–Atmosphere Studies, 4041 Powder Mill Rd., Suite 302, Calverton, MD 20705.].

Faller, A. J., and D. K. Lee, 1975: Statistical corrections to numerical prediction equations.

,*Mon. Wea. Rev.***103****,**845–855.Faller, A. J., and C. E. Schemm, 1977: Statistical corrections to numerical prediction equations. II.

,*Mon. Wea. Rev.***105****,**37–56.Johansson, A., and S. Saha, 1989: Simulation of systematic error effects and their reduction in a simple model of the atmosphere.

,*Mon. Wea. Rev.***117****,**1658–1675.Kalnay, E., and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

,*Bull. Amer. Meteor. Soc.***77****,**437–471.Kirtman, B. P., J. Shukla, B. Huang, Z. Zhu, and E. K. Schneider, 1997: Multiseasonal predictions with a coupled tropical ocean–global atmosphere system.

,*Mon. Wea. Rev.***125****,**789–808.Klinker, E., and P. D. Sardeshmukh, 1992: The diagnosis of mechanical dissipation in the atmosphere from large-scale balance requirements.

,*J. Atmos. Sci.***49****,**608–626.Leith, C. E., 1978: Objective methods for weather prediction.

,*Annu. Rev. Fluid Mech.***10****,**107–128.Manabe, S., R. J. Stouffer, M. J. Spelman, and K. Bryan, 1991: Transient responses of a coupled ocean–atmosphere model to gradual changes of atmospheric CO2. Part I: Annual mean response.

,*J. Climate***4****,**785–818.Misra, V., and Coauthors, 2007: Validating and understanding the ENSO simulation in two coupled climate models.

,*Tellus***59A****,**292–308. 10.1111/j.1600-0870.2007.00231.x.Phillips, T. J., and P. J. Glecker, 2006: Evaluation of continental precipitation in 20th century climate simulations: The utility of multimodel statistics.

,*Water Resour. Res.***42****.**W03202, doi:10.1029/2005WR004313.Saha, S., 1992: Response of the NMC MRF model to systematic-error correction within integration.

,*Mon. Wea. Rev.***120****,**345–360.Schubert, S. D., R. B. Rood, and J. Pfaendtner, 1993: An assimilated dataset for earth science applications.

,*Bull. Amer. Meteor. Soc.***74****,**2331–2342.Yang, X., T. DelSole, and H-L. Pan, 2008: Empirical correction of the NCEP Global Forecast System.

, in press.*Mon. Wea. Rev.*Yang, X-Q., and J. L. Anderson, 2000: Correction of systematic errors in coupled GCM forecasts.

,*J. Climate***13****,**2072–2085.

RMSEs of soil wetness (sw), soil temperature (st, K), global atmospheric temperature (t0, K), and global atmospheric meridional velocity (v0, m s^{−1}), as a function of time. The five curves within each panel indicate the following: black and plain, control run with no correction (a run); red and plus sign (+), nudging based on 2-day filtered tendency errors (p run); blue and circle, relaxation to climatology (u run); green and square, linear, flow-dependent correction (t run); and gray and triangle, nudging based on long-term error (v run). The model level of each variable is indicated in each panel.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSEs of soil wetness (sw), soil temperature (st, K), global atmospheric temperature (t0, K), and global atmospheric meridional velocity (v0, m s^{−1}), as a function of time. The five curves within each panel indicate the following: black and plain, control run with no correction (a run); red and plus sign (+), nudging based on 2-day filtered tendency errors (p run); blue and circle, relaxation to climatology (u run); green and square, linear, flow-dependent correction (t run); and gray and triangle, nudging based on long-term error (v run). The model level of each variable is indicated in each panel.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSEs of soil wetness (sw), soil temperature (st, K), global atmospheric temperature (t0, K), and global atmospheric meridional velocity (v0, m s^{−1}), as a function of time. The five curves within each panel indicate the following: black and plain, control run with no correction (a run); red and plus sign (+), nudging based on 2-day filtered tendency errors (p run); blue and circle, relaxation to climatology (u run); green and square, linear, flow-dependent correction (t run); and gray and triangle, nudging based on long-term error (v run). The model level of each variable is indicated in each panel.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSE averaged over the period 1992–2001 (solid), and decomposed into bias (dotted) and random (dashed) components. The black curves identify the uncorrected model (a runs), the red curves identify nudging based on tendency errors (p runs), and the blue curves identify the relaxation runs (u runs). The model level of each variable is indicated in each panel.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSE averaged over the period 1992–2001 (solid), and decomposed into bias (dotted) and random (dashed) components. The black curves identify the uncorrected model (a runs), the red curves identify nudging based on tendency errors (p runs), and the blue curves identify the relaxation runs (u runs). The model level of each variable is indicated in each panel.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSE averaged over the period 1992–2001 (solid), and decomposed into bias (dotted) and random (dashed) components. The black curves identify the uncorrected model (a runs), the red curves identify nudging based on tendency errors (p runs), and the blue curves identify the relaxation runs (u runs). The model level of each variable is indicated in each panel.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Square root of area-weighted spatial variance computed over the period 1992–2001 for the same fields used in Fig. 2. The four curves in each panel indicate fields from the following: black and plus sign, control run with no correction (a run); red and circle, nudging based on 2-day filtered tendency errors (p run); blue and square, relaxation to climatology (u run); and green and plain, the reanalysis

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Square root of area-weighted spatial variance computed over the period 1992–2001 for the same fields used in Fig. 2. The four curves in each panel indicate fields from the following: black and plus sign, control run with no correction (a run); red and circle, nudging based on 2-day filtered tendency errors (p run); blue and square, relaxation to climatology (u run); and green and plain, the reanalysis

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Square root of area-weighted spatial variance computed over the period 1992–2001 for the same fields used in Fig. 2. The four curves in each panel indicate fields from the following: black and plus sign, control run with no correction (a run); red and circle, nudging based on 2-day filtered tendency errors (p run); blue and square, relaxation to climatology (u run); and green and plain, the reanalysis

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

NMSEs computed over the period 1992–2001 for the same variables as used in Fig. 2. The three curves in each panel indicate the following: black and plain, the control run with no correction (a run); red and plus sign, nudging based on 2-day filtered tendency errors (p run); and blue and circle, relaxation to climatology (u run).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

NMSEs computed over the period 1992–2001 for the same variables as used in Fig. 2. The three curves in each panel indicate the following: black and plain, the control run with no correction (a run); red and plus sign, nudging based on 2-day filtered tendency errors (p run); and blue and circle, relaxation to climatology (u run).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

NMSEs computed over the period 1992–2001 for the same variables as used in Fig. 2. The three curves in each panel indicate the following: black and plain, the control run with no correction (a run); red and plus sign, nudging based on 2-day filtered tendency errors (p run); and blue and circle, relaxation to climatology (u run).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

August mean bias for the uncorrected COLA model (a run − analysis) and the empirically corrected COLA model using nudging based on tendency errors (p run − analysis), for (top) zonal mean temperature (marked t0), (middle) soil wetness at the second land level (marked sw), and (bottom) soil temperature at the second land level (marked st).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

August mean bias for the uncorrected COLA model (a run − analysis) and the empirically corrected COLA model using nudging based on tendency errors (p run − analysis), for (top) zonal mean temperature (marked t0), (middle) soil wetness at the second land level (marked sw), and (bottom) soil temperature at the second land level (marked st).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

August mean bias for the uncorrected COLA model (a run − analysis) and the empirically corrected COLA model using nudging based on tendency errors (p run − analysis), for (top) zonal mean temperature (marked t0), (middle) soil wetness at the second land level (marked sw), and (bottom) soil temperature at the second land level (marked st).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

(left) The August mean empirical correction coefficients from the p runs of Table 1 and (right) the August mean 24-h forecast error for (top) atmospheric temperature (zonal average, K day^{−1}), (middle) soil wetness (vertical average, day^{−1}), and (bottom) soil temperature (vertical average, K day^{−1}).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

(left) The August mean empirical correction coefficients from the p runs of Table 1 and (right) the August mean 24-h forecast error for (top) atmospheric temperature (zonal average, K day^{−1}), (middle) soil wetness (vertical average, day^{−1}), and (bottom) soil temperature (vertical average, K day^{−1}).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

(left) The August mean empirical correction coefficients from the p runs of Table 1 and (right) the August mean 24-h forecast error for (top) atmospheric temperature (zonal average, K day^{−1}), (middle) soil wetness (vertical average, day^{−1}), and (bottom) soil temperature (vertical average, K day^{−1}).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

NMSEs of the JJA mean forecasts by various models for a subset of variables in Fig. 2. NMSE is the total mean square divided by the sum of the forecast variance and analysis variance. The curves in each panel indicate the following: plus symbol, the control (a run); closed circle, nudging based on tendency errors (p run); open square, relaxation (u run); and dashed, constant = 1.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

NMSEs of the JJA mean forecasts by various models for a subset of variables in Fig. 2. NMSE is the total mean square divided by the sum of the forecast variance and analysis variance. The curves in each panel indicate the following: plus symbol, the control (a run); closed circle, nudging based on tendency errors (p run); open square, relaxation (u run); and dashed, constant = 1.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

NMSEs of the JJA mean forecasts by various models for a subset of variables in Fig. 2. NMSE is the total mean square divided by the sum of the forecast variance and analysis variance. The curves in each panel indicate the following: plus symbol, the control (a run); closed circle, nudging based on tendency errors (p run); open square, relaxation (u run); and dashed, constant = 1.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Same as in Fig. 8 but for the NMSE of the JJA *anomalies.* The curves in each panel indicate the following: plus symbol, the control (a run); closed circle, nudging based on tendency errors (p run); open square, relaxation (u run); and dashed, constant = 1.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Same as in Fig. 8 but for the NMSE of the JJA *anomalies.* The curves in each panel indicate the following: plus symbol, the control (a run); closed circle, nudging based on tendency errors (p run); open square, relaxation (u run); and dashed, constant = 1.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Same as in Fig. 8 but for the NMSE of the JJA *anomalies.* The curves in each panel indicate the following: plus symbol, the control (a run); closed circle, nudging based on tendency errors (p run); open square, relaxation (u run); and dashed, constant = 1.

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSEs of soil wetness (sw), soil temperature (st, K), atmospheric temperature (t0, K), atmospheric meridional velocity (v0, m s^{−1}), and atmospheric zonal velocity (u0, m s^{−1}) as a function of time. The four curves within each panel indicate the following: black and plain, the control run with no correction (a run); red and plus sign, nudging based on 10-yr mean tendency errors (p run); blue and circle, nudging based on ten 1-day mean tendency errors, starting from assimilated ICs (w run); green and square, nudging based on ten 1-day mean tendency errors, starting from nonassimilated ICs (z run).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSEs of soil wetness (sw), soil temperature (st, K), atmospheric temperature (t0, K), atmospheric meridional velocity (v0, m s^{−1}), and atmospheric zonal velocity (u0, m s^{−1}) as a function of time. The four curves within each panel indicate the following: black and plain, the control run with no correction (a run); red and plus sign, nudging based on 10-yr mean tendency errors (p run); blue and circle, nudging based on ten 1-day mean tendency errors, starting from assimilated ICs (w run); green and square, nudging based on ten 1-day mean tendency errors, starting from nonassimilated ICs (z run).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

RMSEs of soil wetness (sw), soil temperature (st, K), atmospheric temperature (t0, K), atmospheric meridional velocity (v0, m s^{−1}), and atmospheric zonal velocity (u0, m s^{−1}) as a function of time. The four curves within each panel indicate the following: black and plain, the control run with no correction (a run); red and plus sign, nudging based on 10-yr mean tendency errors (p run); blue and circle, nudging based on ten 1-day mean tendency errors, starting from assimilated ICs (w run); green and square, nudging based on ten 1-day mean tendency errors, starting from nonassimilated ICs (z run).

Citation: Monthly Weather Review 136, 11; 10.1175/2008MWR2344.1

Empirical correction experiments. Symbols Y and N indicate whether the variable in question was assimilated (yes) or not (no). The other entries are explained in section 3.