## 1. Introduction

Over the last decade many important applications of the backward integration of the adjoint of the linear tangent model have been introduced in the literature. These include the generation of singular vectors for ensemble prediction (e.g., Molteni et al. 1996), four-dimensional variational data assimilation (e.g., Lewis and Derber 1985; Le Dimet and Talagrand 1986; Courtier et al. 1994), forecast sensitivity to the initial conditions (Rabier et al. 1996; Pu et al. 1997a), and targeted observations (e.g., Rohaly et al. 1998; Pu et al. 1998).

Among advanced methods for data assimilation, four-dimensional variational data assimilation (4D-Var) is the approach that has received the most attention in recent years (e.g., Derber 1989; Courtier et al. 1994; M. Zupanski 1993). A simplified version has been implemented recently at the European Centre for Medium-Range Weather Forecasts (ECMWF; Bouttier and Rabier 1997; Rabier et al. 1997), and at the time of this writing, the National Centers for Environmental Prediction (NCEP) is testing a 4D-Var system for the Eta Model.

In 4D-Var a cost function is defined as the squared distance between a model integration and the observations in a given assimilation interval. Lorenc (1986, 1988) showed that for a linear perfect model, if (a) a background error term is added to the cost function at the beginning of the assimilation period, and (b) the background error covariance is the same as that used in the Kalman filter (KF) at the initial time, then the 4D-Var analysis at the end of the interval is the same that would be obtained using the KF. This makes 4D-Var attractive, because it is much less expensive than KF (see also Daley 1991; Thepaut et al. 1993).

4D-Var provides initial conditions for a model integration that is close to the observations, but it also has some disadvantages.

It is difficult to include forecast error covariances in the cost function except at the beginning of the interval, which forces the use of short assimilation intervals in order to maintain the impact of model errors small. It is obvious from Lorenz chaos theory that even with a perfect model, one would not want to perform 4D-Var over, for example, a 2-week data assimilation interval, since the 4D-Var analysis would be given by the state of the model after a 2-week integration, when the predictability has been lost (Pires et al. 1996). Even if the assimilation interval is reduced to a shorter period, such as 6–24 h, the neglect of model errors during the forecast can lead to unrealistic results (Menard and Daley 1996). There have been attempts to include simple evolving model errors (e.g., Derber 1989; D. Zupanski 1997), but much remains to be done in this area.

4D-Var has a large computational cost compared to 3D-Var (typically 10–100 or more iterations are required for convergence, equivalent to about 30–300 model integrations per day). ECMWF, for example, has a powerful supercomputer about 25 times faster than a Cray C90, and has been running a model at a horizontal resolution of T213. Nevertheless, ECMWF had to make several simplifying assumptions in their implementation of 4D-Var (such as using a lower horizontal resolution model of T63 and a short assimilation window) in order to reduce the computational cost (Bouttier and Rabier 1997; Rabier et al. 1997).

Recently Wang et al. (1997) suggested the use of backward model integrations in order to accelerate the convergence of 4D-Var (without including background error in the cost function). Pu et al. (1997a) showed that for the problem of forecast sensitivity, closely related to 4D-Var, a backward integration with the “quasi-inverse” of the tangent linear model (TLM) gave results far superior to those obtained using the adjoint model. The quasi-inverse model is simply the model integrated backward but changing the sign of dissipative terms in order to avoid computational blowup. It can be applied to either the tangent linear or the full nonlinear model, each of which has advantages for different applications. The quasi-inverse linear (QIL) method has been tested successfully at NCEP in several different applications, for example, forecast error sensitivity analysis and data assimilation (Pu et al. 1997a), and adaptive observations (Pu et al. 1998).

Wang et al. (1997) adopted the quasi-inverse approach for their adjoint Newton algorithm (ANA), and applied it to a simplified 4D-Var problem, using simulated data, using the Advanced Regional Prediction System (ARPS; Xue et al. 1995), with impressive results. They assumed the availability of a complete set of observations at the end of the assimilation interval, and showed that the ANA converged in an order of magnitude fewer iterations, and to an error level an order of magnitude smaller than the conventional adjoint approach to solve the same (simplified) 4D-Var problem.

Kalnay and Pu (1998) generalized the Wang et al. (1997) approach by including a background term to the cost function and further simplified the method. The background error term in the cost function allows using incomplete sets of observations, but in order to maintain the efficiency of the method, it is necessary to estimate the background error term at the end of the assimilation interval, rather than at the beginning as in the Lorenc (1986) formulation. We have further generalized the method to allow for the assimilation of data at different times, rather than only at the end of the interval, as in Wang et al. (1997). The results suggest that the quasi-inverse model may be used in data assimilation for accelerating convergence and reducing spinup problems, although problems may arise when tested on comprehensive atmospheric models.

In this paper we first introduce the quasi-inverse approach for the forecast sensitivity problem, and then formulate a closely related variational assimilation problem using the quasi-inverse model (section 2). We show that if the cost function has no background term, and has a complete set of observations (as was assumed in many classical 4D-Var papers), the new method solves the 4D-Var-minimization problem efficiently, and is in fact equivalent to the Newton algorithm but without having to compute a Hessian. If the background term is included but computed at the end of the interval, the minimization can still be carried out very efficiently, but in this case the method is closer to a 3D-Var formulation in which the analysis is attained through a model integration. For this reason, we call the method “inverse 3D-Var” (I3D-Var).

In section 3 we introduce a simple “model” (viscous Burgers’ equation), which includes effects mimicking the three main components of atmospheric models: advection, large-scale instabilities, and dissipative processes. Using this simple model we show the effects of applying a linear tangent model, its adjoint, the exact inverse, and the quasi-inverse model, and compare one iteration of inverse 3D-Var and adjoint 4D-Var. In section 4 we present preliminary results comparing inverse 3D-Var and the adjoint 4D-Var using Burgers’ model and Lorenz’s model (Lorenz 1963). We show that when the background term is ignored and complete fields of noisy observations are available at multiple times, the inverse 3D-Var method still minimizes the same cost function as 4D-Var (but much more efficiently). Section 5 discusses several topics related to possible applications of inverse 3D-Var: assimilation of data at multiple time levels, research on a storm-scale model with reversible clouds for storm-forecast initialization, and the problem of random observational errors, which may amplify during the backward integration (Reynolds and Palmer 1998).

## 2. Formulation of inverse 3D-Var

### a. Forecast sensitivity

**x**

_{0}that “optimally” corrects a perceived forecast error at the final time

*t.*In what follows,

*M*is a nonlinear forecast, that is,

**x**

_{t}=

*M*(

**x**

_{0}),

*A*is the analysis,

*E*=

*M*−

*A*is the perceived error,

*L*is the linear propagator (linear tangent model integrated forward in time), and

*L** is its adjoint with respect to the metric used in the definition of the inner product: 〈

**x**,

*L*

**y**〉 = 〈

*L**

**x**,

**y**〉 for any pair of vectors

**x**,

**y**. Then

*δ*

**x**

_{0}is the solution of

#### 1) Adjoint approach (Rabier et al. 1996; Pu et al. 1997b)

**x**is given by 〈

**x**,

**x**〉 =

**x**

^{T}

*W*

^{2}

**x**, the total energy of a state vector

**x.**With this inner product, 〈

**x**,

*L*

**y**〉 = 〈

*L**

**x**,

**y**〉 defines the adjoint of

*L*with respect to the total energy norm

*L** =

*W*

^{−2}

*L*

^{T}

*W*

^{2}, where

*L*

^{T}is the adjoint of

*L*with respect to the Eulerian norm (transpose of

*L*). The error cost function is then

*δ*

**x**

_{0}in the initial conditions will lead to a change in the cost function

*δ*

*J*

*M*

**x**

_{0}

*A,*

*δ*

**x**

_{t}

*M*

**x**

_{0}

*A, L*

*δ*

**x**

_{0}

*L*

*M*

**x**

_{0}

*A*

*δ*

**x**

_{0}

*L*

*E,*

*δ*

**x**

_{0}

*δJ*= 〈

**∇***J*(

**x**

_{0}),

*δ*

**x**

_{0}〉, the gradient of the cost function with respect to the initial conditions is given by

**∇***J*

**x**

_{0}

*L*

*L*

*δ*

**x**

_{0}

*L*

*E.*

*J*has the same units as the state vector

**x**, but it depends on the choice of norm. The adjoint procedure requires in addition an estimation of an appropriate amplitude

*α,*after which the adjoint sensitivity correction becomes

*δ*

**x**

_{0}

*α*

**∇***J*

**x**

_{0}

#### 2) Quasi-inverse approach

*δ*

**x**

_{0}

*L*

^{−1}

*E.*

The QIL approximation to *L*^{−1} consists of simply running the TLM backward (changing the sign of Δ*t,* and also changing the sign of the dissipative terms to avoid computational blowup). Pu et al. (1997a) found that this is a rather accurate approximation to the dry-dynamics inverse model. It solves a deterministic problem, so that there is no need to find an optimal amplitude, as required by the adjoint method. Reynolds and Palmer (1998) compared this method with running the exact inverse (using a Runge–Kutta time scheme and no change in sign for dissipation). They found that the presence of dissipation during the backward integration had a beneficial effect of a small reduction in noisiness.

Note that the inverse solution is not “optimal” like the adjoint solution, since there is no constraint on the size of *δ***x**. However, the inverse approach can be considered to be “perfect”: it reaches in a single step the same solution (**∇***J* ≅ 0) that the adjoint approach would reach after many iterations. It should be pointed out that Lorenc (1988) integrated the NCEP nested grid model backward with a change of sign in the physics, but in his experiments he was attempting to approximate the adjoint model, not the inverse model.

### b. Inverse 3D-Var

Wang et al. (1997) were trying to solve a 4D-Var problem on what they denote the “estimated Newton descent direction” rather than the descent direction provided by the standard adjoint approach. For this purpose they needed to approximate the inverse TLM, and succeeded by adopting the QIL method. In the experiments they did with simulated data and the adiabatic version of the ARPS model, they got convergence in an order of magnitude fewer iterations with the new method (ANA), and a decrease of the cost function, which was also an order of magnitude better than with the adjoint approach. They assimilated simulated data only at the end of the interval, and only the full model field, so that they did not need a background error term in the cost function.

In this subsection we generalize their approach by including both data and background in the cost function, with appropriate error covariances. To maintain the ability to solve the minimization efficiently, however, the background term is estimated at the end of the interval, rather than at the beginning as in Lorenc (1986). Our derivation is also considerably simpler than the ANA method, and our method does not use line minimization, as in Wang et al. (1997). As a result, our method is about twice as fast as the ANA approach.

**y**

^{o}is available at the end of the assimilation interval

*t,*with

*δ*

**x**=

**x**

^{a}−

**x**

^{b},

*δ*

**y**=

**y**

^{o}−

**H**

**x**

^{b}). Here

**x**

^{a}and

**x**

^{b}are the analysis and first guess, respectively, and

**H**

*B*), plus the distance to the observations (weighted by the inverse of the observational error covariance

*R*), also at the end of the interval:

*δ*

**x**(the control variable) is the difference between the analysis and the background (at the present iteration) at the beginning of the assimilation window,

*L*and

*L*

^{T}are, as before, the TLM and its adjoint, and

*H*is the tangent linear version of the forward observation operator

**H**

*J*with respect to the initial change

*δ*

**x**, we obtain

**∇***J*

*L*

^{T}

*B*

^{−1}

*L*

*δ*

**x**

*H*

^{T}

*R*

^{−1}

*HL*

*δ*

**x**

*δ*

**y**

**x**

_{k+1}

**x**

_{k}

*a*

_{k}

**p**

_{k}

*k,*the vector

**p**

_{k}represents a search direction, and the positive scalar

*a*

_{k}is the step length. All minimization algorithms require the computation of the search direction, which is a function of

**∇***J*. For example,

**p**

_{k}= −

**∇***J*for the steepest descent algorithm,

**Q**

**p**

_{k}= −

**∇***J*for the Newton method where

**Q**

**S**

**p**

_{k}= −

**∇***J*for the quasi-Newton method where

**S**

**∇***J*becomes very small and the minimum of

*J*is reached. For some algorithms (e.g., LBFGS), each iteration requires a few corrections (or function calls) to compute the approximate Hessian, so that the number of direct and adjoint integrations required by 4D-Var can be significantly larger than the number of iterations.

*δ*

**x**that makes

**∇****J**= 0 for small

*δ*

**x.**From (8) we can eliminate the adjoint operator, and obtain the “perfect” solution given by

*L*

*δ*

**x**

*B*

^{−1}

*H*

^{T}

*R*

^{−1}

*H*

^{−1}

*H*

^{T}

*R*

^{−1}

*δ*

**y**

*L*

^{−1}at hand (the quasi-inverse model obtained by integrating the tangent linear model backward, but changing the sign of frictional terms), we can apply it and obtain

*δ*

**x**

*L*

^{−1}

*B*

^{−1}

*H*

^{T}

*R*

^{−1}

*H*

^{−1}

*H*

^{T}

*R*

^{−1}

*δ*

**y**

This can be interpreted as starting from the 3D-Var analysis increment at the end of the interval and integrating backward with the TLM or an approximation of it. If we do not include the forecast error covariance term *B*^{−1}, (11) reduces to the ANA algorithm of Wang et al. (1997) except that we do not need to run a minimization algorithm though a few quasi-inverse iterations are needed due to the discrepancy between the full nonlinear model and the linear model. We have tested the inverse 3D-VAR with the ARPS model and found that for this reason, the inverse 3D-Var is computationally about twice as fast as the Wang et al. (1997) ANA scheme.

### c. Equivalence of inverse 3D-Var and the Newton minimization algorithm

**x**+

*δ*

**x**, and our present estimate of the solution is

**x**. Then by Taylor expansion,

**∇***J*

**x**

*δ*

**x**

**∇***J*

**x**

^{2}

*J*

**x**

*δ*

**x**

^{2}

*J*(

**x**) is the Hessian matrix ∇

^{2}

*J*

_{i,j}= ∂

^{2}

*J*/∂

**x**

_{i}∂

**x**

_{j}. The Newton algorithm, which has quadratic rate of convergence, solves the rhs part of equation (12):

**∇***J*(

**x**) + ∇

^{2}

*J*(

**x**)

*δ*

**x**= 0. Therefore the Newton iteration is given by

*δ*

**x**

^{2}

*J*

**x**

^{−1}

**∇***J*

**x**

^{2}

*J*(

**x**)

*δ*

**x**in (12)] is obtained approximately by the difference of gradients, while with the adjoint truncated Newton method (Wang et al. 1995) it is obtained exactly by solving the second-order adjoint. Although both methods can reduce the computing cost of the full-Newton iterations and show more efficient performance than quasi-Newton methods (Wang et al. 1995), they still require many iterations and are computationally expensive.

^{2}

*J*

*L*

^{T}

*B*

^{−1}

*H*

^{T}

*R*

^{−1}

*H*

*L.*

*δ*

**x**

_{1}

*L*

^{T}

*B*

^{−1}

*H*

^{T}

*R*

^{−1}

*H*

*L*

^{−1}

*L*

^{T}

*H*

^{T}

*R*

^{−1}

*δ*

**y**

*L*

^{−1}

*B*

^{−1}

*H*

^{T}

*R*

^{−1}

*H*

^{−1}

*H*

^{T}

*R*

^{−1}

*δ*

**y**

The inverse 3D-Var algorithm solves exactly the same problem but takes advantage of the fact that the lhs of (12)**∇***J*(**x** + *δ***x**) = 0 can be solved directly [cf. Eqs. (8) and (11)]. Therefore the inverse 3D-Var iteration (11) is identical to the Newton algorithm iteration (assuming the quasi-inverse approximates the true inverse), but it is not necessary to compute the Hessian or the gradient, just to integrate the linear tangent model backward.

The results of Wang et al. (1997) and Pu et al. (1997b) support considerable optimism for this method. For a quadratic function, the Newton algorithm (and the equivalent inverse 3D-Var) converges in a single iteration. Since the cost functions used in 4D-Var are close to quadratic functions, inverse 3D-Var can be considered equivalent to perfect preconditioning of the simplified 4D-Var problem.

### d. Multiple time levels of data

If there are data at different time levels we can choose to bring the data increments to the same initial time level (as shown schematically in Fig. 1) so that the increments corresponding to the different data can be averaged, with weights that may depend on the time level or the type of data. For applications in which “knowing the future” is allowed, such as reanalysis, the observational increments could be brought to the center of an interval, and used for the final analysis. In section 4 we show that, in a simple nonlinear model with complete data, when increments are brought to the same initial time, we solve a separate minimization for each time level, but that in fact (at least for this model) the I3D-Var minimizes the same multiple-level cost function as the simplified 4D-Var problem.

## 3. Burgers’ equation example

### a. Simple TLM, adjoint, and inverse model formulation

*u*

*u*+

*δu*and

*ν*is a diffusion coefficient, and we assume that the basic flow

*u*(

*x*) is a slowly varying function of

*x,*and neglect its time changes.

*δu*

*A*

*t*

*e*

^{ik(x−ut)}

*du*/

*dx*< 0), whereas the second term represents small-scale dissipative processes. The imaginary exponent represents the effects of large-scale advection.

*δu*

*t*

*e*

^{−ikut}

*e*

^{−(du/dx)t}

*e*

^{−νk2t}

*δu*(0)

*final perturbation*

*large-scale advection*

*large-scale instability*

*diffusion*

*initial perturbation.*

*t,*is then

*L*

*e*

^{−(du/dx+νk2)t−ikut}

*L*

*e*

^{−(du/dx+νk2)t+ikut}

*L*

^{−1}

*e*

^{(du/dx+νk2)t+ikut}

*L*

*e*

^{(du/dx−νk2)t+ikut}

*L*

^{−1}

*L*

*I.*

*L*

*L*

*e*

^{−2(du/dx+νk2)t}

*L*

*L*

*e*

^{−2νk2t}

### b. Application of 4D-Var and inverse 3D-Var to Burgers’ equation

**H**

*I*(observations are made in the model variable space). We assume

*B*

*αU*

^{2}

*I,*

*R*

*U*

^{2}

*U*

^{2}is the observational error variance, and

*αU*

^{2}is the background error variance.

*α*≪ 1 corresponds to small background errors (good forecast) and

*α*≫ 1 to a poor forecast (relative to the observations). If we neglect diffusion, the inverse 3D-Var solution is

*du*/

*dx*> 0) the initial increments will be larger than at the final time. This is not of concern, because when integrated forward, they will decay to their proper observed values.

*δu*

_{1}

*a*

**∇***J*

_{0}

*a*

*du*

*dx*

*t*

*ikut*

*δu*

^{obs}

*a*is an appropriately chosen amplitude. Like the forecast sensitivity problem, it moves the observational increment backward in time, but enhancing the growing modes during the adjoint integration. It should be noted that when the 4D-Var is iterated, eventually it should converge to the same solution (28).

## 4. Numerical experiments

We have performed preliminary experiments with the NCEP global model (Pu et al. 1997a), and with two simple models, viscous Burgers’ equation and the Lorenz (1963) model.

### a. The NCEP global model

The application to the global NCEP model was a forecast sensitivity approach. The 24-h forecast error was estimated from the 24-h analysis, and the difference between the analysis and the forecast was integrated backward, using the TLM of the NCEP global model, with the sign of surface friction and horizontal diffusion changed. The results were very encouraging, indicating that the correction of the forecast was considerably better than using the adjoint sensitivity approach, even when the latter was iterated five times (Pu et al. 1997a). Several important points should be noted:

Ideally, the TLM integrated backward should have a reversible formulation of the Hamiltonian (energy conserving) dynamics. In practice, the NCEP model has only approximately reversible dynamics (e.g., the Robert time filter is slightly diffusive). This will introduce additional diffusion during the backward integration; this subject is further discussed in the next subsection.

A known nonlinear solution was integrated backward and again forward over 24 h. The error in reproducing the full nonlinear perturbation with the TLM and the quasi-inverse TLM was about 10% in both total and kinetic energy throughout the model atmosphere, except near the surface, where the effect of changing the sign of friction is most important and where the error reached about 25%.

The amplitude of the quasi-inverse sensitivity was much larger than the adjoint sensitivity. This is because the adjoint sensitivity focuses only on the fastest growing modes [cf. Eq. (24)], whereas the quasi-inverse sensitivity includes both growing and decaying modes (and the latter grow during the backward integration). This may result in unwanted noise growth, and needs to be handled carefully.

### b. Burgers’ equation

We performed some numerical tests with viscous Burgers’ equation (16). It should be noted that the model had been originally programmed using the Lax scheme (e.g., Anderson et al. 1984), which is highly diffusive and far from reversible (S. K. Park, personal communication 1998). Such a scheme would not be appropriate for a method that requires approximating the inverse of the model by running it backward. This was easily solved by rewriting the model with a leapfrog scheme for advection (with a forward first time step) and DuFort-Frankel for diffusion (e.g., Anderson et al. 1984). Therefore the numerical scheme was fully reversible except for the first forward time step. This allowed us to compare the effects of using the exact inverse linear model (IL) and the QIL as long as the Reynolds number was large enough (i.e., low dissipation) for the exact inverse to remain computationally stable.

The results were excellent. Figure 2a shows the cost function for a case in which the first guess included errors of 50%, and the observations were exact. As in Wang et al. (1997), the observation field at the end of the assimilation interval was complete, and the cost function did not include a background term. Unless noted otherwise, the results described below are computed with the QIL. The 4D-Var minimization was performed using the LBFGS algorithm (Liu and Nocedal 1989), based on a limited-memory quasi-Newton method. It should be noted that the parameter for checking directional derivative condition (GTOL) in the LBFGS algorithm has to be chosen appropriately for each problem; after some experimentation, we chose it to be GTOL = 0.1, which we found to be optimal for the 4D-Var in our case. For an assimilation window of 71 time steps (Fig. 2a), the inverse 3D-Var converged to less than 10^{−12} of its initial value in 4 iterations, whereas the adjoint algorithm required 11 iterations (and 17 computations of the gradient, each one involving a forward and backward integration) in order to converge to 10^{−10}. For longer assimilation windows, the advantages of inverse 3D-Var became more apparent. For example, when the assimilation window was extended to 106 time steps (Fig. 2b), inverse 3D-Var converged to 10^{−10} in only five iterations. The 4D-Var, on the other hand, converged to the same value in 34 iterations, and almost 80 computations of the gradient (each one involving a forward and backward integration of the model or its adjoint). Choices of smaller GTOL resulted in a lack of convergence.

Figure 3 shows the performance of two methods for the same case as in Fig. 2 but with an assimilation window of 101 time steps and different number of observations. The inverse 3D-Var converges to 10^{−12} of its original value after three iterations, but 4D-Var with data at the end of the interval requires 44 equivalent model integrations to converge to 10^{−10}. If we provide 4D-Var with complete observations for every time step, it converges to 10^{−10} of the original cost function in 12 time integrations. Many other experiments including observational and background errors were performed with uniformly good results. Some of the conclusions from these experiments (S. K. Park 1998, personal communication) are:

Results from inverse 3D-Var are very good in essentially every case. In general, for large Reynolds number (low dissipation), the QIL converges slightly faster than the exact IL. For small Reynolds number, the exact IL becomes unstable, but the QIL still converges fast.

We also tested the results of having multiple time levels in the observations. We included random errors in the observations with maximum amplitude of 10% of the total range. We followed the approach of Fig. 1, that is, we brought the observational increments (innovations) from different observational times backward to the same initial time level. Performing an average of the simultaneous increments gives very good results, and improves forecasts beyond the assimilation window roughly like (

*n*)^{−½}, where*n*is the number of time levels in the observations. Additional iterations are performed integrating the nonlinear model from the updated initial conditions, and integrating backward the observational increments. The results of the forecasts with one iteration of inverse 3D-Var were comparable to those of 20 iterations of 4D-Var, whereas three iterations of inverse 3D-Var resulted in much better forecasts (Fig. 4).It is important to note that in this experiment, the inverse 3D-Var approach in practice minimizes the same total cost function as the variational approach, even though it is only guaranteed to minimize one observation level at a time (Table 1).

### c. Experiments with Lorenz 3-variable model

Inverse 3D-Var was also tested and compared with regular (adjoint) 4D-Var using Lorenz (1963) 3-variable model. Figure 5 shows the evolution of the cost function using two different library minimization algorithms for the 4D-Var approach [limited-memory quasi-Newton or LBFGS, and Fletcher–Reeves conjugate gradient or FR_CG—see Navon and Legler (1987) for details] and the inverse 3D-Var. The results are similar to those obtained with Burgers’ equation: inverse 3D-Var reduces the cost function to 10^{−10} in three iterations and to 10^{−22} in five iterations (Fig. 5). The conjugate gradient and quasi-Newton methods converge to 10^{−14} in about 20 and 14 iterations, respectively (where each iteration includes several forward nonlinear and backward adjoint integrations). Figure 6 shows the cost function in the (*X*_{0}, *Y*_{0}) space and the descent approach of the three algorithms. The fact that inverse 3D-Var is equivalent to a Newton algorithm is apparent by the directness of its convergence: both the descent direction and the amplitude of the step are optimal. Several additional experiments with random errors in the initial conditions, multiple levels of observations, and observations of subset of variables and their combinations have also given uniformly excellent results (J. Gao 1998, personal communication).

## 5. Discussion

In the following discussion we first consider the relationship between the traditional 6-h cycle for 3D-Var, 4D-Var, and inverse 3D-Var (Cohn 1997; Courtier 1997). In 3D-Var (or previously in optimal interpolation) the observations were lumped together within a ±3-h window, and were assumed to have been taken at the center of the interval. For example, an analysis at 1200 UTC included observations taken between 0900 UTC and 1500 UTC, but assumed to be observed at 1200 UTC. For observations made, for example, at 1000 UTC, this introduces two (relatively small) errors: the innovations (observations minus background) are computed with respect to a forecast at the wrong time, and the innovations are applied at the wrong time (1200 UTC instead of 1000 UTC). The first error can be easily corrected within 3D-Var: the background can be computed at the time of the observation, rather than at the center of the window. This correction is currently done at NCEP. However, only 4D-Var corrects the second error, 4D-Var (as implemented at ECMWF) computes the initial conditions valid at 0900 UTC that best fit the data at their correct times throughout the 0900–1500 UTC interval. It minimizes a cost function that includes distance to the background at 0900 UTC, plus the distance to the observations at their correct time (binned into 1-h intervals). The 4D-Var “analysis” at 1200 UTC is defined as the 3-h forecast from the optimal initial conditions at 0900 UTC. Because the minimization required about 80 iterations before reaching a satisfactory level, ECMWF has used a T63 model for the minimization, whereas the forecast model has T213 resolution.

Inverse 3D-Var offers some additional flexibility: if observations are complete, it allows transporting all the innovations from 0900 to 1500 UTC to the desired time (1200 UTC) essentially exactly (Fig. 1). The innovations at 1200 UTC can then be analyzed into a 3D-Var that includes different background weights depending on the length of the forecast. In general, however, the observations are not complete, and a background error covariance needs to be introduced into the cost function. In that case, the inverse 3D-Var analysis at the end of the assimilation interval is equivalent to 3D-Var, but the analysis is reached through a model integration, which can be advantageous in reducing problems of spin up. When knowledge of “future” observations is available (as in reanalysis), and the goal is to optimize the analysis (rather than to improve the forecast), the inverse 3D-Var can also be used, as suggested by the forecast sensitivity applications. In addition, it may be possible to use the inverse 3D-Var as a first iteration in the complete 4D-Var problem, thus acting as a kind of preconditioner.

We have seen that inverse 3D-Var has several potential advantages, including accuracy, efficiency, and flexibility, and these have been apparent in the simple model experiments. It also has some potentially serious disadvantages, but we believe they can be overcome with further development and experimentation:

Growth of noise that projects on decaying modes during the backward integration. This is a very serious problem, but it need not be a “show-stopper” since those errors will decay again during the next forward integration. The results of Pu et al. (1997a) for integrations 24 h and longer, and those obtained here with the Burgers’ equation and observational noise are quite encouraging in this respect. It should be noted that the results obtained by Reynolds and Palmer (1998) when studying this problem are in full agreement with those of Pu et al. (1998) and of the present paper. They found that analysis uncertainties grew during the backward integration, but that during the forward integration they decayed again. As a result the forecast error reduction achieved at the end of the interval using the quasi-inverse was equivalent to that derived using the pseudo-inverse method with 60–90 singular vectors, but it was obtained at a computational cost several orders of magnitude smaller.

Physical processes are generally not parameterized in a reversible form in atmospheric models. This is also serious, but to some extent it can be overcome. For example, moist convective processes can be simplified and parameterized in a reversible manner through the first hour of model integration (Jerry Straka 1998, personal communication). We are testing this idea with the ARPS model, where we plan to use a reversible parameterization of convection to “phase correct” the background field when, for example, the model predicts a squall line shifted in space and time from the observations.

The basic hydrodynamics of a model may not be written in a reversible fashion. If the numerical discretization of the hydrodynamics is excessively dissipative, this may require some rewriting, as discussed in section 4. Slightly dissipative schemes, when integrated backward will also fall within the “QIL” approach: they will dissipate both forward and backward.

The truly dissipative processes in the atmospheric model are not reversible, or may be reversible only for short intervals. If dissipation is a major factor during an assimilation window, it will not be well represented by inverse 3D-Var. On the other hand, both our results and those of Reynolds and Palmer (1998), suggest that in most cases, the quasi-inverse approximation slightly improves the results.

Finally, it has to be demonstrated with more complex systems that in nonlinear integrations this method will provide an improvement upon what can be attained with 3D-Var alone.

We are currently planning to test the inverse 3D-Var approach on the ARPS model, by combining it with a 3D-Var analysis including Doppler radar observations of radial velocities and reflectivities, using a linear tangent model with simplified reversible physics. If successful, we will also attempt to apply this method to the NCEP global model, where it could be applied, for example, in the second phase of the global Reanalysis project (Kalnay et al. 1996), where “future” data is available during the assimilation.

## Acknowledgments

This work was started at NCEP’s Environmental Modeling Center. We would like to express our special gratitude to David Zhi Wang and his co-authors, for introducing similar ideas on the acceleration of 4D-Var, and Jerry Straka for developing a reversible parameterization of convection. Discussions with Kelvin Droegemeier, Keith Brewster, Alan Shapiro, and Brian Fiedler at the University of Oklahoma; Mark Iredell, Wan-Shu Wu, Zoltan Toth, M. Zupanski, and D. Zupanski at NCEP; R. Menard and S. Cohn at NASA DAO; and Carolyn Reynolds and Tim Palmer were also helpful. We are especially grateful to Olivier Talagrand, Andrew Lorenc, Francois Bouttier, Istvan Szunyogh, Jim Purser, John Derber, I. Michael Navon, and David Parrish for their insightful comments on an early version of the paper.

The support and encouragement of Peter Lamb (CIMMS), Kelvin Droegemeier (CAPS), and Fred Carr (School of Meteorology), and Ron McPherson and Stephen Lord (NCEP) are gratefully acknowledged. This research was supported by DOC NOAA under Grant NA67J0160 through CIMMS and NSF under Grant ATM91-20009 through CAPS.

## REFERENCES

Anderson, D. A., J. C. Tannehill, and R. H. Pletcher, 1984:

*Computational Fluid Mechanics and Heat Transfer.*Hemisphere Publishing, 599 pp.Bouttier, F., and F. Rabier, 1997: The operational implementation of 4D-Var.

*ECMWF Newsl.,***78,**2–5.Cohn, S. E., 1997: An introduction to estimation theory.

*J. Meteor. Soc. Japan,***75,**257–288.Courtier, P., 1997: Variational methods.

*J. Meteor. Soc. Japan,***75,**211–218.——, J.-N. Thepaut, and A. Hollingsworth, 1994: A strategy for operational implementation of 4D-Var using an incremental approach.

*Quart. J. Roy. Meteor. Soc.,***120,**1367–1388.Daley, R., 1991:

*Atmospheric Data Analysis.*Cambridge University Press, 457 pp.Derber, J. C., 1989: A variational continuous assimilation technique.

*Mon. Wea. Rev.,***117,**2437–2446.Gill, P. E., W. Murray, and M. H. Wright, 1981:

*Practical Optimization.*Academic Press, 401 pp.Ide, K., P. Courtier, M. Ghil, and A. Lorenc, 1997: Unified notation for data assimilation: Operational, sequential and variational.

*J. Meteor. Soc. Japan,***75,**181–197.Kalnay, E., and Z.-X. Pu, 1998: Application of the quasi-inverse method to accelerate 4D-Var. Preprints,

*12th Conf. on Numerical Weather Prediction,*Phoenix, AZ, Amer. Meteor. Soc., 41–42.——, and Coauthors, 1996: The NCEP/NCAR 40-Year Reanalysis Project.

*Bull. Amer. Meteor. Soc.,***77,**437–471.Le Dimet, F.-X., and O. Talagrand, 1986: Variational algorithms for analysis and assimilation of meteorological observations: Theoretical aspects.

*Tellus,***38A,**97–110.Lewis, J. M., and J. C. Derber, 1985: The use of adjoint equations to solve a variational adjustment problem with advective constraints.

*Tellus,***37A,**309–322.Liu, D. C., and J. Nocedal, 1989: On the limited memory BFGS method for large scale minimization.

*Math. Program.,***45,**503–528.Lorenc, A. C., 1986: Analysis methods for numerical weather prediction.

*Quart. J. Roy. Meteor. Soc.,***112,**1177–1194.——, 1988: A practical approximation to optimal four-dimensional objective analysis.

*Mon. Wea. Rev.,***116,**730–745.Lorenz, E. N., 1963: Deterministic nonperiodic flow.

*J. Atmos. Sci.,***20,**130–141.Menard, R., and R. Daley, 1996: The application of Kalman smoother theory to the estimation of 4DVAR error statistics.

*Tellus,***48A,**221–237.Molteni, F., R. Buizza, T. N. Palmer, and T. Petroliagis, 1996: The ECMWF Ensemble Prediction System: Methodology and validation.

*Quart. J. Roy. Meteor. Soc.,***122,**73–119.Pires, C., R. Vautard, and O. Talagrand, 1996: On extending the limits of variational assimilation in nonlinear chaotic systems.

*Tellus,***48A,**96–121.Pu, Z.-X., E. Kalnay, J. Sela, and I. Szunyogh, 1997a: Sensitivity of forecast errors to initial conditions with a quasi-inverse linear model.

*Mon. Wea. Rev.,***125,**2479–2503.——, ——, J. Derber, and J. Sela, 1997b: An inexpensive technique for using past forecast errors to improve future forecast skill.

*Quart. J. Roy. Meteor. Soc.,***123,**1035–1054.——, S. J. Lord, and E. Kalnay, 1998: Forecast sensitivity with dropwindsonde data and targeted observations.

*Tellus,***50A,**391–410.Rabier, F., E. Klinker, P. Courtier, and A. Hollingsworth, 1996: Sensitivity of forecast errors to initial conditions.

*Quart. J. Roy. Meteor. Soc.,***122,**121–150.——, and Coauthors, 1997: The ECMWF operational implementation of four-dimensional variational assimilation. ECMWF Research Department Tech. Memo. 240, 62 pp. [Available from ECMWF, Shinfield Park, Reading RG2 9AX, United Kingdom.].

Reynolds, C., and T. N. Palmer, 1998: Decaying singular vectors and their impact on analysis and forecast corrections.

*J. Atmos. Sci.,***55,**3005–3023.Rohaly, G. D., R. H. Langland, and R. Gelaro, 1998: Identifying regions where the forecast of tropical cyclone tracks is most sensitive to initial condition uncertainty using adjoint methods. Preprints,

*12th Conf. on Numerical Weather Prediction,*Phoenix, AZ, Amer. Meteor. Soc., 337–340.Thepaut, J.-N., R. N. Hoffman, and P. Courtier, 1993: Interactions of dynamics and observations in a 4-dimensional variational assimilation.

*Mon. Wea. Rev.,***121,**3393–3414.Wang, Z., I. M. Navon, X. Zou, and F. X. Le Dimet, 1995: A truncated Newton optimization algorithm in meteorology applications with analytic Hessian/vector products.

*Comput. Optim. Appl.,***4,**241–262.——, K. K. Droegemeier, L. White, and I. M. Navon, 1997: Application of a new adjoint Newton algorithm to the 3D ARPS storm-scale model using simulated data.

*Mon. Wea. Rev.,***125,**2460–2478.Xue, M., K. K. Droegemeier, V. Wong, A. Shapiro, and K. Brewster, 1995: ARPS version 4.0 users’ guide. Center for Analysis and Prediction of Storms, University of Oklahoma, 380 pp. [Available from Center for Analysis and Prediction of Storms, University of Oklahoma, Norman, OK 73019.].

Zupanski, D., 1997: A general weak constraint applicable to operational 4DVAR data assimilation systems.

*Mon. Wea. Rev.,***125,**2274–2292.Zupanski, M., 1993: Regional four-dimensional variational data assimilation in a quasi-operational forecasting environment.

*Mon. Wea. Rev.,***121,**2396–2408.

Variations in the cost function computed from different initial conditions generated by the same procedure as in Fig. 4 except that each variational assimilation (adjoint and ensemble inverse) has stopped at iteration numbers of 1–5.